2

[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]

table:
CREATE TABLE dc.event (
    id timeuuid PRIMARY KEY, 
    name text

) WITH bloom_filter_fp_chance = 0.01;

How do I get a time range of data from Cassandra?

For example, when I try 'select * from event where id> maxTimeuuid('2014-11-01 00:05+0000') and minTimeuuid('2014-11-02 10:00+0000')', as seen here http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/timeuuid_functions_r.html

I get the following error: 'code=2200 [Invalid query] message="Only EQ and IN relation are supported on the partition key (unless you use the token() function)"'

Can I keep timeuuid as primary key and meet the requirement?

Thanks

Casey Rule
  • 1,960
  • 2
  • 16
  • 28
Johnny Li
  • 21
  • 2

5 Answers5

1

Can I keep timeuuid as primary key and meet the requirement?

Not really, no. From http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

WHERE clauses can include a greater-than and less-than comparisons, but for a given partition key, the conditions on the clustering column are restricted to the filters that allow Cassandra to select a contiguous ordering of rows.

You could try adding "ALLOW FILTERING" to your query... but I doubt that would work. And I don't know of a good way (and neither do I believe there is a good way) to tokenize the timeuuids. I'm about 99% sure the ordering from the partitioner would yield unexpected, bad results, even though the query itself would execute and appear correct until you dug into it.

As an aside, you should really check out a similar question that was asked about a year ago: time series data, selecting range with maxTimeuuid/minTimeuuid in cassandra

Community
  • 1
  • 1
Evan Volgas
  • 2,735
  • 3
  • 18
  • 30
1

Short answer, No. Long answer, you can do something similar EG:

CREATE TABLE dc.event ( event_time timestamp, id timeuuid, name text, PRIMARY KEY(event_time, id) ) WITH bloom_filter_fp_chance = 0.01;

The timestamp would presumably be truncated so that it only reflected a whole day (or hour or minute depending on the velocity of your data). Your where clause would have to include the "IN" parameter for the timestamps that are included in your timeuuid range.

If you use an appropriate chunking factor (how much you truncate your timestamp), you may even answer some of the questions you're looking for without using a range of timeuuids, just a simple where clause.

Essentially this allows you the leeway to make the kind of query you're looking for while respecting the restrictions in Cassandra. As Raedwald pointed out, you can't use the partition key in continuous ranges because of the underpinning nature of Cassandra as a large hash- That being said, Cassandra is well known to do some incredibly powerful things in time-series data.

mildewey
  • 414
  • 2
  • 5
0

Take a look at how Newts is doing time series for ranges. The author has a great set of slides and a talk describing the data model to get precisely what you seem to be looking for. https://github.com/OpenNMS/newts/

mshuler
  • 486
  • 3
  • 7
0

Cassandra can not do this kind of query because Cassandra is a key-value store implemented using a giant hash map, not a relational database. Just like an in memory hash map, the only way to find the key values within a sub range is to iterate through all the keys. That can be expensive enough for an in memory hash map, but for Cassandra it would be crippling.

Raedwald
  • 40,290
  • 35
  • 127
  • 207
0

Yes, you can do it by using spark with scala and spark-cassandra-connector!

I think you should keep your partition keys fewer by setting them to 'YYYY-MM-dd hh:00+0000' and filter on dates and hours only.

Then you could use something like:

case class TableKey(id: timeuuid) 
val dates = Array("2014-11-02 10:00+0000","2014-11-02 11:00+0000","2014-11-02 12:00+0000")    
val selected_data = sc.parallelize(dates).map(x => TableKey(_)).joinWithCassandraTable('dc', 'event')

And there you have your selected data rdd that you could collect:

val data = selected_data.collect

I had similar problem...

Community
  • 1
  • 1
Reshef
  • 2,150
  • 1
  • 17
  • 41