Cassandra Data Modelling and designing the Clustering

Question

I am little confused on designing the data model for Cassandra, coming from SQL background! I have gone through Datastax documentation several times to understand many things about Cassandra! This seems to be problem and not sure how can I overcome this and type of data model which I should opt for!

Primary Key along with Clustering is something really explained well here! The documentation says that, Primary Key (Partition key, Clustering keys) is the most important thing in data model.

My use-case is pretty simple:

ITEM_ID    CREATED_ON     MOVED_FROM     MOVED_TO   COMMENT

ITEM_ID will be unique (partition_key) and each item might have 10-20 movement records! I wanted to get the movement records of an item sorted by time it's created on. So I decided go with CREATED_ON as clustering key.

According to documentation, clustering_key comes under secondary index which should be as much repeatable value as possible unlike partition key. My data-model exactly fails here! How do I preserve order using clustering to achieve the same?

Obviously I can't create some ID generation login in Application since it runs on many instances and if I have to relay on some logic, eventually the purpose of Cassandra goes for toss here.

score 3 · Accepted Answer · answered Jan 19 '15 at 17:44

3

You actually do not need a secondary index for this particular example and secondary indexes are not created by default. Your clustering key all by itself will will allow you to do queries that look like

SELECT * from TABLE where ITEM_ID = SOMETHING;

Which will automatically give you back results sorted on your clustering key CREATED_ON.

The reason for this is your key will basically make partitions internally that looks like

ITEM_ID => [Row with first Created_ON], [Row with second Created_ON] ...

answered Jan 19 '15 at 17:44

RussS

16,006
1
29
58

Doesn't Clustering column uses Secondary Index? or am I missing something? `primary key(itemid,created_on)` – Reddy Jan 19 '15 at 17:48
It does not use a secondary index. A secondary index is a completely different concept. – RussS Jan 19 '15 at 17:50
this post and http://stackoverflow.com/questions/18168379/cassandra-choosing-a-partition-key are contradicting?? :-/ bit confused – Reddy Jan 19 '15 at 18:05
Nope, I suggest you read through the answer again. Specifically " In the primary key we have these components: `PRIMARY KEY(partitioning key, clustering key_1 ... clustering key_n)` " The PRIMARY KEY is made up of a partitioning key and 0 or more clustering keys. These define the on disk layout of the data in the Cassandra table and are completely separate from the idea of secondary indexes. – RussS Jan 19 '15 at 19:07

Cassandra Data Modelling and designing the Clustering

1 Answers1