I store my data in a single Cassandra 2.0.10 table. There is one column (named score
), type of integer, can take any values. I need to write a background job that would assign a value to another column, rank
, giving value 1 for the row with the highest value in the score field, value 2 for the next to highest and so on. The row with the smallest score
value must get the the total row count assigned to the rank
. It is currently defined in CQL as
CREATE TABLE players
(user int, rank int, score int, details blob, PRIMARY KEY(user))
Bet it something like PostgreSQL, I would do something like
select id, rank from players order by score desc offset A limit 100;
using increasing values for A and this way iterating the database in pages of size 100. It would give me top 100 players in one query, top 100 to 200 in the second, etc. Then I can fire update statements by id, one by one or in batches.
When I try to do the same in Cassandra CQL, turns out that many needed features are not supported (no order, not offset, no clear way how to visit all rows). I tried to build the index for the score column but this was not helpful.
This rank assignment is a helper job. It is no problem for it to take days or even weeks to iterate. It is ok to have it slightly inconsistent as scores may change while the job is running. It is not the main feature of the application. The main features do not use ranges queries and Cassandra works well there.
Is it possible to implement this rank assignment combining Java and CQL somehow or the limitations are severe enough I need to use a different database engine?