1

Suppose I have a RethinkDB table with type and timestamp fields. The type can be either "good" or "bad". I would like to write a RethinkDB query which gets the timestamp of the most recent "good" document while making use of a compound index with type and timestamp.

Here is an example script with one solution:

import faker
import rethinkdb as r
import dateutil.parser
import dateutil.tz

fake = faker.Faker()
fake.seed(0)            # Seed the Faker() for reproducible results

conn = r.connect('localhost', 28016)    # The RethinkDB server needs to have been launched with 'rethinkdb --port-offset 1' at the command line

# Create and clear a table
table_name = 'foo'  # Arbitrary table name
if table_name not in r.table_list().run(conn):
    r.table_create(table_name).run(conn)
r.table(table_name).delete().run(conn)      # Start on a clean slate

# Create fake data and insert it into the table
N = 5       # Half the number of fake documents
good_documents = [{'type':'good', 'timestamp': dateutil.parser.parse(fake.time()).replace(tzinfo=dateutil.tz.tzutc())} for _ in range(N)]
bad_documents = [{'type':'bad', 'timestamp': dateutil.parser.parse(fake.time()).replace(tzinfo=dateutil.tz.tzutc())} for _ in range(N)]
documents = good_documents + bad_documents
r.table(table_name).insert(documents).run(conn)

# Create compound index with 'type' and 'timestamp' fields
if 'type_timestamp' not in r.table(table_name).index_list().run(conn):
    r.table(table_name).index_create("type_timestamp", [r.row["type"], r.row["timestamp"]]).run(conn)
    r.table(table_name).index_wait("type_timestamp").run(conn)

# Get the latest 'good' timestamp in Python
good_documents = [doc for doc in documents if doc['type'] == "good"]
latest_good_timestamp_Python = max(good_documents, key=lambda doc: doc['timestamp'])['timestamp']

# Get the latest 'good' timestamp in RethinkDB
cursor = r.table(table_name).between(["good", r.minval], ["good", r.maxval], index="type_timestamp").order_by(index=r.desc("type_timestamp")).limit(1).run(conn)
document = next(cursor)
latest_good_timestamp_RethinkDB = document['timestamp']

# Assert that the Python and RethinkDB 'queries' return the same thing
assert latest_good_timestamp_Python == latest_good_timestamp_RethinkDB

Prior to running this script, I launched RethinkDB at port 28016 using the command

rethinkdb --port-offset 1

I also use the faker package to generate fake data.

The query which I use, which combines between, order_by, and limit, does not seem particularly elegant or succinct, and I was wondering whether it is possible to use max for this purpose. However, it is not immediately clear to me from the documentation (https://www.rethinkdb.com/api/python/max/) how to do this. Any ideas?

Kurt Peek
  • 34,968
  • 53
  • 191
  • 361

1 Answers1

1

Ideally, you could replace this part of your query:

.order_by(index=r.desc("type_timestamp")).limit(1)

With:

.max(index="type_timestamp")

However that isn't currently possible. See https://github.com/rethinkdb/rethinkdb/issues/5141

AtnNn
  • 5,686
  • 1
  • 23
  • 29
  • Unfortunately this gives a `rethinkdb.errors.ReqlQueryLogicError: Expected type TABLE but found TABLE_SLICE: SELECTION ON table(foo) in: r.table('foo').between(['good', r.minval], ['good', r.maxval], index='type_timestamp').max(index='type_timestamp')`. – Kurt Peek Mar 14 '17 at 10:14
  • My suggestion indeed does not work. I've updated the answer – AtnNn Mar 14 '17 at 10:25