rethinkdb: secondary compound indexes / aggregation queries and intermediate documents generation

Question

Let's assume such table content where for the same product_id, we have as many rows than updates during status==1 (published) and finally status==0 (unpublished) and then becomes==2 (deleted)

{id: <auto>, product_id: 1, last_updated: 2015-12-1, status: 1, price: 1}
{id: <auto>, product_id: 2, last_updated: 2015-12-1, status: 1, price: 10}
{id: <auto>, product_id: 1, last_updated: 2015-12-2, status: 1, price: 2}
{id: <auto>, product_id: 1, last_updated: 2015-12-3, status: 0, price: 2}
{id: <auto>, product_id: 2, last_updated: 2015-12-2, status: 0, price: 10}
{id: <auto>, product_id: 3, last_updated: 2015-12-2, status: 1, price: 123}
{id: <auto>, product_id: 1, last_updated: 2015-12-4, status: 2, price: 2}
{id: <auto>, product_id: 2, last_updated: 2015-12-4, status: 2, price: 10}

Now, I am trying to find a way, maybe using a secondary compound index, do get for example, given a date like in col1 (using r.time)

DATE       STATUS==1    STATUS==0    STATUS==2  

2015-12-1  [101, 102]   []           []
2015-12-2  [103, 106]   [105]        []
2015-12-3  [106]        [104, 105]   []
2015-12-4  []           []           [107, 108]

The difficulty here, is that a product_id document is still to be considered as the most recent status as long as its last_updated date is less or equal to the provided date.

I try by grouping by product_id, then take the max('last_updated'), then only keep each reduction unique document if status==1

I have in mind to have an index for each status / given_date

Or another solution, would be to insert in another table the result of an aggregation which would only store a unique document per date, containing all the initial documents ids matching the same criteria, and so on... And then later perform joins using these intermediate records to fetch the values of each product_id at the given date/status.

something like:

{
    date: <date_object>,
    documents: [
        {id: document_id, status: 1},
        {id: document_id, status: 1},
        {id: document_id, status: 2},
        {id: document_id, status: 0},
        ...
    ]
}

Please advise

Edit 1:

This is an example of a query I try to run to analyse my data, here it is for example to get an overview of the statuses for each group with more than 1 document:

r.db('test').table('products_10k_sample')
  .group({index: 'product_id'})
  .orderBy(r.desc('last_updated'))
  .ungroup()
  .map(function(x){
    return r.branch(
      x('reduction').count().gt(1),
      x('reduction').map(function(m){
        return [m('last_updated').toISO8601(), m('status'), m('product_id')]
      }),
      null
    )
  })

Hi, Due to my English, I couldb't be able to understand this. Would you make your question a bit more clear, easier to understand and with some sample data, and what is the exptected output. You already did that in your questions, but somehow I could't quite get it, probably because of my English. — kureikain, Jan 11 '16 at 18:43

rethinkdb: secondary compound indexes / aggregation queries and intermediate documents generation

0 Answers0