Let's assume such table content where for the same product_id, we have as many rows than updates during status==1 (published) and finally status==0 (unpublished) and then becomes==2 (deleted)
{id: <auto>, product_id: 1, last_updated: 2015-12-1, status: 1, price: 1}
{id: <auto>, product_id: 2, last_updated: 2015-12-1, status: 1, price: 10}
{id: <auto>, product_id: 1, last_updated: 2015-12-2, status: 1, price: 2}
{id: <auto>, product_id: 1, last_updated: 2015-12-3, status: 0, price: 2}
{id: <auto>, product_id: 2, last_updated: 2015-12-2, status: 0, price: 10}
{id: <auto>, product_id: 3, last_updated: 2015-12-2, status: 1, price: 123}
{id: <auto>, product_id: 1, last_updated: 2015-12-4, status: 2, price: 2}
{id: <auto>, product_id: 2, last_updated: 2015-12-4, status: 2, price: 10}
Now, I am trying to find a way, maybe using a secondary compound index, do get for example, given a date like in col1 (using r.time)
DATE STATUS==1 STATUS==0 STATUS==2
2015-12-1 [101, 102] [] []
2015-12-2 [103, 106] [105] []
2015-12-3 [106] [104, 105] []
2015-12-4 [] [] [107, 108]
The difficulty here, is that a product_id document is still to be considered as the most recent status as long as its last_updated date is less or equal to the provided date.
I try by grouping by product_id, then take the max('last_updated'), then only keep each reduction unique document if status==1
I have in mind to have an index for each status / given_date
Or another solution, would be to insert in another table the result of an aggregation which would only store a unique document per date, containing all the initial documents ids matching the same criteria, and so on... And then later perform joins using these intermediate records to fetch the values of each product_id at the given date/status.
something like:
{
date: <date_object>,
documents: [
{id: document_id, status: 1},
{id: document_id, status: 1},
{id: document_id, status: 2},
{id: document_id, status: 0},
...
]
}
Please advise
Edit 1:
This is an example of a query I try to run to analyse my data, here it is for example to get an overview of the statuses for each group with more than 1 document:
r.db('test').table('products_10k_sample')
.group({index: 'product_id'})
.orderBy(r.desc('last_updated'))
.ungroup()
.map(function(x){
return r.branch(
x('reduction').count().gt(1),
x('reduction').map(function(m){
return [m('last_updated').toISO8601(), m('status'), m('product_id')]
}),
null
)
})