1

Let's say I have a table with documents like:

{
  "country": 1,
  "merchant": 2
  "product": 123,
  ...
}

Is it possible to group all the documents into a final json structure like:

[
  {
    <country_id>: {
      <merchant_id>: {
        <product_id>: <# docs with this product id/merchant_id/country_id>,
        ... (other product_id and so on)
      },
      ... (other merchant_id_id and so on)
  },
  ... (other country_id and so on)
]

And if yes, what would be the best and most efficient way? I have more than a million of these documents, on 4 shards with powerful servers (22 Gb cache each)

I have tried this (in the data explorer, in JS, for the moment):

r.db('foo')
  .table('bar')
  .indexCreate('test1', function(d){
    return [d('country'), d('merchant'), d('product')]
  })

and then

r.db('foo')
  .table('bar')
  .group({index: 'test1'})

But the data explorer seems to hang, still working on it as you can see...

paranoid
  • 5,453
  • 14
  • 38
  • 70
DevLounge
  • 7,345
  • 2
  • 26
  • 39

1 Answers1

0

.group({index: 'test1'}).count() will do something pretty similar to what you want, except it won't produce the nested document structure. To produce the nested document structure it would probably be easiest to ungroup, then map over the ungrouped values to produce objects of the form you want, then merge all of them.

The problem with group queries on the whole table though is that they won't stream, you'll need to traverse the whole table to get the end result back. The data explorer is meant for small queries, and I think it times out if your query takes more than 5 minutes to return, so if you're traversing a giant table then it would probably be better to run that query from one of the clients.

mlucy
  • 5,046
  • 1
  • 14
  • 20