Find documents not referenced by another table

Question

Kind of a "not_eq_join", if that makes sense.

A have a table A with documents containing a _key field (it's the primary key), and another table B that has a refs field that points to A._key, this is btw a multi index so it may contain multiple references to documents in A. A is quite large, however B is not (probably less than 1000)

Entry in A:

{
  '_key': 'doc1',
  ...
}

Entry in B:

{
  'refs': ['doc1'],
  ...
}

The question is; how do I find documents in A that is not referenced by any document in B ?

My naive solution would be to retrieve all unique values from the B.refs multi index (but how to do that?) to an array variable, and then do something like: .filter(refArray.contains(R.row['_key']).not_()) on the A selection (filtered by other criteria as well).

But this seems wrong, any better ideas?

Of course, I'd like to do this "as quickly as possible" :)

score 1 · Accepted Answer · answered Oct 13 '15 at 18:29

1

Assuming that the number of documents referenced by B is small, your solution is probably the best one.

You can get the distinct values in an index (including multi-indexes) by writing r.table('B').distinct({index: 'refs'}).

If the the number of documents referenced by B is large compared to the size of A, this gets a lot harder to do efficiently.

answered Oct 13 '15 at 18:29

mlucy

5,046
1
14
20

I see. I didn't know `distinct()` could be used for that, nice! I'll accept this as answer since I sneaked in a second question. – pythonator Oct 14 '15 at 06:24

Find documents not referenced by another table

1 Answers1