Google Cloud Bigtable: query partial keys

Question

So if I have the following data in Bigtable:

DEL_6878 .....
DEL_6879 .....
BOM_5876 .....
SFO_8686 .....
SFO_8687 .....

How do I query for say SFO* records? I read the documentation; I know how to get single row; something similar to this:

table.get("SFO_8686");

Or how to get a range; something like getRows("SFO_8686", "SFO _8687") which takes in startKey and endKey, but I read in the documentation and was made to believe that one can get records that start with prefix; the SFO* example. How do I do that?

score 3 · Answer 1 · edited Jul 20 '18 at 03:25

3

I would think that running a Scan with a range is your most efficient option. You can also use a scan with org.apache.hadoop.hbase.filter.RowFilter with a RegexStringComparator.

edited Jul 20 '18 at 03:25

Misha Brukman

10,866
4
54
71

answered Jul 20 '16 at 14:26

Solomon Duskis

2,571
14
11

The documentation specifically says I can query with SFO* kind of operations. IF the key is made up of multiple data points, AND at query time I only know the value of part of the key, the SFO part, then how do I even formulate start and end? – Amit Jul 21 '16 at 11:30
Just for my own clarity, which documentation are you referring to? – Solomon Duskis Jul 21 '16 at 15:28
1

Oh.. you mean "SFO_" is an obvious start, but it's unclear what the end key should be? Maybe byte[] bytes = "SFO_".getBytes(); bytes[bytes.length-1] = (byte) (bytes[bytes.length-1] + 1); I'm not sure off hand. – Solomon Duskis Jul 21 '16 at 15:34
Let's take specific example from documentation: https://cloud.google.com/bigtable/docs/schema-design-time-series In the document example for smart meters, read this section: "Taking the queries together, the row key will need to be of the form DATE + METER ...... To query all meters for a given day, retrieve a range of rows using DATE" So how does one keep a key of DATE+METER, and then query for all rows for a day, like 01July2016, across meters. Query like 01Jul2016*? If in the documentation one can have a query example on how to go about it, then that's the thing I am after. – Amit Jul 22 '16 at 07:35
2

Your easiest solution is likely to be a scan with a PrefixFilter. I'll make a note to beef up our examples for the time series use cases. – Solomon Duskis Jul 22 '16 at 15:48
Wouldn't scan do full table scan? Can't I do startKey = SFOa, endKey=SFOz? Assuming that the key parts after SFO are alphabets, wouldn't that mimic SFO*? The documentation does state the rows are stored in lexicographic order. – Amit Jul 22 '16 at 16:04
2

You can definitely do a scan with a start and end key. I don't know off hand about whether or not it does a full table scan. – Solomon Duskis Jul 22 '16 at 20:20
The documentation has left anything more than trivial to "figure it out". Fit case to be put on SO documentation – Amit Jul 25 '16 at 07:00
Add to that scant community on SO, AND Google claim in the help documents that the BigTable questions are actively answered on SO - I am not getting good vibes on using the system. – Amit Jul 25 '16 at 13:16
1

I sent your feedback to the team. – Solomon Duskis Jul 25 '16 at 16:39

score 3 · Answer 2 · edited May 23 '17 at 12:00

In my experience, PrefixFilter works well for partial row-key *-style scans and from what I managed to dig out, setting the start and end rows in addition to that should improve performance (presumably by avoiding the full scan):

PrefixFilter px = new PrefixFilter(Bytes.toBytes(rowKey));
Scan s = new Scan();
s.setStartRow(Bytes.toBytes(rowKey));
s.setFilter(px);
...

Also, from what I understand from this discussion: HBase (Easy): How to Perform Range Prefix Scan in hbase shell , is in the shell environment, the 'ROWPREFIXFILTER' is meant to combine the two elements above:

scan 'TableName', {ROWPREFIXFILTER => 'SFO'}

But I have not managed to find a java-equivalent of that, if that's what you are after. Would be helpful to hear if others have!

Google Cloud Bigtable: query partial keys

2 Answers2