5

So if I have the following data in Bigtable:

DEL_6878 .....
DEL_6879 .....
BOM_5876 .....
SFO_8686 .....
SFO_8687 .....

How do I query for say SFO* records? I read the documentation; I know how to get single row; something similar to this:

table.get("SFO_8686");

Or how to get a range; something like getRows("SFO_8686", "SFO _8687") which takes in startKey and endKey, but I read in the documentation and was made to believe that one can get records that start with prefix; the SFO* example. How do I do that?

Misha Brukman
  • 10,866
  • 4
  • 54
  • 71
Amit
  • 1,775
  • 12
  • 20

2 Answers2

3

I would think that running a Scan with a range is your most efficient option. You can also use a scan with org.apache.hadoop.hbase.filter.RowFilter with a RegexStringComparator.

Misha Brukman
  • 10,866
  • 4
  • 54
  • 71
Solomon Duskis
  • 2,571
  • 14
  • 11
  • The documentation specifically says I can query with SFO* kind of operations. IF the key is made up of multiple data points, AND at query time I only know the value of part of the key, the SFO part, then how do I even formulate start and end? – Amit Jul 21 '16 at 11:30
  • Just for my own clarity, which documentation are you referring to? – Solomon Duskis Jul 21 '16 at 15:28
  • 1
    Oh.. you mean "SFO_" is an obvious start, but it's unclear what the end key should be? Maybe byte[] bytes = "SFO_".getBytes(); bytes[bytes.length-1] = (byte) (bytes[bytes.length-1] + 1); I'm not sure off hand. – Solomon Duskis Jul 21 '16 at 15:34
  • Let's take specific example from documentation: https://cloud.google.com/bigtable/docs/schema-design-time-series In the document example for smart meters, read this section: "Taking the queries together, the row key will need to be of the form DATE + METER ...... To query all meters for a given day, retrieve a range of rows using DATE" So how does one keep a key of DATE+METER, and then query for all rows for a day, like 01July2016, across meters. Query like 01Jul2016*? If in the documentation one can have a query example on how to go about it, then that's the thing I am after. – Amit Jul 22 '16 at 07:35
  • 2
    Your easiest solution is likely to be a scan with a PrefixFilter. I'll make a note to beef up our examples for the time series use cases. – Solomon Duskis Jul 22 '16 at 15:48
  • Wouldn't scan do full table scan? Can't I do startKey = SFOa, endKey=SFOz? Assuming that the key parts after SFO are alphabets, wouldn't that mimic SFO*? The documentation does state the rows are stored in lexicographic order. – Amit Jul 22 '16 at 16:04
  • 2
    You can definitely do a scan with a start and end key. I don't know off hand about whether or not it does a full table scan. – Solomon Duskis Jul 22 '16 at 20:20
  • The documentation has left anything more than trivial to "figure it out". Fit case to be put on SO documentation – Amit Jul 25 '16 at 07:00
  • Add to that scant community on SO, AND Google claim in the help documents that the BigTable questions are actively answered on SO - I am not getting good vibes on using the system. – Amit Jul 25 '16 at 13:16
  • 1
    I sent your feedback to the team. – Solomon Duskis Jul 25 '16 at 16:39
3

In my experience, PrefixFilter works well for partial row-key *-style scans and from what I managed to dig out, setting the start and end rows in addition to that should improve performance (presumably by avoiding the full scan):

PrefixFilter px = new PrefixFilter(Bytes.toBytes(rowKey));
Scan s = new Scan();
s.setStartRow(Bytes.toBytes(rowKey));
s.setFilter(px);
...

Also, from what I understand from this discussion: HBase (Easy): How to Perform Range Prefix Scan in hbase shell , is in the shell environment, the 'ROWPREFIXFILTER' is meant to combine the two elements above:

scan 'TableName', {ROWPREFIXFILTER => 'SFO'}

But I have not managed to find a java-equivalent of that, if that's what you are after. Would be helpful to hear if others have!

Community
  • 1
  • 1
VS_FF
  • 1,980
  • 3
  • 10
  • 25