4

I am trying to evaluate if it is possible to set a TTL on individual row in HBase or Bigtable.

I know that Cassandra allows using TTL at insert. I want to find if the same is possible in HBase and in Google Cloud Bigtable.

INSERT INTO test (k,v) VALUES ('test', 1) USING TTL 10;
Maxim
  • 3,172
  • 9
  • 21

2 Answers2

4

There is no native support for fine-grained TTLs in Bigtable. But there are two common ways to simulate them, with different tradeoffs:

  • If you're setting up a new empty table and plan to set a TTL with every write, you can set max_age on your column families to something very small, say 1s, and explicitly set the write timestamp for each new value to the time at which you want it to expire.
    • Pro: This approach makes things easy to understand, as the timestamp has an obvious semantic meaning and no munging is required.
    • Con: If you ever forget to set a TTL and instead use the default server timestamp, that data will be considered expired right away and will be dropped on the next compaction.
    • Con: The same also holds if you try to apply this to a pre-existing table: any existing data using real timestamps will be dropped.
    • Con: It's not possible to have multiple values for any given cell which expire at the same time.
  • If you want a default TTL of X which can then be overridden, set this on your column families as normal. Writes can then adjust their TTL to Y by setting their timestamp to (real_timestamp - X + Y).
    • Pro: This approach can safely be applied to a pre-existing table.
    • Pro: There are no pitfalls if you forget to set a TTL.
    • Con: Timestamps cannot be interpreted at all, as any given cell might have a real timestamp or it might have a simulated TTL override timestamp.
    • Con: Related to the above, there could be unexpected timestamp clashes between values with default and overridden TTLs that are written (Y-X) apart.

Remember as always that Bigtable garbage collection is asynchronous, so values will not disappear immediately after their TTL. If you don't want to read TTL'd values you'll need to send an appropriate time range with your read requests. In the first approach, this would be anything later than now. In the second, it would be anything later than (now - X).

Both of these approaches also sacrifice all the useful properties of having a real timestamp attached to the values, including debuggability and easy versioning. You can regain some of this by writing the real timestamp to a separate column yourself, but in general it means these work best when you're also only keeping the most recent value.

0

I have not used/tested the below myself, as was never needed, but have a look at the following:

At an individual mutation level (i.e. creating a single row) try using:

Put.setTTL(long)

To apply this at the Column Family level for a given table, try the following when creating the table:

ColumnFamilyDescriptorBuilder.setTimeToLive(int)

Based on my experience with other HBase functionality with the same setup, I would imagine that you can use the table creation time to set some sort of a global/default TTL for the given column family, but then adjust it at the individual Put level if needed, as shown above.

The above are in Java, but you can do this from HBase shell as well, when inserting rows or creating a new table manually.

VS_FF
  • 1,980
  • 3
  • 10
  • 25