9

I thought that regardless of whether a NoSQL aggregate store is a key-value, column-family or document database, it would support versioning of values. After a bit of Googling, I'm concluding that this assumption is wrong and that it just depends on the DBMS implementation. Is this true?

I know that Cassandra and BigTable support it (both column-family stores). It SEEMS that Hbase (column family) and Riak (Key-Value) do but Redis and Hadoop (Key-Value) do not. Mongo DB (document) doesCouchbase does but MongoDB does not (document stores). I don't see any pattern here. Is there a rule of thumb? (for example, "key value stores generally do not have versioning, while column-family and document databases do")

What I'm trying to do: I want to create a database of website screenshots from URL to PNG image. I'd rather use a key-value store since, versioning aside, it is the simplest solution that satisfies the problem. But when website changes or is decomissioned and I update my database I don't want to lose old images. Even if I select a key-value database that has versioning, I want to have the luxury to switch to a different key-value database without the constraint that many key-value DBs do not support versioning. So I'm trying to understand at what level of sophistication in the continuum of aggregate NoSQL databases does versioning become a feature implicit to the data model.

Community
  • 1
  • 1
Sridhar Sarnobat
  • 19,595
  • 12
  • 74
  • 93

2 Answers2

10

You don't really need versioning support from the Key-Value store.

The only thing you really need from the data Store is an efficient scanning/range query feature.

This means the datastore can retrieve entries in lexicographical order.

Most KV-stores do, so this is easy.

This is how you do it:

  1. Create versioned keys.

    In case you cant hash the original name to a fixed length, prepend the length of the original key. then put in the hash of the key or the original key itself, and end with a fixed length encoded version number (so it is lexicographically ordered from high version to low by inverting the number against the max version).

  2. Query

    Do a range query from the maximum possible version up to version 0, but only retrieving exactly one key.

Done

If you dont need explicit versions, you can also use a timestamp, so you can insert without getting the last version.

sleeplessnerd
  • 18,543
  • 1
  • 22
  • 29
  • Hmmmm, thanks for the suggestion. I guess this is quite analogous to what I want and can be generalized to any KV store. I can see it coming back to haunt me, but I have no good case for it being an undesirable approach that I can articulate. – Sridhar Sarnobat Oct 28 '14 at 20:01
5

A really interesting approach to this is the Datomic database. Rather store versions, in Datomic, there are no updates only inserts. The entire database is immutable meaning you can specify the moment of truth you want to see the database as on connect and the entire history will appear to only contain the changes made up to that point. Or to think of it another any anything inserted into the database can be queried for its history looking backward. You can also branch the database and create data in one branch that isn't in the other (in programming it is like a database based on git, where multiple histories can be created)

Jason Sperske
  • 27,420
  • 8
  • 63
  • 116