my 2 pence worth. A bit longing but ...... I had a similar requirement in one of my incubation projects. Similar to yours , my key requirements where a document database ( xml in my case),with document versioning. It was for a multi-user system with a lot of collaboration use cases. My preference was to use available opensource solutions that support most of the key requirements.
To cut to the chase, I could not find any one product that provided both, in a way that was scalable enough ( number of users, usage volumes, storage and compute resources).I was biased towards git for all the promising capability, and (probable) solutions one could craft out of it. As I toyed with git option more, moving from a single user perspective to a multi ( milli) user perspective became an obvious challenge. Unfortunately, I did not get to do substantial performance analysis like you did. ( .. lazy/ quit early ....for version 2, mantra) Power to you!. Anyway, my biased idea has since morphed to the next (still biased ) alternative: a mesh-up of tools that are the best in their separate spheres, databases and version control.
While still work in progress ( ...and slightly neglected ) the morphed version is simply this .
- on the frontend: (userfacing ) use a database for the 1st level
storage ( interfacing with user applications )
- on the backend,
use a version control system (VCS)(like git ) to perform
versioning of the data objects in database
In essence it would amount to adding a version control plugin to the database, with some integration glue, which you may have to develop, but may be a lot much easier.
How it would (supposed to ) work is that the primary multi-user interface data exchanges are through the database. The DBMS will handle all the fun and complex issues such as multi-user , concurrency e, atomic operations etc. On the backend the VCS would perform version control on a single set of data objects ( no concurrency, or multi-user issues). For each effective transactions on the database, version control is only performed on the data records that would have effectively changed.
As for the interfacing glue, it will be in the form of a simple interworking function between the database and the VCS. In terms of design, as simple approach would be an event driven interface, with data updates from the database triggering the version control procedures ( hint : assuming Mysql, use of triggers and sys_exec() blah blah ...) .In terms of implementation complexity, it will range from the simple and effective ( eg scripting ) to the complex and wonderful ( some programmed connector interface) . All depends on how crazy you want to go with it , and how much sweat capital you are willing to spend. I reckon simple scripting should do the magic. And to access the end result, the various data versions, a simple alternative is to populate a clone of the database ( more a clone of the database structure) with the data referenced by the version tag/id/hash in the VCS. again this bit will be a simple query/translate/map job of an interface.
There are still some challenges and unknowns to be dealt with, but I suppose the impact, and relevance of most of these will largely depend on your application requirements and use cases. Some may just end up being non issues. Some of the issues include performance matching between the 2 key modules, the database and the VCS, for an application with high frequency data update activity, Scaling of resources (storage and processing power ) over time on the git side as the data , and users grow: steady, exponential or eventually plateau's
Of the cocktail above, here is what I'm currently brewing
- using Git for the VCS ( initially considered good old CVS for the due to the use of only changesets or deltas between 2 version )
- using mysql ( due to the highly structured nature of my data, xml with strict xml schemas )
- toying around with MongoDB (to try a NoSQl database, which closely matches the native database structure used in git )
Some fun facts
- git actually does clear things to optimize storage, such as compression, and storage of only deltas between revision of objects
- YES, git does store only changesets or deltas between revisions of data objects, where is it is applicable ( it knows when and how) . Reference : packfiles, deep in the guts of Git internals
- Review of the git's object storage ( content-addressable filesystem), shows stricking similarities ( from the concept perspective) with noSQL databases such mongoDB. Again, at the expense of sweat capital, it may provide more interesting possibilities for integrating the 2, and performance tweaking
If you got this far, let me if the above may be applicable to your case, and assuming it would be , how it would square up to some of the aspect in your last comprehensive performance analysis