10

I want to store a blog post in a database. I thought it would be nice to have different versions of that data, much like version controlling does for text files.

So, I imagine it working like a row in a table, that had version control. So, for example, you could retrieve the latest version of that row, or a previous version. You could even branch from that row.

Does anything like this exist?

Possibly useful info: I am currently using Python, Django & MySQL. I'm experimenting with MongoDB

Edit for clarity/more context: I'm looking for a solution more tailored towards "version control" of rows than of databases; I'm not so much interested in branching entire databases. For example, I would be able to query the content of blog post at 1/1/2011 and at 1/1/2010 (without switching databases).

Community
  • 1
  • 1
cammil
  • 8,112
  • 14
  • 49
  • 82
  • Have you considered using a version control system like git? Would be interesting to see pros and cons of such a solution. – milan Dec 15 '11 at 22:27
  • @milan - since when git versions database **records**? – N.B. Dec 28 '11 at 15:30
  • The question doesn't say *any* database record, it says blog posts, which are mostly text, so why not? – milan Dec 28 '11 at 17:23
  • Does this answer your question? [How to version control a record in a database](https://stackoverflow.com/questions/323065/how-to-version-control-a-record-in-a-database) – Marco Eckstein Nov 01 '20 at 16:44

3 Answers3

5

Version control is a complicated topic; doing it right is really challenging which is essentially why even using e.g. git can be tough. I wouldn't want to write a full blown version control system.

For simpler requirements, consider this structure, in pseudo-mongodb/JSON:

BlogPost {
    "_id": ObjectId("..."),
    "slug" : "how-to-version-my-posts",
    "author" : "cammil",
    "published" : date,
    "lastModified" : date,
    "publicVersion" : 32,
    "draftVersion" : 34,
    "teaserText" : "lorem ipsum dolor sit amet..."
}

BlogPostBody {
    "_id" : ObjectId("..."),
    "Version" : 32,
    "Text" : "lorem ipsum dolor sit amet..."
}

So the idea here is to store each version separately and a pointer to the current public version and the current version for editors, bloggers, etc.

My answer is a little MongoDB centric (because I built a MongoDB based blog engine for home use), but it should work similarly for any storage system.

Advantages:

  • No need to do MAX queries of the version number for either public or private posts
  • Does not correlate last edited to version numbers, which might not be desirable
  • Allows versioning even if a certain version is already published
  • Can fetch the teaser w/o having to fetch the entire article

Disadvantages:

  • Copies the entire text every time. Not a real concern for textual data I guess (try to type 1GB...). Will be a problem for bigger blogging sites, however. Mitigation: Compress text using deflate, delta-compression.
  • Needs to update two objects on update
mnemosyn
  • 42,213
  • 5
  • 68
  • 79
5

First off, I must say this is an interesting question.

In my line of work, I've to save versions of various user inputs. The way I do it, and by all means I don't really know if it's the right way or not, is the following:

I have a master table and revisions table. I chose these 2 names for the example's sake only.

What master does is stores the following info:

  • id (autoincrement)
  • version_id (int)

What revisions store is the following:

  • id
  • master_id
  • version_id
  • rest of the relevant data about entered entity (dates, etc)

What I achieved this way is that I had gotten an ID of, let's say a blog post. If someone edits the post, I'll store that info to revisions table. Via triggers I'm incrementing the version_id in revisions table. After that I update master table with the latest version_id number. That way I don't have to perform MAX() when I want to see what the latest version is.

That way I obtained simple, yet powerful version system of website's content. It's easy to see changes, and it's also extremely fast to obtain data if you abuse some MySQL cool features (in my actual tables, I'm abusing InnoDB's clustered primary key to the max. so the db design is slightly diff. than the one I posted here).

N.B.
  • 12,505
  • 3
  • 38
  • 50
2

OffScale DataGrove allows you to version you entire DB.

It tracks all the changes that happen to the DB and you can tag versions and switch back and forth between them. DataGrove is unique in the fact that it versions the entire DB - schema and data.

In your example - just add the row/data you want to the DB and tag a version. You'll always be able to go back to that version and even branch from it.

Taichman
  • 1,178
  • 2
  • 9
  • 21
  • 1
    From your description, DataGrove appears to be more oriented towards branching for whole databases than for individual rows. (See edit) – cammil Dec 28 '11 at 14:33