77

I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:

  • CRUD of entities with some fields which have unique index on it
  • ecommerce web app like eBay (better description here).

And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).

Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?

Community
  • 1
  • 1
user2427
  • 7,492
  • 18
  • 57
  • 71

6 Answers6

37

You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)

One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?

Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?

So lets look at your examples:

  • CRUD of entities with some fields with unique index on it.

If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.

So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.

  • ecommerce web app like ebay

I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.

Tim Lovell-Smith
  • 13,077
  • 11
  • 67
  • 89
Kerr
  • 2,762
  • 20
  • 31
  • How about using UUIDs for distributed, shared-nothing, globally unique identifiers? Do people commonly do this in the document database world? – Paul Legato Sep 27 '11 at 17:56
  • @Tim Lovell-Smith + kerrr +1 I like the real world comparsion with paper based documents. :) Good point noting CouchDB requires/assumes clustering. Also a good point that consistency is not always guaranteed. For me as a RDB supporter this reads as (a rule among others, of course): "if consistency is crucial use a relational dababase". Right? (Note: I'm currently starting a new project were I'd like to decide if to use NoSQL or RDB.) – try-catch-finally Apr 18 '14 at 10:31
13

Is there a need to normalize the data?

  • Yes: Use relational.
  • No: Use document.
dacracot
  • 20,476
  • 26
  • 98
  • 148
  • 13
    I know you answered this a long time ago, but I thought I'd ask... When do you "need" to normalize? Isn't normalization a choice/best practice? – Matt Grande Jul 08 '09 at 14:53
  • 1
    @Matt, data normalization is just a tool. The degree to which you normalize data is a tradeoff between database design effort and consistency maintenance effort. – pyon Jan 27 '10 at 04:24
  • 5
    I wouldn't agree that this is a good way to distinguish which db model to use. Normalization is inevitable in both relational and document based databases. My guts feeling is that size of transactions is more likely to be a valid differentiation. – Munhitsu Jul 06 '11 at 21:09
  • What do you mean by normalization here? If I understand normalization correctly as a means to an end your answer seems incomplete... – Tim Lovell-Smith Oct 20 '13 at 04:15
  • It's the 2nd time I read this rule of thumb (to look at the need of normalization). But actually for me as RDB supporter constantly trying to understand if the next project should be implemented with a documentbased or with a relational database this "rule" isn't helpful, because if I want to, I could design my RDB (very) unnormalized (and some engineers even recommend this from a performance view). – try-catch-finally Apr 18 '14 at 10:19
8

I am in the same boat, I am loving couchdb at the moment, and I think that the whole functional style is great. But when exactly do we start to use them in ernest for applications. I mean, yes we can all start to develop applications extremely quickly, cruft free with all those nasty hang-ups about normal form being left in the wayside and not using schemas. But, to coin a phrase "we are standing on the shoulders of giants". There is a good reason to use RDBMS and to normalise and to use schemas. My old oracle head is reeling thinking about data without form.

My main wow factor on couchdb is the replication stuff and the versioning system working in tandem.

I have been racking my brain for the last month trying to grok the storage mechanisms of couchdb, apparently it uses B trees but doesn't store data based on normal form. Does this mean that it is really really smart and realises that bits of data are replicated so lets just make a pointer to this B tree entry?

So far I am thinking of xml documents, config files, resource files streamed to base64 strings.

But would I use couchdb for structural data. I don't know, any help greatly appreciated on this.

Might be useful in storing RDF data or even free form text.

WeNeedAnswers
  • 4,438
  • 2
  • 28
  • 47
6

A possibility is to have a main relational database that stores definitions of items that can be retrieved by their IDs, and a document database for the descriptions and/or specifications of those items. For example, you could have a relational database with a Products table with the following fields:

  • ProductID
  • Description
  • UnitPrice
  • LotSize
  • Specifications

And that Specifications field would actually contain a reference to a document with the technical specifications of the product. This way, you have the best of both worlds.

pyon
  • 15,961
  • 14
  • 83
  • 144
  • 2
    SQL Server 2008 is an example of a database that can do both (using the FILESTREAM datatype). – John Saunders Jan 27 '10 at 03:17
  • Wow. Awesome feature. (I've never used SQL Server 2008.) – pyon Jan 27 '10 at 04:20
  • Just being able to store a loose 'document' or file doesn't make it a document oriented database system. Real document-oriented databases give you features to index and work with documents efficiently. – Tim Lovell-Smith Oct 20 '13 at 04:17
  • @TimLovell-Smith If there is any structure, it is most profitably taken advantage of using a relational database (or, even better, a categorical one: http://math.mit.edu/~dspivak/informatics/talks/CTDBIntroductoryTalk). What I am advocating is establishing a clean divide between the structured and unstructured parts of the data. – pyon Oct 20 '13 at 16:50
  • @TimLovell-Smith How so? You mentioned "features to index and work with documents". Indices are structures, and thus, as I said, are "most profitably taken advantage of using a relational database", even if the actual contents of the documents are not. – pyon Oct 21 '13 at 18:40
  • This hybrid solution is kindof what i was thinking about when reading the unique ID constraint mentioned in the problem. "Why not just have a private web service tied to a sequence generator?" But more & more i'm just leaning toward.. "Why not store json in a relational database..." I think its has proper support now from most of the major relational databases or will soon (though I only check Postgres). Even if it doesn't, there are lots of other free-form datatypes that could be used for this purpose but just won't validate the json. – jm0 Apr 28 '14 at 13:41
4

Document based DBs are best suiting for storing, well, documents. Lotus Notes is a common implementation and Notes email is an example. For what you are describing, eCommerce, CRUD, etc., realtional DBs are better designed for storage and retrieval of data items/elements that are indexed (as opposed to documents).

Jim Anderson
  • 3,481
  • 2
  • 22
  • 20
  • 9
    I don't agree. A document database isn't primarily for storing documents. It is for storing hierarchical pieces of data (either JSON or XML). You can index nested JSON fields and JSON arrays with for instance MongoDB. You can store documents (files) in MongoDB (gridfs) but MongoDB would still be useful if you couldn't store documents (files) with MongoDB. I think that MongoDb should be called a JSON db and not a document db. – Theo May 19 '10 at 15:12
  • 1
    According to the Wikipedia entry for "Document-oriented database", "...using XML, YAML or JSON for information storage has advantages similar to document oriented database" but they are not the same thing. Document databases were originally designed so store documents. If you use them for other data, you are not going to get the best performance/usage just the same as if you store documents in a relational databases. This happens a lot. People store relational data in document databases and then complain how bad document databases are. If you misuse them, yes. – Jim Anderson May 28 '10 at 16:30
  • 1
    The Wikipedia entry http://en.wikipedia.org/wiki/Document-oriented_database has been updated since, worth a look to confirm that document oriented databases are indeed more, than filing cabinets for actual documents. – Zsolt Török Nov 10 '10 at 16:37
  • Interesting. It seems document oriented databases have "evolved" in recent years to be more than I believe they were originally meant to be. – Jim Anderson Dec 06 '10 at 00:33
2

Re CRUD: the whole REST paradigm maps directly to CRUD (or vice versa). So if you know that you can model your requirements with resources (identifiable via URIs) and a basic set of operations (namely CRUD), you may be very near to a REST-based system, which quite a few document-oriented systems provide out of the box.

KoW
  • 794
  • 4
  • 12
  • 1
    I don't think that comparing CRUD to REST is enough to think about using Document-oriented databases. There are a lot more things to consider, REST<>CRUD is only a small part of it. – igorsantos07 Jul 16 '12 at 17:25
  • 1
    I upvoted this as it seemed to me to obliquely reference what is known as "object-relational impedance mismatch" (see http://blogs.tedneward.com/post/the-vietnam-of-computer-science). – Tom Russell Mar 08 '19 at 06:54