11

Whats the cons and pros of using a object database or relational database for regular web development which involves a lot of CRUD?

UPDATE: I reopened the bounty reward in order to give Neville it.

ebb
  • 8,919
  • 15
  • 62
  • 119
  • Totally agree with the pros here. The industry standard is to use an ORM + RDBMS. – Stephen Chung Apr 07 '11 at 06:33
  • please look in to semantic technology. Its my personal opinion that semantic data store is superior to relational data store. AFA as OODBMS is concerned, the technology or the algorithm is not mature enough to compare with RDBMS. – uncaught_exceptions Apr 09 '11 at 09:32
  • The industry standard is wrong for regular web development. OODBMS is mature enough: look at how long JP Morgan is using Gemstone. – Stephan Eggermont Apr 09 '11 at 16:11
  • Rob Conery has a series (http://blog.wekeroad.com/category/nosql) of posts touting OODBMS. This one here (http://blog.wekeroad.com/2010/02/05/reporting-in-nosql) talks about where RDBMSs make sense in an OODBMS world. And here's a post on using both at once: http://blog.wekeroad.com/2010/05/19/no-sql-in-the-wild – Talljoe Apr 09 '11 at 16:36

8 Answers8

19

The concept of an OODBMS is quite broken, and the various commercial and free offerings that have emerged over the last few decades have barely made a dint in the marketplace.

The relational model is more powerful than object models in terms of the kinds of questions you can ask of your data. Unfortunately, SQL threw out much of the expressive power that the relational model is capable of, but even in this diluted form, it is still easier to express queries in SQL than in a typical OO database (be it ORM or OODBMS).

Queries in an OODBMS are predominantly driven by navigational operators, which means that if your sales database has sales people owning their sales, then querying for the monthly sales for a given SKU is not only likely to be cripplingly slow, but very awkward to express. Consider also a security model that grants employees access to buildings. Which is the correct way to express this? Should employees hold a collection of buildings they can access, or should buildings hold a collection of employees that have access to them? More to the point, why should either class have to have a collection of the other baked into its design? And, whichever one you choose, how would you ask which pairs of employees have more than one building they can access in common? There is no straightforward navigational pattern that can answer such a question. The sensible solution — an "Access" object — is essentially a reversion back to a properly normalised relational schema, and it requires some kind of query language that borrows heavily from the relational algebra in order to answer the question without a massive over-the-wire data transfer.

Also consider another major strength touted for the OODBMS: methods, especially inheritance with virtual methods. A sports clinic might have different risk-of-injury metrics for different kinds of athlete. In the ORM world, this would be automatically expressed as a class hierarchy, with Athlete at the root, and a virtual method, int InjuryRiskScore() implemented by each derived class. The problem is that this method is invariably implemented on the client, not at the back end, so if you want to find the 10 highest risk athletes across all sports at your clinic, the only way to do it is to fetch all athletes across the wire and pass them through a client-side priority queue. I don't know the OODBMS world as well, but I think the same problem occurs, since the storage engines generally only store enough data to rehydrate objects in the client's programming language. In the relational model or SQL, you would express risk-of-injury scoring as a view, which could be simply the union of per-athlete-type views. Then, you just ask the question. Or you can ask more complicated questions like, "Who had the greatest increase in their risk-of-injury since last month's checkup?" or even, "Which risk score has proven to be the best predictor of injury over the last year?". Most importantly, these questions can all be answered inside the DBMS with nothing more than the question and the answer travelling across the wire.

The relational model allows the DBMS to express knowledge in a highly distilled manner based on predicate logic, which allows the various dimensions of the facts you store therein to be joined, projected, filtered, grouped, summarised, and otherwise rearranged in a completely ad hoc manner. It allows you to easily cook up the data in ways that weren't anticipated when the system was originally designed. The relational model thus permits the purest expression of knowledge that we know of. In short, the relational model holds pure facts — nothing more, nothing less (and certainly not objects, or proxies thereof).


On a historical note, the relational model emerged in response to a disastrous state of affairs with the existing network and hierarchical DBMSs of the time, and largely (and rightly) displaced them for all but a small niche of application areas (and even these probably remained largely because SQL failed to deliver on the RMs power). It is deeply ironic that much of the industry is now essentially yearning for the "good old days" of network-theoretical databases, which is essentially what OODBMSs and the current crop of NoSQL databases are going back to. These efforts rightly criticise SQL for its failure to deliver on today's needs, but unfortunately they assumed (wrongly, and probably out of pure ignorance) that SQL is a high-fidelity expression of the relational model. Hence they neglected to even consider the relational model itself, which has virtually none of the limitations that has driven so many away from SQL, often towards OODBMSs.

Marcelo Cantos
  • 167,268
  • 37
  • 309
  • 353
  • 2
    It is clear you have never used Gemstone. None of your arguments make any sense – Stephan Eggermont Apr 09 '11 at 15:54
  • 1
    1 The querying model of Gemstone is much more powerful than that of relational databases – Stephan Eggermont Apr 09 '11 at 16:02
  • 1
    2 Your query example is much slower and difficult to express in a relational database than in Gemstone, especially when you try to do multi-dimensional analysis (datawarehousing) – Stephan Eggermont Apr 09 '11 at 16:05
  • 3 The relational model cannot scale. It uses the wrong abstractions (Set instead of Ordered Set). – Stephan Eggermont Apr 09 '11 at 16:08
  • 4 The limited type system and lack of behavior (storing methods) is responsible for the bad data quality seen in relational databases. 30% of rows containing bad data is not uncommon – Stephan Eggermont Apr 09 '11 at 16:13
  • 5 time to go outside and play with the kids. There are more arguments – Stephan Eggermont Apr 09 '11 at 16:16
  • 6 ORMs have nothing to do with OODBMS. – Stephan Eggermont Apr 09 '11 at 18:46
  • 7 Large systems build with relational databases are never properly normalized because of the performance problems when doing lots of joins. – Stephan Eggermont Apr 09 '11 at 18:48
  • 8 Your method argument is simply not based on facts – Stephan Eggermont Apr 09 '11 at 18:49
  • 1
    @Stephan: 1) Are you aware of the distinction between SQL and the relational model/algebra? 2) You'll need to be more specific; the problems I cited are fairly easily expressed in SQL. 3) I don't understand the statement that sets make things less scalable than ordered sets; an imposed ordering makes it harder for the optimiser, not easier. I worked on a query language for a distributed system at Microsoft that processed about 1 TB/min on full-scan ad-hoc queries; the absence of ordering constraints was vital to achieving this. 4) The relational model places no restrictions on types; SQL does. – Marcelo Cantos Apr 10 '11 at 00:00
  • 6 & 8) Then please elucidate; I stated that I'm not as *au fait* with OODBMSs as I am with ORMs, so it would be good to hear from someone with more experience in that area. Maybe I've got it all wrong — and I value the chance to learn from mistakes, humbling as it is in a public forum — but I can't learn from a bland, "You're wrong." 7) The relational model doesn't constrain the physical organisation of data; SQL does. A relational DBMS can denormalise behind the scenes based on predicted or measured usage profiles; any technique is valid as long the user still sees the relvars (s)he defined. – Marcelo Cantos Apr 10 '11 at 00:01
  • 1 let me rephrase that: SQL database implementations. 2 your example means that the table representing a sale gets more and more columns (or has to use something like a star schema) as the system covers more of the business. That is bad modeling. And it performs badly. Do you remember the ridiculous DWH example Microsoft showed with a 4D sales model and a terabyte of data? There was a reason it didn't have a realistic number of sales dimensions – Stephan Eggermont Apr 10 '11 at 08:04
  • 6 Gemstone is a persistent smalltalk vm. You might want to take a look at http://programminggems.wordpress.com/2010/02/05/scaling-objects-videos/ 8 Is simple: it is not implemented on the client, it is on the server – Stephan Eggermont Apr 10 '11 at 08:09
  • 3 I don't understand why a full scan would be influenced by ordering. – Stephan Eggermont Apr 10 '11 at 08:25
  • 1) No disagreement there; the final two sentences of my answer covered this. 2) Perhaps I didn't explain my example properly; there is no need to grow the schema. 3) The full scan and ordering are unrelated; I was merely pointing out that the engine actually consumed every bit of data (about 5–10 TB on a typical run) rather than speeding things up with indexes. We did this with a map-reduce style of interleaving to cluster the data by keys. If the system had been modeled on the idea of ordered sets, it would have been much more difficult to arrange the execution graph for optimum dataflow... – Marcelo Cantos Apr 10 '11 at 10:40
  • 6) I've already spent a fair bit of time answering and commenting; I'm not to watch a bunch of videos. It is difficult to comment any further on your claims in the absence of simple code samples. Are there any readily accessible samples and/or tutorials online? Or perhaps you can furnish an example of how one or more of the queries I suggested would be solved in Gemstone. The "pairs of employees" one is of particular interest to me along with its dual: "Which pairs of buildings have more than one employee in common?" 8) Point taken. I wasn't so sure about that case. – Marcelo Cantos Apr 10 '11 at 10:50
  • Tutorial is at http://seaside.gemstone.com/tutorial.html. The videos show quite a bit of implementation detail. – Stephan Eggermont Apr 10 '11 at 14:14
16

Relational database:

Pros:

  • Established technology - lots of tools, developers, resources
  • Wide range of Open Source and commercial products
  • Known to scale to very large sites, and very high throughput
  • Expresses many problem domains in a logical and "programmable" way
  • Fairly standard language (SQL)

Cons:

  • Impedance mismatch with OO concepts - modeling "inheritance" in a database is not natural
  • Hierarchical structures usually require vendor-specific extensions to the language
  • Non-relational data (e.g. documents) are not a natural fit
  • Changes in the business domain can be hard to implement once the schema has been defined

OOBDMS

Pros:

  • Closer fit for OO concepts
  • In theory, a developer only needs to work in one language - the persistence details are abstracted away. This should improve productivity

Cons:

  • Significantly fewer tools/resources/developers available.
  • No widely accepted standards
  • "black box" approach to persistence can make performance tuning difficult
  • persistence details often leak into the OO design (see Marcelo's examples)
Neville Kuyt
  • 27,150
  • 1
  • 34
  • 48
  • Giving the bounty in 23hours - been busy these days :) – ebb Apr 09 '11 at 17:23
  • Another pro for OODBMS: can handle much larger amounts of data (if you can afford the price) – Stephan Eggermont Apr 09 '11 at 19:05
  • BTW: coupled with an ORM framework (i.e. Hibernate) RDBMS systems can be used in a model that is more natural to developers, at the cost of performance and complexity (they encapsulate db access but you have to learn how to use them properly). ORM tools are very mature these days and provide a good solution to some of the issues introduced by an RDBMS. – NightDweller Apr 14 '11 at 12:21
  • "Expresses many problem domains in a logical and "programmable" way"? – Stephan Eggermont Apr 21 '11 at 15:51
4

I can answer your question with respect to one Object database I know well: ZODB.

The ZODB allows you to persist your data models almost completely transparently. Its usage amounts to something like:

 # 'Persistent' allows us to save things transparently
 class Car(Persistent):
   def set_wheels(number)
      self.wheels = number

 # 'database' is a ZODB database
 database.car = Car()
 database.car.set_wheels(3)

You'll have to look a long time to find that kind of readability with an RDMBS. That there is the big pro to using ZODB in a web application.

The big downside, as Marcello outlines, is lack of powerful querying. That's partly a side-effect of the convenience of the idiom above. The following is completely OK, and everything will get persisted to the database:

 database.car.colour = "yellow"
 database.car.owner = "Trotter"
 database.car.neighbour = Car()

However, this kind of flexibility makes it hard to optimize complex queries across different models. Just making list of all yellow cars with neighbours will require O(n) time unless you roll your own index.

So, it depends what you mean by "regular web development". Many websites don't actually require complex multi-dimensional queries and searches in linear time are no problem at all. In those cases, using an RDBMS can in my opinion over-complicate your code. I've written many CMS-type applications using solely an object database. Lots of CRUD doesn't particularly come into it; ZODB is very mature, and scales and caches pretty well.

However, if you're writing a web application that needs to do complex business reporting along the lines of Google Analytics, or some kind of warehouse inventory management system with many terabytes of data, then you're pretty definitely going to want an RDBMS.

To summarise, an object database can give you readability and maintainability at the cost of complex query performance. Of course, readability is a matter of opinion, and you can't ignore the fact that very many more developers know SQL than the various object database dialects.

seb
  • 3,078
  • 20
  • 19
2

In regular web development I use Seaside on Gemstone. For most applications, that means I write zero database connection code. It performs, it scales, development is about five times faster.

The only time I will ever use a relational database again for web development is when I have to connect to an existing one.

The advantages:

  • less code, faster development;
  • much better scalability;
  • can handle much more complex models;
  • much better project agility;
  • did I mention flexibility;
  • can handle changes in class models, not just data but also code;

The disdadvantages:

  • you'll probably have to train developers yourself;
  • the one you want (gemstone) costs serious money for large systems
Stephan Eggermont
  • 15,354
  • 1
  • 33
  • 64
2

Relational db

  • SQL and standards
  • easy to model
  • can use only standard and vendor types
  • referential integrity (matematically solid relational set theory)
  • lot of tools and database implementations
  • data separate from program
  • storage management and high end infrastructure support
  • transaction and concurrency management done within
  • Relational Model is Value-Based ie rows are identified by primary keys

Cons

  • no custom type
  • no extensible data types
  • impedance mismatch
  • cannot express nested relationship
  • cannot use complex entities as a single unit
  • need to define keys and various types of relationships at the data model level
  • write procedures for versioning, transactions if needed

Object DB

  • High performance
  • Faster as no joins required
  • Inherent versioning mechanism
  • Navigational interface for operations (like graph traversal)
  • Object Query Language retrieve objects declaratively
  • complex data types
  • object identity ie. equals() in which object identity is independent of value and updates
  • facilitates object sharing
  • classes and hierarchies (inheritance and encapsulation)
  • support for relationships
  • integrated with a persistence language like ODL
  • support for atomicity
  • support for nested relationships
  • semantic modelling

Cons

  • No mathematical foundation as RDB (refer Codd)
  • cons of object orientation
  • persistence difficult for complex structures, some data must be transient

Object-Relational databases (You might have seen UDTs!)

  • support for complex data types like collection, multisets etc
  • object oriented data modelling
  • extended SQL and rich types
  • support for UDT inhertance
  • powerful query language

Different approaches (OO, Relational DB or OODB) may be necessary for different applications

References

The advantage of using relational databases for large corpora

Relational Database Relational Database

OODMS manifesto

ODMG

Benefits of a relational database

The Object-Oriented Database System Manifesto

Object Oriented Database Systems

Object Relational Databases in DBMS

Completeness Criteria for Object-Relational Database Systems

Comparisons

http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems

http://en.wikipedia.org/wiki/Comparison_of_object-relational_database_management_systems

zudokod
  • 3,984
  • 2
  • 19
  • 23
  • No mathematical foundation is a flawed argument. There is more to mathematics than set theory. Turing and Dijkstra were both mathematicians. – Stephan Eggermont Apr 18 '11 at 22:20
  • The article 'The advantage of using relational databases for large corpora' does not compare to an OODB. If I read it correctly, it is a problem that would have been much easier to solve using an OODB – Stephan Eggermont Apr 18 '11 at 22:23
  • 'Benefits of a relational database' names four benefits where in fact an OODB scores significantly better on all of those benefits – Stephan Eggermont Apr 18 '11 at 22:45
  • yes, i just want to point out the different use cases for both. For OODB, there is no backing of a definite mathematical theory like RDBs do. About the article, usually corpora is defined in a graph databases (for modelling semantic relationships) but RDB can also be used. 'Benefits of a relational database' is just a simple reference – zudokod Apr 19 '11 at 04:44
  • 1
    Data separate from program should be listed as a significant con for RDBMS. It is *the* reason for low data quality. And it makes referential integrity a practically useless concept. – Stephan Eggermont Apr 21 '11 at 14:45
1

I think that everything depends on the specifics of your issue. (I'm really going out on a limb here, I know.)

All we know is that you want to use the DB for web development, and you'll be doing a lot of operations on the data.

One of the relevant questions to ask yourself is, how important is it that the DB be tightly integrated with the objects you manipulate? The more it's necessary, the more an object-oriented DB recommends itself.

On the other hand, if your data easily lends itself to the relational model, a relational DB might be better.

Think about the operations you'll need to do. Will you need analysis of all sorts of items with different attributes? How much will you need to future-proof your DB?

I should add that if your DB is likely to be fairly small, performance will not be a major issue. But if performance is, in fact, an issue, you have lots of things to worry about beyond just OO vs. relational DBs. (Just to pick one example from the relational DB world, what normalization form should you use? This is an exceedingly important, and complex, question. Are you maintaining an operational system or data warehouse? Do you know ahead of time that certain queries are of paramount importance, or of negligible importance? &c.)

Beyond the question of DB performance and integration with your object model, there are other real-world questions to ask. Do you have diskspace / server / bandwidth limitations? Will you offer only a small number of operations to web users, or might people you don't even know be creating their own queries/edits?

For other, more important, real-world questions, whom will you be working with? What do they already know (or prefer)? And if you don't have domain knowledge yet, maybe you have personal curiosity pushing you in one direction? If you're starting on a personal project, following your own preferences is a better guide to success than worrying over performance before you even start.

If you can answer these and similar questions, even if the answer is "I don't know," you will be able to get much better direction in how to proceed.

JXG
  • 6,953
  • 7
  • 29
  • 60
1

In contract to Marcelo's in depth and well thought out response, I'd say that based on the phrasing of your question "regular web development", my off the cuff response would be to say that you'd be hard pressed to find enough pro's to justify using a Object DB over a traditional relational db, for the simple fact that are more resources/developers/tutorials/etc that are more familiar with the traditional relational model, and how to utilize that to achieve "regular web development".

That said, I think that with some of the modern ORMs you get a little of the best of both worlds, in that your underlying data is stored in a well-understood RDBMS (that is likely stable, supported, etc), but you can still abstract away some of the Object modeling capabilities that can (arguably) be more suited to developing CRUD applications.

I'll admit that I'm not well versed in the current capabilities of modern OODBMSs, however unless you are in a field that is completely suited to achieving a perfect object representation of your domain (and you have the object modeling talent to take advantage), then I'd stick with a RDBMS for your persistent storage.

Hope that helps!

jsuggs
  • 2,592
  • 3
  • 16
  • 17
0

This pretty much explains the pros and cons:

http://en.wikipedia.org/wiki/Object_database

dseibert
  • 1,231
  • 8
  • 17
  • 2
    That article is woeful. Comments like, "OODBMS may in some cases be faster than relational DBMS because data is not stored in relational rows and columns but as objects," betray complete ignorance of what the relational model is about. It would make sense if "relational" were replaced with "SQL". Also, the closing *argument* has no place in an encyclopedia. – Marcelo Cantos Mar 18 '11 at 21:55
  • @Marcelo: "*may in some cases be faster* and (my addition) may in many more cases be slower". It may be raining in Nevada tomorrow or it may be not :) – ypercubeᵀᴹ Mar 18 '11 at 22:26
  • 1
    @ypercube: It wasn't the "may be faster" bit that bothered me. It was "relational rows and columns" that got to me. Anyone with more than a passing knowledge of the relational model knows that it has absolutely nothing to do with rows and columns, and that the rows and columns of SQL tables are a poor mimicry of the theory's relations, tuples and attributes. – Marcelo Cantos Mar 19 '11 at 02:12