4

I come from a SQL world where lookups are done by several object properties (published = TRUE or user_id = X) and there are no joins anywhere (because of the 1:1 cache layer). It seems that a document database would be a good fit for my data.

I am trying to figure-out if there is a way to pass one (or more) object properties to a CouchDB map/reduce function to find matching documents in a database without creating dozens of views for each document type.

Is it possible to pass the desired document property key(s) to match at run-time to CouchDB and have it return the objects that match (or the count of object that match for pagination)?

For example, on one page I want all posts with a doc.user_id of X that are doc.published. On another page I might want all documents with doc.tags[] with the tag "sport".

Xeoncross
  • 50,836
  • 73
  • 238
  • 351

3 Answers3

6

You could build a view that iterates over the keys in the document, and emits a key of [propertyName, propertyValue] - that way you're building a single index with EVERYTHING prop/value in it. Would be massive, no idea how performance would be to build, and disk usage (probably bad).

Map function would look something like:

// note - totally untested, my CouchDB fu is rusty
function(doc) {
  for(prop in doc) {
    emit([prop, doc[prop]], null);
  }
}

Works for the basic case of simple properties, and can be extended to be smart about arrays, and emit a prop/value pair for each item in the array. That would let you handle the tags.

To query on it, set [prop] as your query key on the view.

madlep
  • 41,172
  • 7
  • 39
  • 53
  • So it sounds like it is a bad idea. It's possible - but a bad idea. Bummer, thanks to my ORM I've been free from crafting SQL queries for so long that I wasn't looking forward to writing model methods (or views in CouchDB) for all the different ways I lookup data. Then again, at least all the queries are listed in the _design document for anyone new to the project. – Xeoncross Jun 28 '11 at 15:51
2

Basically, no.

The key difference between something like Couch and a SQL DB is that the only way to query in CouchDB is essentially through the views/indexes. Indexes in SQL are optional. They exist (mostly) to boost performance. For example, if you have a small DB, your app will run just fine on SQL with 0 indexes. (Might be some issue with unique constraints, but that's a detail.)

The overall point being is that part of the query processor in a SQL database includes other methods of data access beyond simply indexes, notably table scans, merge joins, etc.

Couch has no query processor. It has views (defined by JS) used to define B-Tree indexes.

And, that's it. That's the hammer of Couch. It's a good hammer. It's been lasting the data processing world for basically 40 years.

Indexes are somewhat expensive to create in Couch (based on data volume) which is why "temporary views" are frowned upon. And they have a cost in maintenance as well, so views need to be a conscious design element in your database. At the same time, they're a bit more powerful than normal SQL indexes as well.

You can readily add your own query processing on top of Couch, but that will be more work for you. You can create a few select views, on your most popular or selective criteria, and then filter the resulting documents by other criteria in your own code. Yes, you have to do it, so you have to question whether the effort involved is worth more than whatever benefits you feel Couch is offering your (HTTP API, replication, safe, always consistent datastore, etc.) over a SQL solution.

Will Hartung
  • 107,347
  • 19
  • 121
  • 195
  • "always consistent datastore" - isn't CouchDB eventually consistent which means it can't be always consistent because otherwise it won't be highly available in terms of CAP theorem? – yojimbo87 Jun 28 '11 at 10:32
  • If you're talking about replication and clustering, them yes. What I meant by always consistent is that with regard to a single instace of Couch, it is "crash proof". The database is always in a consistent, usable state, even if the system or server crashes. Upon restart the DB is immediately consistent and usable. – Will Hartung Jun 28 '11 at 15:28
0

I ran into a similar issue like this, and built a quick workaround using CouchDB-Python (which is a great library). It's not a pretty solution (goes against the principles of CouchDB), but it works.

CouchDB-Python gives you the function "Query", which allows you to "execute an ad-hoc temporary view against the database". You can read about it here

What I have is that I store the javascript function as a string in python, and the concatenate it with variable names that I define in Python.

In some_function.py

variable = value

# Map function (in javascript)
map_fn = """function(doc) {
     <javascript code>
     var survey_match = """ + variable + """;
     <javascript code>
"""

# Iterates through rows
for row in db.query(map_fn):
     <python code>

It sure isn't pretty, and probably breaks a bunch of CouchDB philosophies, but it works.

D

Daniel
  • 3
  • 1