CouchDB indexes to connect the dots between documents

Question

I have the following documents:

{ _id: "123", type: "project", worksite_id: "worksite_1" }
{ _id: "456", type: "document", project_id: "123" }
{ _id: "789", type: "signature", document_id: "456" }

My goal is to run a query and to inevitably do a filtered replication of all documents that have a connection with worksite_id: worksite_1.

Example:

Because this project has the worksite I am looking for
document has that project
signature has that document

I should be able to retrieve all of these documents if I want everything from that worksite.

Normally I would just add a worksite_id to my type:document and type:signature. However, worksite's can change in a project for various reasons.

I was wondering if there is a way to create an index or do something I am not thinking about to show these resemblances.

This feels like it is on the right path but the explanation puts documents inside other documents where I just want them to be separate.

Have you considered [relational-pouch](https://github.com/pouchdb-community/relational-pouch)? — Martin Bramwell, Dec 12 '18 at 22:56
@MartinBramwell I believe I need to create a view/filter in order to do what I need to do with replication — bryan, Dec 12 '18 at 23:01

score 2 · Answer 1 · answered Dec 19 '18 at 08:46

A map function only considers one document at a time, so unless that document knows about other documents, you can't link them together. Your structure implies a three-table join in SQL terms.

With your structure, the best you can hope for is a two-request solution. You can create a view that shows signed documents only:

function (doc) {
  if (doc && doc.type && doc.type === "signature" && doc.document_id) {
    emit(doc.document_id, {_id: doc.document_id})
  }
}

and using the same technique, link projects to documents -- but you can't get all three linked.

score 0 · Answer 2 · answered Dec 21 '18 at 17:32

I think I have what you are looking for.

Here's some data:

{
  "docs": [
    {
        "_id": "123",
        "type": "project",
        "code": "p001"
    },
    {
        "_id": "1234",
        "type": "worksitelog",
        "documents": [
          {
            "timestamp": "20180921091501",
            "project_id": "123",
            "document_id": "457",
            "signature_id": "789"
          },
          {
            "timestamp": "20180921091502",
            "project_id": "123",
            "document_id": "457",
            "signature_id": "791"
          },
          {
            "timestamp": "20180921091502",
            "project_id": "123",
            "document_id": "458",
            "signature_id": "791"
          },
          {
            "timestamp": "20180921091502",
            "project_id": "123",
            "document_id": "456",
            "signature_id": "790"
          }
        ],
        "worksite_id": "worksite_2"
    },
    {
        "_id": "1235",
        "type": "worksitelog",
        "documents": [
          {
            "timestamp": "20180913101502",
            "project_id": "125",
            "document_id": "459",
            "signature_id": "790"
          }
        ],
        "worksite_id": "worksite_1"
    },
    {
        "_id": "124",
        "type": "project",
        "code": "p002"
    },
    {
        "_id": "125",
        "type": "project",
        "code": "p003"
    },
    {
        "_id": "456",
        "type": "document",
        "code": "d001",
        "project_id": "123",
        "worksite_id": "worksite_2"
    },
    {
        "_id": "457",
        "type": "document",
        "code": "d002",
        "project_id": "123",
        "worksite_id": "worksite_2"
    },
    {
        "_id": "458",
        "type": "document",
        "code": "d003",
        "project_id": "123",
        "worksite_id": "worksite_2"
    },
    {
        "_id": "459",
        "type": "document",
        "code": "d001",
        "project_id": "125",
        "worksite_id": "worksite_1"
    },
    {
        "_id": "789",
        "type": "signature",
        "user": "alice",
        "pubkey": "65ab64c64ed64ef41a1bvc7d1b",
        "code": "s001"
    },
    {
        "_id": "790",
        "type": "signature",
        "user": "carol",
        "pubkey": "tlmg90834kmn90845kjndf98734",
        "code": "s002"
    },
    {
        "_id": "791",
        "type": "signature",
        "user": "bob",
        "pubkey": "asdf654asdf6854awer654awer654eqr654wra6354f",
        "code": "s003"
    },
    {
        "_id": "_design/projDocs",
        "views": {
          "docsPerWorkSite": {
            "map": "function (doc) {\n  if (doc.type && ['worksitelog', 'document', 'project', 'signature'].indexOf(doc.type) > -1) {\n    if (doc.type == 'worksitelog') {\n      emit([doc.worksite_id, 0], null);\n      for (var i in doc.documents) {\n        emit([doc.worksite_id, Number(i)+1, 'p'], {_id: doc.documents[i].project_id});\n        emit([doc.worksite_id, Number(i)+1, 'd'], {_id: doc.documents[i].document_id});\n        emit([doc.worksite_id, Number(i)+1, 's'], {_id: doc.documents[i].signature_id});\n      }\n    }\n  }\n}"
          }
        },
        "language": "javascript"
    }
  ]
}

Save that data to disk as stackoverflow_53752001.json.

Use Fauxton to create a database called stackoverflow_53752001.

Here's a bash script to load the data from the file stackoverflow_53752001.json into the databasestackoverflow_53752001`. You'll need to edit the first three parameters, obviously. Fix it, then paste it into a (Unix) terminal window:

USRID="you";
USRPWD="yourpwd";
HOST="yourdb.yourpublic.work";

COUCH_DATABASE="stackoverflow_53752001";
FILE="stackoverflow_53752001.json";
#
COUCH_URL="https://${USRID}:${USRPWD}@${HOST}";
FULL_URL="${COUCH_URL}/${COUCH_DATABASE}";
curl -H 'Content-type: application/json' -X POST "${FULL_URL}/_bulk_docs"  -d @${FILE};

In Fauxton, select database stackoverflow_53752001 and then, in the left-hand menu select "Design Documents" >> "projDocs" >> "Views" >> "docsPerWorkSite".

You'll see data like this:

{"total_rows":17,"offset":0,"rows":[
  {"id":"1235","key":["worksite_1",0],"value":null},
  {"id":"1235","key":["worksite_1",1,"d"],"value":{"_id":"459"}},
          :                     :
          :                     :
  {"id":"1234","key":["worksite_2",4,"p"],"value":{"_id":"123"}},
  {"id":"1234","key":["worksite_2",4,"s"],"value":{"_id":"790"}}
]}

If you then click on the "Options" button, in the top right, you'll get an option sheet for modifying the raw query. Pick:

"Include Docs"
"Between Keys"
- "Start key" : ["worksite_1", 0]
- "End key" : ["worksite_1", 9999]

Hit "Run Query", and you should see:

{"total_rows":17,"offset":0,"rows":[
  {"id":"1235","key":["worksite_1",0],"value":null,"doc":{"_id":"1235","_rev":"1-de2b919591c70f643ce1005c18da1c54","type":"worksitelog","documents":[{"timestamp":"20180913101502","project_id":"125","document_id":"459","signature_id":"790"}],"worksite_id":"worksite_1"}},
  {"id":"1235","key":["worksite_1",1,"d"],"value":{"_id":"459"},"doc":{"_id":"459","_rev":"1-5422628e475bab0c14e5722a1340f561","type":"document","code":"d001","project_id":"125","worksite_id":"worksite_1"}},
  {"id":"1235","key":["worksite_1",1,"p"],"value":{"_id":"125"},"doc":{"_id":"125","_rev":"1-312dd8a9dd432168d8608b7cd9eb92cd","type":"project","code":"p003"}},
  {"id":"1235","key":["worksite_1",1,"s"],"value":{"_id":"790"},"doc":{"_id":"790","_rev":"1-be018df4ecdf2e6add68a2758b9bd12a","type":"signature","user":"carol","pubkey":"tlmg90834kmn90845kjndf98734","code":"s002"}}
]}

If you then change the start and end keys to ["worksite_2", 0] and ["worksite_2", 9999] you will see the data for the second work site.

For this to work, each time you have written a new document and signature to the database, you'll need to:

prepare an object { "timestamp": "20180921091502", "project_id": "123", "document_id": "457", "signature_id": "791" }
get the corresponding work site log record
append the object to the documents array
put back the altered work site log record

I assumed there are multiple signatures per document, so you'll have to write a log record for each and every one of them. If that grows too big you can change worksite_id to something like worksite_1_201812, which would give one log per work site per month with out breaking the query logic, I think.

CouchDB indexes to connect the dots between documents

2 Answers2