Why do my MongoDB simultaneous $push updates fail?

Question

I'm performing a number of updates of the form

update(
  { "uuid": someUuid, "revision.versionNumber": someVersionNumber},
  { "$set": { "meta.someId": someId }, "$push": { "meta.someMessages": someMessage } }
)

occasionally I see when this is called for the same uuid,versionNumber, & someId with a different someMessage the first update will succeed but the second will fail silently.

I see the following in the mongo logs so I know the updates are making it to the database, notice that the first update has the same query as the third one but the first has nupdated: 1 while the third has nupdated: 0

Wed Aug 28 14:50:24 [conn18] update some-db.some_collection query: { uuid: "b841f303-a054-4eb9-8885-9d3ebf9906a1", revision.versionNumber: 9 } update: { $set: { meta.someId: "521e6fe4036420f90371a922" }, $push: { meta.someMessages: { event: "instance.complete", timestamp: new Date(1377726624985) } } } nscanned:2507 nmoved:1 nupdated:1 keyUpdates:0 numYields: 19 locks(micros) w:6010 9ms
Wed Aug 28 14:50:24 [conn18] run command some-db.$cmd { getlasterror: 1, fsync: true }
Wed Aug 28 14:50:24 [conn14] update some-db.some_collection query: { uuid: "843f424d-8a62-4a8b-853f-dc2e9c42b309", revision.versionNumber: { $lt: 10 }, meta.deleted: true } update: { $set: { meta.deleted: false } } nscanned:3243 nupdated:0 keyUpdates:0 numYields: 23 locks(micros) w:8431 11ms
Wed Aug 28 14:50:24 [conn14] run command some-db.$cmd { getlasterror: 1, fsync: true }
Wed Aug 28 14:50:24 [conn5] update some-db.some_collection query: { uuid: "b841f303-a054-4eb9-8885-9d3ebf9906a1", revision.versionNumber: 9 } update: { $set: { meta.someId: "521e6fe4036420f90371a922" }, $push: { meta.someMessages: { event: "instance.complete.success", timestamp: new Date(1377726624985) } } } nscanned:3242 nupdated:0 keyUpdates:0 numYields: 20 locks(micros) w:5684 9ms

Also here is the output from mongosniff

update  flags:0 q:{ uuid: "85700d8c-8946-4b09-968b-968f76d31028", revision.versionNumber: 13 } o:{ $set: { meta.someId: "521e7b12036420f90371b515" }, $push: { meta.someMessages: { event: "instance.complete", timestamp: new Date(1377729439093) } } }
319 some-db.some_collection

    update  flags:0 q:{ uuid: "a460019d-443b-4b59-b23e-1eae19e26c31", revision.versionNumber: 14 } o:{ $set: { meta.someId: "521e7b2f036420f90371b579" }, $push: { meta.someMessages: { event: "task.start", timestamp: new Date(1377729439093) } } }
123 some-db.some_collection

    query: { uuid: "a2558f5c-d825-4ec4-bbc4-7e48b1cb3c60", isLatest: true }  ntoreturn: -1 ntoskip: 0
302 some-db.some_collection

    update  flags:0 q:{ uuid: "85700d8c-8946-4b09-968b-968f76d31028", revision.versionNumber: 13 } o:{ $set: { meta.someId: "521e7b12036420f90371b515" }, $push: { meta.someMessages: { event: "instance.complete.success", timestamp: new Date(1377729439093) } } }
173 some-db.some_collection

@AsyaKamsky there is not. Would that make a difference and why? — soldier.moth, Aug 30 '13 at 17:19
I think it would - notice that the update that "got" there first grew the document which caused it to be moved on disk "nmoved:1" which means that depending on how the other update was scanning the collection it's possible it "missed" the document (both processes yield periodically, which means the state of the world could change: numYields: 20) The index would also help with slowness of the update - you are scanning over 2500 document to find 1 to update, with index the nscanned will be much lower and both updates will be guaranteed to "traverse" the index in the same order. — Asya Kamsky, Aug 30 '13 at 17:58
couldn't it be scanning on _id? A move shouldn't effect it. What is the result of explain()? What driver are you using and are you doing asynchronous calls? — Dylan Tong, Aug 31 '13 at 07:11
@DylanTong no, it would scan in natural order, _id index would only be used if you are using _id field in the query part. — Asya Kamsky, Sep 01 '13 at 17:03
@soldier.moth what does revision represent? shouldn't it be incremented when you update the document if that's what it means, that document was changed? — Asya Kamsky, Sep 01 '13 at 17:04
@AsyaKamsky That's not what it represents, once a document w/ a uuid & revision is saved those values don't change — soldier.moth, Sep 01 '13 at 20:43
thanks. i'd be curious to know if adding an index on uuid (or more likely you would want a compound index on uuid, revision.version for best selectivity/performance) makes this occurrence not happen. I wasn't able to reproduce this with trivially small example, but if this only happens occassionally there could be multiple states that have to align for this to happen. — Asya Kamsky, Sep 01 '13 at 21:03
writeConcern is not relevant here - the logs clearly show the operations are being executed. — Asya Kamsky, Dec 20 '13 at 03:56
@AsyaKamsky are you saying that updates with nmoved>0 could potentially "hide" documents from other simultaneous updates that yield? — Avish, Jan 28 '14 at 07:11
it's a data race situation in very specific set of circumstances but yes, that's exactly what I'm saying. — Asya Kamsky, Jan 28 '14 at 08:43
Are you sharding the data or using replicas? We have seen a lot of data inconsistency issues with sharding. I believe @AsyaKamsky is on to something with the "nmoved". It could have been moved due to sharding or due to it's size exceeding the pre-determined limit in mongod and then the next request missed it due to some bug in mongod. Even on a single mongod with no replicas or sharding we have found that even setting j and w do not guarantee that a write will be visible 100% of the time. — bridiver, Feb 09 '14 at 17:20
@bridiver I'm not sure what you mean by "data inconsistency issues" with sharding - on this example clearly both updates ran on the same 'mongod' since they are in the same log. nmoved:1 never has anything to do with sharding. As far as setting w and j and writes not being visible, it sounds like maybe you're doing something wrong. In this case the write did not happen. Not that it happened and then disappeared. Feel free to ping me off-SO if you'd like to get to the bottom of the issues you were seeing, but it sounds like they were quite different than what this question describes. — Asya Kamsky, Feb 09 '14 at 22:54
@AsyaKamsky, we have seen different results on the console vs the API for the same query and no, we are not doing anything wrong with w and j. We have a lot of experience with Mongo, we've filed numerous bugs and been to the 10gen office in NY to try to resolve our issues. Our experience has been that the product has a lot of bugs and I believe the issue that soldier.moth is seeing is also likely a bug. — bridiver, Feb 10 '14 at 13:08
@bridiver you can call it a design bug, but the behavior of an unindexed scan does not represent a point in time and therefore is not guaranteed to visit every document - in particular if other threads are changing those documents. In this case it's that's what's happening according to the logs. Can you drop me an email with the bugs that you ran into (that are filed in jira?) — Asya Kamsky, Feb 11 '14 at 02:23

score 2 · Answer 1 · answered Feb 26 '14 at 04:51

As a workaround for this bug, I recommend using findAndModify and the checking the results to make sure your update happened.

    dbCollection.findAndModify(
{ "uuid": someUuid, "revision.versionNumber": someVersionNumber},
[], { "$set": { "meta.someId": someId }, "$push": { "meta.someMessages": someMessage } }, {safe: true, 'new' : true}, function(err, updated){
     if(err){
     //handle the error
    }
    if(updated.meta.someMessages doesn't contain your message) {
     //try it again or report it to the client
    }     
    });

Why do my MongoDB simultaneous $push updates fail?

1 Answers1