0

I have trouble with an aggregation involving an ObjectId. This are the pipelines:

{'$match' : {'likes.id' : ObjectId('50e99acfb35de75402002023')}}
{'$project' : {'likes.id' : 1, '_id' : 0}}
{'$unwind' : '$likes'}
{'$group' : {'_id' : '$likes.id', 'count' : {'$sum':1}}}
{'$sort' : {'_id' : 1}}

My attempt to write it in R using rmongodb is:

pipe_1 <- mongo.bson.from.JSON('{"$match" : {"likes.id" : { "$oid" : "50e99acfb35de75402002023" }}}')
pipe_2 <- mongo.bson.from.JSON('{"$project" : {"likes.id" : 1, "_id" : 0}}')
pipe_3 <- mongo.bson.from.JSON('{"$unwind" : "$likes"}')
pipe_4 <- mongo.bson.from.JSON('{"$group" : {"_id" : "$likes.id", "count" : {"$sum":1}}}')
pipe_5 <- mongo.bson.from.JSON('{"$sort" : {"count" : 1}}')
pipes <- list(pipe_1,pipe_2,pipe_3,pipe_4,pipe_5)
result <- mongo.aggregation(mongo, ns = "analytics.analytics_profiles", pipeline =pipes)

Which returns

mongoDB error: 10

that corresponds with the BSON invalid error code.

I think the problem is with the match by ObjectId: the first pipeline alone gives the same error.

How can I fix this?

Extra: how can this be done using mongolite instead?

ami232
  • 55
  • 8

1 Answers1

3

You really should not "dot notation" for a an "array" key in the aggregation pipeline, but what you are doing is still perfectly valid. However you can reduce the array elements to just "id" values with $project though:

Also looks like you might need to contruct your BSON for matching the ObjectId seperately:

oid <- mongo.oid.from.string("50e99acfb35de75402002023")
pipe_1 <- mongo.bson.from.list(list('$match' = list('likes.id' =  oid)))
pipe_2 <- mongo.bson.from.JSON('{"$project" : {"likes" : "$likes.id", "_id" : 0}}')
pipe_3 <- mongo.bson.from.JSON('{"$unwind" : "$likes"}')
pipe_4 <- mongo.bson.from.list(list('$match' = list('likes' =  oid)))
pipe_5 <- mongo.bson.from.JSON('{"$group" : {"_id" : "$likes", "count" : {"$sum":1}}}')
pipe_6 <- mongo.bson.from.JSON('{"$sort" : {"count" : 1}}')

That now makes "likes" an array of just values and not a "key/value" pair. So you don't need "$likes.id" in later stages. Just reference by "$likes".

--

For the record, I went through this with a sample document is a collection like what you seem to have defined:

{
    "_id" : ObjectId("50e99acfb35de75402002023"),
    "likes" : [
            {
                    "id" : ObjectId("50e99acfb35de75402002023")
            },
            {
                    "id" : ObjectId("50e99acfb35de75402002023")
            },
            {
                    "id" : ObjectId("50e99acfb35de75402002023")
            },
            {
                    "id" : ObjectId("50e99acfb35de75402002023")
            }
    ]
}

Then I actually defined the pipeline in R using the bson.from.list` contructors like so:

pipeline <- list(
    mongo.bson.from.list(list(
        '$match' =  list( 
           'likes.id' = mongo.oid.from.string("50e99acfb35de75402002023")
         )
    )),
    mongo.bson.from.list(list(
        '$project' = list(
            '_id' = 0,
            'likes' = '$likes.id'
        )
    )),
    mongo.bson.from.list(list(
        '$unwind' = '$likes'
    )),
    mongo.bson.from.list(list(
        '$match' =  list( 
           'likes' = mongo.oid.from.string("50e99acfb35de75402002023")
         )
    )),
    mongo.bson.from.list(list(
        '$group' = list(
            '_id' = '$likes',
            'count' = list( '$sum' = 1 )
        )
    )),
    mongo.bson.from.list(list(
        '$sort' = list( 'count' = 1 )
    ))
)

mongo.aggregation(mongo, "test.posts", pipeline)

And for me that correctly adds all matching entries within the array.

Also "note" the additional match stage here after $unwind. The first $match in aggregation matches the "document", but this does nothing to "filter" the array content, so items in the array still contain things that do not match the "id" value you asked for.

So after processing $unwind you need to "filter" with $match again once the array has been denormalized. There are actually more efficient ways of doing this and they are well documented on this site even: Retrieve only the queried element in an object array in MongoDB collection

But you should also really be using the bson.from.list and general list() contructors for the structure rather than converting from JSON.

Community
  • 1
  • 1
Blakes Seven
  • 44,166
  • 12
  • 104
  • 116
  • That's great improvement. However, I keep getting the same error. I think the problem is mainly with the first pipeline. –  Jul 17 '15 at 09:21
  • @ALBERTOMARTINIZQUIERDO It wasn't the only problem. The JSON parser does not accept extended JSON syntax? I find that hard to believe. But there is likely another way around that. – Blakes Seven Jul 17 '15 at 09:23
  • @ALBERTOMARTINIZQUIERDO You might have to construct the list as a BSON object rather than use JSON. See `mongo.bson.from.list` as shown. – Blakes Seven Jul 17 '15 at 09:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/83528/discussion-between-blakes-seven-and-leonid-beschastny). – Blakes Seven Jul 17 '15 at 09:53
  • `mongo.aggregation` requieres the pipeline argument to be a list. – ami232 Jul 17 '15 at 10:33
  • @AlbertoMartínIzquierdo Ah? So what were you doing? Just throwing in all the "pipe" stages as arguments? I suppose I should have asked that. Also what was the finding? If a list then does the `.from.JSON` use the extended syntax? Worth posting as an answer if you have the result. – Blakes Seven Jul 17 '15 at 10:35
  • I'm giving a list of pipelines, each one is a BSON: `pipes – ami232 Jul 17 '15 at 10:38
  • @AlbertoMartínIzquierdo Okay then. What I just alluded to was your actual execution is not part of your question. Do you have a result or not? Does the answer here help or not? If not, then what more is needed? – Blakes Seven Jul 17 '15 at 10:41
  • Maybe I didn't explain myself clear in the first comment: the pipeline with the oid isn't working, neither your version or mine. I keep getting the same error. – ami232 Jul 17 '15 at 10:44
  • @AlbertoMartínIzquierdo I'd love to chat but you lack the rep score. Let's be very to the point. Which parts do "not work"? Can you define an "oid" as I suggest? What does the "first" therefore `$match` stage serialize as the way I have presented? Take the time to check and research and then respond. Space is at a premium here, as well as time. So get the information and respond back with that information. – Blakes Seven Jul 17 '15 at 10:56
  • @Blakes Seven, your version should work. In every post here at SO I suggest to use `mongo.bson.from.list` and explain why it's better, but most of the rmongodb users still use `mongo.bson.from.JSON` :-( – Dmitriy Selivanov Jul 18 '15 at 06:50
  • @AlbertoMartínIzquierdo There was a typing mistake I did not notice. That is corrected along with a listing I know works in full and more information for you to look at. – Blakes Seven Jul 19 '15 at 09:24
  • @Blakes Your version using `bson.from.list`actually works, that solves my problem. By the way, I actually want the output without the second `$match` you added. – ami232 Jul 20 '15 at 09:16