Get count of Array Sizes by Ranges

Question

I have a collection (let's call it 'AwesomeCollection') that contains documents that looks like this:

{
  "_id" : "someID",
  "DateAdded" : ISODate("2017-06-08T22:35:43.517Z"),
  "Info" : {
    "size" : NumberLong(32530454),
    "filtype" : "APK"
  },
  "ARRAY_NODE" : [{
      "key1" : "val1"
    }, {
      "key2" : "val2"
    }, {
      "key3" : "val3"
    }]
}

ARRAY_NODE is an array field which can contain anywhere from 1 to 200 items.

I want to write a query that returns count of documents in AwesomeCollection for each of these categories:

Where ARRAY_NODE size < 50
Where ARRAY_NODE 50 =< size < 100
Where ARRAY_NODE 100 =< size < 150
Where ARRAY_NODE 150 =< size < 200

Neil Lunn · Answer 1 · 2017-06-30T02:26:59.747

You want the $switch statement in an aggregation pipeline:

db.collection.aggregate([
  { "$group": {
    "_id": {
      "$let": {
        "vars": {
          "length": { 
            "$cond": {
              "if": { "$isArray": "$ARRAY_NODE" },
              "then": { "$size": "$ARRAY_NODE" },
              "else": 0
            }
          }
        },
        "in": {
          "$switch": {
            "branches": [
              { "case": { "$lt": [ "$$length", 50 ] }, "then": "< 50" },
              { "case": { "$lt": [ "$$length", 100 ] }, "then": ">= 50 < 100" },
              { "case": { "$lt": [ "$$length", 150 ] }, "then": ">= 100 < 150" },
              { "case": { "$lt": [ "$$length", 200 ] }, "then": ">= 150 < 200" },
            ],
            "default": "> 200"
          }
        }
      }
    },
    "count": { "$sum": 1 }
  }}
])

As shown we shorten the syntax by declaring first in $let, in order to get the $size of the array for each count.

There is a $bucket aggregation pipeline stage that is basically a "shortcut" for creating a similar statement, but it's usage is a "convenience" so it's not exactly the same output:

db.collection.aggregate([
  { "$bucket": {
    "groupBy": { 
      "$cond": {
        "if": { "$isArray": "$ARRAY_NODE" },
        "then": { "$size": "$ARRAY_NODE" },
        "else": 0
      }
    },
    "boundaries": [0, 49, 99, 149, 199],
    "default": "> 200"
  }}
])

If you really need the full labels, then use the full form with $switch. Also note that the "ranges" are of course also an convenience function and therefore follow their own strict logic. When your logic differs it is again better to write the $switch in full.

Also note that where no such field is actually present in the document, then it is necessary to use $isArray or $ifNull in earlier versions ( as shown later ) in order to do a logical test to return a default value of 0 to indicate the "length". Otherwise the $size operator, which is expecting an array will produce an error:

The argument to $size must be an Array, but was of type: EOO

Either handle logically or return an empty array to $size as is demonstrated on the answer to : The argument to $size must be an Array, but was of type: EOO

This was always and still is possible using the $cond operator as well, but it's just that the syntax as a "ternary" operator, is nested rather than the cleaner form with $switch

db.collection.aggregate([
  { "$group": {
    "_id": {
      "$let": {
        "vars": {
          "length": { 
            "$cond": {
              "if": { "$ifNull": [ "$ARRAY_NODE", false ] },
              "then": { "$size": "$ARRAY_NODE" },
              "else": 0
            }
          }
        },
        "in": {
          "$cond": {
            "if": { "$lt": [ "$$length", 50 ] },
            "then": "< 50",
            "else": {
              "$cond": {
                "if": { "$lt": [ "$$length", 100 ] },
                "then": "> 50 < 100",
                "else": {
                  "$cond": {
                    "if": { "$lt": [ "$$length", 150 ] },
                    "then": ">100 < 150",
                    "else": {
                      "$cond": {
                        "if": { "$lt": [ "$$length", 200 ] },
                        "then": "> 150 < 200",
                        "else": "> 200"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
])

As a demonstration. Insert some documents to a collection with different array sizes:

// Insert documents with arrays of given lengths
db.collection.insertMany(
  [
    5,40,      //        < 50  count 2
    70,        //  >= 50 < 100 count 1
    120,130,   // >= 100 < 150 count 2
    170,       // >= 150 < 200 count 1
    210        // > 200        count 1
  ].map( n => 
    ({ "ARRAY_NODE": Array.apply(null,Array(n)).map(() => ({})) }) )
)

Then run any of the aggregation statements to produce the results:

{  "_id" : "< 50", "count" : 2  }
{  "_id" : ">= 50 < 100", "count" : 1 }
{  "_id" : ">= 100 < 150", "count" : 2 }
{  "_id" : ">= 150 < 200", "count" : 1 }
{  "_id" : "> 200", "count" : 1 }

OK what if I want to check for more conditions? e.g. DateAdded: {$gt: ISODate('2017-06-20')}. Where in the aggregation would that be inserted? — IsaacBok, Jun 29 '17 at 23:19
@IsaacBok If you have another question then [Ask a new Question](https://stackoverflow.com/questions/ask) and make it clear and self contained. General query selections can be added as an initial pipeline stage with `$match`. — Neil Lunn, Jun 29 '17 at 23:22
I get { $err: \"The argument to $size must be an Array, but was of type: EOO\", code: 17124} even though ARRAY_NODE is an array... — IsaacBok, Jun 29 '17 at 23:36
@IsaacBok Actually it means that is not an array, and therefore some of your documents likely have no such array in them, we can filter that out of course. See the additional usage of `$ifNull` and `$cond` to replace missing arrays with `0` for size. — Neil Lunn, Jun 29 '17 at 23:38
@IsaacBok You are being very silent. Your profile here shows a bit of a bad history of unaccepted answers so I feel you need some prodding here. You have been given an answer that meets the specifications of the "question you actually asked". The document in the question shows an array, and I have also shown you how to deal with documents that are missing this array. You have two versions of the statement, being one for a latest release MongoDB and one compatible with older versions. Both produce the result asked for. — Neil Lunn, Jun 30 '17 at 00:06

Get count of Array Sizes by Ranges

1 Answers1