Using the MongoDB aggregation pipeline

I find myself using the aggregation pipeline when I need to operate on the contents of documents. This might mean changing, adding, modifying, or extracting. I also use it anytime the commands are complex, since it helps to organize things by breaking them up into pieces. I use mongodb aggregation for quering the data used to build my data visuals that you can find here: Map Dashboards.

The Aggregation pipeline is composed of stages and operators

Each stage has a different general purpose, and uses operators within the stage that determine specifically what it does. The operators can be stage specific or act differently in a particular stage.

Stage Operators access field paths with the $ symbol

A pipeline stage can access fields directly, like with match:

{"$match": { "path.to.Date":  { $eq: ISODate('2003-01-31') } } }  

But a pipeline operator needs field paths prefaced with $
This example would be creating a new month field using the $month operator acting on an ISODate field. It needs “$fieldpath” or it will not be able to access it.

"month": { $month: "$path.to.Date" },          

Variables are accessed with $$ symbol

Some pipeline operators allow you to set a field path as the variable to work on. They are then accessed with $$name you gave to the path within the operator.

You can compose aggregation commands

The mongo command line uses JS constructs so you can break up stages into separate objects and then pass those to an aggregation command. You don’t need to, but with large commands it is easier to read, and
can provide re-usable pieces.
selectdocs = {$match: {‘NAME’: ‘something’}};
selectRange = { $match: {$and: [ { year: { $gt: 2006, $lt: 2011 }}, { month {$gt: 3, $lt: 8 } }] }}; selectFields = { $project: { varname: 1, obs: { $size: “$allvals” } } }; Put the stages together into an aggregation command
db.monthly_data.aggregate([
justNYdocs,
selectFields,
selectRange
]).pretty()

How some specific stages work

Here are a few more details on some specific aggregation pipeline stages that I have found useful.

$match stages selects documents

The aggregation pipeline gets all documents from the collection to start with. $match allows selecting certain documents much like the find() command.
Find and match statements should be the same
Be aware that because it is an aggregate pipeline stage there could be a difference between find/match, though I have not run into any.

{"$match": { "Date":  { $eq: ISODate('2003-01-31') } } }  
.find( {"Date" : { $eq: ISODate("2000-01-31") } }  

A more complex match command

If you are familiar with find, match should be no different. This command looks for documents with price above and below thresholds for different metrics.

matchrange = { $match:  
  {$and: [  
    { $or: [ {"price.total": {$lt: 40000}}, {"price.total": {$gt: 600000}} ] },  
    { $or: [ {"price.unit": {$lt: 30}}, {"price.unit": {$gt: 800}} ] },  
  ]}
}

$project stage determines what to output

It can select fields, or create new ones. When I want to create a new field however I typically use the $addFields stage because if you add a single field with $project it excludes all the others by default.
Select fields to output

project = {"$project": { GEO_ID: 1, NAME: 1, "pricedata.ZHVI_AllHomes": 1 } }  

Project a new output field created by filtering an array The path.to.pricedata path would have to be an array, and in this example would be filtered by using pricedata.unit. Note that with a long input path, using ‘as’, renaming, and then referring to that variable with $$ is a useful feature.

 {$project: {  
   'filterprice': {   
        $filter: {  
            input: '$path.to.pricedata',  
            as: 'data',  
            cond: {$ne: ["$$data.unit", null]}  
      }},
   }},

Count the number of values in an array using project/size -

{"$project": { obs: { $size: "$allvals" } } }  

Exclude field -
Use 0 instead of 1 for the argument. The project stage will exclude the id field but include all others.

 {"$project": { val: 0}} 

$addFields is specifically for adding fields

addFields allows you to add fields into documents without implying that you are selecting those fields like $project. This example uses the $month and $year operators to extract those value from a date, and will add them into the document.

{"$addFields": {    
   "month": { $month: "$Date" },          
   "year": { $year: "$Date" },   
    }
}

$unwind -

It is for breaking down an array of something such that each value in the array is now individual document. I usually use this to take a large object, decompose it into an array, and then turn each key/value pair in the array into documents so that there are much simpler documents to work with.
$project decomposes the object with $objectToArray -

  {"$project":    {"arrayofkeyvalue":   {"$objectToArray":"$pricedata"}   }},    

Then $unwind would take that array and transform it into objects-

  {"$unwind":"$arrayofkeyvalue"},      

$group -

Allows summarizing based on grouping by field.
The _id argument is really important to how $group operates -
It determines how it groups, and also how accumulators/operators act.
null - If _id is set to null, lumps everything together. If a field path is specified, it groups based on the values in that.
Using it to get unique variables names -
First I use the unwind process from above to get object with a single key. Then $addToSet adds all the keys, but not duplicates, resulting in a list of unique keys.

 {"$group":  
    {"_id":null,  
      "allkeys":{"$addToSet":"$arrayofkeyvalue.k"}  
  }},

Getting all the values for specific keys-
Take a large object, decompose into array and then single key objects

 {"$project":    {"arrayofkeyvalue":   {"$objectToArray":"$pricedata"}   }},    
 {"$unwind":"$arrayofkeyvalue"},    

Then use $group with the object keys as argument to _id -
It sorts values by the keys, then for each key takes the first key ($first) to set the name, and all the matching values, even duplicates ($push).

{"$group":         
   {"_id":"$arrayofkeyvalue.k",   
    "varname":{"$first": "$arrayofkeyvalue.k"},     
    "allvals":{"$push":"$arrayofkeyvalue.v"}     
  }},

Some generally useful aggregation operators

There are many operators for many purposes, but a couple come to mind that might see frequent use.

map()/filter() -
Two of the standard array operators to loop over and select or perform operations on each member. There are other array operators like $in, $reduce, $size

Date operators -
If you need time/date operations, as long as you have the info in ISODate() format these operators ($year, $dayOfMonth, $hour, $second, …) should be able to do most of the manipulations needed.

Type conversion -
Operators are available for numeric, string, date conversion and checking types.