MongoDB MQL, split list in two and get only unique values

I’m having a books collection in MongoDB.

Each has a categories list up to 2 entries, example :

categories: [
   'Thriller'
]

or

categories: [
   'Adventure',
   'Action'
]

Using MongoDB MQL I need to get two categories lists filtered to project :

  • first list with uniques first values of categories
  • second list with uniques second values of categories

I can only use $group, $addToSet and $arrayElemAt (no $unwind).

Here where I am and I can’t find the way to do it :

collection.aggregate(
    [
        {"$group":
            {
                "_id": "$_id",
                "categories1" : { "$addToSet": { "$arrayElemAt": [ "$categories", 0 ] } },
                "categories2" : { "$addToSet": { "$arrayElemAt": [ "$categories", 1 ] } }
            }
        }
    ]
)

Example of entry in books collection :

{
    _id: 2,
    title: 'HARRY POTTER A L'ECOLE DES SORCIERS - ILLUSTRE PAR MINALIMA',
    isbn: '2075145938',
    pageCount: 368,
    publishedDate: ISODate('2020-10-22T08:00:00.000Z'),
    shortDescription: 'Découvrez ou redécouvrez le texte intégral...',
    status: 'PUBLISH',
    authors: [
        'J.K. Rowling',
        'Minalima'
    ],
    categories: [
        'Youth',
        'Adventure'
    ]
}

Expected output :

{
    categories1 : [
        'Youth',
        'Thriller',
        'Newspaper'],
    categories2 : [
        'Adventure',
        'Newspaper',
        'Essai'],

}

categories1 include only unique values from categories with 0 index (first value) and categories2 include only unique values from categories but this time in second position (index 1).

Any idea ?

Thanks!

  • 1

    Can you post a full sample json document and expected output instead of the current fragmented fields?

    – 

  • Wrt “second list with uniques second values of categories” – so what happens for the book with categories: [ 'Thriller' ] ? Should ‘Thriller’ appear in both lists (as unique 2nd) or only in the first, if not book has it as a second category?

    – 




  • 1

    I edited the question. @aneroid ‘Thriller’ must appears in the categories1 list, but if any book has ‘Thriller’ has a second category this value will also be present into categories2 list, but for both lists only one time (unique).

    – 

Your query is almost correct. You only need to change the _id for $group. Since you want the unique lists across all books, don’t group on $_id – that would treat each object/book individually. Use null to group across all books in the collection:

db.collection.aggregate(
    [
        {"$group":
            {
                "_id": null,  // this is the only line I changed
                "categories1" : { "$addToSet": { "$arrayElemAt": [ "$categories", 0 ] } },
                "categories2" : { "$addToSet": { "$arrayElemAt": [ "$categories", 1 ] } }
            }
        },
        { "$project": { "_id": 0 } }
    ]
)

And since you don’t want _id: null in the result, I’ve added a second stage $project in the pipeline.

Btw, if you’re putting that aggregation pipeline directly in Python, change the "_id": null to "_id": None

Mongo Playground

example data:

[
  {
    title: "book one title",
    categories: [ "Thriller" ]
  },
  {
    title: "book TWO title",
    categories: [ "Adventure", "Action" ]
  },
  {
    title: "book 3 title",
    categories: [ "Action", "Musical", "Animated" ]
  },
  {
    title: "fourth book",
    categories: [ "Thriller", "Musical" ] // both of these are duplicate categories
  }
]

Output:

[
  {
    "categories1": [ "Thriller", "Adventure", "Action" ],
    "categories2": [ "Musical", "Action" ]
  }
]

Leave a Comment