I’m having a books collection in MongoDB.
Each has a categories list up to 2 entries, example :
categories: [
'Thriller'
]
or
categories: [
'Adventure',
'Action'
]
Using MongoDB MQL I need to get two categories lists filtered to project :
- first list with uniques first values of categories
- second list with uniques second values of categories
I can only use $group, $addToSet and $arrayElemAt (no $unwind).
Here where I am and I can’t find the way to do it :
collection.aggregate(
[
{"$group":
{
"_id": "$_id",
"categories1" : { "$addToSet": { "$arrayElemAt": [ "$categories", 0 ] } },
"categories2" : { "$addToSet": { "$arrayElemAt": [ "$categories", 1 ] } }
}
}
]
)
Example of entry in books collection :
{
_id: 2,
title: 'HARRY POTTER A L'ECOLE DES SORCIERS - ILLUSTRE PAR MINALIMA',
isbn: '2075145938',
pageCount: 368,
publishedDate: ISODate('2020-10-22T08:00:00.000Z'),
shortDescription: 'Découvrez ou redécouvrez le texte intégral...',
status: 'PUBLISH',
authors: [
'J.K. Rowling',
'Minalima'
],
categories: [
'Youth',
'Adventure'
]
}
Expected output :
{
categories1 : [
'Youth',
'Thriller',
'Newspaper'],
categories2 : [
'Adventure',
'Newspaper',
'Essai'],
}
categories1 include only unique values from categories with 0 index (first value) and categories2 include only unique values from categories but this time in second position (index 1).
Any idea ?
Thanks!
Your query is almost correct. You only need to change the _id
for $group
. Since you want the unique lists across all books, don’t group on $_id
– that would treat each object/book individually. Use null
to group across all books in the collection:
db.collection.aggregate(
[
{"$group":
{
"_id": null, // this is the only line I changed
"categories1" : { "$addToSet": { "$arrayElemAt": [ "$categories", 0 ] } },
"categories2" : { "$addToSet": { "$arrayElemAt": [ "$categories", 1 ] } }
}
},
{ "$project": { "_id": 0 } }
]
)
And since you don’t want _id: null
in the result, I’ve added a second stage $project
in the pipeline.
Btw, if you’re putting that aggregation pipeline directly in Python, change the "_id": null
to "_id": None
example data:
[
{
title: "book one title",
categories: [ "Thriller" ]
},
{
title: "book TWO title",
categories: [ "Adventure", "Action" ]
},
{
title: "book 3 title",
categories: [ "Action", "Musical", "Animated" ]
},
{
title: "fourth book",
categories: [ "Thriller", "Musical" ] // both of these are duplicate categories
}
]
Output:
[
{
"categories1": [ "Thriller", "Adventure", "Action" ],
"categories2": [ "Musical", "Action" ]
}
]
Can you post a full sample json document and expected output instead of the current fragmented fields?
Wrt “second list with uniques second values of categories” – so what happens for the book with
categories: [ 'Thriller' ]
? Should ‘Thriller’ appear in both lists (as unique 2nd) or only in the first, if not book has it as a second category?I edited the question. @aneroid ‘Thriller’ must appears in the categories1 list, but if any book has ‘Thriller’ has a second category this value will also be present into categories2 list, but for both lists only one time (unique).