Context: I have created a service that deals with users data. I can not modify source and order of data in upstream system (usually it is a partial snapshot of a user or a delta (e.g. user’s new post). I also must deal with duplicated information. I modeled my persistent layer around document db (MongoDB) using a single ‘users’ collection because I deal with independent users that do not have any relation, I don’t need to update multiple tables and my read model is similar to my document’s schema. I can always identify a user by his/her _id and login so whenever I need to update user’s info I can update just a single document. Moreover usually I don’t want to override user’s data, but rather append it so a document creates a user’s log. I just update it in the very rare case (when the data is nullable but will be updated only once (details) or I don’t care about intermediate state (details.verified).
Problem: Whenever I receive an info about a given user, after matching it I want to check if the user should be modified and update it in a single atomic query/update. The problem is I may have several fields or array entries to check.
In a given example I want to:
- match a user with _id 123
- add location entry to locations array only if in the last one location differs than the the one in the query.
- A profile is very complex, thus I introduced a hash. I want to add a new profile entry only if the last hash differs than the one provided in the query.
- Add a new post to posts array only if it does not contain a post with given postId.
Another problem is that I can have e.g. just an information about a new post (that can also be a duplicate) for a given user, but it doesn’t exist yet thus I need to upsert it.
So I want to cover few use cases that will update user’s data in a single query, e.g.:
- a user not exist and we add it’s details
- we append a new post that is not yet persisted
- we are trying to append a post that was already saved
- we append a new post and check if location hasn’t changed (and update it if needed)
- we check if latest user’s profile has changed by comparing hash and update it if needed and at the same time add info to user’s details.
Updates based on a single criteria seems to be strait forward. Without aggregation I can update a document with only one match criteria. Aggregation pipeline can contain several stages, but I am not certain I can use push stage to append to an array if certain criteria is met without a group stage. Upsert with conditional push is also problematic. I end up by ignoring DuplicateKeyException (so far without aggregation, because my service’s logic relay on information if we deal with a duplicate). Maybe you know a better solution.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/push/
A question if this approach is doable? I want to avoid replacing a document. I would be appreciate for some guideline and/or example queries.
A simplified document:
[
{
"_id": 123,
"login": "john",
"details": {
"a": "abc",
"verified": true
},
"locations": [
{
"location": {
"country": "UK",
"region": "foo",
"city": "bar"
},
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
}
],
"profiles": [
{
"preference": {
"a": "foo",
"b": "bar"
},
"details": {
"a": "foo",
"b": "bar"
},
"hash": "hash123",
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
}
],
"posts": [
{
"postId": 234,
"message": "foo bar",
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
},
{
"postId": 345,
"message": "bar foo",
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
}
]
}
]
Your 3 updates of locations
, profiles
, and posts
shared the same update pattern and can be solved with the below update:
- assign a variable of your input with
$let
- perform conditional checking with
$cond
; update or leave the current array as-is based on the result. If update is needed,$concatArrays
with current array and a single-sized array which holds your input variable.
It may seem abstract so here is a concrete example of locations
{
"$set": {
"locations": {
"$let": {
"vars": {
// your location input here
"locationInput": {
"country": "TEST",
"region": "test",
"city": "test"
}
},
"in": {
"$cond": {
"if": {
$ne: [
"$$locationInput",
// last location
{
$last: "$locations.location"
}
]
},
"then": {
"$concatArrays": [
"$locations",
[
"$$locationInput"
]
]
},
"else": "$locations"
}
}
}
}
}
}
Then just repeat the $set
3 times for 3 cases. In the playground, they are separated for clarity / readability. You can put them into a single $set
if preferred.