I am trying to write a resolver which gets the accounts belonging to a customer. The parent.accounts
attribute is an array of account_id
s. In the code below, I attempt to get an array of accounts for each customer.
import Account from "../models/accounts.js";
import Customer from "../models/customers.js";
import { AccountInterface } from "../types/accounts.js";
import { CustomerInterface } from "../types/customers.js";
const resolvers = {
Query: {
// some other code
customers: async () => await Customer.find(),
},
Customer: {
accounts: async (parent: CustomerInterface) => {
const response = await Account.find()
.where("account_id")
.in(parent.accounts);
return response;
},
},
};
export default resolvers;
The code runs and returns the expected output. The problem is that it is too slow since the resolver queries the database for each element in the parents.accounts
array and it becomes increasingly slow when performing ‘get all’ requests from the db.
Is there a better way to write resolvers for nested queries? and if yes, how?
The Problem:
It seems like a scaling issue to me. This could happen with any technology where you attempt to load this much data at once without limitations. If you allow the user to find customers and then accounts without adding any sort of limiting/max-complexity to your queries the user could ask for huge amounts of data from your server, and it will likely have to query over much more than it returns, one problem with this as you have noticed is likely performance the other issue is that of network bandwidth and that could be costing you more than just time.
Say as an estimate all you have is 10,000 customers and each customer has 3 accounts. Then you have 10,000 * 3 = 30,000 accounts in your accounts table. This makes your get all query search over 10,000 customers then for each of those 10,000 customers it will search over the 30,000 accounts (unless you index on your fields then it will likely be far better, more on this in the indexing solution). That means you just searched over 10,000 * 30,000 = 300,000,000 scanned entries and you would end up returning 30,000 account entries. The problem would be much worse when either of these is any order of magnitude larger (a.k.a if you added one or more 0s to the end of either accounts or customers).
If you instead could limit the number of customers to 100 at a time you would only search over 100 customers * 30,000 accounts = 3,000,000 scanned items total which is much better (again the number of accounts scanned is determined by whether you index your fields, so in practice with indexes this number would be more like in the 100s or 1000s of scanned items.) It would also only return 100 customers * 3 accounts per customer = 300 accounts.
The Solutions
There are a couple different solutions I would suggest for what would help you.
Query Limiting
There’s a choice between two solutions people generally pick to limit the query without erroring/failing outright, and which one you pick is dependent on your use case. With this you would likely want to limit the number of any collection/list that you are returning from the db in your current case both accounts and users.
- Paginate with skip & limit using mongoose. This ones a lot simpler then the alternative though not as robust.
- Paginate with cursors. Implementing relay connections is probably your best bet here as this is the most standard way to implement cursor pagination even if you don’t use react-relay on your front end. For help with understanding paginating with cursors and relay’s section on how it expects connections to be done on the server. The relay connections specification is a great way to learn to implement relay connections. Last of all if you don’t want to implement this yourself you could always check out my own mongoose library which implements relay connections I wrote which is MIT licensed, or just look at it to get some ideas.
Adding Max Query Complexity
I would also suggest on top of query limiting you error out when over a max query complexity is reached in your graphql schema. There are likely helper libraries in your graphql framework of choice to help you do this once you have limiting in place. These libraries will take in your query and before it is even run decide if it meets the max complexity requirements. The purpose of this is not to limit what is returned, but to error out early if the client tries to request too much data (think for example like denial of service attacks where the client may be a bad actor.)
Indexing your Collection
This can vastly improve your search speed. Think instead of O(n) search times it will cut them down to O(log n) search times. This is a huge improvement over non-indexed queries. So, if the other two problems don’t do enough to fix query times this may be a pretty vital option. You usually only use a index on any field you are searching for its value. For example your account’s account_id
field is where you would want.