GraphQL N+1 Problem

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

How can I help this using while Prisma? I've found using fragments seems to speed things up quite a bit. Is there anything else I can do?

👍︎︎ 3 👤︎︎ u/sevenyearoldkid 📅︎︎ Dec 16 2019 🗫︎ replies

Helpful!

👍︎︎ 2 👤︎︎ u/suyashgulati 📅︎︎ Dec 16 2019 🗫︎ replies
Captions
so when you're building a graph QL server there is a common issue that comes up called the n plus one problem in this video we're gonna be taking a look at it and how you can solve it so this problem usually comes up when you have a relationship in your schema and you try fetching the data for that so for example here I have a schema with two types type of book and type author now you notice on my type book I have a field here and the author field actually is of the type author and so a lot of times the book and the author are stored in two different database tables or they may be stored in two different databases altogether and so you may fetch them in different ways and so when I fetch books here the query I'm fetching multiple books and I also may need to fetch the author for the books so depending on how I set up and resolve the data for this I can have or run into the n plus one problem and I wanted to make a quick note here I'm using the Apollo syntax to actually create my schema but it doesn't matter if I choose to use something else like type graphic ul to create the actual schema usually the problem stems from the relationship itself when I'm fetching multiple books and I also need to fetch other relationships for example the author so let's take a look at how I resolve the data for this relationship so down here I have the books query and you'll notice I am using next to fetch the books now this doesn't really matter if you're using sequel or all that really matters is we're doing some kind of request to get the books and your notice we are not getting the authors here so we fetched the books and we returned them here and then we actually take use of a field resolver here to load in the author and so we have the book field or book type and inside that we resolve the author field with this function now inside of here what we're doing is we take the parent which is going to be a book and we get the author ID and we fetch the author based on that and if we go over to graph you'll play around and run the query this does work it will fetch the correct data so I'm going to just come over here and I'm going to run this and looks like we had it run anyway but you'll notice I'm fetching the ID title for books and I'm also fetching the ID and name for author and I get exactly that but the problem is is if I log the commands that were run or the sequel ever run or the number of database requests that happened I'm gonna be doing quite a few so if I open up my log here I'm just going to clear it and run it again you'll notice what happens is we do a single sequel query to fetch the books and then for each book we fetch the author so you'll notice they created five sequel requests because I have five books now if I were to limit this to ten books give this a save and I were to rerun this you'll notice we fetched the books and then we fetched the users and now we have ten sequel requests and so this is why it's called the n plus one problem because for every book n is the number of books that is the number of sequel requests or database requests we are making so as we fetch more data our query gets extremely slow and one thing I want you to note about this this only happens when we fetch the author field right here because of how graph kill works the field resolver or the author resolver only runs when we fetch that field so for example I can remove that and just fetch books with the ID and title so now if I clear my log over here and I run this and I fetched just the books and we only run a single sequel query here to fetch it so one of the most popular ways to fix this problem is to use a tool called data loader so we're going to look at that first now we're going to look at the resolver to start off with so the notice that the book resolver looks exactly the same we are fetching the books but what changes is our field resolver right here so we are getting access to the context the graphical context and we're accessing this thing called an author loader and we call dot load in the author ID and so the whole idea behind data loader is we take all the author IDs we batch them together instead of making a query for every single book we batch them together make a single request and fetch all of the author IDs or all the users at once so this is what you do in your resolver to actually batch and make that batch request what we do is we create a loader and so it looks something like this so here's our author loader that we are creating and we're passing in our context and when we have a function that we're passing into the context here with Apollo server what's actually gonna happen is a new data loader is created every single request so data loader not only batches things but it caches things now usually you only want the cache to happen per request that way we are not getting stale data ok enter data loader here what we do is we pass a function in and this function is how we handle the batching and how we do a single request so here we take a parameter called keys keys is where we're gonna store or where we're gonna get the array of author IDs so we take the author IDs and we fetch them so here you can see I'm saying next dot select where and the ID is one of these keys and so here with a single database request I was able to get all the authors and then here's a just a little bit of plumbing that you need to do with data loader to return the authors in the correct order and so if we see this in action now I'm going to just restart mine and run the data loader server that I have I'm going to clear this here and now we're going to run it and we're going to see what happens so if I run this just with our books query without the author field same thing happens we just fetch a single request like this but now if I say author ID name we fetch the author here and if we see in our log what happens we fetch a single or all of our books and a single query and then after that we fetch all of our users or all the authors for the books and you'll notice here we are fetching five of them so now if I come back up to my query and let's say I change this to 20 save this clear the log and rerun it so even though we have 20 users that we are fetching we are only doing a single request and so that is the power of data loader we have batted it all into a single one so this is a lot more performant than doing a request per book now one of the great things about data loader is because it is split between two requests getting the books getting the users the books and users don't even have to be in the same database so for example maybe I store my books in a database and I store my users in a sequel database like Postgres I can do two different requests and I can do a data loader batch to request to Postgres in the books I fetch from and so you can split things up between different services or different data tiers if you need to now there are two things I want you to keep in mind now I don't necessarily want to call these cons but they're not exactly advantages either so the first thing is we have a single relationship here if we have more relationships we need to create a new loader per relationship so I created a single author loader here if we have say we also want to fetch not only the author but then that user has a profile and we need to fetch the profile that would be another data loader that we would create so you actually create a data loader per relationship so there is a little complexity in creating these data loaders per relationship the second thing to keep in mind is we're doing two requests right we are doing a books and then a user's requests some of you that are a very adapt at databases know we can actually fetch all this data in a single request and so that's what we're gonna look at next so data loader is one way to solve this problem but it's not the only way so let's look at another solution so I'm calling this the join example because we are youing a database join to fetch of the data in a single request now again this is only possible if all your data is in the same database but I would say a lot of times that is where your data is stored in a single place and you can do something like this so for example I'm doing next select books and then I'm doing a left join to get the users now I'm have this function here called hydrate all that does is it Maps the requests that we have here because the data is going to come back from the database and in a different shape than we want it so I'm calling hydrate that will just loops through and maps it into the shape that I need it to return and so you'll notice this is actually a little bit simpler I didn't have to set up a field resolver and all I have here is a database request a single one and it gets all the data all at once so let's look at this in action so I'm gonna run that then I'm going to clear the log come back over here we're fetching both the author and the books you'll see the data looks the same on the right over here but now we have a single database request so one of the advantages of this is it's going to be faster because we're doing a single request now when I say this is faster a lot of people say sometimes it's faster but other times it's faster not to do a database join and to do multiple queries but my main point here isn't how you do the sequel request and that database joins are always faster my point is we can write a single database request maybe this contains and join maybe this contains a sub query maybe it's a recursive a CTE or whatever they're called but my point is this is gonna be faster because it's making one round-trip to the database compared to two round trips to the day base but what I have here does have a flaw or a downside so you'll notice we are always doing a join here now if I come back and query I don't have to always fetch the author so for example here I have a query that only needs the ID and title so I run this and we just get these back but you'll notice in our log over here we still do the join so in this case we you are unnecessarily making our query complex or making our query slower by doing a join so in this case we are doing more work than we need to to get the data that is the main downside to this approach if you don't fetch all the fields you may be over fetching some cases this may not matter though it may be a small performance optimization that really doesn't matter to get rid of the left join that you're doing or the extra part that you're doing in every request or you may just always need the author like some graph QL queries that I have in my schema I always need to fetch the relationship but there is a way where we can get the best of both worlds where we can do a single request and get all the data we want like this when we need the relationship and when the relationship is not there we don't have to do the left joint and so that's the next example we're going to take a look at so in this what I do is I take advantage of this fourth parameter that graph kill gives us in our resolver it's called the info object and this contains attributes of what the user selected here so in our graph QL query we can actually look at the info object and see if they selected the books and if they selected the author so I can know when they have selected the author field and when they haven't so then I can do a conditional join or sub query if I need to so take a look at this I made this function called does path exist and it takes a look at the field nodes in the info and then it tells me if this path exists did the user select books and then the author now I'm not going to go too much into this function but I'm going to put the code on github if you want to check it out basically what it does is it recursos through the list of nodes and figures out if the path exists so I have a value here called should join off table which actually now I'm cooking and I'm not sure why I call it off the table really should be called the user table but anyway I use the next query builder here and I start my query and I say if should join user table if I should I add the left join otherwise I don't so this is a really nice place where you can use query builder and you can conditionally create a query very easily and then down here I just await the request and then again I'm calling my hydrate to format my books and so let's see this in action if I do yarn conditional join just going to clear this come over here I'm going to fetch my author and you'll notice this has a left joint in it and it happens in the single request and then if I get rid of my author and I run this you'll notice I have my data back and we do a single request here that does not have the left join so now we kind of have the best of both worlds we have a single request and we're getting all the data that that user needs no matter what fields they are selecting we can even take this a step further if you wanted to and only select the fields that the user wants so for example here you can see I'm saying select all from books but let's say I don't want the title right and I only want the ID of the book I'm still fetching that from the database so there is levels to the optimization we can do and we can even take this a step further in my opinion that optimization is not worth a lot of times and the complexity of actually getting it set up is not really worth the performance gains you're gonna get now what I will say is this is example of just a single relationship we are fetching so this will only get more complex as you add more relationships in there so there's going to be some cases where it may be simpler to create data loaders then is to try to do a single request and join all these things together so when you run to n plus 1 problem keep both of these techniques in mind and I think in a lot of cases you can just get away with doing a single request like this and not worrying about conditionally joining anything or conditionally creating a query at all one library that I wanted to mention was join monster because I know at least one person is gonna ask about it I haven't used it in a while but is the library that takes the graph gel query and does the conditional joining that I was talking about automatically for you but taking a look at it the commits have been pretty sparse lately and the project kind of looks like it's dead so I'm not really sure I can recommend using this you may want to start bill kind of your own abstractions over creating data loaders or creating conditional statements yourself so that is the n plus one problem and the different techniques you can use to solve it if you liked this video make sure to give it a thumbs up and don't forget to subscribe you
Info
Channel: Ben Awad
Views: 80,080
Rating: 4.9602466 out of 5
Keywords: graphql, n+1 problem, GraphQL N+1
Id: uCbFMZYQbxE
Channel Id: undefined
Length: 16min 14sec (974 seconds)
Published: Mon Dec 16 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.