How to diagnose and fix slow queries with Django Debug Toolbar

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Source code from the video is on GitHub

👍︎︎ 1 👤︎︎ u/The_Amp_Walrus 📅︎︎ May 09 2020 🗫︎ replies
Captions
today I'm gonna show you how to find and fix slow database queries in Django with the Django debug toolbar so let's get started the first thing you need to do is install the Django debug toolbar so I don't think I can do any better than the official documentation if you go to Django debug toolbar their website and just look at the installation page follow the steps and get that done yeah I don't think I can explain it any better than this so that's how you do that and I'm gonna move on to an example that I've got so this is a website that I've set up where there's a bunch of slow database queries that we're going to optimize using the Django debug toolbar so I just take you for a quick tour around this website it's like a forum there are a bunch of threads written by users in each thread there are many comments all of these comments are made by separate users just like Reddit or something and there's also a set of clubs each Club has a one or more users and a one user can join multiple clubs so that's our website not too much going on but just enough so that there's some stuff for us to do and while we're here let's have a look at this Django debug tool though so once you've followed the installation instructions anywhere you go where debug is true on your Django website which is in development you'll find this little thing poking around and you can grab onto it drag it around but what you really want to do is click on it and in this there'll be a bunch of cat panels which tell you very useful things about your application you can see things like what are your current settings information about the request really good stuff for debugging and what we're interested in today is the SQL section so this will actually give you a list of all of the database queries that are being run by the Django ORM relational mapper when rendering this view so as you can see there's a lot of database action happening and we'll get back to this later but that's just a general overview of the debug toolbar and the SQL section cool so I guess in order for us to really understand what these queries are doing we need to know what models are making up this django app and so that's what we're going to go over now so this application has four models it has a person and that's just you know a user who uses the website and they have a name that's cool so they have a names just a character field and that's all we need to know about people so people can create threads in a thread is has a title and a creator and so here we have a foreign key where the thread refers to a specific person so there's now a link between our thread model and our personal world and we'll go over this in more detail later we then have a comment model so a comment is you know just looked one of these so on the thread a person can create a comment and a comment needs to know what thread it's for it needs to know what person made the comment and it needs to have some text which actually makes up the comment and you can see here that we've got some text on our comment there's a reference to a person which I've sort of named user which is actually like a bad idea that this is just a foreign key to the person model here and a thread which is a foreign key to the threat model so a comments sort of needs to know what user it's for person and what thread it is for and finally we have this Club model which is for the club's page where club has a name which I just called text and also a club may have many users now this needs to be a many-to-many field because one person may join many clubs and one club may have many people associated with it so those are our four models now we're about to do some sort of query optimization in Django and I think it's good to have a mental model of how these models are actually laid out in the database before we get started so this is just a kind of demo django app and the database is just an SQLite database and we're going to give it a look by using this DB browser for SQLite app you can get it on Windows Linux it's pretty ambiguous so you just find this and download it if you want to give it a try and um yeah it's got all the database tables in here so these are a bunch of database tables that Django has put in there Django migrations sessions or the author stuff and this is what we're interested in here is called forum and so let's start off with person so I've made a whole bunch of people in my database and they all have names they have like primary key IDs nothing too exciting there if we look at forum Club the surprisingly little data we just have the ID of the club and the text of the club for the thread surprisingly little data as well there's just the ID of the thread and the title of each thread so where it gets interesting is when we look at the comment so if we look back here you'll note that the comment knows about it has some text and it also knows about the user and the thread and these are foreign keys these foreign keys are present in the database here whereas thread ID and user ID so on disk each row each comment has the text and then the idea of the thread that it belongs to and the user who made the comment finally we've got this many-to-many relationship and this is a many-to-many field relating clubs to users and this is kind of weird you might have noticed this that a the club doesn't have a user ID associated with it and the person / user field doesn't have any Club information associated with it so you might be wondering well where the does this information go and the answer is that when you create a mini too many field Django actually goes creates a whole new table that map's clubs to users in this case so every row is just a relation between a club and a person so this is all for club 17 and all the people who all the IDS of the people who are part of Club 17 and then the same for the club 18 and these are all the people that article of 18 and this is how many to many relationship is represented cool so that's that that's how everything's laid out in the database and hopefully this gives you some sort of mental model that we can use later and finally before we get started with the actual debugging it's just good to go over how these views are working so I've made three views home view a threat view into clubs view in the home view it's just this one here it just shows us a list of threads the thread view is for a specific thread and shows you all the comments written by the users and the club's view is a list of all of the clubs and all the users that want about those clubs so that's what we'll be looking at today and that's what we'll be optimizing looking at the home view for all of these I've written sort of a slow version of the query and then faster versions but basically what we're doing is in this section we're pulling the threads out of the database that we want to show it for our forum and then we're rendering them to a template and so for the home view we're just looping over all of the threads in the templating engine and then rendering the thread title and the name of the threads created creator in the thread view we're getting a single thread given a certain primary key so we're just pulling one thread out of the database and then rendering that using the templating engine and in the thread view we're going through all of the comments that are related to that thread looping over those and for each comment we're getting the name of the user who made that comment and the text of that comment and finally the club's and we'll come back to all of this in a second we're just pulling all of the clubs out of the database and then rendering those here and so we're looping over all the clubs and then we're looping over all the people in the clubs and for each person with printing their name and that's it that's the whole app cool so let's get started with the home view so um you know like a use case or a sort of typical case where something like this would happen where you're inspecting the queries review is when you load a website in production or a testing server and it seems kind of slow and maybe your users are complaining at your stuff complaining like this websites too slow you need to make it faster well your bus is telling you that it's too slow and it can sometimes be hard to ahead of time know why it's so slow and this is why this sequel debug toolbar panel is so useful so if we load the page we have a looking here we can see what queries were making and I think the first thing you should notice is that it's a shit-ton it's a 27 queries now I just want to point out that these queries are running very fast 4.8 for milliseconds that is an amount of time that anyone is happy with that's blazing fast and so the first thing I want to note is that when you're debugging django queries locally these database queries are going to run pretty fast it's just reading and writing to a file that's on your computer so the time isn't really a good representation of how fast this is running really and another thing you'll notice I'll see if I can replicate it here this is saying 27 queries in 4.3 6 milliseconds I run this a couple of times it's gonna jump all over the place it's five point seven milliseconds four point ten milliseconds it might be sometimes one millisecond I mean it also depends on the type of database you're using and that could be due to caching or whatever it's it's very variable found and to actually get an accurate like repress read reproducible number locally when running his queries isn't very easy and what you should really do is focus on the number of queries and there's another reason the number of queries is important and that's that typically when you're looking at performance when you're getting the same amount of data the one thing that you can fine-tune and the easiest thing to fine-tune is the number of queries so think about it this way every single time a query is run Django it has to send a request to the database server and that might be reading the SQLite file or it might be sending a request over a network to another machine and every time there's some latency involved and so if you're making 27 queries there is a latency overhead every time and so you're paying that price every single query even if each individual query is quite fast and you can see here they're all kind of fast this one's slow you know a lot of these are teeny-tiny what is that point eight of the millisecond it's blazing fast but there are so many of these that slows the whole thing down so what you want to try and do is take all of these little queries and bundle them up into one big query so in general in this guide we're going to be optimizing the number of queries rather than just looking at the wall clock time even though the thing ultimately that we want to make smaller is the amount of time that it takes and another thing as well is the number of queries gives us a good indication of how much our performance will slow down as the number of objects in the database increases so what I'm trying to say is like he was showing I was at ten twelve threads but if we're showing twenty threads or a hundred threads how is the number of queries going to grow and that's another thing we'll be looking at all right so uh this seems like a lot of queries let's get stuck into it and try and reduce the number so the first thing that I think you should do when using Django debug toolbar is to just look at what the database is doing so you can see here if you open this up that we're selecting forum thread dot ID from the forum thread and if that doesn't mean anything to you you can just go and run this you know SQL thing run that see what it does so it's going and grabbing all the IDS which is cool a little bit weird but cool and then it's selecting a thread so it's from the thread table where an ID is 30 25 etcetera so it's going and selecting every single thread and then for each thread Furi through this it's selecting every single person one at a time so it's literally walking through a loop and selecting this thread and then selecting this person and then selecting this thread and selecting this person and this is like the worst possible way to do it and importantly as the number of threads grows the number of queries will grow proportionally so we have a look at the code we can see why this stupid thing is happening and it's because I wrote the stupidest possible code to do this it's kind of convoluted you'd have to go out of your way to write this you can see here that I'm getting all the thread primary keys using values list and then looping over them and using it and that's why we're having 27 quick queries and so the fix to this is to just not do this dumb thing and to use just the Django or and as it should be used and just used the thread dot objects to all so this was just an example of kind of the worst case but this actually isn't as fast as it could be still so let's give this a try I'm sort of simple version where you're just getting all the threads and then rendering them so we had 27 queries before let's give this another good and you can see now we have 15 now the time did go down but we're not focusing on that we're focusing on the number of queries so that's good 15 queries let's see what happened so it just kind of looks faster as well there's less stuff on the page so you can see it is selecting all the threads and you'll note it's not just selecting the ID now it's selecting everything in one go and you can confirm that just by running this if you're interested you don't have to do this last part you can see it's grabbing all the threads in the title so this is the data we need after doing that it's still going and selecting all of the people one at a time which is kind of shitty I mean it means it still means that for every thread that you add to this page a new person will be selected so the number of queries grows proportionally to the number of items on the page which isn't what you want which might lead you to ask well why is this happening and it's because of the structure of the database if we go back and look here if you look at forum thread there isn't the user name in this table and so when Jango goes to look for a threads a thread users name it has to get the thread pull that out of the database which is what it did here you've got the thread title and the creator ID which is the person's ID then with this Creator ID it went and did a lookup in the person table to get their name so this means that for every username that wants to render it goes and does another query for this and that's because the Django ORM although it's pretty good isn't hyper intelligent it's not gonna kind of figure this out for you it's up to you to tell it what to do so this happens anytime you have a like a foreign key which I'm gonna call like a one-to-one foreign key which is to say for every thread there will be one creator right so this is a direct foreign key lookup where we're going from the thread to its foreign key relation and when you have that that's when you use select related so I'm I'm going to link the github of this and you can look at these dogs later if you want but basically select related is for when you want to tell Django to go and get something that's a foreign key so let's get that ago so you'll know this is just thread to objects to old but before we put in the old query we're telling it to select related so let's give that a guy okay so now we're doing three queries in less than one millisecond and once again I wouldn't fix that too much on this time it's probably going to be more variable than ever it's jumping all over the place so don't worry too much about it but three queries is a lot less than 27 queries so what we can see here is that there's basically a one query that we don't care about it's like some Django or thing another thing we don't care that another Django Wolf thing and then there's one query that's doing everything at once and so this is select related at work if you're familiar with sequel this is equivalent to an inner join on the person table and so once again if you're interested you can copy this in run it and you can see now it's um selecting both the title and the name in one query and that's what we want so when you can use select related because it allows you to do everything in one query and you can use select related when there's a foreign key from the model that you're querying to another model cool so this is as good as it gets I think this is I think as fast as you can get this home view and we've done a good job here and so just to recap we've gone from 27 queries to 3 queries which is pretty good alright so we've optimized this page we've gone from 27 to 3 queries and the threats page now let's take a look at the club's page so we have many clubs and one user can also be a member of many clubs a club may have many users hence the many-to-many field and this is 13 queries which given the amount of data we're displaying is like surprisingly little but we can still do better than this we have a look at our 13 queries we can see it's wonderful each person so we must be rendering 13 people in this page and each person needs their inquiry we're only selecting no so we're getting all of the clubs and so that's done in one query and you can check that out if you want this is all the clubs you can do but then for each what's this one and it's going and getting so it's going in getting for each club it's going getting all the people who belong to that club which is interesting I didn't know it would do that so uh the ORM it's not like super smart but it's not super stupid either and so we can see this here where we're iterating over the clubs in the template and then for each Club we're iterating over all the people and then rendering their name so just to go over the model again one club has a many-to-many relations with users I don't think it really matters which model has this relation because if you remember in the database we end up with a structure like this we have the club it just has its ID and text the user or the person just has the ID and name and then there's a separate table that relates each club to the user which allows a many-to-many relationship so we've already looked at using select related for cases where there's a foreign key when you have a many-to-many key you need to use prefetch related so prefetch related is so that the second thing you should reach for when trying to optimize django queries and it's good for these many-to-many relations so what we're saying here is Django when you go and get all of the clubs could you please also go and get all of the users associated with the clubs that you grab and so Django won't be able to do this in one query but it will be able to do it in two and we'll see how that works so that's how change we're just gonna use prefetch related because of the many too many field we give this to try now instead of 13 queries we should see like four or something so there you go we've cut down from 13 to four queries and that was one query for each Club it seems so as you add another club before we would get another query and now we have a look you can say that we have one query to get all of the clubs right yeah I think we've already run this but just shit's and gigs you can see it's grabbing all of the clubs and their IDs and then we're assuming the second query to grab all of the club uses and so Django has actually done this query gotten all the club IDs and then put them here and then run another query that selects all of the users that match those Club IDs so it's aren't they going and getting what it needs it's not dragging up the whole database table no syntax error and you can see here it's getting all the usernames living one you might know it's actually probably getting the same username twice I think I spoke Devi in there like a hundred times so yeah Debbie Debbie so that's interesting I didn't know it would do that on it's not hyper efficient but it's certainly better than what was it how many queries it was before once again this is fetching redundant data but it's probably better because it's a fewer queries and fewer queries means less latency and most of the time it's latency that's the killer not the amount of data you're dragging up so there you go we've used prefetch related on a mini to mini field to cut down from 13 queries to four queries and I'm also confident that if you add another Club the number of queries won't grow which is good it's a good especially when you expect there to be more data in the future cool so we've done the threads page we've done the club page let's finish off with the individual thread page so this is a single thread I think I've gone and messed around with this already I suspect it how it was show you what it should look like right so we have a thread it's a single thread identified by this sort of thing in the URL is picking the URL and this thread has many comments each comment has some text and has a user as a name so we have a look at that in the database you'll see that we've got one thread and we have comments that know about what thread they belong to and the comment also knows what users belong to that comment and then the user has a name there's actually three models here and the relation goes that the thread is going to ask for all the comments that belong to it and then each comment is going to ask for the user that belongs to it so there's sort of a double hop here if we look at how this is queried you'll see we're taking the primary key that comes in through the URL we're grabbing one thread using get and then using a render to sort of use that thread to render the page if we have a look at the HTML you can see that we're rendering the thread title which is probably free and then looping over all of the comments in the thread and for each comment we're going to render the comment text and the comment uses name so here we'll actually making a new query here every time because we're asking for the comment user so let's have a look here at the sequel pane to see what's going on so you can see we are selecting all of the forum threads and if you don't believe me I will execute that and sequel know certainly I was wrong it's selecting one forum thread so it's using way ID equals 19 - just get the title and the the idea that created which we don't care about in this case and then it's actually being relatively smart it's going and actually getting all of the comments in one go so we have a look at that it's going to the comment table and it's getting all the comments at once so it's actually not that bad and this is because we used comments set up all this is a reverse foreign key lookout to look at all the comments that pointing at this one thread and we just asked for all of them and Django being not that dumb has gone and done this in one query to federal for comments and so you would think like well what else is there to do problem is for each comment then we're asking for a user and the user is in a different table so if we have a look at that if we go to the comment you'll see it has a user ID it doesn't have the user name so even though Django is going in fetch all the comments we need it hasn't fetched all the usernames we need because they're in this table here so this results in a new query for every username which um means that for every comment on this thread there'll be one query and we can see that here where it's selecting from the person table a bunch of times and you can see it's selecting just for a particular ID every time so it's going and doing a query for every person and this is the sort of thing we want prevent because it's like sort of pointless has a lot of latency and overhead and we want to bundle all those up into one query so just as an experiment let's get rid of all these references to users and see what happens we just get rid of this reference we user ID and instead of having user name we should call all our users balance such as Alice's forum refresh the page you can see now this is full queries which is more what we're aiming for and so this sort of just twiddling proves to us that it's the selecting of the user that's doing the extra overhead let's change this back so this one's a bit more tricky as the trickiest query bitmap that query optimization that will do how we're gonna get from 29 queries to like four or five so we've already seen how to use prefetch related for a many-to-many field this is a little bit different we're gonna do prefetch related to go over how to put this this is how to describe it we use prefetch related to tell Jango to select a to select over a hop I'm not sure how to really say it but basically we want to tell Jango when it goes and gets all the comments that it should also go and get all of the users so it's going from thread to comments to users or person and so the way we tell it to do that is with prefetch related here and this is a like this zany Jango syntax so you'll note that the thread has a comment step and we want to prefetch that but that's not enough we also need to then go tell it to look even further and look at the user relation on the comment set now this syntax is like if you're reading this in segment what the is this that's pretty normal this is very Sango specific syntax it takes a while to get used to and I recommend that you have a look at the prefetch related documentation on this but basically you feed it a string that tells it to look at all the comments and to prefetch those and then to also prefetch all of the users that are attached to those comments and this gets so complicated and hard to visualize that it's not always obvious how this will work or whether it'll work and that's why Jango debug tool are so valuable because you can check very quickly in development like did my query work did it actually reduce the number of queries for example we were just to do this and maybe didn't think at two hard which I did it earlier maybe you think that this would fix the problem and reduce the number of queries but when you go and try it's it's actually done an extra query so we've made more queries Nautilus and that's because we need to tell it to go and get the users as well so prefetch related it gets kind of hairy especially when you're doing jumps through objects to other objects and you can do some really crazy with this and I guess I just want to tell you don't beat yourself up if you struggle with this and need to experiment a lot and that's what Jango debug toolbar is really good for experimenting with this stuff so now that we found a query that we think works let's give it a try and we do it in six queries which is cool let's have a look at what we're doing so two of these queries are to do with authentication which we don't care about we are selecting let's see the thread it's actually going selecting the thread a second time it's actually doing the same query twice which I don't know why it's doing that it's actually doing that in the prefetch related and the get which is interesting whatever so assuming some weird duplication it's then going getting all the comments here and then it's doing this prefetch sort of query where it's getting all the people with IDs that match these IDs and these would be the IDS of comments that we just wrapped here so that's how that works um if this doesn't make sense to you fair enough it's a little bit of crazy I recommend you download this git repo boot it up and give it a go so I'm just sort of experiment around with this and have a look at what queries it's making have a look at the database structure to read the Django Doc's and hopefully you'll uh you'll get there at some point cool so I think that's maybe that's not perfect but it's a lot better and we can also expect that as we add more comments we won't get more queries I think the wall clock time is fast although that's not the most important thing development there you go that's how you use prefetch related to do I guess queries through objects okay I hope you enjoyed the video just to recap on what we've learned how to do we've learned how to use Django debug toolbar to see what queries are running in our views we've learned how to use select related for cases where there's a direct foreign key relation we've learned how to use prefetch related for cases where there's a many-to-many field we've also used how to learn pretty fetch related to do more crazy things like reach through objects to grab other objects like in the case of going through a comment to get added to use it so yeah hope you enjoyed the video go out and give it a try on your own code base and still Django debug toolbar and see if you can optimize some of the queries that you're running in your views there's also a github that I'll link in the comments and on the blog post which you might be interested in poking around the code that I've been looking at today and also if you're interested in employing Django I've done a video series of on that which takes you step-by-step through how to deploy django to a Linux web server would show if you like this you might like that as well thanks right
Info
Channel: Matt Segal
Views: 2,366
Rating: undefined out of 5
Keywords:
Id: 9uoI6pvuvYs
Channel Id: undefined
Length: 34min 9sec (2049 seconds)
Published: Sat May 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.