How to Think Like the SQL Server Engine Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning party people so howdy it is what 6 a.m. 6 a.m. here in San Diego thanks Francisco for the Francisco Camacho I like it I just subscribed thanks for the the subscription there it's you know it's funny I still got to figure out how to get all the like the little titles to work down around the bottom around most recent follower or most recent subscriber I had commissioned visuals by impulse to go make me a set of custom scream over let's stream overlays but haven't heard anything back from them yet in terms of what the schedule looks like on that and how long it takes so this morning what I'm going to be talking through is my how to think like the engine sequel server series I'm going to open up sequel server management studio and just typing demo queries cold I'm just gonna be going off of either your questions things that y'all go into and I'm gonna walk through I've got a set of notes up here where I'm gonna try to route you through my entire how to think like the engine class and there are no slides here whatsoever it's just a hundred percent demos this also means that I can go off-topic anything that y'all find interesting we can go off and into and go from there I originally scheduled the stream at like an hour long I wouldn't be surprised if we went for like two hours depending on what kind of questions y'all run into alright well let's go ahead and get started so we'll go straight into sequel server management studio now I'm gonna be using sequel server 2019 but everything that I'm gonna show you is exactly the same going back through like the dawn of time nothing in sequel servers behaviors really changed here then olio says what are you drinking I am drinking espresso this morning I love the taste of espresso which is freakin fantastic if I could drink espresso continuously I would be totally all over that I would probably never stop oh I don't have the chat box turned on we make sure that that's turned on so that your chats pop up and over like read Barrett says hey Amsterdam yo this is I feel bad cuz actually did a training class yesterday and I wore an Amsterdam t-shirt and now today I'm marrying Amsterdam sweatshirt so it's kind of weird that works foodie for reals yes you know I should even move that the chat box up just a little bit not the background dummy should move the chat box up a little bit come listen oh do I have it locked I have it locked that's why I can't do that let's move that up just a little there we go that'll make more sense okay so we have in a foodie I also like just weird I like bitterly tasting stuff like I like Guinness for example most people don't like really bitter type beers or chocolates I like dark chocolate I could just drink straight espresso 24/7 if it didn't of course cause things like heart palpitations and I know y'all probably think that like oh he's this excited or whatever because he drinks espresso no I'm always like this I wake up at 2:00 3:00 in the morning and I'm like let's do this I haven't had an alarm for years if just we are like I'm all done tonight they're not there then of course on the flip side is by 2:00 p.m. or 3:00 p.m. I'm like I think I'm just gonna go sit over there and have a nap penny says good afternoon Brent good to see you again got a regular started around here so what we'll be doing is I'll be using sequel server 2019 and I'm using the stack overflow 2010 database now you could use any version of the database I'm just gonna use this small one just cuz a lot of people will use stack overflow 2010 cuz it's small it's only like a gig in terms of downloading it's pretty quick and easy so the stack overflow database is provided by stackoverflow.com totally open source they they're just wonderful in terms of the way that they get back to the community and make it really easy for people to learn so howdy mom owl it could to see you again so now start running queries against it and I've got the links up there up on the page too and I'll I'll go ahead and dump that out into the chat just so that you all have it in case you want to follow along there with the demo queries so let's get started so I'm gonna say select star from dbo users I'm just gonna go show everything in the users table the users table at stack overflow holds exactly what you think it holds a list of everyone who's ever logged into the stack overflow database left a comment posted a question whatever it's a fairly simple straightforward table and it has things like your display name your location where you live in the world your website URL how many reputation points you have and the contents of it are up here on the screen I want you to think of the stuff that's up here on the screen as like a printed out piece of paper in a perfect world when you do this class you actually print out the PDFs and they're available at that link if you go over to Brent Ozark ah man we go put this into as well in the chat if you go to bronto's our comm slash go slash engine you'll be able to get the exact printed out pieces of paper that I use everything in sequel server it actually changes a little when we get to things like columnstore indexes but for the basics everything that sequel server does it stores in 8k pages and I want you to think of them as just like this up here print it out on pieces of paper that if you want to go find these things if you want to go seek out users and show someone's specific information when sequel server runs a query your database engine your MDF file is just a stack of these a.k pieces of paper all lined up in a row if I want to see how much work it is to run a query like the query I just ran select star from DB or from my users I can turn on this option in sequel server management studio called set statistics IO on when I turn this on it's totally safe you can do it in production it only affects your own session from the time that I turn it on I get something interesting over here and the messages tab in management studio there's this it tells me for every page or for every table that I have it tells me how many 8k pages I read that's what logical reads means the number of 8k pages that we read in order to accomplish this query the more you have to read the harder work it's going to be generally query tuning means doing as few reads as we practically can you can also see the execution plan but we're gonna get to that in a minute 7,000 pages it's about 15 reams of paper you remember those dark old days when you had to go into the office and over in the office there was a copier or a printer and you would see people loading in these packs of 500 pieces of paper that's a ream this tables the smallest table in the smallest version of Stack Overflow and it's already 15 reams of paper so as we're executing queries I'm gonna have you come up with an execution plan in plain English and I want you to think of yourself as a human being with 15 reams of paper over in the copier room with all this user data that looks exactly like what we have here now how are these reams of paper organized in the stackoverflow database they choose to do a clustering key or a clustered index on ID that's this ID column starts at one and goes up to a bajillion it's an identity field you've probably done something similar with your own table so you've created a Nike ID column and you've made it the clustered primary key sometimes people will say what column is the clustered index on in our case it's sorted by ID but the clustered index has all of the columns in the entire table it's not like it just has ID it actually has everything well kind of everything over on the far right-hand side for those of you who printed out two pieces of paper there's a little about me column about me is an N vert Kerr max stackoverflow let's people let along write a long letter to Grandma in there about me column if they want you can put your entire resume everything you've ever done well that could be larger than an A K page so what sequel server will do if you get really wordy with your profile is sequel server will actually store your about me on other 8k pages and it'll just include a link a pointer over to those other a K pages where the rest of your data lives up on the page this is called off row data and we're not going to dig into off row data inside the span of this course but I want you to get that went you do a big old and ver care max for example there's a cost to that there's a price that you pay whenever you have to go through and add do those additional lookups to go get those other pages so if I asked you to go select star from users what would your execution plan look like if I asked you to go through and say go find all of the users and read the results back to me well if you're a human being you would go over that pile of paper in the copier room you would grab the first piece of paper and you would start yelling out the answers and you dave says a table scan that's true but think of it first in terms of what we're gonna do is a human being we're gonna go to the top of that paper scan a copier paper and just grab them and start reading out the rows in order doesn't matter what order we hit him in we could start at either end of the stack our goal is just to rip through all of them let's go see what sequel server does in order to see what sequel server does I'm gonna turn on actual execution plans we'll talk a little bit later about the differences between actual and estimated execution plans but I'm gonna go click that button and I'm gonna go run it again and you'll notice that I get another tab when it finishes over in sequel server management studio so I'm gonna zoom in here so that we can see this a little better the way that we read execution plans is we read them from right to left we read them normally you can read them in other ways sometimes during performance tuning problems you may have to read them in the other direction but normally we read from right to left we read in the direction that the arrows point so the first thing that sequel server did was it did what's called a clustered index scan because this is the only copy of the table that we have right now we'll have different copies of the table here in awhile every one of these icons is like a mini program think of it like an app that you installed on your phone and this mini program has inputs and outputs and it works completely in isolation it's almost like microservices the output of this program you can see if you hover your mouse over the plan operator these things are called operators and here's the output list this is the list of columns that we're gonna push up to the next operator and in this case it's just the Select we're just dumping the data out to the end user and that's the end of it the sequel server puts a lot of interesting stuff inside up in here for example how many rows did I read right now I'm dealing with the smallest copy of Stack Overflow so we're talking about 300,000 rows you'll hear that number come up repeatedly throughout the next hour 300,000 rows is how many we read and that's actually how many we output - number of rows read 300,000 number of rows for all executions is also 300,000 we read the whole table we didn't have a where clause on this particular thing now sequel server also guessed how many rows it was going to read we're gonna come back to that in a while but this estimate becomes really important when sequel servers trying to figure out for example what it has to do in which order so the first mini program goes through and has this output it's going to output all these columns because we asked for select star and if you hover your mouse over the arrow in current versions of management studio it tells you the number of rows that are moving through here so I kind of just think of it as like a data pipeline that's gonna move from one Operator to another so how expensive was this query how hard was it in order to scan this table well sequel server makes a guess right at the beginning if I hover my mouse over the Select sequel server says my estimated subtree cost is 5.7 5.7 what well we call them query bucks Kendrell little originally coined the term query box to represent the amount of work that's involved in a query and it's a combination of CPU and i/o work but this is just the estimate that it was done before the query started where's that cost made up of well if you hover back over the prior operator because we only have one in this plan so it's pretty simple you see up here it says estimated CPU cost and i/o cost when you go through and go to the copier room and you scan thousands of pieces of papers you're reading through each of them scans probably a bad word to use since there's also a scanner probably in the copy room but if you read through all of those 8k pages part of that work is you pulling those pages that's IO part of that work is you looking at the pages and deciphering the rows and columns that's CPU cost CPU cost is really low in this case imagine if I sent you to the office supply room and I made you read 15,000 pages you wouldn't be thin enough 7,000 pages you wouldn't be thinking that hard you would just be doing a lot of reading so in that case this is the i/o sometimes when we're tuning things we have to tune for CPU sometimes we tune to reduce i/o but one thing to think about is that this estimated subtree cost is a blended number just gives you a rough idea of how much works equal servers doing in order to accomplish this query a long time ago back in the late 1990s this represented the number of seconds that it would take to run a query on one guy's machine in Redmond these days those numbers are totally disconnected it's just a rough hard coded number of how much CPU work and i/o work sequel server thinks is going to be required this will come in useful when we're doing query tuning but it's also very misleading which I'll talk about in a while all right well it's really not that much fun to run a select star across an entire table there's not really what query tuning is and if our end users ran queries like that we would probably shoot them in the face so instead let's go through an ad to leave it to the American to say something like that so let's clean up this query a little instead of getting all the freaking columns let's just get their IDs and then let's put in a where clause let's say where last access date is greater than 2014 Oh 701 last access date is exactly what you think it is thanks ank X last access date is exactly what you think it is every time that someone logs into Stack Overflow posts a question leaves an answer Stack Overflow updates their last access date so what I'm saying here is sequel server go show me all of the rows hula access the system since July the first now as a human being faced with a copier room with 15 reams of paper with like seven thousand pieces of paper in plain English how are you going to accomplish this query and I want you to type your answer in twitch but what I don't want you to do is say I would build a b-tree I would build a doubly-linked list I would build a short table I would build a hash no no you're a human being with 7,000 pieces of paper in the copier closet how do you do this keeping in mind that the pieces of paper look exactly like what's on your screen here this is the data that you have you have seven thousand four hundred pieces of paper that looked just like this how do you execute this query you just in plain English see what you come up you've already answered so dave says i'm gonna go through each row and keep the ones that we want and then discard but what we don't like Gringo's gringo wants to outsource it he's like I'd like to get someone else maybe I could post it on Fiverr I'm in red says I'm probably gonna read them all the thing to keep in mind is think like a human being if if I said go get me all of the ones that match you wouldn't want to keep them keeping them would mean writing them down somewhere like getting a little piece of paper a post-it note and writing them down you don't want to you want to get rid of them like hot potatoes as quickly as you can so what you would actually do is you would go to the copier room and you would yell from there you'd be like Brent heads up I got the ID number 500 542 546 you would just be yelling them it's probably got a little deafening there over the webcast you would be just yelling them out as fast as you can and that's exactly what sequel server does so let's go through and run this query and let's see what sequel server does in terms of the execution plan so now there's a first time that we have and I like how you say this so I'm gonna call you H there ya and welcome back to it's good to see you again so what you would say is you would put a ruler down through here we'll just to be true sequel server doesn't store things nicely organized like this because it doesn't want to leave any extra white space on the page so the rows would actually be jammed in as tightly as they could fit so you would have to figure out where each row starts but that's not much work for sequel server at all he's finely tuned to do this kind of thing so if each of these is its own little application so the first application runs and the first application is this clustered index scan so the first thing that we're going to do is we're going through the closet is we're gonna read through all of these rows yelling out the answer what's that next one oh this is kind of cool sequel server will automatically paralyze stuff this is just like gringo said gringo said I'm going to give it to other people to help that's exactly what sequel server does depending on how expensive this query is sequel server will Corral in other workers and say alright everybody we're all going to the copier closet together and then sequel server divides out those pages remember how we had like 7,000 pages that were in this table sequel server goes through and passes these out by thread and you can see this if I right-click on this OP on this operator and go into properties I get a whole lot more stuff I don't have to go into here very often during query tuning most of my work is actually out on the plan over here instead but if you want to see every single detail about what every operator was doing you can start to spelunking through here like the number of threads that each operator read so sequel server brought four people all together into the office closet and said you take these pages you take these pages and sequel server divided the work just like you would you wouldn't be scientifically accurate you wouldn't give exactly three hundred and forty six point two pages to each person sequel servers just grab and paper and sling in it out roughly balancing the work you can see that the reads aren't even here by all thread all threads one of the threads read 1,300 pages another thread read 2200 pages that's okay it doesn't have to be exact something else that I find really intriguing is sequel server can't define who's gonna find rows that match because he doesn't know what pages have an evenly balanced amount of people who've accessed the system since July the 1st of 2014 I can kind of see that if I start going into things like the number of rows if I look a little further here you can see how many rows each of the operators brought back not only were the input pages not balanced but the output pages weren't balanced either because how do you know until you go open up all those pages from the office supply closet which pages have so many rows so what sequel server did was it took four people into the supply closet came back out and then there's this parallelism operator oh I just I love these icons because they give you a rough idea of what's happening here the parallelism icon is pulling everyone all the results together from all these different operators so that then we can put it into one stream of output and shove it out the door to the end user who asked for the quest for the data yes have it H says the max top is the maximum deployment to persons in the closet that's actually never thought of that before that's I can appreciate that quite a bit actually and I never said the closet term before either so this actually it's actually might work quite a bit hahaha welcome Ritchie Ritchie's in this morning - so this pulls in all these operators into just one row this is a standalone operator that dave says it is indeed operate interesting signal logical reads on thread zero I just got an idea for a blog post yeah so thread zero as the coordinating thread has to figure out which ranges of pages go to which threads so cool so then on the next arrow all the rows go out the door so it turns out that there are about a hundred and forty nine thousand rows that matched and those 149 thousand rows go out the front door to the person who asked the question so how much more or less work was this than the prior query one way to do it is to put the prior query select star from dbo users with no where clause put these two queries back to back run both of them and compare the number of reads that they did if I go over here to the messages tab sequel server says that the first run red one red 7405 I pages the second one actually read more due to the overhead a parallelism breaking up putting lots of people inside the closet some of the pages are going to be read twice that's okay the results are still accurate I don't want you thinking you got duplicated results the first lesson inside here and this particular query is if you don't have something to make your query easier in the form of an index if you ask for some Rando column and sequel server doesn't know how to get to it quickly like here I'm saying where last access date is greater than 2014 701 I'm still reading the whole freaking table that's where indexes come in and we'll talk about those in a while that's one way to think of it is looking at the number of pages that were read that'll tell you roughly how much work was done another way to look at it is to look at the two execution plans if i zoom in on here the top one if I hover my mouse over here and I look at the estimated subtree cost the estimated subtree costs as five point seven nine then if I look at the estimated subtree cost on the bottom five point eight seven it's actually a little higher because of the overhead involved with building parallelism with sending multiple people into the closet sukur says covering index all day secure I can tell that you're a developer because you want to index everything and unfortunately that's why you're here in the class your DBA sent you here so you sick and tired of hearing you ask for an index when you really need to tune some of your queries now we'll talk about the differences between covering indexes to throughout the course of it so you might compare these two and think that one is more expensive than the other or once cheaper than the other but remember those estimated sub tree costs are just the beginning that's before sequel server actually executed the query Sammy said should actual rows be the same as estimated rows in a perfect world yes but in the world you and I live in not quite so much and through the course of the next hour or two we'll talk about what you do when those s start to diverge when estimate versus actual is way off and how you go about troubleshooting those okay so nice at least here to that we've got this missing index request let's go take a look at just this one query by itself so sequel server has now piped up to say and I always kind of think of these as like Clippy the Microsoft Word office assistant I don't know if you remember but there was this dancing paper clip that used to be in Microsoft Word and he was like hey buddy looks like you're trying to write a resume maybe I can help even when you weren't trying to write a resume you're trying to build a shopping list well cliff he got fired from the Microsoft word team reassigned and now he works over in the sequel server team and he's all hey buddy looks like you're trying to query by last access date maybe I could help Clippy is razor sharp focused just on things like the where clause he's not really concerned as much with say group by or order by more on that later but your if you're following along yours may not have looked like this yours may not have said go build me an index yours may not have said trust err thank you welcome to the club yours may not have said parallelism let's see why these things may not show up what I'm gonna do is I'm gonna right click on my sequel server go into properties and then I'm gonna go into advanced and there's a setting over here that I can change cost threshold for parallelism what this means is that the query must be this high in order to ride the parallelism train remember how when we were looking at estimated subtree cost and i said that these queries were costing like five point seven nine or five point eight what a lot of people will do is they'll change this setting estimated cost cost subtree fur cuff wah-wah cost threshold for parallelism they'll change this to a higher number let me show you what happens when you change it to a higher number like say 50 which is a common setting that I recommend when I'm managing sequel servers I'm gonna change it to 50 and say okay then I'm gonna run the exact same query again and now my execution plan looks totally different I never thought of it that way that's pretty good that's especially when we give him things like table variables so why not make all queries go parallel cuz sometimes you don't want it for example when we were running this query just now it finishes in like a second why bother throwing multiple cores at it if I don't throw multiple cores it'll leave other CPU cores free for other queries and I always want to kind of balance that line I don't want to have one bad sequel query just wreck the whole server I want to walk a balance between letting people get enough resources but not letting them wreck the whole server every time they run a query because you know that one person in the office that writes really terrible queries and you know they have access with SSMS so the query looks different here so here I'm gonna call him Leroy so huh yes and now we know exactly who that was and why doesn't sequel manages cost automatically I really love that question why doesn't sequel server manage this for you automatically because it would be hard and you haven't asked for it Microsoft is just like you at your day job you know how when you build something you build something in your like your managers like I need a proof of concept of this by Friday and you duct-tape something together and it's made out of like Pringles and duct tape and chewing gum and you're like here but you know this is just a proof of concept please don't you know don't don't put this out into production and you know what happens next they're like thank you we're just gonna go deploy that now we're gonna put it in all of our shops later bye and then it goes into production and you're like I really wish I could have coded that done a better job and then later you want to go back to it and do a better job but what does the project manager say hey I got this new feature here I want you to work on this new feature no no forget fixing your crappy code let's work on this new thing that's exact how it works over at Microsoft now y'all are asking a lot of interesting questions around parallelism and Mac stop this is where I little bit of stop here for a second to take a break and say B's I'm really just starting the first learning part of your journey if you go to Brent Ozark comm slash training and I'll put that link over there in the slack I'm really just starting your journey this is literally the first course that you take with me I put it right up at the top left here because it's really just starting your journey and you can watch this again later you can see blog post versions of it too and then we get into things like the fundamentals of performance tuning where I start talking about index tuning query tuning parameter sniffing and server tuning where I dig way deeper into stuff like costs threshold for parallelism max top what happens when work isn't balanced evenly across all cores and much more however I'm gonna stop there I'm gonna put a placemark in that just go if you want to learn more about cost threshold max top and so forth how you tune for unbalanced parallelism how what the symptoms look like that would be my fundamentals of server tuning class but I'll stop there and now we're going to come back to how to think like the engine so I said your your query may not have looked like this your query may have looked more like this where you just had a clustered index scan and no index hint holy cow there are so many things that will make your query look different the number of cores that you have the amount of ram that you have your data distribution your trace Flags your SP configure settings your sequel server version number your build number even every time you apply a patch can change the way that execution plans behave so your next lesson there is that when you're gonna try to tune queries you're gonna have a dramatically different behavior between production and development between production and QA even between production and dr that's normal it's normal to have different query plans throughout these environments because there are just very few shops that can afford to have exactly the same horse power in production and then take that same horsepower and put it on every dev box but I do want to just kind of throw that notice in there that sometimes as you work between different query tuning on different servers you're gonna feed room attica ly different plans here I'm gonna go back to the original default cost threshold for parallelism I'm gonna go back down to just a lonely little v so that as I go through and run this query it more typically represents the kinds of query plans that you're gonna see out in the real world so here sequel server said oh my god I got to scan this whole table it's gonna be all kind I wish use a guy's voice for sequel server cuz he's dumb and stubborn and refuses to ask for directions he's all trust me I got this when he doesn't usually got this so sequel server oh my gosh there's gonna be so much work it's higher than my cost stretch hole for parallelism guys let's all go into the closet together and hold bands or whatever it is that we that we do inside their sequel server paralyze it across multiple cores and that's when Clippy click clicked in Clippy popped up it goes anybody oh man looks like you're taking a lot of people into that closet there you know what sure would help is if you had an index on last access date you could do less work clippy's right and we'll come back to that when we start tuning right now unfortunately I can't do Jessica rabbits voice I don't even know that I could pick her out of a police lineup but uh because it's just because it's been a while since I've seen Who Framed Roger Rabbit and I should really go do that this weekend that's a really interesting way to do it all right so we've got this select star going on select ID going on inside here let's add something else to the query that I bet you do all the time I bet you do order by so instead of just selecting from the are from users I'm also going to attack in order by last access date so show me all the users who access the system since July the first and then I want you to order them by last access date now as a reminder I'm gonna show you the stuff that's inside the table just so that you can build your execution plan you have 7400 pieces of paper that look exactly like this what's your execution plan for this query and Leo that's a great question but I'm not going to go deeper into that that'll be in my fundamentals of index tuning class so here if given this query up at the top of the screen and I'll make it even bigger just to make it easier for folks to see here well pop it all the way up to 400% and we'll put a line break inside there just so that y'all can see so if I go through and say go show me all of the IDs where the last access date is greater than August or US July the first now how do you execute this query just in plain English let's hear what your execution plan is now remember ya 7400 pieces of paper over in the closet you can't change those pieces of paper you can't write on those you can hold on to them for a second but you can't write on them ya so you get all that data yeah I get that how are you going to do it you're a human being be more specific than that so thanks Jason and thanks a tech guy I'll give you all some time here how are you going to get them all sorted by last access date and it'll help to look at the pieces of paper to look at the table because there are users multiple users on the same page now red nails it read all of the pages write down all of the records and then sort them by afterwards yes now night-night hands even closer and I can't says just write down the user ID and last access date you are gonna take your friends over into the into the office supply closet you're gonna hand each of them a stack of paper and then you're also going to hand them empty pieces of paper maybe you're just gonna go steal them from the office supply closet while you're in there as you're reading through 7400 pieces of paper you and your friends are gonna write down the ID and last access date of every row that you find now some of us in here are developers might say well I'm going to sort as I go that might make sense in some situations but not in a database server where we're dealing with everything as a standalone micro service so let's go see how sequel server does it let's execute it let's go look at the plan and now the first thing that sequel server did was scan all these rows and as he was scanning them he only output the rows that match remember this whole table has 300,000 rows in it but he only outputted like a hundred and fifty thousand because sending your friends into the office closet that's one standalone operation if I hover my mouse over that you see how there's a predicate down here predicates says as you're going through the office closet these are the only rows you're looking for I only want you to return these rows and when you're doing an output here's what I want you to output sequel server has to do things like estimate how many rows are gonna come back so that he can plan the rest of his work then this little arrow right here what this means is this is pushing a hundred and fifty thousand rows up the streams then each of your friends is also going to do a sort so the first mini program just yells out a hundred fifty thousand locations the second mini program goes back through here and sorts them by last access date because that's what our order by was on then finally the data gets coordinated across all your buddies who went in the supply closet Joe says is a predicate the same as where yes a predicate basically means here's what I'm looking for I will say you can get predicates even without a where clause we'll talk we won't talk about that in this class we talk about that in mastering index tuning okay so how expensive more or less expensive was this one than the last one we've got two ways we can measure queries remember we said we've got reads so how many pages did it read to see that we'll go look over here and it says we read 7800 pages give or take so we read the whole table another way that you can see that is if I look over at the plan if I hover my mouse over the clustered index scan there's a thing up there man you know when when Microsoft paints rows are paint stuff on a tooltip they seem to use one of two approaches they either put it in alphabetical order which doesn't really make any sense for query tuning especially when some of these have actual some of them don't some of them say estimates and some don't it's not like there's any consistency inside here they either put them in alphabetical order or else they load up the data cannon and they fire it at the screen and wherever that data happens to randomly spread that's where it spreads this is an instance of the data cannon where the data's just scattered all over here in a way that makes no sense whatsoever but the thing that I'm gonna look for in order to see whether or not sequel server read the whole table is up near the top there's a number of roans rose-red thank you for following epic fail drone I love that name that's just fantastic epic fail drone makes it make me think that you're gonna have good videos inside there so number of rows read 300 thousand that means we read the whole entire table so in terms of reads we spent about the same time in the office supply closet in this operator but not the next operator the next operator is brand new that's the sort and look at his cost who's loving the data cannon of course someone with a name like epic fail drone would love the data cannon so here he has a cost see how it says cost is 99 percent this is like 99 percent of the plan because writing down all these freaking user IDs and last access dates and then sorting them is gonna suck rocks so this is what's a big huge giant new piece of cost if you look at the CPU and i/o costs up here you can see that they're so much higher than the cost of five that we had before because sorting is hard work it's hard work in terms of CPU so not only is this query cost higher if I hover my mouse over the Select you can see that the estimated sub tree cost is now more than twice as high by adding an order by now our estimated sub tree cost is twelve query bucks we went from like six query bucks to twelve query bucks but another thing that's different is the amount of CPU work required in order to do this now sukh here that is a really good question and I'm gonna hit that right after I talk about the well that's I'm gonna hit it as a combination of here and what am I talking about over the next like 60 seconds it's a great question I absolutely love that question and it's it's beautiful because it really impacts the way that sequel server builds execution plans too so over here in stats I get reads but I don't get CPU time I can add that though I can say set statistics time on I don't do this often because time is so changing and I don't just mean like from a medical metaphysical perspective I mean that sometimes queries will take a little longer a little less time it's not as predictable as logical reads so it's not usually something that I tuned for but in this case it's gonna tell a story let's start with just the Select without the order by so with no order by we're talking about if I scroll down through this list sequel server spent a hundred and forty milliseconds of CPU time executing this query so I'm gonna write that over there and chat just so that we've got it then I'm gonna go run the query again with the order bye so let's go tack on the order by and hit execute and look at our messages again and let's see how much CPU time was used huh it went up now do I worry how about like 70 milliseconds worth of difference not usually I mean I've worked with companies who do you take a stack overflow stack overflow cares when you add on 70 milliseconds of CPU time for query most most shops really don't gringo that's a good question what's the difference between CPU and elapsed time CPU time is how long you were how hard you worked and elapsed time is how long it took you can feel cases where CPU time is higher than elapsed because queries go parallel across multiple CPU cores if you use four CPU cores for one second for example you'll see four seconds worth of CPU time only with just one second worth of elapsed time the reason why elapsed time is so high here is because sequel server management studio sucks at painting a hundred and fifty thousand rows into a grid this is not what sequel server management studio was designed to do is just not a fast client so if you do this kind of thing inside an app it'll end up going away faster so what we've seen here is that this query has higher CPU time in order to do its work because the order by is so much harder for CPU this is where sook veer tsuki I me and your name is so cool now I got to go see if I click on it it doesn't say super I'm just gonna call them severe if you want put in your pronunciation that you want cuz I've seen you a couple times I want to make sure that I get your name right I suck really bad at names but it's I can remember pronunciations so I'll give myself points for that no severe oh yeah I got it right oh I got lucky [Music] so severe your question was so good because imagine that you sent your friends into the supply closet and I'll come back to the execution plan imagine that you told them hey go scan through all of the rows and I want you to write down the rows that match well you wouldn't want them all to hand unsorted pieces of paper to you that would suck you want to delegate that work so what you have them do is run these two mini programs first do this work of writing it down then go sort all your rows then when you turn your work into me so that the icon on parallelism where it's showing all the work coming together as you turn your work into me yell out your results in order all correlate are all combined the results from each of you so that we're streaming them all together in one sorted list I don't have to sort them all again I just have to get them to behave severe says this sounds quite close close - Tim sort I don't know who Tim is or what kind of sorting that he does that's not true I know a bunch of guys named Tim I just don't know how they do their sorting this is also why so I didn't I didn't get my start and I should stop for a second and say that so in terms of my own personal background I didn't get started with conventional going to computer science degrees type stuff I just learned on the fly doing database work and went to all kinds of training classes around the world as I went through and did my database work but there's so many computer science terms that I feel like I missed out on you a doubly-linked list all kinds of stuff like that now where I do I'm like okay I kind of sort of know what that is but the rest for the time I'm winging it talk about impostor syndrome I totally have impostor syndrome out the wazoo all right so back over here on our demo so the way that you tell how work was broken out is look at the parallelism stream operator everything down strips I'm going the wrong way everything downstream from the parallelism operator was all done across multiple threads so and you can even follow it up the tree if you look at the work done by each operator if you right-click in on the sort and you look at the number of rows for example here you can see the number of rows that each one of them sorted so you can follow that work up the tree in a perfect world sequel server balances it out in a real world sometimes the stuff gets unbalanced and for those of you who are production DBAs you might have heard the term CX packet for example which involves unbalanced parallelism this is one of those cases where it can happen so the order by made it way more expensive the order by made this thing more than twice as expensive in terms of query buck cost it made it about 50% higher in terms of CPU time but when we were executing this query I specifically said what did you have to write down and you said you were only going to yes gringo says is for a thread 0 manager yes it absolutely is and you know how you could tell that it was a manager because it does less work than the rest [Applause] like that like that pick that up on the fly he also my employee is in here so that's Ritchie and Ritchie we'll be glad to tell you that I do less work than he does I love that camera transition feels like I'm watching sequel server the game show next up who wants to lose their query box so we got this query cost now we've added in an order by what we'll do is we'll stop here and we'll take a bio break we'll stop here for like say about five minutes we'll stop for about five minutes and then when we come back I'm gonna change one thing about the query I'm gonna change that select ID to select star and I'm gonna want you to tell me how that query is gonna behave differently so I'll see y'all back in here in five minutes
Info
Channel: Brent Ozar Unlimited
Views: 21,070
Rating: 4.9904532 out of 5
Keywords: sqlserver, brentozarunlimited, brent ozar, mssql
Id: oovA_9fiXVE
Channel Id: undefined
Length: 50min 31sec (3031 seconds)
Published: Sat May 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.