Brent Ozar_Watch Brent Tune Queries

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right everybody we'll go ahead and get started so my name's Brent azar and I'm going to be talking about a pretty self-explanatory session this time around watch rent tune queries the title says everything there if you like the title you're gonna like what you see over the next hour that's a really large camera that you've got there that iPad I see you yes so a little bit about myself in case you haven't heard about me before I wouldn't blame you I'm a kind of person who does a whole lot of work with Microsoft sequel server I hold the MCM Microsoft Certified Master and Microsoft most valued professional which just means that Microsoft thinks I talk a lot about sequel server and that I know a lot about sequel server my wife will be the first one to tell you that only one of those two is true i hol love to talk a whole lot but I don't really know very much at least that's what my wife says I also started DBA reactions calm which is just a fun place where I like to joke about all the horrible things that happen to me as a database administrator today I'm going to be talking about my query tuning process everything that I'm going to show you is all there on that link so you don't have to take any notes down for anything that's inside the demo scripts that I'm going to be showing you or in the slides that stuff's already up there and you can download it whenever you want that includes everything that I'm going to talk about here except when you ask questions if you ask questions and you hear something that's unique and interesting you may want to write that down because that's probably not in the site now my query tuning process revolves around something that I call be creepy I had to come up with a list of terms and an acronym to hold it all I'm not really a creepy kind of person I'm kind of goofy but this lays out everything that I think about when I go off into inquiries I am NOT going to do all of this stuff here today because sometimes you already know something about the server that you're trying to tune let's step through these in order first starting out with the B whenever I'm looking at a sequel server that I've never seen for I'm going to go off and run SP blitz and SP blitz index on it because a lot of times the performance problem on the server doesn't have anything to do with the query it has to do with somebody completely screwing up the sequel server configuration so I like to start there because I don't want to take anything for granted same thing with running SP blitz index I just want to get a quick health check across all of my indexes because maybe there's a screamingly obvious missing index that if I put that in it's going to fix all kinds of queries not just the one you won't be running this every time when you do a query tuning event but you want to at least know that your sequel servers are healthy before you start working on a specific query I am also going to just look to make sure this query doesn't do any writes if it does any inserts updates or deletes I'm going to be tuning it over in a development or staging environment instead today I'm going to do it live I'm going to do it all against my one sequel server second thing that I do is gather end user requirements I need the users to tell me what is fast enough of course end users want to say things like as fast as you can make it well I can spend the rest of my life performance tuning a single query give me an idea of how much time you want me to spend before I give up on this thing for example is it okay if it runs in five seconds five minutes or do you want me to take an hour and just make it go as fast as I possibly can during the course of that hour - sometimes even just tweaking one tiny thing about the query will suddenly take me across the finish line I also want to know why the end user is bringing me this query or why the developer is bringing me this query sometimes I have to make it run faster sometimes I have to get it out of the way of other queries like I'm doing blocking on a particular query then I also want to ask things like do we even have to it on our production sequel servers or is there somewhere else that I could run this query like a reporting server I want to know what the end users purposes for the query output because sometimes they do stupid things like querying every row in sales detail when I already have a report table that gives them exactly what they want I just want to look at the query and say tell me exactly what you're doing with this here and is there another way to get that data next up I'm going to go capture the queries metrics I'm going to show you the things that I turn on in sequel server management studio to tell me whether or not this query is hot or not so to speak I'm going to make sure it's not a parameter sniffing problem and I'm going to pop open a couple of different windows in SSMS so that I can compare my query before and the query after because sometimes I'm going to be changing the database along the way and I won't be able to quickly get to the original query plan as I'm tuning this thing oops went too far next up I'm going to run the query and I'm going to look at the metrics that come out of sequel server how much CPU work is involved how much time is elapsing on the clock does the work that it's doing actually match up with the query results is it taking ten minutes to generate one row maybe indicating that I have something else going wrong on the sequel server at the time next I'm going to experiment with the execution plan I'm going to change things about the query that may suddenly make it faster one of my favorite things to do is look to see if there's an order by and the query and simply comment that out if suddenly I cross the finish line and it runs fast enough I want to start asking tough questions to the users to say why don't you sort this in the application instead sequel server CPU cycles are the most expensive software that we have in the house usually unless we're running Oracle in the US it's seven well right you know we have to make a joke at somebody's expense in the u.s. it's seven thousand dollars you per core for sequel server Enterprise Edition and that's a lot of money that buys a lot of application servers I'd rather farm that CPU work out somewhere else I'll also change the select statement which sounds kind of goofy I will just change the select to have the number one and not have any of the fields in the query because this way I might be able to use covering indexes that don't include all these big crazy ver care Macs or text fields that I don't really want to drag back if suddenly that takes me across the finish line then I turn to the end user and say all right Bob tell me more about every field that you're getting back from here and do we really need all of them or is this an ORM that just does a select star every single time it runs a query next up I'm going to review the execution plan a little bit more in depth and I'm going to start reading the execution plan from the top right you're going to see me do this a couple of times throughout today I want to start there because that's usually the first thing that sequel server does incorrectly the whole reason you're holding this query is because something went wrong the sequel servers estimates didn't exactly match up to what it actually got back so we're going to have to figure out where things went horribly awry and sequel server then if I can't get it to go faster I'm going to look at parallelism can I divide this work out across more cores this starts to get much more difficult we won't be going into that today and then last but not first I'm going to go look at changing the database to match my query normally I want to fix my broken T sequel first I don't want to go about wrecking the database every single time I write a bad query I know a lot of folks jump off to the database tuning advisor every time we have something running slow I just want to try to fix the query first and then make the database changes the last thing that I go off and consider this whole process the be creepy process is how I go about tuning queries I'm going to skip a lot of this here today in the one-hour session but in the full links that you can go download you actually see me stepping through all of this in the code comments and talking about the order in which I do it so now let's jump in and actually take a look at sequel server see if I can remember how which one I'm on maybe it's six fancy yes small victories alright so over here I have sequel server management studio and what I do for demos is the stack overflow database how many of you guys know stackoverflow.com it's where you post questions about the hard things you have to do at work someone else answers them and you get to take all the credit right that's how my job works the cool thing about stack overflow is that they put the whole entire database out there available for the public to download and I give you instructions on how to go get it so that you can play around with it for demos on your machines I hate adventure works I never want to see another bicycle sales company demo again as long as I live that company sucks Stack Overflow is much more fun to query with Stack Overflow it has things like comments that were given on posts it has posts which include all questions and answers users which is anyone who's ever asked a question over on Stack Overflow it's a pretty simple straightforward database structure now where am I going to get a crappy query to tune well I'm going to go over to something called data dot Stack Exchange calm at data Stack Exchange calm whoops you can go off and run your own live queries against the Stack Overflow database it's actually SSMS on the web so I'm going to click on Stack Overflow and I can click on other questions that people have asked such as sam saffron one of the developers over at stackoverflow says find all of the interesting but unanswered questions when I click on this presto here is one of Sam's actual queries and he does things like create a temp table populate data into there get data back out whenever you want to find ugly queries to practice your tuning capabilities this is an excellent place to get it because these are real-world developers writing really world ugly queries myself included I have ugly queries on there as well now what I can do is I can actually put in parameters here run the query and the thing that I find interesting is so it's actually running the query against a read-only copy of the stack overflow database we host the databases in sequel server 2014 there's this check mark here for include execution plan this is just like sequel server management studio now I see the results of my query over here on the left-hand side I get messages just like you would get in sequel server management studio you can even play with things like set statistics i/o on and I get the execution plan even right here in my own web browser so I can and it looks a little bit different but all the basic ideas are the same there I can hover my mouse over individual operators in the plan and I get the same pop up tooltips that you would get in sequel server management studio if you want to download the query employer the download the plan and play with it locally you even get a download XML button so you can open up the plan and sequel server management studio and drill into more details now for today I am going to be using one of Sam saffron's queries this demo script that you see here this is what you can go grab off of brento czar comm slash tune queries and or / go / tuned queries and I've got all the instructions in here about how to download ak overflows database I also have links to the queries that I am going to tune so this particular one is by one of the stack overflows original developer Sam saffron and it goes out and find posts in Stack Overflow that are very controversial I'm going to turn on two things in sequel server management studio one I am going to click query include actual execution plan go back up there and highlight that again so I'm going to include the actual execution plan because estimated plans are actually pretty crappy if the estimates were right your query wouldn't be running slow in many cases the estimates are wrong that's why you're getting slow performance now this isn't perfect because sometimes queries take several minutes in order to run but I really want to get that data the other thing that I'm going to turn on is set statistics IO on and I'm going to show you what that does here in a minute so we'll go ahead and execute those pieces and now I'm going to execute Sam's actual query now while it runs because it's going to take a minute in order for this query to run I'm going to walk you through a little bit about what this thing is doing Sam is saying go create a table full of vote statistics and put in there all of the votes that are either up votes or down votes for questions all up votes have a vote type ID of two all down votes have a vote type ID of three after I've aggregated all of the votes for every post in the system go find me all of the posts where they've had more than half as many down votes as they've had up votes like people have up voted at a hundred times and they've down voted it at least 50 times now it makes sense I get what he's doing but this is not fast you'll notice it took 48 seconds in order to run and let's say that Sam's come to me and said I need this to finish in five seconds or less because we're going to show this on the front page of Stack Overflow or whatever else I need it to be very quick well alright let's go see what's going on with this query and why it sucks so bad it can be tempting to jump over into the execution plan and start looking around and of course there is a big missing index warning but I don't want to hit the missing index first that's not the first thing that I'm going to go do I want to change my database to match I'm sorry I want to change my query to match the database not the other way around when I want to know how much work a query has done one of the things I can do is hover my mouse over the root node of the query and I can look at the line that says estimated subtree cost a long time ago in a galaxy far far away this meant the number of seconds that it would take for that query to run on one machine and building 35 at Microsoft those days are long gone that was the late 1990s now it's just a general measure of CPU and i/o costs it doesn't map up to seconds although you can kind of think of it as this part of the query would have taken 350 seconds back in the late 1990s this is an estimate and queries with an estimate in the hundreds are not going to be quick in most cases so notice that it's got two parts of the plan here the insert is the first part he has an estimated cost of 356 the select down at the bottom if I hover my mouse over that he has a very tiny estimated cost of point zero one so I could add both of those up and say that this query cost is about 356 but that's estimates and they're totally wrong check this out if I instead switch over the messages tab of sequel server management studio I get this is what turned on by set statistics IO on I get the number of page reads that were done for every part of the query one logical read equals reading one eight K page everything in sequel server is stored on disk in 8k pages and so I had to read hundreds of thousands of them if I look at the posts line down here so at the beginning I read about seventy-five thousand pages out of some work file 78,000 out of the votes table and then three million almost four million out of the post table I suck at math I is why I'm not a developer anymore I also suck at debugging there's a long list of things that I suck at and my wife will gladly tell you about all of those as well so rather than trying to add those up what I'm going to do is highlight all of them right click and copy then I'm going to switch over to a different web browser I'm going to switch over to a web browser going to something called statistics parser comm statistics parser comm is this really cool website that was written by a guy named Richie rump that is actually his name he wrote this while he was sitting in one of our training classes see how all you people are sitting around not doing anything you're lazy Richie rump is hardworking now you get to benefit from Richie's hard work all I have to do is right-click in this window and click paste and when I click on the parse button now I get this nice little GUI here that shows me totals for all of my logical and physical reads pretty nice no matter how many queries you have these big long stored procedures you get this nice GUI that you can sort by and quickly add up your totals now I mentioned that logical reads is reading one 8-k page some folks will say well I want to focus on physical reads which means when I read things from disk you can never predict what's going to be in sequel servers cache at any given time your goal doing performance tuning is to reduce logical reads just the number of pages that I read and that will improve performance that is not the only measure of performance there's also CPU time and the run time of the query but when I'm first getting started with query performance tuning on one query this is the metric that I'm going to use to drive down so this query it's currently reading 4.1 million 8k pages that is what in America we like to call a lot because if I had four million dollars I would have a lot of money I don't so I am going to leave that window open over in statistics parser and I may call this original query for example now when I do that this way I can go tab over to a second window of Statistics parser and I can jump back and forth so that I can remember what my originals were over in sequel server management studio now I kind of have an idea of the work that I need to do I need to drive those reads down let's look at something over in the top query the first one that ran I read about a hundred and sixty thousand pages in the bottom I read four million pages but when I look at that execution plan I'll zoom out a little to make this a little bit more obvious here are the two parts of the query the first one sequel server says query cost relative to the batch was a hundred percent whereas the second one oh that didn't cost hardly anything at all that's zero percent what we saw that it read four million logical pages in the second one here's the thing even when you're looking at actual queries that number is still the estimate so in order to figure out what's really going on what you have to do is start hovering your mouse over the top right hand thing in the query we like to say the sequel server execution plans suck they suck the data from right to left over here from this table scan sucks it over into the Select statement and the top right thing is the first thing that we did so if I hover my mouse over that table scan what sequel server says though I could point over here I suppose I always forget about this one behind me estimated number of rows is one actual number of rows is a little bit more than one this is why sequel server thought that that second query wasn't going to take anything at all because it only expected one row to come back that's why our estimates are wildly incorrect this is a known thing with something in our queries the table variable if I switch back over and I look at my query my query had this insert into the Vote stats table that votes stats table variable I've got some bad news whenever sequel server sees a table variable it always expects that only one row is going to come back what so it's going to make exceedingly bad estimates about the rest of the query plan that wouldn't be a big deal if I was only selecting from the table variable but I'm joining to the post table in here so now sequel servers made extremely bad decisions when I go back over and I look at that execution plan it was wrong enough at the beginning when I hover my mouse over the vote stats table its estimates were way off but that also influences everything else it does in the plan see it estimated that it was only going to have to do this clustered index seek months because we thought only one row was going to come back from vote stats but actual number of rows was a little bit higher than that as was our s our actual number of executions so when I'm doing T sequel tuning and I see a table variable the first thing that I'm going to do is just rip that out throw it away and replace it with a temp table no other changes to the query just change that and the two measures three measures that I'm going to look at to determine whether or not it worked was how many reads I did am I reading more or less data and did my estimated costs go up or down right now we know that our estimated cost on this first query is like 356 the estimated cost on the second query was like point 0 whatever so let's go rewrite this query a little bit and see if we can make it better now what I'm going to do though is because I already have this execution plan in here I don't want to lose this so what I'm going to do is I'm going to copy out the rest of this query and I'm going to go paste it into a new window in SSMS because I want to always be able to jump back and forth to look at my original query execution plan if I change the database I may not be able to get that plan again now when I'm working at home on my big ginormous monitors I don't want to brag but that's like my one luxury that I like to have I will also right click on here and say new vertical tab group so that over on the right hand side I have my original query and my original execution plans but over on the left hand side I have the new one that I'm working on makes sense we're not going to keep that open here because I only have so much screen real estate I'm going to move this guy right back into the same tab group so that he's all piled in here nice so now I'm going to make one change to this query and when you're following along at home if you decide to go through and walk through entire demo yourself you'll see that I actually put my thoughts into here I type out the query as I'm tuning it this is just for you to follow along and have fun I'm going to go execute this and I'm going to explain what the difference is in this query while it runs so I'm going to execute him and all that I did to change this query was switch it from a table variable to a temp table I didn't do any other changes than that it's still not finishing quickly so I clearly haven't crossed the finish line yet if I want to look at what the server's doing while it runs I can run this stored procedure SP asked Brent see I wrote this because I got sick and tired of people coming and asking me is the server slow right now or not we'll just go look so here we can see in here oh the other queries already finished we'll come back and hit this one one another one of the long queries is running but it tells you things like what wait stats are hitting the sequel server right now and which data files and log files I've actually done reads and writes from we'll come back to that guy now this query that we changed and went with our temp table it still took 48 seconds to run so when I'm measuring in terms of time this did not get me across the finish line if I go over the messages tab now I'm going to copy/paste all this out and I'm going to jump over here into SSMS or into a statistics parser parse it in and I'm going to see if that made any kind of difference it did not I still have that's actually not right did I run the right query usually it does make a difference hold on a second let's make sure I ran the right one oh I'm not even on the right window hold on a second here here's the one this is the one that we ran he took yeah that looks a little different so he ran now in half as long he ran in 22 seconds this is the fun of doing demos on a wire you never know how this stuff's going to work 22 seconds which is better I didn't get the actual execution plan I to turn that back on but what I'm going to do is I'm going to copy out oh I didn't even turn on set statistics aisle onset statistics IO on set repentance as a presenter on so go ahead and rerun this oh well it reruns let's look at SP ask grant so what it does is take a five-second sample of your perfmon counters wait stats those kind of things and it tells me aha we're dealing with CX packet weights right now in this five-second wait I'm paralyzing things like crazy notice that I'm not doing any reads and writes against user data files like stack overflow database that's because the stuff that I'm querying fits in Ram although I am having to dump data out to temp TB and we know why because we're creating a temp table so if I go over here to my query that finished he again finished in around 25 seconds so let's go off and look at how many reads he did if I copy all this stuff out from set statistics and let's go look in Richey rumps statistic parser comm now when I look at how many reads this query did dawned on Tom we went down from 4 million down to 170,000 now time has gotten better it's still like half as or it does takes half as long as it used to take but that's still not enough for my end users they want it to go even faster but if my concern was blocking on the post table I might have just crossed the finish line already because look at the number of logical reads that I did on the post table it went down from 4 million down to just a thousand I dropped logical reads by a ton and I can see why if I go over and look at the execution plan on this query yeah sure I still get that nasty old clustered index or the create non-clustered index warning but if I jump over here and look at what's going on on my temp table before when I hovered my mouse over that table variable sequel server gave me an absolutely smokin crack estimated number of rows before now I get something that's kind of better now it's estimating three million rows are going to come back and actually 1.3 million rows came back so I'm still way off but at least I'm reading less pages things are getting a little bit better and what about my query costs if I hover my mouse over the insert before my estimated subtree cost on this was 350 some also before that smaller query down there used to be point zero one now it's a hundred and ninety one that sounds awful right if I was gauging by subtree costs this query looks like it would have gotten worse but is that really what happened no what happened was our estimates are now more accurate now sequel server says oh you are going to have a lot of data in that temp table when I go off and query it now I'm going to estimate the cost to be more accurate so my cost numbers go up but that doesn't really mean anything because my queries finishing faster and doing less reads this is why we can't trust estimated execution plans when we're doing query tuning yes if you do have one of those queries that takes an hour to run you're not going to be able to go get the actual plan but otherwise you need to go get the actual plan because these statistics wakko parts the only way you can detect them is to go get the actual plan so let's go take a look at that query and see is there another way that we could make this go faster if I look at this oops let me jump up and show you the first one so what Sam did in his original query is he said I want to go create this list of votes and I want to dump in all of the votes for every post regardless of whether or not it was contra Hershel but look at the where clause in the second query the where clause in the second query says I only want you to show me some of the posts specifically the ones that have more than half as many down votes as they have up votes why am i inserting all of the votes into a temp table if I only want some of them I know if I was an attendee in this session I'd say Brent you made up that horrible query but I know you've seen queries just like that and this is the one that one of the developers at Stack Overflow wrote these are not stupid people they are very smart people but sometimes they're just pressed for time right I know I write crappy queries any of you who have looked at espy Blitz you probably had to wash your hands after looking at that because the code in there is horrific but it gets the job done so what if we moved that filter up to the top of the query now I'm going to go ahead and execute this and I'm going to explain while it runs so while this guy runs set down my little divider bar so we can see things a little bit better what I've done here is I've said move that where Clause kind of stuff up into the top query and now I don't have to worry about any of the up votes or down votes stuff in the second part now I'm saying only put things into this temp table where they actually have more than twice as many up votes and down votes man rewriting queries like this it kind of sucks this is not readable Sam's version was so much more readable because he just said things like up versus down the faster you want queries to go the less legible they usually are I tell people to write for readability first make the query easy to understand first and then worry about performance after you get it to compile and you get the results that you want because after all three months from now you're going to have to go back can read this query and figure out what you were doing back then and whether or not you were sober now here we have so this guy went ahead and ran and you'll notice it still takes about 23 seconds dang that means I haven't crossed the finish line yet let's go over here and look at our messages out of stat statistics i/o I'm going to copy all of this out to see whether or not I'm doing more or less logical reads before I was doing a hundred and sixty nine thousand logical reads let's see if this one is any better or worse let's copy all that out and parse now if I go down to the bottom a hundred and fifty five thousand logical reads it doesn't seem like much but query tuning is a game of inches I'm going to keep making the small inch by inch improvements as I go through here and tune this thing ten percent plus ten percent plus ten percent pretty soon we're talking about real money so let's go see what else we can change about this thing if i zoom back out of here and I still haven't touched indexing on the table either I just don't want to jump there if I can avoid it what if I used a CTE what if I didn't even put things into temp DB at all what if I just let sequel server figure out where it needs to store the data let's go see what happens I'm going to execute this and in here what I've done is basically put that first insert statement instead of creating a temp table I'm just telling the sequel server hey build this and call it vote stats then out of those resulting vote stats go ahead and do this query now clearly you don't have to be a rocket scientist to understand we still didn't cross the finish line but things are a little bit better usually anytime I can avoid the public toilet of sequel server aka temp DB I'm gonna do it there are all kinds of nasty awful things happening inside there dirty things that you don't want anything to do with so if I can reduce the reliance on that great let's go see what oh and now look now before we had two sets of operations in here one set at the beginning when we populated our temp table and then a second set down below now we have just one that if I was any brains if I had any brains at all I could probably just add those up but I don't so I'm going to go back over to Richey rumps site again so we were looking at a hundred and fifty five thousand seven hundred and thirty-four logical reads I also like this little clear results button because sometimes I'm not sure whether or not I parse pasted the right stuff so I'm going to paste that into here and click parse now if I go down do-do-do-do 138,000 logical reads it's these small gradual improvements over and over but I still haven't crossed the finish line in terms of time so let's finally take a look and start looking at the indexing on this particular table if I scroll down and look this is the index that sequel server keeps recommending in the plan this top one here we'll go ahead and look at that execution plan - just so that you can see that if I go down here into execution plan sequel server is telling me that hey if you added this index this query would be 35% faster that's what that impact number basically means I don't usually like to jump to indexing but let's go see what that table has let's go see what it has in terms of non-clustered indexes so if I go over the one it wants it on is the votes table let's go look in object Explorer at the votes table and well what do you know I only have the one clustered index generally my rule of thumb is that I want to aim for five or less non-clustered indexes per table with five or less fields in them I'm okay with adding more non-clustered indexes if I absolutely need them that is my last resort doe I don't want to jump to that first but this table hasn't had the tender loving touch of a database administrator so let's go ahead and give it a little of that gentle touch there may be it's usually a mean touch usually slaps around a lot not big on database violence but there's probably a joke in there so up at this is why I can't go at it so I can't go off the cuff I get myself in trouble so sequel server tells me that it wants me to create it under name of missing index insists name I am going to create my index with the fields that it's on I like calling it by the fields that it's on here I'm going to name it Vogt type ID and includes I don't name out the specific include fields I just want to tell the person reading the execution plan this index has additional includes on it so if I go look now in a second this thing's going to finish and it's going to take like 30 seconds in order to create the index because that's how long it takes to read this part of the table if I wanted to I could go over and look at SP ask Brent and it's just like looking to see if your dinner is done as soon as you get up and go to the bathroom see it happens every time as soon as you go to the bathroom in the restaurant the immediately the waiter brings the food right whenever I run SPS Brent other processes finish and it's not because I'm killing them in SPS Brent although that's an excellent idea so what did it was it went ahead and created that index so I'm going to go ahead and run the exact same CTE query again I'm going to run my CTE query and go off and see how long it takes before we were in it up at about 20 seconds although I do know that everything was in RAM when I was running this that may not be the case since I just built this cluster built this non-clustered index hey so we got it down by a little bit more now we're down at 6 in seconds but let's go look and see what that means in terms of logical reads if I go out and copy my logical reads back out and I jump over into statistics parser before before we created the index the best we pulled off was about a hundred and thirty-eight thousand let's see whether or not we did better than a hundred and thirty eight thousand clear results delete and then paste that stuff back in now when I come down and look whoa it's actually worse this isn't unusual I'm going to keep digging through as I go through and slice and dice queries to make sure that now I don't have another index or something like that that I have to go shred let's go look at the execution plan oops so if I look at this execution plan sequel server has now given up before I had a missing index warning now sequel server is kind of waving the flag and saying from here on out it's up to you I don't have anything easy in the DMV's for you to accomplish now I can slide across here and see it did indeed use that brand new index that I just created so maybe it's better but it's still not so good when you start playing around with sequel server execution plans you'll also notice there's helpful little yellow bangs in there on some of the sorts for example this is one of the cool new features in sequel server management studio 2012 this is one of the reasons why I'd recommend everyone always run the latest version of sequel server management studio Microsoft gives it to you for free even if your company pirated sequel Server 2005 anyone is allowed to use the newer sequel server management studios and if you go to I don't have this in the links but I'm going to throw it on the screen here real fast if you go to brento tsar comm slash go slash get SSMS that is the link to download the latest version of sequel server management studio for free that way you can run as a semester 2014 on your desktop and get advantage of all kinds of cool things in the execution plans there like if I hover my mouse over that sort down at the bottom it says warning operator use temp DB to spill data see here's the thing when you run a query sequel server does the best job it can of guessing how much memory your query will need that number is calculated and it's set in stone and it does not change even if you bring back more data even if no one else is using the server right now Cacique will server always assumes that other people will start using the server at any moment so even if you've got 64 gigs of ram you don't get all of that memory in order to handle your query you can see in there if I right-click on the plan so if I right-click on the Select statement and I click properties you get this little pane over on the right-hand side that you always close because your boss bought you two really cheap monitor you really don't have any space I know how it is and I get this section called memory grant info which is how much memory I got in kilobytes in order to execute my query I am NOT saying you should go off and troubleshoot this to try and get more memory for your query because if you give every query more memory you're going to have less memory available to cache data and you can run into a problem where sequel server won't even let more queries start because it doesn't have enough free memory to run those other queries good friend of mine was looking at a sequel server running into these sort warnings and he said man you know there's probably an easy way to fix this I bet if I go into server properties and I go into advanced I bet there's something called a minimum amount of memory per query can never remember oh yeah yeah this sounds great so I could say that every query gets at least 10 gigs of memory and of course you can guess what happened next every time a monitoring system connected to his machine every time even just the simplest query ran he quickly ran out of RAM and could never cache any pages in memory let that be a warning to you don't be a friend of mine all my friends are like that cheese I don't understand what's going on so if I'm going to go through and make this thing go faster one of the things I want to do when I'm looking at this execution plan is say what are the sizes of these bars really mean if I hover my mouse over it it's kind of easy to see I can see how many rows are involved in how much data is coming back SSMS really hasn't gotten any better about this since sequel Server 2005 there's another really cool free tool if you right-click on the plan and click view with sequel century plan Explorer you don't have this in yours most likely let's see a show of hands how many of you have this in yours Lou how many of you don't have it in yours oh you don't feel bad it's not because you're not bright you just weren't hanging with the right people you are now so if you click on well if first you have to go download it it's totally free they have a paid version as well you don't even need the paid version just start with the free one as long as you start with the free one watch what happens when I click on View with sequel sentry plan Explorer it's going to launch sequel century plan Explorer on my machine Gordon my luck there comes in the background now this is kind of a neat way of viewing execution plans I'm going to zoom in and out here to show things a little bit differently so up at the top I have the query that I'm dealing with if you have something like a stored procedure that prints a hundred things in it you'll have a hundred lines up in that top part so you can jump to each part of the stored procedure then in the middle I have the exact text of the query that I'm dealing with down at the bottom I have my execution plan notice it crams in a little better so that you can see it on smaller monitors and it shows right over the tooltips exactly how many rows we're dealing with I don't usually tune in terms of rows I want to know in terms of data size so what I do is I right-click anywhere here on this plan and I show line widths by data size so now I can see exactly how much data sequel server is juggling around well sure it's getting half a gigs worth of stuff out of that index eek it's kind of goofy the way index seeks work seek refers to the point at which you start reading not the point at which you stop us human beings we think that a seek means we're just going to go grab a couple of little short things who know seek to sequel server means I'm going to start at this one point and then I'm going to read for a little while and who knows when I'm going to be done so in this case it's grabbing all kinds of data out of that index it sounds good just don't equate seek with awesome seek doesn't equal awesome and scan doesn't equal bad things get to be a little tougher than that when we start troubleshooting so as I look at this one of the other cool things you can do with sequel centric plan Explorer is that I can right click on here and say I want to see costs by whoops I want to see costs by either IO or CPU or CPU and IO normally when I'm tuning a query and it's not running at a hundred percent CPU I'm just going to switch over and look at the i/o work that's involved instead and now I can see it's really clear where the problem is on this execution plan I'm doing 99% of that work over there in that vote and I'm getting back thirty six million votes every time because I have 36 million votes as of this database export in Stack Overflow if I'm going to ask sequel server to do a report against 36 million of something 20 seconds on a laptop is actually pretty good that's not bad at all so when I really need to start thinking about doing is pre-baking some of this data how can i pre-baked this data so that it's already calculated now go look at the query that's running I'm going to jump out a sequel century here now this query that's running it's saying every time it runs I want you to aggregate all of the votes and we now know we've got like 36 million of these votes I want to pre-bake this in a structure that's already ready to query and I'm going to build something called an indexed view jump down a few here so what this is I'm going to execute it and I'm going to explain what it's doing as it goes through and runs what I'm doing is saying go create me a view on top of those votes and post tables that has the same where Clause only show me vote type ID and two and three and where all these filters exactly match my query and even go count the up and down votes for me group them all together by post ID that view doesn't really do anything until I say with schema binding when I say with schema binding I am saying that no one is I can't even keep a straight face because this is really dangerous what I'm saying is that no one can change any of the tables that are underneath my view no one can change votes or posts they can add indexes and drop indexes but they can't add fields or drop fields in those tables they also can't drop the tables themselves once I do that then I can also layer on top of there a unique clustered index I am actually putting on index on the view in Oracle we call this materialized views where we actually write the view to disk and it automatically updates whenever you do insert update or delete record on any of those tables it's like having an index across multiple tables oh yes you can hear the screams of pain of extremely experienced database administrators in the distance this is dangerous because now whenever I do inserts updates and deletes I have to go recalculate parts of this view as well so there can be performance drawbacks ask your doctor if in index views are right for you now what I'm going to do is I'm going to blow away the plan cache normally this is not something that you do in production but I'm just going to do it to make sure that I get a fresh plan for that particular query I'm going to hit execute and then I'm going to go run the exact same query again I'm going to go highlight my CTE and hit execute and the extremely perceptive of you in here will notice it's not going quickly in theory sequel server will automatically use an indexed view it'll look at your T sequel and go oh I have an indexed view that matches that perfectly just like it does with table indexes you don't have to tell sequel server to use an index you can you can force it with hints but normally sequel server is smart enough to figure those things out on its own in theory indexed views are that smart in practice not so much sometimes they're going to work but sometimes when I go off and look at the execution plan I am still looking at the underlying tables votes and posts there are whole sessions out there on the web if you go to Brando's are calm and search for index views we've got a video just on how to make those work normally with some tweaking of the T sequel I can get my original query to use the view in the interest of time screw it who has time for that instead we're just going to go query the view directly now I'm going to show you that we're going to zoom way in so you guys can see that here is my new simplified query I am just going directly against the view I'm not using any kind of index hints or anything I am just saying go get me the data from that view where the down votes are more than half of the number of up votes so when I run that now I'm across the finish line execute again just so you can see that that's happening it's execute in less than a second every single time when I look at my execution plan and I zoom in on there it is doing a scan of my indexed view this only works on Enterprise Edition if you want to do it in standard edition you have to add a little table hint no expand to say listen don't use those votes and post tables I really want you to use this indexed view to see really how powerful it is though let's go look at the number of logical reads that we're doing hey this time I don't even need Richie's uh set statistics are statistics parser I can see that I'm now down to 34,000 logical reads which is good but we can do better so if I look back over at that execution plan and I look at what I'm doing there I am scanning that whole entire view every time it runs see sometimes you have to put a clustered index on the indexed view that makes it unique but doesn't really make your query go fast now we're going to go really crazy we are going to add an index on top of the indexed view ah what we're going to do here I said yo dawg i heard you like indexes so I put indexes on your index view so I'm going to create that index that view is pretty small so it doesn't take too long in order for this to run and now I'm going to run my query one last time and of course it still seems to run very quickly but look over in the messages tab now I'm down to just 318 logical reads I am reading only 318 8k pages if I look over at the execution plan I am still doing a scan but I'm doing a scan of one particular index on the index view which starts to get really crazy fast now should I have really gone that far here's what I end up telling the business so I tell them exactly what I've got written down in here I have my three options all the stuff that's in this little section here if I wanted to just reduce the number of logical reads I could just switch over to a temp table and cut the amount of reads that I do by like 95% but the query still takes a long time if I want to get even further what I can do is lay in an indexed view and really make this thing scream but there are performance issues along with that what I probably want to do in a perfect world is go build a separate reporting table that I update maybe once a day but if the developers won't give me that option this is something else that I could do in order to get across the finish line which one I take is really up to the business there are problems with using index as a top of index views but if you want to take things that far you certainly can so any questions about the first query that we went through in tuned yes so you if I this is kind of beyond the scope of what I can something I can hear you real quick on um and you may want to try it yourself in order to see how it works it sounds like you want to move this having up into the where clause oh you so you're saying have a union across two of these one for up votes in one for down votes and depending on how I Index it I'm going to end up doing two scans or two Sikhs across but it may be faster absolutely that's a valid way to try it too and that may end up being even faster depending on how you index that of course yes yes Erland oh wow hey oh man now I'm screwed ladies okay so I should explain that the man who's about to ask questions pretty much wrote the book on team tuning T sequel one of many he has an excellent blog I really like it a lot yes yes so what Ireland is saying is if I go over and put option recompile now what I'm going to do is do you want me to leave the indexes on there drop the indexes doesn't matter yes it does I put hard work into those indexes don't give me that doesn't matter all right let me grab the original query so the original query up here and I guess I can leave I could trash this execution plan so what Ireland is saying is my insert I shouldn't zoom in so you guys can see this in the top section I do an insert into a table variable if I say down here on the second query option recompile sequel server builds a new plan for this query given what it knows about that table variable now what I'm going to do though is I'm going to drop my indexes and I have just like every good Microsoft Certified Master has a stored procedure that drops all of the indexes in my database right right that's how I roll so now I'm going to execute the one that Erlin suggested it's still going to take a long time but sequel server may choose a different plan because now it's going to come up with accurate estimates about the number of rows that are coming out of that table we can see here we're still in the twenty plus second mark and it should end up roughly equivalent to what using a temp table would do or worse yeah now with the indexes it might have been okay this whole thing and we'll let this continue to run just to show what's going on here the real problem is there was a lot of real problems in here I probably also at the same time don't want to do the filtering so now just then finish I probably don't want to do the filtering in the second pass I want to do it in the first pass as well to reduce the amount of inserts I do in the temp table there is a diabolical drawback to use there's a few diabolical drawbacks to using option recompile when I use it if I look at my execution plan now of course sequel server is going to complain about those missing indexes but if I hover my mouse over that table scan of Vogt stats now sequel server says estimated number of rows is three million actual number of rows is 1.3 these are the same estimates that we got when we used a temp table using option recompile gets you better estimates against a table variable that is not a good idea because now what you're also doing is telling sequel server I need you to build a new plan every time this query runs that is additional CPU overhead and I'm going to lose my ability to see this query in the plan cache I want to see repeated executions of this query in the plan cache to know how much overhead it had I want to do that by using tools like SP Blitz cache so this is one of my favorite little tools Jeremih pesckow wrote this one show me the top ten queries in my plan cache ordered by how many reads they did how many logical reads and what it does is go through and look at my plan cache and starts to give me these warnings if I scroll across to the right-hand side it's going to show me by reads here it shows me which queries read the most data how long they ran on average how many times they were executed how much CPU time they burned all kinds of interesting stuff plus if you scroll all the way over across to the right it has the query plans as well these are still only be estimated query plans when I click on this I'm going to get some data but I only get estimates not actual so this is a great tool to help you find which queries you need to tune but you still need to run the actual plan in order to get those tuning metrics more questions of are yes have I could you consider a filtered index oh I love that question so the question is when I go through and have that where clause in the database let's go back through and look at our query our original where clause when we go and do this filter we're saying we're vote type ID is in 2 or 3 what if those to vote types are a minority of that entire table as it happens here there's three kinds of votes flagging for moderation and then up votes or down votes this is basically the whole table so it wouldn't save me at all I love how you're thinking there though yes I see people getting up and leaving that must mean my time is up how am i doing on time I don't even remember when I started this thing what am I supposed to leave 3:15 what am I supposed to leave you guys probably know you want me to keep going 15 minutes oh no it was at 15 Oscar oh all right we'll keep going it will take like 2 more questions and then we'll let you guys out for breaks yes would there be any difference if I instead of doing select top 100 what would you want me to do so you want me to just get the top 100 up here unfortunately I would also have to filter for the up and down votes and community-owned dates and these other things as well these things are not on the votes table they're on the posts table now when I do it in a CTE when I do the whole thing in one single CTE let me jump down to those doo doo so when I start to do it in a CTE down here I'm only notice that I'm only joining two posts down here I could probably do the whole thing in one giant join I just may end up doing a whole lot more reads but it's totally a valid experiment to try out yes all right we'll take one more question or none you're all good all right thanks everybody for joining me this afternoon and have fun at your next session here at see we're out
Info
Channel: SQLugSWE
Views: 63,040
Rating: undefined out of 5
Keywords: sqlrallynordic2015, sqlrally, pass, microsoft
Id: uwGCPtga06U
Channel Id: undefined
Length: 66min 57sec (4017 seconds)
Published: Tue Mar 17 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.