Exploring Microsoft SQL Server Query Execution Plans - Drew Furgiuele

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
beautiful all right welcome to sequel server execution plans my name is Drew and I'm a senior DBA at a little company called IGS and Dublin you may have heard of us and hopefully you're all customers a little bit about me before we get going I've been using the Microsoft data platform for about 10 years now and I've really felt I really fallen in love with it I kind of got introduced to it when I was a developer way back when and I was a really terrible developer so I just decided to get it to get into the DBA game instead also at IGS I do things around release management and DevOps those are like really good buzzwords so I have to include them in a slide my contact information is up there don't worry about writing down my email address my last name is a burden so if you really want to contact me by email I have business cards just see me afterwards my twitter handle's pittsburgh please feel free to reach out tweet rude things at me that's totally cool I also have a website where I blog about sequel server stuff probably not enough some performance tuning things and a lot of kind of automation around sequel server if you're interested in how you can automate your deployments and release management if you manage to get bored in this presentation which is a real possibility because we're going to be talking about databases and nobody likes databases but me you can head over to genes face com we take pictures of my boss without his permission and we post them and I think he would be like not okay with it if he wasn't so good looking all right I need to look at this monitor not that cuz that's really hard what we're gonna talk about today we're going to talk about sequel server execution plans and why you should care about them even if you're not a database administrator or even a sequel developer for that matter okay why are these things important to us well they're important because every time you run a query you get a query plan and if you get a bad query plan you're gonna get bad performance so what I want to spend some time talking about today is some common operators you will see an execution plans and then towards the end we'll do some demos of what it looks like when you write some bad plans and not you can do to fix them I need to stay back here all right so query plans are basically a set of instructions that get used by sequel server every time you run a query now sequel server does what's called a cost based optimization for your sequel queries what does that mean that means that every time you run a query sequel server looks at what you're trying to do and assigns it some number and that number is a cost and ironically what happens is sequel server actually tries to create a bunch of plans for your query and it just chooses the one that is the least terrible in terms of cost that cost is some number it is greater than zero it can be less than one but it's not it's never zero or less and every time you run a query for the first time sequel server takes that plan and caches it that plan cache is where all these queries get stored until they're eventually validated out maybe because other plans need to go in the cache or you tell it get out of my cache because you're terrible when these plans get created it is basically going to detail every step that the storage engine not the relational engine is going to take to retrieve your data so here's what happens I've got a query in management studio where I've got a query and under D framework or in my code where I'm executing a stored procedure and I kick that query off against the server when that happens behind the scenes sequel server has to take that query and the first thing it does is it looks at it and it makes sure it's syntactically correct can i parse it or all the commas in the right place you know did he types to lurked instead of select that type of stuff that I use any keywords that I should move once that query has been successfully parsed out it moves over to what's called the algebra and the algebra is what actually takes and looks at the individual objects in your query okay he's trying to use a customer table or he's trying to use this view or he's trying to use a stored procedure name this sequel server goes in that and looks goes into its engine and looks at it to say yep those are all objects that I can use then when that's all done sequel server then generates your plan we're actually a bunch of plans and chooses the least off of one it looks at every single step it needs to take to go get your data so do I need to scan a table do I need to do some kind of computation do I need to do some type of sort am i joining tables together all of those become different operations in your query plan then when that's far finally done it takes that execution plan and moves it over to the storage engine the storage engine is what eventually thrashes you're spinning rust disks and reads and writes to memory and schedules time on your CPUs all that type of thing okay all query tuning that you're gonna do in sequel server if sequel server appears slow the first thing we do is we look at the query plan we don't panic and say oh we need more resources or we need a flash array or we should go to no sequel because those are all terrible ideas start looking at your query plans okay don't just jump to in my II really really crazy conclusions until you start looking at query plans because chances are you can find ways to tune it inside the plan now like I said sequel server is going to make its best effort guess about how a query should be run sometimes it gets it wrong you can override how sequel server runs your queries with query hints I don't recommend that there's maybe five people in the world and all of them are either current or former Microsoft engineers they can actually make sequel server do things exactly the way it should be done with query hints I'm not one of them and maybe someone in here is I don't know what salt anybody but by and large you shouldn't try to outsmart the optimizer so when we read execution plans and I'll show you a sample plan here in a second we always look at each operator and we read right to left why do we do that well because we start as far to the right as we can in our execution plan and we just kind of follow the arrows through the plan to read it and every operation has a cost and these costs are cumulative so even though you may have a really simple query with a terrible sword in it that sort is going to bubble up its cost to the total operation and then that's why that query is terrible each operation also is an arrow that connects it to each individual thing that arrow will show you how data is flowing between your operations what columns are moving where are things being computed what's the size of the data moving between operations it also shows you if your statistics are out of date now one thing about sequel server is it maintains a list of column statistics about your indexes about your tables everything behind the scenes it knows how often a particular term shows up dents a particular indexes it needs to know that so when it comes up with its execution plan it knows the best way to drive that plan to get your data and you can get all this information from tooltip pop-ups in your query plans but if you really want to go deep there's also properties windows inside of SSMS to view your data alright so let's look at a real sample real simple sample plan okay I have a select statement where I'm going to select some columns from a Human Resources table because that's where the juicy data is and I want to join that to my person table on a business entity ID so these are two primary key columns that I'm gonna join to in sequel server if I were to run this in code or if I were to run this in management studio what's going to happen well the minute I hit f5 depending on which execution plan option I have enabled in management studio I'm gonna get a result that kind of looks like this okay if we read right to left we start at the top and we see I'm doing a clustered index scan which we'll talk more about in a minute from my employee table which then has to be joined via a nested loop join from and then I follow my nested loop to my next node to say I'm doing a clustered index seek on my business and I'm sorry on my person table and then joining those all together and eventually returning it back to the client or the person or whoever requested this data now there's two types of execution plans at sequel server the first is an estimated plan and the second is an actual plan represented by the icons I have up here the one on the right is if you want to get an estimated plan I would click that button if I want an actual plan I'm going to click the left-hand button well what's the difference when estimated plan is just that what it does is it takes your query parses it algebra is it and compiles it but doesn't run it when that happens sequel server can take that query and look at it and decide you know here's how I think I'm going to run this query a lot of times it's exactly the same as the actual plan but what sequel server doesn't know is how much actual data is gonna flow through this operation it makes a guess and that guess is what drives the plan overall okay the opposite side of that is your actual plan if you're doing real end query tuning you should be using the actual plan because you're actually going to execute the query and you're actually going to get statistics back about your data and what actually happened okay the downside there of course is you actually have to run your query so if you've got some terrible query that's sucking a lot of wind and it takes five minutes or ten minutes or a half an hour or some batch process that takes all night it's going to take a while for that execution plan to come back so you kind of need to know the difference of okay I kind of want to get a feel for if there's any general things I can do in my tuning with my estimated plan but as I start making changes if you really want to see improvement when you start tuning queries you're gonna want to use your actual plan all right so we've talked about execution plans what they are how they get generated how we route how we read them now I want to spend a little bit of time talking about some of the more common operations you're gonna see in query plans and what they mean so in order to do that we're gonna try to alt tap out of this guy that's always an adventure alright because you know PowerPoint okay so for my demos today guys I'm using sequel server 2016 okay I'm really excited about sequel server 2016 real quick show of hands because I'm always generally curious about this who's using sequel server 2014 in production alright that's great what about sequel server 2012 more hands alright this is where it gets a little dicey what about sequel Server 2008 r2 okay I know I get it I still have some - what about sequel Server 2005 we need four guys I'm sorry hanging on right I'm not even gonna ask about sequel Server 2000 because I'll crawl under this table alright sequel Server 2016 is gonna be awesome and if I can squeeze it in here it's hard to work it into the demos I'm gonna show you some really cool stuff that's coming in sequel Server 2016 I'm also using the adventure works database it's Microsoft's publicly available database really great for demos because they do everything in it because it's a product demo it's got foreign keys and it's got XML indexes and triggers and all the things that make databases great so we'll be using that data for the most part of it I also have some stack overflow data so we can see how data scales when we really start to go from like 20,000 rows to like 28 million rows okay so let's start with some simple selects here you guys can all see that good in the back all right you're about to not see it really good so I'm going to do some selects here now up here at the top you'll notice let me zoom in a little bit whoo Windows 10 I have my actual execute execution plan selected up here if I wanted to get my estimate I would click here okay remember that if I click my X if I click my estimated plan I get a plan immediately if I do my actual plan I get it after my query is run okay so I did that and I actually got query results I got a lot of query results because I ran like five or six different queries here and I got different row so result sets for each so let's look at the execution plan tab which is over here and I'm gonna make this a little bit bigger and I'm gonna zoom in all right good read that all right so my first query good old select star okay you're gonna hear about how much I hate select star for a little bit select star from production dot product give me everything out of the production dot product table sequel 101 right let's just use select star because it gives me everything so what's happening in this particular query well this is the most basic query you could ever see in sequel server what it's telling me is go to this table which is the product table scan the clustered index and return all the data back so when I mouse over this guy I get some information about this particular operation I can see how many rows sequel server read I can see how many rows sequel server thought it was gonna read that's what the estimated number of rows is keep that in the back of your head because that's gonna come back to haunt us later we can see how expensive the query is estimated operator cost I've heard that number described as query bucks I've heard I've heard that number described as a Microsoft reference machine number it has no real-world value okay in fact if I were to give you this query in this database and you'd run it on your laptop you may even get a different cost than me because maybe I have more memory than you or I have more CPUs than you or maybe there's the data is already in memory it's hard to say but that number is a reference of how expensive that query is and when you run multiple queries in the same batch so I can see here that this query was roughly zero percent of my total query - cost I didn't have any goes or semicolons in the way and if I start scrolling down I can see where the more expensive queries we're in my plan this is really important when you're writing things like store procedures that have multi-step operations you can see where your more extensive steps or more expensive steps are so why is this a clustered index scan what does that mean while clustered index scan is one of the most basic operations in sequel server remember that if I have a table and I create a primary key on it that is how my data gets rearranged and stored in sequel server so that means I'm scanning the entire table a clustered index scan and returning every single row by mouse over my little arrow here I also get some intelligence about how much data is flowing back to the client yeah oh man that's terrible who's parked in the Bed Bath & Beyond parking lot yeah you're gonna want to go move your car they're gonna tell you that sucks sorry by the way you guys should all tweet this to like the Cinemark companies because this is they're doing not ours so really give them some hate tweets sorry guys I guess yeah I'm assuming I'm okay okay I think that's where I am so if my car is not there I'm gonna be pretty mad because I really can't go move mine right now all right sorry guys again tweet your hate to the rave company all right so that's what that clustered index scan means now the operation below it is a table scan why is that different well all I did here is I created another table in a different schema called product archive but I didn't put a primary key on it if you see the table scan that means sequel servers reading data from a table that doesn't have a primary key is that a good thing or a bad thing drew well it depends my favorite answer - give us a DBA table scans are not themselves terrible if you're reading every row from it because it's just all sequential reads off of disk whatever order it's in on the disk we don't care we're not ordering it we're not filtering it we're not doing any sort of predicates with it they're great but if you're gonna be joining to that table you probably don't want to do table scans because that gets to be really terrible when it tries to go and get it later now my next query I have here is select star from sales dot customer it doesn't look so simple but why it's not syntactically different than my other query and I didn't do anything where it looks like I'm computing scalars in this thing how can I find out what's going on let's start right to left again I have a clustered index scan on my customer table then it pipes over about 20,000 rows to a compute scaler what's it compute scaler it's exactly what it sounds like it needs to do something to every row in my result set before I return it back to the client Oh apparently if you're parked at roosters you need to move to I love going first by the way we could work all this stuff out as we go it's all right I want to Beto get the cart oh that would be terrible don't tweet bad things at roosters though because they're really good way all right so I'm doing a compute scaler but what am i doing from a compute scaler well in order to really dive into this we need to look at the properties window because I certainly didn't write anything that needs to compute any data in my result set over here in my properties window I can get more information about what the actual operations are okay this is pretty much everything that's in my tooltip but it has more so if I want to dig into this particular compute scaler I can start expanding these nodes right and I can see what the scalar operator is from this guy or I can make it not so terrible if I do this go in here well let me select it go here so what I can see that it's doing is it wants to do some sort of function in my query where the heck did that come from well I then have to go look at the actual table where this was created and if I go into my object Explorer and expand databases adventureworks and look at the table sales dot customer I have a computed column in my table it's not persisted what does that mean that means that every time I select from this table I need to spend CPU cycles to calculate an account number okay we can see that represented as a cost in our query down here that 97% of the total work we're doing for this query is actually getting the data off of the disk but I've got a roughly 4 percent of my total query is being done and compute scalars and by the way the reason there's to compute scalars there is if I were to script out this table real fast and look at the individual information in it I can see that my account number is indeed a computed column but it's actually got two functions in it one the first is in is null returning the value that isn't null or if it is not returning an empty value I should say or if it is concatenate the letters aw and then some leading zeros to the customer number so that's the compute scaler happening there it's actually two operations because it's a computed column okay back to this so let's say I want to start filtering okay what if I start using predicates I do a where clause in my query what does that look like well let's scroll down a little bit here so select star from person dot person where person type is employee what does that plan look like you would think it's going to be different but it's not I still do a clustered index scan why well drew I thought indexes are great and that's why we do where clauses yeah but I don't have an index on employee type on this table so it can't do anything sequel server still says hey guy I'm still gonna go to my clustered index and return every row and then filter out all the data and that's really expensive this is the most expensive query in our batch even though I'm using a where clause it doesn't make any sense it kind of does though because I don't have an index I still need to thrash my disk to pull everything out of it and by the way estimated row size how big is each row its each rows roughly 8k and I'm returning back to the client you know roughly I don't know 2.6 Meg's worth of data but I need to read everything off of disk first and then filter out the rows I don't want in my memory or with my CPU and then pipe that back out to the user that's why this one's so expensive okay but how can we make it less terrible all right step one stop using select star if you don't need every column don't select it less data back to the client less time spending less time spent reading from disk so in my next example then I change up the query a little bit we're gonna select only the first name and last name from that table and I'm gonna say where my last name is equal to some value you can see it's per amortized here this time I get an index seek operation a seek is different than a scan in that I can go directly to only the rows I need pull them out and then return them to the client the reason that I can do that is because I have an index on that table and we can see it's actually using the index if I look at the object tooltip here I can see it's using an index called person underscore last name underscore first name underscore middle name just a naming convention Microsoft came up with for this particular index to say it's an index on the person table ordered by last name first name middle name that index order is important and we'll see why in a second but I'm not done yet how do we quantify how much better Asik is than a scan well when your query tuning one thing you can do is you can turn on statistics on your io io is generally where most queries start to fall down the more data I need to read how it's ordered on my disk how I'm joining to it how I'm filtering it that's where most of your cost is going to be see if you use a close second but most of your cost is going to come from IO so if we want to see improvement on how we're actually writing these queries we can run this set statistics IO on statement what does that do well now let's run my same queries again where I'm going to select star from person dot person where I have the person type of employee and then I'm also going to select star from person dot person where my last name is Smith so pop quiz Hot Shots what's gonna happen in my first query am I gonna get it am I gonna get an index scan or an index seek great job Chris Scott I'm gonna I'm probably going to get a scan in my first query right because I'm doing select star and I already know I don't have an index on that but my second query I'm doing selects star where I have my person last name is equal to Smith do you think I'm gonna see a seek or not there why boom select stars are terrible remember that if you remember anything else today there's two things I want you to remember select stars terrible and cursors are bad all right so I didn't get us I did get a seek technically but I didn't get just a seek okay here's my two query plans for that we'll look at in a minute but we can quantify how much better one query is in the other in that my first execution here did one scan okay meaning it's scanned one object and it had to read three thousand eight hundred and thirty-eight pages out of memory to get my result why not why not from disk well sequel server does cache things in what's called a buffer pool and as you read data that comes off of disk and goes into memory stays there until there's memory pressure old pages come out new pages come in but I had to read roughly four thousand of them my second query even though I didn't get an index seek improved by almost a factor of ten I had to do less IO why let's look at the plan well we know what the first plan is select start from person to person where I don't have an index and that's what I get my second query isn't actually terrible but it could be better okay I did get an index seek okay I did select from the person dot person table it did use my index to query by that last name but then it had to do something else called a key look up in sequel server when you have an index and you see a seek or a scan it's actually reading a secondary copy of your data every index you create on a table takes the same amount of data in your table but only uses certain pieces or certain columns of it and it orders it in that order so for this particular index it's ordering it first by last name and then first name and then middle name so basically think of it this way I've got my main table with a clustered index that has all my columns and all my data and it's ordered by the by the primary key and I also have an index side by side row for row that has all the last names then all the middle L then all the first names then all the middle names that's how it's stored on disk so it's able to go to that index and quickly seek because it knows statistics about my columns and my index pull back the rows I need and then it's doing a join even though we don't have a join operator in our table why remember that we're doing select star and not every column is in this index so what a sequel server have to do sequel server then has to take all the results from my seek which also includes the primary key every index will have your primary key in it whether you tell it to or not then it has to go back to the table and do a lookup on every single row okay and we can see that this is way more better than this guy okay we can see that this query this this particular query took up roughly 89% mmm have the overall query cost of this batch where this one only took 10% again select stars terrible how can we make it not terrible well if I run the other query now so let's say I'm just doing first name last name on Smith and I run it now I just get my seek and I only had to do four reads okay way better okay we improve we basically improve by a factor of 100 every time that statement may or may not be true because I'm really bad at math but you can see that that number continually goes down that's a good thing okay we're reducing our I Oh cost which makes for faster queries now one more little quiz I have the same query but this time I want to sort by first name or I'm sorry filter by first name okay so I'm gonna look for where first-name equals Andrew do you think I'm gonna see Asik nope because sequel server still used that index but it did a scan and here's why the index is ordered by last name so it went into a scan because it can still read just the index is still it instead of the whole honking table right but because the data is ordered first by last name and then by first name that's not confusing it has to scan the entire index and only return the rows where it's Andrew for the first name that green text right there it's like hey buddy I can make this a lot better if you create an index with first name in it that's good old database tuning advisor and nobody invited him don't always go by these recommendations okay I wouldn't necessarily add this index unless your application was always searching by first name for some reason you shouldn't just blindly add these indexes because how much of a cost would we actually see there assuming we get similar performance like we did from the other query and only for logical reads it's probably not gonna be that low it could be more it could be less depends on many times Andrew shows up in my index this guy does 117 reads still better than our other query with the Select star because I don't need to read as much data okay all right next thing I want to talk about and I'm gonna turn some statistics off now because we don't care about these anymore so let's talk about sorts okay good old select star again stop hitting yourself through select star from person person and this time I'm going to order by the column person type what does that plan look like again with my actual plan turned on go to my execution plan now we can see the costs are trying starting to be split between CPU and disk okay I did a clustered index scan I don't have any indexes up my sleeve here I'm just pulling back every row from the person person table ordering by person type okay I look at this guy and I get some information about them okay estimated operator cost this thing costs 2.8 sequel server reference numbers 68% of the query my sort though is no slouch ok it has an operator cost as well of 1.3 which then adds up to whatever this number is and 4.17 and that's what it costs I will tell you that as a DBA if a developer comes to me and says is query slow the first thing I look for are sorts sorting a sequel server is a very expensive operation for a couple reasons one takes a lot of CPU and a lot of times if you have a really large data set that you're trying to sort sequel server is also going to start thrashing disk in its temporary database to create little hashing tables to do it's sort before it eventually returns it back we can make this query a lot better if we can eliminate the sort how do we do that the first thing I tell people is well can you sort in your code that's generally going to be faster I'll freely admit sometimes though we can't do that if we have an application that's slow and we can't sit there and we can't come up with a fix on the spot and release some code well what other options do we have you can deal with sorts in a couple different creative ways in sequel but my favorite way is with indexes sorts benefit from indexes remember that when I have an index my data is sorted by whatever the indexed columns are so in this particular query I'm gonna select first name and last name from person dot person but first I'm going to order by first name which is not the main index column in that query when I run it what does my plan look like I still have a sort because remember that that is not the column that is the main column of the index it's first the first column is last name then first name and then middle name so I still had to do an index scan my operator is only 8% of my 8% of my query cost is in my index scan because I'm reading way less data now the most now my most of my cost is in my sort but we can eliminate that sort all together with indexes so let's say we actually use the index as it's designed select first name and last name from person dot person and this time I'm going to order by last name when I run it my sort is still is gone but you'll notice that I'm still getting an index scan even though I'm ordering by last name shouldn't I be getting a seek no I didn't use a where clause Sikhs are reserved for filtering data with predicates I don't have one here but the good news is my sorts gone I still returned every row from person to person but I didn't have to spend time doing a sort because I utilized the index that was there I selected only the columns I needed from it and I ordered by the column that has already been indexed so we completely eliminated that sort operation this query gets infinitely faster right away I don't have to release code I just need to buy my DBA a beer all right so now we're going to talk about the next piece and this is where everybody starts to ask me a lot of questions so if you have questions just in the interest of time I'll talk to you afterwards but we're gonna talk about table joins the joy of relational databases right what goods is what goods of a relational database if we can't do joins there are three main join types you're probably going to see in sequel server ok the first one I want to talk about or merge joins so let's look at my query select first select last name at first name and email address from person person joined to the person email address table on the primary key of both tables all right the first type of join we're going to talk about is a merge join and a merge going you can kind of think of like almost like a riffle shuffle if you were putting a deck of cards together I take them and I shuffle them all together they just kind of merge together it can be a very good joint if and only if you're two sides of the join are ordered the same way merge joins rely on the fact that I have an outer and an inner set of data and they're both sorted the same way so that way the merge can just instantly overlay itself on top of each other again very efficient but sequel server needs it in that order and it's not so here we go again with another sort operator I did an index econ last name okay why well because did I actually do a predicate here let me scroll up a little bit yeah so I actually did do where last name is equal to Smith so I'm using my index I did a seek really inexpensive you're really an inexpensive efficient operation but then I need to sort it because remember that index isn't sorted by business entity ID it's sorted by last name but business entity that is business entity ID which is my primary key of my table is in that index so it just has to reorder that output it then does a clustered index scan on the email address table finding the rows that match and then it overlays them and I can see that here if I scroll down so I can actually see the tooltip come on drew you can do it come on well let's go to the properties window that's what it's there for merge-join yep we're joined columns down here doesn't merge join the residual was business entity ID equals business entity ID again see and this is where we start talking about why did sequel server do this if we know sorts are bad why did sequel server have to spend its time doing it if it's an if we know sorts are terrible wouldn't we rewrite this so we didn't get a sort yeah we could write we could write a different index to try to eliminate that sort operation but then we probably wouldn't get a merged joint unless we wrote an index that actually had business entity ideas the first column which that might make our other operator terrible remember when I said that sequel server actually evaluates a lot of different plans for every query this is the least terrible one it said hey guy I'm going to it this way because I can use a merge-join even though I have to eat a little bit of the cost of the sort this is still a better join for the Twitter you're trying to write to get the data out okay so merge joins just right together I can't wait to hear that on the playback all right so next hash match joins hash match joins or usually where people start to see problems in their queries where does a hash match join raise your hand if you know what a hash table is okay okay everybody raised their hand that's good alright when you have very large data sets joining the very large data sets or one extremely large data set joining to one other not so large data set you get a hash match join a hash match join takes the smaller data set and it creates a hash table of values then for every row in the larger table it basically creates a hash for the row and it finds the bucket that that row matches on your hash table and joins it together that way it can be inefficient okay we can see here that sequel server is trying to be helpful again and recommending an index if you see a hash match sometimes that's what you're stuck with if you have a lot of data that's just the way life is you know consider do I need that much data can I rate my query different can I change my logic do I need everything that I'm getting but if you've done all that if you've done all that and you're seeing hash mass joins that's usually a key to start thinking about can I index my data differently to reduce the reliance to reduce the reliance on hash matched joins because all of this is going on inside your temporary DB on sequel server temp DB the toilet bowl of sequel server trademark part Kendre little we should follow her on Twitter she she calls it the toilet bowl of sequel server and this is a perfect example of it because all of this all of this hashing and everything actually gets written to disk and then read from disk and then joined back together we can see down here the sequel server is also being helpful and doing another merge-join based on how I wrote my query the last type of join we're going to talk about is a nested loop join which actually is nothing more than a terrible cursor in join form but it's not I will freely that I hate cursors and I threatened to cut developers that write any true story but this joint is actually really efficient if and only if you have a very small data set compared to a very large data set all it does is it takes that smaller data set and loops over the larger data set comparing it to the smaller one looking for matches nested loop join I think everybody here knows what a loop is right it's a Developer Conference nested loops are great but they have one major drawback your statistics about your queries need to be on point because if it makes a bad decision and tries to use a nested loop when more data comes back then it was expecting that's when you run into bad times deluxe and you want to queue up sad trombone comm because you're in for a bad time what our query estimates so let's take a moment here and look at this so because I ran my actual actual execution plan I can see that sequel server said hey I'm expecting 290 rows back and it actually returned 290 rows but there's a problem here actual number of rows 290 estimated number of rows 1 apply the difference well again sequel server has to make assumptions about your data before it executes the query so in this particular case it says based on how you're doing this join I estimate you're only gonna get one row on this join so hey happy day I only need the nested loop over one row sequel server said this is the best way to do this query but it was wrong ok it actually returned 290 rows so you have a 290 wrote nested loop comparing to a 290 row table that's not still terrible because my result sets are small and I wouldn't fret too much about it but if you had a extremely large tables and you're getting nested loop joins every time you're really thrashing disk you're really thrashing memory and CPU when you shouldn't be so nested loop joins are good if and only if your data sets are aligned the right way and we're gonna see why that's terrible here in a minute okay so those are joins so now we've looked at some of the we're all tabbing again drew good look all right so we've looked at some of the more common query plant operations okay seek scans joins sorts and row lookups okay those are some really common ones you're gonna see in your execution plans now I want to spend the remaining time which is roughly 25 minutes ish looking at some bad queries and how we can identify them okay what we were just talking about with that nested row look up or that nested loops look up is it's subject to something called parameter sniffing who here has heard that term before it's terrible right at the beginning of the presentation I talked about how query plans get cached so if I have a particular query that I run against the table sequel server first looks to the plant cache and says do I need to recompile this query okay because we don't want to spend time recompiling queries over and over again because that time depending on how busy your system is is going to be scheduling CPUs and taking resources away from your other queries if the plan is in memory I'll use it if not I'm going to compile it and put it there and the next time it runs I'll use that plan and the next time it runs I'll use that plan until it's invalidate it out but that can lead to some interesting scenarios so using adventureworks again I have three queries here they're identical except for the predicates I'm selecting from an address table when I'm passing a state province ID over to it and you know in my in my select statement when I run it and I look at my execution plan I get two very similar queries for my first two queries sequel server looked at the data and said I'm gonna do an index seek you're selecting some columns that aren't in the index so I need to do a key lookup we could fix this with and include by the way guys one of the easiest low-hanging fruits in any query you tune is with a key lookup if you see a key lookup that probably means you have an index that doesn't have enough columns in it okay you don't need to necessarily add them to the ordered columns of an index you just need to add them to the included columns again very easy one to fix you can reduce your i/o cost just by tuning your tuning your index a little bit so for this particular query sequel server said I'm gonna do this and I'm gonna do a nested loop join because you're only returning 16 rows here and you're returning 16 rows here nested loop join saves the day but when I scrolled a little bit for that other parameter that other value I passed it actually did an index scan it did something way more expensive than what we had before because sequel server knows again about our column statistics and said it's just easier for me to scan the table then then try to seek and do a look-up on each one of those rows I'm just gonna scan the table because that value shows up way more than everything else for this query that's fine I don't really care because I didn't actually hard code it I didn't per amortize it where things start to go south forest though is if we actually create a stored procedure okay let's say I wanted to create a stored procedure to do this work okay very simple I'm only creating a stored procedure that takes in that parameter of state province ID and returns data super simple stuff let's create that and now let's run it except this time I'm gonna run it for the bad plan the one that gave us that table scan so I'm gonna run it for state specific ID number nine I look at my execution plan and I'm in index scan country but you know I'm actually scanning my entire index right now that plan has been cached so the next time I run this query it doesn't matter what value I give it let's use one of the smaller ones where we saw our index seek I'm still in a scan why remember that I have a parameterised query so the only thing that's stored in the plan cache is that per amortized query so every time I run this this third proc from now on until that plan gets out of memory or out of the plan cache I'm gonna get this bad plan okay that's parameter sniffing and it's terrible and it happens a lot but it happens by design we like I said we don't want to be compiling queries every time if we don't have to that's time we can't get back CPUs are expensive memory not so expensive but you know it's it's one of those things we want to try to avoid now there's a couple different ways we can deal with this problem some of them are terrible some of them are not the first way you can deal with it is you can modify your stored procedure to say with recompile this is a knee-jerk reaction it is a nuclear option because if you do this this means every time you run this stored procedure sequel server will recompile it and make a new plan this can be bad it'll fix your immediate problem and maybe you modify it once to force the recompile and then get rid of the recompile so it gets a better plan but this can be really bad because if it's a really busy stored procedure and let's say it takes fifteen milliseconds to compile this plan every time and I'm executing the stored procedure thousands of times per second that's a lot of CPU scheduling that is taking away resources from other parts of my queries not great but it can fix your problem immediately I don't recommend it if I add this to my stored procedure and then execute this down here I will now get a different plan for each execution because every time it runs we told sequel server recompile it every time so I still got what sequel Saarbrucken let's excuse me what sequel server considers to be the best plan for my first query even though it's a scan it still thinks it's less expensive than the other query and I got my better plan for the state province ID that actually could benefit from an index seek now if you don't want to do that another trick is and this is us trying to be smarter than the optimizer your mileage may vary we can tell sequel server to optimize this query for a particular value let's look at this statement real quick same query no more recompile but I'm basically telling it I want you to optimize this query as if I had passed value 55 okay optimize for this particular value doesn't matter what value I give it always optimize the plan so it looks like this we'll modify our stored proc and we'll run it again and now I get the same plan for both okay roughly 50% of the cost goes in both sections okay it kind of just basically said each one of these plans cost exactly the same now is that good well it's good for where you have a lot of data maybe your data is skewed a particular way you kind of have to know that right you need to know your data before you can do things like that but if you know you have a significant data skew you can optimize for that value and get better plans out of your stored procedures okay that's that's parameter sniffing now your other option and your DBA will hate you if you try to do this don't run that because that dumps every single plan out of memory and they all have to recompile don't ever do that okay the next thing that can be terrible in sequel server table variables who here is used a table variable okay why table variables are one of those things that a lot of people stumble upon and they think wow this is great I have a variable that can be a table and I can store it in memory and I can reuse it and I can write queries against it and do all kinds of things table variables have one very significant drawback you can't get column statistics about it they're great if you only have a little bit of data in your table variable but the minute you start to scale up it gets bad in a hurry and we'll see an example of that the other myth about table variables is that they don't get written to disk that is false they will get written to temp DB if they get big enough so just be wary of that so what does that look like well let's say I declare a table variable called customers with these columns in it then I insert a rec ice I insert one record two into it and then I'm gonna do a select star from it and do a join to it and getting a result that's cool here's what my query plan thinks it looks like and this is a multi statement batch so I need to scroll down to where I actually have the query again a little more complex there's doing there's a lot of things going on in this query first things first though I have this table scan on my OOP Museum in a little bit I have a table scan on my customers - on my customers table variable okay actual number of rows returned one estimated number of rows one perfect it came up with a great plan did a nested loop join to this index seek and everything's happy but if we start scaling this query up okay and I'm gonna declare the same table variable again but this time I'm going to put 5,000 records in it and select from it again what does that query look like well it looks the same but now I have a problem is a nested loop the best thing we can do with this particular query I mouse over this guy actual number of rows 5,000 estimated number of rows 1 okay this isn't a great scaling example but this I just want to highlight why every time you have a table variable sequel server thinks you're only ever going to get one row back and what happens when we get mismatched estimates we don't get the best plan all the time so it's gonna throw it over here to this nested loop join write estimated number of rows it thought it was only going to get one point six rows which again it's an estimate you can't have less than one row right you get it's actually saying you get for you we're actually going to get four thousand four hundred and eighty rows back okay sequel server may not have chosen the best type of join for this because it only thought it was gonna get one row that's not great your alternatives to this or two things one we can actually create an actual temp table you know create table pound customers right we're actually gonna create a table on disk for real this time and then I'm gonna select from it same exact thing same columns everything is the same but I got a vastly different plan this time I got I was able to use a merge join which the data was already in the same order so I just kind of put all that together and then I took everything up here from this table scan and did a hash match to it because you know it said I have these number of rows and I had a pretty large data set I'm actually pulling back one hundred and twenty one thousand rows nope yeah hundred one hundred twenty one thousand rows from this guy and then I'm hash matching it to that other five thousand is it hash match the best one we can get for this maybe we can tune it a little bit but you can see how the plan is vastly different and all we did was change the table type sequel server said this is a a better query than the other one okay that's why table variables can get you into trouble if you have stored procs or you have other you know things written in sequel today that utilize two table variables take a minute in your dev or int or QA environments to try changing the temp tables and see what your performance looks like because if it's an actual temp table you can get column statistics on it you can also index them so if you have some large batch process that you need to run I got 10 minutes left you can actually index them or create primary keys on them and treat them like any other table and then as soon as your sessions close they go away okay so table variables not always great okay next thing now I want to talk about my favorite topic cursors yuck cursors are bad I keep a knife at my desk and I want to say it's for protection but it's just for protection against cursors there's a lot of people that work at IGS that are nodding their heads right now but I kid okay cursors do have a time and a place I can't always tell you that we're gonna know the right time to use them okay if you have some sort if you have any sort of OLTP workload try to avoid cursors if you can okay I'm gonna show you what a plan looks like when we use them but spoiler alert we're gonna kill it with fire because it's just terrible all right so let's say I have a cursor what a cursors look like in query plans so let's say we screwed up okay we're at work we're a company that we send things out to people and you know we made a mistake and we misspelled somebody's name wrong on a product which may or may not have happened to me very recently I want to begin a transaction I'm gonna declare a bunch of variables for my cursor who hears written cursors and tseebo before more of you than I was hoping all right so some of this code probably looks familiar for you but the basic thing about a cursor is we write a select statement and we put that into a cursor which is just giant Li you know it's kind of like an iterative list and then we open that thing and we constantly fetch each row until we can't fetch anymore now this is like cursor 101 that I see developers write I've got two tables that are related to each other what I need to do is insert a record into one find the value of the primary key I inserted and then use that value to insert into the other table right we have some kind of foreign key constraint that we need to keep track of so what do I do I use my old friend scope identity to pull out the value of the row I just inserted and then I inserted into the next one what does that do well let's actually run this okay and I'm actually going to turn off my actual execution plan no no leave it on what the hell let's run that okay and we're just gonna watch this little timer down here I have no idea how many rows this is gonna update okay are you not entertained all right so let's stop this what's happening well we can see behind the scenes if I look at this execution plan I actually have a lot of execution plans because I'm executing a query over and over and over again I had to generate a plan for each one of these and do each insert individually we call that rebar and the reason we call it rebar is we're doing a row by row operation it also means that I want to hit you with a piece of rebar let's roll that back how do we get around cursors well in this particular case again cursors 101 right let's do something a little bit differently okay the first thing I want to do to get rid of my cursors figure out how I can capture the values coming out of that insert and then reuse them let's start by beginning a transaction and I'm going to create a temporary table called new sales so every new sale that I'm entering then I'm going to insert into sales order header the magic is going to happen on this particular statement call it output output is a statement you can use a sequel server that captures the output of what you inserted try to wrap your head around that one every time you do an insert and update or delete option in sequel server there are secret little tables behind the scene for your transaction called inserted or deleted okay if you're inserting a row it actually captures the value that you inserted before it commits it to the table we can leverage that secret inserted table to find out what the new IDs are of what we just inserted then well actually before we get to the then let's just run this part so let's create my table and actually do the insert okay that was pretty fast and if I come down here I captured all of that output to my new sales table so here are all my values that I inserted into that sales table so then it just becomes another set based operation to say insert into sales order detail and use that select statement with these traditional values still not a great insert right and we're gonna see why in a second okay that took six seconds I you know compared to the never of my other query so I'll consider that a good improvement right what did that look like well the reason that that still took a little while and we're not seeing it here again and this is why estimated plans aren't always great is there are a ton of triggers my other favorite thing to talk about and a bunch of other foreign key constraints on this table this sequel server has to check for every insert we can make that a set based operation as well and that type of stuff won't necessarily show up on an estimated plan but that's why that insert still took six seconds but six seconds is better than never okay so cursors yuck destroy them stop using them okay so in my five minutes left let's go ahead and wrap this baby up so a couple of the things I want to quickly mention and I didn't have a really good I didn't have a lot of time to show you in sequel server 2016 actually I do have time to show you this because it's not important to me sequel Server 2016 has a really cool new feature called live query execution so before we've seen actual and we've seen estimates right that's actually not going to work let's go down here to the bottom okay so I'm gonna use my stack overflow database real quick and I'm actually gonna dump my plan cache and dump my memory do as I say not as I do I'm gonna run this select statement now in sequel Server 2016 there's a new fancy button that has this little arrow include live query statistics this is exciting I promise we click this guy okay that means that anytime we on a query now we get something that looks like this Oh red I almost made it almost made it without red text damn it alright let's try that again I get live query statistics of what's actually happening on each step that's pretty cool ok that's actually really freaking cool now good news good news you can actually get this as sequel server 2014 I want to say it was sequel server 2014 one of the more recent service packs they introduced this functionality into it so for those of you fortunate enough to be using sequel server 2014 in production today get the latest version of sequel server management studio that button should be there all it's doing is there's certain new dynamic management views behind the scenes that can query these different operations as they're happening and then return to you how much how long each step takes pretty cool stuff really really handy ok the other thing that's coming in sequel Server 2016 is a thing called a plan storm the plans Thor is microsoft's new way of offering you ways to deal with parameter sniffing what the plans Thor does is he keeps track of all the different query plans for a given query whether or not it's in memory so in that parameter sniffing example I can actually have that query go into the cache and then if we make a change to the proc or even if we don't make a change to the proc sequel server we'll keep statistics about those different plans and how they're changing over time and if they're regressing don't have a good demo for that because I just don't have the time to really go into it but that lets you look at different plans as they're changing and that gives that can give your DBA S or you a clue to say man I really need to look at some of these queries and rewrite them and the other thing I would say is if you're if you're gonna spend time working with query plans I always recommend everybody downloads and I don't work for them I just really like them a lot download sequel Sentry's plan Explorer ok it can take your output of your execution plans and render them like in color code them and show you where expensive operations are give you a better way to look and then as you tune your queries it'll keep stats about each time you run it and give you some more information so it's pretty neat so sequel server execution plans hopefully you're going to leave here today with an understanding of what they are why they're useful and how we can look at them to read them and again as you're making changes try to keep track of things like statistics i/o query execution times look for ways that you're making it better measure it incrementally and then you know work with your DBA you know I love working with developers I love tuning sequel at me as part of the reason I'm here you know by your DBA abeer always stay on their good side I would recommend that if you are interested in this I mean I only have an hour there's only so much I can do you can spend days on this stuff you can really get deep into the engine to learn more about it the plan operator reference on MSDN list every single possible operator you could see in a query plan in a description definitely check that out if you see a plan operating or you don't know what it means that's where you can find more there are people who are way better at query plans that I am and if they ever see this presentation they'll probably yell at me because I probably gave you a bunch of bad information the first is Brando's are unlimited he does a ton of stuff with performance tuning in sequel server check him out the other is grant Fritsche who is also known as the scary DBA he's really good he's actually written physical you know paper tree killing books and eBooks on execution plans and I recommend that I'd also be remiss to say that if you do use sequel server a lot please consider coming out to sequel Saturday Columbus it's part of the past organization free to all does it cost anything to get in it's a free full day of sequel server training I'm on the board full disclosure so again my contact information is here guys I'm about all the time I really hope this was helpful for you please stop by let me know how I did and enjoy the rest of your day
Info
Channel: Stir Trek
Views: 23,822
Rating: undefined out of 5
Keywords:
Id: 3LTcCOLFurc
Channel Id: undefined
Length: 60min 40sec (3640 seconds)
Published: Tue Jun 07 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.