SQL Server Query Tuning Best Practices - Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone this is kevin kline speaking with sequel century I serve as the director of engineering services at sequel century been here for exactly one year now so that's kind of exciting for me with me today is my good friend and fellow Microsoft MVP Aaron Bertrand hi Aaron good morning Kevin how are you I am I'm doing okay now how long have you been with sequel century I started in October of 2010 all right so two and a half three years now hmm yeah boy it has gone fast it has gone really fast yes it does move very quickly but it's been good folks a much for joining us for today's sequel server query tuning best practices webcast we recognize that you have a lot of different ways you could be spending your time and so we're grateful that you have carved out of a bit of your day to spend it with us and we hope to give you some value back for your time by pointing out some tips and tricks that will make you more effective at writing your sequel query fees we've got some cool things going on so let me let me take a minute and tell you about some of those sorts of things the first thing I want to tell you about is in case you didn't know we have a website called sequel performance com Aaron is the editor-in-chief there and the quality of articles over there is just amazing really really outstanding stuff and one of the things we decided to do is not only to put this sequel performance website out but to turn many of those best articles into an e-book so our very first ebook out of the website high performance techniques for Microsoft sequel server written by Aaron editor-in-chief and many of the you know the top experts in the industry folks like Kimberly Tripp and Paul Randall one of my favorite query tuning experts Paul White Jonathan caius Joe sac Glen Barry as well as a lot of just fantastic stuff from Aaron as well most 300 pages in length chock-full of outstanding transaxial and sequel tips and tricks' really really good just by attending today just by registering for that matter you are going to get a promo code that will enable you to download this book for free it will be available on the Kindle bookstore the Amazon Kindle bookstore so people can buy it if they want to but because you're a valued customer of ours and you're taking part in our community we want you to have that for free and then in the future we will also be putting out more of these as we build up more content on sequel performance comm just a quick plug to four for Aaron and his group to spend some time and read the blog post there it is it's a great website and one of my favorite parts of it is HTTP colon whack-whack answers got sequel performance comm it's a discussion forum and there is a lot of really cool stuff about how to tune your sequel these sequel statements and particularly using execution plans there on the answers portion of the website so that's neat one one other thing we're doing that's also really neat but is a source of infinite disappointment for Aaron to myself is because we are not eligible to participate in this is we're just so excited that the path the professional association for sequel server the big past summit the biggest sequel server event of the year is coming to Charlotte in October and one of the things we have decided to do since Charlotte is both our our hometown that's our headquarters at sequel century and it's also the headquarters of NASCAR we thought wow what a great way to bring those two together so everyone again everyone who attends today we'll get a raffle entry into our big giveaway that will be taking place at the past conference for three driving experiences three rookie experience packages from rip the Richard Petty Driving School and then three ride-along packages so three folks will win the opportunity to drive one of these race cars at 220 miles an hour and three folks will get to ride along in stark abject terror as well either with a professional driver or with perhaps the one of the other so just some really really neat stuff happening there and we hope you'll spread the word for us too because if you're going to come to the past event wow what a great way to spend that time and I know that I have personally have got an out on the often it's and it's a pretty high speeds but nothing like what you can do in one of these performance NASCAR things Erin have you ever been out on the on the Autobahn I haven't been on the Autobahn oh but I've driven those speeds before ah just heading out to the to the target from from home yeah pretty much get it much in my 20s and I don't do that kind of thing over but the I've I've definitely been in the hundred and thirty hundred and forty mile an hour range Wow you're braver man that I well we've heard a little bit about Aaron and myself we're old friends we've been we are both very long time sequel server MVPs I just got my 10-year award Aaron was one of the first the very first crop of MVPs I think you're espy MVP at first weren't you yet starting in 97 Wow yeah yeah that's amazing so what yeah longtime sequel server people and just really proud of what we're doing here at sequel century some really neat products and tools and we'll get will put some more attention on that towards the very end of our presentation today what we're going to talk about is right off the bat the first thing I'm going to do is spend a little time about how to approach your sequel query tuning methodically from a methodology a process standpoint and I'll also go a little bit into what it takes to analyze the different touch points along the ways you look at a set of sequel queries and then Aaron is going to jump into some really interesting patterns and anti patterns if you have been joining us for the other monthly presentations we've been doing on query 2 patterns and practices some of this will look familiar to you but we're doing the whole presentation series from start to finish again with some new tips and tricks throughout so hopefully you'll also get some some new information so with that let's talk about our methodology or in this case or a meathead ology my friend and colleague Lori Edwards is shaking your head in shame at the at the terribleness of that pun so the first thing you have to do is you have to configure your test environment so that it's stable and predictable you you know you want to make sure that you know what normal is for that environment you know what it looks like it rest so you have to take some initial measurements the next step sounds very simplistic that you run the tests but there's actually a lot going on behind the scenes with that you need to be able to take measurements that let you compare I guess you could say the same metrics you make sure that the tests that you're running are apples to apples you know not apples to oranges we're not actually looking at different things and then finally we have to you know take a stroll through the different bits of information that we've collected and we have to know what to look for in that information so that we can make some you know some strong determinations of you know what is working well and what isn't working in our particular sequel statements are all different alternatives throughout so when I when I get started you know that very first step is setting up the test environment and when we set up the test environment again the idea is to to ensure stability and consistency and you do that with what you know folks use different terms for it but use a test harness or maybe a test jig or you know a testing script I've heard all kinds of terms for that but in my case you know let me tell you a little bit of a story when I was first getting started with doing you know I had been given a whole set of sequel store procedures that I needed to tune and when I started work on the on the different sequel statements you know I ran them to see what kind of results as they return so that I could you know make sure that all the result sets are identical between different versions of the queries that I was writing and the you know I made some changes so I ran him and then I made some changes and I ran him again and it was just enormous ly faster and I was thinking wow I am a I'm a transact sequel god I am so good at this out you know the changes I made just you know they may perform it so much faster and one of the more senior fellows who worked there who's kind of my mentor walked by and he said yeah Kevin though you know tell me how did you handle all the caching of the data and I said I beg your pardon what is this caching that you speak of and and he said yeah you know all the data is been cached in the buffer caching so the second time you ran the query it didn't have to put anything in the buffer and I thought oh crap all that work I did and I have no idea whether was actually the improvements I made or whether it was the fact that the data was already loaded into the data cache and as you can probably guess it actually that was what made it faster it wasn't that I was that much better and for a long time in sequel server you could only clear the caches by cycling sequel server you actually had to stop it and restart it and boy oh boy what a horrible way to to try manage your cache that way as it turns out now in sequel server pretty much from 2005 and later we are able to use some dbcc commands so we can say dbcc eat free proc cache or free system cache to clear out the the different plans from the plan cache or we can use flush proc in DB now this is an interesting one if you are in one of these situations where you have you know what kind of a dev environment you have a lot of server I'm sorry a lot of databases on the same box you can actually you know give one database to one group of servers in another database to another I'm sorry to a group of developers in another database to a different group of developers this way they can clear up the procedure the plan cache without impacting anybody else or you can clear the whole procedure cache at one time and then there's also this drop clean buffers command so the drop clean buffers command is really useful for clearing out the buffer cache again don't do that in production you know that's that's going to really mess up anybody who's you know depending on cached results to get good performance but if you're actually just in a dev or test environment or maybe you're working on your own little work local workstation then do make sure that you clean that out each time or there's another alternative so what I'm talking about here in these first couple bullets about code to clear out your caches that approach is called cold cash testing so you want to make sure that your cache is cold every time you run the query you know that means there's nothing in it already it's cold it's not warmed up so that you have a an idea of the full amount of time it might take to run that query you know to actually get the data from disk or whatever the stored subsystem is maybe it's SSDs to load that up into RAM into the buffer cache and then to return it to the user an alternative may be if you are on an older version of sequel server sequel to 2000 or something is to do what we call warm cache testing and so instead of issuing these commands what you would do is you would actually run your query say two or three times just to make sure all of that data is loaded up in the cache and then take measurements on that second or third run so that now you're comparing each variation of your your query tuning to the warm cache numbers you know everything's in data can start in the data cache consistently so now when we take the measurements at least we know it's going to be similar you know for each variation of the query that we try so those it's very important to you know prepare your environment that the next step is to take some measurements and different people have different favourites in my case I really really like to use the set statistics time and set statistics I oh those are both kind of essential for me in addition you could also use something like set show planned text or set show plan xml and turn those on you don't have to do any of these to get this information you can actually use the GUI and flip a switch in the options panel of the GUI to show what those are but in my case could also use you could also use plan Explorer which does these for you automatically that's right or you could use the free the free tool called plan Explorer from sequel century that will take care of all of that stuff for you and you'll see that as as we're running along through through our different demos in just a minute now the last thing also that I like to do when we're doing this different kind of query testing is to check your different I guess you could say you're different DMV so that you could look at different metrics to know how the queries are actually impacting performance you know what resources are they consuming what are they doing now set statistics show plant I'm sorry set statistics IO will show you what IO consumption a query has but you get a lot more information out of a variety of DMVs and I mentioned to their system os performance counters and sis dia I'm sorry cos white stats those are both excellent and you want to get to know those DMVs but there's other ones as well and again just like Aaron said in plant explore plant explore Pro in particular we actually also keep track of what those white stats are for a given sequel query so you can you can see that in the tool or you can go after this information yourself if you're kind of rolling your own solution to query tuning now this is the periodic table of sequel server dynamic management objects and it's written by a friend of ours named Tim Ford Tim runs a really fun training event called the sequel Cruz and Aaron you just got back from a sequel Cruz I sure did I'm still on Alaska time actually ah I just got back on Saturday so it was a great event as it always is so you know just for a brief digression I you were a we could see did a lot of training that's right a lot of training and then there's also some downtime too so in on the Alaska cruise there are four ports we stopped in Ketchikan Skagway Juneau and Victoria we have to make port in one entry we call it international it's just Canada I guess for the crews to operate so there's you know we have half a day give or take at each of those ports where you get downtime and you get to go and do excursions and see the scenery and a lot of wildlife and that kind of thing and then the rest of the time is spent with the other cruisers either doing training or other we have some career oriented events and downtime where we just basically talk about our professions or sometimes not our professions as opposed to doing specific sequel server training right it's a pretty well balanced event fun fun I will that's called sequel cruise folks and I encourage you to take a look at that so equal cruise comm so written by Tim Ford this particular set of you know it's laid out like the periodic table and chemistry and so you can see there's a lot of different a lot of different dynamic management views and dynamic management functions that's sequel servers way of revealing its internal instrumentation and its behavior behind the scenes a lot of them and not all of these actually apply to us when we're doing query tuning in fact you know unless you're doing performance tuning in general you might not be spending much time with the dynamic management views the DMV's but for query tuning there are going to be five that I'm going to point you to particularly the first is diem OS wait stats in sequel server a wait stat sequel server basically from 2005 and later it categorizes every time that sequel server has to wait on any kind of resource to complete a particular set of operations those are called wait stats or wait statistics so you can tell very quickly what your major bottlenecks are by looking at your your DMOS wait stats this is really really useful because you can see you know is it related to IO is it related to CPU is it related to locks locking and blocking you know maybe somebody wrote an update statement that you know just locks a whole lot of stuff so it's the kind of thing that it really is very very valuable to determine if your sequel statement or statements aren't performing well what's the general category of bottlenecks that are happening there is a little caveat about weight stats though is that you know they're they're not always easy to understand or they're not always sensible so you need to study a little bit about what the different weights that's mean there are some of them that are just completely natural that occur all the time and always kind of you know tick upward as long as the server is up and running those are kind of natural organic things like a heartbeat or you know respiration so some of them you might get alarmed at because you see it's increasing dramatically but it's normal so you do need to learn a little bit about weight stats to use it effectively and the other caveat about weight stats and I know this is something that there and I have talked about a lot is that weight stats don't show you the full picture there are things inside a sequel server that aren't really I guess you could say fully described by what it's waiting on you know one example of that is kind of a famous weight stat if you read blogs a lot you'll you'll have seen or heard of a weight stat called CX packets CX packet weights of context extension packets is what that stands for and people say oh my gosh that means you've got parallelism in a bad way that's not necessarily true there's a there's a great big asterisk by that so a bit of a caveat on you know just taking a you know a full assessment of your entire environment just based on one or two weight stats something like that but it is extremely good for getting you into the general category or the bucket of what your bottleneck is another one that's really good is the D the DM exact query stats so this gives us query related performance information for each of the different queries that are going on it tells us a lot about the system resource consumption on a per query basis CPU memory i/o so I'll show you an example of that in just a minute another you know key key point and Aaron's going to show a little bit about this in his demos in just a minute but you know there's there's few things that impact query performance as you know as significantly as indexes so knowing a lot about your index so this is going to key to you being a top-flight sequel coder and so the way you find out about sequel server indexes is using the DMV called DM DB index usage stats and so this is a really good DMV to show you how many Sikhs index Sikhs are happening how many index scans like a system scan a user update you can see you know very granular Li what is happening on on the system both are on the index both by it from a system point of view and from a specific int you know from the end-user point of view I should say and the there's there's just two more I was going to mention the IO virtual file stats this is important because IO the speed of your i/o subsystem tends to be we find probably the single most frequent hardware breakdown you know this is the most frequent hardware bottleneck for most computer systems you know sometimes it's its network or sometimes a CPU or memory but you know if I were to bet knowing nothing about your system and you said hey it's performing poorly I'd say it's probably your subs your disks and so the DMI o virtual file stats DMV tells us a lot about all the sequel data and transaction log files how they perform how big they are you know all that sort of stuff so this can tell us if you're doing a query in a particular database is there something that you know correlates all the way down to the disks though we have that database and transaction log is saved to the disk is there some kind of correlation between the performance of those disks queries and then finally the exec requests DMV and this is this is what user by your user are doing and you you have to combine this with some other information typically to get what you want but this tells what the users are up to at a particular time and so let's take a look at each of these really quickly you know I want to give you a little bit of an idea about how to how to look at this data so let's just take a quick look at some of these so in this particular query that we have here and again these are all examples from tim-tim is the co-author Tim Ford is the co-author on a really nice book from simple talk along with a friend of mine named Lewis Davidson and it's called gosh what is that one the EMV for troubleshooting and performance tuning or something along those lines so just look at simple talk.com and look for Tim for word and you know he has a variety of these different queries that you can reuse and they're also in our slide deck so you can see those as well from the slides called performance tuning with sequel server dynamic management views I can't see my bookshelf from here so Oh outstanding so here we can see in this particular one this is the this is the the DMV that we're interested in these two dynamic management functions the difference between a DMV of view and a DMF a function the functions you have to pass in parameters of different sorts so here I'm going to take a look at the sequel Century database and if I execute this it should yeah it's going to give me some different different information here but you can see in these different columns below the average CPU the average total time in milliseconds the average number of reads and writes and those reads and writes are in 8k data pages and then if we expand the request out you can even see you know what those different sequel statements look like and of course if you wanted to sort any of these you could also sort them by you know whatever the column name is so maybe you want to look at those by CPU so we could say you know select our order by average CPU time so this is a great way to see you know if I ran a stored procedure that had five statements in it I could I could take a look at this and see which of those are consuming the most resources and then use that information that's retrieved here to determine which of those different select statements or you know inserts or updates or deletes that I might be focusing my attention on first to help me kind of prioritize I'm going to skip down to one on index usage stats so again you do different things like filter out on base you're interested in if you Aaron and I were just chatting about this actually the index ID is zero tell us again tell me again what that one means Aaron index ID of zero is a heat index idea of one is a clustered index okay so let's just have it tell us about all of the all of the indexes that exist right now in this equal century database and so if I look at that here I can see how many user reads and user right so this is telling me about the end-user activity against these different indexes and you can see it's a really long list there's there's quite a few different ones in there if I were to go ahead and say let's take a look at you know just those heaps well you know a well-designed database seldom has a lot of heaps in there okay and so in this case we've got just one table and I bet you it just has you know one to five rows in it you know what's your license key something like that it's probably just a single row yeah probably just a single row if we go with the clustered indexes so we can change the index ID to one and then you can see you know exactly what the table name is and you know how many reads and writes are happening against the difference clustered indexes in this particular database the won't we specify here so again this is telling us how busy these different indexes are but as I mentioned it on the slides you know different there's different information in there you might want to know how many Sikhs are happening index Sikhs if you have an index that isn't getting any Sikhs on it that's a bad indicator you know it's not being used the way that that may be it is intended to be used and then here's another really useful query this particular one DM IO virtual file stats and this is when you pass the no variables and that's just saying tell me all about everybody you know not for a specific database or file so here in the results down below you can see it is retrieving all the information for all of the different databases for their MDF and LDF files and it's telling you the number of reads and writes the number of the amount of time that stalled the amount of data by volume so the size on disk the number of reads and bytes again how busy are my indexes and just one additional page here to show you as well there's no need for you to write your own query about weight stats at least to get the basics there's a blogger who also writes at sequel performance commies isn't coming out in our new ebook Glenn Barry and Glenn has some great they're called the Squeakquel server diagnostic queries and he's got some great queries in which one of the things he does for you is he has included in the the not sub-clause here all of the things that are just kind of natural parts of the sequel server weight stats these are all organic things that we we don't want to find out what you know what the request for deadlock searches so use his queries and when you use those you'll see what your weight stats are but the ones that are you know actually kind of significant in terms of accumulating ie go look so in my case actually this right log is probably the most significant for for this particular server I'm spending a lot of time waiting for my transaction log to be able to deliver the right throughput that I need so those are some of the sums I wanted to show just on the point about the test harness and I gave you the the code for that in the slides but you can also come in here to the query options and set them under the Advanced tab so you can set things like set statistics time just with a checkbox there set statistics i/o and so forth so you can set these graphically if you don't want to do them with sequel text but I do encourage you to use that tranzec sequel alright so let's flip back to the slide deck and I promise I'm not going to be talking too much more whit when you get your you know when you run your your set show plan text or maybe you flip that box or you're looking at the graphic execution plan hopefully you're looking at the free plan Explorer tool that we provide for free you'll notice certain things will be highlighted in red so we take the we take the guesswork out of it we actually flag some of these things in red for you but if you're not using that you want to look for query operators in the execution plan that are things like lookups and older versions the sequel server you might see something like a bookmark lookup and the newer versions it's going to say a key lookup or an our ID lookup so those are often indicators that there is a an indexing opportunity that could be taken advantage of not always but in some cases scans can be bad you know if you have a really really big table and it does have a clustered index on it and it says clustered index scan you know I remember the first time I saw that decades ago I thought ah that can't be too bad because it's using an index and then ever realized oh wait a minute every single record of the table is in the clustered index so some can be bad but not always there are exceptions to that rule I'm always on the lookout to for spooling operations like a lazy lazy school or eager spool and certain kinds of parallelism operations are also kind of kind of alarming these missing statistics alarms they don't show up in the execution plans necessarily but those are those are important missing index alarms and the later versions of sequel server that's also something that you might want to pay attention to one of the ones that I think is not widely known by a lot of people is that sequel server and when it keeps statistics from the various indexes and so forth it will make a guess on its execution plan that's called an estimated execution plan and then there's the actual execution plan and one of the reasons I always like to pull both the estimated and the actual execution plan is because anytime I see that there's very dissimilar numbers for what a particular step in the query process produces in terms of the number of Records if it's very dissimilar between the estimate and the actual then that is a you know a waving red flag to let you know that the statistics are somehow wrong they're out of date they're very stale is something like that so you're going to see this in action in just a minute what we're going to do now is is Aaron is going to take you through some patterns and anti patterns we have a lot that we are going to talk about throughout the course of many of these presentations over the next few weeks but today Aaron is going to talk about a few of these in particular so these are kind of teasers for what you're going to be seeing not only just today but in the future so you'll be seeing some more of these so Aaron why don't you take it away and tell us about some of the patterns and anti patterns sure so I see a lot of common anti patterns in sequel server some things that people do a lot and they become habits that tend to proliferate through their code even though in a lot of cases they can do bad things and this is not just you know code that I review for customers but also questions I answer on Stack Overflow DVA dot Stack Exchange and this goes way back to even though Microsoft News groups some of these core problems and situations come up over and over again even if some of the details change so we have this list of patterns and anti patterns here on this slide we're going to cover a few today a couple that aren't on the list the list is much bigger than what you see on the screen right now so today we're going to deal with three the first one one that never goes away is people's tendency to use cursors even when they aren't necessary so they they seem to be a knee-jerk reaction in a lot of cases for folks who think procedurally or in terms of acting on each row instead of on the set as a whole so that be on the next on the next slide Kevin so yeah so cursors are usually unnecessary but I realized that a lot of times you know they have their place and I could list many examples where they are perfectly valid or where the effort to convert them to set based code would not have a big enough payoff and yes a while loop is a cursor so a lot of people believe that if they change their cursor to a while loop that'll magically get rid of the cursor it's not the case it a while loop is still a cursor under the covers it just doesn't explicitly say cursor and no a while loop isn't any faster than a cursor which is a pretty common myth just to get those out of the way now I don't expect anybody to go out and change all of your cursors to set base code just because I said so anyway but I'm going to tell you about a much simpler change that you can implement very quickly and make those cursors that you are using much less painful even if you don't have the time to invest right now in changing them to set based code so the defaults for cursors in sequel server are they are global they're dynamic they're updatable and they're scrollable these are all very resource intensive options that are rarely required almost every single cursor I've ever seen in the wild has not needed any of these options to be set typically what people are doing is they're using a cursor to loop through a result set exactly once going forward and aren't using the cursor at any different scope than the current one usually they are just obtaining individual values from each row and firing a stored procedure or otherwise acting on that value without directly updating the cursor using where current of without having the need to go backwards and forwards within the cursor it's a it's basically a fire hose so shoot through the list once and then you're done there's a blog post here on this slide where I've talked about this pretty extensively but I'll show a quick demo of that in a couple of minutes as well the next slide is one about correlated sub queries so people often want to select from one table and then also pull some aggregate or other data from another table but they don't want to write a join or add a group by or any of these other complex query constructs so they include sub queries in the Select list to pull the data not realizing that it can lead to a much more expensive query sometimes it's the same or no worse if you're only pulling one result for example it'd be equivalent to join but a lot of times I see this pattern used and it stacks up so they'll they need to pull a second value so they add a second sub query and then they need to pull a third value and they add a third sub query so I'll show an example of this and a couple of potential rewrites and a very dangerous rewrite that people might might tend to use and then the third demo that I'm going to show is about the the not in pattern so this is where users want to find say all the patients who haven't had a flu shot or all the customers who have never placed an order all kinds of things can go wrong here for example the checked column that's inside the not in clause could be nullable so in this case the query is basically asking is this not unknown and the answer of course is yes because unknown is is unknown so what happens there is no rows are included in the result the reasons for that are a lot more technical I don't want to go too far into them but the the end result is when you say not in and the output column is knowable you're going to include those rows where the column is actually null which isn't exactly what you meant when you said you know is this value not in this set of values a lot of people will rewrite that not in as a left outer join and check that the right-hand column is null but similar problems can happen here unless you're very careful about selecting which columns you're checking for no you can also use accept but there are a couple of side-effects one is that the shape of the result set must be the same so it's tough to say select first name last name from customers accept select first name last name from orders because first name and last name don't exist in the orders table so what you would have to do in that case is maker is use except with a much more complex derived table of some kind that pulls out only the columns you need to use in the accept and then you still have to go and perform a join to go get those extraneous columns that aren't called common to both tables except also implements an explicit distinct so or I'm sorry an implicit distinct so the distinct happens and you don't really know about it so it may not always produce the same results as not exists or left outer join equivalents I've locked about this one as well pretty extensively but I'll also show a small sample demonstrating so I'm going to take control and hopefully you can you can see my screen sure can so the first example I'm going to start with a cursor and I'm going to actually run it and then I will explain it because it takes a few seconds to run so essentially what I'm doing here is Kevin talked about I'm dropping clean buffers so that warm cache doesn't influence my results in any way next I'm declaring a couple of variables one to keep track of my time and one just as a as a variable placeholder now I've got a cursor here that I'm declaring right so I'm just using the default options declare C cursor for and then a query and this is very simple I'm selecting the top 500 thousand rows from a big sales order header table I'm opening the cursor fetching the value and then in the cursor loop all I'm doing is is playing with that variable a little bit I'm not storing it I'm not printing I'm not doing any output nothing else this is just a very simple loop close and deallocate the cursor and then I tell us how long it took so if we look at the results using the default cursor that took about 10 seconds right pretty simple but that took a long time to go through that loop knowing that I haven't really done anything with the data I haven't even presented the data I've just processed this cursor yeah so now let's see what we add a couple what happens when we add a couple of key words to the cursor declaration so instead of global I'm going to make the curse local and that just prevents the cursor from being exposed to outer scopes so if this was in the stored procedure the caller would be able to reference the cursor if I didn't use the keyword local here so it's kind of like a temp table right if you create a temp table inside of a stored procedure the caller can reference that temp table outside or after the stored procedure has finished unless the store procedure has explicitly dropped the temp table this local keyword is essentially like dropping the cursor as soon as it's as soon as you've closed and de-allocated it instead of leaving it open and leaving it referenceable by outer scopes okay and the other key word I'm adding yeah and the other key word I'm adding is fast-forward this makes the cursor read only and not scrollable so it's it's combining read-only and forward only into one keyword everything else is the same so let's run this and see how long it takes and that took five seconds so by adding those two keywords I haven't changed anything else about the query I haven't changed any semantics about what I'm doing the cursor is the same the loop is the same it's just a local fast-forward cursor which as I said before matches at least 99% of the cursors that are out there today and I cut my execution time in half that's a pretty substantial saving Wow yeah okay so onto sub queries so this is a pattern I see quite a bit we've got I'm pulling customers let's say from the customer and person table not going to make comments about the funny schema in adventureworks I'll leave that to Stacia to explain Stacia meissner's is one of the authors of the adventureworks schema but i don't know if she'll readily admit that to a lot of people so the type of pattern I see is we're getting this information from the customer tables and then we're pulling in metrics about their order history from the sales order header table and in this case its sales order header enlarged and I'll talk about that at the end so basically we're taking what I want to know is how much has customer spent total of for all time and when did they place their last order so these are metrics you might feasibly want to get I've tried to make this query as realistic as possible these are two metrics that you might want to try to get when you're trying to understand your customer base and this is the pattern that a lot of people will use so instead of adding a join to the sales order Heather and large table and then having to group by all of these columns they find that this is a lot simpler so we'll make a correlated sub-query in the Select list that goes out to the sales order header and large table and grabs the data for each customer in the result now the way that some people would probably write this if they're thinking a little bit better about how the these sub queries can stack up is to write a join so we just we add the sales order header and large table to the join and then we group by so we take the sum in the max without this all this correlated sub-query stuff we just take the sum in the max and then we add the group body at the bottom now one of the ways that I would typically write this query is I would go out and pre aggregate using a CTE so that I don't have to group by all of these columns and because grouping by first name and last name doesn't really make sense it's not really what you're grouping by but you're forced to group by those columns because of the aggregation that you're doing so I'll pull out the the aggregation using strictly the columns that are necessary so in this case would be customer ID and person ID and the reason person ID is required is because that's how we join to the person table right so I'm trying to restrict the amount of reads I'm doing to perform the aggregations and restrict the number of columns I have to repeat in my group by clause so here I'm taking the sum and the max just just like before taking the sum of the max but against just these two tables and then using the CTE I'm pulling that out of the CTE and then joining to the person table now when I join to the person table I know it's a one-to-one map and I don't need to do any grouping or aggregation anymore I've already done the aggregation inside the CTE and now it's just a straight join I don't have any group buy on the outer here I no longer have to group by the first name and last name like I did in the end the join up here right more well another way that might be right yes it's a it it's kind of both so it's a little bit less elegant because you have the separation now of clauses and you kind of have to mix things up a bit but it's a little more elegant because you don't have these unexplained groupings right so you can have two customers named John Smith and because the customer ID is already unique grouping by those additional John Smith columns even though you have two that are the same you're not really grouping by them at all are you right and just as a little aside you might also address this too and a lot of people seem to be curious about CTS in general and they you know sometimes they'll approach CTS as if they're going to be a much higher performance kind of construct than say a view or you know general sequel select statement that you want you want to give a quick thought on on that sure so there's a couple of things that happen with CTS and there's one in each direction one of them is people think that a CTE is like a temp table so when they build some query inside of a CTE it will only ever be executed once what actually happens is the query that's inside the CTE will almost certainly be evaluated for every single time you reference that CTE so if you if you stack your CTE s and you reference the previous CTE in multiple CTE s or in the query down below or you have a correlated sub-query that also references the CTE the CTE will actually be executed that number of times however many times you referenced it independently it's basically so that can it's like a it's a lot like a view right right it's basically an inline view is what a CTE is now on the flip side a lot of people will say oh well I wrote this TTE once and it killed my performance so then they they have this fear mongering attitude where they say don't ever use a CTE because it sucks for performance and what actually happened is something that had nothing to do with a CTE if they wrote that as a derived table it probably would have done the same thing or perhaps they wrote a recursive CTE and you know that their recursion was was way had a much deeper cycle than they thought or all kinds of things can happen there and the problem is if when someone has a bad experience once they tend to extrapolate that and say oh well this is always bad and I don't want anybody to be scared of CTS because one person had a bad experience or maybe improperly coded a CTE and and have this opinion that they're bad mmm okay that's a lot like me in circus clowns I yes I would agree so speaking of CTE s another way that might be tempting is to use window functions instead of grouping so if we look at this example I've got a CTE here that selects four columns well the three three of the four columns that I'm interested in and then to calculate lifetime sales I do a sum over all right so sum of total do and then I partition by customer ID so this this will give me a sum for each customer but notice that I don't have a group by here so the window function allows me to do to perform that sum without actually grouping so I don't have to group by all of these columns when I know that my grouping really only involves this customer ID column and then we have the order date and then we can also use a row number for example to partition by customer ID and order by the ordered order date descending and this will tell me which result in this set represents the earliest or I'm sorry the latest order date right so we calculate all of this information using the window functions inside the CTE and then outside we filter so we get all of these columns and we filter where this row number is one so the so the old the newest order date for each customer so flip over to plan explore and take a look at and see how these work and so all of these queries produce the same result set but they're going to have probably massively different execution plans it's possible they will have different execution plans yes they produce the exact same results so we'll get the actual execution plans should take about six seconds I think computer is getting slow okay here we go so we have this first version is the one with the the two sub queries that were stacked in the Select list and we can see that the query performed two clustered index scans right and that makes sense if we had added a third sub query in there that had to go get aggregated we'd have to perform another clustered index scan so this the explosion of this query if you will is linear right so we perform an additional clustered index scan for every sub query that's in there now the problem is if you have much bigger tables that you're pulling data out of and you have a variety of tables that you're doing this from there it makes the query much more complex obviously but it can also really increase the cost without you realizing it until you've run the query we look at the join and CTE methods they're roughly equivalent you know we've got half the number of reads because we only performed one clustered index scan and the duration is about half as well so it's a little slightly higher than half and I bet if you flipped a coin and ran these a hundred times 50 times the CTE would come out faster than the join these are very small numbers in the grand scheme now if we look at the seat the CTE that had all those fancy tricks in it this is a much more complex plan right so there's a lot of stuff going on here we're introducing table spools we're introducing really expensive sorts we're introducing hash matches and we've got two clustered index scans that don't cost all that much but the they're right you know there's a lot more going on here so window functions can sometimes make your query structure a little simpler so I didn't have to add group by to get all of these aggregates but it came at a significant cost right so we've got you know the query cost I don't pay too much attention to these the percentage numbers but look at the duration yeah it's almost 10 times right and then CPU is a lot higher as well look at the reads 2 million almost 3 million reads versus 60,000 and 30,000 for the other two approaches so you can you can do things that make you seem a lot clever and have a slightly more compact query statement or a more complex query statement that's less to type but more complex to look at and it doesn't always necessarily produce a better result right that's just one example where it some some fun things happen and a kind of a correlated point to that to Aaron is that Windows functions when doing functions are new in sequel Server 2012 so one of the things that Aaron and I both keep saying over and over again is don't just automatically switch to the newest shiniest thing just because it's new you know know know what business cases it's intended to solve and apply it accordingly it you know just because it's new doesn't mean it's going to be the best way to go right so you know the original windowing functions like row number and dense rank those were introduced in 2005 2012 introduced a whole bunch of new ones thanks to its ik and a lot of them are very useful to get you the answers you need without having to perform multiple queries or to perform really complex queries but like Kevin said in some cases they're not better I have a blog post on sequel performance calm about medians and the new the new percentile functions that were included in sequel server 2012 are actually much worse at calculating medians than existing methods so just because it's shiny just because it's a slightly less code make sure you test that against the method you're using now to be sure that it's actually a better option it's not always going to be the case right they don't that they don't improve the performance of every single use case well we've got about five minutes to go tell us about the not incident yeah okay so okay so not in this is a pattern people use a lot and the problem one of the problems is that you can have incorrect results if the column inside the not in clause is knowable thankfully in this case it's not true customer ID in the sales order header table is not knowable but I still find that a better approach is to use not exists so this is essentially the same query the same query pattern it's going to produce the exact same results the thing I like about using exists or not exists compared to in or not in is that the the result inside of the clause can short-circuit so all it has to return is true or not true whereas the in and not in they have to materialize the entire result set so at the very worst case not exist is going to perform the same as not in and in some cases it will perform a bit better I talked about accept and so in this case we're only looking at customer ID so we can do this in a lot of cases you won't be able to do this so select customer ID from sales customer accept select customer ID from sales sales order header and large this is again same same results just a slightly different query pattern where we're saying give me the customers that aren't in sales order header enlarged and then we've got various ways to do a left outer join and depending on whether the column that you're checking in the right table is covered by an index or is knowable or is perhaps a lob column can really impact the results and I will show that by switching to plan Explorer I'll be labouring too much so we just looked at all of these queries here's not in not exists the accept and then four different left outer joins one is on customer ID which is the which is a key and covered by the index one is on sales order ID one is on total do we notable and is I'm sorry is not knowable but is not part of the index and we'll see how that affects it and then comment which is which actually is knowable and if we look at the results I'm not going to run it again just in the interest of time but we see that go to the plan diagrams so the not in and then not exists in this case produce identical plans and they also produce pretty identical results and the except same thing very similar plan and similar duration now when we get into the left joins no matter which one we use we see that the performance is quite different so as soon as we start using left outer joins instead of the the first three patterns we see an escalation in duration and we also see we see lower reads now this is interesting because on sequel cruise last week I had a conversation with someone who said I always look at reads to determine when I'm comparing different queries to determine which one is probably going to be my worst performing query well in this case we've got reads of sixty six thousand three thousand and thirty thousand but then look at the correlating durations so I tend to look at what my end users are going to look at first and end users don't really care what this read count is if the duration is you know ten times or a hundred times longer right so just just a case as a counter example where we see here that high number reads doesn't necessarily mean low performance one last thing I'll point out here is that when we used a nullable column in the left outer join when we check the right-hand column to be null we get the wrong results so all of these queries return 701 rows except for the one where we used a nullable column as the check column so this included all of the rows where that the column matched but was actually null so you can't you can't really tell the difference whether the column was no because it didn't match or because it was actually it matched and it was no so the for these reasons I always stay away from the left outer join patterns I'll prefer not exists over not in and for simplicity when I can I'll use accept and I think that's about all I have in the way of demos Kevin would you like to take control back yeah and just to add on to to what you were talking about from there make you know make sure you do look at all those different factors the you know just like you said if if you're you know tuning an update statement then you know that's a read heavy operation possibly but it's also a right heavy operation so you know measure what is significant to the end-users so much of the time before try where we think of what we do only as technology related but at the end of the day what we're really being employed to do is to solve business problems and we're just using technology to do that so sometimes we don't need to be over engineering a technological solution for a business problem that might involve you know something very simple great demo Erin thanks thanks so much that's really useful stuff and folks just like we have here on the screen read this blog post I don't mean to embarrass Erin but but this is one of my favorite blog posts in a long time it's a it's a very good very informative and very kind of systematic proof of these different techniques and which works best all right so we're right at the top of the hour thanks everybody for sticking with us throughout very pleased to have you with us and just a couple quick follow-ups you know if you're active on Facebook or Twitter please follow us on those different networks on if you're on LinkedIn for example also we are we've got a website at sequel Century net and as we mentioned before sequel performance comm as well as the answers component of sequel performance comm one of the neat things you can do with our cool free plan Explorer tool is you can actually upload your execution plans to the answers dot sequel performance net website and incredibly brilliant people like Aaron and white and other MVPs will will give you some feedback on so anyway it's almost like free consulting and then also we have a variety of other tools for monitoring event alerting and alarming things like fragmentation management and disk base management for your sequel server databases so those are all available and an entirely free trial versions you know they've got all the features they're just time limited so would love for you to download some of those give them a try and if you're interested schedule a demo with me and my team will be happy to spend some time with you show you how to use the tools how to make the most out of them and and get some good value out of those two in particular I'm going to put Lori Edwards to work on endless demos for you so she can she's shaking your head no at me across the table okay they're great stuff thanks so much for your time appreciate that is a really good really good set of info all right in on Erin's behalf along with everyone at sequel century thank you so much for your time as our attendees we appreciate you joining us we have got another one of these strictly query tuning sessions every week pretty much for the next two months with the exception of the US holiday a couple u.s. holiday weekend is coming up this summer so hope that you will join us for more of these throughout June and July please tell your friends and encourage them to come along too thanks very much bye-bye
Info
Channel: SentryOne
Views: 125,524
Rating: 4.8244805 out of 5
Keywords: query tuning, sql server query tuning, sql server query
Id: MFuUCRjnTEs
Channel Id: undefined
Length: 63min 44sec (3824 seconds)
Published: Tue Jan 28 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.