6 Query Tuning Techniques to Solve 75% of Performance Problems — Amit Bansal

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
me hey well good morning good afternoon good evening everybody depending on where you have joined from thank you uh goodbye organizers the founders for giving this opportunity to deliver a session for the worldwide community this is my second session with group by and quite glad and grateful that i get this opportunity to talk about this session which is very close to my heart six query tuning techniques that will solve 75 of your performance problem not your performance problem basically your sql server's performance problem so the motivation behind this session title is quite simple when we work with sql server in trainings consulting engagements we go from customer to customer there are common anti patterns that we see common problems that they are dealing with and um i have and now whether it is 70 or 75 percent really doesn't matter but what i try to do is learn those things at customer locations sitting with the people developers dbas and have created a few academic examples of those common problems and i have seen practically that if i just take this tool set toolkit and go to a customer that here could be these potential problems it really elevates close to 65 70 75 percent and in some cases maybe even 80 performance issues so those are the common ones that i'm going to talk about now these common ones because they are common ones you might know some of them which is absolutely fine it could be a good refresher or you might learn something new in a demo that i'm going to present but there could be a couple of examples that are new to you and you probably have never heard about them or you haven't seen or you may have heard but have little experience that's the whole idea of this session with 60 minutes this is going to be fast paced session i would request and will be grateful if you can ask all the questions at the end of the session i mean you can keep punching in your questions in the q a panel in zoom and daniel and i and other moderators can help take them up at the end of the session my name is amit bunsel i am an mvp mcm working with sql for many years now i am on twitter um a underscore bunsel is the twitter handle i work for a company called sqlmastrose.com and you can go there and there are a lot of great content and videos out there i've just posted something in the chat window so that you can grab all the necessary links if you like my session you would like to subscribe to this urls bitly connect with ab and if you want the scripts of this session the demos just drop an email to connect with ab sql meshrows.com i will shamelessly subscribe you to my list but then you can unsubscribe but anyway either way you will have the list the demos with you and yes if you like the session you can tweet a group by conference and the underscore bunsel i hope i've got that twitter handle right so let's get started with the um demos so six query tuning techniques which means six demos starting with uh the most basic ones and then we'll move on to little advanced stuff i mean and basic and advanced and intermediate intermediate really all depends on your experience and expertise with uh the with sql server so let's start with the first one which is non-stargability okay so i see a chat i can only see a part of the screen is is that the case with all participants can we daniel is the screen fully visible i can i can see all of it object explorer solution explorer and everything everything right and yeah if the attendees can just quickly confirm that you can see the full screen that will really help okay all good all right great part of solution explorer is cut off someone is saying that okay so let me try this maybe they can try to uh resize their zoom window uh i don't think you have to do anything okay i did share the entire screen now so i hope that will be better i i think you're good just okay perfect let's get started you know the let's we wait we can't wait anymore for the demos so the first one non-sargability and this is the most basic one the most common one and really the pain point here the setback is you have the right index in place and you are not seeking on the index for the beginners people who are new to sql server or the indexing structure the idea is quite simple index is a battery structure it helps you get to the data faster remember just by creating a mere index does not improve performance you got to seek so when you have an index it is a battery structure you are traversing from root to the intermediate level and to the leaf level if you have an index and the optimizer just scans the leaf level of the index it's not all that great i mean it's back to scanning so in most cases you would want that the optimizer seeks and as i said the biggest pain here is you have the index but the optimizer is not seeking and non sarcability is a common reason for that now sargability is not really an official english word so to say i mean it's a made up term which means search argument ability ability of the optimizer to use the search argument to seek on the index let's use adventure works 2016 for this demo adventure work adventure works 2016 some people don't like it but it's not that bad a database even if i would have used any other database the learnings would have been same so let's use that we're creating an index on sales order detail table and the column is modify date that is my index column and i'm including sales order id so the point to note here is i would like to seek on this index and i'm going to search on the column modified date now um we can fix statistics time and io these two statements are quite handy quite powerful when you're doing performance tuning and query tuning stuff time on will give me the estimated the elapsed time the cpu time how much time the optimize the execution engine has spent and io on will give me a lot of information about logical reads and physical reads in this demo they are not very important because this is this first demo is quite elementary now look at the query what's important is the query itself but anyway let's turn this on if i have it in the demo okay now let's look at the query itself if you see it's a query that we are fetching a few columns we're joining between two tables sales order header and sales order detail but look at this portion where and you're doing some conversion on modified data the moment you look at this query i'm sure many of you know where is the problem the problem is you're doing some mathematics on the left hand side of the equality operator so the best practice here is try to create try to keep your columns your attributes as neat and clean as possible the problem with the optimizer here is when you put an attribute and you do mathematics here like something like this like converting or casting or anything like that it becomes an expression then the optimizer is just comparing one expression with the other expression and it loses the seeking capability so even though you have an index on modify date sql server will not see it instead it will land up scanning so this is a very very common anti-pattern that we see with customers when you're developers when they write queries let's turn on actual execution plan and go and execute this and if you look into the let's jump into the execution plan and you will see that the first one up there you are scanning so this is what you didn't want you are scanning and let's take the cursor over scan and let's look at the object is this the object that you created we just created this object and here is the object idx sod modified date and there is a scan happening now if you're going into the messages because i did turn on the um set statistics time and io interesting thing to note here is the i o factor and you can look at sales order detail here and look at the number of logical reads that 337 pages were read now the problem here uh the fix is quite simple here make sure that your um columns are clean and you're not doing any arithmetic on the column on the left hand side of the equality operator and there is the equivalent of this same statement so you're just rewriting the t sql here you have to of course ensure that both the t sql statements the old one and the revised one revised one are functionally equivalent now if i execute this one and let's go to the messages tab and you can see now says order detail logical reads is just three so 337 io reads logical reads have come down to three obviously and if you go and look into the execution plaque plan now you are seeking and this is what you wanted you wanted the seek capability and let's take the cursor over this and you can see the um object is idx sod modified date and you're seeking so this is the most elementary most basic stuff and i'll tell you this is common very very common i mean and i can out of 10 customers eight to nine will have many such queries where this problem persists now this demo does not end here about scanning and seeking as it turns out this battle between scan and c is it just goes beyond sargability so now let's jump to the second demo so i'll go to the second demo where which is kind of an extension extension to this one but not really stargability now this is a classic case where you have the right index and there is no solubility problem really yet you're not able to seek and this is interesting so let's use adventure verse 2016 again and let's turn on actual execution plan the query says select start from person dot person where first name is equal to ken quite simple and for whatever reason it's um i know that there is an index in which first name is an attribute right it is one of the indexing columns let's go and execute this and see what's happening behind the scenes you get the data if you jump into the execution plan again you see that scan is happening and it is not seeking this is something that you may want to fix again so let's go and take the cursor over scan and let's see what's going on when you look at the object here and look at the index name here it says ix person last name first name middle name looking at the index name here it of course rings some bells that okay this is an index an object where last name is the first column first name is the second column and middle name is the third column which means this is the multi-column index now when you have a multi-column index things get a bit tricky you know in real world single column indexes are quite easy to deal with but when it comes to multi-column indexes things get a lot more tricky so in a multi-column index now i am i have in my query where first name is equal to ken if you are not seeking on the first column in a multi-column index you cannot seek on the second column and if you're not seeking on the first and second column you cannot see from the third column that's the rule of the optimizer because in a multi-column index the order of columns matter but the order of columns in your select query do not matter now i want to repeat this statement again because there's a lot of misconception about this here and when i show these demos developers haggle around and try to change the order of columns in their select query well it doesn't matter the select query gets submitted to the optimizer the parser will parse it and internally a query tree or a parse tree is formed by the optimizer which is an internal form that the optimizer understands so it really doesn't matter in which order you put your columns in the where clause in the predicate but the order of columns in the index does matter so if it is last name first name middle name remember it is a left based subset search happens from left to right so it wants to first seek on last name but last name is not there in your query so it cannot seek on first name so it goes to the leaf level and starts scanning the entire leaf data which means all the pages and it is a scan now let me prove this to you let's prove this to you and you and also you can see that there is a missing index hint here um which is again something that you always see every now and then so let's right click and see the missing index details now missing index hints and you know these details again there is um they have bad reputation right you have read so much and learned so much about them i think they're not very bad you know they are they may not be extremely good but remember hint is a hint the idea is do not follow them blindly but take this guess from the from the optimizer from sql server and see if it makes any sense when i look at this hint it tells me to just go ahead and create an index on first name well which is quite obvious remember most of the times i think it always recommends creating a covering index but i'm not going to follow this blindly but i'm going to keep this in mind now let's get back to the query here is another version of the query the same query but this time i have included last name into uh this query now as you can expect what will happen we will expect seek to happen because now we are seeking on last name and first name we don't have middle name but it doesn't matter because middle name is the last column it's the third column searching happens from left to right so the optimizer during execution can seek on last name and it can seek on first name so seeking will happen so let's go and execute this you get the data let's go and jump into the execution plan and there you go you can look at the operator you're now seeking and yeah take the cursor over this and you can see the object it's the same object which is last name first name and middle name and of course you have your seek predicates also same object and here are the seek predicates now as it turns out this is um another common thing that we see with customers having the right indexes in place but they're not being scanned upon and this is the common thing that i see with multi-column indexes so always keep this in mind the rule of the optimizer is it is left based subset you have to seek on the previous columns when you want to seek on the subsequent columns otherwise the execution will resort to scanning well this is the second demo done and i hope i'm doing good with time let's move on to the third demo implicit conversion keep your questions coming into the q a panel and i am going to take all of them at the end of the session implicit conversion now so we're moving away from um uh indexes um oh not really we still have a few more things anyway so quick recap the first one was about targetability um the second one was about you have the right index but seek is not happening third one implicit conversions now this demo of implicit conversions is uh i'll start with something very basic but then i'll go advanced so i'm not only trying to show you that implicit conversions are bad for sql server this is something that all of you probably already know but i am going to do this level zero then level one and level two so let's just quickly be done with level zero so here is a query let's turn on set statistics time because we want to observe the performance and we are doing varchar to envelop conversion in this query you have a select statement or you're fetching a few columns joining two tables now look at the interesting thing where account number is equal to some literal now on the right hand side of the equality operator you have put an n prefix now you know you're telling sql server within prefix this is unicode data which means this data type is nvar char and if you take the cursor over account number and of course you would know the database design and the model and you can see account number is that chat so you have varchar on left hand side of the equality operator and vancha on the right hand side sql server will attempt to do implicit conversion when you leave this decision to sql server to decide to convert from which data type to which data type which is the source and the target sql server is going to spend some time to decide that and look into permutations and combinations so let's select this turn on actual execution plan let's go and execute the query you get the output let's look into the execution plan the moment you look into the execution plan you know what to look into right which is look at the iterators the expensive ones arrows thickness the data flowing right but look at the select operator there is a warning symbol there and if you take the cursor over the select operator there is a warning symbol so it should catch your attention and the warning clearly says that type conversion in expression converting blah blah blah may affect seek plan and query plan choice so this is happening now if you go to the messages tab and scroll down a bit you can see that sql server has spent about 250 milliseconds on the cpu time and the elapsed time was just kind of half a second but this is important 250 millisecond cpu cycles were spent to do this conversion thing now as i said this first part of the demo is quite elementary the fix is quite simple here um i want to ensure that you have the same data type on both sides of the equality operator so i will just remove this prefix n okay so watch this what i've done is just remove the prefix n and i'm going to execute this again now that you have virtual data on both sides of the equality operator let's go and execute this and what you're going to see is execution plan is all very neat no warning symbols anymore if you go to the messages tab here you can see and scroll down that cpu time is zero milliseconds and the elapsed time has also elapsed time has also come down by has been cut down by half now comes the interesting part when um in some other uh conferences and sessions early on when i was trying to show these little pieces of demos uh developers and dbs had this question that i mean both the queries were running in less than a second less than half a second it does it really matter to save every millisecond i mean is what will well that should be the effort i say yes it does matter if you land up saving few milliseconds because it's not just about this single query this is the demo environment i'm the only user i'm just running a singleton query and everything is running in less than half a second but think about this typical classic oltp query which comes from so many applications so many users will hit sql server and a lot of these reporting queries or application queries and if you multiply that look at the magnitude of how many milliseconds are you going to save how much cpu time are you going to save the idea is very simple let's go back and put that end prefix again let's do implicit conversion and let's run this let's say 30 times so when i say go 30 i'm going to hit this query to sql server 30 times back and forth between the client and the sql server and i'm going to turn off actual execution plan because every execution will generate one plan which i don't want what i only wanted want to do is take the total execution time of this 30 iterations so let's go and do that let's go and execute and you will see that every execution is a back and forth trip between the server and the client i mean and it is good here execution is going to be a little faster because the server and the client are on the same box but in real world you know that's not the case so the total execution has taken about 14 seconds so if you look at the status bar there it took about 14 seconds now let's go and fix which means we remove the prefix again so i'm just trying to show you the scale of it in a again in a very simplified manner and let's go and run this again and now you're going to see that same 30 executions of this query there will be no implicit conversions earlier it took 14 seconds and now friends this is done and it has taken only 5 seconds so if half a half a second or 200 milliseconds was not convincing 14 seconds to five seconds should be convincing now comes the third part of this demo and the most interesting thing is which is not very well known but again depends on whether you have come across these things and i learned it the hard way doing it trying it playing around it uh with all this stuff is one is you leave the conversion to sql server the other is you do the conversion yourself which is can i do the explicit conversion i can tell sql server that you know this is the data that i want i'm comparing with so i want to compare this envelope chart to var chat so doing an explicit conversion using the convert function telling c sql server what conversion you want this is going to be equally fast so the idea is to not have implicit conversion happening and either ensuring that the data types are same or doing explicit conversion so let's try this let's go and first turn on actual execution plan we're doing explicit conversion let's go and execute single execution it is done let's go to the execution plan as expected execution plan is neat no warning symbol on select operator now the important thing go to the messages tab and see oops only 15 milliseconds is what it this has taken maybe slightly more but then much lesser than 250 milliseconds now the fun part let's do go and do it 30 times you know this is what we wanted to do so turn off actual execution plan let's run this query 30 times with implicit conversion it took 14 seconds when you had the same data type on both sides it took 5 seconds now when you're doing explicit conversion is it better is it worse let's see friends that has taken just five seconds equal to very close to um having the same data type so explicit conversions are better off than letting a sql server do implicit conversions this is the third demo now if time permits i'm going to show you something based on questions that usually come and i was doing this session in data weekender the other day and questions came i mean so this implicit conversion thing these warnings they're happening in sql server all the time how do you identify these workloads how do you find it out well you might guess it right the answer is extended events so extended events is there uh for help and you can create a session where you can choose the event convert implicit something like that i don't remember the exact name of the event but it's convert plan implicit and you can choose that event create a session and and track these happening of course you should put the right filters because which database you want etc if time permits i will do that at the end of the session okay so let's close this one uh the end of third demo six query tuning techniques that will solve 75 percent performance problem three are done so here we have through let's go to the fourth one dynamic sql plan cache now this demo is quite long fringe so um i will do a shorter version of this to ensure that i don't eat up in the next people's time this is about developers writing queries and not parameterizing their queries explicitly parameterization is an important aspect in sql server and i'm sure all of you have heard about explicit parameter um you know the parameter sniffing um then dynamic sql and the the good and the bad things of dynamic sql um and the parameter sniffing with stored procedure so it's quite a huge topic i'm only focusing on what really i see with customers okay so let's look into a query here so let me scroll down and first take you to the query and then of course i'll run some dmvs etc here is a simple query so the developer wants to write a dynamic sql statement which says select all uh the some employee id information employee id national id number etc from adventure works which means i should use adventure works let's do that there you go and it says where manager id is equal to now this is interesting this select query is encapsulated into a local variable sql string you can see sql string has been declared here as nbc500 and you're doing a cast so where is the manager id coming from manager id is coming from a variable called at the rate count again just a dummy example but then real world is similar so the manager id comes from at the rate count variable you're running this in a loop and we are executing this query for 100 managers so to say just assume manager id 1 2 100 and then we use sp execute sql which is used to run queries dynamics in a dynamic sql way and you put the string at the rate sql string now when you execute this what's going to happen sql server internally is going to create one plan for each execution of the query that's going to be very bad because you will end up having 100 plans sql server is not automatically going to parameterize this this is a dynamic sql execution dynamic sql is quite powerful but this is the bad way of using dynamic sql so let's free the proc cache let me show the impact of this and i've feed the proclash remember friends dbcc free prop cash dbcc drop clean buffers these are statements not supposed to be run in production environments this is just a demo so i'm doing it so what i'm going to do is before i run the query i am going to go and look into the plan cache size so let me do this let me take these two queries as helper queries and put it in a new window so what i'm oops sorry let's copy this and put this in a new window here so first i'm going to see how many plans are there in the plan cache and i'm going to look at the total plan cache size in kilobytes so when we execute this let's go and free the raw cache once more before running the query just to be a little realistic so we execute this and now let's go and execute this one so you can see that there are two plans in the plant cache and the total size is 192 kilobytes well good enough a lot of internal stuff would have run in just one second or two seconds so let's go and execute this again okay we're still there no problems so two plants 192 kilobytes is the size of the plan cache now let's scroll down to the query time to run this bad sql statement here and let's go and execute so this is going to run 100 times for 100 managers with the loop and the variable at the rate cnp is going to get updated with each loop and what's going to happen what's happening now behind the scenes it's taking a bit of time as you can see we're done with the total execution it took about 15 seconds let's go and look at the plan cash now and you will see that wow 103 plans in the plan cash and the size from 192 kilobytes is now about 1800 kilobytes i can tell you that the 100 plans are actually the plans that just got created by the virtue of that execution how can i prove that that to you so let's go back to this one and there is an interesting query here where you're fetching all this information from um query stats query plan and you're doing a cross supply with sql text you will have all the scripts no worries but the interesting thing to note here is we are grouping by query underscore hash so pretty hash means the signature of the query so all the 100 executions have the same signature so to say because the only differentiating factor is the parameter value which is 1 2 three so on and so forth so when you group by query hash in the group by conference in this session if we execute this you will see that there are 100 entries here and you can see the query this is the query that we were running isn't it select employee id national id number title manager so on and so forth and 100 entries so this proves the point and you can see the size has increased from 192 kilobytes to 1800 kilobytes friends this is happening in your environment so you know why not take this query and you know just kind of run it on your production box maybe you may get a lot of surprises now the important thing how do you fix it well explicitly parameterize your query so i'm zooming out a bit just to make sure that the query fits the entire screen look at this query now a couple of things are similar to the previous one what's new we are adding something called as another variable called pattern definition i mean put any name you want and val char 500 this is going to contain all my parameters right um let's proceed my select statement looks quite similar it's um and we assign it to sql string and now my select statement closes here i'm not doing any concatenation remember in the previous one we said manager id is equal to something like that we're not doing that instead i am declaring a variable a parameter sorry correct myself a parameter add the rate manager id of type tiny in because i know uh my manager id is not going to be more than 100 or 200 so why take integer or bigint i can just take tiny in so i'm declaring a parameter at the rate manager id and i'm assigning it to param definition now comes the interesting part so my select statement is done uh my parameter is declared and it's assigned to param definition now the only thing is just combine them so we do this sp execute sql dynamic execution and i tell sql server the statement is inside sql string my parameters are inside param definition and the last part where is the value coming from the value is coming from at the rate manager id equals to add the rate cnt this is kind of if if you're seeing this for the first time if you're doing this for the first time it might be a little tricky but this is just a syntactical thing to understand you declare the variable and you put the variable which contains the sql string the variable which contains all your parameters separated by comma this is a question that might be coming into your mind what if your statement has multiple parameters well just separate them com from comma so comma something something something address data type comma like that and then you have to supply the values so this advert manager id should match what you have declared in this single quote here so again at the rate manager id is equal to at the rate cnt cnt here is the local variable comma next parameter equals to the local variable comma next parameter equals to some other local variable so on and so forth um so that's as simple as that so let's go and execute this one and we're done so let's dbcc free poc cache and let's go and execute this again two plans 192 kilobytes and let's go and execute this one hope it does well so the execution is done and you remember it took 15 seconds or so how much time does it take now for the total execution it's still running now this is all about the cpu time there in this okay 15 seconds again let's go and look into the execution plans here so there you go now you have four plans and 264 kilobytes this is quite good now if you if you look uh i mean where is the performance aspect here remember friends these are just academic examples shorter queries playing around with a few milliseconds it doesn't make sense to write very complicated queries that take minutes to run you will be bored off you will quit this session and go into some other session that's why i got to keep it very short very crisp just get the learnings and the message to you each plan that gets created in the previous case it goes to the optimizer it has to go through multiple phases right parsing uh okay plan creation right so compilation optimization optimization also has multiple phases phase zero phase one phase two and multiple runs and multiple permutations and combinations theoretically they could be hundreds and thousands it is a cpu intensive process so you have to ensure that you know and that is where the sql server concept of plan reuse comes into picture and in this case that's what exactly is happening there's only one plan in the plan cache which is a parameterized plan now and in the previous case there were all ad hoc plans with their specific manager ids plans are being reused cpu cycle consumption is quite less and plan cash bloating so in previous case this was like plan cash bloating you have hundreds and thousands of versions of the plans of the of a similar query but now if you see it's only 264 kilobytes that is also important because uh today hardware is a big right you have powerful cpus you have servers with a lot of memory but that wouldn't really mean that you will just wander away all the memory that you have planned cash gets its memory percentage from the main memory so the main memory is given to the buffer pool it's given to the plan uh cash there are all these different memory clerks and consumption and of course if plan cash hits its limit then plans will get checked out from the plan cash which is again not a good thing because if the same queries come again they will have to be recompiled so on and so forth so it's always good to ensure that there is no planned cash bloating happening so friends let's close this this is uh session number four uh demo number four and now let's go to the fifth demo which is query tuning um bookmark lookup wow looks like a roller coaster right fifth demo five six query tuning techniques so this one is about bookmark lookup so lookups so let's jump to the demo straight away i'm using adventure works 2016 here and thanks to adam mechanic i've used adam's database here the big adventure works and some of these scripts um you know just to make sure you need big data to kind of sometimes show those number of seconds so adam's great query tuning guy i i mean he's not very i don't know active in the community nowadays but uh some of his content was really awesome so let's use adventure works 2016 and the big tables that i was talking about which i created using adam script is big product and the dbo transactions table big product has a few thousand records and debut transaction table has a few million records let's go and execute this query i'm turning on actual execution plan now what is this query really doing there is an outer expression here i'm doing a cross-supply with the inner expression in an expression talks to the transaction table and is joining with uh the product um where product id no okay so your inner expression works with the transaction table the outer expression and this entire query is pumping data into a temporary table called x and then we have provided some filter let's go and execute this query and i'm turning on actual execution plan let's go and execute and see how much time this execution takes in my previous run this would take about six to seven seconds let's hope it is around that okay it takes about seven seconds and you can see about 3.2 million rows get pushed into the temporary table all good let's go into the execution plan this is the parallel plan everything seems to be good here no issues as such no warnings no spilling nothing at all and there is no lookup because this demo is about lookup please see they'll seek the stop there's no lookup here all good now please follow this so what happens is for some reporting requirement for some other application requirement i mean just a made up case right there is a need to add an additional column into transactions table someone says okay let's add customer id column to the transactions table now this has nothing to do with your query this is some other requirement but a dba just goes and adds customer id column to the transactions table um you might wonder that this is not how it happens in a real world trust me it happens quite often anyway so we add this customer id column and we daniel is laughing pretty loudly there so we uh we dropped the table now the temporary table that we created and let's uh to be against fair let's drop the uh clean buffers because we got the data from the uh disk into memory so it is cached and now let's go and execute this query again what's the difference now remember transactions table has this additional column customer id so maybe this query would have to deal with it actual execution plan is turned on and let's execute so while this is executing i will have some water you know there is a saying that the query is running so slow that i can go to the cafeteria i have a cup of coffee so this is kind of a real metaphor the query is so slow that i can really open the bottle nicely comfortably drink it talk to you guys say hi smile daniel hope you're doing good right let's go out for a walk and the query is still running anyway it is going to take a bit of time looks like it generally completes in about 30 to 40 seconds but it's taking a bit of more time so i'm going to kind of copy this and go into a new window and probably show you the estimated plan and maybe that could be a good learning from there so let's see the estimated plan of this query so when you look into the estimated plans from okay while this was uh i was doing this the query execution is completed so i'm going to close this so how much time has this taken okay this has taken 59 seconds so something that completed in seven seconds earlier has taken about 59 seconds now what's going wrong so if you go into the execution plan and this is what many of you will do right slow running query let's go and look into execution plan and try to do some guesswork and find out what's going on anyway this plan is relatively simple but something that will immediately catch your attention two things are going to catch your attention here one is the sort operator and there is an exclamation mark on the sort operator so there is a warning there the other thing that will catch your attention is key lookup why is this catching your attention because of the cost factor 80 so as you know the basic guidelines about execution plans is of course go and look into all these iterators or operators and look at their cost practice so key lookup is going to be very expensive now for the sake of brevity i will tell you that you know sometimes we might be inclined to look into the sort operator because sort operator gives you a warning and i can tell you this is about spilling so you take over go over this and you can go into the warning there and you can see that spilling has happened level one and level two spill threads something something something so there wasn't enough memory and it had to be written back to the disk and you know what what developers and dbs would do is go back to the query try to fiddle around with sort and all of that but just again for the sake of gravity sort is not the problem here the problem is key lookup because look at the cost factor look at the cost factor of sort it is seven percent so i am uh i'm not trying to say that i will straightaway jump into i myself would probably go and look into the sort operator but key lookup is the problem here if you go and look into key lookup why is it so expensive 80 because it is being executed 3.9 million times now what is happening what's happening is and i will show you the details but i'm giving you the summary of it right now there was a covering index which was covering all the columns that were there in the transactions table remember the cost supply and the inner expression was working with transaction table and it said select star which means get all the columns and there was an index which had all the columns that were there in the transaction table when you added customer id as a column it was not part of the index which means that index is not a covering index so to get and fetch the data for customer id the optimizer had to go and look up the base table and it did 3.9 million times because those were the number of records that were there inside the transactions table and that is why it turned out to be so expensive so if you right click on this missing index hint here and in this demo i'm showing this to you the second time because hints are good so i would tell you that these missing indexes are not so bad um i know by the virtue of knowing the database model and the design of this table here that there is an index but look at this one it is telling me to add customer id as a column and make this entire thing as a covering index well there is an index already as i said which is something like this it has product id transaction date transaction id is the clustered index so it is automatically included in all the non-cluster index so i don't did not mention it explicitly so product id transaction date quantity actual cost is already there customer id is not there in that index that is where missing indexes are bad because it ignores what you already have and just recommends you blindly to go ahead and create covering indexes so what is the solution well the solution solution number one is you can go well the worst thing you could do is go ahead and create this index right which missing index can suggest you and create another covering index which will be completely redundant that is a solution but it's probably the worst thing you could do second thing you could do is go and modify the existing index and why not show it to you so go to database adventure works go to tables and let's go and expand dbo transactions so let me just go and refresh this you sure you're not in the 2016 database okay yes you're right daniel thank you so this is tables and dbo okay i need to refresh this as well am i getting dbo transactions there you go so expand this and and and indexes there you go and is this the one product id transaction so let's go and script this out it's quite small fonts out there wow you generally get a dialog box like this in process okay okay is this the okay there you go so you can see this index product id transaction date as i said transaction id is the trusted index quantity and actual cost so what's not there here is customer id so this is the non-clustered index that is already there so your second solution could be modify this drop it recreate whatever and get that done but you know friends where is the real crux number three you are violating the best practice here which is you have put select spar why would you put select star you want certain set of columns just put those column names if this query had only the column names that you wanted which was actual cost product id transaction date etc this query would never run slow or it would never run into this problem so developers who say this select star thing even though so many sql folks in the community keep shouting on top of their voice to avoid select star we still do it that is the culprit here that is the demon here so uh that is the fix so if i scroll down a bit here and you know um drop the table that customer id column is still there let's dvcc free product cache and drop clean buffers and look at the similar another version of the query this is what i'm saying right you have the column names and when you run this right and execute this will run in same six seven seconds because now it's not dealing with customer id column that's the real thing to do now the there are quite a few other learnings from this very demo of course bookmark lookups are actually good they are not bad let me try to give a real perspective nothing is always bad or nothing is always good you know you just can't go ahead and create covering indexes for every query right there can't be one covering index per query that's not the solution there will be bookmark lookups that will happen because you have some columns that are not part of the index they have to be searched from the base table so it's a good thing but as i said in real world sometimes bookmark lookups can really create havoc the problem is of course select star is a problem and you see that this once in seven seconds now and you can see because you fixed it by removing star and you put the right columns the other thing is testing this data with uh big tables uh testing the query with big tables so um i see with customers developers when they write these queries they're going to test they test their queries with tables that have relatively small data and when you take the same query into the production environment which has real world data big data the same query runs very slow that's the problem i've seen developers testing their queries with empty tables also you know so they've just run fine all the time so that is where the problem is so remember um and i'm trying to answer another question that came from another session bookmark lookup will happen when you when the optimizer is looking up on a base table which has a clustered index so someone asked me a question about is this similar to rid lookup so rid lookup happens when the optimizer does a lookup on the base table which has no clustered index so the base table is a heap the operation is same the logic the concept is same is just called as rid lookup because if you do not have a clustered index then there is something called as a row identifier how every row is identified internally inside sql server so that's bookmark lookup then demo number five is done and let's jump to the last demo of the session six query tuning techniques so five has been shown this is the sixth one and this is a popular one and i put this as the sixth one so that i can relax a bit into it because i'm assuming that many of you would know about parameter sniffing it has been talked a lot about and there's so so many resources around it so i thought while the session is ending i'll calmly just discuss they show you a ready-made solution with the hint so let's go and close this one all right not a bad database adventure works 2016 let's create this stored procedure called get customer shipboards it has two attributes two parameters ship date start and ship data and if you look at the if you look at the query itself it selects a couple of columns and the criteria is ship date between start and end very simple stored procedure let's go ahead and create this and then we are going to create an index to support the query uh and create an index on ship date that's where it is and this is required because i can show you the difference of this parameter sniffing now comes the interesting part this is a stored procedure and stored procedures are generally good one of the advantages of stored procedures are stored procedures encourage plan reuse right now this is the compiled object inside sql server on its first execution the optimizer is going to create a plan and the plan is going to stay in the plan cache and subsequent execution is going to reuse the plan that's the good thing let's read the throat cache what i have done here is there are two executions of the stored procedure the first one will fetch let's say all the data so from 2005 to 2020 we'll fetch all the data and the second execution of the stored procedure is just going to fetch 10 days of data from 10 july 2005 to 20th july 2005 that's the difference now let's see what happens when i execute the first one to fetch all the data and let's turn on actual execution plan execute this you are going to see that optimizer decides to scan and there is a scan plan there so that's clustered index scan happening there now if i execute the stored procedure a second time now remember this plan is already injected into the plan cache the scan plan the second execution comes now with different parameter values that's the point to note so earlier one was the query was low selective which means it fetched large amount of data the second execution of the stored procedure with different parameter values is highly selective it's only fetching 10 days of data small set of data so when you execute the second time now because the plan is already there in the plan cache optimizer will reuse the plan and you are again going to get a scan this is expected no issues there let's free the proc cache and let's execute the stored procedure in the reverse order let's execute this one first which is going to give you small set of data and when you execute this you will be surprised not really surprised but you will understand that you know for the given parameter values with small set of data a seek was better a seek with key lookup was a better choice for the optimizer but i freed the proc cache which means there was no plan in the plan cache and the first execution of the stored procedure gives me the seek plan c plan gets injected into the plan cache now comes the other execution of the stored procedure the second time with different parameter values giving me large amount of data scan would have been better but because a seek plan with key lookup is already injected into the plan cache if i execute this this is going to take quite a bit of time it doesn't take on this one because the data set is small but if you go and look into the execution plan it resorts to using the seek with key lookups and if this was doing the same 3.9 million records this would have taken more than 30 seconds to run and if you go and look into the key lookup you can again see number of executions just not just but 31 000 but you can understand if this was 31 million what would be the scenario now it's a coincidence that key lookup comes again here and you may try to compare it with bookmark lookup the previous example but it has nothing to do that with that is just a coincidence but anyway you understand that this could have been bad so what's happening this is parameter sniffing the optimizer sniffs the parameter value and generates an optimized execution plan based on the parameter values a good thing actually not a bad thing but then as i said i know the coin has two sides of it the good and the bad but sometimes it can turn out to be very bad and i just showed you an example of that so parameter sniffing is generally good but these are things that are happening in your environment every now and then if the scan plan is there in the plan cache and all subsequent different parameter values are reusing it sometimes it's all good but if the seek plan with bookmark lookup gets into the plan cache and then if with variety of skewed distribution variety of parameter values are coming in uh which result in ineffective use of that plan the query the query can run very slow and that is the reason why we get all these complaints from business users that sometimes my report is running absolutely fine sometimes it is running slow you haven't done anything but sometimes again it starts running fine and you pat yourself on the back that i did everything and you know all that stuff keeps happening in production all the time what is the fix well there are many fixes to this and there is query store nowadays in 2016 17 and 19 um which can help you identify parameter sniffing quite easily you can just right click and apply a specific plan all that good stuff is there uh there are you can create you can use hints like optimize for unknown you can use local variables to circumvent parameter sniffing but i'll show you a quick and easy way which is using option recompile so don't recompile the entire sport procedure this was the sql server 2000 era where we used to recompile the entire stored procedure starting from 2005 we have a new thing called statement level recompile where i can let me drop the stored procedure where i can tell sql server that only this statement please recompile every time it comes and hits sql server so i'm going to create the stored procedure and observe the statement has the hint option recompile that's a quick fix i would say not the best fix so to say but then if you free the pro cache and execute the first one what do you expect you expect a scan there you go you get a scan i am not freeing the pro cache and doing the second execution earlier without the option recompile query would force will be forced to use the scan plan but now this option recompile i'm executing this again which means the statement is going to be compiled again plan is going to be generated again you get a c you go back to the first one you get a scan again which means every execution generates a new plan option recompile actually spelling sql server don't store the plan in the plan cache because i'm going to come again and you got to recompile again you might be thinking that's that's bad right i mean in contrast to what i was showing earlier because compiling a plan is going to take extra cpu cycles optimization etc well it's a trade-off there is no magic bullet here you kind of decide what works best for your environment sometimes you know as i said earlier there are there's a lot of cpu power which is uh not being used uh is there enough i mean those days are gone when we used to kind of choke at 99 all the time so cpu cycles are available to be consumed so if the trade-off is good right if you're really saving a lot of time business users are happy that's important right um and the client is happy and with a little bit of extra cpu cycles if the trade-off is better you can go for it but if the the trade-off is not good maybe you got to think about other solutions well with this we come to an end of query tuning techniques and this has been a non-stop uh demo uh session on six things hope you enjoyed all the demos and yes all the demos worked that's the big thing so yeah friends thank you uh for your time and i'm ready to take questions and if you like this session uh do tweet uh appreciation is always good so here is my twitter handle a underscore bunsel and daniel can confirm group by conference is the right twitter handle there yeah yeah okay so please do to it and even if the session wasn't good for you you can still do it there's no problem yeah there was always there's always a scope of improvement uh visit these links i can if daniel allows me i can just paste them again in the chat window and you can stay connected with me using all those links and now happy to take questions all right so so there was mostly agreement and not all that many questions um yeah so there was a discussion in slack where mark freeman says entity framework defaults string parameters to nvarcar 4000 and date times to date time to seven uh devs have to watch out for that magnus says we had one of those implicit ones in a very busy database uh that simple single query used as much cpu time as all other queries combined hard to fix it was an orm doing the conversion solved it in the end by fixing the mapping for the orm and tony adds it's not so much entity framework rather but rather a dotnet that uses unicode strings so i i think you really hit the head on the nail with the implicit conversions and the android car things that seems to be a major issue right and and i think i promised the audience that uh if time permits and i think i have a few minutes daniel with your permission can i show them better yeah lots of time yeah so if you actually have another 30 minutes available to you on the slot so just use whatever you need okay perfect thank you thank you so friends um if you are still watching the screen on the object explorer management and if you see extended events um okay there you go right click new session i mean i can take a wizard or i can take new session either way is fine so let's take the wizard just to get this very simply done i'm creating a it is very similar to profiler again assuming that all of you have been using it so let's say track conversions okay and move next and i'm not going to use any template i'll just quickly add the and you know this is what i do is um there are so many events just like there's so many weight types i just got to play around with things so i was just one finder i was like okay is there anything to do with convert and you can see this event that i was not able to kind of remember correctly which was plan underscore affecting underscore convert interesting names who chooses these names by the way anyway so plan affecting convert and that's there you add that which comes with the default payload and a few columns uh you can add additional columns in event extended events world these are called as um actions or global fields so we can take uh let's say what do you want we want a sql text there just to kind of and let's say text session id and let's move next and then you of course should filter one of the extended events best practices just don't collect anything everything be very choosy and selective because even though extended events is lightweight it does add that minimal overhead and why do you want that um when you're so focused on tuning stuff and yeah we start the event immediately and watch the live data and what i'm going to do is because i have not done any filter it will just pull in a few things that are already running behind the scenes so let's go and open implicit conversion here and this was the implicit conversion isn't it let's um use adventure works 2016 and i hope i will get this so it was like what i want to show you is how do you track this right so we did an implicit conversion has just happened uh where is there you go and let's go and stop the watch window data and you here you are so this is let's expand this so here you get captured so if you zoom this and you can just observe the last field session id was 81 and you can see convert issue seek plan affecting c expression is available and you get your sql text of course you can get database id and database names and everything and you have your sql text right there could be other ways also right like you can write a query and you can shred the plan cache xml and search within the xml you can do that but i'm not a very big fan of doing that because that might be an expensive query in itself on the production servers maybe this could be a little more uh lightweight because shredding in xml and you might have hundreds and thousands of plan in the plan cache so shredding the xml of all of that and searching within the xml strings uh could be an expensive query in itself yeah right so i thought the uh convert implicit help in the uh extended events session there was really helpful it shows you exactly which column converts to to what data type so so that's really helpful yes yes indeed okay i'm still lurking around to see if okay so there are there's a question in the chat window there i can see so maybe friends if you can use the q a panel it'll be great but i'll take this question from forrest it says amit how do you feel about plan guides when trying to parameterize ad hoc queries so plan guides are a good um are await and you know the they implement they help with this plan freezing concept and but i'll tell you i had initially started using plan guides and this plan freezing thing i think it was sql server 2008 or r2 when this plan guides concept was introduced but i don't know for whatever reason i've customers or i don't know maybe the community sql community can give you a better answer they haven't seen huge usage or practical uses of that maybe it was a little tedious because remember it was all t-sql you had to create multiple objects then enforce that so i think the implementation of it was a bit tedious and uh and i again i would say that now if you actually use query store and i think behind the scenes of query store actually plan guides are being implemented so which is when you right click a specific plan in the query store and you force a plan you're telling sql server that every time when this fragment comes in please use this plan actually plan guide is happening behind the scene but look at the difference here that was a lot of pc cool work and manual implication implementation and query store is just so quick enable capture click and you are done so i think you you you're right there is whether we use plan guides or we are using queries we are actually doing that freezing the plan yeah not sure if i answered your question for it but yeah okay ashish has a question can you please provide more insights on sarcables in which context we should pay more attention yes indeed sargability is a huge subject i just took you know i took this very elementary example of you know seeking not happening because of this mathematics stuff but remember those queries right uh where we say that um okay you have an index on first name so you say where first name is equal to percentage amit right something like that so you're putting a wildcard character there now again surgicality is going to be lost because searching happens from left to right so if the optimizer does not know the execution the starting point of the character which is a wildcard character again it is going to resort to scanning so there are so many examples i mean usage of this wildcard percentage amit percent is fine because you know amit and you know the starting point but person amit is not fine ah what other examples there are so many of them daniel anything comes to your uh mind so this was one the um the using of the functions on the left side like absolute function abs you know um commonly states dates and strings are the ones that i see all of the time all of the time right so they are uh the ones yeah and i i most commonly i see the usage of functions t sequel functions on the columns on left hand side of the equality operator they are the most common ones not really wild cards are not much but those are the common ones and they can't even be the correct data type they're not sarcable because you need to drill into the function for every single row that you pass which is uh slows down and you put that in the where clause it's just terrible okay and that's ah she's okay thanks yeah thanks ashish i think we've done all the questions so i thought that was a great session uh you you finished way ahead of time as well which is uh it's super cool i i think these six issues pretty much summarize what i do as performance tuning uh it's it's what i usually see um i would i would go into some nested looped joins maybe but but this is pretty much what you see performance tuning so anonymous attendee says thank you for this wonderful session i mean it's really helpful when you're new to performance tuning how how to manage lengthy stored procedures with complex execution plans well um lindy stored procedures with a complex execution plan i don't have um a ready answer for this but i will tell you how i do is break up the stored procedure into those individual statements that they are that are running take them one at a time and i try to look into problems with the execution plan and sql sentry uh plan explorer is quite helpful there because i get much better information out of the plan estimations cardinality estimations that are going wrong etc um so yeah there's no predefined way but is just trying to uh figure out which uh in which area of the stored procedure there is a performance issue i have also used extended events um let's say the um causality tracking which is i um try to again as i said uh put like debugging the stored procedure from a performance perspective but i put multiple steps so i know which portion has executed in the amount number of seconds so i track that so i track uh weight information and uh with causality tracking i can i can using the activity tracking id i can find out how much time a particular portion of the stored procedure has taken to execute so i can find out the problem problematic part you may have a long stored procedure with hundred statements and not all hundred statements are causing performance issues there could be just one or two statements so figuring out which are those one or two statements and then isolating them and troubleshooting them but then yeah yeah but when you use extended events cautionarity tracking feature you just got to be very careful because that also adds a bit of overhead on production server but it is very very powerful it was a nice segue to our next speaker as well andy young from from century one who's actually gonna discuss um um i i don't think it specifically says um um i'm drawing a blank yeah the the query analyzer product you talked about plan explorer plan explorer plan explorer yeah that's a golden tool it's it's absolutely amazing yes um also a thing that i use often is the set statistics the way you use it it prints to to the messages tab you can copy and paste that into a site called statisticsparser.com which turns it into into pretty tables for you so you can actually so it becomes human readable that i find that helps me a lot so there's one more uh question can i get more videos on performance tuning well you should go to uh group by youtube channel there are a lot of videos out there um and you can go to sql baseball's youtube channel also it's just youtube sequel mestros there i i do record whenever i have time i do record a lot of videos and i can remember off hand a lot of us brent ozar does a lot of videos uh and uh full of uh performance stuff on his channel as well so many of them um since you have a little bit of extra time too just wondering emmett uh do you have any free tools that you'd recommend related to your session or not just for people in the community to get their hands on not to put you on the spot yeah just anything like that yeah there are a lot of free tools that we can use um sp who is active is a great tool i've been using it for many years to track down what's going on inside sql server problematic queries brentosaurs collections of scripts very popular there's so many of them uh i have been using them um early on i used pal tool performance analyzer for log when you when we collect performance monitor counters and data so pal and then try to correlate that with dmv data so pal was helpful i've i've used sql diag as well uh that used to be helpful years back i used nexus but not now i think these do enough dmvs um sp who is active i use combination of extended events these are free community tools that i'm using i have uh because i i've not uh used much on the dba tools side i've heard it's quite popular but not sure about i know it's very popular from an administration perspective but maybe some of you can enlighten me if it has some stuff on um i don't think it's that much for performance tuning i think it's very much for for administration and practical things i thought so so i uh use many of those and of course in the paid world there are many which is not right to mention here so yeah but i think it's a little bit of hard work but creating the baselines do the job so uh i i do have a complete baseline framework that i use uh for work capturing all the dmv data into the baseline table periodically publishing it's a mini framework in itself which gives me a lot of good stuff in combination with dmvs and extended events and perfmon perfect beyond awesome thank you freezing everyone's budget this year is what we keep saying but it definitely is true you know when uh with everything going on it's you know nothing can beat free right now i think just taking the worry over price away from people definitely is great so that's awesome thank you all right so we do have some extra time so i guess just um yeah i'll leave it with with you both daniel let me know if you need me okay yeah totally so if um do we have andy in the house so there is another question from ashish if we have nested sps how to figure out which sp is consuming more time any tips or tricks ashish the answer is similar to that because sn one sp inside another recipe is as good as a statement inside an sp so the idea is same i can use the same uh techniques to kind of isolate which is causing problem remember those old techniques we use those print statements and things like that to kind of keep track of where the performance uh hit is happening but now um with weight stats weight type extended events as i said it's quite powerful and relatively easy but the guiding principles are same be it a single large stored procedures or multiple sps within each other it's the guiding principles are same
Info
Channel: GroupBy
Views: 587
Rating: 5 out of 5
Keywords: sqlserver, groupby, SQL Server, SQL, Performance, Performance Tuning, Query, Query Tuning, Amit Bansal
Id: JobHumSyRXg
Channel Id: undefined
Length: 73min 27sec (4407 seconds)
Published: Tue Nov 17 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.