Tuning SQL Server :-Identifying and Fixing the CPU issue

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hi welcome back to the series on tuning sequel server this is the third video in the series and if you remember from the previous videos where we were looking at the data and then simulating a performance issue in the second video what we'll do in this video is now that we know how to simulate performance issues we're gonna try and use some tricks or some methods that we can identify used to identify issues or problem queries to just recap what we did previously we had this particular OS stress utility that we were using to simulate calling this procedure called search flights ten times per request ten requests and 50 connections so total of 500 times and when this was running we were getting a high CPU utilization so I'm just going to go ahead and run it again just so you can see the the behavior and while that's happening I'm going to go ahead and show you activity monitor here as well so you can see how running that query causes my CPU to blow up so again the series assumes that you don't have an idea about performance tuning and that you're a bit developer who doesn't really look into the details of how the server itself is behaving and you're more concerned about functionality and you'll see immediately that the CPU spikes too close to about 80% and we obviously know that that's going to be a problem because this is just a one procedure what if we have hundreds of other procedures and hundreds of other databases on the same server it is going to be a problem and as a result we need to go ahead and optimize it to you in the query but we're going to pretend that we don't know what the query is and we need to identify what's causing the issue so when you were faced with the situation like this where your DBA comes back to you and says that well something is causing the CPU to blow up and we need you to identify it or maybe you're doing a load testing or on a pre prod environment and the testing team comes back to you saying that you know everything seems to be working but your resource consumption is pretty high it's always a good idea to come back and have a look at these queries so that you can identify what the root causes and see if there is something you can do to fix it so as you can see we've got 70% you it's not really that good because it's just the one procedure at the moment so we need to identify what query is causing this problem so I've already written it here just save some time but essentially what we're doing is we're identifying the top 10 queries from sis dmx a query stats which is a DMV that captures statistics of the different query executions obviously this gets reset after a server restarts so you want to make sure that you have this information with you before you restart the server and on and so and it doesn't really give us the details of the query itself it just tells us that there's a plan handled as a sequel handle which we can use we know the creation time which is obviously today because this is running on my laptop and I've just restarted it and you can also see the last execution time the number of times it executed the total worker time then total worker time is what we're interested here because from our Activity Monitor we see that CPU utilization seems to be kind of a problem for us and we need to go ahead and fix it you'll see at this point that it's actually dipped down but that's because this guy's finished executing and as you can see it takes about a minute and 58 seconds so closer about two minutes for this to complete 500 executions and what we need to do is we need to see if we can bring that time down so that we can run more queries simultaneously rather than restrict ourselves to 500 queries before the CPU blows up to 80% you'll see that we have the total worker time the last worker time in the minimum worker time and normally what I prefer to do is take the total worker time n divided by the execution count so that I get those queries which on average have high CPU execution times so as you can see here that when we finish executing this we'll get a result set that shows us the queries that cause the maximum CPU utilization in my case you can see that this first row is just one execution which is like a very long amount of time and that's actually an extended event here so that I'm not going to go ahead and fix because that's not necessarily something related to the code that we wrote but on the other hand you'll see that I've already executed this a couple of times so you will see that I've got like a thousand executions of this procedure and procedure itself seems to be taking a large amount of CPU time and if you scroll to the very end you'll see the the code of the store procedure because we're using the sequel text function here to interpret it interpret the data coming out of the sequel handle basically converting it into text format that we can read and by doing this what we have here is we know the procedure that's going to be causing the CP utilization so you'll see that my CPU kind of went high and then after finish executing so in this case you'll see that it took about three minutes to execute and once it stops then pretty much everything goes back to normal so now that I know which procedure seems to be causing the problem the next step would be to identify the execution plan of the procedure and to do that I'm going to go ahead and run the procedure with the actual execution plan to see what's going on inside the procedure that's causing the issue and you can see that this procedure fetches two rows and takes two seconds which is obviously pretty bad from an execution standpoint and we need to fix that once I go into the execution plan you'll see inside the execution plan we've got a little bit of a veered execution plan because I'm doing some things inside of it to do simulate random behavior so you'll see this top function here you'll see the top function here as well and the sort and functionality there so the reason I'm doing that is just to randomly generate some source and destination airports and some random dates which we'll see when we look into the code but by looking at the execution plan essentially what I'm seeing here is that there seems to be a table scan here which is consuming 33% of the resources and if I scroll further you can see there's another table scan happening here on schedule table which is again consuming 33% of the total execution effort and then we have a hash match to join which is taking 26% of the total execution effort so between the three of them they seem to be causing the maximum amount of resource consumption and obviously by just looking at it it's pretty clear that they're doing a table scan and they don't have good indexes and one thing that I would want to do initially whenever I troubleshoot is to make sure that all the tables have the right indexes which is not the case as you can see with the roster as well as the schedule table as a result because the indexes are not there you'll see that it's actually got a very thick arrow moving from the table to the next operator and that indicates that it's doing a table scan and as far as the table scan is concerned you can see we are doing about a million rows worth of data fetch as part of this operation and the thing is that while we're doing that obviously the performance will degrade because you know then the amount of memory required and all those other things will come into play which we'll talk about in later videos as well but the first thing I need to fix is to kind of help sequel server identify the rows that would qualify for this query without having to fetch all the data and to do that I need to actually see the query and see what what it's doing so that I know which columns are being used and based on those columns create the appropriate indexes so what I'm gonna do is I'm going to go ahead and open up the procedure over here and as you can see as far as the procedure is concerned we've got the roster table we got the schedule table over here in the main query and these tables are the reasons why we're facing the issues so you'll see that we've got flight date here on the roster table origin ID and airport ID on the roster table this is what's being used to filter the data all right so we got the flight date the airport ID and the Virgin Airport destination Airport ID and origin Airport ID and between the three of them they help me filter out specific rows from the roster table which I can then use to join with the schedule table based on flight date and flight number and sorry actually based on flight data flight number so these are the columns that I need to go ahead and use so the next question then becomes is there any random order or is there any kind of specific things to keep in mind when doing this as far as knowing which columns need to be indexed so just to give you an idea what I'm gonna do is I'm just gonna write it down here so in the roster table I've got the flight date I've got the origin Airport ID the destination Airport ID and I've got the crs departure time that the scheduled departure time basically and then I've got over here you can see that we're also joining with the schedule table based on flight number so these are the tape columns that I'll be using to do my filtering so the question then becomes is there any order in which we need to put this data and the order that you need to put the data in is based on the cardinality of the data so a simple thing would be if I do select star from roster I wouldn't do select star all to select distinct from roster so I'll say distinct flight date and when I do distinct flight date you'll see that I've actually got 59 rows here so what that means is that I've got 59 unique dates that I can filter on and by terms of cardinality that would be like let's say we've got this many rows and I'll divide it by 59 and that tells me that one date could on average me map to about 16,000 597 rows which doesn't give you the kind of cardinality that I would want because I would essentially need as many rows as possible filter at the very beginning so that the parallelism doesn't happen so I'm not gonna gonna use flight date for now I'm gonna use the app Origin Airport ID and see whether that is cardinal enough distinct origin Airport ID and see if that gives me a better ratio and you'll see that the origin Airport ID here has got three hundred and seven rows so if I go ahead and do that by 307 that'll give me a better average so if I use one Airport ID I could potentially bring down the rows selected two 3189 rows which seems to be a very good ratio at least for the data set that we're working with right now the next one is the flight number and if I do flight number you'll see that we've got sixty five six thousand five hundred and two rows there that can be used to filter that can be used to join with the schedule table but unfortunately if you look at our query here we're not really doing any filtering based on flight number so you'll see that the flight number doesn't really become part of the where condition where you're eliminating rows based on a particular flight number instead we're eliminating rows based on the flight date the origin Airport ID in the destination Airport ID and as a result what I would want to do is I would need to come back into the roster table here and in the roster table I would go ahead and create an index I'm gonna go ahead create a non-clustered index over here for the time being so I'm gonna go ahead put a non-clustered index and I'm gonna say filter the data by origin Airport ID followed by destination Airport ID and then flight date and the reason I'm doing this is because if you see here I need to move this up so that the most Cardinal columns come up on top and then the least Cardinal columns come to the bottom so that as soon as I enter the index I would be able to eliminate unnecessary rows for more details about how indexes work I've got another video that I've already posted I'll put I'll be putting a link at the end of this video so you can have a look so once this is done I know how to filter the index and get to the data that I'm looking for however there are certain other columns that I'm using that's not really part of the index filtering criteria which is the flight number and the CRS departure time and because they're not really being used to filter the data as when we need in this case what I'm going to do is I'm going to put them in the included columns here and I'm gonna say include the flight number as well as the CRS departure time now there is an interesting question that somebody asked me during one of my trainings where we could just go ahead and put a columnstore index on this because we're basically fetching every column in this table anyways and I'll come back to that when we talk about deadlocks so let's leave this as it is for now and I'm gonna go ahead and just execute this with the defaults shouldn't really take too long but let's see what happened okay so we've gone ahead and fixed the the the roster table at this moment by creating the indexes that we think would be the best fit for this particular query that we are troubleshooting at the moment and what we'll do now is we'll go ahead and also look at the schedule table just so that we can fix both of them together and see the combined effect of them on the on the data set so next in the schedule table what we'll see is that as far as the schedule table is concerned we've got only the flight number here and then the flight date over here and that's pretty much all that we're really using from the schedule table as you can see so we got the flight number which you're using to join with the roster table and we're also showing the flight number and the flight date at the output of this particular query so schedule table becomes fairly simple as far as our query is concerned so all we need to do is right-click new non-clustered index and the two columns that we'll be using is flight number and flight date and as you remember previously the flight number is more Cardinal so we'll be using that as the the top entry point into the index so with that we have our indexes created and we're going to do now is we're going to test the impact of our indexes on the overall query execution so to do that what I'm gonna do is I'm gonna go ahead and just run this guy again and you will see that the previous execution of this particular query very running search flights took three minutes or close to three minutes and we're gonna run this again and see how long it takes now so I'm just gonna go ahead and leave this here and execute this guy just a second and there we go sorry just a second all right so you can see that immediately it seems to be a lot faster than before and now if I come back to my activity monitor let's see what kind of CPU utilization or the resource consumption this particular query has now so what we're trying to do is we know that it takes about three minutes for 500 executions to complete and we want to see if we've actually improved that by any chance with this new indexes that we've created and as you can see almost a an immediate drastic effect in terms of the improvement because you can see that it previously was taking three minutes to complete and now it's taking nine seconds and at the same time you'll see that our CPU utilization is well below the 80% that it used to be and this is kind of like essentially what you want to do every time you do performance tuning you wanna identify the root cause make sure that the query is correct the one you identify the correct root cause and after you've done that you want to investigate that query to see where the problems in that query are a simple example would be as for this particular query we saw that the queries execution time was significantly high and one of the reasons for that was because it was waiting on CPU and the CPU count CPU utilization was pretty high and we needed to bring down the CPU utilization to improve the queries execution times as well as reduce the load on the on the server so I'll just go ahead and run this again just to revalidate always a good idea to do multiple runs so that you can get an average and rule out any anomalies or any spikes in the data that might be one offs and so far it looks good so we've got execution time that's about seven seconds which is seven seconds for 500 simultaneous queries that are running and that seems to have fixed the performance issue that we're talking about the next question that obviously you need to ask yourself is is this enough or should you tune this further and we'll talk more about that in another video I hope you enjoyed this video and thank you for watching

Info

Channel: Jayanth Kurup

Views: 8,631

Rating: 4.9080458 out of 5

Keywords: administration, AWS, Azure, Bangalore, Bengaluru, BigData, BLR, Business Intelligence, Cloud, Consultant, Consulting, Corporate, database, development, Enabled Business Solutions, expert, how to, India, Jayanth, Kurup, Migration, ms sql server, MSBI, MSSQL, power pivot, Power Query, PowerApps, PowerBI, Python, RDBMS, restore, security, SQL 2008, sql 2008 r2, sql 2012, SQL 2014, SQL 2016, SQL 2017, SQl 2019, SSAS, SSIS, SSRS, trainer, tuning, Performance, Training, PaaS, IaaS

Id: 7bYqReGx1hM

Channel Id: undefined

Length: 17min 44sec (1064 seconds)

Published: Tue Nov 27 2018