Troubleshooting SQL Server IO Performance/Bottleneck - Part 1 (sys.dm_io_virtual_file_stats) by Amit

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi welcome to another video from sequel mistress in this video I am focusing on troubleshooting sequel server IO now sequel server IU concepts are huge there are many different ways on how you can troubleshoot sequel server IO and in this part one I am going to show you the usage of a DMV called DMI / qu awhile stats I intend to record multiple videos multiple parts so that I can show you all the different techniques on troubleshooting sequel server IO performance so as usual no slides let's jump straight into action so I am switching on to my DMV or the VM here now my focus in this demo is on this d MV d DM IO virtual file stats and I am getting all the data so I just put null to the parameters when I execute this DM v dynamic management view there are multiple columns the output so let me explain you some of these columns here I of course get the database ID and then using the DB name function I get the name of the database now each database can have multiple files it can have multiple log files it can have multiple data files file ID one will always be the first log file and then you will have multiple data files starting from file ID 2 onwards so you can see one of the databases here which I am going to use in my example the tuning database which has file ID 2 and then let's look at other column and understand the data now here are three important columns number of reads I use tall read MS and number of bytes read so number of phrase as the column name says this is the number of reads that have taken place on this file ID 2 of the database tuning I used told read MS in milliseconds this means that 456 milliseconds is the total amount of time I your requests had to be I had to wait before they were accepted or they were catered to so this is what we call as IO stall read in milliseconds and of course this is the number of bytes read from the tuning database now these numbers are of course very very low and just minuscule because I have just started or rather restarted sequel server instance so you are seeing this really like fresh new data and now when I simulate workloads you will see these numbers getting higher up the way you have this combination of number of reads and this and in milliseconds the stall for read and the number of bytes read likewise you have a combination of these columns for number of writes number of bytes written and I use tall write again in milliseconds how much time the the i/o requests had to wait there is this column I or stall which is nothing but a summation of I you stall read and i/o stalled right so if you let's say add five eighty five to five you get 590 so you see with reads statistics and write statistics and the total you have all the different mathematics that you can do to get more meaningful information now much like many other DMVs just simply select start from a DMV is seldom useful so you need to really do some mathematics I am NOT going to show you very complicated mathematics here I will simply just try to demonstrate on how you can use this data to understand if your i/o is slow the focus here is is not why it is slow but to first investigate if you really have a slow io problem now there is this concept of latency which is the which you can actually figure out from i/o stall read M s now this is the total cumulative value so if you divide this data by number of reads you would get that latency factor pull read so I have done some small mathematics here which is I used to read M s by number of reads and you get what we call as average read latency you could do this more right as well or for the total but in this demonstration I just want to get the concepts clear so let me uncomment this portion and when I execute this now I have my average read latency so what you can see is for the tuning database and sorry I also want to show you that I am ordering by I used tall descending it is meaningful to actually order this by I use tall read MS because I am going to focus on read workload and I will see the tuning database now is that fourth position because I am ordering by the total read M as in descending order so which means now when I run the workloads you will observe that yuning database will actually come on top here so what you can observe is the average read latency in milliseconds is like 5 8 18 so on and so forth and for our tuning which is our let's assume user production database it is 6.6 now this number this in milliseconds the read latency is very debatable is 6 good is 10 good is 15 good 20 good you know single digit number should you have single digit numbers or double digit numbers and sometimes in production servers you might see even triple digit numbers going all the way 200 milliseconds sometimes if you have really slow IO now Microsoft at some point published general guidelines as to what these numbers should be and they said that anything below 20 milliseconds is is good is satisfactory anything less than like 8 milliseconds is awesome it is superb so it really is how powerful your hardware is and it also depends on number of factors most importantly the i/o path your request goes all the way from server to the sand and there are a lot of parts in between the server and the sand you have you have all the wires you have the should be HBA the adapters you have the network cards and there are a lot of these configuration settings and then you have the lat you have the San switch and you have the network switch so all these components they form this I of path and there could be Layton sees and bottle legs in in any area which will really increase this value so what is the right value really talk to your vendor understand what kind of disks and what kind of configuration you have and you will have a better number but for the purpose of demo let's really go by what sequel server gives me here and I'm going to give you a very practical example so I'm running this on a laptop with SSDs so you're going to see good numbers because this is more like a direct attached storage my disk is on the computer where the data resides in real-world production environment data resides in San so so the data really has to travel over the wire nonetheless I'm going to simulate slow disk and I'm going to show you this demo and what are the numbers you should care about so let's go and run the workload so when I execute this now you can see everything is all good static now let's go and run some workload so I run this command and I am going to run some PowerShell scripts let me create a few more instances so now I have about 6 instances running and am I seeing some data yes you can you can see that there is some data coming in there from the PowerShell window which is which means my workloads are properly running now let's go back and look into the DMV output if I execute this and now you will see suddenly tuning database has jumped on the top because it's read latency or is highest right now because this is the only database I am running the workload against and you can see that I use tall and please focus on these numbers now maybe number of reads and I used all read M is an average read latency if you keep your eye on these you will see that number of reads is increasing executing again and again and you will see I used all read em s increasing that number is increasing and now the most important look at average read latency in in in summary number of reads and IO stalled read em as the cumulative value is not going to make much sense average read latency is what is the one data item that you really need to focus on and you can see the numbers is changing now it's five point six three now it is five point seven and let's again execute a few more times and then you will see again it's sticking at five point seven now it's back to five point six so you you can see it's playing pretty well between five point six five point seven or rather I should say less than six which is good it's good storage things are running fine all good now what's bad so the bad is if I go and make the disk a bit slow so let me go and select execute make disk slow and if I execute this command completed successfully and if I go back now and I execute this now you will see the average read latency has jumped to seven point nine now it's 8.08 point six now it's going up so I've engineered something which is making my disk slower and the latency factor is increasing now the number of reads are are happening the way they were happening before but the wait time for those reads have increased now remember the moment I stocked about wait times that's another way on how you can troubleshoot a slow IO so you can use perfmon counter you have a number of different wait types that you can focus on you have these DMVs that you can focus on as I said before i/o is very very critical for a high performing sequel server and and the focus really in this Holy Trinity of CPU IO and memory I do tend to focus a lot more on the IO now now you see the average read latency has gone all the way beyond 20 and it's not touching 30 so now really I'm crossing that general guideline and my latency is going pretty bad so well with this demo you have understood on what is that one data item that you should focus on is this latency fattens very easy to compute this and of course you can do more mathematics around this DMV to get more meaningful data out and before I close my demo I should make my disk and of course I can I just have make disk ready so I can run this and it will go really really bad but I don't want to do that I hope the message is clear so let me go and break it back to normal and when you see I'm trying to do that it does take some time and when you execute this back and you can see latency has gone really bad so in this demo you have seen how IO latency as one of the data items can be used to figure out if you have a slow IO DMV is one you have perfmon you can use extended events you can use weight statistics or in real world combination all of them to first understand whether you really have slow IO the identification is part one first thing and diagnosing and fixing is another thing so in this demo I was just trying to show you the concept of what IO path is and how you can have energy to eight IO paths and how you can troubleshoot it using DMVs using the latency factor hope this demo was worth your time I work for a dominant systems and here are some of the brands that I am associated with data platform geeks equal measures expand people with India Red Gate also I am associated with and every year we organize Data Platform summit a very large conference in Bangalore that focuses on Microsoft data platform and open source stack if you're interested you can always visit DPS 10.com and that website will host the conference for each year this is an in-person event my name is Amit Bunsen and I have been working with sequel server for many now I am a Microsoft Certified Master of sequel server and MVP also honored by Microsoft as Regional Director I speak at global conferences whenever I get time but my day-to-day job is to consult and train on sequel server and Microsoft data platform that's the URL sequel maestro's comm /a meth - Bunsen if you want to learn more about me please be connected with us on sequel maestro's here are all the links and you can connect with me on a mid Munsell dotnet that's my twitter a underscore Munsell i am also available on linkedin and facebook you can pause the slide and check out all these URLs or you can look at the YouTube video description also I want to bring your attention to some of the trainings that we offer in sequel vegetables but more particularly hands-on labs I believe the best way to learn is to do it yourself watching videos is nice it's always fun but trying it out is is the real fun so we have created this learning platform called hands-on labs you can go ahead and create your free account or just drop an email to actual sales at sequel maestro's comm in this platform this solution you have more than 100 labs on various technologies under Microsoft data stack and of course many of them are on sequel server but you have all stuff like as your sequel database as your data factory machine learning artificial intelligence lot of labs related to DBA and developer go ahead and try them out each lab has multiple exercises and instructions on how to execute so really fun way of learning by doing it yourself also join Data Platform geeks comm where you have lot of videos webinars event resources for you absolutely free you can go ahead and create your free account on Data Platform geeks com do join our Facebook groups and LinkedIn groups in case you have more questions would love to answer them with this thank you very much for your time watching this video hope it was worth your time and hope you learned something new I am available on Twitter at www.att.com/biz to fall we have a good day
Info
Channel: SQLMaestros
Views: 6,102
Rating: 4.6428571 out of 5
Keywords: sql server, microsoft sql server, SQL Server IO, SQL Server IO path, IO Troubleshooting in SQL Server, SQL Server slow disk
Id: NapCyVGwBHI
Channel Id: undefined
Length: 15min 39sec (939 seconds)
Published: Fri Apr 20 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.