Monitoring & Troubleshooting Host VM Performance in vMware vSphere

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to Train Signal you are watching monitoring and troubleshooting host and VM performance so let's talk a little bit about tools for performance monitoring this is a fairly simple lesson mainly because we've already talked about ESX hop in the past but I just want to give you some specific counters to look at and things like that so we'll start with the ESX top our ESX top the remote version like we've talked about and then we'll look at visa and performance charts which a lot of people are already familiar with if you've been doing any be sphere administration you're probably pretty familiar with those but we'll hit on the advanced features of that they do sometimes show similar information sometimes in a different format so whereas one may show a percentage vCenter will often show a total count like sometimes on errors ESX top will say here's a you know an error percentage for network you have transmits whereas B syndrome will say well here's a running count of errors in frames it's just something to keep in mind when you look at these sometimes you can't really do a direct comparison of the numbers we already hit on ESX top earlier in the course but it's kind of important to understand so here six top is a tool and it's similar to UNIX top and by similar I mean you know they have the same look and feel and some of the same information it's used to monitor performance metrics a lot of times we'll do it in real time so that's how we'll do it in the lab in a minute and it's just a great way to sit there and look at something if you're seeing a problem you can kind of walk monitor things like latency or errors or usage or utilization and see what's going on you also have the option to run it in batch mode save the data in CSV format and then pull it in to another tool that's handy if you want to catch it over a length of time and then go back and review your findings so to do that again you'll use a delay which says how often so like a delay of - d5 there we'll say every five seconds iterations or the number of times so it's five seconds times 200 so you'll be doing this for a thousand seconds and then we're out putting it into that metric capture dot CSV file so you can pick and choose half and you want to grab the metrics and how long you want to run it you of course don't wanna get too aggressive because it does put load on the system and usually you know every few seconds is a great kind of a a shot and again we talked about these some common tools or windows perfmon Microsoft Excel you can import CSV files straight in and then the ESX plot tool latest version of ESX plot is available there so let's take a look at a couple of common metrics one is percent ready this is a percent of time a VM is waiting to be scheduled this is a common one so for example well first of all this is this is what happens if a VM is sitting there it's ready to have a thread you know executed on a processor a physical processor and it can because the vSphere scheduler says hold on I got other stuff going on I'll get two units in just a split second often we'll see this with VMs with multiple V CPUs because it's trying to schedule them as close as it can on the same CPUs that it has before for cache locality and things like that and it's having issue scheduling in so it's often due to the overuse of V CPUs if you need the V CPUs great but if you don't you're causing scheduling issues and you'll hit hi ready time so now to my lil anecdote I had a good friend brought me in to look at their dev test environment and they were having terrible performance so they were running really high consolidation ratios because these servers had a terabyte of RAM each and there were a couple of them and they were running about two to three hundred VMs per host not unheard of you know that's a lot but it's not unheard of but their performance was bad you look at the CPU use and it wasn't terrible these were I think 40 core servers 4 by 10 cores and so it's a lot of course you're looking at new CPUs is a terrible but they're ready time was crazy it was like twenty to thirty percent so twenty to thirty percent of the time when a VM was ready to have a thread executed it couldn't so they were getting just abysmal performance I want to see ready time under ten percent I really want to see it under five percent but under 10 is pretty acceptable to most people but if you want your best performance you really want it under five so keep that under that percent cstp this is excessive use of virtual SMP which is calling post scheduling problems if basically it's look it's having to do switches it's having to do multiple VCU he uses having to do scheduling you want to see this under three percent just something is kind of a good rule here then you have percent system percent of time spent by system on behalf of the world or you know basically the VM usually caused by high i/o basically the system is having to do a bunch of background stuff a bunch of interrupts and it's taking up and chewing up a lot of time so you want to see this under 20% if this is more again it's usually caused by for some reason high i/o and there's a lot of CPU work being done to move that data around in and out not a lot you can do but at least take a look at it make sure you need to do that i/o make sure the i/o is being done efficiently the right drivers the right devices the right devices for the guests you know good Nicks and HBAs and the server's things like that and then the last one percent ml MTD vcp was ready to run CPU but was artificially limited basically you know you've got a CPU limit so I really want to see this at zero I'm not a you know I won't say I'm not a big fan of CPU limits just use them sparingly and if you see this above zero it usually means that you've got a limit that you may have forgotten about or someone else set so take a look and dig into that some common memory metrics first memory control s Z basically the balloon driver in the VM is being used to reclaim memory so if you see this a 1 or higher it means ballooning is going on your memories being constrained some ballooning is fine but if it's a chronic issue or causing performance you get at least gives you a place to look and say go look at memory the next one swc you are is hosted swapping memory to disk at some point this is kind of like a running total anytime a host swaps to disk it's bad performance goes through the floor it's not as efficient as a guest swapping like windows swapping or Linux swapping at least the guest knows what he's swapping he knows it's memory he's probably not going to need when VMware does it or vSphere does it it's just swapping and it doesn't have that level of my standing so it your it performance just goes to the floor then you have swap read and swap right which means it's actively swapping to disk either reading or writing again bad bad bad cash USD says host has compressed memory at some point I want to see this at zero if you remember we do transparent page sharing we do ballooning we do compression of memory and then we swap so compression is that last stand before we start swapping to disk so you really want to have to be zero zip per second is basically host is compressing memory right now and about to start swapping so again you want that at zero unzip per second is saying the host is accessing compressed memory so it's uncompressing it so if it's above zero that means at some point that it compressed it and now it's uncompressing it so again it's moving memory in and out that takes CPU and we're right on the brink of swapping to disk and then n % l which is talking about local Numa access so percent of time that memory access is local so remember Numa it lets us access memory directly from the CPU if it's not local we have to go through another CPU to get to the memory that adds latency at least you know a threshold you're looking at is eighty percent of the time you want this to be local so if it's less than that if this is sixty fifty thirty twenty it means there's some issues you want to look at how your VMs are configured you want to look at your memory config in the server and you want to see if you can tweak this to make this work a little bit better so now a little bit on visa nor performance charts many are already familiar with these charts but for the exam I want you to know how to create you default charts configure proper metric time windows confirm V Center performance logging levels basically the advanced side of the performance charts and they're the Advanced Settings learn how to go through there I highly suggest you spend a few minutes at some point going through host and VM performance charts looking at the advanced looking at the different subgroups seeing what your options are and the reason being is if you get a question on that I don't want you to have to go through and start digging and remembering where these things are I'm all about time on this exam and I want you to kind of at least be able to quickly go to that point so at least become familiar with it and understand what you can and can't see in those charts so now it's time to hit the lab we're going to do a kind of a quick lab on ESX top we'll look at those key performance metrics that we just talked about and then the vCenter performance chart configuration and usage and a little bit about that so with that let's go ahead and let's jump to the lab so let's go ahead and we'll start with the SX top jump on Optimus again we'll expand this out I like more room with ESX top so by default it is going to do the most of the CPU stuff let's see so a couple things we want to look at percent ready we can see CPU ready and I really don't have anything too busy I've got a bunch of a bunch but probably 15 VMs on this box and most of the time it's the idle process so I definitely don't have to me be CPUs trying to apply for scheduling but you know there you go there's a percent cstp which is your excessive use of virtual SMP again that's all zeroes nice and happy let's see percent system here we talked a minute ago there it is that I wanted to see this somewhere under 20 percent of time spent on services usually high i/o I don't have anything that high i/o let's see the only one that's up here is my untangle VM and it's at point three three and that's my network firewall so it is doing some i/o but obviously it's less than half of a percent I wouldn't really call that anything to worry about and then let's see anything artificially limited ml M - here it is I don't I don't have any artificial CPU limits so that's all zero you know my labs fairly boring but at least shows you where the things are at so let's hit em and switch to memory and see what we can find so if here let's see memory let's see ballooning control s Z may have to do some fields there yeah J is that field hit enter control s Z is right about here there we go and again zero I don't have any memory ballooning not too much over commit on memory there let's see swapping swc you are here we have swapped some so we can see that a little bit in the past but nothing too much same here some swapping let's see anything currently swapping as tip you are let's do my fields I guess I've already got them I need to do the correct order so for case moves it left I'll move those over a bit memory control there we go move these over a little bit swap read swap right nothing actively swapping right now that I see so no problems there cache again let's take a look at my field order hit oh and let's see where's that could it be memory allocations Numa stats I'm gonna need an umma here in a second so I'll go ahead and F we have turned that on so we'll turn that on and I will turn on new mo which was G let's see if those have showed up or I need to move them again oh let's see I need to move them again oh for order and recompression move this way back and zip and unzip see right here all zeros so we're not actively doing anything cache oh there it is I was like I knew I forgot one right here a little bit of compression possible we're seeing a little bit or having the past four like V Shield here some point I have put some of these hosts in maintenance mode and move stuff around so again it's a running total but my zip and unzip per second shows it's not doing anything right now and then Numa access let's see that's not here so go back 404 order Numa is G so let me move him to the left again and let's see zip zip zip in percent L nothing going on there actually there's a single socket system so obviously everything is local but a you would normally see that here in and now so hopefully that was as fascinating for you as it is to normally look through ESX tom but one thing to keep in mind we talked about this in the other lesson Oh is your order f is your field enable as you can see you know I unfortunately have to do these recordings at 1024 by 768 normally have a much higher resolution screen so I can fit more on it but if you're like me and shuffling keep the O and the F commands in mind so let me quit out of this and now let's see there's my pointer and we'll go to V Center so we'll go to hosting clusters couple things here first of all administration vCenter server settings statistics keep in mind what your statistics settings are what kind of information you want to retain and make sure and make use of your database sizing tool normally have three hosts but one of them is having a boot issue I don't have anywhere near that many so we'll say 30 so what my current settings I'm at less than a gig but you can go through and set these like the you know 5 minute interval duration so we do an interval every 5 minutes you can actually set this to 1 minute if you want to get even more granular how long you want to keep those for and at what level the higher the level the more information so like 4 is all metrics so if you're being told to do all metrics you want to do for summary information or basic metrics would be one I usually for 5 minutes keep a fairly good bit of stuff and then you can pick and choose and how long you want to save these for again this is kind of basic information kind of vcp level stuff but again worth repeating then I want to look at performance charts so we come to performance a little fYI a little troubleshooting if you ever get to a point where say your historical time range reports do not work they will not load but if you go to real time they do normally that means you've got a time mismatch when you're vSphere hosting your vCenter server so check your time sink and all that so these are pretty simple you can view this you can views Virtual Machines on the host you can go to different virtual machines like my domain controller again performance and it shows you what you need to know and these are good overview charts what I'm more worried about for the exam or things like the advanced charts so you click advanced and it has some pre-built stuff here you can go over here and switch to certain types for things like disk network memory and then you can click chart options so so the information in here will be different depending on if it's a host or a VM so that's why I said go through here and look at this then you can go through and select some things so if you want to look at a chart for memory for the past week showing how much is granted active all that you can select past week do that and hit OK and it'll draw that out for you so then you can do it for the past month past year you can do custom which says show me the last hour the last day week month the last three months if you have the data things like that or if you have the data you can do a specific date range right here so if you want to see stats from a different date range if you get something you like for example I don't know let's see custom own CPU CPU usage for the last month and you're like this is a chart that I use fairly often you can save it so you can go right here and say save chart settings CPU last month and it will save it here you can even tell it to always load this at startup instead of the default to get this and we hit OK and it will show you that so it shows it's a custom CPU and it gives you what you want over the last hole you'll set month that's CPU use on this VM over the last month if you want to delete these like you created one and want to delete it do manage chart settings pick that one and hit delete so again come in here get used to these things see what you can look at see what you can see also some things are different real time versus custom or past whatever take a look at those see what you need to do here for selection get used to these and save your settings and understand how to make these the default so not a lot to know about performance charts or pulling information up but it's important to know exactly how to get there and get there fairly quickly so that's it for the lab let's go back to the slides and that's it for the lesson so not a real deep lesson but it's important to understand this is again one of these lessons that are kind of quick hits make sure you understand the specified tasks make sure you need you know where to go for the information and how to get it quickly stuff that I want you be able to do very fast during the exam so that's it for this one I look forward again to seeing you on the next lesson
Info
Channel: VMworld 2018
Views: 21,252
Rating: undefined out of 5
Keywords: Monitoring, and, Troubleshooting, vmware, vsphere, Monitoring and Troubleshooting Host and VM Performance, how to use vmware, vmware training video, kernel training, vmware horizon, How-to, Virtualization, vmware tutorial, vSphere 5.5, Datacenter
Id: 0PtaSPkZ2js
Channel Id: undefined
Length: 19min 1sec (1141 seconds)
Published: Mon Feb 06 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.