Windows Performance Deep Dive Troubleshooting

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right welcome everybody date three think of tech at 2014 today we will talk about Windows performance deep dive troubleshooting so we will drill into the core or Windows and why performance is so important for us anybody in the room who doesn't have or didn't have at all windows performance issues at one stage or another I had if I'm honest I had but we will give you the tools and the capabilities in this session so that you're able to resolve those performance issues the agenda for today's session is you'll first talk about Windows performance life cycle so how Microsoft is measuring the success of performance for Windows and the application we are releasing looking into Windows internals so specifically around how does Windows work under the hood so that you're able to understand later the information you're getting in performance analysis tools Jung will talk about Windows performance toolkit myself are on Windows assessment services we are both premier field engineers at Microsoft young what's your background so for the last 10 years from 2000 to 2010 I was a senior support escalation engineer out of Charlotte North Carolina so if you guys ever opened a case and that case was open for more than four hours and it went for days or weeks I probably have worked with you guys at some point and for the last four years I've been a premier field engineer out of Southern California exactly myself I'm based off to by it's a bit more sunny than Seattle that's why I chose that one it's like 50 degree in summer so we are not usually going out that why I'm a little bit more heavier than him sorry about that ok and what we will do is really showcasing you in real deep dive what's going on let's start with the windows performance lifecycle so why should we really care about performance first of all we want to make our end-users happy it's the end users happy you will keep using the applique and if I look into the recent case studies Appaji made they realize that up to 80% of the users deleting or uninstalling applications because the application freezes or is slow in the response in Microsoft we developed a performance lifecycle which we are using internally and which we highly encourage everybody to adapt as well so basically it is a concept of five different phases it starts during the design for performance way with learning so you learn about performance the behavior of your application you plan the development and then it's all about delivering a consistent experience for the user so instrumenting measuring an analysis part so the measuring and analysis part is a consistent process right so every time we ship a new operating system every time we work on a new application we measure the performance do some analysis and then we keep doing it for every major release for any update we are doing and this is what we are doing not only with your Microsoft like only within Windows we do it for mo D so Microsoft Office into explore all the products and we are working strongly to get it over to msdn.microsoft.com and to encourage developers also to adapt this model it is important why is it important if you look into the performance fundamentals it's based on our three pillars users want fast experience they want this snappy user interaction right they don't want that it takes so long for an application to start it should be fluent right it should be smooth and they move from one one-click to another and it must be efficient right it shouldn't drains battery life and it should not take lot on this footprint so those are the performance fundamentals when we talk about how do we make sure that the application or a process or a driver is fast is fluent and is efficient if you look into how this data is coming from what is important to understand first is some of the basics elements or Windows internals and how it works under the hood it is very important we will not have the chance to cover Windows internals in death I think it's no Marc wrote to two parts of his book for that Windows internals part one Windows internals part two so it's pretty heavy but let us share with you some informations around the key aspects the key principles which you should know about the key areas let's take first for a second and talk about user and kernel mode if you look into the user mode we are running system processes in there the application so any application you are starting on your computer like Outlook Internet Explorer Firefox whatsoever you are running that in the user context right so that's all in the user mode we have a kernel mode where the Windows kernel is running right so the name might guess that there's a kernel in there and we got the windows executive hardware abstraction layer to separate the application layer and hardware so we can run it independently and in between we have a so-called anti dll DLL which is kind of like a library every time your application say oh I need to create a new file it goes and say hey anti DLL there's somewhere new in your library a file called or an execution called create file do that for me in the kernel so that's the connection between the user and kernel mode it is important to understand some of these fundamentals because later when we look into Windows performance toolkit Windows assessment services there will be no explanation area because the tool assumes that you know the basics are on what is user mode what is kernel mode one more important aspect is how does it threat gets really CPU time and if you look into that let's try to make it as basic as possible we have a queue right so people are the threads are waiting we have a server called a CPU scheduler which tracks the time and say ok what is it Mike you do I let him in yes or no the CPU schedule say oh I got one thread one two and three thread one with high priority thread - with medium priority and thread 3 with low priority let me put them base the priority levels to the next q2 the ready queue so here you can see in the example thread 1 and thread two are moving to the ready queue thread 1 was high for 2 of medium level and then there's a dispatcher his response would say to execute immediately everything he finds in the ready queue so he said oh there's thread 1 I need to put him on CPU he needs to get some CPU life cycle he's some time on the CPU so then he moves that thread to the CPU the last thing I want to briefly mention is around Windows boot Windows boot and how it works because that's another important principle you need to be aware of because a lot of the performance cases young and me are handling or anybody in our team is our own slow boot right we saw a lot of cases 10 minutes 15 minutes and then we need to know ok in which phase what is happening there so we have the BIOS initialization phase we got OS loader everything around operating system start there and OS initialization there's a little breakdown on it and most of the support cases we see are in between OS loader tool exploring itsel is a ssin obviously kernel installation the kernel start so we load the Windows registry we load system drives session Association that's where the user sees the user interface first time right you see our waiting waiting waiting hopefully not too long after this session when logon is the time where the user is able to enter his credentials so he hits a ctrl-alt-delete types in his credentials it's the same time where we load group policies computer group policies and in between when log on and explore initialization we are the users entering his password confirms it and that's a moment with explorer initialization where we start explored on exe and we've said user group policies are getting loaded as well lastly we've got post boot so that's everything where which is not system critical which we didn't have to have doing boot time itself so for example also delayed windows services or applications we are which are scheduled to start some of the scheduled tasks as well young I know we have lots of different tools right for support analysis prompt analysis but how do you compare like all those different tools right so there are a lot of vast tools available to you guys and you want to have a big you know belt to be able to use all these different tool sets now everybody's have very familiar with perfmon to gather a big 10,000 level overview of what's happening on your system the client or server and then from time to time if you guys end up calling microsoft support will ask you guys for a memory dump either a user mode application dump of a high serialization memory leak or high disk latency or high network latency and the other type of memory dumps that we have is the blue screen of death red also known as a kernel dump and for the developers we have tools for visual studio to be able to live debug the problems that they have but the problem with visual studio is that if a an app or driver goes in production you're not able to go ahead and then attach for the digital studio on a production system right it just doesn't scale that well and that's where the windows performance toolkit comes into play it's very light it takes up most about 2% CPU on the system's it works across the windows ecosystem so we're talking about Windows RT Windows client when the server and it doesn't matter what products that you're running on top of it so if you're a SharePoint is file server of your running sequel exchange we're able to go ahead and take a look at an holistic view not only at the user mode which are the applications that you guys are running on top but at the kernel level so talk about a house the Colonel's like the foundation of the house or if anything goes wrong on the foundation right the the doors end up warping and the windows don't open up properly right and that your applications are affected so this have this tool kit lets you see a big you know it really dip into the issues that you guys are running into that doesn't mean that tools like processes Explorer which you guys are familiar with are not helpful process Explorer it's great at capturing data on your systems but the only caveat with this is that let's say that you want to engage your independent software vendor that soldiers the software or if you want to engage your link driver or storage driver you can only take a snapshot a screenshot of the problem that's appearing so for example I'm gonna go ahead and start an application that just assumes CPU here and we'll see that it is kasumi CPU right right here and a lot of admins and in users just come over here and there kill the process right yeah that's consuming you hi I'm gonna CPU but instead of dealing that process Explorer lets you go into properties and then you could look at different information like this and the one that a lot of administrators don't end up using is this threads tab the threads tab gives you the functions that are running inside of that executable so the work that is happening inside of the application so in this example it's a single threaded application and you can see that it's HCP dot exe plus this huge offset so by default applications don't ship with the symbols that's where you have to go to options and you guys see here what says configure symbols we have to appoint it to the symbol for that application so in this exam we have this PDB file here so I'm gonna go ahead and copy that and then paste it right here and I'm gonna click OK I'm gonna go back to this each CPU dot exe click on properties and the so everybody knows about the process ID and task manager inside of the thread there is a thread ID that identifies which thread is using up the CPU so in this example we're using 12.5% of CPU and we're able to click on stack over here and what you'll see is the work that executable is doing as soon as the UI becomes responsive and so at that point the the nice thing about this is that you could take a screenshot of this send it to the developer of the application or driver and they may be able to look at the line of code where that function is being called and you know to tell something about what's happening now the problem with that is if that developer wants to see if there's anything else happening on their systems this might not give the whole view so do you guys see this according to this tool right now it goes into this main function ready to start up and then it calls this function in wine but this is a single snapshot in period that we're taking so if I go ahead and go back we may end up seeing a different stack and look at this you see that so you have to keep on refreshing and you see that depending on where the clock cycle is or that thread is getting the CPU it will show you different functions so the same thing with user mode dumps when you guys are asked to get a user mode dump the reason that we asked you guys to get three to five years of dumps is that one snapshot may not give us the whole view of what's going on in the application that's causing problems thus multiple dumps now come this new tool called windows performance toolkit and we're able to go ahead and capture the data using the windows performance recorder what's your ships with the windows 8.1 SDK and it works for Windows Server 2012 Windows 8.1 Windows Server 2012 Windows 8 Oh Windows 7 and 2 9 8 r2 if you guys still have some 2003 servers Windows Vista or Windows 2008 systems you need to go ahead and download the Windows 7 SDK done that framework 4.0 to be able to utilize this toolkit with that said you can see that the most most type of issues that we end up seeing in the real world are listed here CPU this guy oh follow yo registry networking heap usage is better known as private bytes which is the number one type of memory leak that happens in applications ok so that's a great you know tool to have in your toolbox that you guys have it out there now to collect the data it's really simple we all we have to do is check this first level triage and because we're troubleshooting CPE is a CPU the issue we just have to check on CPU usage as soon as we do that we could click on start over here and this is running in memory so you can see that it's buffering about 4 4 gigs of memory out of my 28 gig system and it's recording so if I have this high CPU ization I'm going to be able to record what's going on so have this 10% CPU at 10 or 12.5% CP utilization and and all I have to do is take for you know a trace for about less than a minute so 30 seconds will work for me and I could type high CPU eight CPU dot exe I could go ahead and click on save and as soon as this is done saving we're gonna go ahead and open it up on the windows performance analyzer and this could take a little bit of time so just to be so just to let you guys know this tool is capable again of collecting twenty thousand samples per second depending on your CPU speed so if it's a two gigahertz CPU is gonna collect less if it's a three gigahertz system is gonna collect more data samples per second so we're not only able to look at the minutes seconds milliseconds but microseconds so for device developers like if you're having problems with drivers like NIC drivers or storage drivers let's say you got that sand and you're having problems with a HBA driver we're able to go down to the ten microseconds that is supposed to take care of an interrupt or a hundred microseconds that it needs to take care of this thing called DPC deferred procedure call so now that is done collecting the data set we're gonna click on open and WPA and let's get want to open like this and one of the things that you guys will see is that there is this thing called graph Explorer and we're able to see different graphs right called system activity computation storage memory there's a lot of information that is collected the one of the first things that I usually do is select under system activity this this thing called windows and focus when that can that that console application called ECP was running so it this con host was up for about eight point five seconds you could see the duration but what i'm interested really is in the amount of cpu so under computation we have this graph called CPU usage sample and we could go ahead and drag and drop this graph and one of the first things that she could do here is just hover over that high CPU logician and right away it tells you what process it is it's eight CPU XE and in parenthesis we see Pitt ID forty 404 so in there that I'm in the right track and you guys could see that as soon as I clicked on each CPU it highlighted the area where each CPU taxi was running so it's very in you know intuitive in this on the amount of samples it was done the CPU ization was at nine point four nine percent so I could expand this but there's not much information here by default for those of you that have played with it you could come here to right-click and any of the columns and there are more options that you could go ahead and add the first item that we'll be adding is thread ID again if it was a month you know multi-threaded applications like sequel could have a 1000 threads running at worker threads right so you want to see which thread was the one that was using up CPU time so in this case it was thread ID 8624 so very similar to what we saw in process Explorer now that we have this information we could add the stack which will give us the binary name and the function name so the nice thing about this tool is just like an Excel pivot table anything to the left of the golden bar you guys see this we're able to pivot data around that so here's I moved the stack to the right of the thread ID and we're able to expand this and you'll see root and you'll see these dll's do you see the dll's that are being called by each CPU so we are able to see this type of information now to be able to load the symbols we have to go to trace convert configure symbol path and we have all this information here this is the Microsoft symbol server the public symbol server there are other companies out there that also have symbols for their application what I'm going to go ahead and do is add these symbols for the each CPU so you guys are able to see that the functions that have been called here and it's you know depending on your network speed it may take a little bit of time for for the symbols to get downloaded I look like to your system but this tool again we use it to troubleshoot let me give an example we have trouble shot on high sequel server high CPU on sequel servers memory leaks and sequel servers we have trouble shot it for slow logins on terminal servers and Citrix servers we have trouble shot issues with inter Explorer for example where was consuming a high amount of memory we have used it to troubleshoot group policies so if you guys ever wonder why it is taking so long after you enter your username and credentials we're able to use this utility to do that for those of you that have domain controllers and have a high amount of CPU usage on alsace we're able to tell once we get some information of is that LDAP queries that are happening for the clients we're able to go ahead and trace that in conjunction with network monitoring tools like net lon or Wireshark and able to see what's happening on the wire and what's happening locally to the system so this really is really powerful so with that said we have our symbols loaded here and you can see that we just like we saw on price Explorer we are able to see information and such as domain and then function one was called I'm going to go ahead and just look at this summary tab and I'm gonna expand here and you can see that function one ended up calling function to function two and the calling function three and function three calling call function four five six and seven and that's the thing this gives you a big overview of what's going on which press explore again depending on when you click on the stack you may see just function one you may see function one through four you may see only function seven but this gives you the holistic view of what's happening on your systems so with that there are other things that we could deal with this tool and on servers if you guys have 32 gigs of memory or more you have to add this custom measurement called general profile for large servers to not use up a lot of memory on your servers if you guys have done that applications we have this custom profile that we could share with you and have you guys troubleshoot your done that 2.0 apps by default this thus toolkit is able to troubleshoot done that framework 4.0 applications so let me go ahead and show you guys an example of a slow go slow logon how many of you are on Windows 7 today Wow ok so well on XP just just - ok so not bad let's take a look at this at this boot trace and the way that you look at the boot trace is by going through system activity and we could go ahead and go to the first graph that we're going to start out with is boot phases so the booth phases shows us that this section the pre session in it is taking up 95 seconds folks 95 seconds that's a long time so in order to look at what what's happening during the booth phase we could always look at different items as CPU or disk look look at this during the initial period there's a high amount of CPU and then followed by a high disk amount of high amount of disk IO so we're gonna start out with looking at the CPU and again we're gonna start out with the CPU usage sample and here we are at that point for were for 95 seconds the system you know was slow it should that prutte read in it sections should take about at most 10 seconds it took 95 seconds so we were able to zoom in to this area like this by right-clicking on the graph zoom in and we could see that most of the CPU time is spent by system the system process and just like before we're gonna we're able to go to load symbols so trace load symbol and while the symbols are loading we're gonna go ahead and add stack just to do a quick or review of what might be happening so here we are with stack and the thing is the first thing that we see is we're in the kernel start system thread and then I could expand this by clicking on the right right key up here on the keyboard and I'm now that I have where the where the area of my high civilization is I want to look at the summary table a little bit so I'm going to click on display table only and I'm gonna expand this and I'm gonna keep on expanding and the one of the things that we see is that instead the kernel we're on the NTFS filesystem so NTFS and we could keep on going and it says NTFS device IO control so we're doing something with something with a driver and cost storage driver and we're in the kernel again and so on and and then it's EFS goes calls a driver and look at what it has folks fall snap does anybody know what Balthus use for is for the volume Shadow Copy right so if your end user wants to restore an applicant a document this is the driver that takes a snapshot and keeps you know a copy of it depending on the settings that you guys have so in this in this system somebody had created whole bunch of snapshots of documents and it's reading that information at the beginning so for performance reasons for enterprise customers we highly recommend you guys to disable volume Shadow Copy so that your end users are able to store it on a file server and that file servers backed up instead for performance issues like this so it gives us a nice overview of what's happening with the CPU and the other thing is we still saw that there was a lot of disk i/o churn so we're gonna go back and see what's happening with the disk i/o so the the nice thing about this this tool is that it gives us a lot of details I'm gonna go to storage and I'm gonna go to this queue siege and I'm gonna go ahead and drop drag and drop the disk usage graph the first thing that I would notice is that at this point in time something happens where my disk i/o is it's at a hundred percent right so my disk is hundred percent utilize I'm gonna go ahead and zoom in and this is the beauty of this it gives you this type of information do you guys see this the this guy Oh priority so starting with Windows seven and when the Vista we have different disk i/o priorities so for those of you do that's that do server for sequel you're supposed to turn off the so that it never goes to very low the same goes for exchange and the same thing goes for if you're running Oracle database there's a register key that you can turn this off but on workstations obviously you want to leave it the way it is about default and then we're also able to see the i/o type is a reads or writes so under normal is a reads writes chords that flushes to the disk and we're able to look at what processes are either reading or writing so in this case we're able to see that the disk is tape spending most his time in reads and this is the process that is using up the most amount of ton the system process which is the kernel now the nice thing about this tool is I'm also able to see the path name so under the system process I'm able to look at these different paths that are being accessed and you can see that there's one called unknown bitmap MFT and system32 drivers you can see every single file that is touched when the system is when we're taking the trace either boot traces or if you're troubleshooting a disk slow disk local disk u s-- and disk or even an I scuzzy disk so the nice thing about this is you're also able to sort it by size if you want to so how much this guy o is being done by these different processes and just let to let you guys know unknown on during boot is is because of the of the caching that happens by the prefetcher and super fetch and all that is loaded in memory during the boot process so let's see what's happening with the super fetch is it working or is it busted on the system so the SE this versus miss most of our count is a mess instead of a read the rights are good missus are bad so what we're seeing here is that the system for some reason is not loading the items on disk into memory I don't know if you guys knew this but the number of reason for your Windows 7 systems Windows 8 systems to be slow is because they are bottleneck on disks that's the number one reason of slow performance so there's a huge difference as you guys probably noticed between running an SSD on your workstation a laptop versus running a regular SATA drive either a 50 250 400 over 7200 rpm disc with that said the super fetch on the system is broken the nice thing about this is that we do have a hotfix that solves a majority of Windows 7 performance issues do you guys know what ha fix that is starts with two seven seven five five one one if you guys haven't deployed that into your environments it has about ninety hot fixes in one package so it fixes a good chunk of these issues that you see here not only with disk this performance problems it also fixes issues with your policies and and network problems and the file copy problems so this is a high overview of what this tool is capable of doing and with that I'm going to show you guys that this also works this tool also works with Windows Phones so here's Visual Studio under tools you guys see this Windows Phone 8.1 developer power tools the same Zack UI is there under the performance recorder to be able to troubleshoot performance issues on your in-house Windows Phone applications so if you guys have in-house applications that are running Windows Phone you guys are able to use this same Zack toolkit to go ahead and take traces of your systems let me go ahead and cancel over here and let me show you guys this additional information that this tool is also capable of doing how many of you guys have trouble shot slow disk IO to your sand or I scuzzy NAS device both of them okay a few of you guys awesome the cool thing about this tool is you could create these custom measurements and it's fusing and you could go to let me go to my tech demo and I could go to these profiles and look at this do you guys see this store port racing store port racing is at the lowest layer of where your true shooting performance issues with your on your server so there is a store pour for 2008 2008 r2 in this example I have a store pour custom W PRP file to measure Windows Server 2012 systems so let me show you guys where what layered this is go ahead and tracing if you guys look at this this is worth the port of the many ports such as store poor or scuzzy port end up setting and there's nothing between that and the your hardware so if the disk is low over here more than likely it's a problem with your multipathing software it's a problem with your HP a not being tuned properly or the drivers being old or you got a problem between your HP a and the it's you know sand somewhere or something on the storage this has been agreed but with all the storage vendors that if the store poor miniport tracing shows this glade to sees so anything more than 25 milliseconds that more than likely it's an issue with the hardware of your sand so this is a quick and easy way of figuring out if the issue is on the server or if the issue is at the sand and because I scuzzy just plugs into store port you could use this same tracing in addition to like now we're tracing that you would normally collect to see if there resets and different things like that or a fiber channel trace if you're working with your sand vendor so there are a lot of customizations you could do now only these but if you go to if you run this command right here X / - providers you'll see that pretty much for every single component that ships with the US is instrumented so you're able to add information such as group policy you're able to add information as network your network connectivity to your systems and so on so there's a lot of nice things you could do now the next thing down we get show you guys is a quick demo of memory leaks so for memory leaks all I have to do is check this box for heap usage do you see this and I'm gonna go ahead and start a tool called ver map test that leaks memory and this is a tool a fleet freely available for download but from blog saw technic I'm sorry blogs austech not calm and just look for erimar goose's so this tool has this nice UI where you could do different things like leak memory the only thing that you have to do is go into the registry and let me go ahead and show you guys set this setting called tracing flags so here's my Vermont test 64 so it's under H key local machine software Microsoft Windows NT current version image file execution options and I created a new key called Vermont test 64 inside of it there's a D word value called tracing Flags that we have to set to one as soon as we do that we're able to start launch the application to start collecting memory information and I could start leaking memory on the system on this app obviously your app would be leaking by itself but for demo purposes I'm leaking it on on this demo app and all we have to do is collect it for a very short time and here we go we're going to save this information and again we're going to open it in WPA in here we're gonna the first thing we need to do is make sure that our symbol paths are set up so I have the public symbol server and I'm gonna add the symbols to this application right here and I'm gonna paste it I'm gonna go ahead and click OK and here we have a section called memory and in memory we're able to see that there is a a graph called heap allocations the nice thing about this is all I have to do is drag and drop like that and it really shows me that this process is the highest use usage of memory at four gigs do you guys see this it's that simple drag and drop if this was how many of you have used debug diag to troubleshoot issues all right so you have to collect a dump and then you know if you're lucky the the built-in analysis will give you information this is a lot simpler to use folks so I'm going to right click on the column and I'm gonna add the threads to see how many threads were causing my memory leak and I could see that is just one all right single threaded the other thing that I'm gonna go ahead and do is add stack so here's my stack and I can move stack to the left like this and I can ricci all the dll's right so now that I'm able to see the DLL again I'm gonna go to the trace load symbols and again depending on the speed of your internet it may take as fast as 10 minutes I've been at my own microsoft office say it took an hour and a half so it might take a little bit time to look at this data set but in the end we're gonna be able to tell what function in the application was causing the memory leak that's the nice part about it is something that it would take you if you have to look at a user mode memory dump something that would take you probably at least thirty minutes not if not an hour or two is done in a mere 10 minutes or so so it's really really useful so the thing that we're gonna be looking for here is we're gonna go ahead and zoom into this area where the memory leaks is happening and now that we're zoomed in we don't need to look at the graph we're gonna go back to the summary table and we're gonna keep an eye on the memory allocation here on the right side so having a 24 inch monitor is definitely nice yeah at your office so if you guys don't have one just make sure you tell your management that you guys need one now all right okay so with that said we're gonna go down here and look at this it gives me that gives me this type of information and by the way I need to make sure that I'm zoomed sorted by size because I think I just clicked on the wrong thing here so you can see information that is useful like this on butts and clicked new blight right neat new bite so memory is being allocated there and that should call into an malloc so it was for those of you that are familiar with see you'll see malloc right here and in turn caused an internal API call RTL allocate heap so that's where the memory is getting allocated by this function right here and if I had source code let's say that I was the developer I'm able to go to that level of that line of code and see where I am Alec hittin the memory but not freeing up so that that's really useful again it only took me what about 10 minutes it took longer probably to load the symbols then for me to find where the memory was occurring so with that I'll pass on to you you live sure thing alright okay so basically what Jung showed is all about Windows performance toolkit right so where do you get it first started with it what is the tool so here's the Windows performance recorder to do the recording and the analyzer where we see where he spent the majority of the time to really analyze real data you get it from the Windows SDK or if you go with Biggers 80k the Windows assessment and deployment kit where you get even further applications and just let me jump into a specific area that's gonna work yeah the mouse the mouse is terrible come on where's the mouse you know the shortcut Saro that okay so thatís windows performance toolkit well there's one more tool which we recently introduced for performance analysis on scale because it's nice what you can do with Windows performance toolkit but if you're part of the image engineering team for example you would have some challenges to do these deep dive analysis over and over for every build you're running or you're about to release so the engineering team in the windows performance team thought about that and introduced the Windows assessment services the idea of lwas is to provide IT professionals were in image engineering a test framework to measure performance reliability and the functionality of the operating system and the behavior of programs like Internet Explorer the behavior of drivers on scale how does it look like you haven't server where you are running the Windows assessment services server the server role let's say and then you have the clients you want to test this could be bare metal machines or virtual machines you run an automated image deployment through that framework so we use WDS and Windows PE in order to do the testing we will collect the same information basically Jung showed so event tracing locks over there and do automated analysis let me show you how that looks like so that's windows assessment services it's a free tool by the way like any tool we are showing it's free completely free you just go download JDK or SDK and install and once you install you're getting specific performance measurement scenarios for example driver verification browser experience Windows Store apps performance startup and shutdown experience she click into one of them you will get more details so for example the start of a shutdown experience will measure the system boot shutdown hibernate and standby of that computer the run of that job takes two hour and thirty minutes so waiting for x86 as well as x64 and it supports Windows 8 and onwards so what does do is basically if you click here on more information if we go to Microsoft comm and provide you even more information so here it will run first a job number one which is boot performance so fast startup and then standby pros obviously I won't click now on run because I don't want to shut down my demo machine but I prepared some of those scenarios for you so here for example you see in report for driver verification which I just took this morning from my own machine 8:00 a.m. so it will give me informations about my hardware itself first full CPU the job name when did I do it it's a machine ID so the computer name and if I use a template yes or no so I drill into it so it's all about driver verification here so he found a couple of issues so he identified for example some unnecessary drivers so I could go now to driver management and look into ok who's the vendor Ziya the OEM was shipping that or what what's really the driver about and analyze and see if there's an updated version of that driver I see multiple drivers here again devices move multiple group drives so here I see ok that's a device name that's group by om 15 it's coming from Intel you see the current driver version one other example is Internet Explorer startup time so here basically what it does is it takes around 10 minutes that job and it starts in time for around five times and measures the time it takes to load the first page right and then it will give me certain warnings or issues so here for example create tab of ie launch assessment exceeded its threshold so I could click on it now look into the recommendations right here Tate okay iteration one took that long to three and so on and I get immediately the recommendations how I can mitigate it and remember I was saying it is more lightweight for image engineering and Windows performance to get more in depth that's basically what he says as well right for more extension use wpa2 investigate further but but it will give you a general idea where to drill into same counts for Windows UI performance again I'm getting here deep dive informations or on what's going on boot performance so that's my own machine it noticed that there are some optimization which could be done by my om so here's a set up special was not complete by the manufacturer right and then the recommendation for it the manufacturer should run the set up specialized before the end user receives this computer so in general Windows assessment services will give you a lot of information on a high level how to optimize your image before you run it into the production environment so if I just switch over here for a second again what are the requirements in order to get Windows assessment services we quite recently announced it when we ship Windows Server 2012 Windows 8 release so it requires Windows Server 2012 it is recommended to have at least one gig right off Nick and you should have this space because as soon as you start doing lots of runs through Windows assessment sirs it will obviously take a lot of disk space because it's generating a lot of traffic and it's collecting all this data on the server and that so that it can analyze it if you look into the test computers itself what you need to do is this must be Windows 8 or onwards right so when I said 8.1 windows 8.1 Update 1 I still believe hopefully we will change the naming one day it's getting a challenge and particular USB support for USB and pixie boots support so if you want to do bare metal deployment test so you can say to you can say wwas should start a bare-metal deployment and it will measure use the entire boot deployment process and will tell you okay at this phase I will recommend you to do this instead of what you're doing right now you need to have a DHCP server right for if you're doing the bare metal deployment lwas need to know okay where those clients are some tips and tricks are on lwas you need to install it on a server computer and then you have the Windows assessment services client which I just showed you on those client machines if you're doing analysis after the boot time alternatively alternatively you can also have a standalone version everything in one box which I have on one of my other machine so you can have a server inclined on one machine if you want to drill down into your own machine around the setup itself you need to add certain drivers because if we are talking about bare metal deployment well it guess what if you want to do bare metal deployment analysis lwas need to know okay what is the network driver what does the network card you are using so that it can communicate to pixie with the machines and do this analysis for you so you need to add the NIC driver and then you need to prepare the windows PE USB Drive for the test computer inventory you can add lots of more drivers so you can really make it look like your own image right you can add your out-of-the-box drivers you can add your applications to that image that's all possible we see more and more enterprise customers getting interested in it because it just saved time from doing the usual hey let's start WPT let's do a trace okay I found this issue let me drive dual into it most of the times the first things you will see in WASR we the current road blockers you have in your image engineering process so this will give you a high level and quickly the information needed in order to do the windows deployment if you cannot solve it with lwas with Venus assessment services then what I will really command is using like yung showed windows performance toolkit because there you can really drill into the material and see okay where is this thread which function is it calling why is it happening so that's the WPT scenario let me just jump into here one more time all right one of the other things I really want to show you around lwas is you can also run individual assessments and you can combine them right so because we have certain scenarios in place right like browsing experience which combines IEC Q as security software impact mini filter Diagnostics over here first one second one but we will give you with the tools the freedom to choose okay say hey I want to have my individual assessment so what you need to do is you click on run individual assessment and then you can just select what you want right maybe you want to check the file handling so it measures the duration of common functions like copy move delete and sip you can analyze the memory footprint takes an hour meaningful to lots of information like you can even measure how long it takes to open view or search from photos right so some of the most common tasks and users will do so before the end user calls you and say hey every time I open my application is so slow you can have a KPI for then say no no actually with through the corporate image you have it will take you like I don't know one to one point two seconds to start this picture same counts for media player performance so here what it does basically as soon as I launch it it take to start Windows Media Player over and over and start sample videos right and then will tell you okay how long did it take to decode that video file what was the quality was really full HD you're seeing or it may be 4k so you get all of these informations some of the lessons we have learned so far from lwas since its launch right now it can assess one computer at a time so if you're running multiple tests for like let's say 10 M 10 machines it will take to start with 1 machine finish that job move to the next machine it doesn't support yet Windows RT so Windows runtime which runs on ARM processor likes the surface 2 and really when you look into lwas it is a really high level tool to get started around ok what does the image looks like is it fast enough is it fluent enough and what could be the potential user experience if you look into video processing picture processing if you really want to draw into then you need to have WPT right young yes and we could you know show them hang analysis so that how many of you guys ever had a look hang or IE or your in-house application has anybody have have the apps hang all right now besides killing the the app what have you guys tried doing this how's killing it is that what everybody does was that a common thing I would say the nice thing about the the when this performance toolkit is that we're able to go ahead and tell you guys what's going on so how many of you have ended up using the windows resource resource monitor the resource monitor a lot of folks right so let's let's say let me ask you guys this have you guys ever notice this thing called analyze weight chain oops okay let me try this again right click have you guys ever noticed this and live a chain is built-in right to resource monitor now the thing is that a lot of times this will this will only tell you what thread might be hung on it will not tell you what that thread is doing for a living is there waiting on network is a waiting on disk is it waiting on CPU and the nice thing about the toolkit is it will tell exactly what's waiting on so here we go back to the application call vernment test and i'm gonna hang this for about 30 seconds do you guys see this it says hung UI and i just punched then 30 seconds so what i'm gonna do is bring the ones performance recorder and all we have to do is check the box for CPU usage and click on start this recording and i'm gonna hang this UI so i can't move in now and if i go back to the resource monitor I'm gonna click analyze chain it says its but now it says it's not responding right that's all is it it's able to tell me because we're running the windows perforce recorder we'll be able to get you guys root cause of what call you what happened to that application why did it go to lunch for 30 seconds and came back or my why might have gone to lunch I never could have come back so with that we're gonna so the hang was complete and I'm just gonna save this trace again and again depending on how much data and collecting it may be faster or slower and these files let me tend to this don't run it for an hour if you run this for an hour you may run out of disk space these files could end up getting to five gigs I think the biggest that I've seen it was 32 gigs folks so be careful run it for a very short period of time so here we go we're opening up the windows performance analyzer let me bring it to the screen and when you're troubleshooting a hung application and by the way this works also if your machine just hangs for like five minutes and comes back you look at perfmon and there's no data right it's just a blank line this tool will still collect information and tell you exactly what driver caused your your machine to hang it could be server client or when it's phone when is our T systems with that said the first thing that I need to do because it could take a long time to load the symbols I need to make sure that I pointed to the symbols for this particular application so let's say that dev is in house and give me the symbols I mean I click OK start loading the symbols before I do anything and I'm gonna go to system activity the nice thing about system activity as you guys could see has this graph called UI delays so makes life easy here we are and forth 30 seconds this application was delayed right and here we have 30 seconds so under message track delay so all I have to do now is zoom in to this area now that I know where my application was hung for 30 seconds minimize this and instead of going to computation CPU sample we're gonna go to this new graph called CPU usage precise folks okay and we're gonna go ahead and drag and drop CP usage precise and for those of you they're going like man I wish I had those you know column set up the way that he's doing Andrew Richards has a link that will be providing to you guys on the deck that has these templates where it sets up all these columns properly so that all you guys have to do is come to profiles apply browse and then just apply the templates so here's an example so I have four high CPU high disk IO Heidi PC interrupts so this is if you're having a driver problem slow boot slow logon so for those of you they're doing winners clients and weight analysis this is the you know hang analysis so I'll show it to you guys in a little bit but going back the columns that we need for hang analysis is I have to right-click and we have this come called new thread stack this other column call ready thread stack and we're done now because it has to reload the symbols the UI will be unresponsive while is doing the things on the on the background for a little bit of time that's okay so will give will give it a time but in the past what you had to do if your application or your system hung do you guys know that if your system hangs instead of having to hit the power button you could blue screen the system and get data out of it how many of you guys knew that servers and clients okay you could you could blue screen the system is it and actually get data and find root cause analysis obviously analyzing the dump it's really difficult right that's why we have X developers that it all they do for a living is daily look at memory dumps and the same thing goes with ocation hangs we have guys that develop for a long time that end up looking at those user-mode dumps those windows error reporting dumps that when you have application crashes or when you have application hangs the little war comes up we're able to get get you guys root cause analysis but obviously it takes time and a lot of patience but with this tool life is easy all I have to do is move this where do you guys see this where it says weights max all I have to do is move this to the right of the golden bar and then sorted by that and I need to look for my application that was hung so the first thing that I need to do is remember what the process idea was 80 68 just in case if I had multiple versions of the application so I'm gonna go ahead and look for vert mam I'm gonna click Next here we are and the next thing I need to do is find out what thread it was do you guys see this it was thread 65 88 so I'm gonna go back excuse me to CPU sirs precise and we got to look for 65 88 and here it is because I had my symbols loading at the beginning to save us time I'm gonna go back to summary table and look at this beauty folks I'm gonna be able to tell why it hung so keep an eye on the right side it was hung for 26 seconds that only gets before microseconds milliseconds and seconds so it's very detailed so if you're a financial company doing trades in milliseconds this is the tool to use to find out how to make a run faster so or if even if you have a SharePoint site you want to make it run faster use this tool guys and find out where in your code the bottleneck is so with that said I'm gonna keep on going and here we have this function that says on button click go and hang and look look what's been called right underneath I'd sleep e^x delay execution folks so that function right there called sleepy X made the app hang because it was sleeping right so believe it or not we had a customer that had a third-party application where during logon Explorer would not come on for five minutes there the the third-party code was calling sleepy eggs 10,000 times for five seconds ten thousand times for five seconds right so we're able to tell this type of detail with the with this tool that in the past you would have to get a memory dump and it's super difficult for folks that you know don't look at code like like us like IT pros like ourselves it's it's difficult so this tool makes life a lot easier to get to the bottom of the vicious with that said that's the end of this demo yes so well with that said I mean we are nearing down and we will be actually open for Q&A now if you don't have any questions thank you so much for attending thank you thread one with high for two with medium level and then there's a dispatcher he's responsible to say to execute immediately everything he finds in the ready queue so he said oh there's thread one I need to put him on CPU he needs to get some CPU life cycle he's some time on the CPU so then he moves that thread to the CPU the last thing I want to briefly mention is around Windows boot Windows boot and how it works because that's another important principle you need to be aware of because a lot of the performance cases young and me are handling or anybody in our team is our own slow boot right we saw a lot of cases 10 minutes 15 minutes and then we need to know okay in which face what is happening there so we have the BIOS initialization phase we got OS loader everything around operating system start there and OS initialization there's a little breakdown on it and most of the support cases we see are in between OS loader tool exploring it'll is a ssin obviously kernel installation the kernel start so we load the Windows registry we load system drives session association that's where the user sees the user interface first time right you see our waiting waiting waiting hopefully not too long after this session when logon is the time where the user is able to enter his credentials so he hits a ctrl-alt-delete types in his credentials it's the same time where we load group policies computer group policy all the products and we are working strongly to get it over to msdn.microsoft.com and to encourage developers also to adapt this model it is important why is it important if you look into the performance fundamentals it's based on our three pillars users want fast experience they want this snappy user interaction right they don't want that it takes so long for an application to start it should be fluent right it should be smooth and they move from one one-click to another and it must be efficient right it shouldn't drains battery life and it should not take lot on this footprint so those are the performance fundamentals when we talk about how do we make sure that the application or a process or a driver is fast is fluent and is efficient if you look into how this data is coming from what is important to understand first is some of the basic elements or Windows internals and how it works under the hood it is very important we will not have the chance to cover Windows internals in-depth I think it's no marked road to two parts of his book for that Windows internals part one Windows internals part two so it's pretty heavy but let us share with you some informations around the key aspects the key principles which you should know about the key areas let's take first for a second and talk about user and kernel mode if you look into the user mode we are running system processes in there the applications or any application you are starting on your computer like Outlook Internet Explorer Firefox whatsoever you are running that in the user context right so that's all in the user mode we have a kernel mode where the Windows kernel is running right so the name might guess that there's a kernel in there and we got two windows executive hardware abstraction layer to separate the application layer and the hardware so we can run it independently and in between we have a so-called anti DLL DLL which is kind of like a library every time your application say oh I need to create a new file it goes and say hey anti DLL there is somewhere in your in your library a file call or an execution called create file do that for me in the kernel so that's the connection between the user and kernel mode it is important to understand some of these fundamentals because later when we look into Windows performance toolkit Windows assessment services there will be no explanation ever because the tool assumes that you know the basics are on what is user mode what is kernel mode one more important aspect is how does it threat gets really CPU time and if you look into that let's try to make it as basic as possible we have a queue right so people are the threats are waiting we have a circle called a CPU scheduler which tracks the time and say okay what is it Mike you do I let him in yes or no the CPU schedule said oh I got one thread 1 2 & 3 thread 1 with high priority tried to with medium priority and thread 3 with low priority let me put them based on the priority levels to the next queue to the ready queue so here you can see in the example thread 1 and thread two are moving to the ready queue all right welcome everybody day 3 think of tech at 2014 today we will talk about Windows performance deep dive troubleshooting so we will drill into the core or Windows and why performance is so important for us anybody in the room who doesn't have or didn't have at all windows performance issues at one stage or another I had if I'm honest I had but we will give you the tools and the capabilities in this session so that you're able to resolve those performance issues the agenda for today's session is you'll first talk about Windows performance life cycle so how Microsoft is measuring the success of performance for Windows and the application we are releasing looking into Windows internals so specifically around how does Windows work under the hood so that you're able to understand later the information you're getting in performance analysis tools Jung will talk about Windows performance toolkit myself are on Windows assessment services we are both premier field engineers at Microsoft young what's your background so for the last 10 years from 2000 to 2010 that was a senior support escalation engineer at a Charlotte North Carolina so if you guys ever opened a case and that case was open for more than four hours and it went for days or weeks I probably have worked with you guys at some point and for the last four years I've been a premier field shanira Southern California exactly myself I'm based off to buy it's a bit more sunny than Seattle that's why I chose that one it's like 50 degree in summer so we are not usually going out that why I'm a little bit more heavier than him sorry about that okay and what we will do is really showcasing you in real deep dive what's going on let's start with the windows performance life cycle so why should we really care about performance first of all we want to make our end-users happy it's the end users happy you will keep using the applications and if I look into the recent case studies Apogee made they realize that up to 80% of the users deleting or uninstalling applications because the application freezes or is slow in the response in Microsoft we developed a performance life cycle which we are using internally and which we highly encourage everybody to adapt as well so basically it is a concept of five different phases it starts during the design for performance way with learning so you learn about performance the behavior of your application you plan the development and then it's all about delivering a consistent experience for the user so instrumenting measuring an analysis part so the measuring and analysis part is a consistent process right so every time we ship a new operating system every time we work on a new application we measure the performance do some analysis and then we keep doing it for every major release for any update we are doing and this is what we are doing not only for Microsoft like only within Windows we do it for mo d so microsoft offers in to explore
Info
Channel: Fidela Aretha
Views: 19,970
Rating: undefined out of 5
Keywords: Microsoft, Build, Windows Operating System, Milad Aslaner, Yong Rhee
Id: 6IXx7xx8t2Y
Channel Id: undefined
Length: 78min 58sec (4738 seconds)
Published: Thu Mar 29 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.