The One About The Windows Event Log

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
on today's episode we're gonna learn a little bit more about keeping an eye on windows through the Windows Event log and also how to actually figure out what caused that blue screen using some software development tools okay we've done our test machine up and running here we've actually been doing some benchmarks on this it's got an unusual problem that I think is because the CPU is not quite perfectly seated and it's socket it's got a problem that we've seen on every motherboard every exhaling that motherboard that we've tested that has an X on a non chip set so from every vendor this particular problem is not brand ignominy it is not you're going to run into this problem sooner or later and the machine's booted up everything is working correctly right well not necessarily most people don't even look at it but there's something called the Windows Event Viewer we're gonna right click on the Start menu and go to computer management now this is the computer management has been there since Windows 2000 so this is not really anything special we're gonna go down to Event Viewer and we're gonna go to Windows logs and this is sort of the the main area now if you're on server you've also got this application and services log there can be other stuff in there if you click on the folder it's gonna give you this summary view and so there's no events if you were to expand these folders and look at this it's like holy crap there's a ton of folders in here well there's nothing in any of them this is showing you that right here on the on the Windows Event Viewer most of ordinary Windows desktop stuff is in here now if you're unserved one other cool handy thing that you can do that I'll mention is you can for these logs to a server or another system to keep an eye on them which is really handy and as a system administrator you totally ought to do that because you can keep an eye on things alright so I've opened the system log here and it just so happens at this exact moment I'm there's a lot of these errors in here that say whta error logger that's really weird now I just rebooted nothing has changed with this system I just rebooted at about oh I guess it was 634 and the logs here now look before then hmm there's not really much going on others a little bit from whta but this is from a prior boot so this was a boot and then I rebooted and then no more whta errors then I've got some other airs in here so I'm in the system log right now there's also an application log and a security log and some other stuff but we're going to focus on system for right now and so this is a common error that we've seen on x99 systems and this sometimes is to do with PCI Express 3.0 sometimes it's to do with your graphics card drivers sometimes is to do with just the alignment of the Sun and the moon and the planets sometimes it is actually a hardware problem like you can actually have a hardware problem with your motherboard or your video card or something else anecdotally it seems like this shows up this particular problem shows up on motherboards that have plx bridges or plx chips for their PCI Express don't know what that's about I'll get a couple support tickets open with some various companies not not through any official channels which is sort of through unofficial channels to try to troubleshoot this this is also shown up some on the Linux kernel mailing list although most people sort of chalk it up to it's supposed to do that or disable error reporting or you know just ignore it but I don't think it's really completely safe to ignore this error and so we're gonna take a look at sort of diagnosing and making some changes and doing some stuff to hopefully eliminate the error but also sort of explain use it as a vehicle to teach you about the windows event system and so before we get to this error let's look at some of the other errors that are in here or some of the other informational messages that are in here so these are warnings it says there on the level so let's just scroll down alright so other than you know those warnings but what do we got here it's like oh Network link is disconnected big deal who cares there's not a cable plugged into that but it's generating a warning in the event log information here oh link is up at one gigabit full-duplex now for those of you guys that have not looking-- installed like in the last week you should skip to the end of your log and I think you'll be surprised it probably goes back to the day that you installed Windows and so this as a forensic tool in terms of like figuring out stuff that's happened with your system when you've booted it up when it's been on when it's been doing things there's a lot of information in this event log so you could figure out a lot that's not like browser history or anything like that but application crashes and changes to the system and when Windows updates have been installed that's all in here so you can really it really tells you a lot about someone's usage pattern there's a lot of sort of meta information in there all right so here we've got a kernel plug-and-play event all right so it's like oh the driver for something failed to load for something else not especially useful but we could google it and figure out more information about it all right here's an error our Ethernet is disconnected this is the 10 gig adapters all right you want express yeah no one cares kernel power okay so this is just some informational messages saying hey these power states are supported all right NTFS volume is healthy no action is needed informational message this is interesting application pop-ups event ID 56 application pop-up cannot be found this is an event messages it's literally saying an application tried to pop up a message could not actually find the message to pop up and so just popped up a generic error usually that means there's not a translation into English for what the error is you don't really see this a lot in english-language applications because most applications will not most but a lot of applications are written in English first and then translated so in this case this is probably an application that was originally in Chinese or Japanese or something that was incompletely translated into English although sometimes it's just an error that no actual translation was written for doesn't really matter all right so here's some boot messages it's like colonel boot colonel-general event log this is probably of the event log system uptime seven seconds so yeah there's the reboot 643 and then blam you know there's the windows hardware error log now let's refresh this because I bet you we've got some more errors from the windows hardware yeah yeah all right so look at this 648 in 12 seconds 647 646 645 so it's doing this two or three times a second or a minute it's doing this two or three times a minute which is kind of a lot now it is saying a hardware error occurred these errors are actually corrected errors I'm not sure if it actually says here oh yeah corrected hardware error has occurred yeah in linux these are reported in a very similar way and linux is a little more verbose than Windows so I've sort of been having some fun with this and so if you guys are on exon 9 you may be able to open up your Event Viewer and see these kinds of things right now now what exactly the issue is the graphics card could be loose in a slot it may not even be the graphics card this will tell you what more information about sort of which PCI slot it's plugged into so you can move your graphics hurt to a different PCI Express slot and see if that makes a difference in our case this is actually an error reported by the root hub which means that the root hub to which all of the other devices is connected is actually the devices reporting error now in the case of a PL x the error may have actually been generated on the PL x or the PL X might have generated the error but the root bridge is the thing that's actually reporting error or the graphics card might have generated the error with the root bridge PCI Express root bridge is the things actually detecting the error so it's a little squirrely sometimes okay another type of error that you can encounter in this log is oh the previous shutdown that you know 331 p.m. was unexpected basically that just means the computer was turned off and so the next time you turn it on an event will be logged in the Event Viewer that says hey the last time I was turned off it was unexpected and so that'll go in there you will also see errors for when you have a hard drive that's going out when you're a system like if you've got a mechanical hard drive and it's got a bad sector on it your machine will be humming along fine and then you'll hit that bad sector and the system will retry to read that bad sector but that interrupts everything everything else that's going on with your system it will sit there and try to retry and read that sector for you know a second or two and that second or two may register as like a hitch like if you're playing a game and then the computer just just stutters for no reason for a second or two it may have encountered a bad sector and so you can open up your event log and look and see if you see any errors and it may say oh a hardware error was encountered on device whatever or oh a bad sector was encountered or there may be a clue as to why that hitch occurred in your Event Viewer it's not always the case I mean when you're playing a game and something hitches it could be a million different things but this is why I'm showing you the Event Viewer because if there's something that's logged it's gonna be in here I mean there's all kinds of crap in here that doesn't even matter but there's actually good stuff in here that is useful to know it's sort of useful to be aware of so we've got you know some other stuff here it's like a system started here another reboot more PCI Express errors now you notice that even though we've got pages and pages of warnings there's no actual errors an uncorrectable error uncorrectable errors are the most egregious because an uncorrectable error means that there was an error and it couldn't be corrected and because of that there's bad data floating around there somewhere it stands to reason that if you have a whole bunch of now look at this one like through here it's happening multiple times per second and so this is you know us fiddling with it and this is us you know sort of pissing off the Badger and having everything go nuts and this is just trying some different parameters in the UEFI and you know trying some other different things we've got one informational message in here it's like oh background intelligent transfer service totally not related anything just a random informational message saying hey this service that's on Windows started doesn't mean anything alright so when the when the system was first set up when the system was first installed look at this it's completely insane we're getting hundreds or thousands of errors per second this was actually owing to not having the correct chipset drivers installed oops now these were corrected errors but by not having the correct chipset drivers installed it seemed to be causing some kind of a problem with PCI Express 3.0 but again there were no uncorrected errors and so you never really felt it on the desktop like you know he boots up and seems like everything's running fine but if you actually go to the Windows Event Viewer there's all this crap and the Windows Event Viewer saying hey stuffs all over the place I don't even know what's going on what's happening and this is all to a a driver that was not installed correctly in this case because there were no chipset drivers at all but that's why we were getting thousands of errors per second which is kind of a lot of errors logging these errors in the background too can also chew up system resources the Windows Event log in process is not the most elegant one and it's not the most efficient one and so if your system is actually generating hundreds or thousands of entries per second in the Windows Event Viewer just even if you've got a multi-core system just because of the locking involved in the the thread synchronization there it is going to be a fairly significant performance hit in some cases it's also true that the Windows Event Viewer tries to do a disk flush every time every time it's logging something because it doesn't know and so if the computer is about to crash it's trying to get as much stuff on disk as possible so that if it does crash at least that stuff gets written out and maybe the blue-screen can finish handling whatever was in queue to be written to the event log and and that sort of thing so that when the computer starts back up the person that's looking at the event log can maybe troubleshoot it so you'll see errors sometimes - like distributed comms like the server something did not register with decom in the required time out in the context of this error message a server is does not mean like a server on the internet server means that it was a process or a program on this computer that registers an interface through which other programs can talk to that program and so something went wrong here or some program was trying to communicate with another program and the other program just wasn't communicating and so that isn't actually running become is a sort of an old architectural feature of Windows for inter program communication and you will see stuff like this in the background whether or not it's actually anything serious or just just windows being Windows that is up to you to figure out and sort of go through and your troubleshooting process so right now the task at hand is what the heck is this whe a error and what can we do about it well let's start by checking device manager and seeing if we have any unidentified devices I've got one 690 LC alright how do we you know what the heck is this all right let's go to details let's go to Hardware IDs all right so that's a USB device it's a USB device we don't care USB devices will not really cause these types of errors tell you what let's reboot and let's go into the Box the UEFI whatever you want to call it okay we're in the UEFI now there's no overclock overclocking can also create a situation where you know even though we don't have error correcting memory or an error correcting CPU you know we were seeing PCI Express errors being corrected by the system that's owing to the new platform is so fast it's crazy it's gotta have error correction to deal with with this that in the other it's also true that if you were overclocking it would probably be a good idea to disable overclocking and see if you are still getting error messages in the Windows Event log if you are overclocking and you were getting messages in the event log and you turn off the overclock and you're not getting the messages anymore chances are that's not a stable overclock because eventually you'll produce an uncorrectable error and eventually you'll get a blue screen Windows will crash whatever so that's something to keep in mind alright so what we're gonna do is look through the UEFI and the first thing we're gonna do is look for PCI Express 3.0 okay so we actually have some several options here that are that are maybe relevant to what we've got going on PCIe ASP M is like PCI Express power management that can also play in here the PCI Express power management can show up like as if it's an error to the windows hardware error reporting just because the link speed is changing it's like oh it's a fast links for you oh it's slowly speed so we may need to play with these settings as well which might make the problem worse or make here it for now we're gonna set everything for Gen 2 everything in the whole system just because we can that's all we're gonna do we're gonna reboot and we're rebooting alright we're back in Windows almost let's go to computer management oops Event Viewer looks like I got a couple corrected errors but otherwise it seems like it's pretty stable what's run heaven real quick just because that's gonna generate a lot of chatter and we'll see if that generates any PCI Express bus errors alright so have been running in the background let's go over here and let's refresh wow look at that we're still getting a few windows hardware error event logged issues not nearly as many as we were but we're still getting a few yeah it's pretty steady I mean it's not it's not as maybe not as crazy as it was all right let's reboot and change the power management settings no we've disabled a spm this is the kind of power management for PCI Express on laptops this is actually a good thing this will save you power on your laptop on a desktop it's a little less important I mean it's nice to be able to lower the overall amount of power that the system is gonna use but in this case it may be causing some of our whe a errors now for the rate of errors that we're getting now or it's just a small number of intermittent errors none of those errors have actually led to any crashes what sort of precipitated this was when we were first getting started the chipset drivers meant that whatever was going on underneath the hood that those errors were being generated at a dramatic pace and so running the thing overnight like writing a burning test running some other stuff it would crash but this is also a problem that we've seen on many many other brands of motherboard and it's really interesting because we first encountered it in Linux with Eric s Raymond system and so it was it's really been interesting to troubleshoot different CPUs different processors you know he's using a xenon with error correcting memory this is an i7 X series with non error correcting memory and it seems to be related to the PCI Express generation and also the power management at least as far as whta goes on this particular platform but we've also in like Server machines and workstations and hardware that was legitimately malfunctioning as opposed to just configuration issues generate the same types of errors and so we thought it would be interesting for you guys to take a look alright so let's go check our Windows Event Viewer now that we've disabled a spm looks like it's back looks like it's back in full force ok if we dig a little deeper on this particular error this device ID is actually the Gigabit Ethernet hardware device ID this is the the Intel igix GBE device and so for some reason the Intel Gigabit Ethernet interface is throwing all of these errors so let's go back into the UEFI and disable one of the nicks now keep in mind that these W HEA errors can actually be legitimate hardware problems and so it may be that you need to arm a your motherboard but this you know this problem really goes back to X 79 and it was really never fully fixed on xx 79 on all motherboards most of the time it was it was able to be corrected with a UEFI update but it can still be a little squirrely even on modern hardware I'm going to go ahead and disable MCTV as well PCIe management component transport protocol just to see if that makes a difference we can also drop these down to Gen 1 but I don't want to do that just yet so we've got our 2 540 lands that's the 10 Gigabit Ethernet let's go ahead and disable the Gigabit Ethernet as well while we're at it we'll save an exit ok let's take a look at the Event Viewer looking better ain't no 86 to fo 8 ain't no 86 is intel to foi I'm not sure what that device is - OH - and - fi wait wait - fo 2 is the Gigabit Ethernet after again now depending on your card and the particular hardware that you're running it may be the case that there is no fix yet but this is something to keep an eye on and sometimes it actually is a hardware problem but sometimes it's actually a problem with just the UEFI and the configuration on the card so it looks like the last set of changes were pretty stable let's run heaven and see what happens looks pretty good that's actually a pretty good result considering the versions of things that we're running on so even though this is running in the background we're not generating errors at nearly the rate that we were so this is probably good enough for now this is sort of an edge case and sort of a weird situation so stay tuned different motherboards have different levels of support but all motherboards from all vendors exhibit this issue this is also happening not just with the x99 chipset but with like the the C so two ships at the server great chipset the supports multiple CPUs it's really sort of interesting I can't say that it's happening nearly as much on the on the sea 602 chipsets it's actually not happening at all unless you are dealing with a high-performance graphics card as far a great controllers 10 Gigabit Ethernet controllers that kind of thing generally are not generating these kinds of errors when you're using like a Titan or you know a GTX 980 or even older graphics cards in a server configuration like a dual socket Xeon still socket 2011 Xeon then you can see these these same kinds of errors now if you let this thing run and do your burning test and it's completely stable it's probably okay that you have some errors in the whe a log you know you're not it's not going to be perfect and some of them are caused by software issues at a low level or you know handshake issues with the PCI Express like the PCI Express handle a handshaking thing so even with all that we got to whta errors not bad that's not going to cause any really egregious problems it's something to keep an eye on if after however you have a crash or other situation you can look in here and see if you see any uncorrect 'add whe a errors so instead of corrected hardware error has occurred it will say undirected hardware error has occurred you might even get a blue screen and so let's talk a little bit about how you diagnose what causes a blue string now I mentioned before that we were doing some heavy testing without having the chipset drivers completely installed and we actually did get a blue screen but this was owing to a completely wacky a completely wacky setup it was our fault in the Windows setup but that's good because we've actually got a blue screen well how do you you know what do you do with the blue screen you just get the blue screen and that's it well there's some information about the blue screen in the Windows Event log but the best thing is a mini dump file and so whenever you get the blue screen Windows will generate a dump let me show you something to check to make sure that you actually have it set up to do that some OEMs disable this and I don't know why so when you get a blue screen it's good if it dumps debugging information so if you go here and you go to startup in recovery and settings and you want to do this so write an event to the event log on system failure automatically restart and then automatic memory dump so this is probably what you want you can also do a small memory dump for 256 kilobytes if you want but automatic memory dump is fine and it's gonna tell you where it's gonna put it so a system root which is usually C colon backslash Windows and then memory TMP and you can overwrite that if you have any an existing file and so you want to be sure that this is set to ensure that that you you have a dump file to work with should something happen there's also a folder on your C Drive C Windows Mini dump and so in C Windows Mini dump you have these these DMP files here that it will be created when your system bluescreens basically and so how do you look at those what do you do with those well it's called windbg and windbg oh yeah i disabled the internet connection huh when dbg is part of the Windows software SDK and so when you install it you should have one of one of these icons I'm gonna just run this administrator you download these from Microsoft it's part of like the windows 8 software development kit but there's another download that lets you just download the tools describe that one alright so i've run it as administrator i'm gonna go to open crash dump and then i'm just gonna pick our mini dump file alright and so this is gonna tell us we're gonna get some messages in here there's like we don't have debugging symbols installed so this would actually let you take apart the executables that were involved if you had the debugging symbols and the debugging information installed so that you could actually figure out what caused the crash but in our case it doesn't really matter too much most of the time the mini dump thing can figure out what actually caused the crash so once you opened it we just skip to the end and this is probably caused by nto sk r NL dot exe well in that case that's actually the Windows NT kernel which is sort of the core of Windows and the thing that manages programs and access to the hardware and the physical like interface of Windows to the raw hardware and so the crash occurred there is really sort of unusual I mean Windows NT is fairly well hardened this is almost always a device driver issue a device driver doing something that it shouldn't and a device driver corrupting memory space it could be bad memory like memory that is throwing errors and that kind of thing and so when you see errors or blue screens that are caused by an TOS kr in L dot exe it is almost always down to a driver for your hardware bad hardware or bad memory and so this is just another another piece of evidence that something went horribly wrong on the whta side of things and we got an uncorrectable error and indeed that's what actually happened when we were setting up when we were first setting up the machine installing all drivers installing incorrect drivers not having the correct chipset drivers for the platform and there was plenty of warning for that in the windows hardware event log and so hopefully with this sort of quick overview of the windows hardware event log or the Windows Event log in general and how it interfaces with your hardware and the programs on your system you will have leveled up in your understanding of how Windows deals with errors and where that's logged at so you can go look at it and how to troubleshoot blue screens even after the blue screen is over it's like oh I hope I get another blue screen so I could write down the error no that's not necessary just go go load the mini dump and see what it was go to the Event Viewer and see what the actual crash was the next time the system starts up it's gonna have that in there another thing that you can do is if your system is down and and has crashed you can actually open your your hard drive on another machine you can pull your hard drive out and put it in another computer go get the mini dump and look at it that way and so the Windows system event log is also stored similarly it's in Windows systems or anything and then win EVP and then logs and so in here are all of your log files for the whole system so if the system is so badly knackered that you can't even get into Windows you can put the drive in another machine well assuming that you don't have BitLocker installed and then if you have to jump through some hoops to decrypt your hard drive if you do you can browse the logs folder and you can browse the the windows folder to get the mini dump and to grab the logs and then on another Windows computer to open these logs all you got to do is double click on the file it'll open the minute viewer and then you can take a look and so you know here's our application log if we come back over here and we do system systems by and large the one that you want most of the time we open up system BAM there we are and so now I'm in saved logs as opposed system logs because Windows thinks it's open that you know another copy of an event log from another system or whatever you know I Boulder for a system 32 is not very smart you can't get a look over it well hopefully that's been enough of a crash course in the windows log for the windows logging facility and how to make sense of the windows logging facility that you will be able to troubleshoot the next time you have a problem with Windows or a problem with Windows being a stable or a graphics problem or a problem with a new platform or the next time you have a blue screen like some of your friends is having a blue screen it's like hey I know how to deal with that I can probably get some more information about it so there you go until next time see on the forums [Music]
Info
Channel: Tek Syndicate
Views: 118,187
Rating: undefined out of 5
Keywords: tek syndicate, Logan
Id: KAbuZ_LQJCo
Channel Id: undefined
Length: 28min 5sec (1685 seconds)
Published: Wed Apr 29 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.