Thread dump analysis - HotSpot JVM - Java heap depletion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is PA Charbonneau I'm a senior IT consultant and the other on Java II support patterns blog spot.com today's session will be first deal of a series that will teach you how to perform trade em analysis which is a very crucial skill for any individual involving production support or even if you're involved in delivery and load testing initiatives so today's session will teach you basically how you can analyze a tread down for a hotspot JVM format yeah regards to the version typically 1.6 plus and in the beginning will basically give you an overview I can do the overall high-level analysis and then the tread them that we'll review today will actually show you how you can detect Java heat depletion so sometimes you will see this type of problem ultra will generate tread search and then you should be able after the end of this session to be able to detect that also in your environment here's a tread dam that we'll review today as you may have seen from my past article so tread dam is very crucial piece of data right because it's basically a snapshot of all the tread executing from the JVM and this is the exact type of tread that you will get and typically the the JVM is obtained if you're running a UNIX environment you'll need to generate theater with a kill - tree it's one way you can use Jake's G stack as well or if you're running on what logic for example you can also generate rhythm directly from the console so there's a few ways you write read them native tread um is prefer go like killing - tree so you get access to all the native information as well so if you're working with a hotspot JVM this is a typical format that you will get in our example that trade my generated from our calls service bus enrollment which is why you see WebLogic in the actual trend so let me explain you how typically you'll do an analysis so the the tread dumb hotspot is fairly straightforward so basically have a listing of all the created threads or the running tread in the JVM and at the very bottom of the tread dump you also get the detail from the Java heat that data is available since hotpot sorry a hotspot version 1.6 which is very useful because you get the breakdown in a one-time snapshot to get all the running treads and you get the utilization your Java heap a value we will see today this is very crucial at time as well so you can draw some straight correlation depending on what type of problem you're dealing with okay so I'm going to share with you typically I analyze that read them as you can see I don't use any tool there's something you can use out there like TD or a few others and they are good at formatting the tread dumb and highlighting new some of the block tread and so on but I still recommend you that you learn to analyze the raw data as you will see personally I've been losing tread them since the last ten years and I and I really prefer to analyze it just using a simple enter with raw data okay so first thing to be able to do is to be able to quickly differentiate healthy treads versus non healthy trend so what does this means right so healthy tread means it's a trail which is waiting to execute a request right basically a trade will has nothing to do for now non healthy tread or structure or so tread it's basically a tread that you see actually executing a transaction like this one and and and basically this is where you have to start to worry so you have to be able as a starting point to different shades destress so you don't want to spend time and I like men two minutes analyzing each single tread oh yeah what it does and what it does you need to be able to quickly realize the deist red so you just focus on the actual tread or doing an actual task and then quickly you start analyzing a few of these treads and the goal is to establish attempt to establish or identify a pattern okay this is really key you don't want to spend two hours analyzing that tread them you want to analyze a few treads and then determine the pattern and then once you once you figure out the pattern then you can probably move quickly the rest of the trade them and then just confirmed yet they die you're on the right track okay and this is what what you pinpoint the root cause and the the tricky part random is that at some point you'll be able to annoy that read them but the the key experience will come for you to be able to quickly detect the actual patterns so let's do the end of this in this case so let's assume that you're in charge and you're dealing with Oracle service bus threw down a major tread surge in this case we can see we are a lot of love tread doing a lot of tasks right stuck tread stuck tread all over their stock tread all over the place right so obviously you start from the top now the key analysis will be from the stack trace ok WebLogic if you're using a more Blahnik it will give you also the state of the tread like in this case WebLogic will show you I stand by your running tread stock mean that WebLogic detected that the tread is taking more than two or few minutes to complete as per your configuration so this is typically a trail which is stuck in this condition for a few minutes this is really not good right because this is exactly what you want to avoid the rest of the information right so that the trending quite typical for WebLogic right with the state as I said and the rest of your information is native information for the tread and this is typically crucial data which you want to correlate which word are using too much CPU but we won't cover this today okay today we'll just focus on that high level analysis the key analysis aspect is the stack trace okay similar to what you're used to in when you're basically our reading logs from your application or a stack trace or an exception right yeah I always have to read a stack trace from bottom up right the stack rate is pretty much the execution path of that trail or following an error at a given time in the context of a tread dump the stack trace is the actual execution path from the tread on sale the first line which basically means that current task that that tread is a time thing to complete okay so you always have to read from bottom up and then you have to as I said establish Panther so in this case let's do the analysis at this case we can see it's a WebLogic tread the tradies is executing a request now it's trying to execute the timer timer expired so basically on the on that oracle service bothersome internal monitoring so that's why you see this type of tread on service brass because there is some environment monitoring or collector going going on and then but then if we look at a character let's see where the tread is now waiting you see log logging service log log you look at this it means that WebLogic or service bus internally just trying to collect the information from service bus and then look at what's happening it's waiting to acquire a lock so right now this tracer is block is that tenting tekwar lock on this particular object right we can see that here so far so good can keep that in mind flat lock or come on issues but as I said the goal is to try to define identify a path and so far we know we have one thread collector which is for some reason is unable to get a lock for from the logger which is a bit surprising because the Paquette processing should be extremely fast right so let's move on right so this trend or blanching perspective it's not doing anything how do I know that well this is what yet you have to learn right you have to learn depending on which container you're using if you're using web logic you see wait for a request that's the method and the WebLogic sure that you have to realize that this road isn't doing anything it's sitting there and waiting it's waiting from the WebLogic kernel to assign it a request right so this thread is well I call healthy it's not doing anything so you don't even bother looking at it ok as soon as you see that you skip it now this one thread 4 9 look at this now we we quickly move right to the pattern now we see look same pattern you see collector logger waiting to acquire flatlock you see that right so far we have two trails trying to wait on the same object lock right we can see that here okay good so we start to see a pattern you see that look at this for weight same same thing its strength acquired flat lock on a console and lourve to an internal logging again same same stack true see that's we have quite a few right so this is already suspicious right you know giving you some so what once you figure out the pattern like this you can actually search for the object ID right and then you will be able to see we have quite a few waiting to lock right we can see where there's quite a few tress trying to lock this object right it's it's major and then what you can simply do is search for which tread right he's owning you see by just looking at the law you see that's typically what you get from hotspot for flat locks you see the actual turd waiting for a particular object ID on the object monitor right that's a fast lock or synchronize if you prefer in the java code then you search for the culprit you see that's we just find it here then you analyze the stack trace from this one here right so let's look at this trap you can see this thread is basically just trying to some tea tree log info on structure this is just some internal logging that WebLogic is trying to do this should acquire the lock on this object right and we already saw we have many threads waiting from that object and then if we look how the thread is doing next just trying to do some logging information right and then if you look at it you'll be able to see that now the question is what this thread is doing right as you can see again this is some a little internal logging request and then the tread is trying to basically know some sort of resource bundle and then it trying to do a loading a class loading con so this is very surprising right so we get the tread chain because of a logger in terms of web logic and then get the culprit tread which is also stuck which is simply trying to load the class in a class loader well this is this is a key finding okay because class loader typically is extremely fast so what you need to understand is then as soon as you see something like this as when you start to see slow down even for class loader calls then it means typically that you're dealing with overall contention okay no overall contention when in the JVM and and what can trigger that what you will learn with more experience of what will trigger that type of contention is typically are seated with JVM or Java heap problem or could be very high CPU happening on your server or a few treads spinning and using too much CPU and causing overall contention so that's a key finding once you find that as you can see we spend a few minutes then you will move next and then you scroll the rest of the tread um and you determine yeah basically if you start to see other pattern like this around logger right still logger then you here you see we see some stuck tread just trying to initialize some object you can see that are that released are just going to initialize some object in memory so that's giving you the hint that there is contention so at this point you should stop your tread thermolysis right here and you need to do a quick assessment of your Java heap right because typically tread down will show you the culprit but it's not the only data point required you need to consolidate order data points as well so once it's something like this do a quick assessment at least to rule out if you're dealing with a Java heap problem before you did too much deep dive in at read them and the good news is as I explained the tread diamond hotspot 1.6 has all the detail so you can get tread down all the snapshot and you get the story right also from a Java heap on a view now since we found contention within flash loading and logging this is the typically type of processing which will suffer us first when dealing with JVM issues my question is are we dealing with a Java heap problem let's do the assessment now so that that pretty much enrollment is running the concurrent collector so that's why you see all the parallel GC and then this is why you will see a young gent piece here and the old gen space here right the perm gen also which is also applicable for a GD key 1.6 so if you look at the young gen space this one not too much of a concern you see that the actual space that we have is not that big a deal we are only using about 170 Meg's out of the 160 Meg or something so it's not the young journey is not really sure in this case now let's move to the old gen total capacity all right see that one that four gig right use oh look how much use almost 1.4 game you see that 99 percent years so we exactly found the problem right so at that time of the snapshot we basically if we go back at the facts major contention with internal logger WebLogic which is very suspicious then we have a culprit red showing contention trying to do class loading column which as I said it's typically a symptom of contention you know the GBM disk CPU and so on so yet this is where you have to start to worry then you do the quick assessment on GBM and then we have a perfect match here so why are you see that well the reason is that as soon as the old gen will reach 99% at some point the JVM won't go in thrashing mode which means you will see excessive garbage collection so these threads will be firing over and over and over causing a lot at JVM pass time since energy VM is struggling a lot of pass time the actual treads get interrupted right and this is why you start to see all the slow down or contention within computing tasks in maricon feeding touch right that's why you see pattern which doesn't make sense like this this is a key problem pattern it will always see that when you start to deal with Java problem you will see processing like a class loading XML parsing logging show shown in the tread DOM and then this is why it's very important you do an assessment on your jar Java heap so again the tread um analysis will bring you to that it's possible you did you can do that the other way around it's possible you will detect a Java heap problem and then you can analyze at read um to understand the the impact but in this case we did the other way around so at least is giving you some background when you do tread down make sure you also do a health assessment of your Java heap this is one way to do it enabling verbal G's you're using a monitoring tool like G visual VM warning commercial tool would do as well okay so hope you appreciated that truth M analysis on the hard part like I said it just won the first view of a series that problem pattern was about Java heap depletion and how to detect that from the tread down next session we'll be covering much more pollen cases so I hope you did enjoy it and have a good day
Info
Channel: Pierre-Hugues Charbonneau
Views: 72,355
Rating: undefined out of 5
Keywords: JVM, Thread Dump, Deadlock, Java (programming Language), Threads, Hosting, Software, Technology, Computer, Tutorial, Web Application Monitoring, Network Monitoring, Java Monitoring
Id: 3dKufRRT_3E
Channel Id: undefined
Length: 16min 44sec (1004 seconds)
Published: Thu Feb 14 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.