Am I Reading Garbage Collection Logs Correctly?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I would like to welcome everyone to this am i reading the GC logs session cut correctly so before I get started with this session and session I recently read an article so I want to share that the what I read in that article with you was saying that in your hand there are several acupuncture pressure points so when you clap all those pressure points gets activated and the person who is clapping will feel very relaxed so all through the session feel free to clap whenever you want to we can clap now once okay good so few words about me I'm Ram Lakshman so I have built several large scale applications like one of the largest world's so application I built a brute largest travel application of architected or very massive applications and now I will come out and I am doing this by independently I am building the tools GCE C which is a universal garbage collection log analyzer and in a fast read which is a javathread dump analyzer we will get started so as far as agenda goes first one is we will try to learn to read one single GC logline next item is let us learn to read one more GC logline and after that I have kept it as a suspense what are the next agenda items I'll tell you as Stan comes see I was checking there is around 10 to 15 garbage collection sessions in this JavaOne conference itself how this session is different from rest of them is all of those sessions talk about G 1 G C of you to net the pros and cons but this session talks about how we can interpret the G C log file how you can troubleshoot the problems in the G C log file can you better understand the G C log file so that is what the session is about reading the G C log file okay before that we want to see how we can enable a GC logging in your JVMs it is very simple you want to just pass these parameters print GC details when GC date/time stamp and then X log GC where you need to give the file path where the GC log frame is to be printer starting from Java 6 update 34 you can also rotate the GC log file like if you are if you JVM is a long-running for several days then you can rotate the GC log for you just like how can rotate your application log file this is the parameters to do that and one best practice that I would recommend to everyone is always keep the GC log file turned on on all your JVMs because it adds close to negligible ordered or unnoticeable over it because we don't know where from which JVM is going to suffer from the problem so it is better to always turn it on it will become Andy to diagnose ok okay so now our first agenda item how many of you work with GC log files closely okay so this is one single line from a GC log file it is it is one single line because in the PowerPoint that has been folded down into three four lines but it is one single line how does this look it is all numbers right so let us try to break it down and to see what what they mean first one is in a GC log file the very first thing we are seeing it basically shows the timestamp at which this GC event time whenever any GC went runs it is being logged and this shows the timestamp at the at which this particular GC one time this GC event ran on march 21st 2016 at 13:00 46:58 right and the next element it shows the number of seconds since the application has started this event was printed like this application was started like a finite milliseconds finite seconds before after finite second we are seeing this GC went to run the third one it shows what type of GC event it is whether is it a full GC whether is it a minor GC or whether it is a major GC it prints it this is what this section says and the fourth element it shows like we know that most of you know that the Java memory model it has egg generation whole generation and then prom gen and the older versions and in the meta space in the newer versions so it shows what was the N generations utilization what are the in generations capacity you can see see this is basically arrow mark the egg generation size so what was before the event was this is the value it was 1gb before this event ran the the young generation size was 1gb but once the event completed the g1 completed it became 2-0 GB that is from 1gb it dropped to zero GB the GAR the objects have been garbage collected in the egg generation and what you see within this parentheses is the allocated size of your rank generation what was the size that was allocated for n generation and the next section it shows what is the old generation size it was the same thing the old generation size was 1.99 gb and after this event ran it dropped down to 845 MB and the overall allocated size for this whole generation is 2 GB and the next section source the overall EEP size both the N generation whole generation put together what was the size before like it was 2.99 gb from that it dropped all the way to 845 m b that is it is cumulative both egg and the world together this is what it indicates so this GC went has done a pretty good job it has garbage collected like a significant amount of garbage from the memory and the next section shows the poem generation size what was it before and what was it after in the poem generation there wasn't a significant change fairly remain constant and this section the times it shows how much time that this the GC even took you can see it's saying it took one point eight four seconds for the overall for this GC event to complete from start to end it took one point eight four seconds okay also here you do see other values what is this user what is this sis what does it mean and when this is also real okay to answer that question we will look into this UNIX command time how many of you have used this command time okay good see what is this UNIX command does like say it gives you the overall time taken to execute a particular UNIX command like say here I am saying time LS when SEO LS it is going to list all the files in the directory in that folder which is what we are seeing here when you say time it also displays the time that is taken to execute this complete this LS command that is what the time command does so here also you see the same real user and sis the same principle that we are seeing in the Java GC also what does it mean see the real is basically the wall clock time that is what you see in your clock and the SIS is basically that time you spend in the kernel for executing like there is also if you see there is a user mode and there is our system mode what is the time you spend in the kernel to complete that command that is what this is and the user is what is the time you spend outside the kernel for executing that command that is what this is so typically user plus sis gives you the total amount of CPU time you spend it is a grameen on CPU time you spend for executing that particular command so so typically if you are running in a serial GC mode if you are using serial GC see here the serial the user insists together it is 3.9 seconds and that is what you end up seeing as a three point ten seconds in the reload but let's say you are using a parallel GC concurrent mark and GC or g1 GC in that case you have multiple GC threads working on your process and garbage collection then this time this say let us take this example it roughly comes to thirteen seconds added together user and sis but since you have multiple work at a Jew is working parallely and there is also CPU is concurrent it has the actual time you spend is only 1.3 seconds so for all for most of our practical world computations we can go with the real time okay so now let us move on to the next item let us lead one more GC log line say I want to see this is basically the g1 GC the junii see has become the default algorithms since Java 9 so this is what you want be more familiar with see this format is slightly different from the earlier not slightly different significantly different from the format that we saw earlier just a minute back it is not one single line it has multiple lines I have edited some of them which are not which are not very relevant right now when we talk about but so I edited some of them so let us look at this what are the significant important points in the GC log file of the g1 GC log the first element one sign shows the time the timestamp in which this GC event ran 2015 914 at the 12 32 24 the next is this is a g1g see as multiple faces and it shows which faces is it it says it is an evacuation phase the evacuation phase in which you are copying the objects from one region to another region it says what is the sub phase of it the next is it shows what kind of event this it is an N generation event it is an it's an NGC then here you see this number of worker threads like number of parallel threads that are configured for this in this case I configured 8gc worker threads the next section shows what is Eden is basically the N generation the Eng generation size before the event was 12 MB and after the event it became 0 MB but now here also you are going to see something within the parentheses 12 MB and here you also see a 14 MB if you see in your earlier log line we saw only one size this is basically allocated size of your young generation it is basically so on it it is 12 MB and now it is showing after the event the allocated size became 14 MB whereas let us look let us go back and look look at the earlier example this earlier example we have only one parentheses one size allocated which is close to 1 GB we do not see to two values within the parentheses the reason is G 1 GC allocates the memory at runtime dynamically so what it says is before this event was ran it was 12 MB the allocated size for n generation was 12 MB but now I made adjustments it run time delicates n it scales now it has become 14 MB does dynamically scaled the Eden size and next is it shows the survivor size and it went from zero bytes to 2048 kilobytes and then when the heap size it was 12 MB before from pearl MB dropped all the way to 8 MB but the allocated size for overall EEP remains constant it is 252 MB and similarly here also you see this times how much time it took to execute this GC went it's only 0.02 seconds so the g1 GC log format is slightly different but the information is same the data is same okay now let us get to the suspense part ok ready so if I showed this before probably everyone would have left it or been a very boring session because it is a same data repeated again and again and again but that is not true see there are so many different types of GC log lines which we will see now actually this is the agenda point all many more GC log lines should I learn because we just learnt only two or many more GC like log line formats we have to learn see the GC log format it varies by who is your java vendors whether it's HP IBM Oracle be a layer or even Azul it varies by what kind of algorithm you are using where it's serial CMS or g1 GC or parallel and in some in some Jamie in progress it varies by the version also we're using 1.5 1.6 1.7 and also it varies by what kind of system properties you pass the timestamp and all so there is if you're going to ask me how many more GC log lines should I learn answer is I really do not know but roughly I would say there is around forty to fifty different formats of GC logs are available and before busy this is what we are seeing is only a raw data from which we can analyze whether your application is suffering from a memory leak we can obviou wear applique is it suffering from consecutive GC pauses first you need to understand the data then when you can actually understand the data then you can analyze what is going on so so let us take a look at few few other GC log formats what I am talking see this is an another version of G 1 GC log format it is one single line like earlier we saw it was multiple line but here it is one single line with different format and different structure so if you are going to use a CMS log CMS algorithm then this format structure is going to look different totally different from what we were seeing earlier see this is an IBM's GC format GC log format they have a structured way which is an XML structure but it's it is totally different ten use over lower all the stuff and if you come to the most recent version of IBM their GC log format changes once again it is not the same it is XML but the structure is - but the structure and the data is different so this is a very painful process to understand and to debug to troubleshoot any of these GC so as I mentioned earlier so to address this problem I have created a tool called as GCE C which is basically a universal garbage collection log analyzer so to this tool you can upload I told the 42 tea formats in which I sub I process B process around like a more than 90 percentage of GC law formats today most of the widely used or even very minutely minorly used GC law family support so to this tool we can upload the GC log format the tool can parse whatever the the raw data and then can present you with real rich information not only with real rich information it can also diagnose it has patterns single intelligence patterns to diagnose and tell what is the problem going on in your JVM so let us do a quick example of this so let us look at a 1 GC log file this is the GC log for 1 1 GC log file see whatever we saw just one line there is going to multiple lines like this so now let's upload this log file to this - and then let's see what what what happens so this is a free tool free service no registration nothing is required to this you can just upload the GC log file just uploading the file okay so it upload analyzed parse and it is telling the information like see earlier it shows it shows you the breakdown of what is the size of your eat what is the angulation size what is a whole generation size what is the form generation size what is the total size in this case this JVM has a 12.5 gig of memory and it reports several other statistics as well like how long what are the duration it ran what is the throughput or is it rated by its promoted by its allocation rate promotion rate let us look at some of the graphs it this shows ours the heap is behaving during a run time like it shows like this was the the red triangles whatever you see they are basically the Fugees events in the Fugees even plan you can see a memory to be dropping how much it was going on there is few very key important things that you might want to look when you are analyzing the garbage collection log one of them is basically their latency they say the latency how long did it 1 GC went to complete like here you can see this graph shows what how long your GC events took to complete see some GC ones took like a 13 seconds another GC even took 14 seconds especially when a full GC runs your JVM pauses so if an EJB imposter customer transactions will have to wait so you might want to focus attention on this one on this particular graph so this tool is analyzed it and it is giving you some kind is kind of a problem statements saying our analysis tells that your application is suffering from long GCS six GC events took more than ten seconds so that means whether is it acceptable like for some applications even one second is not tolerate about one second of a positive is not tolerate able whereas for few applications like a several seconds is taller a table like here I am going to show you some of the GC log files where the past time has been for 14 or 15 minutes there are applications which are passing the GC the garbage legend running for 14 or 15 minutes okay and besides that here you can also you want to see you can want to see the distribution of how many is out of this it says for this JVM it ran for 120 hours 119 hours and 39 minutes in this total GC 4040 6224 times GC even prime during this window so pretty significant number out of this the minor GC is 46 thousand 150 times and the major GC ran for 74 times and what is the total time it spent on the GC total it's one hour and 43 minutes how much time you are spending on full GC how much time it gives you these kind of detail breakdown okay so now I want to see forty six thousand two and twenty four times my GC Ron what triggered these many GC is to run what is the breakdown of it for those things here's the breakdown it shows because of allocation failure forty six thousand and sixty-six times is the GC mannan allocation failure it happens when there is not enough in generation space is available that means maybe you might want widen your in generation space when you do that probably may not see this many GC events to run and see there are explicitly system dot GC calls are being made by this JVM by this application five times it has called system dot GC see it may be that our code from our code we may not be calling system dot GC but we use lot of third-party libraries frameworks like given application servers they explicitly call system dot GC without our knowledge it will get be invoked those things we will get into visibility by these kind of analysis like here maybe I don't know whether the application really by intention they are seeing this very explicitly calling the system dot GC calls okay see this as when you do a GC analysis it has overwhelming amount of information I am just going to give you three key performance indicators that maybe you want to take a look at it and take away these are only three things primarily you want to look when you analyze the GC for a performance the first thing is throughput yet it says 98.5 six percent so what is throughput throughput is amount of useful work your application is doing instead of spending what are the time it spends on garbage collection and what is the time it's been on doing this you're real useful work that is servicing your customer so in this case it's ninety eight point five six percent ninety eight point five six percent over time has been spent on servicing your customer transactions there is a remainder time they spent on doing this garbage collection work so you need to always shoot for a higher throughput try you want to keep increasing your throughput because you want to have less and less of garbage collection time so they are servicing more of for your customer so this is the number one key performance indicator that you want to look and the second one is this GC duration time this is basically a latency how much time does it take for garbage collection events to run and complete and the third one is the footprint which is basically you want to based on what GC algorithm you use how frequently your GC runs your CPU consumption will go i or go down so that is another thing that you want to look what is the CPU consumption but the CPU consumption is not reported in the garbage collection log you want to use some application performance monitoring tools or any top or those kind of tools to see what is the CPU consumption so I would say these are the primary things that you want to look when analyzing the garbage collection log file okay so now let us take a look at some of the troubled garbage collection logs so that you can get a feel like how to troubleshoot some of them okay so now uploaded a different garbage collection log file now the two last inbuilt patterns and it has identified that this application is suffering from a memory leak let us see why the tool says this application is suffering from memory leak if you see here this is how the graph the EP usage graph looks like the EP says has been going through fine it has been after this point it is started to climb up and from here if you see the application is only during a full GCS full GC is keep on running and let's even zoom in and see see it is only doing the full GCS but one thing if you notice even though it does the full GC the memory is not reclaiming it is not coming down it still remains at the the same point to you there is this graph which shows the reclaim bytes how much bytes are getting reclaimed on every GC event that you can see here earlier when the GC ran it is able to reclaim fair amount of write like a 2.5 gig 2 point when it was able to reclaim but after us after this point it is not able to reclaim any more memory your reclamation starts to go down go down go down it has at the rock bottom that means all the objects are having active references and your JVM is is firing a garbage collection but the garbage collection is not able to reclaim any memory so your memory still remains at the same state so once again the garbage collection is triggered so garbage runs again is not able to reclaim any more memory then once again then okay now the James is oh no my memory is not reclaimed and it is going to fire carbon so that means at this time your JVM is only doing the garbage collection nothing else so this is a classic indication of a memory leak right any questions okay so let us take a look at a another example of a GC log which is also suffering from a memory leak this is a 3.24 job at a JVM here you can see this JVM has been running for 14 days and in this 14 days it has been spending around 3 hours only for doing the Fuji C's let us look at the pattern here it was going through find like anything they started in August 20th it has been going fine then the Fugees happened the memory reclaims reclaims boom at this time you can see the red triangles the Fuji sees are happening repeatedly and let us look at the reclaimed bytes of them and see then you do not see any bytes to be reclaimed this is an indication that this application is suffering from a memory leak once you are in fact ok this memory leak has happened then you want to take a heap dump of your of your JVM and then do an EP dump analysis to see which objects are accumulating and what they are and then and then you want to fix it okay okay so now let us analyze this garbage collection log this is a very very interesting garbage collection log file do you see this number like it has half a terabyte of data of memory 483 gig JVM business very large implementation and it is running for 26 hours if you see this guy's throughput is only 89 percentage which is pretty it's not very optimal throughput it's I would say it so bad throughput to be targeted now let us look at the garbage collection log file it is going fine fine fine we do not see any memory leak pattern that we saw earlier but now let us look at the the GC duration C this is basically seconds not milliseconds that is few GC events is running for one GC this event is running for 885 seconds which is 14 minutes so for 15 minutes we can even go to we can take a big break and come back to whatever you want but the garbage collection is going on here see ya but the tool it has this thing it says okay we are analyzed in say 17 GCS took more than 10 seconds long running Jesus or unfavorable performance it points out these kind of problems to you that what it finds in the garbage collection log file yeah see that's a good point see that 10 seconds is a number I picked because for several applications maybe even one second is not even is not even on top it is not at all a table limit but for several applications this terms a passing for 20-30 seconds easily and they are comfortable with that so this has to be spit set specific to your application but if I keep it as a tool s of value then it will trigger alert for several people so it has to be customized and set for your application for your organization based on your need okay see now let us look at 1 G 1 G C log file see this is how a G 1 G C log file looks there's so much information let's upload this log file into the system into the tool and let's see what happens once it provides you this information what is your n generation what is a meta space size all the same statistics that you are so earlier for the other algorithms the graphs see you gon GC as several phases what is a cleanup phase mixed remark string duplication NGC concurrent mark initial mer so it gives you the breakdown the split up by each and every single phase as well all these consolidated summary you do not have to dwell and get it from the GC log file it is available and visible for you right there okay one more thing that I want to show this there is a REST API is also built to analyze your garbage collection log file see like via our JVM is running on multiple servers tens or hundreds of servers the thousands of servers based on your implementation we cannot go and analyze every single GC log file because it's tedious and it's impossible so what we have developed is and REST API to this REST API you can take your GC log file and then you can publish you can just make a rest call with your GC a with your GC log file it's going to pass and then it is going to give you the response back so let us let us do a quick demo of that see it is very simple here you need to give the URL you need to you need to pass a API key in your HTTP header this you will get when you register with us and then you need to be pass then here you can you can paste your GC log file once you do that when you do a post you are going to get immediately get back a response back so this is the response I form formatted which is which you are seeing here here whatever the visually you saw on the web version all those information is present here in this GC but one very important element maybe you want to look is this element is problem so this value will be true if your application is suffering from any problem like a long GC or whether is it suffering from memory leak or is suffering from the consecutive GCS or any of the problem it detects it applies the pattern detects and reports you whether it's true or false so maybe you can this has been published in its reasoning huge amount of traffic every day several hundreds of thousands of people are using this API and they are pushing that DC log front so this can be used to analyze all your production JVMs plus also it can be integrated as part of your CI process in CI continuous integration I am seeing people are doing performance tests also so maybe once you do it if you have a garbage collection there you can just stream it into this API this API is going to revert you back whether is it successful whether it's a that it's doing good or whether you are seeing any problem so those kind of things you can do with this tool yeah correct no no right now no yes that is something as when they thought about yes okay see I am going to show one more pattern see this one this pattern this we are calling as a consecutive full GCS what does this mean let us look at this graph you see the full G C's have been running it's been happening if you see there is here you see a cluster of full G C's to be running repeatedly zoom in the cluster of Fugees have been running running running and it is not able to recover from that but after a point it starts to recover unlike the earlier memory leak only the full G C's runs repeatedly in it it won't be able to recover from that whereas here it is able to run and then it is able to recover from it let us look at the recline Bynes if you see it is starting to drop the recline bites is starting to drop but after a point it starts to pick up this typically indicates that you're you need to allocate little bit more memory to your JVM because maybe at this point you are experiencing AI traffic volume more than what you are expecting and then the JVM was not able to handle that but once a traffic volume goes down or your research line goes down then it is able to recover back this is a symptom to indicate that you need to allocate a little bit more memory application so these kind of several patterns are there in the system which you can which it applies and tells you what is the problem going on so any questions from anyone yeah yes yes you are asking whether that API exists to my knowledge no it might be but to my knowledge I do not know but but but I also thought about that several times maybe I can explicitly even we can take it out of the rotation and we can call system GC and then after some time we can put it back into the go yes okay so add a one more and yeah see this is a free service but the to which I am going to open source it and get up shortly yes yeah see right now it's a JVM like you can Jing we can run any language like a Java Scala JRuby jet on anything this will work but right now it's only for a JVM it operates at the gym but dotnet and all I don't think yes it works for any language which runs on in fact suprising I learned that there are 52 languages that runs on JVM today it's not just Java but this operates at the JVM level so it works across all the languages okay any other questions from anyone okay if not then that's fine I think that's it we'll wrap the us meeting okay thank you everyone
Info
Channel: Oracle Developers
Views: 7,266
Rating: undefined out of 5
Keywords: Java
Id: 4HTPzqaYuUQ
Channel Id: undefined
Length: 46min 44sec (2804 seconds)
Published: Wed Sep 21 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.