Using FlameGraphs To Illuminate The JVM by Nitsan Wakart

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

forget it make it do it make sure my name is the turn I'm going to talk to you today about flame graphs and how we can use them to gain a bit of a different perspective on our code and on the JVM there's two slides there's the one with how great it's going to be Morgan I love it and there's this one where things work out differently for us hopefully we choose the first right I'm a programmer I work on performance I'm currently working on on Aaron with Martin Thompson used to work for us all and I still love us all we broke up Amica Blee and they make a very good JVM you should all cry out thing has anybody heard of zinc yeah awesome this is it used to be like three people now it's half the room great I have a blog I don't know if anybody does anybody read the blog a lot less people excellent the rest of you should read my blog it's excellent and I contribute to a bunch of open source projects the only ones that are sort of relevant to this talk it's honest profiler and person app agent so we'll mention them just now I'm also a jug organizer so if any of you are ever in Cape Town and you feel like you've had enough of your holiday and you drank enough mine and you want to talk to the local community give me a shout and we can sort something out or you can come join us and it's always nice right what is the root of all evil I I feel as an industry where we're fortunate that we have an answer to this question I think a lot of other people are very confused they think it's Marxism they think it's violence or capitalism alternatively but we know that there's a truth there's a real root to all evil and this man has told us what the root to all evil is and basically what he's saying is that you know in the age of writing assembly and even in this day and age when we write the system we write a lot of code and most of the time when we rather write that code we should forget about smaller it doesn't matter if you can concatenate in strings or you know allocating a lot when you're dealing with the code that might never run or Migron once and then be forgotten about this important code and there's less important curve and prematurely choosing the wrong codes is obviously the wrong thing because premature as we all know is a bad thing but where are the big efficiencies how can we find the right time to optimize how do we know what to optimize right and this boils down to this lovely confusing phrase in performance often the problem is not the problem the problem is phrased by your clients who come back and say this page is terribly slow this application sucks every time I use your code I want to hang myself that is you know obviously not helpful right you want to know where the problem is and then you have to use some tools and they present a bunch of symptoms and from those symptoms you need to find out if it's a configuration problem or a problem with your code or actually maybe there's no problem it's all perception and that your client is just having a bad day all right so what do we do and if anybody's have been to Martins code he mentioned some of this so this is a brief summary of my view on it first of all you have to have requirements if someone comes to you and say I want a fast application that doesn't mean anything more it means many things you want to know what they mean when they say fast they but what do they mean - put low latency do they mean you know a thousand requests and hundred millisecond latencies do they mean a million quotes per second and sub microsecond the kind of latencies okay so you get requirement and then you measure at the level of those requirements measuring as some people do we won't point fingers measuring the internal latency a particular bit of code in your application it's not that it's useless so much as often does not reflect what your client see so you need to measure at a level that is meaningful to the requirements that you've collected okay and once we've done measurements once we've established that the problem we have is one that can be fixed by improving the CPU utilization in particular then we profile and we use a profiler has anyone in this room never used a profiler before don't be shy everybody's use the profiler profilers are not part of the syllabus at university apparently and many people haven't used the profile but all of you have which is great or maybe you're just lying that happens too right you've profound you've improved the curl'd you've done your bit now you measure again the reason you measure is because if you've improved the profile and it didn't fix the problem then you're not helping anyone so there's a cycle here you've changed something you see the effect you hopefully find the right spot to improve you improve it then you measure again if you don't measure it again then you know you're just doing it for your own enjoyment which is bad obviously right everybody here knows about Java right we're not going to have to explain Java great anybody here used flame graphs knows what flame graphs are right everybody knows a little bit anybody here used perfect before even less people those of you who have good on you right this story starts with Brendan Greg Brendan is the performance Superman for Netflix and he looks at huge clusters of machines lots of applications and when things go wrong they go to two mr. Brendan and he invented all developed flame graphs not just as Netflix also elsewhere and this is a tool he uses and I think it's important to understand his motivation here as the the man behind the tool and this is what how he describes he says flem grafts are a visualization of profile software allowing the most frequent code path to be identified quickly and accurately this is a man who looks at applications that he didn't write he needs to make sense of complex applications now you know a benchmark where you have twenty lines of code you know a genuine web app with different kinds of requests different code paths he doesn't know the application and he wants to make decisions quickly and find out what the hell is going on so this is his focus when he looks at flame graphs and and what flame graphs look like is something like this if you've never seen a flame graph you can admire the colors and the fact that it does sort of kind of look like a flame so we're going to look at flame graphs and what they mean how to interpret them and in particular the context of Java how do we get flame graphs there's different inputs you can feed in the ones the one input we care about is sampling profilers and sampling profilers all work along these lines roughly you collect stacks which means that as if you were to put a thread pen stack trace everywhere in your code and then you know wherever you collect that you throw it into a file and then later on your processor and you'll collect the stacks every once in a while maybe ten times a second maybe a hundred times a second maybe once a minute doesn't really matter you'll collect lots and lots and lots of snacks okay you have all this data if you were to read through it visually you will take a very long time you want to present this in a concise way by the way some people and they have success with that they look at an application say it's very busy I'm going to call Jay stack which is a command line tool to pump out all the stacks for the application and they'll do that two or three times I say half this is where my application is spending its time sometimes you're right and sometimes three samples are really not enough so it's for sort of broad observations it can work out I wouldn't recommend that is common practice okay how do we present the data the data is presented in one of three ways that there's probably other presentations but three is a nice number for everybody to think about there's a slice view which is usually the list of hot methods in your application so you see a list of methods you don't bother drilling into where they came from but maybe there's one method that is the top of your you list maybe you're spending all your time in hashmap kit and then you know I need a new hash map or you're spending all the time you know method you know is inefficient and then you go and optimize death method often that list is lacking in context because spending all your time in hash map get doesn't mean there's one call to hash map get that is you know the problem your application is probably calling hash map get from a bazillion different places so where is that that you care and where don't you care you don't need to replace all the hash maps in your application maybe there's just one that you care about so getting that context is important the other traditional view is the tree view and then the tree view those of you who've used profilers know that clickety click experience where you go down and down and down the stack and you drill through old photo Joe and it's all about the context you start from the top and you drill all the way in to find that one hot method at the top and then we have link graphs this is what the flat view looks like so here we have some default headers in it at the top this is this is Java Mission Control and since everybody said they use the profiler you all know this view and this is a preview and this is a demonstration of why it sucks because I opened one sort of thing here and the rest of my application disappeared from my screen because I click click click my screen is full also method names and and class names fill up the screen very very quickly so I have to scroll sideways I have to scroll up and down I have to was different cloud and scroll up and open different nodes and it becomes a very frustrating experience if you've ever done this then you know what I mean especially again in real life applications ok and then we have this and you probably think how can I get one how can I get one of those flamey things you use a profiler you collect all the snacks you can ok you can use a visual VM you can use JFR you collect a file full of stacks ok you take those tags and you convert them to a unified format which is the collapsed leg it's just a text file we'll have a look at the format just now you take that and you can hack on it you can improve it you can grep through it you can do whatever you like at this point we're looking at a text file that I think most of us who's ever ever had to do scrape through logs can do some nifty things with logs think of it as a log we take the collapse next we shove them through a Perl script anybody here knows Perl good on you I have to spend an afternoon learning it it's a scripting language it's not like the end of the world it's a very short script and that produces an SVG which is an animated graphic which is the flame graph again this is sort of a text transformation from lots of frames to lots of squares on the screen but you can just jump in there because it's just 500 lines of code it's Perl it's not you know the most accessible language in the world but it's still we're not talking about a massive dive into something like J visual VM which is a complex project with lots and lots of code that you would need to set up your IDE and a whole development process you're not going to do it nobody's going to dive into J visual VM and add a feature or very few people whereas if you spend 30 minutes you can maybe add a feature you are interested in to flying laughs ok so this is a flame graph we've seen it three times what does it mean ok and this young fella here thinking the same thing right this is a dry explanation of what it means the height the y-axis is the stack that's and you have one method calling into the other and the flames range higher as you stack goes deeper or longer whichever way you want to look at a stack really the x-axis is a representation of the number of samples it's not the representation of time so when you're looking along the x-axis it's not the progression of time the wider the frame is the more samples fell into that frame okay and the frames are sorted alphabetically the one thing left to explain is really the colors the colors we've seen before are just basically random colors from the pallet red that's Brendan's choice of user experience I'm not sure it's everybody's idea of usable interface but it helps distinguish the frames which is what he was aiming for when we're looking at the Java flame graphs we'll see different colors and frames and the different colors will mean something but still within a shade like all the green shades of green frames will be the same meaning okay now I'm going to switch to my browser because none of the presentation tools I've tried actually work for this VG's so here we go right I mentioned the collapsed Stax format this is what it looks like it's very simple we have a semicolon delimited string and each sub you know part of that is method so we have main calling in 2:01 blah blah blah blah blah and at the end I have a space and the number of samples in this collapsed egg okay this is the simplest view of this format and this is what the flame graphs we produce from it would look like there's only one sample so it catches the whole width of the screen 100% is spent in that and we can see that you know we could click through but it wouldn't actually do anything because that's just one sample there's one stack okay one sample is not a very scientific thing greater we want lots of samples what would this look like with a million samples okay it would look exactly the same because it's the same stack it's just a million times the same thing it's the same data it's a hundred percent of the time spent at the same place okay there's nothing to drill into it's not particularly interesting but since it's a hundred percent of the weight in that you know those frames that's why we get the same picture right this is a step forward we have everything all the stacks here are they start from the five method okay everybody goes to five and then cause something we have some sub methods that have some time spent in them some samples collected in them and this is going to look slightly different this is how it looks okay I've zoomed into five unhelpful though it may seem this is the full view and this is what zooming into five looks like exactly the same I can zoom in to seven and then seven will expand to fill up my view and then I can see that seven takes up eighty percent of the sample one five appears in one hundred percent of the samples this is a good time to realize that the number of samples in the collapse tax file is the self time it's the self samples okay whereas what we see here the number of samples we report in this view is the total time if you've used the profiler before you know this distinction the self time is the amount of time spent in the method itself the total time is the amount of time spent in the method itself and all the methods it called okay so here a frame went it's the is affected by this total time okay and then we'll go to one last example here again the colors mean nothing here but the important thing to see here is that things are sorted alphabetically so a is to the left and B is to the right we have the method seven appearing in different places and we may think this is you know I want to know how much time I'm spending in than seven so I can search and I'll look for seven and I'm spending 41% of my time in seven which is kind of handy and we'll see this in use just now and I can drill into you you know six and see that it's got three coolies and so on okay that was a nice gentle introduction to the topic I didn't put the delimiter slide in this one we'll get to the limit the limit of slides is now okay right so we know all about line graphs what can we feed this presentation method well these are the tools we use for profiling apparently this is a survey done by Jay rebel a couple years ago they do a bunch of what are they call surveys every once in a while they publish a nice report anybody here used JB's ovm okay anybody here used any tool on this except for Java Mission Control okay anybody use Java Mission Control fewer people double Mission Control has been with us for a bit and it's worth trying out that's been with us since Java 7 and a bit and it's definitely a better class of profiler than a visual VM it's misleading here because all the profilers I don't know about the custom in-house tools but I'm guessing except with Java Mission Control are using the same API and all of them fall under this category will rush through because I can't spend the whole time talking about safe points who here knows what a safe point is right safe points are places you know throwing around your code where the JVM can stop and safely collect the garbage safely inject new code safely do all sorts of things the flate monitors also all kinds of things happen in a safe point and the JVM can't just stop your code wherever it is and do its thing it has to ask your code to voluntarily stop so your code has lots of little safe point pawns and because we wouldn't want to spend all our time during safe point pawns there are few and far between so if I'm only sampling at a safe point I can only see you at a safe point in any code that is between 2 save points you're not going to be able to see in this profile ok now this is pretty bad often by itself it doesn't help that every time I sample I have to have effectively safe during the operation which is a stop the world operation which means I'm profiling applica your application when it's not working I've stopped it it's no longer working I'm taking a picture it's as if I had to take a picture of this room with everybody not blinking and we'd have to all sit around wait for everybody to not blink I'll take the picture then you can go about your lives ok and the last thing is each sample collects from all the threads this means this is a stop the world unbounded operation and I say unbounded because the more threads you have the longer it's going to take to deeper your stacks the longer it's going to take multiple multiply those two factors and it's all your threads the threads you started and blocked your thread pool with a hundred threads in it you're going to go through those 100 threads every time and collect all of these samples but hey it works on everyday vm because it's part of the JVM Ti it's part of your your guaranteed API for every JVM and it's been there forever so all the commercial tools use this because it's the lowest common denominator you can produce flame graphs from this data I wouldn't necessarily recommend it but sometimes that's what you have and that's what you can use ok next level up is honest profile and Java Mission Control they both use a very similar internal API that allows you to sample the stack at an interrupt you basically enter up the threads wherever it is and you collect the stack this is great because you don't have the safe point by insertion but there are certain places where your collection would fail using this API and this in particular a lot of run time stubs that you can't see using this tool and we'll see what what we're missing out you know that but all of these are also blind to GC threads because this we were only caring about the Java profile here or any compiler threads native threads etc these work so you can have honest profile of work from Java 6 you can have Java mission control since Java 7 and you can generate flame graphs from this data again if this is where you are this is what you can do exactly a better place to be than what just a visual VM family ok so Java profilers work most of the time when when I say they work most of the time sounds pessimistic we've used these tools for ages they do work they do highlight some some hotspots and they do help us find places in our code that can be improved it's not that they're completely useless but we can we can do better and before we dive into what it means to do better we need to remember what the scope is of an application doesn't do well there's the OS that we're using maybe there's an OS configuration issue maybe we are in OS code somewhere in the kernel ok maybe the JVM is eaten all of the CPU time we're running back-to-back GCS we're compiling maybe there's a compiler issue then the compiler is spinning maybe we've called in to some native library where we're spending all our time and it allocated some background thread that's consuming a CPU this can happen with messaging libraries for instance and maybe it's in your code this does happen the problem is the JVM doesn't run your code the JVM doesn't run Java doesn't run Scarlett runs bytecode and that bytecode goes through the magic that is the jvm machine to actually run native instructions on your cpu so your bytecode start in the interpreter and that's where things are slow this is it stepped through your your bytecode this fancy state machine that's as fast as it can be but it's still an interpreter and being an interpreted language is not that great for speed so once we've determined that your code is really hot really important you run it around we're going to compile it the world is slightly more complex these days than it used to be it used to be that you just have the client compiler and server compiler now you have tiered compilation so you have code going from different levels of combination but once it's been compiled its native code that's its assembly and it is far more efficient than it can be in the interpreter and finally we have inline compiled code once we hit in learning things get a bit mushy and in in the real world because if you have a method that calls three other methods and now I've compiled all of them into one what's called a code block which is animal Gatien of all this code and I've reordered it to be efficient it's kind of hard to say which method is which so when when profiling inline code we can have quite a lot of inaccuracy and there's really no great way to fix that but it's it's worthwhile being aware of these limitations and failure to inline code is also a big thing so it would be good you know about inline and to see which code is inline which isn't in to surf the Linux profiler perfect a system profiler it can see user code it can see kernel code you can see everything and it's truly a multi-tool of many many many uses at the standard so it's not it's not installed by default on every machine you 11th but it's a reasonable request to ask your ops person to install it it's not beyond the width of anyone and it's the safe thoughts consider the reasonable tool to have okay and it now works with Java didn't use the work with Java if you look up on Stack Overflow say I want to make perfect work with Java uh it's really complicated it's not complicated anymore okay do you make perf work with Java a lot of people contributed I contributed a tiny bit but the the big contribution here was two main forces there's person map agent which was developed by johannes rudolph and brendan greg has poked the open JDK people until they decided to to allow us to walk the stack in a native profiler that they've implemented the feature that's been around for C for many years which is keeping the frame pointer around so you can walk the stack so the open JDK took his patch productize it and I've contributed a tiny bit to person up agent to make a deal with inlining and T Jake made that visualization work in flame graphs and now we can have Java flame graphs with us do you make this work there's a long list of requirements but basically you need drilling you need a recent JDK you need to enable some options only the possess frame pointer is really required and you will need some permissions to profile to use first you'll need perf installed and so on there's a blog post there to help you make this work and there's a bunch of them actually it's quite straightforward okay what do we get for all of this effort there we can see Java we can see native we can see the kernel perf like I said as a multi-tool we'll only be looking at cycle kind of profiles but you can use it to look at page faults at cache misses at anything really you can suck out of the counters available to you it's low overhead doesn't have a safe point bias what do we lose we lose a register and we have increased overhead so when we call methods we will now have to have some prefix postfix to manage that register shouldn't make a difference if your methods are reasonable but if you're you've got like a thousand method stack deep of all tiny tiny message that you climb up and down manically you're going to see an issue so it normally we're talking about between zero to three percent loss I've usually seen it at close to zero but in in edge cases it can be worse than that we can't see interpreter frames anymore in Perth because the interpreter all looks like one big blob to Perth and we get broken snacks and and they might be fine with Java and they might not show up at all in Java but it's it's a reality and as the tool evolved it will probably improve on that we have limited stack that on all the First's on newer Linux kernels which not everybody has you can have a larger stack that's okay finally we have a lot of intonation and we have a lot of new information and in the 20 minutes remaining we will try and understand what is that information a-okay now now the fun begins okay what is this this is a lot of Java code and we see the can people read this font or do I need to zoom in further can people at the back read this yes now okay this better okay cool lovely stop complaining right this is what the profile would look like if we were using a an async it call trace can profile the main point here is that we can now see in lining what do I mean we can see in lining the so we need to understand the caller color scheme here these yellow we muster the kind of things these are C++ calls the red ones are native so the interpreter a bunch of unknown frames here the green ones are real frames their actual code blobs their actual methods on the stack the Theo bluish kind of frames are completely a figment of our imagination they are inline methods so this method in this method have basically disappeared so that we can call this method okay and we can now see some interesting things even without the extra data that we will get to in a minute we can see for instance that this method looks like it's got a lot of self time and we can see that it's in line already quite a few methods so for instance it's in line char apt if you have a used Chara it's a good thing that it gets aligned because if it didn't get in line you would have to reload the string or HRA every time you call it but because it gets in line it means your your char array can be hoisted out of the presumed glute that it's in and so on so this is great that we can see in lining but how is that helpful right if we look at this method so I'm going to zoom out just for a second here right we can see that this method got in lined into this method okay but the coolies didn't get in line now this is the hot method obviously why didn't we align the coolies and what we can see here is that there's this right method and is this right method which is another type adapter and there's this right method so we have three different types here all implementing the same method right in Java where you have a wherever you have a method call it's a virtual method call all the method calls in Java are virtual and this used to be like a big hack link point for C++ developers and C developers because virtual method calls are bad now the JVM goes to great lengths to convert all the virtual method calls into static static call sites so we know who we're calling once we know who we're calling we can inline them we can do that through interfaces and we can do that through just no more class code cause even if the class isn't final or the method isn't final maybe the compiler knows there's no other implementations of that method and can then inline it and inlining is really a great improvement because once we've in line we can you know mangle the code and eliminate most of it maybe or just optimize around it in ways that we can't do if we have to actually call into that code so it didn't happen here my guess would be because we have three different types implementing the same method they're all getting called and that means the call site is what we call mega morphix mega as in slots if it was monomorphic it would get in line if it was biomorphic it would have a little F and it would still get in line but in the JVM case it's a case of one too many lots so too will get in line three no there's now an extension full thing where it does in line two three case but there's some arbitrary sort of point at which inlining would stop now maybe there's an optimization sort of opportunity here maybe there isn't it's kind of hard to say but hey at least we have some new information and if we were to look at this profile and this was all we had we would probably think okay this jason write a string method is going to need some work I'm going to have a look at that and see if I can improve my code now that would be stupid okay let's not do that okay let's look at what we're missing when we using the async it call trace profilers and this is the slightly wider picture because now we can see run time stubs and runtime stubs are a funny thing runtime stubs are generated code their native code that we call into from Java but we don't know that we're doing it when you're calling system array copy you'll end up in a runtime stub but also when you're calling system index so string index or a lot of the string methods get aligned into some sort of runtimes that some of the zip methods get in line there's a lot of places in the JDK where the team went to a lot of effort to optimize obviously this is a hotspot that used to be a problem I run a claim once it has been improved it also became invisible to anybody profiling a job application so they can no longer see it and one of the things we can see and we should see is array copies so if we look at this code that is all about strings allocating strings and concatenating them and forming Jason's because Jason's are all the rage right and we look for array copy now that we can see it we'll see here at the bottom that it's 17% of the profile that we've been missing but 17% that's been sweep under the rug that you can see before and it is phenomenal that you can see because very copies are a huge thing if you look at any optimization manual the Intel optimization many more like there's a short essay on the topic of array copies and memset any compiler optimizes heavily for mem copy and memset you should see them otherwise they wouldn't be important if the if your code never did that why would people invest their time in improving it your code is doing it you just can't see it so try this new tool you can see it now ok another thing we couldn't see before is this little friend here this is this is the same bit of code we were looking at before I've gone back to like really really tiny font to you I'm really sorry okay so this is the right method calling into three implementations of the right method and I see some new friends here one is mr. I table stump which is the interface table stub it's going to look up the interface which interfaces which implementer is which and the other one is the V table stub these put together are like five percent this is five percent I'm paying that I didn't know about before this is five percent that I'm paying for not in lining in this particular case even before I got any benefit out of in lining as an optimization this is just the price I pay for a mega morph ecocide so was there more of that now that we've seen it let's go back to the full scrub look for table and we'll see that there's like seven percent here so put together that's 25 percent of the profile that was just invisible to us up until the point where we rediscovered it just bought a lot one other thing here which is perhaps telling is that we can see an allocation we can't see allocations normally you can't see allocations the profile of the reason you can't see them without looking at assembly code it's because there's no debug information for them and they get in line so once they've gone through that amount of processing they're just invisible they're part of a larger sort of code blob with no metadata to tell you that they're there this is where the sort of idea that allocation is free comes from again you can't see it so how would you ever know how much it costs it's actually very hard to put a finger on how much allocation costs in in over your application how much time you're spending in allocating and zeroing member but here we can see this guy n'yar a non-0 how come we can see it the reason we can see it is that as we can see it ended up in some GC code and the reason it ended up in some GC code is because allocations sometimes exhaust the heap when we've exhausted the heap we have to ask the GC to clean up our so that we can have some new heap we can trash so that's what happened here which sort of begs the question where is all the GC CPU at and it is elsewhere and we'll get to it in a tiny bit before we move on to sort of larger scope this is a very confusing picture this is nettie code nettie in particular this is the net event loop anybody here uses netting you're using that even if you don't know it everybody's using netting because pretty much every framework if you're using spring it's using that if you're using Cassandra using a neti pretty much it everywhere and the neti code calls into native code and this is where we can see here all these orange frames of Colonel Club now I think for most of us looking at Colonel sort of stacked are not like it's not hugely intuitive were informative but for someone like Norman Maura who's the lead developer or natty looking at kernel code can lead him or looking at profile data for Keanu code can leave him - perhaps deciding to re-implement a bit of niño so that he can use different OS cost which he has so if you're using nathie and you're on Linux you can now use the native app or implementation and you'll get better performance for it and again this is a tool that enables Norman to make those sort of decisions and perhaps you require that sort of insight as well most people don't which is fine another thing we see here is the and stacks issue so we see all these facts they don't really start the Java they started some unknown frame and this is part of the shortcomings of the tool so this is something you'll have to contend with if you're using this data it's not impossible to live with but it's there finally there's this bit of code which oh right this is a Java profile for Cassandra server and what I wanted to show here is that with a bit of search and replace I took a flame graph that looked very much like the Metis one with a lot of confused stuff you go into the collapse neckline you look for all the interpretive frames and you collapse them because you don't really care that they'd spend time the interpreter you bunch together all the broken stacks and you put them in one place etc etc and you end up with something that is a lot more usable and in particular you can see here a familiar friend to those of us and use profiles which is unsafe Park if you used a visual VM or J stack or whatever Park threads are in the state running which is very confusing because they're parked so what are they parking is rather than on parking is red costs you something it cost you CPU cycles to do that operation what you think here is that cost expressed and when we look at this profile one thing that jumps out is that lalalala it's a big profile it takes a while actually cassandra is spending an awful lot of time parking threads maybe there's too many worker threads in the worker pool maybe there's a bit of thrashing the tasks are not getting distributed such that you wouldn't have to put threads to sleep and wake them up all the time if you think that sort of thing consider rebalancing your thread pause or find a more efficient implementation that's it's I think that's it here right okay so we looked at the Java portion of a profile we looked at inlining this is data that's not really available to you unless you're looking at inlining lungs from the JVM so most people don't look at it it's valuable we look at runtime stubs which amount I think in most applications to quite a lot of code that you can't see usually and is close to native and as the park costs element and I would really like to encourage people to dive into the the collapse tax processing opportunity and make up their own tools and and we can really gain something from this we can trim coordinator to only get the bits we want we can collect the broken frames automatically and finally if anybody wants to spend an afternoon on flame graphs I'm pretty sure they can produce a color mapping scheme that would color the JDK classes differently and different frameworks differently so when we look at profiles these things can jump out at us and we can identify more opportunities this is not something that is readily eating easy to do with other tool chains shall we say and I really think we can and should do something towards this now last but not least so we were looking at this benchmark just shortly short time ago but we were only looking at the Java cost and as you can see the bottom here that's hidden by a desk and it's really small it's 62 percent of everything here we're only looking at 62% and if we wake up from that dream that Java is everything that happens we'll be able to see that in this particular benchmark looking at the java code is highly misleading because 40% of the time is actually spent in GC i say 40% of the time perhaps I exaggerate so I can look at the GC task and see it's 26% there's this other stuff happening looking at the wider picture can now give you an idea of the CPU utilization that goes to the JVM in particular two GC a lot of people when they talk about Java performance they talk about the GC costs but it's very hard to put a figure to it and this gives you a profile that enables that observation also enables you to look into what is the GC doing maybe you can learn from that maybe you can look up particular bits of code that looks suspicious to you and find out exactly the activity you should be tuning for finally looking at this one thing that is confusing when you're looking at GC is that this is a young GC at stop the world so these four threads that used up between them 25% in that set of all it's running concurrently so they probably ran for just 5% of the time but in that 5% of the time they were blocking my code from executing so it's kind of hard to do the maths when you're looking at multi-threading I don't have a really good solution I think with a bit of tooling we can make utilization problems a bit easier to solve here but at the very least you have the data to start playing with it the next thing I wanted to look at is what happens when you use a different profiler and this is sort of a meta exercise of profiling the profilers so we have this benchmark but we just looked at and we can see and this is one of the nice things about language you can see immediately that this is a different profile it's the same application something has changed and if we look here again I apologize to the people in the back about the font size we can see that this is the compiler thread jumping in and sucking got some of the CPU okay we can also see that our code looks very very different when you enable an interpreter and and sir not an interpreter an instrumenting profiler it changes your code that injects a lot of timestamps everywhere and two things happen one all your code goes back to being recompiled reinterpreted being slow and getting fast again all that cycle starts from scratch the compiler has to do a lot of work to get it back to being fast your code is no longer your code though it's got a lot of new code injected into it so what you're measuring has now changed significantly and you're spending quite a lot of time calling clock get time everywhere because you've injected all those timestamps so instead of running your code you're now paying a significant percentage to just measuring it it's a classic management problem between actually doing any work and reporting the work that you've done and it's it's just a machine example of it so this is what happens when you use the instrumenting profiler and this is what happens when you profile with Jay visual VM again this profile looks different one of the things that it's perhaps hard to perceive but if you have a good sort of visual memory you will note that this hump on the on the right it looks very similar to the benchmark we were seeing before the code hasn't changed its shape hasn't changed but it's getting a lot less CPU it's getting a lot less samples collecting the stacks doesn't actually take that much time if we dig in and find that method call we'll see that it's marginal but it stops your code every time it happens so this is what we're seeing here right I'm nearly out of time so that was the sort of meta profile larger profile view you can get CPU utilization info on JVM threads you can get an idea of what's happening internally in the JVM without resorting to looking in sort of obscure other logs without the context of where does this fit with my application there is multi-threading there is confusing blocking behavior to contend with but it's part of you using a new tool and perhaps you'll get used to it there's a lot more you can do with perf and there's a lot more you can do with this tool chain in particular you can take these text files and concatenate them and get a cluster wide profile you can profile using perf on the whole machine level and get a view of what other applications are doing or how multiple applications behaves together and you can use all the other per features too to enable your investigation finally I know I'm out of time but I would really like to encourage people to jump in and add whatever features come to their minds and this is just a few suggestions all of these should be relatively small projects that require very little upfront knowledge we're just doing some text processing and we get new insights out of it in summary this is a new set of tools for you to use they're not particularly hard to pick up and they give you a lot of new insights you can tweak and hack and share and I think that's a great thing to have and you can gain both a wider understanding and you know perhaps even solve some problems using these tools have fun thank you for listening and we don't have time for questions but if you have questions I'm sure the people outside would love to wait while you ask them so go ahead thank you

Info

Channel: Devoxx

Views: 6,061

Rating: 4.9111109 out of 5

Keywords: DVUK17

Id: ugRrFdda_JQ

Channel Id: undefined

Length: 52min 11sec (3131 seconds)

Published: Wed May 17 2017