Extreme Profiling: Digging Into Hotspots by Nitsan Wakart

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay my name is Newton I'm gonna talk to you about extreme profiling extreme profiling is not dangerous it's something you can do safely if you're pregnant or suffer from high blood pressure you don't need any sort of protective measures perhaps padding your desk when you slam your head into your desk you might want to have something on it so you don't harm yourself but other than that it is perfectly safe we're gonna talk about tools that most developers don't use so don't feel bad if you've never tried that the whole point is that when you leave this talk you might go home and try something new and in particular a profiler no just anything again as the picture might suggest you might want to have bunnyears on when you profile but you might not if you work from home you can do that thank you for coming and thanks to the organizers for setting this up and giving us food and drink and the rest of it I work for Azul systems we have the stripper Duke you can see on the upper left there we make thing which is the most wonderful JVM on earth it is it's an unbiased professional view only works on Linux only works on x86 so naturally aimed at server science systems not so much for your sort of desktop application or running on your laptop it's it's geared towards large beefy machines or you know large enough anyway and we're focused on responsiveness was focused on responsiveness sensitive systems or low systems and Finance but also systems and sort of commercial websites etc we have our own GC algorithm which is fully concurrence of the young donors concurrent the old generous concurrent typical pauses are sub-millisecond as opposed to many milliseconds in the competing JVMs and we have ready now which is a relatively new feature but it's it's quite mature by now I would say which allows you to persist profile data from your previous run and use it in the the new run so that your warmup period is much much shorter so all the data that the profiler gathered in the previous run gets loaded up and the kampala can can do all the compilation at startup apart from that I do a bunch of sort of blogging open source development etc and I organize the Cape Town Java meter I live in Cape Town right which of you guys use J visual VM or have used a visual VM in the past that's that's good anybody here who uses Java Mission Control okay all of you who used java visual vm you should definitely try to have a Mission Control if you're running on a post Java 7 JVM it's it's a great profiler and there's other profilers here this is a rebel amps survey and we will be talking about exactly none of these profilers they are entirely when I say extreme profiling we're going to talk about the one percent that doesn't use any of these tools and encourage you to join them so what's what's wrong with these profile is why don't I use them on a day-to-day basis and why maybe you should consider using different profilers on occasion for wonders there's Jay visual VM and similar profile filers and it's there's a host of profilers here and I will not name them particularly who are commercial profilers and they rely on the JVM TI interface to profile the JVM and that sadly means that they're subject to sampling only at a point falls because that's what the API provides which means they suffer from two main issues one is each sample every time they sample your application stops working samples all the threads stop there to stop the word pause takes a sample and then you can move on with your life that's terrible when you think about it it's profiling your application a when it's not doing anything and B if you have a lot of threads they all have to stop they all have to start again the overheads are quite crippling which is what a lot of people don't profile in production with these profilers because they can't degrade the performance to that point which means they they never see a profile of their application in the production environment which is itself problematic is it draining all right each sample they take includes all the threads so you see sleeping threads you see running threads etc and in particular they suffer that's the first point which is perhaps not that clear is that they suffer from safepoint bias which means the only sample your program at particular points and there are great many points that don't get sampled which means your profile may not indicate what you think it indicates I will not go into that because that's not a problem the profile is we won't talk about it suffer from the next level up is Java Mission Control honest profiler which are the one-eyed kings in the land of the blind they do a much better job and you have to give them credit for that and really I recommend you charge of emission control for that reason only in limitation with Java Mission Control this - really one is you neither slightly sort of recent JVM and the other one if you do it in production you need to pay a license which involves something along the lines of you know selling your firstborn so I can't recommend that but you know some people are less attached to their children they don't suffer from safepoint bias but they only sample the Java stack so if your application if you think your application is only about the Java code then we'll be talking about that in a second but D not only you're running a JVM it's a process with other stuff happening this is more of nature than I signed up for okay we could be having this talk outside and not only that they are limited in granularity which perhaps doesn't seem like much of a limitation upfront to the line of code so whichever line of code they perceive is hot they'll report now sometimes that's good enough sometimes that's you you want more granularity than that and they have their blind spots because they rely on an internal API of the JVM called async yet call trace it's not an official API which is why all the commercial tools don't use them only EMC is a proprietary Oracle tool so it can use that honest profiler just joins for the ride but for instance it wouldn't work with j9 up until recently it wouldn't work with zinc either so we implemented that API but it's not an official API you can't be an official JVM without implementing it that API however does suffer from a couple of blind spots if you trying to measure when there's a GC going on or at the off to going on or some runtime stub is running your sample will fail and what you do when your sample fails will vary from one profiler to the next we recently updated I contribute to honest profiler so it was a patch I contributed so we changed almost profiler to to report those as method frames so you can see them in your in your profile but Java Mission Control for instance just drops them on the floor and you never hear about them so if your application spends a lot of time for instance in system array copy you we'll never see system or a copy in your profile because every time the profiler catches it it'll fail the sample similarly but well not similarly this is a different problem we'll get to it later but John permission control and honest profile and most profilers suffer from a problem called skid and in particular skid and in lining are a bad combination and we'll get to that in a second before we dive in let's ask ourselves why do we profile and I really love this quote it's not about profiling but it's about this illusion and and on occasion I've had this conversation with developers where they think they've done something and they've improved the system immensely and they you know metaphorically gluten out to the renault clio and it turns out that that hasn't happened they were relying on a profiler that told them one thing was a problem and they've eliminated the problem in the profile but the application was just as fast or slow as it was before it's sort of you know on the one hand when you optimize you want to optimize a bottleneck you want to optimize where it matters you don't want to be the sort of premature optimization victim that optimizes the wrong thing so you use a profiler to find the bottleneck but if the profiler gives you the wrong bottleneck then you're optimizing in the wrong place so moving right along to profilers that care about more than to have a profile is care about demo profilers only cover this area we want something else because we want to cover the OS we want to cover the JVM runtime so if we spending all of our time all the CPU is going to GC your CPU profiler should admit that it shouldn't just say well I don't know what happened I couldn't run Java I can tell you that but part of that who knows the compiler might be running which is valuable information or you might be in some runtime stub or some I'm activity and finally there's your code but your code doesn't actually run and what I mean by that no JVM actually runs Java okay you compile your java into a class file you have bytecode you give that to the JVM so type eurasia happens for instance but all sorts of things happen the there's a first step of compilation JVM doesn't run Java 8 runs bytecode the bytecode is run by the interpreter but if your code is running in the interpreter that means it's not very important code it hasn't happened enough times to get compiled when your code is important where your code happens all the time it'll get compiled so compiled code is the important bits of your application and then after that we'll be talking about inline compound code which is an important sort of it's considered the mother of all optimizations in a way in a compiler where you take methods that you call in to you and you suck them in to the method that calls in to them and then you can make assumptions about what goes on inside them maybe drop half the code in that method you just inland or maybe just not load all the members of the class again and again and again and rather just use them as if they were all in one big method block so we're gonna be talking about native profile as native profilers and jet runtimes traditionally don't get along and that's because they had one or all of these three problems before we move along there's anybody here use perf is anyone here used perf at least once ok and on we go when you go home try perf so this is what perf top looks like for a native application this is Skype on my machine as you can see it's doing nothing because it's it's locking and unlocking and there's nothing really happening happening here but you can see methods you can see you know good stuff happening if I knew anything about how Skype works maybe I would being able to offer some insight to the developers about how they could improve it right this is what happens when you do a top on a Java process communicating with the sort of addresses is hard and even worse in in sort of jittered languages those addresses will stuff meaning anything when I restart my application so the challenge what jet compiled code is that there are the methods don't exist as far as perfect concerned upfront they only exist during the runtime and perf doesn't know about your methods so what can we do about this perf the way perf works it interrupts your process with a signal so whatever is on CPU gets halted and runs you the the perfect signal handler it collects a PC which is the program counter which tells it where it is it's that address in memory that we just saw and then when it goes on to to the reporting stage of perf it tries to find out which method are we talking about and there's if it's a normal application it would have a static object file and that would contain the data it needs to say which method is at that offset if it's JIT compart there are no there so files you need to use a map file per file support for jittered languages to have sort of a lookup table the JVM doesn't produce now file so somebody has to and that summary is perf map agent this is an open source project you're honest Rudolf he works and I think I can't remember what he works on interesting anyway looking up on github looking up on Twitter it produces a perf process ID file and it's a JVM ti agent so in particular it's it can be attached to a running process you don't have to add to all your processes and then if you forgot then you can't use it you can just come along and use it whenever importantly it takes a snapshot when you load the agent and what I mean by that it's looking at the state of your application when you load it and records NAT in the map file that is not the reality of the JVM the reality of the JVM is even more foreign to Perth than usual because what happens on the JVM is that you compile a method and then you change your mind and then you compile it again and then you wouldn't need some sort of notion of mapping over time and that's simply not supported in turf as as it was and it is getting support going forward there's an effort from Google to support to provide better support with with perfect digital languages but we're not gonna go into that because it's it's sort of fairly recent and not everybody has it okay so the map file looks like this that might look horrifying but it actually only has three pieces of information three columns there's the address at which we will find this method there's the size of the method in hex and then there's the name in sort of an ugly format but it's it's there so you can sort of scan through this and if we were sort of masochistic we would look at the top take the address and manually search through this but we don't have to because once we have that file it works we can see the the methods what we see here is the reality of the process so or an interpretation thereof this is the the real frames that the JVM is running and we can also see some interesting kernel methods like update blocked averages there we can see all sorts of JVM methods happening so if we had a problem in in any of these places that would show up in this profiles already we're seeing a bit more a wider picture than we did with a Java profiler what's nice it master perf is you can use it for a single process but you can use it machine wide you can record everything that's happening so you can get an even wider view arguably sometimes you want that sometimes you don't another benefit of you know suddenly integrating with perf it's not purport profiling with hardware counters it supports our probes it supports OS events it supports all sorts of new and interesting ways to measure and you can use them with the rest of the tooling so I'm not going to touch on on doing exotic things with perf but it's good to know that it has other capabilities perf map agent supports several levels of granularity for for its mapping so you have the default which is real methods when I say real methods I mean the compiled code blobs which means if you call into hash map get then hash map get was entirely inlined into your method that hash map get will not show up in the profile you won't see it because it's not a real method it's it's been in line that's no longer on the stack it's there's no frame calling into it it's a inside the JVM it's referred to as virtual frames it's an inline method so you can use unfold simple which will give you a view of inline methods as if they were real methods and you can use unfold all to give you a view of the inline stack they're both interesting for different things so when you look at unfold simple let's say you have a data structure with a small method and it gets in line everywhere that's valuable information if that method is hot but it's very small it gets in line everywhere you want to know that you want to see it as one of the top methods in your process so having it represented as a real method is is important but something you you can see in the profile here something a bit funny happens when you do this each segment of the top-level method will get the ball get represented as its own method which is only fair because the the perfect map doesn't allow you to say this method is at this address plus you know five segments I'm talking about the same method here we can see you know hash map put sorry hash map but Val in particular and sort of different attributions these are different segments in the same method so ideally you you would sum up all the sets certainly that doesn't happen but there's these tools that do that and we'll have a look in a bit and then you can unfold all and unfold all does this sort of arrow notation and what that tells you is that you have at the context for the method in in the place that it was used so we can see the comparison key over there so put Val cos into equals so the the actual method in the java code that we're looking at is the equals method but it's the interpretation that got inside patrol which might be different from the interpretation it would get elsewhere so they're different segments within the same method so let's say I have put them and it's in line some methods then the first segment where nothing as in line is put well and then there's some other methods and then it's put well again after that and then repeats so it can be slightly confusing we'll have a look at a tool that makes it all clear in a second so perfe agent talk is it's one of the scripts that come with person app agents when i say when you go home try it out this is a really really simple installation it's you know download from get build and you're good to go so it's it's pretty sort of newbie friendly so just go on attach to process your running and see what's happening attached to eclipse it covers the code it covers the JVM and the OS it gives you what perf gives you essentially and for inline methods you need to remember to enable debugging on safe points this is it's a flag it on one of the previous slides if you don't enable this the JVM will not generate a lot of debug information between safe points which will mean the inlining data you will get will be slightly skewed I don't know of any sort of bad side effect of enabling this flag apart from the JVM probably consumes slightly more native memory so I don't see why you shouldn't enable it in pretty much everywhere but you know it's if you don't enable it you'll see a slightly confusing profile and again I will mention skit but we won't actually go into it right now moving right along some men just want to burn things in particular this man likes flames he this is Brendon Greg he's from Netflix and before Netflix you work for CERN and jaian he's a brilliant guy he wrote a ton of books about system performance and he came up with a really great visualization for code which is called flame graphs and flame graphs when you use them with Java with perf map agent look like this now that is really unhelpful the reason it's really unhelpful is because you can't see any of the Java the red frames are the native frames there's lots of those and then you have the little yellow ones which are Java okay so that's unhelpful why doesn't this work the reason it doesn't work is because perf can't walk the Java stack anybody here doesn't an OS course University remember anything operating systems funny things they run on your laptop etc right if you've done that course you might remember that there is a stack and if you remember even more detail you might remember how the stack works the JIT compiler doesn't doesn't keep the frame pointer in place and perf relies on the frame point and the frame pointer points to where the frame starts and from that you can find out where the previous frame is and because the JIT compiler uses that as an extra register it gets an extra register which is when but Perth doesn't you work so that's a bit of a sad thing no frame pointer you get broken stacks and Brendan Gregg contributed a patch openjdk and it actually was productized from one of the by one of the open JDK developers and now it's part of the Oracle JVM so if you're using a if you're using Java 8 you should use the latest update and if you're using the latest update this is something you have now here it's more of a sort of gray area whether or not this has a bad side effect you can expect to see a performance difference in some areas usually it's marginal because what's stopping your JVM from performing is not shortage of registers it's all sorts of other issues on occasion in if you run in code that is very mathematical very computationally intensive you might need all the registers you can get and if this is where your hotspot is then enabling this option might give you a bit of a bit of degraded performance so have a good idea of the before and after when when you try this but it should be you know relatively safe and when you do that you get something like this slides don't really do justice to flame graphs so we're gonna switch here and over here we see a flame graph that I didn't take because because I didn't take it this is a flame graph produced by the nightly runs of the cassandra' benchmarks so they produce their own lives and their own graphs and they also collect this this flame graphs the the nice thing that I want to highlight here first of all you get like a nice tooltip for everything it's not as bleak as the slides may suggest when you can't click on anything will click on something in a second but the great thing with but this I was talking to one of the Cassandra maintenance and he said I ran this benchmark and this is what happened and this is the profile I collected and he showed me this picture and you can say a lot just by looking at one picture it has a lot of data in it in particular on the particular case it was a bit worried abut even here you can see these methods which mean he is recording during the warm-up period and you can just look at one and say hey this is you you recording a profile while you're still compiling so it's still the sort of there's a lot of measurement here that is unstable and if you profile say a minute later the profile would look slightly different and that might be the reason we're that we have some unknown frames here we're perf map agent probably took a snapshot and the compiler generated a new method later on and then perfed no idea what we're on about so I couldn't actually tell us what was going on and we can see all the GC activity so if you profile application java application you usually look at the CPU profile you also separately look at the GC profile this you know if you look at a GC profile it's really hard to say how much CPU you're spending on GC you know GC happens you know when it happens you can enable extra logging and find out how long it stopped your application but saying how much CPU of the overall available CPU was taken by GC is not something that you get which is important if you're trying to get maximum throughput out of your machine you want to use all your CPUs and if the GC is using say 20% of your CPUs then you're missing out what we don't see here is inline frames so these are all real frames that we're seeing in this diagram and I wanted to show you so you can click here and it zooms in and you can see more detail so you don't have to stay with like three letters out of a method and there's an even nicer you are for it that is coming along which does everything animated and will give you sort of drill into particular threads profiles etc so where are the inline frames perf walks the real stack it doesn't walk the imaginary stack that Java has so it only sees real methods so we're back to the perf top with with no view of our little inline methods but if you use a person map agent and unfold all and flame graphs a read like one of the latest patches you will have a visualization including all the inline frames which we'll have a look at in a second and this is really to me because I really like open source it's it's a great story of collaboration so you have T Jake who's one of the cassandra' maintained and you have johannes the virtual void and you have Brendan Gregg and they all worked on different parts all together and what we got is this wonderful diagram and again I'll switch to the browser here and you can see that the flames burn high but also we get a new feature here that is actually something you you don't get with any of the Java profilers which is a distinction between inline frames and real frames and that's to me very valuable and I think also to to all of us it should be quite an interesting hand the the part where it gets interesting is where you have a big method that in lines a lot of other methods into it so I just came back from a visit to the Azure HQ and we have we actually have two compiler teams and people who write Java compilers or compilers in general they worry about in learning heuristics they want to know what normal applications do and they can have a lot of theory and it's it's sort of hard to discuss but here is a picture of real Java code and it's in learning behavior and what got I learned what didn't get in line on the more sort of developer side of things methods that don't are hot and don't get in line problems so if you have a hot method and it's too big it won't get in line to another method maybe there's something you can do maybe you can split it into two methods and that you you get some inlining happening there and it will improve their your performance so it is a valuable piece of information if you're trying to improve performance and you'll notice we can see the Java code calling into native code which is really nice we have a bunch of unsafe code somewhere here anyway take it home play with it it's gonna be great you're gonna love it right so down flame graphs it's a great visualization if you go to Brendan's website he uses it for loads of stuff he uses it for CPU profiling but he also uses it for off CPU profiling he uses you can use it with any hardware counter that you like so you can see where all the page faults in your application come from or you can see if you have a particular method that suffers from huge numbers of cache misses so you can generate this this new and exciting profile that you couldn't before you couldn't do it with any of the Java only tools certainly perf gives you great range but this only works for the the more recent Oracle releases I'm really sad that it doesn't work with zinc at the moment we're getting bad should happen in in one of the next releases but it is something that we're actively working towards if you on the JVM that doesn't support stack walking you can still use perf map agent which is better than nothing and you can generate flame graphs with honors profiler or Java Mission Control profiles you can also use I think there's one of the scripts for hpf will let you do that but then you get into the whole safe point bias etc so if you're on and all the JVM you can still enjoy flame graphs to a certain extent but you won't get perfect flame graphs I mentioned in learning this is from portal the cake as if I really like that um so like I said the JVM doesn't run your code it runs byte codes and even then they get compiled so there's no bytecode index to speak of you only have instructions and when you sample a process you only get that program counter but as it turns out there's a funny thing that happens with with program counters they're not entirely accurate or not perfectly accurate or not accurate and the reason they're not accurate is because when you think about it you may may have noticed CPUs recently are superscalar CPUs that means they do more than one operation at a time so you should have more than one program counter should have like four because more than one instruction can happen instructions get translated into micro codes micro codes could get fused there's effectively like a little compiler inside your CPUs without like spending everybody times on it compilers are really complicated trying to describe what the compiler does with just one instruction is no longer possible so they do speculative execution as well which is even more confusing because let's say I say load this value and then I branch on that value and then I do some computation the CPU is going to guess which value is going to come back from memory and then it's gonna go ahead and do some computation and maybe eventually it'll hit a point where it has to use the value that it pretended to load only then will it stop and wait for that load up until that point that's running so the expensive load is not going to get blamed because that's not where it stopped somewhere down the line you you'll get all the blame different instructions react differently to this situation and we'll look at a concrete case just now and you have signal latency as well depending on the hardware counter the point at which you try to profile in the point at which you actually profile are slightly a past this is not a big deal wolf you can describe it in tragic terms but actually people use you know profiles all the time and they managed to find valuable stuff it's not a big deal when you're looking at assembly but we're not looking at assembly so if you're looking at Java lines of code again if if I could say look at the previous line and that's where all the blame is gonna lie would be fine the problem is we start with this sort of fuzzy definition we start from program counter but that's sort of where we are and then there's not a bytecode index for every instruction some instructions are entirely nothing to do with Java they're the JVM sort of accounting code so they don't really relate to any line of code and then you find the closest BCI you can because you have to blame someone right not every byte code index have has a line of code so you just you know again look for the closest thing the problem is that that's a very weakly defined thing once you start looking at them lining and reordering profilers so before the JIT compiler ate your code it looked something like this you called all these methods in order and it all seemed quite reasonable in the code base and maybe you could make sense of it right after it's done with it some of the methods are gone because the JIT just decided that you're never going to call that maybe you're checking on our own or something that's provably never going to be true so the the JIT compiler is gonna say well you know I'm not going to have that here it can reorder different blocks of car this is particularly true if there are no memory barriers in the way of that reordering so it can take a method that was the last method you called and can make it the first method you called it also does you know the the compiler does interesting things like if let's say I'm doing you know classic defensive programming and I'm checking the arguments in each method you call and you call three of these in a row when I inline them it turns out I'm repeating the spiff statement three times I don't I don't need to check that three times I can check it once I can load that field once and check what the value is so the first method will get blamed and when I say first whichever one was reordered to be the first and the other ones won't have to do the work so which line of code that that if come from when I was discussing the sort of reporting in line methods with guys in the compiler team that they were sort of how can you do that that's not actually how things work we just jump a little all up so it's a confusing world after after you in line but 60% of the time it works every time so it's as good as it gets you get valuable information it's fuzzy but it sort of works so mostly it's good enough sometimes it can be really confusing and this is a a convoluted benchmark I came up with we have a hobbit that extends atomic long it is trixie we call x add or X add just cause get an increment down here and Inc not only increments counter but also cause set add it's a stupid thing to do but it does it anyway when we profile well when we start by measuring this code we can see that ink takes 4.7 nanoseconds and the X hat actually costs a lot more cost yeah 6.1 and when we run them together the ink acts and this sort of the sum of those two plus or minus so performance isn't additive is one lesson here but the other message here is incus and if we profile that we would expect to see the ex add method and the get and ant long as sort of 50% or more of the profile but when we profile it with Java Mission Control we see get an increment is like 4% 22% is spent in hashmap but val and the rest is spent in ink how can that be right why would that happen if we look at the path map agent unfold we we also get a confusing profile we get an increment is even less prominent why does that happen happens because of skid that happens because we in learn methods and it means the previous line is a confusing measure to look at because if you look at the call tree they're nowhere near each other those lines of code so how would we actually tell that we have a problem one option is to use jmh per fathom anybody here using jmh jmh is awesome if you're doing any micro benchmarking or benchmarking you should be using jmh it's the official recommended way to do a bad thing so if you read articles on the web it's or like benchmarking is really bad don't do it but if you're gonna do it you should use jmh so definitely use the image also I would recommend you benchmark it's educational and having concrete measurements is better than having no measurements in my opinion jam it supports three perf profilers it's the post perf stat so sorry just perf so you would you - prof perf and you would get an output that is similar to the perf stat d the nice thing about aggression with Jamie chess damage would only measure the measured iterations so it would drop out or the warm-up for you there's perf gnome which is even nicer because it would take all the stamp data it would normalize it to sort of pair operation measurements that's important because when you compare a bit of code that's going slower than another bit of code comparing the the stats for them is quite confusing you need to normalize through the work so if I have you know 11 million cache misses in one benchmark and then there are only nine million in another but in the other one I only did like half the operations then it's not really a piece of information that I can use but if I normalize the the number to the number of it the number of operations then I can say when I run it like this I have five cache misses per operation and when I run like that I have eleven so it's it's a it's a better way to look at your profile finally we have per face em which is something we'll have a look at in a second if you've used perf which none of you have it's like perf annotate perf annotate annotates you assembly code with the profile and so shows you where the hot spots are so it only shows you that for the hot regions of assembling so a few in some sort of sinful point in your past used to print out all the assembly and try to figure out what the hell was the compiler doing what my code perf SM will just dump out the hot regions with their profile and and you'll have a multiple focused view on that it's sort of a handcrafted lovingly mixed concoction of print assembly and paracord and it supports hardware counters just like anything Perth and when we run perf as and for this benchmark we can see that this innocuous move there is where we're spending 45% of the time now moves don't take that long so look one-one operation before that and we see the lock add queue and this is an expensive instruction and that's where all the money is but we can also see that the closest method to the move is the get filled set that's in line from benchmark increment so that's why all the blame goes to the benchmark if you're using perform in conjunction with another profile you'll notice differences in reporting of in learning and which instruction work came from where they're all right and they're all wrong and it's just different views on the same piece of data so perform uses the print assembly which is really handy for correlating the assembly with your code perf map agent relies on some JVM ti metadata and Java Mission Control an honest profiler rely on a synced call trace which uses mysterious internal data structures to get this information who's right nobody's right we're sort of running out of time we'll go a bit quicker so solar studio analyzes the first the last tool on this presentation it works on Linux the name is confusing it doesn't just work on Solaris it works on Linux as well you can't attach the running process but you can launch a process with it you get mixed profiling so you get Java and native in user mode it has three modes there's user mode expert mode and machine mode you get the a single trace view and then in machine mode you can drill into the assembly so the user mode view of the same benchmark looks very similar to what we saw in Java Mission Control you know this the the get ad long is nowhere to be seen but if we look at the Machine Road then we see that all those methods actually don't exist the figment of our active imagination and actually what happens is there's only one method on the stack and inside that method if we right-click and look at the assembly we can see the UH notated shameless assembly for info and in there we can see the lock at Q so in here we get the same view that we would from jmh professor it's you can't frame everything as a benchmark so sometimes you would use jmh and then you'll get a phantom sometimes you can't do that so that's another option for you that's sort of easy and usable working with studio first of all you need to install it which is sort of a mystery mission because the Oracle documentation on it is a bit lack but you can find instructions online you can send up a handy alias which chappal have posted once and I use that it's quite handy and then you launch it with your alias you get the collected experiment data importantly you need to filter out the warm up data yourself because you have to launch the application with it so there's going to be a bunch of data in your profile that is not what you want and dig in enjoy have fun final weather I have one minute exactly without leaving time for questions so this is the suggested workflow first of all you need to measure at the application level something you care about you wouldn't nobody cares about your profile you know you can't call up the businesses I saved half the instructions and they'll be like what the are you on about once you have a measurement that you care about do some application analysis server level analysis with flame graphs identify draw into your your bottlenecks don't drop into assembly straight away because it's unpleasant try and find the problem on the Java level you know Java you're comfortable with Java everybody's more comfortable with Java maybe you're doing something evidently stupid on the application level in most cases that's that's what's happening next up try and capture the code the real frame not the inline frame the real frame in a jmh benchmark iterate over it try and make it better try to improve the the performance and when you're done with that also importantly re measure the application just because you've improved one bit doesn't mean the application is better this is especially true for multi-threaded applications where the bottleneck might be in a thread that is feeding other thread so improving the performance of those downstream threads is going to do absolutely nothing so they might be really big in the profile but actually a problem is elsewhere that's it I don't have a queue a nice line because I forgot to make one so with them any questions how does it profile lambdas um I'm not sure to be honest I haven't looked in there lambdas are just they have generated classes so I would expect the the sort of lambda classes to show up but notionally the same way lambdas get profiled anywhere the code has some method some some class and some method attributed to it yeah it's it sorry does it work when the JVM is is stalling or hung yes it does it's completely independent of JVM mechanisms so we're sort of over time so come to me with any questions thank you very much for coming

Info

Channel: Devoxx UK

Views: 3,394

Rating: 5 out of 5

Keywords: DevoxxUK2016

Id: 7PkkxDaFDj8

Channel Id: undefined

Length: 52min 22sec (3142 seconds)

Published: Thu Jun 16 2016