Everything I Ever Learned About JVM Performance Tuning at Twitter (Attila Szegedi, Hungary)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right hello everybody my name is Attila and I will be talking to you about my various war stories experiences etc at the time that I well back in the day when I worked at Twitter I was part of Twitter's runtime systems back in the day right a runtime systems group and we were in charge of just basically making sure that all the JVM s at Twitter run without hiccups in the course I did learn quite a lot about tuning JVMs for performance I usually say that I learned quite more than I actually ever wanted to and there's there's been some hard-won lessons typically what happens is you have your developers who write their code and when they are complete they put it in production well they don't put it in production they start running it and typically they figure out that oh my god performance characteristics of this thing are not nearly as nice as we would want them to and that was that was when US came into play that was when they would they would come to us and then we would sit down and try to make sure that their stuff actually performs we used to internally joke there is a there is this period between when developers finish the code and when its production ready and that was basically us I don't work a Twitter anymore so this has been some years ago in the meantime I spent spent more than two years at Twitter mostly in the JVM trenches I also optimized the garbage collector of the Ruby runtime as well after that I left for Oracle where I worked on mass phone which is the JavaScript runtime on the JVM I spoke about it last year and as life has it I'm actually not little anymore either I went I circled back almost to Twitter I'm now with this little startup called fauna DB which is mostly found which is founded by some ex Twitter colleagues so it's a small company about half of us are ex Twitter employees and we are working on creating the operational database to end all operational databases you look here about us in due time I guess so with that Twitter's and in general any web services biggest enemy is nothing else than latency people don't like slow websites they will navigate the way they want to use it it will feel sluggish and it manifests itself at render time in the browser manifest itself at network connections manifest itself all the way to your servers latency and there are separate groups that deal with how can you minimize the network latency how can you speed up the rendering on the web browser and so on and so on that's not my table I deal with server-side normally by far the biggest latency contributor on a service that's based on the JVM and Twitter was typically like that there was there was some Ruby but this was being phased out most of our Scala and Java based services over time by far the biggest contributor to latency is the garbage collector therefore memory tuning is crucial if you want to have responsive applications aside from that you can have in process locking thread scheduling IO application algorithmic inefficiencies we won't talk about those because they are not domain of tuning application if inefficiencies can be solved by optimizations I always different I will talk a little bit about locking in thread scheduling maybe as far as memory tuning itself goes you can tune for different goals and you can have different approaches to it typically you can tune for your overall memory size you can tune for your I'm sorry we're not there yet anyway let's give the slide mentally I was talking about this so you can tune for a memory footprint you can tune for your application rate you can tune for for the amount of garbage collection that your system is doing and I skipped slide back when you're talking about tuning you can you can tune all of these but we will concentrate on on memory may be a little bit slow contention I have it in a talk we'll see how it fare with time as far as you want to tune your memory footprint well why would you want to tune your memory footprint well the less memory your service needs the less pressure there will be normally on the garbage collection so the fastest garbage collection is the garbage collection that doesn't need to happen right so tuning for memory footprint is worthwhile as long as it doesn't conflict with your other goals sometimes it does we'll get to that a very blatant sign that your replication is using too much memory is obviously if you're getting an out of memory error I know it sounds silly and very obvious right it's also one of those things that you don't want in production so you would be surprised how often it does happen the reasons why you might get an out of memory error or maybe you're not getting an out of memory error but your applications dock slow because it just doesn't have a I'm in a memory of free memory overhead to work with you might just have too much data quite obviously your data representation might be inefficient back when I used to work at Twitter I would typically use the word fat to refer to inefficient memory representation it's a great way to evoke your mom jokes which is also I mean it's a great thing when you work in a company that's mostly populated by 20-somethings you can also have a genuine memory leak but we won't discuss memory leaks a memory leak is an application bug so that's something is you need to track down and fix so how much memory does your application actually use there's a really easy way to figure that out I will have various GC flags interspersed in this talk at times I also confuse this let pointer because it's also lets screen so I will just point so if you run with verbose you see you will get nice log output and you can observe your numbers and full GC messages typically it looks something like this it will tell you when a full GC occurs this much memory was used and after the GC happened this much memory is used now and this is the total allocated by the JVM the JVM typical allocates more than what's used and doesn't release back to the operating system immediately because if it did that it would have to reallocate release on every in Jesus cycle and that's plain inefficient so once it allocates additional memory tends to keep it sometimes it does release it though show you how it does that what you want to is you're interested in this number right this is your live set after GC this amount that remains here is the live set of your program and if it's a lot specifically it's very close to either this number or it's very close to the physical capacity of your machine then the obvious question is can you give the JVM more memory more is always better as you know performance problems are resource problems right most performance problems even throughput and latency ones can quite often be fixed by throwing more memory at it and if the answer is yes I can give my JVM more memory that's great give it more memory hell all right for the record this stand doesn't have a lip so this could happen I'll just put this poor guy here and hope that it lasts until the end of the presentation oh we lost the screen cool we're in business if this isn't an advertisement for echo Harbor I don't know what is all right so so the other question is do you need all that data in your memory maybe you don't there are techniques for making sure that you don't keep everything in memory you can use our you caches or you can use soft references there is an asterisk here I should probably make it bigger and I will talk about it when I joined Twitter I told the soft references are great and since then I have both twitter and at Oracle encounter situation where soft references bit me really hard so basically what happens here is if you have data that can be recomputed reloaded from external sources right then you don't need to keep it in memory all the time even if you loaded something in memory you don't necessarily have to strongly reference it all the time sometimes it makes sense if it's cheap to reload or recompute to just keep some of them in a least recently used cache and make sure that expired so you can you can manage your memory users like that so fat data is normally not an issue these days you just structure your classes classes subclasses objects have pointers to other objects and so on most of the time it's all it's all good just you know keep to your nice object-oriented principles structure your code as you would normally but again when you work at places like Twitter things come like the research team comes around and they tell you hey we need to load the full Twitter social graph in a single JVM we've been giving this guy 60 gigs of RAM and it still doesn't fit yep or load Oh user metadata in a single JVM so at this point actually thinking about how can you make your internal data representation objects smaller it works at these economies of scale it's when you I mean back in the day I don't remember but this was probably several tens of millions of of graphs and even if you don't do the full social graph just do it for active users you will still end up with with a lot and there's always problems like that so normally and this is all specific to the hotspot JVM on hotspot whenever you allocate an object it will have a header and that's two machine words a machine word is whatever is the native size of the word on them on the machine these days it's 64 bits so whenever you allocate an object that's at least 16 bytes if you want to make it sound scary you can say 128 bits that's a lot right so you just do a new Java line go up check that will take 16 bytes one word is the class pointer the other word is typically used for locking and some other purposes like if you take system identity hash code then it needs to be permanent and it's stored there etc if you allocate an array it's even worse the smallest array you can allocate is a zero but is a zero length array of bytes that thing will still cost you 24 bytes because we have the it's an object right so you have 16 bytes for the header you have 4 bytes for the length of the array and 6 we are on a 64-bit architecture you have 4 bytes of padding until the next multiple of 8 so if you owe padding for subclasses if you have this simple class that only has one byte field smallest thing you can have this cost 24 bytes because you have 16 bytes for the object header you have one byte for that field and then you have 7 bytes of padding until your next multiple of 8 if you subclass this guy you would think oh there's plenty of space in that padding so this should still fit in 24 right because 16 for the header 1 byte 4 X 1 byte 4 y 6 bytes of padding nope thing is padding happens on a subclass by subclass level so the in memory layout of this class will be 16 bytes for the header 1 byte for field X 7 bytes padding 1 byte for y 7 bytes padding so if you're conscious about memory I guess go easy on subclasses this is this is a terrible advice to make I mean quite often whenever you are talking about performance so sooner or later you will you will find yourself at odds with nice design so I would say keeps up classroom keeps those abstraction higher he's nice if you if you can but if you find yourself in a situation that is biting your performance wise then consider flattening it of course if you have a class that has a reference to another object doing something like this will immediately take away 40 bytes because because you will have one object of C and another object that's 16 and this one is 24 because its header plus 8 bytes for a pointer and similarly array elements there no such thing as inline array elements there is no such thing as structs in java yet there's this project Valhalla which is adding value types to Java and it will be a huge boon once it shipped I'm not sure when does a ship probably know Java 9 no certainly not John so but with Java 10 I guess we are hopeful we will hopefully have actual value types in Java so you could have a situation where this object is actually embedded well not as an object but the destruct in another so C programming language style structs are coming to the JVM but until they are here we'll have to cope you can take slimming to the extreme so remember that full follower graph in memory research project that I was referring to we ended up there representing each vertex which is a user edges as as int arrays because we just couldn't afford the pointers at that point it was that big if it grew even larger we were I even suggested to them that you know you're just linearly scanning these arrays we could do some variable length differential encoding in byte arrays so if you have objects so you need to represent like huge number of connections say you know no people following Justin Bieber's typical example right but and I actually suggested this sort of like a joke but then being data scientists they immediately liked the idea anyway as I say don't do this at home this is this is something that you really need to resort to when when things are going bad there is such a thing as a compressed object pointers which the James typically used the idea is that since we are padding every address to a multiple of eight the least three bits of every point there are zero anyway so why not just right shift them by three and use 32 bits so we 32 bits normally can address four gigs of memory but if you treat every at every value as being divided by eight now sudden you can address 32 gigs of memory so the J Liam actually does this quite nicely for some reason if you set your heap to third two gigs it will not use compressed pointers but it still could automatically it uses it by until about like 30 gigs and if you need more than 32 gigs of heap then you must use uncompressed pointers because with compressed pointers the max you can address is 32 gigs this also means that there actually exists this uncanny performance valley that you can do much about so there was an Oracle study on very large corpus of JVMs that said compress the pointers reduce the memory load by anything between 25 to 33 percent it obviously depends on the mixture of objects in your running JVM how many pointers do you have but this also means that if you need more than 32 gigs of ram if you go to 33 suddenly because your pointers have now inflated you actually your JVM will be able to store much less objects in memory so if you need to go above 32 you need to go to at least like 42 to 48 to actually get the same amount of data that you can store in memory same amount of objects that you can store in memory so 32 between 32 and 48 you don't gain anything you actually lose because of the pointer inflation so take care of that if you see that you need more than 32 gigs of ram don't go to 36 because you will actually end up being worse there just handy chart showing that how do the various sizes look in uncompressed and compressed pointers also 32 b2j vm's but nobody uses those anymore there's some other things that you can figure out and you know this is old but it applies so avoid using instances of primitive wrappers this is such a obvious and blatant thing that I'm even ashamed of pointing it out but sometimes you don't figure it out so when I joined Twitter in 2010 and this is like dating it v you still use like Scala 2.77 and turned out like a sequence of ins will use boxed integers and an array of int will actually store primitives now the first one needs this many bytes notice this very much factor here and the second one obviously needs much less and you don't really notice this except when you look at your program with the profiler and you see what my god where do all these boxed integers come from then start digging and then you figure it out eventually this was fixed in Scala 2.8 but that's not the moral of the story I could have used any library Phyllis tration of the concept the moral of the story is that you don't know the performance characteristics of your libraries the only way to figure out the performance characteristics of the libraries is to run an application under a profiler if you want to optimize in tune JVM you will need to familiarize yourself with the profilers there's really no way around it either that or read and understand the source code of every library that you rely to rely on your pick for me personally I would be too dumb to understand all the libraries that I rely on so for me using a profiler seems like a easier way around that speaking of libraries there was also this funny thing with the map footprints we used to use Google guava and it's a great library and it has this innocuously looking very default write map maker make map what can be easier than that and it turns out this will create a map that eats more than two kilobytes of RAM again not a big deal unless you have a problem where you need to create several millions of them in memory so and obviously we had that so and what you can do all this creates concurrent Maps so what one funny heck I did and one particular problem is I reduced the comparison level to 1 to make the map and now it's something just took 352 bytes and we could store about 8 times as many in a memory immediate and it was great now what's the difference here the concurrent Maps concurrent maps are typically implemented by striping so striping to the number of concurrent well usually CPUs or cores on your machine so this can end up having sixteen different stripes to increase the concurrency it reads and writes and if you fail to concurrency level of one you just have one stripe okay but why is it concurrent then right what sense does this make actually it makes a lot of sense because you still get concurrent reads so I mean compare compare this with a synchronized map synchronized map still you can only have one reader or one writer in it concurrent method concurrency level of one can still have multiple readers but it can only have one writer and the writer will also lock out the readers but still better performance and it's also not sensitive to concurrent modification because the other others might roll or concurrent modification Accession this doesn't so we again had this thing service where you fan out tweets so Twitter even back then was mostly using a push model so somebody tweets and somewhere needs to be a machine that has all of the followers of that person in a map and pushes it out to all connected clients and again if you have a very popular person that tweets something then somewhere you will have a large map and most of those maps are actually really small but you also need to have a lot of them in a memory alright looks good yep okay so another thing I found is and again just as previously my point was not about Scala what I'm telling you now is not about thrift thrift is of wire format that was developed by Facebook and that Twitter is internally using a lot it's so you know a bunch of micro services they need to send data to each other and Twitter's chosen via format was thrift thrift is pretty great because well the things that I love about rift are also the things that I hate about thrift thrift is using via format that has no description in it it's very compact but it also means that you need generated code that can read it from the wire because it has no self description you always need to have coding and decoding classes for it and problem is some people were using them as domain objects which is really dumb so here's what happens you have an object that is named you have a class person right and you have a thrift wire format and your thrift code generator generates your person class it has a string first name string last name whatever date of birth and so on it looks great it's just exactly what I need I just fuse this no you don't want to and here's why every thrift class that has a primitive field will also have a bit set it has a bit set because thrift is really tricky on the wire so if you say have a boolean field that you never set to anything it will remember in this bit set that I never set it and it will not send it over the wire now just having this innocuous bit set who had like 72 bytes of overhead to your object here's why you have your thrift generated class 16 bytes at least there's a pointer eight bytes there's a bit set 16 for the header another pointer to the words another 8 bytes 4 bytes for the words in use there's an actual longer a here which is another 24 bytes for an array and then you have like a single long element here which is 64 bits oh yeah that's the one bit that you actually use so and you know what this is actually great this if you have a data transfer object you don't care about this because it only lives during the time of Alpha of a request response cycle and then it's thrown away in garbage collected but we also had some people at the company that just kept those things in memory so they were like I'll just keep a drift proxy and whatever they tidy sterilized I just keep all of this in memory and then you got then you got stuck with your 72 bytes of overhead and then of course they come to you and they tell you well I have this 64 bit JB a 64 gig JVM and my data doesn't fit in Java sucks so yeah and what I did yep screen is dead again so anyway what I did here was I I went in refactor the code like you had opportunities for various optimizations we had a class for location which had like a city region country code metropolitan area latitude longitude and so on for like Twitter's geolocation stuff and then they just have to generate it from thrift and so it had so many opportunities for optimization right actually you can look at the side screen size screen still work and so this was what truth to generate and when you want to send that location over the wire this is fine but if you want it in memory you can like extract this separately only have this here it has a pointer shared location so these are already much smaller country codes can be interned and you know cities typically belong to a single country at the time except in unfortunate circumstances and and so on and you know suddenly like 60 gigs of ram went out to 20 and you know Java doesn't suck anymore yep as I said it's not about drift thread locals you know what I'll skip this let's talk about fighting latency so what I was telling you about until now is various techniques for for restructuring a program but let's look into like actual latency fighting if you were studying computer science at the University you probably were faced with this thing where the typical tell you there's a trade-off between memory and time yes there is and this is true it's also overly simplified I typically think about this as a triangle very novel right because you still have the memory but time is a concept that can split into either latency or throughput now what you typically want to you want to increase this guy you want to decrease this guy want decrease this guy also it's mentally a little bit confusing but you need to think about I want to increase one decrease the other so that's why I kind of invented for my own internal use instead of memory usage I figured out compactness which is higher the less memory you use and instead of latency views responsiveness which is the higher the less the latency is and so what happens is that for any given combination of Hardware operating system and their application you will actually have some fixed constant which is some kind of a weighted multiple of these three and when you are tuning you have a fixed a and you are trying to increase one of these while decreasing usually the other two you can also optimize your code or optimize your system which is a means of increasing the overall product you can optimize your code by fixing algorithm inefficiencies throwing more hardware at it switching to a better operating system and so on but that's not tuning that's the difference between tuning and optimization yeah normally responsiveness and throughput are at odds with each other this is an imaginary time diagram of an application that's responding to requests over time and if you optimize for throughput you can see these guys are more densely packed so you respond so it process more of them but every now and then one of these will suffer a catastrophic GC pause at which time it will wait until it goes further if you optimize for latency it typically end up it's smaller throughput and each of these will incur its own little overhead sometimes you have bulk applications and you want to go with this but this is actually more preferable especially for web applications it's easier to plan for this if you throw more hardware at the system then you know that this will go lower if you throw more hardware at the system that's throughput oriented it's not really telling what will happen I mean it should buy you more throughput but it's not a square-cut biggest threat responsiveness in the JVM is the garbage collector and here here's our typically the memory layout in a in a JVM hot spot looks looks like you have your heap which is split into new part which is either in survivor in the old part you also have some other things like permanent generation which went away with Java eight and cut code cache we are not talking about those at all so how does young generation work every new allocation happens in the Eden it's super cheap because there's a pointer that starts here and you allocate an object and allocate an object the market an object so it's always just a pointer increment it's really fast and when even fills up that's where I stop the world copy collection happens so all the live objects are traversed from GC roots which is everything on stack everything is CPU registers and so on and gets copied in for sake of argument survivor space 1 and at that point this pointer is just reset to zero and everything starts again when it feels up again then it again is collected plus the survivor space 1 and everything that survived is copied into survive space 2 so survivor space is 1 is always empty another is full and eventually things from survivor space go into the old generation now what's important to note here is your dead objects are free to collect that's why allocating is not a big deal in Java well you pay the constructor price but other than that if the objects die they are not even traversed here only the live objects are ever collected from the eden and put in the survivor space and then the pointer is reset so the dead objects are summarily thrown away after several collections your survivors get tenured into the origin so your young generation operation young generation size should be big enough to hold more than one set of all your concurrent request response cycle objects so if you have an HTTP server you have 50 threads let's suppose and each of them needs about 10 Meg's of RAM to process a request then you will need at least 500 Meg's of young size that's actually ridiculously low but just for the sake of argument so that all the garbage it's created can in at least one request response cycle concurrently can fit in it and your survivor spaces should be big enough to hold active request objects with those that are tenuring so actives may be your HTTP request is in progress generated 4 Meg's of garbage it will generate another 6 mix of garbage but at the point it has 1 Meg of live objects so it should fit in there you have several collectors in the JVM you have Troopa Tory emptied ones you have low post ones low post once the concurrent mark-and-sweep is a good choice which I have a 7 Java 8 as well all little or now user g1 GC by default I cannot really talk about it because I don't know much about it and so what's really great is that if you use the throughput collectors those guys can actually automatically tune themselves you can tell them things like this is the maximum GC pause that I want this is a true put goal and this is the ratio of time like 1 to 19 that I want to spend in GC and they will try to size the memory region so they actually meet this goal memory tuning is typically like rings of hell there's many of them and you don't want to go too deep this is the first ring if you can manage it here that's great try the adaptive sizing one place where you really want this this is one of the Twitter's queuing systems internal queuing systems with adaptive sizing poly so you can see it it's pretty low and then a lot of load gets thrown at it and it up the sizing policy increases the memory size increases it even more with another peak but then it then it comes down the ideal operation of a queue is being empty right message comes in message goes out message comes in message goes out you want them to go at sometimes by the nature of it they will pile up and that's when you need to adapt so for queuing system throughput policy is pretty great so if you an akai box service used to put collector with not active sizing for everything else to put collectors are good with the emphasizing policy or good first try and if it doesn't work then you need to use concurrent mark-and-sweep anything whatever you do you need to first tune your young generation that's crucial if you enable those flags you will see a lot of interesting infinite along you need to determine your desired survivor size there's no such thing as a desired Eden size the more memory you can throw into Eden the better there are some responsiveness caveats with that but this is general advice and basically the survivor size after every small little G every youngjun G C minor G C you will see that the Eden space is zero because we just collected it's always zero it's the most useless piece of information ever one survival space is always empty and another will hold some number of objects what's the ideal value here anything below hundred because if this goes 200 it means that not all of your surviving new objects can fit into a survivor space they need to go somewhere so where do they go they get forcibly tenured into the old generation and you want to ideally not go into the old generation there's also this interesting thing that shows you how many generations did objects survive in a in a survivor space you want this number to be decaying like really the king if if it's not strongly decaying then your memory load is increasing that means two things a your application is starting up and it's creating all the objects buffers etc or B you have a memory leak so if this is pretty constant then you have a problem tuning the CMS is again give a rap as much memory as possible the thing with CMS its concurrent mark-and-sweep it's speculative what CMS does is your application is running and it's allocating memory and CMS runs concurrently writes a concurrent mark-and-sweep it runs as your program is running and ideally it can clean up memory faster than your application allocates it as long as it happens you're fine again just try using CMS without tuning just use verbose GC print GCD to see whether you observe any food received messages you don't you're done don't go to the next string of help if you do then go back to you in the end generation then come back and tuning the CMS all generation you need to keep the fragmentation low CMS does not compact that's one of the things that it doesn't do normally and you need to avoid Fuji C stops normally computer science when you have two goals they're always in conflict with each other this is probably the only time in my profession I figured that there is a goal that I have two goals that don't conflict with each other it was wonderful and what you need to do for CMS you need to find your minimum and maximum working set sizes there's a trick to doing that we switch to a throughput collector which only collects when the memory fills up you see how much they did clean so you will observe an actual maximum working set there and then over-provision this number by 25 to 33 percent so you see that your application in stable state is always using ten men well that's all ten gigs of ram well give it 13 or 15 or more as much as you can because as I said the CMS needs space because as it's cleaning your application is still allocating so you want to have extra free space for your application to be able to to allocate while CMS is trying to catch up people will usually say well this memory is just sitting around doing nothing well it's not doing nothing it's a safety cushion because if your application out runs the CMS allocates all the memory you know what happens I stopped the world Fuji C happens and since the CMS does not compact normally this is when it will compact and that's when you have like two minutes pauses on large heap in a distributed system a two minute pause is indistinguishable from a host that went down it's that simple so you can also do things CMS doesn't run all the time it when like 80 to 75% of the memory is allocated that's when it kicks in it tries to save CPU and there's actually a flag for that and you can actually lower that if you have sheep spare CPU you can even like lower it to zero this means that basically CMS will just run in a cycle all the time in the background in your application if you have spare CPU cycles that's a good thing usually you have most applications these there's an i/o bound so you will typically have spare CPU cycles all right and you know if your responsiveness is still not good enough then well this is pretty much almost the last slide if you have too many live objects during young Genji see you can try reducing new size this is weird this is a weird thing that that sometimes happens at Twitter we had the situation where collecting the young generation was taking long but that was because we had a long with lot of new objects that were surviving in that case you actually reduced the new size you even reduced the survivor spaces a contrary to conventional wisdom you want those to like tenure into old generation where CMS can hopefully better handle them and you know it does work but but if it's weird it always depends I mean again you need to profile you need to watch it and finally if you you can have too many threads we had a service that had like thousands of threads threads needs to be scanned for GC because they are live roots so you need to need to find the minimum concurrency level that works you if you know if everything else breaks you need to break a service into several JVMs let's that's the ultimate performance tuning advice use more processes potentially more boxes but you know that's that's that's literally thinking outside of the box and I think we are at time is that right okay that's great because I would have have had a small like part three but that would have been another three slides with unrelated advice and I think that at this point this talk is pretty round as it is so thank you for your attention all right anything questions thank you first can you please tell me have you used all hip storage for your purposes ha it seems like it's well valid position the use of the regime okay so the question is have we used of heap storage for our purposes yes with it actually while I was at Twitter we had a service that was adopting Cassandra and we had some Cassandra committers being employees at Twitter at the time and they were tuning it a lot and they were experimenting with off heap buffers they work but the important thing to realize is that all you can store is row bytes so you will need to take care about encoding and decoding things in your byte buffers and you also don't get any benefits of garbage collection you need to manage those buffers on your own in Cassandra's case this was workable because they were using those byte buffers as serialization buffers before flushing to disk so there was a linearity to it so you would linearly fill up a buffer and when you flushed it out it was clean but we also had some people experiment with off heap storage for their own byte encoded stuff that's when they couldn't make the JVM perform as well as they wanted to and that was horrible because they ended up writing their own free lists in the byte buffers and so on so yes you can use of heap storage but be wise about it if you have a really good like boundaries of liveness and preferably those are linear then then it's it's it's actually a pretty good proposition but again you you will have to go back to row byte arrays and encoding your data over bytes so it's it's not terribly object oriented thank you very talk from your experience in your career what was the most craziest reason of out of memory error what's the craziest are out of memory error well before Java 8 you have you have permanent generation in JVM and it can throw an out of memory error as well but what's funny about it is that things that are kept in a permanent generation are a class data and I had a system where a lot of classes were loaded on the fly so it was actually something like OSGi so a versioning something that could load different versions of the code so what happened was that even though we were keeping the classes in ordinary heap the class objects themselves held by soft references so that the GC can clean them out whenever it needs them the regular heap never filled up so they were never cleaned the permanent generation on the other hand it filled up so we had this situation where the perma we were getting out of memory errors in the permanent generation why we still had a plenty of heap and the idea was that we we are softly refer referencing those class loaders they will get garbage collected when they need to get garbage collected except they weren't because the garbage collection of software including of soft references was governed by the ordinary heap and instead we had a out of memory error in the permanent generation this took a while to figure out and track down because it was out of memory error but we have plenty of memory what's going on yeah that was that I think that was the craziest out of memory or if they ever had was even pre-twitter all right any more questions okay oh there's a question thank you for speech I have one question regarding to all generation a new generation you told said issued for sometimes we could pass also our object to all generation and because we have GC and multiple reducing the memory it's good idea to possible object to all Turner's all it better to avoid this situation normally okay so the question is is it a good idea to like for subjects if I understand correctly there it is is it a good idea to force objects into the old generation normally no that's because most objects die young that's the so-called young generation hypothesis and it's been observed that that for majority of the program's out there all the objects that you create they will die while in the young generation you don't really want to so ideally you want to actually keep we want them to die while they are young and only few should tenure so ideal in your old generation you only have objects that are the infrastructure of your program but anything that's transient data requests response cycles and so on that should ideally all just you know live in live in live and live and die young the when I was telling you this is that if for some reason your objects are not dying young I'm not sure why so your application logic is such that your objects are not dying young then it makes sense to make the young generation small and shove them off because even if you have concurrent mark-and-sweep the concurrent mark-and-sweep only works on the old generation new generation is always stopped the world copy collected and it's faster the less objects there are but if most of those guys in Eden survived then it can be slow so in that case you might be better off just having a small new size making sure everything tenures and then let's see ms take care of it but that's unusual that's an anomaly if you if you profile your application you know too this trend then it makes sense but but this is like counter to the normal advice which is give as much memory as you can to new generation because the more memory you give to the new generation the higher the likelihood that objects will die young because there are bigger the new generation the definition of young becomes also larger right so they they will be young longer so so as I said this only happens if your latency problem is due to the minor GC collections in the new generation we did have that so we had an application that was tuned to the bone and still they had some undesired latency and it turned out that all the latency that remained was was a new generation collection latency and we had to do something about it so this is this counterintuitive as I said rings of hell we were pretty much near the bottom here okay all right any more questions all right think redundant thank you again for

Info

Channel: jeeconf

Views: 43,759

Rating: 4.8888888 out of 5

Keywords: jeeconf, JVM, performance, java, performance tuning

Id: 8wHx31mvSLY

Channel Id: undefined

Length: 50min 57sec (3057 seconds)

Published: Tue Jul 12 2016