Back to basics: threads - Adam Dubiel

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so hello everyone hey I'm here hi my name is Adam Dubya and I would like to start this talk with a short note about abstractions and frameworks that we use daily so they are great they allow us to deliver our business value faster they shield us from repeating the same lines of code all over again from all the mundane tasks but there is a dark side to them because they hide all of details from us the details that are not just some you know implementation details not important to us but the details which are really crucial and critical for the performance and stability of the applications and services that we build and because of this sometimes we become uncertain and very vague when talking about those you know lower-level things like Fred's so we're not sure how they work or what happens in our application and as Dexter once said I think abstraction should not be a way to hide those things they are abstractions are here to empower us to make our language up should absolutely precise to be able to speak in in very precise way and so during this talk I would like to take a few steps back from all those frameworks and abstractions that we use and bring back the ability to talk about threads in a very precise manner so we are sure how they work and what are the consequences of the way they are all of the way they work in JVM applications but before this short introductions so I'm a technical team leader at Allegro and I work in a team which is responsible for building this kind of layer between micro services and infrastructure so we have a lot of micro services a lot of people working and my team is there to make it all make the life of developers easier and so let's get to the very beginning of things so whenever you start a new application a new process it's like a big bang out of nothing comes something and in the this very short few moments in the very beginning there's a lot of going on there like there are a lot of threads created there are a lot of things on the CPU but after a while it all goes quiet it all comes down so our application you know is in its stable State it's already processing the traffic and nothing much happens maybe from time to time there's like a supernova there's a new Fred created somewhere but it's not happening so often everything starts in the beginning and to give you a better understanding of what happens whenever you start a new application I've created a kind of a test application based on spring boot and I created a timeline of Fred's creation so you can see the first 25 seconds in the life of this application and you can see the threads as they are being born each of those blue dots is one Fred and to give you a better understanding of how many Fred's are there this is the same data just plotted in a slightly different way so in the very first few milliseconds there's like 20 new Fred's created after a while there comes more it's like 30 of them around 7 seconds mark and then after a long long time around 25 second mark you can see there's like 80 new threads created why what happens at those moments so we have this big Bank the JVM starts this is the zero point this is the beginning of our universe of our application well the first 20 or so frets are different Virtual Machine JVM threads and we're going to talk about them a bit later then around 6-7 seconds in the application lifetime spring contacts boots up and it adds another 15 threads so these are HTTP server threads maybe database connector threads things like this and then the traffic starts flowing and then we get the huge surge of new threads around 80 of them so just for the purpose of this talk the delay between spring booting up and thread and the traffic study that starts flowing is a bit you know bit longer it takes a bit longer than usual I just made it like this so you can see the huge surge of threads usually the traffic starts flowing as soon as your application boots up so those threads would be created somewhere probably around 10 second mark so what happens at the very beginning whenever the process is born I'm going to talk exclusively exclusively about Linux systems so and in Linux all threads are gathered into kind of a tree structure so there is a process tree and you can trace any single process going back to the first process ever to the root of all processes and in any modern Ubuntu system it will be system D but it's always process with appiied or process ID number one and out of it stems all the tree and in this slide you can see just a just a small chunk of process tree and in this processor you can see that I had a SSH daemon working and I actually SSH down to the machine and because of it I created also a new shell bash you know interactive shell for my process to run but what would happen if I run a simple code in this in this scenario so whenever I run Java version I'm just gonna get you know and you process a child of my bash shell would you simply Java well of course you might have a setup like this so your system system T might be able might be responsible for managing your Java threat but you know it doesn't change much for us but what is the process I probably said this world like 20 times up till now so you can think of a process as nothing more than just a collection of threads a set of threads which all can operate on the shared memory so just a bunch of threads some memory and they can all you know share the memory mutated read from it from the same address space from memory and there's nothing more it's nothing sophisticated this is a process just a set of threads okay but this is well this is how processes look like from the perspective of our applique our operating system so what does it have to do with threads in JDM well we're gonna find out so I use the ps3 a command to plot this kind of a process tree and I hit a bit of a very important detail from you in the slides before so this is the full process tree whenever I run java application and you can see the same Java process here but then there are some weird things next to it like a new branch of things but they look different because their names are in curly braces you can see that the like the numbers are still like in the same number space but they are bit different what are those well it turns out that JVM threads are actually system threads so what we can see here are threads that are Java process has pound so what does it and if you don't believe me you can just go to the Java source code and you know this is Java 10 source code and whenever you create a new thread you're going to call pthreads library defer to create function and it creates a new fret it goes to the system asks to create a new thread for me and we receive a new thread for this process so what does it mean for us as you could see I could see Java threads in using the system tools so if I can observe them using system tools it means that I gained a lot of new tools and ways to observe my java application not only Java specific things but also system tools and there are a lot of them if there are system threads it also means that it's not the JVM that's responsible for scheduling CPU time for them actually JVM has nothing to do with it it has no say in this process so Linux kernel scheduler is responsible for scheduling for forgiving you know lots of time on the CPU for scheduling all those threads not Java it has nothing to do with it so if they're observable using system tools well let's try them let's try to make some use of it so you know the slash prods file system all the information about running processes are there and if threads come from the system there are also you can also find those information in praat file system so whenever you LS prod feed task directory you're gonna see all the threads that are running inside this process if they are there they're also posted it's also possible to take a look at them with a higher level tools like PS or if you don't like PS you can you stop but the threads are still visible so you can do it with the minus H argument is going to show all the threads for this given process but how can we make some use of it so in the PS command I wanted to output specific specific columns so I'm outputting feed teeth I'm gonna talk about those in a moment memory CPU and time and command so what is the speed and anteed did process ID however they said it it just you know idea of the process of the of the parent process feed is thread ID this is the native system thread ID the number of the thread but how can we make some you know how to use it in practice what can we do with this information so you probably seen you know things like this so like the all the CPUs are utilized in hundred percent and nobody knows what's happening how to be able to quickly tell which threads are using up of the CPU time well we can you stop or PS to see all the processes and how much CPU are there using at the given moment so after running this command I'm gonna see this output and you might already notice that there are four threads here which actually utilize a lot of CPU like 30% each and we also have their IDs so this is the teeth of those threads if only there was a way to match system identifications IDs with JDM thread because well as much as it might be useful to know that there are only four threads which take up all the CPU time I still don't know what are those Fred's doing which ones are they so in Java world whenever you want to talk about threads you're going to start with J stack yeah so to see the threads which are currently running in your Java machine you're just gonna use a stack with PD actually well you might want to use J command with Fred print whatever but you can just get the same output and for each Fred in the output you're gonna see this line and there are a few important things in this line and I'm going to zoom in on them so what we have here the first column is Fred name so if we took care of it and actually named our threads correctly we're going to we might get some usable information from there the next one is steed which is Fred ID and there is a need which is native ID so it gets a bit complicated here so we have some crazy feed which looks totally different from the teeth that we've been observing before well this is because well this output is not so easy to read this is did here in J stack is actually the hexadecimal representation of pthreads ID and it's not really usable for us in the context that I'm talking about so I just wanted to show you that the name there the names clash but it's not really important for us the important bit is n ID need this is actually the hexadecimal representation of system or OS thread ID yes so we take Fred ID make a hex of it and it's gonna be here well so there is a link between those two numbers system Fred ID and Java and ID JVM and ID so let's try and figure it out so let's take this one Fred it's taking up 40% of our CPU time and it's trade ID is is eight five to one okay so what I would do next is I would translate this eight five to one into hexadecimal number which is 2 1 4 9 and then just run a stack and grab it grab in search of this 2 1 4 9 and what I'm gonna see is this no surprise here actually this was a GC thread all along GC and a parallel GC thread was running was eating up all those resources and I can see it because the native ID need here matches the thread ID from my operating system so you might want to probably write a script for it because you know it's pretty complicated and whenever you've got a downtime or something bad is happening you're probably too stressed to remember about all those steps but there's good news so in Java 8 there was this method which was called set native thread name and it was called it is called whenever you used thread that set name unfortunately it was not implemented you can see not implemented yet yet because with Java 9 comes the implementation of this method so if you need any more reasons to update there's another one and what it does is it actually takes first 15 characters of the thread 9 whenever you set the thread name and it broadcasts it to the P threads so you're actually what's happening is that your system is becoming aware of the javathread naming so it's only 15 charge because it's like 16 charge limit plus 0 at the end but this is very powerful tool now because what we're going to see in in each of the system tools like ps3 is actually the names of the friends you can see them here it's not Java anymore some cryptic name is the name of the thread and it's the same in PS so if I were to say which Fred is taking up all the CPU time it's easy now I can see GC I don't have to do all this crazy hexadecimal mumbo-jumbo I can just go with of the output from PS so unless you have really long thread names which often clash this is a great tool to observe what your application is doing yeah so just the GC Fred so we already know how to that that JVM threads are actually system threads and how to observe them and there are a lot of threads in a typical application so I would say that this one we have like 110 or 115 threads here is actually like pretty small so you could see like applications with 500 or or thousands of threads written in JVM in Java so let's take a moment here and wonder how many threads can you actually spawn so how expensive it is to spawn and you Fred why are we able to spawn like thousands of them and our applications are still running so there are different types of costs when it comes to threads there is the explicit cost we need some memory to allocate the stacks it's 1 megabyte by default and it's an explicit cost you can actually calculate it and there are the implicit costs so there is a cost of context switching there is the cost of safe pointing and GC roots so each Fred you spawn actually becomes the GC root so whenever a GC kicks in it has to you know iterate all over them and and see if it references any other stuff the same with 5 pointing so before you start your garbage collector you need all the threads to stop in the SE at the size point which means that we're waiting for all of them so the more friends you have the more time you might be spending waiting ok but let's talk about the explicit cost verts first so how big is a thread stack before we jump into the question of how big it is let's let's settle one thing so as you all know in Java Java applications have this you know like two regions of memory so there is a heap and this is the place the kingdom of garbage collectors of managed memory and there is the off heap so all the other so this is the place like we don't touch it usually like this is the the system memory that belongs to the Java process and Fred I live here they live in the off heap so setting maximum heap size doesn't you know affect the number of resident you can spawn and there is this cute little tool in JVM which is called native memory tracking and it can tell us how much memory we are actually getting from the operating system so whenever a whenever JVM goes to the system and asks it for more memory it's gonna be logged using this tool so it's called native memory tracking and there are different there are different categories of of allocations and one of them is Fredd allocation so in the output of this tool like you know just printing Java version and and and printing the memory used we're gonna see that we created we actually created 18 fret this is number of just open Java version and we actually asked for 18 megabytes of memory from the system so after doing some maths you can see that actually I was telling the truth so the the size of a stack by default is one megabyte so one fret is one megabyte okay so we wanted to calculate the total application memory that the total memory used by an application what we would have to do is actually take the heap size and then take the amount of threads and multiply it by the stack size and there there are some other factors that we should actually cover but that would be that would be for us now and based on this simple formula we can start reasoning about how many threads can I actually spawn before I overflow the memory so let's say I get a machine with some some amount of RAM let's say it's like 2 gigabytes and there is some heap and there is the you know of people or the all the other memory in the system so it's 2 gigabytes the heap would be I know 500 megabytes but just turning 500 you wrong so it's gonna be 512 megabytes and then on the off heap or the other parts of memory I got like let's say I'm gonna allocate like 300 megabytes for some stuff we didn't care about and it leaves us with around 1.2 gigabyte of memory just for Fred snacks yeah so once again complicated math 1.2 gigabytes 1 megabyte per each Fred it should give us the ability to spawn one point mm Fred am i right I think I am so let's see let's check it so I got this kind I got this background box running here which has exactly 2 gigabytes of memory and there is a there is a simple application here somewhere ok it's here it's called Fred spawner and what it does it actually only spawns Fred's and it and it prints how much how many of them it spawned so I'm gonna run it okay I'm running it didn't I say that I will be able to create only one point mm Fred's well I was able to create like 5,000 just to say that I'm not lying you can see in table that I've got 2 gigabytes of memory this is 2 gigabytes so how is it possible well you can't use all the memory if it's not real so let's think about it it's not real we cannot use it or we cannot overflow it what does it mean so there are actually two types two ways to measure the amount of memory that the process is using we've got virtual size and the resident size so the virtual size is it's virtual memory size it's actually it's a virtual memory it does not exist it's just an address space that my thread can access that my process can access so you know whenever I ask the system for some memory it can say sure like here you go you got like 10 gigs of it it doesn't mean it's real I just can access it as long as I don't access it it's not there it might not it might be somewhere and swap or it might be just you know non-existent and this is supposed to resident set size so this is the actual memory that my process is using this is like the hard memory that I have somewhere on RAM and I'm using at the moment so this is the only metric that can show you how much memory your process is eating up at any given moment it contains everything like for java application it's like the sum of heap size and all the threads and and meta space and everything like everything the process has allocated its in rest so let's see how it works so once again to my machine and I'm gonna run top here and then I'm going to run this java spanner once again and you can see the virtual size is like I don't know I'm seven gigabytes or something oops it's gone while the resident size was still only 70 max why because the phrase that I found do nothing they just go to sleep so there is no need to allocate one megabytes of stack for a fret that does nothing so the system just doesn't do it it promises my process my JVM application that it will give me some memory if I really need it but it doesn't have to do it if I actually try to you know use up all this memory like run spring application with huge stacks well I would be in a world of pain so like I would be out of memory very soon and the out of memory killer would probably come and kill me so as you can see I was able to spawn like five thousand threads or something so if it's not the memory that limits me what is the limit and here we come to the to the notion of those operating system limits there are a lot of those and they usually you know they really show up when you least expect them the one that we were interested in is max user processes limit so it tells us that single user cannot spend more than you know around 8,000 processes and as I said threads and processes are not so different in linux world they're very similar so you know new friends actually eat up or they add up to this limit this 8000 limit that was data that's why I was able to spawn so many threads because they didn't reach this limit yet it's not because of memory it because of a system limit of course you can change it if you want but this is the default on Ubuntu okay so you've already seen that I can create a lot of threads but should i I can't but should i well if you've been if you've been following the the newest friend trends in in frameworks you can see that actually we try to limit the amount of threads so like you know who comes Webb flux which actually instead of spanning 200 threads it spawns like you know just a handful of them but why why do we strive to keep the number of threads at the low level well we've got only so many physical CPUs like in this case let's say we got only four of them and then my application spelled like 110 10 threads so it's physically impossible for all of them to run at once because we have only so many friends so many spews so what are they doing well mostly they are just waiting there doing nothing and let's get back to the cost of threads I've already told you about the explicit cost you know the memory footprint but I would like to also talk about the more implicit one which is context switching what is all about and why is it bad so if we take a very simplified picture of a processor our architecture you can see that you've got you know the processor core and then you've got some levels of caches of memory caches l1 l2 and l3 caches and a 1 and L 2 cache are usually are usually exclusive so like if they are exclusive to the core itself of Alfre cache is usually shared between different cores and why are there so many caches well because you can only fit so many bytes or or memory close to the processor at a given cost and in a given access time so the closer the memory is and you can see the access times here the the less of this memory we have so like for what l1 cache with the access time for happy path of course so if we have a cache hit it's like nanosecond it's really blazing fast but we also have very little of this memory it's like you know kilobytes I don't have to cache which is a bit slower but it's still like you know more kilobytes and then we have our free cash which is already like megabytes in size but it's already very slow but like much slower it's like 20 10 to 40 nanoseconds and then we have Ram can have terabytes of RAM but in the in the timescale of a CPU the access to ram is really slow it's like two orders of magnitude slower than accessing l1 cache but what does it mean for us in our threads and context switches well let's say a thread is running on a processor and because it's running it probably needs some data to operate on and this data is called working set so the data that thread operates on is working set and this usually some limited amount of data and well very rarely you would see a program which is able to fit all its working set inside l1 cache so usually like the data flows from l2 to l1 sometimes maybe from l3 something's from Ram but you know we'll just keep it closed we keep all the data close and then comes another thread and systems scheduler says ok now you're gonna run so this is another thread and you know what all this data that we've been fetching from Ram and keeping close it's probably useless now I mean it's not stale it's still good to go you can still use it but you probably don't need to because threads in Java applications they tend to operate on different sets of data so it's probably different HTTP requests coming in and has totally different data and it uses totally different data from the database it's just different data so this thread whenever it comes has to fetch all the data from one Ram once again just to keep it close so the real cost of context switching is actually in terms of whenever we switch fred is not really the context switch itself is the missing compute opportunities of using the data that you had very close to you in caches so you can say that the cost that the worst-case scenario is like maybe 30 microseconds for each context switch if you're not you were using any data but it's not it's not easy to reason about it well because as I said depends on the data if the cache if data in caches is you know usable by different threads then the cost is smaller you can observe it actually using system tools like perf so with perf you can see the amount of context switches but also the context hits sorry the cache hits and cache misses so we can see if you're actually reusing the data if you're fred's can use the same data or there are so little threads that there is no need for context switch at all so this is them implicit cost that's why we try to do this like if you have less threads there will be less context switches so profit for all of us for our performance but let's get back to this up to our application so we've been talking about some low-level details let's get back to the application itself so as I said first the VM threads come to life and let's see what are those VM Fred's actually so this is a zoom zoom in on the first hundred milliseconds in the application lifetime and guess what are the first two threads that come to life garbage collector so before we start mutating before you start doing anything we need somebody to clean it up so the GC threads are born then it get the VM thread VM fred is a very specific thread because whenever you need something from the virtual machine you're gonna put a task on the queue and this thread actually reads this queue all the time and you know just does this tasks it's like like if you want to go to the same point you're gonna ask this thread to do it and then we got JIT compiler frets c1 and c2 and then some other less important okay what comes next spring boot context and traffic starts flowing so why is there so much space between spring context booting up and traffic starting flowing like why are there so many threads created not when why are those create that threads created not here but here and how can we manage them how can we utilize them to their full potentials that you know application is actually very performant and stable let's talk about managing threats so basically there are two ways of creating threats of of using threats you can either do the new Fred meaning just create a new thread run something can throw it away or you can use Fred pose you probably seen this construct this method executors Factory executors fixed thread pool the creative you know Fred pull of a fixed size so as you might suspect as I already told you are creating a new Fred has it costs so probably we don't want to use it we would like to you know as with all resources that are costly to create we would like to pull them keep them in pull and catch them for later and a small notice like be careful who you trust so if you use spring at the moment and you use like annotations like at a sink or you use a sink rest template what you're going to get under the hood is this thing called simple ASIC task executors and what it does is that for every task that comes up it fires up a new Fred the number of Freddy's Fred's is unlimited and it does not reuse threads so it's just like calling new friend and actually there are some pretty great people working on pivotal and marching jaeseok is one of them and the first time I showed this presentation he came up to me and told me like Adam if you don't like it let's change it if you have reasons for it to change it so it works like this but I hope it will be able to change it to come up with some better solution and it's up to me now so I feel bad by showing this slide but I have to because it works like this for now but we're going to create a new new issue and and I'm working on some set of example why it is bad and how can you avoid it and you know why is it actually a very common thing to be able to kill your application by using this so anyway let's get back to Fred pose so Fred - is a very simple concept like you got some you know some resources and you just want to catch them but what happens when the Fred pool becomes full so you got like ten threats or like eight in here here or sorry six you get six threats and all of them are busy and new task comes well as you know of pools usually there's some kind of a queue the only question is how big is this queue how much tasks can you actually stir in this queue and the sad answer by default it's infinite and by finit I mean it will overflow your memory and your app is going to die so this is what infinity means and to be able to actually to be able to change it what you have to do is you have to stop using the executors factory and use the implementation of fretful in Java which is fret tool executors and it has some interesting arguments here constructor arguments the one here is task queue implementation so you can you can add any task queue here which can be bounded tasks you like you can just you know set the size of a task queue to 100 or 1000 it's up to you to measure to calculate how big should it should it be but it's not infinite so it's not going to crush your application and with this comes another important property which is rejection policy and research in policy is a piece of code which is run whenever your queue fills up so let's take a closer look at it so the happy path for us is that we have some code we have some thread which tries to you know submit some tasks to the thread pool and it just puts a new new task everything is fine you know just getting it just gets executed on the fret pole here in those you know threads and everything is great well if you use the default settings and we get this you know unhappy path so everything is everything is clogged up and we have no resources what's gonna happen is that whenever a Fred puts some new tasks on desk you well we're not gonna get any feedback everything's gonna be fine but the queue is going to keep on growing as I said so because the size is infinite it's just gonna keep on growing well this is a bad situation for us because well the memory is finit unfortunately so if we use bounded task queue we can also take advantage of different rejection policies and there are a few of them which are pre implemented and available in a JDK so one of them is color runs policy so whenever the queue is full instead of switching the context and putting the execution into this thread pull the thread which wanted to call the Fred poll is going to run it itself so it's going to block more it might not be ideal for a lot of cases because we actually wanted to offload the work but it exists then the other one which is actually the default is abort policy so whenever the queue is full is just going to get an exception that's great we failed fast we know that something wrong is happening and we can handle the situation so we get a quick feedback this is really great there are also some more sophisticated policies like discard so if you want to put the tasks onto the queue well it's just gonna put it in dev know if it's full like just going to vanish it might be useful in some cases and the other one which is discard all tests it's very similar but whenever you put something on a queue which is full instead of discarding your task it's going to discard the oldest one you know the first one waiting for execution and if you have tasks that are actually you know that becomes tail very quickly you might want to use it and of course this is a very simple functional interface it has only one method so it's very easy to implement your own rejection policy and you can do things like measure how many tasks are you actually losing or you know put the task somewhere in memory or on the database depends on on the need so this is a very cool tool and if you have bounded queues you should use it if you had about eight queues well it will never run so okay so we created a new thread pool and still it doesn't under answer the question why are all those threads created so late so my context starts here all those threads should be created here right because the thread pool is created here well not really by default whenever we create a new thread pool it's actually empty there are no threads in it only as the tasks come in the thread pool fills up with new threads whenever a thread is done it's not destroyed it's cached it's still there but it's not created eagerly is created lazily if for some reasons we wanted to have all the thread pools ready whenever we create them we can use this thread pool executors method which is called pre start all call frets so if we run it we're going to get all the 80 threads at the very beginning whenever the thread pool is created and so well important question is this is a cool graph but like this is a graph for some dummy application how can we see how my application behaves in production what tools did I use to create this well there is the one cool another cool thing that comes with Java 9 which is called like JVM unified logging and with this you can just log the Red's you can easily log threads lifecycle into the file it's virtually it's virtually for free so you can just dump the information about each thread being created and each shred dying into the file and based on this output you can create things like this so the key takeaways from my talk are JVM threads are actually system thread and this means that we have a amazing tools to reason about what's happening inside our JVM we have a lot of system tools which would which can help us visualize and diagnose what's happening inside we're moving towards things like web hooks which actually try to spawn less threads because less is more because each thread costs and the costs of operating many threads are actually can be actually huge and whenever you work with a thread pool whenever you think about frets well try to tune your thread pools because by default well you might run into some weird corner cases so that's all from me for today if you would like to ask me some questions or or follow me on Twitter or or work with me here are the details and I'm happy to take any questions thank you you

Info

Channel: Devoxx

Views: 3,489

Rating: 4.9649124 out of 5

Keywords: DVXPL18

Id: IYHYk3rgfGI

Channel Id: undefined

Length: 45min 25sec (2725 seconds)

Published: Sun Jul 15 2018