Unique resiliency of the Erlang VM, the BEAM and Erlang OTP - IRINA GUBERMAN

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

my name is Irina government I work for a small startup called after we are doing Internet we are creating internet for things which is a IP overlay network for just the IOT devices but my talk is gonna be very generic not about what my company does so what the purpose of this talk is to show very significant differences between beam the beam which is Erlang the and what else other VMs out there but the one that's very big and popular is JVM and that's what I used to be working with so I'll be kind of comparing the two as we go but mostly focusing on extremely important and powerful features of the beam and I'm going to have a few demos along the way I'm sort of praying to demo gods I don't hoping that it's gonna work but if something yes I apologize ahead of time if something doesn't quite go quite right also there is gonna be quite a bit of cold I'll be showing quite a bit of code so hopefully don't get lost on that but I am trying to focus on one thing here and for those of you who are new to airline it's okay the code especially at the beginning will be very like very small and very detailed it's ok if you are a beginner and if you are a more advanced airline developer this is more of a like a marketing tool to just show people who don't understand the benefits of are lying or don't quite appreciate them so this is the one thing I would sort of recommend focusing on so let's get started a particular so then the beam they are allowing the um implements actor/model in in a very proper way it allows creates in hundred thousands and millions of processes that limit the system limit is up to one hundred roughly 34 million processes and they are not in any way limited by like they're not related in - Oh a threat whatever the underlying or threats are it's all or underlying threat or it's all up to the implementation of the beam so we don't have to worry about how this is how these processes are created and how they're you know how how is it possible to have so many it's not our worried are lying is a higher level language and I call it process oriented language because you could just processes like the main building block so they have share nothing semantics they communicate through message passing all the variables are immutable most of the people here and all that I'm sure but that's one and so now next what I'll be doing is a really simple demo server it's a hypothetical server it could be a web server just any server that accepts a lot of requests and very fast so we cannot jump to yeah there is a there is a github repo that has this all of these demos was the cold so if you don't quite follow me today could and but it takes your interest Akande always play was dumb there are bidness and they compile and all that so you can always follow up and take a look later so this demo server will have like very simple except loop which we'll we'll be using very basic arrowing constructs just for this initial demos so the accept will just be receiving the request I will I'm saying our requests are coming from out of space because just doesn't matter where they come from but we get a request and we dispatch the request we don't spend a lot of time in there in that receive loop and then we just move on to accept the next one and for simple so that how it would be sort of in production for the purpose of this demo I just have accept requesting just a number of requests a specific number of requests and so I'm just running a sequence of a list sequence to control that just for the demo purposes right so so it starts now request and then dispatch will be kicking off a process which I call that request controller it's not quite the process here that we'll be handling the request and that so there will be a two processes per request that it is just for the architecture of this particular demo doesn't have to be this way but that just was easier for me to implement this way to collect the stats and everything I didn't want to go into any one I kind of go into stats liber like the XO meter and other things you would be doing in production so I try to keep it as a small and simple as possible so I ended up with two processes per request one as the controller and one as a handler and so the this controller will be spawned by the acceptor and itself the controller will kick off the request handler itself and request handler it's actually the process of handling the request so if you noticed the acceptor loop has to be super fast and spawning a request and airline is extremely fast it's it could take about five to like five to ten milliseconds and that would not hunger the request acceptor it's the actual handling will be taken care by their handlers they are controller and the handler so there so the controller will do one thing it will measure the execution time of the handler and then return the response back to acceptor telling it that yes this guy is finished and this is how long it took so we can collect some statistics about what's going on and so it also has a receive loop so that receive loop will be to wait for it just will be waiting for the response from the handler request handler and once it's done like I said it will measure the execution time and some response back to acceptor so acceptor can account for this one request cannula request is also is very uh it's a made-up thing here it's just does some busy work so let's do this this one real quick I'm sorry I want to make sure that job so you'll be accepting 10,000 requests and so it only shows the keynote I have a lot of back and forth it's not just a keynote I need to sorry about that so so I'll be running I'll be sending 10th out simulates sending 10,000 requests to my acceptor the acceptor loop and basically these are all the requests that are running simultaneously all 10,000 they're finished pretty quick even though it's all there was quite a bit of busy work there they're like did some thousand a thousand records and if you remember the code so that's pretty cool we can so we can accept requests and spawn a process for every request and they execute all not all at the same time but so the airline's schedule or make sure that they all get executed okay so that's very simple there is nothing nothing much to it what we are gonna do now we can I get things are gonna get more interesting we are gonna introduce a bug into our system so every tenth request somehow result in an infinite loop something something's not like we didn't quite get to test this particular scenario something went wrong like it's kind of obscure bug it happens only every so often but I'm obviously simulating it here so now I'm gonna do this that every tenth request happens to have this F is affected by this bug and now you'll be running the same the same code with the bug I'm running the same thousand requests but now we'll be seeing that every day you'll be I'll haven't a display that infinite loop requests they'll you'll see them showing up so every tenth requests result in an infinite loop and look at this look at the beam taking up seven cores of my CPU pretty much because I started the airline with seven scheduled ordered instead of eight so I can still continue with my presentation here while this infinite loops are taking up all the CPU that was very clever of me because I knew there will be a bug if I didn't I don't know what would have happened anyway so let's see what's happened so 9,000 requests finished and it now to roughly a thousand milliseconds and now the a thousand of them is still running so if I like if I do this again it's still it's gonna get a little slower we're gonna still you'll have more infinite loops but our system is still sort of is kind of up and running we still accept requests there is no real disaster it's just that it's slower now because it is those infinite loop requests are sitting there they are taking up resources but if it happens like in the middle of the night you don't have to wake up your entire team to jump on this bag because your requests are still executing it's not a total disaster right so I think I should go back to my yeah so the to introduce this buck I basically just said I simulated so I said if if if requested the tents want introduce the Sinfonia to loop an infinite loop is the recursive call that never finishes that's what it was so we'll talk a little bit about beam scheduler now that we saw this couple of demos there are two types of schedules the cooperative schedulers and there are pre-emptive schedulers so cooperative schedulers are simpler and probably more performant than pre-emptive schedules cooperative schedulers the only switch to the switch to a different process or thread when the thread yielder finishes and there is no context switching they're much simpler pre-emptive scheduler will and the process after certain criteria and like from your OS will have only pre-emptive scheduling so that all the applications that are running was in it if there is any rogue application it doesn't take up the entire CPU it will just be empty any process that runs on it so based on what we saw and so but pre-emptive schedulers are they're a little more expensive and there are a little more complex and there will be contact switching because you chop a process often that somewhat working in a place where it states is very rich it will be a little expensive to restore that state and start off but it definitely is a very reliable kind of thing because nothing's gonna take up all of your system resources so based on what we saw is a airline scheduler cooperative or pre-emptive what do you guys think any more opinions okay so it's a correct answer but it's also cooperative its cooperative at the sea level so it's a good thing because that if there is any complex stay that the underlying C code is running in we don't end up with expansive context switching I'm not gonna go into details of what exactly exactly that logic goes but it's just this can be done it in airline because of how airline is and it is pre-emptive at the airline level by means of reduction counting so every reduction roughly is a function call and after 2000 reduction a process has taken off it's taken off the CPU and if you think scheduled or it gives way to another process so that's how we were able to run 10,000 processes and tantor out of which a thousand were infinite loops and those other 9,000 were still they still finished you know a little slower of course but they still were able to complete no problem I didn't restart my known my I didn't restart my airline note which I would have have by now I would have to do this if this was a Java node I didn't have to restart I can actually go back now and we run this whole experiment and still be cool just a bit slower we still have those infinite loops running in there so a reduction is a function call and it would be like thinking what if your function has an infinite loop it is impossible in our life because in airline there is no for loop and there is no while loop and anything that's kind of like for loop or a while loop it always involves other function calls like if you're doing something like with sequence to sort of mimic for loop that guy will be calling a function on me internally so it'll always going to be a function that will be counted four so the only time where we can introduce a problem into beams for the beam schedules when we create an if if an if has a for loop in it that's infinite that's it now we have the same problem as we would haven't see code so that's where annoying scheduler would be powerless so you have to be very careful with how you implement those but introducing an infinite loop at the function level in Erlang is impossible so beam memory will talk a little bit also about beam memory model every little every process every are going process is very small it contains of the process control box block which is only 300 bytes it's tiny like for instance the OS process control I'll like Lingus Linux s process control block would be like 1024 bytes then we have heap and stack they're all in the same address space and still save on function to save on memory pointers the stack starts from the top addresses and heap starts from the bottom addresses and they grow toward each other and once the emitter the garbage collection kicks often old heap gets a new heap gets allocated and the initial default heap in the airing process is tiny it's only 233 words to to sort of analyze a little further the like the memory allocation of arrowing process we can use the hype it's they're pretty cool if you have hype enabled so one interesting one is show PCB it if you can if you want to look at the heap size maybe I'll skip that step just not to take up too much time because there is I'm not sure if I have time for the rest but basically it will be 200 it will see it it will be 230 word 33 words that the hip is allocated to sorry about that so when we run our demo our system roughly look like that all these tiny processes was tiny heaps completely isolated completely separate so that's a very important we're gonna get to why it's so important in just a little while and that's compared to JVM every thread is pointing to a large shared heap it does have certain matter like performance advantages possibly but that's cause it's it's more problems than advantages especially if you let's say if you're working on a large like a web server where every request is completely independent it's really it's really gonna using JVM I think you would be you're shooting yourself in the foot and I'll show you in first all you cannot the two demos I just showed you you cannot really do this in JVM because after about a thousand requests or a whole like was that many infinite loops and thousand requests your servers just gonna be going down it's not gonna you're you won't be able to continue with those requests like we just showed this is only possible with the memory model that early the beam provides it is not possible with this kind of memory model but we'll be doing much cooler things than which that was just the beginning when you have a server and your server implement all the requests think about why would you add those infinite loops running in your system if your request did not complete in a certain amount of time it doesn't matter if it's a financial application maybe you allow it as much as ten minutes doc two databases do a ton of expensive things but there will be of some reasonable time out ten minutes thirty minutes it should it cannot be running forever if it's a web server you require could be reasonable to be expected to be done within like 200 milliseconds or maybe 10 seconds there's gotta be a business kind of limit on how long a request should be implement it should be handled so those infinite loops there they're still they're still running and my system on my known but they really and taken up the CPU because I never stopped that but I really there is no reason for them to be running who's waiting on that process nobody so it would be ideal if the process did not complete in a specific amount of time reasonable amount of time it would be nice to just kill that so I'm gonna I'm gonna move on to my next demo and that's like that's a killer down that demo is a real killer and in here we are going to look at killer juggler so this is a slightly modified version of my original server it will be it is slightly modified that they receive loop it's a slightly different construct of the receiver hope it's not going to be waiting forever for the handler to complete after a certain time out I just started to 5,000 that's an actual default 5,000 milliseconds the default he could configure it to be whatever you want after 5000 milliseconds it doesn't matter what that server what that handler is doing it's no longer relevant it's taking up resources on my in my system and nobody is waiting for this request to complete it's it's it's no longer relevant so what we do is just kill it it was a simple simple airline call exit so we kill that handler I tell it the reason which is timed out and you tell they accept you know our acceptor account for everything that is started so we're gonna tell it that it's so that's great because now as soon as the request handler is no longer relevant we just kill it it doesn't it may not be even being there maybe no bug it's just the system maybe is too slow so it's doing some expensive operations they are taken too long and if it's taken too long we kill it so it stops using up resources so it doesn't have to be a disastrous bond it's just you keep your system doing only as much as it should be doing it doesn't do anything useless if the request didn't complete in time you kill it you kill the request handler sorry cannot keep switching back and forth now we're gonna look at our killer demo I want to get rid of all the infinite loops and start clean it's a much better server now it will be running Sookie we're Jaguar is pretty much the same thing as the jungler but it's now has this little ability to kill the handlers after a certain time out so what happens is that the processes that are not finishing up fast enough they're starting to get killed you see that little key I didn't want to make it into a graph I just kind of want it to be more visual some somebody got stuck anyway so what happens here is that the processes that didn't complete in time they nothing's now now my CPU is not taken up by those anymore they were all killed as soon as they were like timed out as soon as they turned out no idea what else I did they're here so this was a very I used very basic error weighing functionality to run the server you would not be doing this kind of thing in production in production we would be sorry I have to skip a few slides with in production we would be using and that's what would be highly recommended OTP which is open to accomplish form so from what we've seen so far we're moving on so like you know that that everybody's seen this joke before so how to draw an owl so so far we drew the two circles and now we're gonna draw the rest of the owl with the OTP and OTP was created by Ericsson in 1995 it put a lot of resources into creating that these libraries is amnesia database inin that but main thing is like system architectures support libraries and it's a framework for writing robust and fault tolerant applications and that's something that I would strongly recommend to be using you would be using in your production in your production systems so like now I'm a little overwhelmed oh how I'm gonna tell you how to draw an hour from after we did the two circles that was very easy for me to explain now - we have like 10 minutes and now I'm gonna explain to you how to draw the owl actually not so basically there is the code that I wrote and it's a available in the repo I'm not gonna show you the level of detail of the code we might just like look at a couple of things there are application behaviors which are they design patterns and they were there they've been bulletproof in many many production environments and many production systems and they're sort of they've been refined over many many years and one of the very important and most people here probably know the supervision principle the gem server it's been refined over time so you're probably better off using these so there is application there a supervisor there's gem server it's my favorite I would use it above like Jennifer Sam or gem Statham you could use GM server for everything nervous to an event so drawing the rest of the damn owl I use rebar to generate this application and sorry that you're not quite done it is the keynote that's done so so I generated the OTP version of this juggler application and that one is a bit more complex and it's a bit more appropriate for production and now bear with me that's kind of a lot of code hour so we generated this application with the rebar as you saw and then I had to sew it it's the the top-level supervisor will be creating a separate supervisor for controllers and separate supervisor for handlers and those will be there will not be a long-running processes there will be just um they'll be just to handle one request you can argue that it's not the best way to do this it's just for simplicity in production you might actually have just one sir like one controller server controlling numerous requests requests handler of specific category but arrowing processes are so cheap that it's totally okay to do this even if it's a gem server you have to realize gem server is just as cheap as any process as any you know it's the toast processes that I disappoint processes gem server sound like a server but it is very cheap so it's totally fine to replace those simple processes that I was showing it before it was just a John server so there is a there is a controller there's a controller gem server and there is a handler John server and the cool thing about those being a gem server is that I actually can talk to them especially in production there might be a reason why I want to be talking to them while they're doing their thing so it may not be such a simple process as there it is for this demo so you actually can send requests to its mailbox it may do something while it's handling the request there might be a lot of different things it might be talking to some remote services the remote services can get back to it I synchronously so using a gem server allows that and what happens is that the controller will be now I'm just basically showing you the code and it's too much for one presentation to show so feel free to look at the repo but so the controller will this create so the supervisors for controller on the hand our simple one for one today their children can be created dynamically so the controller is being to accept her when it gets a request it will spawn a controller and the controller in turn will use handle supervisor to spawn a handler so what the controller does it will actually create the handler and then say Jen server call to the handler pin so that's a black and call and it will give it a request timeout so if the handler does not finish within the specified time out the controller will die and that's basically that that's that's basically the way the OTP Jen server is implemented if the Jen cerebral call does not finish within the specified time out which is configurable in this application it will die and when a terminate I put here that it will also kill there also will kill the handler so that's a sort of more production level way of doing things so I guess I guess I'll just do the demo right now for is the OTP because basically demonstrating this all of this code is a little complicated out if anybody is interested you could look at it later but so we will just run our test console which starts the OTP juggler application and you'll be testing like a thousand requests I do have the same problem as before in this code as before it will have infinite loop every every tenth request will result and the handler bug that will cause infinite loop sorry about that so what happens now it's just it's pretty much the same thing as we saw in the ammo tree but it's not the OTP way we have supervisors and we have the gem servers doing the work so what happens here you see all these crash reports this is from those I did now I ran a thousand requests so out of the thousand every turns will have infinite loop just like before and so what happens now we have a hundred requests that are in the infinite loop state and those crashed the controller because they're the gen server called did not execute within the timeout and it's a to I can smoke something 1000 I started so you'll see those crashing and it's taken and this will be taking the it will be taking the handlers with it so if you look at our CPU here again it's clean nothing is running anymore if we look at the statistics of what has what has been executed and we have again 100 failed requests and 100 ok a successful request and we were able to report this back to the the controller would report back to the acceptor every time it succeeds or fails and it knows it failed on the infinite loop because it got it it was terminate basically got terminated calling the the entire call was a timeout hope it makes sense so that's I'm getting anything so that's pretty much it for this demo I could run it was a little more I could make it larger and run it was ten thousand but it's taken a little too long printing out all this crash reports so don't want to do that to you so there to prepare for this presentation I I spend a lot of time reading through this really nice resource the beam book it still work in progress but it tells you a lot about the beam memory model and pretty much everything all the everything you might want to know in depth about it I didn't quite I didn't quite make it my goal to reiterate the book here I just wanted to sort of focus on the one thing you can do with arrowing which you cannot do with any other language is be able to kill processes and be able to spawn millions of processes and be able to kill them no matter what and that's like is I think this is highly underestimated feature people think if you talk to a lot of times I talk to JVM people they actually were quite surprised that this cannot be done I challenge the JVM core team to do this and they were a little surprised they were like I I was quite surprised that people who are really knowledgeable on the JVM sorry they actually don't realize this cannot be done it's kind of funny it's so it's so fundamental and that's why when you talk to people outside of online community trying to explain to them how annoying is so amazing they always find like oh you have pet attention but Scala akka has parted match and you have this oh but problem there's always a but we have reach libraries so nobody is like really sort of it says but this is the one crazy important thing that it does that it's just impossible to do in other languages I mean not completely impossible in JVM if you like instrumented for every for loop or while trying do some weird things instrument your entire code base you can still achieve this in a really convoluted ways but it's not part of the it's not part of the I am the main culprit has that big shared heap and there's it's central to the design to the architectural JVM and there was nothing you can do about and having small isolated processes is central to the architectural heroine of the beam which was early as it was created and before the multi-core CPU was even available to nobody they already saw it coming and they that they design that the yahoos was this was the future in mind so this is extremely important point even though maybe I keep talking about one thing here and this presentation but this is what I think when people try to sell airline that's why they sort of fail they start showing all the good things there's so many good things that people actually lose the one thing that's unique important as how and not available not possible in most other languages without I mean if it's a sea of course and like written so you can't breathe for matter but it's gonna be extremely difficult so hopefully that makes sense guys questions I got you got one so when you're talking about how to even the stock brought together and they were allocated international sighs I guess with something like two into words or something do I not benefiting from 233 to start out what's the most amount of memory that you've seen allocated on the stack which was the was the biggest man I've ever seen for president um I don't think it ever gets to be I don't I don't know exactly how big it can get it can be big right it doesn't anyone know how how let's get back it's not about my personal record you

Info

Channel: Code Sync

Views: 815

Rating: 5 out of 5

Keywords:

Id: Slf2PCC41tg

Channel Id: undefined

Length: 42min 54sec (2574 seconds)

Published: Fri May 22 2020