Robert Virding - Hitchhiker's Tour of the BEAM

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
before we get going that of course is Mars from the Curiosity we haven't got our line there yet but I have a goal that's might even have to carry it up there myself but that's my goal okay so originally it was planned the Creston grab would talk about air junk but he couldn't make it so you have to put up with me talking a bit more is there ah no more no reason for living wait on the rover okay Wow I have to find Africa visit them and see what we can do there are other planets yeah further away big a bit bigger better more exciting no reason for living anyway yeah nothing but still Creston couldn't make it so I'll talk a bit about a bit slightly more over higher level view of the beam so unfortunately things came in slightly wrong order so we start off the deep end in the most difficult bits now we're going up to the more overview of the system so yeah and yes so each Ike is two of the beam so of course what what is the beam well it's a couple of things it's a virtual machine for running for running a line okay of course duh it's interfaces to the outside world so I mean how do you get out I going to talk with the outside world there are basically two mechanisms there are ports in their nips so from the Allen point of your port makes the outside world look like a process while an if makes it look look like a function call and they're both good for different both good have their uses it's also a large bunch of built-in or useful functions typically bifs right the things that in the module our line and a few other things written in C as well too so here they're Biff's in the language or just some special functions they're handled for efficiency like lists : reversed actually that's actually written in C so if you're going to implement nailing system what are the specific properties of our like that that make that are you need to be aware of if you've if you use the language you've thought of what you've seen all these things there's nothing strange here is that such just want to get them down right so it's lightweight massive concurrency which sets aside from most other systems it's well two-way synchronous communication based the whole communication mechanism is asynchronous all the way down we have process isolation of course so things can happily crash without renewing for other stuff as well we have error handling where rather we have primitives for building error handling systems so the same thing with a concurrency model there are primitives in there and you make more complex things yourself the same thing with error handling there are primitives for making it for building fault-tolerant systems you yourself have to do it yourself you do it yourself the continuous evolution of the system yes and it's soft real-time which in this case means yes there are timing constraints in the problem but if you over run them occasionally it doesn't make that much of a difference it's okay hard real-time people don't call this real-time real-time okay they they say if you run over that's an error bang Thanks and there are the properties of the language as well so the best properties of the system right then you have language properties you have to be aware of as well for example things like your mutable data is heavily used for pattern matching and it's a functional language and other things as well - these aren't specific to our lying anyway if you look at most functional languages you will have you'll find they have the same concept the same features they might look slightly different in works like a different way but they're all the basic stuffs there right so it's the system side that's this sense more important oh there you can't ignore these things so if pattern matching wasn't done efficiently it would slow down the system that notably say for example you need efficient pattern matching for example that's nothing orchid science there are lots of different ways of doing that you can't read papers on books about how to describe that do that so if our lying is to run on the beam the beam needs to be able to do all this at least all these things okay so we're not gonna look at everything I'm not gonna look at the the language features as I said you can I can give you a number of books to read if you just didn't and he'll do it we won't look at some things which have already been mentioned so both Lucas and it extend man took up mention number of these features now I'm just gonna look a slightly more overview of the whole thing here so yes so if you look at schedulers the process has been about memory management messy message parsing multi-core and a few other things as well too unfortunately it's very difficult to give a very sort of logical straight through presentation of these because they all interact with each other so we're jumping around a little bit on the frame and the basis for the whole SMP sidles or multi-core however you want to call it is that the SMP handling should be transparent to the programmer you shouldn't have to be able to go in you shouldn't have to have to be forced to go in and work out how many cause am i running on now and I do I have to restructure my system to use that number of course I'm running etc etc you shouldn't have to be able that the system should be able to do it for you though sometimes you actually may you may want it okay there are there are times you don't want to let it go by default it might do it might reuse your risk you might use your resources in the wrong way for example I don't think like this so sometimes you do and the basis for the whole thing is that the scheme use if the beam uses something called schedule ISM so when you start up a beam an a-line virtual machine and one operating system process to be however you look at it it will start up a number of schedulers when you're right this is just pure white if by default there's nothing this this is how it works so what's a scheduler so scheduler in this case I call it a semi autonomous beam virtual machine so it's got most of the things need in it to be able to run a line with the stuff with ports with handling ifs etcetera etcetera cetera in there it's got most of each one's can each one's more or less autonomous well could be autonomous but of course they're running in the same systems they have to be able to communicate with each other otherwise you get lots of little Airlines is there by default well one you run one shared you let per VM thread okay that's just that's how it works that's how you get the independence of them that's how you they can run concurrently or in parallel and by default when you start the system up you get one thread one scheduler per processor core that's just the default okay they are they are quite autonomous so they tried to be so say for example each one has its own run queue where actually a number of run queues for different things which it tries to manage as independently as possible and they try to run as separately as possible of course so you can avoid the nasty things like locks and synchronization and locks and synchronization no matter how smart you are they always cost you so the less locks and synchronization you could have the better the system's going to run this isn't a fact of life so for example well as Lukas was talking about each each scheduler does it tries to as much if its own memory management as possible for example you don't want to go out and and try and synchronize yes you can call the operating system do a malloc or something like this and yes that will be thread safe etc etc there's an awful lot of work going on the operating system to make sure it is - it's safe and if I can avoid that fine yes yeah okay now one other thing interesting things that the schedulers do is load balancing so if I run if I run up something on a number of court number of course number of schedulers the system will try and balance this balance the load on these cores as much as possible by itself this is just a built into the system your extend run was mentioning these getting into it we look more at the details but this is the basic principle right you don't want to you don't want if you could eight cause I don't want one core running 150% type of thing in all the rest doing nothing right because that's just a very bad usage of resource I was trying to spread the load as much as possible so yeah so I want to spread things that but at the same time I want to compact things because it's actually beneficial to reduce the number of course it's for example memory of memory locality it's better for memory locality if I'm running fewer cores to do things for and well especially hyper-threading here as well too so so we've got two opposite things here in the system I want to spread things out as much as possible you can even load on the system make sure nothing overloads I want to compact things as much as possible to make sure to get better performance out of it so that these these these two goals conflicting with each other so how does it do it well there are number of ways it does the load balancing spreading things out and the first way the simplest ways of something called process dealing it actually steals more things and processes but this is the basis of it so this is the primary mechanism that's used to load to load balanced and spread processes and it's purely local so how it works is that if a scheduler finds it has nothing to do so it's it has no runnable processes there's nothing in the process queue and none of the process it knows about it owns I have anything to do will try and go out and find steal processes from another scheduler so basically you can look at the schedule herbicide and say does this schedule I have anything in its run queue and if it does it will steal it will try to steal it anyway because the other shared you will be doing something else anyway so will try and steal that that process if there next next one doesn't of course then I'll try looking at the next one along the line to see if we can try and find a process it can steal over to get work for it so in this case if we look at these this simple case here with her four schedulers shared EULA 3 has nothing to do there's nothing you just run queue is not doing anything just sitting there so we'll try and go out and said you love four and see if we can pick one of the processes in its run queue and take it over and hand her over only steals from run queue it never steals running or suspended processes this is just work this is not free I mean it takes bit of work to move a process from one one schedule to another so it's not free but it's it's a very simple mechanism to try and load balanced things and it's done in a purely local level for it so although all the schedulers are doing this if they could I think do they'll go and try and find another process to run so that's in a more local level that's the main that's the main thing for spreading the main mechanism for spreading the load but there is another there is another level as well - a load balancing and that is Eric went in a bit bit about this that's the fact that once every a period I haven't got 40k reductions he was saying four million reductions I don't know exactly where the figure is someone might know better but the principle is that all the schedules are running and when someone has reached this many reductions it says oh I'm the manager I'm now I'm now the scheduler manager right our master scheduler and it's basically the first one to reach that count now at this stage nothing stops the machine is still running we're not pausing the machine anyway here the machine is still running and what this master will try and do we'll try and look at all the schedulers and find out how much work are they doing how much work have they been doing look in there run queues what they're doing is cetera set like this and it will try and optimize all try and balance and optimize and it's at this level it can detect that yeah we might have a cause we're actually only running in about two hundred percent load so having all these eight courses actually big waste and we could probably close down three or four of them quite nicely and spread the existing load out on the system and it trying tries to work things out like this as well and if it detects other schedule well it doesn't move things because then you have to do synchronizing but it can tell other other schedulers to move processes across they will do it and try and do this balancing fort and when it detects their schedules no longer being used it can shut down that core for put on suspend or we're going to call it which is of course very nice because that saves energy and it doesn't make anything slower because we're still not overflowing our schedules no threats and again then when it's done it stops and it well it keeps on going and then all the schedules run until the next time someone reaches this level and becomes a new master does basically the same thing so this is anymore this is trying to optimize the usage in a more global level so doing things like being able to compacting Amigo Virtusa been scheduled you can't do that locally oh I kept my I can't look at myself and say I've got nothing to do so I'll put myself to sleep because maybe all the other edges like a long run queues so this can mean you've done the global level and as the other guys mentioned there's a large number of plus s flags which allow you to control these things how much you how much you want to drive it to try and compact things as much as possible or not perhaps are do you want to spread it peps I've sold the system as being able to which balances keeps the bat explicitly as balanced as possible over everything and then I don't want to compact it cuz that's not what I've said it should do a lot of flags for doing things about this yeah yeah by the way reductions spread that around we use it quite happily it's better specially function calls this is from the bad old days when a line was written as a prologue with a prologue interpreter and prologue doesn't do function calls it does reductions so we use the word reductions for counting how much work out I did and that stayed even when we became functional and when we left prologue is still called reductions so as the other guys mentioned this is not just pure function calls you can think have things doing work and say yeah we've now done the equivalent of five function calls or whatever it might be for direction okay a little bit about scheduling processes so by default well what happens is when you spawn a process where a process spawns another process a new process will end up on the same scheduler as the original process there's nothing strange there and a process can be in multiple states I can be running its what's actually being running can be runnable it's in the run queue it can be waiting it can be suspended sitting there waiting in inner se waiting for a message which hasn't arrived yet it can be exiting so when the process decides it's going to die or it's been killed it's actually goes into a state where it's going to clean up after itself for example free memory it's going to send signals down all those links and things like this for so it'll be in the exiting State and it can also be garbage collecting so here when we garbage clicked approach when we garbage collect in the island this in the beam we garbage collect one process at a time so each process carbon takes separately that's a bit more about but this has the benefit that garbage collection times are generally very short and varied it's very rare you get past the case where they're not short enough for our real-time requirements for but it can be garbage collecting Wang can be suspended you can't actually suspend the process still there it's still alive but it will never run until you until your resume it afterwards I don't really know why you want it but it's there anyway so it can be in a suspended state as well yeah so yeah so processes suspend when waiting for messages I think the important thing here is to realize this is not a busy wait the process isn't outpolling have any messages do I have any messages to only message it's just sitting there it's suspended and when a message is sent to the process then the process is put on the run queue so that means it doesn't cost anything to have a lot of processes that are sitting there waiting suspended doesn't cost any execution time memory yes but that's it right therefore is quite possible I mean if you if you can have a process a system say 100,000 process only if you early a few thousand might be you doing something at one time the other just sitting there will be just be sitting there suspended waiting for messages and not cost anything and when a message arrives a process becomes runnable and important thing here is that when a running pref process that will be suspended for two reasons well multiple reason to rip the two main reasons are when it when it goes into receive and waits for a message so it goes in it doesn't find any message which matches then it's suspended also there's a pre-emptive schedule in the system which reschedules processes after 2000 reductions irrespective is doing a lot of work after 2000 reductions this process will be researchable put on the end of the run queue and this means that no process can block the system or even block a scheduler in all they should always be scheduled out and this this gets back to original requirements we had for example of latency the system should never block sure it always glad to do new things in and when I'm writing my program I shouldn't have to worry about writing things out and and having to think about these things so the system does it I can say this is I mean this is nothing strange most operating systems do that I mean in most real operating system even Windows I mean does these type of things for you can't block on it that's nothing strange yeah but that just gets back to one slight push here but get back to one thing we found quite early on when you're starting to think about Aling systems you find they're very operating system like it's not like so much programming a programming language is such running and running something you know you're building a whole system it's a very operating system like with the processes communication etc etc and a lot of the features in the language support that view and also a large number of the library support that as well - I won't go into it here but take up the risk to talk about that but I won't yeah anyway so yeah we showed link processors which means nothing blocks this is less important today when you've got multi cause but back in the old days when your head only had one thread that very critical we have ports as well - of course that that's one of the ways you creating they're also managed by the schedulers so again when you create a porters crate on the same scheduler some port activities are scheduled it's not pre-emptive in that sense when a port is running the code for the port the actual port implementation is written in C and then when the system calls that it gives up control to that so we won't automatically preempt that or anything like this for and so when you're writing LinkedIn drivers which are a way of writing code in C which on the a line point of view looks like a port you send messages backwards and forwards from them you have to be careful if you're not careful you can block the system or block your core doing that it's exactly the same problem as with NIF's so have to be careful to do that so that's ports but they're also scheduled to manage I think actually I think I mean quite grasp this year but I think you schedulers can steal ports as well - from others someone who knows we have to wave their hand and say yes or no about that okay yep so you can do a lot of stealing going on right so we kill things and we steal from things as well - so yes well we're pretty criminal people actually the system is pretty criminal when you think so don't don't quote me here because these quotes did then do it to annoy people yes I'm not going to talk about the let it die to kill our children t-shirts say yes okay so memory management yeah so Lucas went into a lot of detail details of some of these features I'll just give him more slightly overview of that and the basic idea is there are lots of different mid type memory areas with handing different types of memories different type of memory in the system for different requirements they had different properties and different needs this is nothing strange the system gets complex and either you can have a naive memory and management system which is slow we can or more calm which is fast you just have to take your Pickering as I said you can replace all these and just call malloc for Everett malloc and free for everything it'll work but it won't be very fast actually quite slow so we have things like processes each process has its own heap its tables they're in a separate memory area as well to the atom tables completely separate a block of data we're something I call a large binary space that's where you put Westworld big binaries big being bigger than 64 bytes i put code somewhere is there timers not going to go through all these things but there are lots of different memory areas in the system they separate because there are different requirements different needs so the easiest ones the atom I think the atom table so all atoms the system interned in a global table there's one big global atom table which is very nice so if I've got an atom foo in one part of the program it's the same at and food in another part of the program which means all atom comparison is very fast I'm literally just comparing an index into the table so you never need to use introduces tax so you can actually quite happily use integers as tags and get very making both make the code readable and make it fast the trouble is atoms are never deleted from the table so once you put an atom in the table it's there until the system goes down that's a problem of how memory is managed in the system seeing there's no really no central garbage global garbage collector for everything there is no if it would be very difficult to keep track of when you'd actually free an atom it could be done but it would be very difficult that means you should be careful creating new atoms on the fly okay because if you do it wrongly you will overflow the atom table and that that is your classic that's one of the classic ways of crashing the system so you should avoid programs that rampantly create atoms in an uncontrolled fashion you will I think the default atoms table size is somewhere around a million atoms I think that's the death some default somewhere up around there anyway and you will never right create that many atoms in your code you millions a lot that's a lot of words and if you create atoms dynamically for example using for say trying to be unique some things or tags or whatever then you can quite easily quite quickly overflow that and you'll crash the system there are cases I mean I might dynamically create atoms through one to two thousand again and again again that's really no problem because I still don't got a thousand that's that's quite nice but it's keep it uncontrolled and the atom tape atom saw table size is fixed you can't increase it while the systems running so when you fill up your crash so that's atoms so great things but you actually slightly careful about how and when you create them I can also add they're creating atoms because you want something unique isn't so I cannot create a guaranteed unique atom I can give it a long weird funny name before someone else somewhere else in the program creates exactly happen with exactly the same long weird funny name they're the same I can't I cannot guarantee the uniqueness of atoms so about the large binary space and the idea is they had a large binaries which in this case means greater than 64 bytes are not stored in process hoops they're stored in a separate area that has a lot of benefits so if I'm sending binaries but from from one process to another and it's large in the sense I don't copy it the barn is pre created somewhere in a separate memory area I'm basically just sending a reference to that binary round so I can send it round which means I can do things like actually streaming streaming large amounts of data through the system so I'm not copying it that's why how systems actually do strip their to streaming video to to it because he not just not copying data for they can save a lot of memory as well in this case binaries I'll just want to point out I think a question arose somewhere recently when I said can't remember I thought now but so if I create a binary in a process you know then I store that in an Internet table I'm not copying the binary the binary is still in a large binary space I'm just putting another reference to it from the edge table so if I've got large binaries and I'm moving backwards and forwards between processes or between process and nets and tables I'm not copying them right so I'm sharing it that way the problem with binaries or the binary space is that can take a long time to reclaim a binary so what can happen is I've got a large binary from my making in one processor put together and I send it through a chain of processes until it gets the other end and then used and disappears the trouble is I can't to be able to reclaim that binary each process it will be ref tagged up in each process that it arrives in each person contains a yoga reference to this binary and that binary can't be removed until all the processes along the paths have been garbage clicked and each one says no I no longer have a reference to this binary so it's literally a reference counting scheme so every process it comes to bumps the reference by one and it's only on garbage collection of the processes that the reference T is decremented that can be a serious problem you can overflow the system because it takes so long time for your for some of your binaries to get reclaimed you might have processes along the path which do very little work and create very little data which will garbage click very seldom and they can they can keep their reference these binaries take a long time to for the bio-memory to be reclaimed that has crash systems before and probably keep doing it it's just something to be aware of with a lot of these things I mean it's it's it's some form of heuristic you want to optimize for and for sometimes it would be a big win and for some people it will be very bad unfortunately that's not much you can do about it the way around of course is if you if you get this problem you can actually explicitly start garbage collecting processes if to get rid of the references and then unload this would be much faster that again costs garbage collection stand free so again you have to have to wear so yeah it's tables again they're separate they're separate from processes so one of them there are two basic reasons for originally for implementing its tables and one was be able do a fast preferably constant time lookup into edge tables which is what we have and the second one was to be able to store large amounts of data outside the process heaps because I mean if you put a gigabyte of data or gigabytes of data in your process heap that garbage collect in that process is going to take a notable amount of time which is which you want to avoid so you put outside the thing to remember here is that the memory in a process heap or in a process is never shared with with the data and yet stable so a process always works on its own data its own data and when you access a Nets table you're copying that element from from the X table into your own processing then you're working with it when you're writing things when putting something into a net stable you build a tuple on your heap then you copy it over and put it in the editor area okay so you're copying back with some full data backwards or forwards this actually means that you're not the it's data is not shared III so how long actually doesn't have shared data I'm not sharing data I'm not I'm not lots of processes aren't writing their things if we're reading from from the edge table they're making their own copies and working with them so actually it's page tables are very processed like it's just that there is no explicit message passing access to them but they actually behave in a very process like way of course copying is not good well you something you want to try and avoid that's why there you have things like mash and select because they allow you to do a lot of work picking out exactly which elements I want before I actually copied them into my process otherwise I could just do I could just do first and then your next next next and step over each one copying and looking for it the matching selected language did a lot of that work before actually while the data is still in the pro in the edge table that's what that's one reason for them but the big thing here is I can quote I can store large amounts of data units tables gigabytes is not uncommon and if you go talk to Cline are there they have terabytes of data in their tables right so you can have lots of data in there not implicitly garbage-collected what that means is yes of course they garbage collected so if I delete it if I delete element from a table of course that memory is reclaimed if I delete a table of course is reclaimed what this means is that that in its table won't go away by itself just cuz no one references anymore it's like a process a process won't digest because no one's reference it anymore it's there until you kill it right it's the same thing with its table even if no one represents the data still there but of course the memories reclaimed otherwise be useless yeah and of course it's tables are linked to the process that created them that's why I say they process like if that process dies and yes table goes away which is great fun if you're sitting experimenting in the show I've put up in a table I filled it with stuff I'm doing work I'll do an error and the shellman tables gone yeah done that been there so yes but therefore what they were designed for they're very nice thing to do they're very nice thing to do for and I'm not going to get into the discussion when you when to use its tables and when not to use its tables that's a long discussion and there's no there's no one reply for what answer for appear okay process herbs so as I mentioned before each process has a separate heap and all process data is local to the process itself this is not just things that were on the heap but as Eric was mentioning there's other things which are local to process and each processes don't copy of stuff so eight to the process is working on it it's local to the process and there are yeah so so what happens is when you create a process if by default that starts off with a very small heap so that's why you see things that the minimum process size is two or 300 words which it is it just starts off with a tiny heap in a stack and it will grow so the more data you put in it will grow and will grow by do it when it runs out of memory by doing garbage collection and saying yeah I need more memory that'll increase the process heaping and getting again again I can set the minimum process heap size so I can say okay I can start off this process with one megabyte of memory because I know it's going to grow big which is interesting it also means that the sending of messages is done by copying data so when I send a message I copy for one process to another nothing fits this is actually not required by our length this by the aolong the language the semantics this you might say is an implementation detail all the language specifies is that the data in one process should be isolated from the data another process and if one process crashes that that's not going to affect the data of another process but seeing you have immutable data in the system you can actually share data between processes that so we don't need separate processes before the isolation then you ask of course isn't all this data copying terribly inefficient right I'm sending messages happily sending message backwards or forwards I'm copying data backwards and forwards and wouldn't it be better to just share data and send a reference well yeah sort of maybe but and it's a very big but here probably made even bigger as well as with all these things it's trade offs it's all trade offs and there are a lot of benefits of having separate processes yes I pay because I'm copying data between them I pay by using more memory but as I mentioned before having separate process heaps means I can garbage collect each process separately if I'm sharing data garbage collecting processes suddenly becomes much more complex and approach generally garbage collection times are so small you don't notice them well they're short enough the garbage collector becomes much more efficient I don't have to do a real-time collector I can just do a stop and collect for because the store times are so short and that trust me is a big win the garbage collector will becomes will becomes more efficient it also becomes simpler and that might sort of sound like a trivial thing but I can assure you implementing garbage collectors is a right pain right because every time you get an error in there you're not going to take the error then you're going to attacked it a couple of garbage collections later when your reference memory when some pointers pointing then completely the wrong area and trying to find go back and try and find out what went wrong is difficult not trivial so I'll say so keeping the garbage collection simpler is a big win it also means it's easier to do more have used better garbage collection algorithms now I'm just talking about the process if not about the whole memory management thing but even inside the process heaps it's you can use you can use more complex and better mechanisms for and it keeps the synchronization down so if I have processes sharing especially France if they're on separate threads and have a processor sharing memory I need to synchronize when I'm doing garbage collection on them because what one one thread might be changed and it might be changing the memory under the foot feet of another thread they have to start synchronizing for it and this this trust me is a big win I've got an example at the end give some examples of that and the more calls you have the bigger the win it is so yeah the garbage collector it's a copying collector nothing strange here this is standard this type of language copying garbage collector means you just you're copying the data which is actually lies and you're leaving behind everything which is which is not no longer no longer use that's the best generally the best our type of algorithm for this type of system we're creating lots of temporary data so it's nothing strange here it's a generational collector so here we have multiple generations in the garbage collector and again this is based on the heuristic that most data you create dies young okay some will survive and data which survives a long time a student usually becomes very old right so if I can create just garbage click to new data the newly created data the chances are I'll find most of the garbage there so we do that we have separate generations we have an old and new generation and I just garbage clicked and you when we're out of memory are just garbage clicked to new data now some of that's going to be saved I can garbage click that a couple of times and eventually have to put it back into the old data it's been around long enough so now we're saying this is old and generally speaking not much data ends up in the old heap that's the whole goal of it right I can do most the stuff in the new heap only a little bit ends up in there in the old heap so most of the garbage collections I do are only of the new heap which is good but eventually the old heap will fill up and then after garbage click the whole process that's but by the way is when I can detect I no longer have references to large binaries one of your full collections but I can't be certain until I have that collected everything right that's the trouble with it and just guessing and hoping is not good enough for so I can tune these things here just a couple of simple ways of turning things right there are lots of doing these things so I can say mentioned before I can set the process the minimum process heap size so I can start off a process not being as small as possible being bigger because I know it's going to grow and that will save me a bit of CPU time while the process is growing because I'll be doing less garbage collection just filling things up the downside is of course the process the process will never get smaller than this so it takes more memory I can set this both at a per process level and a system level so I can say yeah every process in the system have minimum size of one megabyte that's quite a lot more than the default 3 or 400 I can also say okay how often do I want to do full sweeps how often when I'm garbage collecting my process do I want to garbage check the whole process here and not just the new generation so I can set that and the more often I do a full sweep the more Metin total the more memory I'm going to reclaim so I'll use this memory in my heap but it's going to cost me more CPU time to do it so I can say yeah every time you do a garbage collection do a full sweep so it's yes it uses less memory reclaims large binaries faster but less efficient because you're doing more workload and again it's a trade-off what I want to do I can set these at per process or system levels as well there are a lot of other options you source more than this more than those were here this morning so a lot of these a lot of these things to control this just a simple level here and we've got the async thread pool which also mentioned trouble is well a trouble is that if you're doing file i/o or other i/o that is generally very slow compared to what the rest of the machine can do I mean reading stuff from a file compared to running instructions is very slow so if I have one one thread say running on one core and it goes out and does a file i/o and sits and waits for a millisecond that's an awful lot of work I'm not doing while that's hanging around I don't want to be forced to write non-blocking i/o by hand because that's a right pane if you don't believe me just try bit a node I want to be able to go read on the file and sit and wait for the reply and come back and keep on going it's a very nice way of writing things but I don't want to wait this block the system so there is something called the async thread pool that's a set of extra threads which are used for doing these type of file layers so when I do a file i/o ace one of the async threads is chosen out and that that read operation center is passed over to it and it's done there and if it has to sit and wait that it's that thread that sits and waits my thread will keep on going yes my process will be rescheduled but this but the thread itself will keep on going and doing other stuff as well too and then when the operation has been done I'll be my thread will be notified in this process be rescheduled I'll keep on going work working through so it's a very nice one and while I I have built-in mechanism views me through created I think by default it's ten I think you get now don't you yeah I think two versions ago zero but now it's ten and you can set that there's a there's the plus capital a for setting how many asynchronous threads you want to run in the system again how much how much I are you going to do if worth around I think - Oh someone have tricked me a Bachelor has an awful lot of them because they're doing a lot of i/o right so for them it's worth LinkedIn port drivers can use them if they exist if you write your driver in that way the idea driver never does its pregnant or not worth the effort for them to do that and yeah there's just nearing the end now here yeah a void long-running niffs we've all heard this right it's true it's true here to do that there are lots of different reasons for it memory is not freed while they're running while an if is running the scheduler has given up control to the to your C code so it has no idea what's going on has no idea of time or anything like this for as well - and it delays a lot of functionality that the thread needs to do so if you're loading code the thread can't take part in that while estat running your NIF because it needs to be in control this is no problem when you're running a line code because every function call you can go out and check these things if there's necessary do anything but here I don't I've hand over control I can do that also it's queues the schedulers view of how much work is doing so yes your scheduler can be running a full blast but the airline system can think is doing nothing because hasn't bumped in your reduction counts and I can't talk through it anyway like that so that has also happened that they can be put to sleep perhaps not wake up when necessary so this is a serious problem to be honestly it it's not difficult to work around so all native code only does bits a small work if you're calling an if it should come back pretty quickly to the airline system then you're running backing out and you're counting stuff and doing whatever you need to do in there and I think the limits somewhere around one to four milliseconds for if the if a NIF is doing more work were you if doing this little work it can tell the system I've been doing some work by calling the consume time slice function which just tells the system you have that I've done so I've done some work now and you can use that when it's balancing the system there are dirty schedulers which if I got the whole thing correctly as you move over these dirty things onto separate schedules and not block the other ones and it says quite perhaps but euphemistically that if you have control of your native code you shouldn't have to use this I just rewrite mine if so they don't block I can I can have mine if start run things in separate threads as well not gonna do quite a lot of what complex stuff in there I can do things in separate threads which won't block the system sometimes though I'm running other bits of code which I don't have control over if I might have to use them but if you can avoid them don't and will just end up with well one thing here had a crashing theme so anyone knows this well I mentioned this fill the atom table that's a good one I just create and I just create atoms and enough system goes down and I get a very big crash dump telling me all the atoms in the atom table run overflowed the binary space yeah uncontrolled process heap growth this is a good one I just create more data infant recursion I put more messages put messages faster on the message queue than the process can handle where I put messages and the message queue the process won't handle and eventually that just builds up more and more memory and the eventual just system will just run out of memory we have a lot of data that's more difficult but message queue is quite nice you typically see this for example in error loggers you're logging just too much information and then logger can't process so fast enough and the message queue builds up over time that system goes down errors and NIF's and linked in port drivers so NIF since linked import drivers are written in c and there you're out of control of the system so if you've got a bug in your code here you'll crash the Aling system if you do divide by zero and if the whole system goes down there's no way to stop it that's another reason why to avoid NIF's as well - by the way so yeah I just want to do one here's the Thank You just going to show one thing here a lock example just one simple example of why locks and synchronizations an example of a house how costly lost locks and synchronization can be it doesn't sound it sounds well what's the problem right so here we have an example of a program an airline program that actually starts thousands of our linked processes so actually doings and they're all running separately so it's actually this is actually very very concurrent that's very very parallel application and you look here you'll find that the more schedulers have put on it the slower it goes which is not quite what I expected okay so I'm running a parallel application here I'm not quite expected so if I flip the data the other way I can see the speed up it gets slower what's going on okay what's going on here that shouldn't be right this this is not what I'm paying for so it's big and I could get too many now yeah so how does this work though this spawns a lot of processes I think it defaults two or three thousand processes here so whether there's no chance of not having schedulers with nothing to do and it creates timestamps they all create a long list of timestamps they actually thought the belong the way and then send the result into a parent processes seeing their waiting and it uses our line column now to create time stance now Allen : now is nice if you haven't used it it returns the time from what's the first of January 1970 in microseconds as a topple of three elements so it's great but it has one property it is guaranteed that every call you make will be like it's monotonically increasing every call you make over the whole airline note okay so if a different process is calling it each one will become bigger and well that causes a lot of problems with the one I'm here after here is this has to synchronize so I call our line column now I have to go I have to lock go in get the value bump the value then remove the lock and I get my value if someone every time someone comes in there to start locking these things and so it was a nice idea and worked fine on a sink on a single processor but a typical type thing it doesn't scale and this is an example of why locks and synchronization are actually very costly and why a lot of efforts putting the beam to avoid these things because they are very costly and there's the more parallelism you have in your system the more costly this becomes right the alternative is to us : timestamp which more or less returns exactly same thing but there isn't there no guarantee it doesn't have any guarantee in doesn't need locking for it which is all the same thing so this is one of the things we just got wrong it's a nice feature because I can see I can give you absolute time warnings or other things but it's very costume so yeah that's just an example for it okay I will I will admit happily admit that this is very this is a pathological case right that's all we're doing but it's just an example of the problem so yeah that's it questions by Manny I managed to not run over time too much so I'm very pleased for myself yeah that's an accomplishment any questions nope clear as mud okay yeah yeah John the old is still running a long process I have absolutely no idea do you know no I have no idea who's had a system running for a long time I quite happily crash things I don't worry about that but yeah no okay no okay there was oh yeah dude shared binary reads use locks or synchronization or not they must use some I think otherwise you can't get around it you can you working on it so having some binary that's used by all the processes is you but you have to do only have to do things with it when you we only have to lock things we're actually doing something with it yeah I mean the date is just there right you need locking when you when you have where when you can have multiple threads doing something with the data reading starts reading safe front we weren't actually working on it and there's a lot of our optimization got on to doing so doing smart usage of binaries right so if you do something like a binary comprehension where you're building a binary appending new values to it the whole time you're actually not creating a new binary every time you write it by hand you might be but the binary comprehension doesn't it creates a binary that's a bit it's too big and you just keep filling in more data to it but I don't see the hole I don't see the whole block of data I see I see a little binary thing which is a reference to the data saying how where it starts and how big it is so I can have one I can have a big binary here where I see specific chunks of it as if they're precise chunks of it does loss but the data itself is not not effect oh now we're gonna be yep now that now there's some talking okay thank you
Info
Channel: Erlang Solutions
Views: 10,892
Rating: 4.9000001 out of 5
Keywords: Erlang, Erlang USer Conference, Robert Virding, BEAM
Id: _Pwlvy3zz9M
Channel Id: undefined
Length: 52min 53sec (3173 seconds)
Published: Thu Jun 19 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.