"Systems that run forever self-heal and scale" by Joe Armstrong (2013)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

thank you for inviting me to this it was uh great fun to be in i've never been in chicago before so it's quite an experience to be here it's quite unusual to find it colder indoors than outdoors because in stockholm it's the other way around um so when i was invited to give this talk i what happens when people invite me to give talks i usually say no and and then i say maybe and and then i write a sort of title that i think looks okay and i wrote a title and said i'd accepted this and i tweeted this title and could you turn the volume up a tiny bit more please um so i tweeted this title systems that run forever self-heal and scale and immediately um somebody tweeted back well that's impossible you know you can't i can't i mean forever is actually quite a long time so so i'm not going to not going to make systems that run forever they just run for a few years or you know a few hundred years or something like that but not forever because the earth's not going to be here forever anyway so um uh self-heal that's pretty obvious um if they're gonna run for a very long time they're going to get bugs in them so they're going to crash and we need to fix them as they're going along and we want to scale them up to sort of planetary wise things so that's that's the theme of this talk and what i want to do is kind of set the scene for that because i've been working with systems like that for quite a long time and some parts of making systems like that is pretty difficult and other parts are pretty easy and so i want to kind of just give you some kind of feeling for the bits which i think are difficult in the bits that i think are easy and how you go about doing that and what sailing got to do with all of this um so the plan for this talk is i just sort of introduced the problem domain it's this highly concurrent highly connected world um just talk about some algorithms and architectures and and then and then i just go into what i call six rules for making systems i'm kind of not so intro i mean this might sound funny coming from me i'm not really so interested in programming language i'm interested in solving problems airlines one of those languages that was developed to solve a specific problem it turns out to be very good at solving that problem surprisingly it's also good at solving other problems but that's not by design that's more by accident and i'll talk about the reasons for that given that you've got those six rules you could actually implement them in any any language you like uh if you constructed the language to have certain properties um you can bolt these properties on top of other language they weren't explicitly constructed in that way so you'll get the you get the sort of feeling for it but not not the not the behavior that you get in early i'll show you how they're programmed in airline right so so what's the problem domain well um i want you to think about somehow we've got to go from this this is that that's a sequential program that's that's all sequential programming languages look like that to me there's just one sort of thing and you've got to do everything in it right that's all sequential programming and here's airline you see so it doesn't look like that it's a completely different world you see you're not you're not talking about one process or one computation that's going to do everything you're talking about thousands possibly millions of computations that are going to do things and they're all isolated they're all doing their little things and they're all talking to each other so we have to understand this sort of world i don't want to understand that other world that other world is is a very strange world i mean if you're programming things in the real world um they're parallel i mean the real world is proud of hundreds of people in this room it would be extraordinarily difficult to try and describe that in a single pro i mean how would i describe it you're thinking that and then i sort of interleave and then you're thinking this and then you're thinking that and i go around you all and then we get back to you and how would you program that i couldn't program that but i could program it as you know 300 parallel processes i always like to think of failing processes like little people because we've we've all got private memory and we all communicate by message passing i've got my memory you've got your memory and i speak to you and the words go and this message passing channel is imperfect you you might not be listening when i speak or or you know a jet might go by and obscure the noise you don't hear it so that's really how my little processes work they've each got private memory they just send messages to each other we don't know if the messages get there or not if you want to know if a message got there you've got to send a message back then you'll know it got there so we're not really making any big assumptions about how the world behaves and this actually how i used to be a physicist this is how a physicist would would view the world you know there's no such thing as similar nuity we don't know what's happening now we know how it was at the time it takes the ray of light to go from a to b you know i i don't see you now i see how you were a micro or a pico second earlier because of the time it takes life to get from from me to you it's the same with software when we send messages to each other we don't actually know how they are now we know how they were in the past and we have to live with that we can't change the laws of physics we have to live with that so so this world has got lots and lots of computers um lots is sort of rather vague term but it could be anything from you know i wouldn't call two lots but it's certainly more than one it probably lot starts at about 10 and goes up to a few tens of thousands possibly millions of computers we don't actually know how to program millions of computers in early at the moment because we haven't got millions of computers but there are research projects who are figuring out how to do that actually the million core cpu sort of projects are sort of going on to the research horizon now of course if you've got this world with lots of lots of computers they're all executing things in parallel and it's distributed uh it's concurrent because they're all running at the same time and faults are going to happen and so it's not a question of saying well let's use type systems or let's use something so that no faults can occur it's more the case that faults are going to occur and then you're going to have to detect them and do something about it you're going to have to worry about consistency or rather the lack of consistency i mean if you've got thousands of computers it's just kind of unrealistic to think you can put it into a consistent state so you're going to have to live with inconsistency and if you want it to run forever you're going to have to change the code in it as it's executing i mean you just cannot there's no such thing as an atomic upgrade stop it change the code restart it if it is every single computer on the planet it's just not going to work so you're going to have to have this notion of things sort of partially upgrading themselves and an upgrade to take time and of living with different versions at the same time um so one of the things that actually the primary thing that erlang was designed to do was was to do fault tolerant computations so let's let's just think about that for a moment so um there's this kind of world this this is really two ways to design things you can either sort of start with small things and scale them up or you could start with big things and scale them down and see if you can get it the same way so suppose you're building a system and you want it to be said you want to build a system for like 10 000 people to use simultaneously one way of doing it would be to start with the system design it for 10 people and test it like that and scale it up to 10 000 the the other way would be to design it for like 100 million people i mean do the design for that and then scale it down to ten thousands and you you might not get at the same architecture you might get a completely different architecture in fact you would get a different architecture and i think in general it's really bad idea to start at a design that works for 10 or 100 things and scale it up it's better to start with an architecture that you know will work for a few trillion things and scale it down it will actually be less efficient when you've got your 10 000 than one you scaled up but you'll know that you'll be able to scale it up later so it's good so rather than ask how do we get to five nines i'd really like let let's make it more interesting let's let's start at you know nine thousand nine hundred ninety nine nines reliability scale it down okay so so let's make a system that's got 9999 reliability well that's easy well it's not quite easy there's a tricky bit right so well that's really easy because if things are independent these are just probabilities if the probability that one thing fails is ten to the minus three just get three thousand three hundred and thirty three independent machines around them all in parallel and uh the probability they all fail at the same time is ten to the minus nine nine nine nine so you've got your four or nine oh that's not four nines reliabilities 9999 live if that's pretty easy okay so the difficult bit is making them independent right so um oh and does this work yeah i've just put in a few slides to show that well we believe this sort of technology works this is um a very this is this is what i do right when i'm not here i work for erickson and this thing here is a connection but this stuff down here this this is your 4g lte wideband cdma 3g out here and this stiff stuff over there ip network that's internet right and we put a bot so so let me the software abstraction of that is this okay [Laughter] well software people do bubbles and arrows and things you see so this is millions of smartphones and that's erlang and that's the internet and that box in the middle that connects smartphones to internet now it actually connects about half the smartphone traffic in the world right so if you've got a smartphone and if you're using data traffic then there's about a 50 chance you're going through an airline thing but if you're if the base stations are programmed sorry if you've purchased your base stations from ericsson okay if you've purchased them from our competitors then they won't be an airline they'll be in c or something like that um so it's about half the world smartphone traffic is going through these things and we built a thing here this is this is erickson i've just sort of taken some stuff um this is a that's the you know i showed you the first picture that was the architect second one's a software abstraction this is the sales abstraction which you've showed to customers that thing is a box it's a big sort of box um and uh it's not really very exciting actually um but it's one of those boxes right we don't sell many of them actually has got the goals are six nines reliability and uh 18 million um simultaneous connected subscribers and uh low power consumption purpose that's all programmed in erlang and uh works quite nicely thank you very much so this is pretty similar to um internet high availability uh there you want sort of thousands to millions of users kind of soft real time upgrade them as they go along you don't really stop these systems i mean google doesn't sort of go offline to do code upgrade at midnight um it's you know there isn't really a midnight because when it's midnight here it's not midnight in sweden and so on so so there's not really a midnight when you perhaps say stop it at midnight in america and change all the code and restart it and that's what the banks do they they turn them off at four o'clock in the morning and leave them off it takes about an hour or something and then they come back online again and well we don't do that in telephony we google don't do it we want to make these things very scalable right so i i was kind of looking for some distributed programming is in a sense hard it's also difficult it's also easy but i i searched on the internet a little bit i was just searching for why distributed programming is hard i found this blog and i sort of thought that's right i don't know who this guy is but he'd written a blog he put down five reasons why um distributed programming was was uh was hard so i wrote them down here i thought yeah i agree with that it's difficult it's difficulty in identifying and dealing with failures when he wrote his blog i disagree with that actually i think i think that bit's quite easy i'll tell you why um point two achieving consistency in data across processes is difficult and i'm going to go into that in a little bit of a little bit of detail just to give you some kind of feeling for why i think it's difficult uh these big systems say they are not homogeneous that we we don't build systems in one language even if i'd like you to build the whole system and you're not going to do it with several different languages and try and connect them together and testing them um is pretty difficult because you can't they're not kind of reproducible you can't put them in the same state every time and often um it's it's not they're not these big distributed systems require an awful lot of special purpose hardware and kit i mean in order to test some of the software we were writing you you had to book a test facility and you could get like two days in three weeks and that wasn't enough you you you know you go into a room half this size full of equipment and say okay here's your test facility and you have no clue what to do so you need a whole lot of little chaps who run around and you tell them what you want and you've got to rig it and and that's what actually makes tests you know continuous development continues to you know all this continuous stuff well you can't do it if you need a whole pile of hardware that's really difficult really expensive and you can go into it move in for two days then move out again you can't just sort of rerun them every time you commit your program so it is actually difficult to test and uh the technologies in distributed systems are not easy to understand um i i'll give you some examples of that um i'll just frighten you off a little bit but they are kind of kind of tricky so i can look at data so highly available data i'm going to say data is sacred and computation isn't you really need to look after your data make sure you never lose it the computation that's just stuff that transforms data so if if a program crashes just rerun it or something like that that's not a problem you can perform the computation anywhere provided you can get hold of the data i'm going to look just at a sub problem and that's how to make highly available data um because okay so let's look at the where is my data problem i i like this so i'm actually got 10 million that's my thing you've got 10 million computers and you've put your data joe's data is on three or four of them so the first question you want to ask is well in order to access my data i need to know which computer i've put my data on uh how could i find that out right oh that's pretty easy we know how to do that um so most of the algorithms this is one of my favorite out of the chord algorithm uh that that tells you how to put data how to find the computer that all my data's on if i've got millions of computers so for those of you who don't know it i'll just go through it rapidly what you do is you map the ip addresses or the hash of the ip addresses and the hash of the keys into the same name space okay so give you a simple example of this and use md5 as the hashing function so suppose you've got three machines oh more than three i've got several machines there but the first three machines have got ip addresses 235 23 34 12 and so on so the first thing i can do is compute a hash sorry an md5 checksum of those and and i did that there and then sort those md5 check sums okay so the next list here that's the sorted md5 checksums and then if i've got a particular data suppose suppose i got a key something like this male number two so if i hash that into an md5 checks i will also get an md5 checksum if you compare that with the this list here okay and you say well that occurs in the third position in this list here and the third position well between the second you see the the first two entries the hash of oh nine nine three four that's machine s2 because i've sorted them and the second hash is machine s3 and so on so this hash here occurs between the second and third entries in that row and that says it's just a conventional i'll store that data on the machine with the hash that's the first value that's lower than that okay so i'm going to store that on the machine s3 okay so by hashing key values into the same namespace as the hash of the ip addresses we can figure out which of these machines to store it on and you want a replica well you hash it again so you get two machines um and store the second one on the second sheet because it might hash to the same machine so it went to the same machine just do it again and that's the chord algorithm right and it's actually the basis of all on the all distributed hash tables and most of them work like that so it's actually quite easy to understand very pretty algorithm okay so let's look at these replicas so if they can find the data that's not a problem that's pretty easy but i'm trying to replicate data in case the machine's crashing so we'll store in two or more places now life becomes a bit more difficult so so your first thing is okay the first thought that people have about replicas is um i know what we'll do we'll keep an odd number of copies and we'll vote you know you'll send out like say keep five copies and you send them out and and when three of the machines have said yeah yeah we agree then you say yeah it's okay what about the two that didn't agree i'll forget about them for a moment um uh this uh we'll forget about them for the moment is gonna come and bite us in the bottom i think uh later but never mind it sort of looks okay the majority wins and what about the time oh oh that's brilliant look at that we've got some master machine here and uh it's broadcast to five machines and then it gets five answers but all this takes place in parallel okay so in wave one you send out five messages and then you get five replies um that's great fantastic as soon as you've got more than three replies you can say yeah that's great so you can have 17 replicas or you could have 9999 replicas and wait until more than half of them are applied and and you're okay which isn't that easy you see it's linear time don't do it doesn't matter how many replicas easy to find the machine everything's simple yeah now at this point you're thinking ah distributed data is really easy right right so making replicas is pretty easy isn't it right so if we can make 9 999 replicas world's going to be easy so i just thought let's look at two i'm a bit sort of you know i like to understand things so so so i think well i don't really understand 999 right let's try and understand what's involved in making two replicas because that's a much easier problem right so so here's here's some here's how here's how we got two replicas you see let's suppose we've got a master machine a and a replica machine b and we we send a message to it set x is 10 sends a message to the b machine set x is 10 and you get an ack from the b machine you get an act from the a machine so so at this stage what we know about the system is asymmetric right because a knows that x is replica is replicated on b but b does not know that x is replicated on a because somebody might have told b a to change the value and it hasn't yet got a signal right so it's asymmetric and that's going to be very nasty right so so let's see so what happens if everything let's suppose we've got a well it's not going to kind of work like that because you've got two machines a and b and you've got a client they can't talk to a all the time because they don't know where a is so normally you have a sort of load balancer or a master machine that knows where a and b is and so the client actually talks to the master machine who knows where a and b are and it always talks to a so this is what happens if everything works so the the user sends a message to the manager that knows where a is that that's the manager is a program that's executing the chord algorithm okay so it sends a message to aceta and you get all these acts back and it all works beautifully and there we go so we've set x equal to 10 and then we set it equal to 20 and everything's fine right so hang on what happens if a crashes and then the client asks for the value of x well up to this oh sorry so it's crashed at this point so the user set x equal to 10 and some stuff's worked here and that's fine then he said x equal to 20 and some stuff's worked and then this process crashes at this point here so the master here knows that a has crashed and now somebody asks for x okay hmm well the master knows that a is crashed so he asks b and he gets back a value of b that's 10. so what can you reply oh dear this isn't a sort of unsafe value of x i know it's 10 but it might not be right because you see it actually i mean it's been 20 here but he can't he can't ask a because i is dead but he knows that eve died and he knows that the so oh dear so we've got this um unsafe value of x now what do you want in your program well you do a get and put into a database and you say yeah what's the value of x yes 10 what's the value of x 10 oops what's the value x well i think it might be 10. so so programmers don't want to know this [Music] and so other programmers put abstractions like two-faced commit so even though you're getting it well i'm unsure about this really they say oh i'm sure i'm sure it's ten you know right so now we have to what happens if the master now well wait a minute look hang on so this was kind of tricky you see here's the a has died but the master process hasn't died it's still living and then it's beginning to get a tiny bit complicated now so now we said oh what happens if the master process itself dies right this is you don't want to know stuff well i'm going to tell you a little bit right life gets a tad tricky okay so have any of you heard of the paxos algorithm right so you know life is a tad tricky because of the pa so the paxos algorithm is distributed leadership election it's the problem you have a leader just dies and then you've got to in a distributed way um figure out who a new leader is and you kind of vote and take the majority but as you're voting you might lose messages and they might die so if you look it up say on wikipedia or something you'll find these signal diagrams they look they look pretty much like the ones i've been drawing only there's about 15 to 20 of them that look like that but that's one of the simpler ones that will fit on the slide so in in making this lecture i did a a little bit of research and i i found i found a quote i rather like because it was from a paper um the paper was called in search of an understandable consensus algorithm i like the word sort of understandable consensus algorithm and and to read a bit it says unfortunately paxis has two significant drawbacks the first drawback is that paxus is exceptionally difficult to understand the full explanation is notoriously opaque few people will succeed in understanding it and only with great effort as a result there have been several attempts to explain it these explanations focus on a single degree taxes they are still challenging leave them leave the reader with not enough information to build a practical system in an informal survey of attendees at something over we found few people are comfortable with practice even among seasoned researchers we struggled with packs ourselves we were not able to understand the original paper that's the one pax has simply explained you know so um uh until several simplified explanations in designing our own alternative protocols and look who wrote it well one of the authors is john us to have do you remember tcl very nice and so john us i mean he's not a dumb guy right he designs programming languages and programming language designers um sort of a bit anal about understanding stuff so try and figure out how it works so so if john us says he couldn't understand it and it's kind of tricky stuff that's that's good enough for me you know i mean it is actually so that's the bad news okay so what's the good news well the good news is that it's been solved in the airline i'm going to say quotes in the airing libraries you see so so um so there's a gen leader thing in um in the yelling libraries that itself is a sort of that was this problem was so difficult that um we had a research project to do this and the research bit of it was to i mean the project was to write the leadership election algorithm and the research bit was to prove that it was correct right and they proved it was correct and it ran for several years and then i can't remember when it first was but then they got a bug report that the leadership election was actually electing two different leaders and it was reproducible and this was a bit of a problem because it actually proved to be correct so so it went back to the people who proved it and they said oh well we had to make a few simplifying assumptions in order to do the proof so um it's believed to be correct at the moment and if you don't want our libraries there are things like react um which is a database which implement these kind of algorithms and they actually if you find react actually is quite nice because instead of um sending this normally things hide the voting structure so if you've got five replicas and you really wanted to be sure that it really worked you'd really like to maybe know that you've got five commits not three most databases don't allow you to do this react has this kind of layer of fine control where you can sort of say well i really do actually want to know exactly what's going on and you could sort of go in at different levels which is very nice so this was just a little sort of introduction to some of the problems you get in in distributed programming um i don't think that you need a lot of complicated fancy algorithms to do distributed programming but certain ones like replicating data you do need and you certainly don't want to implement them with yourself so you really want kind of rock solid industrial quality implementation for that stuff and don't try and build it yourself unless you're a research group and want to spend 10 years doing it before you you know if you just want to build something take these libraries and use them you don't actually need that many there's libraries like react um amqp the rabbit mq is a persistent message queue think about it you can implement a heck of a lot of things just using persistent message queues and it's just nice to know that somebody else has done all that work of putting that into the library these are the libraries you want you don't need libraries for web services any fool can write them right but this stuff is difficult okay right so let's let let's let's look at some rules for the these are these are sort of properties that you need if you want to make fault tolerant systems and you know when people i don't like you know slides with well i've got slightest bullets but you know trouble with slides with bullets is all the bullet points are the same size so you don't know which is most important um so really i'd like a patent you know big bullets mean it's important small books mean it's not important so if you put on a bulleted list all the things that are important the the the biggest bullet of all is isolation processes must all computations must be isolated from each other that means that one process can't crash another one it doesn't really matter if my process is i mean suppose i've proved my process correct and tested it and it's wonderful and then some other bloody idiot comes along just crashes it you know pulls pulls the plug out you know i've proved your program to be correct i've built a fault tolerance system you know you've built it on one computer and some guy comes and i say oh yeah and pulls out the plug it's not a fault tolerant is it you know so this isolation property is very important i i know because this computer is isolated i know that if i just walk into the can can i walk into the audience do i have a volunteer who will give me one of these beautiful computers i'll smash it and show that my slides are still working do we have a volunteer or do you believe me sir can i take your computer to smash it do i do i need to smash your computer or do you believe me you believe me right that's great take the challenge you know sort of you know i'd be really surprised if i went and smashed a computer here and the slideshow stopped or if somebody smashed a computer in sweden and it's stopped that's isolation okay and we want that for our software right so what does once you've got isolation what does it give you what do you get for isolation you get a heck of a lot of nice things you get you get fault tolerance because you know if i if i just smash this machine it's not going to cause what's going on in these other machines to to to to stop it gives you scalability because if i have organized my computation so that it will run on one machine isolated from another then if i want twice the power or ten times the power just get ten computers and put ten of these things so designing my system for isolation making sure the bits are isolated gives me scalability and it gives me reliability okay pretty much the same assault it also gives me testability because i've isolated it i can test it in isolation if it's not isolated if it's lots of bits all glued together how the heck you test it the more i can isolate the bits the better i can test them it gives me comprehensibility because this act of isolation is taking things into smaller components talking to them through protocols that gives me the ability to test them okay and it gives me the ability to upgrade the code in them because once i've isolated all of these i can now go around the room turning on off turning off one at a time changing the code and putting it back online again then going to the next one so i'm going to get all these things for free provided i have made an architecture with isolated components okay so once they are isolated the next property i need is concurrency can i describe a computation in terms of concurrent processes because if i've now isolated them they are concurrent i can't do anything about that so it's a kind of it's does my system i'm not talking about programming is it an operating system a programming language does it enable the description of concurrency in a sort of rather natural sort of way okay um well why do i want concurrency well um world just happens to be concurrently parallel you know just go out on the street if you don't believe it's concurrent just walk over the street outside there you know when all the cars have whizzing by you know tis all these cars are sort of zooming around all this stuff is happening if you try and describe that in a sequential process yes how would you do it i don't know how you can't so i mean the world actually is concurrent so let's try and describe it in a concurrent language um or a concurrent way that doesn't mean to say it will execute in parallel it means the description is as of a set of parallel the concurrent activities you see i can't if if i run it on a sequential processor one core computer it'll never be parallel it will be concurrent however i get the illusion of parallelism by interleaving the cpu if i've got like a thousand computers i've got a 10 core cpu i can run 10 parallel processors on it i can't run 11. i can run a million concurrent processors only into leaving on to the 10 processor we have to distinguish these two two ideas um we also need you know at least two computers to um to make a non-stop system i always thought that was obvious until um you see i've been working for this so long i didn't realize that people thought oh you actually you know you can make a fault tolerance system on one machine i said what happens if you smash the whole machine oh i didn't think of that so um so erickson you see we we um we we do this sort of mike mike williams who worked with him a few years and co-developer like he said he was erickson built military systems and they were they had a bit that built build things for generals and people with funny hats on and um and uh so this general came in and said you've got to talk about you know two machines you're not one out it carries on on the other one and um and and and he was very pleased about this because he said to my always wonder what will happen with a new kit one of the exchanges so you're telling me if a new kit won't exchange it will still carry on going well sort of my mental model of this was like two computers sort of in the same building it stood up with a wire in between and duplicated power supplies so you wouldn't have sort of mental models of being like 400 kilometers between them to guard from nuclear weapons and things but a little bit of thought said yes it would work so general was happy uh yeah so you know concurrency it kind of goes together with this distribution well i mean if it's distributed it is concurrent you've got to live with it you can't so so here's a funny thing you see if you want to write fault tolerant software you have to do concurrent and distributed programming whether you like it or not it's not an alternative not to understand distributed programming if you want to build fault tolerance systems because you do need two computers they are in parallel they are distributed you can't get away from that right next thing you want to do is detect failures machines are going to die so if you can't detect the fact that they've died you won't be able to correct any errors that occur when they die pretty obvious failure detection right so yeah if you can't detect failure you can't fix it and this has to work not it's got to work sort of transparently across machine boundaries um and that implies uh actually non sort of non-shared data structures you've got a cop you can't have like dangling pointers over machine boundaries you can't sort of lazily transfer data and have half the data on one machine and half the data on another machine because if one of the machines crashes you've lost half your data because it sort of implies this kind of pure message passing where you copy everything the the real reason for that is fault tolerance it's not to do with anything else it's not to do with efficiency or anything i mean i also you know first of all you make systems right that means fault tolerance and then you make them fast you see so consequence of making them right is you can't have lazy data structures with dangling pointers over machines you have to copy everything otherwise you can't make it fault torrent and actually i also want it to be asynchronous message passing because you can't do synchronous message when the world is asynchronous light sort of moves from a to b and takes a certain amount of time and you can't change the laws of physics right uh we build fault tolerant things into into little in airline we we build them into trees of things that sort of watch each other and these work transparently over machine boundaries and this is a kind of logical picture they could all be processes on the same machine or they could be processed on different machines it doesn't really matter you need also fault identification it's not enough to know that something's crashed you really want to know why it's crashed um at the lowest level ellen gives you a kind of simple uh yeah it crashed because you divided by zero or it crashed because something was wrong uh but you can tune that you could you could send a lot of entire databases with you when it cracks you send the entire state of the system when it crashes if you want to yeah and yeah i said that and then you want live code upgrade pretty obvious won't really say much about that uh i did say something about it you want zero downtime once you've started you know no there's no stop button on your software you press the start button that's it go home go forever um and you want stable storage i mentioned stable that's this you know store my data um that's one of the tricky things to do that that's you can't that's not a property of programming now it's a property of programming language and algorithms and operating systems and all sorts of stuff but it's kind of pretty good now uh you can this is kind of off the shelf stuff uh you know it's quite nice if you've got proper storage you don't need backup anymore what the hell got backup for you don't need backup backups something you want in case your machine crashes but if you've never crashes you don't need backup do you so you remove that you do need snapshots you know because you overwrite your data but you don't need backups uh you you need crash reports and things like that when it crashes it's nice it's nice if you can come back later and say oh well despite it's all crashed it's still in the storage why it crashed when we started um i just wrote a quick few quotes from various people about harry potter systems and things like that well um jim gray that's actually the paper why computer stop and what can be done about it um do anybody of you remember the tandem computer oh excellent good not the only one with gray hairs um eric is in one sense the tandem computer in software it's architecture it's very much like the tanner because um i read jupiter's report and i mailed him and said uh you know ellen he he'd he postulated if you do all this in software it worked pretty much like the hardware he said and i said yeah well we've done it in software it doesn't work pretty much like the hardware and he mailed back and said yeah i know it would do he also said welcome to the church of concurrency he said in his mail which i thought was rather nice and he said in order to i just highlighted a bit see process achieves fault containment by sharing no state uh with other processes only contact with other processes via messages carried by the kernel messaging system so that was the only way that uh data could be exchanging in tandem and uh he said the key to software fault tolerance is to hierarchically decompose large systems to model each module being a unit of service and a unit of failure and the failure of the module should not propagate beyond the module this seems to me straightforward actually um i haven't got it on this slide but postel's law of um making matters worse law you know sort of be be tolerant in what you accept and you know this is absolute nightmare because you know you send what are you supposed to do you define a protocol and then you send a message that is in violation of the protocol and you say oh well that's okay we'll let it in anyway you know it's horrible you shouldn't do that you could crash immediately no and then and then gray a guy called great oh well it's pretty much the same transaction mechanism and process per software modularity through processing that's the same same sort of idea and uh there's a notion of fail fast um the process approach to fault isolation advocates that process software should be fail fast that either a function should function correctly or should detect the fault and stop operating as soon as you've got a fault you stop don't make matters worse by continuing right and then fail early this is uh this is somebody else uh a fault in a software system can cause one or more errors the latency time is the interval between the existence of the fault and the occurrence of error can be very high which complicates the backward analysis of the error uh this is a paper from uh sort of business systems if you if error occurs now and you carry on and carry on and carry on and carry on and then you get the symptom it's very difficult going backwards to find out where it occurred right and here we go oh alan kaye uh this is a nice melee center a reminder uh that it took some pains at the last opportunity to extend this small talk it's not only about its syntax and class libraries it's not even about classes i'm sorry that i a long time ago coined the term object for this topic because it gets many people to focus on the lesser idea the big idea is messaging so the big idea of object-oriented programming is messaging it's not objects and it's not classes and it's not methods it's not how they're organized that's abstract data types the big idea in object building program is messaging and poly poly uh let's say polyphonic polymorphic messaging right and of course all this is in erlang uh and schneider uh oh it's just it's the same sort of stuff anyway i summed all this up into joe's quote let it crash right so just just let your software crash now crashing is a big deal in sequential programming languages because you've only got one process so if you crash you're screwed right well what else do you do okay so wait a minute the corollary to this all sequential programming languages get error handling wrong every single one it doesn't matter what they are doesn't matter if they're java haskell small talk all of them they're all wrong because they all have this notion of one sequential computation you've got to do everything inside it so you have to keep it alive no matter what happens so what happens when you're writing a program and you get to the point in the program where you cannot return a value to something where it doesn't make any sense to continue because you figured out that what what's going wrong didn't agree with what should be you've got a violation of your internal logic what can you do you have no alternative but to crash you can't logically continue there is no way you can suppose you suppose you've got a factorial you know suppose you've got a divide divide by x and x turns out to be zero what do you do you can't continue in it you can't i mean throw not a number or something come on you know just crash goodness sake you see it's not a problem with this airline because you've got millions of processors all watching it who cares if a few processes crashed right right so so there were six rules so how are we going to program them well what's erlang um going to program you know uh so that length is a functional program what kind of three things um which it's like this will come you know when you unwrap the parcel you find three things inside it it's it's a a functional programming language um probably in respect to haskell it shouldn't actually be a big f it will be little little f it's got a few non-functional things in for pragmatic reasons um you can program in a very functional way if you want to you can also don't have to uh it's a set of design principles they they go under the name of otp it's a very self-evident name middleware open telecoms platform oh well we're not doing telecoms it's useless right so um but it's all about that's all about fault tolerance and setting up these things and it's a virtual machine called the beam b stands for well it was designed by bogdan houseman this is what it's a b in it so it's bob bogdan's abstract machine but then bogdan left and burn took over so so bjorn gustafson now now supports it and so it's bogdan and bjorn's airline abstract machine um and right so how do we program the fixture well use libraries or we could use things in airline that were designed to do that okay so uh what about the ice okay let's go through the the isolation rule well airing processes just are isolated uh one process cannot damage another by accident it can you can damn it it could damage it deliberately extend an exit signal to it but that's not accidental you shouldn't be perhaps you can't accidentally damage other things and also you can't really kill it it might set a flag saying no you're not allowed to kill me um but there is a special message that you can actually kill process even if they've said you're not allowed to kill them and an airline node can can have millions of processes um they take about 300 bytes per process you get three uh three processes per kilobyte to get three thousand processors per megabyte and three million processors per gigabyte um but that's before they've doing anything so you know with a few gigabytes of memory you can you can run up you know a few tens of millions of processes without any problems just try it at home it's quite easy they've got no shared memory and they're very lightweight they're much lighter weight than threads i mean operating system threads are very very very heavy compared to to to align presses and threads are evil anyway because so they share resources so you know my treads don't obey this i mean here's the crazy thing you've got nice things in operating systems which are actually isolated so one process can't up in other processes internal data structures but threads are evil because what's the difference between a thread and a process are the threads can up each other's internal data structures so they're absolutely the things you don't want to program with right uh concurrency well airline processors are concurrent yeah they all run in parallel in theory and uh actually on a multicore they really do run in parallel uh but you can't get more parallelism out than the number of cores so you've got a four core machine you're not actually running more than four of these process at a time and they're scheduled and they're preemptively scheduled and they they um they they do within a thread they block for i o but they're non-blocking with respect to other processes so so um uh the process will go like read write read go read write read write read write read and when it's when it says read it waits but it doesn't affect anybody else so they all can carry on so it's not one of these funny languages that say uh issue a read request on read complete call this function over there somewhere bye and then and then the read complete goes it goes right right right read on read complete go over there and do something you know it's not like that so it's actually easy to understand and it's actually just three little things in early you know spawn creates a parallel process pit bank message sends a message and receive the pattern um oriented uh syntax what it actually does it doesn't actually send a message to the process each process has a mailbox okay so you send to the mailbox so this this this this is just like regular mailboxes pid bank message tells the vm which is a postman says deliver this message to this guy so it doesn't need to know where it is this is location transparent this will find its way over the network it's just like a letter if i if i put naca garten27 on a thing and pop it in the post it'll turn up in my mailbox in sweden okay so that's the same as in erlang you once you know this address bit bang message just to send it off and it'll end up in the mailbox and this receive operation says hey just go look in the mailbox and see if you've got any of these mails it just takes them out any other mails that are there won't lift them out so there's a patent matching operation that lifts things out of the mailbox right that's the concurrency bit the failure detection bit just says well you can set up a link between these processes um if the process dies it'll send you a message and it'll end up in your mailbox that's all it says actually two types of messages um actually the technical signals um what really happens is if you don't do anything and you link to another process and it dies and if the reason it dies is because of an error not a normal death you you yourself die and you send out messages to to to the process you're linked to and the reason for that is there's a there's a classic problem in distributed programming and that's of you can set up lots of parallel processes that collaborate to do something but what happens if any one of them goes wrong at that stage you want to make sure that everything dies that's involved every single process you might have 15 processes all sort of spread out doing stuff and if you want the invariant if anyone dies we all die so that's what the link does just set the links up it doesn't matter who starts they'll all die and it's an invariant um that's why we don't do defensive programming actually we you know file system on unix you the process opens a file and starts reading and writing to it if it crashes the operating system closes the file in a nice sort of way so that so the next program that comes along can open that file it's not like the file is broken forever because the process that was using it crashed so real time things we grew we we all we do is we sort of assume they're going to work program them as if they're going to work and then set up some guys who monitor them and if they fail we put back these invariants just kind of pick up the pieces and put them back together again let's see failure detection uh and yeah we pretty religious about fixing errors remotely um you can't fix your own errors okay so if i if i fell over and had a heart attack and you know need open-heart surgery i'm not gonna do it myself you know like where's the knife you know some medics come in and sort of carry you know some other guy fixes me up when i'm you know dying and things like that so we do that with processes think of these errors they're like little ambulances that sort of run in and fix up things that go wrong it gives you very nice separation of issues you've got processes whose only job is to do stuff and you've got other processes only job is to repair what's gone wrong if the process that do stuff fail and then you've got managers if their only job is to manage people and start and stop them and and do things so it makes makes it actually pretty pretty nice and uh yeah here's one or two little extra things that we bunged in here spawn link that that spawn created a process spawn link creates a process with a link so that if the process dies you get a message if you don't do that statement there your process will die if the other process dies but if you do put that statement in uh you get a error message which you can receive just like anything else and we've got some live code upgrade uh this actually it's it's quite convenient to that this actually means something different if you call if you call bar and you've got a module with bar in here and the mod is called foo it'll always call that no matter what happens but if you say food colombar and you upgrade the code for bar at the same time as you're running it then this will hop into the new version of the module and carry on in the new version and the old one will carry on in the old version so actually you can have you can now without changing the names of any of your modules uh you can actually have two completely logically different versions running at the same time they've all got the same names and and this this old version is well actually it's a mistake in the language um when airline was designed we didn't have much memory uh i mean that was 86 something like that today i would do it different today i would have multiple versions with the same namespace and garbage collect code you know if you're if you're performing a computation i mean suppose we spawn 500 processes to do the same thing and some of them are long-lived some of them might last for months others are short-lived and then and then we go again in a second do it again we might change the code while we're doing it um we don't really want to change the code in the things at all every point in time you maybe want them to run to completion even though the code is old until we get to the next time they restart and then give them new code the next time they start but we don't want to kill off the old ones using the old code just because we've changed a new code so it's like having regular modules or classes or namespaces but thinking you can actually run the old code and the new code at the same time so imagine java you've got a class called something or other and it's in the jvm and you change all the definitions and just bung it in again and bung it in again and banging it but the old ones that are still running elker they're still running the old the new ones are running the new code and it's all hunky dorian works fine and then we've got stuff for stable storage um manesia is a real-time database that has the property that you can configure it for multiple nodes and it replicates in ram and and does things like that it's all very nice i'm running out of time oh let's see so once you've done all this once you're thinking in terms of isolation the isolation thing forces you when you design your system to break it up into separated components so it's kind of rather nice you see in order to make things fault tolerant you have to design it so that the components are isolated and so you've got your scalability for free you didn't have to think about how to scale it because you've designed it correctly hey that's great it comes for free but it's the other way around suppose another guy is going along so i want to make a scalable system so you design it in isolated components it's become fault tolerant for free well hang on wait a minute if they're isolated and it's scalable maybe we can run it on a multi-core yeah and then it should go faster yeah so when the multi-cores get this is probably why air language took off back 2004 the multi-core started coming along we ran a lot of programs on multi-cores they just went faster we didn't have to do anything to them at all just bung them on the multi-cores they went faster um so the goal was you know sort of you know if you go to quad core take your old program bung it on the quad core should go four times faster we hope some programs do actually our goal is to get like 75 of the number of cpus in speed up so if you've got 12 cores we hope to get like eight a factor eight speed up without doing anything of course it depends how you've written your programs if you've written them with one big process you get no game but if you've written them with lots of little processes you should get these gains it's very unusual that you don't get any gain at all and then as you start scaling up funny things can happen i'm going to show you a little bit about that in a while uh so actually it's it's kind of it's kind of fun because everybody else is saying uh how do we take a sequential program and run it and parallelize it so we can run it on a multi-core and the airline people are saying well it already runs on a multi-core how do we find bottlenecks so look we're looking for bottlenecks now we're not we've already figured out how to when you write your program every time you write spawn in your program you're saying oh i've decided that can be done in a parallel process and then you just bang it on a multi-core goes faster um we're looking for the bottlenecks now and i'll show you that because that's quite fun uh so what do we got when all of this we've got this is the kind of summary we've got lightweight processes fast message passing total separation between pros stuff very nice stuff actually more of this is kind of what you would associate with operating systems rather than you see airline's not really you don't need an operating system really the operating system just sort of when airline starts it just says to the operators give me a heck of a lot of memory please and that's about the only monitor call it makes and then it does everything itself i mean it schedules itself and it doesn't use the operating system shared or anything like that that's awfully big the stuff let's have a look uh and we've got all this stuff um i'm gonna hop over a bit of this it's taking too long uh yeah i've run on to a few little projects just to show um it's not just erickson that uses erlang um couchdb was one of the first projects that kind of spread a little bit and quite a lot of people use and then react um there's some react people here so go and ask them about react i'll tell you how wonderful it is chef 11 in case you don't know this is um this is the thing that it's a it's a management system that monitors machines so i think the chef 11 servers are written in airling now um it's client server architect servers written in airline clients are written in ruby and you monitor loads of machines so one airline node that can monitor about i don't know 100 000 machines or something like that so that's being rolled into facebook so airling is will be controlling every single facebook node on the planet um it's quite nice because it's sort of it's good for that kind of thing it's good for monitoring thousands of machines good for having you know you can get about three five million sockets running on one reasonably sized machine in airline uh moki web uh that was running facebook chat with um run using that for a while uh it's not using it anymore um it's been replaced by it by i think a web server called cowboy you know scolaris was a distributed peer-to-peer thing actually they implemented the wikipedia not the real wikipedia the research group in germany won the prize for the best scalable system they they implemented the entire wikipedia using something like one-tenth of the number of machines that the wiki really really uses but it was written but they never managed to sell it into wiki i think uh this um this is quite interesting egyptian it's it's an xmpp server it's the most widely used xmpp server in the world and it was actually that code that was modified to form makiweb and a lot of things and it's actually um whatsapp you know the whatsapp application this is a mobile thing which switches i think they were saying 10 billion messages a day or something that was actually based on the xmpp code by a student who'd learnt airline and they built this thing i think value to i reckon they'll sell themselves to google or facebook for a few billion dollars so i mean that's what you can do if you just learn a language student just just write i mean nobody's interested in commercial uh you know in sort of programming name but they you make money by building applications that people want to use uh rabbit mq this is the first compliant implementation of the advanced message queue protocol rabbitmq did this and then they were required by by uh well i can't remember the name of the company it was then acquired by um vmware so the vmware distributed systems actually using airline sort of hidden in the bottom layer using rabbitmq to to manage them it's quite fun uh and uh yeah uh a lot of companies erickson's pretty big and ellen using a lot basho of tlf uh clarner this is these are all swedish companies you probably don't know much about them but i mean at the current rate of expansion this is going to be bigger than google in about six years time it's a financial company and it's grown it's the first pure airline company that's valued over a billion dollars and it's got like 800 employees or something from in five years and it's going crazily good um and whatsapp is tailf is managing a lot of cisco routers and things you know there's um a load of books uh this is a one this is actually a really good write um i don't know if it's a good read but it was it was quite fun writing it the second edition of the airline book so it's got all the new things in airline that none of the other books have got in yet right so so a couple of slides on some fun stuff um what what's in the future i'm going to show you two slides um well i think this is kind of alexia is really fun it's like a son of airline or daughter of erlang it's a language that uses the same you know i said earlier three things a language it's a set of libraries and it's a virtual machine so alexia is the same virtual machine it runs on top of the airline virtual machine so all these nice things i talked about all these things about links all these things about process isolation all of that is in the alloying virtual machine it's not anything in airling itself it's in the virtual machine so alexa gets all that stuff and it's got a ruby-like syntax and it's got some fancy meta programming in that you can't do in airline so it's actually very nice very it's really cool and it's taking place in a completely different way than early so when airline was developed that was before the world wide web that we were talking about this in 1985 1986 robert heading and i and mike and we i mean the the number of people who could take part in this debate about how a programming language should be was limited to to the number of people in the coffee room you know we talked about it every day oh i should do it this way no i don't know but this debate is taking place on the net so i'm you know there's a mailing list there i'm talking to yourself alien and mailing him and say why do you do this and a lot of people are contributing ideas people who've never met you know people in the soviet union people in america and it's airing i think the gap between the first book and erlang was somewhat like 14 years right the gap here is about a year because dave thomas is uh who's the guy who introduced ruby to everybody he's written this programming alexia and the first drafts out of the beta book you can buy it and read it now and uh there's an o'reilly book that's out i've read both the drafts and um so the the documentation the books which you need are being co-developed with the language we've never seen this before it's always been this long gap even ruby ruby was kind of developed and then dave wrote the books so i think this synergy effect is going to be very good and i think it's a very exciting experiment because the otp people are really keen on this and i'm very keen on it because we can combine kind of youthful enthusiasm with sort of this old style engineering practice we know we know there's a lot of engineering there's probably hundreds and hundreds of manuals of work in the virtual machine and in the libraries and we can just sort of glue them all together so this is very exciting uh the other stuff this this is cool i'm just going to show you a little video uh if i can make it work this is concurrence now somebody earlier asked about wait a minute i've got to go to somebody asked about um could you somebody said you know can you find bottlenecks and things like that so let's remember that picture i showed with lots of little dots that say how do we understand the system like this well let's see if this works oh where's my thing to press oh heck oh there let's hope this works what shouldn't be so violent there any sound please no thank you joe and hello lambda jam my name is alex canaris and i'm the ceo of concurrence corporation today i'd like to show you a few of the tools we've been working on for erlang performance our flagship product is a real-time visual profiler it consists of two parts a tracer which hooks into the erlang virtual machine and a set of html5 visualizations installation is easy simply add it to your rebar config and away you go the code here is for a website that generates mandarin sets it's a simple program written in the excellent chicago boss framework now let's see this in action here it is running live in the concurrence visualization tools every circle represents an erlang process the size of the circle is proportional to the amount of computation lines represent communication patterns and the yellow dots represent the actual messages being sent so far it's pretty stable not much happening let's add some load to it and see what happens as we're adding load you can see hot spots develop as the program struggles to keep up the beauty of these visualizations is that it makes it super easy to see exactly where your program is has a bottlenecks here we're zooming in on the bottleneck in this particular program bottleneck that we found is a common one in erlang where many worker processes communicate with a central singleton process and that singleton process becomes a chokepoint for scalability to figure out what it is simply put your mouse over it and you'll be told the module name function and process id of the offending bottleneck well that's it for our whirlwind demo of a real-time visual profiler we hope you found it interesting and useful and look forward to your feedback thank you all very much so that's what you can see when you know it's like a gas i'm wondering if it's going to obey boltzmann's equations or something or um boyle's law pv is rt you know if you if you compress this you know the processes get hotter and they sort of buzz around faster and the cpu will get warmer and i i want to see if if we get to like statistical mechanics for understanding how these work because i think we need to go from this yeah i'll show you this picture we need to go from this world where we think that we program using single processes to this world where we program with sort of thousands and thousands or millions of processes that are communicating with each other so i think if you take away a message from from from this talk is i think that's what we need to be thinking about it's not how single processes behave it's how aggregates of millions of processes behave and how to build systems out of that so i think that's about where i get to yeah thank you

Info

Channel: Strange Loop Conference

Views: 18,123

Rating: 4.9490447 out of 5

Keywords: Erlang, OTP, distributed systems, concurrent programming

Id: cNICGEwmXLU

Channel Id: undefined

Length: 70min 22sec (4222 seconds)

Published: Fri Mar 26 2021