The Do's and Don'ts of Error Handling • Joe Armstrong • GOTO 2018

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] okay so this talk is about error handling and how we build fault-tolerant systems and so what's a fault-tolerant system we'll have the thought tolerant justice system is fault tolerant if it carries on working even if things are wrong so might not do everything it's supposed to do but it won't just fall in a horrible heap on the floor and stop working okay and I've been doing this for quite a long time I was gonna say work like this is never finished it's always in progress they're always finding better ways to do this I've been doing it for about 35 years and let's don't think if we still see new ways of attacking the problem hardware can fail that is relatively uncommon Hardware attempts to be pretty reliable once it's once it's working it works quite often if it fails it fails catastrophically and you can't put it back into service software on the other hand fails an awful lot there's far more software errors and hardware failures so so if you're gonna build a fault-tolerant system you need to take account of software as rather than hardware errors and hardware errors you can you can more or less eliminate with replication you could just replicate the hardware several times and the chance is that once it's replicated that they both fail at the same time very low so I mean if if you've got some hardware and the chances it fails in a given time intervals ten to the minus three they get to lots of it and they're independent the probability they'll both fail out at the same times ten to the minus six and if you get a hundred lots of hardware the chance they all fail at the same time he's 10 to the minus 300 it's pretty low so so basically we can make systems as reliable as we want provided we can replicate them and provided independent okay so I'm going to give just a little overview of where I'm coming from and what I think about this stuff and I take the view that fault tolerance cannot be achieved by a single computer so I'm not I'm not gonna argue the case about type systems proving systems to be correct and all that kind of stuff that's a completely different subject I'm going to argue the case that computers can fail because they're hit by lightning or something like that I was on a panel debate with Tony Hall he says yeah you haven't got this type system you can't prove correctness of this no I said yeah but your practice theorems aren't going to help you if the computers hit by lightning okay so I'm talking about systems that can be hit by lightning I'm talking about how we build fault-tolerant systems from components that can fail and my view of life is that we will never ever be able to eliminate failure failure is always going to happen we can never make system consistent there will always be inconsistencies in the system all the time we just got to live with this it's gonna be like cancer or something like that parts of the system will not work but it's still got to work to a certain extent despite the fact that individual components are failing so the first the first self-evident truth there is that you cannot make a fault-tolerant system from a single computer and the reason to that it's obvious that computer might fail you only got one computer and it fails you're screwed nothing you can do about that and therefore you need several computers of course if you've got two computers if they both fail you're screwed you've got three computers if they both fail but they all fail at the same time you're screwed but the probability of that goes down if they're independent okay so that very statement that you have to have several computers implies an awful lot of things it implies that your programs won't be concurrent because you've got several computers the programs are executing at the same time and that's what we call concurrency the programs will execute in parallel the programs you write are distributed programs these programs are bei the laws of physics we have to take into account the amount of time it takes from messages to pass between the machines it also obeys certain logical laws so for example in a distributed system so I send a message to a remote machine and I don't get an answer back I do not know why one of two things can happen the remote computer can have failed or the communication network can have failed and I will never know why unless I've got more evidence for what's going on so we cannot violate these physical laws these are intrinsic to the problem if we try and violate them and not pay regard to you know the laws of physics we will build systems that just don't work and I said this morning in my keynote remember in a distributed system you never know how things are from remote nodes you know how they were the last time they told you if I if I ask you how much money in your bank account you tell me I've got $200 in my bank account all I know is the last time you told me it's $200 I cannot assume that you've still got $200 a minute later because you might have transferred some money out of that account so provided we bear that in account while we write our programs we're not going to get into any logical conundrums and any any faults okay so in such systems message passing is inevitable it's inevitable because we've got two things how on earth can they talk to each other without sending messages message passing is inevitable okay so a message passing is actually the basis of object-oriented programming Alan Kay said you know the big story in object-oriented programming is the message passing all these classes methods that's abstract data type that's that's that's not the basis of object-oriented programming programming languages should make this what I mean this this means concurrency parallel programming distributed programming so sure make this easy III crossed out the word easy and replaced it with doable it's not easy it's not easy but it's doable right so how individual computers is part of what I call the small problem how computers which are interconnected work together and the protocols that they use is what I call the big problem and that's the interesting problem I don't care really how the individual things work but I do care how the system as a whole behaves and actually I want the same way to program large systems with more systems if I had a shared memory system you don't tend to if you don't program with message party when I program a thing at planetary scale the individual processes send messages to each other there's no other way to do it if I have an application with one node in Sweden another in America they are of course sending messages between each other so if I program that way at scale why should I not program the same way at small scale when they're all on the same processor the fact if you use shared memory processing and you try and scale out which you can't because the programming model changes I don't like that I don't like two different ways of programming like one way of programming based on message passing a message passing is and I was like quoting from Alan Kay who invented the term of object-oriented programming and in a message to the squeak mailing list he said the big idea is messaging the big idea is messaging and of course messaging an object-oriented stuff has traditionally been done incorrectly I would say long it's the only opportunity language in the world because it's the only one that does messaging correctly messaging in small talk for example is not messaging at all it's a disguised method call and they're actually synchronous it's not very good there the other integration position with Terry horse communicating sequential processes so airline which is a language I invented is derived from the small talk and Prolog influenced by ideas from CSP and it unifies idea from concurrent programming with object-oriented programming and functional program is that kind of unifies these things into framework that makes it easy to understand I think and it follows the laws of physics it's got this asynchronous message passing I I disagree with people like Tony Hoare when they say we can do how can you do synchronous message passing light does not travel infinitely fast there is the propagation delay the two systems become inconsistent while the message is passing after the message has got there may be there can be consistent but even there there a lot of theorems that say that they can't and it was designed Ellen was designed for programming fault-tolerant systems and I should say now I'm not I mean I'm a applications guy I invented airline to solve a problem I didn't I didn't start inventing a programming language and then find a problem to solve it was the other way around I had a problem to solve which to make a fault-tolerant system for Ericsson and I did I invented this language called Erlang and it kind of spread so it controls about half the world's networks it controls what whatsapp's written in Erlang a lot of things written in Elling you don't know there are actually written in Erlang and all the Ericsson networks are programmed all the smart data is that smart data set up so all programmed in there like so it's kind of works and you can program all sorts of things in it so building fault tolerant software boils down to detecting errors and doing something when you get an error but of course you don't want to do it locally remember I said you know the whole process of mine might crash so really when you detect an error you want to fix it somewhere else it's like if if I have a heart attack and kind of fall down here I can't go and run out and get a defibrillator and try and restart my heart somebody in the front row I've gotta do that for me you know we can't fix our own errors wouldn't be off fatally ill something else has got to help us so this idea is really fundamental to Erlang what about errors well their errors we can detect at compile time that's great that's really nice their errors we can detect at runtime that's nice they're ours we can infer from the behavior of the system we can't actually detect them but it's an invariant we've broken you know this thing should always be green and it turns out to be red we know something has gone wrong there are reproducible errors this is really nice if you've got reproducible errors and there are non reproducible errors these are sort of hiding bugs and things like that twice it never occurs again they are actually in a sense that's important they're non reproducible so what do we do nothing well you can't reproduce them the important thing is the system itself carries on and repairs itself and carries on working it's very nice if we can reproduce it and do it but it's more important that it doesn't grind to a halt and stop so we're trying to make systems that want to start it will run forever will never be stopped so the philosophy is find methods to prove software at compile time if we can that's what all the the type we need to do you know prove the program to be correct you can program in rust and things like that or whatever and maybe you can prove it to be correct but also assume that the software is incorrect and is inconsistent and that these errors will manifest themselves at runtime and so we have to do something about errors in software that a manifest them and software there's a lot of evidence software failure is all around us every day we have built systems that are so bloody complicated that nobody understands how they work and they fail all the time nobody understands why they've become like biological systems and the best we can hope is that once the failure has been detective we can reboot the systems they'll start again right so proving the self consistency of small programs will not help okay so what a tests do tests prove that the system is self consistent do they prove that the system is right no I have often found programs where the test cases are incorrect and the program is also incorrect because he got em great test cases so basically a type a type system is proving that there are no internal consistencies in the system itself it's not proving that that program solve any real problem it's completely unrelated improving things is difficult and I mentioned this is more so I'll hop over that B yeah small things some small things can be proved to be self-consistent and of course if we can do that it is good I'm not against doing that but large assemblies of small things behave in extremely complicated ways that we do not understand the best thing we do is put down invariants check the events are true and if we violate those invariants just give up and start again so have a timeline of my involvement in all this kind of stuff and it goes back to 1980 when I first started looking at fault-tolerant systems because I got a job programming Viking which was Sweden's first satellite and it's the first time I learned about fault tolerance and the guys they said so so what happens if there are any faults in the software well the missions over we can't chain it's a satellite it's flying around the earth the software is wrong I'm buggin don't do anything that'll work yes okay a 1985 I moved to Ericsson and started working on something that was a replacement to Plex Plex with a programming language that Ericsson programmed all these products in and yeah flex was based on several ideas it was based on the over the message passing programming language actually object-oriented brutha object-oriented hardware is one of the first true object-oriented hardware's the thing about all of that was effects with the proprietary language was never escaped outside Ericsson the hardware that was executed on with proprietary hardware which never escaped outside the telecoms world but it was highly reliable it was duplicated processors specified when I joined in 1985 to have a downtime in three minutes per year and with penalty clause or sometime $50,000 if you were down for more than three minutes a year so we're doing that back in the 80s and that was commonplace okay so in 1986 I'd kind of unified object oriented programming with function I didn't I didn't actually realize how dumb had at the time some somebody pointed this out a few years later and I thought did I yes I suppose I did actually didn't realize it at the time and we built several products in Erlang inside Ericsson and in 1998 a fun thing happened Ellen got banned and that was kind of a bit of a worry at the time I didn't really like that we were annoyed about that but a consequence of that was we managed to persuade the Ericsson management well if you don't want to use them if you can ban it we might as well make it open source and so the reason that Ellen became open source was dealing with banned and by a strange coincidence four days after it was made into open source actually all the computer science lab quit and the company which is just one of these coincidences that people will never be able to explain and so we started blue tail this company and that lasted for about three years and was brought up by Altium web systems $150,000,000 and I moved to Alton web systems and out in web systems were brought up six days after they had bought us it was bought by Nortel Networks and this was in the height of the telly this was just before the telecoms boom crash Nortel Networks was a company that formed it because it was buying up little companies like crazy I mean it bought out in web systems for 80 billion dollars which is after Ludacris and then Nortel Networks said that they were withdrew their financial statement said they did and then Nortel Networks crashed causing the telecoms crisis and I got fired which was quite fun and I moved to six Sweden to do computer science and I got a PhD and and I went back to Ericsson and during this time several things happened in the beginning this airline model of computation this shared memory systems idea of building systems was and generally a general rejected people said it won't be efficient enough mainly efficiency we can we can't build systems are there and at the end of that period this Ellingham model of computation was widely adopted and accepted in many different programming languages so it's the basis of scholars models for the things that occur are based directly on the OTP stuff and these all come from the original airline work and their derivations are there so he was biking the first thing lovely little satellite beautiful right so types of systems fault tolerance has different interpretations I mean a different amount of engineering going into it are we going to build highly reliable systems I mean the systems that control nuclear power plants the weapons control system for nuclear weapons things like that they're they're highly reliable we've got reliable systems maybe driverless cars and things they it doesn't matter if I kill a few people they're not going to give billions of people millions of people if they go wrong I mean the the the the distinct is how many million people do we kill if it goes wrong if the answer or how many billions of dollars does it make to replace the satellite I mean a satellite launch costs you what's a satellite launch cost 500 million dollars or something like that it's a lot of engineering in it so it's a cost either in human life or in replacing this that determines the amount of engineering you put into it basically we can make very reliable systems put a lot of engineering into it then we've got driverless cars I kill a few people nobody cares well apart from the lawyers and the lawyers we'll have fun we've got the kind of reliable things it annoys people if they fail and we got dodgy things we just showed a shadow you know stuff failing all the time I live streaming music from Spotify I listen to music for an hour just stop through 30 seconds I don't know I don't notice don't pick on the phone and scream its own awful do I scream at Spotify or I scream at my eyes but nobody knows well nobody know it's not instrumented so that we know if we're going to instrument every step we could say exactly which component had failed and in fact what I would see is the next generation of computing which you guys are going to come build if you instrument all these systems to a legal requirement so when stuff doesn't work we not only know why it doesn't work we know exactly who's to blame because if you connect 20 things together and they're all do they all have to work for it to work and something goes wrong if we don't know which of those 20 things have gone wrong and we don't know who to blame and that's the basis of a contract right so we need to get this idea of reliability of service into consumer products because what we got today our systems which if they work are wonderful they're much for you not my TV it's much better than the TV we had in the 60s or 70s analog TVs but the reliability is far worse if it works it's great if it doesn't work nobody knows why we reboot it that's not good and if computer scientists computer people we want to fix that that's the next big thing we need to fix how can we make software that works reasonably well even if there are errors in the software the answer is read my pieces so that's quite easy really there are six rules and to save you reading my thesis there are six requirements on the basic properties as the system has to have it has to support concurrency because things are happening at the same time it has to encapsulate the errors we don't want error is propagating around the system we don't want the fact our system is crash to crash another system we need to detect what the fault is we need to identify it we need to change the code add some systems running and we need stable storage since the whole thing crashes when we reboot it we can go back into the stable storage and see why it crashed so there's a method detect all errors and just crash immediately if you can't do something try and do something simpler so we build computations into trees hierarchies where at the highest level if the system is working perfectly it will do exactly what it's supposed to it'll pay all the things it should do and if something goes wrong we drop that requirement of making everything work and we reduce we relax our requirements and make it do something slightly simpler that doesn't work we make it do something even simpler and so on and hopefully the system will still work maybe with reduced functionality but it won't fall over completely so we build trees of computations and we should identify the error kernel the error kernel of a system is that part which must be correct ok that's what the error kernel is all that all the other code can be incorrect doesn't matter the error kernel is the part of the system must be correct if it's incorrect all the bets are off but the error kernel must be correct so the error kernel in our Lang is pretty small it's like 200 lines of code that's it it's in ring zero I wrote it years ago Harden and I think it's ever been changed you just make sure it's right and it's I mean it's so simple it must be right famous last words yeah and we build things into civilian trees and basically if the stuff at the bottom fails the things higher up will trap those errors and do something this is this is built into things like Acker same supervision trees this is from some a cursor item types ocean and the yelling system itself okay so all the stuff works to use by Ericsson for smart data set up it's used by whatsapp seized by sir CERN found the Higgs boson with business over I found the Higgs boson buying in a database using couchdb cisco net comes in Cisco's run by Erlang the this is called spine to is the National Health Services database in Britain they Chubb now Oracle and replaced it with an open-source database written in Erlang saving lots of money things like rabbitmq written in airline right so what is an error how do we discover an arab what do we do if we get an error right so what is an error well it's an undesirable property of a program is something that crashes the program the value that's it and corrected deviation between your desired and observed behavior who finds the error program can find the error the programmer can find the error when they're writing their code compiler fine sierra what should you do well the runtime system finds an error arithmetic errors we can find them divide by zero of not a number pointers incorrect was nonsense vows in values that are nonsense switch statements that don't apply but what do we do what do we do in the code right so what should we do when the runtime finds an error you could ignore it that's not a good idea you could try and fix it all right I changed the fly sometime right yes I'm on the right now I really the last time I updated them oh oh right no um the problem of trying to fix errors is that you make matters worse you don't make matters better don't make matter that missile rule don't make matters worse don't try and fix stuff crash immediately and assume in the architecture that it won't be somebody else's responsibility to fix it that's why we build a hierarchical architecture alright sumption is that our software is correct if it's not correct it's going to crash somebody in this supervision tree that's higher up we'll have the job of fixing that areas not our responsibility if it's correct so we crash immediately this is actually basically rebooting the system when a system fails we reboot it but when we're doing it nailing that reboot is taking a microsecond it's not taken two or three seconds or minutes or 30 seconds as in a normal operating system rebooting the system means detecting a process has failed and restarting it and that will happen sub microsecond speeds so it's pretty fast right so oh so what should the programmer that was what there's a programmer so what's a programmer do when they don't know what to do ignore it no they shouldn't do log it yeah right writing in a log file that's very good yeah and crash immediately after you've written in the log file and where's the log file we've been paying attention it's in stable storage very good thank you you'd be listening good right so in sequential programming languages the single thread crashing is not widely practiced because if you've only got one thread and it crashes you've lost everything bad news so what's the big deal about concurrency because I keep rabbiting on about cuca there are thousands of programming languages and they're all sequential and there are three programming names in practice that are parallel prevailing there's a lecture we basically just airline with a different name and a different front entrance and this Pony ponies pretty cool and then there's a load of academic language isn't nobody ever uses sorry I didn't like that okay so this is a sequential program one process okay this is a dead sequential this is a sequential program that's crashed nothing you've got nothing this is a parallel program okay this is a parallel program now watch Kathy this is a parallel program when one of these processes crashes wait a moment no you see it one of those processes if you back it right if you can see it look there you see but when I go forward I mean you don't notice hey go that's one that crashed so we might go back so Ellen has this content linked processes linked what's a link a links are done in real-time when you when you show graphically what does that mean the link means that if this process dies these other processes are going to get sent a message telling you that the process has died so I can imagine if the read process fails then the blue processes read process fails the blue processes are going to get error messages that's it it's a link mechanism invented by Mike Williamson I didn't invent that so why concurrent well I said that fault Torrance is impossible on one computer and scalable and then this is kind of funny okay so we just said fault tolerance is possible impossible on one computer and well scalable is impossible on one computer well it's sort of partially true you can scale up a little bit a little bit and you've filled up that computer once you reach the boundaries of that computer you can't scale up more you've got to move to 2 or 3 4 4 5 6 7 10 the scalable is in computer is impossible on one computer and something else security is very promising impossible miss domestic security is very difficult on one computer if somebody gets a virus into that computer there they don't get a virus or thing into this computer the reason is that they are isolated so we can make reliable systems by isolating the components my bank had in a box from my bank when I stick a card in it I go to the website and it sends a challenge carried out I type the challenge code into my little box and stick the card in it tightest another number and I type that in guess what that little box the bank gave me they have never sent out a software fix unlike Apple unlike Google they do not have to send me bug fixes for security every 10 days because the last one was wrong so why is this software reliable probably just because it's a tiny little thing and nobody can talk to it it's not connected to the Internet and how does it communicate to my other computer through an out-of-band method through my eyes and my fingers that's how we communicate a little number comes up in the screen and I type it into my computer so that's actually quite difficult to hack that needs a drone you know a little rather make tiny little drones about size of a mosquito that sits of my shoulder and watch it what I'm doing and then you could hack it but it's difficult so security is very difficult on one computer and right I actually want the same way to program I don't want two different ways to program I don't want one way to program local systems and another way to program distributed system I just want one way to do it all so this is my case for concurrency because you've got all that and I should say well just you know the world is concurrent we are actually a room with a hundred people in or something we're all we're all separate but not one sort of joined brain sort of a Siamese twin with a hundred brain is connected together we send messages the message is a sound we understand if the messages received are not being asked but did you did you hear my message what's two plus two yeah you see suddenly I know he got the message and replied because he ascends him a challenge he he replied quickly good so we can detect errors how do we detect errors Oh Harry for all sorts of ways how we doing for time well I have to finish Oh including questions right John me no rabbit on for ten minutes and no questions or war like five minutes we'll see we'll see I'm just gonna talk a little bit about arithmetic there are silent and deadly errors errors where the program does not crash but delivers an incorrect result these are horrible and there are noisy errors errors which cause the program to crash we love noisy errors we hate silent and deadly errors silent errors things called quiet not in numbers Oh griever these are these fantastic things you ever seen these sort of wait a moment a silence I got so I've got a movie what a moment let's see what happens it's supposed to play the movie isn't it yes it does it does it does something happening but there's no sound oh oh the HDMI turn the sound off this is a rocket guess what's gonna happen okay I've only got five minutes oh it's better with the sound it's a sight it said it was an error in an eider program the red line is the error it crashes silent programming errors why are they silent because the program doesn't know there's an error so I'd point out that arithmetic is a nice source of silent errors I came across this run full pane which is very nice this is a formula compute three hundred and thirty three point seven five times wider the six plus some stuff and rump this guy called rump therefore that's the name of the author he wrote a paper about this and showed that if you're doing 32 bit precision you get one point seven two and if you do in 64 bit precision you get that if you do it in 128 precision you get this if you correct answer is negative so basically more precision didn't help so to ask you the question here if you if you were an application programmer and you're programming in your favorite language that might be wrong it might be Java it might be Haskell it might be anything and you're given a formula like that probably you just sit down and write ye you know sort of x equals three point three point x y ^ 6 and so on and you'll just assume that it computed the right result wouldn't be using floating point and the answer matters if it's a health care system if it's a nuclear power plant or something like that you better not use floating point if you do you better prove that it's right floating point isn't dear really difficult very good lecture look up John Gustafson you've got some stuff on YouTube very entertaining very entertaining oh yeah yeah this is a this is a very good lecture example normally recommended so arithmetic is very difficult to get right getting the same answer in single precision and double precision does not mean that the answer is right goodness no and it matters you must prove everything to be right so like most programmers believe we should have a quiz yeah I should show this in Tucson how many people believe that a plus brackets B plus C is the same as bracket a plus B plus C well I'm showing you the answer they're different that's where you put the parentheses this has got a rounding errors the most people think most programmers think that these deliver the same result most programming languages no they don't and compiler writers no they don't about a value errors yeah program does not crash but the values computed are incorrect or inaccurate how do we know how do we know it's correct when it's got numerical calculations in well if you snuck into integers are pretty okay but if you're outside introduced it's pretty difficult does it matter well it depends on what the applications being used for not a number is pretty okay now I'm looking for slides you know do you want Jim somebody sent me that I thought that was nice right yeah programmer doesn't know what to do just crash I call this the let it crash philosophy and what do you do when you receive an error maintained and invariant and try and do something else and we've got longer okay so is that all guess no no no it's not no no okay so so what's the message what's in a message okay so when we build systems we build them out of black boxes which send messages to each other now what's inside a black box is a program and it's written in a programming language and there are thousands of friggin programming language whoo is not language they're written in what's important is the behavior at the interfaces okay so Tony Hall introduced this idea of observational equivalents two systems are equivalent if they are observational e-equivalent that is to say that if we observe the same patterns I've even put some outputs in two black boxes we could assume them to be the same that's called observational equivalence and it doesn't matter what language we write the inside of a black box in but it does matter what comes in and what goes out so why are we all looking at what's going on inside the black boxes this is completely irrelevant what we need to do is pin down what's happening at the boundaries the boundaries of the program right so the interaction between components involves message passing and there are very few ways to describe the messages Jason XML asn.1 things like that and more recently things called session types now quite often we just describe the format of the messages but we say nothing about the sequencing of such messages when we write API is I mean the van API for a file system we say when we open a file we get a handle when we close the file we destroy the handle when we read a file we give it a handle we get data back the type system does not tell us that the following program is illegal open a file close the file read the file the type system does not tell us that so we need way to describe the protocols protocols are contracts and contracts assigned blame and we need an architecture where we don't have to have a client in server we have something that describes what we see on the wire that I call a contract and I think the correct architecture for any system has three things in it it has a client and a server and the contract hacker in the middle this will incidentally solve my first problem of assigning blame because when we connect lots of things together with contract checkers the contract checker will say what went wrong and who went wrong so the check a little cat-and-mouse thing it's watching the protocol assigning blame when things go wrong so the question I would leave open at the end of this talk and this is my last slide is how do we describe contracts this is something you can all be thinking about and this is this is going to take another 10 to 20 years to get it sorted out thank you very much and we've got minus two minutes for questions we've got three minutes for questions yes that's okay don't worry well we don't need that anymore we've got three questions from the audience in the app three questions three questions they came in at the last minute there right the first one is how about the idea of an immune system the ability for a system to detect internal errors and at least attempt self-heal yeah well that's what this is doing it is it means I mean maybe we can build it we can program an immune system means that we do yeah very good yes so the questions are going very willingly very quickly the next one is how to identify the error kernel I'm not sure yeah this question tricky that you have to think about you know it is very politicians that if we're that were to crash what would happen is it fatal get it down to a very small part basically error kernel must be that part of the system that's responsible for logging errors and starting and restarting things that's the simplest thing I mean if if you can't detect errors log them and restart things that have failed you're screwed so the error code was going to do things like that and also when you make a diverse programming team it's a good thing to put your most experienced programmers on writing the error kernel not those guys who've just walked in off the street and don't know what the language is so so you structure your programming team such that your most experienced programmers do the trickiest part and you try and make it so that there is a small amount of America only knowing what the error colonel in the XD system was like 0.2 percent of all the code it's very very small though thousands of modules that could have errors in two or three modules which if and everything we were screwed got more questions coming in hold your horses people distinction between an error and a failure Pass pass kubernetes enforces a lot of these practices what is it missing sorry kubernetes enforces shall we pass on this one - if you don't know Cuba Nettie's I don't know anything okay fair enough and oh that was posted four times by somebody right click once yeah I can get four different answers so that was then the questions I suppose one question that I'm thinking about myself is with you're talking about physics and yes physical systems I guess cause analyst he is a big one yes so do the systems here force a particular order in in the asynchronous messaging is that an essential part of it as well absolutely okay yes I mean you come you can't do stuff until you know stuff you've got I've got to have got a message from you we're driven by message passing basically I mean I don't I don't know stuff about the world until I've got a mess it's of course allergies yes I can't I can't assume anything until I've got a message that's causality right okay so another one coming in by that we could also turn it to people in the audience the less shy ones amongst you but the air there's another one in the app which is with respect to contracts would you consider protobuf protobufs and the use of our pcs sufficient No on to the next question I mean it's on let's go it's sort of going in the right direction but it's not adequate I suppose the follow-on question there were you Watson missing a language for specifying the sequencing of messages let me see is there a system this is a this is called session types in the theory and I've written a say I've written a dynamic session type-checking years ago but it's thin my PhD thesis it's in his PhD thesis yeah it's very readable you know it's not much growth I didn't I I wrote a thesis without much Greek in it okay so the last question that came in here is is there a system today not into Melek telecommunications which has the portions you covered I don't know there might be I mean it's possible there are but but I'm in high frequency trading myself there are a couple of I know sorry I didn't tell you that before did or I did did physics mind you Oh totally good yeah redeeming factor maybe thank you very much Joe [Applause]
Info
Channel: GOTO Conferences
Views: 48,642
Rating: undefined out of 5
Keywords: GOTO, GOTOcon, GOTO Conference, GOTO (Software Conference), Videos for Developers, Computer Science, GOTOchgo, GOTO Chicago, Joe Armstrong, Error Handling, Fault Tolerant, Fault Tolerance, Programming Languages, Inventor of Erlang, Erlang
Id: TTM_b7EJg5E
Channel Id: undefined
Length: 45min 31sec (2731 seconds)
Published: Fri Nov 02 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.