Gig City Elixir 2019 Speaker Talks: Anna Neyzburg: Go vs. Elixir

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Music] you so here we have Anna from carbon five you might remember from last year if you were here she was so good we decided we had to ask her back what else one of the foremost evangelist for elixir running a lick sir bridge among other things great curriculum there and if you don't recognize her you might recognize her voice because she's also one of the co-hosts overlooks route law so give her a round of applause everybody you need a microphone I'm gonna need that thanks everyone how's it going so far wait a little bit better than that come on how's it going so far so yeah thanks Brett my name is Anna I am a developer at a consultancy called carbon 5 we build custom software for all kinds of companies everything from super early-stage startups through enterprise I'm really full service agency we have product design and development in-house we have to have an office in this building one in San Francisco New York and Santa Monica if anybody's curious come talk to me I think there's some other carbon fibers in the audience so I could find them that's not what I'm talking about today though what I'm talking about is looking at go versus elixir and specifically looking at the concurrency paradigms of both languages so I've actually given a longer version of this talk with my coworker Hannah and the inspiration for this talk really came from conversations you were having with each other and other people we were talking about applications and dealing with concurrency and oftentimes people would throw out go and elixir and talking those conversations but they're not the same right as with everything we know that everything has trade-offs and really all we're talking about comparing the concurrency paradigms and go and elixir we're really talking about the actor model we're talking about early and elixir the actor model and when talking about go we're talking about this thing called CSP or communicating sequential processes but before we actually kind of dive in and look at both and what's the same and what's different I kind of take a step back and think about how we might define concurrency and before that where did this all really start right we know that in the beginning computers didn't really have operating systems right you had what we were designed to run like one program for beginning to end and the program had access to all of the machines resources thankfully over time that's changed right on multiprocessor systems multiple processes can be executed in parallel and I feel like you know at least once a week my fan starts burning on my machine probably every day all these things running I'm like also why are all of these things running but how is this possible and interesting thing is it's not really right we fake it we know that like at any given instant in time if you have a machine with a single CPU only one process can be running in that instance right but the executable code for several processes can be loaded into memory and the processor essentially switches rapidly between jobs to give the illusion that things are running in parallel and at the risk of kind of oversimplifying we know that a process can really only have one of three states right it's running right the CPU is executing statements in that process it is ready so it's lower than the memory but not yet running or it is blocked another thing we run upon to set the stage for is that we also have these things called threads right and the really the only thing you want to keep in mind about them is that they're subsets of a process and they share memory so how is this all relevant right why do we care and why are we starting here well we now have these more robust machines and we can do more with them right so how do we leverage that in our applications this brings me back to how we want to think about and how we talk about concurrency I did not come up with this analogy I stole it from somebody Oram borrowing from somebody who's much smarter than me on the internet but I liked it so I want to share it with you so let's imagine you had five people they all went to Ikea they liked the same bed they each buy that bed they bring it home and they build it and then at some point later in the future you have like five beds now think about what the instruction set for building those beds would look like to me it seems pretty straightforward right they would each get the same set of instructions okay well what if you had five people and they all went to Ikea and they bought one bed then they came home and started building it together what is the instructions that look like so that each person can make progress on their tasks without actually blocking anybody else and then at some point in the future you have a fully functioning bed so don't know about you but to me that sounds a lot more complex so when we're talking about concurrency right can define it as the composition of independently executing processes while parallelism is a simultaneous execution of computations now if that sounds like a lot of words right which it kind of does what we're really saying is parallelism is about executing many things at once its focus is execution concurrency is about dealing with many things at once its focus is structure and that requires coordination and this is really where we introduce complexity so how do we really coordinate between these tasks working together so I'm gonna shift gears a little bit and talk about how we've traditionally dealt with concurrency right how do we organize work between multiple potentially parallel tasks that are working together we have these processors that can get more done so how do we take advantage of that well and we know that the problem with having no coordination and just relying on parallelism is it actually doesn't get things done faster does anybody familiar with this so we talked about concurrency we talked about keeping things unblocked right moving faster potentially in parallel and as we see here right this NCIS image two sets of hands typing on a keyboard actually doesn't type faster even though we want it to so how do you coordinate well the original solution is to use this concept of threads right their threads were these built-in concept in operating systems they're independently executing subsets of processes and they share memory right and because they share memory they actually share data so the strategy that's traditionally been used right is to communicate by mutating these shared data sets the metaphor I like to think of is let's say you have like a painting hanging in a room and you have to be working on that painting one person comes in it does some work and then leaves then the person comes in sees the work that a person is does some more work and then leaves the other person comes back right they never actually directly communicate with each other they only see kind of the changes that the other person is made problem with this is that it's actually really easy to override each other's work right so to prevent that right we introduce these kinds of locking mechanisms that we see in a lot easier tional programming languages right like mutexes or summer for us so we can serialize access to that data and threads really are the original basis for writing concurrent programs because they're already built into operating systems you kind of get this communication through shared data for free aside from like using a couple locking mechanisms and this is why a majority of traditional programming languages support threads right they provide the best basic mechanisms for communicating with this data but as you can imagine initially right it sounds easy and it sounds simple and we can coordinate all this work and it's awesome but very quickly it becomes problematic right we very quickly run into some challenges one we're dealing with lots of this shared mutable state and I don't I think I'm at the right conference where like I don't have to convince people that like having to constantly deal with large amounts of shared mutable state is like not a good idea and because of that right if we're dealing with that then we have to rely on the programmer to remember to like lock access to serialize this to block access to this data so they can do some work and then remember to unlock it but what happens if they forget right and anybody else is trying to access this program can't write because then the program is locked and so not only do have like unpredictable problems with like managing the expectations of programmers like remembering to do the right thing but as you introduce more threads right and introduce more state the complexity increases exponentially rather than linearly so as we start digging into like go and elixir we see these models right and the models if the concurrency models that they're leveraging as approaches to try and deal with these problems in a different way so first we're going to talk about the actor model it's probably very familiar for a lot of you who are familiar with elixir and/or Erlang how many of you are new to elixir no really cool what's awesome so for those of you this will hopefully be helpful for a lot of you this might be a review it's going to set the stage for the conversation that we're gonna have in relation to go so the actor model fun fact invented by Carl Hewitt in 1973 early was created in 1986 and I read this and I think it's true but tell me if I'm wrong but supposedly the creators of Erlang did actually not actually know about the actor model when they wrote erling which I think really speaks to the promise of the design pattern so we talked about the actor model we're talking with this conceptual model to deal with concurrent computation we're defining some general rules for how the system's components should behave and interact with each other so this is an example of what it might look like right and what is going on here each of these circles has a mailbox and a little box that says internal state you sure they didn't just called an actor or in an elixir erlang land a process right these are not operating system processes superlightweight they communicate by sending messages to each other and they never share memory right which means they never directly shared data so each of these processes maintains an internal state that can't be directly accessed by another process and I kind of like this analogy when we're talking about the actor model one aunt is no aunt one aunt cannot survive alone same idea for actors they come in systems right and this model everything is an actor and each actor or in our case process has a unique ID so they can communicate with each other and they communicate by sending messages right or imagine like sending an email and if we take a closer look at the sending of messages right we mentioned that each process has a unique ID the messages are sent asynchronously and they're stored in the mailbox and when a process receives a message can do one of three things right you can spin up more processes it can send a message or it can update its internal state based on the message it's received and that state will then be acted upon by the next message receives so the thing we want to remember about processes or things you want to keep in mind at least for the context of this talk they're not operating system processes they're lightweight they do not share memory and they have a unique ID and when we were putting together this talk the interesting thing that came to mind especially in the context kind of Erlang is in this case the actor model is physically based and what do I mean by who still has a landmine anybody does anybody still use their landline I love it and when was the last time you two call dropped do you remember like never exactly right so an interesting thing we were thinking about this is that in this case right this this was built right this language was built to model the real world constraints of physical systems over long distances so we think about the functionality that we can leverage in this liquor or Erlang ecosystem it was necessitated by these problems that these language creators are trying to solve right an example of this being like distributed elixir and I'm gonna qualify this a little bit right because things are inherently more complicated at scale and the tools that we have are are at our disposal in this ecosystem do not at all solve any of the problems that are presented by distributed systems but they give us thing they make it a little bit easier to get started right for example each process has a unique ID so process can mutate directly another process whether it's on the same node or a note how fair on the world and given that a node can go down at any time we would not want the sending of a message or the receiving of a message to be blocking in any way right so to Jim to think that the messages are also kind of not time bound any sense and we've all heard this phrase it within this community right like let it crash we are allowing ourselves to essentially build and this it looks for early an ecosystem these kind of self-healing systems right we have processes that watch other processes and this goes back to the thing I was talking about with regards to coordination right we're building these self-healing systems they're allowing us to better coordinate our work so if something goes down the works still keeps moving forward and theoretical we build systems that don't die right we try to but none of this runs and runs in isolation right these processes don't run in a vacuum we know that they run on the beam or the the virtual machine and the one point we really want to take away at least for the context of this talk is this idea of pre-emptive scheduling right we have this scheduler that essentially coordinates all these processes and all these work and it can interrupt a task right and allow another task to run without any permission from that running process again another kind of pattern that allows us to take advantage of this coordination allowing us to run our applications more efficiently and if we think about like the priorities of the language it kind of helps us understand some of the choices that have been made let's probably say it looks erling parodies not just elixir but the fact that they needed it to be scalable right taking on lots of load the fact that he had these systems distribute over long distances so they needed to be fault tolerant and fast so let's switch gears for a minute and now take a look at the other language you've been talking about so go and CSP so as I mentioned gos concurrency model is based on this thing called communicating sequential processes it's a description for coordinating worker mining I'm amongst independent processes and it was first coined by the sky Tony Hoare in 1978 apparently the 70s were a really good decade for distributed systems papers so again in the context of CSP a process just refers to like the smallest independent unit of work essentially something that has a start and an end and so in this context kind of like to think about it as a sprinter someone who has a defined start runs for as long as they can and then stops and similar as we talked about the actor model right most implementations of CSP implement the version of a process on top of the operating system so these are not operating system processes so how do we coordinate work among these types of processes right well we're gonna go on with this like a sprinter a knowledge and imagine we have a relay race sprinters running and they have the baton and hand it off to the next person the next sprinter waits until they actually get the baton until they take off so the baton in our case is called a channel and if the basic mechanism through which processes communicate and tell it a process to start doing its work so one important and difference from the traditional relay races processes actually don't know about each other in CSP they only know about channels so you can imagine the Sprinter hands off the baton but they actually have no idea who they are handing off the baton to I just hand up to the first person who's there not only that but handing off the baton the passing of it is actually synchronous so imagine that I'm running in this relay race and I have a baton and I get to the end and there's nobody there I'm stuck I have to wait until someone's there to actually take the baton off my hands right so if I write something to a channel I have to wait for somebody to read it before I can move on and do something else so last concept that's kind of essential to the CSP model that we're talking about this idea of choice so we're now approaching a pretty theoretical relay race but stay with me so imagine you're running and you're waiting for a baton right but you have a router that's coming with the gold baton and you have a runner that's coming with a silver baton and you have no idea which baton you're gonna get but depending on the baton you're gonna go in a completely different direction and if they both come at the same time you're kind of gonna pick one at random okay what's the been or a relay race I'm gonna take a look at how CSP actually works and practice I'm using go as our implementation so any go programmers some people huh no go I know carbon fibers in the room so they don't go anyway don't worry if you don't know go so just to spend your disbelief we're gonna look at a simple kind of go program and our goal really is just to launch two processes I'm gonna have one right to the channel we're gonna have one read from the channel they're both going to notify their parent that they're done they can terminate so this is our read function so it takes a channel it's going to read from a string so notice the channels are typed and it's gonna notify its parent when it's done and then don't worry about that like empty struct thing we don't want you to transmit any data we just want to say something happened which is why we are using that here's our write function you can see just going to write to the channel and then notify a parent that it's done and here is our main function Sogo is actually a pretty low-level language so we actually need to initialize our channels by telling go to allocate them and but it is garbage collected so we don't need to worry about cleaning them back up so we're gonna kick off the read and write in parallel this DSP equivalent in Goa processes are called go routines and the way you make something a go routine is by putting go the word go in front of the function why are you laughing and so we're going to kick off both of them in parallel we're gonna put the read function before the write function to prove that they're happening in parallel and then wait for both of them to be done we actually don't need to wait for both of them to terminate but then we wouldn't be able to guarantee that this print line that says finish to be the last thing to happen so again I think I've already said this but every read and write to a channel is synchronous right if you write to a channel it will block until somebody else reads and the interesting thing about this or those relative this as you start to see in the example is that you have pretty fine-grained control over the work being done and so you can use the channels to have pretty predictable behavior if that's what you're at so let's take a look at how go actually implements CSP if you look at the documentation right and dig into how go defines go routines they say a girl routine is a lightweight thread managed by the go runtime anybody think it's funny that they referenced the word thread we're talking about processes it's not a mistake because go routines sure memory and they have access to the same data that's a little weird right but there's more there's a librarian go occult sync and what you get with sync is like all of the basic mechanisms that she would get with any other threaded programming language so you end up with kind of this like choose your own adventure with go where you could use the more regimented CSP based model for coordinating work or you could just use a mutex and the docks unfortunately aren't as clear as you want them to be because the docks initially say don't communicate by sharing memory share memory by communicating except in the very next line they're kind of like well but if it's hard and it's not gonna work and it's gonna be more confusing then just use the mutex one more fun fact and the thing that's important to know that is in contrast we talked about when we're talking about the actor model and the early a.m. is that the routines are scheduled cooperatively so that means is your girl routine basically has to hit certain statements while it's running in order for go to interrupt and allow another girl routine to run so if you create a grow team that actually doesn't do any of that you can essentially end up with this really confusing locking behavior some of you might be thinking like WTF what's going on this seems a little weird but before we like get all judgey some of you already judgey I know but before we get all judgey let's look at a few other aspects of gos implementation that like might help us get a fuller picture of like why the decisions that were made were made so a couple things to know about go it compiles down to actual machine code the go runtime is like really small a HelloWorld executable and goes about two megabytes which is about twice as large as the C program and underlying all of this is it's really trying to be a systems programming language it's kind of it's garbage collected right so you wouldn't really want to do write an operating system and go but it's really designed to be a replacement for lot of the other things you would like to do maybe like native app programming but it also wants to be flexible which is why you kind of get this like kitchen sink of choose-your-own-adventure the way you want to do things because you also want to be able to write a web server if you want to and I think this is made clear when we talk about kind of the priorities of go right it seems like the first priority is ease of adoption right especially for C programmers like you can learn the basic syntax and concepts pretty quickly it's a pretty flexible language there's not a ton of convention right it doesn't have a lot of the higher-level concurrency primitives that you see and Erlang an elixir right there are no gen servers or supervisors and it has concepts it'll be really familiar to folks that are familiar with threaded programming to is that it's fast right it's supposed to be replacement for C and lastly it's designed to be flexible in a number of use cases right so hence the choose your adventure approach whether you want to write a command line program or you want to write a web server so we've kind of talked a little bit about each of these models but what's really different or are they more the same than they are different and when would you use either first of all I think it needs to be said that they're far more similar than they are different especially when compared against like traditional threaded programming so both go and elixir manage the own concurrent code they both have an abstract concept of executing a piece of code separate from the operating system right this means that you don't need to spend a whole lot of time worrying about scheduling pieces of code to run or worry about the memory management of having spun off thousands of elixir processes or go routines and while go is like certainly less rigid in its implementation of this then we see an elixir in Erlang they both emphasize message passing sharing data by communicating as like the basic primitive for synchronization I think in this room a lot of us would agree that maybe message passing is better and there's no coincidence right Jerome's drunk himself has said that even though the actor models don't throw the concept right Allah Erlang was heavily influenced by CSP so what are the differences all right one of the key differences is identity right we talked about how and the concern early on with the process model they have unique identities and CSP processes are largely anonymous once you create a goroutine there's no way to really directly refer to it and this kind of explains the communication patterns that we see right because these processes are anonymous and they have no identity they need a way to be able to communicate and this happens to an intermediary right this thing that we're referring to as channels so there's layer of like indirection that we don't necessarily see in the elixir earlier it also explains this kind of a synchronous versus asynchronous message passing and this might be one of the most important things right is that CSP messages are synchronous so as soon as you send a message you have to wait for somebody else to read it which really kind of changes how you work with the language right you really wouldn't want synchronous message passing over a network necessarily but are they really that different this is maybe pushing it not pushing it too much but like you could give CSE processes an identity right if you have one channel and one process reading from that channel that channel kind of becomes the de facto identity and can we fake asynchronous message passing or not fake it but actually implement it this probably would not work for a mysterious system but if you were doing it locally right we can kind of have this message queue process where you'll have a channel on which you can send messages and that process will just read all those messages and put them into some kind of buffer and then you've another channel that's gonna receive the messages and anyone receiving on that channel will just get those messages when they come out asynchronous message passing it's actually not quite that simple but this is kind of like the canonical blog post for doing this and go there a couple more steps but it's really good so if you're interested I encourage you to take a look at it so we talked a little bit about go talk to a little bit about elixir their design paradigms kind of what the advantages and disadvantages of both of them are and the interesting thought that I was thinking that when I was kind of redoing this talk for this particular conference is this and I promise this is not a tangent there is a point to this story so bear with me but a couple of weeks ago I was rock climbing with a friend of mine in Yosemite Valley we were doing this like multi-pitch trad climb and it's the kind where you like put in the gear and you clip in your rope and we get to the top you've got two ropes we're gonna tie them together to repel any climbers in the room right so like rappelling is the most dangerous part of climbing because you're so dependent upon the rope and the anchors and my friend John has been climbing a lot longer than I have like 20 years and he was like you can tie this one type of knot which we normally do for rappelling or you can tie this other knot and I didn't know that that was maybe not the correct knot to tie anyway we tied this other knot you leave a little bit of tail just in case the knot rolls so it doesn't come apart and we're about to fight we're like 500 feet up and the heavier person generally goes first so he started rappelling he gets about 15 feet down and I luckily I'm watching but the knot starts rolling right and there's like about this much tail left and like John stopped and luckily there's a Ledge you could stand on so we pulled up the knot retied it into the other one which is the one that you're actually supposed to use for rappelling and then we're both here so it ended up fine my point with that story and it's not to be dramatic right but the point is like really understanding this is why the decisions were made for a particular thing like and when to use that thing can really save you a lot of pain further down the road in situations especially when they might be unexpected so understanding like the design decisions initially can be hugely helpful and that's all I got thanks for listening [Applause]

Info

Channel: Groxio

Views: 7,020

Rating: 4.6319017 out of 5

Keywords: Gig City Elixir 19, Functional Programming, Golang, Elixir, Myelixirstatus, Programmers, developers, coding, conference talks, programming, tech

Id: UAlkWtJO8AM

Channel Id: undefined

Length: 29min 46sec (1786 seconds)

Published: Wed Dec 18 2019