ElixirConf 2021 - Mat Trudel - bandit on the loose! Networking in Elixir demystified

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] [Music] [Applause] [Music] all right uh hopefully everybody can hear me uh before i get started i just wanted to take a moment to thank uh jim and anna and andrea and the whole rest of the elixir team for uh putting on such a great show and also thank you to all of you it really uh means a lot to me to have people take the time to be here and listen uh so thank you to everyone my name is matt trudell i'm an elixir hacker here in toronto uh and the title of my talk today is bandit on the loose networking in elixir demystified my main goal with this talk is to take the wraps off a pair of libraries that i've been writing over the past year or so named thousand island and bandit uh they're ground up reimaginings of the ranch and the cowboy projects respectively both of these libraries represent pretty significant advancements in the state of the art of networking and elixir both from an ergonomic perspective but also from a performance perspective uh i've had an absolute blast of a time working on them and i'm so excited to finally show them off in public to everyone today um another primary goal of this talk is describe it at subtitle and that is to demystify lower level networking and elixir you know all too often i think people consider networking code and server implementations as kind of being unapproachable and intimidating but really nothing can be further from the truth um if anyone caught desmond's talk yesterday he mentioned had a great he had a great statement there where he said all problems are solvable and networking code really is no different i'd actually go so far as to say that lower level networking code showcases really some of the best things that beam and the otp have to offer the patterns and the features that set the beam apart really shine when you get down to these lower levels and i'm super jazzed to take all of you on a tour through all of it today i'll be introducing thousand island and bandit kind of in due course during the tour uh but in the meantime i've organized this talk along the lines that most people think about networking and that's kind of as a stack so specifically we'll be looking at the stack that holds up phoenix and it looks kind of like this from the lowest to the highest level which is how we're going to explore things today we start with tcpip we then have sockets http as a protocol plug as a library and finally phoenix sitting on top of it there's an awful lot to cover here today so we're going to jump right in starting with tcp and more specifically starting with ip so ip really its most fundamental job is to provide every host on the network with an ip address and that's kind of definitional to what it means to be on the internet in the first place is to be able to have an ip yourself and be able to reach any other host on the network via their ip the important thing to realize here though is that all ip does is deliver packets to hosts it doesn't know anything about applications it doesn't know anything about processes or users all it does is deliver packets from one machine to another uh the in job of actually under of identifying individual applications and users belongs to higher level protocols that get encapsulated inside inside ip the most common one of those that we'll be talking about today is tcp so tcp provides application level addressability on top of ip what that really means is that you can take an ip address and a port number and be able to address a specific application running on you know on a remote machine uh once you make a connection to that remote machine the connection comes in the form of a pair of reliable streams these are basically one-way streams one running in each direction and they're binary streams that essentially data goes in one end and comes out the other the directions are totally independent they can be shut down independently they can be used simultaneously and they're most commonly accessed via an api that's colloquially known as the berkeley sockets api this is part of every it's shipped as part of just about every operating system today um and that's really about as far as we're going to talk about tcp today this was probably the shortest of the five sections and frankly probably review for most people but it's really foundational kind of just the same so moving from here up to sockets this is where we actually start talking between remote processes be forewarned this is going to be the longest of the five sections it's also probably the most interesting for most people is this is in my experience where people's understanding is kind of the blurriest about how this stuff works so let's jump in so as i just mentioned the tcp stack is accessed via the berkeley sockets api and the core of that api really consists of about a half dozen or so functions that look like this clients use the connect function to connect to an ip port pair servers use the bind and the listen functions to bind to a named port to bind to a specific port number servers then call the accept function in a loop and every every time they call accept they service a request from an individual specific client once that connection is accepted for a client a tcp session gets created between the client and the server this the two peers can then use the send and receive functions to be able to send data back and forth to each other and that anytime either peer can close the connection in elixir we access this using the gen tc at functions on the gen tcp module within erlang and the functions on that module map just about one to one onto the standard berkeley calls that we just saw ssl is accessed in a very similar manner using just about identical functions the core abstraction of secure sockets is really just a secure socket it's the same abstraction of the socket as tcp and for the most part you can usually just replace gentcp with ssl in addition to a few extra steps for handshaking and stuff and in code that looks like this this is a very simple gen tcp server that we'll just walk through it line by line the first thing it does is listen on port 4000 and then returns a listen socket back to us we use that listen socket we call it in a loop around this accept and handle function and we use that listen socket to go in and listen for a specific client connection this call will block until a client connects to the server uh when that when a client does connect we return we have returned to us the socket that represents the connection to that particular client we then send the string hello world to that client close the connection and call accept and handle to go and process the next request from the next client on this client side it's a little bit simpler we use the gen tcp.connect function to connect to a named ip port pair localhost and 4000 in this case we then wait and receive data wait to receive data from the server in this case we receive the string hello world once we've received that string we close the connection and we're on our way so graphically this stack looks like this at least at this point at the lowest level we have the operating system the operating system exposes berkeley sockets via the socket.h header and libc the erlang run time then via uh it's a port driver implementation in erlang that layers on top of berkeley sockets and provides a gen tcp uh interface up into erlang user land for us to call to and this is really all you actually need to implement the server but working this low level is super tedious doing this performantly is not trivial at all you want to make sure that you know a long running client doesn't block you from accepting requests from other clients and so this is usually abstracted into a pattern called a socket server now with socket server's main goal is what i just described it provides an abstraction on top of raw sockets and it usually sits underneath a higher level application such as a web server it listens for connections over with whichever type of socket it's configured for and it hands those individual connections when they're received up to that protocol layer for for further processing it also handles sort of secondary bookkeeping concerns such as certificate negotiation for ssl uh shutting the server down clearly cleanly and just other bookkeeping functions and of course it does all of this efficiently and scalably the go-to socket server these days in the ruby the in the in the beam world excuse me is is ranch uh ranch is written by the same person like that wrote cowboy um i've written a new one called thousand island written entirely in elixir uh and inspired pretty heavily by ranch but with a number of improvements the price i mean just for starters one of the main improvements is that it's no slower uh performance is basically equal to ranch this is kind of unsurprising considering that they fundamentally do the same thing in very similar ways and it is about as fast as you can really get this kind of a performance a model to be both performant and scalable uh it's about half the size from a line cam perspective as ranch uh this you know well we still have feature parity with ranch we're about half the size from a line count perspective comprehensively documented lots of examples very well tested supports all of the standard socket types you would ever need fully supports all configuration options that the erlang layer underneath it supports we just proxy the options right through fully wired for telemetry including socket level tracing so this is really cool you can use telemetry to jack into a particular socket connection and actually see the bytes coming and going on the wire tremendous for debugging um but probably most importantly it has a very simple powerful and really expressive handler behavior for higher level applications uh and that's what we're going to look at right now so this is an implementation of the daytime protocol which is a really simple one when a client connects the server sends it the current time and some unforum and in some unspecified format and then closes the connection so we implement this by implementing the thousand island.handler behavior and we implement the handle connection callback on that behavior this function is called by thousand island whenever a connection is received from a client uh and it passes the state the socket for that connection into us we then go and format the time as a string and we use thousand island.socket.send to be able to send that string to the socket and then we return close to thousand island which tells thousand island to close that connection a somewhat more refined example is the echo protocol uh this protocol specifies that the server should wait for data to be received from the client it should echo that data verbatim back to the client and then continue doing that until the client closes the connection so once again we implement the thousand island.handler behavior this time we implement the handle data callback on the behavior which is called by thousand island whenever data is received from a client it passes the data in as the first argument we then take that data turn it around pass it back to the client to the client using the socket.send function and we call continue back to thousand island which tells thousand island to keep the connection open and continue waiting to see if there's more data coming these two functions are really the core of the handler process and that process is implemented there's one process per connection so per client connection um and the thousand island.handler module that we just saw is really just a gen server inside that process the cool thing about this is that your process can then go and do all the other gen server things that normally can in addition to handle connection handle data and the other half dozen or so functions that are defined callbacks that are defined in the behavior you can handle calls handle casts run things on your process do whatever you'd like to do receives from this from the client are asynchronous by default using the handle data function we just saw uh but you're also free to write traditional blocking network code using the send and receive calls anywhere you want you can do this on thousand island calls on gen server handle calls any place you'd like to in that process and if all of this is too restrictive there's also a start link based escape hatch if you want to sort of just plug into thousand island at an even lower level these individual processes inside thousand island are hosted inside a process tree that's rooted in the thousand island.startling call that your application uses to start the thousand island server and that process tree is entirely self-contained there's no backing application it's entirely supported for an individual host to run any number of thousand island instances each on different servers each on different ports rather uh and internally those process trees are this multi-level thing that we've designed to minimize contention at all sort of steps of the connection life cycle so that it remains performant under also under you know any type of load and it's really kind of a textbook example of how powerful otp design patterns can be uh to be able to do this as simply and expressively as we have is really kind of a testament to how awesome otp is the project readme file at the url at the bottom of the slide has tons of examples and tons of description of how this works so that's thousand island to summarize the kind of current state of the project the project itself is stable suitable for general use feature parity we have essentially a complete feature parity with ranch we're currently on the 0.5 release series this is likely to be the last before an eventual 1.0 we have big ideas for the future post 1.0 largely around multi-node setups and managing clusters of thousand island nodes transparently but the foundation that you've seen here is and should remain solid so this is what the stack looks like so far on the left hand side we've got the gen tcp interface we saw before with ranch layered on top and ranch exposing the ranch protocol up to higher level applications on the right hand side we have thousand island layered on top of gen tcp exposing the thousand island.handler uh interface up to higher level applications so we're going to move up the stack now and talk a little bit about http so http is really just plain text over sockets at least http 1.1 is it looks like this and many of you have probably seen this or maybe even done it over telnet before when you're debugging a server uh the client makes a request in this case it makes a request to get the resource things 12. it describes some headers and passes some headers that describe some parameters of the request possibly a request body the server then determines what to send back sends back in this case a 200 okay response code some headers that describe the response and the body of the response now the simplest i'll note that in ssl it's the exact same structure here simply run over ssl sockets instead of tcp uh the vast vast majority of the http protocol is unchanged in both cases so this is kind of what a simple http server would look like in thousand island once again we use a thousand island.handler behavior we implement the handle data function except this time we ignore the data that's passed in because we're actually going to just send back a static 200 response code with a static body of hello world and then we close the connection this is a valid server if you load this up in chrome you will see hello world chrome will not complain but it's obviously not super useful when you think about what goes into making a useful server though it's not really that much more complicated than that fundamentally we're parsing a request body from a client possibly reading the body in figuring out what to send back to them and then sending that back to the client conformantly and a few implementation this is i mean what the core of any implementation does the go-to implementation of this in the in the beam world today is cowboy again written by the same by like the same person who wrote bandit it is a complete implementation of http 1.1 as we just saw as well as http 2 and websockets i've written bandit which is a new pure elixir http server written specifically to host plug applications i'm going to tell you a little bit about that right now so bandit isn't written 100 in elixir and it is plugged native its entire job is to serve plugs that's all it does uh and as a consequence we also support really robust http and http 2 conformance written from the ground up to be correct to be performant and to be cl and to be really easy to understand again part of my goal with with these projects is to kind of demystify networking code for people so clarity and readability is really kind of important here and this simplicity also drives some pretty incredible performance numbers that we'll see in just a little bit so turning back to how you actually make an http server do useful things you need to return content and that content is going to be the content from your application right the question is how to actually hand off from an http server to an application server and in the elixir world that's the job of the plug library so we're going to move up the stack now with the plug so plug looks something like this it's an abstraction of http requests and response pairs and if anybody's seen rack in the ruby world it's a very similar it's a very similar pattern the core of it is really the call function as you see here now this is called by plug by your web server implementing plug whenever a request comes in from a client the first argument is a plug.con structure that describes the parameters of that request your application then uses functions on the plug.con module to describe the response it likes to send back in this case it's returning a 200 response code with the body hello world so they're pretty simple but they're actually really really powerful and they're powerful largely because the library the plug library comes with a bunch of useful parts built in it comes with plugs to create pipelines to compose plugs together you have routes to be able to route based on various patterns we'll see in a moment to do a bunch of other useful things that you need to do in the common course of running an http server as well and this is actually enough for an awful lot of applications in fact the original project that spurred on the development of bandit in the first place runs on these basic building blocks for this day and i'm actually just going to take a quick sort of sidebar to highlight that as i think it displays describe some useful tools that you might not be aware of so bandit was originally written to support hap which is a home kit accessory protocol library i've written uh that allows essentially nerves devices to be able to be controlled from an ios device and just participate in the homekit ecosystem and the hap protocol runs over http but with a little bit of a twist that they have this kind of one-off custom encryption that runs on bear tcp um i started trying to do this in ranch and i kind of gave up pretty quickly um and that that's what actually spurred the development of bandit and thousand islands so that i could get lower level and do these things myself but the http that runs on top of this is actually really simple and in the case of hap it runs entirely on the plug.router module plug.rotor provides a really simple dsl that use allows you to kind of match http requests against patterns so posting to pairings or getting accessories in these cases and be able to craft responses in a really lightweight manner uh reformed rubyists might recognize this as being very similar to sinatra and it is so if you ever find yourself thinking that phoenix is a bit too heavy for a particular job i would suggest looking at plug.router as it might really fit your needs quite well uh so turning back to plugs after that little sidebar let's look at how web servers actually call into plugs so in the cowboy world this is implemented via the plug.cowboy adapter which is an adapter written by jose that adapts from cowboy's internal handler behavior into the plug pattern band that has a bit of a leg up here because we are plugged native and so in a stack perspective it looks kind of like this on the left hand side you've got cowboy sitting on top of ranch with the plug dot cowboy adapter you know stuck in there and then it serves plugs up to your application on the right hand side we've got bandit that sits directly on top of thousand island and serves plugs up to your application natively so that's about everything we have to say about plugs we're going to finally turn our attention down to phoenix and it shouldn't surprise many people at this point but phoenix at least the http part of it is just a plug itself it sits on top of the stacks we just saw the websocket support in phoenix is a thing apart i'm going to talk about that in a moment but this is really about as far as we're going to talk as far as we're going to go in the stack today graphically we're at this point here we've now layered phoenix on top of the respective plug apis on the cowboy and the bandit side and i'm going to take the rest of the talk now now that we've kind of described the full stack that sits underneath phoenix and talk about why i think bandit is a really compelling option and a really compelling alternative in that stack compared to cowboy so just to review bandit is an http server written specifically to serve plug applications obviously it's built on top of thousand dialog with full support for http 1 and http 2. we don't yet support web sockets coming to that moment we score 100 on the h2 spec conformance suite uh this is a go application and it's kind of the gold standard conformance suite for http 2 servers we run this as part of our ci suite in addition to the full bat the usual battery of ex unit tests and the whole thing is kind of written top to bottom to be again really clear and approachable and just unsurprising idiomatic elixir the whole way through one of the things that i want to talk about focusing in on how banded is a plug first web server um is the fact that there's no impedance mismatches every bit of code in the project is there with the sole purpose of getting from web requests to plug calls and back again and this means that not only is the code clearer but there's also less of it um at the current time we're about a third the line count of cowboy now this isn't an entirely fair comparison because cowboy does support web sockets i suspect by the time we we we're we're fully feature pair we've reached feature parity with cowboy we're probably about half their line count but we're still much much smaller and this simplicity kind of gives us some really incredible performance and memory numbers um which i'm going to talk about now so in a recent benchmark that i've run cowboy is abandoned is up to five times faster than cowboy depending on your level of concurrency and the type of uh whether you're serving http 1 or http 2. uh on the low end we're 1.5 times faster on the high end we're up to five times faster and i want to be clear here that i'm not juicing these numbers at all this is as good faith of a comparison as i'm capable of doing as a non-professional benchmarker and in fact in several cases doing these tests i actually had to give an advantage to cowboy because it couldn't complete some of the work and higher the more highly concurrent tests without crashing um unless i missed something stupid here i really do believe that these perf numbers are this good in the both these cases they're totally out of the box uh configurations for both bandit and cowboy uh the process behind these benchmarks is at the url at the bottom of the slide if anybody wants to take a take a review of them um but these performance numbers i think really come from how streamlined our approach actually is in building and how we built bandit and we're going to take a look at that right now so http1 and bandit uses a single process per connection this is in contrast with um with cowboy that uses two processors for every http one connection and there's just about a direct mapping between the handle data calls that we saw earlier when we were looking at the handler behavior and plugged up call calls the code to get from a handle data function to a to a plug call is very straightforward and linear there's not really a whole lot of there there which is why i think we're able to be as fast as we are the http 2 side is quite a bit more complicated and it's more complicated largely because http 2 is more complicated the primary design goal of http 2 was to be able to allow a single connection between a client and a server to service multiple http requests simultaneously and it does this by multiplexing those requests by breaking them up into small frames these binary frames that each correspond to an individual part of an individual request the http 2 protocol calls these streams at any given time the client may be sending headers for one stream and the body of another stream while the server is returning the body for a completely different string and you can kind of see that in this diagram here this is all quite a bit to handle and we we handle it like this in hd in bandit we have a single process per connection plus we have the minute and we have a single process for each stream so for each separate client request within that connection the connection process implements the thousand island.handler behavior that we saw earlier and it kind of faces the client and manages everything to do with the with the network connection the stream processes are the ones that actually make the plug calls because recall that streams correspond exactly to http http requests and they kind of face the plug api and then the majority of the implementation just involves coordinating those processes against one another in addition to this we also support all the other goodies in http 2 we do flow control we do push promises continuation frames pings go aways all the all the goodies in the protocol are there we use io data structures everywhere we're able to internally and that gives us another big boost for speed and memory awareness so from a status perspective the http 2 implementation and bandit is complete it's exhaustively tested that was the main goal of the 0.3 release stream which landed last month uh we're now turning our attention back to http 1.1 uh the current implementation kind of dates from the earliest 0.1 days of the project and we're just more or less going over it and tightening the screws on it ensuring that it has the same you know comprehensive test coverage that the http 2 implementation does and that's the purpose of the 0.4 release stream that we're in right now at this point i think the project is suitable for non-production use uh we don't support phoenix i'm going to talk about that in a moment but for the places that we are a suitable drop replacement we are basically a drop-in replacement for cowboy in most cases it's literally a matter of changing the word bandit to cowboy in a few places the startup parameters are the exact same and plug obviously is the entire job of plug is to abstract the server away from you so for the most part a drop in replacement is all you need obviously looking to the future the biggest thing that we're excited about is phoenix support the biggest impediment there is landing web socket support this is going to be coming in the 0.5 release stream of of bandit i will note that websocket support is outside the scope of plug and phoenix actually realizes web sockets right now by reaching into cowboy's internal implementation of websockets in addition to this we also have plans to integrate with phoenix more broadly in terms of the startup and configuration and just general process hosting within it um beyond this also plans to obviously continue driving in performance improvements and to um improve our story about how my how to make it easier to migrate from cowboy over to bandit where there are issues with that um driving a pr campaign to essentially let people know about the pro about the project um getting more users means we fix bugs faster means we get to a 1.0 faster and eventually long term supporting things like http 3 and quick all of this is kind of table stakes for what the actual long term goal of this is and that's to become the de facto networking stack for elixir and the beam i'd like to one day see bandit as the default or i default choice um for new phoenix installs i'd love for this to be the stack of choice that people reach to for all manner of networking in the beam along with thousand island of course and judging by the responses i've received from dry runs of this talk and just from other evangelism i've done particularly with thousand island there's a huge amount of excitement that i'm seeing from people being able to start writing low-level networking applications without the intimidation of uh you know having to jump into erlang which is unfamiliar to a lot of us so i'm really really excited to see where this goes obviously that's an ambitious goal and to get there it's about collaboration so the biggest thing in the near future is working with phoenix to be able to add bandit support depending on the reception from the phoenix team i'd love to look at generalizing their websocket interface into something similar basically plug for websockets or something along those lines andrea and eric from the mint http client project were kind enough to abstract their http 2 header compression library out of mint into a library called hpax which i'm now using within bandit so thanks for that ver thanks very much for that you too and it's great to see code sharing like that between clients and servers um some other things as well some improvements uh to speed up bio lifts and hurts that i have in mind and some other sort of more minor upstream improvements with mostly to do with plug um while we're speaking about our friends i want to be perfectly clear here that i'm not throwing shade at cowboy and ranch at all they are fantastic projects like does tremendous work on them and they are and continue to be probably the right choice for most people at least for the near future um obviously these are both open projects there it's really fun foundational work at least personally this is the type of work that i enjoy most that i'm best at i would love to be able to do this kind of stuff all day uh infrastructure code again really doesn't deserve the reputation it has as being unapproachable and mysterious and i think if you jump into these projects you'll see how approachable this stuff really is uh it is a great they are grassroots projects this is stuff that i'm doing entirely after hours so while resources are thin in that respect there's also a huge opportunity for any of you to jump in and leave a big as big of a mark as you'd like on either of the projects um we've got a number of sort of things that are you know kind of top of mind right now in terms of places we'd love help looking at the http http 1.1 implementation and the websockets work coming up uh improving our property testing story uh white hats to be able to do an audit of the product to be able to of the project to be able to see any holes that we can fix and really anything else other people have in mind um that brings us to the end of our talk today uh hopefully you come away from this with some new nugget of networking knowledge again i'd really encourage anyone you know with even a passing interest in lower level code to take a look at bandit and thousand island uh you know try to understand them maybe even contribute again a big part of their existence is to demystify this stuff and to get people to be less terrified of networking and infrastructure code so don't be shy um we do have about a minute or two left in questions uh if anybody has any that we don't get to or that they just don't feel comfortable asking in public do feel free to reach out uh my twitter my slack and my email are here i will be on the toucan i'll be on toucan right after this talk um and i guess at this point we'll open it up for questions thanks everyone very much for listening again thanks jim and anna and the whole elixir club team for putting on a great show and i hope to talk to all of you soon thank you thank you thank you so much matt it was super interesting your talk uh we have a comment and we have a question we can quickly go through them through them uh the comment is from travis griggs okay and he says that he writes apis but not so much web apps uh so using phoenix uh to do api sometimes feel like driving a semi-truck to to work i could lose i i would love to see some documentation similar too and he's sending us a link that shows barebones security i'm not familiar with the link that he's the the story that he's passed there but the answer to that i think would probably the first place i would recommend to look at is the plug dot router module that i spoke about that's the one that i use in the hat project for the home kit stuff and it it's essentially like the simplest possible thing to be able to match on patterns of urls and you can even extract you know ids and stuff out of those patterns as part of how you match against them uh and then being able to like for example parse json and throw json back at a client is like literally three lines inside inside one of those dsls um i might suggest looking at um the hap project for that at slash android l slash hap on github um i've heard a couple of examples about how we use the plug.router project the plug.router module in that project you can also look at the plug doc api documentation itself it's fantastic and goes talks pretty directly about plug.router and plug pipelines as well thank you so much we have time for another two maybe so uh this one is from ben uh and he's asking yeah i can just drop off okay i i haven't gotten around to doing it yet um mostly because i despite all of this work i've actually written like 20 minutes of phoenix in my life like i'm i'm a very very low developer but the low level developer by my preference um so i haven't actually had a whole lot of exposure to try to do that in theory yes exclusive of any of the websocket stuff and any of the the process startup stuff within phoenix there's nothing that says that we shouldn't work we do completely support the entirety of the plug api and bandit which in theory is all that you know that that that phoenix you know knows about with respect to the http stack so it should work um the longer answer is i'm going to be digging into this pretty extensively as part of the 0.5 work which i hope they'll start in a month or so maybe so we'll definitely have more answers there but in theory yes other than websockets we should support phoenix today
Info
Channel: ElixirConf
Views: 964
Rating: undefined out of 5
Keywords: elixir
Id: ZLjWyanLHuk
Channel Id: undefined
Length: 33min 19sec (1999 seconds)
Published: Sun Oct 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.