Tyler McMullen - Lucet: Safe WebAssembly Outside the Browser | Code Mesh LDN 19

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I guess first off my stats so I'm Tyler McMullen I'm the CTO of fastly you can find me on that Alford waffle bird site I don't actually tweet anymore much but like you know you can find me if you want github balloon Tyler and by the way my cat's name is Pluto that's my cat he is about 20 weeks old and that is usually what he looks like he's like standing on top of something where I don't know how he got there and screaming at me so yeah alright so I guess now that we now that we got through like the important stuff I'm going to talk about like kind of the the motivating things that happened that led to us creating this thing called Lucent so back in December 2016 I had been working on kind of two separate problems one was the thing I wanted to be doing and one was just an annoying problem so the the thing that I really wanted to be doing was trying to work on how to use C or D T's and like other like other mechanisms for distributed systems that would allow me to let our users build like much more interesting complex applications on top of fascinating so I guess if you don't know fastly is a large edge cloud sort of provider we're most known for like our CDN right so we have thousands of servers spread around the world and companies pay us to make sure that their websites are fast but they also pay us to make sure that like they not it's not just about delivery it's actually about the logic it's actually about being able to move some of the logic from their applications out to like the edge of the network right and I really wanted to find ways to make that like more powerful for people so I was working on this like the CR DT related problem and the the conclusion I ended up coming to is that like yeah you know I could do these C or DT and like other things with this but like without really powerful programming languages at the edge it didn't actually make any sense like no one's going to write complex applications and are like in our DSL that we have at the edge right so that was the problem I wanted to be working on and so and then the the annoying problem was that so right but we have this DSL it's called VCL varnish configuration language that we've always let people program at the right the thing with VCL is that it's it's pretty limited right you know it doesn't have loops it doesn't have like pointers and like a bunch of other things right and so it has these limitations specifically for safety right and so you know we had this issue where like you know cool so like the design of the language means that it's safe however a bug in our compiler for that language would then lead to unsafety right and so I was like all right you know I also need to figure out a way to prevent this from like actually like becoming a problem because like you know a bug in that compiler could cause like a global outage for us which would be nightmarish you know existential yada yada so I was you know I was looking at these two problems or I'm like alright so I need a more powerful language but also this language is too powerful what can I do right and so I realized that what we were looking for is actually like a form of isolation right and so there's a bunch of different things that people mean when they say isolation you know we might refer to like resource isolation or process isolation or fault isolation or software fault isolation and there's a bunch of different ways that we do this as well as programmers so it might look like processes or containers virtual machines buh-buh-bah right so but it seems to me that like at its core what all these things are actually trying to accomplish is just this it's about being able to run some code without an affecting other code other than the in any unexpected ways right so you know that's my very scientific definition of what isolation is you know resource isolation is part of that right so you know but I think a lot of times people thinking about resource isolation as just you know oh it's a way to be able to use a computer more efficiently right but I think it's actually quite a bit more than that right it's about actually being able to take resources that you have and divide them up such that like there's no overlap between different things that are running anyway so the other part to my mind is actually it sounds a lot like something called F dir is anyone familiar with F dir all right so this is a there's a concept from like control theory and what it stands for is fault detection I isolation and recovery so usually this is like this is referring to monitoring things like thermal signatures and vibrations and oscillations and so on and like machines right so it doesn't apply directly but rather in a metaphorical way I like it because it actually separates it out into the three stages that you need so detection implies being able to tell when a fault has occurred which is not as trivial as people may imagine sometimes isolation in our world means being able to prove that when a fault does occur that we know exactly what the blast radius of that fault is and recovery for us anyway means being able to ensure that the execution can be restarted and the system kind of continues totally oblivious to the fact that anything has actually happened right so when it comes to software like to my mind this actually looks like two different things right this is actually control flow integrity and memory safety right so when I say control flow what I'm talking about is like kind of the analysis of the set of possible paths that a program can take right when I say memory safety I'm referring to the fact that like you know basically we can divide up our memory and be able to say like this program definitively can write to this area and definitively cannot write to this other area and so to my mind anyway like if you can work out these two problems you can create your own like isolate you can create your own like sandbox those are the two main issues though so so I think just when you when you examine what guarantees if any you have for control flow integrity and memory safety what you get out is this thing called a fault domain and a fault domain is the answer to the question what is if this program blows up there you go again scientific definitions so all right so so you know we're working through this problem trying to figure out exactly how we can provide this isolation for our customers so that they can run more complex code at our edge right and like there's the the most common way that people do this is either processes or containers which are essentially the same thing so upsides of this it's well known like this is a concept that's been around for like 40 years rightly this is well known well tested like well trodden path it's operating system-level you even like not that this is a problem back in December 2016 but you also get decent specter protection with processes which that definitely threw a wrench into this project but no so the downsides is that you know when we when we're approaching this problem we're like okay not only do we want to provide isolation but we also really want to provide isolation on like a per request level which at a scale like ours means like on each individual machine I need to be able to do like 20 thousands have like 20,000 sandboxes living at the same time and spawning every second or so right and so basically if I tried to do that this would take us back to like the the sea 10k like before the sea 10k days does anyone remember sea 10k yeah exactly we don't want to go back to this it's bad news so okay so processes are out so we started looking around for other things that could potentially work for this I'm one of the first ones that came up was this thing called Native Client as anyone familiar with Native Client all right yeah so Native Client was a thing that was built into Google Chrome back in the day and I don't know if it's still there it may still be there but the whole idea with this is that it was a modified they had a modified version of GCC that would produce native code that could be sandbox within particular memory region which like good luck with the GCC part but like but the so there was a you know okay so this is interesting like it had really fast startup time which is great and it was like very close to native like performance with code so two really cool things there unfortunately no so the downsides of this there are two both of which are kind of deal-breakers here each one of those Native Client sandbox 'iz takes eighty four gigabytes of virtual memory space so the reasons for that are actually kind of fascinating because like you basically you get four gigabytes of memory in the middle but because of the way x86 64 is memory operations work they basically need to block off 40 gigabytes on either side so that you can't jump out of it and so like virtual memory pretty cheap but like 84 gigs is quite a lot and then the other kind of deal-breaker here is that it's also end-of-life which it was end-of-life about a week after I started looking at it which was great so we went down a bunch of other paths for instance this was my personal favor one a long series of academic projects written by grad students no no offense intended but like you know there there's a lot of things I was like oh cool if I just install oh camel and pearl and C and C++ then like you know I have a working sandbox that hasn't been maintained for six years so that wasn't that wasn't a great start either you know we landed on v8 for a while like this is one that's actually pretty common that people use like v8 is like probably like the the thing that people use for doing like sandboxing of like relatively complex code in places that are relatively resource constrained you know so it has wide support it's well-known decent sandboxing but we decided against this because really there's a couple things here right so one of them is like single language you know eventually they got webassembly support in there but it's still webassembly via javascript and still kind of too heavyweight like we're still talking milliseconds to start one of these right we need less than that and so I also think okay so I'm gonna go in like a mild rant for a second but like we're talking about HTTP requests and connections and so on I think that this is just also ultimately like kind of fundamentally the wrong model to use right because if you're gonna use one of these you need to have it last for more than like a single request as it flows to the system like when you see people like you know like lambda and others do this what they end up doing is they start one and then it lives for quite a while right which has some unintended side effects right so for instance it has this unintended side effect of accidental statefulness so basically if you have multiple requests running through one after another you can see state from the previous ones which you may then think is state that is actually gonna stick around but can disappear at any moment so it can be kind of a tripping block for users but to me the bigger problem is actually like the accidental data leakage between your end users right and so when we were approaching this problem I really wanted to find a way to make it so that you couldn't accidentally leak data from like one of your end users to another one of your users by accident so okay so v8 is out so eventually we ended up landing on webassembly which if you were in the previous talk you probably already know something about this but that that to many people could be actually pretty surprising right because as mentioned it's webassembly it's not really web and it's not really assembly so what the hell are we using it for for this but to me web assembly despite the name means something very different to me web assembly is actually the first fast language agnostic retarget able and safe intermediate representation that we have ever actually agreed upon to any extent as they like computing industry / community right like the closest we ever really came before was the JVM and I don't think anybody was terribly happy with that one so like this is actually like super exciting to me right there's a lot of good things about web assembly so for instance that is the entire syntax of web assembly fits on one slide really easily that is the entire almost the entire type system for web assembly it has a sound type system meaning like you know you essentially you can't fool a web assembly system that is well well written to into accepting an incorrectly typed program it even has a small step semantics that comes along with it meaning we have the means to prove the safety of the allow some specification the other thing this means really though is that the language is small enough that someone bothered to do it which is cool okay so so okay so we're using web assembly like what else is good about it like as I mentioned like there is actually like community support behind this at this point right so we we have a bunch of different languages that are targeting it now so we have typescript we have like preliminary supporting go the go support is like kinda gnarly at the moment rust is actually quite good at this and basically any language targeting LLVM can also use it which I don't know if anyone knows the for like the thing that these four have in common what's interesting about these four in particular well it's actually the fact that it can be organized into a gopher writing a unicycle being chased by a dragon um so you got that so it was too quiet yeah okay so our contribution to this ecosystem is something called lucid and usually the lucid is using something called crane lift and I will get into what exactly these are shortly here but so lucid is a compiler and runtime for web assembly that's not meant to run in a browser right it's made for fast and very high concurrency execution of web assembly programs so again like I mentioned like one of our servers in a single process might have 20,000 of these running all at once it can spin them up that quickly as well so interestingly the engine behind lucid is something called crane lift as anyone like looked at crane lifts they're heard about crane lifts yeah okay so crane lifts is from Mozilla has actually been like a you know fast it was one of the early collaborators with them on this so we've been working together on crane lifts for several years now so basically like the same engine that powers Lucette is actually the the engine that is powering Firefox as web assembly as well which is really cool so we think this is kind of important because like in order for a model like this to truly exist it has to be widespread like we don't really want to have ten you know dozen copies of like a web assembly compiler spread around like you know it's it's useful to work together on these things that's it's hard so yeah okay so it's kind of like I'm gonna I'm gonna jump into what exactly lucid is and how exactly it works and this is where it's gonna turn it's gonna start very shortly turning into like kind of an intense compiler talk but right so we have these three languages let's say right now and so the way that this whole process starts to work is that you know all of these things are able to target web assembly right and they all have their different ways of targeting above assembly typescript has its assembly script compiler rust has Russ C which now has like stable support built-in and LLVM also exists for C right by the way there's also a new thing coming out for web assembly shortly that we've been working on called huazi which kind of addresses some of the some of the problems that people have had with trying to develop what exactly the interface looks like between web assembly in the browser or in our case web assembly in the server web y'see is essentially web assembly system interface so basically like a syscall layer like a standard syscall layer for webassembly so this is cool right so you know if we wanted to run these on a client or a server i've kind of talked about this a little bit like but like up till now the answer has basically been okay cool if you want to run it on the client you have all these different browsers which is actually a sounding like if you're around back in like the browser wars days the fact that four different browsers agreed on a single spec to implement is actually really cool so our answer though to the latter part is is lucid of course so lucid isn't isn't ahead of time compiler so obviously it goes take a web assembly module plug-in to loose it out the other end you get an elf object file right you know with relatively standard like system five calling convention type things as well we're not doing any sort of jetting or anything like that so okay how does lucid actually work this is where it kind of turns into just to talk about how modern compilers actually work so right so we kind of start with the parsing side of things so we have this parser which walks through the structure of the web assembly module right so it's building up all these internal data structures web assembly is is it's composed of a bunch of different like sections right so your code section and your import section your export section and so on right so our parser is just walking through this and developing its internal data structures as it goes that all ends up flowing into the verifier which is actually like a pretty crucial part of this entire system right because again this is the part that like that sound type system for instance plays into right so if we want to be able to like know for certain that this web assembly module is not going to do anything evil one of the things that we need to do is verify it very carefully if we want to get any of the properties that are built into that language spec so assuming almost well this flows into our translator and so what the translator does is it actually like it takes that web assembly and as was mentioned before like web assembly is a stack based language there's a bunch of interesting characteristics about it but like we turn it into a more standard IR right so we translate this into the crane lift intermediate representation which ends up looking a lot like LLVM ir but actually slightly lower level okay so that flows into the code generator which we'll talk about momentarily and when we get out of the code generator what we're getting out is native like machine code so we have to put that into an artifact so what that artifact is is essentially like the machine code all laid out as well as a contract and we'll talk about the contract in a little bit and essentially the contract says like you know this is the requirements of this particular module if you want to run it safely okay right so crane lift okay so green lift also has a verifier blah blah blah ones it's verified the first set of optimization passes is what we start start with inside of crane left right so the okay so we have our crane lift ir and there's this whole idea of a pre optimizer versus a post optimizer and a lot of compilers and so the idea with this is that the pre optimizer is working on the generic ir right so as we generate this ir we don't know anything about the we don't yet know anything about the machine on which it's going to run so the optimizations that we're going to run at this stage are essentially the ones that can just work on the ir itself they're machine independent so these are things like arithmetic simplification branch collapsing and so on kind of like the basics of optimization right so after that we flow into the legalize ER which sounds super boring but it's actually really interesting so the legalize ER is the step at which native machine code is actually generated right so this is the stage at which we go okay we have all of these ir instructions here now we're going to map all of those ir instructions to individual like machine code operations right so at this point like it is now specific to an individual machine so in our case this would be like x86 64 so there's like a one to one mapping going on there alright so after that flows into the post optimizer so this is where things like like machines specific optimizations start to happen within the compiler so for instance i think i think one of the more interesting ones is x86 64 has a bunch of different ways of loading and storing memory right well his instructions for loading and storing memory are actually really simple they're like really trivial so a lot of times what we can do is like if you have a few of these in a row we can actually combine them into one instruction so this is like this is again do you like machines specific like processor specific optimizations that have to happen all right so we're almost there once we're satisfied with that we take our we go to the register allocation phase register allocation phase is essentially taking all those like symbolic those symbolic values that we've been working with throughout the IR and now we turn them into either like load and store from a register or spill and fill from the stack essentially okay and finally we roll into the branch relaxer which is a weird name for this but essentially what the branch relaxer does is actually like lays out the machine code right it kind of figures out what the actual like memory layout like you know okay this function is gonna go first and alright and then like this specific jump needs to jump like n bytes forward to get to this other function and so on right that's kind of what the branch relaxation phase does so it just lays it out in memory all right so when we put it all together and ends up looking a little bit like this so whenever anyone tells you that a compiler is made up of lexing parsing and code generation you can point them at this and yeah modern compilers are actually like really complicated pieces of software right so lucid is both the compiler and a runtime all right so the runtime is where like most of my team spends its time right so it's kind of critical because this is how we get the performance out of this system when for instance like one of one of our primary goals was to have like ridiculously fast cold start time for for our sand boxes that are spinning up right so like I'll get into this a little bit later but basically like we our spin up time looks like something like 35 microseconds right now for an individual sandbox which is orders of magnitude faster than anything else at least that I'm aware of so we write so the artifact that lucid produces doesn't just contain code it actually contains like a bunch of metadata that forms contracts about the code so what that contract says is essentially like I the compiler claimed that this code is safe for you to run as long as you the runtime are configured such that the environment meets these certain specifications and most of what those specifications are is actually just about memory layout right so essentially like the compiled code references memory directly like well relatively directly essentially like the the first argument to any function in the compiled code is a reference to actually it's like right there between linear memory and Global's it's a pointer to right there and everything is essentially an offset either forward or backwards from that right that's one of the ways that we like manage to do this really fast and so the whole idea is that like you need to know very specifically where exactly everything is laid out if you're going to get the performance benefits of this because the alternative to it is that you have to go through abstraction layers to figure out where other things are does that make sense right so you you either have to like go like oh I would like to I'm gonna call a function that will tell me where to find the heap for this program or you can access the heap directly but if it's not exactly where you think it is it's just gonna go real haywire real quick so we have to be very specific and very like conscious about this contract to make sure it's very like accurate so the other part that makes this complicated is that it's not about just one of these it's actually about again like thousands and thousands of these all like running simultaneously the way I think about it is that loose it runtimes like looser runtimes job ends up ends up looking a lot like an operating system kernel right it's like a micro operating system that is running inside of a single process this all brings us to like all right so so essentially like the way that we get this like ridiculous cold start time for one of these sandbox is essentially by doing nothing like the code is is compiled such ahead of time that effectively like we have to load the module and then we have to do a stack swap like we do a context switch onto a different stack and jump into the code and that's about it that's like kind of the novel thing about this right so I found this kind of hard sometimes for me and others actually for that matter to to really like grasp how exactly like how fast 35 microseconds is so I made a demo for you so this is roughly how long it takes for a container to start there you go okay we're done right so then we have v8 much faster very nice and loose it is that so that's kind of a visual and also like time-based representation of like the kind of difference we're talking about here like the orders of magnitude difference but to me the thing that is interesting about this isn't just like you know I love it when things are fast it's kind of like my jam but like to me it's not actually about fast in this case because no one really cares about the difference between five milliseconds and 35 microseconds like no user is noticing the difference between those two things what this actually does is allows us to have multi-tenant systems that are much more granular the real point is that like the speed allows us to be able to spin one of these up fast enough that like you can essentially sandbox any individual thing inside of a system right so kind of fun too it's kind of a to me that's a fundamental change so for instance you know you could you can sandbox individual connections inside of a load balancer that is running it like you know near wire speed you could do queries in a database this is one that's come up a bunch of times already so essentially you could have a web assembly sandbox per query you know you could actually individually sandbox users inside of your API in our case HTTP request and an edge network and really like and this is kind of another interesting one modules in essentially any system so there's an interesting thing that's happening in the web assembly community group at the moment where we're trying to standardize this concept called interface types and what the idea behind this is that it it allows you to essentially have what's called shared-nothing linkage between different modules so like you know you I assume any of you're aware of like the things like static linkage versus dynamic linkage right so essentially like at compile time versus at runtime and the whole idea was shared-nothing linkage is that you actually have mold different modules that are all running independently that are capable of being linked into each other at runtime without actually sharing anything other than exactly what they have decided to share so in our case this could look like you know I have a you know I I have a library that I don't want to share the code for but I can run it on my I can run it on a machine and another webassembly program regardless of what language it's written in would be able to link against it as it starts to run that's kind of the idea here so again like this is all like this is enabled by the fact that we can start these really really fast write because you can start many of them ok so right I guess I'll talk about edge computing a little bit the this kind of ends up tying into the same thing like a few years ago I described like the edge computing thing as kind of a misnomer because like the whole concept of the edge of the network is like a nonsense idea when you think about like how the internet is actually structured but if you look at it from the perspective of an individual application and this is kind of getting into like fastly stuff a little bit but like at least the way that we think about it if you think about it from like the perspective of an individual application you might think of like origin servers like your you like your you know your your AWS servers that you have spun up somewhere or your data center itself as like kind of the root of the tree or like the trunk of the tree and all of your clients as a leaves right and so the whole idea with the edge computing is that it's not actually about moving things to the edge it's about moving things into those branches right it's about in network computing and so you know it ends up kind of looking like this right like we previously kind of had this concept of like you have your origin you have your clients and we're basically just growing the tree so that there's more nodes in which like there's more hops inside of this with which you could actually do on which you can do computation right and so this model is interesting to me because it kind of implies a bunch of different ways that interaction could happen right so for instance there's lots of things that could happen without an origin at all or your origin could move entirely you could have multiple of them that just kind of transparently work likewise like you could actually have things happening like within that network that are kind of like not even requiring an origin that are like kind of local to specific regions for instance but it's so this is kind of the problem that I was trying to solve when we started this when we started this um this whole thing is that like I really wanted to be able to make it so that people could move computations to where they were most efficient to run because essentially like in a lot of ways right now you have two options you either run it on the client or you run it on the server and that's about it right and there's lots of pieces of problems that actually make sense to run at different places within the network the problem to me has always been like the problem that I ran into when we were trying to do this is the fact that like I can't actually like provide a consistent developer like deployment environment across all of these things right like if I'm trying to convince people that like okay you know edge computing really cool but also use this totally different thing to do it like people don't actually buy into that right and so the whole idea that I'm trying to approach here is like how do I let people write programs in a language that they know with an experience that they like understand that can actually like work across all these different things and so lucid is like our idea of how how one might actually be able to do that so it works for fastly but like it we also are hoping that other people will adopt it and use it for themselves I'm not gonna talk about this part so right if you're interested lucid is open-source we're always looking for like new contributors and so on so you can find it there if you want to try it out there's a demo up it's called terrarium one fastly labs calm and just yesterday actually like I had to add this today into my talk we announced our actual products that spit on top of this called computed edge so again that's the that's the whole idea so okay thanks any questions thank you very much this is awesome um two questions the first one is how does this compare to isolates from CloudFlare and their use of v8 and then the second question is like as a product if I want to integrate this what's like what kind of like what's the order of magnitude of work I need to do to use it with the order of magnitude for what an invalid acts like if database and trying to isolate it like user queries like what would what would it take currently like that was one of your examples oh sorry if you were using v8 you said no if I if I want to use loose it it's like one of the examples is you can rent queries and the database itself totally like how would the database go about implementing it totally ok so first question was about like CloudFlare site so it's I I have to be especially careful especially now that we are a public company I cannot say anything about that but if I assume that it's based on v8 which I believe they have said then yeah we're talking like two or three orders of magnitude in like cold start time right difference which means that we can again like you know do things much more granularly so the other question was like how would it work inside of a database I don't know I haven't actually implemented this gap to like the whole idea is that you essentially need to so you have your database program and you essentially need to define an interface that the webassembly program can operate against right so like this has been a problem for like a bunch of different people who are working on this and so Y Z is our attempt to do that but you can also have it be like very specific so for instance if I had a web assembly program that had access to a set of functions that was like okay I don't know I never actually written a database I look this up in an index combine this with that give me a reference to like data back to this and then like essentially like the the program the query itself could end up being written in whatever language you actually want you really what you have to do is figure out like what the efficient interface looks like between your program and the lake web assembly program I think that's the name so you can talk can you talk a little bit more about like Wasi and like is that so the little programs can access the file system is that what exactly is that doing yeah so the initial version of huazi basically looks like POSIX so like the whole idea is that like the the initial idea for this is alright let's stop having these kind of poorly defined interfaces so like for instance Emscripten which is kind of like the original web assembly compiler back in the day has its own like interface that it assumes will exist on the other side of the web assembly program and so what we have seen is that like a bunch of things like for instance go decided like okay well we're just gonna use em script in the same interface right but that interface is essentially like it's like underground underscore underscore sis call 16 for instance right so it's not like a really user-friendly sort of thing right so well as use our attempt at defining like a much more well structured API that can be targeted by multiple different compilers so that starts out looking like POSIX with like file systems and network access and so on but we're also planning to grow it into other things so one of the proposals right now is for like what an HTTP client and server interface would look like so this is something that could exist both in like the browser as well as on the server and so the whole idea here is that like if you can program against if your compiler targets that interface you can actually run your web assembly program across different platforms without ever actually having to care about it that's the idea sorry I'll repeat that real quick it's just saying like Wesley also has like the idea of a capability based permission model right so you can define exactly which api's can be called on which objects and your system yep whereas like a traditional POSIX API you know it can if it has access to like the read sis call I can read whatever it wants exactly yeah so that's the other interesting part about whether or Y Z is that it so for instance there is no open sis call there is only an open act it's this call so the only thing that you can open basically like anything that you access has to have already been granted to you or like essentially like a parent of whatever you're trying to access has to have been granted to you so you can't open things anywhere except in directories that were handed to you at the beginning of the program execution so it's actually like a pretty pretty like cool security model for it yeah anyone else all right give them a hand [Applause] you
Info
Channel: Code Sync
Views: 4,329
Rating: undefined out of 5
Keywords: Lucet, WebAssembly, Infrastructure, Open source, Edge computing, Tyler McMullen, Code Mesh LDN, Fastly
Id: QdWaQOgvd-g
Channel Id: undefined
Length: 36min 30sec (2190 seconds)
Published: Wed Feb 05 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.