Bringing WebAssembly outside the web with WASI by Lin Clark

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi i'm lynn clark and i make coke cartoons and they also work on developer technologies at mozilla so if you aren't familiar with our team we work on the rust core language and web assembly and the rust to web assembly tool chain and if you aren't familiar with web assembly it started as a way to run programming languages other than JavaScript on the web so languages like rust and C++ and C as well but lately our team has been pioneering a lot of the work on a different application of web assembly this is web assembly outside the browser use cases so what is that the stage a little bit here and explain a little bit about what that webassembly outside the browser means so with web assembly outside of the browser we're taking web assembly beyond just interoperability with the web platform and on to interoperability with all of the things so for example the how you could run a web assembly module using rich api's with high-level types like strings and objects when running your code in Python or Ruby or PHP scripting languages they're running in their own runtimes and then turn around and take the very same module and use it to talk directly to the host or the operating system using those same high-level types even though the types that are used on the operating system are different than the types in something like the Python runtime and then use those sanitize those same high-level types when talking to a web assembly module written in a different source language that has different types so for example having a web assembly module written in rust talking to one written and go now why would you want to do this why would you want to use it as a web assembly module why would you want to use it as a web assembly module in all these places a few reasons if your app is in a scripting language like Python then web assembly could be much faster you could get near native performance without the hassle of compiling a native extension if your app is in a low-level language like C++ then webassembly can give you lightweight sandboxing the module can't access memory or resources unless they've been given directly to it so this can help make reusing code a lot more secure and for both scripting languages and low-level languages being able to reuse code from any language ecosystem without having to rewrite it in your language can help you work faster and also make your maintenance costs a lot less if there were a way to run the same web assembly module across all of these different environments that would unlock a lot of wins so that's something that we've been working on with something called the interface types proposal and I wrote a little bit more about this in a blog post that's on the hacks blog Mozilla hacks blog a few weeks ago so if you want to dive into the details so you can read that post but right now I just want to show it to you in action so now I'm going to do a demo but I have to say I have a healthy fear of the demo gods so I've actually pre-recorded everything here so now we need a scenario and this area I'm gonna use is let's say that we want to build a tool a build tool for taking markdown files and turning them into static websites so for that I need a markdown parser and also to show this off I need a module that actually uses interface types so we're gonna go ahead and create one so I'll find a markdown parser written in a language that can be compiled to web assemblies so something like rust now since I'm not the author of this module I can't add the interface types directly to the module so I'm gonna wrap it but if you were actually coding your own module you could just do this in your modules code so I'll create a render function which use the string types in Elana wasm bind Jenny annotation above it this does all of the magic it knows the various rough string types and knows that they should map to web assemblies high level string type down that's not part of the web assembly spec that's part of the interface types proposal and then I'm going to compile it and I'll do this using a tool called was--and PAC and I'll use the wasm interface types flag because this is still an experimental feature we need to do this to let the compiler know that we're we're using this feature and this gives us the single wasm file to use in all of these different environments now for our first environment let's go with pure web assembly for this we need a webassembly runtime that comes without JavaScript so one that can easily be run outside of the browser and that's what our wasum time runtime is so we'll download wow sometime from why some time dev and then we can run this module and pass in the markdown string and you see the web assembly module took the markdown string and returned to the HTML string even though the runtime doesn't know anything about how rough strings work they were able to communicate with each other using this high-level type so that was easy and straightforward but what about Python can we use this markdown parser there yes and we may want to for speed to do this we download the Waze some time extension now this makes it possible for Python modules to call web assembly functions now all I need to do is import the extension and then I can import the markdown module and then I can call the render function and now we run this and again it works the types are different this time we're passing in Python values but it still just works because of the magic of interface types the same file runs in the same way we can also use the same wipe assembly module and rust now one reason you'd want to use it here is for that late weight sandboxing that was talking about before that isolates this third-party module away from the rest of your application so let's talk about how this works we had wasm time rust as a dependency and this does the same thing that the Python extension did before it gives us something that we can run web assembly in and then in the main file we add the Y's and time roster macro and a trait and a render method but we aren't going to add an implementation of this render method that's actually what the web assembly module is instead what the this tool does is it just wires this up for us so that rust knows that the implementation is in the web assembly module and it also adds other methods on that trait like load file so load file means that we can instantiate this web assembly module in our code and then we can call render and something that's important to note here the result is strongly typed it can be used exactly the same way as a natively compiled version of this same functionality so now let's use cargo to build and run it again it just works except with a different environment using different types now this might not seem impressive because we compiled the original module from rust but it will work just as seamlessly if the Wisie module were compiled from C++ or go as long as that module we're using interface types where else can we make this work well I don't have enough time to show you but this also already works and node and on the web through wasm bind gen so there's the same module compile to web assembly and running in five completely different languages and runtimes and those are just a few examples there's no reason why this can't be supported and a whole bunch more language runtimes and beyond that there's no reason why this can't work for talking directly to operating systems and this is where we get to what I really came to talk about so would we want to have web assembly modules talking directly to the operating system well it turns out the question - the answer to this question is yes as we figured out this spring when we announced the standardization of a new system interface wozy for doing just that and there's been a huge uptick in interest since then with people like the co-founder of docker solomon hike saying if this tech had been around 10 years ago they wouldn't have needed to create docker so let's dig into this more why would you want to have web assembly modules talking directly to the operating system well I already talked about a couple of reasons there's the portability which I demonstrated you know with the same module running across all of these different environments I'll be talking about that more throughout the talk and also be talking more about the security that lightweight sandbox scene that I mentioned before so web assembly is portable and secure and we'll talk about why these two things are essential in the rest of the talk but I also mentioned the speed and that's another big reason why you want to use web assembly what assemblies fast if you want to run code in a safe portable way it would be hard to find anything faster it gives you the ability to run code at near native speeds and it's not just that the code runs fast what assembly code can also give you fast startup times and web assembly VMs can start up more quickly than other VMs like JavaScript VMs so for example fastly is using web assembly to handle requests for their service service platform and they do this by spinning up a new virtual machine instance on every single request which gives them a lot of security they can instantiate web assembly modules in under 60 microseconds now for comparison if your instantiate Nikias module in a runtime like v8 it takes about five milliseconds web assembly is also scalable it takes a lot less memory to run a web assembly VM then it does a JavaScript VM now I mentioned fastly before Dara webassembly runtime only requires a few kilobytes of memory overhead versus tens of megabytes with something like v8 so this means that they can fit tens of thousands of simultaneously running programs in the same process as opposed to hundreds with v8 so that's why you would want one to run web simply outside of the browser now let's look at who or what what exactly is huazhi when I was introducing web assembly I talked about how it was an assembly language for a conceptual machine not for an actual physical machine this is why I can run across a bunch of different machine architectures web assembly is very close to the assembly language that most machines use so the runtime only needs to do a small translation from web assembly to the actual assembly language specific to the machine that the code is actually running on it goes from the assembly language for the conceptual machine to the assembly language for the actual machine just as web assembly is an assembly language for a conceptual machine web simply needs a system interface for conceptual operating system not for any single operating system and this way it can be run across a bunch of different kinds of operating systems and this is what huasi is the system interface for the web assembly platform now who's working on this huazi is being standardized by the web assembly community group we've chartered a sub group to work on it and you can find out more in the web assembly / huazi repo so if you want to get involved in this you have a few options you can join the CG and work on standardization or you can also start contributing to one of the implementations such as wasone time the web assembly runtime that we're working on at Mozilla or if you have an existing project that can use y'see feel free to get in touch and we can talk about that so for example we've been talking with the node folks about how they could possibly use huazi to make native module development a lot easier and say so now that we've covered all of that let's talk about huazi in more depth now some of you may already know how operating systems work but I like to cover things from the ground up to make sure that we're all on the same page so I'm going to give you a few operating system basics many people talk about languages like C giving you direct access to system resources but that's not quite true these languages don't have direct access to do things like open or create files on most systems why not because these system resources such as files and memory and network connections are too important for the security and the stability of your code if one program unintentionally messes up the resources of another then it could crash the program and even worse if a program are used or intentionally messes with the resources of another it could steal sensitive data so we need a way to control which programs and which users can access which resources now people figure this out pretty early on and they came up with a way to provide this control which is protection ring security with protection ring security the operating system basically puts a protective barrier around the system's resources and this is the kernel the kernel isn't the only thing that gets to talk to yeah that gets to do things like creating a new file or opening a file or opening a network connection the users programs run outside of this kernel and something called user mode if a program was to do anything like open a file it has to ask the kernel to do it for it this is where the concept of the system call comes in when a program needs to ask the kernel to do one of these things it asks using a system call so this gives the kernel a chance to figure out which user is actually asking and then I can see if that user has access to the file before it opens it on most devices this is the only way that your Co can access the system's resources through these system calls it's the breeding system that makes these system calls available but if each operating system has its own system calls once you need a different version of the code for each operating system well fortunately you don't how's this problem solved abstraction most languages provide a standard library while coding the programmer doesn't need to know what system they are targeting they just use the interface then when compiling your toolchain picks which implementation of the interface to use based on which system you're targeting this implementation uses functions from the operating systems API so functions that are specific to the system and this is where that system interface comes in you know so for example put C is the interface when it's being compiled for a Windows machine it can use the Windows API to interact with the machine and the implementation and if it's being compiled for Mac or Linux it will use POSIX instead this poses a problem for web assembly though with webassembly you don't know what kind of operating system you're targeting even when you're compiling so you can use any single operating system system interface inside of the web assembly implementation of the standard library as I mentioned before this is why web assembly needs that conceptual operating system not the one single operating system but there are already run times that can run web simply outside of the browser even without having this system interface in place how do they do it well let's take a look the first tool tool for producing web assembly wisdom scripted it emulates a particular operating systems system interface POSIX but for the web so this means that the programmer can use functions from the c standard library Lipsy in their code now to do this and script and created their own version their own implementation of Lipsy for the web platform this implementation was split into part of it was compiled into the web assembly module and the other part was implemented in jeaious glue code that that web assembly module talked to and then the JSE glue code would call into the browser and the browser would call into the operating system most of the early web assembly code was compiled from in script in so when people started wanting to run web assembly without a browser they started by making them script and compiled code run so these run times that were running this code they need to create their own implementations for all of these functions that would have been in the jeaious glue code there's a problem here though the interface provided by this jeaious glue code wasn't designed to be a standard or even a public-facing in an interface at all because that wasn't the problem it was solving so for example a function that would be called something like read in an API that was designed to be a public interface for that the JSP code instead used underscore system 3 with the parameters which and VAR arcs so that first perimeter which is an integer which is always the same as the number that's in the function name so 3 in this case the second parameter VAR args is the arguments to use and it's called VAR args because you can have a variable number of them but webassembly doesn't actually provide a way to pass in a variable number of arguments to a function so instead the arguments are passed in via linear memory so basically via the heap now this isn't typesafe and it's also slower than it would be if the arguments could be passed in using registers now that was fine from script and running in the browser but now runtimes are treating this as a de facto standard implementing their own versions of this jeaious glue code so they're emulating in an internal detail of an emulation layer of POSIX this means that they are reimplemented choices like passing arguments in his heap values that made sense based on M script ins constraints even though these constraints don't apply in their environments if we're going to build a web assembly ecosystem that lasts for decades we need solid foundation this means our de facto standard can't be an emulation of an emulation but what principles should we apply this is where we come back to these two important principles I've talked about before and which are baked into web assembly Portability and security we need to maintain these key principles as we move to outside the browser use cases as it is POSIX and UNIX is access control approach to security don't quite get us there so let's look at where they fall short we'll start with portability POSIX provides source code portability that's what the PM POSIX is for portable you can compile the same source code with different versions of Lib C to target different systems but weba simply needs to go a step beyond this we need to be able to compile once and run across a whole bunch of different machines so we need portable binaries this kind of portability makes it much easier to distribute coach users now let's look at security when a line of code asks the operating system to do some input or output the operating system needs to determine if it's safe to do with the code asks operating systems typically handle this with access control that's based on ownership and groups so for example the program might ask the operating system to open a file and they'll ask on behalf of the user who started the program that user has a certain set of files that they have access to either because they own them or because they have they're part of a group that has access so when the user starts the program the program runs on behalf of that user and if the user has access to the file then the program has access to that file this protects users from each other and that made a lot of sense when early operating systems were developed you know systems were often multi-user and administrators controlled what software was being installed so the biggest threat you had was that other users would take a peek at your files that's changed though systems now are usually see user but they're running code that pulls in lots of other third-party code of unknown trustworthiness so now the biggest threat is that the code that you yourself are running is going to turn against you so for example let's say that you had a library that you're using in your application and it gets a new maintainer as you know often happens an open source that maintainer might have your best interest at heart or they might be one of the bad guys and if they have access to do anything on your system for example open any of your files and send them over the network then their code can do a lot of damage this is why using third-party libraries that can talk directly to the system can be so dangerous web assemblies way of doing security is different so webassembly a sandbox by default this means that code can talk directly to the operating system but then how does it do anything with system resources well the host which may be the browser or it may be a runtime like wasum time puts functions into that sandbox that the code can use so this means that the program can limit what a program the the runtime can limit what a program can do on a program by program basis it doesn't just left the program act on behalf of the user calling any system calls with all of the user's permissions now just having the mechanism for sandbox scene doesn't make a system secure in and of itself you know hosts can still choose to put all of those sis calls into the sandbox in which case we're no better off than we were before but it at least gives host the option of creating a more secure system in any system interface that we design we need to uphold these two principles portability makes it easier to develop and distribute software and providing the tools for hosts to secure themselves or their users is an absolute must now given those two key principles what should the design of the web assembly system interface look like well that's when we're figuring out through the standardization process but we did start it with a proposal and that is to create a modular set of standard interfaces and start with standardizing the most fundamental ones so what will be in these most fundamental Razzie core modules well in our first proposal we said that there would be this huazi core module that would contain the basics that all programs need now this has changed a little bit we've actually whittled that down considerably I'll get to that in a moment but fundamentally the early waxy modules will cover much of the same ground as prospects including things such as files network connections clocks and random numbers and why's anyone think is a very similar approach to POSIX for many of these things so for example we'll use POSIX --es file oriented approach where you have things like open closed read and write and the rest of the system calls are built as augmentations on top of those but Wasi core these early huazi modules won't cover everything that POSIX does so for example the process concept doesn't really map very easily onto webassembly and beyond that it doesn't make sense to say that every web assembly engine needs to support process operations like fork but we also want to make it possible to standardize fork this is where the modular approach comes in this way we can get good standardization coverage while still allowing niche platforms to use only the parts of huazi that make sense for them and we're maxed actually making it even more modular than we talked about in the blog post now this is some calls for each of these things so random numbers clocks file system they'll all be broken out into their own module and y'see core will just be the mechanism for loading these modules and a few other infrastructure things so this way your runtime can really pick and choose what it wants to use so an IOT device without a file system could implement the random number and clock syscalls without having to worry about the file system syscalls now how will module developers use huazi languages like rust will use huazi core and these other otherwise modules directly in their standard libraries so for example rusts open is implemented by calling wozy path open when it's compiled to webassembly this means that the module developer doesn't have to know any y'see syscalls they just use the standard library and it gets compiled in for them so for c and c++ we've created a yz6 route that implements Lipsy in terms of these huazi functions and we expect compilers like playing to be able to use this and complete tool chains like Russ and M script in to use YC a part of their system implementations and rust actually already does have this this support how does the compiled code then actually call these YC functions well the runtime runtime that's running the code passes the Y Z core functions in as imports this gives us portability because each host can have their own implementation of Y Z coordinate form so from run times likewise some time to note or even the browser and also gives us sandboxing but because the host can choose which wisey core functions to pass in so which system calls to allow on a program by program basis and this preserves that security but well as he actually gives us a way to push the security even further it brings in more concepts from capabilities based security so traditionally if code needs to open a file it calls open with a string which is the past name of the file it wants to open and then the operating system does the check to see if the code actually has permission based on the user who's running it to run the file with that this string identifies with huazi if you're calling a function that needs access to a file you actually have to pass in a file descriptor for that is basically an object and has permissions attached to it so this could be a file descriptor for the file itself or for a directory that contains that file this way you can't have code that randomly asks to open et Cie password instead the code can only operate on the directories that are directly passed into it so this makes it possible to safely give sandbox code more access to different system calls because what those system calls can do can be limited and this happens on a module by module basis by default a module doesn't have any access to file descriptors but if code and one module has a file descriptor it can choose to pass that file descriptor to functions and calls and other modules or can create more limited versions of the file descriptor to pass in to other functions so the run time passes the file descriptors that the application is allowed to use in at the top level and then that gets propagated down through the rest of the code on an as-needed basis this gets webassembly closer to the principle of least Authority where a module can only access the resources that it really needs in order to do its job now these concepts they come from capability oriented systems like cloud API and capsicum one problem with capability oriented systems it says often hard to port code to them but we think that this problem can be solved if code already uses system calls like open at with relative file paths compiling the code will just work if code uses open and migrating to the open a style is too much up for an investment why as it can actually provide an incremental solution with lip reopen you can create a list of file paths that the application legitimately needs access to and then you can open you can use open but just with those file paths so what's next well we think that huazi core is a really good start it preserves webassembly portability and security providing a solid foundation for an ecosystem but there are still questions that we're going to need to answer as we develop this standard so those questions include things like asynchronous i/o and file watching and file locking we also need a really solid runtime implementing these foundations in the right way something that can serve as the base for many applications and many hosts for this we have wasum time which is what you saw in the demonstrations earlier it's a standalone web assembly runtime and it's intended to be very pluggable so that you can build custom tailored hosts on top of it we think it can be the best-in-class runtime for things from IOT devices all the way up to the cloud and we're looking forward to talking to people who think that it can help with their use cases so if you think it might help with something that you're working on please come and find me happy to talk more about all of this I want to say thank you to all of the people on the team who are driving these exciting developments both with interface types and with huazi and wasum time and I want to say thank you for listening [Applause] [Music] [Applause]
Info
Channel: Codegram
Views: 67,001
Rating: 4.9025445 out of 5
Keywords:
Id: fh9WXPu0hw8
Channel Id: undefined
Length: 31min 34sec (1894 seconds)
Published: Fri Sep 06 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.