"What can't WebAssembly do?" - Katie Bell (PyCon AU 2023)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
foreign So today we're going to talk about webassembly uh in particular we're going to talk about C python in webassembly who uses C python here yeah who uses a different implementation of python yeah yeah something like a lot to see by the people most of us you see python but in order to really understand what it means to run C python in webassembly we're going to dig a little bit more into UC python itself and how it works and but in particular how can I write code that runs on any machine so I have here an M1 Mac laptop so it has a different processor to the windows and Linux machine that I run at home and when I run stuff in the cloud and run stuff on Linux usually and so I'm dealing with different operating systems and different CPU architectures and there are two major approaches to like how to make code work across platforms the first approach which you are all familiar with is to use some kind of interpreter or engine like the python interpreter and so if I'm using if I'm writing code in Python I can write it once and as long as I'm running it on a machine that has python installed or there is a python interpreter there for me I can run the same code anywhere and it should just work uh and so if I take a little uh program like this uh it takes uh the current time uh the current local time it sort of formats that time it writes a little message into a file called time message.txt it prints to the the console um and at each of these steps it's interacting with the operating system it needs to go through the operating system to get the current time it needs to go through the operating system uh to write to a file and to write to stand it out and so where we can write this code in Python and python is handling that for us uh so we can write the code once and even though the operating the systems are different the file systems are different you're running it on a different machine uh this code will work and all of those places uh but that is because there is a different version of python for each one of these different platforms there's a python compiled for Windows as python compiled for Mac OS in fact the Mac OS 1 has two pythons because there's two different CPU architectures that need to account for and so we have for C python itself it takes the other approach approach to is to have some code that you compile a different version of that code to run on each different CPU and operating system and so we have C code which is our python code uh python C python itself we compile it to different executables for these different machines which means we need to have several different versions of python uh running around there and the way that python does this mostly is using the C standard Library you can write C code against this standard library and as long as there's an implementation of that standard library for each operating system you can compile see python so there's like one set of code for C Python and then it's compiled into different versions for different operating systems and platforms and so both of these approaches have some drawbacks to them either you need to have your language specific interpreter or engine on the machine that you're trying to run it on right python card on a mobile device probably doesn't have python installed you'd need to bundle The Interpreter with it and then you would need to have a different version for different platforms and different operating systems because you'd have a different implementation of the different compilation of the Python interpreter and even when you're writing code in C and compiling it for different platforms the C is not a library and the kind of different operating systems are quite different anyway and so you need to write code like this this is the C python open file implementation has a bunch of different like oh right if it's windows I have to do this other stuff oh if this particular like function is there I do this otherwise it's not um and so both of these approaches take a bit of work to have them work across platforms uh and so this talk is largely aspirational I have a dream one day in the future I want to be able to write any language not just python I want to be able to write whatever programming language I want I want to be able to build it into one thing one bundle one package something and have that package work on any machine right who would like this to be the case yes right yes computer Esperanto that's a good way to put it um I want to be able to like choose my language and not have to worry about oh I have a right Java I have to make sure there's a jvm there if I write JavaScript I need to have a JavaScript engine um if I write in Rust I need to compile different versions for every platform I don't want to have to deal with any of that I want it to all just work which would be great um so I actually have a second dream as well this is a very aspirational talk um and for my other for my other dream to explain this I need to ask you all the question okay we're gonna do a show of hands I need you to answer this question honestly do you ever run code locally that you don't really trust yes okay that's a lot of hands um I also do this um I might do something like oh I cloned some git repo from some random internet stranger and then I will build and run that code I might if install some package I've never heard before heard of before there was what was that oh yeah I forgot sooner yeah see I don't know that so like there's a limit to how much damage this can do if you don't use Suitor if you do use sudo ah yeah all bets are off um so uh the The Bash one for like running an install script that's from The Homebrew installation instructions um just like oh yeah let's just run the script in my terminal um so I have done all of these things right and this is this is risky Behavior this is risky behavior that we all do right these programs don't even need pseudo access to say read the folder where your SSH keys are or your maybe your AWS credentials are sitting in a credentials file there maybe you have API access tokens in your environment variables right you're running programs all the time where if there was malicious code in there like it could it could do some damage right um so my second dream that I want to be able to easily limit a program's access to just what it needs and nothing else very fine-grained permissions like yes you can write files but like only to this folder or yes you can connect to the internet but only this particular like port or this particular address so we kind of already have this a little bit um you already run untrusted code in a reasonably secure way every day and I'm not talking about people here who are like very security conscious and like run everything in a virtual machine um you all run untrusted code securely every day how do you do that browsers yes exactly web browsers web browsers are great they're basically running untrusted code machines you go to some random URL and is pulling all this all this code and just running it um and it has sandboxed that it is secure it is built in a way that like that code can't wreak havoc on your machine and that's that's really cool but not all apps or web apps Katie like yes not all apps are web apps uh we want to be able to run other things but still get those same kinds of advantages and so if we've cast your minds back to 2013 uh the Harlem Shake was a thing if you remember that um Mozilla introduced not webassembly asm.js who's heard of asm.js hey lots of people that's great um it's essentially like a weird subset munging of JavaScript um you could take JavaScript and label certain functions as use ASM and that would signal to the browser that this is not normal JavaScript this is like heavily optimizable JavaScript it doesn't use normal Heap variables it will store the Heap in like a byte array of its own and it will enforce the types on various things essentially like it's changing JavaScript into a set of functions that can be optimized for most CPUs that can be optimized really really efficiently for the sort of common features that most CPUs have right like 32-bit integers is a lot easier than dealing with like JavaScript numbers which might be an integer it might be a double we don't really know until we run it so this was was able to be run extremely fast on the browser now this is really ugly code you do not want to be writing this code yourself it was intended to be a compilation Target so we could write code in C or some other language and compile it into JavaScript and it would run slower than it would normally normal machine but actually reasonably fast and this is how the first time we had C python working in a browser worked you could compile C Python and run it as JavaScript code in the browser and it was maybe twice as slow or as like normally running it two or three times or so but it was actually usable and so if you think running python in the browser is a new exciting thing it's actually been done for 10 years now um but this had some downsides to it so particularly trying to write this kind of heavily optimizable JavaScript but still maintain backwards compatibility just in case the browser didn't have the optimizations built in it still had to work um that is quite difficult trying to make it fast and also compatible with JavaScript but the bigger problem was really really huge piles of JavaScript that you would have to download to be able to run it okay so python C python wasn't super usable in this form because you had this huge download size to try and run C python in the browser uh so webassembly was invented as they were sort of starting to discuss maybe we can create some kind of compact binary format for this JavaScript code and then once you've sort of breaking compatibility with JavaScript you might as well like invent a whole new system anyway so first version of webassembly was like built on what was learned from asm.js sir if you hear me say wasm and webassembly those are interchangeable that is the same thing it's not an acronym plasm is just short for webassembly okay so we can take code in these various different languages and compile it to a little dot wasm binary file which is the compact format it's very fast to read and load it's very fast startup time um and this code will run in a browser in the browser's webassembly engine this is a standard across all browsers all browsers support this now uh and the binary files are way smaller than they were for asm.js uh which is great so webassembly has a text format as well you don't usually write this because again you just compile other code to webassembly but it's used for debugging and for Learning and to sort of explain what webassembly is actually doing under the hood now this is not actually an Assembly Language it is a stack based virtual machine right it is its own kind of interpreter but one that is intended to be incredibly optimizable for what most CPUs are capable of and it's getting faster all the time as they're adding more functionality for sort of simdi and like different CPU instructions that can be used uh to make things run faster and faster in this example this is just a little function that takes two numbers and adds them together but you can see it's calling the console log function and this is because in a browser you can JavaScript can call webassembly functions invoke webassembly scripts and webassembly can call JavaScript functions if it was built to have those functions imported into webassembly so they can sort of call functions back and forth but they are separate instances they have separate various separate Stacks separate variables passing large amounts of information between the two means copying them from one to another okay so back to python okay we want to make C python work in webassembly and I'm going to preface this with part of the point of this talk is to talk about that there are more than there's more than one way to compile see python to webassembly and it means very different things in different circumstances so we're going to go through two different ways to compile C python to webassembly that are quite different to each other the first one is compiling C python with M scripting now if you're used to compiling things with C you have your C file you run GCC or something and it gives you a binary file M scripting is designed to kind of drop in there and you just say EMCC instead of GCC and it will spit out a Javascript file and a webassembly file and we're going to use this we're going to do a live demo and this is the python code that we're going to run it's the same as before it's reading the local time and it's writing it to a file and the file is called time underscoremessage.txt okay live Denver time so when we run a python see python built within scriptin in the browser so the parts here that are built by m scripting so you take C Python and it spits out this python.js file which is just a JavaScript sort of wrapper that's the thing that has the API that you can call to start running python as is python.wasm file which is our most of where the C code actually goes and this is python.data which is essentially all the files that python needs to be able to run in the browser like Library files and python files that are part of the standard Library uh go into that sort of data package here now this python.worker.js and the python.html I wrote those myself largely I wrote the first version someone else added it it's in the C python repo there as a way to kind of test out your M scripting python so this is our python code that I pre-loaded in there that reads the current time and writes file and if I run the reple here I can just use the repo like print hi it works as as a python Ripple would um the the terminal itself and the text box like that's all just normal JavaScript web app stuff and then it's calling into the webassembly to run python okay so I can stop that running and by then in this case is running in a web worker so it's actually running in a separate thread okay if I clear this and I run the python code it just says message written to file okay we're in a browser like where is that file what file did it just write a file like what is going on here um and we can kind of see a little bit what's going on if we like we can delete this and we can say import OS we're importing or where is the operating system like what is going on uh OS dot Lister if I remember that I'm typing this correctly oh I need to print that that would make sense run okay we do have a file system there are some folders there um there's a temp home there's also a main.pi which was the code I was running before and time underscoremessage.txt is there so if we could say it feels open it's just in like the top of my root directory I guess uh time message.txt printf dot read and we run that okay it says the current time is 10 51 so you can tell this is a live demo it is actually well it was 10 51 when I wrote the file um the string format time part isn't very like cross-platform consistent generally uh so the AM PM part didn't work uh but hey we have we have written a file uh so what is going on with this file uh that we wrote so cast your minds back to when we're talking about C uh and C python calls functions in the C standard library and that c standard Library kind of works differently on different operating systems to implement that library in mm scripting and scriptn provides a JavaScript implementation of the C standard Library among other things but mostly that and that implementation of the C standard Library by default uses an in-memory file system that's in the browser so my file system that I'm reading and writing to here is just in memory in this browser if I refresh the tab everything in that will be gone okay um so what can't webassembly do at least within script and running in the browser now we can get random numbers we can get the system clock time we can do quite a lot of different operating systemy things reading and writing the local file system well it's got this in-memory file system so you can kind of simulate that but it's not actually my files on my local machine because browsers can't do that now there is a new web API that lets you access local file systems it pops up a thing it gives you permission to a particular directory but uh you can use that from in scripting but it has an async only API because that JavaScript API is async so you can't do synchronous file system reads and writes to that because it's asynchronous so it's quite difficult it is possible to access the local file system but from C python currently you can't also sockets and networking like generally aren't supported in a browser anyway so these turn out turn out to be mostly limitations of the browser and what you can and can't do in a browser rather than limitations of what you can and can't do with webassembly itself okay um the cool thing is though am scripting does support sdl2 so you can do graphical interfaces audio joystick and keyboard and mouse inputs and have that all work uh and uh PMP Dash B on GitHub has done amazing work to support Pi game in the browser so you can take Pi game games and run them in the browser because inscript and supports sdl2 which happens to be what pygame uses now if there's a different game engine that doesn't use sdl yeah good luck um but Pi game happens uh happens to work so highly recommend checking out uh pygame.web.github.ao for a bunch of different Pi game games running in the browser there the other shout out goes to piadide which is much easier to use than trying to compile C python to encrypt them yourself it's packaged up it's very easy to integrate python into a web app using piadite it actually supports installing a bunch of pip packages as well including packages with native code pandas numpy lots of the data science packages will just work in a browser using piadite people have made jupyter notebooks but like entirely in the browser without needing a server it's very cool okay going back to the two dreams that I had I want to be able to write in any language compile it once and have it work on any machine or operating system sort of yes I guess if it's in a web browser it works um I want to easily limit a program's access to just what it needs and nothing else well also sort of like it's limited it's safe it's secure but also I can't access a whole bunch of things that I might want to actually have my code access sorry it's kind of halfway there not quite there yet um but webassembly was built as a web standard for browsers to use but once it was built and once it was standardized once people started using it I started to think well webassembly is standardized it's language independent can run basically anything on it because you can compile to it it is cross-platform and we have implementations of it for several different platforms already it's sandbox and secure and it's pretty much very it's very close to Native speeds uh depending on what you're doing what if we had a standalone webassembly engine that we could run anywhere including outside of a browser wouldn't that be cool we could just kind of build stuff for webassembly and just run it anywhere uh and these things exist but you can't just kind of run webassembly by itself if it can't do anything like read and write the file system or access the network still needs to be able to do those things and if we're going to run it everywhere we need some kind of standardized API to be able to do those things uh hence there is a new standard YZ the web assembly system interface and this is a center that is in active development there's currently like preview zero and preview one and they're working on preview two um but people are already using it so I could have a webassembly program and I have this API that lets me access common operating system things it's not the same as the CPC API because we don't want to deal with all the Legacy things um but it kind of follows that a little bit as well so I can read files write files I can get the system time I can access the network and it's all going through this wazzy API and so if I have uh and there's an implementation of the C standard Library on the Wazi IPI so I can use the C standard library in my normal C code and when I build it for webassembly it's built to use the Wazi API underneath okay so wasm time is a standalone webassembly engine that supports the Wazi system the the wazzy interface and so in theory I could just kind of say wasm time python.wasm and I'd be able to run python uh like this uh and I would be able to run whatever python code I wanted uh and this was python.wasm would work on any machine as long as I have wasm time or a different webassembly engine uh installed on that machine which would be cool but it doesn't work exactly as easily as this um so if I wanted to build python uh for Wazi it's a little bit more complicated than building from scripting it hasn't had as much sort of work put into making it just work out of the box and easy it's very new so you kind of have to say well I'm going to build with clang supports webassembly and Wazi just kind of out of the box but you also need to provide it the um the sort of wazzy implementation of the of the C library and NEC uh other other C libraries that you want to be you use they have to be compiled to webassembly and then also compiled in um but what you get out is just a wasm file right I just python.wasm which you can then just run on the command line with wasm time but if I actually run python.wasm with resin time like this I get this high like whole giant error message uh that's coming out there um can anyone tell me what's going on here does anyone debug this for me I don't have my libraries exactly perfect um python so in this example I do actually have the libraries the libraries are there in a folder uh but python osm cannot access those libraries and this is because um oh I changed the slider it doesn't work um so the wazzy API is built with security and sandboxing in mind from the beginning so when you're using the Wazi API there's a set of capabilities that you can grant to a webassembly program when you run it if it needs to read the file system you have to tell it hey you have access to read the file system you have this directory or this specific file you have access to that file and nothing else if you want to read or write a particular file you grant it access if you want to access a network you have to grant that as a permission uh to the webassembly program that you're running so if we do this we tell it hey this is where the python path is and this map dough is saying the current directory which is where all of my C python code is and where all of my libraries are should be at the root directory when I'm running in wasm uh then I can grant python access to all the libraries that it actually needs um if you have a program that doesn't need to have access to a bunch of files this is much easier you don't have to do this but for running python it needs to have access to the library files or it can't run but this folder with the if I have the folder with the library files and my python.wasm build of python I could take that and run that on any machine that has wasm time installed okay any machine that has was in time or a different web assembly engine so I kind of have this right it's early days the standard is still under development there's lots of things that it can't do yet but I can write in any language I can compile it to a wasm file or wasm there are also wasm Docker containers as well um which would then bundle the files in as well and then I can run that in a way that's secure and sandboxed and limited to what it needs to be able to do on my machine and nothing else so maybe once this becomes a little bit more streamlined a little bit easier to use I will be able to fulfill both of my aspirational dreams which is very cool so where are we at the moment what can't webassembly do um so M scripting has um like a bunch of limitations but uh wazzy like it can read and write the local file system it can do sockets and networking um I forgot to mention before NM scripting you can kind of do sockets and networking but it has it needs a proxy server that it opens a websocket too and then the proxy server is actually handling all the network stuff um I've not managed to get this to work even with an example um for myself I sort of managed to get it to work a little bit but not really it doesn't work in C python um that part hasn't been accomplished yet it's but you would need a proxy server anyway for it to run in a browser but for wazzy it does support sockets and networking if you're using other C libraries especially if they're kind of operating system specific things they're unlikely to work in Wazi sdl2 doesn't work nyz um if you are trying if you have some kind of C program that you're trying to make working webassembly um have a close look at exactly what uh webassembly the wazier API supports and whether you need anything else for it sometimes you'll find like when we're trying to add support for zedlib and check if that worked in Python with Wazi because python depends on zedlib to do zipping um you could just compile it for Wazi and then link it in and it just kind of worked uh which was amazing because it doesn't need any operating system access so but if you do need to do more operating system access things then the library's probably explicitly have to support wazzy um until maybe the sort of wazzy API will develop for them okay um where can we find webassembly it's currently in use in a bunch of different places in particular it's used on the web you've probably used it in websites if you visited figma uses it uh the new Photoshop running in the browser makes quite extensive use of webassembly to Port some of their Photoshop stuff to work in the browser as well uh there's this really cool example of uh someone's built an entirely emulated x86 32-bit machine um that runs in the browser and webassembly and you can load whatever like operating system image you want onto there so this is Windows 2000 running in my browser that is my own beautiful artwork right there um so lots of webassembly is happening on the web it's also starting to make Headway into the cloud it has potential it's not quite in a lot of use yet cloudflare supports webassembly workers um there's a wasm edge project they claim uh that it will be potentially 100 times faster and start up 20 faster at runtime than Linux containers I'm not entirely sure how they're making that comparison um but yes smaller containers because it's just your webassembly binary that you need not an entire like Docker Container full of Linux files um for that so there was potential for this to be huge in cloud computing because you have this sandboxing without having to run small lightweight virtual machines because the sandboxing is built into the webassembly engine Okay the third one which I'm super excited about is sort of replacing native modules in other runtimes so as an M1 user I often find that python packages don't have great support for the native compiled modules because even though it's python there's some C code or something else in there that's compiled that needs to have every different platform and operating system supported and if it doesn't support M1 Max because they're relatively new then I'm kind of stuck uh and so this is why I'm super excited about the next talk that's coming up right now um which Jim is giving which is about including like essentially you don't need to create 70 different 27 different wheels for different platforms to have native code in a python package you could include webassembly instead and just have one package that works everywhere so I'm excited about that um and so you should stick around for the next talk thank you very much foreign [Applause]
Info
Channel: PyCon AU
Views: 60,725
Rating: undefined out of 5
Keywords: KatieBell, pyconau, pyconau_2023
Id: JbZAsSzzk0E
Channel Id: undefined
Length: 29min 3sec (1743 seconds)
Published: Thu Aug 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.