Rust & Zig Combined • Richard Feldman • GOTO 2023

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right this is rust and Zig together I'm Richard Feldman so I've been working on this programming language for several years called Rock um this is not a talk about rock uh but it sets the stage because uh something I didn't know about GitHub uh is that GitHub actually has analytics for your pages and I found this out because we had a list of analytics for our repository visitors and one day we got a big spike out of nowhere and somebody told me about this and I was like first of all I didn't know GitHub analytics but now that I do know um why is this happening where is this coming from and it turned out that somebody had written a blog post that went viral and it linked to one of our FAQ entries which is why does Rock use both rust and Zig and apparently a lot of people had questions about this so how many people here are familiar with both rust and Zig okay so some but uh but less than half um so one thing that these languages both have in common is that they're very fast they're good for like high performance you know you want to build a program that runs really fast either one of these languages is a good choice for that um but there they have quite a few differences so Russ is a very complex language it's sort of Marquee language feature is the borrow Checker you may have heard of this um it's basically a way to help rust programs have more guarantees especially around memory safety and certain things around concurrency um and it's both a source of big learning curve but also a source of a lot of the benefits of the language now Zig by contrast is a very simple language which appeals to me um it has sort of Marquee feature called comp time which is basically where you can write Zig code and have it actually execute at compile time rather than at runtime so let's you like pre-compute a lot of things really nice feature in a high performance language and it's really simple and you know doesn't have as much of a learning curve as like a macro system or a distinction between like const and non-const functions um and also Zig has a really nice tool chain I'm going to talk a little bit more about this later um but for example you can do things like I want to build my Zig program and I'm on a Mac but I want to build an executable that runs on Windows or one that runs on Linux and maybe I'm on an RC but I wanted to run on an Intel you know machine um you can do all that sort of out the box with Sig it's really really good at that at those things now the question is why would you actually use these two things together it seems like they're both you know different ways to approach the same thing and this is exactly why so many people were you know uh coming to our FAQ entry is people were wondering the same thing and I saw in a lot of the discussions about you know on this blog post and others um around like you know comparing Rus and Zig um a lot of things I thought you know oh yeah that's that's right a lot of things like that's that's kind of missing some Nuance or or kind of missing the picture and I sort of realized that there's an interesting question here which is why do we do this and why might somebody else want to do this but also along the way there's a lot of interesting questions around memory safety and um things that I think uh deserve some more Nuance discussion than what I've been seeing around these things so let's talk about these things um we're going to start by just talking about answer the question like why did we mix Russ and Zig in the first place um then I to talk a little bit about memory safety in practice and some of the Nuance that I think is missing from some of these discussions and then finally conclude with like where to draw the line you know if you're going to be mixing these two things um you know where would you want to decide to like draw that line okay so I have to start by telling a little bit of a story uh this is called the storekeeper bug so back in the day when I got into programming I started off with basic and then I moved on to Visual Basic and I started making games in Visual Basic this is like around middle school era um 1990s and so uh at some point somebody told me you know if you're going to make games what all the professionals use is C++ and I not really knowing what C++ was was like well that's what I'm going to use um then someone was like well if you're going to learn C++ you should really learn C along the way so I was like cool then I'm going to learn c um so I did I went and learned C and then kind of learned C++ but honestly the way I was using C++ was mainly just like using the C++ compiler in kind of a c style um so I wrote this role playing game uh in quote unquote C++ but really kind of c and at some point I was walking into town as I had done many times before in the game and the storekeeper greeted me with as he always did uh hello traveler the text being displayed across the screen and then one time I walked into the store and instead of saying hello traveler he said hello traveler and a bunch of memory garbage appeared on the screen and I was like what do I do about this I know I didn't write that string anywhere in my program um what's what's happening here this is just a very scary bug to encounter especially for a middle schooler and I I really was sort of at a loss to like how I could possibly debug this thing to this day I don't remember exactly how I I mean I did eventually fix the bug but as I recall it was something along the lines of I just was so like gave up on how to fix the bug proper I just went around refactoring other things and cleaning them up and eventually one of those somehow fixed it um but this is an example of a memory safety bug and yeah these are like really scary things and so when I got into industry I sort of gravitated towards like automatic memory manage language garbage collection and stuff like that in part because I didn't want to have to ever deal with that sort of thing ever again so it fast forward a couple decades I'm creating this language called Rock um and again not not really talk about rock but um this is an automatic memory managed language so it doesn't have you know that that sort of category of problems at least in the language itself um and uh this came up for me this like rust and Zig question because the Rock's tagline is it's a fast friendly functional language and that fast part applies not just to rock programs should run fast but also the rock compiler itself should run fast so if we're building a fast compiler which is you know the goal here um I really didn't want to get stuck with like a sort of performance ceiling like sometimes something that can happen is you choose a particular language for a task like building a compiler you get partway through and you realize I can't squeeze any more performance out of this language because there's a garbage collector there's there's something in my way that I just can't push through to make it any faster the only way to get to go faster is to write in a language that has a higher performance ceiling so I didn't want any performance ceiling here I didn't want to hit one I wanted to say look I'm going to start from the get-go making this compiler in a language that has the maximum performance ceiling I can get but at the same time I also didn't want to worry about these like memory unsafety bugs this you know storekeeper hello traveler you know memory garbage stuff I didn't want to have that experience um because I still remembered it from Middle School uh so knowing that I wanted maximum performance and also didn't want to hit memory unsafety bugs uh Russ was sort of this obvious choice now i' heard that Russ was a language that lets you right code that runs really fast and part of the reason for that is that uh it has no garbage collector um unlike you know other languages that uh that I was familiar with um also i' heard that it was sort of like good with concurrency and kind of these are like the the my the sum of my knowledge of like why rust was considered fast was like I don't know it doesn't have a garbage collector and it's good to concurrency little did I know this is like the tip of the iceberg and like I had no idea about Arena allocation or CPU memory caches or Branch prediction or simd instructions or this long long list of things that I did not know at the time um but nevertheless you know this was sort of what led me to to pick rust as the uh you know I wrote the first lines of the rock compiler in January 2019 this was the reason um so how did it go uh well if you're not familiar with rust uh but you are familiar with Haskell uh this reminded me a lot of learning Haskell um so in one way because there was this brain exploding learning curve there were all these Concepts that I ran into I'm like I know like more than a dozen programming languages I've never encountered anything that resembles a Bor Checker or lifetime annotations or stuff like this this was all like brand new stuff that I'd never seen anywhere and never seen anything like it anywhere and I remember the same kind of feeling when I encountered some things when I was learning High School um in terms of language complexity there's just a lot there I mean separate from the fact that there's a lot of you know new Concepts to learn there's also just a lot a lot of Concepts period there's just a it's a big big language um but at the end of the day also like hasell once I did actually get my code working I had a lot of confidence I was like okay this is not going to break this is not going to you know give me the storekeeper bug um I and also I just you know even compared to other uh like me like garbage collectors language i' use I felt like a lot of confidence in refactoring kind of reminded me of Elm in that way like in a positive way um so this this confidence was really kind of the thing that maybe say okay cool the learning curve the language complexity are worth it I'm happy with this Choice like I'm glad that you know I I chose Russ for this compiler this is um you know something I want to keep building on um now there's this one particular use case inside of our compiler called built-ins so built-ins are basically language Primitives so these are things like numbers strings lists um things that are like you know baked into the language and they're partly implemented in rock so like we'll have a function called like you know uh do this thing with a string and some of it will be Rock code but then partly once you get to a certain point there's this sort of like irreducible we just need some sort of lower level primitive thing that really shouldn't be exposed in user space because it's unsafe or something like that um so it's partly that these are implemented in rock and then partly there are sort of these Primitives that are built into the compiler itself hence the term built-in um so at first when we were building these built-ins we implemented them with manual llvm calls how many people know what lvm is okay so a mix um so for those who don't LM is basically a way that you can as a compiler author um take your code and translate it into machine code while having some optimizations done on it so you basically call these functions in llvm that say like hey I want a conditional here I want you to make an ad instruction um or something like that and llvm will say okay cool I can turn that into x86 code or arm code or you know Linux Mac Windows whatever um and along the way it can do a whole bunch of optimizations on it it's it's really really good at optimizations um so unfortunately writing manual lvm calls is kind of like writing Assembly Language but with much more ceremony so it's like you get all of the really lowlevel fiddlin of assembly but instead of the conciseness of assembly you get like all the ceremony of I don't know it's like one of the most let me let me just show you an example so let's say I wrote this rust code that says if length one double equals length two then well you know do something in the then block else do something in the else block really really straightforward simple code just have a little conditional comparing two numbers together and then we have a then and we have an else so here's the equivalent in manual llvm calls this is also rust called This is the rust code that generates the lvm calls so this is actually something I took out of the rock compiler um so we have let then block equals context. a pen basic block parent then else block equals context. a pen basic block you can kind of see where the verbosity and the ceremony is coming up here m.b builder. build conditional Branch this is our if uh build int compare so we have to say okay we're going to compare these two for equality in predicate eek Len one Len two and then double equals and we have the then block and the else block and then later on these are both going to get get mutated with the contents of whatever goes in here so that was if then else in manual lvm calls what if we wanted to do something slightly more complicated like comparing to arrays where you're going to compare you know each element in the array to see if they're equals well this is a little bit longer than what we saw the if then else so um it's this also this and then also this that's what array one double equals array 2 looks like in manual lvm calls so now imagine this and you're like cool now let's build a hashmap in manual lvm calls no thank you so once we got to a certain level of like complexity in our builtins we're like okay we want to get higher level with this we don't want to keep like writing this really lowle manual stuff with all this Ceremony this is getting really painful and also difficult to maintain um how can we get something a little bit higher level going for our built-ins so fortunately some languages can compile to llbm bit code and basically what that lets us do is it's something where we can load this BC file off the dis and then mix it in with our Rock compiler output which is sort of dynamically generating lvm stuff and then um putting those two together we can sort of you know emit the same stuff as if we'd handwritten it so languages that can do this include C C++ Zig and rust um and basically this would mean that we could actually like literally write this rust code in our built-ins instead of all the you know expanded version that we were previously writing by hand so obviously the first choice for this is like well let's use rust because we're already using rust for the compiler why would we want to introduce new language if we didn't have to let's let's just use Russ as our higher level language for the built-ins um so we pretty quickly discovered that we needed this unsafe keyword all over the place so in Rust they have a special keyword for when you need to do something that's memory unsafe and sort of step outside some of rust's guarantees um and this was necessary all over the place in part because basically we're generating this arbitrary LM code that stuff can just do basically machine code stuff it's completely unsafe 100% um and the whole point of this is we were generating stuff that was going to interface with this arbitrarily unsafe stuff and rust correctly says hey this is unsafe you need unsafe keyword and that was just just all over the place with these things so that was both annoying and also meant we're not really getting a lot of rust's guarantees which is like one of rust's big selling points um it actually turned something that was normally a selling point into kind of like an annoying downside um also in some cases when you use the unsafe keyword with rust there's this tool called Mei which can help you out like discovering certain potential sources of problems but unfortunately foreign function interfaces namely when you're sort of combining your rust code with another language which is what we're doing here uh you're just kind of on your own to sort of figure out um how to how to navigate that you don't get help from M on that um also I don't remember exactly what the problems were but Russ generated LM was causing some problems at the time I talked to some other people who remember like this era when we were uh making this decision and they couldn't remember either but U but suffice to say like there were also some concrete like we tried to do it despite these problems and ran into other problems with the actual implementation um including some some basic tooling and development build difficulties where I also kind of vaguely remember there was like a we web assembly specific thing um if you want to see how we're using web assembly in Rock by the way check out Brian Carrol's talk from day one it's really cool um and he he showed like an online reppel that we built in rock um but basically putting all these things together we were like okay the obvious first choice of rust is not working out well for this use case um it's working out great for the compiler but not for this not for the built-ins so the next question that we asked was hey why not see why don't we just try a different language that's a lot less verbose than writing this handwriting this um LM um and uh you know maybe it'll it'll go better for you know various reasons um well of course this leads me back to I remember what it was like when I had when I was doing some C and I had had a lot of bad experiences with this um but because I'd spent so much time with rust at this point when I went back and Revisited sort of my past experiences with c and C++ I felt a lot more well equipped to understand these types of things so I actually want to go through this bug real quick and I don't yeah this code is sort of lost to time I don't know exactly like what the bug was but I have like a guess as to what it could have been like an example of that and this is I I think actually an interesting way to illustrate like some memory safety in practice so let's just talk through this real quick all right so remember that you know what it's supposed to say is hello traveler but instead what it says is all this gibberish around it so let's kind of zoom in on the Hello traveler part of this so in memory like the actual bites that are happening here would be these so these are asy bites uh back then it would have been literally asy today most likely you would see these same bites because it's utf8 which happens to have the same correspond the same numbers um in memory for these particular set of characters as asy would have but basically uh this is just like these are the numbers that are associated with the particular characters now let's suppose uh that this was in a c style called null terminated string and this would have been literally what I was using at the time um so this is where basically you have a little zero in the end of the string zero B that marks the end of the string so in some languages this will be stored separately like you'll say here's the string and then I'll have a number which is here's the length of the string but in C and also in some cases in C++ um this is a really common way to do it too is you had this zero at the end of the string that marked the end of the string and then you didn't store the length you just said that's just that's just a part of the string itself um so now let's imagine that I had uh some algorithm where I wanted to change the greeting from Hello traveler with a DOT to hello travel with an exclamation point if the shopkeeper was really happy to see you because you're a return customer or something like that um so this would be almost exactly the same bytes the only difference would be the the dot bite would be 46 and then the exclamation point would be 33 and if I were writing some C code that wanted to change the last bite of the string from a DOT to an exclamation point I might write it like this so string square bracket stir length minus one let's assume we'd already calculated the string length ahead of time um and stir square bracket something is uh if I set it equal to this um in C it's going to just basically write a new bite there and the exclamation point is going to be 33 so you can imagine that you know I might have literally uh written this code if I had been doing this translation between um the period and the exclamation point you might also imagine that a bug or a mistake I might have made would be a classic computer science one the off by one error so let's suppose that instead of writing the correct code I forgot the minus one there um that's certainly something I did a lot when I was uh newer to programming um and now I've gotten burned by it so many times that I don't do it as often but I can't say it's zero uh but let's say that I wrote this code and now the 33 rather than being written here ends up being written here so there are two consequences here one is that we still have the 46 for the period here because I didn't write to that location and the other is that I overwrote the zero which means we no longer have a zero marking the end of the string at all so where is the end of the string well it's wherever we happen to encounter a zero in memory next what whatever it happens to be in between that who knows maybe it looks like this so This is actually a plausible explanation for how I could have gotten this bug is I I wrote something to I wrote over the the zero that was terminating um this string and I could have done that maybe in this logic or maybe completely somewhere else in the program but the point is that I had probably an off by one error somewhere I overwrote the zero and then all of this gibberish was just other stuff that happened to be in memory at that location getting printed out because C thought it the string didn't end until it finally by sheer coincidence happened to hit a BTE after all this gibberish now this stuff might have been strings it might have been numbers it might have you know it's just ones and zeros in memory um and C's just choosing to interpret them as as asky because it you know doesn't know any better um but keep in mind that this is not you know you know if it's just a game it's kind of funny when you have a bug like this but this could have been like a secret this could have been a password in that memory there could have been all sorts of sensitive information in those bites that's now just getting printed out to the stream uh in plain text and also it might have been that you know instead of this being a middle schooler game this could have been like a production server and maybe instead of printing it out to the screen it was sending it over the network or writing it to a database or something there's all sorts of really bad stuff that could have happened here that was much more serious than my silly little bug so this is one of the reasons that people call this memory safety um this is a memory safety bug right here and it's an example of memory unsafety and one of the bad things that can happen when you have memory unsafety um so this this particular line was the the bug where I wrote stir bracket stir length equals exclamation point and different languages handle this in different ways so we saw that in C what basically it'll do is it'll just silently overwrite whatever is there whatever that you know stir length bite is um it's just going to overwrite it and it's the same thing in C++ because C++ one of the early selling points of it was backwards compatibility with C so they made square brackets work the same way as they do in C so you could sort of copy paste your C code over and have it in a lot of cases just work now rust an example of rust's uh improved memory safety compared to CN C++ is that if you write the same syntax in Rust what you're going to get is an error index out of bounds like you would in lot of you know automatic memory managed languages so this is because in Rust what we have here in this stir it's represented differently in C and C+ plus this is just a memory address but in Rust it's a Memory address of the start of the string and also a length so rust actually knows what the length of the string is and so it can check at runtime and say hey I noticed that this length is bigger than the you know length that you gave me or equal to it in this case this is not going to work this is this is going to cause memory unsafety so instead of doing that I'm going to give you an error at runtime this is also what Zig does so Zig also has square brackets and Zig has a data structure they call a slice and what it does is uh depending on your um compiler settings is it it will give you an error index out of bounds just like rust would so this is an example of how both rust and Zig are addressing a really common memory safety problem known as a buffer overrun um and you might think oh well you know how often does this actually come up you know you made that mistake in Middle School turns out this is actually a big deal like this type of error this memory unsafety error comes up all the time in practice um so this is a Microsoft report report from a few years ago they looked at uh critical vulnerabilities reported from like 2006 to 2018 across a whole bunch of different code bases um they found 70% of these critical vulnerabilities were due to memory unsafety in C or C++ um some of these cves are uh buffer overruns just like what we saw a second ago a a buffer overrun being uh where you access memory outside the range of like an array or or a string or something like that um that you weren't supposed to access now bounce checks like what we saw that Zig and rust are doing those prevent buffer overrun vulnerabilities but the these are not the only types of memory on safety so unfortunately Microsoft didn't break down what percentage of those uh critical vulnerabilities were um uh memory unsafety due to buffer overruns versus other but let's just briefly talk about uh two of the other types of memory unsafety kind of kind of is like a big three um and these are the other two so here's another really common line of code that I would have written all the time in C array equals Malo 321 don't need to worry about exactly what this is doing but basically Malik is short for memory allocate and when you want to make an array and you don't know what size you want at compile time which is really common usually you you don't know what size you want until you're actually running the program um you pass in some argument to Malik saying hey here's how many elements I want in this array or how many btes I want um and then Malik says cool I will allocate that much memory for you and then I'll give you back the address of that uh memory and and that's what's actually the array is going to store um so then I do a bunch of stuff in my program and then later on I have to call this function called free uh which and passing in the array which is the address of that memory which will basically say hey I'm done with this stuff now you can you know deallocate this and and and give it back and reuse that memory later now this is important because if I don't do the free then I'm going to have a memory leak and basically I'm going to allocate this memory and since C does not have a garbage collector it's just going to stay in memory forever and maybe sometimes that's what you want there are certain categories of programs where that makes sense if you have like you know just a script that kind of runs from start to finish maybe you don't want to bother deallocating it that's fine but most programs do at some point in their uh execution want to actually free something at which point you've introduced the possibility of a use after free bug so this is where you call free on the array and then later on you forget that you've called free earlier or maybe there was like a conditional Branch where sometimes you called free and sometimes you didn't and you didn't realize that there was a certain set of circumstances in which you might actually be using this thing after you'd called free on it and the problem with that is as soon as you call free on it that means if somebody else calls Malik somewhere else in your program so you might say oh cool that memory is free I'll just go ahead and give you that and now you have two different variables that are now referring to the same thing and you had no idea all sorts of terrible terrible things can happen as a result of this so this is also considered a memory safety bug a use after free now there's a variation of that where uh you you call free on something you do a bunch of other things then you call free on it again this is called a double free bug um and it ends up having basically the same kind of symptoms as it use after free um and the reason for this is basically if I call free and I you know forget forget that I call free on it again um what can happen is that that second time when I free this memory maybe somebody else was assigned that same address because I called Malik again and said oh this memory is free I'll assign you that address and now I have freed something that uh oh was already in use is still in use by by some other variable so it ends up being sort of the same um symptoms as a use after free so again you don't need to know this in super depth the point is that use after free and double free are kind of the other two major types of memory safety that people encounter aside from buffer overruns okay now Zig does have something to help make uh like reduce the the odds of this happening and it's called the defer keyword and basically what this does is it's a way to say I'm going to write my free immediately after my Malik on the very second line but of course I don't want it to run yet and so what defer does is it says cool you can do whatever the you know the program wants to do in the rest of this function um I'm not going to actually call this free until the function is about to return and that includes if you have a bunch of conditionals in here maybe you have like an early return here and there and like you know some loops and stuff maybe some complicated things are going on doesn't matter defer is going to say whatever is going on there I'm not going to call this free until all that's done now this doesn't work all the time there are some cases where maybe you you know want to allocate the memory and then you want to return it from the function and you don't want to free it yet you want to have somebody else in the program free it in that case defer is not going to be that helpful but the point is that this can help you have be uh much less likely to have a use after free or much less likely to have a double free there's a whole category of things where if you just write it in this case in this style you know you're never going to write free manually you're never going to um potentially uh you know do a use after free because defer is going to make sure that you know you're you're using it before it goes out of scope or sorry you're freeing it um after it goes out of scope and so it's it's not a possibility that you would have a use after free there so um this is not you know bulletproof but it does help you out a lot um in terms of uh avoiding these par particular categories of memory unsafety now uh part of the reason that rust has a really good reputation for memory safety is that it has sort of even stronger guarantees around this stuff now definitely not going to get into how Russ does this in the type system because uh I don't have another six hours to talk about it but um suffice it to say that what rust will do is at the end of the day automatically make sure that this array is going to get freed whenever you're done with it much like it would in a garbage collected language except without running a garbage collector instead it's going to do it by tracking certain things in the type system and figuring out exactly where it needs to insert um certain calls to making it um get deallocated only once and uh and at exactly the right like earliest time when you possibly could so in Rust uh when you're using the system you cannot have a use after free and you cannot have a double free those categories of memory unsafety much like the um the buffer overun we talked about earlier are just not things you have to worry about in Rust uh with a little bit of an AS which is unless you use the unsafe keyword in which case you can totally get all these unsafety things um but that's kind of the point of the unsafe keyword is that uh the idea is to sort of minimize the amount of code that you need to think about and and need to check for potential memory unsafety in Rust it's not that you can't do it because if rust said there's no memory unsafety allowed whatsoever then it couldn't meet its goal of sort of maximally using your Hardware there would have to be some performance ceiling and saying like you know bounce checks for example do have a runtime cost um so if they wanted to say like you know uh we guarantee memory safety 100% across the entire language no no questions asked then there would be certain things you just couldn't do in Russ that you could do in like a zig or a c that are that would uh improve your performance okay so um a comment that I saw not just on this blog post but I've just seen this like around the internet like a pretty common thing is people say I don't understand why in this day and age anyone would use a memory unsafe language well as someone who's using a memory unsafe language um I understand but um but let me like explain briefly like what this term memory unsafe language means because it's a little bit different from the memory unsafety that we've been talking about um so here I'm going to talk about some different uh sort of languages in terms of uh their potential sources of memory on safety in both your application code like what you're writing and then also your dependencies so I think a really easy uh example to compare these is um looking at Old School JavaScript I'm talking about JavaScript from the 1990s which is why I'm using their Old School logo that it was back then before they got the yellow one um this is back before web assembly which kind of muddies the waters here a little bit um and also before no JS existed so this is when JavaScript was only running in the browser and did not have web assembly I'm going to compare that version of JavaScript to nodejs so when we're talking about memory unsafety when it comes to application code like if you're writing you know JS files in the browser you know back in the day there is no possibility of memory unsafety and this was a selling point of JavaScript in the browser was that there were other things like ActiveX and Flash which could have these like memory vulnerabilities and could crash the process and stuff like that and do all sorts of bad things but JavaScript didn't it was really really locked down there was no possibility of um memory unsafety in the whole language no matter what code you put in that. JS file now in contrast nodejs introduced ffi which is a foreign function interface so in node.js you can actually call C which you could not do in the browser uh JavaScript and this is a source of memory and safety because as we've seen C is full of memory and safety um so this means that if you're writing your node.js code and you use that ffi like you do bring in like some C code like a C library now you have the potential for memory unsafety in your node.js app and this is not a possibility in the old browser based JavaScript what about your dependenc so here we have to get into a little bit of like okay it kind of depends on how you define it because if you're talking about dependencies it's like well JavaScript runs in a virtual machine and the virtual machine can have bugs it is true that if there is a bug in the in the implementation of the virtual machine then your JavaScript code could exhibit memory unsafety that could totally happen and then you could also say well uh the operating system could have a bug that could cause memory okay fair enough so let's just agree that there is some constant number of dependencies that could have memory on safety for JavaScript that starts at the VM and goes down those are potential sources of memory on safety as they are absolutely every program um but it's not something that varies by how many dependencies you have when you're writing old school browser based JavaScript it does not matter how many JS files you're importing none of them are going to introduce memory unsafety into your program not true with nodejs nodejs every single dependency you have potentially could introduce memory unsafety to your program and as we've seen before unfortunately one of the one of the really nasty things about memory un safety bugs is that they can affect other parts of your uh program in fact they can impact any part of your program all of your your entire process is using the same memory space so if you have something with memory unsafety in one of your dependencies and it overwrites one of your btes to turn a zero into a nonzero and now you have this you know string overrun um that can happen anywhere and and it can happen you know across any amount of code boundaries that you set up in terms of modules and anything else so unfortunately the fact that node.js introduced this ffi means that you now have to worry about memory unsafety not only in your application code but also in every single dependency that you import that's a trade-off for for getting to access C code now there's a whole bunch of languages that do this uh no JS is far from alone in this way so python closure Ruby hasal these are just examples like lots of languages do this um they they have uh like by default they are memory safe but then they have an ffi that allows you to access a memory unsafe language and that includes in all their dependencies there's another category of languages that kind of take it a step further so uh in this case I'm going to give the examples of java C Scala uh Swift and go and these are languages that have not only ffi but also they have a concept of unsafe so there's a first class way in the language to write memory unsafe code so for example in Java it's sunk. unsafe there's just a package you can just yeah I can just do memory unsafe things directly in Java code I don't even need to use ffi and and get C into the equation which means that I need to Now sort of expand my space of things that I have to consider as potential memory unsafety uh sources so it's not just that I have to say oh am I using cffi do I have C files or you know C libraries and my dependencies um no in these languages I also need to go and audit well do I have you know sun. mk. unsafe anywhere in my code because if so or any of my dependencies for that matter if so those are potential sources of memory unsafety and I probably want to be more careful when like reviewing code that involves those things for example and then finally we have the last category of languages like assembly C C+ plus Objective C and these are languages where they basically anything could potentially be unsafe they have like first class like pointers and addresses and unlike in the the previous category of languages these are not sort of off in a corner and like really culturally discouraged their use it's like these are just used all over the place and you know people can just uh use them as much as they want and that that also includes every dependency okay if you're curious by the way um rock is actually the second language that's in this top category where we also have um we don't have an arbitrary ffi that like can infect every dependency there's basically exactly one of your dependencies that you kind of build everything else on we call it the platform this is not a talk about rocks I'm not going to get into details there but um that was an intentional design decision as part of like trying to minimize the amount of stuff that you need to worry about in terms of memory unsafety to sort of you know as minimal as possible okay so I'm going to draw this box here uh called uh memory safe languages so this is when people say memory safe languages this is what they're talking about this is where that term comes from now you can see that there's already there's some Nuance here it's not like you have memory safety guaranteed 100% in any of these languages except for arguably JavaScript and rock but even then I mean like javascript's running in a virtual machine that can have memory safety bugs um that potentially can be exploited through. JS code um and you know rock does run on the platform so you know everywhere you look there is some potential source of memory safety it's really just sorry memory unsafety it's really just a question of like how big is the surface area how much you know what parts of the language in your codebase you need to consider as potentially memory unsafe and be concerned about versus what parts you're like okay yeah as long as I've audited those and I'm really confident those aren't doing anything bad um the rest of my code base that's built on those I can kind of assume is is fine um and this really is kind of the big pitch of rust and that like why rust goes in this box is that it's the same thing with rust as long as you've really carefully checked out the you know few parts of the uh rust codebase that are using the unsafe keyword uh and things like that and and and ffi because rust can do ffi as well um then you don't have to worry about the rest of it because the the guarantees of the language sort of protect you and that's definitely a valuable thing and Zig of course goes in this other box uh where it's you know it's much like C and C++ um it it doesn't have this sort of like small subset that's like guaranteed to be safe but rather it has tools that help you improve your uh ability to detect and deal with memory safety and this is an important distinction which we'll talk about in a second um so yeah basically I think it's an important to note you know when whenever I hear someone say this the first thing I think is like yeah okay but memory safe language is not exactly the same thing as memory unsafety cannot happen here if it were nobody would use rust because it wouldn't be able to reach that maximum performance ceiling um and it's also not true at the same in the same line is like you know memory unsafe language doesn't mean that everything will definitely explode all the time like you know we all use like you know postgress database we all use operating systems all these things like they do have vulnerabilities from time to time for sure um but they're not just like constantly blowing up due to memory unsafety bugs um and are unusable like you know I I I I consider postgress and like sqlite to be some of the most reliable code bases I've used um you know overall even though they do occasionally have memory safety problems and like bugs reported um so really this term memory unsafe language is about where potential memory unsafety can be found that's the critical distinction within the language so why would somebody choose to use a memory unsafe language well one reason is like what we ran to in rock I mean if we can't have memory uh safety in this code like this this built-in use case like we're just going to be using the unsafe keyword everywhere in Russ and we just can't avoid it because that's the thing that we're building if we want to build it in this way we have no choice then like yeah why not optimize for other things than memory safety like we can't have it anyway so you know why bother there there's no in this particular use case there's really no benefit to having a borrow Checker if we're just going to have to uh you know circumvent its guarantees anyway with unsafe uh similarly you can have the sort of the opposite end of the spectrum where you're like look we're not going to have memory unsafety anyway because like um I'm going to use the example of tiger beetle which is a database written in Zig and lots of cool talks about that but this is an unusual database in that it never deallocates anything so use after free bugs not a thing double free bugs not a thing and also as we saw Zig has balance checks on slices so you know they don't necessarily uh need to worry about any of the big three that we talked about use after three and double three don't come up if you never deallocate and if they're using you know zig's slices and and keeping those checks on you don't have to worry about buffer overruns either so at that point like what's the concern what what what are the sources of memory unsafety remaining for tiger beetle if they're in this um situation so whereas in our case we're like yeah we have memory on safety kind of no matter what um that we have to worry about in Tiger case it's like well we kind of don't have to worry about memory on safety no matter what so yeah why not use Zig if we like the ergonomics better so I mentioned earlier there's this really important question of okay let's say I'm in the memory unsafe world what help do I get from the language to help me avoid it um and to prevent memory unsafety bugs so we had this example of like what if I have the the buffer overrun and we sort of saw that like okay yeah C and C++ don't really give me any help here they just silently overwrite whatever's in that memory which can lead to buffer overruns um rust will give me an index out of bounds Zig can give me an index out of bounds um balance checks though are only one example of how a language can help us with these things there's you know for use after freeze and double freeze especially there's all sorts of different tools that languages can help us with um so the drop trade and rust is a good example of this the defer keyword and Zig that we talked about ear C++ is a thing called ra that can help with this um Objective C and Swift and other languages have automatic reference counting uh the list goes on tracing garbage collection Zig has these really cool testing allocators um address sanitizer UB sanitizer Miri in non ffi rust um there's just lots of different tools that we get from different languages and these all have different trade-offs and at the end of the day that's the thing that I've learned like from from you know if there's one thing that you take away from from all this is memory safety is not All or Nothing um it like Russ Checker is a useful tool rust unsafe is a useful tool zig's defer is a useful tool zig's testing allocator is a useful tool and all these things have trade-offs um you know sometimes you might say I want really strong guarantees I want you know what Russ gives me other times you might say I don't need those strong guarantees or you know there's there's other trade-offs that might lead you to choose Zig even being fully aware of the trade-offs of all these things okay so I asked this question earlier of why not C like we were talking about you know we tried rust for our built-ins it didn't go well and we said okay why not see but when we learned a little bit more about Zig we're kind of like uh I mean if Zig is an option like well there's lots of reasons why not C like for example Zig compared to sorry C compared to Zig is more prone to memory unsafety like it doesn't have zigs defer to help us out with that um it has more gotchas than foot guns from being a you know several decades older than Zig like silent conversions and stuff you that Zig just doesn't do Zig has you do like explicit casts for things um C has less ergonomics features doesn't have this really nice comp time feature that Zig has um and also what I found is that the zig Community is really helpful and beginner friendly whereas the c community is so old that it's just not doesn't value like beginners um to the same degree that I've experienced in Zig um I should also mention that zig's tooling has been really really nice not just on this project but um based on my experiences in this I actually learned about this and ended up using it at work um so Zig CC makes uh cross compiling C code really easy it's basically like uh that's a way you can get some subset of zig's uh tool chain features um in a c compiler so Zig actually ships with like a professional C compiler called clang and then then it basically wraps that up in a really nice tool chain for doing things like cross compilation um so at work we use this for node.js and rock interop so basically at work we have this really big typescript back end um that runs on nodejs uh that's been around for a long time and the goal is to migrate it to rock in order to get a bunch of benefits that again are out of scope for this talk um and what we're using is Zig to cross-compile that so we do a lot of like AWS Lambda stuff and basically I want to be able to build on my machine and output a rock binary for the Target that we're going to deploy it to which is like you know Linux Intel versus like my Mac is like an you know arm machine um and Zig CC has been really helpful for doing that and I only would have found out about that because we ended up um you know considering it for Rock's compiler um by the way if you're interested in uh working at a company that's doing cool stuff like this uh check out vendor.com careers um because uh we we all have all sorts of nice positions open um cool so uh we talked earlier about how sort of the motivation for all this was um having these manual lvm calls that were super verbose and wanting to just sort of get higher level language so we didn't have to write all this for these these built-ins um we talked about how some languages can compile to lvm bit code which includes you know cc++ Zig and rust and so it can you know get us this like much more concise thing now when we ended up using Zig for this uh for this use case um basically the way that we were combining Zig Zig and rust is exactly what is on this slide which is to say compiling our built-ins to llvm bit code which is actually like a BC file and then we have the rust compiler sort of ingests that sort of slurps it in in and then the rust compiler mixes the output of that um Zig compilation to uh to sort of Blends it in with the uh the LM that we're generating on the fly in the Rock compiler and then sort of outputs the the same stuff so it's as if we had just written all those things um at once but this is not like Zig and Russ sort of calling each other directly but rather Zig is used at build time and then Russ is exclusively used at runtime that's how that's how we've done it um and I talked about how like you know our first choice of rust didn't work out for various reason and then uh that's sort of how we ended up with rust and Zig together um you know using Zig at build time to to generate these um bit code files and then uh importing them into rust uh for use at runtime but there are other ways that you could use rust and Zig together such as calling between them um so both of these languages can compile to C compatible binary libraries uh and also both of these languages can import and call C compatible binary libraries so you can use that intermediate format as a very straightforward way to build some Rust code into an intermediate library and call it from your Zig code or if you've got a big rust code base maybe you want to introduce some Zig into it you can do the same thing compile your Zig code to a c compatible binary library and then import it from your rust code and basically this is the same thing as using any C library in terms of overhead and also how you do it like lib SSL is a very famous Library that's used in a lot of projects it's the same thing except you're building your own it's you wouldn't call it lib SSL You' call it you know lib whatever I'm doing probably um but this is a way that you can combine the two at runtime rather than at build time like we have in our project um we're noting that if you're going to do this that you're going to have you're going to end up needing to come up with some way to share type definitions between the two so either you're going to have to sort of duplicate code and write like okay here's the type on the rust side written in Rust code and then here's the type on the zig side written in Zig code or else use code generation to to generate one of the other um so this sort of leads to the question that like you know knowing that we we could do this we could uh do like compiling them both to uh know binary C libraries um why didn't we like why why don't we consider using Zig and rocks compiler alongside the rust code there in some cases where we might think that Zig is better fit well actually we did talk about this um honestly for me personally the the main appeal of doing this is compile times um I I I really like a lot of things about Russ but compile time is one of the things that I don't like um zigg has done a really good job of speeding up the compiler probably this is because Zig has a simpler compiler but also it's because uh Andrew Kelly who created it has spent a ton of time optimizing it he gave a really cool talk a few years ago at handmade Seattle about how all the different techniques that he used to speed up zigs compiler um and I have not seen those same kinds of optimization land rust compiler um but this would be something that I would really appreciate especially when there's parts of the code base that we iterate on very often we're always feeling the pain at those compile times um also zig's allocators are sort of a natural fit for uh how we like to structure the compiler um without getting into too much detail this is kind of the idea that you might want different memory allocation strategies in different parts of your code base and Arena allocation is one that we use as much as we can in the Rock compiler um and uh this is not as ergonomic in Rust as I would say it is in zig um and finally there are definitely some parts of our code base that could be simplified if we they were written in uh Zig instead of rust and in some cases we would have to give up some significant guarantees for that but those are the cases where just the nature of what we're doing means that we really wouldn't be giving up that much in terms of guarantees or at least I don't think so um having said that it's also the case that we already have 300,000 lines of rust code and uh introducing the two would complicate the build even more than it already is we got a lot of stuff going on with these like built-ins and bit code files and whatnot um and so like having to do like you know shared type definitions across the two especially if we're generating that um there's a significant overhead to doing that and we didn't think that that was necessarily worth it in our case not to mention I you know I've talked about how unsafe is kind of this like thing that turns off you know certain guarantees and rust um but it is very nice not just for internally like for our own code base at sort of isolating the parts of the code base that we need to worry the most about and be the most careful with but it's especially nice for new contributors because we have a lot of people contributing to the Rock compiler where this is the first time they've used rust maybe this is the first time they've uh ever worked on a compiler both of those things were true to me when I was starting the compiler um but now that we have like 300,000 lines of code and like a bunch of contributors it's really nice when somebody's making their first contribution to the project to be able to review that and say oh they didn't use unsafe at all so I don't need to worry about you know auditing anything extra carefully or they only use it in this one place so let me just really carefully check that one place um otherwise we can we can sort of have a lower bar of how carefully we need to uh check these things which reduces the amount of time we need to spe spend um checking new contributions which is really nice um so overall all like you know we did consider it we did talk about it but it just seemed like for that part of the project it just didn't seem like a nice fit so finally I want to talk about uh very briefly like why else might someone else uh want to mix rust and Zig because you know depending on what situation you're in what project you're working on you might find you know what actually I think this does make sense for me and i' never really considered it before that's certainly something that's happened to me a lot in my career is I look back and I'm like I really wish that i' realized that X was an option back then I didn't even consider it it wasn't even on my radar and if I had known about it maybe that project could have gone better so I'm just going to sort of speculate here um some ideas for why you might consider mixing rust and Zig I'm not saying everyone should run out and do it probably it's correct that a a small minority of projects should be doing it um but I think you should at least be thinking about it because you never know when opportunities like this are going to come up that actually might make sense for you um so let's say you had for example like a large code base with lots of mandatory unsafe code like we do so I'd imagine like I I've never worked on an operating system but I could imagine like an OS kernel would have a lot of stuff like that for device drivers and things um then maybe you also have something uh codebase where you have lots of tricky lifetimes to get right where things are getting passed around a lot and It's tricky to remember like when things need to get allocated and deallocated um you know rust could help with that um but maybe you want access to the Z Zig tool chain but also you want access to the rust crates ecosystem maybe somebody's implemented something really nice in like a rust crate rust ecosystems at least today much bigger than the zigg ecosystem um that might be another reason you want to mix those or maybe you want uh certain like you know Marquee features that each language has like R con currency checking via the borrow Checker but you also want access to zig's comp time maybe those would be useful in different parts of your code base maybe that could be also a reason that you might want to consider mixing the two at any rate whatever the reason whatever whatever your combination of use cases are I hope that over the course of this you've found something to take away from our experiences using rust and Zig together thanks very much
Info
Channel: GOTO Conferences
Views: 69,441
Rating: undefined out of 5
Keywords: GOTO, GOTOcon, GOTO Conference, GOTO (Software Conference), Videos for Developers, Computer Science, Programming, Software Engineering, GOTOpia, Tech, Software Development, Tech Channel, Tech Conference, GOTOcph, GOTO Copenhagen, Rust, Zig, Roc, Richard Feldman, Rustlang, Ziglang, Roclang, Programming Languages, Functional Programming, Borrow Checker, Compiler
Id: jIZpKpLCOiU
Channel Id: undefined
Length: 45min 33sec (2733 seconds)
Published: Fri May 17 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.