Diving Into Rust For The First Time

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so yeah I'm here to talk today about rust as Jackie just said I was I've been working on rust for about six years but what she didn't mention which cuz I didn't tell her that before that I have been a pretty big fan of C++ for a long time I just think Lea remember asking my parents for this book must have been middle school for for a Christmas present and I poured through the whole thing I couldn't find a picture of me reading it though but I did find this reading the Zen of graphics programming with my dad at some point or another but I thought was pretty cool but I'm here today to talk about rust and so I know not everybody here is familiar with rust or even heard of rust at least based on some of the conversations I had so far so I thought I'd give a kind of really quick intro to what rust is at a very high level all right so the basic idea is we want to take all these nice safety guarantees that you can get if you write a program in Java or C sharp or some other language like that like that you're not gonna have double fries and segmentation faults in fact we want to go further and say you also won't have data races so your Kurla programs will be more predictable and have a type system that's more expressive that can help you get correctness guarantees or not guarantees but get correctness beyond these basic guarantees we want to do all of that without having to have a runtime or a garbage collector or really any undue performance overhead so I've heard be honest to sturb say the c++ tries to leave no room for a language between it and assembly rust kind of has a similar ambition right there are people writing kernels in rust and so forth so I think overall basically our big goal is that we want to have productive systems programming and I mean productive in a kind of global sense so how basically how fast can you get your program up and running how fast can you meet your performance requirements but also how fast can you or how well can you maintain it over time right because that can be a big productivity drag and today is kind of a good day to do a talk on rust because it's almost our 2 euro birthday so to speak we had our one-year 1.0 release sorry two years ago on May 15th so yesterday and in that time rust has been growing we've got a lot of using it in production now I have a partial list here of the most kind of biggest names but you can see a more complete list on our webpage so of course Firefox has started to integrate it when I say of course because Mozilla sponsored rust but actually it wasn't a given that it would a succeed or B make it into Firefox we had to kind of prove our way there as well and Dropbox has been using it on their servers and recently also on client side NPM has integrated it gnome hasn't actually put anything into production but we've been working a lot with the gnome team and we're kind of excited to see where that might go there are some gnome libraries starting to experiment with rust so in honor of rust being 2 years old though I thought I would start with something two-year-olds love which is I would tell you a story and this is a true story although not exactly as I presented here and it's a story about using rust and so it starts at a mmm at a working place in this office center where Lady ADA comes every day for her job as a programmer except something seems dated okay this is better so it starts at his cafe right where every day Lady ADA comes and she works on her program and what her program is is a compiler so I told you it was a true story from my life and she's trying to make it go faster and she notices that hey I have this there's this one main thread that's doing most of the work and every once in a while an event occurs and when this event occurs we have to update this data structure and sometimes it takes more time and sometimes it takes less time the key thing is we don't need the result until the very end from this data structure so this is what gives her an idea she says I could probably make this go faster if I introduced another thread right I could have now when the event occurs I'll just do some lightweight processing to send it off but the other thread update the data structure which I don't need to use and that will take some time and at the very end I'll get the result back right and if all goes to plan everything should be faster right because the two threads each have less work and we have more than one core so she puts this to play puts this plan in place builds the system and it works and you know she feels pretty good the thing is six months passed this has been working in production the compiler is somewhat faster than it used to be and then she's doing something else entirely and to be honest she's completely forgotten that there were threads in the first place right so she's just working on this some part of the compiler that happens to be modifying that same struct this by the way is rust syntax but I figure you can probably figure it out so this defines a struct and this this event that happened to be the thing that was getting processed previously it just had a few integers in it but now she has to add something new which is a name which is an intern string and this is where the story gets a little bit scary so if you've done kind of multi-core programming or parallel programming you know at least if you're like me your heart starts to race when you think about all the things that could be happening now because what ADA doesn't know about that in turn string she's never actually had to look at that part of the compiler in detail is that this is how its implemented and this are see of string that's kind of rough speak for a reference kind of string kind of like a shared pointer except our C is not atomically reference counted so that means that if two threads come along and try to modify the read count in parallel it could be wrong right and in particular like I said we did just introduce this thread that was gonna send these events and hence send intern strings across threads so there's a pretty good chance that we're gonna be modifying this ref count in parallel one of them will probably get slightly off this probably won't cause a problem right away especially because these are intern strings so there's always kind of one big dictionary holding it at least one ref but sooner or later it's gonna cause a crash or maybe just gibberish output or something right and it's gonna be a big pain and waste a lot of time to debug but actually none of that happened because she was using rust so what actually happened was that she got a compilation error and she said what so the type in turn string does not implement send that's kind of rough speak we'll get to a more detail later for saying this is not a thread safe type and she said why would it have to be thread safe oh right I have a thread I forgot about that and then she went in and changed it and everything was fine I mean this is kind of the experience for us aiming for it you can do things very easily to start right so you notice it took her only one day to put those threads in that's also in well really the stories about me so I'll switch to the first person now it took me about one day to put those threads in in the first place and that's only but that's only part of the costs right the rest of the cost comes later in the maintenance period and we're Russ type system is really great is when it can kind of deflect these things that would have been very painful and I only take five minutes of your day now to fix it and get the rest of the system back going right so there is kind of these two parts the productivity in the beginning in the productivity in the end are the things I want to kind of drill into more I mean I think the productivity in the beginning comes from a lot in large part from the zero cost abstraction idea that rust has kind of learned from C++ right so we've got a lot of libraries that let you write code that's very high level and that compiles down to something very efficient so this first example is a function that tests if a slice or a string rather string slice we call it is all whitespace right and we can do that by calling text cars which will give us an iterator over characters which is kind of like a C++ range we can call dot all which invokes a closure on every character and this is the closure syntax in rust so we're saying for each character C is C white space and that compiles down of course to just a little loop with a pointer kind of bumping along this utf-8 string right and this actually was a real life example from some people who reporting Ruby code and they had replaced like pages of C code that was doing the same thing with this two line loop and it performed the same and they were very excited so the load images this is another example doing parallel processing using in this case using a third-party library called rayon that I'll come back to a few times it's something I wrote but it's it's for doing fork/join parallelism and you see that it uses also iterators similar idea except you just write power editor and now you get a parallel iterator that's executing in parallel but it's also guaranteeing data race freedom at the same time we'll see and all of these libraries basically rust is a language geared around writing cool libraries that you know let you write high-level things that perform well but the best libraries in the world aren't that much use if you can't easily use them so we also have a nice package manager which lets you find the libraries and add a few lines of code and then the compiler will download them and build them for you and so forth so that's all part of the productivity story but that's the first part that's just the writing code to begin with then comes the safety and the freedom that lets you maintain it over time and that's more what I'm gonna focus on in this Mis talk because that's kind of the thing that comes on top of C++ and I think if you put these two together that's when you really get to this goal of productive systems programming so the structure of this talk I'm gonna start with a fairly technical part I'm gonna go through some parts of the language how they work and so forth and I'll focus on kind of a safety the generic programming aspects a little bit about parallelism don't worry about the unsafe thing we'll talk about that and then I'll get to some about what feels like to use rust kind of outside of the technical parts like what was it like to get rust into Firefox and what are some of the community processes that rust uses to grow and develop so let's start with the memory safety so this was our goal and the interesting thing about these top two bullet points the zero cost abstractions and the memory safety and data race VM is that they're actually kind of in tension there's a reason that a lot of languages that have safety don't also have zero cost abstraction and I'm gonna start by looking at some C++ code to kind of show you what I mean so here this is a few lines right we already in these few lines we see a few of some of the features that let you build zero cost abstractions effectively like we can make full use of the stack in C++ we can put the vector of strings has the fields of the vector or inline onto the stack and similarly the fields of the string are in lined into the vector so all in all when I want to get to my character day it only cost me two pointer dereferences and that matters a lot if I'm going to put this vector inside of a hashmap and put an iterator on top of the hashmap and put something else on top of that all those layers start to add up if you have to add a pointer at each point but once you have the idea of a string being embedded in a vector or what-have-you you also need the ability to have a reference right into the middle of other data structures right to have a reference that takes the pointer of a field or of a particular array element so this is something of course C++ has and once you have that if you let then layer on top of it deterministic destruction meaning we're gonna free the memory of this vector as soon as we exit this this function right this is where things start to get a little bit complicated and so if I extend the example now well I will just say however this makes things complicated but it is crucial to kind of maintaining your memory use over time and keeping things from ballooning out of control so if I extend the example just a little bit we can start to see some of the safety problems right this is the same code I have my my element or not the same extended version of the same code I have my element pointer reference pointing into my vector but now I'm gonna call vector dot pushback right and at this point if the vector is at capacity we could have a problem because we'll copy over the old memory we'll put the new value and then we're gonna free the old memory and adjust our data pointer and from the point of view of the vector everything's fine right all the references are up-to-date everything is consistent but there's this dangling pointer out here this dangling reference of element that's pointing now into freed memory and if I go on and use it I can get into trouble right and even in this simple example we see kind of the outline of what happens if the issues boil down to and there's really two key ingredients that come together the first is you have to have some sort of aliasing so you have to have more than one path to reach the same memory in this case we had the elem reference and we have the vectors data pointer and then you have to have some mutation because that's what makes things get free that's what changes types like in a union or something this is what kind of invalidates references and the reason these two things combine is if you only have one path to a given piece of memory it's pretty easy to keep it up to date when you make a change alright so if you for example update the the vector kept it's data reference up-to-date but it can't keep track of all the elements all the references that are out there in the world so what Russ tries to do is to layer on something we call the ownership and borrowing system and basically the key idea here is if aliasing a mutation together are the problem then let's just not have them happen at the same time and we do this by having kind of three main ways to access data which I'm going to go through it's called one by one right so the first one is ownership this is the most common thing where you just own some data and in that case we're gonna see that you don't have any aliasing because you own it you're the only one who has access to it but you are allowed to mutate it right ownership we chose this word well we partly because it's in use but also I think well you have a pretty common-sense feeling for what it means right so in life if I'm owning and I have a book it's my book it's on my bookshelf and I can do a lot of things to it right I can write in the margins I can read it one of the things I can do with it also when I'm done with it I can give it to somebody else and now it's their book and it's not mine anymore and I have nothing more to do with it alright and that's kind of how data structures work in rust so here's some rust code and it starts out by creating a vector now this is our first glance at rust syntax so I'm gonna just point out a few things the functions begin with FN were from a company that's very familiar with JavaScript where the function keyword is function and it's very long so we tried to make this one very short since you type it a lot and then inside of a function you can declare local variables with let and this vet colon colon knew that's invoking basically a static method essentially new is not a keyword it's just a method that by convention creates an instance of the type it's attached to and returns it to you so the representation at runtime will be exactly the same as C++ we have our fields on the stack and I want to highlight this mute keyword in rust you'll see most things tend to not be mutable unless you say so in this case we're declaring that we are allowed to write to this book and the data that it owns and that's kind of a helpful thing when you're reading code you can see pretty quickly what's changing what's not and so forth I'm gonna indicate that with the little pencil there so when I call book dot push I get data living in the heap some strings or what have you so far it's pretty much the same as what we saw before but where things start to be interesting is here when I call publish you see that it has an argument which is Veck of string so if you have a function that takes the back of string and rust it's saying I need ownership of this argument give me a vector all from my very own and on the other side the default then is to say well I have a vector you want a vector I will give it to you right it's not mine anymore so at runtime what that means is we copy over the fields that are on the stack and the data sort of transfers with that ownership transfers along and then we kind of forget about this fields as if we didn't have them right so now published can execute it owns the book it has the only access to this book it can read it and when it's done everything that it owns it can free right it runs the destructor and now when we come back this is where Russ starts the diverge right so we gave away our book but we're trying here to use it again so what happens we get a compilation error because the compiler is tracking what did you own what if you've given away at each point in the program and it won't let you use the things you've given away and we track this on a pretty fine granularity so you can give away one field of a struct for example or some fields and then we initialize but usually it's just a whole variable at a time so this is somewhat different than kind of copy constructors right c++ has a very similar notion of ownership but the defaults are are kind of switched so if I have that same example what transliterated into C++ it starts out the same we have the memory layouts the same and really when I call publish in a way the signature here is the same it's saying give me a vector that I'm going to own however what happens on the other side instead of giving them the vector that we have we would invoke the copy constructor and do kind of deep clone of the data right and now in from publishes point of view it's all the same they own this book that you can read it they'll free it when they're done but when we come back we're in a very different state we haven't given away our book we still have it and we could call it again and again and again so you can do the same thing in rust of course if you don't want to give away the thing you have if you want to clone it you can you can say that this is rust code again by calling clone Oh like I said we kind of switched the defaults around so essentially in rust doing a deep clone is explicit the default if you have something that has ownership and has a lot of memory associated with it is to give on a ship away which is kind of the more efficient thing so we do have a special case for things like integers floating points those are what are called copy types which is kind of like plain old data and basically as soon as you touch them they were always cloning them implicitly because there's no real memory semantics associated with them so okay so they're they're not quite copy constructors moves are somewhat similar to our value references at this point we're getting past the C++ I learned in high school so if I make a mistake you know feel free to correct me but I think I have the basic gist right so if we change this program to use standard move right that's going to create an r-value reference to the book and the UH net effect is going to be very similar at least in terms of runtime to what happens in rust right we're going to copy over the stack fields but we're going to take ownership of the memory and again published has its own vector which it can execute however again we're anis turn it sorry it's the big danger of the animations however we are forget it we're in a different state when we get back basically in the main difference is that the compiler in in that variable book is still of usable kind of variable lba1 in a kind of unspecified state at this point and in particular its destructor will run when we exit the function and so forth so in russa is not the case when you give on a ship away it's as if you never had it the destructor doesn't run it's basically all tracked a compilation time and that that makes a big difference not only because you find errors faster but because when you're building api's and things it lets you use kind of say the zero size or phantom values as kind of tokens that you can give away and then you lose access to them forever and you can't use them again and you can enforce you can basically layer that to build more complex invariants in your api's so that's ownership it's fairly straightforward it is the most common thing but actually in real life or at least it's a very common thing but in real life you know usually if I give my book to someone I'm not saying take it forever I'm saying here why don't you read it and give it back to me when you're done right and that's usually what we want to do with data structures too so that's where borrowing comes in and there's two kinds of borrows in rust the first one the most common one is what we call a shared reference it's written ampersand T and like the name suggests it allows sharing but we'll see that it does not allow mutation because we said we want to keep those two things separate so if we go back to our example but I've updated the type of publish instead of taking a Veck of string it takes an ampersand back of string meaning a reference to a vector right and on the other side instead of giving published the book I'm using the ampersand expression I'm sort of taking the address of book but in Russ terms I'm borrowing the book to make a reference which I can then give so at runtime this is just going to be a pointer that's what an ampersand back of string is represented as and we saw that it's going to disallow mutation so I'm gonna go into more detail on that after this slide but for now I'll put the little slash to show what I mean so now publish can execute and it has a borrowed copy of the book as a pointer to the book you can use it and when it's finished no memory gets freed right because it doesn't own anything it's just borrowed it and so we can then call it again and again because the book has not been invalidated by this so let's see now this mutation thing I have this theory that you can kind of everything you need to know about rust you learned in kindergarten which is kind of what I say we don't break your friends toys just kind of when you're sharing something if you're borrowing someone else's thing at least you really shouldn't be breaking your writing into it and so you should be nice to it and that's kind of what happens we shared references right so when we first create the book here this is this combined snippet in one function when we first create the book it's in a mutable state and we own it so we can do things like call book push and that's fine but when we enter into this block and here I'm just using the block to scope the variable R doesn't have anything else any other purpose and we make this reference we're putting the book into a borrowed state the compilers mind right and that means that during this period where the reference is in use we can't do mutations on books so we couldn't call book push that wouldn't be allowed and we similarly can't call through the reference to the book we can't call r dot push both of those would do the same thing effectively if we allowed it so you see here that we have two paths to the same book you can go through the original variable book or you can go through R and you could also copy R and make many references and none of them allow mutation you also see that mutation and rust is not a binary thing it's not like this is immutable forever or mutable forever it's really about in this period it's mutable or in this period it's not and that's particularly here because when we exit this scope the variable R is going out of scope now the borrow ends and that means that we can again call book dot push that's perfectly legal at this point since there's no references to it so I should add one caveat I've been saying immutable and mutable this whole time and that's actually a little stricter there are ways to mutate shared alias memory but you have to go through special API so essentially I have to use unsafe code which I'll talk about later so an example where you would want to do that might be a mutex that's protecting some data if you couldn't share it you would just have the mutex in one thread that's not that's kind of pointless so you would want to be able to share it amongst many threads and then have them each acquire the lock to gain access that's an example of a kind of special API that enables mutation but the default is if it's shared you can't mutate it so if we go back to our first example now this is the rust version of a vector that we're going to invalidate where we have a reference to its 0 of element that we're going to invalidate we can see now that already this is the rust compiler will rule this out right so if we make an element pointer here we're taking a reference to the zeroth element of the vector that's going to freeze the zeroth element but also the vector itself because it's the whole container especially because the russ compiler doesn't reason about indices very fine-grained in a very fine-grained way and then we'll see we can't push on the vector anymore because it's not allowed while this element pointer is out there okay so that's a shared reference and the other kind of reference is a mutable reference and as you might expect if you've been following this table a mutable reference allows mutation but doesn't allow for sharing right let me show you what I mean so here's my code example I have a mutable reference to a vector here it looks a lot like the shared reference but it has a mute keyword in it so signaling we're going to do mutation now and on the other side instead of writing ampersand book are writing ampersand mute book so again if you're reading through you sort of see where the mutations are happening at runtime this is exactly the same as a shared reference it's just a pointer but in the compilers mind it has a different significance instead of making that that pencil go to an immutable state it goes to a kind of locked state where we're not going to allow access to the original variable at all so now we can run our function it can make changes through this reference and they will affect the original variable so we pushed onto the book and we could call it again we haven't given ownership away and all these things can happen while we remain the owners right so how does it look what does it mean to block out all access well this is the same kind of format here and we can walk through and we see that initially the book is mutable just like before but when we make a mutable reference here we're borrowing immutably now it's a little bit different before you couldn't do any mutations but either using either path you could do reads now if we try to use the original path the book even just to read it just to get its length we get an error but if we try to use the reference we can do anything we want we can read it we can write to it we have full access all right and all of this is temporary just like before so when the borrow ends we return to our original state or we can mutate the book so the reason that we have it like this now there are a lot of reasons actually we found that this scale better in various ways but one of the reasons that's it's very easy to see is that when when the rep the mutable reference has soul complete access it means you can do a lot of things like for example you could send it to another thread and you can be sure that the original thread there's no one's reading from it from the current thread right I'm not reading through it for through a book while another thread writes to it from our that would induce a data race and we'll see that later on so there's one key idea here that it's kind of been implicit and I want to make more explicit I've been talking about how when you make a reference it freezes this variable for a span of code like in this case if I make a reference R to book of 0 I'm gonna freeze the book until the end of this block right starting from the let until the end of the block that concept this span of code is what we call a lifetime and in particular it's the lifetime of this reference and we give them names although not within a function usually as we'll see you but across functions sometimes you want to give names to them so the full type here it's actually part of the type of the reference if you go into the full details so where I wrote ampersand string that's kind of a shorthand for saying a reference to string and you figure out the lifetime if we were gonna write it explicitly it might look like that where we would say ampersand tick L string so this tick L the tick represents lifetime and that means that a lifetime with named L that's the lifetime of this reference and it refers to this span of code so we're saying this is how long the reference is valid for how long it can be used and the compiler won't allow you to have the reference outside of that lifetime so you don't have to worry about this within functions it's all implicit and in fact we don't even have a syntax for naming it within a function although I'd like to add one but across functions is very useful so this is a function that returns a reference to the zeroth element of the vector that you give it right and you see these annotations here what these are basically saying is they're they're telling the caller who doesn't know what the functions body is what this reference is that as being returned where does it come from it's basically saying by tagging both of these references with the same lifetime we're saying I'm returning to you a reference to a string that I got from this argument V and that seems pretty obvious here because there's only one reference as input so there's really nowhere else we could have gotten from so actually rust doesn't require you to write it explicitly in this case we have religion rules that let you allied those things but in sometimes if you have many references can be important to clarify where the value is coming from and where it's not and then that allows us to kind of track these borrows across functions right so if we have this function first and we have an example here I'm borrowing book but all I'm doing with the book you see instead of before I was directly assigning it to a variable so the compiler could easily see what was happening here I'm passing it to a function which is returning to us something alright and the compiler can use these annotations basically or you to track what's happening and say ok this reference some sub part of this reference is going to be stored into R and R is used in this block so I can figure out that book is basically borrowed until that reference R goes out of scope and that means in particular we're not going to allow you to say modify book here even though there's no direct reference to book anymore it got passed away all right so that's kind of the heart of rust essentially what I just covered the ownership and borrowing system or at least one of the the key parts of rust but the other key part is our approach to generic programming I think this is another place where we took a lot from C++ and we've tried to add onto it some other influences in particular some of them from Haskell so I called this section traits traits are very like the cornerstone basically of generic programming and rust and I think you can there are someone analogous to a concept I'm sure the details are very different unfortunately this is like clipping some parts of the display but what that should say is trait clone and it's so we've actually seen this already earlier I said if you want to clone a book you could call book clone what that was actually invoking was a generic method called clone that's defined in this this trait and so a trade is sort of like an interface that you implement for a given type and they can have associated types they can have different kinds of members the most common one is a function so here I'm saying clone is an interface with one method and the signature of this method so you see these different references to self the first part the ampersand self with the lowercase that's declaring the receiver so C and the the this keyword is kind of implicit in C++ its explicit in rust in part because you are there are different ways that you different modes kind of for functions so in this version I'm saying when you call Veck clone I don't need to take ownership of the vector I just need a reference to the vector in order to clone it I just have to be able to read it so one nice byproduct of that is when you read a interface declaration you kind of see where the witch functions will mutate and which ones were not kind of like having a Const receiver and so forth and so the other version of self here this is this is referring to the type so like a generic type parameter saying the type which is implementing this interface alright so when you call clone whatever type you called it on that's the type you get back so here we have an implementation this is how you would actually implement the interface provide a version of it for some specific type now here I'm defining it for vectors that first keyword that's cut off should be imple impl like implement so we're implementing it for vectors and we the basic model here is very similar to C++ in the sense that for every different kind of vector like a Veck of you 32 of ech of string whatever we will generate a custom copy of this code to tailor to that type right fully optimized for that type but the type checking version how we handle that type checking is very different we do that checking once on this generic specification not on each individual instantiation okay and that's where these requirements come in with this imple is saying is here's how you clone a vector for any type T provided that T itself is cloneable right and that's because when you clone a vector you also clone all of its contents so they have to be cloneable in the actual implementation we can create a new vector here that was something like we just saw only I'm calling it V and then we iterate over it I don't have time to go into exactly how this works but basically LM here is gonna be we're gonna create an iterator and LM is going to be a reference to each element in the vector in turn right so ampersand T will be its type and on each of those references I can call the clone method and then I get back a value of type T that I can push into the vector and when I'm done I return the completed vector so this is a very simple way to clone a vector so most of the time when you're doing generic programming you're first of all using traits and you're using methods but we also have some traits that don't have any members at all right there we call them marker traits and these basically express properties of the type they don't have operations themselves but they will enable other things some of these are built into the language some are not so here's two examples both of which have come up briefly in this talk so the send one the first one that's not really built into the language but that defines types that are thread safe and types that are not and so you know anything that kind of owns its its data will be considered send it will have an implementation for send and it's using some various language features so that it's basically automatic we detect we have a conservative approximation for things that are thread safe and we automatically have such a trait beacon has such a type be considered thread safe so like strings integers and so on those are all safe to send between threads because when you send them you also transfer the full ownership of their memory but we saw earlier in our very first story something like our sea of string where you have implicitly a kind of shared well ref count that's being non atomically manipulated that is not safe and that does not implement sent right and we have also an atomic version which we call arc atomic reference count and that one is sent and this is an interesting example because I mentioned that send is sort of automatic so string in you thirty two don't need any extra annotations similarly our C string would be ruled out by this conservative check but arc of string is not automatically considered thread safe but it uses unsafe code which we'll get to later two kind of opt back in and say I know this looks dangerous I'm manipulating this shared state but I'm doing it in a thread safe way so it trust me on that and then we saw copy I mentioned that for integers and things there's no ownership tracking copy is a marker trait that says any type that implements copy can be mem copied from place to place right so for floating points integers and so on that works this one is completely built into the language because it's so core to the type checking things that have destructors like strings and RC strings and also things that might have destructors in the future like types that you define are not automatically copy you have to kind of opt in to say you're allowed to treat this as copy yep no so the question was our c string does not support sin so what would happen if i try to add it myself all right basically and the answer is you can't do that because there are rules called coherence rules which basically define who is allowed to implement which traits for whom and so if you define a type you are the only one that is allowed to implement traits that you can that you can see sort speak so sin is in the standard library if your type and and also string is in the standard library so only the standard library can implement send or not implement it first string let's say and RC but if you define your own type then yes you could choose whether to implement send or not for that type similarly if you define your own trait you could implement it for a string but you have to have defined one of the two that way we prevent there being conflicting implementations for a given type yes the question is if there's two libraries one of which defines the trait and one defines the type they have to know about each other to a certain extent right and the answer is yeah at the moment they do we have different features for that in the package management system so you can kind of have optional dependencies one of them kind of optional dependency on the other so that people can say hey I want to if you happen to be using both of these then please do provide this info but it's an interesting challenge it's one we've been working on a while because on the one hand it would be nice to allow anyone to implement any trait for anything on the other hand then you can imagine like I have two different ways to hash a particular type and I have two different hash tables that are doing two different hashing techniques and if they go and get confused about which one is in use where everything goes crazy so we've been trying to find a nice compromise between those two I think we would like to enable it in some way but we're not sure what the best way is yet yes Hospice Oh Haskell so Haskell supports incoherent things if you opt in I think but we don't allow you to opt in that at the moment yes yes so the question is could you to solve this problem create basically a strong type def like a struct that wraps it and has only one field and implement the trait for that and yes you can do that and that's a common workaround sometimes it's not what you want to low inconvenient sometimes but actually one part of the solution we've been thinking about is just to make that easier to do since it's often good enough okay so I want to talk a little bit about parallelism and I'm gonna lean here just a little bit on the last two sections right on ownership and borrowing and on generic programming and reason for that is that rust actually has taken a big turn in its development when it comes to parallelism we always wanted to have a language that was had strong support for parallelism but initially the idea was okay we're gonna build it into the language probably take an Earl on like approach because that seemed like the only thing we could think there was going to work where each thread has ownership of some memory and they can send messages to one another but once the borrowing and ownership ownership and borrowing system evolved and the trade system we realized we can actually move all of this into libraries and they can leverage these two systems to to avoid databases and give all the same guarantees that we thought that the language had to provide so in fact rust the language knows basically nothing or let's just say very little about threads he has a few intrinsics for doing atomic add and stuff like that but other than that it basically doesn't really know much about threads and that all comes in at the library level so it turns out ownership and borrowing are or they were we developed them for sequential programming but they're really a good tool for avoiding databases and if you look at the definition of what a data race is it kind of makes sense right so a data race is this very common error where basically two threads share access to one piece of memory at least one of them is writing to it maybe both and they're not coordinating with one another right they're not using any kind of ordering to do that so this is basically the first half of this talk right it's all about can we avoid sharing and mutation so if we're avoiding sharing your mutation by default we kind of don't have data races at all and the only problems that you have to be careful about is when you introduce sharing mutation back like I said maybe with a mutex or something and that's when your API has to make sure that ordering is enforced so here's a cool example it's not cut off too badly it's a parallel quicksort and it's using rayon which is this library I was talking about earlier for doing fork/join parallelism it uses a few rough features we haven't seen that so let me walk you through it first off the the input to this is a mutable reference to a slice of integers and the slice is kind of like an array it has you know a number of I 32 s those are 32-bit integers sequentially in memory and it comes with a length also so we can do bounds checking and we're gonna do the usual quicksort thing so we're gonna pick if it's only one it's sorted already otherwise we pick a random element and we partition to the to the left and right so that we sort kind of partially two elements and then we're gonna call this split up mute that's a library method that takes in a slice and a midpoint so it takes and it gives you back to your kind of two views onto the same memory which are split at that midpoint and it's a lesson greater are now referring to kind of the left and right halves of this input and it might seem like we're violating the the prime directive here by having aliasing since now we have kind of a path through vector and a path through less and greater which can reach the same elements but that doesn't happen because of the features I was talking about earlier it's kind of low but basically where the function declares when it returns its references we know where they came from and so we know that less and greater were derived from Veck because they're ball annotated with the same lifetime today and so we can figure out that therefore while the less and greater are in use the vector is considered borrowed right so when you call split at Moot you're effectively trading in I had a reference to the whole array now I get two references one to each half of the array and I lose access to my original yeah okay so the question was do I also specify that the ranges don't overlap that's a good question and I will get into it a little bit and then a little bit next slide but in this case they can't overlap sort of by definition because we gave one midpoint and we said zero to mid and mid over and those do not overlap right we didn't give two independent ranges but yes we might have a bug in our implementation that's right so well like I said that's coming up actually that's perfect lead-in to a later slide so so okay let's assume for now that split up news correctly implemented and we have two disjoint views onto the original array then we can call this method Ray on join so join is a method to find a function I guess defined in the Ray on crate which is the library and what it does is it takes to closures as argument those are the parallel bars all right and it starts conceptually at least starts a thread with each of the two threads one with each closure lets them run and joins waits for them to finish and then it returns to you but one second Malka to under the head under the hood it's actually using a work-stealing runtime to do this more efficiently right and the key point is that these closures there's no the compiler doesn't know that the closures are going to run in parallel all it knows is that whenever you have two closures that exist at the same time they can't both have access to the same mutable state because that would violate our prime directive right our prime rules so basically here the compiler will already check that the set of variables at least mutable variables or yeah instead of variables that each closure mutates let's put it that way is disjoint so here we have less than one and greater in the other if I made a mistake like say I use less in both or Veck in one the compiler would flag it as an error right and all that happens kind of by the base system and rayon doesn't have to specify anything about that and that sounds like the unusual bug but actually I've made that mistake several times every night copy and paste code yep okay so the question was the closures are they implicitly borrowing by mutable reference that's a good question sorry yes effectively what we do is we analyze the closure body and for each variable we figure out essentially a mode so it can be by value or by shared reference or by mutable reference and that's based on how you use it so if they were only reading from less and greater it would have been a shared reference if they were there are other cases it might take ownership like if you take some vector and send it to another thread inside the closure yes another question so why don't you have to specify ampersand mute when your call cue stored in the closure the reason is because if you have an ampersand mute reference and you give it to a function that needs an ampersand mewed reference then it's a matching type okay so the question is does the inference take into account what I will do with the closure the answer is yes it does sort of so the they're the closures themselves they're callable right and there's a trait for being callable and so the way close a bit the base the model code is very similar to the C++ model essentially they're like a independent anonymous structure has their various things that they touch as its members and they may implement one or more of these various callable traits and there are three of them one for let's see one for by ownership one for shared reference and one for mutable reference and so when you would declare your function you're basically saying like when I declare join I am saying I need two closures and I have to specify that I pick which of these traits based on how I'm gonna use it so in this case I would take the ownership version because I'm gonna take these closures and send them to different threads so kind of they're only gonna get executed once and so there's a there's like a to two parts the the one who gets the closure basically declares the conditions under which they will call it will they call it exactly once or at most one sorry will they call it sequentially always but many times or will they call it in parallel with itself and then on the other side based on what you touch and how you use it we kind of pick which is the max you can support and they have to match or else you'll have an error one more question in the back so the question is could we figure out automatically when you want an ampersand mutant when you don't basically for the variables and the answer is that we could in some cases it's if you get into the details it can get quite tricky type inference is that way but we actually I mean may we're looking into doing some changes in that direction but certainly not from mute maybe for shared references but it's unclear the the thing is it's actually really useful when you're reading rust code to have a visual indicator not only of where mutations will happen but also of ownership and borrowing where where am I giving ownership away lets you kind of predict compilation errors better and so when we had that in the past we actually took it out we had some versions of it because we found that it was kind of actively confusing you would think you would expect the compilation error and not get one or vice-versa um so see what oh well you see it from the signature of cue sort in this case yeah yes so that's the reason that it's called split add mute is that you're declaring that you're taking you're getting back to mutable references there's also a split at that doesn't have that characteristic and so the thing I want to emphasize I kind of brought it up earlier but the whole reason that this works at all the reason we can send mutable references to other threads without kind of just because they're part of the closure is because of this rules that say when you have a mutable reference there is no aliasing alright there are no other threads reading from it because even if another that were just reading if another if the new thread is writing that produces a database okay more questions or are we happy okay one more do we have a plan do we plan on specifying the type of the capture so I I did I think the answer is I left out one thing actually there is a single flag you can use you can write in front of the closure move and then the closure and in that case you are declaring that you take ownership of everything that you touch and you do that specifically when the closure is going to be returned out of the function or in some way escape the current stack frame because then you need to force there to be no connection even if it would have been safe no we don't do it implicitly because it actually affects the semantics in some cases like if you have a single variable and you increment it are you incrementing the one on the stack or your own copy which which one are you changing so move kind of the Clara's do you want your own copy of the variables or do you want to use the copy from the stack and I think using its utilize your just a using move you can actually emulate a full capture clause because you can move a reference and then you've effectively moved you've kind of declared how you want it okay so I mentioned that rust has a lot of different paradigms I don't have time to go into all of them we just kind of saw how fork/join programming works and how it builds on borrowing I want it also builds on traits I got distracted by all the questions but I should say that the one thing rayon joined does say it says I take two closures it doesn't tell the compiler they are running in parallel but it does require that everything they touch is send a ball right and we talked about send earlier because it's gonna send them to other threads so you better not be using an RC or something that's not thread safe in there so it leans on borrowing and traits and we also have other things right we have like message passing this is built into the standard library you can sort of imagine that message passing is just ownership transfer if I have some data I want another thread to have it I just give ownership of it and it's good the only requirement is the data is sent that was the thing that a doe ran into in the very beginning right and we support locking so if you have readwrite you have read read locks and normal mutex is if you have two threads maybe they've API is set up so that basically if the some data is protected by a lock the lock owns that data and hence if you want to get access you have to go through the lock meaning you have to acquire the lock so you can't forget to acquire the lock and so forth and we've doing actively now I would say a lot of work on lock free data structures and future style programming for asynchronous i/o and so on ok so let's talk about unsafe programming so a couple there was a question earlier about split at mute and the question was how how do we know that it's right right and so basically the RUS type system has been designed to not be that smart it has this relatively simple patterns of ownership shared reference and meetable reference which is enough most of the time but doesn't cover everything you want to do we've already seen some exceptions to that like RC and so on and for those things the way we do it is we have the ability to opt out and go into a superset of rust recall unsafe rust and basically you write a write a block of code using unsafe and then you can kind of break the rules you can have essentially C pointers that point anywhere it can be shared they can be mutable however you choose and on the other side the idea basically is not that you'll just take these unsaved blocks and kind of sprinkle them anywhere in your code that you happen to be getting a type error I mean you can do that some people do that but that wouldn't be the best pattern practice instead the idea is that you're supposed to build an abstraction around it that is safe all right and so split up mute is an example of such an abstraction and there is some responsibility there so the question earlier was like how do we know that they're disjoint the compiler doesn't know it trusts us that they're disjoint and we could design the API in such a way that we have a bug and we accidentally give off overlapping ranges in this case like the logic is fairly straightforward we're pretty confident that they're not but you know it can be subtle sometimes but we know that we're not going to be able to kind of prove everything the humans can do right so that's not a goal of ours it's just to get the most of your code should not be using unsafe and the nice part about this is when you have a problem if you have a crash you have a pretty good idea where to go looking right those those unsafe blocks and those unsafe abstractions maybe you made a mistake in one of those so I was trying to get some good statistics to give an idea of how often unsafe I know you can't read that don't worry how often unsafe code comes up I was working with Diane haasvelt who's a someone who works on servo which I haven't talked about yet but servo is a project I'll talk about in a bit and she has a tool for measuring how much unsafe code there is relative to other code it turns out this is actually kind of a hard thing to measure but we did our best within the russ compiler for example we came up with this number so the russ compiler is 350 thousand lines of code more or less it's actually excluding some kind of part of it but and approximately 4% of that is around in the routine is unsafe meaning there are unsafe blocks kind of in its vicinity and the a lot of that is basically because we call LLVM so if you think about it calling into c code is kind of inherently unsafe right it can do whatever it wants so there's not much we can do about that so bindings tend to be a really heavy source where you bind to a C in C++ libraries tend to be a really heavy source of unsafe code but that gives you a feeling I mean I would say my daily usage essentially I never well okay I very rarely write on safe code unless I'm working on rayon which is that library for multi for work-stealing which yes the core of it is written in unsafe but even there we've actually factored the library into two parts so there's a core abstraction which we don't change very often that does the work stealing and then all the like parallel iterator api's and all that stuff those are all safe or just layered on top which is really cool because when we get polar quests from people who are most of them are targeting that stuff we don't have to review those as closely right a few basic tests is enough you don't have to worry about beta races and other subtle interactions and that's a really nice benefit so I think I guess that applies more generally but I would say especially for open source where you have a whole lot of contributions coming in from a whole lot of sources and you don't know how experience level of those authors having a safe on say distinction is very helpful so alright I wanted to turn here and I want to talk some about putting rust in production and what that experience has been like and so forth so we've we've recently shipped rust in Firefox as I said it was not a kind of completely no-brainer decision for one thing Firefox is a very big C++ code base and we are not going to rewrite the whole thing at once right so what we tried to do actually before I go into this what we tried to do in parallel with developing rust is we were working on a project called servo I mean servo is a essentially a reimagining of if you did rewrite everything from scratch if you said okay instead of taking gecko which was built in the 90s for unicore processors without GPUs and we wanted to say how would we render our webpage now with today's computers what would it look like that was a big task and we had a lot of early success and I'll talk about it later but kind of all of that informed the advanced features that we might want to bring into Firefox eventually but when it came time to really put rust in we went looking for something smaller something that could be basically the smallest amount of Rusco that we could actually put in so that we would then have to do all the other work which is the make files which is testing that we support all the different platforms for example Russ did not support WinXP at that time and all these different things that Firefox targets and so the choice the first one turned out to be an mp4 D mixer so this takes in an mp4 video parses the header splits it out into audio and video and this code was written already in C++ it's actually a kind of a simple operation but it's a common source of security vulnerabilities this sort of thing right because it's all these indexing and there are these mean people who will like write mp4 files that are specifically looking at your code and the bugs in it trying to take over your machine and so Russ seemed like well that's a pretty good choice for that actually when this shipped so this is now shipping I should say I think it shipped in well I guess here on the slide you probably can't see it but it says 2016 March since then what you're looking at here at this big blue circle is the telemetry of Firefox so for users who opt in some percentage of them we send data back on kind of different measurements and and and one of the measurements is we ran this rust code in parallel with the C++ code so we didn't trust it fully yet and we check that they do exactly the same thing and this was how many times was that successful so that's one hundred percent so that was pretty cool and so this is run now about 1.2 billion times out in the world and so this was kind of our first our first foray we have a couple other similar things like this planned I don't know if they'll make it or not so one example is the URL parser another thing which accidentally back when I was working at that company DataPower I remember that was the if Hugo were mad at someone you were like all right it's your job to work on the URL parser now because it seems so simple but it's really hard surprisingly especially when you get data URLs and all this other stuff and another place where people can craft and try to attack your machine so these these kind of targeted areas are good places to introduce rust in a sort of narrow way all right and I should one of the things that I wanted to mention here is that Russ does make this pretty easy right so we support the C ABI natively and our structure layout can be made completely compatible with C C structures so you can very easily call it into rust from C and into C from rust and vice versa back and forth C++ is a little bit trickier because it has such a rich set of features right and we don't support all them we're working on tools there are the people who did this work of or who have also done a lot of work as what gets to in a second integrating with the C++ code and Firefox and they've had to support you know templates and the classes and the different data layouts and so on so we have some tools for doing that but it's still a work in progress so right so what are the next steps beyond mp4 headers and URL parsers so I mentioned this this servo project which was kind of looking at how could we do the how could we do that the the rendering engine differently and there were two main things that are now coming from that project into Firefox itself one of them is something called stylo and this is kind of an interesting story so stylo does CSS selection which I don't know how much you know about CSS but basically there's an HTML Dom it's like a big tree and you walk down and based on some predicates you assign a different styles and so on and it's kind of an embarrassingly parallel problem and yet when we tried in the past to do it and we failed to paralyze it and there were a variety of reasons for that some of them were like Microsoft's compiler at the time wasn't compatible with something else or there were a lot of annoying issues of that kind but also stamping out and gaining confidence in the database in the database small stamping out all the databases and gaining confidence in that was very hard because this was basically manipulating all these Dom data structures that have been designed from the beginning to be single threaded and so people were afraid that they might never get it to work and that once they did maintaining it would be a real headache and so it never landed but now what we've done is we've reimplemented it in rust so that the main logic is happening in rust code which is using FFI bindings over to the those C++ data structures and that makes the problem kind of easier because you know that the rust code is already data race-free and you have to just validate that the layouts and so on are correct so this has actually been successful showing speed ups and that's pretty cool but one of the things that we noticed along the way of doing all this work for multi-core is that actually that's already kind of a little bit dated in terms of you're a machine you probably have four to eight cores but you also have this big GPU sitting there that's doing has a lot more parallelism available to it so one of the other big projects that we try to do is to move rendering off of the cores and onto the GPU as much as possible and following in the model of game engines this is actually a movie let me see if it works so this is showing different engines that was fifteen nine frames per second five frames per second and then when we get to servo we come to 60 frames per second and the CPU is very low so you can see that there's definitely some benefit from doing that right and that we hope to be moving that also onto Firefox at some point so all right so that's kind of I would say the other than what these features are yeah yes that was done in rust I mean I assume that I haven't looked that closely but I assume that the there's a certain amount of whatever people write you programs in OpenGL and so on to execute on the GPU itself but I see not that I know of but I'm actually not sure it's a good question but a lot of the there's so even though it sounds like it's all GPU programming there's a lot of support that goes into it a lot of all the code around it is all in rust in short the the next thing I wanted to talk about was just kind of what rust development process is like and the community of rust itself so one of the things that we've worked really hard on and probably been one of the hardest things is to be as opening as open and as welcoming as we can to people from all backgrounds and that means both technical and otherwise but you know I think what we've been a little bit surprised with in rust is we expected this to be a tool for C++ programmers and it is to a certain extent right there are a lot of C++ programmers using it are people who or I should say systems programmers in general write C C++ whatever but there are also a lot of people who wanted to write that sort of code but we're concerned about the maintenance cost about the learning curve and so on and that they thought that if they actually ship this product of their customers and it crashes it could be a lost customer and they didn't want to take that chance so a lot of them have been we're working and say Ruby on Rails or working in JavaScript or something and nodejs and they've been trying to experiment with rust and we're pretty excited about that kind of people from both directions and that's helpful both people using rust but also for getting a diversity of ideas about how to improve rust so I think a lot of those languages come with a very different expectation in terms of how much the how the Lang what the interface to the language and the user will be right so if you followed rust at all you at least prior to the 1.0 release you probably noticed at least one thing about it which was that we changed it a lot there was a definitely a period so I've been working on rust six years almost and we've been two years of stability and I can tell you that before that if I had some code that was six weeks old I would just erase it like it was no it was not gonna compile it's gonna be a lot of work I might as well just rewrite it that's exaggerating a little but it is true that it definitely wouldn't have built and we had to do is because we were doing a lot of experimentation finding how the system should work and that took us a while and so when we declared the 1.0 release we definitely wanted an end to that right we were tired of it and so our goal is basically to say it's not that will never change rust but that it's never you will never be afraid to upgrade it because you won't have a hard time getting your code to compile and so on and so we can maintain compatibility and and we'll do that in perpetuity right but the danger is okay if we're gonna maintain compatibility can we keep innovating can we keep improving rust and how can we do that and so we've been working on a system to let us it sort of lets us do that in a very flexible way and we call this our kind of feature pipeline so if you want to add something to rust it starts out in the RFC phase right and this is this is true whether you are in the rust core team or whether you are somebody who's never worked in rust before if you have some cool idea you want to put into rust you write up a description how it should work and you put it up on our RFC repo and you will get a lot of feedback actually probably a better starting point before you post the description is to put it in the forum's and iterate a little there but regardless you every RFC that goes in kind of improves on the way before it comes out is what we found and even when we think we have an idea that's guaranteed perfect we've got it all set up and this is where it's really great that we have such a sort of diverse community and also where we work really hard to keep the conversations pleasant because for a lot of people including myself if you post up some ideas that you had and you get a whole lot of blistering negative feedback it's really discouraging right and it can prevent you not only is it discouraging and I mean you may walk away but we may never find the better solution so we're always looking for ways to kind of get the best of both worlds without sacrificing if that's possible to overcome the trade-off so once we get through the RFC process we move to this nightly release so basically the Russ compiler is kind of taking a model from browsers here until we release more or less every night sometimes the builds don't succeed for one way or not reason or another the current master branch and you can use that and people do and then you basically have the bleeding edge so that's where new features will get implemented there first and if you use them to use these unstable features that are still in iteration you have to opt into it in your code you kind of declare okay I'm using an unstable feature here I know that my code may not build tomorrow I accept that and that's how you can then give feedback and influence the direction so just because an RFC is accepted doesn't mean that the final state of the feature is known right we kind of work on it over time and then every six weeks we will cut out a release and we'll say okay this this contains only those set of features that were happy with all the other ones that we're still iterating on those aren't usable from this stable release so if you don't want to have the risk that your code breaks you can just stick with stable and you're all set and every once in a while when we see we feel like a feature is done then we will declare that it's stable and it'll go on to the stable release so that that this pipeline has served us pretty well for a letting things simultaneously be actively developed while keeping a core set stable we also do a lot of testing so I mentioned like the crates that i/o repository so for example we run the compiler against all of it on a regular basis checking to make sure everything still works the same as it used to and so on so that brings me to roughly the end of my talk I want to leave you with a few things first off there is a bunch of resources if you're interested in rust we've recently completed a new book which I think is very exciting it has a really good direction for for learning rust and covering all of those various features let me have a users mailing list we have an IRC channel I think there's probably slack and other things although not official as well and as I said we work really hard to make sure that these are friendly places to come and ask questions and the last thing I'll point you at is I have this screencast collection of screencasts called into rust comm which kind of take a similar - style to what I did here teaching you the different aspects of rust and so forth so one thing I should mention because I always forget I have a lot of stickers if you like stickers please come and find me also if you have any questions about rust and so on
Info
Channel: Coding Tech
Views: 79,910
Rating: 4.8983297 out of 5
Keywords: rust, rust programming, rust tutorial, rust programming language
Id: _jMSrMex6R0
Channel Id: undefined
Length: 70min 34sec (4234 seconds)
Published: Sun Dec 10 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.