Crust of Rust: Dispatch and Fat Pointers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Watching all Jon Gjengset videos > watching the LOTR Extended Edition.

This guy is an absolute legend. How awesome is it to live in a time where you can have free full masterclasses on Rust from a PhD of MIT.

👍︎︎ 73 👤︎︎ u/avwie 📅︎︎ May 01 2021 🗫︎ replies

Wow, great video. This guy is super clear in his explanations.

Do we know what amazing vim setup he's using? Thanks!

👍︎︎ 6 👤︎︎ u/lefuet 📅︎︎ May 01 2021 🗫︎ replies

awesome video man!

I'm still bound to Go due to past decisions and this video definitely made me regret that even more. :)

👍︎︎ 26 👤︎︎ u/chance-- 📅︎︎ Apr 30 2021 🗫︎ replies

At the end of the video, when asked about the difference between dyn Fn and fn, you mentioned that the fn cannot be a closure. That's not completely true, a non-capturing closure is also an fn. You could pass the closure || println!("Hello world!") to your bar function.

👍︎︎ 6 👤︎︎ u/MEaster 📅︎︎ May 01 2021 🗫︎ replies

Lol what an intro!

Saving this for later.

👍︎︎ 2 👤︎︎ u/elingeniero 📅︎︎ May 01 2021 🗫︎ replies

I don't understand why the "static method" fn weird() {} couldn't be in the vtable but just be called without passing in the receiver half of the fat pointer.
s.vtable.weird() seems fine to me?

👍︎︎ 2 👤︎︎ u/qm3ster 📅︎︎ May 01 2021 🗫︎ replies

Thanks for posting the new video u/jonhoo! I'm still watching the video, but from the table of contents it doesn't seem to go into polymorphization. There is still little to be found on this topic, might it be an idea for a next deep-dive topic about monomorphization and polymorphization? :) Thanks again!

👍︎︎ 1 👤︎︎ u/Pointerbender 📅︎︎ May 01 2021 🗫︎ replies

Super cool video. I love the "machine code" level explanation with fat pointers.

Does anyone know what browser setup he's using? How do I put the title and URL bars on the bottom, and how do I do it on Firefox?

👍︎︎ 1 👤︎︎ u/adbugger 📅︎︎ May 11 2021 🗫︎ replies
Captions
hello folks welcome back to yet another crust of rust um this one i i'm hoping i didn't bite off too much to cover in like an hour and a half to two hours i guess uh by the time you watch this recording afterwards you'll you'll see whether i succeed or not um so for this one what i wanted to cover was a sort of chunk of semi-related topics around um traits and dynamic and static dispatch the size trait um like wide or fat pointers and some of the things around like v tables coherence they all have this that they're all sort of tied together some more so than others and i get a lot of questions about each one and i figured i should just do one stream where we talk through how all of this stuff works and how it fits together um i'm i'm hoping that um i'm hoping that we don't end up too far down any of the any one rat hole um i think what i'll try to do is cover most of it in sort of workable detail and then we might do sort of a deep dive on one of them in some later stream but but in order to try to contain it to the time you might find that i skip over some of the detail and hopefully that'll be okay um so i actually wanted to start this time on the rust book so the rust book has a chapter on generic types and traits and lifetimes um and there's this one subsection in there on the performance of code using generics and this is a section that like if you've read the book and i i highly recommend you do if you haven't already um might have like struck you as you might have read and be like that sounds like magic and in some sense what the stream is all about is trying to deconstruct that magic and figure out what's inside and in particular it talks about this idea of monomorphization um and monomorphization is the as the text says the process of turning generic code into specific code by filling in the concrete types that are used when compiled um but that sort of uh sort of abstract um so let's try to pick up a slightly more concrete example and then work from there uh let's go with uh new lib um we're gonna call this one we're going to call this one naming is hard it doesn't matter but that makes it even harder because there's no constraints we're going to go with uh xm example written in region um all right let's get rid of this test we don't need no test um so let's say that you have a function uh greater uh it is generic over t actually let's go even simpler let's say that um we're going to have a function sterlin and it's going to take anything that implements uh asref stir and it's going to give you back a use size um so we're going to do asref blend real simple function here um let me make it a little bit larger so this is a generic function you could just as well have written it as s is s to u size where s implements s refstr uh these two are equivalent that's ref1 and i guess we'll make these pub to make the oops to make the compiler stop complaining so these two are equivalent um they're not quite equivalent but they do the same thing um and both of them are generic functions uh they can take in any type that implements uh that that can be turned into a reference to a star which is what this says and same as what this trade bomb does and what happens is if we write any function let's say foo and it calls sterlin of let's say hello world um and then it also calls sterling of string from hello in this norwegian a good good handy shortcut for for picking examples um so this just to sort of see why these are different right this is giving in a type of um a static stir oops uh this is giving in the type of string whoa um right so this is where we get into the point of this function is generic it doesn't just accept a string reference it accepts any type that can be turned into a string reference so string for example implements azref stir because if you have a reference to a string you can trivially get a reference to a stir because the internal representation of a string is a star right so here we immediately see sort of why generics are useful it's because it enables you to call into a given function in a more flexible or ergonomic manner and behind the scenes what happens and this is what the rust book is talking about when it talks about monomorphization is that the compiler actually generates two copies of this function um one for each of these types so at compile time what we're actually going to end up with is like a sterlin refstr which actually takes a stir and calls estadlen and we're gonna end up and we're gonna end up with a sterling string which takes a string and does the same thing right so this one generic definition gets turned by the compiler into these multiple non-generic implementations that's the process of monomorphization uh and this is sort of key to how generics work in rust and and this process doesn't just happen for uh functions they happen for types too so imagine that you have a hash map that's generic over like the key and the value you actually get a full copy of the entire struct and all of its methods one for each type that it is used by notice that it doesn't generate um copies of this for types that we didn't invoke it with it this is not like a generate for every possible type it is generate for any type like sort of on demand but on demand determined at compile time if you see code that needs this function for a given type then generate it at that point in time and a type would be the same like a hashmap would only a particular instance of the hashmap type would only be generated if that type combination actually appeared uh somewhere in the in the code this is actually also one of the reasons why it's a little bit hard to make um rust to compile rust code and then ship people a binary that they can use as a library because imagine that if you had this in a in your rust library that you wanted to sort of distribute in binary form think like dynamically linked libraries and stuff it's actually fairly complicated because the contract is that russ is supposed to generate distinct copies of this function for each type but it might be instantiated with a type in a consumer that isn't defined where the source function is defined in the originating library and therefore you need the source in order to generate all those instances and the reason why modern modernization is great is because you end up producing potentially much more efficient code right sterling is a bit of a bad example here um but let's imagine that we had something a little more complex um actually a good example is hashmap i'm not gonna write out the full definition of a hashmap here but what you end up with hashmap is because you get a copy of the type for each concrete type you use imagine that one instance of the hashmap has the key be a string and another instance has the key be a number you would actually end up with the compiler when it generates the code for each of the methods of the hashmap it would generate like it would sort of inline the definition of say the hashing function for each key type so the one the hashmap version of the code that uses string would sort of put in the code for hashing a string directly into the hashmap code and it could optimize that code based on the fact that that type is a string similarly for something like a number it might be able to skip hashing altogether because it realizes once it sort of monomorphizes all the generics that actually the hash is just the value of the number um that might not actually be true but but you get the sense of like the compiler gets to see the concrete written out code for the particular types that are used which lets it optimize a lot better does this all make sense before we move on to the sort of more complicated aspects of this does the general idea of sort of generics and why why modern modernization is nice makes sense so the downside someone observed in chat is yeah for for monomorphization one of the downsides is that your binary ends up being larger because you need to generate a copy of the type or the function for every type that it's used with um it's not quite as bad as like just multiply by the number of types uh one of the reasons is because for hashmap for example you might only use a subset of the methods for any given type combination and it will it won't generate like like if you ever have a hash map that has string as the key type it generates the entire hash map like all of the methods for hashmap for the string key type uh it will only generate the methods you actually end up calling um so so that amortizes the cost a little bit but but it is true that um it leads to more code in the in the final binary and a sort of side effect of that is that your code can be a little less efficient um in the sense that now rather than having one function called sterlin that like one um sequence of like assembly like machine code really um that the the computer can jump to and then it can keep those lines of machine code like in cache which lets it execute it more more efficiently the next time around what you end up with instead is you have two different functions and some code jumps to one and some code jumps to the other and they have to be cached separately because they're the separate regions of memory as you actually end up with a slightly worse um cache efficiency for your instruction cache which is a downside um it's also a little bit expensive in the sense that the compiler does need to generate these copies but but usually that's not too much of a cost like actually generating these unless you end up instantiating with with a very large number of types um is that one of the primary reasons rust binaries are larger than c ones um so that that is slightly different uh rust binaries statically compile more stuff um which might be part of the reason part of it is modern modification uh part of it is um i forget the exact details but i think there's more stuff from the standard library that gets up compiled in and another really big one is that rust often builds in debug symbols which if you don't strip them from the binary you end up with a binary that has very little actual machine code but a lot of debug symbols as you might want to try to just strip the binary will the compiler try to inline generated functions yeah so this is one of the reasons why monomorphization is really good it's because it doesn't just like in like whether it's string or stir but if you have say a um fn like a bool then which takes a a boolean and it takes an f which is a function and it returns you an option t and then let's say the implementation is something like if b then sum f else and this this function actually just landed in the standard library although as a method on bool so here um if you ever ran this function with a particularly provided closure or or function it would actually generate a copy of bool then with the code for the closure you passed in directly inlined into this function and so it could be optimized accordingly so yeah it does end up inlining um or inlining is the wrong word here um it has the option of inlining and optimizing us based on that it doesn't have to um it's just that because it gets monomorphized to the particular type it knows exactly what that type is and it can choose to inline if it wishes which is an important distinction big reason for larger size is standard lib that's not quite true because only the parts of the standard library you actually use end up getting put into your binary it is true though that if you're using a lot of generics from the standard library that ends up growing the size uh how big of a cost is duplication of methods um basically none i think actually the actual generation of multiple functions isn't that important but what is important is if you generate many many copies of a function you have to compile each of them into machine code which slows down the compilation process wouldn't sterling string take ref string rather than just string by value it can take either actually they will both work which is one of the reasons why taking asrefister is nice how do dynamic libraries handle generics they don't this is like one of the reasons why it's challenging to do um dynamic linking of rust libraries or even just distributing rust libraries in in um in binary form and uh it's something that i think in swift they've been trying to stabilize something like this we don't quite know how to use it for rust yet and yeah you can't use it for something like if you want a a c die lab like a dynamically linked library in rust that has a bunch of like extern functions those can't be generic all right so now that we have an idea of what monomorphization is i want to talk a little bit about what um the actual process of dispatch here so dispatch is the idea of um actually let me give you a slightly better example that we can work with for a little bit longer throughout the stream so let's define our own trait we're going to call this trait uh let's just call it hello it's simple enough or i guess let's go with the norwegian high and it just has one method and let's say it actually returns nothing it just has a symbol a single function that just lets you say hello in a region and we can imagine that we implement that trait for let's say stir um and all that's gonna do is it's gonna print line uh self oops um and then we're gonna do uh i guess let's just go with j dot hi ah can't type today all right so so what happens behind the scenes here is that when the compiler tries to generate the code for this right it needs to it determines the the type of this then it looks up like what um it looks at which methods are available on just the star reference type um it doesn't find one called hi and then it looks at the traits that are in scope it finds that there is a high method on this which is implemented for the receiver type and so it ends up calling that seems fine now let's imagine that we have a sort of a generic version of this where we take something that implements high and i guess h and it's just going to call h dot hi so this one's trickier right when the compiler has to generate the machine code for this it needs to somehow call this method but it doesn't actually know the type of h and this is where monomorphization comes into play right that in reality what's going to end up being generated is a bar stir which has hb stir right and now dispatching this like figuring out where to what to call here is trivial because we know the concrete type so this is what's known as static dispatch that is at compile time the compiler knows what the actual type is and therefore when it tries to call this method it just knows that that is this method and it can just jump to um like this basically becomes a call and like assembly code to call this function which is at some known location in memory right because it's just where this method is does that make sense for for static dispatch right that um actually figuring out how to compile this code becomes trivial once you can generate this method because this method is really the same as this and these are both trivial to figure out where the high method lives so now that we've looked at this sort of static dispatch you might wonder well what is the alternative right what if we don't want to generate a bunch of different copies for this um so this is where we go back to um dynamically sized types um oh sorry yeah impul high here is um it's just syntactic sugar for generics um this is equivalent like i showed with um sterling before this is equivalent to h is high h these these two are equivalent um yeah so so what then is the alternative we don't want this monomorphization we don't want multiple copies or the other reason why you might imagine this is um let's say that you want to um let's say you have a vector of things that implement high right so i wanna in foo here um i want to have a vector of this and i want to do like 4h in this vector call h dot high that's fine if they're all the same type but what if i wanted a sort of collection or sequence or set in some way of things that are high but i don't i don't need them to actually be the same type all i care about is that they implement this trait right so so um more concretely imagine that i want to write bar and i want to take anything that is a say slice of like impul high let's start start with this version right so for h in s h dot high this works fine um i can call bar with say a um with something like this that works just fine that compiles is fine similarly i can do i can turn each of them into strings that also works just fine oh right i didn't implement this for string so that was just fine but let's imagine that i want to be able to call it with like this should really be fine but i can't do that right like the i can't create a vector of things that are different types that's not a thing that i can make in the first place what i really but but in this case i don't actually care what the concrete type is i don't care that these are strings all i care about is the fact that i want them to implement the height rate that's all that matters to me so i should be able to create a vector of things that have this behavior without caring about the concrete type and this is where we get into the the sort of area of dynamic dispatch and trait objects and the book has a chapter on this too let me find where that one is uh it's oh i have it i've opened it in the wrong order which was silly of me uh sorry about the jumping around um it is down here in the chapter here so the the book has a pretty decent chapter where it talks about the ability to treat things that are different concrete types as the same type um so another example that they use is in a in the case of like a gui you might have lots of things that are drawable like they implement a draw trait and you might be able to take you might want to have like a draw function that just takes a sort of list or iterator of all of the things to draw and some of those might be buttons some of those might be images some of them might be text boxes but it doesn't really matter for the draw function all it cares about is that it gets an iterator of things that are drawable which has a similar sort of flavor to this right where we want to just take a slice of things that i can call high on and what you need is what's known as a trait object a trait object is where you see the din keyword this isn't actually going to compile so what i'm going to do is sort of this is sort of what i want right i so impul remember is a shortcut for and this is where it will be useful to actually write out the types um as a shortcut for this and this sort of gives a clue to what's going on bar is generic over one type which has to be either a stir reference or a string it can't be both because only generic over one type and you can imagine we have like an h2 high here but given that we're taking a slice or let's say an iterator there is nowhere to put h2 there is only the one iterator which only has items of one type so this won't really work in some sense what we want to say is that each of the things is a high it it doesn't have any other concrete type that we care about and in fact if we write this we get a compiler error saying let me pull this up on a new screen it says trait objects without an explicit din are deprecated okay let's get rid of the warning first it's saying we need to put din here so we're just going to do that and now it says the size for values of type dinhi cannot be noted compilation time right so it points to this and says doesn't have a size note to compilation type the trait size is not implemented for din high slice and array elements must have a size type this is a an error that you might have come across before and part of what we're going to be talking about for some of the remainder of the stream is why we get this error what it means what the fix is what size means and sort of how how this ties into dynamic dispatch and trade objects let me pause here for a second because i've just thrown a bunch of stuff at you let's see whether it makes sense um so uh it's a bunch of discussion of dark mode and light mode um so really what we want to say is we you can think of this as sort of type erasure right we want to say that i want to take a collection of things that and i only care about the fact that i implement this trait i sort of want to take an abstract notion of this trait um all right so the the challenge of the compiler is pointing out is in this case um uh where are we gonna even start with this all right i think we need to talk about size first so let's go to the sized trait so the size trait has no methods it is a marker trait that's what it's in the the marker module and the description just says types with a constant size known at compile time now most types are sized like if you create a struct foo it's probably sized most types in the standard library are sized in fact there are very few types that aren't sized so few in fact that every trait bound you have even if it's just like a foo t has an implicit bound of requiring that the type is sized and to see why think about what happens when let's just sort of comment this out for a second let and comment this out for a second i just want to get actually i guess i can do this smarter so uh let's sort of go back to our our very simple sterling example in the very beginning right it's going to be generic over some s that implements s refstr it takes that s and it returns you size as reflect right so imagine that some something wants to call sterling well sterling takes an argument that is that s so it has to be past a thing uh like somehow it needs to be passed that argument well the argument that it gets passed has to take up space on the stack or has to be passed in a register either or um which sort of means that the compiler has to make sure that there's sort of code in the in the resulting assembly that like allocates the required space on the stack for whatever this argument is and that requires the compiler knows how large the argument is right if if um if s here i guess think of the concrete implementation right so sterling stir here this is just a let's go with the string implementation um so string is really just a uh it's sort of two numbers and a pointer right it's the um length of the string it's the size of the allocation and it's a pointer to the first character in the actual string on the heap so it's size the compiler knows that the size of this argument is three words um three u sizes if you will right and so it can generate the assembly code for this function and for calling this function trivially because it knows how much space the arguments take up you can imagine something similar right where if we were generic over the return type it would need to know how much space to actually allocate for the return type on the stack and so even though we didn't specifically say that s is sized here it just is basically always a requirement that the type is sized because if it wasn't the compiler wouldn't know how to generate the required code for it right all right so um why why then is it compiling when we're trying to use this well din hai is really just saying anything that implements high like just something that generates high without giving a concrete value because remember we want this to be a slice of potentially like string and stir and whatever other types might implement this trait and those are all different sizes right a string is two numbers and a pointer a stir is one number and a pointer and you could imagine that we have some type foo that is like like a gigabyte large as a struct or something insane right and we implement high for it and that is also something that we might want to pass in here so this slice doesn't have a size it's it's just not like if you if you a slice is really just a contiguous piece of memory right where each chunk is the same size it's an array where all of them are the same size but if we don't know the size of them we can't guarantee that they are the same size if i told you if i gave you one of these so the din high here is not sized right because it could be any type that one's high which could be any size and i asked you give me the fourth element normally in an array because all the elements are the same size the fourth element is just the pointer to the start plus like three times the size of one element right this is a point of arithmetic to get to the fourth element but but if they're different sizes you can't generate that code and this is why it's saying slice and array elements must have a size type because otherwise the type just doesn't make sense so how can we fix this like we really want to be able to do this right it'd be insane if russ didn't didn't give you the ability to talk about things that implement a trait as a collection and arrays aren't the only problem here right even if we took um even if we wrote uh what's a good example of this let's say we had a sterling 2 here that took a like din asrefster it has the same problem right if we if i uh try to compile this you see it says the same thing let me get rid of the other message it says the size for values of type didn't ask for stir cannot be known at compilation time size is not implemented for din asref function arguments must have a statically known size borrow types always have a known size and it tries to give you a suggestion for how to fix it but this is sort of the key insight again that we need to the compiler needs to know how large the type of an argument is otherwise it can't generate the function call code and so this is sort of like a fundamental requirement of the compiler that these types are sized does that make sense so far like why this is a problem can you show us implementing sized so size is auto implemented for any type that can implement it so if you do like a struct foo with no fields it is sized if you add a string in there string is sized so foo is sized if you add another bool like this type is now still sized types are always sized if they can be the exception would be if i did something like and in fact even if i did foo t and said that this holds a t it's still sized because remember that there's a an implicit requirement for every trait bound that it is sized so there's sort of an implicit t sized here uh and so that means the t is sized so all the fields are sized so therefore foo t is sized for any tea as you you never implement size yourself it is only ever used as an auto trade um the the issue is not stack size the issue is not that it might not fit on the stack it's that in order to call a function right just at the machine code layer or think of it as the assembly layer right in order to call a function you need to know when you call that function how much space do you allocate for the like stack variables of that function we might include some of its arguments um or on the caller side when you have to allocate the space for the return value how large should that space be fundamentally that just requires that you know how large the space is because you can't generate the machine code for for making that stack that there are like sneaky ways you could do this but basically you require the compiler in order to generate efficient code has to know the size of these things let's see what things aren't sized um so we've talked about like sort of bear traits um so this is if i write pub fnfu and i just say that i take a high right high here is a trait and lots of things implement a trait and they might all be different sizes so this doesn't have a defined size if i say impul here right impul is a shortcut for h high so this adh is sized because we get a copy of foo for every concrete type and for each of those copies the size of the type is known so if we use static dispatcher from one amorphization this is fine but if we just try to take like the trait itself saying basically this function takes anything that implements the trait this won't work because this is not sized and this is equivalent to writing din high right this is not sized we'll talk in a second about how you how you make something like this sized um the other example is something like um if we go back to my my struct foo uh if you try to have it hold a stir uh stir is not sized because a string can be any size sort of if you think about not the pointer to the string but the string itself can be any size uh the same with like um this just like a slice without a reference is also not sized because you don't know how many elements there are right so it doesn't have a well-defined size um uh those are the two best examples i know of of things that aren't sized and then anything that um that in turn contains those types um there's a lot of discussion in chat trying to explain the things that i'm about to explain on uh how we fix this and why you need references and stuff i will get to it i promise great so i think uh roughly where we've gotten to so far makes sense uh it seems like so uh the question now becomes how can we resolve the situation right we want to be able to take a for our sterlin 2 here let's call it sterling din maybe we want to be able to have a method like this that doesn't get monomorphized but this isn't size so what do we do the trick here is to make this be a type that is always sized and the way that we do that is by indirecting it through some type that is itself sized for any um inner type an example of this is a reference right so a reference is always the same size it's just the size of a pointer or in some cases two pointers we'll get to that in a second right this does have a size right the compiler always knows how large one of these are no matter what the type that follows is right because this is just a pointer um the same would be if you have a box right a box also just a pointer to the heap and so once we do this this the size of this argument is now known and we're fine and this is in fact how you take something to take a a trait itself without more morphising is you place it behind uh some kind of pointer type like um like a reference like a box like an arc the way this works in practice is that if you look at the definition of box so it has magic in it right or it's not magic but it has a bunch of stuff inside of it uh basically it just has a pointer it's not quite true but close enough a box actually has a has this definition on it and the question mark here means does not have to be so it's basically opting out of the auto bound that gets added of everything has to be sized saying for box the t does not have to be sized and that's why you can create a box of something that itself is not sized and the reason this works right is that when you create the box initially you make an allocation that's the size to whatever you're going to put inside of the box right so let's say the let me sort of make a main here that's gonna exemplify this at some point i have to do like box new of say uh string from hello right and then i can say that now this is gonna be a din asref stir this should say x and this should say stir and then i can call sterling din with y so at some point when i created the box i had to give it something with a concrete type and at that point it allocated on the heap space for that that type but the box is still of a known size because it's just the pointer when i turn it into a box for this it's still just a pointer conceptually even though the thing it points to might have an arbitrary size but the actual argument that we passed to this function does have a size and therefore we can generate the relevant code does it make sense why indirection through a type like this where the indirection type size is known does that make sense we'll talk about combination of traits too let's see i promise we'll get to multiple trades um can't you just put boxed in as ref stir directly on x yeah so you can do this too s refstr that's also fine um but even so you still had to provide a concrete type to box new that was sized uh when should you use box over references like when you choose this over this there's no hard and fast rule there in general the the advantage of box is that it's static so you can keep using it even after the stack frame of the caller goes away so it's the same as when you choose box over references normally can i also then give the function a reference to the stack yeah exactly so you can also do let me do that down here so i can totally do and then sterling din why right [Applause] so this let's say i have a din2 that takes this um and this can called in two why is this complaining uh it's because box has an asterisk method too that's why the double a surf is needed yeah so you see um i can do the same for something that's on the stack i can create a um sort of go through a pointer in direction here and then pass it to something that takes a reference din and in fact i think i can even do this uh yeah that's fine it's not not terribly important why that doesn't work um so what we just constructed is known as a trait object uh it is an object that has the only has the property that it represents a trait um so this is what trait objects are they are things that only behave as some underlying trait this is a trait object and this is a trait object it doesn't matter what the wrapping type is uh box is kind of a pointer type yep box is a pointer type um can i convert a trait object to its original concrete type i'm gonna go with no but technically yes sometimes um but but in general no you can think of it as um this is essentially type erasure you can technically go back through like unsafe transmutes and stuff but in general the moment you turn it into a trade object you you erase the knowledge about what type it used to be but yeah there's like some magic around unsafe and some magic around the any trait that you can use but but in general you should ignore that um also note that when you do this you only retain the ability to use this trait you erase all other knowledge of what the concrete type was so the only thing you can do on a boxed in astrif stir is call asref on it nothing else same thing here and we'll get into why in a second um all right so um this race is an interesting question let's let's take the example let's make this the the two and this the one and just swap these around so this still compiles um let's now think about what actually happens when the compiler has to generate the machine code for this function right at some point it has to actually do the compilation and produce assembly code that can in turn be turned into like runnable machine code so s here is some type that implements asref in fact let's go back to our high trade that's going to be a little bit nicer so let's go back to here and let's say that i want a say hi uh which takes a din hi returns nothing and it just calls high right so this is going to be this takes a trade object for the high trade but when the compiler generates the code here it doesn't know what type s is right it's just it's the trait object like it it's really just a pointer so how does it call the high method remember in the generic case h in this case um we actually get a copy of this for each concrete type right so really this turns into this and for this concrete implementation generating the machine code is trivial because this is really just like an assembly call to like uh whatever is uh to this line to line six it's not actually line it's memory locations in the binary but you get the idea right that the compiler knows like statically at compile time where to go here for the dynamic case though what do we call we don't know that this is a stir or a string right so we don't know whether to generate the address of this the address of this or the address of something else entirely so what machine code does this turn into well this is where we get into dynamic dispatch and v tables or virtual dispatch tables so the trick to constructing a trait object is that the it's actually not quite true that the reference of the box is just one pointer wide it actually carries a little bit of extra information um about the type that is pointed to so here let's go back to the um back to the the rust book so the rest book has a chapter on dynamically sized types and the size trait this is under the advanced type chapter and what it's talking about is let's in fact it doesn't even talk about as much as we wanted to talk about um here all right let's use the rust reference instead so the rust reference on dynamically sized types says most types have a fixed size that is known as compile time and implement the the trait size like we've talked about a type with a size that is known only at runtime is called a dynamically sized type or informally an unsized type and this is a slices and trait objects are two examples of dynamically sized types the the reason why this is only known at runtime is at compile time we don't know what type this is going to be right it's only at run time that we'll know and we can sort of to give to make this a little bit more concrete imagine i have a main function where like if rand i don't have a random number uh random is four uh if random is four then say high of this otherwise say hi of string from world right let's imagine that random was more random than four um you don't know until runtime like let's let's say this was like read from the user at compile time you don't know which actual type is going to be used here it's only determined at runtime by some input that can't be predicted at compile time right so that's why this is determined at runtime and it says pointer types to dynamically size ties types are sized but have twice the size of pointers to size types so pointers to trait objects also store a pointer to a v table and this is where we get into what is a v table well when you have a trait object something like at din hi what actually gets stored in the in the reference is one a pointer to the actual concrete uh implementing implementing type and two a pointer to a v table for the referenced trait so then what is a v table so a v table or virtual dispatch table is a little data structure that has pointers to each of the methods for the trait for the type so in the concrete case of the high trait so let's say we have a din high the v table is going to be sort of you can think of it as a high v table struct that only has one member which is high which is a pointer to an fn an fn actually that itself takes a pointer to um whatever the t is so basically um a different v table ends up being constructed for each concrete type turned into a trait object so when we start out with um let's say that we have a stir and we want to go to a din hi what that actually gets constructed is a let's say we have a um well number one becomes a pointer to the stir and number two becomes a high v table where high is stir as high high and a pointer to that when you make this conversion the compiler knows that it has to construct this v table and it sticks the address of that v table inside of that reference so this reference contains two things the pointer to the actual type and the pointer to the v table that the compiler generated during this particular conversion so there'll be one such v table for each type that gets turned into a trait object and so now when this code when the compiler tries to generate this code what it actually generates is like s dot v table dot hi s dot pointer and so if someone passed in say a string instead it would have a different v table so the s dot v table would be a different pointer so in this case this would be um where's stir high like line six right so this would be a pointer to line six but if we instead went i'll stop for questions in a second if this instead went from a string to a denhai it would be a pointer to the string and string as high which is this method up here on line 12. so in other words this code that gets generated will work regardless of which type is passed in because we indirect through the v table and each type has its own distinct v table all right that was a lot to throw at you at once but hopefully it wasn't too bad let's see uh are the v tables themselves statically built at compile time are they also allocated dynamically the v tables are the v tables are built at compile time they're built statically um and so in general the the second part of the so this is what's known as a fat pointer or wide pointer because it has two pointers in it not just one so anytime you see anytime you see this it's actually a wide pointer and in general the second pointer in there will always be known statically because it's determined by the the original construction of the trait object the reason i say generally is because you technically can construct it on the fly like there's there's nothing in the calling here that requires that it was known statically so you could you could imagine manually constructing a v table and this is used we'll actually talk about that a little bit later um uh refstr also contains the length of the string slice yes so in general this will actually be this so it's not a pointer to the actual string it's actually a pointer to the stir reference which itself holds the length um can we debug print that v table struct somehow um i don't know of a way to get the compiler to print its vtable struct but it's basically just this it's just there's a member for each method of the corresponding trait that this is a trait object for where each member is each member is a corresponding method um and the value is the pointer to the implementation of that method for the the concrete type um then why are trade functions that don't take self not object safe couldn't the compiler just generate an fnmu itself that doesn't use self we'll talk about that in a second does it construct a new v table every time we create an instance no the v table is generally statically constructed for the type it doesn't get constructed dynamically are identical v tables detected no there's no deduplication of these and in fact they are guaranteed to not be duplicates if you implement a trait for two different types then the implementations even if they contain the same code are still distinct locations in the source code and in the resulting binary and so would have different addresses trait objects of more than one trait we'll get to in a second so box din is a thin pointer that points to a wide pointer the points of the object no um a if you have a box din hi uh that box itself is a wide pointer so box is a little bit of a special type actually it's not that special it's just the box internally contains a muted high and the rust compiler knows for references and pointers that if they are trait objects they're actually with two um so box isn't special but pointer types are uh so so box denhai is itself wide you don't end up with an extra indirection um yeah we'll talk about nightly apis for trade objects too all right so i think we're now at a point where that roughly makes sense um so there are a couple of reasons why uh trait objects are a little bit more constrained than being fully generic so for example uh let's imagine that you wanted to have a um pub event baz that took an s that is din da da um let's say hi plus asref stir right and we're gonna do like s dot hi and then we're also going to say let s is s dot asref and then we're going to do s dot high again because we know that strings themselves implement high i guess maybe that's confusing maybe i should just do blend the compiler here says oh it's ambiguous all right let's get rid of the ambiguity it's not really relevant to the discussion uh it still won't let me do it why not do i have to do this maybe for it to not be ambiguous no all right i'm just gonna let's pretend that the syntax worked for a second here so basically i want to say that i want to trade object of two different traits this won't actually compile and ignoring the parsing error for a second there's actually a bigger reason why this won't work which is we need in order for this to work we need to know both where the high method is and where the asref method is but those are contained in two different v tables one is the v table for high for the concrete type and one is the v table for azref for the concrete type um and so what we'd actually need is not just a two wide pointer but a three wide pointer a pointer to the data a pointer to the the v table for high and a pointer to the v table for the uh for asref and that's possible there's nothing that's sort of inherently impossible by that but it would mean that as you add more traits the size of this reference just keeps on growing and that's probably not what you intended and you might go well can't the compiler just generate a v table for the combination and then pass that in that is something that in theory it could do the rust compiler doesn't currently do this i don't know of a proposal to do it part of the reason is because you can get away with you can get around this by saying pub trait hi asref and it requires that the type is both high and azref stir with no methods of its own and now i can do this right because now there's only one v table and that v table is high asref and the compiler knows when it has to generate the v table for this trait that includes all the methods from both of these traits and so this v table is going to be larger because it is going to have both the high method and the asref method and so i think part of the reason this hasn't landed in rust itself is because you can you can opt into it yourself and by not not automatically turning the plus into a larger v table you basically tell the you tell the developer that this might not have been what you meant but if it is you can do this thing instead um i agree the compiler error here probably been better right summon esteban cooper um that uh that maybe we could sort of inform people that they can uh generate their own trait here uh i think it's someone wrote i think it's high plus asref stir let's see if that's true yeah okay so so that is the the way to get the right compiler error which is only auto traits can be used as additional traits in a trait object whereas here you have an example of um two different traits that are neither of them are auto traits we'll talk about that in a second and here it does actually give the the suggestion we talked about right consider creating a new trait with all of these super traits and using that trait here instead and it also points out that auto traits like send and sync are fine and basically the idea here is that for marker traits like if i want to say hi plus send send doesn't have any methods so it's fine for the v table to be empty and therefore we don't need to store that additional part of the v table because there aren't any methods in it so that's why it's okay to have additional traits with no methods i don't think you can use your own like if i did a fux trade and i said this plus fuchs i don't think that'll work um but there's been some proposal of having like a marker annotation to declare a trait as never having methods but that's not something we currently have so so at least for the time being this is the way that you should do it um great all right um so that is is how this works with uh combinations of traits but that's not the only example where trait objects are a little bit limited so the other one is let's imagine that for high we added a um actually let's start with saying that we added an associated type um let's call it name and we don't actually use it for anything we just add one the code already doesn't compile why doesn't it compile well uh the value of the associated oh that's the implementation let me do that so this is going to be type name uh equals unit i'm just going to use that for both of them because we don't actually care about the type yeah so here it says the value of the associated type name from the trait high must be specified so it says here where we take it in high we actually need to specify the associated type we can't just say we take any high regardless of what its associated type is and the reason here is because that information can't be captured in the v table because this is a type it's not it doesn't have an address in the binary it doesn't have anything we could stick in the v table and therefore this won't work with a trait object directly instead what you would have to say is i take a high where name is this and i think we need to do the same down here uh and then that works because now we have a v table specifically for anything that implements hi where the associated type is name so that's fine let's undo this because it's not a very interesting change the other perhaps more interesting case is let's say that we want a um what do we call i'm going to call it we're going to call it like uh weird uh we're gonna have a weird function that does nothing no one needs to implement it um and you see that this also does not compile if i run cargo check this let's make this a little bit smaller so that hopefully it's a little easier to understand hopefully it's still readable um oh that's very verbose let me comment out the combination trait bit just to make the errors a little nicer all right so this says the trait high cannot be made into an object so when we say dinhae the basically it's telling us that the high trait is not object safe that is it cannot be turned into a trait object uh and it says for a trait to be object safe it needs to allow building a v table to allow the call to be resolvable dynamically so this is what we've been talking around through so far and in particular here it says the trait cannot be made into an object because the associated function weird has no self parameter it doesn't really talk about why that's necessary though but let's try to work through it so again let's say that down here in our say hi right we now try to call s dot uh weird so imagine what what would the compiler do if we wrote this code well we can't really write this code right because weird doesn't take self it doesn't take a pointer to anything and we can't just like randomly construct a pointer to it because that would mean that we need to have a valid instance to it to have a pointer to it so this is really just saying like thin high like colon colon weird which i don't have to tell you seems weird because which type are we calling weird on here there there is no like nothing here specifies which actual weird implementation we want in this particular case because it's an over because it provides a default um you might think it's not a problem well what if there was like an implementation of weird for string that was different from the one on the trait which is allowed right you can overwrite a default implementation well how would this know whether to call the one for stir or the one default for the trait or some entirely different one it can't be in the v table because there is no s here right the weird method doesn't take itself it's not associated with a given instance of any type so this just doesn't work that it's not a meaningful thing to say um but what if you really wanted this like you wanted the weird function to exist but you're like i don't care about calling this through a trait object i only care about the high thing through a straight object so i want hi to be traits trade object safe but if you have a trait object to high i'm fine if you can't called weird so you can do this uh this um and let me go back down here have this be so what we're saying here is that the weird function requires that self that is the type that implements the trait is sized and we remember that traits aren't sized if you just have like din high that's not sized it's only sized if it's behind a pointer type so this is basically a way to opt out of the v table is the way that you can think about it it's saying this function shouldn't be placed in the v table and crucially shouldn't be callable through a trait object what this means is in here let me get rid of the static versions because they're just in the way what this means is now the code compiles just fine i can call hi i can make a trade object for it but if in here in fact i can go even further and say let's say that this did have a self but i wanted to opt out then this still compiles i can still make a trait object for it but let's say that i try to call weird in here the compiler is going to tell me the weird method cannot be invoked on a trait object because the method has a sized requirement and so this is the way to opt out for any given function from having it be included in order to make sure that the the trait remains object safe you can also opt you can also say that the entire trait should not be possible to turn into a trait object by saying where self is sized so if you do this you're basically disallowing using this trait as a trait object it's rare that you see this in practice usually the trait will like the trait just isn't object safe because of the methods in it opting into disallowing trait objects is rare sometimes people do it for backwards compatibility reasons like if you know that you might add non-object safe methods later you might add this proactively but but it's pretty rare all right do the restrictions so far make sense um does the associated type problem also occur with static dispatch no it doesn't uh and the reason is with static dispatch because it gets monomorphized you actually know the concrete type for any given function implementation s colon colon weird could be possible so this is back when we had the fn weird right and let's say we also add an override here um and now let's say you wanted to be able to like let's say we added like special syntax like this saying for the type of s called the weird method um you could imagine that weird got included in the v table for um for the trait object but it like knows to not call it with the self argument the challenge here right is that this is sort of odd right like if you require an instance of the type anyway in order to call it then why doesn't it just take self right the traditional example of this right is that weird is actually called new or something like new right where it doesn't if you already have an instance of the type you wouldn't need to call the method in the first place you just take self so i agree like something like this maybe could be possible um i think in practice it's just not usually very useful uh and that's why it hasn't been a priority there might be a deeper reason why it's not added to i'm not sure does where self colon sized also disallow implementing the trait for concrete dsts you know that's a good question so the question is basically if you have where self is sized can i implement hi for box din asref stir for example uh i believe i can uh the reason i say this should be possible is because uh yeah this this now won't work because it can't be a trait object but you can still have this implementation i believe why does this all right because it's a box so we need to have the double asref this is fine because this is not a dynamically sized type right again like when you wrap it behind a pointer it now has a size it's not dynamically sized so it doesn't prevent that even if you have uh self-sized on the trade um the associated type restriction feels weird because the point of associated types is that only one exists for any given concrete type so shouldn't the v table know what the associated type should be like it can't have a pointer to the type but the implementation of the trait should know what the type is when it's compiled yeah so the problem is that the type is erased all that remains is the v table and so you can't tell from the v table what the concrete type used to be there there is an exception to this which is the any trait so the any trait has a method that returns a descriptor of the type of the concrete type that it used to be if you didn't follow that that's fine ignore it but basically there are some ways to sort of finagle around this but basically trade objects are type erased like you don't get to keep information about the um the type it used to be um oh you meant type such a stir you wait uh no you can't do that uh self has to be sized when you have yourself a sized um there's one other restriction that you get with trait objects which is that the methods cannot be generic and for this rather than sort of keep working on our weird hydrate i'm actually going to go to the standard library and look at the from iterator trait so the from iterator trait is something that's implemented by vec for example this is how collect works is that vec implements from iterator and when you have well so if we look at iterator collect i guess is the best way to do it um collect on an iterator requires that the thing that you collect into implements from iterator so that's why you can have an iterator and collect into effect is because vec implements from iterator the from iterator trait is a little weird though it a the the a type of the trait is the type of the items of the iterator but you also see that the from iterator type itself takes a t and the t here is the type of the iterator right because it has to be vec for example it doesn't care what the iterator is it can be it can you can construct a vec from any type of iterator as long as the i well you can create a vect t from any iterator where the item is t this arguably should be called i for iter but um it's not terribly important here a is the the sort of thing that you'll end up with a vec of so if this a was say bool you would end up with a vekka bools and the t is the type of the iterator which might be something like um i don't know like a hash map into it or whatever whatever that type ends up being um this poses a problem for a trait object though right let's imagine that we try to write code against this um so i'm gonna erase this for a second in fact i'm going to erase all of the remainder of this file then i'm going to have like um what are we going to do here let's do collect uh standard iter from iterator we're going to take a from iterator of bool uh and we're going to return um and i'm gonna do s dot collect oh what did i do i did something weird tonight great um so this seems like it should be fine right i take anything that can i take well actually this is weird for a number of reasons maybe from iterator is a bad example here because it doesn't take self either um so it actually requires self to be sized anyway uh extend is a better one let me dig up extend here uh great so we're gonna use extend instead uh and it's going to have to take a mutant um and so we're going to do let mute v is vector then we're going to do v dot extend uh actually we're not even gonna do that we're gonna take this is gonna be the v we're gonna do v dot extend uh and we're gonna give it just iter once true so we're going to just extend it with a extended with a single bool add true right so we take in anything that can be extended with an iterator of bools and then we try to extend it with an iterator that yields a single bool so we're just going to add true we're going to pen true to whatever thing we get and here too it tells us the trait extend cannot be made into an object because extend this is not object safe and if we look at extend that doesn't immediately make sense from what we've seen so far right so again it's generic over the type of the items of the iterator that's fine extend takes self so we do have something we can stick into the v table and given an instance we we sort of know how to call this method but again notice that it's generic over the type of the iterator and this is where it gets really weird you can imagine the like inside of vec right there's going to be impul t extend t for vector and it's going to have this implementation i'm going to go go ahead and call this i because i think it's more helpful um and who knows what's going to be in here right yeah just to get rid of that warning i'm going to do this um i guess be a little bit more helpful my back um like this implementation is going to exist somewhere in the standard library right except just for vec instead but this method is generic so we know from the monomorphization discussion that actually at compile time we don't end up with a single extend we end up with multiple copies of this method right we end up with like an extend bool that takes um an extend i guess hash map into itter which takes a standard collection hash map into iterator right and in fact even more than that we're actually going to end up with because the whole impul is generic we're actually going to have this that is the actual method that gets implemented or get sort of generated by the compiler assuming that someone tries to extend using a hashmap iterator of bools there is no one extend there is one for every combination of iterator and item types so what gets put in the v table we've nailed down that we want to extend bool so that's good right so there's only we don't have to consider that the t up here because we've already chosen that t but for extend itself we really in the v table really needs to have a pointer to extending for this iterator type over bool but that's not that's not expressed in the type of the function and we don't even have a way to express it right you could sort of imagine like where like these implementation of extend is standard iter once bool or something super weird like that but the way that it's written nothing here says that we want the v table to point to the extend implementation for a standard error once and therein lies the problem when the compiler tries to generate this code there is no pointer to the appropriate implementation of extend therefore the v table can't be generated you can't generate a v table for din extend because it would have an infinite number of entries one for each possible implementation of extend and so therefore the the answers just didn't extend cannot exist you cannot create a trait object for extend all right does that make sense that's a somewhat intricate explanation but but hopefully it all sort of ties together here um could rusty add a monomorphized version of extend extend for each t it's called with to each type that implements extend that's tempting but it's not always possible right so imagine that you're compiling this code but i mean even just for the standard library right like the standard library has a bunch of uses of extend but in crates that depend on that crate they might call extend with even more types so the size of the v table like when you compile the the bottom most crate if you will like the the standard library essentially that would mean that the v tables for din extend in the standard library is different than the v table for din extend in other crates further up because there are more possible iterator implementations that you might want to use and so you now end up with lots of different v table implementations or lots of different v tables for din extend and that's like it would be a combinatorial explosion kind of problem um and it would also mean that you can't pass a din extend from the standard library to a thing that takes didn't extend in the crate further up because their v table types are different um all right uh so this is the reason why uh why you can't have like the basically in order for a trait to be object safe it needs to not have generic methods it needs to all of its methods need to have a receiver that includes self and there's another requirement which is that it the trait can't have a type that returns self um so if we backtrack a little here and say clone and we want to take a thing that is din clone and we're just going to call clone because it's the only thing we can do with the didn't clone right if you have a din clone you can literally only do the things that the trait lets you do and the only thing cloneless you do is clone so this says the trait clone cannot be made into an object this is now an error that we've seen a decent amount the reason why you can't do this is because the clone trait right has a fn clone that takes self and returns self all right so let's say i say let x is v dot clone what's the size of x didn't clone is just any type that is cloneable and we're supposed it's reported to return self like not the reference so it's not supposed to return a referenced in clone it's supposed to return self given a reference so that sort of means that it should return din clone because that's what self is here because we don't we've erased the concrete type but didn't clone isn't sized so it would mean that the return value of this method isn't sized and the return type of a function must be sized for us to generate code for it therefore this can't work um so you can't have methods that return self for a trade to be object safe and that's why you can't you clone now there are some cases where you have a you have a trait where as as we talked about with high right you you might have some methods that are object safe and are still useful on their own but you have a bunch of other methods that aren't would make the trait non-object safe but you want to include them for types that aren't going through a trait object because just because it's convenient and in fact a good example of this is if we look at the iterator trait so the iterator trait right has the associated type item that's fine it has next and next is object safe but it also has a bunch of methods like chain which is generic it has enumerate which returns self it has uh by ref which returns mute self um it is collect which depends on from iterator which we already know isn't isn't object safe so how can iterator be object safe because it is right if we do if i say it's here and i say i want to take a din iterator where item is bool um the collect oops iterator well i can't call collect because this method cannot be invoked on a trait object but i can call next i'm allowed to call next on this uh mute otherwise it doesn't work that works fine and the way that the iterator trade has achieved this is that if you look real carefully you'll see that um if i actually go to the source of the iterator trait you should see this um let me just find here so count for example which consumes self right so it's not allowed to go behind a reference which will make it sized has where self is sized if you scroll down to last which is the same self is sized if we look to um chain you see the chain which is generic so normally would make the trait non-object safe has where self is sized and the reason for this is you can basically this is the same thing we did for the weird function right you can use the where self is size bound to say ignore this function for the purposes of using this trait as a trait object basically don't try to put it into the v table what that means is if someone has a trait object they can't call the method but if they don't have a trait object if they actually have like a vec intuitor or like they have the concrete type then they can call this method and so this is a way to have a trait that has some methods that are nice to have but wouldn't be object safe without making the whole trait object not object safe um can the receiver be anything that includes self or does it have to be refself or mute self um so the rules for this are actually uh where do i have this i have it open somewhere [Music] give me a second here uh so this is from the rust language reference on object safety and in fact by now we've covered most of these things all the super traits must also be object safe right because we construct a v table from the union of all the uh all the super traits uh size would not be a super trait so that is if we require self. self colon sized it's not object safe must not have any associated constants this again is because you don't have anywhere to put the constants theory maybe you could put them in the v table but now the v table becomes really large and also the values in the v tables are no longer all like function pointers they might be arbitrarily sized things uh all associative functions must either be dispatchable from a trait object or be explicitly non-dispatchable so dispatchable functions require that they don't have type parameters so they're not generic um they have to be a method that does not use the self type right the concrete type except in the type of receiver so this is we can't have something that returns self for example it can't have a receiver with anything but the following types right so it can be a reference to self it could be mutable reference itself it can be box self rc self arc self or pin self pin to self rather so anything that's behind a pointer basically anything that can take the anything that can turn something that's not sized into something that is sized and you can see that it explicitly excludes where self's is sized bounds so that you can have a an object safe trait even if some of the methods aren't object safe this is the non-dispatchable functions should library writers always consider adding where self is size to non-object safe methods just in case someone downstream wants to use it as a trait object i think yes um it depends a little bit on whether your trait is even usable so if your trait isn't usable as a trait object like think of clone right then sure you could add where self is size to the clone method so that people could create a trait object clone but they wouldn't even even be able to call the main function of that trait so it's probably not worthwhile but i do think that in general if you have a trait where it is useful even if you could only call the objective methods then it might make sense to to opt out for the other one so that the trait overall is objective did iterator last have the size restriction uh iterator last i think does um the reason why it lasts has where self is sized you can't see it here it gets hidden by rustoc is because its receiver doesn't go behind a reference it consumes self which means that it takes self and self is a din iterator which is not sized and function arguments must be sized therefore last can't be called through a trade object um there's one more thing i haven't told you well there are many things i haven't told you but there's at least one uh when it comes to trade objects which is that there's a little bit of a secret with trade objects um so let us say that i have pub trait actually i can just do this this is a drop is actually object safe you might think that it's weird like if all you can do with an object is drop it then is it really that interesting it turns out this is useful in a couple of situations like um i know crossbeam does this for example where you want to do garbage collection but you don't want to drop objects immediately you want to like stick them in like a linked list or something and then periodically go and collect all the garbage because you want to store lots of potentially different things in one type you need trait objects and so it just stores mutant drops i think technically boxed in drops but but same effect you can't actually do anything with this but here when v when v goes out of scope drop drop is called through the b table this should make you go wait a second because let's go with our say hi all right we take the din asref stir um or actually let's even go with boxed that's better what happens when s goes out of scope think about this for a second we get a box so we have a memory allocation on the heap we might use that in here might call s dot asref this could be hi whatever but then we drop s when this function returns but when we drop a box we have to free the memory but this is a trait object so the only method it has is asref the answer to this is every v table includes drop you can sort of think of this as like an implicit drop here but in practice the v table for any trait object includes a pointer to the drop function for the concrete type because it's just it's just necessary it also technically includes a little bit of extra information includes the size and alignment of the concrete type the reason those are in there is because for something like a box where you have to de-allocate the memory that information is necessary to pass to the allocator to do the deallocation and so every trait object includes the v or the v table for every trait object includes the methods for the trait plus drop plus size plus alignment normally you don't have to think about this but it's just worth knowing while we're on the topic um to shift gears a little bit so far we've only really talked about trait objects but there are other types that aren't sized so for example like we know that din trait is not sized but u8 is also not sized and stir is not sized this one gets turned when you place it behind a reference or a pointer right gets turned into a tuple of like mute a pointer to the data uh and a pointer to the v table this one if you make it sized or put it behind a pointer also becomes a wide pointer where one is a pointer to the data but the other is the u size which is the length of the slice and same thing for stir as a pointer to the data and use size of the length so this is why if you try to write a function foo that takes a ua the compiler is going to yell at you and say that the size for the value of type u8 is not known because u8 this is just saying an arbitrarily long list of u8s which is not sized therefore we can't call the function with it as the argument same as if you try to return just a straight up uh sequence right if i had a bar which returned a ua it also complains is it even gonna let me do that there we go i constructed the type um see here it says the same thing the size for the value of type ui cannot be known and the return type of a function must have a statically known size this is the same problem and here too the way that we go from something that isn't sized to something that is sized is we place it behind a type that can sort of mask the unsightedness which you do with a with a reference or you can do it with box so you can also do box of this i think i'm allowed to do this great so this is fine because it's now sized the size of it is a pointer that is wide where one part is the data and the other size is the length and the compiler then knows that when you access uh this type it knows to sort of deconstruct the pointer to get the length um same with something like box or or raw pointers for that matter and same for string so so these are a little bit magical there's been some work on trying to standardize this so currently the sort of dynamically sized types are a little bit special in the compiler because you need to know whether the pointer is sort of wide or not so it's fairly hard to sort of deal with unsized types yourself like if you wanted to implement box yourself and take a type that wasn't sized it's possible it's it gets pretty annoying once you get into casting and allowing trade objects and stuff um but there was recently an rfc that landed rfc number 2580 which talks about adding a generic api for manipulating the metadata of fat pointers so this lets you do things like say what the type of the second part of the fat pointer is right so in the case of um in fact we can scroll down here a little and look at a din trait now becomes a pointy where the metadata is din metadata and where din metadata is a pointer to a v table that represents all the information such as type size type alignment a pointer to drop in place and a pointer to all the methods for the types implementation of the trait so exactly what we just talked about um similarly you could imagine that the um this pointy trait would be implemented for something like a u8 um a u8 slice without the reference where the metadata would be a u size that is the length of the thing um this is still very much like new and experimental i do recommend you read through this rfc it's really fascinating if you want to learn more about this it has things like the trait alias thin which is any type that implements pointy where the metadata is empty which would be anything that doesn't have associated metadata in the pointer so anything that is a thin pointer and not a fat pointer and it has essentially methods for introspecting the metadata of a pointer so that you can do things like actually look at the v table there were some questions about this in chat earlier this rfc would actually let you introspect that information and crucially construct v tables on the fly so you could have one that isn't known at compile time but you built at runtime and in fact this already is something that um exists one place in the standard library as far as i know which is for wakers in sort of the land of async so the waker trait we're not going to go too much into detail but the waker trait which i'll show you here um the waker that's i'm lying i mean the wake trade uh awake not the weight rate the raw wakers oh there isn't a trait for it that's why um let's go show you context actually so context gives you a waker waker is a struct that has the methods wake wake by ref will awaken from raw and you can drop it and clone it but in practice waker is really generic it's just that it isn't generic when you look at it but inside of a waker is a raw waker and a raw waker if you look inside of it is a data pointer and a v table pointer so it really is dynamic dispatch but in sort of a hidden way and the v table is a raw waker v table and a raw waker v table you construct by giving the function pointers for the clone method for the wake method for the wake by ref and for drop and so it's basically a manually constructed v table that gives you dynamic dispatch through a type rather than a trait i'm not going to go into too much detail of this but i just wanted to show it as like an example of a manually constructed v table in the standard library um is there anything more i wanted to talk about from here i don't think so all right um let's do questions because i've rented for a little while um i thought a reference to u8 had a start pointer and an endpoint or not a length i think it has the length um it doesn't the two are basically equivalent you might be right i forget in fact i wonder whether this reference says um yes you see this rfc also deprecates the trait object um stuff that existed nightly because this is a replacement um pointy trade thin metadata from raw parts where is the definition of oh it doesn't actually give yeah i was hoping it would say here with the actual associated um metadata type is for um slices but i don't see it i think it's just a length um can we make our own types dynamically size types um sort of so you can write this um foo here is now not sized in fact you can add more fields um you can't add a field after though after this is not sized right so if i try to do this it'll tell me the size for values of u8 cannot be known at compile time uh only the last field of a struct may have a dynamically sized type change the field's typed i have a statically known size borrow types always have a statically known size so what what this is getting at is if it's the last field then think back to the the argument i made very early on right if you have a non-sliced type in the middle here and i asked you like if i wanted to look up like foo dot x and you only have a pointer to foo how do you get a pointer to x when the type in the middle can have an arbitrary size and the answer is you can't there's no you don't know how to get the address of x because t is arbitrarily sized but if it's at the end then it's fine you can statically know the offset of f the offset of x and the offset of t and the only information that's needed in order to make this size would be the length of the last field as you are allowed to construct a type like this this type is now not sized but if you have an atfu then that type will be sized by storing the pointer to the foo alongside with the length of t so you can create your own dynamically sized types this way um is box u8 the same thing as a vacu8 um so this is if you have a box of a u8 slice not behind a reference uh is vec you wait um they're not the same uh so no i guess let's do this um they're not the same uh they feel the same but they're not the same uh oops evacuate can grow so evacuate first of all has um it's three three words it's the pointer to the vector on the heap uh it's the length of the vector and it's the capacity of the vector and this is so that if the if you try to push beyond the capacity or below the capacity the length just gets incremented if you try to push beyond the capacity then the entire vector reallocates itself copies over and then um changes you with a box you ate you can't do that this uh this will never grow or shrink you can't push to it it's just think of it as a it really is just a slice you can turn evacuate into a box you ate um and you can also go the other way around but they're not they're not the same no um to do what's the difference between a din fn and an fn so basically i can write a fn food that takes a din fn and i can write an fn bar that takes an fn what's the difference between these uh these are not the same uh this uh is a this has to be a function it can't be a closure and the reason for this is fn is really a function pointer like it's just a single pointer that has to be an address this is a v table it's a trait object which means that it both has a sort of the v table you can think of here has a pointer to calling the function but it also has a data pointer right because the the v the structure of this is that it's a wide pointer where one part is the v table which has the the pointer to call actually calling the function but the other part is a data pointer and crucially the difference here is that if i have fn main i can call foo with a closure um so i can do this but i can't do the same with bar uh [Music] why is it complaining oh right so so i can call foo with the closure because the um this captures x right so so the it's not just the function pointer it's a function pointer and the the data that the closure captures from its environment that is it needs to when you call the closure you also need to supply the address of x because it's needed by the body of the closure and that's what's passed in the data part of the the white pointer for bar it requires an actual just function pointer which means you can't pass a closure because you wouldn't have anywhere to stick the data when we use din offend over impul fn yeah so so there's also baz f impul fn and i can call baz with a closure what what is it what's it complaining about uh i can call baz with a closure um because remember impul fn is sort of sugar for a generic function over anything that is fn so we actually get a concrete copy of bass for every type of closure that's passed in uh and therefore you can trivially pass in the uh the data as well because it's monomorphized to each individual closure the question then becomes when do you use a dinafen when you use an impulifn uh the the basic answer is implement is is more generally usable like you don't have to indirect behind a pointer for example but you end up generating a copy of baz for each closure type you pass in which which might get become quite a lot and the other reason is because sometimes you want to take a trait object instead of making it generic because otherwise you have to propagate the generic type up so imagine that i had a struct wrapper that internally i wanted to contain a function pointer if i have it hold like i could write this right i could do this but this means that anyone who wants to any user of my library that wants to hold a wrapper would also themselves need to be generic over f or name the type of f whereas if this instead stored a box din fn the now wrapper is no longer generic so my callers don't have to think about and propagate that generic parameter so sometimes it it cleans up your interface and makes it um just nicer to use there's also the example of trait uh making traits object safe so if i have a trait x that has an fn foo if i make this take an impul fn then x is not object safe i couldn't write a quacks i guess let's have this take self um trade the trade x is not object safe because this is generic right input fn is equivalent to saying uh f is fn and this is f right and this cannot be made object safe like we talked about but on the other hand if i made this din fn it can be object safe because there is only one foo right so we don't have this problem if we need an infinite number of entries in the v table there's only one foo and it itself takes a wide pointer which then works for any type that can be a objective so there's not a clear-cut answer here um nice okay i think i think uh that captures everything i wanted to talk about so the one thing that we didn't get around to talking about is um coherence um i think coherence is different enough that we're not going to talk about it this stream and i might just do a separate stream on it especially because we're on like the two hour mark so i think this is a good time to stop and then just do this separately are there any sort of questions towards the end about all the stuff we've talked about so far is that the general difference between static dispatch and dynamic dispatch the size trait v tables object safety wide and fat pointers does all of that roughly make sense now or is there anything more i can go through to sort of help crystallize it in your mind to make it clear why we have these what they're for what their limitations are i'm happy to take some questions towards the end here um could run time trait detection be implemented in the future using a type c table um if i understand your question correctly i don't think it can so so the idea is that can you can you take a trait object or can you take like a wide pointer and figure out which traits it implements just by looking at its v table um i don't think you can part of the reason is because they're like a different v table gets generated for each trait object so it's not like there's one v table for stir or is for string right there are many v tables for string there's one v table for string for each din trait right um and so there's not just one one v table that you could then look at to figure out what what traits it implements um like if you had like a find traits that took an s that was a din like didn't trade um then like the only things in the v table would be whatever methods are on the trait that you name here you wouldn't get a v table of all possible methods on uh the provided type uh so no i don't think you can do dynamic dynamic detection of which traits something implements this way um does calling a din offend involve a double d reference one's for the v table pointer once for the actual function pointer within or does that get optimized away so if i do call takes an f of fn and in here i do f um you know that's a good question um i think it'll end up being a double d reference because you first have to go through it's not quite a double d reference because this is a wide pointer right it's not it's not a pointer to a tuple of a pointer and a v table right it it's really what gets passed in is a pointer to some t you just don't know what the t is and a pointer to t's fnv table right that's that's really what's passed in here uh and so this ends up calling like this sort of ends up calling like v table dot call of t and so there is a d reference here like this is a pointer to the v table so you do have to dereference the v table i guess i guess this is a better way to sort of type it out so you do have to de-reference the v table to get to the call it's really i guess this right you have to de-reference the v table to find the address of the call that you then call so it does end up being two d references but it's not quite pointer chasing either um um it is true okay so so there is one thing that that's a little bit funky right which is if you if you have an x that takes an s of let's say den asref stir um then one thing you could do and this was mentioned in chat like with the new pointer v table dynamic metadata um rfc you could imagine like s like if s dot v table is equal to like string as din asref stir dot v table then like s as reference f s dot data as reference to string dot i guess if this was a mute uh dot push right like maybe you could do some really weird ugly magic like this i would recommend just don't do it um but yeah you you could imagine you could compare v tables this way i don't know if this is even a guarantee that the compiler will uphold that a type will always have only exactly one v table um and yeah the the compiler does do a decent amount of optimization here too like if you do a box din fn i think it like might not do a heap allocation always if you like if just like escape analysis and stuff um but in general if you use generics like if you use an actual generic type parameter or an impul block or sorry an impultrate um in those cases you will get better optimization just because the code get the compiler gets full insight at compile time into what all the relevant types are and can co-optimize based on the actual concrete implementations whereas once you have indirection through dynamic dispatch the compiler loses some of the information it would otherwise have and be able to optimize based on um can you do a quick example of the slice vec of dins oh yeah uh say his um so if you actually wanted to do like i guess we had din hai but let's make it in asref stir and you wanted like four h in s let's make it v for smv s.ref so to actually get this to work what you end up with is you have to do this to make this sized right um so that all of the elements of the array are the same size because the arrays require that so you can find the given index right and once you do that this all just works and you could put a box or arc or whatever for the inner type too um different compilation units can lead to different v tables uh that sounds about right so uh this is a little bit anecdotal but when you compile russ code rust might compile one crate using multiple independent threads to compile different subsets of the crate to sort of speed up compilation but because you want those separate units to work in like entirely concurrently without too much synchronization they might both encounter let's say string being used as a din asref they might both generate their own v tables because they don't want to coordinate about generating the v tables and therefore you have multiple v tables even for the same type for the same trait so that's one example why that invariant might not hold that's a good point and yeah there is also any um which i haven't talked about i'm not gonna actually demo it but um so there's the any trait uh and any is super magical it's not actually like any is just a trait that has a function that returns an identifier for a type that is unique guaranteed by the compiler and so if you have something that's a trait object over any you can use type id on it because it gets added to the v table to get a unique type identifier for that value and then you can use that to downcast from a din any to uh the concrete type because we know what the type identifier is i'm not going to go into any too much but but any uses a lot of the stuff that we've talked about so far to actually be able to go from a din a dim trait as long as that trait includes any into a reference to the actual underlying type it's really cool and actually surprisingly simple now that you know what you know so i recommend you give a read to the standard any module documentation it talks about why this is safe all right i think we're gonna end it there two hours and fifteen i'm pretty happy with that we covered a lot um i hope that was useful i hope you feel like you learned something and some of this might actually stick um i have made a little bit of progress on my coding live stream that i really want to happen uh i'm hoping it'll it'll happen sooner rather than later but it's it does not escape me i'm still on the case um thanks all for coming out and uh i will see you next time so long for well
Info
Channel: Jon Gjengset
Views: 25,946
Rating: 4.9732442 out of 5
Keywords: rust, live-coding, vtable, dynamic dispatch, monomorphization, fat pointer, static dispatch, generics
Id: xcygqF5LVmM
Channel Id: undefined
Length: 132min 52sec (7972 seconds)
Published: Fri Apr 30 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.