RustConf 2021 - Move Constructors: Is it Possible? by Miguel Young de la Sota

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi my name is miguel today i'm going to show how move constructors a technique from c plus plus can be used to express a new class of types and rust this type of programming has frequently been cited as being incompatible with object model i'm actually going to show that we can make it work to get there we're going to go on a journey through rust to c-plus plus and then back to rust again this talk is a bit dense so please make sure to ask questions in the chat if you get confused i'm going to assume an intermediate level an intermediate to advanced level familiarity with rust throughout the talk to understand why we care about move constructors we need to understand self-referential types we need to want to create a type that can point into itself we'll start with a struct that contains a value x and a reference to x which we're going to call putter to x very creative uh if we write this code rust actually complains because rust requires us to give explicit lifetimes to any reference inside of a struct now we don't actually want to add a lifetime to the struct itself we want a lifetime that is sort of like a self-struct that refers to the lifetime of the struct that contains it unfortunately rust really doesn't have a way of expressing this so we wind up having to use a raw pointer the barrel checker just doesn't know how to deal with this today i'm going to be using these little boxes containing fields and addresses in order to describe what pointers are doing at any given time they'll become very important once we start tracking self pointers into different structs now that we're using raw pointers we can make cycle self referential after constructing it on the stack by you know put setting assigning protox to the reference to a reference to cycle.x this creates a little cycle inside of the this creates a little cycle inside of the inside of the uh pointer so pay close attention to these arrows because they're going to change over time as we do different things with the struct we can implement a get function that uses the self pointer to access the other field although a little bit contrived this is a real thing in certain data structures being able to create self pointers is very useful for some classes of optimization and is pretty common in you know performance sensitive code in order to get get to work we need to maintain an invariant in cycle part of x must always be null such as around initialization or a pointer into itself and no other cycle in other words these are the only two shapes for a cycle that are allowed in particular we cannot have a cycle that points into another this is not allowed and actually breaks assumptions that we want to make about this type now it's pretty natural to want to factor out this initialization code into a constructor function the only change from the previous code is that we've copied the let mote cycle part into a function and then call up called that function instead of using the cycle that we initialized previously but if we do rust will move cycle as it returns out of cycle new when we return by value rest is a move which changes the address of the returned value but because putter de x is unchanged it points into cycle new stack it no longer points to x not only does this violate our invariant but it makes get violate memory safety because reading data from destroyed stack frame is not allowed this is pretty bad so it means that we can't write cycle new this way while maintaining the other properties we want because rust moves are implemented as bitewise copies cycle doesn't get a chance to fix up its invariant this is a general problem with self-referential types their invariants break if rust moves them and rust really likes to silently move things halibut rust does provide a good way to avoid this pitfall pinned pointers pin p wraps up a pointer type so that unsafe code like our get function can assume it never moves moreover it stops safe code from accidentally moving its point d safe code can still do whatever it wants without hurting unsafe assumptions in general this means that you can't get a mutable reference into the thing behind the behind the pin pointer you can only get immutable references because you can't swap or replace out of those most rest types are unpin meaning that they nonetheless can get moved out of a pin like you can unpin an integer because integers don't really know about their address but a self-referential type like cycle is sensitive to its address and therefore can't be unpin we you actually usually disable unpin by adding a special type that i'll show later most importantly pin isn't magic it's a library type and it doesn't really change rust's core semantics you can still move things around as long as you have them by value this is unlike types like slices and raw pointers which introduce entire new entirely new language semantics this will be important later because we will add additional uh additional library guarantees that don't change rust semantics at all so let's go back to our dangerous implementation of cycle new because due to this move it immediately invalidates the raw pointer we create if we use pin we avoid this entirely by returning a pinned box you know we the we ensure that we can't move out of it in any at any time certainly doesn't move when we return out of it we also need to make sure that it's pinned the phantom pen type is a zero size type that just disables unpin every every type that cares about its address needs to include this type as a field we can now write our code as before without any issues cycle.get will not dereference a dangling pointer note that we can't pull out of it like we could with a normal box this ensures that safe code can't do anything rude pin lets us write unsafe code that requires certain values never move there are a number of reasons you might want to do this this comes up most commonly when writing asynchronous code in rust but there are other situations in which it's useful for you know advanced data structures it also enforces this in safe code this means that safe code can't accidentally move stuff or do things that unsafe code doesn't doesn't expect so unsafe code can do whatever it wants and safe code can do whatever it wants and unsafe code doesn't need to care if you're watching closely though i actually needed to add a box to cycle new to allow me to return the pin to data this is a problem we need a pin we need a pointer to pin in the first place you can't pin an object and return it that doesn't mean anything in the in the pin formalism this leads to two major problems with pin number one because stack stack allocated stack pin data is going to be behind a mutable reference you can't return it it's the same as with any reference therefore you need to pay the cost of an allocation and box it we also can't pin data in collections this is because basically every collection that can resize is going to move things when resizing when you when a vec fills up it copies everything and treats it as a move when a hash map hits its its cap its capacity it needs to do the same thing address sensitive types don't really fit into the language and using them widely is not free okay so that's enough about languages that suck let's talk about c plus plus and no this is not a c plus talk i don't assume people have encountered it before and i'm going to be very light on the weird c plus syntax you can mostly you can mostly treat it as rust with a weird different syntax and no borrow checker let's recall our cycle invariant first remember that when we have a cycle we want it to be null or a self-reference we cannot have this version where it points to something else let's implement cyclone c plus plus first let's define fields like we did in rust next we're going to need to be able to construct it and c plus plus you use constructors to create values unlike a rust constructor function that might be named new c plus plus constructors get access to this which is the c plus plus equivalent of self it is as if cycle new somehow took a self parameter we can take the address of this in the constructor which we could not do in rust when we call the constructor it silently passes the destination of the new object as the this pointer so you know as you can see a the version in the constructor which didn't really know the concrete address gets a concrete address when actually constructed to make this work c plus plus allocates space before calling the constructor so this a lock by array is just random untyped data we then use the special new syntax which is kind of like a raw constructor to tell it hey please run this constructor with this explicit this pointer finally we cast it to the correct type and it's just that business as usual as before you can call functions on cycle and it's perfectly normal value in c plus all values get constructed in place that is space is allocated before the constructor runs as a result c plus plus values are always implicitly pinned they always know their address they're all address sensitive unless you say otherwise so what would this raw constructor operator look like in rust on the left side i have distributed the c plus plus constructors from before on the right side i have a transliteration of that into rust the raw constructor will be some kind of function call that accepts uninitialized data and constructs an object there which is the same signature as the rock constructor although the rock instructor does take an additional argument and we'll see how that mixes in later naturally we want to capture this function signature into a trait new new is a constructor object that can create values of a specific type but remember that we want to leverage the fact that constructors create pinned values so we need to make sure to mark the rust version of this as pinned to implement a value to implement a constructor we create a new type new cycle is a constructor for cycle it takes an integer argument and sets up the invariant like the c plus constructor the difference between this and c plus constructors is that the additional type captures the constructor arguments so instead of taking an integer the new new takes itself and self contains that integer argument for other constructors you might have more or less arguments note that it is not cycle that implements new because there are many ways to construct a value for example you could imagine a default cycle type that takes no arguments notice that the va the body of the rust constructor is almost word for word identical to the c plus plus version nu lets us capture the destination address and store it just like c plus we can now set up the cycle invariant without transiting through the null state again just like in c plus plus it's almost as if we've ported the feature all all bugs and all the new trait represents constructor objects you can build a constructor with some parameters and then run it on some storage somewhere new colon new is rust's equivalent of the raw constructor syntax and is intended to be wrapped in a more ergonomic api much like how you usually don't need to call it in c plus and you can use the syntax sugar of declaring a variable instead even though we can now create cycles without an intermediate null cycle we still actually solve the either of the problems we had now that cycle is pinned on construction we can't return them at all maybe that's better or maybe there's an abstraction we're missing when a function returns it needs a place to put the the return value and this is some place in memory called the return slot c plus plus contr constructors actually get special treatment in the c plus plus version of this when you return a c plus constructor it it's not as though you're assigning it to a variable and returning it what actually happens is it constructs it in place in the return slot this means that this code which wouldn't work at all in rust works perfectly in c plus plus and dodges the move on return hazard c plus plus implements this by adding a hidden out parameter to every function the return slot the this pointer used for the constructor is the return slot unlike a local variable like it would be in rust or at least that's sort of how rust behaves that said rust does have a return slot and it does use it a little bit in the c plus way but it's not in a way that is accessible to users and certainly not in a way that we can use it to run a constructor so what if we just ignore the rust return slot and build our own wouldn't that be better that way we can control how things get placed into it let's represent the return slot with you know sort of reference type that is data somewhere it's uninitialized we own it and we're responsible for initializing it it doesn't really matter where it is but usually it'll be on a caller on a caller stack slots are the ideal location for running a constructor their raw memory waiting for a value to appear we can define a method that accepts a constructor runs it and returns a reference to the result consuming the slot so in place takes a slot from the uninitialized state uses a constructor and then takes the initialized state that the constructor results in we can then fake a return slot by passing in a slot argument instead of returning by value we run a constructor in the slot and return the resulting reference this is actually very similar to the c plus plus arrangement under the hood and while still leveraging the borrow checker both the slot and the pin seem to share the same lifetime so you don't have any of the concerns you would have in c plus where the slot might not outlive the result okay so constructors plus slots actually gets the first of these we can genuinely return objects on the stack even if it's not quite as ergonomic as it is for unpin types progress we still can't do the collections version of this but c plus plus seems to make it work even though all the pointers are pinned so how do they do it let's look at our our c plus cycle type again when we assign a variable to another c plus actually makes a copy rather than a move like in rust much like clone by default this is a bitewise copy which means we can run into the same problem where the self pointer invariant is broken oops in particular remember the bitewise copy means the pointer does not get updated but like rust we can actually customize this copying behavior the key difference is that well clone and rust simply returns by value c plus plus copies are done with a constructor this gives us a chance to fix up the pointer to the no lock new location before it's available for potential misuse so the cons the copy constructor looks almost identical to the normal constructor it fixes it sets up the invariant and copies the data copies x but not the pointer c plus plus variables don't move around like in rust you always construct new ones whenever you give a variable a new name however objects are always notified of when this happens so you can fix up any invariants the biteways copy would otherwise break this is similar to clone but it has all of the pro all of the power of constructors obviously we need to port this to rust because it's really cool again let's write down the c plus code and see how we might write it on in rust on the left side we have a copy on the right side we have the the rust constructors exploded out into the version that manually calls the rock constructor operation unlike constructors there's only one copy constructor for each type so it makes sense to give it a specific trait which we're going to call clone new we don't need to add an additional type that represents the cloning operation it's like intrinsic to the type being copied the clone neutrate looks almost exactly like clone except that instead of returning by value it takes a destination parameter just like a constructor would implementing it is super easy like we already know how to implement it in c plus plus we implement it exactly the same way in rust it's the same with the other constructor now we implement this on clone because as i mentioned before being copy constructable is a property of the type rather than them being very many distinct methods for doing so in c plus you only really get one copy constructor so it doesn't really make sense to try to generalize and rust the new tray is a fusion of clone and new it's the cloning constructor but in rust we really like to avoid having to make copies because room semantics are awesome so how can we recover them let's recall back to our move slot abstraction when we create a value in a slot because the slot is a unique reference to some memory somewhere we're actually the sole owners no one else will ever access the value even after the slot disappears we can introduce a new type moveref which is like a mutable reference that captures this ownership property not all immutable references can be move refs because some of those might be outlived by other mutable references or other types that are entitled to run the destructor moveref has to be entitled to run the destructor that's what ownership means we can use moveref instead of mutable references for the return value of in place since it produces a brand new value no one else will ever access the key point is that moveref owns its pointee and not its storage this means that we can move out of it like box but it's storage agnostic like a mutable reference you can have move reps that point to the stack or the heap now let's recall copy constructors they're like a constructor but instead of accepting some special constructor type they accept a reference to self and write a copy of this like supplied destination we can define move constructors analogously by replacing the source with a pinned move ref pin pinned because that's what in place means unlike clone new move new consumes the source and that's why it takes a move ref however the implementation is almost identical and it reveals the true nature of custom moves they're just copies that consume the source however they can also mutate the source to disable its constructor or you know properly move an owned resource this is exactly what rust smooth semantics do for vanilla types that is unpin types clone new and move new are just the usual clones and moves there's nothing special about them and it shows that this concept actually reduces to something we've already encountered in rust move constructable types let us move the immovable despite being pinned we can move data from one location to the other this doesn't violate the pinning invariance because we control exactly how the value is moved in a way that unsafe code can be made aware of the trait is generic so we can build custom collections that leverage it instead of doing a wise copy to resize a vector we can call the move constructor on each and every element as we're resizing it this is how c plus plus collections for example resize when the contained value is not trivially copyable pin data and collection so long as it can be move constructed is totally something we can do so yeah we can actually do these things we can make we can make address sensitive types feel as flexible as normal rest types we can return them we can put them in collections we can really allocate them wherever we want and we can specify complex moving strategies that the normal rust object model would not let us that said we need to muck around with uninitialized data in order to use any of this and the ergonomics are really bad most people are not going to be writing on safecode they're going to be using these types and the unsafe code under the hood but it would be best that they don't have to do any of that manually so in placing a constructor actually only really cares about the constructor value you use everything around it is boilerplate that can be inferred however recall returning painted data is impossible so we can't use a function to do this but we can use macros to work around it for example the slot macro could be used to allocate some space on the stack and then immediately stick it into a slot so that it can only be accessed through the slot unfortunately the slot macro can't even be an expression since it needs to allocate on the stack we can use in place in order we can use the in place operation defined on slots previously to construct values unlike the example above this produces a move ref we can add convenience functions to slot that make it easy to call the clone and move constructors as well so now we don't need to deal with any unsafe code in order to call constructors we can just call this macro allocate some space and then put it in put it directly inside them this is this is a common enough pattern that we can add a macro to do this even more concisely the choice of syntax isn't really important ultimately we can make constructors feel simpler and easier to use you can we could also potentially add functions that let us and place macros into boxes into arcs and other similar similar things that you would want to do the regular value really it's a matter of what what conveniences you want to add although constructors are implemented purely as a library we can use macros to add custom syntax to use them safely ultimately what they look like is up to the constructor library this only scratches the surface of how macros can make constructors feel more natural and part of the language there is still a lot to do in this time in this topic i already implemented the core traits in a crate check out move it on crates.i o i want to actually build a move respecting veckenhash map like i described earlier c plus has already implemented such things so it's just a matter of porting the analogous thing from a c plus standard library i'm most of the way there with back with vec i don't know to what degree i want to attempt hashmap until someone really moves it and really needs it because hash maps are pretty complicated this concept is also very useful for mixed rust c plus code but i think it can also be useful for certain kinds of pure rust as well i'd like to make examples of both more readily available to motivate this kind of api more in the process of developing this i sort of realized that pins doc docs don't express all of the things you can do with it it only expresses a certain subset of it i'd like to improve the documents so that they more clearly state the guarantees than what they mean for api design i have a working version that i'm hoping to propose to the to the rust standard library at some point finally there are tons of c plus features that can make working with constructors easier such as syntax sugar from being able to have struct literals uh using using constructor operations as an expression rather than as a binding there is a very long list of these things that i'd like to that i'd like to port someday in order to make uh constructors easier to use now none of this would have been possible without the help of many people who have discussed who i've discussed this feature with who [Music] who gate who suggested ways to make it more concise who pointed out flaws in the original design however i want to especially thank tyler mandari after he corrected something i assured about pin i realized that the pin guarantees could be used to express constructors i'd also like to thank manish gorgokar who encouraged me to publish this idea widely for the benefit of both rust c plus plus interop and the development of rust in general thanks for listening and i hope you enjoyed my talk you
Info
Channel: Rust
Views: 5,436
Rating: undefined out of 5
Keywords: rust-lang, rust, rustlang
Id: UrDhMWISR3w
Channel Id: undefined
Length: 24min 8sec (1448 seconds)
Published: Wed Sep 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.