Type erasure in Rust C/FFI — structs, generics, trait objects

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today I'm talking about writing C code that creates some works with rust objects over FF I this is pretty easy if your struct is boring but it gets more interesting once you start adding things like generics along the way we'll look at some trade objects in detail and do some weird stuff let's up with a simple non generic example here's a struct called my value it's public which means I'll be able to refer to it externally but it's not represent we won't expose its field we won't try to expose its field on the C side it will be an opaque struck that we can only refer to via a pointer now that I haven't I can define some FF I methods to work with it as you can see my struct contain some value you 32 in this case which is chosen on the rough side I have a function to allocate one on the heap by putting it in a box then giving ownership to see within to raw I then have a function to free it again by taking ownership of the memory with from raw then dropping it I also have a third function that prints out the debug representation of the contents of whichever and Y value pointer gets passed in so far so good if I build it with C by engine configured it will generate a header file for me and that header file looks like this I have my struct definition it's opaque I also have my function prototypes I can now use these from C code so here's a basic program that includes that header file it makes a struct it calls the method to print it and then it frees it so I can build this C program I'm just using a small script to automate the things that I need to do for that it's not especially fancy and as you can see it does pretty much what we expected creates one asked it to print itself and then freeze it in all these functions the key thing we're passing around is a pointer to a struct my value that's over closer look at that in the debugger for the we program so we'll set a breakpoint on the main function and run it so we're about to create one let's go to the next one so it actually gets created now we have some value some pointer inside the variable env let's just have a quick look at our mapping if I type it correctly mappings okay this address range is where our heap allocations currently are and if I do phone local we can see that MV is a pointer into 8 1 0 0 1 0 so that's right at the start of our our current heap memory region so that's pretty much what we expect now let's have a look at what's actually contained inside that this is just reading one for byte integer from the memory pointed to by MV and we see that it's the number 4 so that's sort of what we would expect it's a pointer to the struct and if we sort of extend that out thank you haven't got some zeros after it but there's not really anything else interesting going on so at a higher level this is what we've confirmed the return value from our value create is just a memory pointer that's a memory address and we found the content of that struct laid out in memory at that address 1 32-bit integer so far hopefully this makes sense but there is something important that isn't there let's focus just on the print function for a moment the only thing that we pass in is a memory location how does it know that the address we passed in actually belongs to a my value struct how does it know that it's going to find a u 32 at that memory location what if we gave it the address of some different kind of struct how would it know and the answer is that it wouldn't know when we say that the argument is a pointer to a my value we're telling it that's why should be then when we convert that point to into a rust reference we have to do that unsafely we're telling the compiler treat this memory location as a my value strut we could have 100 kinds of structs and maybe some of them contain an integer and some of them don't the type of the struct itself is not stored in memory it's completely contextual but because we sent the pointer to this particular function it's going to assume the memory is this particular struct and we end up treating it like one for better or worse now what if my struct was generic at the moment this has no type parameters but we could imagine a program which has a struct which is this time we have our power field X is a parameter a generic type D and something that implements debug so that we can print a representation of it to the screen so now I have to adjust my functions accordingly to deal with this my create and free functions are going to create a u-32 variant explicitly same as before however I've made the print function completely generic here in normal rust code we could pass it any kind of my value including u32 and it would just do what we expected to do the compiler would know how to make that work but we're not build this one something else happens this time I get a strange warning that the print function has to be mangled and if I go and look at the library was the current of the bio value we can see that at the print function is missing it hasn't actually exported it it's not available from C and if we think about it what we're trying to do just isn't possible when we receive a pointer to it my value from C we have no idea what the generic type D is we have the memory address where the data is stored on the head but what format is that data it could be a u-32 it could be u8 it could be a string this is essentially an impossible situation so we can fix this by specifying the types exactly and here's an example of how you could do that and now instead of having three functions we have six we've got the create three print for you 32 and they create free print for a string type and you know it's it is because it's a generic type we can store a string in there if we want and and this works and then if you run this through C by engine you end up getting all the all of these functions being created but it's kind of annoying because then in your C program you have to specify oh I'm using a u-32 I want to use 32 type and oh I want to print the you 32 type and I want to free the you 32 type this is annoying you know about what if I don't know or care what the type is maybe sometimes I create one and it gives me one that has a you 32 inside sometimes I created and it has a string inside and I don't even know is there a way that we could write our rust code and explore they're way too far to make this possible and the answer is yes we can do this while using traits we can define a trait that doesn't have any type parameters so we can construct my 32 my value.you 32 or my values string right before and it doesn't matter what type it is so I was it implements debug and now for all of those values any my value where D is debug we have implemented the traitor my trait so now we can sort of represent this entire set of objects anything that could be in my value D can be represented as a my trait for or them so that's all mapped to just one type which is very interesting from an FF I perspective the goal now is to make our way FF I functions refer to this single trait rather than having to specify exactly which struct we're using in each case as we discussed just a moment ago the main problem is that when we receive a pointer from C we don't know what type of strut it is we've lost the contextual type information we only know the location of the actual data struct on the heap a trait object in rust fixes this problem by tacking on an extra piece of information and not only has a pointer to the data but also a pointer to a V table I don't want to get into what exactly your V table is but it's an extra piece of information that effectively identifies what kind of struct have actually is so if you created a my value u 32 and decided to refer to that as a trait object of type my trait that would have a particular V table and then if you create another my value u 32 the same type but a different instance people have a different data location but the same V table and then if you go on to create a my value string it's a different instance again so it has its own data but then it will have a different variable corresponding to the stream variant and we'll prove this in the debugger in a little bit so hopefully things are making sense because it's about to get a little bit weird let's look more closely at the FFI functions relating to my trait when I create the object I'm creating it on the heat with box new but instead of storing it as a box my value it's a box my trait this variable V actually consists of these two pointers the data plus the V table is just hidden from you in the rough source code I then give up ownership of it to the FFI and return the pointer but it's not a normal pointer it's a wide pointer den my tray also notice that in this code there's a 50-50 chance that we'll get either a u 32 type or a string type both of these can be converted to the same type of trait objects a roster school with it so having created that our other functions receives the same wide pointer type as an argument what's nifty now is that when we want to print it out the wide pointer the trait object by itself contains enough information about what type it is we can just call V dot print and behind the scenes that we'll end up doing the right thing for a u32 one or the right thing for a string one it's the same old trait object that we know and love from normal rust code that enables this kind of functionality so let's make sure that we can actually build our library like this great now let's look at the function prototypes that see bind gen created for us oh oh dear see Biogen has decided that we're not you see our create functions return value is no longer a standard memory address which would be 64 bits on this machine it's actually two 64 bit things jammed together third data in the V table and to make things worse this trait object is kind of an internal data type that we're not really supposed to be relying on but you know maybe we can do it anyway let's have a quick look at the library again using n in T so we can see that the functions have actually been compiled and exported in the library so the functionality is there we just don't have the appropriate definitions for our super when no problem we can write them ourselves I guess so first we need a struct describing the wide pointer that we're expecting to get back from the create function as we saw we expect it to be two pointers one after the other and this is returned by the create function then the same type of wide pointer is passed as a parameter to the print function and to the free function so in the main program body what it's going to do is is going to create three trait objects remember there's a 50-50 random chance which type it will be then it will ask for us to print the contents of each of the three and then freedom so let's compile and run that C program a few times to see what it does okay you can see that randomness is taking effect there and now we have a C program that is actually getting references to different types of generic my values and still able to work with them so we've achieved our dream basically we've hidden this information we've erased for generics from from the seaside all the necessary magic is actually built into the trait object which is already part of rust but we have to jump through quite a lot of hoops here specifying a trait objects struct and all of our functions manually in C is there a way to fix that and of course there is it's almost a bit silly but it works really quite well you hit you just have to put the box in another box now I can't actually take credit for this idea I read it a while back or somewhere on the rust users forum but it's a stroke of genius honestly what do I mean by put the Box in the other box well in the current code we have our Y value structs stored on the heap behind a data pointer and we end up with this wide pointers stored on the stet we return it from the create function what we're going to do instead is create a new 128 bit allocation or the heat to store our trade object those two pointers so now that's on heat memory then we take the addresses that and give that to see in other words C has a pointer to a trade object a pointer to a white pointer and C doesn't have to realize that that's what it is at all because it can just treat it as opaque but it means that when rust unpacks this pointer it has another layer of indirection to go through so here's our old print function it takes a trade object pointer and here is our new print function it's the same except now the trade object parameter is boxed the white pointer on the inside is two pointers jammed together but the pointer on the outside is just a normal 64-bit pointer to some opaque object on the heap from CS point of view so in theory C binder will create a prototype for it let's find out okay that hasn't actually worked so there's a bit more da genus here the reason that C bind Jin is not happy here it's not because we've done anything wrong it's just that it's seeing the din keyword in these definitions and thinking I know this is something that we can't handle but we know better than C by genso whoops what I'm going to do is simply remove those dinky words which actually will generate a warning on recent versions of rust because they're making it so that you have to it's justified in but for now we'll just take that out and run the build again there's those warnings now let's look at the dot H file again lovely that's worked so now that we've successfully tricked C by n into thinking that everything's fine it's created and opaque struct type box white right there's that the only side effect with this technique is that you end up with this box prefix on everything you could do your own type def to fix that I suppose but now everything is looking pretty good we have out our three our three methods that we've defined and we just need to make our C program use the correct name so we've gotten rid of all that boilerplate where we get to define the trait object at all that is just not there anymore so we create three of them and now they're now they're pointers they're actual pointers as C knows them and we pass those pointers into print functions and then we free them so let's build and run that program right and that's working perfectly now just to prove that this is doing what we think it is let's have a closer look at some of those objects in the debugger so I'm just going to step forward a few instructions so the MT one two and three I have all been given my value values and we have all the pointers so we can hopefully see those if they are now let's actually inspect some of the memory inside these things [Music] those look like heap addresses so let's look at what's actually on the hip you sort of expect to see two pointers at that address certainly looks like two pointers let's have a look at the other ones as well okay this is looking pretty good so we've got three different data pointers and two different kinds of vtable pointer so the first Y value is one type and the two other values are some other type and to prove that let's step three more times and watch more gets printed okay first one was the string and in the two ones that are the same were actually numbers now let's have a quick look at the contents of the data pointer in theory this is just going to be a number the number four there it is probably going to be the same for the other number one and in the string it's going to be something completely different it's going to be probably several pointers or something yeah lens five capacity five that's probably more data next sixteen see okay and finally we find the contents of our string in some buffer there so we've come to the end of the demonstration we started out with a simple non-generic strut and we can sense it point us back and forth I have a referee with no problems then we tried to make it generic and it just didn't work very well the incoming point it didn't contain enough information to decide how to print it we ended up having to create different sets of functions so that the correct type could be inferred by context from depending on which function we were calling which is just not that fun so then we explored the idea of using trait to paper over the differences between different generic structs this works great trade objects are built into rust to solve exactly this kind of problem but dealing with wide pointers from C is kind of annoying finally we said well why don't we just put the wide pointers on the heat then we have a normal thin pointer to give to C and this works mostly pretty great I have no idea how you're going to use this information but I hope you enjoyed it here we go
Info
Channel: Dodgy Coding
Views: 1,006
Rating: undefined out of 5
Keywords: programming, rust, ffi, traits, generics, coding, lang
Id: UnZVNoBC1RE
Channel Id: undefined
Length: 21min 20sec (1280 seconds)
Published: Sun Aug 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.