Live-coding a linked hash map in Rust

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
It surprises me every time. All right. Hi, everyone. It's time for us to write a hashback. For those of you who have not gone through this before, I'm John, in case you weren't aware. I have a Patreon page and also a Twitter page where you probably found this in the first place. I do a bunch of these live coding sessions of varying degrees of sort of required Rust expertise and looking at... We've looked at everything from doing easy to orchestration to asynchronous SSH to now a hash map and I'm thinking of doing some other things in the standard library. If you want to hear about upcoming streams then you can either check twitter, you can follow my patreon, or wherever else you think you might find me. The plan for today is basically to implement a HashMap and not quite one that's as advanced as one that's in the standard library. Our goal is not, at least for this stream, to write something that could replace what you would just use from the standard library. We're going to make a very simple sort of linked HashMap. It's gonna have no fancy features, but it's going to show you just how you might build a data structure in Rust and also the kind of API you might want to expose. We'll basically try to mirror what's currently in the HashMapping collections, including stuff like the Entry API, if we get time for it. I'll be monitoring the chat as well. So if you have questions as we go, feel free to just do a shout out there and I will try to monitor in the background. So yeah, let's get to it. Saw your tweet and came running. That's great, I appreciate it. All right, let's see. So a HashMap for those of you who are not aware or a map in the first place, a map is just a key value store. So it stores mappings between keys and values. Any given key can only have a single value and you can use the key to efficiently get access to a value that you've stored under that key previously. The way it does this is it takes the hash of the key. and indexes that into an array. So basically the hash takes any object, could be a string, it could be a struct of some kind, anything that implements the hash trait. And the hash trait, if we look at it, Well, so hash is not all that useful on its own, but hash is basically a way to take a hasher, as you see here, you write a bunch of just values into it. It can be a byte stream, it can be integers or whatever, and in the end what you get back when you finish hashing is just a single number. And so this is a way to turn any object into a single number. Of course, that number may have duplicates, so you might have two objects or two strings or two numbers that hash to the same value, and we still want to be able to fetch those separately. Like if I put something under the key foo and something under the key bar, if foo and bar have the same hash, I still want to be able to get back the values from them separately. Like if I do map.getfoo, I don't want to get the value for bar back and vice versa. And so the way you deal with that is you need to deal with collisions. And this is a lot of where the complexity and also where the interesting other designs you can have for HashMaps come into play. So a HashMap is basically just an array in memory where every entry in that array is the... is used to index by the hash of the key. So say that I'm trying to insert something with the key foo into the map, I would hash foo, that gives me some number, I would take it modulo the length of the array, and that's the bucket as it's called that that thing will fall in. But now you can imagine the foo and bar fall into the same bucket. And now there are basically two strategies you can take, you can have what's known as a link hash map, which is what we'll be building today. So a link hash map for every bucket, you keep a list of things that that hash to that bucket. And then you just walk that list in some fashion and compare the actual keys that you store, compare each one until you find the key you're actually looking for. So like a hotkey to a piece of info in the array? Yes, sort of. So I think those will be clear with code, so we'll get to that in a second. But I just want to illustrate the difference between what we're doing and what the standard library HashMap is doing. So for every bucket where there might be collisions, where two keys map to the same bucket, we will just keep a list of all the keys that map to that bucket and we will walk that list. In the standard library and in many other HashMap schemes, what you do is you have what's known as open chaining, where instead of adding some other, instead of every entry in that original vector being a vector itself or a list, you store any values that collide on the hash in other places, in other buckets in the map. And there are many different ways to do this. So there's, for example, the Cuckoo HashMap. which will take two hashes of every key and put it in one or the other. and then this allows you to shift elements around. We might get into that in a later live coding stream where we try to make this more fancy, but for this, we're just going to do the very straightforward thing of keeping multiple keys for every bucket and just scan them. So what does a HashMap look at? The easiest thing for us to do, and I've sort of gone through this in... past live coding streams as well, is I like to start any library that I build with an example, with basically how I imagine the users of the library would write code. In this particular instance, we can use the standard library HashMap because it has an example. So we can just take it, copy paste this example, and we want that example to work with our HashMap, right? So we are going to do, oh, actually, let me make this a little bit larger maybe. That. That's probably better. We can have this as a video. All right, so we're gonna make a new lib and it's gonna be called HashMap, very original. Two terminals here. And we are going to make an example. Instead of examples, we're just gonna take the... standard library example, we're going to instead of using the standard library hash map, this is going to do extern create hash map and then it's going to use hash map, hash map. In theory, we want this example to just work. We might put this as a documentation test later, but for now I just wanted it in its own file. If you look at what this does, you create a new HashMap, you can insert, there's a key and a value, they can be of different types. You can then later ask whether the map contains a given key. You can remove things by key. You can iterate over all of the things in the map, or you can try to get something and get a none if it's not there. And you can iterate over all the things in the map. You can also insert to replace, and all these things we want to implement. We want our HashMap to do all of those things. So how are we going to do that? Well, let's not have some tests to start with. We are going to need a struct HashMap. That's where all of this is going to start. And the struct HashMap, it's going to be public, of course. Inside of a HashMap, the main thing we need is the list of buckets. We're going to have a bucket, and it's going to be a vec of something. We've got to figure out what that is. It will be a vec of bucket. We're going to have a struct bucket. What's going to be in a bucket? Well, a bucket is itself a list of all of the key value pairs that hash to that bucket. So this is going to be something like items. And it's going to be vec of tuples of key and value. Right? So this means that we need a key and value type here. Similarly, we're going to need a key and value type here so that this can be a generic value. This is saying that the user gets to specify what the key type is and what the value type is. In this particular instance, we don't actually need to have any bounds on key and value because vec doesn't require its types to have any particular trait implementation, neither does bucket. Of course, later we're going to need things like the key has to be hashable, the key has to be comparable so that we can check whether the key that a user gives us is in fact in the map, but for the data structure itself we don't actually need to do this. There's a really good guide just API guidelines. This is really neat if you're writing your own library. Most of it is fairly intuitive. But for, where's the predictability? So this basically specifies lots of things you should try to make sure that your library does and how its API works. And there's in fact one thing that ties into this. I think it's flexibility. So it says this. Uh... Oh, maybe it's not here. That's annoying. should arguably be added. There's been some debate about this, but the basic idea is that you want trait bounds to be only on the places where you implement the methods that use them, rather than on the data structure itself. So we could say this should implement hash and eat, but this means you can't now construct a HashMap without having those bounds, and everywhere that we implement something on the HashMap, we would have to specify the bounds again, which is pretty annoying. There is an RFC that's landed to imply trade bounce like this, but we're just going to ignore it for now. Ooh, two mics. Let's see. Hmm, that's interesting. What about if I do... Is that better or worse? Like if I talk now, does it still sound underwater? Sounds underwater. I think I don't think there should be anything wait let me see still sounds weird all right let me try something so let's do this how about now is that any better all right well that's weird It might be that one of the channels is weird, so what I'll do is when I upload the video, I'll make sure to only extract one channel. Still sane. That's super weird. That is super weird. Yeah, this all looks right. Huh. Weird. Is there a way for me to like, move the bass? How about now? It's not terrible. All right. Well, the real question is whether you can hear me now or whether I just now turned off the mic. Looks like it's, oh, that was too low. All right, fine. All right. We'll have to deal with that and I'll try to deal with it afterwards. As long as it's not too bad. Okay, let's see. So we have a HashMap when... So we could specify the trait bounds that we require for the types on the type itself, but usually it is nicer to specify the bounds only on the places where you need them rather than on the data structure. So it's actually a good question whether HashMap does this. So I think HashMap... Yeah, so the struct itself does not have any bounds on k and v, just like we don't. But the implementation is basically all of the implementation requires that k is ekin hash. So they've put only the trait bounds that are required in the appropriate places. And so that's what we're going to do too. Now, the first thing you want for any map is you need a way to make a new one. Right, so we're going to have a pub for new that's going to give you itself. And when you make a new hash map, of course, it's going to be a hash map. The buckets. This is where we start having to make decisions. So here, we need to start out with there being some number of buckets. So imagine that we get this hash of the key that the user is giving us, and now we need to map it to a bucket. That means we need to choose how many buckets there are, because we don't want to dynamically grow this. That would mean that things shift around. So normally what we're going to do, I'm just going to leave this here because it'll be easier to talk about. If you have a key, and I guess a value, I'll get back to why that was in a second. So basically, this is going to do something like key.hashmodself.buckets.len. This is going to give us one of the buckets, and this is also the way that we get collisions, because this value can be the same for different keys. But we need to decide on how many buckets we want initially. Usually, the way libraries do this is just powers of two. So as long as you stick within powers of two, you're usually fine. There's also generally a trend in Rust to not allocate unless you have to. So creating a new empty map should basically be free. And we're going to do the same thing. We're going to say that if you create a new HashMap, it is initially empty. It's only the first time you insert into it that we're going to allocate. So initially there are no buckets. This of course also means that we're going to need a resize method. which is going to... So imagine that you have a map with only a single bucket. If you have only one bucket, then every object you have, every key you have is going to map to the same bucket. So that bucket, because we're using a linked list, is going to get really, really long. And this means that looking up any key is going to take... proportional to the number of elements, unless it's sorted, which we'll get back to later. But so basically, that would be a very inefficient HashMap. Whereas if you have lots of buckets, then now basically every key has its own bucket. So lookup is just takes a single operation, you don't need to scan anything, all the lists are at most of length one, but it means you're using lots of memory. So usually there's a trade off here. In general, what we often do is every time you want to insert and you feel like the map is too full, then you double its size. This gives you what's known as amortized costs over time because usually you don't have to resize. And it gets increasingly less likely that you will have to resize as you go. And this is what I think the standard library hash map does as well. So our resize method is going to do something like target size is going to depend on the number of bytes we currently have. We need a match here because if the length is zero, then we want some initial target size, probably something like 1,024. But for our example, let's make it one. The reason I make it one here, for efficiency, you should probably make it be larger because otherwise you're gonna do a lot of resizes in the small sizes. The reason I make it one is because for testing this library, we actually want to trigger enough resizes that we uncover any bugs that we might have. might change this later. In fact, we could easily do something like initialN buckets, newSize is going to be one for now. And for any other N, the target size is going to be two. Now, when you do a resize, exactly what that does, we're going to do later. But suffice to say for now, initially when you make buckets, they're all empty. You can imagine that we're going to have some number of buckets defined by buckets.len. And each one of them is going to be one of these bucket objects up here. So let's do insert first. Because insert is probably the most interesting function. If you can't insert anything into the map, the map is basically useless. So we already have something that looks sort of okay here. It turns out that... There are a lot of ways you can hash things. This is maybe not very surprising. There's everything from sort of cryptographically secure hashing functions to very fast ones that generate a lot of duplicates. In our case, and we're gonna do the same thing as the standard library does, notice that it has, it's parametric over K and B, and this S. So S here is a hasher. And it defaults to this random state thing, which we will probably also want to do just for compatibility. So S is sort of the way, a way to build a thing that can hash things, is the best way to describe S. So if we look further down, you'll see that the requirement here for this is that S implements buildHasher. And yeah. So here we want to implement this where S implements buildHasher. And what Buildhazard does... Where is Buildhazard? So buildHasher, as the name implies, lets us build a hasher. A hasher is this thing we looked at before that lets you hash something, right? And so this is what we're going to have to do to our key. We're going to pass it through one of these hashers and take the resulting value. And a buildHasher is just a way for the user to specify how to get a hasher. So a buildHasher could be something that constructs a SHA hash. It could be something that constructs any other type of hash. In our case, all we really need to do here is to call this buildHasher method. So this means that our house map is also going to have a build hasher of type. So this is going to be... Ooh, it's a good question. What do they do here? So for new, it gives you a random state. So new is going to give us a... What we're saying here is that if you create a new HashMap, if you just don't specify what S is, we're going to give you some sensible value for S. That's why here we're not actually going to be generic or S. Instead, we're always going to give you a random state back. Random state is just a particular type of build hasher that comes in the standard library. It's just very convenient to use. It means that the user will often not need to specify it. So our build hasher in this case is going to be a random state. Does random state take any... random state new. Great. And now, of course, the reason we do this is, as you see in the standard library too, there's a with hasher where the user can specify their own. We're going to just not specify that for now. In fact, let's make this a lot simpler. Let's just get rid of this and say that it's always random state. In fact, we could go even further and just say that the hasher is always a default hasher. We wanted to make this really straightforward. So notice that this is default hasher. And a default hasher is just something that comes with the standard library that uses some relatively sensible hasher, and we can just use that. That way, the user doesn't need to know about this additional thing, and it makes our point simpler. So for exposition, it makes our lives easier. In fact, then we don't even need to store it. We can just create a new one each time. It might not matter. Yeah, no, we need to create a new one each time. So the reason we need to do that is hashes are cumulative. So the first time you write something to it, that's going to change the hash. Calling finish is going to give you back the current value for the hash, but you can also keep writing. If you look at finish here, yeah, so this does not reset the hasher's internal state. So if you continue writing, you'll keep changing the hash. If you need a fresh hash value, you need to create a new hasher. The reason why the standard library has this build hasher thing is because they actually randomize hashes for any given map. This is to prevent various attacks. If you're interested, you should go read the documentation for random hasher. I don't think that's something that we need to cover here. This just makes our lives a lot simpler, so we're going to do it. So here, insert, we're going to need a hasher, and that hasher is going to be a default hasher. Of course, the default hasher comes from standard collections hash map. Default hasher. All right, so we now have a hasher. Now what do we do with it? Well, that's not what I meant to do. So a default hasher gives us one of these, and what we're going to do with a hasher... is we're going to use the hash trait. So we're going to require that our key, technically that's not needed for new. So here we're going to require that the key implements hash and that's standard hash. So where's, all right, so if we search for hash here, hash trait. So hash just takes a ref self. So this is going to be our key, and it takes a state, which is a hasher. So in order to get the hash, we're going to do key dot hash into the hasher. And then we're going to say bucket is hasher dot finish, which is going to give us the hash for the given key modulo the buckets. model the number of buckets we have, and that's going to be our index into buckets. Now we happen to know that this value bucket is less than or equal to, actually it's not needed. The compiler can figure it out. So this is going to give us some value that's in the range of zero to buckets of len, which means that the bucket is going to be self buckets, bucket. So that's the bucket we're going to insert into. Now here, all we really need to do is we're going to take the bucket, and we're going to push, conceptually, we're just going to push the key value pair. Now, one reason why this is not okay is imagine that someone's already inserted something with that exact same key. If we did this, there would now be two key value pairs for the same key, which is not okay. So what we actually need to do is we need to walk the list of buckets and see if anyone else has inserted the key that we're already inserting. So here we're going to have to do something like let i is bucket dot. Let's do bucket dot. find. Yeah, let's do this this way. Drop find. Find takes a ref mute to the key. So this is going to be an existing key and this is going to be an existing value. So if let sum... Let me write this out and then I'll explain what it does. A key ref value, let's find... where ET is equal to T. We'll see when we get to implementing get, but is that a problem for retrieving when the bucket count changes? Yeah, so if the bucket count changes, then we actually need to recompute all of these. So a resize is actually really expensive in a HashMap. Now, there are some tricks you can pull. So for example, the standard library HashMap, instead of just doing modulo, it actually uses the, let me think, the low bits of the hash. That way, when you resize, you only need to move half of the elements. If there are twice as many buckets, you use one fewer bits of the hash, so only the things that had one in that position would have to move. All the other ones stay in their current place. Whereas for our case, we will have to iterate over all the key values and recollect all the buckets, which is going to be fairly expensive. That's why you want resizes to be rare, which is why we resize every... we double the size every time we resize so that in general we will not need to do it. What this does is we are iterating through the bucket. We're trying to find any element that has a key that matches the key the user gave us. In this case, find gives you a... actually no it will not give me this. It's going to be a little bit annoying actually. How about we not use fire? How about we instead do, yeah, let's make this easier to follow. For pgEvalue in bucket. Yeah. That's easier to follow. We're going to loop through all the elements in the bucket. If the key that existed in the bucket, if any item in the key is equal to the key that the user gave us, then now we know that there's an existing value there, and the semantics of insert in that case is to replace the value. There are a couple of ways you can do that. The easiest is use mem, and in our case, we want to mem replace the value that's currently in the map. So mem replace takes a mutable reference to something and a T. Let's pull it up here. Here, place. So mem replace takes a mutable T and a T, and it gives you back the thing that's inside of here. that was inside of here and puts this thing in there. Is the sort of way to think about it. It's quite literally a replace. And gives you back the old value, which is exactly what we want here. We want to replace the existing value with the old value, and we're gonna return some of this. So this is to tell the user that a value was replaced and tell them what the old value was. If we get through this entire for loop without returning, that means that the key does not exist. So we can now just push the key-value pair that we wanted to add, and we can return it. Because no value existed for that key. So that's our insert. We could also try to test this at some point. In fact, let's write a simple test for us, just for our own sanity. And we'll do this. Let's use super insert. Now insert is gonna create a new map. It's gonna do map.insert foo 42. Notice here that we are doing a quality checking on key. So key, in addition to implementing hash, also has to implement eek. Eek is a trait for comparison for equality. Spectrum, I don't know if that sounds wrong. Well, there we go. We need to use the trait, so we're going to use the hasher trait. So remember that in Rust you can only use the methods of a trait if the trait has been imported, which is what we're doing here. So the bucket here... We want to be a U-size, but finish gives you a U64. This is a little bit annoying, actually. So the problem here is you get a hash that's very large, and I think we want to do the modulo in U64 space. So that way we know that it won't overflow. So the alternative here, we could cast this to U-size and then do the modulo in U-size. The problem is what if U-size is U32? So you're on a 32-bit platform. If you did that, then... the half you got back might be just wrapped, and that might be fine in this case, but I think the nicer thing to do is this, because here there won't be any truncation of the hash, it will just be a straight modulo, and then we cast the result to a U-size, which basically it has to be possible to cast it to a U-size because it will be smaller than the length of the buckets, and the length of the buckets is a U-size. So this... What else does it complain about? It complains that bucket is not an iterator, which is true. So remember, we have this bucket here that's a vec. We could just make this a vec directly. Let's do that instead of having the bucket type. I don't think we need the bucket type for now. It might make some things nicer. Basically, it was complaining that here we have something. So bucket here is of type bucket or was of type bucket. And bucket does not implement iterator, or interiterator, so you can't actually iterate over it. Whereas we know that bucket is a vec. And now that we changed the type for HashMap, now this type is vec, or technically this, which is iterable. And now it's saying one of seven possible tokens. Oh, right. Only traits may use parentheses. It's true. All right. So now it's saying can't compare K with K. Yeah, so eKey here is a reference to a key, whereas key is an own key, so to compare them we either do this or we do this. I think in this case we might as well do that, they're basically the same. It's also saying that expected mute we found v. Oh, this is a ref name. What we're doing here is we're destructuring the iterator. So when you iterate over, in this case, a mutable reference to a vec, the items you get back are mutable references to each element of the vec. So this is why we have this ref mute, we're destructuring that. So we're saying if we just did this, then x would be of type mutable reference to a tuple. And so we're sort of destructuring that into, I want to get rid of the mutable reference, which is the same as saying dereference it. And then I want to deconstruct the tuple into the key, which I just want to read only reference to, and the value, which I want a mutable reference to. So like so, value moved here. False. You're on mute. I shouldn't need to do that specifically. Maybe I need to do this, but that seems excessive. Hmm, weird. Don't actually know why it won't let me just do this, but why not? All right, so it compiles. It probably won't currently work because remember how we just left resizes to do? So initially all our buckets are just empty. So we can't actually push anything into any bucket. It's probably time we do resize. The way we're going to do this is in insert, if we detect that the map is more than so-and-so full, then we're going to do a resize. So here, that means we're going to have to track items. just how many things there are in the map. Because remember, we can't get that easily just from the bucket count, because it can be many things in any given bucket. So we actually need to count this explicitly, it starts out at zero. And we're going to say that if self.items is more than if self.buckets.isEmpty. So either if we haven't allocated any buckets yet, remember the new doesn't allocate. Or if the number of items is more than, say, That's a good number here. Students. So either if we have no buckets yet, or if the map is more than 3 quarters full, then we're going to do a thing here, of course, for self.items.sql.s1. Then we are going to do self.resize. And so what this is going to do is as the number of items in the map grows, initially we'll just fill the buckets. And then at some point we'll say, okay, enough. And then we'll double the number of buckets, move all the things around. And then we'll continue to that and expand inside the buckets that we have. Now this means that vSize will be called immediately the first time you do insert because we have no buckets. The target size is being computed here and now we have to do something. First of all, we're going to have to set up the new buckets. buckets is going to be a vec. It's going to initially just hold empty vec's, so every bucket is going to be empty and we're going to want target size of them. Like so. Consider giving new buckets a type. Shouldn't be this. It's gonna complain to me later, but let's just start off with this cause it's clearer. And then, and this is where the expensive part comes in. We're going to do for, actually we're gonna do, well, there are a couple of ways we can do this. I think what we're actually going to do is for key value in self.buckets.drain. So we're going to get rid of all of the things that's in the current VEC, and we're going to put them into new buckets. So in order to put them into new buckets, we basically need to do the same thing as we did originally. We have to figure out what bucket they fit into. So here, for every item, we create a new hasher, we hash the key, then we stick it into the new buckets, so we find the bucket in the new number of buckets that we have and then we do new buckets, buckets, bucket dot push and then we push the key in. So this is we're iterating over all of the old key value pairs or all the key value pairs that are currently in the map, we're putting them into their new buckets and because there are no duplicates here we don't need to do the search. And then, now that we're done, we then do, let's use standard mem at the top. Then we simply replace the self-bought buckets with new buckets. So now we're going to take the new buckets that now hold all the key value pairs because we've drained them from the old one. And we're going to swap that in place for the existing self.buckets. And now we have buckets. So now we have twice as many buckets and all of the old values have sort of been imported. And so now in theory, this should work. Let's see if it probably doesn't compile. Right, so here's a little bit annoying. This syntax for creating vectors requires that the type you give here is clone. Now we happen to know that a vector new is clone, but all Rust knows is that this is a vector of type kv. k and v are not clone, and so that means that a vec kv is also not clone. And as you see, that's what it's complaining about here too. It's saying that k and v are not clone, and therefore I cannot use the syntax to create this many of this type, because this type is not clone. Even though we know that in the specific case of vecnew, it actually is clone. Playing with Rust, that has been super annoying. Which part? That it doesn't know that it's cloned? Well, I mean, it's actually really complicated for it to know that this thing contains no data. So, I'm not really blaming Rust here, it's just like, don't have enough stuff in the type system to say that this is okay. So the way we have to work around this is we... created with capacity. So when you create a Beck with capacity, it basically allocates that much size immediately. And then we do new buckets dot extend, zero to target size, dot map. And we don't care about the value, but we create it. So what this is doing is it's iterating, it's just creating target size new empty Vecs. So this doesn't allocate, so this should be fairly efficient, but it does have to walk all the buckets and create a Vec for each one, like push each one to the end, but none of this should need to allocate because we did the allocation up here. So this is just like a bunch of CPU operations that are basically unnecessary, but we need to have them there in order to make the compiler happy. Let's see if it's happy now. Expected struct vec found tuple. Oh yeah, this gives us buckets. So really what we want is... Flat... So this gives us a bucket, and we want bucket.drain. So we want to drain every bucket. Why not let mutant... So your proposal is to do... I don't think that will work either. I could be wrong. Yeah, this also requires clone. It's basically the same. Right? This is the same as that. So it still has the requirement that the... Because you're telling it that you want this many of this type, but this type is not clone, and therefore it isn't able to. What does dot dot mean? Yeah, so when you do a drain... When you do a drain on a vector, you give it a range and you say, I want to drain these elements and only these elements. So for example, here I could say like 0 to 3, and that would drain only the first three elements. In this case, I want to drain all of them. So the reason we have to use this double drain here is because we're iterating over all of the buckets. Actually, this could just be... We're walking all of the buckets, and for each bucket, we want to drain all of its elements. We want to take all the key value pairs. And that's basically what this does. Let's see. Oh, right, parameter. Great. So we create new buckets, we remove all the key value pairs from the old buckets, put them in the new one, and then finally swap the two buckets. And now if you run cargo tests, it's going to work. The map doesn't need to be mutable, that is true. Expected item. What? Oh, it's trying to run the examples. Great. Okay, so our insert works. So we don't actually know whether our insert works, but it doesn't crash our program. So that's a start. In order to test whether our insert works, we're also going to have to see whether get works. Get takes a ref self and a key. Let's get back to what this type is. It gives you an option, b. This is going to say we're going to do a lookup into self, and if it exists, then we're going to give you back a reference to the value, otherwise you get none. And the implementation of get is actually really straightforward. Actually, let's do this. Let's have a convenience method for key is a key. Sure, let's do this for now. OK, let me, this is not actually what this type is going to be, but let's use that for now because it's basically right. This is going to give you the new size. I just wanted to encapsulate this code into its own little snippet. Tk. Great. So now this can just do bucket is self.t. Actually, self.ticket of the t. Um, yeah, um, so we're using, so forget, we just want, we don't do the user to have to give us a K because that means that imagine the K is like, um, like a long string or something, we don't want the user to have to own that string. We don't want them to have to like, if they only have a reference to that string to clone it or something. So we only need reference to it. The reason I'm a little bit hesitant is because imagine you have a map where the keys are strings as in capital S string, you want to be able to index into it with like a stir, like a constant string. And this API won't currently let you do that. There's a way to get around that though. that we'll look at later. Let bucket two times won't give you an, won't that give you an already defined error? No, it actually won't. So Rust does let you override names, which is actually really convenient. Like I like being able to do this. In this particular instance is not terribly important. But it's sort of nice to be able to remove an old name from consideration, to overwrite it with, from this point on, you should only deal with this type, or deal with this value, and you should ignore the old one. The reason this is kind of nice is because the type system mostly saves you from screwing up, right? So the types of these are very different. So if I meant to use this, but I got this, it is unlikely that this type is just going to work where I was expecting this, right? It just creates a new variable. The old variable is being shadowed by the new one, so anytime you use this name, you're now referring to what this name points to, rather than what this name points to. Alright, get. Get is very straightforward. We're going to get a bucket by calling self.bucket with the key. Then we are going to essentially search the... search the bucket that the key has to. Here we can use find. Find is given a reference to the things we're iterating over. I think this will actually be this. It's just given a reference to the things that you... No, actually, no. Find is just given the items. Okay. So, iter gives us references to the contents of the bucket. So, reference to every item in the bucket. And every item, of course, we know is an existing key and an existing value. We don't care about the value for find. And all we care about, we want to find the thing where the ekey is equal to the key. And what find does is, if it doesn't find anything where the condition is true, then it returns none. Otherwise, it returns some and this thing. Right? But because we want to return just the value and not also the key, because the user already has the key, we can map this. And we ignore the key and we give just back to the value. Yeah, so I mean, you mentioned underscore JS. This is sort of the functional programming paradigm, sort of, right? And Rust lets you mix these so you can sort of, you can choose the one that fits the situation the best, right? So for insert here, technically, we could express this for loop in a purely functional way. I shouldn't use the word pure, but you know what I mean. But in this case, it's just like easier to deal with this as a for loop. We might later want to... this for loop is pretty expensive, and same as this find, so we might want to keep the buckets in sorted order, for example, so we do a binary search, but that's something we can leave for later. Okay, so we have a get. So in theory we should now be able to do this, and then we should assert equal that math.get is sum 42. Unexpected end of macro invocation. Yes, indeed. So, yeah. So this takes a reference to the key, not the key type itself. So the key type here is refster. And so we need to give a ref to the key type, which is a ref refster. Which is a little sad. Hey, it passed! Okay, so we are now able to insert into the map, and then do a get, and we get the thing back. Can you explain ref key? Yeah, yeah. Okay, so you want these two lines. All right, sorry, these two. Okay, so let's look up iterator, because that's going to help. So iterator, in this case, right, we're getting an iterator over the contents of a given bucket. So remember that every bucket is one of these, right? So every bucket is a vector over tuples of key value. So when we call iter on a given bucket, what that's going to give us back is it's going to give us an iterator where the item is a ref to a kv. So that's what .iter gives us back. On that iterator, we have chosen to call find. So find takes a predicate closure. So the predicate is just a closure that is given a reference to the item and returns a Boolean, whether it matches or whether it does not. In this case, it is given, I'm gonna ignore this for a second, it's given an X where the X is of type. reference to this, and then it should return true or false. In every case, the only way we can tell true or false is by comparing the keys. So we need to extract the key from x, right? So because it's a tuple of k and v. So one thing we could do here is we could do x dot zero. This would do the same thing. However, we don't really want to do that. It's hard to read, right? We would need to remember that it's like the first element and it's a little awkward what's going on. So the way we can do this is we can deconstruct. So in all Rust functions, you can deconstruct arguments directly. So think of this as sort of like doing a match. It's doing a match on X and then using this pattern. this pattern, right? And it's basically saying that I know that this type is this pattern. So in this case, there would be no other thing that it can be. So what we're doing here is we're saying that, we're telling find that you're going to get items that look like this. And what I want you to do is I want you to bind eKey to be a reference to the K, and I want you to not bind anything to v. And now we can refer to eKey directly. And we're doing the same thing here in the map. Does that roughly make sense? Let's see. Rust feels complicated. Yeah, so remember, I could write this fully imperatively, right? So I could do the following. For eKey eVal in self.buckets bucket. So these two segments are basically identical. Right? So this and this are the same. Sorry In fact, if you wanted to make it even more explicit, you don't want to use the pattern matching. You can do x, x to the power of 0 is key, then x to the power of 0. So this is with no functional stuff at all and not using pattern matching, then you would get this code. This code down here is the same code. It ends up doing the same thing. I happen to find this easier to read, but you can do either. And that's one of the things that's nice about Rust. They're like this pattern matching is really, really powerful in part because you can do it in so many places. Like I can do it here. That's just really, really neat. But I find this easier to work with. And in fact, we can do this. Great. Probably unnecessary. Doesn't really matter. Right. So we now have a map where we can do an insert and we can do a get. Is that pattern matching or destructuring? It's destructuring, but destructuring uses pattern matching. You should think of destructuring as pattern matching where there's only one possible pattern. Usually, if you pattern match, it's because there are multiple... Like if you pattern match over an enum, for example, there are multiple different ways in which you could match. Whereas if you're doing destructuring, you're saying that it's a specific type that I want to destructure in this way, and the compiler will check that that is the only type of pattern you can even receive. All right, so we're missing remove, so let's add remove in here. Remove takes a mute self and a reference to a key, and also gives you back an option, because it might not remove anything, and it's an option V. Now, this is very similar to get, like so. It is a little bit different, though, in that it should remove the item if it's there. So here, there are a couple of ways we can do this. Let's see how I want to... There are many ways we can write remove. I'm going to do it the following way. Is that better? Yeah, great. We're going to use retain. Retain is also a really useful method on vectors. Let's go up here. On a vector... You have the retain method. So retain will... Retain will walk the vector and for every item calls the closure you give it. And if the closure returns true, then the element is left in the vector. If it returns false, then the element is removed. And the reason you want to do this is because it turns out that if you use retain, you can be slightly more efficient about how you choose to remove things if you remove multiple things. Actually, in this case, we know we're always removing it most once, so let's do this in an even better way. We need to find the index of the key. So we're going to do self.buckets.position. Position is sort of like find, except that it returns the index rather than the... rather than the value. So find here, return the value that we found, position returns the index of that value in the iterator. So here we're going to check whether ekey is equal to key. And we can use question mark here now, right? So the question mark operator is really neat if you haven't used it before. It basically says that It was initially intended for result. So if you have something that returns a result, and then you call a function that returns a result, then question mark is equivalent to the following. If you have something, question mark, it is the same as match something. If it's okay. If I do, yeah, okay, K, K, error E. is return error. Well, let's ignore the into. It basically does this. So if something is okay, then it unwraps the value. Otherwise, it returns from the current function with the error. Now, it turns out you can also do this for option. So it looks like this. And that's what we're doing here. We're saying that if position returns none, then just return none immediately. Otherwise, unwrap the value. And so we can use that very nicely right here. And now what we can do is because I, so if we actually get an I back here, that means we didn't return from the question mark. So the key does in fact exist, and then we can just do self buckets bucket. So we'll reference to that. Yeah, question mark, what's an option? I think it's unstable. We're, I guess, about to find out. And then we can use the swap remove function. So swap remove replaces the last element in the vector with the one you are removing. And the reason you want to do that is otherwise you have to shift all the things in between. So if you have a vector that's like A, B, C, if you now remove B, then you would have to move. So if you remove this, in order for the vector to not have holes, you would need to shift C to the left. like this. And of course, if this vector is long, and if you now remove B, you have to shift C, D, E, and F to the left. What swap remove does is it swaps B with F, and then it removes B. And now you just truncate the end. And so it's a lot more efficient. It doesn't have to do many more operations, but it does change the order of things in the array. Currently that's okay because our buckets are not loaded. Yep, that's exactly right. So the question mark will return from the function if it gets a none, and if it gets a sum it will unwrap it. So i here is of type uses, whereas position returns an option uses. swap remove dot one because we want this for value. All right, so now we can test this further. We can assert that remove gives us back 42, and we can assert that get again now gives us none. Hey, we have a working house map with remove. Oh, RustVector has a lot. So the RustVector is a very well-documented primitive, in part because they know they'll be used at a really low level. So if you see this guarantees section on a vector, it talks about exactly the guarantees that the implementation gives you so that you can rely a lot on the underlying implementation details. Because you often need this for building more advanced data structures. Okay, so now we have math. We will want a couple of other things. In particular, we will want things like len, which gives you the number of items. Oh, remove should here do self items minus equal one. That's good. And we want is empty, it takes it off itself, with nice rule, self items equal to zero. And now we can add that to here. So here we want a certainly map len is zero. You want to assert that it is empty. After we insert, of course, it should have one, it should not be empty. After remove, it should be empty again. Great. We can also run Clippy on this in theory. Let's nightly install force Clippy. Run that in the background. So Clippy is a really nice tool for just like checking that your code is okay. Let's look at the example now. So the example now does insert, does contains key. Okay, so contains key is basically a get, except it returns a bool. So contains key is just going to be bind.isnotReturnable. I'll go T, what else does it need? Oh, right, this needs to be fendPain. Let's see what our example does. Let's see. Okay, so there are a couple of things that the example from the standard library needs from us that our HashMap does not currently provide. Let's deal with them in reverse order. So the first one is you want to be able to iterate over all the elements in the map. So that's what they do here. They iterate over all the things in the map. Currently, we have no way of doing that, and so we will fix this. We basically want to implement intoIterator for... Oh, sorry. And we want that for any k and v, a v into iter self into a something, which I haven't quite decided yet. You work for Facepunch? No, I do not work for Facepunch. I also don't know what Facepunch is. Oh, Standard Library just says self-get ism? I feel like this is more efficient, but I might be wrong. I feel like this might optimize better, but unclear. I feel like it could... Yeah, I mean, that's what we can do. We don't really care. Sure. It's even easier. Okay, so what we really want is we want some way for the user to iterate over all the values in our map. There are many ways you could do that. So you could imagine something like Drain, where you remove all the key value pairs and yield them one at a time. So that's what Drain does. And that consumes self. It consumes the HashMap because you're giving away all the keys and values. The other thing you might want to do is you want to iterate over all the things in the key in the map. You don't want to get rid of them. You just want to walk them. And the way to do this, like if you want the user to be able to write code like this, like for in something, then if you do this, it usually means drain. So for loops in general require the thing after in to implement into iterator. So which is a trait that just says that this thing can be turned into an iterator of some type. And so if you wanted to be able to do this, now you would get back owned copies of the book in the review. If you do this, you're saying, I want a reference to this thing to implement into Iterator, which usually implies that the things you get back here are actually references to the key and references to the value. So this would be the types. In this case, if you remove this, then this would usually be like that. In our case, because the example just uses the reference, we will just implement into Iterator for a reference to our... which should make this example work. So this has two types, two associated types. It has item, which in our case is going to be a reference to the key and a reference to the value. And it has an into iter, I think it's into iter, which is the type of the iterator. So remember that the... If you call .intoIter on this, you need to get something back. Like it has to be some kind of type. And so intoIterator is what that type is. We might want to be, actually, I don't know if we can use the intoIterator here. Let's see if this works. That would be really nice, but I don't know if it's true. Yeah, so the intoIterator trait only has a single method and that is intoIter, and then it needs to return a self-iterator. Yeah, it's not stable yet. That's too bad. Okay, so we're going to return like a iter. So we're going to need some public struct iter over k and v. Don't know what's in it yet. And that's what we're going to return from into iter. So that's just going to be iter use. So this, iter kv new is going to take a ref to a hasToMap over k and b. Let me do it by itself. And then we're going to have to implement iterator for that type. this type. The type of the item is going to be the same as here. We're going to give back references to the key and references to the value. And the requirement for iterator is that you implement next, which returns an option, option self item. So that is it returns some of an item if there are more things to iterate over and none if there are no more elements. Ooh, more or less, so much chat. Let's see... Oh, yeah, no, this is not Rust the game, but Rust the programming language. And I do not work for the makers of that game. Yeah, Twitch is geared a lot towards gaming in general, so finding programming languages that are called the same as games is not super easy. Let's see, why are lifetimes needed here? Yeah, so, okay. I just wrote a lot of code, so let's walk through what I wrote. The reason I wrote all of this in one step without going through it in too much detail is it's useful to have the code so that we can talk to it as a whole. So remember, the general flow here is that the user has a reference to the benchmark, and they want to iterate over it. That's what the intoIterator trait is for. It is going to call the intoIter method on the... So it's going to call this method. on this thing that the user has, and that's going to give it back some type. And that type is going to be this type, and that type we're sort of promising an iterator that this type implements iterator where the item is this. The reason the lifetimes are needed is because the key and value references that we return in the iterator are tied to the lifetime of the map. If we did not do this, so if we just had this, first of all, it wouldn't compile. But second of all, if this didn't decompile, the reason it would be a problem is because imagine that the user, like, does a iter is book reviews dot iter dot next dot. So this gives them the first item of the iterator. And then they drop book review. But now they still have iter. Because iter was just a reference, but it didn't have a lifetime that was associated with the hash map in any way. So without those, this iter would continue to live. It would be pointing to stuff that has already been dropped. And so that's why that's not okay. That's why we're saying here that the items you get back are tied to the lifetime of the reference to the map. So if the map goes away, the items are no longer valid. And similarly, the items cannot outlive the map. So the map needs to continue living for as long as you want this iterator to work. The audio issue mentioned earlier in the stream just suddenly fixed itself. Well, that's good. Oh, that's so weird. It is really humid here, so maybe the microphone got sad. I don't know. It's weird. Yeah, I do have a GitHub. My GitHub is this one. if you're curious about things. That's me with less beard. Let's see. So yeah, does that roughly make sense why we have the lifetimes there? That's true. We could cut the new. That's fine. That's all right. Well, I sort of like the new, but OK. than you. Okay, so the iterator is going to need to have a reference to the hash map. And it's going to have to keep track of what bucket it's currently at, and where inside of that bucket it is that right. So remember, each bucket might contain multiple key value pairs. And so next is going to be good question. Next is going to do the following. The item it's going to yield is going to be self dot map dot buckets, self dot bucket, self dot at, right. So that's basically what it's going to do. What we'll do is we'll do use get map and then we'll do the, I know this is very functional style. What we're saying is... No, I want this slightly differently. The trick here is if we reach the end of a bucket, we need to move to the next bucket. If we reach the end of the map, then we need to return none and continue to return none. Let's map on self, map buckets, get self.bucket. If this gives sum, then the question is now whether the bucket contains self at. If that's none, that means we've spilled over the end of the bucket. If that's sum, then we can yield that element. And this, of course, is a key value pair. So we return v, except we'll have to move along self at and self. Do you plan what you're writing or are you thinking of it right now? I'm thinking of it as we go. That's partially why the code moves around a lot. But I have implemented hash maps before, so that probably helps. Okay, let's see. If let's deal with the most straightforward case first, this is going to be bucket zero at zero. If we get an element, then what we really want to do is we just want to return sum of v. However, we also need to bump self.app. Right? So we're going to say that we're going to return the current element, but the next time you call next, we want to get the next one. So we're going to increment at. Now we don't need to do anything about bucket. We don't need to like check whether at is inbound. Because if at if the next time we call next, then at is out of range for that bucket, then this match is going to return none. So in this case, here, here, this will This indicates that we reached the end of the bucket, and so we will do self.bucket plus equals one, and self.act equals zero. So this is saying we reached the end of one bucket, now start from the beginning of the next one. If, on the other hand, the bucket is out of range, so we have reached a bucket that is not in the list of buckets, then now there's just no way that there are more items. And so in this case, we can return none. Now notice that here we sort of need to continue, right? So one thing we could do here is we could do self.return self.next. While this would be nice, that's not actually okay because Rust doesn't have a guaranteed tail call elimination. So this might be like self.next call self.next. Imagine you have a very sparse map. So the map is like all the buckets are empty except the very last one. You're going to keep calling self.next inside of itself, so your stack is going to keep growing, and then you're just going to run out of memory because you're walking all the buckets but calling a function each time. With tail call elimination, this would go away. The way we're going to get around here is we're going to flatten that by making this a loop, and then say that in this case we break with none, in this case we break with some, and in this case we continue. loops can now return values, which is really neat. A lot of people don't know this, I think, that you can break with a value, and that will be the value yielded by the loop. I think storing outer inner. Yeah, so Rust doesn't, it sometimes adds tail calls, but it does not guarantee that it adds tail calls. There's been some discussion, I don't know what the current state of it is, of introducing a new kind of return keyword that you could do like, instead of return self.next here, you could do like become self.next, which is like, it would be a compile error if it could not be tail recursive. But for now, we'll just flatten it with a loop. It's fine. Yeah, you could probably, instead of storing these U sizes, we could probably store direct iterators into the bucket. But this code is fairly straightforward anyway. But so this roughly makes sense. So the way we're going to iterate is we're going to check whether the bucket and the the index into the bucket that we're currently pointing at is valid. And if it is, then we increment the index into the bucket, and then we yield that value. Otherwise, we move to the next value and try again, or move to the next bucket and try again. Yeah, there's... I should find this detail. This one, this one. Yeah, there's been a lot of discussion on this. Proper tail calls. Sounds nice. Wait, why does this link not work? Oh, there we go. Apparently, that's been closed. Probably just postponed. Yeah, okay, so it's just been postponed for who knows how long. But yeah, there's been discussion of it. Nothing has been settled quite yet. As Ganko pointed out as well, it gets a little bit tricky in Rust because of destructors. Like, when do you call them if you're recursing? It's a little tricky. Even more proper tail calls, that's true. Okay, so we have, in theory, an iterator. It's complaining the K might not live long enough. This is another thing that's going to go away eventually with implied trait bounds. Basically, we're saying that not only is the pointer to the map going to live long enough, but the values in the map also live long enough for the iterator to work. Ooh. OK. Yeah, because our iterator yields both the key and the value. And so therefore, we don't want to yield just the value. The reason we destructure nonetheless is because otherwise, the item would be this, which is a little bit more annoying for the user to work with. Like if we just did this. Well, I guess that. Then this would be the item type. And this item type is a little bit more annoying for the user to deal with because they have to destructure and use refs. So if we instead do this, then we get the nice return or the yielded item type of two references. All right. And this complains now because it needs the map, which is true. Ooh, so one optimization we could add here, I don't think I care enough, is we could count how many things we've yielded. And if that matches the number of things we know are in the map, we could stop yielding more items. This means you don't end up iterating over the railing set of buckets, but assuming the hash function is uniform, this should not matter all that much. Yeah, that's a good point. So Ganker just pointed out that if you do, if your return type is this, then now you require that internally you store tuples of kv because otherwise you couldn't produce references to it. Whereas this is just saying, I have a reference to a k and I have a reference to v and you're not guaranteeing that they're stored together in any way. That's a good point. You want to count any way to provide size hint. That's true. I'm gonna probably not implement size hint for now, but maybe later. We might get to it. Might get to it. Let's see. OK, so that works. Well, I guess we don't actually know whether it works. But let's write a simple test for it. Itter. So we're going to insert some values. Boo, bar, buzz. What's the last one? Quarks? I feel like that's the. for k v in math, match k, match star k. If it's foo, then the sort peak v is, actually, let's just destructure there. Match would be 42. Technically, this test is not correct because it could be that you never get any of these values or that you get some of them multiple times, which it technically should check. But what we're going to do instead is just... Well, into iter dot count. So this is a somewhat weak test, but it will basically get the job done. Test just the lib. Okay, so it works. So at least we know that it generates, we know that the iterator indeed generates four items, and we know that those four items, the key value mapping is correct, even though it could be they're all foo. We don't actually test for this. But let's just assume it's right. Oh, yeah, the unreachable macro is really nice. There's unreachable and unimplemented. They all just panic, but they're just nice semantic markers. They panic with slightly different messages, I think, is about it. Oh, it's raining. All right, so what are we now missing to the example? OK, so notice here that they have any other neat macros, probably. So the. The ones I think I use the most are just, that's not what I want, macros. Yeah, so the debug assert macros are the same as assertions, but they are compiled away in release mode. So those are kind of nice. Eprint and EprintLine prints the standard error. There isn't a can never happen, but that's basically what unreachable is, right? One thing I have wanted is incomplete. So something that will only fail to compile if everything else compiles, but it does not exist yet. Include string is pretty useful. But yeah, I think unimplemented and unreachable are probably the most. That was nice. All right, so yeah, so the thing here, if we look at the example again, notice that they insert using strings and the values are strings. This means that this HashMap, yeah, as they point out here, this HashMap has a key type of ref stir and a value of ref stir. And then they try to do contains key and give just a ref stir again. But notice that contains key take a reference to the key, which would technically be like a ref refs. Right? That is the value we're expecting. So this would compile just fine if I did this. Right, we can test it. Yep, so that runs just fine. Examples. Cargo R, example std1. Great, so that runs just fine. But we don't really want the user to have to give this extra ref. And so that's where this really neat and also somewhat confusing trait called borrow comes in. So borrow, let's do the create level documentation first. So borrow is a little bit of a... It can be a little bit confusing. They've done a lot of work on the documentation, so it has gotten better. They also have an example that is basically HashMap. So I'll basically be saying what's here, but this also means that you can read it on your own time if you want to. The observation is that containsKey get and remove. We don't want them to take a ref k. We want them to take a queue where, I'm gonna just write this out first. And then I will explain what it does. Did I miss the syntax of it? Okay, so we want Get to take a reference to anything that's a queue, where K can be, the way to read this is, where K can be borrowed as queue. So borrowed as means if you have a reference to one, you can get a reference to the other without any conversion. It's just, if you have a reference to one, that is also a reference to the other. And then we require that Q is hash unique. And because borrow implies that the reference is to the same thing, if Q is hash unique, that hash unique must be the same as the K hash unique. So borrow is basically saying that these two types can both be, if you have reference to one or the other, they can both be considered the same type of reference. And the thing pointed to has the same semantic meaning. In this case, what this means is sort of that if you have a queue, you can use it as a k. Or rather, if you have a reference to a k, you can treat it as a reference to a queue, and it will be the same. We also say that queue doesn't need to be sized because we're just taking a reference to it, so there's no reason for it to necessarily be sized. I don't think this matters all that much. But it's in the standard library, so we might as well use it. I think it's so that queue can be stirred. This way, Q can be str, which is not normally sized. OK, so in this case, this is going to change a few things, because now bucket needs to be the same. I'm just going to take a Q. Oh, and a ray t. That's a good point. Yeah, so this basically guarantees that the key, if you hash the queue, it gives you the same hash value as if you took the corresponding K and hashed it, which is why this should then still work. I want to use... Borrow, borrow. I want auto imports, especially from the standard library. Right, so now the trick here is to say that the E key, so remember E key here is a ref K, and the key we have is a ref Q. So in order to compare them, we're going to have to borrow the... the K as a Q, which is what this would do. Similarly, we can now add the same trait bounds to contains key. We can add the same trait bounds to remove. And now here, of course, as well, we will also have to borrow And lo and behold, the example now runs exactly like it does in the standard library. So the exact same examples now work on our HashMap. Our HashMap is worse in a lot of ways, but it has the same API. It provides the same features. So this is pretty nice. This is like a good starting point. The next thing I think we want to add is the entry API. Let me commit this first somewhere. support first HashMod design. So let's set this. It turns out it's hard to type things when they're off screen. All right, so we're gonna make a new repo so that all of you can see this. Just basic Let's build a HatchMap. Probably not the most useful thing, but hey. Load add this. So what we have now, now this code should all be pushed here. So if you want to see the code, you can now go browse it here and it should all work fine. Great. Now let's look at entry. Okay, so the entry API is really neat. If you haven't seen it before, you will be very happy. So the idea with the entry API... Oh, that's terrible. is basically that you can get a reference to where something will be inserted into in the map. So the way to look at this is here. So entry, entry, why? Want this to not collapse. Okay, so entry takes a mute self and a key. and gives you back an entry. An entry is an enum of either occupied, so there exists something for that key already, or vacant, there does not exist something for this key already. On an occupied, you can choose to replace it, or you can choose to get the key that's currently there. With vacant, you can choose to insert. But the sort of crux of this is you have these orInsert methods, which let you do things like... What's the best example of this? Imagine that you're keeping a counter of some sort. So you've probably written Python code that looks a little bit like this. 4x in x's. If x not in, so you have some collect is equal to sum. If x not in collect, then collect x equals this, and then collect x.push, right? This is code that everyone has written or something that's similar to this, right? Well, you would push something else, but like code similar to this where like, You have to initialize the things in your map somehow. This is what entry is perfect for. So entry is an API that lets you do the following. BookReviews.entry, my book. Or insert with, So this means if there is no entry for my book, then put a vector there. If there is, then just give me the vector. This returns you a mutable reference to the value, so to the vector, so you can do this. Right? So this is, if my book did not exist before, it will insert a new vector and push 42. If it did exist, it will just push 42. This is a really neat API for working with collections. And so what we're gonna do is we're gonna implement that. So we're first gonna, we're gonna do the same thing we did initially. We're gonna take the example from up here. which basically just does a bunch of stuff with that. Let's go and create path map, use this, FN main. Great. So the basics of the entry API is that you have a pubFN entry, gives you a mute self and a key. and returns an entry over K and B. And we're going to have to write some code. How do you know when to create new files? So in this case, currently, I'm just keeping the entire HashMap in one file because it's not all that large. I think the entry API is actually in its own file internally, but I don't remember. So good question, source. Oh, that's pretty unhelpful. I think entry is in its own file in the standard library. In general, I do it once I need to scroll lots in the file. Entries all in line? OK, yeah. But yeah, I like to try to keep files contained by function, mostly, not in the sense of a function, but by their functionality. So in this case, HashMap is a fairly self-contained thing. And so I like to keep all of that in one file. Sometimes I will extract out particularly hairy things or things that require unsafe, but it doesn't really matter all that much. OK, so we're going to have a new called Entry. It is generic over KNV. And just like the real map, we need to be able to track the difference between an occupied, which is going to have something, and a vacant, which is going to have something. Specifically, we need a Pub struct occupied entry, KV, and a... vacant entry. The reason we need these to be different is because you can do different things with an occupied entry versus a vacant entry, right? They just have different things you can do to them. So an occupied entry, all an occupied entry really needs is it needs So it's going to regenerate over a lifetime. It's going to have a mutable reference to the entry in the vector. Now, I'll get back to in a second why that's not enough. But for now, let's just look at this. So element. So it's going to have a pointer to the currently occupied entry. So the thing that's in one of the items of the bucket. A vacant entry, on the other hand, all it really needs to do is it needs to have a reference to the bucket that it's going to insert into. Now, this is going to also keep the key around because when it eventually pushes, it's going to have to push that key. The occupied entry does not need to do that because the element already contains the key. Yeah, I've noticed that. So lib collections usually has big files that are relatively self-contained, except for things that are internals or unsafe. So occupied is gonna hold an occupied entry, AKV. Of course, all of this is tied to the lifetime of the HashMap because you can't start messing with an entry after the map has gone away, for example. Vacant entry, AKV. Right. And now the place where all the convenience comes in, if you have a entry, okay, V, then you can do, let's just do or insert for now. Is that all the example uses? I uses both, okay, or insert takes a consume self and takes a value. So the the value that you want to reserve, wants to insert, and then gives you back a mutable reference to that value. So the way to think about this is that, or insert, if self points to an occupied entry, then it will just give you a reference to the occupied, so to the V that's inside of the element that already existed. Otherwise, if it's a vacant, it will put that value into the map and then give you a reference back to that same value. In this case, we're going to mapSelf. We're going to say that if we are an entry occupied, we're going to have to do one thing, and if we're a vacant, then we're going to have to do something else. If we're occupied, we don't actually need to do anything with the value. We can just throw it away. Specifically, we can just do e.element.1. You just give a reference to the value that's already there. If it is vacant, on the other hand, then we need to insert the entry. So we will do, we'll actually destructure. I guess we can just do e.insert. And then we'll implement. Because you can imagine the user wants to have finer grain control and started interacting with these types directly. And so therefore, we'll implement an insert here. And this also needs to be pub. So the insert here also just consumes self and takes a value v and turns a new v. So this is going to self.bucket.push. self.key and then bucket.last. Is there a last? I don't remember. Does vec have a last? vec. It has a first. Last, great. .map, well, actually .unwrap. Um, why should you add tick in the impulse block? Isn't the tick in the type enough for the compiler? No. So we need to say that we are sort of generic over any lifetime. Because if we did this. it wouldn't know which lifetime this tickA is. So for example, the example of this would be static. This is valid. Static here is not generic. If I did this, which you should probably never do, static does not refer to static. It refers to some lifetime that is provided for the input block. That's why we need to give tickA. What does the .1 do? Remember that our buckets keeps key value pairs. We want to return a reference to the value, so the .1 gets the value out of the key value tuple. I'm taking a mutable reference to the value. The reason this unwrap here is safe is because we have just pushed to it, so we know that's our last element. Great, so that's what orInsert does. OrInsertWith. is very, very similar, except that it takes an F. It takes like a maker F, where F is an FN once that returns a V. So the idea here is that imagine that you have something that's expensive to create. So imagine you had a vector that actually did a bunch of allocation. With the code that we've written so far, if you have bookreviews.entry.or and search, vecnew, then this means that you are always allocating a new vector, regardless of whether or not foo exists. Whereas or insert with, you would not call vecnew, but vecnew would be called for you if foo does not exist. So you avoid doing the work of constructing the item unless you need it. And so that's what this would do. This would here be maker. So that's or insert with. And then one thing that's, I don't think it's stabilized yet, but it's really convenient is or default, which does not take any of these, but it requires v to implement default. And now... If you had a map where the values are vec, for example, then you could just do entry foo.orDefault and that implies that you want or insert with vecnew. This is in the standard library in nightly, but I don't think it's stabilized yet. Right. I think this will currently work with the example maybe. I guess we're about to find out. Oh, this is the buckets. May not live long enough. We need to guarantee that the keys and the values won't go away either. It's a little sad. Means we have to do this. Anyone? Oh, sorry, it says to be here. And now 39. This we need to guarantee that it lives long enough. Great. Okay, so now for the entry method itself. The entry method here is a little bit finicky. So what we want to do is first, we want to get the bucket for the key. So that's all fine. And now we want to match, we want to try to see whether we can find the value, right? So remember we need to return a different kind of entry depending on whether or not the element exists. So we will do bucket.find. So where's the thing where we do find, it's here. Yep. If that finds a... this is going to be a mute this time. If it finds a bucket entry, then we do one thing. If it does not, we have to do something else. Here, my guess is lifetimes are going to bite us, which is a little bit sad, but we might be OK. Yeah, it's going to... we're going to have to... Actually, let me write this out, show you the error, and then tell you how to fix it. So let's see. So if we do find an entry that matches here, then we want to return an entry occupied. And then an occupied entry where the, what did we call it? Item? Element? Element. So the element is going to be that entry. So I guess we could probably call this element knows. Instead make it entry. Entry is good. Right? If it's none, so this means we didn't find something with the key for the entry, then we return an entry vacant. That's gonna be a vacant entry. And the vacant entry has a reference to the bucket and it contains the key. So this will have key and bucket. So now you might think that this is all good, but my guess is the bar checker will yell at us. Oh, right. This is now called entry. This is now called entry. Anything else? 92. Do I really need to do that? Can't compare k with k. Oh, here we have a key of type k if we don't have this Q trick, so we don't need to borrow. 46, this should be entry. what am I talking about? This should be self dot or insert with defaults. Can't compare in 87. The key. Right? Yeah. So here's the compiler that I thought we was going to get and we did indeed. It's telling us that bucket is borrowed, but we're trying to move bucket down here. And that is because this find returns an option that points into bucket. So in the sum case, we do actually have a pointer into bucket. So if we here try to move bucket, that would not be okay. In the none case, though, we don't have a pointer into bucket. It would be safe to reuse bucket. But the compiler does not realize. The reason it doesn't realize is because it thinks of borrows in the scope of blocks. So here, the thing that's returned from find. is considered borrowed for this entire match, not just for the entry that gets the reference. There are two ways to fix this. So if you're on nightly, then you can turn on the feature NLL for non-lexible lifetimes, and that just fixes this issue. The compiler realizes that in the none branch, you don't actually hold on to the pointer, and therefore it's okay for you to use buckets again. Because we want this to work on stable, because why not, we instead do this. a little less nice. But here, the reason this works is this reference is now only borrowed for this the contents of this if let, which means that after the if let bucket is no longer borrowed and so therefore we're free to move out of it. feel like that's false. I have been working too much in NLL land so it might not like this is why NLL is nice because it lets you get rid of so many of these things. Why am I not allowed to do that? That's super weird. Okay, fine. Buckets, bucket, new. This should definitely work. That's really strange. Why is it complaining about that? And of course this goes away. And compare because this... I am building a HashMap! Well, we are building a HashMap. What? Cannot borrow self.buckets as mutable more than once at a time. But I'm not though. That's super strange. I have no idea why that is not letting me compile, but in order to not waste time on it, we're gonna implement it the other way. We do this instead. So we're just doing the imperative way instead, because that might be why it's confused. But it shouldn't care. Entry, if entry.zero is key, then return entry. No button. What on earth? Why is it not letting me do that? Oh, that's so weird. That is so weird. Well, if it's not letting me do this, then we will turn on NLL because I don't want to fight with this right now. So this is what we had, feature NLL, turn on all the goodness, rest up over I'd said nightly, cryo test. Am I doing something weird inside the... Oh, that's so strange. Buckets, buckets. Mute, cell, buckets, bucket. Don't know why it's not letting me do this. Let's sell entry. And we'll see something like obviously wrong here. Because to me, this seems totally right. And also, hmm. Why is it complaining about this now? If let sum that, then return this. Ah, there. Let's see. Let's see. So it's telling me that I'm borrowing from self.buckets here, and that therefore I'm not allowed to borrow mutably from self.buckets here. which to me is very surprising. Because that would return, and there wouldn't be any borrow of self at that point anymore. Be a borrow of key, but that should be fine. All right. So let's. try this again this way. So entry and mute self buckets. Okay. If g dot zero is equal to t. Hmm. I have no idea why this is not letting me do that. Because entry here should be safe. So entry is a pointer into self buckets. If we match it, then we return it. And return should be safe. So the question is, why is this borrow of buckets not allowed when the borrow of buckets up here should be done at that time? And it doesn't even help scoping this? It's very confusing. I think this is a bug in the compiler. Well, in the borrow checker specifically. Because unless I'm mistaken, this should be perfectly fine. Hmm. Weird. A little bit annoying because I don't have a good way of getting around this. The bucket, so it's borrowed. First mutable borrow occurs here. Borrowed value must be valid for the anonymous lifetime defined there. So let's name the lifetimes and see if that helps us. Borrowed value must be valid for the lifetime A. Oh, I wonder. I wonder if it's Oh, it's because the entry, the lifetime for this borrow needs to be the same as the lifetime for this borrow because we're turning an entry take a I wonder what they do in the standard library for this. I guess we're about to find out. Well, that's interesting. Um, source gave me that source. What is this into entry do? Well, that's pretty unhelpful. Oh. Well, I mean, we can, we can technically get around this with unsafe, which I think is okay. but it's probably going to come back to bite me. But sure, let's do some unsafe. Sounds good. Specifically, the reason we can do unsafe here is because we know that either we will return this or we will return this. And both of them are tied to the lifetime of self. So this shouldn't actually be unsafe. So this will be mute star entry. Specifically, what we're doing here is we're dereferencing and then re-referencing the entry, which I think should erase the lifetime. So we want entry as mute, and then we will... So we're basically converting it to a pointer and then back out. And this so the trick we're pulling here. Yeah, I know. It's pretty sad. It shouldn't need to be unsafe. I think it's that the compiler doesn't realize that these two lifetimes are non overlapping. I'll probably file a bug afterwards because this should not be right. See, let's go back for a sec. And just like checkpoint this Almost but needs unsafe bug question. That way I can file it later and hopefully it will be hopefully the smarter rust folks than me will figure out what's wrong. And then we'll go back to this. Oops. And hopefully eventually we can get rid of the that. Okay, so now I think we can run the example. Oh, attempted to calculate the remainder with a divisor of zero. Ah, okay, so imagine that the very first thing you insert into the HashMap is using entry. So remember how in insert we have this like trick where if you do an insert, we do a resize if necessary. We're gonna have to do the same thing here. Specifically, if you decide to insert into a vacant, into a vacant entry, then now we need to, first of all, we need to update the number of items in the map. And second of all, we need to resize the map if it's too small. So I think what we're gonna do here is we're gonna have this actually contain the, contain the pointer to the map itself so that we can update things appropriately. And then we'll have it store the bucket as in the index of the bucket, because that way, When we do or insert up here in vacant, this can now do the same kind of logic as our insert did. Specifically, this is now going to be a map, which is going to be a HashMapKV, and a bucket, which is going to be a USAT. And now when you insert using a vacant entry, if we detect that the map has no buckets, or if the number of items in the map is more than three quarters, here technically we should probably have a helper method for this. But basically, if we detect that those things are the case, then we resize the map. And then we do self map. buckets self.bucket push self.key. We do self.mapItems, plus equals 1, because there is now one more item in the map. And then we return a reference to the thing that we just inserted. This, of course, has to call resizeOnMap, so you can't resize a vacant entry. Oh. That does mean that this now needs to require that k is hash unique, although in general, you can't, ooh. You, in fact, can't even, so for the entry API, you can't even get an entry without k being hash unique. So that should be fine. And now, ooh, that's a lot of complaints. Oh, did I? I don't understand why I keep putting fullends after wares. Eek is not implemented. Right. And finally, no field one. Why? Oh, buckets. Right. We need to find the specific bucket that we insert it into, and then we find the entry that we just pushed, and then we return a pointer to the bucket. And now it's telling us that we cannot borrow immutable item self.map. That is because self here is not mutable. Sorry, that's because this has to be a mutable reference to the map. Because otherwise we can't change it in any way, right? Now what does it complain about? It's complaining that we can't... Trying to divide by zero again. Line 94. So that means bucket got called from somewhere when we hadn't even resized. That's because we compute the bucket here. Yeah, so we actually do need to do the resize outside. OK, that's fine. It's a little sad because it means that if you use entry, get a vacant entry and don't insert anything, then you still now cause the map to allocate, but that's probably fine. So we basically do the resize in entry so that we guarantee that. In fact, we might have to do it here anyway, because imagine that the insert caused a resize that would change which bucket you end up putting stuff into. All right. And now this is just going to be self, because there's no self.map anymore, because we're already on the map. Hey, didn't print anything though. That's a little bit sad. But this does mean that the entire example succeeds. So one thing we could do, now we don't need an LL anymore, which also means that we can move the override. Great, so let's now try, what's the last thing we have for HashMap here? Oh, this is just some other example that they have. This other example, I think, is mostly just to see that you can use other types. Standard one. So if you see this example, it is mostly just testing that the map works with some type that implements eek and hash, but that is not like string or use size or anything. So this should just work for us out of the box. Great. This thing is a little neat. So let's see, examples, 100 forest. OK, so this example is constructing a map by taking a list and collecting it into a map. So remember that in Rust, in general, you can have a list of, you can have an iterator. And on iterator, where is iterator? Go to the top and iterator. Iterator. So the collect method on an iterator takes all the things that you've been iterating over and collects them usually into a vector. But you can also, if your iterator is over tuples of two elements, you can also collect into a hash map. which is really, really useful. And we sort of want our HashMap to be able to construct it that way too. And you see that in order to get the collect method, you need B, so the thing you're collecting into, to implement from iterator over the things that the iterator yields. So in our case, that would mean that we would have to implement from iterator for our map. So let's see, we're going to add examples two and three. Entry API. And now for example four, what we want to do is we want to implement from iterator for let's see, impl anv. from iterator for HashMap. Looks like from iterator is not in the prelude. So we need to import this. Let's see. So for from iterator, so it's a generic trait A. Specifically, we want to implement fromIterator over things that are couples of key value. So if you have an iterator over a key value, you can collect into a HashMap with that key and that value type. The fromIter method is generic over T, where T is the iterator. We could call this I if we wanted to. I prefer I for iterators. It's not terribly important. So we can take any We can take any iterator, anything that can be turned into an iterator whose items are key and value. And so what we would do, there are a couple of different ways we could do this. The easiest way is to say mute map is hash map new. And then for kv in iter, map.insert kv. And then return the map. Did I mess up the syntax here somewhere? Probably. Oh, use. And of course, this requires also that where k implements hash amik. Great. So the idea here is that This is a very straightforward implementation, right? We have a map and we iterate over the iterator we're given. And for every key value pair, we insert that value with a given key. And then we return the resulting map. Now, you can be a little bit more efficient here. So, for example, if you know, so the iterators, for example, have this size hint. So, ScytheSint tells you how many elements may be coming down the line. And if you know how many items may be coming, then you can essentially pre-allocate enough space to fit all of them. So, remember now we just like double the number of buckets whenever we detect that it's starting to get full. Instead of doing that, we could say that in advance, I can allocate space that will hold about this many items. And that way I might not have to allocate at all in the map while I'm iterating over this. In the current state of things, I'm gonna keep doubling the number of buckets some number of times, basically log N of the number of items in the iterator. And that's probably fine in our case. We don't, this is not a very performant HashMap anyway. But, And I think in the standard library, for example, when they implement from iterator, they make use of all of these size hints for iterators, for example. Similarly, while we're at it, the other thing we'll want to implement is... Remember how we have this Intuiterator for HashMap? It's specifically for a ref, so you can get a reference to all of the items in the map. We might also want a way for the user to just extract all of the things in the map. Think of this as Drain. Actually, we can implement Drain too. We want something that just gives you all of the items. We can do that pretty efficiently, actually. might as well do that because it's a feature that people use. Let's commit this first though. So this worked. So we will add the example, std for from iterator. So now you can collect into our new map. And finally, let's add into iterator for map itself. So this would be if you wanted to extract all the key value pairs, not by reference, you just wanted to like drain, you wanted to take everything from the map. So this is not quite the same as a drain. Because a drain leaves the map intact, it just removes all the key value pairs, whereas this will also deconstruct the map. So this is gonna, it's going to basically be the same. It's gonna be a little bit different, because it will just contain a sad, it's a couple of ways we can do this. But we can actually have it be basically the same as what we had up here. So we have an iter. And then we will have this, which is going to be an into iter. which has pretty much all of the same things. But it owns the map. It implements iterator. But it implements iterator over k and v directly, rather than over references to them. And apart from that, instead of doing, oh, yeah, let's do get mute. And this will do a... So we sort of want this to do a remove, right? But remember how I mentioned that if you remove something from a vector, then you have to shift all the things to follow it? So we sort of want to use swap remove here. But unfortunately, swap remove... returns a T and panics if it's out of bounds. So remember how we rely on the fact that we can detect if self.add has moved past the end of a bucket and moved to the next bucket. Swap remove would not give us that information. So we need to basically reverse this flow a little and say that if self.add is more than equal to bucket.length, len, then we've spilled over. Else we break with sum of self.bucket. Sorry, no, bucket.swap remove at. Now, you might also notice that if you're very perceptive, swap remove, remember, moves things around. So this means that we will not actually iterate in bucket order because when we get element zero of some given bucket and yield it for iteration, we're going to swap zero with the last thing in that bucket. The next iterator, ooh, actually, this is wrong. At here is always zero because we use swap remove. Aha, that's actually kind of nice. We're going to keep removing the zeroth element. So we're going to remove the zeroth element, and then the last element gets placed in the zeroth spot, and then it gets truncated. So if there are elements left in the bucket, they will always be at least one in index zero. So if the bucket is empty, we move to the next bucket. Otherwise, we just take the first element. And so we don't actually need at at all when we're draining. The reason we can't do this in the other iterator is in the other iterator, we're not removing anything. So you can't remove element 0. You could technically swap things around, but there's no reason to do that. You can just walk the bucket instead. Whereas here, we just want to keep removing things from the bucket until each bucket is empty. And we could test this by saying that essentially doing the same thing as we did here, actually extract the KNV from the map. So this is the old test we had for iterators. And we're saying here, items is zero, items plus equals one, and then we assert that we indeed got four items. Um, is it mildly more efficient to remove the last item, not swap and remove the first one? Do you mean then remove the first one? So we could, um, yeah. So the, the proposal here is we could, instead of doing this, we could do, um, bucket dot pop. Uh, that works too. I think they're basically about the same. But yeah, so one way we could keep this code sort of similar to what it used to be is we do pop, which does what was suggested here and removes the last element. And if it's sum X, then we break with X, with sum X. Otherwise, so if you don't, if you pop and there's nothing there, that means that it was empty. And so we continue. Yeah, that's pretty nice. I like that. So this still ends up doing exactly the same thing, except that it removes from the end rather than removing and swapping from the beginning. I think you would struggle to find a performance difference between these. But this is probably better, I would say. See if this works. Also, let's fix this so that it doesn't give us a warning. So now we also have intoIterator. And now that we have intoIterator, we could also implement drain pretty efficiently. And drain, let's not do it. It's fine. So drain is sort of a similar kind of we put an iterator struct there, but what it's going to do is it basically does exactly the same as our intoIterator does, but it does not consume the map. It just empties all the buckets. So it would be just like this one, except that this would be a mute hash map instead. Apart from that, the code is exactly the same. It just keeps removing all the things from all the buckets and then returns the. And then at the end, you have a map. But it's a, oh, actually, we can do this with a cow. Meh. Unfortunately, you need a different implementation because one of them needs to take this, whereas the other one needs to actually own the HashMap so that the HashMap gets dropped at the end. And so you can't actually have them be quite the same. There's the cow type, which is really nice, which lets you capture either owning or having a reference to, but unfortunately, it doesn't capture having a mutable reference or owning. That's not something cow deals with. Um, okay. I think we're like pretty decent with this. Now we support, there's a lot of things in the HashMap API that we don't support. We support all of the examples that they give, um, all of the basic features. The things like with Asher, like we talked about in the very beginning, this with capacity for learning about how many things you could fit in the, in the map before it would need to be resized. Um, now. Capacity works a little bit differently for us because our map is linked. There is no real capacity. You can always insert more elements. They would just get, your buckets would just get really long and you would have to search them. Whereas in the Rust standard library hash map, because it's using open chaining, so there's no, every bucket holds exactly one or at most one element and other elements that are pushed just get pushed to other locations in the map. Because of that, there's actually a capacity. The number of buckets you have is the number of elements you can insert. That's not the case for us. So that's why they need more of an API around capacity and reserving additional capacity. We could still approximate that because we know that we resize whenever there are more than three quarters of the number of elements. There are also other strategies we could use, like if any bucket has more than five elements, then resize, which would be a different kind of capacity reservation. And notice also they have operations like shrink to fit. So this is reducing capacity after you've resized to grow. So that's something we could do too. Like if we detect that the, after a remove, there are fewer than this and this many elements, then make the map smaller. There are other iterators, like things that give you only the keys, things that give you only the values. I don't think those are terribly important to check. There's clear, which is actually pretty useful. So clear empties all the buckets and drops all the values. but is also not terribly important. Yeah, here you see the same trick we pulled us with borrow. And they have that for remove as well. Remove entry, I don't know what remove entry does. Oh, so remove just returns the value, remove entry returns the key and the value. Does that make sense? Retain is just like it removes anything. It just does an iteration. And you call the user-provided closure for each element. And if the closure returns true, then you keep the element. Otherwise, you remove it. So the reason, actually, I should mention this. The reason that... occupied entry just holding entry is not really sufficient, is because normally there's an implementation on, there's a function on occupied entry called remove, where you remove an entry that was already there. And in that case, it would actually need to keep a mutable reference to the map. So we could say, decrement the, the number of items in the map, and also if it just has a mutable reference to the key value, that means it doesn't get to change the vector. Whereas if you remove something, you actually need to do a remove on the vector itself, like from the bucket you need to remove this key value pair, which you can't do with just this pointer. So that's also something we could add. When you match to a monad like sum, is there a way to bind the entire monad instead of having to wrap it? Yeah, so you can. Let's see if I have a good place where I do that kind of match. I guess we did that in Iterator. Right? So here, yeah, so there are a couple of ways you could do this. You can just do x. So you can just match the name. You just don't destruct it. Right. The other thing you can do is if you want to do it, but so if you did this, then none would never be hit. So that's not OK. What you can do is you can do this. So this will match against the pattern, but bind x to the whole thing. This would mean that you're not allowed to do this, because you can't both move out of the sum and keep a reference to the outer thing, because the inner thing has been destructed. But you could do this. Yeah, exactly. Yeah, so these kind of add matches are pretty useful when you have very large matches where you are sort of, usually if you have something like a single channel coming in and you want to multiplex to multiple channels, you often want to match without stealing things and then just forward. And now you can do that here. All right, I think that's like... about all I wanted to cover in this first one. The one thing we could do, what's the time? Yeah, so one thing we could do, we do have sort of the time for it, is to, we could have the buckets be sorted. So currently the buckets are in random order. This is why we have to walk the entire bucket to find an element with a given key. Whereas if they were in sorted order, then now you can just do a binary search on them to find the appropriate key. and that is a lot more efficient for lookups, but it is much slower for inserts because insertions now need to do basically an insertion sort. Like, they have to find where to insert, put it there, and shift all the other elements over, so it makes inserts a lot more expensive. If people are interested in seeing, like, a sorted bucket version, we're basically just going to take this and ensure that we maintain that it's sorted in all places. It's unclear whether that'll be interesting, but if you feel like it'll be interesting, then give a shout out and I'll consider it. The other thing I want to do later is maybe try a more intelligent hashing scheme. So for example, you could do cuckoo hashing, where cuckoo hashing is an open chaining scheme where you use two hashing functions. So you don't chain any of the buckets. So then we would have capacity and such like the standard implementation. And you hash to, you get both the hashes for a given key. You check if either of them is empty. And if they are, you put it in the one that's empty. If neither of them are available, you pick one and then you check its two alternatives. And if one of those is empty, you move it to the other one. And now you have an empty space for the original one. And you keep recursing like this until you find a place where you can insert. It's pretty neat data structure. Pascal, another Rust developer pointed out that we could also do hopscotch tables that are pretty neat. Or of course, I also want to cover other things in the standard library. Like it would be fun to try to implement a mutex, which has a bunch more unsafe code. The hash map shouldn't have any unsafe code, but unfortunately happens to have one that hopefully I'll get rid of. I'll follow a ticket after this. Yeah, I mean, in general, what I do for these streams is I post to Twitter about a week in advance with a poll of what you would like to see. So that's how we ended up doing a HashMap now. There were a fair number of votes for doing... Continuing on the asynchronous SSH crate that we wrote a few streams ago, which currently only supports reads, but have it support writing to the standard input of a remote process as well, which I think could be pretty cool. It would be probably a little bit more advanced than this, but if people are interested, I'm happy to do those too. All right, it doesn't look like people are terribly excited about doing sorted buckets, which I sort of understand. So I think we'll probably call it here then. Thanks for coming out. Thanks for watching. If you have ideas for other things you'd like me to cover, then ping me on Twitter or on Patreon. If you feel like there are things that could be better in the streams, like someone pointed out last time that the font size was too small. And so that's why the font size is now larger. Um, then just like... Let me know if there are particular things you'd like to learn. If there are particular things you think should be explained better, then let me know. Apart from that, enjoy the rest of your day. Thanks for coming out. Bye, everyone. Let's see if I...
Info
Channel: Jon Gjengset
Views: 34,310
Rating: undefined out of 5
Keywords: rust, live-coding, livecoding, hashmap, data-structures
Id: k6xR2kf9hlA
Channel Id: undefined
Length: 149min 40sec (8980 seconds)
Published: Sat Jun 02 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.