Hash Tables - Data Structures and Algorithms

Video Statistics and Information

Video

Captions Word Cloud

Captions

hey everybody welcome to your video on hash tables now if you're brand new to hash tables this is a little bit confusing but I'm going to do my best to explain it in a simple way when I was first learning this I just had to tell my two brain cells to pay attention and I just had to go through an example after example to get it so hash tables are used in a lot of popular programming languages and it's very similar to an array but instead of just putting values into the array we're actually going to have key value pairs so what in the heck is a key value pair and by the way if you're coming into this because you really want to understand the hashing aspect of this we're going to talk about that in this video so a key value pair you can think of it as an association between two values so it might be a person's name and a person's ID so let's go with that we'll say ID is associated with the person's name and this is an example of a key value pair more specifically you might have the value 3 and the name caleb curry so whereas before you may have just put caleb curry inside of an array you know one of the spots here might have the value caleb curry and then you might have other people's names in here as well you no longer are just going to worry about the value you're also going to have to worry about the key whereas before you would index these just using integers that go up you know you would say this is index 0 this is index 1 and this is index 2 so like let's say we had some other names in here we might have John Smith and subscriber that's you there's also John Cena but you can't see him so before we would grab these using these numbers which is pretty much just the offset so let's say this was a collection called users we would originally just grab these elements using these integers which is just the offset so to grab subscriber here you would say start at users and we're gonna go offset to so the computer knows where to find the data you're looking for because it goes to the users array and then it goes to elements off to grab the piece of data you're looking for so that is how a traditional array works but when we introduce keys so we're not just worrying about values now we have key value pairs instead of passing in that two you're actually going to pass in this person's ID so like let's say your ID was 618 here is how you would grab this element so no longer do we just have Caleb Korea John Smith and subscriber but we actually have the ID as well so three is Caleb curry and we'll go with seventy-five as John Smith and subscriber is a six eighteen so let's talk about the of a hash-table now that you understand a little bit of what it's trying to do so here are the primary purposes of a hash table and its benefit the first one I have here is lookup table and what this means is you can use a hash table to look up data by some key so you can look up this person's name by the key 3 you can look up this person's name by the key 75 so that is the concept of a lookup table it's a common concept in computer science any time you just need to look up some data another example of what a lookup table might be is let's say you have some zip code I'm in the US so I'm not really sure how things work and you view other countries out there but you can eat or just focus here for a minute 1 2 3 4 5 that could be a zip code and you might have a lookup table to see what city this is in so you would have this is the key and this might be associated with I don't know Bellevue or something yeah don't know that spell anyways that's another example of a lookup table anytime you use some value to get extra information the other main benefit of a hash table is extremely fast inserts and retrievals similar to how retrieving data from a normal array is very quick you just take an array name such as data that'll start at the beginning of the array in memory and then you pass in some offset such as 32 well the computer is able to just jump to the correct position thanks to that offset hash tables are a little bit magical because they are still very very fast for retrieval and inserting data but we don't necessarily have to know that offset because we're not using an offset like here we're actually going to use some key such as 6 18 3 or 75 we're in this example this 32 is referring to where is that in memory this number here the key does not so hopefully that's not too confusing this here is a normal array and this here is a hash table this is an offset or an index this is a key this is not directly tell you what position it's at in memory this however the offset or the index does tell you where it's at in memory so you might be wondering if this doesn't say where it's at in memory how is it that we can still have very fast inserts and retrieval similar to an array well the actual location in memory can actually be calculated from this key here so 618 in this case acts as an input to some calculation that determines where to store this inside of memory so you want to visualize that 618 goes through some processing and this processing is used to determine the actual memory location whereas with a normal right that index can just be used immediately to get the location the keys for hash tables just have to go through one extra step to get that memory location but it's still an extremely efficient process as it is constant time so this is an O of one operation it's a constant time operation now it's not perfect and there's some issues that might come up with hash tables that we're going to talk about so we might not always get a constant time for inserts and retrievals however it is still extremely efficient all right so I think we're getting a little bit too cluttered here so what I want to do is I want to clear some of this off and go through a little bit in more detail how this process going from a key to a memory location to grab the data such as Caleb courier John Smith or whatever what is this process and how does it work because this is the magic the secret sauce the secret formula of hash tables so if you can understand this process then you will understand hash tables and set yourself apart from a lot of people who are only able to use hash tables because they're very easy to use but don't understand any idea how they work so if that's what you're interested in then stay tuned because we're gonna get into that and hopefully increase your knowledge and understanding all right so let's say we want to use a hash table in our code it doesn't really matter what programming language you can look up the specific syntax for whatever language you're using but it's probably going to look something like this we're going to have some variable and we'll just call it data and we can initialize that so we will just start it out as an empty hash table the way you might add an element would look something like this data then you pass in a key here so let's say we pass in the value 3 which is my user ID and we assign it some value such as Caleb so if you're in Python the hash table data structure is known as a dictionary you may also hear associative array for various programming languages so those are both common names to describe hash tables but in this situation we are creating an empty hash table and we add one element with the key of 3 so if you went ahead and analyzed data to see what it looked like or if you just printed it out it's going to look like this curly braces to indicate the start of the hash table and then we're going to have is a key and then the value something like that and I'm terrible at drawing these curly braces but those are curly braces and that's a 3 now to get that data back out you just do the same exact thing so as an example you could pass to a function will just say data and use the key 3 so that is how you would pass in the value Caleb if you tested to see if these were equal it would pass as true that's all you need to know on how to actually use a hash table so it's very very simple to use one but it's a little bit more complex behind the scenes so now I want to understand the hashing aspect and how this is actually stored in memory and I don't know a hundred percent exactly how it's done for any particular language but it's going to be something very similar to this for whatever language you're in so we'll just go through a simple example and let's say when we create a hash table in memory it gives us an array of a certain size so let's draw that over here and this might have six spots right one after the other in memory so this works behind the scenes just like any other array where it's just one continuous area of memory and each element gets one spot in this area of memory so this here Q 3 and the value Caleb might be stored let's say right here now to use the hash table the actual location in memory is irrelevant to us but if you want to understand the hashing aspect it's important to understand that this is going to be assigned some specific spot in an array now the actual way it decides which position to store the sat is determined from hashing so the way it goes from 3 to index 0 1 2 3 4 so this is index 4 the way it goes from here to here is done in the hashing algorithm so there is some hash function that does a transformation on this key to get some position in memory and it's got to be in this situation either index 0 1 2 3 4 or 5 so this hash function the way this actually works depends on a lot of things what type the key is probably what language you're in and probably a bunch of other things so a simple exam of how this hash function might work is actually taking the key and modulus it with the size of the array so I'll explain that in a sec but it's pretty much going to look like this three modulus six which is the array size so you can generalize it and think of it as so key modulus size and as a refresher the modulus is going to give us any remainder so any remainder when you do three divided by six well there's actually going to be the entire thing left so this would actually give us the value three and let's say that is the algorithm it uses to decide where to store the data and this is actually going to be stored right here so three Caleb so convenient in that we have the same key and the same position in the array but let's go through a different example let's say we want to add another element in here so we say data index 664 and this one is going to be John well following the same exact algorithm here's what's gonna happen we're gonna have the key 660 for modulus with six so if you were to do 660 for modulus 6 well 6 goes into 660 11 times so there's going to be 4 left over so this will give us the value 4 so key 664 John is going to go index 4 so that is the basic process for how we go from the key to the actual location so there might be some variations in there a lot of prime number examples I've seen online and other ways of doing this process but ultimately you're going to get a number 0 up to the array size minus 1 in this situation it's actually impossible to get 6 because if it went in evenly there would be remainder so for example if you passed in 66 modulus six well this is going to give you zero which is going to go here if instead you did 65 it's going to go into sixty ten times and you're gonna have five left over so you get five so there's no way to go out of the bounds of zero to five so that is why a modulus is often used also keep in mind that we are using an integer for the key here but you can also use strings so there could be an association between someone's email and their name or an association between the person's name and their address or whatever you want it to be as long as this data type is hashable you are good and most data types are hashable the only kinds that are not are things that often change so for example a a list or a dynamic array is probably not hashable but any of the mutable data types should be good to use for a key here so now let's keep going through this example and I want to introduce a problem with hash tables and how this problem is solved I'll give you one example of how it's solved but there's lots of different ways of solving it so let's say we go through an example where we want to add in another key here and we're gonna pass in 64 and this person's name is Helen well if we were following that same algorithm 64 modulus 6 that is going to return the number 4 so let's go ahead and add Helen to Oh what's what do we do John is already here so do we just does it explode do we explode well that's exactly what happens if you have a conflict here you legally become an explosion so exploding is one way to fix this there's also other ones that are probably more ideal and one common way to do this is actually to use a linked list so what's going to happen is this becomes a node that actually points to the next node which is in this case going to be Helen so 64 is Helen so that's why all that linked list garbage is now useful because it actually can be used in these other data structures so if you weren't paying attention for the linked list videos face your consequences so that's one solution for this problem but there are other ones out there so for example you might hit index 4 and there might be already something there so you just go to index 5 so collision resolution is the word used to describe these different options of fixing these collisions the linked list option I told you is called separate chaining the one where you go from index 4 to index 5 is an example of linear probing don't like the word probe it's weird I could probably go into so much more detail for hash tables because they're very in-depth for example at what point is six elements irresponsible and instead it would be better to have 600 elements or 6000 or 60,000 or whatever you know because if we're just going to have six elements here and then a linked list of 22 trillion elements then that defeats the whole purpose of hash tables because if you think about it to find this element we're gonna have to start a John and then go one note at a time searching through this linked list until we hit that data so how that might work is if you try to access data index what was 64 it's going to basically take this key hash it it's gonna get the result of 4 it's gonna realize that's John we don't want John John smells weird so we're gonna go down our linked list - fortunately Helen is right there but imagine if we had a really super long linked list we're going to negate the benefits of the hash table because now we are subject to the operation speed of a linked list which by the way iterating through a linked list is an O of n operation so that's gross now it's not going to be oh of n for the entire data structure here it's just going to be oh of n where n is the size of the linked list so if your linked list is a hundred elements long it could potentially take 100 operations of checking the node going to the next node going to the next node and so forth but anyways I think I'm getting into the weeds if you want to know more of that information you can research that so beyond the different ways to do collision resolution you can look up how these data structures are dynamically resized and when that happens and what is the cost of doing that there's all kinds of different things you can research with hash tables that I'm not going to get into in this video but that's just because this is the intro data structures course but hopefully we'll get into the stuff in more detail in the future now if you want to know more on hashing data structures there's actually a hash set which is similar but different and that's what I'm going to be talking about in the next video so stay tuned and definitely be sure to so or you will get some linear probing no propings gonna happen here I'm sorry I'm I just really suck today

Info

Channel: Caleb Curry

Views: 13,308

Rating: undefined out of 5

Keywords: hash table, hash tables, hashmap, hash map, hashing data structure, hash, hash explained, data structures, algorithms, hash collision, linear probing, quadratic probing, linked lists, associative array, hash set, computer science

Id: riO8Rgunc0o

Channel Id: undefined

Length: 20min 59sec (1259 seconds)

Published: Mon Jul 13 2020