Data Structures in Golang - Hash Tables

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
uh hello today we're going to be talking about hash tables um today's topic is very important not only because it's one of the most basic data structures but also it is really commonly asked in coding interviews so coding interviews can also ask you to implement a hash set or a hashmap without using any built-in libraries so if you're preparing for those kind of interviews you really need to know how hash tables work so in this video we're going to talk about things like what hash tables are and also what is a hash function we also need to talk about collision handling etc and after knowing how hash tables work you'll understand why hash tables can be so fast when it's trying to insert delete or search data okay so um like always i'm going to first explain the basics with um diagrams and animations and then we we can go and code what we learned in go okay so let's dive in so there is this array and we've got a bunch of names stored in it let's say you want to search for randy you'll have to go through the whole array one by one until you find your match and this array is pretty simple but in reality arrays are going to be way much larger than this and using brute force to find randy is going to be pretty inefficient but let's say there's a way to find out which index randi is in and surprisingly we can find that out just with the name randy itself we'll be able to find where randy is right away because we know that it's stored in index 82. this is the main idea of a hash table so the thing that can turn randy into 82 is something we call a hash function we input randy into the hash function and it's going to spit out 82 and we call this output number the hash code and that will be the location of randy this is possible because when we're storing data we put randy in a hash function which will give us 82 so we store it in the location 82 and later on when we're trying to search randy we put randy in the hash function get the hash code which would be 82 and then go to that index 82 to find randy a data structure that has data stored in this way is called a hash table you can see that hash functions play a pretty important role in these kind of data structures let's look into hash functions a bit more i'm going to give an example of a hashing algorithm i'm just gonna give an easy example each letter of the word randy will be converted into his ascii code this just means that we're going to turn each letter into a number and then we sum it up and divide it with 100 and get the remainder so why is it 100 because that is the size of the array that we want to store the name into in this case the resulting hash code will always be something between 0 and 99 so we store randy at index 82 and we're going to put eric into the hash function and get 91 and store it and we're also going to insert butters stan and kyle in the same way and when you're trying to find a word let's say we want to find stan we would put stan into the hash function get the hash code and easily search it in the hash table uh this sounds perfect right this seems like the best data structure ever but there is something we need to talk about when we're dealing with hash tables and that would be something called collisions in the hash function example i just gave there might be a chance that two names can have the same hash code let's say we want to store two names randy and eric and we want to store it in an array of a size 7. if we put them in the hash function they will be giving out the same hash code so if we had stored randy in the array and then after that we try to store eric that index number is already taken by randy so we can't store eric handling these kinds of situations is called collision handling so collision handling would let us find a way to store both randy and eric in the array we'll look into two ways of collision handling open addressing and separate chaining first let's see what open addressing is if the spot for randy is already taken we just store eric in the next index and when we later on try to search eric we first go to its original position where the hash functions tells us to go and if it's not there then we'll look into the next index although we need to add an extra step when we're searching this would still be faster than just searching all the indices one by one from the beginning but with open addressing as the array gets filled more and more the names would get further and further away from where it should be and we will start to lose the benefits of hash tables so this would lead me to explaining the second collision handling method which is called separate chaining a simple and easy way to explain separate chaining would be storing multiple names in one index how by using linked lists so instead of storing the names in the indices of the array each index will hold a pointer that points to the head of a linked list that has a list of names if you don't know what linked lists are there is a video explaining that so i'll put a link in this video link list will be able to grow and shrink dynamically so we don't have much limitations toward the size of the data we want to store and using the combination of an array and a linked list makes separate chaining a very fast and effective method for hash tables so we talked about two ways of handling collisions open addressing and separate chaining in this tutorial we will be coding separate chaining so we need to look into the details of how we're going to insert delete and search the keys so from now on we will call the names to be stored as keys and the linked lists in the array index will be called buckets we call the names keys because hash tables are used for storing key and value pairs for example the name randy can be associated to a value that holds data or other information let's look at how we can insert a key into the hash table we'll first get the hash code by putting the key in a hash function then go to that certain index in the hash table array and insert it right there but because we are doing separate chaining we will have to store it in the bucket each key in a hash table will have to be unique so first we need to search the bucket and check whether that key already exists in the bucket or not and if the key doesn't exist we will then add the node represented by the key into the linked list let's add eric that has the same hash code as randy like we do in a linked list we will make a new node and we will let that new node become the head node and make that node point to the node that used to be the head node before when we're searching for a key we'll first find the bucket that holds the key by putting the key in the hash function and then go to that bucket and search the bucket one by one until we find the key and if we want to delete a key or we go to that slot just like searching and inserting we go to that slot and look for the key in the bucket and then we disconnect it from the bucket we do that by letting the previous node skip that node skip the node that we want to delete and make it point to the next node by using linked lists and arrays we're getting the benefits of an array where we can immediately access a slot and that lets us insert search and delete a key faster compared to other data structures let's talk about how fast it can be by looking into the time complexity of hash tables the time complexity of hash tables for inserting deleting and searching is expected to be constant in an average case comparing that to some other data structures we talked about in the previous videos that is pretty good you can think of hash tables with separate chaining as a combination of the benefits of arrays and linked lists the fact that arrays let us instantly access the data that lets us search insert and delete data very fast but like always there's a worst case scenario if all the keys collide with each other so all the keys that you want to store have the same hash code the hash table would become more like a linked list and it would take more time to insert search and delete so in that case the time complexity would become o of n but that is the worst case scenario in general a hash table is expected to have a constant time complexity in this tutorial we're going to implement a simple hash table with separate chaining in golang there is a type called maps in go and that actually implements the hash table itself but we also need to understand how to implement a hash table without using any built-in libraries or any built-in functions so that's what we're going to do here so let's go and do some coding so the hash table part is going to have the insert search and delete and the linked list bucket part the buckets are going to be linked lists will also have insert search and delete so first of all let's go and write down the things that we need to do so first we need to have like a hash table structure and we also need a bucket structure insert for hash table and we need a search for hash tables a delete so these three are going to be methods for the hash table and for the bucket we also need to implement insert and search and delete okay so we have to build a structure for hash table and bucket these are going to be for the hash table and these are going to be for the bucket and other than this other than the the structure the methods for hash table and the method for buckets we also need um to define a hash function and we're also going to do um init this is also going to be a function that initializes the hash table okay so for the hash table structure the bucket structure needs to be a linked list so that's why we need to have a linked list and also we need um okay um the hash table is going to hold an array so and just call it array the array size is going to be a constant so we have to find a constant for the array size because we don't want to hard code the number 7 inside the hashtable struct and that type is going to be a bucket we didn't define this yet but we will define it right after this so the array size is going to be the size of the array and because we need to use this here and there not only the hash table but when we're doing um like writing the hash function we also need to use that so i'm going to make it as a const and right now i'm just going to put 7 in it so we didn't define this yet so we're going to do this here a bucket is going to be a linked list and like we did in linked lists we're just going to put the address of the head node in the bucket and bucket node we didn't define that yet we're going to do that here and like we did in linked list node each bucket node is just going to hold the key and also it needs to have an address pointing to the next node so we'll just call it key and that is going to be a string like the names like eric randy and then the next will be an address of another node another bucket node um let's go and just uh try to like create some of the structures and just see what they have and i'll just create that okay and if i open the terminal and try to run it so we have an array this we just like printed out the whole thing at hash table the test hash table right now has an array size of seven and the array is empty right now okay each each slot of the array it has to hold a bucket so we're going to make a function that initializes this particular hash table and put a create a bucket into each each slot it's not going to take in any parameters any arguments but it is going to return a hash table an address of a hash table so and create a new hash table we're going to create a new bucket so we have to be creating seven times so we're going to put in put it inside a for loop to do that okay so right now what i did was um i created a for loop that can loop through each of the slots inside the hash table array and inside there i created a bucket and i put it in so here i'm going to put test hash table as an init function in that case this is going to return a hash table where each slot has a new bucket inside it and it's going to return that so let's see what this shows run it and unlike last time last time everything was inside the array was nil but this time in each slot of the array we're going to get an address because in each slot we created a bucket and we put it in the next step uh we're going to actually do the inserting and searching and deleting i'm going to have a pointer receiver of a hash table it's going to be a method of the hash table and it's going to take in a key like randy or eric so we'll call it key and it's going to be a string and it's not going to return anything um and we'll write what's going inside here later on we're just going to copy this and paste it here it's a search because this is the search method we're going to return true if that key is inside the hash table or false if not it's for delete so this is for the inserting searching and deleting of the hash table when we're doing these three things the first thing we need to do is to put this key inside a hash function and get the hash code call it index and we'll go hash and we'll put the key inside it we did not actually create this function yet that is going to be our next step but just we're just going to just write it out first organize so now that we need this hash function we're going to go and implement it here it needs to take in a key a string so we'll just go key string and it's going to return an integer so int okay we're going to just get the ascii code for each character sum it up and divide it by the array size and get the remainder that's what i'm going to do i'll first go sum to get the sum of each ascii um and i'll just say zero to just initialize it and then we need to use a for loop to loop through each character of the key so that's going to be our four i'm going to throw that out at the range of heat and i'm going to just put it into something right now what this is doing is um let's say if i put in eric then it's going to loop through each of the characters in eric in the string so e and then i'm going to turn that e into an integer and i'm going to add it to some sorry i'm going to add it to some the next for loop i'm going to go to r in eric and i'm going to add it to some and that's how i get i get the sum of each of the characters and i'm going to return some get the remainder of the array size let me just do a little test here i'm just going to comment this out oops uh yeah and comment i'm just going to comment everything out okay so we get four for randy which is correct okay so i just wanted to check if the hash function was working all right and for the insert after we put after we get the hash key turn it into a hash code and put it in the index we're going to go to that array slot so in that hash table there's going to be an array right and i'm going to the slot of index of that array and i'm going to insert i'm going to insert the key in this slot each hash table is going to have an array right now having seven slots each slot is going to have one bucket and inside that bucket i'm going to insert this key so that's what we're doing right now i did not define this method yet that is going to be my next step so i'm going to go and make this this method over here this right now is a bucket so it has to be a method of a bucket so i'm going to go and define insert right here so the first step would be to get a key and create a bucket node with that key um i don't like that i'm gonna go like this i'm gonna set the in the parameters k i'm going to set this new node as a head and then i'm going to make the next of this new node point to the head that was there before so right now the new node next will become the current head the head is going to become the new node so let's go and check this over here and i'm going to just create a button with nothing inside it and then okay and i'm going to start to use breakpoints instead of printing everything in the console i'm going to launch the debugging i'm going to step over and that means i'm going to see the results of this so i executed this line and we're going to go to the next so test bucket should now have randy inside it so let's go and check that so inside test bucket this is the address of the bucket and if you go inside here it means we're going to go to that address which is going to be like the actual bucket and inside there the head is going to have this address which is a bucket node and this is an address so this means we're going to go to that address which is going to be a bucket node and we're going to see the contents of that which is the key randy so we found out by debugging that we actually have the key randy stored inside this test bucket okay i'm going to stop by pressing that and so let's first write up the search method uh we're going to jump to each of the nodes inside the bucket inside the linked list and see if it's a match with the parameter k so first we need to put um the head in the current node we're going to keep on looping until we find a match so what we're going to do is we're going to jump to each of the nodes inside the bucket inside the linked list and see if it's a match with the parameter k so first we need to put um the head in the current node current node is going to indicate that we're doing the um we're going to compare that node and it's going to be like a while loop we're going to keep on looping until we find a match so we need to loop until the current node is empty so that's going to be expressed like this and we're going to find uh whether the current node is a match to k so that's going to be like okay if there is a match while we're doing the we're going to find a match for the current node if the current node key is a match with k which is what we're trying to search in that case we're going to return a true right away and if not if it's not a match we're going to update the node to the next node if we keep on doing that and we went to the end of the node and it turns out to be empty we're going to come out of that loop so we're going to come out of here and that's going to mean because we looped everything and was not there so we're going to return a false so that's the end of search uh let's try to search here um and we also search for images so so you can see that we inserted randy so when it searches for randy then it's going to be true and when it searches for eric it's going to be false so now that we have the function search we are going to put an if statement here saying that if okay is there so this means if k doesn't exist in b in that case i want to execute this and else i am going to hmm i'm just going to comment these ones out and insert randy again and let's see if we can get that statement saying randy already exists the last step we're going to implement the delete the linked list will be deleting the current node by skipping the current node and making the previous node point to the next node and we can't do that if we actually go over the previous node so that's why we're going to put the each node into a previous node and express the current node as previous node.next and everything is going to be very similar to the search method but instead of saying current node we're going to put previous node.next you know how in the search we said for current node is not nil we're going to go for previous node dot next this is the current node and while this is not nil we are going to uh loop it through and update it so we're going to put the updating part first previous previous note will be the next previous node which is the current node dot next and this is where we're going to put the if statement if this part over here if current node.key which is previous node dot next dot key is equal to k in that case we're going to delete and how we're going to do delete we're going to go pre we're going to set the previous node next to previous node next dot next but when we're doing the delete we need to be careful about just one more thing because we're putting the head in the previous node when we're missing one case where the the actual matching key is the head so we have to put in an if statement here if the key of the head is actually the match to what we're trying to delete so it's going to be b dot head so we're resetting the head of this bucket to the second node which is b dot head dot next and after we've done that we're going we want to uh come out of the method so we're just going to go return so this is going to just um just end everything and finish the method i'm just going to do a very simple test saying um instead of this i'll just go delete randy and search randy and see if it returns a false so um and so it's it's returning a false because we successfully deleted randy so now that we implemented um delete we implemented search and insert for the buckets now let's go and finish these up which is kind of like the the last step so i'm just going to uncomment everything and here this is going to be and this is going to return a boolean so we'll have to return that turn and in delete this is also right now we're going to the uh the index and then we're going to go to that bucket which is h dot array index and we're going to delete k the key okay so this is actually the end of the uh the coding now we're just gonna go and see if we can just debug it and see if it's and just check if everything is working fine or not i think and i'm going to call this particle put a break point here so i'm gonna launch it and let's see what we have so we are at the first insert we didn't execute it yet before we actually press the step over so the step over is going to let us execute this and go to the next step right now in the hash table we inserted eric we inserted eric so eric is the hash code for eric is for so we can see it over here eric is over here we're going to step over to the next is added and the next is going to be the next node is going to be eric so this is one bucket node the bucket node is going to hold the address of the next where we can follow through here so eric would be the second node um the next over and over and the next is going to be stan which hashcode is 2 so that's going to be in here stand and we're going to step over and over and it's going to be randy which is hatch code 4 again so it's updated over here so the head is going to be randy and you can see all the other nodes i kind of like pushed to the next so now that we added all of them let's start deleting some um table dot delete let's delete and we'll launch it so right now we're here in stan stan was in stan was in hash code number two and when we step over we're going to see this disappear so step over and you can see hash code hash code number two the location number two is now nil so we successfully deleted it uh we can also try to search it um and as we deleted stan it's showing false and we didn't delete kenny so it's showing true if this video was helpful and you want to see more of my videos please subscribe to this channel and thank you so much for watching until the end i don't think many people do okay thank you so much and i'll see you in the next video you
Info
Channel: Junmin Lee
Views: 12,883
Rating: undefined out of 5
Keywords: data structures, golang, leetcode, coding, programming, coding tutorial, programming tutorial, algorithms, computer science, software development, hash table, hash function, collision handling, separate chaining, open addressing, linked lists, hackerrank, hash tables in golang, hash tables in go, hash tables, maps in go
Id: zLnJcAt1aKs
Channel Id: undefined
Length: 31min 40sec (1900 seconds)
Published: Fri Sep 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.