Learn Hash Tables in 13 minutes #๏ธโƒฃ

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right what's going on everybody hash tables a hash table is a collection of key value pairs each key value pair is known as an entry we have two pieces of data the first is the key and the second is the value in this example let's pretend that we're a teacher and we need to create a hash table of all of our students each student has a name and a unique student id number but these can be of any data type that you would like in this example the key will be an integer and the value will be a string so how do we know at which index to place each of these entries well what we could do is take each key and insert it into a hashcode method the hashcode method will take a key as input plug it into a formula and spit out an integer this integer is known as a hash now if we're finding the hashcode of an integer in java that's actually really easy the formula is the number itself so the hash of 100 is 100 123 is 123. so on and so forth with the other keys so after finding the hash of your key now what can we do these numbers are way too large and the size of our hash table is only 10 elements what we'll do next is take each of these hashes and divide each of them by the capacity the size of our hash table we have 10 elements so take each hash divided by the capacity of our hash table whatever the remainder is we will use the remainder as an index and to find that we will use the modulus operator so 100 divides by 10 evenly the remainder is zero so 100 modulus 10 gives us a remainder of zero so spongebob's entry we will place at index 0 within our hash table so the hash of patrick's key is 123. 123 modulus 10 gives us a remainder of 3. patrick's entry will be inserted at index 3 of our hash table so here's a little shortcut if you have a number modulus 10 you could find the remainder and that is the last digit 321 modulus 10 will give us a remainder of 1. sandy's entry will be placed at index 1. then squidward's entry will be placed at index 5. 555 modulus 10 is 5 and gary's will be at index 7 following the same pattern now here's one situation what if two hashes are calculated to be the same value that is known as a collision and i can best demonstrate that with a separate example in this next example let's say that each key is now a string each entry is a pair of strings we will first need to find the hash of each of these keys so the hash code of a string uses a different formula basically speaking we're going to take the ascii value of each character within the string and plug it into this formula i went ahead and calculated the hash of each of these strings using this formula of the string hashcode method and the next steps are the same as before take each hash divided by the capacity of our hash table and find the remainder so beginning with the first hash 48625 modulus 10 gives us a remainder of 5 spongebob's entry is now at index 5 within our hash table now patrick's will be at index zero now here's sandy's sandy's will also be zero we have a collision both of these entries will be located at the same index so what do we do each of these storage locations is also known as a bucket and the most common resolution for a collision in a hash table is that we will turn each bucket into a linked list if this bucket already has an entry within the first entry we will also add an address to the location of the next entry and keep on adding more if there's more entries within this bucket so in this way this becomes a linked list if we're looking up a key we first go to the index in which it's located if there's more than one entry we will search linearly through this bucket as if it were a linked list until we find the key that we're looking for so that's the most common resolution when there is a collision but ideally you would want each of these entries to be within their own bucket based on the hash of squidward's key squidward's entry has an index of nine and gary gary has an index of five and there's another collision we will add the address of gary's location to the first entry and this bucket becomes a separate linked list this process is known as chaining the less collisions that there are the more efficient that this hash table is going to look up a value ideally you would want each entry to have their own bucket but collisions are possible to reduce the number of collisions you can increase the size of the hash table but then again the hash table is going to use up more memory then so people usually find a balance between the two so yeah those are hash tables in theory let's create our own now all right everybody so let's implement a hash table in java so we will need to declare this hash table and list the data types of our key value pairs if we need to store primitive data types we can use the appropriate wrapper class so let's store integers and strings integer and string the integers will be the key is the strings will be the values we'll map student id numbers and student names and i'll name this hash table just simply table equals new hash table there we go so in java when we create a hash table these have an initial capacity of 11 elements and a load factor of 0.75 so once 75 of our elements are filled this hash table will dynamically expand to accommodate more elements now you can set a different capacity for your hash table instead of 11 let's say 10 to be consistent with our example in the previous part of this lesson and if you would like to change the load factor you can add that as well instead of 75 let's say 50 so we would pass in a floating point number so 0.5 then add an f at the end for floating point numbers but in this example let's just keep the load factor consistent so let's start adding some key value pairs to add an element to your hash table use the put method so table dot put and then we will pass in an integer as the key and a string as the value so our first student is spongebob he has a student id of let's say 100 and let's pass in a string for the value spongebob okay that is our first student so let's add a couple more from the previous example so we have spongebob patrick with an id of one two three sandy with an id of three two one squidward with an id of five five five and gary with an id of 777 now to access one of these values you can use the get method of tables so i'll display this within a print line statement table dot get and i will pass in a key let's get the value at key number 100 so this student is spongebob how can we display all of the key value pairs of a table well we could use a for loop so i'm going to create a for loop and place this within it so to iterate over the keys of our table this is what we can write we can use an enhanced for loop so we are iterating over integers so the data type is integer key colon so to make our hash table iterable we can get all of the keys from our table and put them within a set a set is iterable so we will iterate over table dot key set method this will take all of our keys and return a set and a set is something we can iterate over and within our print line statement let's print each key key plus then maybe i'll add a tab to separate these plus table dot get then pass in whatever our key is okay so after running this this will display all of our key value pairs and if you need to remove an entry well there's a remove method table dot remove then pass in a key let's remove gary so remove the entry with this key 777 and gary is no longer within our table but we'll keep them in i'll turn this line into a comment now just to get a better understanding of where these key value pairs are being placed let's also display each hash code for each of these elements so preceding our key let's display each hash code i'll precede our key with a tab and let's display each key's hash code key dot hashcode method if we're using the hash code of integers this will return the primitive integer value represented by the key that we're passing in if we're using the hashcode method of integers well the hash is going to end up being the same integer so you can see that these numbers are the same to calculate an index we can follow the hash with modulus operator then the size of our table we set this to originally be 10. so we have gary at index seven squidward at index five patrick at three sandy at one spongebob at zero now if our data type was strings we would use a different hashing formula so let's change the data type to string and all of these keys are now strings then let's remove modulus 10 and change the data type of our for loop to strings string key okay these are the new hashes for each of our keys this key has this hash number this key has this hash number so on and so forth so different data types will have different hash code formulas now let's calculate the element in which each of these entries is going to be placed by adding modulus the size of our hash table 10. so here are the elements and we actually have two collisions we have a collision with these two keys they're both within bucket five as well as these two entries so both of these will be placed into bucket zero since there's more than one entry within the same element we will treat this bucket as a linked list and just iterate over it linearly until we reach the key that we're looking for now one way in which we can avoid collisions is to increase the size of our hash table if we set this to the default of 11 and change this to modulus 11 well then these will be placed within different buckets and you can see that they changed however we still have a collision with spongebob and squidward so what if we increase this to 21. do we have any collisions then nope we do not these keys are within their own buckets all right everybody so in conclusion a hash table is a data structure that stores unique keys to values when you declare a hash table you state the data types of what you're storing and these are reference data types each key value pair is known as an entry and a benefit of hash tables is that they have a fast insertion lookup and deletion of key value pairs but they're not ideal for small data sets since there's a lot of overhead but they're great with large data sets hashing in the context of a hash table takes a key and computes an integer it utilizes a formula which will vary based on the key as input and its data type then to calculate an index we follow this formula we take a hash modulus the capacity of our table to calculate an index each index is also known as a bucket it's an indexed storage location for one or more entries they can store more than one entry in the case of a collision and in case that happens we treat each bucket as a linked list each entry knows where the next entry is located and as we discussed a collision is when a hash function generates the same index for more than one key less collisions equals more efficiency and the runtime complexity of a hash table varies if there are no collisions the best case would be a runtime complexity of big o of one it runs in constant time in case there are exclusively collisions as in we place all of our entries within the same bucket it's going to be one giant linked list and the runtime complexity of a linked list is big o of n it runs in linear time on average the runtime complexity of a hash table will be somewhere within this range so yeah everybody those are hash tables if you would like a copy of all my notes here i'll post them in the comments section down below and well yeah those are hash tables in computer science
Info
Channel: Bro Code
Views: 101,508
Rating: undefined out of 5
Keywords: Hash Table, Hashtable, Data structure, algorithm
Id: FsfRsGFHuv4
Channel Id: undefined
Length: 13min 26sec (806 seconds)
Published: Wed Oct 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.