Simple Explanation of HashMap, HashSet, ArrayLists and Big O: O(n), O(1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video i'm going to describe the hash table or hash map and compare that with a more traditional array or an array list i'm going to use some magnets that i purchased in my travels from around the world and you see i've organized them here into a traditional array list or array if you wish now from here several operations are straightforward if i wanted to print out a manifest of all my magnets easy enough i simply start with the first which is new zealand and i print out new zealand then i move to the next which is peru and i print the name peru and i iterate over each one of these items shake hands with it and i print out its name if i want to add an item to the end of this collection it's also easy i simply make a new bucket and i take my london magnet and i add it to the end easy enough but now if i want to insert this item at the beginning of the collection it's much more complicated because there's already another magnet at the beginning and if i move that down i'm going to run into the same problem there's already a magnet in that space so what i need to do is start at the bottom move that magnet down one take the magnet in position seven move it to eight the magnet in position six move to seven five to six four to five three to four two to three one to two and zero to one and now i can add my new magnet at the top so if you were counting there i had to do one operation for each item in my collection the bigger my collection the more steps i have to take to insert something at the beginning of the collection now what if i want to find a specific magnet in the collection when we're analyzing performance for a find operation we typically assume worst case let's say it's the last magnet we find so let's say i want to find my isle of man magnet why did i go to the isle of man because i can but nonetheless let's look for the magnet so it's not london okay no not new zealand not peru not rio not zurich not queens not coventry not monterey not sydney well here it is it's the isle of man so see i had to have an operation on each item in this collection before i found my isle of man magnet so the more items that are in the collection the more operations i have to do to find a specific item let's consider the performance of finding an item in an array or an array list on our big o graph let's remember on the bottom axis we have n which is number of units in this case number of units we're storing and on the axis on the left we have t which is the amount of time it takes to process and in this case we want to find an item in an array or an array list and when we're considering big o we typically think worst case so this collection has nine items starting with zero and ending with eight if i look for my isle of man magnet i have to go to the first item and say are you my isle of man magnet and it says no so that's one unit of time and one item that's stored now i go to the second item are you my isle of man magnet no so that's two units of time total now for two items stored three not the isle of man magnet or number two actually but our third element our fourth element our fifth element our sixth element our seventh element our eighth element our ninth element you see that the number of elements we have in this array each one we add is one more unit of time that we have to process to find the item we're looking for right now we have nine elements which is nine n and that's nine units of time if i were to add a tenth element represented by number nine here an 11th element represented by number 10 and a 12th element represented by number 11 now if i have to find my isle of man magnet i need to traverse across all 12 of these elements in worst case so once again 12n 12t and that gives us o to the end performance the more in we add the more t it takes to process there's n now let's consider hash codes and how we can use them if you've ever looked at the documentation for java.lang.object which is the grand ancestor of every java class you've seen a method on there called hashcode and a hashcode returns an integer which is a distinct way of telling objects apart it may not be perfectly unique but it should be distinct in the vast majority of cases so that hashcode method is defined on java lying object and it's up to you as a developer to override that and generate a distinct integer for any object of that class if you use lombok annotations it puts a little intelligence in there and it tries to generate a hash code for you one that is documented very well though is the hashcode method on java.lang.string it's a fairly straightforward formula now don't worry if you don't know all of this by hand or this is a bit of a confusing formula but we'll walk through it one at a time keep in mind that in the language of computers letters map to numbers because computers talk in zeros and ones which become numbers then those numbers we can map to letters that mapping can be done by ascii ebcdic or unicode any of those encodings is simply a map of number to letter so we start with the letters we take the first letter and we find its numerical equivalent we take that number we multiply it by 31 to the power of the length of the string minus one then we take the second letter and we multiply that by 31 to the power of the length of the string minus two the third letter we take times 31 to the power of the length of the string minus 3 and so on and so forth as we're working through each of the letters now i've worked this out with rio which only has three letters so it's relatively easy to compute and we see that the letter r is essentially number 82 according to our letter to number mapping multiply that by 31 squared and we come up with the number 7802 now take the letter i which is letter number one zero five for a lower case i multiply that by thirty one to the power of one and we come up with three three thousand two hundred fifty five take o which is number 111 multiply that by 31 to the power of zero any number to the power of zero is one so we simply have one hundred eleven sum that all up and we come up with the number eighty two thousand one hundred sixty and if you take a look at rio 80 2168 there's our hash code easy enough now i've taken this table and i've computed the hash codes for each of the magnets in my magnet collection so you can see the range from london at minus 2 billion all the way up to monterey it's 702 million so this gives us a rough range of hash codes for our magnets now you might say wait a minute how can we have an equation like this with all positive numbers and how do we have so many negative numbers here well the reason is that the hashcode method returns an int and remember that an in has a range from negative 2.1 billion roughly to positive 2.1 billion so if you were to go to 2.2 billion and try to store that into an int it would essentially flip the odometer and you'd end up on the negative side so we do end up with some negative numbers because that flipping flipping the odometer but the nice thing is because this is an int we know that our absolute max range is going to be minus 2.1 billion up to positive 2.1 billion if we store this in a long we'd likely see that all these numbers would be positive but then our range would be much much bigger because a long can store a much larger number than an end so this tends to give us a nice balance of number range buckets and the like let's consider how we can use hash codes to make a more intelligent collection that offers us quicker response for add remove contains and size operations first of all we can compute our hash code on each object and then what we can do is we can give our index numbers some kind of intelligence and say that anything that was is within this range of minus 2 billion to minus 1.8 billion belongs in bucket number zero anything in the next range of minus 1.8 billion to minus 1.625 billion belongs in bucket number one and so on and so forth so we use the hash code to determine where an object should live in our collection now there are two types of collections that use this first of all a hash set a hash set takes an object and invokes the hashcode method on that object and uses that hash code to determine where the object lives in this collection a hash map on the other hand is a key value pair it's computing a hash code on the key which could be something like a driver's license number if it's the bureau of motor vehicles and then that will link us to an object perhaps my bureau of motor vehicles driving record is linked to my driver's license number so that's a hash map where we're mapping from a key to a value and the value is typically an object okay so now we've computed what our different buckets can be and typically with a hash map or a hash set we'll try to leave about 25 of it open so now that we have our buckets computed i can organize my magnet collection by their hash code so we'll start with sydney sydney is minus 1.8 billion and that belongs in bucket number one now let's go to new zealand new zealand is minus 244 million which is going to put us right about in bucket number nine go to london london is minus 2.1 billion so we're going to put that in bucket number zero isle of man isle of man is 449 million on the positive side which corresponds to bucket number 12. monterey mexico monterey is 702 million so we put that in bucket 13. peru peru is 2.4 million so we will put that looks like right bucket number 10. and then we have zurich zurich is minus 1.6 billion so we'll put that in bucket number two queens queens is minus 1.8 billion now this is interesting you see we have two things in bucket number one now and that's possible there are two ways we can handle this first of all just make a collection and put them side by side like an arraylist or take it to the next open position and put it there or we can decide to reallocate our hashmap hash set and maybe give it a bit more capacity at this time which will give us more spaces to put things but nonetheless that's what typically happens when we have two hash codes that end up in a similar bucket let's continue rio rio is 82 168 so that will go looks like right about next to peru and then coventry coventry is minus 287 million and so that one goes to bucket number eight so you see that we're able to store each of my items but they're not stored in the same order that i inserted them instead they're stored by the order of the hash code the nice thing about this is it's easy to find an item let's consider how we could find something specific like the isle of man magnet the isle of man magnet is 449 million about 450 million on the positive side that's the hash code for the isle of man magnet subtract from that the lowest hash code that we have which is 2.1 billion roughly about 2 billion we come up with a difference of about 2.5 billion now take that 2.5 billion and divide it by the range of each of these buckets so each of these buckets has a constant range in other words the difference between the low value and the high value for each of these buckets it's about 200 million if we take that 2.5 billion the difference between the hash code of the isle of man and our very lowest hash code and we divide it by about 200 million we come up with the number 12. and you see we're able to find exactly where the isle of man magnet belongs without having to introduce ourselves to all of the magnets before because we have a simple algorithm that points us directly there regardless of the size of the collection it's going to point us directly to that location so you see that the collection can be this big can be this small no matter what it's always going to be the same algorithm that's going to point us directly to that address let's consider where the hashmap occurs on our big o notation table there are four operations that we want to consider adding an item to a hashmap removing an item from a hashmap seeing if an if a hashmap contains an item and also looking at the size of the hash map luckily all of these formulas can be accomplished with a couple of very simple algorithms to find the increment or the range of each bucket we simply take the highest hash code subtract from it the lowest hash code and divide by the number of buckets so as our hash map grows the only thing that changes is this denominator the number of buckets and really it doesn't take a computer much more time to divide by 10 as it does to divide by 100 or 1000 so that's relatively constant now to find an item in a hash map what we're going to do to approximate where it is is take a hash code of that item subtract from it the minimum hash code of our hash map and then divide by increment so once again dividing by 10 dividing by 100 dividing by 1000 is roughly equally easy for a computer so for the big operations that we have add remove contains size it's constant or o to the one it's simply running through these algorithms so you notice that as the n increase finding an item adding an item removing an item is going to be constant time with a hash map or a hash set where if we look at a traditional array each new item we add is one more unit of time to do an add remove find or size operation
Info
Channel: Brandan Jones
Views: 7,165
Rating: 4.9867988 out of 5
Keywords:
Id: FhNJ6aikTVI
Channel Id: undefined
Length: 14min 58sec (898 seconds)
Published: Sun Sep 20 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.