Data Structures: Hash Tables

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi, I'm Gayle Laakmann McDowell, author of Cracking the Coding Interview. Today's topic is one that you really don't want to miss - hash tables. A hash table is possibly the most useful data structure for interview questions. It comes up all the time both in interviews and in real life. In fact one technique I often tell people is just, for any problem, have hash tables at the top of your mind as a possible technique to solve the problem. So let's talk a bit about what a hash table is. At a high level a hash table is a key value look up. So it gives you a way of, given a key, associating a value with it for very very quick lookups. So suppose you had some situation where you needed to associate somebody's name with some set of information about them. A hash table would be the perfect solution for this problem because you can just put this into the hash table and then you can say, okay give me the data associated with Mary and then boom we can get that information immediately. So in a hash table the key as well as the value can be basically any type of data structure. A string is often used but it could be a circle, a square, a person, pretty much anything, as long as you have a hash function. So what does that mean? Well let's turn to the implementation to talk about that. So at a high-level we do want to store the objects in an array. So let's picture that. But how do we actually jump from a string to a particular index in the array? Well that's what a hash function does. So a hash function takes in a string, converts into some sort of integer, and then remaps that integer into an index into that array. And then that's where we can find the person we're looking for. So it's important to understand that the hash code is not actually the index inn this array. We map actually from the key to the hashcode and then over to the index. And one of the reasons for this is that the array that actually stores the data from the hash table might be much much smaller than all the available potential hash codes. And so we don't want to have an array of three billion just because there's three billion potential hash codes. We actually remap it into something smaller. Now note here that two different strings could actually have the same hash code and that's because there are an infinite number of strings out there but a finite number of hash codes. So it's theoretically possible for Alex and Sarah to actually have the same hash code. Additionally since we're remapping the hashcode into an even smaller index, two things with different hash codes could actually wind up mapped to the same index. So what do we do when this happens, which is called the collision? There are different ways of resolving collisions and I really do encourage you to look this up on your own time, but I'll just talk about one of them which is called chaining. And chaining is possibly the most common one and it's very simple. But it basically means is just, hey when there is collisions just store them in a linked list. So rather than this being an array of people it's actually going to be an array of a linked list of people. And so that means though that when you get a call to say hey get the value of Alex, you actually need to look through all the values in that linked list, and pull out the value for Alex. Now as you'll note here this linked list contains not just the actual person objects but the actual original keys as well. And the reason for that is that if you only store the person object you'd see all these people who mapped to this index but you wouldn't know which one they are. And so you actually need to store the keys with them. So when you get a call to say get me the value for Alex, then you actually call this hash function, you get a hash code back, you map over to this index and then you walk through and look for the thing with the key of Alex. So that's the basics of how a hash table operates. So let's walk through this from start to end. We're going to have this hash table class that underneath it has a array that actually holds the data. And the array isn't going to be an array of the actual values it's going to be an array of the linked list of the values. Then when someone says put the value of Alex, put the value Alex mapping to this person, we call this hash code function that gets us back the hashcode. Then we map from this hash code over to an index and that gets us over to this index in the array and then we put it into this linked list. If there's nothing else, great, then we're just going to have a one element linked list. But there could be multiple values in there, in which case we walk through it. Now what if there is already a value of Alex in there? Well then we just fix that value immediately. So let's talk now about the runtime of a hash table. Well it really depends on what we're talking about and what assumptions we we make. In many cases we assume that we have a good hash table and a good hash function that really distributes our values well and so for the purpose of an interview, we often just summarize this and say okay, getting and setting in a hash table is constant time. In reality if something weird happens we have a bad hash function blah blah blah, we could also say it is linear time in the very worst case. But for the purpose of most problems we generally talk about constant time because in the real world we're going to make pretty sure hopefully that we have a good hash table. So this is a bit of a high level view of what a hash table is and how it works. There's a whole lot of complexity you can dive into here about different ways of handling collisions and exactly what happens when a hash table, you start to put a ton of elements in the hash table, how does it grow with new elements? Can it be resized? All that sort of stuff. I really do encourage you to go look at this stuff on your own time. It's a fantastic thing to really understand a lot of the complexity here. But for the purpose of a lot of interview questions we just sort of simplify all this down and summarize things as, let's assume we have a good hash table and so we get this beautiful constant-time insert find and delete. So now that you understand the basics, go ahead and do try to learn these more complex details of hash tables, but also go practice some of these basics on your own time on your own problems. Good luck.
Info
Channel: HackerRank
Views: 1,055,544
Rating: 4.6709404 out of 5
Keywords:
Id: shs0KM3wKv8
Channel Id: undefined
Length: 6min 25sec (385 seconds)
Published: Tue Sep 27 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.