Hi, I'm Gayle Laakmann McDowell, author of Cracking the Coding Interview. Today's topic is one that you really don't want to
miss - hash tables. A hash table is possibly the most useful data structure
for interview questions. It comes up all the time both in interviews and in real life.
In fact one technique I often tell people is just, for any problem, have hash
tables at the top of your mind as a possible technique to solve the problem. So let's talk a bit about what a hash
table is. At a high level a hash table is a key value look up. So it gives you a way of, given a
key, associating a value with it for very very quick lookups. So suppose you had
some situation where you needed to associate somebody's name with some set
of information about them. A hash table would be the perfect solution for this problem
because you can just put this into the hash table and then you can say, okay
give me the data associated with Mary and then boom we can get that
information immediately. So in a hash table the key as well as the value
can be basically any type of data structure. A string is often used but it
could be a circle, a square, a person, pretty much anything, as long as you have
a hash function. So what does that mean? Well let's turn to the implementation to
talk about that. So at a high-level we do want to store the
objects in an array. So let's picture that. But how do we actually jump from
a string to a particular index in the array? Well that's what a hash function does. So
a hash function takes in a string, converts into some sort of integer, and
then remaps that integer into an index into that array. And then that's where we
can find the person we're looking for. So it's important to understand that the
hash code is not actually the index inn this array. We map actually from the key
to the hashcode and then over to the index. And one of the reasons for this is
that the array that actually stores the data from the hash table might be much
much smaller than all the available potential hash codes. And so we don't
want to have an array of three billion just because there's three billion
potential hash codes. We actually remap it into something
smaller. Now note here that two different strings could actually have the same
hash code and that's because there are an infinite number of strings out there
but a finite number of hash codes. So it's theoretically possible for Alex and
Sarah to actually have the same hash code. Additionally since we're remapping the
hashcode into an even smaller index, two things with different hash codes could
actually wind up mapped to the same index. So what do we do when this happens, which is called the
collision? There are different ways of resolving collisions and I really do
encourage you to look this up on your own time, but I'll just talk about one of
them which is called chaining. And chaining is possibly the most common one
and it's very simple. But it basically means is just, hey when there is
collisions just store them in a linked list. So rather than this being an
array of people it's actually going to be an array of a linked list of people.
And so that means though that when you get a call to say hey get the value of
Alex, you actually need to look through all the values in that linked list,
and pull out the value for Alex. Now as you'll note here this linked list contains
not just the actual person objects but the actual original keys as well. And the
reason for that is that if you only store the person object you'd see all
these people who mapped to this index but you wouldn't know which one they are. And
so you actually need to store the keys with them. So when you get a call to say get
me the value for Alex, then you actually call this hash function, you get a hash
code back, you map over to this index and then you walk through and look for the thing
with the key of Alex. So that's the basics of how a hash table operates. So
let's walk through this from start to end. We're going to have this hash table
class that underneath it has a array that actually holds the data. And the array
isn't going to be an array of the actual values it's going to be an array of the
linked list of the values. Then when someone says put the value of Alex, put the
value Alex mapping to this person, we call this hash code function that gets
us back the hashcode. Then we map from this hash code over to an index and that gets us over
to this index in the array and then we put it into this linked list. If there's
nothing else, great, then we're just going to
have a one element linked list. But there could be multiple values in there, in
which case we walk through it. Now what if there is already a value of
Alex in there? Well then we just fix that value
immediately. So let's talk now about the runtime of a hash table. Well it really depends on what we're
talking about and what assumptions we we make. In many cases we assume that we
have a good hash table and a good hash function that really distributes our
values well and so for the purpose of an interview, we often
just summarize this and say okay, getting and setting in a hash table is
constant time. In reality if something weird happens we have a bad hash
function blah blah blah, we could also say it is linear time in the very worst
case. But for the purpose of most problems we generally talk about
constant time because in the real world we're going to make pretty sure
hopefully that we have a good hash table. So this is a bit of a high level view of
what a hash table is and how it works. There's a whole lot of complexity you
can dive into here about different ways of handling collisions and exactly what
happens when a hash table, you start to put a ton of elements in the hash table,
how does it grow with new elements? Can it be resized? All that sort of stuff. I really do encourage you to go look at this stuff on your own
time. It's a fantastic thing to really
understand a lot of the complexity here. But for the purpose of a lot of
interview questions we just sort of simplify all this down and summarize
things as, let's assume we have a good hash table and so we get this beautiful
constant-time insert find and delete. So now that you understand the basics, go ahead and do try to learn these more
complex details of hash tables, but also go practice some of these basics on your
own time on your own problems. Good luck.