Let's look at an introduction to the hypergeometric distribution, another important discrete probability distribution. I'm going to assume that you know the combinations formula, also known as the binomial coefficient,
both its meaning and how to calculate it, because it's going to play a big role in
the hypergeometric distribution. If you don't recognize this, you should look
into it before watching this video. I'm also going to assume that you've
previously been introduced to the binomial distribution, because I'm going to be comparing the
binomial and hypergeometric distributions here. Let's look at an example to start. An urn contains 6 red balls and 14 yellow balls. (These types of urn and balls problems are
classic hypergeometric problems.) Five balls are randomly drawn without replacement. What is the probability exactly 4 red balls are drawn? An important point to note here
is that the sampling is done without replacement, and by that I mean once a ball is chosen we look at the color and set it aside and it cannot be chosen again. And that implies that the trials are not independent. Knowing what happens on one trial gives
some information about the probabilities on other trials. Because the trials are not independent, the binomial distribution would not be appropriate here. Before we take a formal look at the hypergeometric distribution, let's calculate this probability by thinking
through the underlying logic. If we are randomly selecting five balls, then any sample of five balls is equally likely. The probability of getting exactly four red balls is the number of samples that result in
exactly four red balls and one yellow ball (since we're picking five balls and four
must be red, one must be yellow), divided by the total number of possible samples of size 5. Recall that there were 6 red
balls and 14 yellow balls, for 20 balls in total. The total number of possible samples then is 20 choose 5. This is the combinations formula, the
number of ways of picking 5 balls from 20. In the numerator we need the number of
ways of getting four red balls and one yellow ball. There are 6 red balls, and from those we must pick 4, and there are 14 yellow balls, and from those we must pick 1, and so 6 choose 4 times 14 choose 1 over 20 choose 5. If we use our combinations formula properly here we'll see that this works out to 15 times 14 over 15,504. To 5 decimal places this is 0.01354. It's not appropriate to use the binomial distribution here. Since the sampling is done without
replacement, the trials are not independent. The probability of getting a red ball will change from trial to trial depending on what happened in other draws. For example, on the first draw, since there are 6 red balls and 14 yellow the probability of getting a red ball on that first draw is 6 out of 20 or 0.3. But suppose the first draw is a red ball. Then on the second draw the probability of getting a red ball is now only 5 out of 19. There's 5 red balls left out of 19 total, and that's a little less than 0.3. And so the probability of success on any
individual trial depends on what has happened on the other trials. The trials are not independent and independence is one of the necessary conditions for the binomial distribution to hold. Suppose instead that the sampling had
been done with replacement, meaning that when a ball is chosen we
look at the color and count it, but place it back in the urn so that it might be chosen again. The probability of getting a red ball on
any given trial is simply 6 out of 20, regardless of what happened on the other trials. And since the trials would be independent here, the binomial distribution would be appropriate. I'm not going to go into the details, but
this would be the appropriate method of calculating the probability using the binomial formula. (You can see my video on the binomial
distribution for more information.) To 5 decimal places this works out to 0.02835. Compare that to the probability we found
previously using the hypergeometric distribution, when the sampling was done without replacement. There we found a probability of 0.01354. The probability found from the binomial distribution with replacement is actually quite a bit different from that. So if we were to mistakenly use the binomial
distribution in the without replacement case our calculated probability would be quite a bit off. In our hypergeometric distribution
example we simply thought through the problem and came up with the probability using some logic. But now let's give a little more formal
introduction to the hypergeometric distribution. Suppose we are randomly sampling n
objects without replacement from a source that contains a successes and capital N - a failures. There are capital N objects altogether.
There's only two types of objects: the successes, and there are a of those, and the failures, and there are N-a of those. And we're going to let the random variable X represent the number of successes in the sample. Then the random variable X has the
hypergeometric distribution, with this probability mass function. The probability the random variable X takes on the value little x, which I'll sometimes write as p(x), is equal to this quantity. In order to get little x successes from the total number of successes a, we must pick x of them, and from the N-a failures we must choose n-x of those and the denominator is simply the total number of samples, the number of ways are picking little n
objects from capital N objects total. What values can X take on here?
What are the possible number of successes? Well, it's the number of successes, so can
only take on whole number values. The minimum and maximum numbers are a little bit messy. The number of successes in the sample can't possibly take on a value bigger than the number of objects we are choosing, so it couldn't possibly be bigger than n, and it also can't take on a value bigger than
the number of successes in the population, so the maximum value X can take on
is going to be the minimum of a and n. As for the minimum, we know that the
number of successes can't possibly be less than 0, but it also can't be less than this quantity. To make that a little easier to see I'm going to write this as little n - (N-a) and this is the number of objects we are sampling minus the total number of failures in the population. The number successes has to be at least that quantity. So the minimum value X can take on is
the maximum of 0 and this quantity. The mean of a hypergeometric distribution is equal to n times the number of successes a
over the total number of objects, capital N. In other words, n times the proportion
of successes in the population. Note that this looks a little bit like np, which is the mean of a binomial random variable, and it's pretty much the same thing here. There's also a formula for the variance,
but it's a little ugly and I'm going to leave it out here. If you need it you can easily look it up. Let's look at a different example.
Suppose a large high school has 1100 female students and 900 male students for 2000 students in total. A random sample of 10 students is drawn and we want to find the probability that exactly seven of the selected students are female. Here, although it doesn't state it explicitly, it's implied that the sampling is done without replacement. If, say, your boss asked you to get a sample of 10 people and you come back with 2 people and tell your boss you sampled one of
them 6 times and the other one 4 times, you'll likely be looking for another job in the very near future. If we let the random variable X
represent the number of female students selected, then we need to find the probability
that the random variable X takes on the value 7. Again, I strongly recommend in this type of problem that you don't try to put the values into the
formula and you simply try to think it through logically. Most people find it easier to find the
correct probabilities that way. Here the denominator is going to be the
total number of possible samples, and we are picking 10 students from 2000, and so the denominator is going to be 2000 choose 10. The numerator is the number of ways of getting
exactly 7 female students, and from those 1100 female students we must pick 7. But we're not done yet. In order to get
exactly seven female students in a sample of 10, we must also pick 3 male students, so from the 900 male students we must choose 3. The probability of getting exactly 7 females is 1100 choose 7 times 900 choose 3 divided by 2000 choose 10. Note that 1100 plus 900 is equal to 2000, and 7 plus 3 is equal to 10. This is not a coincidence, and it
will work out like that if done properly, and so that can be a useful double check on your calculations. To 6 decimal places this works out to 0.166490. If you feel the need to use the formula
for the probability mass function on the previous slide, then capital N represents the total number of objects, or here the total number of students,
and we had 2000 students. Little n represents the number of objects
that we're sampling and that is 10. a is the total number of successes in the population, and since we're counting up the number of females we're calling getting a female student a success, and the total number of female students is 1100. If we put all of these into the formula
for the probability mass function from the previous slide we'd get what we have over here. It might be informative to try that once, but most people find it easier if we just think it through logically and not rely on the formula. What if we had ignored the fact that
the sampling was done without replacement, and we used the binomial distribution instead? What if we simply said the probability of
getting a female student on any given trial is simply the number a female students we had, 1100, over the total number of students, and we said that that was 0.55 on each trial, and we ignored the fact that that's changing from trial to trial. If we put this into the binomial formula
we would see that this works out to 0.166478. But since the sampling was done without replacement,
that is not the correct probability. Recall that when we used the
hypergeometric distribution on the last page, we found that the correct probability was 0.166490. Wait a minute, these two probabilities are pretty close. The incorrect one calculated with the binomial distribution is pretty darn close to the correct probability for this example. And that leads us to this point: the binomial distribution can sometimes be used to provide a reasonable approximation to the hypergeometric distribution. In most cases it will provide a reasonable approximation if we're not sampling a very large proportion of the population. And as a very rough guideline if we are
not sampling more than 5% of the population, the binomial distribution would provide
a reasonable approximation. Why would we want to use the binomial
distribution as an approximation? Why wouldn't we simply use the
hypergeometric distribution if it's the appropriate distribution? Well it turns out that in some cases the
binomial distribution is easier to work with. In some probability calculations and statistical inference scenarios, the true underlying reality might imply
a hypergeometric distribution, but the binomial distribution might
provide a very good approximation, and might be much easier to work with. If we look back at this example we were
sampling only 10 people out of 2000 total which is 0.5%. The guideline tells us that the binomial
distribution would provide a reasonable approximation here. Why is this? Here 55% of the population is female, so the probability the first student selected is female is 0.55. But as students are selected the probability of selecting a female student
is going to change a little bit, depending on what students were selected before. But since we're only sampling a small proportion of the population, the probability is not going to change very much. For example, suppose the first three
students selected were female, the probability the next student selected is female is 1097,
the number of remaining female students, over 1997, the total number of students remaining. This is a little bit less than 0.55, but it's still pretty close. So this probability changes only a
little bit and the binomial distribution, which assumes a constant probability of success regardless of what happens on the other trials, provides a very reasonable approximation in this situation. One last thing. These methods can be
extended to more than two groups, and let's take a quick look at that. Suppose that in the US a business employs 12 Democrats, 24 Republicans, and 8 independents. If a random sample of 6 employees is drawn, and suppose without replacement again, what is the probability there are 3 Democrats, 2 Republicans, and 1 independent in the sample? If we didn't rely on the formula in the earlier examples, and we understood the underlying logic, we can extend those methods to this type of situation, where there are three or more groups instead of just two. Here there are 12+24+8 people, or 44 altogether. So when we're calculating our probability,
the total number of possible samples, which we put in the denominator is going to be 44 choose 6, because we're picking 6 people from 44. In the numerator we need the number of
ways of getting 3 Democrats, 2 Republicans, and 1 independent. From the 12 Democrats we must pick 3, and from the 24 Republicans we must pick 2, and from the 8 independents we must pick 1. And all of this works out to 0.0688. So the methods the hypergeometric
distribution can be extended to more than two types of object. This is sometimes called the multivariate hypergeometric, and this example is simply a very quick introduction to that.