Let's look at an introduction to the
binomial distribution, a very important discrete
probability distribution. I'm going to assume that you know
the combinations formula, also known as the binomial coefficient, because it plays a role in the binomial distribution. A coin is flipped 100 times. What is the probability heads comes up
at least 60 times? We'll see that we'll be able to use the
binomial distribution to calculate this probability. Here's another example. You buy certain type lottery ticket once
a week for four weeks. What is the probability you win a cash prize
exactly twice? Well, this might sound a little different
from the tossing coins example, but from a probability perspective, it's
really the same type of scenario. And we can again use the binomial
distribution to calculate this probability. Here's how the binomial distribution arises. The number of successes in n independent Bernoulli trials
has a binomial distribution. I look at the Bernoulli distribution
in another video, and other distributions like the
binomial and geometric are built on a base of independent Bernoulli trials. But let's break down what independent
Bernoulli trials means. Suppose there are n independent trials,
n is a fixed number, and by independent
trials we mean that knowing the outcome of one trial gives us no information about
the outcome on another trial. and each trial can result in one of
two possible mutually exclusive outcomes and we'll label those two possibilities
as success and failure. A success simply means the event
that we are counting up and it may very well be a bad thing,
like having a tire blowout. The probability of success on any one trial is p, and this stays constant from trial to trial.
The probability of success on any trial is p. Success and failure are
mutually exclusive and exhaustive, so failure is simply the complement of success, and the probability of failure is simply
1 minus the probability of success, or 1-p. The random variable X represents a count
of the number of successes in those n trials. Then X has a binomial distribution,
with this probability mass function. The probability the random variable X
takes on the value little x, which I'll sometimes write as p(x), is the probability we get little x successes in n trials. That is the probability of success, p, raised to the number of successes x, times (1-p), the probability of failure, raised to the number of failures, which is n-x. We multiply that by the number of ways
we can get x successes in in trials, which is n choose x. But I'm going to break that down a
little more detail in a moment. It's not a probability distribution
until we list what values X can take on, and here we are counting up the number
of successes in n trials. X can take on only whole number values between 0 and n. The mean or expectation of a
binomial random variable X is equal to np. We can show that mathematically without too much difficulty using the formula for the mean of a
discrete random variable that we've discussed earlier, and a little algebra. But I hope this also makes a little bit of sense. If say ,we had a probability of success
on an individual trial of 0.2, and we had 100 trials, then on
average we'd get 20 successes. The variance of a binomial random variable, sigma squared, is equal to np(1-p). Again, we can show this
using the formula for the variance of the discrete random variable
and a little algebra. And I worked through the derivations of the
mean and variance in another video. Let's look at a simple example. A balanced six-sided die is rolled three times. What is the probability a 5 comes up
exactly twice? You might think this doesn't have a
binomial distribution, since there are six possibilities on
any given roll, any given trial. And that might be a problem
if the question were different. For example, if the question were: what is
the probability a 6 comes up once, a 4 comes up once, a 3 comes up once, and
no other numbers are rolled? Well, this is not a binomial situation,
and we could not answer it with the binomial distribution. We'd have to use a distribution called the
multinomial distribution. But here that's not what the question is asking. We're simply asking about the number of fives. So we will define a success as rolling a 5,
and a failure as rolling anything but a 5. A failure is simply everything that is
not a success. So even though it might not have looked
like a binomial to start off, it reduces to two outcomes on any individual trial. And if we let X represent the number
of fives in three rolls, than X has a binomial distribution with n=3, and p=1/6. The probability of success, 1/6, stays constant from trial to trial. And knowing what happens on one roll gives us no information on what
will happen on another roll, and so the trials are also independent. So the conditions of the binomial distribution are satisfied here, and to find this probability, we can put
this into our binomial formula. The probability the random variable X takes on the value 2 is equal to, our n is 3 and we need 2
successes, so 3 choose 2, times p, which is 1/6, the probability of success, the probability of getting a 5, raised
to the number of successes, which is 2. We multiply that by the probability of failure, 1-p, or 1-1/6, raised to the number of failures, which is 3-2. And if we do this calculation we'd see that this works out 0.0694,
when rounded to four decimal places. So that's our probability of getting exactly 2 5's if we roll a fair die three times. Let's look at the binomial formula in
greater detail. Why does this formula work? p raised to the x times (1-p)
raised to the n-x is the probability of one specific ordering of x successes and n-x failures. But we don't care about the ordering, we
just care about the total number of successes, so we need to multiply that by the
number of possible ways of getting x successes and n-x failures, and that is simply n choose x. Let's take a bit of a closer look at that. Let's look at the die rolling situation
where n was 3. On the first roll we can get either a
success or a failure, and then on the second roll
we can get either a success or a failure, and on each of those four possibilities, on the third roll we can either get a success or a failure, so there are 8 possibilities here.
This one on top gives us success, success, success, so I'm going to write SSS, and this one down here say, gives us failure, failure, success, so I
would write failure, failure, success. Let's fill in the rest of those. Now suppose we wanted to find the probability, as we did in the die example, of exactly two
successes in three trials, or in other words the probability our
random variable X, representing the number of successes, takes on the value 2. Well, if we look at these possibilities,
two successes occurs three times. The individual trials are independent, so to find the probability
of a single one of the sequences, we simply multiply the probabilities on
the individual trials together. Success has a probability of p, and
failure has a probability of 1-p, so each one of those sequences individually has a probability of p^2 times (1-p) since success happens twice and failure happens once. To get the overall probability, we need
to multiply the probability of a specific ordering of x successes and n-x failures by the number of ways x successes happens. So here, the probability of getting
exactly two successes. is going to be equal to 3 times p^2 times (1-p), and I'll just put that to the
first power, indicating failure happened once. But if we worked out the binomial
coefficient here, 3 choose 2, we'd see that that is equal to 3.
n choose x gives us, in the general setting, the number of different ways of getting x successes in n trials. And so in the general case the probability mass function for the
binomial distribution, the probability the random variable X takes on the value little x, is equal to n choose x, times p raised to the number
successes, x, times (1-p) raised to the number
of failures, n-x. Let's look at another example.
According to Statistics Canada life tables, The probability a randomly selected
ninety-year-old Canadian male survives for at least another year is
approximately 0.82. If twenty ninety-year-old Canadian males
are randomly selected, what is the probability exactly 18
survive for another year? Well, it might appear to be a bit more
complicated situation than rolling dice, but the premise is essentially the same. Here we will call a success
the man surviving for at least a year, because we're counting up the number that survive. And a failure is that they die within the next year. We have a fixed number of men, 20 of them, and since we're randomly selecting the men it's pretty
reasonable to consider them independent. Although conceivably there could be a wacky
situation like a serial killer focussed on killing ninety-year-old Canadian males, but that's probably not something we need to
worry too too much about. So here the binomial distribution is reasonable. We'll let X represent the number of men
that survive for at least one more year. Then X is going to have a binomial
distribution with an n of 20 and a p of 0.82. and you might see that written as X is distributed, or has the binomial distribution, with an n of 20, and a p of 0.82. We want the probability that X takes on the value 18. We're going to use our binomial
probability mass function to calculate that. That's simply going to be equal our n of 20, choose 18, times the probability of success, 0.82, raised to the number of successes, 18, times 1 minus the probability of
success , 1-0.82, raised to the number of failures, or 2-18. And this, rounded to three decimal places,
works out to 0.173. So the probability that exactly 18
of these randomly selected men survive for at least a year is 0.173. If we worked out the probabilities for
all the possible values of X, and plotted it, it would look like this. This is the binomial distribution when n is 20, and p is 0.82. And the probability we just worked out,
the probability that X takes on the value 18, is over here. Right here was the value
0.173 that we just worked out. If we wanted the mean number of men
would survive, the mean of this probability distribution, that would be mu, for the binomial
distribution is np, and in this particular case that's 20 times 0.82, and that works out to 16.4. So on average, 16.4 of the
twenty randomly selected ninety-year-old Canadian males would survive for at least one more year. The variance of a binomial distribution is np(1-p). And here that's 20 times 0.82 times (1-0.82) And that works out to 2.952. And the standard deviation is simply going to be the square root of 2.952. Suppose we have a different type of question. In the same setting what is the probability that
at least 18 men survive for another year? We need the probability of at least 18,
so let's shade those values in. We're going to need to add the probabilities of 18, 19, and 20 together, We're going to have to use the binomial
formula three times, on each one of those values individually. So let's go ahead and do that. The probability the random variable X
takes on a value that is at least 18 is the probability equals 18 plus the probability it equals 19
plus the probability it equals 20. n was 20, so X can't possibly take on a value greater than that. And to find each one of those, we simply
use the binomial formula with n=20 and p=0.82, the values given in the problem. So this is the probability of 18, this is the probability of 19, and down here is the probability of 20 And you can verify for yourself that
these work out to these values given here. I'm rounding to 3 decimal places here, but you should carry many decimal places
throughout your calculations. If we are add those up, we get our final
answer of 0.275, which again is the correct answer
rounded to three decimal places. The probability that at least 18 of the 20 men survive
for at least another year is 0.275. The binomial distribution is an extremely
important discrete probability distribution, one that has many applications in
probability calculations and statistical inference, so we'll deal with the binomial distribution
very frequently in probability and statistics.