Let's look at an introduction to the
Poisson distribution, an important discrete probability
distribution. Suppose we are counting the number of occurrences of an event in a given unit of time or distance or
area or volume. For example, we might be counting up the
number of car accidents in a day, in a city like Toronto perhaps. Or the number of dandelions in a square metre plot of land. Then the number of events is going to be a
random variable that may or may not have the Poisson distribution, depending on the specifics of the situation. But a Poisson random variable is a count
of the number of occurrences of an event. I'm going to phrase the
following in terms of time, but the same ideas hold if we are
discussing distance or area or volume etc. Suppose events are occurring
independently. In other words, knowing when one event happens gives absolutely no information about
when another event will occur. And the probability that an event occurs
in a given length of time does not change through time. In other words, the theoretical rate
at which the events are occurring does not change through time. A little more loosely we might say that the events
are occurring randomly and independently. If these conditions hold,
then the random variable X, which represents the number of
events in a fixed unit of time, has the Poisson distribution. Here's the probability mass function for
the Poisson distribution, what we'll use to calculate probabilities. The probability the random variable X
takes on the value little x, which you'll sometimes see written as p(x), is equal to lambda^x times e^(-lambda) over x! e, like pi, is an important mathematical
constant, the base of natural logarithms. It is approximately 2.71828,
but it is an irrational number that has infinite non-repeating decimal places. We've discussed factorials previously,
but as a specific example of this x!, 5! would be 5 times 4 times 3 times 2 times 1, and that would be 120. It's not a probability distribution
until we say what values X can take on. Here the random variable is a count of the
number of events in a given unit of time, and so it can take on any non-negative
whole number value. So this is the probability mass
function that we use to calculate probabilities for any value of x that's 0,1, 2, off to infinity. There is no upper bound on the
value that X can take on. But depending on the
situation the probabilities eventually get tiny for large values of X. The mean of the Poisson distribution is lambda. So mu, the mean of the random variable X,
is equal to lambda. So we could have used mu
as our parameter, and some sources do that. But we often use lambda for the Poisson distribution. The variance of the Poisson distribution, which we'll label as sigma squared, is
also equal to lambda. For the Poisson distribution, the mean
and the variance are equal. Let's look at an example. Plutonium-239 is an isotope of plutonium
that is used in nuclear weapons and reactors. One nanogram, or 1 billionth of a gram, of
plutonium 239 will have an average of 2.3 radioactive decays per second. And the number of decays in a given
period will follow, to a very close approximation,
a Poisson distribution. Here we'd like to know:
what is the probability that in a randomly selected two second period
there are exactly 3 radioactive decays? We'll let the random variable X represent the number of decays in a
two second period. Lambda is the mean number
of decays in that period. So here we have an average of 2.3
radioactive decays per second, but we're talking about a two second period and so in that period,
the mean number of occurrences, which is going to equal lambda,
is going to be 2.3 times 2, which is 4.6. And so X has a Poisson distribution with lambda equal to 4.6. We want to find the probability that the
random variable X takes on the value 3. And the Poisson probability mass function is lambda^x times e to the
minus lambda, over x! And here that's going to be 4.6, lambda, raised to the third power times e to the -4.6 divided by 3! If we worked that out on a calculator or computer we'd see that's equal to 0.163, when rounded to three decimal places. So that is the probability of getting
exactly three radioactive decays in a two second period. If we were to
calculate the probabilities for the different possible values of X and
plot them, we'd get this. This is the probability
distribution of the random variable X in this spot, a Poisson distribution
with lambda equal to 4.6. The number we calculated, the probability
that X takes on the value 3, is here. That's what we just calculated to be 0.163. We can see here that X takes on the
possible values 0, 1, 2, on up. I've truncated the plot over here at 15, since the possible values go off to infinity, but the probabilities start getting very
very small. But for a Poisson distribution there is
no upper bound on the values the random variable X can take on. For this distribution, the mean mu is
equal to lambda, and that's 4.6 here. The variance is also equal to lambda, so that's also equal to 4.6 And if we wanted the standard deviation
sigma, we'd simply take the square root of 4.6. If we look closely we can see that
there's a hint of right-skewness in this distribution. The Poisson distribution has some right skewness, but it depends on the value of lambda. When lambda is large, the distribution will
be close to symmetric, when lambda is close to 0, the right
skewness can be pretty strong. Suppose we wanted a different probability, the probability there are no more than three radioactive decays. Here that's the red bit on the plot. We'd need to work out the probability of 0, of 1, of 2, and of 3 using the Poisson probability mass function, and add them together.
So let's go ahead and do that. Here we need to find the probability
that the random variable X takes on a value less than or equal to 3, which is the sum up the probabilities of 0, 1, 2, and 3. We put these values of x into the Poisson probability mass
function with a lambda of 4.6, and when rounded to three
decimal places, these probabilities work to these 4 values, and they sum to 0.326. Working out probabilities like this can
be a bit of a pain if there are a lot of values, so we often rely on software to carry out
the calculations. There is an important relationship that
sometimes helps us determine whether a random variable has a Poisson distribution. The binomial distribution tends toward the Poisson distribution as n tends to infinity, p tends to 0 and np stays constant. For us, at the moment, the important bit
is that the Poisson distribution with lambda equal to np from the binomial distribution, closely approximates the binomial distribution if n is large and p is small. In fact, this is why the radioactive decays of plutonium
has a Poisson distribution. Even for a tiny bit a plutonium, there are a very large number of atoms, and each one has a tiny probability of
experiencing a radioactive decay in a two second period. So in the example we just worked through, it was in its underlying nature a
binomial problem with a very large n and a very small p. And that's why the
number of radioactive decays is very well approximated by the Poisson distribution. I have videos that explore this relationship in greater detail. Like many models in
probability and statistics, the Poisson distribution is typically
used as an approximation to the true underlying reality. In most situations where we use the Poisson, we know that the Poisson distribution
doesn't fit the scenario precisely but we use it as an approximation.
Possibly a very good approximation. But it can be difficult to determine whether a random variable has a Poisson
distribution to a reasonable approximation. So I'm going to look at a few examples and
discuss some considerations in another video.