Let's take a look at an introduction to
discrete random variables. Suppose we're about to roll a die four
times and record the number of sixes. The number of sixes in those 4 rolls is
a random variable that will eventually take on a value. We know when we carry out the rolling and count it up we know that we're either going to get 0 sixes, 1 six, 2 sixes, 3 sixes, or 4 sixes. We simply don't know which one
of these five values is going to happen. So the number of sixes is a random variable that will eventually take on one of
these 5 possibilities. And these five possibilities are going
to have different probabilities of occurring. A random variable is a variable that
takes on numerical values according to some sort of chance process. Meaning there is some sort of randomness involved. Random variables can be either
discrete or continuous. And this is going to be an
important distinction for us because we're going to have to handle these
situations a little bit differently. Discrete random variables can take on a
countable number of possible values. So if we think back toward our die example, we had the possible values 0, 1, 2, 3, and 4. There were 5 possible value, so that's
a countable number of possible values. Another way of thinking about this is
that discrete random variables can take on a value from a set of distinct possible values like these. Now discrete random variables
do not always represent a count. We might have a discrete random variable
that takes on the values 1.5, 2.5, and 3.5. These aren't count data, but we have
these three possible values that our random variable can take on, and as such it's going to be a discrete random variable. Continuous random variables on the other hand can take on all values in an interval. So suppose we had some container that
had a maximum capacity of 2 litres, and we went outside and put that in our backyard and we intended on coming out in a couple of weeks
and seeing how much water was in there. Well the amount of water in that
container is going to be a random variable that can take on any value between 0 and 2, so any value in the interval 0 to 2 litres. But it can take on any value in there, so
not just 0, 1, and 2, not just these distinct values like up here, anything in this interval. So an infinity of possible values in here, so 0, or, 0.3259862 litres, or any value between 0 and 2, so there is
a continuum there, an infinity of possible values between
these two endpoints. Here's a couple of examples of discrete
random variables. The number of lottery tickets
purchased until the first winning ticket. Well, we might get a winning ticket on
our first ticket. Or we might have to wait until our second
ticket, or we might have to wait until our third ticket. And there's no upper bound, so this is
going off to infinity there. But even though this is going off to infinity, this is still a countable number of values.
This is still a discrete random variable. How about the number of courses a randomly
selected university student is taking? Well the possible values here are 0 (in some situations a person could still
be considered a university student if they're taking 0 courses) for instance some graduate students. 0, or 1, or 2, or 3, there's some maximum here but it depends
on the University what that would be. So there's some upper bound maybe it's five or six or seven or eight but that would depend on the specific
University. So that is still a countable number of values here,
so it is still a discrete random variable. Here are a couple of examples of continuous
random variables. The time until a newly-released website
gets its first hit. Well that's got to take on some value
greater than 0, right, so let's just say greater than 0. And the height of a randomly selected
adult Canadian male, again that's just going to take on some value greater than 0. There is some minimum value, we don't
have an adult Canadian male that's less than two centimeters tall, say. But there is no natural lower bound
here, no natural upper bound. So let's just say some value greater than 0, even though in practical cases
there's some more practical range of values. In either one of these cases, there aren't discrete jumps. It's not like a person is 174 cm tall and the next possibility is 175 cm tall. Anything in there, there's an Infinity of
values between those two possibilities. So these are continuous random variables. Let's go back to discrete random
variables and look at an example. Approximately three percent of the United States
adult population is under correctional supervision, meaning they're either in jail or on
probation or on parole. Now this 3% is approximately true, but let's pretend that's exactly true
for the purposes of this example. Suppose we randomly sample 2 US adults.
And we let capital X represent the number of adults in our sample that are under correctional supervision. We typically represent random variables
with capital letters near the end of the Roman alphabet, so very often we
have X or possibly Y, or possibly Z. Let's list the possible values
of X and their probabilities of occurring. Well, the possible outcomes, which I'm just
going to label here as the possibilities are that the first person we get is not
under correctional supervision, that's one possibility and the second person is not under
correctional supervision as well, so I'm going to let N represent someone
was not under correctional supervision. Another possibility is that the first
person we get is not under correctional supervision but the second person we get is under
correctional supervision. Another possibility of course is that
the first person we get is under correctional supervision and the second is not. And the fourth possibility is that both people that we randomly sample are under correctional supervision. The value of X, the value our random
variable takes on in these spots, well, if you recall, X was the number of
people that are under correctional supervision. So here it takes on the value 0, here would
take on the value 1, here it would take on the value 1, and here
would take on the value 2. Now let's list the probabilities here. So if we calculated the probabilities of these things happening, if we are sampling randomly and independently, and the probability that a person is
under correctional supervision is 3%, the probability they're not is 97%. And if they're randomly and independently [sampled] we can simply multiply those probabilities together here. So the probability of getting N and then N is 0.97 time 0.97. The probability of getting N and then C
is 0.97 and then times 0.03. The probability of getting C and then N is 0.03 times 0.97. And the probability of getting two C's is 0.03 times 0.03. Now this works out to 0.9409 This works out to 0.0291. And of course this is the same
something thing, 0.0291. And this works out to 0.0009. Now if we were to summarize that, if I
said the value of X, if I summarize this with my value of X, and
said it can take on the possible values 0, 1, and 2. And the probability of those occurrences,
of our random variable X, well this one has probability of 0.9409 this one has a probability, well 1 happens in these two spots, so this is going to be these two added
up together, and we get 0.0582. And 2 is 0.0009. So we might say something like: the probability that the random variable X takes on the value 2 is equal to 0.0009. But we might want to just do this in
general, so we do want to write this sometimes the probability that the random variable X takes on some value, and we call
that some value little x. And the capital X represents the random variable X, whereas little x represents some value of the random variable X. So we could call this value of X, we
could sometimes wright that as, I can put it in brackets here, little x, and this probability we
represent as the probability that your random variable X takes on the value little x. and we sometimes right that as p(x). Now what we have here, what we have just done here, is created the probability distribution of our random variable X. A probability distribution for a random variable X is a listing of all possible values
of X and their probabilities of occurring. This could be a listing like we had on
the previous slide, or this can be a formula or this can be
some sort of graphic representation. We know that all discrete probability
distributions must satisfy these conditions. p(x) represents a probability.
And probabilities have to lie between 0 and 1. So p(x) is between 0 and 1 for all x. And we are listing all possible values of X,
and their probabilities of occurring. So if we sum up the probabilities of all possible values of X, we're going have to get 1. And if we go back to our discrete
probability distribution here, we would see that the probabilities do
in fact all lie between 0 and 1. And if we added them up, we would see
that they sum to 1. Depending on the discrete
probability distribution, these values can be anything. Here we have 2, that doesn't lie between 0 and 1, maybe it's negative maybe it's a billion
depending on the situation. The values of x can be anything,
depending on the situation, but the probabilities have to satisfy
those conditions on the previous slide. Now we very often plot this out to see
what this looks like visually, So let's plot these values of x and their
probability of occurring. Here's a visual representation of our
probability distribution of X. Now the probability of zero was very
high, and the probability of one was much lower. And 2 you can't quite see
here, it's 0.0009, doesn't quite show up visually for us, but it's there. So plotting out our discrete probability distribution can help us visualize what's happening. Let's contrast what a discrete
probability distribution looks like with what a continuous probability
distribution looks like. Here I have our discrete probability distribution from the last page where we have these discrete jumps between these values. It goes from 0, jumping down to 1, and then jumping to 2 in this case. And we had these three possibilities here. Down below is approximately the
distribution of the heights of Canadian males. And this is a continuous random variable and this if you see, is modeled with a
smooth curve, we don't have these distinct discrete jumps like we do for discrete probability
distributions. So when we do eventually talk
about continuous probability distributions, we're going to have to handle them in a slightly different way.