The following content is
provided under a Creative Commons license. Your support will help MIT
OpenCourseWare continue to offer high quality educational
resources for free. To make a donation, or view
additional materials from hundreds of MIT courses, visit
MIT OpenCourseWare at ocw.mit.edu, OK. So let us start. All right. So today we're starting a
new unit in this class. We have covered, so far, the
basics of probability theory-- the main concepts and tools, as
far as just probabilities are concerned. But if that was all that there
is in this subject, the subject would not
be rich enough. What makes probability theory
a lot more interesting and richer is that we can also talk
about random variables, which are ways of assigning
numerical results to the outcomes of an experiment. So we're going to define what
random variables are, and then we're going to describe them
using so-called probability mass functions. Basically some numerical values
are more likely to occur than other numerical
values, and we capture this by assigning probabilities
to them the usual way. And we represent these in
a compact way using the so-called probability
mass functions. We're going to see a couple of
examples of random variables, some of which we have already
seen but with different terminology. And so far, it's going to be
just a couple of definitions and calculations of
the type that you already know how to do. But then we're going to
introduce the one new, big concept of the day. So up to here it's going to be
mostly an exercise in notation and definitions. But then we got our big concept
which is the concept of the expected value of a
random variable, which is some kind of average value of
the random variable. And then we're going to also
talk, very briefly, about close distance of the
expectation, which is the concept of the variance
of a random variable. OK. So what is a random variable? It's an assignment of a
numerical value to every possible outcome of
the experiment. So here's the picture. The sample space is this class,
and we've got lots of students in here. This is our sample
space, omega. I'm interested in the height
of a random student. So I'm going to use a real line
where I record height, and let's say this is
height in inches. And the experiment happens,
I pick a random student. And I go and measure the height
of that random student, and that gives me a
specific number. So what's a good number
in inches? Let's say 60. OK. Or I pick another student, and
that student has a height of 71 inches, and so on. So this is the experiment. These are the outcomes. These are the numerical values
of the random variable that we call height. OK. So mathematically, what are
we dealing with here? We're basically dealing with a
function from the sample space into the real numbers. That function takes as argument,
an outcome of the experiment, that is a typical
student, and produces the value of that function, which
is the height of that particular student. So we think of an abstract
object that we denote by a capital H, which is the random
variable called height. And that random variable is
essentially this particular function that we talked
about here. OK. So there's a distinction that
we're making here-- H is height in the abstract. It's the function. These numbers here are
particular numerical values that this function takes when
you choose one particular outcome of the experiment. Now, when you have a single
probability experiment, you can have multiple random
variables. So perhaps, instead of just
height, I'm also interested in the weight of a typical
student. And so when the experiment
happens, I pick that random student-- this is the height
of the student. But that student would also
have a weight, and I could record it here. And similarly, every student
is going to have their own particular weight. So the weight function is a
different function from the sample space to the real
numbers, and it's a different random variable. So the point I'm making here is
that a single probabilistic experiment may involve several
interesting random variables. I may be interested in the
height of a random student or the weight of the
random student. These are different
random variables that could be of interest. I can also do other things. Suppose I define an object such
as H bar, which is 2.58. What does that correspond to? Well, this is the height
in centimeters. Now, H bar is a function of H
itself, but if you were to draw the picture, the picture
would go this way. 60 gets mapped to 150, 71 gets
mapped to, oh, that's too hard for me. OK, gets mapped to something,
and so on. So H bar is also a
random variable. Why? Once I pick a particular
student, that particular outcome determines completely
the numerical value of H bar, which is the height of that
student but measured in centimeters. What we have here is actually
a random variable, which is defined as a function of another
random variable. And the point that this example
is trying to make is that functions of
random variables are also random variables. The experiment happens, the
experiment determines a numerical value for
this object. And once you have the numerical
value for this object, that determines
also the numerical value for that object. So given an outcome, the
numerical value of this particular object
is determined. So H bar is itself a function
from the sample space, from outcomes to numerical values. And that makes it a random
variable according to the formal definition that
we have here. So the formal definition is that
the random variable is not random, it's not a variable,
it's just a function from the sample space
to the real numbers. That's the abstract, right way
of thinking about them. Now, random variables can
be of different types. They can be discrete
or continuous. Suppose that I measure the
heights in inches, but I round to the nearest inch. Then the numerical values I'm
going to get here would be just integers. So that would make
it an integer valued random variable. And this is a discrete
random variable. Or maybe I have a scale for
measuring height which is infinitely precise and records
your height to an infinite number of digits of precision. In that case, your height
would be just a general real number. So we would have a random
variable that takes values in the entire set of
real numbers. Well, I guess not really
negative numbers, but the set of non-negative numbers. And that would be a continuous
random variable. It takes values in
a continuous set. So we will be talking about both
discrete and continuous random variables. The first thing we will do
will be to devote a few lectures on discrete random
variables, because discrete is always easier. And then we're going to
repeat everything in the continuous setting. So discrete is easier, and
it's the right place to understand all the concepts,
even those who may appear to be elementary. And then you will be set to
understand what's going on when we go to the
continuous case. So in the continuous case, you
get all the complications of calculus and some extra math
that comes in there. So it's important to have been
down all the concepts very well in the easy, discrete case
so that you don't have conceptual hurdles when
you move on to the continuous case. Now, one important remark that
may seem trivial but it's actually very important so that
you don't get tangled up between different types
of concepts-- there's a fundamental
distinction between the random variable itself, and
the numerical values that it takes. Abstractly speaking, or
mathematically speaking, a random variable, x, or H in this
example, is a function. OK. Maybe if you like programming
the words "procedure" or "sub-routine" might be better. So what's the sub-routine
height? Given a student, I take that
student, force them on the scale and measure them. That's the sub-routine that
measures heights. It's really a function that
takes students as input and produces numbers as output. The sub-routine we denoted
by capital H. That's the random variable. But once you plug in a
particular student into that sub-routine, you end up getting
a particular number. This is the numerical output
of that sub-routine or the numerical value of
that function. And that numerical value is an
element of the real numbers. So the numerical value is a
real number, where this capital X is a function from
omega to the real numbers. So they are very different
types of objects. And the way that we keep track
of what we're talking about at any given time is by using
capital letters for random variables and lower case
letters for numbers. OK. So now once we have a random
variable at hand, that random variable takes on different
numerical values. And we want to describe to say
something about the relative likelihoods of the different
numerical values that the random variable can take. So here's our sample space,
and here's the real line. And there's a bunch of outcomes
that gave rise to one particular numerical value. There's another numerical value
that arises if we have this outcome. There's another numerical value
that arises if we have this outcome. So our sample space is here. The real numbers are here. And what we want to do is to ask
the question, how likely is that particular numerical
value to occur? So what we're essentially asking
is, how likely is it that we obtain an outcome that
leads to that particular numerical value? We calculate that overall
probability of that numerical value and we represent that
probability using a bar so that we end up generating
a bar graph. So that could be a possible
bar graph associated with this picture. The size of this bar is the
total probability that our random variable took on this
numerical value, which is just the sum of the probabilities of
the different outcomes that led to that numerical value. So the thing that we're plotting
here, the bar graph-- we give a name to it. It's a function, which we denote
by lowercase b, capital X. The capital X indicates which
random variable we're talking about. And it's a function of little
x, which is the range of values that our random
variable is taking. So in mathematical notation, the
value of the PMF at some particular number, little x,
is the probability that our random variable takes on the
numerical value, little x. And if you want to be precise
about what this means, it's the overall probability of all
outcomes for which the random variable ends up taking
that value, little x. So this is the overall
probability of all omegas that lead to that particular
numerical value, x, of interest. So what do we know about PMFs? Since there are probabilities,
all these entries in the bar graph have to be non-negative. Also, if you exhaust all the
possible values of little x's, you will have exhausted all the
possible outcomes here. Because every outcome leads
to some particular x. So the sum of these
probabilities should be equal to one. This is the second
relation here. So this relation tell
us that some little x is going to happen. They happen with different
probabilities, but when you consider all the possible little
x's together, one of those little x's is going
to be realized. Probabilities need
to add to one. OK. So let's get our first example
of a non-trivial bar graph. Consider the experiment where
I start with a coin and I start flipping it
over and over. And I do this until I obtain
heads for the first time. So what are possible outcomes
of this experiment? One possible outcome is that
I obtain heads at the first toss, and then I stop. In this case, my random variable
takes the value 1. Or it's possible that I obtain
tails and then heads. How many tosses did it take
until heads appeared? This would be x equals to 2. Or more generally, I might
obtain tails for k minus 1 times, and then obtain heads
at the k-th time, in which case, our random variable takes
the value, little k. So that's the experiment. So capital X is a well defined
random variable. It's the number of tosses it
takes until I see heads for the first time. These are the possible
outcomes. These are elements of
our sample space. And these are the values of X
depending on the outcome. Clearly X is a function
of the outcome. You tell me the outcome, I'm
going to tell you what X is. So what we want to do now is to
calculate the PMF of X. So Px of k is, by definition, the
probability that our random variable takes the value k. For the random variable to take
the value of k, the first head appears at toss number k. The only way that this event
can happen is if we obtain this sequence of events. T's the first k minus
1 times, tails, and heads at the k-th flip. So this event, that the random
variable is equal to k, is the same as this event, k minus 1
tails followed by 1 head. What's the probability
of that event? We're assuming that the coin
tosses are independent. So to find the probability
of this event, we need to multiply the probability of
tails, times the probability of tails, times the probability
of tails. We multiply k minus one times,
times the probability of heads, which puts an
extra p at the end. And this is the formula for the
so-called geometric PMF. And why do we call
it geometric? Because if you go and plot the
bar graph of this random variable, X, we start
at 1 with a certain number, which is p. And then at 2 we get p(1-p). At 3 we're going to get
something smaller, it's p times (1-p)-squared. And the bars keep going down
at the rate of geometric progression. Each bar is smaller than the
previous bar, because each time we get an extra factor
of 1-p involved. So the shape of this
PMF is the graph of a geometric sequence. For that reason, we say that
it's the geometric PMF, and we call X also a geometric
random variable. So the number of coin tosses
until the first head is a geometric random variable. So this was an example
of how to compute the PMF of a random variable. This was an easy example,
because this event could be realized in one and
only one way. So to find the probability of
this, we just needed to find the probability of this
particular outcome. More generally, there's going
to be many outcomes that can lead to the same numerical
value. And we need to keep track
of all of them. For example, in this picture,
if I want to find this value of the PMF, I need to add up the
probabilities of all the outcomes that leads
to that value. So the general procedure
is exactly what this picture suggests. To find this probability, you go
and identify which outcomes lead to this numerical value,
and add their probabilities. So let's do a simple example. I take a tetrahedral die. I toss it twice. And there's lots of random
variables that you can associate with the
same experiment. So the outcome of the first
throw, we can call it F. That's a random variable because
it's determined once you tell me what happens
in the experiment. The outcome of the
second throw is another random variable. The minimum of the two throws
is also a random variable. Once I do the experiment, this
random variable takes on a specific numerical value. So suppose I do the experiment
and I get a 2 and a 3. So this random variable is going
to take the numerical value of 2. This is going to take the
numerical value of 3. This is going to take the
numerical value of 2. And now suppose that I want to
calculate the PMF of this random variable. What I will need to do is to
calculate Px(0), Px(1), Px(2), Px(3), and so on. Let's not do the entire
calculation then, let's just calculate one of the
entries of the PMF. So Px(2)-- that's the probability that the
minimum of the two throws gives us a 2. And this can happen
in many ways. There are five ways that
it can happen. Those are all of the outcomes
for which the smallest of the two is equal to 2. That's five outcomes assuming
that the tetrahedral die is fair and the tosses
are independent. Each one of these outcomes
has probability of 1/16. There's five of them, so
we get an answer, 5/16. Conceptually, this is just the
procedure that you use to calculate PMFs the way that
you construct this particular bar graph. You consider all the possible
values of your random variable, and for each one of
those random variables you find the probability that the
random variable takes on that value by adding the
probabilities of all the possible outcomes that
leads to that particular numerical value. So let's do another, more
interesting one. So let's revisit the
coin tossing problem from last time. Let us fix a number n, and we
decide to flip a coin n consecutive times. Each time the coin tosses
are independent. And each one of the tosses will
have a probability, p, of obtaining heads. Let's consider the random
variable, which is the total number of heads that
have been obtained. Well, that's something that
we dealt with last time. We know the probabilities for
different numbers of heads, but we're just going
to do the same now using today's notation. So let's, for concreteness,
n equal to 4. Px is the PMF of that random
variable, X. Px(2) is meant to be, by definition, it's the
probability that a random variable takes the value of 2. So this is the probability
that we have, exactly two heads in our four tosses. The event of exactly two heads
can happen in multiple ways. And here I've written
down the different ways that it can happen. It turns out that there's
exactly six ways that it can happen. And each one of these ways,
luckily enough, has the same probability-- p-squared times (1-p)-squared. So that gives us the value for
the PMF evaluated at 2. So here we just counted
explicitly that we have six possible ways that this can
happen, and this gave rise to this factor of 6. But this factor of 6 turns
out to be the same as this 4 choose 2. If you remember definition from
last time, 4 choose 2 is 4 factorial divided by 2
factorial, divided by 2 factorial, which is
indeed equal to 6. And this is the more general
formula that you would be using. In general, if you have n tosses
and you're interested in the probability of obtaining
k heads, the probability of that event is
given by this formula. So that's the formula that
we derived last time. Except that last time we didn't
use this notation. We just said the probability of
k heads is equal to this. Today we introduce the
extra notation. And also having that notation,
we may be tempted to also plot a bar graph for the Px. In this case, for the coin
tossing problem. And if you plot that bar graph
as a function of k when n is a fairly large number, what you
will end up obtaining is a bar graph that has a shape of
something like this. So certain values of k are more
likely than others, and the more likely values
are somewhere in the middle of the range. And extreme values-- too few heads or too many
heads, are unlikely. Now, the miraculous thing is
that it turns out that this curve gets a pretty definite
shape, like a so-called bell curve, when n is big. This is a very deep and central
fact from probability theory that we will get to
in a couple of months. For now, it just could be
a curious observation. If you go into MATLAB and put
this formula in and ask MATLAB to plot it for you, you're going
to get an interesting shape of this form. And later on we will have to
sort of understand where this is coming from and whether
there's a nice, simple formula for the asymptotic
form that we get. All right. So, so far I've said essentially
nothing new, just a little bit of notation and
this little conceptual thing that you have to think of random
variables as functions in the sample space. So now it's time to introduce
something new. This is the big concept
of the day. In some sense it's
an easy concept. But it's the most central, most
important concept that we have to deal with random
variables. It's the concept
of the expected value of a random variable. So the expected value is meant
to be, let's speak loosely, something like an average,
where you interpret probabilities as something
like frequencies. So you play a certain game and
your rewards are going to be-- use my standard numbers-- your rewards are going
to be one dollar with probability 1/6. It's going to be 2 dollars with
probability 1/2, and four dollars with probability 1/3. So this is a plot of the PMF
of some random variable. If you play that game and you
get so many dollars with this probability, and so on, how much
do you expect to get on the average if you play the
game a zillion times? Well, you can think
as follows-- one sixth of the time I'm
going to get one dollar. One half of the time that
outcome is going to happen and I'm going to get two dollars. And one third of the time the
other outcome happens, and I'm going to get four dollars. And you evaluate that number
and it turns out to be 2.5. OK. So that's a reasonable way of
calculating the average payoff if you think of these
probabilities as the frequencies with which you
obtain the different payoffs. And loosely speaking, it doesn't
hurt to think of probabilities as frequencies
when you try to make sense of various things. So what did we do here? We took the probabilities of the
different outcomes, of the different numerical values, and
multiplied them with the corresponding numerical value. Similarly here, we have
a probability and the corresponding numerical value
and we added up over all x's. So that's what we did. It looks like an interesting
quantity to deal with. So we're going to give a name to
it, and we're going to call it the expected value of
a random variable. So this formula just captures
the calculation that we did. How do we interpret the
expected value? So the one interpretation
is the one that I used in this example. You can think of it as the
average that you get over a large number of repetitions
of an experiment where you interpret the probabilities as
the frequencies with which the different numerical
values can happen. There's another interpretation
that's a little more visual and that's kind of insightful,
if you remember your freshman physics, this kind of formula
gives you the center of gravity of an object
of this kind. If you take that picture
literally and think of this as a mass of one sixth sitting
here, and the mass of one half sitting here, and one third
sitting there, and you ask me what's the center of gravity
of that structure. This is the formula that gives
you the center of gravity. Now what's the center
of gravity? It's the place where if you put
your pen right underneath, that diagram will stay in place
and will not fall on one side and will not fall
on the other side. So in this thing, by picture,
since the 4 is a little more to the right and a little
heavier, the center of gravity should be somewhere
around here. And that's what for
the math gave us. It turns out to be
two and a half. Once you have this
interpretation about centers of gravity, sometimes
you can calculate expectations pretty fast. So here's our new
random variable. It's the uniform random variable
in which each one of the numerical values
is equally likely. Here there's a total of n plus
1 possible numerical values. So each one of them has
probability 1 over (n + 1). Let's calculate the expected
value of this random variable. We can take the formula
literally and consider all possible outcomes, or all
possible numerical values, and weigh them by their
corresponding probability, and do this calculation and
obtain an answer. But I gave you the intuition
of centers of gravity. Can you use that intuition
to guess the answer? What's the center of gravity
infrastructure of this kind? We have symmetry. So it should be in the middle. And what's the middle? It's the average of the
two end points. So without having to do the
algebra, you know that's the answer is going to
be n over 2. So this is a moral that you
should keep whenever you have PMF, which is symmetric around
a certain point. That certain point is going
to be the expected value associated with this
particular PMF. OK. So having defined the expected
value, what is there that's left for us to do? Well, we want to investigate how
it behaves, what kind of properties does it have, and
also how do you calculate expected values of complicated
random variables. So the first complication that
we're going to start with is the case where we deal with a
function of a random variable. OK. So let me redraw this same
picture as before. We have omega. This is our sample space. This is the real line. And we have a random variable
that gives rise to various values for X. So the random
variable is capital X, and every outcome leads to a
particular numerical value x for our random variable X. So
capital X is really the function that maps these points
into the real line. And then I consider a function
of this random variable, call it capital Y, and it's
a function of my previous random variable. And this new random variable Y
takes numerical values that are completely determined once
I know the numerical value of capital X. And perhaps you get
a diagram of this kind. So X is a random variable. Once you have an outcome, this
determines the value of x. Y is also a random variable. Once you have the outcome,
that determines the value of y. Y is completely determined once
you know X. We have a formula for how to calculate
the expected value of X. Suppose that you're interested
in calculating the expected value of Y. How would
you go about it? OK. The only thing you have in your
hands is the definition, so you could start by just
using the definition. And what does this entail? It entails for every particular
value of y, collect all the outcomes that leads
to that value of y. Find their probability. Do the same here. For that value, collect
those outcomes. Find their probability
and weight by y. So this formula does the
addition over this line. We consider the different
outcomes and add things up. There's an alternative way of
doing the same accounting where instead of doing the
addition over those numbers, we do the addition up here. We consider the different
possible values of x, and we think as follows-- for each possible value of x,
that value is going to occur with this probability. And if that value has occurred,
this is how much I'm getting, the g of x. So I'm considering the
probability of this outcome. And in that case, y
takes this value. Then I'm considering the
probabilities of this outcome. And in that case, g of x
takes again that value. Then I consider this particular
x, it happens with this much probability, and in
that case, g of x takes that value, and similarly here. We end up doing exactly the same
arithmetic, it's only a question whether we bundle
things together. That is, if we calculate the
probability of this, then we're bundling these
two cases together. Whereas if we do the addition
up here, we do a separate calculation-- this probability times this
number, and then this probability times that number. So it's just a simple
rearrangement of the way that we do the calculations, but it
does make a big difference in practice if you actually want
to calculate expectations. So the second procedure that I
mentioned, where you do the addition by running
over the x-axis corresponds to this formula. Consider all possibilities for x
and when that x happens, how much money are you getting? That gives you the average money
that you are getting. All right. So I kind of hand waved and
argued that it's just a different way of accounting, of
course one needs to prove this formula. And fortunately it
can be proved. You're going to see that
in recitation. Most people, once they're a
little comfortable with the concepts of probability,
actually believe that this is true by definition. In fact it's not true
by definition. It's called the law of the
unconscious statistician. It's something that you always
do, but it's something that does require justification. All right. So this gives us basically a
shortcut for calculating expected values of functions of
a random variable without having to find the PMF
of that function. We can work with the PMF of
the original function. All right. So we're going to use this
property over and over. Before we start using it, one
general word of caution-- the average of a function of a
random variable, in general, is not the same as the function
of the average. So these two operations of
taking averages and taking functions do not commute. What this inequality tells you
is that, in general, you can not reason on the average. So we're going to see instances
where this property is not true. You're going to see
lots of them. Let me just throw it here that
it's something that's not true in general, but we will be
interested in the exceptions where a relation like
this is true. But these will be
the exceptions. So in general, expectations
are average, something like averages. But the function of an average
is not the same as the average of the function. OK. So now let's go to properties
of expectations. Suppose that alpha is a real
number, and I ask you, what's the expected value of
that real number? So for example, if I write
down this expression-- expected value of 2. What is this? Well, we defined random
variables and we defined expectations of random
variables. So for this to make syntactic
sense, this thing inside here should be a random variable. Is 2 -- the number 2 --- is it
a random variable? In some sense, yes. It's the random variable that
takes, always, the value of 2. So suppose that you have some
experiment and that experiment always outputs 2 whenever
it happens. Then you can say, yes, it's
a random experiment but it always gives me 2. The value of the random
variable is always 2 no matter what. It's kind of a degenerate random
variable that doesn't have any real randomness in it,
but it's still useful to think of it as a special case. So it corresponds to a function
from the sample space to the real line that takes
only one value. No matter what the outcome is,
it always gives me a 2. OK. If you have a random variable
that always gives you a 2, what is the expected
value going to be? The only entry that shows
up in this summation is that number 2. The probability of a 2 is equal
to 1, and the value of that random variable
is equal to 2. So it's the number itself. So the average value in an
experiment that always gives you 2's is 2. All right. So that's nice and simple. Now let's go to our experiment
where age was your height in inches. And I know your height in
inches, but I'm interested in your height measured
in centimeters. How is that going
to be related to your height in inches? Well, if you take your height
in inches and convert it to centimeters, I have another
random variable, which is always, no matter what, two and
a half times bigger than the random variable
I started with. If you take some quantity and
always multiplied by two and a half what happens to the average
of that quantity? It also gets multiplied
by two and a half. So you get a relation like
this, which says that the average height of a student
measured in centimeters is two and a half times the
average height of a student measured in inches. So that makes perfect
intuitive sense. If you generalize it, it gives
us this relation, that if you have a number, you can pull it
outside the expectation and you get the right result. So this is a case where you
can reason on the average. If you take a number, such as
height, and multiply it by a certain number, you can
reason on the average. I multiply the numbers
by two, the averages will go up by two. So this is an exception to this
cautionary statement that I had up there. How do we prove that
this fact is true? Well, we can use the expected
value rule here, which tells us that the expected value of
alpha X, this is our g of X, essentially, is going to be
the sum over all x's of my function, g of X, times the
probability of the x's. In our particular case, g of X
is alpha times X. And we have those probabilities. And the alpha goes outside
the summation. So we get alpha, sum over x's,
x Px of x, which is alpha times the expected value of X. So that's how you prove this
relation formally using this rule up here. And the next formula that
I have here also gets proved the same way. What does this formula
tell you? If I take everybody's height
in centimeters-- we already multiplied
by alpha-- and the gods give everyone
a bonus of ten extra centimeters. What's going to happen to the
average height of the class? Well, it will just go up by
an extra ten centimeters. So this expectation is going to
be giving you the bonus of beta just adds a beta to the
average height in centimeters, which we also know to be alpha
times the expected value of X, plus beta. So this is a linearity property
of expectations. If you take a linear function
of a single random variable, the expected value of that
linear function is the linear function of the expected
value. So this is our big exception to
this cautionary note, that we have equal if g is linear. OK. All right. So let's get to the last
concept of the day. What kind of functions
of random variables may be of interest? One possibility might be the
average value of X-squared. Why is it interesting? Well, why not. It's the simplest function
that you can think of. So if you want to calculate
the expected value of X-squared, you would use this
general rule for how you can calculate expected values of
functions of random variables. You consider all the
possible x's. For each x, you see what's the
probability that it occurs. And if that x occurs, you
consider and see how big x-squared is. Now, the more interesting
quantity, a more interesting expectation that you can
calculate has to do not with x-squared, but with the distance
of x from the mean and then squared. So let's try to parse what
we've got up here. Let's look just at the
quantity inside here. What kind of quantity is it? It's a random variable. Why? X is random, the random
variable, expected value of X is a number. Subtract a number from a random
variable, you get another random variable. Take a random variable and
square it, you get another random variable. So the thing inside here is a
legitimate random variable. What kind of random
variable is it? So suppose that we have our
experiment and we have different x's that can happen. And the mean of X in this
picture might be somewhere around here. I do the experiment. I obtain some numerical
value of x. Let's say I obtain this
numerical value. I look at the distance from
the mean, which is this length, and I take the
square of that. Each time that I do the
experiment, I go and record my distance from the mean
and square it. So I give more emphasis
to big distances. And then I take the average over
all possible outcomes, all possible numerical values. So I'm trying to compute
the average squared distance from the mean. This corresponds to
this formula here. So the picture that I drew
corresponds to that. For every possible numerical
value of x, that numerical value corresponds to a certain
distance from the mean squared, and I weight according
to how likely is that particular value
of x to arise. So this measures the
average squared distance from the mean. Now, because of that expected
value rule, of course, this thing is the same as
that expectation. It's the average value of the
random variable, which is the squared distance
from the mean. With this probability, the
random variable takes on this numerical value, and the squared
distance from the mean ends up taking that particular
numerical value. OK. So why is the variance
interesting? It tells us how far away from
the mean we expect to be on the average. Well, actually we're not
counting distances from the mean, it's distances squared. So it gives more emphasis to the
kind of outliers in here. But it's a measure of
how spread out the distribution is. A big variance means that those
bars go far to the left and to the right, typically. Where as a small variance would
mean that all those bars are tightly concentrated
around the mean value. It's the average squared
deviation. Small variance means that
we generally have small deviations. Large variances mean that
we generally have large deviations. Now as a practical matter, when
you want to calculate the variance, there's a handy
formula which I'm not proving but you will see it
in recitation. It's just two lines
of algebra. And it allows us to calculate it
in a somewhat simpler way. We need to calculate the
expected value of the random variable and the expected value
of the squares of the random variable, and
these two are going to give us the variance. So to summarize what we did
up here, the variance, by definition, is given
by this formula. It's the expected value of
the squared deviation. But we have the equivalent
formula, which comes from application of the expected
value rule, to the function g of X, equals to x minus the
(expected value of X)-squared. OK. So this is the definition. This comes from the expected
value rule. What are some properties
of the variance? Of course variances are
always non-negative. Why is it always non-negative? Well, you look at the definition
and your just adding up non-negative things. We're adding squared
deviations. So when you add non-negative
things, you get something non-negative. The next question is, how do
things scale if you take a linear function of a
random variable? Let's think about the
effects of beta. If I take a random variable and
add the constant to it, how does this affect the amount
of spread that we have? It doesn't affect-- whatever the spread of this
thing is, if I add the constant beta, it just moves
this diagram here, but the spread doesn't grow
or get reduced. The thing is that when I'm
adding a constant to a random variable, all the x's that are
going to appear are further to the right, but the expected
value also moves to the right. And since we're only interested
in distances from the mean, these distances
do not get affected. x gets increased by something. The mean gets increased by
that same something. The difference stays the same. So adding a constant to a random
variable doesn't do anything to it's variance. But if I multiply a random
variable by a constant alpha, what is that going to
do to its variance? Because we have a square here,
when I multiply my random variable by a constant, this x
gets multiplied by a constant, the mean gets multiplied by a
constant, the square gets multiplied by the square
of that constant. And because of that reason, we
get this square of alpha showing up here. So that's how variances
transform under linear transformations. You multiply your random
variable by constant, the variance goes up by the square
of that same constant. OK. That's it for today. See you on Wednesday.