Thought we'd start with a little
puzzle I'm trying to solve. Maybe you can help me with the puzzle,
okay? Maybe make a little less volume, thanks. I don't need to echo that much. All right, so here's the puzzle. I thought this was
a pretty interesting one. And maybe related to what we're gonna do. Okay, so the question is,
what's the next number in the sequence? 0, 1, 2, any ideas? >> 4. >> 4. >> 3. >> 3. Any other suggestions? >> 1.
>> 1. >> 0.
>> Okay, no one has said the same number twice, so we have no consensus at all. >> [LAUGH]
>> We have 0, 1, 4. >> 3.
>> 3. 3, 2,
>> 5. >> 5. 42, that's a good one. 1 over e would be a good guess. >> [LAUGH]
>> Okay, any other suggestions? One-half, one-tenth, okay. Well, I don't want to get a reason for
each of those. Is there any majority opinion on this? >> 3. >> 3, that's a, why do you say 3? >> Integers. >> Integers. Okay, 0 is an integer. Why 3? Yeah? >> It's a value that
the Poisson could take. >> It's a value that
the Poisson could take, okay. But I don't have to list
them in order though, I could have listed them out of order. All right, well, okay, a lot of you seem
to like the number 3, but I haven't yet heard a good reason why. I'm kind of interested. >> [CROSSTALK]
>> It's an arithmetic sequence, I hadn't noticed that. So I guess you're pointing out that it's
an arithmetic, hey that's pretty cool. So you go from here,
add one, add one, add- >> [LAUGH] >> Arithmetic sequence, okay, that might have been what
the puzzle writer had in mind. I've been trying to solve this for
the past couple days, I got stuck. >> I haven't thought of 3 yet. The best answer I thought of so
far was 720 factorial. >> [LAUGH]
>> We can debate a little bit, is that more correct or is 3 more correct? Anyone,assuming it does goes this way, anyone wanna guess the next term
>> 3. >> 3 [LAUGH]
>> [LAUGH] All right, I think you got it. All right, so let me tell you why
I'm talking of this sequence. Well, it's probably just like,
I always really, I love puzzles since
the time I was little. But, I hated this kind of puzzle, because
there never seems to be a principle for coming up with an answer. You can make up one,
I make up one thing and the answer is suppose
to be something else. Well, what's the measure,
what's the better, right? I mean, it's a measure of
maybe the simplest answer, but will two people always
agree on what's simpler. No, right,
you need some complexity measure or something to actually make that
into a valid, well-defined problem. So in a sense,
this is just as good an answer as 3, Some of you had a might an arithmetic
sequence, that's a good suggestion. But I was thinking of 0, well, 0 is 0. 1 factorial, 2 factorial factorial,
3 factorial factorial factorial, 4 factorial factorial factorial factorial. Which one of you just
suggested it in thought. That's 0 with no factorials, 1 with
one factorial, 2 with two factorials, etcetera. Okay, and probably should put parentheses,
but that's gonna, because this double factorial is also
used to mean the skip factorial, where you go down by
two numbers each time. But anyway, that's doing 3 factorial,
4 factorial, etcetera. All right, so, well I mean, that's
what one interpretation, it could be. So the general question is, if we have a sequence of numbers,
how do we extend it, okay? And I was thinking of this example cuz
I was thinking about factorials and how I wanna explain standing factorials. So, like,
if you ever wondered what's pi factorial. We're gonna talk about
questions like that. If you didn't ever wonder
what pi factorial is, that's okay, you'll see other uses for
this kind of thing. The question is if you plot it, what
does the factorial function look like? Well, 0 factorial is 1. That's 0. 1 factorial is 1. 2 factorial is 2. 3 factorial is 6. 4 factorial is 24. 5 factorial is 120. 6 factorial is 720. Can't draw it anymore. It goes way, it starts out pretty small
and then it grows extremely fast. Very, very beautiful formula that
gives you the approximations for factorial is called Sterling's formula, which if you haven't seen before,
everyone should know this formula. It's both beautiful and useful,
says that n factorial is approximately square root of 2pi n
times n over e to the n. This is actually an extremely
good approximation. Even if you try this out on
a calculator where n is like 12 or 20, it's actually very, very good. Not only that, but
this approximation is so good that if you take the ratio of this,
divided by this, it will converge to 1. As n goes to infinity. So, that gives you some,
that's just a cool looking formula. By the way,
we can prove this using probability and I'm not going to do that today,
but I might do that at some point. We can give this a probability
interpretation by thinking about Poisson's. But anyway, this gives us some sense
of how fast does a factorial grow. This square root thing is not so
important. The main thing that's driving
is this n over e to the n. So basically we have
this n to the n behavior. It's discounted by e to the n. But, n is, if you're letting n go to
infinity, that's dominating over the e. So anyway, that's Sterling's formula. So, that gives a sense
of how this thing grow. But now my question is,
how would you connect the dots? That is, if you want to, this function was
only defined on non-negative integers, and if we want to interpolate. Well, it's like many ways you could draw
some function that connects those dots, infinitely many ways to do it. The question is whether there's
some way that's more, so if I ask you what's pi factorial? And you have as many ways as you want to
try to connect the dots between 3 and 4, then what should you do? Well, it turns out that there's one
really standard famous way to do that. Some of you may have seen, but not at all,
I'm assuming that everyone has. That's called the gamma function, so
that's what we need to start with. We need the gamma function because
we're going to be introducing something called the gamma distribution. We're not done with beta,
we were doing beta a lot last time, and hopefully it wasn't too scary,
despite it being Halloween. We'll come back to the, the beta turns out to be extremely
closely connected with the gamma. So to understand the beta, we need to
do the gamma, which we're about to do. The gamma distribution is named
the gamma distribution because it stems from the gamma function which is- Which is one of the most famous
functions in all of mathematics. Like if you had a top ten list,
top ten most famous functions in math, gamma should, I haven't seen anyone
make quite a list like that. I've seen people make lists of
the ten most famous constants, but not the ten most famous functions. I'm sure someone has, and
if they haven't, or if they have and they did a good job, they would put
the gamma function on the list, okay? So it's very, very important in math, not
just in probability and statistics, but it's also important in statistics,
which is why we need it right now, but it has its own life
outside of this subject. All right, so here's the definition
of the gamma function That's just a capital letter Gamma of
a is defined by the following integral. Integral, 0 to infinity,
x to the a E to the minus x, dx over x. Usually, you'll see it written as an x
to the a minus 1 here, which of course, is the same thing. Cuz you can cancel out one of the xs. But for certain reasons, it's convenient
keeping kind of one x on hold over here. Just a different way to write it,
but the same thing. And this is defined for
any real a greater than 0. [COUGH] If you try letting a equal zero
here or if you let a be negative and you think about how this integral is
gonna behave, it's not gonna behave well. It's not gonna exist. So, for example, if a is zero, this part is gone,
we just have e to the minus x over x. And then like look at what
happens to that near 0, okay? So I'm not going to
rewrite the whole thing. Just pretend that the x to the a is
gone because we let a equal 0. We just have this. Look at what happens when
x gets very close to 0, x is close to 0, either minus x is just
close to 1, so it's not doing much, right? Either 0 is 1, so this part is very
well behaved, it's just approximately 1, nothing much going on. So what's driving the integral is this 1
over x, the integral of 1 over x is log x, log of 0 is minus infinity, and
that causes major problems. So, Anyway, but as long as a is greater than 0,
this is okay. Same argument,
what happens when x is near 0? Well, this e to the minus
x doesn't do very much. But here, we just have x to the a minus 1. And integrating that is gonna be fine. You can just do the anti-derivative and
you'll see it converges. So it's okay. That's when x goes near 0. It's also, you also have to consider
what happens when it goes to infinity. When x goes to infinity either the minus
x is the dominate part ,this thing, even if a is a million,
than this would be x to the 999,999. Completely irrelevant compared to even
minus x, which is exponential decay. So that integral is gonna converge. So okay, so it exists for
any real a greater than 0. This is an extension of
the factorial function. It will be a little bit more convenient
if gamma of a is a factorial. It turns out that gamma of
n is n minus 1 factorial. So you just have to be
a little bit careful. That's just for
historical reasons pretty much. This is for n, for
any n a positive integer. So as I said, there's other ways you can extend
factorials to the positive real numbers. But this one turns out to be the one
that has at least found the most different applications in math,
so this is the most natural and best one known in a certain sense. So why is this true? Well, let me write down
another identity for Gamma. Gamma of x plus 1 equals x gamma x. So this recursive formula for gamma. You don't need to know much about
the gamma function for this course. Depending on what kind of math you do
later, you might need a lot of gamma stuff, but for our purposes, you should
know the definition, you should know that it extends factorials in this way,
you should know this identity. Notice that this is very
closely related to this, right, because let's just look for
a second at gamma of 1. We let a equal 1, x is cancel,
integral of e to the minus x, we've seen several times that
that's just 1, so gamma 1 equals 1. And then if we plug that in here, it says that gamma of 2,
that's 1 plus 1 equals, gamma of 2 equals 1 gamma of 1,
which is 1. It would be nice if gamma
of 2 was 2 factorial, but according to this gamma of
2 its still 1 factorial. So that's still correct, gamma of 2 is 1. And now if we look at gamma of 3,
well we do 2 + 1, 2 gamma of 2, and we would get 2, and so on. So, notice that this is the same recursive
formula that the factorial will satisfy. So basically this formula immediately
implies this one just by, we have the starting point, right? And then both things, both factorials and gammas satisfy the same recursion, so
then they must stay the same forever. Okay, and to prove this thing
is just integration by parts, which I'll let you do if you feel like it. If you tried to do this integral, just using kind of like the most basic
integration by parts, like AP calculus. What you would do is, you would not
just get an answer for the integral. But what you would get is something that
looks exactly like the gamma function again and you would get this identity, just from
a straightforward integration by parts. Okay, so not the gamma function and
that's what happens for factorials. One other thing about the Gamma
function that's worth knowing is what's gamma of one-half? Just in case you ever wonder what's
the factorial of negative one-half. What does that mean? Well, gamma of one-half,
so when it's an integer, we just get factorials which are integers. Gamma of one half is square root of pi. Kind of a curious looking fact. And where have we seen
square roots of pis before Normalizing constant from the normal. So we proved that the normalizing
constant for the normal is we have 1 over square root of 2 pi,
blah blah blah, for the normal PDF. So if you let a equal one half here,
it doesn't look exactly like the normal. But actually, it's similar to what
you would get if we have this normal, take our normal density, let's say,
e to the minus x squared over 2 dx. Let's say, we were just trying
to do that integral, right? And when we did this using the cool
trick of writing it down twice, we took into rule, couldn't do it wrote down
a second time and then we could do it. But if we didn't know that trick, we might
say, well, let's make a substitution. For example, I guess the most obvious thing to do
might to let u equal x squared over two. Right, so du = dx, right? And so, Sorry xdx. So, I mean that will be
one thing we can try. This would be a natural thing to try. Just because it's kind of annoying having
this x squared up in the exponent. So if I do this then at least it's
just gonna be e to the minus u, that's very nice up in the exponent. But the problem is we don't actually have a x here, right? So you can still do the substitution,
but what you're gonna get is, you'll have an x there, and
x is essentially square root of u. I mean,
there's some other nonsense going on. So you're gonna get a square root of u,
which is like what you would have here, or you would be dividing
by square root of u. So anyway, I'm just saying. I mean, you can work through
the substitution if you want. But I'm just saying if you do this kind
of substitution on the normal integral. You're gonna reduce it to something
that looks like gamma of one half. So you could either think of
that as if you knew this fact, this would give you another way to get
the normalizing concept of the normal. Or, the way we actually have it is we
already know from a different method, the normalizing constant of the normal,
then we can prove this fact. So once we have gamma of one-half,
then we know gamma of three-halves. (3/4) = one plus one-half. So if we use this formula,
just says it's 1/2 gamma of 1/2. So that would be square root of pi/2 and
so on. So, then we could get gamma of
five-halves, gamma of seven-halves and get all the half integers that way. All right, so that's just a quick
introduction of the gamma function. What's on this board is all you need
to know about the gamma function. But now let's talk about the gamma
distribution to connect it back to probability. Okay, so here's kind of a simple naive way to create a PDF. Based on the gamma function. So we have this integral here. That's just the definition of gamma of a,
okay? And suppose we wanna somehow
guess that some kind of PDF that's gonna be related to the Gamma
function, and we want a valid PDF. And to have a valid PDF, all we need
is something that's non-negative and integrates to one. What would you do? Normalize it, how? How would you normalize this thing? >> [INAUDIBLE]
>> Just divide by that value. So if we divide both sides by gamma of a,
that's a PDF. So 1 = integral 0 to infinity,
1 over gamma of a, x to the a, e to the -x, dx/x, Right? That's what I call just like a simple
naive trick, that we have that integral. Just divide by gamma of
a now we have a PDF. So this thing here, Is called everything we're integrating, including
this 1/x it's easy to forget this 1/x. You could put it up there as x to the a-
1, but then it's easy to forget whether that should be a or a- 1,
I like to write it this way. This is called the Gamma of (a,1) PDF. There are two parameters, we only see the a right here
because the other parameters is 1. Gamma is very closely related
to exponential distribution. So the more general one
would be Gamma of a lambda, And that one would be obtained just by, remember how we got from exponential
one to any exponential lambda? It's the exact same idea that
we're just putting in a scale. So this is just a scale or
a one over is a scale. So to get this one we're just
gonna make a change of variables, let's say we wanted y to be gamma of
a lambda and we wanna find it's PDF. Then we're gonna define it as, Y =, this is exact same thing we did for
the exponential. We're just gonna let y
equal x over lambda. We just defined gamma of (a,1). That's what we just defined as this PDF,
and now we want gamma of a lambda all we're gonna do is take
a gamma (a,1) divide by lambda, okay? And so now at this point, this is just like a really
simple transformation, right? Just multiplying by a constant, and we've been talking about what happens
with more complicated transformations, non-linear stuff and
all kinds of complicated transformations. Here we're just multiplying by a constant. So if want the PDF of Y just to remind
you of your change of variables. The PDF of Y is fx(x) dx/dy, right? And in this case we
have y = x over lambda, which we could of course
write as x = lambda y. So dx/dy = lambda. So we just right down the gamma PDF, and we just replace x by, x is lambda y, so I'm just replacing x by lambda y, to the a, e to the- lambda y, 1/x. And x is lambda y. Times lambda, so
that lambda cancels that lambda. And that's what it looks like. I should emphasize the possible values,
this is for x greater than 0, this one is for y greater than 0. So this is like the exponential, that gamma is a continuous distribution
on the positive real numbers, okay? So, I mean that's just, all right,
so that's a cute trick, how do we get to convert this famous
function into a distribution. That doesn't really say that that's gonna
be an important distribution, right? That's not like a math trick story. It's not a probabilistic or
statistical story. So I wanna show you now how
this relates to the gamma, no, the gamma distribution relates to
the exponential distribution, okay? That's why we use the letter lambda and
this is analogous to what we did for the exponential, so
why does that work out? So for that we just need a picture. That's actually related
to the Poisson also. Gamma relates to the normal
distribution we'll see later. It relates to the beta distribution, it
relates to the exponential distribution, it relates to the Poisson distribution. And by the way, so we've already
discussed all the famous discreet distributions we need in this course. We're almost done with continuous
distributions as well. Not done like studying them, but done with like famous ones
that You need to know by name. There's a few more, but all of
the remaining ones are just variations in some sense of Gammas or
possibly with normals. But as you'll see later,
Gamma is closely related to normal. So once we have the Gamma and the normal, we can create the rest
of the distributions we'll need. So to talk about the Gamma,
and Exponential connection, we need to think about a Poisson process. Okay, so, we've only talked a little
bit about Poisson processes. Which is if you take that, 71, which is processes you can do
a lot of Poisson process stuff. For our purposes we just need
to know a couple basic facts. First of all what is a Poisson process? You've seen them before, but just to quickly remind you we're
imagining emails arriving in an inbox. Where the number of emails in some
time interval let's say N sub t equals number of emails up to time
t we just have this timeline here. And the Poisson process is called
a Poisson process because we're assuming that that's Poisson with rate lambda,
lambda times t. So in a time interval of length t,
this is true, I stated this as you know
you're going from zero to t. But we're actually assuming that this is
true in any time interval of length t, it's gonna be Poisson lambda t. And there's only one other assumption for
a Poisson process, that is that the number of arrivals in
disjoint intervals are independent. That is the number of emails you get from
this time to this time is independent of the number of emails you get from
this time to this time for example. So those,
there are only those two assumptions. So a few weeks ago we
saw what the connection with the exponential distribution is. And that was before we had even
defined the exponential distribution. So, I'll remind you of what that was. So, we're just drawing Draw an x
every time you got an email. That kind of picture,
that is the picture we have in mind. So what we proved a week ago was
that if we call this one T1, T1 is the time of the first email,
that's exponential lambda. And the proof for that is just to say,
Just to remind you, what's the probability
that T1 is greater than t? That's the same thing as saying at time t,
I haven't yet gotten any email. That's the same thing as
saying that N sub t equals 0, cuz that's just the same thing,
no emails up until time t. Well, that's just given by
Poisson e to the minus lambda t. The rest of the Poisson stuff is just 1. So it's just this one line calculation
that shows that, and then say that's just one minus the exponential lambda CDF so
that's true. So this first time is exponential. Using a similar argument we get that
that's the time from here to here. Using the same argument repeatedly,
The time you have to wait from the first email to the second
email is also gonna be an exponential and it's independent of the first exponential. Just use the memoryless property. Memoryless property says,
no matter how long you've waited, waiting for this second email but
starting at this point in time it's just starting over again so
it's just the same problem again. Same method says that here to here
is another exponential lambda, and from here to here is
an exponential lambda. Exponential lambda, and so on. So all of these times between
arrivals are exponential lambda. So our assumption, so that's actually an equivalent way
to state the Poisson process. Rather than stating with Poisson,
we could say that the inter-arrival times, that is just these distances
between Xs and this timeline, interarrival times, Are i.i.d. Exponential lambda. So those are the inter-arrival times. But what if we wanted to
know the actual times? So if we call this T2 and
T3, T4, T5, and so on. Well, that's the actual
time on the timeline, not just the distance between two times. We want to know that distribution. So therefore we can think of Tn So let's define Tn to be
the time of the nth arrival, nth email, nth phone call, or whatever. We can think of that, to get Tn, we
just add up the first waiting time, then the waiting time from the first to the
second, and a second to third, and so on. So we can just think of that as
the sum of Xj, j equals 1 to n, where these Xj's are these
inter-arrival times, which we just said are exponential lambda,
i.i.d. This is the most important story for
the Gamma distribution, at least for the integer. So right now I'm assuming n is an integer. Over here,
a did not need to be an integer. But if we assume that it is an integer, then this is gonna be Gamma of n lambda. We haven't shown this yet, so that's
something we need to prove, that this kind of a sum of i.i.d exponentials will be
Gamma, that's what we're gonna show next. But they can see the analogy, remember
the negative binomial and the geometric? The geometric you're waiting for
one success in discrete time. Negative binomial you're waiting for
however many you know, seven success say, right. If you can think of the negative binomial
as the sum of geometric, so you wait for your first success, then you wait for
your second, and then third. And so on, in discrete time. The exponential is the continuous
time analog of the geometric. So this is the continuous time
analog of negative binomial. How long do you have to wait for
n successes in continuous time? Where in this case,
success just means getting an email. So that's a pretty weak
definition of success. But anyway, that's what's going on. >> [LAUGH]
>> So success is however you
wanted to define it. But in this case,
we just mean the arrivals. All right, so let's prove this fact now
because just from what I've written, I'm saying this is true, but just from
what I've written here, here I said add up exponentials, and here I said Write
down this PDF with the Gamma function. What does that have to do with that? Not yet clear, right? So we need to check the best true, okay? So let's do that. There's two ways to do it, at least two. There's two ways that we
know how to do right now. One would be to do a convolution. So, that thing over there is
just a convolution, right? This is a convolution,
it's just adding up an IID exponential. We could do a convolution integral. You could do that for practice. It's not that bad,
actually stuff will cancel and it's not bad, and
then you could involve X1 and X2. Then take that thing and involve it
with X3, and then take that thing and involve it with X4, or you get pretty
tired of it pretty quick soon. But hopefully before you got tired of it,
you would see a pattern and you could prove it by induction. But I don't wanna do it that way. We have a much better method
available that is MGFs, okay? MGF, right,
because we're adding up IID things. And in class, we already derived
the MGF of an exponential. It's a nice looking thing, and so
we may as well use that, okay? So that's what we're
gonna prove right now. So a proof that, Proof that, let's say T = the sum XJ, J = 1 to N with XJ IID exponential lambda. Let's do exponential one first,
is Gamma of (n,1). As long as we can do this when lambda
is 1, then we immediately get this for all lambda, because you could just
multiply or divide by lamdba. Lambda was just a rescaling, and
so just scale everything later. It's not really harder to do it with
general lambda here, it just makes the notation more cluttered, having all
these lambdas floating around everywhere. So I'd rather just set it equal to 1,
and then we can always multiply or divide by lambda later. It's not a big deal. All right, so let's do that using the MGF. So the MGF of XJ, MGF of X1, since they're our IID,
they all have the same MGF. It's the same as MGF of X2 and all XJs is. It's just 1 over 1- T, Which we can
also think of as a geometric series. So we used this before to
get all the moments of X1, all at once using the geometric series. And this is valid for T less than 1, Okay? So we did that before, Therefore, We can write down the MGF of T. I'll call it Tn,
just as a reminder that there's n of them. MGF of Tn, easy just raise
this to the nth power, right? Cuz we're adding independent things and
multiply the MGF. So just multiply that by
itself a bunch of times, we'd have 1 over 1- t to the n,
where t less than 1, okay? Very little work required to do that. I just put an n there. That's a lot easier than
doing a convolution. On the other hand, how do we know
that this is the MGF we wanted? Well, to answer that question, we need to look at the MGF of
a Gamma distribution, right? I'll make a new notation just so
that I'm not doing circular reasoning. Well, let's just say Y. Let Y be Gamma of n1, and
let's find its MGF, okay? And we said before that there's no
masquerading distributions that pretend to have a certain MGF and aren't actually
what they appear to be, right? So if, in fact, Y has this as its MGF,
then we're done, right? So that's what we need to show,
so find its MGF. Well, its MGF is the expected value,
e to the TY = LOTUS. I'm just gonna write e to the ty, And then I'm gonna write down
the gamma PDF which was, I'll take out the constant,
1 over Gamma of a. Y to the a, e to the -y, and a is n here. So that's Gamma of n, y to the n, e to the -y, dy over y. That's just this thing, and
just using LOTUS, okay? So that's not even conditioned,
write down, right away. That's why it's LOTUSed. It shouldn't require much thought
to write this down, okay. Now, it may take some thought. How do we actually do this integral? Well, I guess we can
simplify a little bit. Collect these two terms. So this is really just 1 over Gamma of n, integral 0 to infinity,
y to the n, e to the minus. And let's just collect
these terms together, 1- T, Times y, dy over y. Cuz we have e to the -y and
we have e to the TY, okay? All right, so
that may look like a difficult integral. But on the other hand,
if you stare at it for a little while,
it should start to look familiar. It kind of looks pretty similar to
the integrals we've just been doing. And it has some extra constant up here,
but this should seem pretty reminiscent
of the Gamma function itself, right? I mean just in general in this course,
it's not really a big deal how good you are at integration by parts and stuff
like that, you'll rarely have to do that. But what you do need to be able to do
is pattern recognition and see well, this looks like a Gamma integral. it's just I have this extra constant here,
that's really no big deal, right? Just make a change of variables just
to make this e to the -x again, right? So we're just gonna let, x = (1-t)y. So that dx = (1-t)dy, Okay? So t is less than 1, so that means that this is
actually still positive. So the limits of integration, so now,
let's just rewrite that integral. So I'm just making a change of variables
just to make that e to the -x again, cuz that's gonna look more like. It looks very similar to a Gamma, now, let's just make extremely similar,
that's all I'm doing. So we do 1 over gamma of a. Limits of integration stay 0 to infinity,
because when Y is 0, X is 0, and one goes to infinity,
the other goes to infinity. And y to the n is just well, y is x over 1- t, so
we are gonna get 1- t to the -n. That looks pretty promising, right, cuz that's actually exactly
what we wanted over there. And then we're gonna have x to the n,
e to the -x. And one convenient thing about
kind of keeping this as dy over y, instead of cancelling it. Is that this multiplicative
thing is gonna cancel, so dy over y is actually
the same thing as dx over x. That actually makes this
kind of thing easier. And that is again gamma of n,
not gamma of a, or just let a equal n. So that's why it's handy,
dy over y is same thing as dx over x, just makes it a little bit
easier with the algebra. Well, that was exactly the gamma function
again, so that's just 1- t to the -n, which is what we wanted. It's the same thing. MGF is the same, so that proves
that this statement is true, okay? So that's nice,
that's the connection with the, that's the connection
with the exponential. Well, another thing we need
to do is get the moments. So again, that sounds like
it could be a nasty problem, but it actually is not bad at all,
once you see the pattern. Okay, so let's let X ~ Gamma(a1). And by the way, this calculation here
actually, in this last line here, in this part, I actually never assumed
that n was actually an integer. So this actually shows that
this is the correct MGF, even if n could be any
positive real number here. To have the interpretation of the sum of
the exponentials, then we need an integer. But this is the MGF in general, cuz that didn't rely on
assuming that was an integer. All right, so now let's get some moments. Of course we can use the MGF,
but actually, I think in this case it's even
easier just to directly use LOTUS. So let's say we want to find E(X to the C). Where I don't even need for c to be an integer, c is just some number. I'm not saying this will exist for all c,
we'll have to see whether it exists, okay, but c is just any constant. In particular,
we can let c = 1 to get the mean, c = 2 to get the second moment,
then we can get the variance of that. Well, it's just LOTUS, so
this should be pretty quick. I'm gonna write down the,
I want E(X to the c), change big X to little x,
just LOTUS, right, X to the C. And then then I'm just gonna write
down the gamma PDF again, right? So we had 1 over Gamma(a), x to the c, and then we had an x to the a,
e to the minus x, dx over x. Well okay, obviously we can combine this to be X to the a + c, 0, infinity, x to the a + c, e to the -x dx over x. That integral should look very,
very familiar again, right? Just pattern recognition, that's x
to a power, e to the -x dx over x. That looks very very much like
the gamma function again, right? That is the gamma function,
that's just Gamma(a + c) over Gamma(a). That's just the gamma function. Or if you want, you can multiply and
divide by Gamma(a + c) and say, we just integrated
the Gamma(a + c) PDF. Must get 1,
if it's a PDF it has to integrate to 1. So you don't need to do any calculus here,
really, that was just algebra. This times this, and
then recognizing that's a gamma, right? So gamma has very, very nice properties in that sense,
beta has some similar nice properties. All right, so
that's Gamma(a + c) over Gamma(a). And our assumption has to
that a + c is positive, because the gamma function is
only defined on positive numbers. If a + c less than or equal to 0,
then it just doesn't exist. So okay, let's use this just
to just quickly get the mean. E(X) =, now let's just let c = 1, so that's gonna be Gamma(a
+ 1) over Gamma(a). But then remember that identity,
Gamma(a+1) is a Gamma(a). So this is a Gamma(a) over Gamma(a),
which is just simplifies to a. Very easy-looking formula for the mean, now let's check whether this
makes sense intuitively. In the case when a is an integer, and we use that interpretation over there,
we already knew the answer, right? Cuz exponential 1 has mean 1,
you add up n of them, you have to get mean n by linearity. So we already knew this is
correct in the integer case, and this says it's true even
if a is not an integer. All right, and
let's get the second moment, E(x squared). Well, that's just Gamma(a
+ 2) over Gamma(a). Gamma(a + 2) is (a + 1) Gamma(a),
(a + 1) Gamma(a + 1). I'm just using the same thing again,
right? Just replace a by (a + 1), so
it's (a + 1) Gamma(a + 1). But gamma of (a + 1) is a Gamma(a),
so again, the Gamma(a)'s cancel. And we would just get a squared + a,
and that tells us the variance. Variance is E(X squared)
minus E(X) squared, so we're subtracting a squared, so that
tells us the variance of x also equals a. Again, that's easy to remember. So it says that Gamma(a + 1)
has mean a and variance a. Sounds a little like the Poisson,
where the mean equals the variance. But this is kind of a special,
for the Poisson, the mean is always equal to the variance. And this is kind of,
just because I let lambda equal 1. So now if we bring back the lambda,
gamma of a lambda. Well to get that, we define that just
by rescaling, by dividing by lambda. So gama of a lambda has
mean a over lambda, but the variance becomes
a over lambda squared. Because remember, when you multiply by
a constant, remember, when you multiply by a constant and you take the variance,
then the constant comes out squared. So gamma a lambda has this mean and
this variance. But notice that it was just easier to
do this thing with lambda = 1 first, as long as we don't forget at
the end to bring back the lambda. All right, so this is just kind of an
introduction to the gamma distribution and the gamma function,
how it connects with the exponential. Next time, I'll show you how it
connects to the beta and the normal. Thought we'd finish
a little bit early today, cuz I don't wanna scare you again
with the beta til next time. See you on Friday.