Lecture 24: Gamma distribution and Poisson process | Statistics 110

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Thought we'd start with a little puzzle I'm trying to solve. Maybe you can help me with the puzzle, okay? Maybe make a little less volume, thanks. I don't need to echo that much. All right, so here's the puzzle. I thought this was a pretty interesting one. And maybe related to what we're gonna do. Okay, so the question is, what's the next number in the sequence? 0, 1, 2, any ideas? >> 4. >> 4. >> 3. >> 3. Any other suggestions? >> 1. >> 1. >> 0. >> Okay, no one has said the same number twice, so we have no consensus at all. >> [LAUGH] >> We have 0, 1, 4. >> 3. >> 3. 3, 2, >> 5. >> 5. 42, that's a good one. 1 over e would be a good guess. >> [LAUGH] >> Okay, any other suggestions? One-half, one-tenth, okay. Well, I don't want to get a reason for each of those. Is there any majority opinion on this? >> 3. >> 3, that's a, why do you say 3? >> Integers. >> Integers. Okay, 0 is an integer. Why 3? Yeah? >> It's a value that the Poisson could take. >> It's a value that the Poisson could take, okay. But I don't have to list them in order though, I could have listed them out of order. All right, well, okay, a lot of you seem to like the number 3, but I haven't yet heard a good reason why. I'm kind of interested. >> [CROSSTALK] >> It's an arithmetic sequence, I hadn't noticed that. So I guess you're pointing out that it's an arithmetic, hey that's pretty cool. So you go from here, add one, add one, add- >> [LAUGH] >> Arithmetic sequence, okay, that might have been what the puzzle writer had in mind. I've been trying to solve this for the past couple days, I got stuck. >> I haven't thought of 3 yet. The best answer I thought of so far was 720 factorial. >> [LAUGH] >> We can debate a little bit, is that more correct or is 3 more correct? Anyone,assuming it does goes this way, anyone wanna guess the next term >> 3. >> 3 [LAUGH] >> [LAUGH] All right, I think you got it. All right, so let me tell you why I'm talking of this sequence. Well, it's probably just like, I always really, I love puzzles since the time I was little. But, I hated this kind of puzzle, because there never seems to be a principle for coming up with an answer. You can make up one, I make up one thing and the answer is suppose to be something else. Well, what's the measure, what's the better, right? I mean, it's a measure of maybe the simplest answer, but will two people always agree on what's simpler. No, right, you need some complexity measure or something to actually make that into a valid, well-defined problem. So in a sense, this is just as good an answer as 3, Some of you had a might an arithmetic sequence, that's a good suggestion. But I was thinking of 0, well, 0 is 0. 1 factorial, 2 factorial factorial, 3 factorial factorial factorial, 4 factorial factorial factorial factorial. Which one of you just suggested it in thought. That's 0 with no factorials, 1 with one factorial, 2 with two factorials, etcetera. Okay, and probably should put parentheses, but that's gonna, because this double factorial is also used to mean the skip factorial, where you go down by two numbers each time. But anyway, that's doing 3 factorial, 4 factorial, etcetera. All right, so, well I mean, that's what one interpretation, it could be. So the general question is, if we have a sequence of numbers, how do we extend it, okay? And I was thinking of this example cuz I was thinking about factorials and how I wanna explain standing factorials. So, like, if you ever wondered what's pi factorial. We're gonna talk about questions like that. If you didn't ever wonder what pi factorial is, that's okay, you'll see other uses for this kind of thing. The question is if you plot it, what does the factorial function look like? Well, 0 factorial is 1. That's 0. 1 factorial is 1. 2 factorial is 2. 3 factorial is 6. 4 factorial is 24. 5 factorial is 120. 6 factorial is 720. Can't draw it anymore. It goes way, it starts out pretty small and then it grows extremely fast. Very, very beautiful formula that gives you the approximations for factorial is called Sterling's formula, which if you haven't seen before, everyone should know this formula. It's both beautiful and useful, says that n factorial is approximately square root of 2pi n times n over e to the n. This is actually an extremely good approximation. Even if you try this out on a calculator where n is like 12 or 20, it's actually very, very good. Not only that, but this approximation is so good that if you take the ratio of this, divided by this, it will converge to 1. As n goes to infinity. So, that gives you some, that's just a cool looking formula. By the way, we can prove this using probability and I'm not going to do that today, but I might do that at some point. We can give this a probability interpretation by thinking about Poisson's. But anyway, this gives us some sense of how fast does a factorial grow. This square root thing is not so important. The main thing that's driving is this n over e to the n. So basically we have this n to the n behavior. It's discounted by e to the n. But, n is, if you're letting n go to infinity, that's dominating over the e. So anyway, that's Sterling's formula. So, that gives a sense of how this thing grow. But now my question is, how would you connect the dots? That is, if you want to, this function was only defined on non-negative integers, and if we want to interpolate. Well, it's like many ways you could draw some function that connects those dots, infinitely many ways to do it. The question is whether there's some way that's more, so if I ask you what's pi factorial? And you have as many ways as you want to try to connect the dots between 3 and 4, then what should you do? Well, it turns out that there's one really standard famous way to do that. Some of you may have seen, but not at all, I'm assuming that everyone has. That's called the gamma function, so that's what we need to start with. We need the gamma function because we're going to be introducing something called the gamma distribution. We're not done with beta, we were doing beta a lot last time, and hopefully it wasn't too scary, despite it being Halloween. We'll come back to the, the beta turns out to be extremely closely connected with the gamma. So to understand the beta, we need to do the gamma, which we're about to do. The gamma distribution is named the gamma distribution because it stems from the gamma function which is- Which is one of the most famous functions in all of mathematics. Like if you had a top ten list, top ten most famous functions in math, gamma should, I haven't seen anyone make quite a list like that. I've seen people make lists of the ten most famous constants, but not the ten most famous functions. I'm sure someone has, and if they haven't, or if they have and they did a good job, they would put the gamma function on the list, okay? So it's very, very important in math, not just in probability and statistics, but it's also important in statistics, which is why we need it right now, but it has its own life outside of this subject. All right, so here's the definition of the gamma function That's just a capital letter Gamma of a is defined by the following integral. Integral, 0 to infinity, x to the a E to the minus x, dx over x. Usually, you'll see it written as an x to the a minus 1 here, which of course, is the same thing. Cuz you can cancel out one of the xs. But for certain reasons, it's convenient keeping kind of one x on hold over here. Just a different way to write it, but the same thing. And this is defined for any real a greater than 0. [COUGH] If you try letting a equal zero here or if you let a be negative and you think about how this integral is gonna behave, it's not gonna behave well. It's not gonna exist. So, for example, if a is zero, this part is gone, we just have e to the minus x over x. And then like look at what happens to that near 0, okay? So I'm not going to rewrite the whole thing. Just pretend that the x to the a is gone because we let a equal 0. We just have this. Look at what happens when x gets very close to 0, x is close to 0, either minus x is just close to 1, so it's not doing much, right? Either 0 is 1, so this part is very well behaved, it's just approximately 1, nothing much going on. So what's driving the integral is this 1 over x, the integral of 1 over x is log x, log of 0 is minus infinity, and that causes major problems. So, Anyway, but as long as a is greater than 0, this is okay. Same argument, what happens when x is near 0? Well, this e to the minus x doesn't do very much. But here, we just have x to the a minus 1. And integrating that is gonna be fine. You can just do the anti-derivative and you'll see it converges. So it's okay. That's when x goes near 0. It's also, you also have to consider what happens when it goes to infinity. When x goes to infinity either the minus x is the dominate part ,this thing, even if a is a million, than this would be x to the 999,999. Completely irrelevant compared to even minus x, which is exponential decay. So that integral is gonna converge. So okay, so it exists for any real a greater than 0. This is an extension of the factorial function. It will be a little bit more convenient if gamma of a is a factorial. It turns out that gamma of n is n minus 1 factorial. So you just have to be a little bit careful. That's just for historical reasons pretty much. This is for n, for any n a positive integer. So as I said, there's other ways you can extend factorials to the positive real numbers. But this one turns out to be the one that has at least found the most different applications in math, so this is the most natural and best one known in a certain sense. So why is this true? Well, let me write down another identity for Gamma. Gamma of x plus 1 equals x gamma x. So this recursive formula for gamma. You don't need to know much about the gamma function for this course. Depending on what kind of math you do later, you might need a lot of gamma stuff, but for our purposes, you should know the definition, you should know that it extends factorials in this way, you should know this identity. Notice that this is very closely related to this, right, because let's just look for a second at gamma of 1. We let a equal 1, x is cancel, integral of e to the minus x, we've seen several times that that's just 1, so gamma 1 equals 1. And then if we plug that in here, it says that gamma of 2, that's 1 plus 1 equals, gamma of 2 equals 1 gamma of 1, which is 1. It would be nice if gamma of 2 was 2 factorial, but according to this gamma of 2 its still 1 factorial. So that's still correct, gamma of 2 is 1. And now if we look at gamma of 3, well we do 2 + 1, 2 gamma of 2, and we would get 2, and so on. So, notice that this is the same recursive formula that the factorial will satisfy. So basically this formula immediately implies this one just by, we have the starting point, right? And then both things, both factorials and gammas satisfy the same recursion, so then they must stay the same forever. Okay, and to prove this thing is just integration by parts, which I'll let you do if you feel like it. If you tried to do this integral, just using kind of like the most basic integration by parts, like AP calculus. What you would do is, you would not just get an answer for the integral. But what you would get is something that looks exactly like the gamma function again and you would get this identity, just from a straightforward integration by parts. Okay, so not the gamma function and that's what happens for factorials. One other thing about the Gamma function that's worth knowing is what's gamma of one-half? Just in case you ever wonder what's the factorial of negative one-half. What does that mean? Well, gamma of one-half, so when it's an integer, we just get factorials which are integers. Gamma of one half is square root of pi. Kind of a curious looking fact. And where have we seen square roots of pis before Normalizing constant from the normal. So we proved that the normalizing constant for the normal is we have 1 over square root of 2 pi, blah blah blah, for the normal PDF. So if you let a equal one half here, it doesn't look exactly like the normal. But actually, it's similar to what you would get if we have this normal, take our normal density, let's say, e to the minus x squared over 2 dx. Let's say, we were just trying to do that integral, right? And when we did this using the cool trick of writing it down twice, we took into rule, couldn't do it wrote down a second time and then we could do it. But if we didn't know that trick, we might say, well, let's make a substitution. For example, I guess the most obvious thing to do might to let u equal x squared over two. Right, so du = dx, right? And so, Sorry xdx. So, I mean that will be one thing we can try. This would be a natural thing to try. Just because it's kind of annoying having this x squared up in the exponent. So if I do this then at least it's just gonna be e to the minus u, that's very nice up in the exponent. But the problem is we don't actually have a x here, right? So you can still do the substitution, but what you're gonna get is, you'll have an x there, and x is essentially square root of u. I mean, there's some other nonsense going on. So you're gonna get a square root of u, which is like what you would have here, or you would be dividing by square root of u. So anyway, I'm just saying. I mean, you can work through the substitution if you want. But I'm just saying if you do this kind of substitution on the normal integral. You're gonna reduce it to something that looks like gamma of one half. So you could either think of that as if you knew this fact, this would give you another way to get the normalizing concept of the normal. Or, the way we actually have it is we already know from a different method, the normalizing constant of the normal, then we can prove this fact. So once we have gamma of one-half, then we know gamma of three-halves. (3/4) = one plus one-half. So if we use this formula, just says it's 1/2 gamma of 1/2. So that would be square root of pi/2 and so on. So, then we could get gamma of five-halves, gamma of seven-halves and get all the half integers that way. All right, so that's just a quick introduction of the gamma function. What's on this board is all you need to know about the gamma function. But now let's talk about the gamma distribution to connect it back to probability. Okay, so here's kind of a simple naive way to create a PDF. Based on the gamma function. So we have this integral here. That's just the definition of gamma of a, okay? And suppose we wanna somehow guess that some kind of PDF that's gonna be related to the Gamma function, and we want a valid PDF. And to have a valid PDF, all we need is something that's non-negative and integrates to one. What would you do? Normalize it, how? How would you normalize this thing? >> [INAUDIBLE] >> Just divide by that value. So if we divide both sides by gamma of a, that's a PDF. So 1 = integral 0 to infinity, 1 over gamma of a, x to the a, e to the -x, dx/x, Right? That's what I call just like a simple naive trick, that we have that integral. Just divide by gamma of a now we have a PDF. So this thing here, Is called everything we're integrating, including this 1/x it's easy to forget this 1/x. You could put it up there as x to the a- 1, but then it's easy to forget whether that should be a or a- 1, I like to write it this way. This is called the Gamma of (a,1) PDF. There are two parameters, we only see the a right here because the other parameters is 1. Gamma is very closely related to exponential distribution. So the more general one would be Gamma of a lambda, And that one would be obtained just by, remember how we got from exponential one to any exponential lambda? It's the exact same idea that we're just putting in a scale. So this is just a scale or a one over is a scale. So to get this one we're just gonna make a change of variables, let's say we wanted y to be gamma of a lambda and we wanna find it's PDF. Then we're gonna define it as, Y =, this is exact same thing we did for the exponential. We're just gonna let y equal x over lambda. We just defined gamma of (a,1). That's what we just defined as this PDF, and now we want gamma of a lambda all we're gonna do is take a gamma (a,1) divide by lambda, okay? And so now at this point, this is just like a really simple transformation, right? Just multiplying by a constant, and we've been talking about what happens with more complicated transformations, non-linear stuff and all kinds of complicated transformations. Here we're just multiplying by a constant. So if want the PDF of Y just to remind you of your change of variables. The PDF of Y is fx(x) dx/dy, right? And in this case we have y = x over lambda, which we could of course write as x = lambda y. So dx/dy = lambda. So we just right down the gamma PDF, and we just replace x by, x is lambda y, so I'm just replacing x by lambda y, to the a, e to the- lambda y, 1/x. And x is lambda y. Times lambda, so that lambda cancels that lambda. And that's what it looks like. I should emphasize the possible values, this is for x greater than 0, this one is for y greater than 0. So this is like the exponential, that gamma is a continuous distribution on the positive real numbers, okay? So, I mean that's just, all right, so that's a cute trick, how do we get to convert this famous function into a distribution. That doesn't really say that that's gonna be an important distribution, right? That's not like a math trick story. It's not a probabilistic or statistical story. So I wanna show you now how this relates to the gamma, no, the gamma distribution relates to the exponential distribution, okay? That's why we use the letter lambda and this is analogous to what we did for the exponential, so why does that work out? So for that we just need a picture. That's actually related to the Poisson also. Gamma relates to the normal distribution we'll see later. It relates to the beta distribution, it relates to the exponential distribution, it relates to the Poisson distribution. And by the way, so we've already discussed all the famous discreet distributions we need in this course. We're almost done with continuous distributions as well. Not done like studying them, but done with like famous ones that You need to know by name. There's a few more, but all of the remaining ones are just variations in some sense of Gammas or possibly with normals. But as you'll see later, Gamma is closely related to normal. So once we have the Gamma and the normal, we can create the rest of the distributions we'll need. So to talk about the Gamma, and Exponential connection, we need to think about a Poisson process. Okay, so, we've only talked a little bit about Poisson processes. Which is if you take that, 71, which is processes you can do a lot of Poisson process stuff. For our purposes we just need to know a couple basic facts. First of all what is a Poisson process? You've seen them before, but just to quickly remind you we're imagining emails arriving in an inbox. Where the number of emails in some time interval let's say N sub t equals number of emails up to time t we just have this timeline here. And the Poisson process is called a Poisson process because we're assuming that that's Poisson with rate lambda, lambda times t. So in a time interval of length t, this is true, I stated this as you know you're going from zero to t. But we're actually assuming that this is true in any time interval of length t, it's gonna be Poisson lambda t. And there's only one other assumption for a Poisson process, that is that the number of arrivals in disjoint intervals are independent. That is the number of emails you get from this time to this time is independent of the number of emails you get from this time to this time for example. So those, there are only those two assumptions. So a few weeks ago we saw what the connection with the exponential distribution is. And that was before we had even defined the exponential distribution. So, I'll remind you of what that was. So, we're just drawing Draw an x every time you got an email. That kind of picture, that is the picture we have in mind. So what we proved a week ago was that if we call this one T1, T1 is the time of the first email, that's exponential lambda. And the proof for that is just to say, Just to remind you, what's the probability that T1 is greater than t? That's the same thing as saying at time t, I haven't yet gotten any email. That's the same thing as saying that N sub t equals 0, cuz that's just the same thing, no emails up until time t. Well, that's just given by Poisson e to the minus lambda t. The rest of the Poisson stuff is just 1. So it's just this one line calculation that shows that, and then say that's just one minus the exponential lambda CDF so that's true. So this first time is exponential. Using a similar argument we get that that's the time from here to here. Using the same argument repeatedly, The time you have to wait from the first email to the second email is also gonna be an exponential and it's independent of the first exponential. Just use the memoryless property. Memoryless property says, no matter how long you've waited, waiting for this second email but starting at this point in time it's just starting over again so it's just the same problem again. Same method says that here to here is another exponential lambda, and from here to here is an exponential lambda. Exponential lambda, and so on. So all of these times between arrivals are exponential lambda. So our assumption, so that's actually an equivalent way to state the Poisson process. Rather than stating with Poisson, we could say that the inter-arrival times, that is just these distances between Xs and this timeline, interarrival times, Are i.i.d. Exponential lambda. So those are the inter-arrival times. But what if we wanted to know the actual times? So if we call this T2 and T3, T4, T5, and so on. Well, that's the actual time on the timeline, not just the distance between two times. We want to know that distribution. So therefore we can think of Tn So let's define Tn to be the time of the nth arrival, nth email, nth phone call, or whatever. We can think of that, to get Tn, we just add up the first waiting time, then the waiting time from the first to the second, and a second to third, and so on. So we can just think of that as the sum of Xj, j equals 1 to n, where these Xj's are these inter-arrival times, which we just said are exponential lambda, i.i.d. This is the most important story for the Gamma distribution, at least for the integer. So right now I'm assuming n is an integer. Over here, a did not need to be an integer. But if we assume that it is an integer, then this is gonna be Gamma of n lambda. We haven't shown this yet, so that's something we need to prove, that this kind of a sum of i.i.d exponentials will be Gamma, that's what we're gonna show next. But they can see the analogy, remember the negative binomial and the geometric? The geometric you're waiting for one success in discrete time. Negative binomial you're waiting for however many you know, seven success say, right. If you can think of the negative binomial as the sum of geometric, so you wait for your first success, then you wait for your second, and then third. And so on, in discrete time. The exponential is the continuous time analog of the geometric. So this is the continuous time analog of negative binomial. How long do you have to wait for n successes in continuous time? Where in this case, success just means getting an email. So that's a pretty weak definition of success. But anyway, that's what's going on. >> [LAUGH] >> So success is however you wanted to define it. But in this case, we just mean the arrivals. All right, so let's prove this fact now because just from what I've written, I'm saying this is true, but just from what I've written here, here I said add up exponentials, and here I said Write down this PDF with the Gamma function. What does that have to do with that? Not yet clear, right? So we need to check the best true, okay? So let's do that. There's two ways to do it, at least two. There's two ways that we know how to do right now. One would be to do a convolution. So, that thing over there is just a convolution, right? This is a convolution, it's just adding up an IID exponential. We could do a convolution integral. You could do that for practice. It's not that bad, actually stuff will cancel and it's not bad, and then you could involve X1 and X2. Then take that thing and involve it with X3, and then take that thing and involve it with X4, or you get pretty tired of it pretty quick soon. But hopefully before you got tired of it, you would see a pattern and you could prove it by induction. But I don't wanna do it that way. We have a much better method available that is MGFs, okay? MGF, right, because we're adding up IID things. And in class, we already derived the MGF of an exponential. It's a nice looking thing, and so we may as well use that, okay? So that's what we're gonna prove right now. So a proof that, Proof that, let's say T = the sum XJ, J = 1 to N with XJ IID exponential lambda. Let's do exponential one first, is Gamma of (n,1). As long as we can do this when lambda is 1, then we immediately get this for all lambda, because you could just multiply or divide by lamdba. Lambda was just a rescaling, and so just scale everything later. It's not really harder to do it with general lambda here, it just makes the notation more cluttered, having all these lambdas floating around everywhere. So I'd rather just set it equal to 1, and then we can always multiply or divide by lambda later. It's not a big deal. All right, so let's do that using the MGF. So the MGF of XJ, MGF of X1, since they're our IID, they all have the same MGF. It's the same as MGF of X2 and all XJs is. It's just 1 over 1- T, Which we can also think of as a geometric series. So we used this before to get all the moments of X1, all at once using the geometric series. And this is valid for T less than 1, Okay? So we did that before, Therefore, We can write down the MGF of T. I'll call it Tn, just as a reminder that there's n of them. MGF of Tn, easy just raise this to the nth power, right? Cuz we're adding independent things and multiply the MGF. So just multiply that by itself a bunch of times, we'd have 1 over 1- t to the n, where t less than 1, okay? Very little work required to do that. I just put an n there. That's a lot easier than doing a convolution. On the other hand, how do we know that this is the MGF we wanted? Well, to answer that question, we need to look at the MGF of a Gamma distribution, right? I'll make a new notation just so that I'm not doing circular reasoning. Well, let's just say Y. Let Y be Gamma of n1, and let's find its MGF, okay? And we said before that there's no masquerading distributions that pretend to have a certain MGF and aren't actually what they appear to be, right? So if, in fact, Y has this as its MGF, then we're done, right? So that's what we need to show, so find its MGF. Well, its MGF is the expected value, e to the TY = LOTUS. I'm just gonna write e to the ty, And then I'm gonna write down the gamma PDF which was, I'll take out the constant, 1 over Gamma of a. Y to the a, e to the -y, and a is n here. So that's Gamma of n, y to the n, e to the -y, dy over y. That's just this thing, and just using LOTUS, okay? So that's not even conditioned, write down, right away. That's why it's LOTUSed. It shouldn't require much thought to write this down, okay. Now, it may take some thought. How do we actually do this integral? Well, I guess we can simplify a little bit. Collect these two terms. So this is really just 1 over Gamma of n, integral 0 to infinity, y to the n, e to the minus. And let's just collect these terms together, 1- T, Times y, dy over y. Cuz we have e to the -y and we have e to the TY, okay? All right, so that may look like a difficult integral. But on the other hand, if you stare at it for a little while, it should start to look familiar. It kind of looks pretty similar to the integrals we've just been doing. And it has some extra constant up here, but this should seem pretty reminiscent of the Gamma function itself, right? I mean just in general in this course, it's not really a big deal how good you are at integration by parts and stuff like that, you'll rarely have to do that. But what you do need to be able to do is pattern recognition and see well, this looks like a Gamma integral. it's just I have this extra constant here, that's really no big deal, right? Just make a change of variables just to make this e to the -x again, right? So we're just gonna let, x = (1-t)y. So that dx = (1-t)dy, Okay? So t is less than 1, so that means that this is actually still positive. So the limits of integration, so now, let's just rewrite that integral. So I'm just making a change of variables just to make that e to the -x again, cuz that's gonna look more like. It looks very similar to a Gamma, now, let's just make extremely similar, that's all I'm doing. So we do 1 over gamma of a. Limits of integration stay 0 to infinity, because when Y is 0, X is 0, and one goes to infinity, the other goes to infinity. And y to the n is just well, y is x over 1- t, so we are gonna get 1- t to the -n. That looks pretty promising, right, cuz that's actually exactly what we wanted over there. And then we're gonna have x to the n, e to the -x. And one convenient thing about kind of keeping this as dy over y, instead of cancelling it. Is that this multiplicative thing is gonna cancel, so dy over y is actually the same thing as dx over x. That actually makes this kind of thing easier. And that is again gamma of n, not gamma of a, or just let a equal n. So that's why it's handy, dy over y is same thing as dx over x, just makes it a little bit easier with the algebra. Well, that was exactly the gamma function again, so that's just 1- t to the -n, which is what we wanted. It's the same thing. MGF is the same, so that proves that this statement is true, okay? So that's nice, that's the connection with the, that's the connection with the exponential. Well, another thing we need to do is get the moments. So again, that sounds like it could be a nasty problem, but it actually is not bad at all, once you see the pattern. Okay, so let's let X ~ Gamma(a1). And by the way, this calculation here actually, in this last line here, in this part, I actually never assumed that n was actually an integer. So this actually shows that this is the correct MGF, even if n could be any positive real number here. To have the interpretation of the sum of the exponentials, then we need an integer. But this is the MGF in general, cuz that didn't rely on assuming that was an integer. All right, so now let's get some moments. Of course we can use the MGF, but actually, I think in this case it's even easier just to directly use LOTUS. So let's say we want to find E(X to the C). Where I don't even need for c to be an integer, c is just some number. I'm not saying this will exist for all c, we'll have to see whether it exists, okay, but c is just any constant. In particular, we can let c = 1 to get the mean, c = 2 to get the second moment, then we can get the variance of that. Well, it's just LOTUS, so this should be pretty quick. I'm gonna write down the, I want E(X to the c), change big X to little x, just LOTUS, right, X to the C. And then then I'm just gonna write down the gamma PDF again, right? So we had 1 over Gamma(a), x to the c, and then we had an x to the a, e to the minus x, dx over x. Well okay, obviously we can combine this to be X to the a + c, 0, infinity, x to the a + c, e to the -x dx over x. That integral should look very, very familiar again, right? Just pattern recognition, that's x to a power, e to the -x dx over x. That looks very very much like the gamma function again, right? That is the gamma function, that's just Gamma(a + c) over Gamma(a). That's just the gamma function. Or if you want, you can multiply and divide by Gamma(a + c) and say, we just integrated the Gamma(a + c) PDF. Must get 1, if it's a PDF it has to integrate to 1. So you don't need to do any calculus here, really, that was just algebra. This times this, and then recognizing that's a gamma, right? So gamma has very, very nice properties in that sense, beta has some similar nice properties. All right, so that's Gamma(a + c) over Gamma(a). And our assumption has to that a + c is positive, because the gamma function is only defined on positive numbers. If a + c less than or equal to 0, then it just doesn't exist. So okay, let's use this just to just quickly get the mean. E(X) =, now let's just let c = 1, so that's gonna be Gamma(a + 1) over Gamma(a). But then remember that identity, Gamma(a+1) is a Gamma(a). So this is a Gamma(a) over Gamma(a), which is just simplifies to a. Very easy-looking formula for the mean, now let's check whether this makes sense intuitively. In the case when a is an integer, and we use that interpretation over there, we already knew the answer, right? Cuz exponential 1 has mean 1, you add up n of them, you have to get mean n by linearity. So we already knew this is correct in the integer case, and this says it's true even if a is not an integer. All right, and let's get the second moment, E(x squared). Well, that's just Gamma(a + 2) over Gamma(a). Gamma(a + 2) is (a + 1) Gamma(a), (a + 1) Gamma(a + 1). I'm just using the same thing again, right? Just replace a by (a + 1), so it's (a + 1) Gamma(a + 1). But gamma of (a + 1) is a Gamma(a), so again, the Gamma(a)'s cancel. And we would just get a squared + a, and that tells us the variance. Variance is E(X squared) minus E(X) squared, so we're subtracting a squared, so that tells us the variance of x also equals a. Again, that's easy to remember. So it says that Gamma(a + 1) has mean a and variance a. Sounds a little like the Poisson, where the mean equals the variance. But this is kind of a special, for the Poisson, the mean is always equal to the variance. And this is kind of, just because I let lambda equal 1. So now if we bring back the lambda, gamma of a lambda. Well to get that, we define that just by rescaling, by dividing by lambda. So gama of a lambda has mean a over lambda, but the variance becomes a over lambda squared. Because remember, when you multiply by a constant, remember, when you multiply by a constant and you take the variance, then the constant comes out squared. So gamma a lambda has this mean and this variance. But notice that it was just easier to do this thing with lambda = 1 first, as long as we don't forget at the end to bring back the lambda. All right, so this is just kind of an introduction to the gamma distribution and the gamma function, how it connects with the exponential. Next time, I'll show you how it connects to the beta and the normal. Thought we'd finish a little bit early today, cuz I don't wanna scare you again with the beta til next time. See you on Friday.
Info
Channel: Harvard University
Views: 108,358
Rating: 4.889328 out of 5
Keywords: harvard, statistics, stat, math, probability, gamma distribution, poisson processes
Id: Qjeswpm0cWY
Channel Id: undefined
Length: 48min 49sec (2929 seconds)
Published: Mon Apr 29 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.