Lecture 18: MGFs Continued | Statistics 110

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Well let's get started. Thanks for coming despite the rain, but at least we can feel lucky that the sun rose today cuz we have a lot more to do and it would be hard to do if the sun stopped rising. So, okay, so we were talking about MGFs last time. We've done all the theory that we need for MGFs but I'm not sure that the intuition is clear enough yet and there are some important examples. So I just want to start with a few examples of MGFs. We already have all the theorems. Okay, especially, how do we work with MGFs for some of the most important distributions such as exponential, normal, and Poisson? Just to show you how the MGFs are useful for some of those famous distributions, okay. So let's start with the exponential. And [COUGH] we talked before about, So this is the Expo MGF. We talked before about the fact that if we have exponential lambda we can always find a constant to multiply by to make it exponential one. So let's just start with the exponential one case cuz that's simpler, hat is lambda equals one. Let x be exponential one, and suppose that we wanna find find MGF and find the moments, okay. Find moments. And this will really show you why it's called the moment generating function. That doesn't actually, I didn't actually talk about where did the word moment come from? It comes from physics. Those of you who've done moment of inertia and stuff, there's actually a pretty strong analogy between variance and moment of inertia. That doesn't answer the question for where did the word moment come from in physics, but you can ask the physicist that. But it came into statistics via physics because of this analogy with moment of inertia. Anyway, so we have an exponential, okay, and so let's find the MGF. Well by lotus that's a pretty easy calculation. M(t), remember just by definition it's just the expected value of (e to the tx). And this is a perfectly valid thing to write down. This (e tx), that's just some random, we're taking it's expectation, and then we're viewing this as a function of t. And I pointed out last time, t is a dummy variable, so I could of just as well said M(s) = expected (e to the sx), or whatever you wanna call it that doesn't clash. [COUGH] The interpretation is just that this is a very, very useful bookkeeping device for keeping track of moments, and it's another way to describe distribution, rather than a CDF or a PDF. Okay, so let's just compute this thing. Well this is an easy lotus problem cuz by lotus we can just immediately say this is the integral 0 to infinity, e to the tx, e to the -x dx. All right, that's just immediate from lotus, combine the two exponentials, so that's e to the -x(1-t) dx. So that's just an easy integral, right. So that integral, well actually one way to do it is just to do the integral. Another way to do this integral is to recognize this as another exponential PDF with a different parameter and put it in the normalizing constant. And you'll get 1 over 1- t, and this is for t less than 1. If t is bigger than 1, we have some problems here. Cuz if you let t be 2 for example, then that would 1- 2 is -1. You'd get e to the +x, which would blow up. But as long as t is less than 1, this will be okay. Exponential decay, not exponential growth. So we have to assume t is less than 1. But that's okay, cuz we talked last time about the fact that we wanted to have some interval, I called it -a to a on which this is finite. So in this this case it's finite everywhere to the left of 1, right. So in particular it could take some interval like say -1 to 1 open interval on which it's finite. So this is a perfectly valid MGF. Okay, so now we wanna get the moments, right. So from what I said last time, we could take this thing 1 over 1-t and start taking derivatives, so, and plug in 0. So would be true that M'(0) would be the mean and M''(0) would be the second moment. And once we have this and this we could easily get the variance. We already talked about the mean and the variance of the exponentials, so you could do this and check that it agrees with what we did earlier through lotus, okay. And then third moment would be the third derivative evaluated at 0, and so on, right. So we could do that, but that's kind of annoying in the sense that you have to keep taking derivatives. Now for this function, taking a bunch of derivatives is not too bad, okay. But it's still a much better way to do this, Is to recognize the pattern, right. A lot of this is about pattern recognition, okay. Where have we seen 1 over 1-t before? Geometric series, right. We keep using the Taylor series for e to the x and geometric series over and over again. It can go in both directions, right. You can have the geometric series result and write it as a geometric series or you can have a geometric series and simplify it to this. Anytime you see one over one minus something, you should be thinking that may have something to do with the geometric series. That may be a useful interpretation, it may not, but at least the idea should pop into your mind just cuz you see this pattern, okay. So if we do that we get 1 over 1-t equals just a geometric series, a sum t to the n, n=0 to infinity. And this is valid for t greater than, for absolute value of t less than 1. That's when this converges. By writing it this way, we don't actually have to do derivatives. We're just looking at this series, okay, and then we're just gonna read off the moments. So the only thing we have to be careful about is the n factorial. Because I said with the MGF, you take the Taylor expansion and the moment is whatever is in front of t to the n over n factorial. I don't see an n factorial here, but that's no problem, right. We just multiply and divide by n factorial, cuz we need the n factorial there. So I'll multiply by n factorial t to the n over n factorial. Now this matches exactly the pattern that we talked about last time, about what ever's in front of the t to the n over n factorial, that's the nth moment. So that's the nth moment. So we immediately know now that E(x to the n)=n factorial for all n. So instead of taking derivatives over and over again, we simultaneously get all the moments of x, okay. So that's nice, right. Didn't need to take any derivatives. So, by the way, that's kind of like the coolest thing about MGFs is the fact that if you, just in general, not necessarily for this example. If you wanna find the moments of some distribution by lotus, you would think you have to integrate, right. You want e of x to the n so you're going to integrate x to the n times the PDF. That may be an incredibly difficult integral. But the MGFs, once you have the MGF, we're taking derivatives not integrals. So it's pretty surprising to me at least that you can do derivatives of the MGF rather than the integrals of powers of X. Derivatives are much easier usually than integrals, so that can save a lot of work. So let's just quickly see what happens if it's exponential lambda, where lambda is not necessarily 1. So now let's let Y be exponential lambda. And then, let's just convert it, just to see how to apply this. Convert it, well, exponential, so we talked before about the fact that if you multiply or divide by lambda, it may be hard to remember, should you multiply by lambda or divide lambda? But there's an easy way to see that. Let's just let X =, Lambda Y. So I need to multiply by lambda rather than dividing because we know the exponential lambda has mean 1 over lambda. So if we multiply by lambda now this has a mean 1. And we show that this is, in fact, exponential of 1. So we've converted it to this case. In other words, Y = X over lambda, and we can take nth powers. So now we immediately have to moment of the nth moment of Y. Expected value Y to be n = expected value of X to the n which is n!, divided by lambda to the n. Okay, so I didn't do any calculus here. I only used the geometric series. We could have directly done something similar to this for Y, but I think it's easier working with the exponential one and then converting it back. Similarly, at the end of last time, we derived the MGF of the standard normal, okay? Now if you want any normal mu sigma squared, then you just write a mu plus sigma z, right? Then you can get its MGF very easily. So a lot of times it's easier to work with the standard normal. Okay, so speaking of standard normal, let's actually get the normal moments now. We already know the odd moments. So the problem is let Z be standard normal, And find all its moments. Okay? We already know that the first moment is 0 and the second moment is 1 cuz it's mean 0 variance 1. We already know that the odd moments are all 0. That's just by symmetry, we mentioned that fact before but you should check for yourself that that makes sense, that to practice the symmetry. Because if you write down the integral using LOTUS, you would be integrating an odd function symmetrically about 0 so the negative area cancels the positive area. So don't need to do any work to get this, just use symmetry. Even moments, though, that seems pretty hard. And we already know E of Z squared, and we did that by doing some integrations. Now if we want E of Z to the forth, if we use LOTUS, you're gonna have to integrate Z to the fourth times the normal PDF. How do you do that integral? I don't know, I mean, you can try doing some substitutions, you can try doing integration by parts and you can easily spend a couple hours doing that integral. And it's possible to do it, but it's not easy, it'll be a lot of work. And that would just be the fourth moment, and then you'll say well, what about the sixth moment? What about the eighth moment, right? So that's not a very efficient way to do things, it's doing a lot of nasty looking integrals. Okay, so let's use the MGF instead. The MGF that we derived last time is the function M of t = e to the t squared over 2. So that at least gives us an approach to getting the moments that doesn't involve having to figure out how to do these integrals, okay? It's something more straightforward. Like for derivatives, we have the chain rule, the product rule and so on. There's no chance that you can't do this derivative if you know your chain rule and product rule, and stuff like that. Whereas for integration, you may just not know how to do the integral, okay? So we could take the derivative of this, use the chain rule. And we're gonna get a t that comes out in front because of the chain rule. And then we take the second derivative, because then there's gonna be a t out there after the first derivative. Then we will have to use the product rule, okay. And then we take another derivative, then we have 2 terms and then terms start multiplying and get more and more terms to deal with. And it'll get more and more tedious and ugly, the more derivatives we took. It's still something that you can do. It's pretty mechanical, but it's tedious, and we wanna avoid tedious stuff, okay. So here's a much better way to think about it. Over there with the exponential, I emphasized just the pattern recognition geometric series. Let's apply the same thinking again. Pattern recognition, this is e to a power, okay? Unlike the geometric series, the Taylor series for e to the x converges everywhere. So I can immediately just write down the Taylor series for this, without taking any derivatives. This is just the sum of t squared over 2 to the n over n!, right. Because the Taylor series for e to the x is valid everywhere, so in particular, I can plug in t squared over 2, okay? So this is a much, much better way to do it, than to start taking derivatives of this. So let's simplify this, this is the sum, notice that we're only gonna get even powers of t, which makes sense because this is an even function. So it's gonna be t to the 2n, And, there's a 1 over 2 to the n in the denominator and there's an n!. Okay, so that's what it is. Now, same as over there, we just have to read off the moments. The only thing you have to be careful about is the fact that there's a 2n here in the exponent, and there's an n!, there. So there's kind of a mismatch right now, okay? We want the 2n moment because 2n is just an arbitrary even number. Okay, we want the 2n moment, so the 2n moment, we want the coefficient t to the 2n over 2n!. We don't have a 2n!. Well that's okay, just put in 2n!. As long as we multiply by that, it's okay. So I just multiplied and divided by 2n!, that immediately tells use the answer. The expected value of Z to the 2n, so that's just an arbitrary even moment, we already have the odd moments, Is just the coefficient of t to the 2n over 2n!. That's everything that's left. That's 2n!, over 2 to the n n!. And let's just check whether this makes sense in the cases we know. If n = 1, N = 1, this is 2!, over 2 times 2, when n = 1, so 2 times 1. Okay, so So that's 2 divided by 2 times 1 is 1. So E(Z-squared) = 1, and that's what we expected because the variance is 1. And let's just do a couple more, n = 2, so then get the fourth moment, z to the fourth. n = 2, four factorial is 24 divided by 8, 24 divide by 8 is 3, so the 4th moment is 3. And the next one E(Z to the 6th) is gonna be 3.5 = 15. And you'll see, alright as 1 times 3 times 5. You'll see the pattern as, if it's 1, 1 times 3, 1 times 3 times 5, 1 times 3 times 5 times 7 and so on. And this is not the first time that we've seen these numbers, or at least if you've done the strategic practice problems going way back. That was the number of ways to break two n people into end to end partnerships and there's a story problem there. We could either write it this way, or as a product of odd numbers. So kind of a surprising fact, or at least I found it really surprising that the same expression comes up for even moments of the normal. As it's the same number as breaking up people into partnerships and counting number of ways to do that. And I thought that was kind of mysterious, it turns out that it's not a coincidence. But there's this kind of a very deep combinatorial explanation for that which I can't get into but there is a reason for that. Anyway, that gives us all the moments of the normal distribution now without doing any calculus. So that's nice, okay? So one more MGF problem, we haven't talked it yet in class, the MGF of the Poisson distribution, so let's do that. So you know Poisson, we know that the mean is, the Poisson lambda, we know it has mean lambda and variance lambda, but we haven't computed any other moments. But mainly for the Poisson, I wanted to show you the other, like I said last time that there are three reasons why the MGF is important. And those examples to illustrate the fact why it's a moment generating, cuz we generated all the moments. But for the Poisson I wanna show you the other important reasons. So let's let x be Poisson lambda and find its MGF again. Well, let's just use LOTUS, the expected value of E(e to tx) = the sum, so Poisson takes non-negative integer values so I'll just say k equals 0 infinity. e to the tk, all right it's just LOTUS, e to the mn times the Poisson pmf, e to the minus lambda, lambda to the k over k factorial. Okay, looks like a kind of ugly sum. But actually you'll find that this sum is an example on the math review handout, so I was planning for in advance. But you don't have to memorize that or anything, this is just another example of pattern recognition dealing with a series. It looks a little ugly when you first see it but this is actually easy once you're familiar with the pattern, right? So either the minus lambda comes out cause that's just a constant, look at what's left inside. We have, this is either the t to the k and this is lambda to the k, so together that's lambda e to the t to the k, right? So all that's left is the sum of something to the k over k factorial. That's just the the Taylor series for e to the x again. So this is very easy once you've mastered the Taylor series for e to the x. So and just immediately write that down, that is the Taylor Series for e to the x evaluated at x = lambda e to the t, okay? So we can simplify that a little bit, it's e to the lambda e to the t- 1. So that's the Poisson MGF and it's valid for all values of t because the series converges for all t. Okay, so that's the Poisson MGF. So one thing we could do with it is to start taking derivatives or trying to expand it or whatever to get the moments. But I'm not doing this example because I want do moments, I want to show you the other applications of MGF. So now let's let Y be the Poisson of mu. So we have two Poisson's now, not one Poisson. And suppose that X and Y are independent. And the problem is find the distribution of X + Y. So we wanna study the sum of two independent Poissons. Okay, so that's called a convolution, and we'll come back to convolutions later on in the semester, but you know in general it can be nasty. But I pointed out last time that for MGFs you can just multiply the MGFs, that's easy. Whereas, doing a sum or an integral could be pretty nasty, okay? So all we have to do is multiply the MGFs. That is I'm just going to take the MGF of X times the MGF of Y. So here's the MGF of X, e to the lambda, e to the t minus 1. MGF of Y is going to be the same thing, except that the parameter is now called mu instead of lambda. So that's gonna be e to the mu, e to the t- 1 =, and let's just simplify it. That's e to the lambda + mu, factor that out, e to the t- 1. That immediately tells us that X + Y is Poisson lambda + mu. Because of the fact, we didn't prove this theorem, as I said that's a really difficult theorem. But that is a theorem that this is the Poisson lambda plus mu MGF. There's no other distribution that has the same MGF. So this is, therefore, the only possibility. By the way, it was obvious that the mean had to be lambda + mu, by linearity. So this, we already knew. The interesting part is that it's Poisson, the sum of independent Poissons is still Poisson. Most distributions don't have such a nice property. Like you'll add independent versions of them, and usually you get some other family. Here it's still within the Poisson family of distributions, okay? So that's a very, very nice property of the Poisson. And a common mistake with this, is to ignore the assumption that X and Y are independent. So to justify being able to be just multiply the MGFs we need X and Y to be independent. So just to see a quick counter example, if they're not independent. If X and Y are dependent, well the most extreme case of dependence that I can think of is when X = Y. Okay, so let's just see why this doesn't work when X = Y. Well obviously if X = Y, then X + Y is 2X. And that's not Poisson. Why is that not Poisson? Yeah, >> [INAUDIBLE]. >> Okay, so that's a good way to think of it with the MGF, if we take the MGF of this thing you're gonna get 2 in there. And what you actually gonna have, you are gonna take the selected value of e to the 2tx, so you've replaced t by 2t. So you'd get 2t up there and that doesn't look like a Poisson, now so that's close to a proof but it's a little more complicated than I was thinking of. And you would still deed to say like could there be some miracle of algebra that would reduce that back down. It's not true right? If you put a 2 there it's not of this form. But still, what if you just didn't think of the brilliant algebraic way to simplify it down. Yeah. >> [INAUDIBLE] >> Yeah that's the simplest way to see it. What she just said was that this thing is even. So that's one good way to see it. A Poisson has to take on any possibles non-negative integer value. This thing is always an even number, so it couldn't possibly be a Poisson. That's the simplest way to think about it, is just looking at one of the possible values. Another way to see it, would be to compute the variance, the mean and variance. So the expected value of x plus y, which is 2x, would be 2 lambda. So if it were Poisson, it would have to be Poisson 2 lambda cuz that's the mean. But the variance of 2x is 4 lambda cuz the 2 comes out squared. For a Poisson the mean always equals the variance. For this thing the variance is double. Intuitively that should make sense because you're adding the same thing to itself. That increases the variance compared to if you added independent things, then you might expect if one thing happens to be very large, then the other thing might offset it, right? But if you're adding the same thing to itself and it happens to be large, then you're adding the same large thing twice, okay? I've seen similar mistakes, cause this is like an easy counter example. I've seen some more mistakes since that one time, many many times where maybe we have something like a sum of x1 plus x2 plus x3. And a student just then replaces them all in their iid, and a student replaces them all by x plus x plus x, and then get 3x. But X is not independent of itself, and you'll end up with the same mistake as this. So I wanna mention that counterexample, okay? So and there's other ways to see it, too, but we just talked about three reasons why this was not Poisson when using the MGF, one by looking at the possible values, one by looking at the mean and variance. So hopefully you're convinced now that that's not Poisson. Okay, so next major topic in this course is joint distributions. That is, something we dealt with a little bit before but just like bringing it in as its own topic in its own right. So joint distributions just means, how do we work with the distribution of more than one random variable, okay? So that's why everything in this course is cumulative, right? Because if you don't fully understand the CDF of one random variable then it's going to be really hard to understand the joint CDF of more than one random variable, okay? So joint distributions, we already talk about independence versus dependence right? If you have independent random variables joint distribution just means multiply the individual CDFs of the individuals PDFs and it's pretty straightforward. Remember the slogan independent means multiple, okay? But in general we need to have some tools and notation and so on for dealing with dependent random variables. Maybe just two of them or maybe a million of them, okay? So we're gonna talk about joint distributions. And I think the best way to start is in the simplest case, where we have two random variables. And let's even say there are two binary random variables. So we can think of this in terms of two by two tables. And this may seem really, really simple. I hope it seems pretty simple, cuz then if you understand this simple case really, really well, it'll give you a lot of intuition for the more complicated case. Okay, so I'll start with a simple one where x and y are Bernoullis. Possibly dependent, possibly independent, and possibly the same p, possibly different p's. I'm not saying they're both Bernoulli-p with the same p. Okay, then we can think of this in terms of two by two tables. So we could draw an example light like this with a table and we could just tabulate values where here is x = 0, x = 1, and y = 0, y = 1, okay? Okay and then to specify the joint distribution all we have to do is put in four numbers here that are non-negative and add up to 1, right? Any four numbers you want as long as they are non-negative and add up to 1 that will be a valid joint distribution. So remember for your know PMFs to be valid a PMF just non-negative adds up to 1? Completely analogous it's just now in two dimensions instead of one dimension okay? So we can just make up four numbers that add up to one. I guess we can talk about some of the general definitions here. So this is for this specific case. But let's also talk about the general case. So if we have x and y, first of all, they're joint CDF. It's completely analogous to the individual CDF. So the joint CDF is the function of two variables now. F(x,y) = the probability X less than or equal to x, Y less than or equal to Y. Similarly we have a joint PMF in the discrete case. Which would just be the probability that X equals little x, Y equals little y, all right. So we just add this part. That's the PMF. The joint PMF means we're considering both of them together, okay? Now, in the case where they're independent if x and y are independent. That means that this joint PMF is the product, P of X = x times P of Y equals y. So we need, so that's called the joint CDF. Joint PMF. And now, so this is when we're considering them together, right? Because it's comma within the same P. Right, it's considering them jointly. Okay if they're independent, that's equivalent to independence is you can split this up. Okay, so now there's another concept that we need called marginal The marginal distribution, marginal just means take them separately. So the marginal distribution for x would be a probably x less than or equal x is called the marginal distribution of x. Similarly marginal PMF would just be just this part, okay? So therefore in words, we could say that marginal Independence means that the joint distribution, the joint CDF is the product of the marginal CDFs. Okay, and similarly we have, we can continue this over here, we have the notion of a joint PDF I'm doing kind of discrete and continuous together, because they're analogous to each other and they're analogous to the one dimensional case. So a join PDF, which we might write as little f(x, y) such that, so this would be the continuous case in two dimensions. What does it mean to be a joint PDF? Just as like in the one dimensional case, the PDF is what you integrate to get a probability. Two dimensional case, same thing. If we wanna know what's the probability that x and y are in some set, let's say x,y is in some set B, where B is some region in the plane. Maybe it's a rectangle, maybe it's a circle or something. Just imagine some area in the plane. Then what we do is integrate over that region f of x, ydxdy. So that's the first time that we've written down a double integral here, but as far as what we're concerned, for the most part, double integrals, for this course, the double integrals, we're not gonna need to do a lot of them. And when we do normally we can just think of it as one single integral and another single integral so just do two integrals. But the intuition should be clear, right? The PDF is what you integrate to get a probability. So it's completely analogous. And so independence means that We've already talked about this before, I'm just using new terminology for it. Independence means that the joint x and y are independent. If and only if, The joint CDF is the product of the marginal CDFs. So, I'll call that, just for emphasis, it would be confusing to use the same letter F here without any clarification. This is the marginal CDF of x. This is the marginal cd F of x, this is the marginal cd F of y, this is the joint cd F, okay? So it says that instead of having to do some kind of complicated joint thing, I can just find the probability of this event times the probability of this event. So that's the definition of independence. But we've seen over and over again that usually it's easier to use PDFs or PMFs. So it's equivalent, it's not too difficult. It's a little bit tedious but with some algebra, we can show that it's the same thing as saying the joint pmf is the product of the marginal pmfs. That's in the discrete case. And in the continuous case, that the joint PDF, is the product of the marginal PDFs. And I wanna emphasize that this has to be for all x and y. Not just for x and y that make this thing positive. You have to pay attention to the zeros also. We'll see an example like that later. So for all real x and y, we can't restrict it. All right, so coming back to this little example, we can make up any four numbers we want as long as they're non-negative and add up to one. So I made up 4 numbers, just for the sake of example. Two-sixth, one-sixth,two-sixth, one-sixth. So I made up a simple little example here, and I could ask the question, are x and y independent? And to answer that we need to say well two ways we can think about it. One would be so I so I wrote this in terms of, you know, joint CDFs, joint PMF. We could also write something like conditional. That is independence means you don't have the distribution of y given that x equals something. It doesn't actually depend on that x part. So it's the same as the unconditional distribution. Okay, so, well anyway, so each number in this table is one of the joint probabilities, right? So two-sixth is the probability that x and y are both zero, one-sixth is the probability that they're both one, and so on, okay? So to check that they're independent from the definition, well, what that means is we first need to find the marginal distributions and then check that this is true. Okay, now to get from the marginal to the joint. Here's just quickly how do we get marginal distributions? Getting marginals is actually pretty easy from the joint distribution. Because Let's just do the discrete case first. If we wanna know the marginal distribution of x as the marginal PMF, then just by the action of probability, all we have to do is add up the different possibilities for y. So that the sum of all y P of X = x, Y = y, okay? Because just the axiom of probability right? That we're adding up just joint cases the union is this. You can also write it as a conditional. You can also think of this as the law of probability, and write given Y equals y, times P of Y equals y, it would be the same thing. Okay, that just says add up, X = x, but Y could be anything so we sum over Y. That's called marginalizing over Y, that we're just summing up. We start with this thing that's a function of x and y, sum over all y, then we just get a function of x. And in the continuous case, let's get the marginal so that's the discrete case. And the continuous case, let's say we want the marginal distribution of y, similarly, you can get the marginal distribution of y, I'm not gonna write the same thing again. If you want the marginal PDF? So this is the marginal PDF of y. Marginal, this means viewing it. On its own, as its own thing, right? Then all we have to do is integrate completely analogous to this. Integrate the joint density, f of x,y, (x, y), integrate over all x. That's just the continuous analog of that. Here we're summing over all values. I swapped the x and y here just for variety, here we are summing overall values of y. Here we're integrating overall values of x, the joint density, okay? So, you can go in that direction, this is getting marginal distributions from joint distributions. You can't go in the other direction. If we only know the marginal distributions that doesn't tell us anything about how x and y are related to each other, right? So you can't go any other way. But you can go from the joint distributions to the marginal distribution. So for this example, let's get the marginal distributions. So what's the probability that y equals 0? Well, obviously, we're just adding this case plus this case, right? Cuz those are the two cases where y equals 0. So we add those two cases, we get four-sixths. Add these two cases, we get two six's. And for the other way around if we want the X = 0, just add this case and this case and you get three six's or one half. This one plus this one, 3 / 6. And by the way, one thing you have to be careful about, is the terminology in economics and statistics is very different. And when you take an econ class you always hear about marginal revenue and marginal cost and things like that. And usually in like, AP Econ, then they don't want to use calculus, and so they explain everything is incremental, if you do one more unit of something, then what happens? And then later when you actually see what's going on with calculus, you realize that in Econ, marginal means derivative and in statistics, marginal means integrate, so it's completely opposite meaning and I don't know where the Econ term came from but you can see here where the statistics term came from. Cuz it's called marginal cuz we write these numbers in the margins. So that's a marginal distribution. So, once you understand this two by two table, you basically have the key intuition into joint distributions. In this case, here they are independent in this example. To check that they're independent There are other ways to do it, but just to check it by the definition, what independence means is that to compute any of these entries. Let's say 2/6ths Asl I need to do is find the probability that X = 0 x the probability that y = 0 so I'd multiple 3/6ths times 4/6ths. Which is 1/2 times 2/3 is 1/3 which is this, so if you get this number I can multiply this times this and so on so you check this four numbers. So each of these joint probabilities is obtained by just multiplying two marginal probabilities. So that means they are independent. Or as you can make up your own examples, if you just here is kind of an extreme example it doesn't have to be this extreme. But I can pick whatever numbers I want as long as they're nonnegative and add up to 1. For example, I just made one up here where these nonnegative add up to 1. So this is a perfectly valid joint distribution. But you can see right away that this 0 means that it's not gonna be true, that if you multiply, you can't obtain it that way cuz if you do the marginal thing again, 1/2, 1/2, and this is 1/4, 3/4, and you multiply 1/4 x 1/2, you don't get 0. So this one would be dependent. This one is dependent, you can make up your own examples. It doesn't have to have a 0 in it to make it dependent, that was just an easy, extreme case to see what's going on. Okay, so this is a simple two dimensional discrete example to think about. Let's also do one simple continuous example just to have some intuition on what this all means. So the simplest way to start is I think at the uniform distribution. What is uniform in two dimensions mean? So let's consider as an example what if we have uniform on the square that's all x y such that x and y are both between 0 and 1. So we just have this square here. We can draw our coordinates, and have a square here where this is 1 and this is 1, okay? So we have this square, and we want a distribution that's uniform over this square, so. Remember, in the one-dimensional case, uniform meant that the PDF was constant on some interval, okay? So the analogous concept would be, we want a PDF, which is gonna be a joint PDF, and we want it to be constant on that square. And 0 outside the square, right? So, that just captures the notion of being a completely random point. As we're picking a random point, x comma y, we want a completely random point in the square, so we want the density to be constant all over that square. So 0 outside, let's find the joint PDF. Well, the joint PDF, therefore from what I just said is some constant c, if x and y are both between 0 and 1, and 0 otherwise. Now in one dimension, if you integrate the number of one over some interval, you get the length of the interval. In two dimensions if you integrate the constant one over some region you get the area of the region so if we integrate this thing we get the area, so the integral is area, So C = 1/area would normalize it, which = 1 because the area of that square is 1. So the joint PDF would just be 1 inside here and 0 outside, and if you want the marginal distributions, then just integrate out the, integrate this Dx or integrate this Dy, you'll get 1 so marginally, X and y are independent uniform, which is pretty intuitive uniform 01. Which is kind of intuitive right because it just says if you pick a random point in the square in the x coordinates uniform, the y coordinate is uniform. So that's pretty straightforward, that's an example of independence. But I want to contrast that with an example of dependence, where instead of a square, let's use a circle. So, suppose we want uniform in the circle I'll say disc for clarity. A circle might just mean a circle we want everything inside. So on the disc, x squared plus y squared less than or equal to 1. Okay, so let's see what that looks like so we just draw a circle. Sorry, it doesn't look like a very good circle, but pretend that that's a perfect circle centered at 0 of radius 1. And we wanna be uniform in here, okay? We wanna write down what the joint PDF, what are the marginal PDF's, okay? So first of all for the Joint PDF by the same kinda reasoning. It's just because its uniform that that means another way to say uniform is that the probability of some region must be proportional to its area, right. So now in one dimension I said probability is proportional to length for uniform distribution. Here probability is proportional to area, so because of that the normalizing constant has to be 1 over the area of the circle, Pi r squared, so that's Pi. So a joint PDF is 1 over pi inside the circle and 0 outside. And a common mistake with this kind of thing is to then think that that says that they're independent because that's just a constant so it doesn't It looks like I can factor 1 over pi as constant times a constant. It's just a constant but they're not independent because of this constraint. They're actually very dependent because for example, if x is 0, then y could be anywhere from -1 to 1. But if x is close to one, then y has to be in some tiny little interval, right? So, if we fix x to be here, then y could be between here and here, right? So the values depend on where, that is knowing x constrains the possible values of y. That says that they're not independent. So here x and y are dependent. And in fact, we can show that given that x equals x Then, we can actually say, what can y be? Y has to be between square root of 1 minus x squared and minus that. Because x squared plus y squared are less than or equal to 1. So this depends on x, this is the constraint. So we might guess that Y is uniform between here and here. That is if X is here, then we know it's between here and here, but could be anywhere, right? So a good guess would be uniform, but next time we'll do an integral to show that for practice. But you can see right now they're dependent. Okay, so see you on Friday.

Info

Channel: Harvard University

Views: 59,353

Rating: 4.9189191 out of 5

Keywords: harvard, statistics, stat, math, probability, MGFs, exponential distributions, normal distributions, sum of poissons, joint distributions

Id: tVDdx6xUOcs

Channel Id: undefined

Length: 49min 40sec (2980 seconds)

Published: Mon Apr 29 2013