Lecture 13: Normal distribution | Statistics 110

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Today is a pretty sad day for Stat 110 because I just saw in the news that a judge in the UK has ruled against the use of Bayes Rule in court. So at least it was not in the US. It was in the UK. I posted a link on Twitter. Well, that's a very disturbing case. I haven't read the full legal ruling yet, but I intend to. But my impression is that the judge kind of didn't like, maybe the judge, I haven't seen all the details of the case. So maybe the judge had a valid point of like kind of the so called expert witness was just kind of like estimating some probabilities and throwing them into Bayes Rule, and as with anything else, it's garbage in, garbage out. If you put in garbage probabilities in Bayes Rule, you get garbage out. But sounds like he wrote the opinion in this case in a kind of sweeping way that others could use as a precedent against using Bayes Rule. So normally I don't wanna inject politics in this class especially not British politics but, I strongly oppose that and I think that that should be overturned. So if any of you have any connections in the British government or anything, maybe we can do something about that. So, sorry I had to bring you that bad news, but at least it's not a US case. So to cheer us up, let's talk more about universality of the uniform. >> [LAUGH] >> So, we were doing that at the end last time. I proved the theorem, but I did not do an example yet. And it probably looked a little bit mysterious last time. I mean, the math is perfectly correct and you could reveal what we did last time. And hopefully, you can follow every step. And if you were confused by the proof, you should review that. But it's a proof, so you can't argue with it. But that doesn't explain what does it really mean, okay? So I'm gonna talk a little bit more about universality. And a couple other odds and ends and then we'll get to the normal distribution which is the other key continuous distribution that we need at least before the mid term. So just to remind you of the statement, universality of the uniform. We proved it last time, I'll just remind you quickly what it said. It said, well that, let capital F be a continuous, strictly increasing CDF. So this is a little bit unusual compared with what we're used to, because we've mostly been starting with a random variable, then finding the CDF. But we did talk about the fact that any function that's right continuous and increasing. And goes to zero if you go to the left, to minus infinity and goes to one as you to the right, as you go to infinity, that's a valid CDF. So here we're starting with the CDF, not starting with the random variable, and I strengthen just to, this result can be generalized to make it easier to prove. I assumed it was continuous, remember in general we just need right continuous, and I assumed it was strictly increasing, just makes things nicer not having to deal with flat regions. But you know in general a CDF could be flat and then increase, flat and then increase. I mean just assuming it keep increasing and increasing, strictly. Those are the assumptions, and then the statement of the theorem is that, If we let X = F inverse of (u) that's the definition of X. If we let X = F inverse of (u), that's gonna have CDF F, if U is uniform 0,1. So this says so that's the reason I call universality cuz you just start with any uniform 0,1 random variable, and at least in principle, we can synthetically create a random variable with any distribution we want. So we're interested in, so this is useful for simulation, right? It's saying, if we're interested in simulating random draws from the distribution of F. One approach to doing that is get some uniforms which are typically it's easier to generate uniforms and to generate other distributions in the continuous case. Then compute F inverse of u and then we're done. In practice, this, in some cases this is very easy. In many cases, it's gonna be very hard to analytically write down this F inverse. But at least in principle, this is saying from the uniform you can get everything. So that's conceptually quite nice. This is also the fundamental reason why remember we talked about these three properties of the CDF, but they didn't prove the fact that those three properties are enough to describe a CDF. If we extend this a little bit it's saying that if we have any function F that satisfies the properties of a CDF, then it is a CDF. Okay so I wanted to do an example and talk a little bit more about the intuition behind this. We proved it last time. There is kind of a flip side to this. So that's statement one, there is another way to write this which is kind of going the other way around. Here we started with a uniform 0,1. We computed F inverse of U, and we claim that that has CDF F, okay. What if we want the other way around and if we already had X? So if we start with X and we don't have a uniform yet, we just have X which has CDF capital F. And then, Just to kinda get an idea of what's going on here. Just apply F to both sides, it would say F(x) = u. So we're going in the opposite direction. If X is distributed according to F then if you compute F(X) that's gonna be distributed as uniform 0,1. And this looks very mysterious to most people the first time they see this, okay? Because there's something beautifully and bizarrely self referential about this. What did we do here? We took a random variable X and we plugged it into its own CDF. So that sounds like kind of a strange thing to do, okay, but first make sure that it makes sense to you, what does that actually mean? What it means is simply X is a random variable. We've talked before about the fact that a function of a random variable is a random variable. F is just a function, so a function of a random variable is a random variable. So that's a perfectly valid random variable. That doesn't explain why we wanna plug X into its own CDF, but we can do it if we want. That's a random variable. And the theorem says that that's actually gonna be a Unif(0,1). Notice of course, there could be a CDF takes values between 0 and 1. We know that this is gonna always take values between 0 and 1. But it doesn't show it's uniform but at least that shows it's going from 0 to 1 which makes sense. So, notice also that there's some notational difficulty that you could run into here. I just wanted to warn you about. Remember the CDF F(x) = P(X is greater than or equal to x). So if you kind of just blindly plug in X here, you would get F(x) = P(X < or equal to X). But, so X less than or equal X, that's an event, but that's an event that always happens. All right? You don't need any kind of not insider information or to do anything to know that it's true. Big X is less than a little big I actually didn't have to tell me that that's actually one. But that would say that F of capital X equals one which is not what we're trying to say this is so this step is actually wrong. And it looked very natural to just plug in capital X, okay? So that's why you have to interpret this carefully. The interpretation is best seen through an example. So, let's suppose that F of x, F of little x equals 1- e to the minus little x. That's a little x here just am writing x so you can see. f(x)= 1- e to the second of -x for x greater than 0. This is an important CDF called the exponential distribution, that we will get to later. Then if X is negative then we set it equal to 0. So this is continuous, and it's strictly increasing on the positive side, it's just flat at 0 on the negative side, but the same argument is gonna work. Then the interpretation would be F(X) = 1- e to the -X, right? So that's very natural, right, I just Change little x to big X. This is the intended interpretation. So the interpretation of this expression is that we first evaluate the function f as a function, something squared or whatever. Write it as a function, then replace little x by big X, then it won't run into trouble. We're here, it's just a notational Issue, but it can be confusing if you haven't thought it through carefully. All right, so this results kind of just looks like a curiosity at this point, but the reason I'm talking about about it is first of all it's just good practice with thinking about the difference between a random variable and a distribution. Just to take the effort to understand what this really means is worthwhile. Secondly, this result is quite useful in 111 and statistical inference and I'm not going to get into that much because I don't want to steal 111's thunder. But the basic idea is simply that X may have some complicated or maybe even unknown distribution. And if we wanna kind of be testing a certain model or things like that, it may be useful, to be able to reduce things to uniform is just a very simple known standard distribution. And so it may be useful to reduce things to uniformity. And if we generate a lot of instances of this and we find that they don't look uniform, then we conclude maybe there's something wrong with the model, that kind of thing. So it's actually quite useful, but right now, I'm just thinking of it more as of a kind of conceptually neat thing. Similarly this one here. Well I just think that's a beautiful result because it says from uniform you can get anything, which is pretty surprising. But it's also quite useful for simulation. So just as an example of how to use this for simulation Let's use this distribution here which I introduced over there. So, let's consider that CDF, I'll just rewrite it, cuz it's easy, 1- e to the minus x for x positive. And that's called the exponential distribution with parameter 1, which will be an important distribution for later, but you don't have to know it right now. And suppose that we have access to a uniform between 0 and 1, and we're interested in simulating from this distribution. So we wanna simulate X which is distributed according to F. Well, all we have to do according to this result is compute the inverse of this function, so F inverse of U. Well this is just like high school algebra finding the inverse function, right? So if I set this thing equal to little u and then this is u in terms of x, we just solve for x in terms of u, right? Then we would get minus log one minus u just by taking the inverse function, okay? So, therefore the universality theorem immediately tells us that f inverse of capital u, which is -ln(1- U) has the distribution F. So if we were doing it on a computer and we wanted ten random draws from this distribution we would just generate ten iid uniforms and then just compute, this is just an easy function, compute this ten times. And then we'd have 10 iid random draws from this distribution, so that's how it would be useful. While we're talking about this particular function, this is a good time to mention something about 1- u. 1- u is also uniform 0, 1. So if we want to, we could have just done minus log of u rather than 1-u, that's an important symmetry of the uniform. And you check this easily for yourself, just simple good practice with CDFs and PDFs. But in terms of a picture, if we're picking u is uniform between 0 and 1, let's say it's there. So use the distance from here to here and its uniform. And 1 minus u is from here to here, right? But there is a symmetry going on so why do we care if we are measuring from the left to the right or from the right to the left. It doesn't matter, We just have a random point, okay? So that's the intuitive symmetry, but you should check the calculation too. In general that's gonna be true that if we take a + bu, where a and b are constants and u is uniform between 0 and 1. So then we're just doing linear stuff to it, that will be uniform still on some interval, whatever the appropriate interval is. So if we start with the uniform 0,1 and we want uniform 0 to 10, just multiply by 10, that's very straight forward. But like an important common mistake to be aware of that is that if we do, this is like a linear transformation, if we do something none linear, then it's usually not gonna be uniform anymore. Nonlinear usually leads to nonuniform. For example, if we squared u, you can check that it will not be uniform anymore. And to check that, just compute the CDF, it doesn't look like the CDF of a uniform, so it's not uniform. So that's not gonna be careful if you can't just say, well, u squared is between 0 and 1, so therefore it's uniform. You have to actually check it, and in that case, if you check it it's not true, okay? So that's what this theorem is doing. It's just saying, in this direction here's how we can simulate. And in this direction, it's saying how do you go back from x to a uniform, okay? So you can kinda convert back and forth between distributions this way. All right, so let's talk a little more about independence just because I said that, sorry, these boards are very squeaky in this room. I said that I'd say something about independents, so I'll still say something about independents. We've talked already about independents of, Events. And I sent an email about this, I wanted, I really wanted to elaborate a little bit about what's the difference between independence of random variables vs independence of events. So independence of random variables, they're closely related, we just define it directly in terms of independence of events. So if we have random variables X1 through Xn, and we wanna say what does it mean for these random variables to be independent? Then the definition is that. Definition is that they're independent. If we look at the probability that X1 is less than or equal to little x1, blah, blah, blah, Xn less than or equal to little xn. So that's an event right, it's an intersection of n events, okay? Now intuitively, remember independence intuitively means that to find the probability of an intersection, we just have to multiply. So what this means is that we can just multiply, notice these are just the CDFs. So to compute this probability we just need to multiply the separate CDF's. This thing here that I wrote down is called the joint CDF. Because this is the CDF of all the random variables considered jointly as one probability thing. In here we separated it out into the separate, these ones are called marginal, we'll come back to that in a later lecture. But these are just the individual CDFs, okay, and that has to be for all little x through little n. So actually it looks simpler than the definition of independence of events. Cuz remember for independence of events, if we wanna say three events are independent, we talked about the fact that we have to the triple intersection. But that's not enough, right, we also need all the pairwise ones. And here I didn't bother writing any pairwise statements or three-way statements, I just have everything all in one. So it looks simpler which would seem strange at first, but the way to resolve that is that this is for all X1 through Xn. So even though it looks simpler, actually I've written down uncountably many equations. And then in the case of just independence of events, it's a large list, but a finite list of equations. Here we have infinitely many equations, it just looks like one equation cuz it's for all X1 through Xn. All right, so that's the definition of independence in general, but in the discrete case, usually it's easier to work with PMFs rather than CDFs. And so then we work with what's called the joint PMF, where, I'm just going to write the same equation again. Except I'm gonna replace less than or equal by equal. So that's just the product of the PMFs. So this thing here is called the joint PMF. And in the discrete case, these two things are equivalent. Proving they're equivalent, there's nothing difficult about it, it's just a little bit tedious to write down. Cuz you have to write up the appropriate sums to convert between these things. And it works out, and the intuition that it should work out should be very, very clear. Because what this is saying is, what this statement says is that, knowing any collections, any subcollection of these Xs, knowing their values tells us nothing about the other ones, right? And this says the same thing. So this is stronger than just pairwise independence. Pairwise independence would say if you know one random variable, doesn't tell you any information about any other one random variable. Full independence means knowing any of them, any collection of them tells you nothing about ones that you don't know. No information whatsoever. So that's a stronger statement. And just to give you a quick example where, pairwise independence doesn't imply independence, here's the simplest example I know of that. Let's just let X1 and X2 be iid coin flips, that is Bernoulli one half fair coins. So if you want, just think of flipping a fair coin twice, and then this is kind of like an old game that people used to play called matching pennies. Where each person like you know pulls out a penny and see whether it's the same, or not one person wins if they're the same. And if there both heads or both tales the other person wins otherwise. I think that used to be a popular game but not so popular anymore. However, that suggests another random variable which is a natural one to look at, that is just are they the same or not? So let's let X3 equal 1 if X1 equals X2 and 0 otherwise. So that's just saying, 1 if, if the two pennies match, 0 otherwise that's an indicator of a random variable, right? So okay, these are pairwise independent but not independent. It's obvious that they're not independent because if I know X1 and X2 I immediately know X3, X3 is a function of X1 and X2. So not only does it give me information, it gives me total information. So knowing X1 and X2 give us total information about X3. However, just knowing X1 tells us nothing about X2 because those, I assumed, are independent. Just knowing X1 tells us nothing about X3 because it's still 50-50, right? If we know that X1 is 1, then this just reduces to saying X2 is 1, in order for this to be 1 but that's still 50-50. So similarly X2 is independent of X3. So they're pairwize independent but they're not independent. So, I just said in words why that's true, but you can try checking what happens with these equations, why does that agree with that? So, pairwise independence isn't enough in general just to have independence. All right, so, the other thing we were doing last time was LOTUS, and we are gonna talk more about LOTUS, definitely on Wednesday. But I wanted to start on the normal distribution first, which is both important and also a good distribution to have on hand when we're doing more LOTUS stuff. Okay, so here's the normal distribution. Now it's also called the Gaussian but I don't call it the Gaussian. Because first of all the Gauss was one of the greatest mathematicians of all time, he has enough stuff named after him already. And secondly, he was not the first person to use this distribution. So it is not really fair to give him credit, so we call it the normal distribution. But I'm just mentioning that because you may see the term Gaussian and it's the same thing. The normal distribution is by far the most famous important distribution in all of statistics. And There are many reasons for that. The most famous reason for that is what's called the central limit theorem, which we're gonna do towards the end of the semester. But I'm just gonna tell you in words, intuitively, what it is now. Just to kind of foreshadow it, just so you have a sense of why is this important. Then we'll go into the details of it much much later in the course. Central limit theorem, The central limit theorem is possibly the most famous and important theorem in all of probability, so we're definitely gonna talk about it later. And it's kind of a very, very surprising result. What it says is that if you add up a bunch of iid random variables, the distribution is gonna look like a normal distribution. So this is just one distribution, or one family of distributions, cuz you can also shift it and scale it. Asides from the shift and scale, it says that adding up the large number of iid random variables, it's always gonna look normal. Which is kind of very, very shocking, I think, because it seems like, I didn't say I was adding continuous random variables. They could be continuous, they could be discrete, they could be beautiful, they could be ugly. They could be anything, you add them up and then it's always gonna look like the same shape. And that shape is the one you've all seen before, it's just the standard bell-shaped curve. But there are different curves you could draw that look like bell curves. So why does this one particular bell-shaped curve, comes up always as the only possibility, that's what this theorem says. So sum of a lot, I'll make this precise much later, when we actually state it as a theorem and prove it. But just intuitively, sum of a lot of iid random variables looks normal. And by looks, I mean if you look at what's the distribution, it's gonna be approximately a normal distribution. And there are even further generalizations of this, going beyond the iid case. You need some technical assumptions, but there are generalizations even beyond this, okay? So that's one reason that the normal is so fundamental, and then there are others as well. Okay, so let's draw a little picture. So it's gonna look like, the PDF is just gonna look like this nice symmetrical bell-shape curve, but there are many possible PDFs. So as long as this thing is supposed to be symmetric, it's more or less symmetric the way I drew it, but it's exactly symmetric. As long as the area is 1, I mean, there's millions of different possible PDFs that you can come up with that look basically like this. The normal is this specific one that's given by, let's start with what's called the standard normal, which is written as N(0,1). This notation means that the mean is 0 and the variance is 1, but we'll prove that later. Okay, so the normal has two parameters, which are the mean and the variance. And we're gonna start with the standard normal, which is mean 0, variance 1, and let's write down its PDF. The PDF is f(z). It's kind of a tradition of using the letter z for standard normals, not that you have to do that. But often we'll use z for standard normals, that's why I'm calling it z rather than x. So, well, it's gonna be e to the -z squared over 2. If you plot this function, just using your old calculus techniques. Find the derivative, and the second derivative, and the points of inflection, and so on, and plot this. Or better yet, just using graphing calculator, you'll get something that looks like this. That's not yet a PDF, though, because it doesn't integrate to 1. So I'm just going to put some constant here, c. Where c is whatever constant we need, that's called the normalizing constant. It's whatever constant we need such that the area will be 1. So you can plot this thing and you see it will look like this, that doesn't yet show, why do we choose this? I mean, it's a nice looking function, right? We can see that it's symmetrical, because if we replace z by -z, nothing happens. We can see that it will go to 0 very, very fast. Because exponential decay's already fast, and here it's being squared up in the exponent. So it's gonna decay to 0 very, very fast, as z gets very positive or very negative. So it's a nice enough looking function, but it will be much later when we see why this is the most important PDF. Right now it's just a PDF with an unknown normalizing constant, okay? So before we can get much further with this, it would be useful to know the value of c. So let's try to get the normalizing constant. So that might be a long calculation, so I'll try to do that over here. So we need to know, in order to make this integrate to 1, we just need to integrate this function. So let's try to do this integral. We wanna know the integral, from minus infinity to infinity, of e to the -z squared over 2, dz. Okay, so that's actually a famous integral in mathematics. Partly because of probability and statistics, but partly just in its own right. This seems like something we should be able to integrate. So of course, you can try doing, I could use substitution or some other kind of change of variables, it will not work. You could try doing integration by parts, and you know there's many ways to try integration by parts. You can try to split it up in some way, it will not work. Anything else you could ever think of, let's say we want to find an antiderivative and let's use the fundamental theorem of calculus, the usual way we do integrals. I can guarantee you that they won't work. The reason is that there is actually a theorem that says that this integral, as an indefinite integral, that is without the limits of integration, is impossible to do it in closed form. And it's kind of pretty amazing, I think, that someone was able to prove that. It's not just like, no one's thought of a way to do it. Someone proved that you can't ever do it, so don't even try. And what I mean by, well, to qualify that a bit, there is one way we could do this integral that will work. And that is the definite integral, that is, that's e to a power. So we've been using the Taylor series for e to a power over and over again, that series converges everywhere, okay? So if we want, we could just expand this out as a Taylor series. And the way to do that would not be to start taking derivatives of this. It would be to take the Taylor series for e to the x, and then plug in x is -z squared over 2. Well, we'll get an infinite series, and then with some analysis you can justify integrating that term-by-term. That is, we replace this by an infinite sum and then integrate each term. And those are all very, very easy integrals, because that's just polynomial stuff. But then we just get this infinite series and we wouldn't know what to do with it, okay? So when I say this integral is impossible, it means that it's impossible to do it in closed form. That is, just as a finite sum In terms of what are called elementary functions. By elementary we mean just the familiar functions, sine and cosine and exponential and log and polynomials and stuff. And anything you can write down in terms of standard functions you can't do. Okay, so this is impossible as a indefinite integral. But that doesn't rule out all hope that we could do the definite integral right that is we might be able to find the area under the curve without first finding anti derivative. We will not be able to find an anti derivative enclosed for them because it's impossible. Okay but we can try to find an area. All right so how do we find the area under the curve. So we have this function here which looks kinda like that, and we want to know this area here from minus infinity to infinity, and so we write down this integral and we can't find an antiderivative, and at this point if I didn't already know how to do this I would probably be stuck. Because that seems very difficult. Though the way we're use to doing this would be find an anti-derivative, right? So we're kind of basically stuck at this point. And so someone, and I wish I knew who, someone came up with an incredibly stupid and an incredibly brilliant way to solve this integral. This method does not usually work, but in this problem it does. And that method is we have this problem that we can't do, so we write it down a second time. [COUGH] Now that solution may have just looked like kind of like banging your head against the wall. That you can't do the integral so you keep writing it down over and over again. That doesn't seem like it would help the situation. Actually, this solves the whole problem. So let me show you why this trick of just writing down the same thing twice solves this problem. Well, let's just change the notation a little bit. This letter z here is what's called a dummy variable. You can change x to whatever you want. This is just notation for area under the curve so just so I can keep track of this is my. First integral that I can't do and this is my second integral I can't do. Am gonna change the notation to X Andy so I can tear them apart. So this is e to the minus X to the second over two dx and this is integral Either -y squared over two dy it's exact same thing. Now let's write this as one integral. So this is a double integral but if you haven't dealt with double integrals much as they said on the first day of class. It's nothing to worry about just do single integrals one after the other. This also can be written as the double integral of e to the minus x squared plus y squared over 2 dxdy where the interpretation of this double Integral is that first we do the inner integral, keeping y held constant, and then do the outer integral. That's the exact same thing because if you imagine rewriting this as a product again, when we're doing this inner integral, we're holding y constant so we can just pull that out. And then what we have left here is one of these intervals, so you can pull that interval out, it would be the exact same thing. So I've written it as a double interval. It still doesn't look so much easier, but then there's one thing that is saving us here. That's the fact that we have this X squared plus Y squared. Whenever you see an x squared plus y squared, it should remind you of the Pythagorean theorem, right? Sums of squares, so we draw a simple little picture. Let's say, suppose x, y is up here just for simplicity. Of course, they could be in any quadrant. So here's x, here's y. And if we let r^2=x^2+y^2, I'm using the letter r for radius, that is just the distance from here to here. Right it's just the distance formula, just basic geometry and there there's some angle theta The fact that we have r squared sitting right there is a hint that it may be useful to convert to polar coordinates. Polar coordinates just says, represent points in terms of a radius and an angle rather than in terms of the cartesian coordinates of x and y. So we're going to convert from x,y to r theta, that is polar coordinates. So our limits of integration are gonna change, so, first of all what is that thing that's just e to the- r squared /2. And we we are gonna integrate as drd theta. We can also do d theta dr, and theta is this angle so that's gonna go from zero to two pi. And r is this length, so r goes from zero to infinity. And then there's one other thing we need here, which is something that, one of the very few facts from multi-variable calculus that we need in this course is what happens when you transform in more than one dimension you You need to multiply by something here called the Jacobean. And I discussed this on the math review handout. So if you're not familiar with this you can review that handout. And I actually do this particular case of this transformation. Cuz this is a very common transformation, convert to polar coordinates. Okay, so if we do that What the Jacobian here works our to r. That's just a little calculation. If you haven't seen it before, then you can read about it in the math review handout or in a calculus book. So that's called the Jacobian. So we don't just replace dxdy by dr d theta, we replace it by r dr d theta is the correct way to do it. That is what now makes this from go from being a very hard problem to a very easy problem. As soon as we have the R here then that just suggests well look if the derivative of R squared is two R. So we have kind of the derivative sitting right here. The derivative of what's up in the exponent is just sitting right there. Now it's just an easy substitution integral. So this is the integral from zero to two pi. Now let's just do this inner integral, and then we have a d theta at the end. So just doing this inner integral, Let's just let u = r squared over two. So du=rdr, okay? So rdr is what we have there, so that's just the integral of zero to infinity, e to the minus u, du. Now that's a really easy integral. Integral of e to the minus u is minus u to the minus u so this integral is just one. So what are we doing? We are integrating one from zero to two pi that's two pi. And lastly, we just notice well, that's not the integral we started with, that's the square of the interval we started with. So therefore, the integral for minus infinity to infinity of e to the minus e squared over 2, dz Since we've wrote it down twice we got 2 pi, if we had written it once we would get the square root of 2 pi. So that's what we need for the normalizing constant. All right. So now we know what the C is. C = 1 over square root of 2pi. Kind of amazing, first of all that this trick worked. And secondly, we're integrating an exponential. And suddenly we get square root of pi, where did the pi come from? Where did the circle come? pi makes you think of circle, where did the circle come from? Circle came from the fact that we're using polar coordinates. So some way the pi appears. All right, so that's nice. Now we know the normalizing constant. So that's the standard normal distribution. Let's compute its mean and variance and then we can talk a little bit about the general normal as opposed to the standard normal, okay? So that's the standard normal. Let's compute its mean and variance. I've already claimed that the mean is zero. And the variance is one, but we haven't checked that yet. So let's verify that, okay? So, first of all let's get the mean. The mean is easy. So we're gonna let z be standard normal, and sorry for the pun. We're gonna compute EZ, which I said is easy. Why is it easy? It's easy because of symmetry. By definition the mean is the expected value of Z times the PDF which we now know is 1 over square root of 2pi. Which I can take out cuz that's a constant. e to the -z square, where over 2dz = 0. That was an easy integral. Why is it 0? Well, I just said it's by symmetry. And the general type of symmetry we're using here. Is that if we have an odd function. Let's say g of just as a general statement. If g(x) is an odd function, which I'll remind you. Means that it has the symmetry property. That g(-x) = -g(x). That's the definition of an odd function. An even function would be if we do not have a minus sign here. That's the definition of an odd function. Then if we integrate g symmetrically. Let's say, from -a to a, where that's the same from -a to a, not from just any a to b, of course. We'll always get 0. And you can do that by splitting this up into two integrals and check that. But the easiest way to see that is just like, as an example. For example, sine is an odd function. And if you have something that looks like that where this is symmetrical. And you say, you integrate from there to there, then the negative area cancels out the positive area. And so it's true. Or you can verify it by splitting up into two integrals. The positive area cancels the negative area. Well, this is an odd function, right? Because if I replace z by minus, this thing I'm integrating, is an odd function. So if I replace z by -z, nothing happens to the exponential part, and that becomes -z. So it's an odd function. So just by symmetry, you can immediately say 0, without having to do some nasty calculation. All right, so that's good. But let's try to get the variance. Variance is gonna take a little more work. So the Var(Z) = E(Z squared)- (EZ) squared. The other way, but that second part we just showed is zero. So that's just E(Z squared). So, now is where we're starting to need LOTUS again. I'll remind you, we haven't proven what LOTUS yet. But we'll talk about that on Wednesday. Here I just wanna show how to use it. Lotus says that if we want E(Z squared), we do not first need to find the PDF of Z squared. We can directly work on in terms of the PDF of Z. That's what we talked about last time. So we know immediately that this is just the integral, I'll take out the 1 over square root of 2pi again. Integral minus infinity to infinity z squared e- z squared/2 dz. So it was exactly the same integral except I replaced z by z squared. That's why it's called the law of the unconscious statistician, cuz that's just kind of an obvious thing to do, just plug in z squared. Lotus says that, that is in fact valid. Okay, so now we just have to do this integral. This integral, this is now an even function. Which is nice but not as nice as having an odd function. So with an even function if we want, we can integrate from zero to infinity instead. And I think I'd rather do that. It's not necessary to convert it this way but I'd rather integrate from zero to infinity just so that I can avoid thinking about negative numbers for a while. Let's go from 0 to infinity and multiply by 2. Since it's an even function, that's perfectly correct, cuz we have a positive area, a positive area and so by symmetry, it's the same thing twice. So just twice the area from 0 to infinity of the same thing. Z squared e (-Z squared)/2 dz, now here I think we need to resort to integration by parts. Which usually I try to avoid, but once in a while we can't avoid it. This integral, so just in terms of the strategy for using integration by parts, remember with your integration by parts, you need to split the integrand into two pieces. One piece that's easy to integrate, and one piece that's easy to differentiate. Now Z squared is easy to do whatever we want with, the part we should focus on is this thing. And we don't know how to integrate that. We know how to integrate it from minus infinity to infinity, but we don't know how to integrate it in general over some interval. But we saw over in this calculation we just did that if there were an extra z here, then it's a really easy integral. So we're gonna split this z squared into two zs, zz there, okay? Now we're in good shape because this thing, we could let this thing be u and this thing be dv, right? Because that's something we know how to integrate. So in other words, we're letting u = z, so du = dz. That's really simple. And we're letting dv = ze to the -z squared/2, so v = e to the -z squared/2. And to check that if we take the derivative of this, there should be a minus sign in front. If we take the derivative of this by the chain rule, we get that, right? So, okay. So, therefore, now we're in good shape to do integration by parts. So this is just your usual integration by parts thing. It's 2 over root od 2pi times (uv) integrated from 0 to, evaluated from 0 to infinity. And then minus the integral of VDU, but that's minus a minus because of that, so we're going to do minus Minus this plus the integral of e to the -z squared over 2 dz, from 0 to infinity. Okay, now we're actually done with this calculation, because all we have to do is say, that's the integral we just did. The only difference is that we're going from 0 to infinity, instead of minus infinity to infinity. This part is just 0, because if you look at what is this near 0, this part is 0 and this part is close to 1, or close to -1. When z is very large, then this part's exponentially small, so this part is just 0. So we only have to concentrate on this part, but that's what we just did. This is one half of the integral that we just computed, right? So it's one-half of square root of 2 pi, and we multiply one-half square root of 2 pi by this, we get 1. So this whole thing just becomes 1, cuz it's just this times its reciprocal. So what this showed was that the variance is equal to 1, which is what I said here, so that's good. All right, so a couple more very quick things about the normal, and then we'll continue next time. Just for an important piece of notation, this is standard notation in statistics The notation is that, is for the CDF, capital Phi is the standard normal CDF. Because this distribution is so important, and yet so hard to deal with in the sense that it's a lot of work to do that integral. And now it's going from minus infinity to infinity, then it deserves its own name. So in other words, Phi(z) equals 1 over root 2 pi, Times the integral from minus infinity up to z, of e to the -t squared over 2 dt. I just changed the letter to a t, to avoid clashing with this z here. So that's just the CDF, right? Because this function is so important, this CDF is very easy to calculate using various computer or calculators, very easy to find tables of this. So in a sense, we got around the problem that we couldn't do this integral by just saying, call this Phi(z), and now just treat that as a standard function. And now we can do this integral, it's just capital Phi, right, so that's standard notation for that. And one other remark about that is, what happens if we get Phi(-z)? This is something you should check for yourself, Phi(-z) equals 1- Phi(z), by symmetry. And just for practice in the concept of symmetry, and CDFs, you should check this for yourself. Just draw a picture and see why that's true, it's a useful fact. All right, so next time we'll continue with the non-standard normal, and that's the standard normal
Info
Channel: Harvard University
Views: 89,112
Rating: 4.9045873 out of 5
Keywords: harvard, statistics, stat, math, probability, normal distribution
Id: 72QjzHnYvL0
Channel Id: undefined
Length: 51min 9sec (3069 seconds)
Published: Mon Apr 29 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.