Lecture 12: Discrete vs. Continuous, the Uniform | Statistics 110

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

So, the main topic for the next couple lectures is continuous distributions. We've learned about the binomial, and the poisson, and the hypergeometric, and so on, and at this point we've covered all of the famous discreet distributions that we need in this course. And no now is a good time to start talking about the continuous distributions. I like to do discreet before continuous, because conceptually it's simpler to think about discreet. But it doesn't mean that continuous is harder, necessarily, because discreet is kind of conceptually easier in a sense. But, on the other hand, we have all these nasty sums that come up, and so we learn some ways to sometimes avoid the sums using stories, and so on, but sometimes you just have the sum you can't deal with. The continuous case, we'll be doing integrals instead of sums, and even though this sounds counterintuitive, in general, it's easier to do an integral than a sum. Although the same thing could come up, we could be faced with integrals we don't know how to do. So again, we're gonna try to look for kind of more clever, and more conceptual ways to avoid having to do lots and lots of integration. But anyway, we'll come to that later. But, a lot of the ideas are completely analogous. So, at this point, I'm assuming you have a pretty good understanding of what a PMF is, and what is a discreet distribution? What does it really mean, and the expected value of a discreet distribution, and now we're just gonna move into the continuous case. So, I think, and just for having a big picture on this, it helps to just kind of contrast the two things. So I'm gonna make kind of a dictionary of discrete world and continuous worlds. So we can put discrete world over here and continuous world over here. So we have a random variable that we're looking at, and usually we've been calling our random variable x in the discrete case, and usually we'll call it x in the continuous case. So, so far it's completely analogous. We got discrete, continuous. Now in the discrete case, as you're very familiar with by now, we have a PMF, Which you can just think of as the P(X=x), viewed as a function of little x. So if it takes positive integer values, then I would need to specify this for all positive integers x. In the continuous case, the [P(X=x)=0]. So in that case we have a PDF instead, which usually we would write as f(x), but you can call it whatever you want. I'll call it f sub x (x) just to emphasize that this is the PDF of x. So I'm gonna tell you what a PDF is, but I'm just telling you now, that it's analogous to a PMF. The reason we need this is that the [P(X=x)=0]. So continuous, it means we're thinking of random variables that could take on any real value, or maybe any real number in some interval. So say we had the interval from zero to one, and X is allowed to take on real number value between zero and one. Well, I mean we could make up examples where this not true, but in the continuous case every, there are uncountably many real number between zero and one, and any specific number like Pi over four has probability zero. So if we just try to write down a PMF we would just say it's zero, and that would be useless. So that's why we need something else instead. So, I'll tell you what a PDF is, but that's the analogy. Just to continue this a little more, then I'll start telling you more about what PDFs are. We have a CDF. That's this function F(x)=P(X</=x), and sometimes we'll subscript the x just because maybe if we add another random variable y, we could write F sub y for its CDF, okay? Well, in the continuous case we have a CDF, exactly the same thing. So that's one advantage, and we've seen in the discrete case, usually it's easier to deal with the PMF than with the CDF. The CDF in the discrete case is a lot like a step function with all these jumps. It's not so easy to deal with, and this is much more direct. But one virtue of CDFs is a CDF is completely general. So every random variable has a CDF, and so we don't need to separate out the theory. Now, let's talk about PDFs. So, now this is a PMF. So the PDF is the most common way to specify a continuous distribution. PDF stands for probability density function, not portable document format, probability density function. Okay, so the keyword here is density. The common mistake with PDFs is to think that they're probabilities. It's not a probability, it's a probability density. So you can think of density, just in an intuitive sense as like, think of probability as mass. Remember if the pebbles with the total mass equals one? But in the continuous case we can't think of pebbles any more, it's more like we just have this kind of massive of mud that we're smearing around the space. So I think of discrete as pebbles, continuous as mud. The total mass of the mud is one, and density, makes you think of mass per volume, mass per area, things like that, mass per length. Okay, so it's probability per something, but not probability. So we say that x, this is a definition. A random variable, X, has PDF F(x) if in order to find probabilities for X we can achieve that by integrating the PDF. So, if the probability of let's say X is between a and b, that is X is in some interval a b, must be given by the integral from a to b of f(x)dx. For all a and b. So, f(x) is not a probability, it's what you integrate to get probability. Integrated density, then you get a probability. So that's the definition, and let's see how this relates to CDF and other things. Notice, by the way, that if we let a = b, Then we're integrating from a to a, of f(x)dx. So that's the area under the curve from a to a, which is zero, cuz you haven't actually specified any interval, so that's zero. Which agrees with what I said there, the probability of any specific point is zero. We need an interval of non-zero length. Okay, so that's called a PDF, and to be valid, remember, for a PMF, I said that a PMF is valid if they're non-negative and they sum up to one, right? So by analogy, for a PDF we want them to be non-negative, and rather than summing to one, it should integrate to one. Okay, so to be valid, f(x) is greater than or equal to 0, and the integral of f(x) for minus infinity to infinity should equal 1. Otherwise, we have not specified a valid PDF. So it might look something like, just to draw a picture, an example, maybe a PDF. A famous example would be the bell curve type of thing that we'll get to later. But anyway, for the purpose of the picture, I don't really care exactly what the definition of this function is. But I'm just drawing some curve from minus infinity to infinity. Now it might be that it's 0 on the negative side and only positive to the right or whatever, but it's some continuous looking curve. And the total area, if I shaded the whole area under this curve, I would get 1, right? And so larger points where the density, I drew a symmetric one, but it doesn't have to be symmetric, it could be some nasty looking curve. As long as it's non-negative and the area under the curve is 1, those are the requirements. So to kind of interpret a little more of what does the density really mean? Cuz I said it's not a probability. If we take f(x), let's say, at some point x0, what is that really like? If we take some point x0 here and we say the density is this number. What does that mean? It's possible that this number is greater than 1, for example, because you can have a function that sometimes is greater than 1, but the integral could still be 1, right? So we can't say that's a probability, but what we can say is, so this is a density. So if you think of it as like probability per unit of length, then if we multiply by some small number, let's say, epsilon is approximately the probability that x will fall in an interval of length epsilon. Let's call this, let's say, x0- epsilon/2, X0 + epsilon/2. So all I did was take x0. The probability of the random variable exactly equaling x0 is 0, okay? But if we take some tiny, so for epsilon, very small. So the probability is 0 of it equaling x0, but we take some tiny little interval around x0, I just wrote down an interval of length epsilon. Then the probability is approximately the density times the length of that interval. But by multiplying by this epsilon here, we're kind of converting it back into a probability scale instead of a density scale. And to see, so this is kind of a good intuitive way to think of a density. But I haven't yet shown you why is that equivalent to this mathematical thing that I wrote here. But to see why this is true, just by staring at this, If we wanna find the probability, then what do we do? We integrate the pdf from here to here, right? So imagine this integral where you're integrating from here to here, okay? And then let's think about what would that integral be. Well, I didn't just say epsilon was small, I said epsilon is very small. I could have said very, very small. Now if epsilon is very, very, very small, what that means is that in that tiny little interval, f is not gonna change very much. So over that tiny miniscule interval, we can treat this function as being approximately a constant. And it's easy to integrate a constant, the integral of a constant is just the constant times the length of the interval. And that's all we did, so we're treating it, if it's approximately this constant on that interval times the length of the interval. So, that's why this follows from this. And this is more useful for driving things, but this gives you some more intuition on what's the difference between a probability density and a probability. Okay, so, let's see how is this thing related to the CDF? So if x has PDF, little f, let's find the CDF. Well, by definition, the CDF is the probability, That x is less than or equal to little x, but by definition, I said the definition of a PDF is that's the thing that you integrate to get probability, right? So if I wanna know what's the probability that x is in any region, all I do is integrate the PDF over that region. So here, it would simply integrate from minus infinity to x of, I could call it f(x)dx, but it's a little bit clearer to change the letter, so f(t)dt. t is just a dummy variable here. I just didn't want it to clash with this x. That is, for any particular number x, we're gonna take this curve. Let's say x is here. If this x is this x we're looking at, then we're saying just look at the area under the curve up to this point. That would give us the CDF at that point, all right? Because we wanna know the probability of everything to the left, and probability is given just by taking area under this curve. So it's just the area under the curve from minus infinity up to x. That's all we're doing. So that shows how to get from a PDF to a CDF, okay? Well, what about the other way around, if we have a CDF, how do we get the PDF? So go the other way around, if x has CDF, capital F, and of course, we're assuming it's a continuous random variable, not a discrete one. So in the continuous case, by the way, the terminology, it could be slightly confusing because when we say we have a continuous distribution, it means capital F should be continuous. But we don't just want it to be continuous, we want it to be differentiable. So the continuous refers to not so much to F being a continuous function. It refers to the fact that x can take on a whole continuum of values, rather than just discrete values, okay? So if it has CDF F, and x is a continuous random variable, and then we want to get, From the CDF to the PDF, So f(x) =, so let's think about that. This is the relationship between a CDF and a PDF, okay? And but now I wanna say, if we know this integral, how can we extract out this? Well, the answer is just take the derivative, right, f(x) = F'(x). And why is that true? Your favorite theorem of calculus, that's the fundamental theorem of calculus, FTC. Actually, we're gonna need both parts of the fundamental theorem of calculus, so it's nice that actually that it is pretty fundamental. At least the way I learned it, part one of the fundamental theorem of calculus said if you have an integral that looks like this, up to some indetermined upper limit, if you take the derivative of that, then you just get this function. So that's the first part of the fundamental theorem of calculus. The second part of the fundamental theorem of calculus says that if you wanna do a definite integral, you find anti-derivative and then evaluate it at the two end points. So okay, this is just saying the derivative of the CDF is the PDF, in the continuous case. So it's a very straightforward relationship between them. And if we wanted to know, this also kind of confirms something we did earlier. Let's say we wanna know the probability that x is between a and b. And in the discrete case it's crucial whether less than or equal, and so on. In the continuous case, it makes no difference if you write strict or not strict here. So according to the definition of a PDF, if we wanna get the probability of this interval that x is in that interval. All we do is integrate the PDF from there to there. But another way to think about this would be, remember your fundamental theorem of calculus, and the notation matches up pretty well too. Because, like in AP calculus, usually if you have a function little f, usually it will call its anti-derivative capital F, which is exactly what we're doing here. If we wanna do this integral, we take some anti-derivative. Well, we already have one, that's the CDF. And then we evaluate here, evaluate there. So that's just F(b)- F(a). So that's also true by fundamental theorem of calculus. And that's similar to a result that we had earlier for CDFs, so it's consistent with earlier stuff, okay? So we'll do some examples in a little while. But right now this is just the general framework, and making the analogy, okay? So we have a CDF, and I'll just add here, just to have it in this dictionary too. That's the derivative of the PDF is the derivative of the CDF, question? >> [INAUDIBLE] >> Yeah, and the question is, in this framework, is big F always differentiable? Yeah, we have to assume that it's differentiable. I mean, there are functions that are continuous, but not differentiable everywhere. But in that case it would just be a more complicated thing. And when we say continuous random variable in this course it means we have a CDF which has a derivative. Because if we don't have a PDF then we're not dealing with continuous distributions and things can be much nastier. So yeah we're assuming that this derivative exists. Okay, so that's, So in general if I ask you, find the distribution of whatever, in the continuous case, in the discrete case, you can either give the PMF or the CDF. Those are equally valid ways to describe a distribution. In a continuous case, you can give the PDF or the CDF, those are equally valid ways. Okay, so let's continue this list. In the discrete case, we have the expected value, right? And remember the expected value, we just take the sum of the values times the probability of the values, okay? So in the continuous case, this would just be 0 because all of these are 0, so that's no useful. But by analogy, instead of a sum, we'll do an integral. S the definition of the expected value in the continuous case is that we integrate x times the PDF. So it's completely analogous. In general we're gonna integrate from minus infinity to infinity. And sometimes we'll deal with random variables where the only possible values are say between 0 and 1. And in that case we're just integrating 0 outside of that interval. So then we would restrict it to the region where it's non-zero. But in general that's best the definition, okay? So that's completely analogous. Let's do one more concept that applies in both the discrete and continuous cases. And that's the notion of variance, so we've been talking about expected values. But that's just giving a one number summary of the average, right? But it is not telling us anything about the spread of the distribution, right? How spread out is it? So for that, we need the idea of variance, and the definition of the variance. So intuitively, variance is just supposed to be a measure of how spread out the distribution is. That is, on average, how far is x from its mean? So we might start by trying to do the expected value of x minus the expected value of x. Here's the mean. This is the difference between x and its mean. But if we just did this though, we would always get 0, though. Because by linearity, that's E of X- E of E of X. But E of E of X isn't E of X. Because it's just a constant, so this would be useful because that would just be zero. Okay, so then I guess the most obvious thing to me to do to fix that problem is to put absolute value signs. Because then we're making it non-negative and then there won't be 0 anymore, except if x is a constant. But absolute values are annoying to deal with. For example, the absolute value function, it's this V shaped thing right? It has a sharp corner, it's not differentiable. It is difficult to work with. So the standard way to deal with this is instead of absolute values, to square it. One reason as I said is that the absolute value is just annoying because it's not differential. Kind of a deeper reason though is that the square, anytime you see squares, it should start to reminding you of the Pythagorean theorem, right? It means that there's a lot of geometry, there's a lot of beautiful geometry that goes on with squares, and sums of squares, and right triangles, and Euclidean distance, and things like that. And you lose that geometry if you're using absolute value, and there are other reasons as well. But anyway this is the standard definition of variance. So this is on average, how far is x from its mean, except that we're squaring it. One annoying thing about squaring it though, is that we changed the units. So if x is like a measurement that let's say it's measured in miles. We are measuring some distance in miles and we square it, we've got miles squared, okay? And so that's no longer in the same units as what we started with. So because of that, something more interpretable is the standard deviation, Which is a familiar term. Standard deviation is defined as just the square root of the variance. So this seems, at first, like a kind of convoluted thing to be doing. First we square everything then we take the average then we square root it back again. The reason is that the variances has really nice properties, but on the other hand we changed the units, so we just change it back at the end. So that's the definition of standard deviation. In general, variance is a lot nicer to work with than standard deviation as far as doing the math. But then at the end of the day when you want to have something interpretable. It's easier to think about what the standard deviation means, because you're back on the original units. Okay, and let's just write one. One nice thing about this letter E notation, this is a really good notation. E for expectation. Because I could just write down this one thing and I didn't assume here that X is continuous or discrete or anything. This is just a general definition and I didn't need to write a separate definition for the discrete or for the continuous case. So this is a unified definition. Let's just write the other. Another way to compute variance, rather than doing this, so another way to write variance which is more commonly used than this one. This one's the usual definition but the other way to write it which I'm about to show you is usually easier for computing it. Not always, sometimes this one's easier. Another way to express variance. So we want the variance of X. Let's just expand this thing out. I'm just gonna multiply it out, right. So that's X squared. -2 X(EX) I'm just squaring this thing + (EX) squared. And let's use linearity this is E(X) squared, minus. Now for this middle term, the 2 is a constant and constants can come out. The E(X) is also a constant, right? X is a random variable, E(X) is just a number. The 2 E(X) is just a number that comes out. So that's 2 E(X), and then what's left inside is still an E(X). So we have another E(X) there, and then plus, this thing is also a constant. So taking its expected value does nothing because it's just a constant already. So that's + E(X) squared, and so this whole thing just becomes E(X) squared- E(X) squared. And it sounds like what I just said was 0, but the parentheses are different. Here we square it first then take the average. Here we take the average then square it. We take that difference, okay? So that's usually easier. And so that answers the age old question, if you had, this question came up for me I think in seventh grade science class where I had to, do a bunch of experiments and then I got a bunch of numbers. And for some reason I was squaring the numbers and I wanted the average. I didn't know whether I should square first and then average or average first and then square. And I think I computed both ways and I got a slightly different answer. Which one is correct? Well, this one doesn't say which one is correct, but this says that this one will always be bigger than or equal to this one. And equality holds only in the case when X is a constant. So if X is a constant, then the variance is 0 cuz X just equals its mean obviously. If X is not a constant, than what's gonna happen is that this thing, that you're averaging some numbers that may be sometimes 0. But it's certainly sometimes positive, and you can't average positive things and get a negative number. You can't average positive things and get 0. So it would be strictly positive. Which means this is strictly greater than this. Except in the case of a constant. So, okay, that's the variance. And as far as notation, it's standard to write E(X) squared for E(X) squared this way. That's just standard notation. So, if you see E(X) squared, you should always interpret that as squaring first, and then take the E. That's just a convention, a pretty standard convention. This way is a little more clearer, to avoid any possible ambiguity. But, it's very common to see it written this way. So interpret it as squaring first. Okay so that's variance, and over here we can continue our little dictionary, variance of X = E (X squared)- E (X) squared the other way, And then the continuous case, same thing again. And the one difficulty with this is, we've been talking on how do we compute E(X), but how do we actually compute E(X) squared, that's the question that we need to address. How do you actually compute that thing? So we'll talk about that a little later. But first, we should see at least one example of a continuous distribution. The simplest one to start with is called the uniform. As far as what you'll need before the midterm, there are only two continuous distributions that you need to know by name before the midterm, and then we'll do more later. One is the uniform, the other is the normal. Uniform is the simplest continuous distribution, so we'll start with that one right now. Normal distribution we'll talk about mostly for next week, and normal distribution is the most famous important distribution in all of statistics. And the reasons why it's so important will kind of gradually emerge over the semester. Let's start with the uniform. So here's the uniform distribution on some interval on (a,b). So we have some interval from a to b. I'll say here's a, here's b. We wanna pick a random point in this interval. I'll put random in quotes. In this interval. How do we do that, the question is what does random mean? If it's sort of intuitively random is too vague because that just means we have some random variable okay? What if we said completely random. Like what's the most random that it could be? Again that's a little bit vague but let's just kinda explore that intuition a little bit and then write down a formula. If it's completely random see I can just see the probability of any two points is the same because all real numbers between here and here. Every individual number is probably 0. So it's not so interesting to say all the probabilities are the same. So pick some random point say, there, x but the problem if I got that exact value right there was 0. Okay so that means we still have the same way it does mean for it to be completely random. So well the intuitions now is suppose we broke this interval into two halves where this is the midpoint say. Intuitively, if it's completely random it should be that this half is equally likely as this half. Cuz If it were not then it seems like we would kind of prefer to be, you know, the random variable prefers to be more to the right than to the left. And somehow we want a concept where it's not gonna, it doesn't care where it is, right? So in other words we could say that probability, so for the uniform means that probability is proportional to length. That's a reasonable definition. That is, if we take two intervals of the same length, they should have the same probability. If one interval is twice as long, it seems reasonable that that one should be twice as likely. So we're just gonna write down a continuous distribution where probability is proportional to length. And so to specify this, we can either write down the pdf and drive, the cdf, or we can try to figure out what the cdf should be and derive the pdf. Let's start with the pdf here, because we're trying to practice pdfs. So here's the pdf, it's a constant. If x is between a and b and it's 0 otherwise, Because I want probability to be 0 outside of this interval. Inside that interval I want the density to be constant, because if the density were higher at one point than another, that doesn't seem very uniform. So well of course, we could ask, what's c? Well it has to be that the integral of the pdf is 1. And I could start out by integrating from minus infinity to infinity, but of course we only need to integrate from a to b, cuz it's zero outside of there. If we integrate this we have to get 1, therefore c equals1 over b minus a. So it's just one over the length of the interval. It has to be this way otherwise this would not be a valid pdf. Now suppose we want the cdf. So to get the cdf we just have to integrate this thing minus infinity up to x. So how do we do that? Again, we don't really have to go all the way from minus infinity we can just start it at a. f of t dt, then we have to consider some cases. Well, first of all this is 0 if x is less than a. Well, this expression here that I wrote down is kind of already assuming that x is greater than a. So assume x is greater than a. If x is less than a, then the probability is 0 so the cdf has to be 0. And we also know that it's 1 if well, let me just write this separately. So here's the cdf, if x is less than a it's 0. If x is greater than b it's 1, because we know for sure that x is less than b. Now, the interesting case is what happens in the middle. To get the thing in the middle, all we have to do is integrate a constant and this is the constant with the integral so that I plug in f of t equals c here. That's gonna be c times x minus a. Right just integrate the constant. It's a very easy integral. So that's just gonna be x minus a over b minus a. If x is between a and b. And notice this makes sense because if we let x equals to a here it reduces down to 0. And if we let x equal to b it reduces to 1. So this is a continuous function. So it's saying intuitively, this is a linear function of x. They're saying that the probability is, as you increase x, the probability is increasing linearly. Which make sense, cuz you're accumulating more and more stuff. So let's get the expected value of x. Expected value of x, again, it's just gonna be an easy integral. Because we just have to integrate from a to b of x times the pdf. So I just wrote down x times the pdf. So integrating x is easy, it's just gonna be x squared over 2. So this is x squared over 2 times b minus a. And we evaluate this as x goes from a to b. So that's really just b squared. Let's factor our the 1 over two 2 b minus a. And then it's b squared minus a squared. But b squared minus a squared is b minus a, b plus a. So we can actually cancel the b minus a and we just get a plus b over 2. Just doing that easy integral. Well, that's a very intuitive answer. That's just the mid point. It says the average is in middle which it would really be weird if that didn't happen cuz this is supposed to be uniform. Okay, so that was just check that. Now, we have a bit of a quandary though. For how to deal with the variants. So let's try to get the variants. So If we want the variants then that means we need e of x squared. Because we know this part we don't know this part. How do we get rid of x squared? Well, E of x squared, Equals? So if we think carefully about this how do we get E of X squared? Well, X squared is a random variable. Let's call that thing Y. So let's let Y equal X squared. If we take a function of a random variable it's a random variable. So Y equals X squared. So that's E of Y. And how do we get E of Y? Well to get E of Y then we need to know the pdf, assuming X is continuous right now. To get E of Y then we need to know the pdf of Y and then we integrate Y times the pdf of Y, it'd be Y. So the question is do we need the pdf of Y? But that sounds kind of annoying because we don't know the pdf of Y. Now we can get the pdf of Y, and later in the course we will talk about how do we get the pdf of Y, but right now that's seems like a pretty annoying problem. So let's kind of do this more carelessly instead. Let's just say well it's too much hustle to get the pdf of Y. So instead I'm just gonna say I'm gonna reason by analogy. And I'm looking at this formula right now for E of X. But I don't want E of X. I want E of X squared. So I'm just gonna change that to an X squared. All right, I want X squared, not X, so I'm just gonna put down X squared there. And then I'll go f of x dx. That's the pdf of X, that's what I know. And I'm too lazy to find the pdf of Y, so I'll just change X to X squared. Well, that doesn't sound very legitimate. This, what I just did is called the Law of the Unconscious Statistician. Which has a nice acronym that's just LOTUS. It's called that because that just seems like if you're kind of like half asleep and you just want to find this thing and you just kind of replace X by X squared because X squared and it seems like something you might do if you're not thinking very hard. So to state it in general in the continuous case, we want the expected value of a function of that. X is a random variable who's PDF we know. We want the expected value of a function of X. So, the Principled Approach would be, find the distribution of this and then work with that. The Lazy Approach would be, still use the distribution of X but that sounds kind of too good to be true. So the Lazy Approach here would be well, I'm gonna take g of X I am gonna change big X to little x. And then I am still gonna need insist on using the density of x and not convert anything. Well, this turns out to be true. So I'll put a box around it. We can talk sometime next week about the proof, why this is true. But this turns out to be true. And thus, even though it sounds too good to be true, it actually is true. So that's called LOTUS. This is the continuous version. In the discrete let me write both versions. So a continuous LOTUS is that thing I just wrote, we have LOTUS so same equation you can copy that there. And let me just write the discrete case, again we want the expect value of some function g of S, so all I'm gonna do is take this. This is the definition of the expected value. All I'm gonna do is change X to g of x. So this is gonna be g of x times the PMF of x. It says we don't need to convert and get a distribution for g of x. We just do that. This is also valid. We'll talk more about why later. But it's useful to know that right now. So coming back to this problem about the uniform, if we want the variance of the uniform so let's let, just for simplicity let's let u be uniform between 0 and 1. And suppose we want the variance. So we know the expected value of u, Is one-half, just the midpoint. And if we want E of u squared. According to LOTUS, we don't need to first find the PDF of u squared. We can just directly write down the integral 0 to 1 of u squared times the PDF times the PDF f sub u of u du as the PDF, but this PDF is actually equal to a constant and that constant is 1 in this case. So this is just equal to, this part is just one. So it's the integral of u squared, the u, u cubed over 3, which is 1/3. So therefore the variance of u equals e of u squared, minus e of u squared the other way. And that's one-third minus one-quarter equals one twelfth. So the variants of a uniform zero one is a one twelfth and that was a very easy calculation because we were able to use lotus here, which we haven't proven yet but we will talk more about that later. I'm showing you how to use it right now, then we'll justify it more. So that thing that's too good to be true actually works. So that's Lotus. One more thing about the uniform distribution. It seems like the uniform is the simplest continuous distribution that you could possibly imagine. Because the PDF it's just a constant. On some interval and one other point about this is we have to have some bounded interval here. We cannot define a uniform distribution on the entire real line. Sometimes that it's a bit annoying if there isn't one. But if the whole real line there will be no way to normalize it, there'd be no way to find a constant and make it disintegrate to one. So it sounds like this is an extremely simple distribution. And it is, it's just constant PDF on some interval. Extremely easy. So start with the uniform zero one, it seems very simple, but actually uniform zero one has the property that if you give me one uniform random variable and you're interested in some other distribution, there is a way to convert it and simulate that. That is from the uniformed zero one you can simulate or generate from any distribution no matter how complicated it is. At least in principle. As a matter of computation that may be easy or hard, but in principle from the uniform you can get anything, so I call that universality of the uniform. Universality of the uniform means that given a uniform you can create any distribution that you want. So that's kind of theoretically nice in that it kind of unifies concepts and says this things that's seems very, very simple to just one uniform. You can actually use it to generate something that's as complicated as you want. That's kind of cool but it's also useful in practice where most computers programs can generate random numbers between zero and one, it's actually pseudo random. But they not know how to generate whatever complicated distribution you're interested in. And this, in many cases gives you away to convert from the random uniforms to whatever you want to simulate. So I want to show why that's true. So the statement is that, we're gonna start with the uniform between 0 and 1 and let F be a CDF, that we're interested in. So usually we've been talking about here's the random variable and then find in CDF. Here we're going the other way in the sense that we assume that we have some CDF that's of interest to us. But we do not yet have access to a random variable that has that CDF. So let F be a CDF and it's possible to generalize this further. But to make this something that we can do fairly quickly let's assume that F is strictly increasing, so we don't have to deal with flat regions. And let's also assume that F is continuous as a function. Just so that we don't have to think about jumps right now, although you can generalize this. Now then the theorem says that if we let X define X to be F inverse of u. So the inverse function exists in this case because I took something that was continuously and strictly increasing, it will have an inverse. So we take the inverse and we plug in u. Then the statement is that X is distributed according to F. That is the CDF of X is F. So what this says is we have this CDF we're interested in. We take the inverse CDF, plug in the uniform, and then we've constructed a random draw from that distribution we're interested in, capital F. So let's prove this very quickly. And the proof doesn't require anything fancy at all. It doesn't require anything, except for understanding what a CDF is. So another reason I like to talk about this is it's just good practice with really understanding what a CDF is. Cuz the better you understand CDFs, then the easier it is to see why this is true. So to prove this, all we need to do is to compute the CDF of X. This notation means that X has the CDF F, that is, X follows this distribution. So all we have to do is compute the CDF of X, but that's actually pretty easy. Because by definition, X is F inverse of u, I'm just plugging in what X is. Now let's apply capital F to both side. So I am just putting F here and F here. And that's an equivalent because I made these nice assumptions about F that's an equivalent to n, u is less than and equal to F of x. You know if we didn't have an increasing function then if I apply both sides by minus one then the inequality flips, things like that. But since we have an increasing function then It's preserved. And since its invertible, this is really the same event just written in a different way. Now we are done with that, because what the probability that u is less than or equal to F of x. I'll just draw a simple little picture. U is uniform from 0 to 1, and F of x, remember that's a probability, so that's just some number between 0 and 1, let's say it's there F of x. Now I said that probability is proportional to length for a uniform and in this case that proportionality constant is just 1 because the length of the whole interval is 1. So for uniforms 0 and 1, the probability of an interval is its length. So we want to know, what's the probability that u is between here and here. That's just the length of the interval that's F of X. And that's the end. That's the end of the lecture. So have a good weekend. Thanks.

Info

Channel: Harvard University

Views: 78,062

Rating: 4.9087138 out of 5

Keywords: harvard, statistics, stat, math, probability, distributions, probability density functions, standard deviation, uniform distribution

Id: Tci---bVs60

Channel Id: undefined

Length: 49min 56sec (2996 seconds)

Published: Mon Apr 29 2013