Lecture 22: Transformations and Convolutions | Statistics 110

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Picking up right where we left off last time, we were deriving the variance of a hypergeometric, right? So I was just wanting to quickly recap that and make a few more comments about it. We basically did the calculation last time, just didn't simplify the algebra. But I wanna say a few more things about that and remind you. So we were doing the variance of the hypergeometric, And we have parameters w, b, n. Which you think of it as w, white balls, b, black balls, and we're taking a sample of size n without replacement. And then we wanna study the variance of the number of white balls in the sample. And I'll just remind you what we did at the very end, last time. And let's actually make up a little bit more notation to make this a little bit nicer. So let's let p equal w over w + b, that's a natural quantity to look at, right? And that is between 0 and 1, it's just a fraction of white balls in the population. And it's also kind of convenient to let w + b = N. That's not a random variable, but that's sort of a traditional statistics notation sometimes, for the population size, is capital N. Sample size is lowercase n. Okay, and then what we did last time was derive the variance, Of x, we decomposed x as a sum of indicator random variables where xj is just the indicator of the jth ball that you draw being white. And then using the stuff we did last time for variance. The variance of the sum is the sum of the variances, then plus, we have all these covariances. If they're independent, you don't have to worry about the covariances. But in this case they're not independent, so we need the covariance terms. So really this is just gonna be a Var(X1) plus blah, blah, blah, plus Var(Xn), and then all the covariances. And sorry, I'll put 2, because I'm grouping covariance of X1 and X2 with covariance of X2 and X1, group them together. So 2 times the sum over all i < j Cov(Xi, Xj). Okay, now here's where this would be, it looks like a complicated sum. But we take advantage of symmetry, like I was doing quickly the last time. But you should make sure that you see the symmetry in this problem. Any of these Xj, the jth ball, before you draw any balls, that's just equally likely to be any of the balls. So we're not, these are not conditional variances, these are the unconditional variances. By symmetry, they're all the same, so it's just n times the variance of the first one. So n times Var(X1), well, X1 is just Bernoulli p, right? So this is just n p (1-p). And then we need all these covariance terms, and choose, by symmetry they're all the same, again. So it's 2, there's n choose 2 terms in this sum. So I'm just gonna write 2 (n choose 2), then I don't have to do a sum anymore. So I'll go 2 (n choose 2), and we want the covariance between X1 and X2. And we did this quickly last time, but it's important enough to write it down again. Cov(X1, X2) =, just to remind you of the definition, or the equivalent of the definition, E(X1 X2)- (EX1)(EX2). So once we do, that's just how you get covariance in general, I mean, that's always true. But once we have this, that will tell us immediately what to put in here. At least immediately, once we think hard enough about indicator random variables actually mean. Okay, so this part is just easy, it's just (EX1)(EX2). And we already know that marginally, these are just Bernoulli p's. They're not independent, but this is just saying look at them separately, so those are just Bernoulli p's. So, that's very easy, that's just gonna be p squared, that term. Now E(X1 X2), as I pointed at last time. If you multiply two indicator random variables, that's just an indicator random variable of the intersection. So this is just the event that the first ball is white and the second ball is white. So first ball is w over w + b, that's just p again, times w- 1 over w + b- 1,- p squared, okay. So that looks messy. If you multiply everything and then simplify and do the algebra, it actually comes out to something surprisingly nice. So this is just algebra at this point, if you simplify it, what you get is this factor. N- n over N- 1, times something that looks familiar, n p (1-p). This part looks very familiar, right? That's just the variance of a binomial np. This factor in front, in statistics, is called the finite population correction. And this answer, it's really neat that it works out to something so simple and similar to the binomial. It looks like the binomial variance, it's we just need this extra correction factor in front. And let's just check this in a couple simple and extreme cases, right, I always recommend look at simple and extreme cases. So one extreme case would be of little n equals 1. Then this goes away, right, and we just get the variance of a Bernoulli p. Well it had to be that way, right? Because if you're only picking one ball, what difference does it make if it's with replacement or without replacement, there's only one ball. So that makes sense, when n is 1. And now let's consider another extreme case, so I'll just write that down, extreme cases. So one extreme case is N = 1. And the other extreme case is if N is much, much, much larger, I'll just write much, much, not much, much, much, larger than little n. Little n is say, 20, big N is 100,000. If that's the case, this is extremely close to 1. Which says we're getting something extremely close to the binomial variance, and that should make perfect sense. Because if the sample is so minuscule compared to the population, it's very, very unlikely that you would sample the same individual more than once, right? You're not doing replacement, but what difference does it make, cuz it's unlikely to get the same person twice anyway, in your sample. Okay, so it's gonna be close to a binomial if this thing is close to 1, so that should make intuitive sense. All right, so that's the variance of the hypergeometric. Okay, so I think we're ready for a change of variables now, change of topic to change of variables. So change of variables is synonymous with transformations. This is something we've done before via other methods, but not as a topic in its own right. But hopefully, the method that we're gonna write down, everything should look kind of natural, because we've already done some similar stuff. So, and we've already been talking a lot About what happens when you have a function of a random variable. A function of a random variable is a random variable. And we use LOTUS a lot to get its expected value, okay? But LOTUS is great but LOTUS only gives you the expected value of that transformed random variable. It doesn't give you the whole distribution. So a lot of times you don't just want the mean or you just don't want the mean and the variance, you want the entire distribution, well, how do you do that? So, let's state it as a theorem and then, then do examples. So, it's more interesting in the continuous case. So, I'm going to state this for continuous random variables. So let X be a continuous random variable. With PDF, let's say, f sub x and let Y equals g of X. So transforming from X to Y by multiplying some function g. We need to make some assumptions on g, if g is a really nasty function then this may not work out very well. LOTUS will still be true, but that doesn't give us the distribution. So let's assume to start with, let's assume that first of all g is differentiable. So in particular it's continuous, but it's stronger than that, we want the derivative of g to exist everywhere, or at least everywhere of interest to us And let's assume that g is strictly increasing, Okay, and then the question is how do we get the PDF of Y And the answer, Is given by, fY of y equals, well, we start with the PDF of X. And then we multiply by dx dy. And I just have to explain the notation a little bit. Here we transformed capital X to capital Y. So a natural thing to do is to mirror that notation with the lowercase letters. So we're defining it to be true that little y equals g of little x. So we're doing the same transformation. Now his looks a little bit strange because, This is a function of y and this is a function of x. And if I ask you for the PDF of y, I'm hoping you'll give me a function of little y. And if you just write this down, you have a function of little x. So the interpretation of this, it is that everything is then written in terms of y, little y. It looks uglier if I write it that way right now. Well, because I made these assumptions that g is nice enough, g will have an inverse so we could also write x equals g inverse of y. So all I'm saying to do is plug in g inverse of y here, then it's a function of y. Dx dy is the derivative of x with respect to y viewed as a function of y. And there are several variations on this. In particular, you can also do dx dy is the same. This is just intro of calculus again, but it's useful to point out. dx dy is the reciprocal of dy dx. That's just the Chain Rule. Remember from calculus these look like fractions. They're not actually fractions but the Chain Rule says they act like fractions. That's just the Chain Rule. So that says we have a choice in doing this we can decide which is easier. Either we could do dx dy directly, or we could take dy dx and flip it. And then we just have to remember to write it as a function of y, either way. So there are a couple of choices for how to use this, and you should think first about which one's gonna be easier rather than just blindly jumping in without actually thinking about it. Also, make sure to check the assumptions. Strictly increasing. So a common mistake on this kind of problem would be to just try to blindly plug into this formula for a function like g of X equals X squared. Now if g of X equals X squared is a very, very nice function. I'm not saying that's not a nice function, it's a parabola. But it's a u shape, it goes down and then it goes up. So it's not strictly increasing so you couldn't apply this. It doesn't mean we can't solve the problem, it just means you have to go back to first principles in that case. This would work, though, with g of X equals X squared if we're dealing with positive random variables. Because then the negative side doesn't come into play. But if you're dealing with both negative and positive values and you're squaring it, that's not increasing. Okay, so you have to be careful about things like that. Don't just plug into the formula all without checking the assumptions. All right, so let's prove this. And the proof should be pretty easy, just based on kind of similar calculations we did before. Like, in a sense, this is easier than Universality, or things like that, that we've done before. Well, our proof is just gonna be let's find the CDF, take the derivative to get the PDF. So it doesn't require any great leaps of thought. We're just gonna find the CDF. We're gonna take the derivative, that's it. So let's do that pretty quickly. The CDF of Y, Probability Y less than or equal to little y, equals, I'm just gonna plug in the definition, g of X less than or equal to, Little y, right? The derivative of the CDF is the PDF. So if we want we can write this as P of X less than or equal to g inverse of y, because I am assuming this function has an inverse, this event is equivalent to this event. That is you can get from here to here and from here back, back to here. It's just the same event written in a different way. But notice that this is just the CDF of X evaluated here right. It's just the definition of CDF so hopefully this is very, very familiar by now. That's just the CDF of x let's call that F sub X of g inverse of y. And just to make the notation a little bit nicer, really that's just FX of x. Cuz I defined it. I wrote over there x is g inverse of y. So that's an easier way to write it. So basically what that says is that for the CDF you don't really have to do much of anything. But for the PDF you can't just say this equals this, that you have this derivative that comes up. That's from the Chain Rule. So now let's just take the derivative of both sides, fy of y. Equals, I'm differentiating with respect to Y. So chain rules says I can differentiate first with respect to X and get the PDF of X. And then we have the correction dx dy to correct for the fact that that we differentiate both sides with respect to Y. Whereas to get to big FX to little fx are differentiated with respect to X. So it's just the chain rule, nothing else to it. All right, so that's what we wanted to show. So let's do an example. So, here's a famous example one of the most widely used distributions in practice is called the log normal. And let's let Y =, log normal does not mean the log of a normal. You can't take the log of a normal random variable, because you can't take the log of a negative number. Log normal means, so this is log normal example. Log normal means that the log is normal, not log of the normal. So if you take, z is standard normal here. More generally you could let z be normal mu sigma squared. But let's just do the standard normal case first. So if I take the log of this, I'll get z which is normal. So that's why it's called log normal log is normal. So we actually had a homework problem about this before, right. Where if you did the problem, what you did was to use the MGF of the normal to find moments of the log normal, okay? But that's just moments, right now we want the entire PDF, okay? We want the distribution. So this is an increasing, this transformation, this is an increasing function. It's infinitely differentiable, right? It's a very, very nice function, so there's no problem with applying that result. And we can just immediately therefore, write down the PDF, fy(y) = let's do it here. Fy(y) =, so I'm just gonna write down the standard normal x is z in this example. 1 over root 2 pi, e to the minus z squared over 2, except that I said that we have to express it as a function of Y, right? So z is log y, so instead of writing z squared over 2, I'm gonna write log y squared over 2. So this would be the normal density, except plugging in log y in for z, and then it says according to this we also have to multiply it by dz/dy. So over here, let's just compute the derivative, dy/dz, =, just the derivative is dy/dz equals the derivative of e to the z, z to the z, right? But I wanna write that in terms of y instead, e to the z in terms of y is y so that wasn't too difficult. And then we just need to be careful about do we multiply it by dy/dz here or dz/dy, right? But that says dx/dy. So it's the reciprocal of this, so we're gonna just put a 1/y. And this is for y > 0. So that's gonna be the PDF. By the way if you ever forget whether this is dx/dy, or dy/dx well, I mean it shouldn't take long to rederive it. But kind of mnemonic is kind of pretend that the dy is over there then it looks really nice and symmetrical f(y)/dy = f(x)/dx, is a handy way to remember it. And you may have been taught that it's not ok to separate the dx from the dy, but when you go further in math then people start separating them again. And as long as you're careful about what things mean it is possible to do that if you interpret it correctly, but I just think of that as an amonic. So, if I every write f(y)dx equals f(x)dx just think of that as a notation, that means exactly this. All right, so the proof was pretty short for this, and an example just use the formula. Pretty straight forward as long as the conditions apply. But while we're on this topic, let's do the multidimensional version, which looks uglier, but it's conceptually the same thing. So now we're gonna have transformations in n dimensions. So transformations again, but now we have the multidimensional version. Okay, so now we think of Y and X as, Random vectors, so Y = g(x), where g is a mapping from Rn to Rn Random vector just think of that as a list of random variable so this Y is really just Y1 through Yn. So it's not really a new concept. It just means we took our n random variables and listed them together as one vector, okay. And so we have a mapping from Rn to itself that we're doing a transformation. And then the problem is, and let's assume that the X = x1 through xn, that vector is continuous. That is, it's a continuous random vector. In other words, we just have some joint PDF, right, cuz we've been talking about joint PDFs, so that's a familiar concept at this point. So we have this joint PDF, and we do this thing, and then the question is, what's the joint PDF of Y, right? So it's completely analogous, just higher dimensional. So I'm not gonna prove that the analog holds, because that's just basically an exercise in multivariable calculus, which is not really that relevent for our purposes. It's completely analogous, so depending on how much multivariable calculus you've done you could either prove it or just accept it as analogous, cuz it is analogous, okay? So we want the joint PDF of Y, In terms of the joint PDF of X, right? So I'm just gonna write down the analogous equation to that, f sub Y(Y), I'm just using that as notation for the join PDF, = the joint PDF of X. Times dx/dy. The only problem with this is, how do we interpret the derivative of this vector with respect to that vector? What does that actually mean? Well this thing, and actually we wanna put absolute value symbols around it. By the way if this function were strictly decreasing we could do the same thing just by sticking absolute values in here. If we forgot the absolute values, we're gonna get a negative PDF which makes no sense. All right, so I just have to tell you what this thing means. Well this thing is called the Jacobian. And I'm sure some of you have seen it but I'm not necessarily assuming that you've seen it before. I mean it's standard multivariable calculus thing. And all the Jacobean, if you haven't seen it before, all the Jacobian is it's the matrix of all possible partial derivatives. And as I said on the first day of class, if you know how to an ordinary derivative, you know how to do a partial derivative, you just hold everything else constant. So dx dy equals just to write it out. It's a matrix of all possible partial derivatives. So, what we do is we take, X is a vector. We take the first coordinate of X, we differentiate it with respect to all the coordinates of Y. So we go dxd1 dy1, dxd1 dy2, blah, blah, blah, dx1 dyn And then we would take x2 for the second row, do the same thing, and we keep going till we've done all the partial derivatives. dxn dy1 blah blah blah dxn dyn. So it's just a matrix of all possible partial derivatives. Now it doesn't make much sense to stick in a matrix here, so actually, these absolute value symbols actually mean take the absolute value of the determinant. So we're taking this Jacobian matrix and we take the determinant of it. Take the absolute value. That's the analog of this formula we just checking up, we have this matrix somehow we need to compress the matrix down to a number and it turns out the right way to do that is using a determinant. And like in the other case we could also do, we could choose to do it this way or we could have done dy dx. This says dude you're copying the other way around, right take all the partials of y with respect to x. Okay, we could have done that and then take another reciprocal, and it would be the same thing. So sometimes one of these two methods is much easier than the other, so you wanna think first about which direction to do the transformation in. A lot of books just write the Jacobian as J, and I like the letter J a lot, but I don't like that notation here because it doesn't tell you which way are you going, from x to y or y to x. So all right. This way then it's very obvious that just says take derivatives of the x's with respect to the y's, that has to be this. And this one would be the other way around. You can do either way as long as you're careful about whether to do the reciprocal here or not. All right, and so, that's the Jacobian. One other calculus thing that we should discuss briefly is convolution. Convolution is something we've done already, just like we've already done some transformations. But I just wanted to mention it as its own topic briefly. Convolution is just the fancy word for sums. That is we want the distribution of a sum of random variables. Remember for the binomial we did a convolution of binomials using story proof as long as they have the same P. And for Poissons and Normals, we used the MGF and all of those are pretty easy calculations. But sometimes you can't find a story that will help you and the MGF may not exist or you may not know how to work with the MGF. Sometimes we need a more direct method. So in the discrete case, we've already done calculations like this. So we want, so let's let T equal X plus Y. And we want to know the distribution of T assuming we know the distribution of X and the distribution of Y that's called the convolution. So, in the discrete case, we can just immediately write down a formula. It maybe a messy formula it may or may not be something we can actually do but at least we have an expression. So in the discreet case we can immediately just well what's the probability that T equals t? Well, that's just the sum over, how can I get a total equal to T? Well, X has to be something and Y has to be whatever makes that up to T. You can think of this as just conditioning on X. But I'll write it just as you're using the actions of probability, breaking this up into disjoint cases. So I'm just gonna sum over, always I can make the total equal x. So I can immediately just write it down. This is P of X equals little x. We're assuming that X and Y are independent here. It's much nastier if they're dependent. Probability Y equals t minus x. We're summing over all x such that this is positive. So we don't need a separate proof for this. This just says to get the total equal P, X has to be something and Y has to be whatever makes the total P. It has to be that way. And because I assumed independence I split it up into two probabilities. So that's true for the discreet case. Now lets write down something analagous in the continuance case. So now we want the PDF instead. And I'm gonna write down something that looks completely analagous. That is, instead of doing the, this is the PMF. So cuz it's continuous, I'm gonna replace the PMF by the PDF. Let's go from minus infinity to infinity. This is the PMF of y, evaluated at t minus x. I'll replace that by the PDF evaluated at t minus x dx. This is true. And the easiest way to remember this result is by thinking by analogy, with this. However, that's not a proof. That's just an analogy. And on the new homework, you'll see an example where if you try to reason kind of an analogous way for a product, instead of a sum, well, you'll see what happens. This requires more justification. There are several ways to justify this. Probably the simplest way would be to take the CDF. Let's do the CDF take the derivative and get the PDF. So what's the CDF? Well, for the cdf let's use the continuous probability. So we're integrating the probability that X plus Y is less than or equal to little t given x times the PDF of x. This is one way to do it there are other ways to do this calculation. But I like this one. That's just the continuous law of probability. Now we plug in X equals x. And once we've plugged in X equals x, we can drop the condition because X and Y are independent. And so then all we have is the integral of, notice what's left here, just in your mind replace big X by little x and move it over to that side of the inequality. So it says Y less than equal t minus x, that's just the CDF of Y evaluated at t minus x. Now take the derivative of both sides of this equation. Derivative with respect to t, and then there's a theorem that says you can swap the derivative and the integral. Derivative of this CDF Is the PDF so it would get that way, so it requires some justification. Usually I would like to avoid doing convolution integrals like this but sometimes you can't avoid it. But if possible try to use a story or an MGF or one of the other things we've done but sometimes you need that, yeah? >> [INAUDIBLE] >> This is capital F, this- >> [INAUDIBLE] >> Yeah, sorry, this is F sub t(t), thank you. That's the CDF of capital T, thanks. Okay, so my favorite thing about statistics is that you can do things that are beautiful and useful. And Jacobeans are, it's an extremely useful technical tool, but I've never heard anyone describe Jacobeans as beautiful. So to kind of rebalance our beauty quotient for today, let's do something completely different that involves no calculus at all. Something that, you can see whether you agree with me or not but, this is something that beauty is not really an adequate word for, this is not something I would consider existential, okay? So, here's an idea. The idea is you can use probability to prove existence of up, okay? So we're gonna prove existence, what does that mean? We're gonna prove existence of objects with desired properties. Using probability. Properties using probability. So that's a very general idea. So let me just tell you mathematically what's the idea. The idea is, I wanna show that an object with a certain property exists. One way to show that would be to show that, let's say desired property A. That is A is some property, okay? I'm saying this very generally but we'll do an example. So we want to show there's an object with a certain property, that sounds like it has nothing whatsoever to do with probability and statistics. That's just like if you searched everywhere in the universe, either mathematically or whatever, could you ever find this thing? I didn't say anything about randomness or uncertainty, okay? Here's the strategy, so this is a strategy. We're gonna show that P(a) is greater than 0, for a random object. We get to choose how to define random, that is we just have this universe of objects and we decide on some method for randomly selecting an object. So if it is a finite set, the most obvious thing to do is just pick one at random where they're all equally likely, right? If I have a million objects there's no probability anywhere but I say well just pick one at random equally likely, okay? And then let A be the event that the randomly chosen object has the property. Well, is it clear that if the probability is non 0 then there must exist one? Well, of course, if it didn't exist, the probability would be 0, so if the probability is positive, it must exist. So if we can show this, we've shown that it exists. And that sounds like, so this is true, I mean I don't need to write a proof for that. But that sounds like a very, very wishful thinking strategy, that if we can't even exhibit existence of even one such object, how are we ever gonna compute the probability? We can't even find one, but we're gonna compute its probability, that's pretty weird. Notice that we don't actually have to compute P(A) exactly, we only need a bound that shows that it's greater than 0, okay? WE don't need to know exactly P(A), just that it's positive. That's method 1, let's extend this a little bit. Suppose each object has a number associated with it, let's think of a score. So we have this universe of objects, no probability yet, each object has a number attached to it, so some kind of a score. We wanna show there exists an object with a good score. To say what does good mean, I will talk a little bit about that. We wanna show there is an object with a good score, but suppose it's really hard to actually find one that has a good score. show there is an object with a good score, I had to say what good means. Well here's the strategy. You may guess this has something to do with probability. Pick a random object again, look at its score. So, in other words, what's the average score of a random object? Now, here's the theorem. There is an object, Whose score is at least the average, right? Let's just call it E(X), where this is the score of a random object. So, we're defining a random variable by taking a random object, find its score, take the average. Well obviously, there must be at least one object that at least is the average, right? They can't all be below average, that would make no sense, right? So therefore, if now of course, E of X may be pretty lousy. But if E of X is actually pretty good then we've shown that there exists a good one without actually exhibiting it. So again that sounds like a group of people, and at least one person has to have at least the average salary of the people in the room, things like that. That's an extremely crude statement. Is that ever gonna be useful for anything? I think this is a neat idea, but is it actually useful? Well, what I consider one of the most beautiful and useful results of the 20th century was Shannon's theorem. Claude Shannon is the father of information theory. Also the father of this modern communications theory. So anytime you use the cell phone,that's all based on communication and coding theory that goes back to Shannon's work. So you can thank Shannon for this. Let me just tell you, this is not an information theory course. It's a really amazing idea that you can quantify information, though. But let me tell you very briefly what one of Shannon's theorems was. Shannon theorem, was that he showed that if you're trying to communicate over a noisy channel, so you're trying to send messages from one place to another, but bits get corrupted, things, there's a lot of noise and interference, or whatever. He showed that there's something called a capacity of the channel, and you can communicate at rates arbitrarily close to the capacity, with arbitrarily small chance of error That is even if you have a very, very noisy channel, you can make the air probability very, very low. That sounds like a very difficult theorem. And no one out he proved this in 1948. No one else was even close to thinking of that as far as I know. The way he proved it, that there exists what he called a good code, right, a good code is gonna be one that works well for sending messages across this noisy channel. The way he showed that a good code exists was to pick a random code. And that's like kinda the most daring thing you can imagine, he probably spent months trying to actually find one couldn't find one so he picked a random one. And to think that a random one is actually gonna do well is kinda unbelievable, and it turns out to be true. It was only 30 or 40 years later that people actually explicitly could write down a good code. Until then Shannon showed that they exist because a random one has the right properties. Even though you can't actually write down a specific one, without a lot of work. That's one of the most amazing results, just mathematically extremely beautiful, but it underlies all of modern communication and information theory. All right, so I'm not going try to prove Shannon's theorem in ten, or five minutes, but I am going to do one quick example. Along these lines so I just made up a simple example just to illustrate this idea. So the idea is, and here is the problem. So suppose we have. 100 people, I just made up some numbers, just so that we can actually do something reasonably concrete and simple, just to show you how this idea would work in a small example. Okay so there are 100 people and those people form committees. Now one person can be on more than committee, so let's assume that there are, how many committees do I want? I made up some numbers last night I think I wanted 15 committees. I just made up some numbers where it works out nicely but we can try this something more general like M and N and whatever, but I made up some numbers. 15 committees of 20, that is each committee has 20 people. So I chose these numbers such that 15 times 20 is 300 which means that if everyone is on the same number committees, than that means each person is on three committees. You can generalize this to cases where different people can be on different numbers of committees. But well, for simplicity, let's assume each person is on three committees. No probability yet so far, right? That's just okay, there's different way to do it. You can think of it as a counting problem, how many ways are there to do it, there's some vast number of possibilities. Okay, now here's the problem. The problem is to show that there exist two committees whos overlap is at least three. So I can find two committees, or there exist two committees, where a group of three people is on both committees. So show there exists, two committees With overlap greater than or equal to 3. All right, so clearly the way to solve this is not gonna be like write down every possible [INAUDIBLE] committees and then search through and find over the computer all the overall laps, all the intersections and go through everything right? That would be a nightmare. So we're going to use this idea and we're gonna prove existence. This is an existence problem. We're gonna prove existence just by computing the average. So the idea is find the average intersection I said average. That involves probability. We didn't have any probability yet. We introduce our own probability structure by just saying let's just choose two random committees. So find average overlap of two random committees All right, so hopefully we can do that quickly. So, I'll just write E, you can make up some fancy notation and stuff, but we're just picking two, we're assuming that we have this fixed assignment of who's on what. We have specific people with names. The so and so is on this committee and so and so is on this committee, and so on that's not random. Our randomness is because we're choosing two random committees. Okay, and we want the expected overlap of those two committees. So how do we do that? Indicator random variables. We create an indicator random variable for each person. There's 100 people, so I'm not gonna write all the indicator random variables, because this should be familiar by now. We have 100 people, so we create an indicator for each person, use linearity. So it's gonna be 100 times and over here all we need to do by the fundamental bridge, all we need to do is write down the probability that, let's say person number one is on both of those random committees, right? So now we're looking, okay, person number one what's the probability that that person is on both of those randomly chosen committees well, you can think of that as a hypergeometric. You don't have to let's just think about it directly. I'm assuming I chose two random committees. So it's 100, choose two possibilities. Naive definition applies, because I'm assuming equally likely that we chose any two with equal probabilities. Then the numerator, sorry, this is number of committee, how many committees are there? 15 committees, choose two out of the 15 committees and then the numerator. Person number one is on three committees so choose two out of the three committees. So this is three choose two, three choose two is three, so that's 300 over 1500 choose two is 15 times 14 divided by 2. 300 divided by 15 is 20, the 2 comes up so it's 40 over 14. Which we can simplify as 20 over 7. If I did the arithmetic correctly that looks like we came a little bit short. It's like almost good enough cuz we wanted at least 3. And we only have 20 over 7 and if only it were 21 over 7. Then we'd be so happy. But here's the idea. According to that, there must be, so the average is 20/7. That implies that there exists a pair of committees. With at least. An overlap of 20/ 7. Now there's no way that two committees can have an overlap equal to 20/7 if the overlap were only 2 that would not be good enough. So we get to round this up to the next integer because the overlap is an integer so that means we can have overlap of at least 3. Than means we have proven that. So we prove that it exist, we ran out of time so have a good weekend.
Info
Channel: Harvard University
Views: 63,046
Rating: 4.8773007 out of 5
Keywords: harvard, statistics, stat, math, probability, lognormal distribution, convolutions
Id: yXwPUAIvFyg
Channel Id: undefined
Length: 47min 46sec (2866 seconds)
Published: Mon Apr 29 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.