Lecture 26: Conditional Expectation Continued | Statistics 110

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Last time I left you with a cliffhanger, right? So we better try to resolve that, the two envelope problem. So, I'll remind you what the problem is, it's very simple to state, but not so easy to resolve. So the problem is just, we have two envelopes, envelope one, envelope two. They look identical, and suppose they are X dollars in here and Y dollars in here. And you don't know anything at all about X and Y, just they're random variables right now. And all you know is that one envelope has twice as much money as the other. So they each have a check for some amount of money. And one is double the other. But you have no information other than that, twice as much as the other, okay. So the argument last time was, Let's say you get to pick one. So let's say you pick this one, and this one's X. And you don't know what X is unless you open it. But you do know, the other one is either 2X or X over 2. And if you average 2X and X over 2, you get a number bigger than X, which suggests that this one is better. But you could reply the same argument to this one. This one's a Y, the other one's 2Y or one-half Y. If you average 2Y and one-half Y, you get a number bigger than Y. So, all right. So, let's actually write that out kinda formally as an argument. So, based on the other two competing arguments here. Argument one, Is simply that E(Y) = E(X) by symmetry. So, if you were given some piece of information that the person who put the money in the envelopes is left-handed. And left-handed people subconsciously wanna put more money on the left, or something like that, then you'd have an asymmetry in the problem. But there was no asymmetry given, that the way it was stated is just there's no difference that the left envelope should have more or the right one, so it's a symmetrical situation. It's kind of hard to argue against symmetry here, right? There is no asymmetry in the statement of the problem, so if somehow the envelope on the left were better, where would that have come from? It doesn't make sense, okay? So that's a pretty strong argument. But let's look at argument 2, which also seems pretty strong. Argument 2 is the condition. Say how do we compute expectation? Just like when we had a probability, we didn't know. We want a condition on the thing we wish that we knew. And so this is just the law of total probability except the expectation version. We're gonna condition on whether, You know either Y = 2x or Y = one-half X. So we could just write it this way. E(Y|Y=2x) P(Y=2x) + E(Y|Y= one-half X)P(Y= one-half X). So I mean that's just a fact about expectation that you condition. And the proof of this is basically just use the law of total probability, and that's true. So it's hard to argue with that. Let's compute this. So the argument goes, E(Y|Y=2x), well, then we know Y = 2x. So we're just gonna replace that with E(2x) by symmetry. It's equally likely that Y = 2x and Y = one-half X, so that's just one-half. Plus, in this case, Y is X over 2, so we're gonna get E of X over 2, Probability, or that's one-half again. And if you simply this, you can take out the 2, take out the one-half, simplify you get five-fourths E(x). All right, so then the question is how to resolve this? Well, first of all, is there any case in which both of these statements could be true? Someone said 0. Well, kinda, yeah, both envelopes have $0, then that would be okay. Now, so let's assume there is a positive, nonzero amount of money in both envelopes. Are there any other case? Infinity, well, infinity equals infinity, infinity equals five-fourths infinity. So one way out out it, could we just say, well, on average, there's an infinite amount of money, and then you're pretty happy, right? But there are not many kinda like the Saint Petersburg paradox, where the expected value is infinity. But there are not many real world scenarios that you could think of, probably, where your expected value of how much you'll make is infinity dollars. So let's assume that these numbers are not 0 or infinity. In that case, this is just a direct contradiction, meaning that one of these two things must be wrong. Well, symmetry, I can't find any way to argue against the symmetry here. So, this has to be right. Symmetry takes precedence. So let's try to see what's wrong with argument two. Well, actually, this is one of the most common and troublesome mistakes with conditioning is to kind of like use the information, and then forget about it. Look at what we did here, we plugged in the information Y=2x, but then the condition went away. What's the justification for that? There is none, it just looked good to say that, but there's no justification. So, actually this step is wrong. So, actually it's not equal. What we should have written was, it's perfectly valid to plug in the 2x. Let's write the corrected version. It's perfectly valid to plug in 2x, but that doesn't mean we can forget that we know Y=2x. So, it's 2x given Y= 2x. And similarly over here, we can plug in the information, but it doesn't mean we're not conditioning on it anymore. So, we can do that. But we still have to then evaluate this conditional expectation. We can't just say that we can forget this information here. In fact, there's no more justification for forgetting this information here, then there would be up here. You just can't say that. So actually what we just showed is that essentially, E(X|Y=2x), Is not equal to E of, let's do it this way, E(Y|Y=2x) is not equal to E(2x). So what's going on there is I like to think of this in terms of indicator random variables. So let's just let I be the indicator random variable. I is the indicator of which envelope has more money, so we could say I is one, if y equals 2x, and zero otherwise, or we could define it the other way around. That's the event that says that the envelope on the right has twice as much money as the one on the left, okay. So we have an indicator for that. Then, essentially what we just showed is that X and I are dependent. We can either talk about X and I or talk about Y and I. They're dependent, and that seems surprising at first. So let's think about what that means. It says that if you got to observe X, then somehow that gives information about I. So now we get to think of more about the case, what happens if you actually get to open the envelope on the left? Okay, and you see $100 there. So you know the other one is 50 or 200, but the question is whether that changes your probabilities for I, that is, is the other one 200 or 50? Is saying it's not 50,50 anymore. Which seems a little strange because you're not given any information kind of the scale of the problem. So let's say you open envelope on the left and there's a trillion dollars there. Well probably you're really happy, but is a trillion a big number, or a small number? Well it's a lot of money, but in the grand scheme of things, compared to the entire real line, from zero to infinity, a trillion is miniscule, right? A trillion is nothing compared to other numbers I could name if I wanted. So when you see that trillion dollars, does that give you information that makes you think that the other one is probably only half a trillion cuz a trillion is big? What's a trillion, right? It's nothing compared 2 to the 2 to the 2 to the trillion to the trillion or something like that. But if you would observe that then that's nothing compared to other numbers I could name. So it seems surprising that they're dependent, but essentially we just proved that they have to be dependent if the expected values are finite. And there's a strategic practice that's related to this too, which says that, you can look up the problem if you haven't already. But just to tell you what the result of that problem is, which is also surprising. It also is a two envelope problem, but in that problem it just assumes that there's two amounts of money, two positive amounts of money. It's not assumed that one is double the other. And the problem is to come up with a strategy guaranteed to give you better than a 50% chance of getting the envelope with more money. You get to observe one and then you choose whether to switch. You can guarantee that your probability of success is strictly greater than one half. Which again, at first sounds impossible. Because if it's a trillion dollars, should you switch or not? But it says in a certain sense that actually you can. You can make a measure of is a trillion a big number or a small number. And the strategy is in that problem, is to generate your own random threshold. That is generate some value t, let's say from an exponential distribution, but it doesn't have to be. You can pick some other distribution. You generate your own threshold value and then you say you're happy if you got more then t and unhappy if you get less then t, and that gives you better then a 50% chance of success. All right, so anyway, that's the two envelope problem. There's many, many different articles and debates about this. And some people try to take Bayesian approaches to resolving this and so on. But I think that's fairly unnecessary, I think the key blunder is just in this step. And this is kind of a strange problem, but this mistake comes up in a lot of other contexts as well. So it's worth thinking carefully about. We plugged in the information, but that doesn't mean we can then get rid of the information. The only time when we can get rid of the stuff we're conditioning on is when we know we have independence. And here there was no justification for independence and in fact, X and I, or Y and I can't be independent. Which is not obvious. It's not obvious that they're not independent, but on the other hand you can't just say they're independent without proving it. And if you try to prove it you'll find they actually are not independent here. So that's what's going on with that problem. All right, so let's do another example of conditional expectation. The coin flipping problem. So I'll call this one, Patterns in coin flips. Okay, so assume we have a fair coin, all right? You have a related homework problem where, where the coin may be biased. But right now, we're assuming we have a fair coin and we do a repeated fair coin flips. And we're waiting for a certain pattern, okay? So we're gonna wait until or we want to know how long? How many flips until we observe the pattern heads followed by tails? So I'lll just call that HT. That is keep flipping the coin and eventually, you will observe heads immediately followed by tails, right? That particular pattern will eventually show up. You wanna know how many flips does that take, including the H and the T there. Okay, that's a natural question we could ask. Similarly we could ask how many flips until HH? That is, how long do you have to flip a coin until the coin lands heads twice in a row. Okay, so let's find the average. So let's say, let's call that, I'll just call it WHT and WHH. But WHT I just mean the random variable representing how many flips does it, you know. Imagine this long sequence of coin flips and how far into the sequence do you have to get until you see HT for the first time. How far until you see HH for the first time, those are both random variables. But right now we're talking about expectation, so the question is to find the average. So our problem is to find the expected values of these two waiting time. I'm using W for waiting time. Expected value, how long you have to wait for these patterns, okay? Well, so we'll solve both of these, but before I actually solve them, let's try to think more qualitatively about whether there's an inequality or equality here. Okay, so we know there's only three possibilities, either this number is bigger than this number or it's equal, or this is less than this, okay? So why don't you take a few seconds to think intuitively about whether you think they're equal or one is bigger and if so which one is bigger. Then we'll vote. Then we'll see, okay. So there's three possibilities, right? Think about which one you think is true. Okay, so how many of you think that this is greater than this? Raise your hand if you think this is greater than this. Okay, and how many of you think this equals this? Okay, and how many of you think this is greater than this? Okay, so the dominant answer is equality, and it's roughly equal numbers, saying this is greater than this or that this is greater than this. Okay, well the answer is this one equals 4. And this one equals 6, 50% bigger. But you might say, well, by symmetry don't they have to do the same thing? That would be a false use of symmetry. Symmetry would tell us that, if we would ask the same question. Let's do a little symmetry argument over here. Symmetry would say expect the value of WTT equals expected value of WHH. That has to be true by symmetry. Because, the coin, it says heads on one side, tails on the other, but you could have, like relabeled it, the same problem again, just with a different labeling. So that's symmetry, that I'm interchanging heads and tails. Similarly, we know that E of WHT equals E of WTH, But this, neither of these tells us that these are equal. It doesn't tell us that they're not equal. It just you can't just say by symmetry. Because you have to swap heads and tails everywhere, and those are different things. All right, so let's do the calculation. Then I'll talk a little more about the intuition for why is this bigger than this? But let's just calculate it first. Okay, so first let's do E of WHT. So a nice way to think about it, we don't actually need a conditional expectation for this. A nice way to think about this problem it is just imagine a sequence of tosses. Maybe it starts out tails, tails, tails, tails. Eventually the coin will land heads, right? Now once this has happened you can see we've made partial progress, right? Because now, well if the next flip is tails, then we're done. But if not, that's okay. Heads, Heads, Heads, eventually it'll be Tails. So then we're done, right, then we got it. So just drawing, a few examples like that, make up your own little sequences to make it concrete then you'll see what's going on here, which is that all we have to do is wait for the first time the coin lands heads. Let's call that W1. That could be the first flip, but it could be however long it takes to til the first head. And then, after that point. How long do you have to wait additionally for the first tails? Let's call that W2. So matter what happens, I can always split it up between a W1 and a W2, right? Time to the first heads, time to the first tails after the first heads, right? Well, okay, those are independent of each other, because the coin is memory less. But even if they were not independent, we could still apply linearity and just say this is E of W1, plus E of W2, equals 2 plus 2 equals 4. Since the W is basically a geometric the only thing we have to be careful about is that we define the geometric to not include the success. So actually W j -1 with our convention is gemoetric 1/2. Right, it's just a waiting time for success and where here we're defining success to be this heads and here we're defining to be this tails, okay? So geometric 1/2, this has expected value of 1, 1 plus 1 is 2. So each of this, an average will take two flips to get to this stage, two more flips to get to this stage, 2 plus 2 is 4 linearity. [COUGH] Okay so, Let's try to do the same thing with heads, heads and see why that doesn't give us the same thing there. All right, so now we're gonna do E of (WHH) and again it just really helps for concreteness draw some little examples. So, again, maybe the sequence starts out Tails three times, four times, like here, eventually the coin lands heads. Now at this point, so we could still call that W1 if we want, but at this point either of two things can happen, either the next flip is heads then we're done, we got it, but if the next toss is tails, we have no partial progress anymore. That means that in this scenario these tosses were all wasted, right. They do nothing for us then the thing just restarts. Al right, I mean it took six tosses to get to this point, but we're starting again with exact same problem we had, right, no partial progress. In this case we have partial progress. Because once you get this heads for the first time, then you're halfway there essentially, right? That, that's the key distinction. All right, so here we have to be more careful because in this case all these tosses are gone and we just start over again. Okay, so now we're gonna use conditional expectation to compute this. So it's kinda like gamblers ruin type of thing where we condition on the first toss. So this is E of WHH given, you can make up some notation if you want it, but I'll just write it. First toss is heads times 1/2 + that's the probability that the first toss is heads, plus expected value WHH given first toss his tails times 1/2. Just conditioning on the two cases that this first toss is either heads or tails. And now let's expand it out further. This one we need some space for probably. This one the second term is actually easier here, okay? So I wanna compute this thing I want the expected value WHH given that the first toss is tails, okay? Then the first toss is tails, what is that say? It says the first toss is tails, that cost us one toss. Tossed the coin once, the coin was tails, okay? Then it's the same problem again, right? Exact same problem. So that's just E of WHH again. We're gonna try to solve for it in terms of itself. Which sounds circular but then we have an equation. We can just solve that equation for E of WHH, okay? So that was the first toss, was a waste and then it's the same problem again. Now for this first term, first toss is heads. We have to further subdivide it into two cases based on the second toss. So here we're when we're doing this conditional expectation it means we're working within the world where we now know the first toss is heads. That's the information we have. Now we look at the second toss. If the second toss is also heads, then that means the first two tosses were H, H and then we're done and it only took two flips. And that has probability 1/2. But, with probability one-half within this case, the second toss is tails. In that case, the sequence started out HT, which does nothing for us, right? That means it's a waste of two tosses and then it's the same problem again. Right, two tosses from heads tails and its the same problem. It resets again at the same problem. All right, so now, we have an equation for E(WHH) in terms of itself. So just multiply the 2 times 1/2 is 1 and move things around, and you'll get that the expected value is 6. But you can also check just by plugging in 6 here. This is five-halves plus seven-halves is twelve-halves, is 6. So that works. All right, so okay, so that seems a little strange. And to kind of explain a little bit of the intuition here, let's imagine drawing like a long sequence of coin tosses and look at where did the HHs appear and where did the HTs appear? So maybe it looks like TTHHHT, blah, blah, blah, blah, blah, blah, blah. And then somewhere later on there's an HHHTT, you can make up your own sequence. I'm just gonna try to illustrate kinda what's the big picture of why this is true. In other words, the question I guess is why doesn't this contradict the fact that, since it's a fair coin, if we flip the coin twice, then all four possibilities are equally likely. So if I'm only looking at two particular positions, let's say here and here, it's equally likely that this will be HH as for it to be HT. Those are both one-fourth, okay, so I think that's where the intuition comes from that it should be the same. When you look at any two positions, it's equally likely. But we're not just looking at two positions. We're looking at the entire sequence. And what happens is that what the HH is, sometimes you'll get three heads in a row, sometimes you'll get five heads in a row. If you have three in a row, then that means there's an HH there and an HH there, they're nested. So if we have five Hs in a row, which will happen more often than most people would expect, cuz coincidences happen a lot. And look at all these HHs with those five in a row. Try to do the same thing with HT. You can't, right? It doesn't nest inside itself in the same way. So that means that when you get the HHs, they're kind of clumped together. But on the other hand, you have the same expected total number of appearances of that. So since the HHs are more clumped, they appear in clumps, those clumps must be further apart. That's what's going on. Okay, and if you think this is just kind of a curiosity of coin flipping, this kind of problem actually has important applications in genetics. In genetics, we will not be looking at sequences of heads and tails. But we'll look at a DNA sequence which is the sequence that's drawn from the alphabet ACTG. And a lot of times, in genetics these days, you won't necessarily study certain patterns, right, they call motifs, where the pattern appear in the DNA sequence and you get similar problems. And, while I'm mentioning this, I'll recommend a TED Talk, Peter Donnelly is a statistician who works on genetics. And I don't know if you've seen any of the TED Talks. But if you go to ted.com and look for Peter Donnelly's talk, he mentions some interesting courtroom examples, and statistics and probability in the courtroom. And he also talked about an example similar to this. It's quite a nice talk. Okay, so, that's the waiting for certain sequences. You can extend this in various directions. This was already a fair amount of work, not too bad. You could also ask more complicated questions, like longer sequences and things like that. And there are a lot of other methods that can be used for more complicated sequences. But that basic method is just conditioning. So you can see that conditional expectation, just like in the Gambler's Ruin, let us simplify kind of this complicated problem, break it up into simpler pieces, by conditioning on the first two. Essentially, what we did was to condition on the first two tosses. Another way to do that, but I sort of did it in two steps. We could have also done this just by, at the very beginning, breaking it up into four cases, based on the first two tosses. That might have been even easier, but anyway, I did the two-two step way, okay? So that's the basic idea of conditional expectation. And I guess to just state it a little bit more, some of the general properties of conditional expectation, and would be useful. And also we're leading into conditioning on a random variable. This whole week is about conditional expectation. So far, we've been doing conditional expectation where we conditioned on an event, okay? But what we're gonna get to is what does it mean to condition on a random variable? They're very, very closely related. But if you mix them up, then it can be bad. So we have this thing with E(Y given X=x). I just wanna make sure everyone is completely clear on what this means both intuitively and mathematically. What does this expression actually mean? Well, I've said over and over again that big X equals little x is an event, right? I mean assuming capital X and Y are random variables, little x is a number. We're conditioning on that event. And as I've been saying, all that means to do is use the same definition of expectation except make it conditional. That's why it's called conditional expectation, okay? So at this point, you should all be able to just write down the definition, that this is for Y discrete. And we'll write down the continuous case too. Okay, so in the discrete case, to get the expected value, we just sum up the values times the PMF, right? So the sum of y times the probability that Y=y, that would just be expectation of Y. The only difference is we make this conditional, given X=x. That's all this means, that we learned the X was equal to x, and so we conditioned on that information, right? So it's our best prediction of the value of Y given this information. Best in a certain sense of minimizing the square of how far off you are on average, cuz it minimizes the expected square error. That's the discrete case. Let's also just write down the continuous case. So if Y is continuous, then the basic definition of expectation, we'd integrate y times the PDF of Y. And now we're just gonna use a conditional PDF instead, which you might write as F of Y given X, Y given X. Dy, okay? And if x is also continuous, just to remind you of how we get this conditional PDF, what the definition of the conditional PDF is completely analogous to conditional probability. That is instead of a probability, we have the density but the density of Y and x divided by the density of x. So that would just be the joint PDF. Divided by the marginal PDF of x. So this just says that this equals this because if just says that to get the joint PDF we could take the marginal PDF of x times the conditional PDF of y given x. So we could also write it this way. And notice that this marginal here, that doesn't depend, that's supposed to be y not px. This doesn't depend on y, so we could take this out of the integral, if we want or we can leave it there, same thing. So, this is just the analog, it's just saying we're using conditional PDF, its defined analogous. So, I mean that that that's how you compute things using a sum or an integral but that that doesn't yet say like what property does this satisfy, okay? So let's let this thing be g of x. That's g of lower case x = E of Y, given big X = little x. I'm writing it as g of x just to emphasize the fact that this, when you compute this thing, because I can't even count the number of mistakes I've seen in these kinds of problems where I give a problem like this on an exam. And then get an answer that involves capital X or capital Y, which you should immediately know would be would be completely wrong, okay, because this is the expected value of y, it can't depend on capital Y because you're taking the average of Y. So how can that depend on, you can't you can't base that on Y, right that's your prediction of Y you can't use Y, okay. Given this information X = x, for conditioning on this event and it can involve capital X, it can involve capital Y, it's just a function of little x. It might be a constant. If X and Y are independent, then conditioning on X gives us no information about Y. So if they're independent, this will just reduce to E of Y, which is a constant function, but it has to be a function of x, possibly a constant function, okay. So that's just to emphasize that's a function of x. But that also suggests how to define conditional expectation given a random variable. So we wanted to find E of Y given capital X, As g of X, capital X now. [COUGH] So I mean this is like not a complicated looking equation, but conceptually, this takes a lot of thought and I'll explain it. And I'll explain it more in the next lecture. But really, you just have to take some time to just think about what these things mean. I think carefully about it because there's just so many mistakes you could fall into we don't fully understand what this notation means. All right, so what this thing means is we have this function of little x, and we replace little x by big X, after evaluating it, not before. So one trap you could fall into here is to say, well, g of capital, this is g of little x, g of capital X, then I should plug in big X for everywhere I see a little x. So I plug it in there and there, but in that case you would be saying EY given capital X = capital X. And then you'd say, well, I already knew capital X = capital X. So that's just irrelevant information that I'll cross out and you'll get E of Y. That's not what this means. What it means is, think of this as a function of little x that you've actually computed. Like, maybe it's x squared. I'll just write this as an example. So what it means is if g of x little x = x squared. Then g of big X = big X squared. That's all it means, that this is some function of X. And you replace little x by big X after we actually have our hands on that function, okay? It's really just a notational trap to think of plugging in the x here. But think of this as some function you're replacing little x by big X. Notice that that's a random variable. So in particular, E of Y given X is a random variable. It's a random variable that's a function of x. It's not a function of y, it's a function of x. All right, so the intuition for that is that, I mean this is the definition. But the intuitive interpretation is that E of Y given X, what that means is X is a random variable. But let's pretend that we observed X, and we know what X is, and we then get to treat X as if it's a known constant, that's what this says. So, it says assuming we get to pretend that X is known, then what's our best prediction of Y? So that's allowed to be a function of capital X, okay? So that's what it means intuitively. It's not really that different from this and in fact when you have a problem like this, you can always at least in principle translate it back into conditioning on an event. It's just that the notation can gets more unwieldy in this case. And we'll see examples where it's just a lot more compact and convenient to write things this way than always having to revert back to this. But if you start getting confused about what this means, then you probably should revert back to this kind of notation. Okay, so all right, so let's do a couple of examples. So, Let's do one with Poissons. So let's let X and Y iid, I-I-D Poisson lambda. So all right, so suppose we wanna find E of X + Y given X. So, as I said, if we want, we could first find this given big X = little x. And then change little x to big X and you'll get the same thing. But, just for practice, let's do this directly using the interpretation I mentioned, that this says we get to pretend that we know X now. All right, so linearity still holds. So this is E of X given X + E of Y given X. We can still use linearity. The reason is that conditional probabilities are probabilities, right? So conditional probabilities satisfies all the same rules and properties as probability. But therefore, since conditional expectation is defined in terms of conditional probability, conditional expectations are expectations. It satisfies the same properties such as linearity So linearity is still true. Okay, so now let's just think about, what does this thing actually mean? E(X|X), that's easy, that's X, right, what else could it be? I know X and I want to predict X, I'm gonna use X as my prediction. So okay, it has to be that. >> No. >> [LAUGH] >> Now let's think about E(Y|X). X and Y, I said they're iid, so they're independent. Since they're independent, that means that getting to know X is of no use at all in predicting Y. So that's just the same thing as E(Y). So this part is by independence, and this part is X, X is a function of itself. In general, we'll talk more about this later, but I may as well mention it now. If we have E(h(x)|x), this is true for any problem, not just the Poisson, so I'm putting it above this. We'll come back to this later, but since it's already coming up. If we get to know, h is any function, we get to know X, well then we know h(x), so we could just compute it. So E(h(x)|x) = h(X), right? Cuz we have X, we have h(x) so that's it, no uncertainty about it. Okay so E(X|X) is X, E(Y|X) is just E(Y), so that's just X + lambda, For the Poisson case. Actually, up to this point, I'm not using anything in particular about Poisson, I'm only using the fact that they're independent, okay? So that's just an example that shows, okay, we can still use linearity. If we have independence, we can drop the stuff we're conditioning on. And if we have something that's completely known, then we can actually just take it out, because it's known. All right, now let's do the kind of reverse problem, suppose we were asked to find E(X|X+Y). And okay, so there's no form of linearity that says this E(X) given this plus E(X|Y). And I've seen that mistake several times, maybe, in time pressure-type of panic. Just, linearity is like this, no, we can't do that here. Okay, so we have to think about what this means. So I'll show you two ways to do this, actually. One way is just straight from the definition. We're gonna let T = X + Y and let's find the conditional distribution, right? Because what this says is we get to know the value of T, and then we use the conditional distribution rather than the, right? Without this it would say just the unconditional, right, what's the expected value, we know it's lambda. But given this information, right, treating T as known, then we need a conditional PMF. So, find the conditional PMF, Okay, so this is similar to other problems we've seen, but let's just compute it for practice. We need the conditional PMF. We want the probability that X equals k, let's say, given that T equals some number, let's say n. Those are just dummy variables here, I'm just finding the conditional PMF. That is, what's the distribution of X, given that we know T, okay? So by Bayes' rule, that's just P(T = n | X = k) P(X = k), divided by P(T = n). And that's actually an easy calculation, because T, this is another example of like that plugging in thing with the two-envelope problem. T is X + Y, so we're gonna plug X = k, and so we know that Y = n- k. And then in this case, we do get to cross out this information at that point, because X and Y are independent. This would fail if they're not independent. But in this case, X and Y are independent, so really this is just P(Y = n- k) P(X =k) divided by P(T = n). And sorry, we're going right to left here, let's just write what is this here. Y = n- k, just straight from the Poisson, right? E to the -lambda, lambda to the n- k over (n- k) factorial. X is also Poisson, e to the -lambda, lambda to the k over k factorial. And in the denominator, we know that the sum of independent Poissons is Poisson. So this is gonna be a Poisson of 2 lambda in the denominator. So we know that's e to the -2 lambda, (2 lambda) to the n divided by n factorial. Okay, and if we simplify this, we get a very familiar-looking distribution, right, what distribution is this? >> Binomial. >> Binomial, notice the e to the -2 lambdas cancel, right? And these power of lambdas cancel, right, there's lambda to the n over lambda to the n. We just have a one-half to the n here, right? And this thing, n factorial, put that on top, that's just n choose k. So this thing, I'll simplify it one more step, it's just n choose k one-half to the n, that's just a binomial. Okay, so what we just showed with that, Bayes' rule, is that X|T = n is binomial with parameters n and one-half. So now, let's use this to get the conditional expectation. So what that says is E(X | X + Y), but let's first do the case of conditioning on an event. So we're given T = n, well that just says, given T = n, it's no longer Possion now, it's binomial. The expected value of this binomial is just n over 2. So in the other notation, this is conditioning on an event, right? But if we want to condition on T instead, E(X|T), then all we have to do is replace this n by T, just according to that definition there. So that's gonna be T over 2. So it says that if we get to know the total, then our best prediction is 1/2 the total. That's actually a pretty intuitive result, right? That we get to know the total of X and Y, and they're iid, and we wanna predict the average of one of them. Well, if I told you the total was 100, and they're iid, you'd probably guess that they're each 50, right? That would be a reasonable guess. So that's a mathematical proof that that's kinda the correct guess. That's one way to prove it, here's a way that I like even more, is to notice the symmetry. E(X | X + Y) = E (Y | X + Y) is true by symmetry, because they're iid. This is always true with iid, that doesn't assume anything about the Possion. Lets add these two things, E(X | X + Y) + E(Y | X + Y), by linearity, that's E(X + Y | X + Y). But E(X + Y | X + Y) is X + Y, which is T. So I added something plus itself and I got this, so that immediately implies the same result, E(X|T) = T/2. And this didn't use anything about the Poisson, so that's actually a more general result. All right, lastly let me just mention one key property that his notation gives us, we'll prove it next time. So this is called iterated expectation, or it's also called Adam's law for reasons I'll tell you next time. This is the single most important property of conditional expectation. So it's good to be aware of it now, and then I'll talk in detail of that next time. It just says, as I said, E(Y|X) is a random variable, so we might want to know, what's the expected value of that random variable? Well, if you do E(E(Y|X)), you get E(Y). And this is closely related to the law of total probability. In a sense, it's a very compact way of writing the law of total probability, extremely useful fact. All right, so I'll see you all on Wednesday.
Info
Channel: Harvard University
Views: 41,822
Rating: 4.9396229 out of 5
Keywords: harvard, statistics, stat, math, probability, two envelope paradox
Id: PgawcWisb0I
Channel Id: undefined
Length: 49min 52sec (2992 seconds)
Published: Mon Apr 29 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.