Lecture 9: Expectation, Indicator Random Variables, Linearity | Statistics 110

Video Statistics and Information

Video

Captions Word Cloud

Captions

we've been talking about random variables and their distributions and the main topic for today is averages that is how do you compute the average result what is an average mean and how do you compute the average of a random variable and okay but before we do that I want to say a little bit more about CDF so because CDF I think you seen it seems a little mysterious the first time you see it you know you have this random variable and then you must assign it a function it has a function associated with it what did the CDF really mean okay so first a little bit more about CDF's I drew I drew a picture last time of what a CDF might look like and just draw a quick picture again mainly we're talking about discrete random variables so far but a CDF makes sense for any random variable doesn't have to be discrete that is we have random variable capital X and we let f of X equal the probably X less than or equal little X as a function of little X which is real even if X only takes integer values this is a function that's well defined for all real numbers X ok so that's the function that's called the CDF but I want to talk more about you know what are the properties and how do you use this thing so picture a discrete example by the way make sure you spell discrete di SC r e t-- e it's a different word from our e e T don't don't confuse the two senses of discrete otherwise it could be a little embarrassing so this is a discrete random variable I'm going to draw a CDF it's going to look like a step function so it's going to jump so so let's draw one whose possible values are 0 1 2 or 3 we could have a discrete random variable where this pick could take on any non-negative integer value so it could go off to infinity but right now I'm just looking at one that's 0 1 2 or 3 so it has four possible values so on the negative side if just it's just zero because it can't be negative then once you said that the horizontal x-axis is act and the vertical axis is f of X and as soon as you hit zero it jumps so let's say it jumps to there then it stays flat now it's also allowed to equal 1 in this example so it's going to jump again at 1 and I'm putting a closed circle here an open circle they are in a closed circle here to say that at the jump it takes the upper value not the lower value so it's assuming this value not this value and then at 2 it jumps again and then at 3 it jumps one more time and that jump is going to take it all the way to 1 so so here's 1 and then once it reaches 1 obviously I can't go any higher up it stays at 1/4 forever so this will be you know this is one example of course you can also have if you have one that is not bounded so it's allowed to go off to infinity then what you would have would be the jump it would still keep jumping but the jumps would get smaller and smaller and smaller so it would approach one but it would never hit one it would just get you know keep jumping but smaller and smaller jumps so it doesn't actually reach one this one actually reaches one once it reaches one stays there forever ok and one thing we can notice about this is just you know we talked about we're talking at CDF now and we talked about PMS before just just to help connect those two concepts if we wanted the PMF it's geometrically it's just the jump sizes because here if this is saying this is the probability that X can't be negative right but with some probability x equals 0 so that's the size of this jump so this jump is the vertical height of that jump is the probably at x equals 0 this jump is the probably at x equals 1 and this jump is the probably at x equals 2 and this jump is the probably at x equals 3 so the jump sizes are the PMF in in this picture what did we say about pmf's we said they have to be non-negative and - one right if you add up all these jumps that's just saying we're going from zero up up to one okay so so from the CDF we could recover the PMF from the PMF we could recover the CDF just just by summing things up I'll talk a little bit more about that but just for an example if we wanted to know for example if we took the probability that that X is suppose we want to know what's the probability that X is between 1 & 3 I just made up numbers just for example but you can make this any a and and B okay well one way to think about that would be to say that so suppose we want to find this in terms of the CDF I'm just just doing this as an example to show that if you know the CDF the CDF is giving you the entire distribution so that means front from the CDF you can compute any probability you want for for X so then I would say well if we take the probability that X is less than or equal to three and if we add to that the probability that X is between one and three let me do this the other way suppose we want the probably that X is less than or equal to three that there's basically two cases either either X is less than or equal to one when we put a 1 there or X is between 1 and 3 alright so I'm breaking it up into two cases so that that's a 1 there so so this is true just by the axioms of probability right I broke it into two disjoint cases to say it's less than or equal 3 either it's between 1 & 3 or it's it's smaller than 1 so I broke it into two disjoint cases and that's the thing that I was looking for so that immediately tells us that this probability would be F of B minus F of 3 minus F of 1 I could have called a and B instead I just did that for concreteness but it's but I would say the so this gives us the probability of any interval from A to be now if it's a discrete random variable we have to be very careful about whether these are strict inequality or less than or equal then it will matter if it's a continuous random variable which we'll get to next week then it doesn't matter if you put less than or equal here or less than but here this calculation is completely general it could be discrete continuous or anything and so now so we now I'm trying to be careful about the strict inequality okay so so from here I mean I'll just write that more generally a less than X less than or equal B so if we want an interval like this well then it's just F of B minus F of a that's just an example showing that from from the CDF then we can compute the probability of an interval like that okay now so in general CDF's have three important properties and we can see all three from the picture basically of a CDF first of all it's increasing and I don't mean strictly increasing so it's allowed to be flat like that okay but it can only go up not down and that and that that's you know to prove that that's true all you have to do is look at the definition if you increase X you're just making it more and more likely that that that you know that that happens so so it has to be true secondly it's right continuous I'm not sure if everyone has heard that term before but I'll just briefly say what that means right continuous step means well first of all it could be continuous I'm assuming you know what a continuous function is this function is not continuous but it's right continuous which means that if you take any take any point and and approach from the right so let's say we took this point here and and we look at the limit as you approach from the right then then the value of the function converges to the value at that point it's not continuous from the left because if you come from the left and then in you reach you reach two and it jumps but but coming from the right it's continuous so so this is the jump also all this says is that the jumps look like this with closed here open here or if there are no jumps that's even better than it's just continuous and then the last property is that what happens as X goes to infinity or minus infinity so as X goes to minus infinity a CDF has to go to zero this one actually hits zero but it's also possible that it would kind of approach zero and never reach zero but get closer and closer to zero and if you go the other way this one it hits one but but in general it could be approaching one but it has to get closer and closer to one as you go the other way okay so those are the three properties of a CDF they're all pretty intuitive like if you keep a picture like this in mind and and this turns out to be this this is if and only if in the sense that we'll talk about why later but but just just so that you know now if you have any function f that satisfies these three things then that is a valid CDF that is we can find some random variable that has that CD right now we're talking about the other direction that we started with a random variable and we said what what is the CDF looks like well in the discrete case you know it could look something like that and in continuous case it will you know be a continuous curve but in any case it's going to satisfy these three things okay so all right those are properties of CDF's one more definition is independence so we already know what independence of events means and we've been talking a little bit about independence of random variables kind of intuitively but I want to say what's the precise definition of independence of random variables so we say X and y are independent is the definition x and y which are both random variables are independent if and then the idea is simply to relate random variables back to events so we would say the probability that X is less than or equal to little X probably y less than or equal little Y factors remember the slogan independence means multiply so if you have this thing here is called the joint CDF and you know we'll be considering those much later in the course but what that saying is look at the probability simultaneously for the X and the y but it's saying independence means you can just multiply those two things X less than or equal x y less than or equal Y then this has to be true for all little X little Y so that's saying that the events remember this thing X less than or hula little X that's an event it's saying that that event is independent of that event okay so we reduced it back to independence of events now this looks a little bit unwieldy so usually in the in the discrete case it's easier to work with the PMF right like that jump function I drew and glad it's kind of covered up now because it's hard to think about these jumpy functions right I don't really want to deal with all the jumpy functions it's much easier to use the PMF so in the discrete case this equation is equivalent to just saying that the joint PMF which is the probably at x equals x and y equals y is the product of the individual pmf's so I hope this definition is pretty intuitive you can show that this is equivalent to this it's kind of it's kind of tedious but there's nothing really difficult about it but I hope that this this equation is intuitively makes sense as far as you know as what what independence means because this says that intuitively knowing the value of X tells us nothing about the value of y so means that the event x equals x should be independent of the event y equals y that's what that says this definition won't work in the continuous case that we'll get to later because in that case is just going to say zero equals zero but in the discrete case this is a lot easier to work with it just says we can factor this into the individual pmfs okay so that's what it means for random variables to be independent and in general as I said in the discrete case it's easier usually to work with the PMF dn't than the CDF for example in the discrete case if you if you wanna know the probability of anything any event just just just just sum up the just sum up the PMF over the relevant values and that's it so okay so that's independence alright so now we're ready for average is you know I'll just start with something very very simple and familiar and then we'll just generalize and extend it and see you know how do we how do we really work with averages of random variables so averages averages well you know there's more than one way to average a bunch of numbers right but if we just say you know there's like mean median and mode and different weighted averages and all these things but if you just say the word average without any kind of clarification or anything usually what you mean is the mean right so that that means means so so if we just say average without any further clarification you can assume that that's the mean that is just the ordinary average that you're you know you know add up the numbers and divide by the number of numbers and it's also called expected values all of these are the same thing so use those interchangeably okay so we want to know how do we take the average of a random variable and that's a very very important concept because first of all you know remember this random variable is this is this thing that we don't know what value it will be before we do the experiment then we do the experiment and we actually may get to observe the value but beforehand we may want to make some predictions or I on it you may want to say on average what's going to happen okay so that's one reason that an average is a very very familiar concept but but actually it the importance coat goes beyond that too because you you could just say well the average is only just telling you like one a one number summary of the center of the distribution in some sense and that's important but that's still that's just a one number summary you might have very complicated distribution and you one numbers not going to be enough so later we're going to talk about you know variance and standard deviation and measures of variability well it turns out that if you want to measure how spread out the distribution is you still need to use this concept anyway so so we're going to keep using this over and over again not just for finding the simple average we're going to find averages of various functions of random variables okay so to start with something very simple if we just have the numbers one two three four five six then if we want to average them then we're just going to add them up one plus two plus three plus four plus five plus 6 divided by 6 equals three and a half okay well that's the starting point of averages very very simple and by the way does everyone know that you can if you want to do this you don't actually have to add up all the numbers you can just add the first one and the last one and average those that is the average of one and six is three and a half that's true for any arithmetic sequence that's something I should have put on the math review I'm just going to mention briefly how many of you know the story of like Gauss at age 10 and that numbers from 1 to 100 ok very quickly a lot of you have seen that story I'll tell it to you very quickly for those who haven't seen it Gauss was one of the greatest mathematicians of all time and when he was a little kid his teacher was annoyed with the class and and and and asked them all to add up the numbers from 1 to 100 just just to kind of keep them busy so he could do something else and so thinking that that would keep them these were like little kids and he was thinking that's going to keep them busy the whole the whole days of eek eeeek you know he doesn't have to deal with these kids and then it so cows to saw these numbers and he took well he said well you compare gauss immediately saw well you compare want the one in the 100 and the two and the 99 1 in 100 is 101 2 and 99 101 3 98 101 so every all the pair's after 101 so he just immediately wrote down you know 101 and there's 50 pairs equals 50 50 and and just immediately you know wrote down the answer so that's just an example of of what I'm just saying about an arithmetic series so it's useful for you know in this class but it's just useful in general too everyone should know that if you want to add up the numbers from 1 to N if I want to average the numbers from 1 to N that's n plus 1 over 2 which is what I just said if the same thing is averaging the first number and the last number and to prove that you could do just something something similar to what - like what Gauss did or as you put the end on that side but I put it here just could this is saying if we average the numbers from 1 to n that's called an arithmetic series as opposed to you know we've seen geometric series arithmetic series you multiply by a constant every time arithmetic series means each term you add a constant so this one we're adding 1 each time from one term to the next but as long as you're adding a constant each time what I said will be true they just average the first in the last okay so that's really really easy now let's do not not really a harder problem but but something that's a little bit different in the sense I wrote down six distinct numbers here let's think what would happen if we had some repetition so suppose we have I mean I'll sort the numbers but you know that the average is not going to depend on what order I write them in suppose we have five ones and two threes and one five and we want to average these numbers okay so so I have eight numbers here notice that there are two ways to do it this is like really really simple arithmetic but it's also a really key concept that's going to help us when we get to random burials to keep in mind the fact that there's two ways to do this one way to do is just the same thing I did just add up these eight numbers and divide by eight right so add and divide by eight okay but notice we can also do it a different way which would be to group the ones together group the threes together and then and then the five it is its own group right and then and then I could average these but when I average these obviously I'm not going to give them equal weight so I'm going to give this higher weight because I have five times as many ones as five so so I would do a weighted average which would be I'm giving this this term weight five-eighths because there's five ones and I'm giving the threes weight two eighths and I'm giving the the five weight one eighth and that we the same right just just simple arithmetic but but the key point is that I can take the ungrouped average or I can group them according to common values and then average those as long as I put the correct weights here it has to be the same thing right I don't actually have to do the arithmetic it it doesn't I don't actually want to compute this no does anyone doubt that those would be the same all right so so that's a pretty simple idea but but but the very very crucial for understanding what's going on so that's a weighted average right where that where the these terms are the weights and notice the weights are non-negative and add up to one right 5/8 plus 2h plus one a is one okay so we have some weights that's a weighted average okay so so if we just do something like this this is an unweighted average right because we're giving each each of these has weight one over N so we'd call it unweighted if they all have the same weight and this one is weighted because we have different weights okay all right now let's extend this to random variables so so we want the average and we're talking about the discrete case now everything will be analogous for the continuous case which we'll get to later but right now we're doing discrete so average of it a discrete random variable X in the notation standard notation in statistics is is dryy of X e means expected value a of x equals and now I'm just going to I'm just going to use this the same idea that we're summing up values x times weights and the obvious weight to use would be the probabilities right so if I want to know the average of X and X has different possible values I'm going to give higher weight to the values that are more likely and I'm going to give low weight to values that are unlikely that's all we're doing so therefore we just write the sum over all X of x times the probability that x equals x that's the PMF and that and that's a value this is the easiest way to write it but we could also write this as you know we could call this thing like you know in index it by here I'm assuming that we're we're not something overall real X right right we're summing over all X let me write that out summed over X with probability x equals x positive so we're only going to need to sum over over a finite list or an infinite list but not an uncountable list for those of you know what uncountable is so for example if X takes on positive integer values or its possible values then we would just sum over all positive integers of you know a positive integer times the probability of that okay so that's completely analogous all right so let's do a couple examples simple examples to start with and then we'll do some harder examples the simplest thing we could start with is a Bernoulli ari Bernoulli P let X be Bernoulli P and let's save some room Bernoulli P it means it's only can only equal 0 or 1 so that's very very easy to deal with so we can do this event 1 very quickly well what let's just write it down expected value of x equals 1 times the probability that x equals 1 plus 0 times the probably that x equals 0 but that term is 0 anyway and that's just equal to P by definition so a Bernoulli P has expected value P very very easy calculation but even though this is so simple there's actually something deeper going on let's suppose that x equals let me give you a specific example of a Bernoulli random variable suppose we have some event a that we're interested in and we let X equal 1 if a occurs 0 otherwise okay now that's called an indicator random variable and we'll be using those a lot so an indicator random variable is just saying 1 if the event occurs 0 otherwise by definition that's Bernoulli right because it's either 1 or 0 so what we've just shown is that well the probability that it is 1 is the same as saying that probably the day occurs so therefore e of x equals P of a very simple looking equation ok well and that's just the same thing except a value is P and P is the probability of a so very very simple but but I actually think this equation is pretty fundamental I call this thing the fundamental bridge which is kind of an overly grandiose name for such a simple little equation but but but the reason I think this equation is really fundamental is that this bridges between expected values and probabilities this says that any problem you want in probability that is if you have any event a you want P of a if you want you can always reinterpret that as the expected value of an indicator so so this bridge is between E and P so so so in a sense we could have started with expected value on the very first day of this class and derived expect drawed probabilities from expected values we could have gone the other way so you can go you can go between P and E that that way so that that's very useful all right let's do it let's do a harder example then the Bernoulli the obvious next one to try would be the binomial so first I'll write down the definition and we'll work it out but then I'll show you that actually there's a much better way to do this problem that's good to see more than one way to do it so we want the expected value of a binomial again I'm just going to write down the definition I'm going to sum up value times probability of value right so so binomial takes values k equals 0 to n the value is K then we multiplied by the PMF width which by now you know is n choose K P to the K Q to the N minus K ok so we have to do this sum somehow well without the K that's just the binomial theorem and we know it equals 1 and that's the PMF sums to 1 with the K though and it sounds like you know that K is kind of in the way and we have to deal with it somehow luckily this this this thing can choose K is one of our is one of our stories that we talked about when we when we're doing stories for counting of course you can just do know do some algebra with this but but we talked about the fact that K and choose K is the same as n n minus 1 K minus 1 because that was the story where we have n people and we choose a committee of size k with one person as president so you can either first choose the president then choose the rest of the committee or choose the committee then choose the president from someone on the committee so that's the same thing the reason I want to rewrite it this way is K depends on K and that's annoying there and does not depend on K that's just a constant that comes out so we can take out the N and let's also take out one of these peas so that I can make K minus 1 and K minus 1 match up so we're going to take out NP times the sum K equals well let me actually start this sum at one rather than zero it doesn't change anything because if K is 0 its 0 so that term doesn't do anything so I'll start it started at 1 and then just whatever's left over n minus 1 choose K minus 1 P to the K minus 1 Q to the N minus K now it looks a lot more like the binomial theorem again in fact this is exactly the binomial theorem if we let's say we let J equal K minus 1 so J goes from 0 to n my just making a change of variables I'm letting letting J equal K minus 1 so now it's going up to n minus 1 K minus 1 is jace that's n minus 1 choose J P to the j q to the and then the total has to be you know so it's n minus J the total has to be the total of these exponents has to be n my n minus 1 so I'm just letting K equal J k equals J minus 1 let's see that becomes a plus I guess that's a plus well it doesn't look right hmm pull out a cue I don't want to pull out a cue k equals J all right the way this goes is that so we're letting J equal K minus 1 it should be a what n minus 1 minus J and minus 1 minus J like that it's okay sorry J is K minus 1 so k is J plus 1 right so I let K be J plus 1 so so they're both minuses okay up that okay now alright so now that this looks better because now if we add up J and n minus 1 minus J we get n minus 1 now it looks exactly like like the binomial theorem so this whole thing is just 1 so this whole thing so that this part equals 1 so the whole thing is just NP that's why I didn't want to pull out a Q that we could because I knew that's going to work out to NP so we took up NP whatever is left is 1 by the binomial theorem okay so that's kind of an annoying calculation here's a better way to do this a key idea probably actually not just probably definitely the single most important property of expectation is what's called linearity so we're going to use linearity over and over and over again the single most useful thing about expectation linearity set says that the expected value of x plus y which is just you can just add them up and this is always true even even if x and y are dependent i think this is kind of intuitively made seems fairly obvious if x and y are independent but it seems surprising at least to me the first time I saw this if that even if x and y are dependent that's still true so what will prove that this is true next time but for now you know we can feel free to start using it and then I'll fill in the proof next time but I want to show you how this is useful and then we'll come back to the proof one other thing about linearity linearity is normally stated as two things one is this and the other one is that we can take out constants and this one is kind of more obvious but also useful so if C is a constant if C is a constant we can just take it out okay but this is the most interesting one expected value of the sum is the sum of the expected values all right so now let's redo this binomial thing because because this was pretty annoying to go through this this whole thing and I even had a head start because because I you know I knew that this particular identity would be useful but if you didn't remember this and you start expanding out some factorials and messing around with it and hopefully you can eventually get to this point there'll be a lot of work let's redo that a better way redo the binomial well I kind of didn't leave much space here but we don't actually need much space because all we have to do is think about linearity remember the binomial and P we can think of it as the sum of n iid Bernoulli peas each of those Bernoulli peas has expected value P and there's n of them so it's n times P by linearity so I don't know about you but I prefer this method to this method so that you know that's a calculation you can all do in your head just each one is P each one each Bernoulli has expected value P and there's n of them so it's NP since I'll write that out as well x equals x1 plus bla bla bla plus X n where X J's are Bernoulli P now actually for the for the binomial we can actually think of these as independent Bernoulli's but this says it even if even if these Bernoulli's were dependent it's still true it's still going to be NP so so this is just then extremely easy but by linearity okay let's do another example another distribution we talked about last time was the hyper geometric and just just to take a concrete example again let's let's say we have five cards five card hand an example I talked about last time and x equals from a deck of cards and x equals the number of aces so just coming back to that example I can phrase it this way or I can phrase in terms of the elk or I can phrase in terms of marbles or how you know many different ways it's the hypergeometric problem and we want the expected value of x well the hypergeometric look you know remember to have that bat PMF involving these binomial coefficients and in this complicated looking thing it'll be kind of a pain to try to do that hypergeometric looks pretty complicated but instead let's just thinking in terms of in indicator random variables so let's let X J be indicator of J card being an ace now I didn't say that the cards are any in any particular order but that we can assume that there's some order like like the cards get dealt to us one at a time or your whole I mean there's implicitly there's an order and that you're holding these five cards and there's one on the Left leftmost you have them in some order it doesn't matter what order they're in but it helps to think that they are in some particular order so it makes sense to say okay a first card second card and so on just as I can concretely say I have indicators for each of the five cards so I have five J goes from 1 to 5 I have five of these indicator random variables okay so therefore we can think of X as just being the sum of these indicators are that that's just how we can write we want to count to 3 go 1 1 2 3 just add 1 each time you want you know want to increment by linearity that's just the sum of these that's linearity by symmetry so this is linearity this step is indicator random variables and then by symmetry that's really just the same thing five times symmetry there's no reason to think that the second card has a different distribution from the fifth card right right it's completely symmetrical so therefore we can just write down the a of x1 actually let's try one more more step a of x1 equals the same as the probability that that card is an ace so this is equal to five times the probability that the first card is an ace that's the fundamental bridge probably that the first card is an ace well for over 52 or 113 so this is just five thirteen okay so I wrote this out in a lot of detail just so you can see exactly what steps are being used but this is also one that you could do in your head right it's just saying I have five indicator random variables each one is 113 so it's 513 so it's actually it's actually completely analogous to this where it's just something you can immediately do in your head that the difference is just that in this case the X's are the X J's are are dependent right because for example the first four cards are aces the fifth one couldn't be but linearity still says that it's still true so using these tools together is often a very very powerful strategy indicators linearity symmetry fundamental bridge and you can solve a lot of problems so I just did this as a concrete example but in general that this gives you the expected value of a hyper geometric any hope any hyper geometric that you want remember hyper geometric from last time you know the marbles are the elk it's going to be the same thing right same same calculation and even though in the hypergeometric the trials are dependent what this says is that for the expected value it still looks as if it were binomial even though it's not for the expected value for other things we'll do later we'll see differences with type of geometric but just for the expected values just n times the probability that an individual one you know has whatever property okay so we'll do more examples that you know with indicators and hence stuff next time but the last thing for today is is we need one more like famous distribution and that's called the geometric not to be confused with the hyper geometric there's very little in common between the geometric and hypergeometric okay so this is our next famous distribution the geometric distribution with parameter P as with the binomial we're assuming that we have Bernoulli trials independent Bernoulli trials for example flipping a coin but just and anything where you're repeating the same experiment over and over again with independent Bernoulli trials each trial has the same probability of success and the story of the geometric is that it's the first it's it's it's how many failures before the first success so so so this is the number of failures before the first success and you have to be a little bit careful with this because some some books count the success as part of it and some exclude the success and I slightly prefer the convention to just count failures before the success and don't include that success but you just have to be consistent and careful because depending on where you look you'll see different conventions number of failures before the first success okay so let's write down the PMF and then let's find the expected value so here's the PMF the possible values are 0 1 2 3 and so on and let's let Q so let's let X be geometric P so the interpretation is or just repeat unlike the binomial in the binomial there's a fixed number of trials n right and in the geometric you just keep you know if at first you don't succeed you try try and try again until eventually you succeed and then you count how many failures you've had up until that point okay so as usual let Q equal 1 minus P and just to draw a little illustrative picture here here's how it goes it could be that that well you know suppose that's right that's right F for failure and s for success so maybe we got 5 failures and then a success okay the probability of that happening so so if so in this case we would say x equals 5 right because we had I'm counting the number of failures until the first success that's the only this is this sequencer is the only way that that could happen right you know 5 failures and then success so x equals 5 corresponds exactly to this sequence and the probability of this sequence is Q to the fifth times P right because we had five failures and then a success that's just the probability of that sequence so in general this is going to be Q to the K times P where K goes from 0 1 2 3 and so on okay so that's the PMF let's check that this is a valid PMF why is this valid well to check that these are non-negative so to check this is valid we have we have to add these up and see if we get one so if we add up a P Q to the K K equals zero to infinity we can take out the P because that's just a constant that's just P now the sum of Q to the K that's just a geometric series right the sum of this geometric series is 1 over 1 minus Q 1 minus Q is P so that's just 1 and that's what we wanted that's why this is called the geometric distribution it's because this thing is a geometric series just like for the binomial distribution you get you get the binomial theorem that shows that the PMF is valid all right so now let's do the expected value so again there's more than one way to do this let's take so we're letting X be geometric P and we're going to try to compute the expected value of X first of all by the definition definition says we just write down the sum of values times probabilities so go K equals 0 to infinity of K P Q to the K I just wrote down that PMF times the value so that's the definition and you know P is a constant so we can take out the P and we can start the summit 1 because it's 0 at 0 K Q to the K so without this K in in front that would just be a geometric series very easy with the K in front that's kind of annoying okay and we need to actually so so so as far as a strategy is here's our scratch paper we see that K in front and we don't know what to do with it the strategy is start with things that we know and then try to try to reduce it back to 2 what we're looking for so the only thing you know I know that's related to this is the geometric series I know how to do this without the K so out the K we would say well we have a geometric series that's what I just did okay now somehow I want to K in front somehow I need to get from Q to the K and get that back to K in front well one idea would be to take the derivative right because if I take the derivative with respect to Q then I'll get K Q to the K minus 1 then I have a K in front so let's take the derivative of both sides of this equation so that's going to be K Q to the K minus 1 I'm going to start the some at 1 because it's 0 at 0 I just took the derivative I exchanged the derivative and and and then this sum but it's okay to do that and then I have to take the derivative on the right hand side well what's that it's it's minus 1 over 1 minus Q squared and there's another minus from the chain rule so just 1 over 1 minus Q squared now that now looks very very similar to this right the only the only thing bad about this is that we are missing this is Q to the K minus 1 that's Q to the K but that's easy to fix just multiply both sides by Q so the sum of K Q to the K equals I'm just multiply both sides by Q so that's Q and denominator by the way is just P squared because 1 minus Q this piece that's Q over P squared ok so now this is just P Q over P squared equals Q over P so that's a trick for doing a sum like that that you know we can start with something we know then you know try to like derivatives and things like that all right well we only have two minutes left but I want to show you this is still to required some calculus and a clever trick I want to show you here's the story proof method let's let's let C equal the expected value of x and we're trying to solve for C I just want to give it a more familiar looking notation C we're trying to find C well let's just think things about in terms of you know flipping a coin with Raleigh P of heads over and over again and until the coin lands heads for the first time count the number of failures so want to solve for C well there's two cases this is this is very similar to the first step analysis like the gamblers ruling either the first coin flip is heads in that case X is 0 because we had no failures right so in that case it's 0 and that and that case happens with probably P I mean success the first time the other case is failure the first time okay which happens with probability Q now if it's failure the first time we have that one failure but then notice it's the same problem again right the coin is memoryless the coin is not out to get you the coin is not trying to help you it's just just a coin want one failure then the problem restarted so it's the same problem again so so so it's plus C again okay so what's that that's just q + c q salt solve for solve for c 1 minus Q is P so if we solve that equation we get Q over P so this way I like more because we don't need to use any calculus we just write down you know what what does it mean to geometric in the story this way we use a little calculus but you know either way we'll get Q over P okay so that's all for today I have a good weekend

Info

Channel: Harvard University

Views: 114,462

Rating: 4.9187226 out of 5

Keywords: harvard, statistics, stat, math, probability, indicator rvs, linearity, symmetry, geometric distribution

Id: LX2q356N2rU

Channel Id: undefined

Length: 50min 22sec (3022 seconds)

Published: Mon Apr 29 2013