Lecture 8: Random Variables and Their Distributions | Statistics 110

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so so picking up exactly where we left off last time we were starting on the binomial and the Bernoulli distribution and random variables in general right so so I want to kind of review the binomial a little bit and go and go further than we got last time with the binomial and then also in parallel with that be discussing more about random variables okay so here's the binomial just to remind you a binomial distribution it's one of the most famous distributions and one of the most useful want ones in all statistics and we write it as distribution we write it as bin of NP for shorthand it has it has two parameters N and P that's what they're usually called I mean we call them whatever you want but the default choice would be to call them NP so those are called parameters it's if you change the parameters then you have a different distribution it's still called a binomial distribution so strictly speaking there is not just one binomial distribution there's a whole family of binomial distributions because you could even let n n is any positive integer P is any real number between 0 and 1 so for any n NP then we have a binomial NP distribution or what is that distribution well there's actually three important ways to think of it and all three are important as far as we're concerned in this course so the first one I think it's the most important because because that's the story and the story tells us why do we care about the binomial if we didn't have a useful story for it then there's no point in looking at it the story as I mentioned last time they're just reviewing last time we have we have n independent trials each trial results in success or failure and this is just the distribution of the number of successes okay and in notation we will write X this notation means that X is a random variable that has this distribution what does it mean to say that it has this distribution well it means that we can interpret it in any of the three ways that we're about to do so the first one I just mentioned it's think of X s is the number of successes in n independent that's a crucial fact is that the trials are independent Bernoulli p trials and Bernoulli p trials is what I just said each one is success or failure where P is the probability of success so that's the probability of success so you can define success however you want you can define success to be failure and failure to be success if you want everything will still work the key is that every trial results in success or failure but not both but other than that you can you can define success however you want so that's just in generic word so for example you know the most famous example of a binomial would be it would be flipping a coin n times and you could let you could define success to be the coin landing heads and failure to be the coin landing tails or if you define it the other way around but but either way this is counting successes however you define successes okay so so a lot of times you'll see binomials explained in terms of coin flips just it's easy to talk about that but I think you can you know we'll see many examples later in the course but I think just you can already see that this is a very general setting right you have n independent trials and then you count the number of successes and you can define success to be whatever you want so this is a very general useful distribution okay it's the second way to think of a binomial is is in terms of what are called in indicator random variables some of indicator random variables and this I didn't write out last time but it's actually just immediate from from the story which is that we can think of X as X 1 plus X 2 plus bla bla bla plus X n where XJ is 1 if the J trial is a success zero otherwise so that's called an indicator random variable and we'll be using them a lot it's called an indicator because it's just indicating was the J's trial successful or failure one-one indicates success zero indicates failure right so it's just a very simple encoding success is one failure is zero okay so so then if you think about what I wrote here as an equation is exactly the same thing as what I wrote in words here because this just says add one every time there's a success at zero if there's a failure so that's just how we count right if I want to count to five I would go one two three four five that added one five times to just count the number of successes so this very very simple I'm just doing one plus one plus one each time there's a success but it can actually be subtle and it's actually very very useful to think of it this way because what we've done is broken down a somewhat complicated distribution into very very simple a sum of very very simple things that are just 0 or 1 so that that's useful and and the X J's are independent x1 through xn I'll introduce one more acronym now our iid Bernoulli P the acronym iid is used a lot in statistics so you may as well define that now iid means independent and identically distributed so what that means is because we assume that the trials are independent and and these are the indicators of success for each trial so so those should be independent that's what independent means identically distributed means all of these X's have the same distribution in other words they're all Bernoulli P and remember from last time a Bernoulli P is just just means one with probability P and zero with what probably one minus P so that the key a very very common confusion is to confuse random variables with with distributions the random variable is you know mathematically it's a function like we're defiant last time but intuitively this is just it's just you know x1 is 1 if the first trials is success and zero otherwise right that depends on the first trial the distribution is saying what are the probabilities that X will behave in different ways so you can have lots and lots of random variables that all have the same distribution because the distribution is saying what's the probability that will do this what's the probability that will do that ok but they're not the same random variable they're independent but they all have the same distribution Bernoulli P ok so that's what iid means so all this is saying is just just decomposed the number of successes as add 1 every time you have a success ok and then the third way which we also mentioned briefly last time is to write down the PMF I'll talk more about PMF today but I introduced it briefly last time that's the probability mass function and all that is is is just saying what's the probability that X takes on any particular value and and last time we showed that for the binomial that's n choose K P to the K Q to the N minus K where Q equals 1 minus P because if we have n trials with exactly K successes then that's the probability for one specific way to do that and then that's the number of ways to choose where the successes are ok so we can immediately derive the PMF so that's called the PMF now but but let me go over here and talk a little more generally you know what's a PMF what's it what's a distribution so we usually abbreviate random variable to RV because we use them so much it's nice to have a shorthand for that so just just to remind you from last time what what is it if we have our sample space s and if we think of that in the pebble world interpretation then that we the finite you know the case where there's finitely many there are some some pebbles okay and I draw them as open circles just in case I feel like writing numbers inside of them but they're there this is our sample space there are different possible outcomes a random variable is a function that assigns a number to each pebble so we want we could think of it as like seven seven five I'm just making up some numbers seven seven five five three three three but whatever numbers you want I kind of lined them up so that within each column they have the same number it doesn't have to be that way at all I mean this could be some this could I drew a very simple example because that's something I can draw easily on the board the sample space could be as incredibly complicated possibly infinite space that you could never draw it could be very high dimensional infinite whatever but the but the picture is each each pebble get gets assigned a number okay so so we're starting with this abstract space of possible outcomes and then we're assigning a numerical value to each one that that's what a random variable means so if we talk about x equals 7 for example where x is a random variable we have to think what does that mean that's an event remember an event is a subset of the sample space so in my picture that event would consist of these two paper two pebbles that I labeled seven so it doesn't matter what the numbers I don't know if the numbers are easy to see yet but I call these two seven these are fives and these are threes but you can put whatever numbers you want x equals seven what does that mean it that's not like an equation that you solve in some way you know it looks strange the first time you see it because that's a function that's a number and what are we really saying here we're not saying try to solve that equation we're not saying it's a constant function what we're saying is that that's just notation for an event what event is it well it's the event that x equals 7 right which is those two pebbles in this picture okay so that's an event so therefore it makes sense to write down things like we're going to find something called a CDF so x equals 7 or x equals in any little X is an event X less than or equal to little X is an event so it makes sense to talk about its probability so we could write P of so if we let f of X f of little x equals the probably that X is less than or equal little X then this this function capital F is called the CDF of X and that stands for cumulative distribution function that's too many letters to write so that's why we call it CDF so let me explain what this function means what why is this function important so I'm just letting X be any any random variable right now so the sample space could could be infinite could be much much more complicated than this you can use that picture for intuition but we could have some incredibly complicated sample space random variables is you know whatever the outcome of the experiment is the random variable assign some numerical value right so X less than or equal little X is an event but either either right before you do the experiment you don't know what X is after you do the experiment maybe you observe X happen to equal 7 and then if this little X happened to equal 9 then we would say okay this event occurred because step 7 is less than 9 that's all it means so that's an event that's so so so this is saying as a func of little X the probability of this event that's called the CDF and that's just one way to describe the distribution okay it's not the only way there's other ways to describe a distribution but this is one way that in principle determines all possible probabilities about X so so if later we wanted to know what's the probability that X is between 1 & 3 or between 5 & 9 we'll do some examples like that next time but but the idea is as long as we know this function capital F we could answer questions like that what's the probability that X does this what's the probably that X does that all of those questions could be answered in terms of this so-so CDF is a way to describe the distribution because it's telling us the probabilities of different different possible values for X and and let's talk more about that the PMF probability mass function this is only for discrete random variables so I have to tell you what the difference is between a discrete random variable and a continuous random variable a discrete means for our purposes usually we can just think of it as meaning that it takes integer values like a binomial is discrete because the possible values are 0 1 2 3 up to n right integers but in general it means that the possible values they don't actually have to be integers but it has to be something you could you could list may be a finite list may be an infinite list so our a 1 a 2 a 3 etc that you could look you could list out this list might end you know with a n or it could go on for forever I'll list both of those cases a n or it can go on forever a 1 a 2 etc so you could list out the possible values of the random variable a content continuous random variable would would be there's a little more to it than that but it would be the case where where we could could take on any real number or any real number in some interval that kind of thing there are random variables that are neither discrete nor continuous either you can kind of have a hybrid of discrete and continuous but but if we understand discrete and continuous then you can handle those those hybrids as well so so so we'll mainly be looking at discrete and continuous random variables and we'll start out mostly by doing discrete and then late later in the semester we'll be doing more continuous but we'll still be using discrete as well okay so that's what a discrete random variable is and once we have that we can say what's the PMF the PMF is is just the probability that x equals a J for all J so to say what's the PMF you have to say what's that for example if these a 1 through a n are just you know the integers 1 to n it says to say what the PMF is we have to say what's the probably that x equals 1 what's the probably that x equals 2 and so on we have to specify all of those probabilities so so clearly that has to satisfy two things let's call this PJ so PJ is the probably that x equals 8 AJ so we're specifying right that's why I called this lat last time I got a blueprint for X this is saying this is saying what just you know what what are the probabilities that X will take on certain values that that's describing that they were the randomness of X so so what does PJ have to satisfy well of course PJ has to be greater than or equal to 0 because it's a probability and the other condition we need is that the sum over all J of PJ equals 1 because if this sum if the sum is greater than 1 well well that doesn't make sense if the sum is less than 1 then seems like we haven't listed out all the values all right go X has to do something I'm assuming that this is a complete list of the possibilities so this just says X has to equal something so the sum has to equal one so these are the two conditions you need for a PMF and if you want to go the other way around you could say pick any numbers PJ satisfying this in this and then that would define a valid PMF so so these are the conditions for when is a PMF actually valid so so this is you for discrete random burials usually it's much easier to use the PMF than the CDF the reason we need CDF is that it's more general that this this definition works for any random variable this only helps us in the discrete case right now we're focusing on the discrete case so we can mostly be doing pmfs so so if you had a problem where I said you know find the distribution what that means is either give the CDF or the if it's discrete give the PMF either way usually the PMF is going to be easier but those are equally valid ways to describe the distribution okay so coming back to the binomial let's just check that this is a valid PMF that I wrote down I should have written that this is for K between 0 and n an integer and this is 0 otherwise because these are the only possible values okay so is this valid well first of all this is greater than or equal to 0 that that's obvious so the only thing we have to check is that this adds up to 1 and in fact if we add up some K equals 0 to n and choose K P to the K Q to the N minus K how do we know that's 1 what is this sum remind you of it reminds you of the binomial theorem hopefully that this looks exactly like the binomial theorem by the binomial theorem that's just P plus Q to the N but P plus Q is 1 because because Q is 1 minus P so that's 1 that's 1 to the N equals 1 by the binomial theorem so by the binomial theorem the sum of the binomial PMF is 1 that's why it's called the binomial distribution because it's connected to the binomial theorem in this way so so that was easy to check and if somehow this sum is not equal to 1 then there's something seriously messed up right because it has to it has to equal something so that the PMF has to add up to 1 so so that would mean that this equation is wrong but but this help to check ok that that's comforting it adds up to 1 that makes sense okay so that's the binomial um and now let me come back to the thing that I did it at the end last time with the sum of two binomials okay and let's actually see why that's true from all three of these perspectives first one I already did last time but I'll remind you it's very quick X is binomial NP why is binomial NP and they're independent and we want to show that the sum is binomial n plus M P okay so that's what we did at the very end last time but I won't want to show you different ways of seeing this that's what we're trying to show and for and before we can actually say more about it actually I should make sure everyone is clear on what is X plus y actually mean we're adding to mathematically speaking we're adding two functions the way you add two functions is is that the sum of two functions they have to have the same same domain and in this case the domain of both of them is pebble world okay now if you have two functions with the same domain and you want to add them what do you do well you just compute both functions and then you add the values and then that's your new function so that the sum of two functions is defined as compute both functions and add them so so therefore this makes perfect sense I can add up Brandon as long as L as long as all our random variables are on the same sample space s it makes per make sense to add them multiply them square them cube them whatever we want right we could take you know e to the power of this thing cubed if we want for whatever reason that's a random variable right how would you compute that random variable well you observe something you know eat before you do the experiment you don't know what it's going to be after you the experiment you know x and y take on certain values and then you just compute the function and so you have a new random variable so that's all we're doing intuitively this is the number of successes in n trials this is the number of successes in M trials and and and these are separate sets of trials because I said they're independent so this could be like flip the coin n times and then flip the coin M additional times so we have a total of n plus M coin flips or trials notice it's the same P for both this will not work if this one is like 1/2 and this one is one-third okay but I assume P is the same for both so we have n plus M trials each trial has probably success P and so that what's the number of successes well I'll just not add up this number of successes plus this number of successes that's the number of successes so Spinal n plus M P ok so that's what I did at the end of last time that's just immediate from the story we don't have to write any algebra or anything like that but I think it's helpful to also see how would this work from from this perspective and from this perspective which we didn't do last time so from the second point of view let's say we wrote X as X 1 plus bla bla bla plus X N and if we write y equals y 1 plus bla bla bla plus y M where all of these X's X J's and Y J's are all independent Bernoulli P random variables then X plus y equals well it's the sum of the x's plus the sum of the Y's that's all okay I didn't do very much with that I just I just this Plus this okay now all we have to do is recognize what's what's the form this is in this is just this is just a sum of n plus M independent Bernoulli P right so that's the sum of n plus M iid independent identically distributed Bernoulli Peas but according to this if we take a sum of N in iid Bernoulli peas that that's a binomial NP and here there's n plus M of them so so therefore we have binomial n plus M P so this this method is also easy because we're just adding up in iid Bernoulli's with the same P so it's binomial that there just isn't anything more to it than that because they're all independent it would get more complicated if they're not independent all right now for the third way we have to actually do a calculation so so what it's a useful calculation to just to see so the third way would be use the PMF so I want to show that X plus y is binomial by computing its PMF so what I need to do is compute the probability that X plus y is some number K and if this is of the same you know binomial form then we'll say okay it's a binomial and if not then something's wrong somewhere because we have a contradiction all right so how do we do that well this X plus y it doesn't seem too obvious how to deal with it unless it's our unless you know using this way or this way that both those ways make it easy but if you're not thinking in those terms it just sounds like we've added two random variables that sounds like it could be something complicated in fact in statistics this is called a convolution and I don't think it's a coincidence that convolution and convoluted both sound very similar so we have to do this convolution how do we do come we've never studied convolutions in this class I will we will come back to convolutions later in the semester but at this point it's just convolution what's that well so this is you know same same strategy though right wishful thinking what do we wish that we knew well I think this would be easier if I knew the value of X if you want you can also assume you know the value of y either way okay well let's conditioned on X you can condition on Y if you like that more let's condition on X because because this will be a lot easier if we knew the value of X so that suggests you use the law of total probability where we conditioned on X so we know that this is the probability that X plus y equals K given X equals J times the probability that x equals J summed over j j goes from 0 to n but actually let's just sum up to K because if X if the sum is equal to K there's no way that one of them X on its own could not be greater than K because you're adding up to non-negative things so we could sum up to K now let's just compute this this is the sum J equals 0 to K of the probability ok so X plus y equals K given x equals J that's useful information right we can plug in x equals J and rewrite that for y this says y equals K minus J notice it's still given x equals J so we'll have we'll have to deal with this times the probably X equals J well that's just immediate because X is binomial so I'm just going to write down the binomial PMF it's n choose J P to the J Q to the N minus J now a key fact here is that X and y are independent independent I haven't written out the formal definition of independence for random variables yet but if you understand independence for events then you understand it for random variables which is just that knowing that this event occurred x equals something independent means that if we know X it gives us no information whatsoever about Y so if we know that x equals J that tells us nothing about Y so independence means we can just cross this out that's by independence so it's just the probably that y equals K minus J because the definition of independence is that conditioning on X gives us no information about Y okay so now that this thing in front is just the binomial PMF again y equals K minus J so that's just going to be M choose there were M trials for at for y M choose K minus J P to the K minus J Q to the M minus K plus J times the stuff we already had there and choose J P to the J Q to the N minus J ok well that looks ugly but we can simplify it at least a little bit we can collect the powers of P so that's P to the K P the K does not depend on J so we can take it out of the sum it's a constant collect the Q's M minus K plus J and that's n minus J so we have M plus n minus K the J's cancel so that's also constant that comes out so that's Q to the M plus n minus K times whatever is left over is this sum J equals 0 to K M choose K minus J and choose J well that looks pretty ugly but this sum look familiar or anyone yeah vandermonde very good this thing is what we called the vandermonde you don't have to memorize that this thing is called vandermonde but when we when we were doing story proofs this is this is exactly one that we looked at a sum that looks like this it looks like a looks like a complicated sum but but using a story is actually easy to evaluate this and that's called the vandermonde identity and the vandermonde set identity says that this will just equal m plus n choose k so this equals sorry I'm going right to left here vandermonde sense says that this equals M plus n choose K so that was the vandermonde identity we did last time and not last time but we did it a while ago using a story proof okay so so so that's M plus n choose K according to the vandermonde thing we did and that means well now that looks exactly like the the binomial n plus M P PMF okay so so that's true so obviously this was a much more complicated and difficult way to do it especially if we didn't know or didn't remember how to do this sum then we'd be stuck at this point luckily we are we already did the vandermonde or earlier so I so I can just quote that result but even with this it was still a lot more work and without this then he would just be left with this hideous son okay but it still worked so so we you know we would have a contradiction so another point of view of what we just did is that we actually just proved vandermonde again right because if this were not equal we'd have a contradiction there therefore that this identity this has to equal n plus and shoot choose K otherwise you have a contradiction so so that's our second proof of vandermonde side entity all right so that's that's the binomial distribution we'll be seeing a lot more with it and I want to kind of contrast that with account like a common mistake is kind of thinking that things are binomial when they're not so I want to give like kind of a simple example to about you know they should be careful about that if you the key assumption saris boards are so squeaky the key assumption is that the trials are independent and they all have the same probability of success okay so if the probability of success are different we can't say it's binomial and if they're not independent we can't say it's binomial so let's let's do an example that's not a binomial yet a common mistake would be to somehow think that this is binomial so here's just a simple example to think about with with cards and suppose we have random five-card hand front from a standard 52-card deck okay and we want to know what's the distribution find the distribution of the number of aces in the hand right so we pick a random subset five cards out of 52 all all subsets of size five equally likely and the number of aces that you know there's there's some number possibly zero of aces in that hand we want to know what's its distribution and so as I said before we say find the distribution we could we could find the cdf but it's going to be easier to work with with with the PMF this is certainly discrete is the number of aces is either 0 1 2 3 or 4 so this is a discrete problem so it's going to be easier to find the PMF I'll say PMF or CDF because finding a cdf would be equally valid or just be more complicated so let's just do the PMF okay so how do we do that so let's let X equal the number of aces that's our random variable in this kind of notation on the 1 hits very intuitive right I'm just saying let's look at the number of aces right so it's very intuitive on the other hand sometimes students kind of struggle with how do I interpret this as a function right the you're not you know most students are used to writing like f of x equals x cubed that that's a function so so you know it's worth thinking through why is this actually a function well it's a function from the sample space to integers between 0 & 4 it's a lot easier to write it out this way then then to write out something you know some some other type of equation for it but it is still a function okay so let's find this PMF so we need to find what's the probability that x equals K first of all so that's what we need to find well first of all this is this is obviously zero except if K is 0 1 2 3 or 4 right you're not going to observe two and a half aces or five aces if it's a standard deck you write it's not possible so those are the only possibilities and for a lot of these problems it's helpful just just starting by listing out or describing what are the possible values okay because it's a common mistake with probability is to is to list some PMF where either it doesn't sum to one or it involves simple you know impossible values or things like that okay so those are the possible values that that's just obvious there's four aces in the deck right okay so so we can actually immediately conclude that the distribution is not binomial because we can we can think of each card as being a trial but those trials are not independent right because if the first two cards are aces or if the first cards an ace it's less likely the second card is an ace and the more aces you have in the earlier cards you're dealt it's less likely to have more in an extreme case if if four cards if the first four cards you get dealt are aces then the fifth card is definitely not an ace so the trials are not independent is not binomial um but let's just find that the PMF just directly just by thinking about it what's the probably that X equals K not good not because we memorized anything but just by thinking about it so we can go go back to the naive definition of probability because because I said all five card hands are equally likely so there's 50 to choose five possible hands equally likely so we're using a naive definition now we want to know what's the probability that the number of aces is equal to K well there's four aces in the deck and we need to choose K of those aces there are 48 non aces and we need to choose if we have K aces and five cards we have five minus K non aces so that's four K between zero and four so that's just a multiplication rule naive definition kind of has a neat pattern which is that 4 plus 48 is 52 5 plus K plus 5 minus K is 5 it's kind of nice looking alright so that's the answer but that's not the whole and end of the problem yeah because I want to try to check whether this actually makes sense and try to see how is this relate to other stuff first of all it does this probability remind anyone of anything that we've seen before yeah it reminds you of the vandermonde yep that's that's where it's very reminiscent of the vendor which you know kind of looks like well it is this and this kind of looks like that okay we'll come back to that point anything else that reminds you of from the homework yeah the elk problem this is like the elk problem why do you say it's like the elk problem yeah right uh-huh okay that's a great observation that this looks exactly like the elk problem now it's not just some coincidence that it happened to be like somehow you memorize the answer to the elf problem and you see it kind of looks like like that well you saw there's actually a connection there which is that you have these two groups so if you remember the elk problem I'm not expecting you memorize the answer to the elk problem okay but but you should under you should remember the story of the elk problem which was that you have there's a population of elk some of them are tagged some of them are untagged you collect a sample and you want to know what's the probability that that sample has exactly K tagged elk that was the problem okay you don't have to memorize the answer to that but but the problem is a useful one to think about this is exactly the same instead of elk we have cards tagged so instead of tagging elk we're tagging cards as aces now what does it mean to tag a card as an ace well it means it has an ace written on the cards are tagged already right four of the cards are tagged as aces the other 48 are not tagged as aces so we have four tagged cards we have 48 untagged cards or tagged means an ace okay so it's the exact same thing so it's not just like it's um same as okay so that's a key sort of thing that you should be thinking about in this course is trying when we do one problem and see how it relates to other problems it's exactly the same okay now coming back to the comment that this reminds you of vandermonde let's actually write this out more general in a more general version suppose that we have let's say we have wait we have we have like like you know a jar full of marbles and and let's say B of them are black and W of them are white marbles okay ah you pick pick let's say n of them pick random sample simple random sample means that all subsets of that size are equally likely of size let's say n okay so so then the question is what's the distribution of the number of white marbles in the sample number of white marbles in the sample notice again that's exactly the same as the elk problem and the ace problem where we're thinking instead of thinking of tagged and untag we're thinking of white and black but it's the same problem okay so we can immediately write down the answer the answer let's call let's call this X okay that's our random variable so we want to find the probability that x equals some number K and we can immediately write down the answer because the same thing we have we have a sample that we have we have B plus W marbles let's say w+ b marbles of which we choose n and now how is it possible that exactly KR white well we must select K exactly K R white that means we need to choose K out of the white marbles and however many it is left if there's K white and we have a sample size n there must be n minus K black marbles so that's B here and let's just just for emphasis write down where where is this non what are what are the constraints here well first of all it must be true that 0 is less than or equal K less than or equal to W because we couldn't possibly have more white marbles than there exist white marbles similarly we must have 0 less than or equal n minus K less than or equal to B but remember we did take the convention that if K is greater than W we take this to be 0 so so this is just for emphasis ok this distribution is called the hypergeometric so if we say hyper geometric distribution you should you should immediately think back to the elf problem and you should be thinking of this the hyper geometric distribution is defined by this story or equivalently by the elf's story that's the name of the distribution this is its PMF but it's it's you know if you just memorize the PMF that's not going to help you to recognize when you should apply this distribution so that the key thing is to understand the story of the hyper geometric which which we now have three versions of and we'll see more later now so that's not a binomial because you could say like even if you I mean I was imagining grabbing the marbles all at once but if you want you can imagine picking one at a time the key is that it's without replacement right you're sampling without replacement here if you picked a marble and put it back and picked and put it back and then you wanted the distribution for the number of white marbles that would be binomial because if you replace it each time then you just you just reset things and it's independent but if you're sampling without replacement then the trials are not independent so we do not get a binomial so that's without replacement so that's a key distinction between hyper geometric and binomial and already from that we intuitively we have we have an intuitive connection between the hyper geometric and the binomial which is that binomial as I said if you if you put the marble back each time pick one put it back pick one put it back that will be about binomial firth of the distribution of the number of light marbles hyper geometric if you don't do replacement that that tells us something important which is that suppose that the number of marbles is like a billion and suppose that our sample is very small compared to a billion let's say ten okay now if we're picking two apples out of a billion it's extremely unlikely that we would pick the same marble more than once so it must be that sampling with replacement and without replacement should behave very similarly there and mathematically you know we can derive those kinds of things but intuitively under conditions like that where with replacement without replacement don't have much difference then the hyper geometric should be approximately binomial all right now there's one other thing we need to do with the hyper geometric which is to check that this is a valid PMF right so first of all this is non-negative okay secondly we need to show that this sums to one so if we sum this up over all possible K so let's say we sum K equals zero to ww2 sk' be choose n minus K divided by W plus B choose n the W plus B choose n is a constant so that just comes out that comes out in the denominator and I'll put one times this then you can put it in the denominator that created a denominator for it okay so you can take this out over in the denominator I'll put one over one now I now have a denominator to stick this in now what's left is this sum here well this sum should look very familiar that's the vandermonde again and in fact that vandermonde this is W plus B choose n which is what we took out therefore we immediately get 1 again by vandermonde so that's consistent on the one hand on the other hand we can think of that as a proof of vandermonde because if that didn't work then now it will be a contradiction therefore we proved vandermonde for the third time right so so this you know that that's the hypergeometric distribution okay one last thing just just to talk a little bit more about CDF's I just want to draw a picture of what a CDF might look like so CDF remember that's the probability that X is less than or equal to little X now it could look a continuous one I'm just going to draw a continuous one and a discrete one and we'll talk more about that about these next time but just to have a picture in mind continuous one might look like this where notice if X is like you know negative a billion like that if you let X be more and more negative it gets less and less likely that X is less than or equal right so so it's going to approach zero this way so so imagine you have a function where here's 1 here's 0 and we have a function that that's it's increasing okay because that as you increase F little X it's more and more likely that this is this event occurs so you might have a function this is just an example but just it helps to have a picture in mind of what a CDF looks like so I'm drawing something that is continuous and it approaches one as you go this way it approaches zero if you go this way and it's increasing like that okay so we'll see a lot more like that later but just to have a quick picture of a discrete one it could look like it's going to have jumps okay because because look for example let's assume X takes value possible values 0 1 2 or 3 or something like that well then at zero it's going to jump like that and then at one here's one two three and then it's going to jump again so maybe it jumps to say there and it's going to be flat and then it's going to jump and then it's going to jump I won't say it jumped like that and then it stays at one forever and I'm drawing open circles there because it takes the higher value at each one because they define this as less than or equal so this is one again this would be a CDF of a random variable that has values 0 1 2 3 and oh it jumped let's see Oh jumped it to so this one actually is I would need one more jump anyway so this one is actually 0 1 or 2 okay and then the probability is is 1 that it's less than or equal to 2 so for in the discrete case you have these complicated jumpy function that's easier to use the PMF in the continuous case it's often useful to use the CD alright so that's all for today
Info
Channel: Harvard University
Views: 141,271
Rating: 4.8981233 out of 5
Keywords: harvard, statistics, stat, math, probability, variables, Hypergeometric distribution, probability mass functions
Id: k2BB0p8byGA
Channel Id: undefined
Length: 50min 24sec (3024 seconds)
Published: Mon Apr 29 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.