Binomial Vs Hypergeometric

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi in this video I want to talk about the difference between binomial random variables and hyper geometric random variables in short if you just came for a quick answer the difference is that in hyper geometric random variables the the outcomes are not independent right so you do sampling without replacement in the binomial distribution by normally random variables the outcomes are independent and so you have sampling with replacement okay but let's what I'm gonna go ahead and do an example of each of those scenarios and I'm gonna talk about also the binomial approximation gel hypo geometric and why that makes sense all right so first let me go ahead and zoom in and let us look at the criteria for the binomial experiments alright so in a binomial experiment very important you have a set number of trials okay so in this case we have n trials and each trial is independent and identical okay next criteria is that there must be two possible outcomes right so every trial has two possible outcomes either a success or a failure next the probability of success is the same for each trial and our random variable is going to be the count of the number of successes okay so this is the binomial probability distribution or probability mass function this sometimes you'll see it written as the probability x equals x sometimes you'll see it written as f of X sometimes you'll see it as be x colon and peat all right they're all the same thing this is the binomial distribution function or by binomial probability mass function okay so it's going to go ahead and do an example okay so the classic example for the binomial probability distribution function is the flipping of the coin okay well so let's talk about this flipping of the coin we're gonna fly a bit five times all right so since we flipped it five times it's basically there's five trials okay so five trials and so that means N equals five there are five trials now are these trials independent when I flip it what happens on one flip has no influence on what happens on the next clip so yeah they're independent trials and they are identical as well because it's the same coin right so I should be having identical probability of outcomes each time okay so the first criteria of the binomial experiment its met we're going to count as the number of heads okay so there are two possible outcomes either a head or a tail right and so we are going to count the number of heads so that meets the criteria of having two possible outcomes and our random variable X is the count of the number of successes or the count of the number of heads so let me go ahead and define X as being the count of the number of heads in this case a head it is going to be what is called as success all right so we count that the successes all right next we're going to find the probability of getting four heads and five flips all right so um one thing that we're missing here we need n X and we need P remember what is P it's the probability of success and then we need to be the same for each trial Rhaenys charles identical so it will be the same so what's the probability of success in this case what's the probability of getting a head probability of success should be the same as probability of getting the head in this case it will be 0.5 all right so because there's basically there's two they're both equally likely head and tail right all right so now let's go ahead and write down our probability distribution function all right and choose X P to the X Q to the N minus X all right so n Sherman is 5 oops what am I trying to find the probability of anunnaki that was myself trying to find the probability that x equals 4 we're trying to find the probability of getting four heads and five flips so try to find the probability x equals four n is 5 X is 4 P probability success is 0.5 brought to the power of X X is 4 Q what's q Q is the probability of failing okay so that means probability of getting a tail in this case well what's the probability of getting a tail that's 0.5 so this case it would also be 0.5 and - X would be 5 minus 4 all right so you go ahead and solve that out you can find the probability of Game four heads out of five flips all right so that's a very simple example for the binomial distribution but most important things you saw here is we defined the number of times that we're going to flip the coin ahead of time okay so we'd find the number of trials that there were two possible outcomes that the out that each trial was independent of the previous trial and the probability of getting a success for each trial was the same what's the same every time we flip the coin okay so we basically use the same coin okay um let us move on to the hyper geometric distribution now okay so now in the hyper geometric distribution we still have n you still see the same end that you saw in the binomial distribution except now n is the number of items selected without replacement from a population of capital n items okay so before n which lists the number of trials there was no you know big n in the binomial distribution now there is a big end for the hyper geometric distribution all right each item can either be a success or failure oh that's exactly the same as the binomial distribution where we had two possible outcomes okay so that's the main thing that ties binomial and hypergeometric together X is the counts of the number of successes in the end selected items that's also very similar to the binomial distribution okay so our random variable it's very similar main difference here is this top piece that there are n items selected without without replacement so these trials are not independent of each other they depend on each other because we do not have replacement okay and then this is our probability distribution function okay sometimes you'll see this written us and f of X again or you can see it as an H with all the different X and capital N and K in there okay so let's do an example we'll talk about how to use this guy okay the most classic example for the hyper geometric distribution is a deck of cards okay so I have a deck of cards here so this is my population I want to be selecting from this deck without replacement which tends to be the way we select from decks right like if I'm handy and if I'm playing a card game I will give you 1 2 3 4 5 that's your deck right that's how we play cards usually so usually we select from a deck without replacement right unless you're doing some sort of magic trick or something but like we're just playing games usually you you hand someone their deck or the you hand someone their cards without replacement from the deck okay so without replacing that means I give this to you and I don't put it back before I give you the next one ok so that dependent relationship is what's going to make this an example of a hybrid geometric distribution rather than it being a binomial all right the fact that I am NOT putting in the card back in the deck before I give you another card all right so we have 10 cards then we're going to select from the deck without replacement okay and we want to find the probability of selecting three kings okay so just like in the binomial distribution we have a success and a failure right either you get a king or you don't get a king right so you still have the success or a failure but you know made the main difference being now that when I select a card I give you another one I give you another one I give you another one I'm randomly selecting cards from the deck but I'm not putting in the previous card back into the deck before I give you another one okay if I was putting in the card back into the deck each time and then giving you another one then I'd be back to the binomial right because the probability of getting a king each time I pick a card and then put it back randomly pick a card and then put it back that with replacement that would make this a binomial distribution but because I'm handing you the cards and not putting them back in the deck and I find the probability of getting three kings that's what makes this a hyper geometric distribution okay all right so this is a hyper geometric distribution I am selecting little and little n equals ten items or ten cards from without replacement very important without a replacement from a population of 52 cards each item is a success or a failure right using their king or not a king and X is the count of the number of successes or the count of the number of Kings out of ten cards so x equals the count of the number of Kings in ten cards okay so X could either be I get one card I could get two cards I can get three okay good one one king two kings three kings actually up to four Kings right but I can't get ten Kings why not there's only four kings in the all right so the most X could possibly be is 4 in this case all right and actually the number of Kings that are in the deck that's what we call K okay if the number of successes in the population that are possible right so there are four kings in the deck so that's what K is so K's are related to and here so actually let me put it next to it okay this is population this is coming from the population all right so X is related to the sample it's the counts on the number of Kings in the sample from the 10 cards you get okay so what I want to know I want to know what's the probability of X equaling 3 remember X could equal 1 2 or 1 2 or 3 or 4 and we want to know what's the probability of getting 3 Kings and we've been given a 10/10 from the deck all right so actually first let me go ahead and write down the distribution so that we can talk about how to use it so the distribution is K choose X and then n minus K and minus X and over N and choose F all right so now you seem to plug in what we know and then we can solve this probability so if I want to know what's the probability that x equals 3 I would do K which is 4 choose 3 times n minus K so 52 minus 4 is 48 n minus X and little n numbers 10 minus minus X would be 3 sat-7 okay and then all of that divided by n which is 52 choose 10 okay what's going on here this is basically this is the number of ways you could have chosen the 3 Kings that you got from the four possible this is the number of ways you could have chosen seven cards that are not Kings from the remaining cards and this is the total number of weight divided by the total number of ways you know just like ten cards from your entire population okay all right so that's the bright and that's the hypergeometric experiment so next I want to talk about the hypergeometric approximation to the binomial experiment okay so let's go ahead and zoom in and look at that okay so we can use the binomial distribution to approximate the hyper geometric distribution if the number of objects that are selected without replacement are from a large population all right so basically if your population is so big then whether or not you put the card back in it really doesn't really matter because your population is so big okay so it's always easier the binomial distribution is slightly easier to calculate than the hypergeometric that's why we like this approximation okay and so we would use this only for large populations okay so what's considered to be a large population a large population is a lot of times the rule of thumb is that if the sample size is less than five percent of the population then that's basically a large enough population okay so let me do an example where we can talk about how to use this approximation okay so now let's do an example where I have three decks of cards so my population is not just one deck not just 52 cards but three decks 156 cards population just got bigger okay now queue to trying to see whether or not you want to be using the hypergeometric or the binomial distribution it's approximations the hyper geometric is you want to think about is my population size really big okay so once you think oh is my population size really big this hypergeometric can I use the binomial approximation that's when you need to actually calculate if your sample size is going to be less than 5% of your population okay so now have a big population have all these card 256 cards okay think population and what I want to do is I want to select five cards from this so I'm only selecting five very small number cards from this big population and I want to find the probability of selecting four kings so again there's either success or there's a failure getting a king or not getting the king now without it said with replacement this would be binomial right because if I pick a card I see if it's a king or not and I put it back right so that would be with replacement and since each trial then would be independent if I put it back probability of getting the king each time is the same because I'm putting it back right that would be a binomial distribution right but I'm not putting it back I'm going to select five cards randomly from my deck and I'm not gonna put them back each time now the thing is the reason why that doesn't really matter that I'm not putting them back and the reason why I'm able to basically use the binomial distribution despite the fact that I'm not using replacement is because this population is so big whether or not I put it back it really has very small influence on the probability of success right the probability of getting a king in this case since I have I have such a big deck let's see actually what is the probability of gaining the king we're gonna need to figure that out eventually so the probability of getting of King right the first time I draw it how many teams are there there's three decks each deck has four kings so there are twelve kings out of a hundred and fifty six cards now that's the first time I draw but the next time I draw if I don't use replacement then then how many cards are in my denominator 155 cards right and it also depends on whether I got it that influences what the probability of this one being the king right so if this first one was a king I would only have eleven Kings left in the deck so you know there's there's a dependent relationship here as I pick right but the point is that that dependent relationship has this has sample size gets larger that taking one off of 156 it's really so minor it really doesn't have much of an influence on your probability of getting a king right so like 12 divided by 156 it's really quite similar to 12 divided by 155 right and you can imagine that your sample size increases 12 divided by a thousand it's really similar to 12/9 Maternity not right so you know with really more sample sizes where that ought to just basically forget the fact that we didn't put it back in okay so we're basically able to pretend like we use replacement even though we didn't okay that's the binomial approximation to the hyper geometric distribution okay so let's talk about how to use it all right so even though this is a hyper geometric distribution we're gonna pretend like it's a binomial distribution basically and pretend like there was replacement okay even though there wasn't all right so remember in a binomial distribution we are X is going to be count of the number of successes so count of the number of counts number of kings okay so P probability success will be the probability given a king which is 12 over 156 the number of trials n and we have five trials we're selecting five cards okay so remember our binomial distribution and choose X P to the X Q to the N minus X okay and choose X will be five Oh let me write this probability we're trying to find the probability that x equals four let me get four kings so 5 choose 4 P is 12 over 156 to the power of 4 and then Q what's Q number Q is just probability of not getting a king basically probability of failing so that would be 1 minus whatever P is so that's going to write it like that 1 minus 12 over 156 and then you would bring that to the power of n minus X so that would be 5 minus 4 okay and then you just go ahead and plug that into a calculator and you'd solve and find the probability of getting four kings now if you're interested in knowing how good of an approximation this is you could always use the hyper geometric distribution just gonna get really big numbers when you plug it in so you could use the hypergeometric distribution realize that the entire sample capital n is 156 that little n is 5 that the X we're trying to find the probability X is 4 and then K the number of successes in the population would be 12 okay so you could go ahead and plug this into the hyper geometric distribution and solve this solve the hyper geometric distribution and see how similar the answers are and they should be quite similar
Info
Channel: Michelle Lesh
Views: 11,934
Rating: undefined out of 5
Keywords:
Id: pUQfQC6hTPA
Channel Id: undefined
Length: 20min 56sec (1256 seconds)
Published: Fri Mar 17 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.