Hypergeometric Distribution EXPLAINED!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome to the third video in this series on probability distributions this one today is the hyper geometric distribution and a kind of feeling this one might stir up a little bit more interest than the others perhaps due to its relevance to card playing and in particular poker the classic example of a hyper geometric distribution is a five-card poker hand and here I've got the probability distribution of the number of spades in a five-card poker hand so out of five cards you can either have 0 spades one two three four or five spades and that collectively is a hyper geometric distribution but let's find out why now for our quick rundown the first thing we can note is that it's a discrete distribution much like the binomial and Poisson distributions so you can only get discrete a whole number outcomes from your distribution you can't for example get 1.5 spades in your poker hand and it's equivalent to the binomial distribution but without replacement so this is why a card playing analogy actually really works because when you've selected one card in your hand the probabilities of getting a spade or getting an ace or anything like that the probabilities change for that second draw from the pack because you've already got one card in your hand now if you'd actually replaced that first card back in the deck and took another sample from the deck of cards then that would be a binomial distribution but not here as I said the probability of success changes with each draw and that depends on what you've actually selected in that first draw as well now this one's defined for three parameters capital n capital A and lowercase n which means that for a set combination of these three values you've got your particular hyper geometric distribution so what are these three parameters represent capital n is the total population size so in the case of a deck of cards and is 52 Capital a is the total items of interest in the population and if we're concerning ourselves with the number of spades in a deck of cards then that's going to be 13 because there are 13 spades and n is the sample size and if we're dealing with a poker hand and it's going to be five because you have five cards that you're selecting out of the total 52 all right so what's the probability mass function for the hypergeometric distribution well it gets given this fairly horrid looking formula but all these are are just those combination functions so you see these sort of elongated brackets with a and X here that's the same as writing a with a C and then X so you might have seen that before saying the number of ways you can choose X items out of a population of a now we've got three of those combination functions here nonetheless you can use your calculator to find them and if say we were interested in finding the probability of getting two spades in our five-card poker hand we'd be after just the height of this bar here the probability of getting exactly two and to get that we first need to figure out what each of these values represent capital n being 52 capital A being 13 little n being 5 and X being 2 X here is the number of items of interest in our sample and if you sub in all that into the formula you'll get point two seven four so there's a twenty seven point four percent chance of getting two spades in a five-card poker hand now you don't need to use that formula and a calculator to do this Excel has a handy suite of functions that'll help us with these distribution type questions so if you go equals hype geom dist it'll provide for you the probability mass function so long as you put in the correct two pieces of information so you can see there's quite a bit of information you need to include in this formula the first piece is a value of X which is the number of items of interest in our sample or the number of successes in our sample that we are considering in this case 2 the next piece of information is lowercase n then you've got capital a and then you've got uppercase n so it kind of goes in Reverse from the way I've written it up here and just be careful they have to actually be in that exact order and the final thing to note is that we have to write false because this final argument determines whether it's a cumulative distribution that we're interested in or not and of course in this case we are not we want just the probability mass function which is the height of that one discrete outcome and that's zero point two seven four so how did we find the cumulative distribution function for the hyper geometric distribution well that has a big huge ugly formula so we're not going to really do it by hand however what we can do if we wanted to find the cumulative distribution at - meaning the probability of getting two one and zero combined we can just do the same thing but make sure we write true in that final argument and that'll add up those three columns for us to provide us with zero point nine oh seven so that's the combination of those three there you can also find the expected value of the number of successes in your sample or number of spades in our sample here which is just little n which is five in our example five cards x by a on n which is quite intuitive really if you know that there's thirteen spades out of 52 this is going to be 1/4 so there's a quarter of a chance of an individual card being a spade so out of five cards you're expecting 5 times 1/4 which is five on four five on four spades so 1.25 spades is the expected number from five cards the variance is some big long formula here which you can find manually if you'd like but that's rarely particularly relevant so here's a question for you it involves a game of Texas Hold'em and just as a background I provided you just a few sentences here to explain the important parts of Texas Hold'em for those that might not be so familiar it's a variant of poker where each player holds two cards in their hand now five additional cards are then dealt as common cards on the table for all the players so each player then can see a total of seven cards that's two in a hand and five on the table now a flush occurs when a player can see a set of five cards from the same suit so out of your seven cards that you can see if five of them are the same suit then you can call that a flush okay so pause is dealt the following hand in a game of Texas Hold'em against here's a longtime rival puck so he's got a queen in a Jack of Diamonds what's the probability that pause scores a flush that is after that five cards are dealt on the table that's are just pausing this and giving it a go yourself to see if you get the correct answer but I'm gonna jump straight to it so to get a flush and pause needs an additional three diamonds from the five common cards so this is definitely a hypergeometric distribution with the following properties now he's already got two cards in his hand so realistically we've got a population of 50 unknown cards that's why n is 50 and out of those 50 cards we know that 11 of them are diamonds there's a total of 13 but he already knows that two of them are in his hand there are about to be five cards dealt from that 50 and so after that the probability of three four or five diamonds being dealt into those five cards so if we were to draw up the distribution you can see that it's going to be three four and five summed up all these yellow columns here five of course is going to be a very very unlikely occurrence but nonetheless a real probability so if you're going to do this using Excel which is probably advisable you can go equals one minus the hypergeometric distribution where I've put in two as my number of successes of interest here and the reason why I've put in two is that I'm going to use the cumulative distribution and the cumulative distribution goes to and below so even though I want this yellow region up here I'm gonna go one minus this green region down here one minus two one and zero and that will get me the remaining yellow region up here because we know that the total of all these bars must sum to one because it's a probability distribution so if you just use Excel you'll get zero point zero 6 3 9 9 8 now I'm sure a few of you are thinking wait a minute pause can also get a flush if all five common cards are of the same suit so even though he has two diamonds in his hand so he's got an advantage in diamonds he can still get a flush in Hearts if just by chance all of those five cards are hearts right so the total probability of him getting a flush involves the probability of him getting a flush in diamonds which we already know that's what we just calculated the probability of him getting a flush in hearts now if you're going to do that this would be the formula you'd put in there we need 5 hearts out of a total of 5 cards where 13 hearts exist in the deck with a total population of 50 and we make sure we put in false for that final argument because that tells us it's going to be the PMF the probability mass function at that point if you do that you get a very small probability but nonetheless there is a probability of just getting 5 hearts dealt out in a row and that would be the same for spades and clubs as well because obviously the suit doesn't make a difference so if you tally all that up you have a total probability of six point five eight percent of pause getting a flush which is not that high is it now that's about it but I've left you with a bonus question because I was keen on seeing whether that's significantly higher than say another hand where you have off suits so let's just say that puck is dealt the following hand here three of clubs and an eight of Hearts what's the probability that puck scores a flush now let's just assume that puck can't see pauses cards so this is the probability of park scoring a flush as it appears to puck and I'll put my answer at the bottom of the description of this video so you can check to see if we've got the same answer but that's it thanks for watching here are some links if you want to keep in touch or suggest other videos you might want to see
Info
Channel: zedstatistics
Views: 43,300
Rating: undefined out of 5
Keywords: what is the hypergeometric distrbution, hypergeometric distribution explained, zedstatistics, zstatistics, justin zeltzer, hypergeometric
Id: upVJ4YqTlC4
Channel Id: undefined
Length: 12min 12sec (732 seconds)
Published: Tue Apr 18 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.