Poisson Distribution EXPLAINED!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] all right folks it's the Poisson distribution for today named after a French dude that did a whole bunch of stuff in stats for our purposes we only care about his work as it relates to probability distributions so here's a quick rundown firstly we know it's a discrete distribution meeting there's only a discrete set of values that this distribution can take but what does this distribution actually describe well it describes the number of events occurring in a fixed time interval or region of opportunity so the classic example is you know how many customers does a bank teller get every hour so in that instance there's a fixed time interval of one hour and this distribution might describe the number of events that occurs within that hour but I'm not too sure that that many people are going to bank tellers these days perhaps a better modern-day equivalent might be the people arriving at an Apple Genius Bar because they're freaking iPhone 7 headphones don't work back on topic the next feature about the distribution is that requires only one parameter which is the expected number of events per time interval lambda so in this case I've put a distribution here with lambda equal in three so maybe that's three customers every hour or whatever but the other thing to note is that it's bounded by zero and infinity so unlike the binomial distribution if you've been watching this series the Poisson actually continues on forever my graph here stops at ten but there's theoretically a very small probability of there being ten events in this time interval and there's also a smaller probability of there being 11 and 12 and 13 etc it's just I haven't included them on the graph because they become negligible nonetheless it's a theoretical difference between this and the binomial distribution which needs to be appreciated now what are the assumptions underlying the Poisson distribution firstly the rate at which events occur must be constant another way of saying this is that the probability of an event occurring in a certain time interval should be exactly the same for every other time interval of that same length the other assumption is that the occurrence of one event does not affect the occurrence of a subsequent event ie the events are independent so it shouldn't matter if an event just happened that shouldn't influence the time interval till the next event now these assumptions won't necessarily hold in reality so it's often good to appreciate when you are using the Poisson distribution just how relevant it is to the question at hand and we'll see with some examples here whether these assumptions are going to break down so the next thing we can learn about the Poisson distribution is the PMF or the probability mass function now all that means is the height of each of these discrete outcomes or the probability of getting each of these discrete outcomes when the mean is 3 in this case so for example if I wanted to find the probability of getting say 5 events happening in this time interval I can use the formula subbing in the value 5 for X and subbing in the value 3 for lambda because don't forget 3 is still our mean or expected number of events per this time period so I can actually do this by hand and in that case I get zero point 101 but of course these possible to use the wonders of Excel to do exactly the same thing so if you use the Poisson dot DS to function now these are the new statistical functions that were introduced to Excel and I think the 2013 version but they're all standardized now which makes things really easy you might find other sources giving you the old formula here and that'll work too but I think it's good to start using these dot dist functions because you'll see they're all exactly the same once you get a handle on them so pasando district wires three different arguments the first of which is a value we're seeking the number of events for which we're seeking the probability so if we want 5 in that case we'll put 5 as our first argument the second argument requires the mean for the Poisson distribution in this case that's three and a third argument requires you to tell it whether you want the cumulative distribution which is called the CDF or whether you want the probability mass function the PMF and of course we want the latter so to write false that tells it that we don't want the cumulative distribution we want the PMF so it'll give us point 101 as well so there's a 10% chance roughly ten point one percent chance of getting five events occurring in this time interval all right so let's talk about the CDF now the cumulative distribution function now that's not the height of a certain individual discrete outcome that is the cumulative distribution so all of the heights put together up until that point and there is a formula for that involving a gamma distribution and all this stuff which look you're not going to really need to know unless you're doing higher-order statistical stuff but for this purpose we can use Poisson disk again but make sure we write true as that third argument and in that case it's going to sum up for us all of these bars up to and including the outcome where there are five events so it'll sum up all of those five and we get zero point nine one six the other unique thing about the Poisson distribution is that it's expected value which we kind of need to be told is actually equal to its variance so lambda is also the variance of the distribution so here we've got a mean of three and we'd also have a standard deviation of the square root of three okay remembering that the standard deviation is the square root of the variance and what I've done now is just provided for you a couple of Poisson distributions for differing values of lambda just so you get a sense of what they look like so this is a Poisson distribution with lambda that's the expected value being one and this one is for where lambda is two here's lambda being three and four and five now of course in each of these circumstances the distribution continues beyond ten but I've just left the scale constant so you can kind of get a sense of how these flow from one two three four and five but be aware too that the mean lambda doesn't need to be an integer value you can also have a Poisson distribution with a mean of three point six one for example or even a mean of 0.5 so there's no requirement for that lambda to be a full whole number all right so it's time for you to do a question let's give this a read exclusive Vines import argentinian wine into australia and they've begun advertising on facebook to direct traffic to their website where customers can order wine online the number of click-through sales from the ad is Poisson distributed with a mean of 12 click-through sales per day okay so we've got a mean of 12 that will be our lambda value now I've got three questions for you and I'm hoping that you can pause the video here and give these ago and then see if we get the same answer or after the probability of getting exactly 10 click-through sales in the first day at least 10 click-through sales in the first day and then more than one sale in the first hour so see how you go with those and I'm also going to give you a bonus question do you think the Poisson distribution is actually appropriate for this scenario in reality so hopefully you can think about those assumptions and figure out whether they would hold in this case so here's the answer to Part A the probability of 10 click-through sales in the first day is equivalent to the height of this bar here now this is a Poisson distribution where the mean is 12 lambda is 12 of course it goes a little beyond in this direction down to zero and on this direction it goes up to infinity but how do we find the probability of that bar we can use the formula for the PMF and just sub in those values for 12 being lambda and 10 being X and we get a value of zero point 105 which would be the same result if you've used equals a Poisson Destin subbed in at 1012 and false so there's a 10.5 percent chance of getting 10 click-through sales in that first day so there's the probability Illustrated on the plot alright Part B what's the probability of at least 10 click-through sales on the first day so how do we find the probability of X being greater than or equal to 10 now that's equivalent to this whole yellow area over here if we sum up all of those together going from 10 up until infinity well unfortunately in Excel there's no way of finding the probability of getting a value or higher so we're gonna have to use the CDF which is the probability of getting a value or lower and subtracting it from 1 now here's the trick we're actually gonna go 1 minus plus on dist:9 is the value of interest we're gonna use here because if you think about it here are those green bars we're going to go 1 minus the probability of all of these green bars put together which is 9 and below so you have to use your brain a little bit with some of this stuff knowing that it said at least 10 click through sales we know we were after 10 and above which is 1 minus the probability of 9 and below but putting in the appropriate formula here we can get 0.75 8 and hopefully you got that exact answer what about the probability that we have more than one click through sale in the first hour well this is where the properties of a Poisson distribution show themselves if we know there's an average of 12 click-through sales in the first day if it's truly a Poisson distribution the mean number of sales per hour will be 0.5 because all we do is just divide by the total number of hours so this becomes our new value of lambda so this is the distribution now where we've got a lambda value of 0.5 so most of the distribution is going to be down here at 0 and 1 because we're only expecting 0.5 sales per hour so we're most likely to get 0 sales in a given hour potentially we can get 1 and it becomes less likely to get to 3 and very unlikely to get 4 5 and 6 and beyond so really we're after this shaded yellow region which will include all of those values from 2 and beyond to get that it's going to be very much like the last example we're gonna go 1 minus plus on dist where we're taking the cumulative distribution so putting true in that third argument but we're doing it from one where the mean is 0.5 so we're going to be subtracting from one these two bars here which is 0.09 oh so there's only a 9% probability of getting more than one click through sale in the first hour and I'll just reiterate that it's very important to read strictly what was written in the question here because it says more than one click through sale if it said one or more we'd actually get a different answer because we'd be after this probability as well so this would also be yellow where X is one alright so how did you go hopefully you got those same answers question D the bonus question asked you to think about what a Poisson distribution kind of ears and whether it's appropriate for this scenario in reality now I'll return you to our assumption where it said that the rate at which events occur must be constant so in other words no interval can be more likely to have an event than any other interval of the same size now if you're dealing with the clicks on Facebook over the course of a day it's very unlikely for it to be a constant rate how many people do you think are going to be clicking on ads for wine at about 2:00 a.m. in the morning well actually that's probably quite a few how many people are gonna be looking at it at say 6 a.m. in the morning might be a better question not very many right so irrespective of when in the day people would be likely to click on this you can you get the sense that people's usage of Facebook would differ throughout the hours of the day and also their likelihood to be attracted by an ad for wine so in reality it would not really be a Poisson distribution but nonetheless it is often good to use these types of distributions as it can still give us a decent picture of what's going on feel free to click through to the next video in the series where I deal with the hyper geometric distribution and any poker players might be interested in that one that is the classic poker hand distribution but yep I'll just leave these up here and if you want to keep in touch you can do so through these links
Info
Channel: zedstatistics
Views: 109,214
Rating: 4.9216785 out of 5
Keywords: what is a poisson distribution, poisson distribution explained, zedstatistics, zstatistics, justin zeltzer, poisson distribution
Id: cPOChr_kuQs
Channel Id: undefined
Length: 14min 23sec (863 seconds)
Published: Tue Apr 18 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.