Poisson or Not? (When does a random variable have a Poisson distribution?)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's take a closer look at the question when does a random variable have a Poisson distribution this can be a trickier question to answer than for some other distributions like the binomial and geometric so let's take a closer look at a few examples and some things to consider when we're trying to address this question I do not do any calculations in this video this is a concept video I'm going to assume that you've already been introduced to the Poisson distribution but let's do a quick review recall that a Poisson random variable is a count of the number of occurrences of an event in a given unit of time distance area or volume etc that sounds simple enough but other conditions need to hold in order for this count to truly have a Poisson distribution I'm going to phrase the following in terms of time but the same ideas hold if we were discussing distance or area or volume etc suppose events are occurring independently more specifically knowing when one event happens gives no information about when another event will occur and the probability that an event occurs in a given length of time does not change through time in other words the theoretical rate at which the events are occurring does not change through time a little more loosely we sometimes say that the events are occurring randomly and independently if these conditions hold then the random variable X which represents the number of events in a fixed unit of time has the Poisson distribution and here's the probability mass function for the Poisson distribution what we used to calculate possible probabilities and note that the random variable X can take on the values 0 1 2 off to infinity there is no upper bound on the values X can take on thinking about the possible values the random variable can take on can sometimes help when we are trying to determine what the appropriate distribution is here's an important relationship the Poisson distribution closely approximates the binomial distribution if n is large and P is small this notion can sometimes help us when we're trying to determine whether a Poisson distribution might be reasonable let's look at a few examples here with the number of chocolate chips in a randomly selected scoop of cookie dough have a Poisson distribution well here we're counting up the number of events the number of chocolate chips in a volume of cookie dough so that condition of the possum is satisfied and if the cookie dough is mixed up really well so that the chocolate chips are distributed randomly and independently throughout the cookie dough then the passo model would probably be pretty good so here depending on the specifics of the situation the poiseuille model might be very reasonable how about the average weight of customers arriving at a store in a 10-minute period well here I'm just being a little silly the average weight is a random variable but it's not a count of a number of events so the average weight would definitely not have a Poisson distribution how about the number of deaths from horse cakes in the prussian army in a year here we're dealing with a count again the number of deaths in a year and this one's a bit of a classic passo example and there's a classic data set associated with it from the 1800s if these deaths are occurring randomly and independently then it'd be fine to model this with a Poisson distribution and in fact the actual data set showed that the Poisson distribution fit the data quite well but if we didn't have any data to back us up it would be hard to say exactly the independence assumption could easily be violated in certain ways for example perhaps horses go crazy every now and then and take out as many people as they can so the 1st and 3rd situations here are probably pretty well modeled by a Poisson distribution but it depends on the specifics of the situation and for the rest of this video I'm going to try to help us visualize what we should be thinking about when trying to determine if a Poisson model is reasonable suppose we have a situation like this in which these dots are randomly scattered throughout this area I've done this in such a way that they are randomly and independently scattered throughout the grid and the true theoretical rate of dots is constant across the grid we might see a pattern like this if we put a piece of paper on the ground and saw where the raindrops fell when it was raining suppose we were to randomly select a square meter of this area and count up the number of events this green square represents a randomly selected square meter and in this square we have four events we could randomly select another square meter and this time there are two events if we did it again this time we got one with the Poisson distribution provide a reasonable approximation to the distribution of the number of events in a randomly selected square meter yes here are the conditions of the Poisson are for all intents and purposes perfectly satisfied we have randomly scattered dots distributed independently of one another and the theoretical rate of occurrences is the same across the field here's a bit of a different situation here the events are distributed evenly across the plot this is the same mean number of events as in the last plot but with a very different distribution we might see something like this as the distribution of Lights and a large parking lot say here if we were to randomly select a square this time we got two events if we did it again this time we got two events again and if we did it again this time we got two events again we wouldn't get exactly the same number of events every time but there isn't going to be a lot of variability certainly not nearly as much as when we had random scattering of points on the last slide with a Poisson distribution be a reasonable approximation to the number of events in a randomly selected Green Square no definitely not the events are most definitely not distributed randomly and independently with the same theoretical rate across the plot here's a bit of a different situation in which the dots appear to be grouped together in four groups you might see something like this if the events were not occurring independently and if the events had a tendency to group together for example cows in a field often look a little like this as cows have a tendency to group together if we randomly selected a square here this time we got zero events and we get zero quite frequently since there is so much blank space with no events if we randomly selected another square this time we get 15 events a very large number of events if the events were distributed randomly and independently and at the same theoretical rate across the entire plot then it would be incredibly unlikely to get 15 events if we randomly selected another square this time we get a single event with the Poisson distribution be a reasonable approximation to the number of events in a randomly selected Green Square definitely not the conditions necessary for the Poisson distribution to be reasonable are not met here to summarize here we have the three situations in each one of these three situations the theoretical mean number of events in a randomly selected Green Square is about 1.4 but only the top situation has a Poisson distribution if we use the Poisson distribution to calculate probabilities in these other two situations below we could be very far off and mislead suppose we were faced with this question during the week students enter a university center at an average rate of 20 per minute what is the probability that in a randomly selected five minute period exactly ninety arrived it might be tempting to jump right into the Poisson distribution here we are counting up the number of times an event occurs a student arrives and we could easily say that for a five minute period lambda is equal to the five minutes times the rate of 20 per minute which equals 100 then we could calculate a probability using the Poisson probability mass function it's easy enough to force numbers through the Poisson formula but whether it's correct to do so or not is another question and here we shouldn't be too quick to assume the Poisson distribution first of all the rate at which students arrive would vary dramatically during the day between classes and at lunchtime the rate is very high while at night or during classes the rate is much lower but even if the rate was the same at different times or we restricted ourselves to a strict time like between 2:10 and 2:15 on Tuesday afternoon say there's still a problem students will not be arriving independently of one another to illustrate that notion I've created a few simulations here each happy face represents a student and the space on the right represents the door to the university center in this first case I've simulated a situation where the number of arrivals in a given time frame would follow a Poisson distribution I set this up such that students are arriving randomly and independently with the constant theoretical rate of arrivals but to speed things up for us I'm going to have them arriving at a faster rate than what was given in the question and that looks like this here the arrivals are following what we call a personal process and the number of arrivals in a given time frame would follow a Poisson distribution here here's another situation where the students pace themselves apart evenly there isn't randomness in the arrival times the students are arriving at evenly spaced intervals students are not arriving according to a Poisson process and the number of arrivals in a given time frame would not have a Poisson distribution but it would be a rare case where students would space themselves apart like this student arrivals would look more like this simulation I set this one up so that there is some randomness and many students arrive as individuals so it might look a little like the Poisson process like that first simulation but I've tweaked this one in such a way that the students have a tendency to group together that's not to say that the students can't arrive on their own but here they are more likely to arrive in pairs or groups than would be predicted by a model based on random and independent arrivals we still sometimes use the Poisson distribution to model scenarios like this and it may provide a reasonable approximation in certain settings but it won't handle a situation like this very well where a large group of students groups together for some reason which we might see when a bus arrives or when a certain class lets out a large group like this would not be properly accounted for in the passo model let's look at one last pair of situations with the number of fatal commercial plane crashes have a Poisson distribution we're counting up the number of times an event occurs in a given time frame so that might lead us to think about the puzzle and there are a very large number millions of flights in each year each one of them having a very tiny probability of experiencing a fatal crash so that should be guiding us towards a Poisson distribution because of the Poisson approximation to the binomial from a bit of a different perspective it's reasonable to think that these crashes are occurring randomly and independently of one another at least to a reasonable approximation but it would be pretty tough to pin down the value of lambda the mean number of crashes in the time period even if we had a lot of historical data the probability of crashing has been decreasing through time with better safety standards and technology but the number of flights has been in Racing over the years which would tend to increase the number of crashes so it might be a little tough to estimate lambda but overall the Poisson distribution would likely provide a pretty good approximation to the number of fatal crashes in a time period how about this very different but related question with the number of fatalities and commercial airline crashes follow a Poisson distribution here again we're counting up the number of times an event occurs in a given time frame so that might lead us to think about the Poisson and there are a very large number of people flying and each one of them has a tiny chance of dying on any given flight so that again might lead us to think the Poisson model might be reasonable but it's not here the events are most definitely not occurring independently of one another if you are sitting on the tarmac waiting to take off and you somehow or given the information that the plane that is too ahead of you on the runway will have a fatal accident that day that doesn't affect your chances of having a fatal accident that day too much however if somehow you were given the information that the person two seats away from you is going to die in a fatal plane crash that day that is very very very bad news for you so that's my argument as to why the number of crashes might have approximately a Poisson distribution but the number of fatalities most definitely would not let's see if the data bears that out here are plots of the number of fatal crashes a number of fatalities for us commercial airlines between 1982 and 2012 courtesy of data from the National Transportation Safety Board these plots might look similar but take a close look at the axes and the plot of the number of crashes the events range from zero which happened nine times to five which happened three times the number of fatalities is zero in those same five years of course but it has massive spikes to over 400 in two of the years some crashes result in hundreds of deaths and those deaths are most definitely not occurring independently a place'll model we cannot handle situations where zeros are very likely but several hundred events are also reasonably likely as well in a Poisson model seeing numbers like this is nearly impossible so the number of fatalities most definitely does not follow a Poisson distribution but the number of crashes in a given time period likely follows approximately a Poisson distribution although it might be tough to give a good estimate of lambda especially if we are trying to predict the future there were three years in a row with zero crashes is that a real effect due to better safety standards or is that just a fortunate run or some combination of those two things it's awfully tough to say to sum up it can be difficult to determine if a random variable has a Poisson distribution and it's definitely trickier than some sources make it out to be
Info
Channel: jbstatistics
Views: 202,392
Rating: 4.9476361 out of 5
Keywords: Poisson, Poisson distribution, Poisson process, random variables, Poisson approximation to the binomial, probability, discrete probability distributions, binomial example, probability examples, jbstatistics, jb statistics, statistics, introductory statistics, intro stats videos, intro stats help, stats help, stats tutor
Id: sv_KXSiorFk
Channel Id: undefined
Length: 14min 40sec (880 seconds)
Published: Thu Jan 30 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.