How to Learn Probability Distributions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome to mutual information my name is dj in this video i'm going to show you what i think is the best approach to learning about probability distributions i'll start with an analogy let's say you had a full minute to memorize this mess after that minute you have to redraw it from memory perfectly every edge every node no mistakes you think you could do it well if you have a normal human brain like me then no there are just too many details to get wrong but what if i were to tell you about the pattern behind this complexity that is there are 25 nodes which could be numbered like this and you should consider only nodes that are not prime then draw an edge between any two of them but only if their difference is prime now i'll ask again could you redraw this pattern knowing this fact i mean yeah almost certainly it might be tedious and annoying so please don't do it but yes it could easily be done all for memory the lesson here is if you want to hold something big and complicated in your head it's best to find a simple compressed pattern that explains all or most of that complexity and then learn only that pattern then when it's time to produce that complexity just reapply that pattern now this lesson isn't always easily applied how to explain a complicated thing with a simple set of rules isn't typically obvious or possible fortunately it happens quite frequently in mathematics and in this case statistics so to state it directly my argument is if you want to understand a bunch of probability distributions it's best to focus on the stories that relate them rather than their individual definitions now because the world of probability distributions their properties and their relationships is huge the compressed pattern that relates them is still quite big compression helps but to own such a huge valuable bag of information you still must pay with focus time and patience and if you're curious about this graphic mourn it at the end but for now i'll start by telling you one of these stories that relates six of these distributions we'll start with the simplest of random variables they're bernoulli one sample of this guy yields one of two outcomes either yellow or blue it has only one parameter the probability of a blue box which i'm writing here as p and i've chosen specifically to be 0.3 now let's measure these boxes to produce another random variable specifically let's draw lines above each blue box and count the yellow boxes between them the punch line is this list of numbers would have a geometric distribution and it carries the same parameterization as our bernoulli which is that number p not too bad so far next let's apply a different criteria for those lines let's draw one over every other blue box and count the yellow boxes between those well that would give us the negative binomial distribution with its r parameter equal to two if we picked every third blue box the r parameter would be three all right still not too bad let's try another let's draw lines separating groups of six boxes and count the blue boxes between those well that would give us the binomial distribution with its end parameter equal to six hopefully you're getting a sense for how these distributions are similar they're all measuring the same thing but with a different criteria for counting so let's step it up a bit let's try to make a continuous version of these boxes one way to do that is divide the probability of a blue box by some large positive number c and make each box count for 1 over c instead of 1. as you can probably guess i'd like you to pretend this number c approaches infinity but to keep things tame on screen i'll go with c equal to 5. with that let's reapply our previous criteria and see what we get if we were to sum the boxes between all blue boxes keeping in mind that each box now counts for 1 over c then those numbers would be exponentially distributed the rate parameter in this case would be p i should know the rate parameter is not p over c which in the limit would be 0 but rather it's p the probability of a blue box in the discrete case we just came from moving on if you were to sum the boxes between every other blue box then you would get a gamma distribution with the shape parameter equal to two and the scale parameter equal to one over p as you can see the gamma distribution can be viewed as a continuous version and in fact a generalization of the negative binomial all right last one if you were to count the blue boxes between each block which sums to six then you would have a poisson distribution with a rate parameter equal to six p who saw that coming okay let's take a step back my argument is if you want to understand the six distributions you should focus on this short story it's just good bang for your buck it tells you a lot and it's not that hard to remember in fact i think it could cram it into this corner that said i still need to follow through on my analogy reasoning about this story should yield patterns that would otherwise feel like extra free-floating flash cards needed to be memorized the first pattern is virtually announced in this story and that is the discrete continuous analogies the exponential is the continuous version of the geometric the gamma is the continuous version of the negative binomial and the poisson is the continuous version of the binomial also i'm putting the poisson in the continuous section because it measures continuous things with that said these analogies are useful in my experience continuous distributions can be hard to reason about their parameters just aren't interpretable so we can reason with their discrete versions in their place for example i think of the exponential distribution as a geometric distribution with thin boxes that's useful when trying to remember things like the mean of the exponential the mean of the geometric is easy it's one over p meaning if a blue box shows up ten percent of the time then on average i have to observe ten boxes before i see my first blue box okay what about the exponential well by analogy it must also be one over p or after you do the parameter translation one over lambda see how this can be helpful and clearly this sort of moment translation doesn't stop with the mean of the exponential though that may be the simplest one the next pattern to notice is the summation relationships that is if you sum samples of the geometric you get the negative binomial and by analogy if you sum up the exponential you get the gamma to see that let's revisit those separating lines of the geometric now if you wanted to show the sum of two geometrics with this view how might you do that well since we know the geometric is just the number of yellow boxes between lines we could drop every other line and then count between the remaining lines but notice these are exactly the same separating lines of the negative binomial with r equal to 2. so we see summing up geometrics gives us a negative binomial and by analogy the same argument shows that the gamma is a sum of exponentials now that is insight and in fact you could use a similar argument to show that summing up binomials yields another binomial and the same is true for poisson it would be pretty easy to figure out their parameters too and if you smuggled in the central limit theorem that tells you why four of these distributions approach the normal distribution as some select parameters become large more insight so hopefully that convinces you it's easier to understand the crowd of distributions if you remember the stories that relate them rather than their individual definitions and that's because those stories offer additional properties virtually for free and those crystallize the whole picture and burn it into your brain but as you may have noticed that hardly covers everything well yes the compression ratio at least from what i can see isn't as dramatic as we may hope there are some fragmentation of stories each of which only relates a handful of distributions still those are worth remembering that said to demonstrate some of the range of these stories i'd like to quickly fire off a few of my favorites just as a bonus ready i'm about to throw a lot at you okay say v is just some fixed positive number and you have a gamma distribution with both its parameters set to half of v then you sample from that gamma some value which i'll call z i then you use one over z i as the variance in a mean zero normal distribution and then you sample a value from that which i'll call w i well if you repeated that process many times you discover that wi has a student t distribution with v degrees of freedom i'm not sure why but it was only when i learned that fact that i start to get comfortable with the student d distribution the student t is just a bunch of normal distributions mixed together with different variances that bounce around according to an inverse gamma distribution okay another let's say we have an exponential distribution with some fixed lambda parameter we'll draw two samples from that exponential zi and zj and take their difference and call that difference wk well in this case wk would have a laplace distribution if you don't know about the laplace it shows up a lot in machine learning for generating sparse solutions and again i felt it was pretty unintuitive until i saw this angle okay last one let's say we sample two values from the standard normal distribution again which we'll call zi and zj then let's say w k is their ratio well in this case w k has a cauchy distribution excuse my redundancy but again this demystified the caution distribution for me a lot the costly distribution is weird it has an undefined mean and variance which i felt made no sense but it made a little sense after i realized it's a ratio and the denominator has an expected value of zero okay now i kind of get it now everything i've told you remains a small piece of the full story a much bigger view is given by this epic graph which i did not and could not create in fact it took many contributors to put this piece together they even have a website dedicated to it which i'll link to in the description along with my other sources if you take my recommendations which is to learn these stories of relation then exploring this graph is a great place to start and finally thank you for your focus if you enjoyed this video and would like to continue learning about statistics and machine learning please like and subscribe content like this is the content i'll continue to make especially if i can get your support
Info
Channel: Mutual Information
Views: 1,788
Rating: 4.9794869 out of 5
Keywords:
Id: mBCiKUzwdMs
Channel Id: undefined
Length: 10min 54sec (654 seconds)
Published: Tue Apr 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.