The Central Limit Theorem, Clearly Explained!!!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
even if you're not normal the average is normal hello I'm Josh starburns welcome to stat quest today we're going to talk about the central limit theorem and it's gonna be clearly explained note for this stat quest to make any sense at all you should be familiar with the normal distribution if not check out the normal distribution clearly explained it would also be helpful if you were familiar with the concept of sampling from a statistical distribution if not check out sampling from a statistical distribution clearly explained the central limit theorem is the basis for a lot of statistics and the good news is that it's a pretty simple concept in this stat quest I'll explain what the central limit theorem is and why it's important like most things in statistics I think the central limit theorem is easiest to understand if we look at some examples so let's start with a uniform distribution this one goes from zero to one it's called the uniform distribution because there is an equal probability of selecting values between zero and one the probabilities are all equal and thus are uniform we can collect 20 random samples from this uniform distribution and then calculate the mean of the samples and on the right we can draw a histogram of the mean value since we only have one mean value the histogram isn't very interesting but after we collect ten more samples and collect ten more means the histogram starts to look a little more interesting here's the histogram after collecting 20 samples in calculating 20 means 30 means 40 means 50 means 60 means 70 means 80 means 90 means and 100 means after adding 100 means to the histogram it's pretty easy to see that these means are normally distributed however to make it easy to see that the means are normally distributed we can overlay a normal distribution you might have noticed that in the last two slides I put means are normally distributed in bold I did this because this is what the central limit theorem is all about even though these means were calculated using data from a uniform distribution the means themselves are not uniformly distributed instead the means are normally distributed BAM here's another example this time we'll start with an exponential distribution just like before we can collect 20 random samples from this exponential distribution and just like before we can calculate the mean of the Tawney samples and lastly we can draw a histogram of that mean over here on the right after we collect ten samples and calculate 10 means the histogram starts to look a little more interesting here's the histogram after 20 mins 30 means 40 means 50 means 60 means 70 means 80 means 90 means and 100 means after adding 100 means to the histogram we can see that they are normally distributed even though these means were calculated using data from an exponential distribution the means themselves are not exponentially distributed instead the means are normally distributed BAM so far we have seen that the means calculated from samples taken from a uniform distribution are normally distributed and means calculated from samples taken from an exponential distribution are also normally distributed well it turns out that it doesn't matter what distribution you start with if you collect samples from those distributions the means will be normally distributed yes there's a little asterisk here that means there's some fine print that will come later for now just know it's really fine print and not worth spending too much time worrying about double BAM cool but what are the practical implications of knowing that the means are normally distributed when we do an experiment we don't always know what distribution our data comes from to this the central limit theorem says who cares the sample means will be normally distributed because we know that the sample means are normally distributed we don't need to worry too much about the distribution that the samples came from we can use the means normal distribution to make confidence intervals Duty tests where we ask if there's a difference between the means from two samples in ANOVA where we ask if there is a difference among the means from three or more samples and pretty much any statistical test that uses the sample mean triple bam note out there in the wild some folks say that in order for the central limit theorem to be true the sample size must be at least 30 this is just a rule of thumb and generally considered safe however as you can see in the examples here where I use a sample size of 20 the rule was meant to be broken here's the fine print in order for the central limit theorem to work at all you have to be able to calculate a mean from your sample off the top of my head I can think of only one distribution the Koshi distribution that doesn't have a sample mean and after doing biostatistics for 20 years I've never come across it in practice that said if you know of distributions that don't have means put them in the comments below and tell us what they're used for I'm curious about how common this occurs hooray we've made it to the end of another exciting stack quest if you like this stack quest and want to see more of them please subscribe and if you want to support stack quest well consider buying one or two of my original songs alright until next time quest on
Info
Channel: StatQuest with Josh Starmer
Views: 227,503
Rating: 4.9433823 out of 5
Keywords: Joshua Starmer, StatQuest, Central Limit Theorem, Statistics, Clearly Explained, CLT, The normal distribution
Id: YAlJCEDH2uY
Channel Id: undefined
Length: 7min 35sec (455 seconds)
Published: Mon Sep 03 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.