What is a Sampling Distribution? | Puppet Master of Statistics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi I'm Mike Maron today we'll discuss the concept of a sampling distribution of the mean in many fields of study we collect data these data are used to calculate an estimate which we use to understand our world usually we collect only one random sample and then use it to estimate something say the mean of a much larger population but what if we had collected another random sample wouldn't the mean be different wouldn't our estimate of the population mean change so how can we know that our one sample can be trusted to represent a much larger population and how can we know that our work as statisticians is precise the concept of a sampling distribution helps us to understand how we can use one sample to make statements about the mean of an entire population a sampling distribution is an important concept in the statistical sciences it's foundational and it's crucial that you learn it but what is it really a sampling distribution is the probability distribution of a given statistics based on a random sample it describes the what-if of all the possible estimates we could have ended up with still not sure what I'm talking about stick with me today to help us understand these concepts we will be estimating the average length or mean length of fish that live in this loop the thing we are interested in estimating today which we will call a parameter is the population mean our variable of interest is the length of a fish we have our random sample of 25 fish from this lake here and we will begin to measure the length of each individual fish now in order to get the true population mean of all the fish in this lake we'd have to measure every single fish and that could be thousands and we don't have that kind of time today so today I'm going to entrust the skills of a special friend of mine [Music] Neptune can you please help us calculate the length of every single fish in this lake the important thing to learn here is that we must understand how samples behave when we know well everything the truth we can do this using probability theory first we're going to build up an understanding of how samples behave when we know the true values for the entire population then we'll be able to learn how to use a sample to try and make statements about the entire population in the real world when we don't actually know the truth without help from nekton we do this using statistical inference [Music] so we now know the mean length and the standard deviation of the length and this probability distribution shows the length of all fish in this lake every single fish so we know that this population has a mean length of 40 centimeters and a standard deviation of 10 centimeters thanks for your help Neptune we now know what we can do when we have the power of Neptune at our disposal but what if we don't what if we're back to being regular statisticians what if we can't count each and every fish we will now collect a random sample of 25 fish and measure their length and calculate the mean of the 25 lengths we will call this our sample mean or estimate of the population it's important to remember that this sample mean is just one of many we could have gotten just by chance we would have a different sample mean each time we collect data because each set of fish is slightly different from one another look at this histogram it's starting to show not one not two not three not four but all of the possible sample means we could have ended up wouldn't it be great if we could see a histogram of all the possible sample means without having to collect sets of 25 fish over and over again forever what would happen if we ended up with a different sample of 25 fish you'll be pleased to hear that there is a way to know approximately what the shape of this distribution would look like without repeating the sampling process thousands of times thanks to Neptune we now know that the true mean of fish in the lake is 40 centimeters and the standard deviation is 10 centimeter and we know that the distribution of the lengths of all fish looks like this then what's going to happen is we're going to go out with our fishermen and we're going to take a random sample of 25 fish from this population and when we take that sample of 25 fish we expect our sample mean to be equal to 40 centimeters the true population mean although we know that it's not going to be exactly equal to 40 due to what we call sampling variability now imagine all the thousands of random samples we could have taken each means deviate slightly because not all samples have a mean that equals Neptunes true mean of 40 the standard deviation of all possible means is called the standard error and we can work that out the standard error is the standard deviation of the lengths of individual fish divided by the square root of the sample size so our standard error is 2 centimeters and this tells us typically how far the mean length of a sample of 25 fish deviates from the true mean so typically we're going to end up with a sample mean about 2 centimeters away from the true mean of 40 even without Neptune we have an idea of how close our estimate will be to the true value the sampling distribution of the mean will be approximately normally distributed under certain conditions in fact an old rule of thumb tells us that approximately 95% of sample means we could get will say within about 2 standard errors of the true mean and be between 36 and 44 centimeters that's most of them not bad for a simple guy with only a paper cut out of a fisherman to help them out now check this out the sampling distribution the distribution of all the possible means we could get is approximately normal or belching okay so we think of the sample mean that we're going to get as one of many possible sample means we could get and we can think of pulling it out of this distribution here even if this here the population of all individual fish lengths is not normally distributed like in a different lake with an unusual proportion of large sea creatures this year the set of all possible means would still be normal in bell shape as long as our sample size is large now that's pretty cool now you may be thinking to yourself why is Mike spending so much time pretending and talking about what is today well the reason is that in everyday life we generally don't know the true values for an entire population and we must use a sample to try and make statements about the population we're sampling from learning health samples behave when we do know the true values for the entire population allows us to build the theory necessary to make statements about a population when we do not know the true values for the population in the videos to follow we will discuss how the sampling distribution of the mean can be used to help us make statements about a population as we learn how to construct a confidence interval and test claims using hypothesis test don't forget to check out the statistics visualizations that accompany this video you can find a link in the description thanks for watching you [Music] I know I know I'll come back home I know I know together [Music]
Info
Channel: MarinStatsLectures-R Programming & Statistics
Views: 60,583
Rating: 4.847826 out of 5
Keywords: Sampling distribution, sampling distribution of the mean, sampling distribution definition, sampling distribution statistics, sampling distribution in probability and statistics, Hypothesis testing, hypothesis testing statistics examples, hypothesis testing p value, what is a hypothesis, statistics for data science, statistics course, statistics 101, statistics crash course, Marinstatslectures, marin stats r, statistics videos, marinhamadani, नमूने का वितरण, Stichprobenverteilung
Id: olK80ngCbXc
Channel Id: undefined
Length: 9min 14sec (554 seconds)
Published: Sun Jan 22 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.