Sampling Distributions: Introduction to the Concept

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Let's take a look at an introduction to the concept of sampling distributions. To a great extent, statistical inference techniques are based on the concept of the sampling distribution of a statistic. Later on we're going to be discussing statistical inference and so it is important that we get this notion of a sampling distribution down. The sampling distribution of a statistic is the probability distribution of that statistic. In other words it is the distribution of the statistic if we were to repeatedly draw samples from the population. So if we were to get a sample, get a value of a statistic, and draw different sample of the same sample size and get a value that statistic, the statistic is going to vary from sample to sample according to the sampling distribution of that statistic. Let's look at a simple example to illustrate. Suppose a university class has 16 students, and the professor wants to know the average age of the sixteen students in the class. Since the professor is interested in these specific 16 students, these 16 students represent the population of interest, and their average age is a parameter. And I'm going to call that mu. Perhaps the professor would have access to this information in their records but I'm going to assume here that they do not have access to this information, and so mu is an unknown quantity to the professor. I'm also going to assume in this bit of a contrived example that the professor can take a sample of three students and find out their ages. So perhaps it's something like the professor has a friend in the Registrar's Office who'll look up the ages of 3 students for them. Or something to that effect. Unknown to the professor, this is the reality of the situation. These are the true ages for the 16 students in the class. And this is the reality of the situation. We can calculate the true population mean mu. If we take the average of those 16 values, we would see that that is 239.8125. But that is an unknown value to the professor. To the professor, the reality of the situation looks something like this. There's 16 students with unknown ages. I'm going to number them so we can keep track of them. The professor is allowed to draw a random sample of three students and find out their ages. So let's randomly select three students. The red dots represent our randomly selected students, and we can find out their ages in months. We get ages of 233, 227, and 238. And we can calculate the sample mean of those three values simply by adding up those values and dividing by 3. And we get a value of the sample mean of 232.67, when rounded to two decimal places. We're going to use this value of the sample mean to estimate mu, which is an unknown quantity to the professor. In addition to this single value, this point estimate, that estimates mu. We would like to give some measure of the uncertainty associated with that value. How close is that value likely to be to the true value of mu? To answer that question we use mathematical arguments based on the sampling distribution of X bar. Related to that is the idea that if we were to draw another sample we would be very very unlikely to get this sample mean again. The sample mean is going to vary from sample to sample. Let's take a look at an example of that to illustrate. Here's our sixteen students again, and let's draw a random sample size 3. We get these three students and they have ages of 251, 238 and 276. And we can again calculate the sample mean of those values by simply adding them up and dividing by 3. And this time we get a sample mean of 255. And had we got this sample, we would use this value to estimate the unknown mu. Note that the sample mean we got here was different from the sample mean we got in our first sample. In repeated sampling the value of the sample mean would vary from sample to sample. The value of statistics vary from sample to sample. If we sampled many times, we did it twice here, but I've sped up the process using a computer and done it a million times. We plotted those sample means in a histogram, it would look something like this. and because I've repeatedly sampled so many times, this histogram of sample means will very closely resemble the true sampling distribution of the sample mean in this scenario. For a little perspective I'm going to put in the population mean mu with a red line. That's what this red line represents, our value of mu, which is about 240. We can note that the sample mean will be distributed about the population mean in some way. As we'll learn later on, very often the sample mean has a distribution that is approximately normal. Doesn't look like that here, but in many situations the sample mean does have a distribution that is approximately normal. Here we sampled 3 people from 16 and thus there were 16 choose 3 or 560 possible samples. So another perspective on the sampling distribution here, is that in this scenario, the sampling distribution of the sample mean is the distribution of the sample mean in all possible samples of size 3 from this population. Going back to our histogram of sample means, we didn't have to actually repeatedly sample from the population. We had 560 possible samples and so we could have worked out the exact sampling distribution of the sample mean in this scenario. But I wanted to illustrate the repeated sampling argument. And since we repeatedly sampled so many times, this histogram will very very closely resemble the true sampling distribution of X bar in this scenario. Note that in practice we don't repeatedly sample from the population, and we typically draw only one sample. But the concept of a sampling distribution is an important one. The value of a statistic that we see in our sample will be a random sample from that statistic's sampling distribution. Why are we even talking about this slightly abstract concept? Well we will use mathematical arguments based on the statistic's sampling distribution to make statements about population parameters. So this is going to play an important role in statistical inference. When all is said and done we'll end up making statements like we are 95% confident the sample mean lies within 22.1 units of mu. And we're going to be allowed to say things like 95% and 22.1 based on mathematical arguments related to the sampling distribution of the sample mean.
Info
Channel: jbstatistics
Views: 258,331
Rating: 4.8850856 out of 5
Keywords: Sampling distribution, sampling distributions, introductory statistics, introductory, statistics, jbstatistics, jb statistics, 8msl, 8 minute stats lectures, intro stats videos, intro stats help, stats help, stats tutor, jeremy balka, AP statistics, p value, p-value
Id: Zbw-YvELsaM
Channel Id: undefined
Length: 7min 51sec (471 seconds)
Published: Fri Dec 28 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.