Introduction to the t Distribution (non-technical)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Let's look at an introduction to the Student t distribution, often shortened to simply the t distribution. This video is a little light on mathematical details, so if you're looking for how the t distribution arises mathematically, or its pdf, I go through that in another video. Suppose we are about to draw a random sample of n observations from a normally distributed population. We've previously learned that the quantity X bar minus mu over sigma over the square root of n has the standard normal distribution. And we typically label that with the letter Z. Previously, we've used this notion to construct a confidence interval for the population mean mu. But in practice we encounter a problem, and that problem is that we don't know the value of the population standard deviation sigma. Sigma is a parameter, the standard deviation for the entire population, and we don't typically know its value, so we can't use that value in a formula. So we do the next best thing, and instead of using the population standard deviation, we're going to use our sample standard deviation to estimate it and then we're going to have a statistic X bar minus mu over s over the square root of n, where s is our sample standard deviation. But something very fundamental has changed here. Sigma is a constant but we don't know its value so we use s, which is a statistic, and this statistic s has a sampling distribution, and it would vary from sample to sample. And so this quantity down here would no longer have the standard normal distribution. And we call this quantity or we label it as t because it has a t distribution. When we are sampling from a normally distributed population, the quantity X bar minus mu over s over the square root of n has the t distribution with n-1 degrees of freedom. The concept of degrees of freedom can be a bit of a tricky one, so I'm not going to get into the details here. But the degrees of freedom for the t and if you recall when we had our sample variance s squared, we divided by n-1. those two notions are very much tied together. What does the t distribution look like? We'll look at that in a moment, but if we look at this statistic, it looks very much like our Z statistic, which has the standard normal distribution, Except we've replaced the population standard deviation with the sample standard deviation. We are estimating a parameter with a statistic so there is greater variability. So our t distribution is going to look a lot like the standard normal distribution, except with greater variance. Here's a plot of the standard normal distribution in white and a t distribution with one degree of freedom in red. We can see that both distributions are symmetric about zero and bell-shaped, but the t distribution has heavier tails and a lower peak. The exact shape of the t distribution depends on the degrees of freedom. A very fundamental point here is that as the degrees of freedom increase, the t distribution tends toward the standard normal distribution. So I'm going to let the degrees of freedom increase and let's see what happens. as the degrees of freedom increase here we see the red curve getting closer and closer and closer to the white curve. or in other words, as the degrees of freedom increase the t distribution is tending towards the standard normal distribution. I've stopped it here at 20 degrees of freedom, and the curves might look close, but if we look very closely we would see that the t distribution still has slightly heavier tails and a slightly lower peak. But if I let those degrees of freedom continue to increase, the t distribution is going to get closer and closer and closer to the standard normal distribution. This has some implications for us in statistical inference. Here I'm going to look at constructing a 95% confidence interval, but the same notion would hold in many other situations as well. If we are sampling from a normally distributed population, and we happen to know the value of the population standard deviation sigma, then we've discussed previously that this is the appropriate formula for our confidence interval. This 1.96 comes from the standard normal distribution. And I've drawn in the standard normal distribution down here. If we want a 95% confidence interval then we put an area of 0.95 in the middle, and we divide up the remaining area of 0.05 evenly into the two tails, putting 0.025 in the right tail and 0.025 in the left tail. We call the value here with an area to the right of 0.025 z_.025, and that value is 1.96, which we've encountered previously, and we can find from the standard normal table or software. But if sigma is not known, then we can't use it in our confidence interval formula, and we would have to replace it with the sample standard deviation. But then we should no longer use 1.96, we shouldn't use a value based on the standard normal distribution, we need to use a value based on the t distribution. So down here I've drawn in a t distribution, and we use the same logic in that we want to put 95% of the area in the middle and split up the remaining area evenly into the two tails. And so what we want to find is from this t distribution the t value that gives an area to the right of 0.025. Because the t distribution has greater area in the tails and greater variability than the standard normal distribution, How much greater? Well that depends on the degrees of freedom, because the shape of the t distribution depends on the degrees of freedom. But let's look at a few values. Here I have a table with the appropriate t value for various degrees of freedom. This first column has the sample size n. The second column has the degrees of freedom, which are n-1 for the case we're discussing today. And then the appropriate t value for a 95% confidence interval. This can be found from a t table or software. Take note that at infinite degrees of freedom we get our z value of 1.96, that is our z_.025 value, and that's because a t distribution with infinite degrees of freedom is the same as the standard normal distribution. But if we look up here with five degrees of freedom, we see that the t value is 2.571, which is quite a bit bigger than the 1.96 value from the standard normal distribution. As the degrees of freedom increase, the t distribution is getting closer and closer and closer to the standard normal distribution, so these t values are getting closer and closer and closer and closer to 1.96, the value from the standard normal distribution. Some sources go so far as to say that if the sample size is greater than 30 just forget all about the t distribution and use the standard normal distribution. But if you take statistics from me, forget you ever heard such a notion. If we look here at 30 degrees of freedom we see that the t value is 2.042, which to me at least is quite a bit bigger than the z value of 1.96. Even at 100 degrees of freedom the t value still is a little bit different than the 1.96. And so if we use this z value when we should be using the t value our calculated margin of error will be smaller than it should be. If we are sampling from a normally distributed population and we are using a standard deviation that is based on our sample's data, then we should be using values from the t distribution and not the standard normal distribution, regardless of the sample size.
Info
Channel: jbstatistics
Views: 401,958
Rating: 4.9331102 out of 5
Keywords: t distribution, standard normal distribution, confidence interval for mu, sigma known, sigma unknown, confidence intervals, jbstatistics, jb statistics, introductory statistics, 8msl, confidence, interval, 8 minute stats lectures, intro stats videos, intro stats help, stats help, stats tutor, AP statistics
Id: Uv6nGIgZMVw
Channel Id: undefined
Length: 8min 54sec (534 seconds)
Published: Sat May 04 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.