t Tests for One Mean: Introduction

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Let's look into t tests for the population mean mu. These tests will be appropriate when we are sampling from a normally distributed population. Is there strong evidence that the population mean mu is different from some value that is of interest to us? Is it different from some hypothesized value? Here we'll be testing the null hypothesis that the population mean mu is equal to some hypothesized value mu_0. The alternative hypothesis is that the null hypothesis is wrong in some way and we're going to choose our alternative from one of these three possibilities. These first two are called one-sided alternatives and this second one is a two-sided alternative hypothesis. We should choose this two-sided alternative hypothesis unless we have a strong reason to be only interested in one of these sides. Suppose we are drawing a simple random sample of n observations from a normally distributed population. To test the null hypothesis that mu is equal to mu_0, we're going to use either a Z test or a t test. The choice of appropriate test statistic boils down to whether the population standard deviation sigma is known or not. If the population standard deviation sigma is known, which is going to be very rare in reality, then we would use a Z test. This sigma X bar, which is simply sigma over the square root of n, is the true standard deviation of the sampling distribution of X bar. If the null hypothesis is true, then this Z statistic will have the standard normal distribution. In practice, sigma is almost never known and so we will typically be using a t-test. If the population standard deviation sigma is not known, we estimate it with the sample standard deviation s. We call s divided by the square root of n, the standard error of the sample mean X bar. And the standard error of X bar is the estimated standard deviation of the sampling distribution of the sample mean. When we replaced sigma with s, something fundamental happened to the underlying mathematics, and this test statistic does not have the standard normal distribution. If the null hypothesis is true, this test statistic will have a t distribution with n-1 degrees of freedom. And so in t-tests, p-values or rejection regions will be found based on the t distribution with n-1 degrees of freedom. Here I've plotted the standard normal distribution in white and a t distribution with 5 degrees of freedom in red. The t distribution is very similar to the standard normal distribution except that it has heavier tails and a lower peak. The exact shape depends on the degrees of freedom. Here we had five degrees of freedom, but we'll see as the degrees of freedom increase the t distribution tends toward the standard normal. So as the sample size increases, the t distribution will get closer and closer to the standard normal. After we construct the appropriate test and get our test statistic, we're going to want to find a p-value. So let's go ahead and look at a few examples of that. If we're testing the null hypothesis that mu is equal to mu_0 using this test statistic and we're sampling from a normally distributed population, then the distribution of this test statistic if the null hypothesis is true is a t distribution with n-1 degrees of freedom. The exact shape of the distribution depends on the degrees of freedom but I've plotted in a t distribution here. Suppose we go through and we carry on our test, and we end up in this spot getting a value of our test statistic of -1.31. -1.31 is around there somewhere. that is the observed value of the test statistic. Since in this case our alternative hypothesis is that mu is less than mu_0, values in the left tail of the distribution give evidence against the null hypothesis in favor of that alternative, and so our p-value is going to be the probability, under the null hypothesis, of getting what we observed, or something even farther to the left. Or in other words, the area under the curve to the left of our test statistic is our p-value for this alternative hypothesis. If on the other hand our alternative hypothesis is that mu is greater than mu_0, then values in the right tail give us evidence against the null in favor of that alternative. So suppose we got the same value of the test statistic here, and our test statistic is equal to -1.31 in our sample. Well, -1.31 is still over here somewhere and values in the right tail give evidence against the null hypothesis so the p-value is the probability, under the null hypothesis, of getting this value or something even farther to the right. Or in other words, the area under the curve to the right of our test statistic is our p value. Suppose instead we had a two-sided alternative hypothesis and we go ahead and we get our sample data and we end up finding a test statistic that is again -1.31. Well -1.31 is over here somewhere, and so we are interested in the probability of getting this value or something even more extreme. So this area is of interest to us, but we would have thought it just as much evidence against the null hypothesis had we gotten a value on this side at 1.31. So this area is also of interest to us as well. And so the p-value for this two-sided alternative is going to be the area in the tail, beyond the test statistic, doubled. Let's look at an example to illustrate. Does mild dehydration affect reaction times? Suppose it is known from a large body of past experience that young women have an average reaction time of 0.95 seconds on a certain type of test. 25 dehydrated young women take the test. They have an average reaction time of 1.00 seconds with a standard deviation 0.18 seconds. Does this yield strong evidence that the true mean reaction time for dehydrated young women is different from 0.95 seconds? Here we might be interested in testing the null hypothesis that the true mean reaction time for dehydrated young women, mu, is equal to 0.95 seconds, against the alternative hypothesis that it is different. So I'm going to use this alternative hypothesis that mu is actually different from 0.95. One could make the argument that dehydration is likely to slow down reaction times at some point but here let's play it safe and use a two-sided alternative hypothesis. And suppose for the sake of argument that we feel that an appropriate significance level here is 0.05. Before carrying out any of our statistical inference procedures, we should plot our data and have a look. Here's a box plot of the reaction times for the 25 women. I've put in a red line representing the hypothesized value of mu, 0.95 seconds. The observed mean in the sample, X bar, was 1.00, and we're going to carry out a test to see if this difference here, between what was observed in the sample and the hypothesized value is a significant difference. Visually it doesn't seem like there's much of a difference there but let's see what the test says. The t-test assumes that we are sampling from a normally distributed population, and that assumption should be investigated. Here I've plotted out a normal quantile quantile plot. The normal quantile quantile plot will be approximately a straight line if the data is approximately normally distributed. Here that's a pretty straight line I'd say and so I'm going to give that the check mark and say that's a reasonable normal quantile quantile plot and that its okay to go ahead with the t test. Recall that we are testing the null hypothesis that the population mean is 0.95 seconds, against the alternative hypothesis that it's different from 0.95 seconds, at an alpha level of 0.05. The standard deviation, s, was calculated based on the 25 observations in the sample and so since we are using the sample standard deviation, we use a t-test and not a Z test. The t test statistic is going to equal the observed value of the sample mean minus the hypothesized value of 0.95 from the null hypothesis, divided by s which is 0.18 over the square root of 25. And rounded to three decimal places, this is equal to 1.389. Now we need to find the p-value and the p-value is going to come from a t distribution with degrees of freedom equal to n-1, which is going to be 24 in this case. Here's a t distribution with 24 degrees of freedom. and the observed value of our test statistic is 1.389, which is right around there somewhere. Our alternative hypothesis is two-sided and so the p-value is this area in the tail, beyond the test statistic, doubled. We can get that area using software or a t table. And if we were to use software we could see that this area is 0.0888. If we didn't have access to software and we had to use a t table, we could only get an interval of values and say that the area is less than 0.10 but greater than 0.05. If we were using software we would double this area and say that the p-value is equal to 0.1776. If we were using a t table we would have to double the endpoints of this interval, and say that the p-value lies between 0.1 and 0.2. If you recall, the smaller the p-value the greater the evidence against the null hypothesis. And this alpha level here can be considered a cut-off level for significance. but since our p-value is greater than that alpha level, we'd say that the evidence against this null hypothesis is not significant at an alpha level of 0.05. But even if we did not have an alpha level given to us, we still should be able to come up with a reasonable conclusion based on the p-value. A p-value of 0.1776 is not considered to be very small, and does not give strong evidence against the null hypothesis. Now let's tie that into the problem at hand. There is not strong evidence (with a p-value of approximately 0.18) that the true mean reaction time for dehydrated young women differs from 0.95 seconds. A few notes to finish. The young women in this study were not a random sample. They were healthy young women who volunteered for the study at a university. Anytime we don't have a random sample from the population of interest, there could be strong biases introduced. And so drawing conclusions about young women in general here is dubious at best. Also, to properly investigate the effect of dehydration on reaction times, it would be far better to carry out a comparative experiment. And in fact the authors of the study that inspired this example did just that, and they compared the reaction times the women when they were dehydrated to the reaction times of those same women when they were not. On a related note, the one sample t-test on means carried out here is not typically as interesting as the two-sample t-test on means. t tests on means are most useful in comparing two groups and we will look in much greater detail at the two-sample t-test a little later on.

Info

Channel: jbstatistics

Views: 121,101

Rating: 4.91363 out of 5

Keywords: t test, sigma unknown, tests one mean, test population mean, tests mu, level of significance, alpha, Hypothesis testing, hypothesis, testing, one sample, mu, population mean, one mean, tests of significance, statistical inference, jbstatistics, jb statistics, statistics, 8msl, 8 minute stats lectures, intro stats videos, intro stats help, stats help, stats tutor, jeremy balka, AP statistics, p value, p-value

Id: T9nI6vhTU1Y

Channel Id: undefined

Length: 13min 46sec (826 seconds)

Published: Sun Feb 17 2013