Z Tests for One Mean: Introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Let's look into hypothesis tests for the population mean mu, when sampling from a normally distributed population. There are two different scenarios here, and we use two slightly different methods. If sigma, the population standard deviation, is known, we're going to use a Z test. This however is going to be rare. Usually sigma, the population standard deviation, is not known and then we're going to use a very much related method but it's going to be a t-test, so it's going to be slightly different. This video is about the situation where sigma is known, and so we're using a Z test. I have a separate video for the t test. Here the question is: is there strong evidence that the population mean mu is different from some value that is of interest us? Is it different from some hypothesized value? Here we're going to be testing the null hypothesis that the population mean mu is equal to a hypothesized value mu_0. So for example, we might be interested in a given setting to test the null hypothesis that the population mean is equal to 10 perhaps, in which case mu_0 is this value 10. We're going to test that null hypothesis against one of these three alternative hypotheses: the alternative hypothesis that mu is less than mu_0, the alternative hypothesis that it is in fact greater than mu_0 and the alternative hypothesis that it is different from mu_0. We call these two one-sided alternatives and they lead to one-tailed tests. And we call this one a two-sided alternative and that will lead to a two-tailed test. The appropriate choice of alternative will become a little bit clearer as we work through problems, but it can at times be subject to debate. As a rule of thumb, choose a two-sided alternative unless you have a very strong reason to be only interested in one particular side. The appropriate choice of alternative hypothesis depends on the problem at hand, and should not be based on the current sample's data. You should be able to construct appropriate hypotheses without ever looking at your sample's data. And as a general rule of thumb in statistics, you should not use the same data that suggests a hypothesis to test that hypothesis. That's considered cheating a little bit. Suppose we have a simple random sample of n observations from a normally distributed population where sigma is known. These conditions are sometimes called the assumptions of this procedure. This normality assumption, the assumption that we are sampling from a normally distributed population, is very very important when we have a small sample size, but as the sample size gets larger and larger that normality assumption is less and less important, due to the central limit theorem. To test the null hypothesis that the population mean is equal to that hypothesized value, we are going to use the test statistic Z equals X bar minus mu_0 over sigma X bar. Sigma X bar is the true standard deviation of the sampling distribution of X bar and it's equal to sigma over the square root of n. So we sometimes see this test statistic written as X bar minus the hypothesized value over sigma over the square root of n. An important concept here is that if this null hypothesis is true mu is equal to mu_0 and then this test statistic will have the standard normal distribution. Let's look at an example. Suppose a supplier to a sushi restaurant claims their blue fin tuna contains no more than 0.4 parts per million of mercury on average. The owner of the restaurant fears the supplier's claim is incorrect, and the average mercury level is higher. We're going to let the parameter mu represent the true mean mercury content in blue fin tuna from this supplier. The owner of the restaurant might wish to test the null hypothesis that mu is equal to 0.4 parts-per-million, which would mean that the supplier's claim is correct, against the alternative hypothesis that the mean is actually greater than 0.40. From the owner's perspective, if the true mean is 0.4 or less, then the supplier is good to his word. And the owner might only wish to know if the mean is greater than 0.4 because that might enable him to cancel a contract with the supplier, or get a rebate on some funds that he paid out previously, or what have you. Now note that if we end up rejecting this null hypothesis in favour of this alternative hypothesis, we would also reject any value that is less than that. And so you sometimes see people writing the null hypothesis in this situation, that the population mean is less than or equal to 0.40. Some sources write it in this fashion. But in the bitter end, we do have to test a single value and so I'm going to be writing it in this way. In a random sample of 16 pieces of tuna from this supplier, there was an average of 0.74 parts per million of mercury. Does this yield strong evidence that the true mean mercury content is greater than 0.40 parts-per-million? Suppose it's known that the population standard deviation sigma is equal to 0.08 parts-per-million. In the real world, sigma's usually going to be an unknown quantity, but let's pretend that we know it here. In statistics it's almost always a good idea to plot your data to see what you're working with. And here I've plotted a boxplot of the mercury content from those 16 pieces of tuna. And I'm going to put in the hypothesized value of mu for a little perspective. The 16 pieces of tuna had a mean of 0.74 and so we're going to see if this difference here gives us strong evidence against the null hypothesis. Visually it certainly looks like these observations are all well above that hypothesized mean, but there is some variability involved. so let's see if this gives us strong evidence against the null hypothesis. First we should investigate the normality assumption, especially for a sample size as small as 16. Here's a normal quantile quantile plot of the mercury amounts. And a normal quantile quantile plot is an approximately straight-line if the data is approximately normally distributed. And in the normal quantile quantile plot world, that's a pretty straight line. So I'm going to give that a check, say the normality assumption is reasonable, and let's go ahead with the test. We're going to test the null hypothesis that the population mean is equal to 0.40, against the alternative hypothesis it's greater than 0.40. And there's our test statistic. The sample mean we found to be 0.74, sigma was assumed to be known to be 0.08, and we had 16 observations. Sigma X bar is the standard deviation of the sampling distribution of X bar, and that is simply sigma over the square root of n. In this case that's 0.08 over the square root of 16, which is going to work to 0.02. So when we work out the value of our test statistic, we take the sample mean here, 0.74, minus the hypothesized mean of 0.40, over sigma X bar, which is 0.02. And this works out to 17. Now what do we do with that number? Recall that if the null hypothesis is true this test statistic will have the standard normal distribution, So if the null hypothesis is true, this value that we get here should simply be a random sample from the standard normal distribution. Here I've plotted in the standard normal distribution, which is the distribution of the test statistic if the null hypothesis is true, and over here I've plotted in the observed value of the test statistic. Now this observed value of the test statistic is way way out here in the right tail. It would be very very difficult to get a value this big or bigger if we are sampling from the standard normal distribution. In fact, we can even work out the probability. If Z has the standard normal distribution, then the probability of getting a value 17 or bigger is 4.1 times 10 to the -65. A tiny tiny value, very very near zero. So it would be nearly impossible to get the value we observed or something even farther out if the null hypothesis was actually true. So this gives us very very strong evidence against the null hypothesis and we can say there is extremely strong evidence that the true mean mercury content of blue fin tuna from this supplier is actually greater than the claimed 0.4 parts-per-million. Here the evidence against the null hypothesis is overwhelming. We would almost never see what was observed if the null hypothesis were true. So there was extremely strong evidence against the null hypothesis. But it's not always that obvious. How far out does the test statistic need to be before we can say the evidence against the null hypothesis is significant? Before we can reject the null hypothesis in favor of the alternative hypothesis? To judge whether the evidence against the null hypothesis is significant, we use one of two approaches: the rejection region approach, or the p-value approach. I have a strong preference for the p-value approach, and use that in most settings, but I do look at both of these approaches in separate videos.
Info
Channel: jbstatistics
Views: 155,701
Rating: 4.8982706 out of 5
Keywords: Hypothesis testing, hypothesis, testing, hypotheses, one sample, mu, population mean, one mean, tests of significance, test of significance, statistical inference, jbstatistics, jb statistics, statistics, 8msl, 8 minute stats lectures, intro stats videos, intro stats help, stats help, stats tutor, jeremy balka, AP statistics, p value, p-value, sushi, mercury
Id: pGv13jvnjKc
Channel Id: undefined
Length: 11min 12sec (672 seconds)
Published: Sat Jan 26 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.