Statistics 101: Single Sample Hypothesis Z-test Concepts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello thank you for watching and welcome to the next video in my series on basic statistics now as usual a few things before we get started number one if you're watching this video because you were struggling and in class right now I want you to stay positive and keep your head up if you're watching this it means you've accomplished quite a bit already you're very smart and talented and you may have just hit a temporary rough patch now I know with the right amount of hard work practice and patience you can get through it I have faith in you many other people around you have faith in you so so should you number two please feel free to follow me here on YouTube on Twitter on Google+ or on LinkedIn that way when I upload a new video you know about it and it's always nice to connect with people and watch my videos online the world is much too large and life is much too short not to take the opportunity to connect with one another number two if you liked the video please give it a thumbs up share it with classmates or colleagues or put on a playlist because that does encourage me to keep making them for you on the flipside if you think there is something I can do better please leave a constructive comment below the video and I will try to take those ideas into account when to make new ones for you and finally just keep in mind that these videos are meant for individuals who are relatively new to stats so I'm just going over basic concepts and I will be doing so in a very slow deliberate manner not only do I want you to know what's going on but also why and how to apply it so all that being said let's go ahead and get started so this video is the next in our series on hypothesis formulation and now finally hypothesis testing so up to this point we've talked about what a hypothesis is we talked about the null hypothesis we talked about the alternative hypothesis we talked about type 1 error and type 2 error with many many examples so all those were leading up to this very topic and that is actually conducting a hypothesis test now there are many time types of hypothesis test but we're going to do the most simple in this video and that is where we have a single sample with a known Sigma or we have a single sample we're testing against a petha sized mean and we are given Sigma which is the population standard deviation so we'll go over several distribution curves we'll talk about critical values and how that affects our alpha level and things like that and then we will walk through two real-world examples so all that being said let's go ahead and dive right in now it's very important to point out the hypothesis test follow a very prescribed procedure now as usual it always starts with a well-developed clear research problem or analytical question if the problem is poorly thought out if what you're trying to accomplish is unclear that no amount of statistics is going to be able to solve that it can actually make it worse so always think through what you're trying to find out at the problem stage now once you have that we always establish our hypotheses both the null and the alternative so remember the null and the alternative are complete opposites of each other and they must account for all possible outcomes then we determine the appropriate statistical test and sampling distribution so as I said before there are many types of hypothesis tests so in this one we're going to be looking at the Z test and other ones we might look at the T test and there are more still after that then of course the sampling distribution will depend on whether or not we have Sigma given to us or we know it or we have to estimate it so step three is always determine the appropriate statistical test and the sampling distribution then we choose our type 1 error rate so what comfort level do we have with making a type 1 error is it 5% one percent 10% again it will just depend on what our study asks for and what we are comfortable with it also has to do with what level of type 2 error we are comfortable making because remember they are inversely related then we state our decision rule so in this case we're going to come up with a Z statistic and then we will have to determine whether or not based on that Z statistic we're going to reject our null hypothesis or fail to reject our null hypothesis then and only then do we go out and gather our sample data so I know a lot of students I've worked with are really excited about going out and collecting data the very first thing but I always have to say no always form your research question or your analytical question first set up your hypothesis so you know what you're actually going at and then I choose your test your distribution your error rate decision rule etc then go out and get your data so there is this impulse to want to go out and collect data first and then form the research question based on the data you collected no it's the other way around always form your question first but once we have the data we calculate our test statistics so in this case it will be the Z statistic but based on those test statistics we will state our statistical conclusion so we'll have a statistic to then compare to our decision rule and then however our statistic compares to the decision rule will be our conclusion and then finally in the real world we can either make a decision or an inference based on that conclusion so it may be some research question in a journal we're looking at it may be a policy in our business we are looking at it may be some analytical work we are doing may be in the financial industry or in the insurance industry or in the production industry whatever it might be so we finally get to the point where we can make a decision or some policy recommend nation based on our conclusion now as I said there are really two types of these statistical tests there are ones where we know Sigma and ones where we don't so as with confidence intervals there are two types of single sample hypothesis tests when the population standard deviation Sigma is known or it's given to us and when the population standard deviation Sigma is not known and therefore we have to estimate it using s the sample standard deviation now when Sigma is known or given to us we use the normal standard or the Z distribution to establish the non rejection region and the critical values in our sampling distribution so again we talked about that a great length when we looked at type 1 and type 2 error rates so if you're still unsure what this concept is go back and look at those videos but when we know Sigma we're going to use the normal standard or the Z distribution to establish these regions now when Sigma is not known we will use the T distribution instead because remember the T distribution is a little bit shorter in the middle and it has a little bit more probability in the tails to account for that unknown or that estimation we're doing with the standard deviation of our population now some instructors in some books will indicate that using the Z distribution is acceptable anytime the sample size is 30 or greater whether or not you know Sigma or not now I prefer to go ahead and use the T distribution anytime I do not know Sigma now remember the reality is is that as sample size increases the Z distribution and the T distribution actually converge so it just depends on what your instructor or your book is asking you to do because the T distribution with it's fatter tails will actually change a little bit how the Alpha level affects your critical values now it's always good to check the sample data for normality better safe than sorry so you might want to look at a histogram or a QQ plot or a PP plot of your sample data to make sure it's not skewed heavily one direction you don't have any really crazy outliers or whatever else that might be it just always good to check your data for normality so remember what we're talking about here is the hypothesized versus the true mean so mu is the true mean of the population under analysis so for analyzing a population it actually has a real world true mean now mu sub 0 is the hypothesized mean of the population under analysis so we might have some guess or some previous study or something else we are testing it against so we're testing two means we're testing our data's mean the actual population mean versus some hypothesized value we think it is so what we're asking here is the true mean the same as the hypothesized mean are they coming from the same distributions now we will test that question or this question using sample means of course and confidence intervals which we'll call critical regions here in a minute now let's just remind ourselves about the two-tailed test rejection region so here we have our two hypotheses as we had before and then in this case we're going to choose an alpha of 0.05 so we have our distribution or sampling distribution that looks like this now remember what we're actually saying here with an alpha of 0.05 we are saying that this blue area in the middle is 95% now 95 percent of what well what we're saying is that 95 percent of our sample means that we would take should be within this blue region and then we risk 5% being outside that region now a hypothesized mean is set here in the middle and we call this blue region the non rejection region and on the ends and the tails those are both rejection regions now our alpha in this case is spread evenly among both tails so our alpha of 0.05 we have point 0 to 5 in the lower tail end point 0 to 5 in the upper tail that's 2.5% on the lower 2.5% in the upper tail now dividing the non rejection region and the rejection region it's called the critical value it's one of that boundary between the two now remember the critical value is determined by alpha in this case point zero five and if we are using the T or the Z distributions with an alpha of 0.05 and Sigma known we would consult the Z table and find the corresponding Z scores for a two-tailed test with the alpha 0.05 now when we do that we see that our Z critical values are negative 1.96 and positive 1.96 so that z-score is the boundary between the non rejection region and the rejection region based off our Z table and our alpha and again we're using the Z table because we know our Sigma now what if we change the Alpha level to point 1 0 so we had point 0 5 now we have an alpha of 0.1 0 which is twice the previous alpha now if you look at our tails something should be fairly obvious right off the bat our rejection regions are larger and our non rejection region is smaller or narrower now we are saying that 90% of our sample means should be in the blue in the non rejection region therefore 10% would be in the tail in the rejection region either above or below so now we have an alpha divided by two of point zero five so that's five percent in the lower tail and five percent in the upper tail now as far as critical values go are they going to become smaller or larger well they're going to become smaller because the critical values moved inward because we have less probability they're in the middle so it has to move inward so our Z critical values are now negative 1.645 and positive 1.645 so what happens when our alpha level increases so in this case we went from point zero five two point one zero our non rejection region gets smaller in the middle and the rejection regions in the tails get larger and of course our critical values move inward so finally let's look at what happens to our critical values when we use an alpha of 0.001 so in the previous slide we looked at an alpha of 0.1 zero so now this is point zero one so let's make some predictions about what's going to happen here well you notice that our non rejection region in the middle is much wider there's much more area there in the blue now the reason that is is because we have to take this point zero one or one percent and divide it evenly among both tails so we have point zero zero five or one half of one percent in the lower tail and we have point zero zero five or one half of one percent in the upper tail so what we're saying is that we expect 99 percent of our sample means to be in this non rejection region in the blue region in the middle and of course in the tails that is our rejection region and we expect one percent of our sample means to either be in the upper or the lower rejection region now of course the whole point of these series of slides is to talk about what happens to our critical values so what's going to happen to our critical values with a very very small alpha of 0.001 it's going to get larger or smaller well the critical values are going to get largerr so here we have plus or minus two point five seven six and those are by far the largest values or you can think of them as the widest values of all the alphas we have used so the overall point of these last few slides is look the relationship between alpha and the area of our non rejection region in the middle our rejection regions on the ends and the effects of the critical value as it sets the demarcation or the boundary between these two regions so what are we really asking in sort of real-world language what we're asking is did our sample come from the same population we assume is underlying the null hypothesis so if we take a sample from a population to use in our Z statistic we want to make sure we're testing whether or not our sample came from the population we are hypothesizing it came from now if so then we expect our sample mean to be inside the critical region either 90% of the time 95% of the time or 99% of the time depending on what we choose for alpha that's what we are really asking is our sample mean from the same population we are hypothesizing it to be coming from so let's go ahead and look at the actual Z test for a single mean so here is our formula now it is comprised of x-bar which is the sample mean mu sub 0 which is the hypothesized population mean given our problem Sigma is the population standard deviation again it's a given or a known to us in this case and n of course is the sample size as it always is now if you remember from the previous videos this denominator is a very special term it is the standard error of the mean which is another name for the standard deviation of the sampling distribution so the standard error of the mean is the standard deviation of a distribution of many many many samples so you may see it written like this so Sigma sub x-bar is the same thing as it's written over here on the left they're both representations of the standard error of the mean so just wanted to show you both ways depending on whatever class you're in or whatever books are using you might see it either way now the question we are asking when we find this Z statistic is is this Z test value in the non rejection region in the middle or is it in the rejection region in the tails so one of the other tails depending on how our apophysis is set up and that's what we're doing when we do is Z test you
Info
Channel: Brandon Foltz
Views: 172,057
Rating: 4.9485478 out of 5
Keywords: hypothesis testing single population, hypothesis testing statistics z test, one sample z test, hypothesis testing z test, statistics z test, z test statistics, z test statistic, z-test statistics, z test hypothesis testing, z-test, z test, alternative hypothesis, statistics, brandon c. foltz, hypothesis testing, brandon c foltz, brandon foltz, confidence interval, statistics 101, p value, p-value, hypothesis, null hypothesis, hypothesis test
Id: HoqzIR8xj4s
Channel Id: undefined
Length: 19min 9sec (1149 seconds)
Published: Tue Mar 26 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.