Lecture 19- Hypothesis Testing: T-Test, Z-Test

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome everyone to the session of marketing  research and analysis. In the last session   we have discussed about hypothesis. We had  introduced the subject of hypothesis.   So what exactly is in hypothesis and why it is  so important for any researcher. So hypothesis is   basically as we understand is an assumption. So as  a normally we say I hypothized that this is going   to happen, my hypothesis that today it might rain  or I have a hypothesis that this new machine will   work better than the old machine. So these are  basically that means the thing has not happened   and we are trying to predict, we are trying to say  something in the positive or negative right.   So hypothesis is basically an assumption as we  understand. So the question is, so when we did it,   we say that there are two types of basically  any hypothesis of two types. So the null and   the alternate, so the null basically we said  is one where the researcher follows the basic   maintain the status quo that means something  would happen what it is normally happening right,   or it is a case of an equal to that means two  things are equal and till we have not proven   it. So the null says they are equal to, so it is  a case of basically that is why it says it is a   case of equal to case. So let us say the mean of  two groups is equal to or the intelligence of two   people is equal to or same right. So the words  same equal to come into the null. On the other   hand, alternate is something which is not the  null or against the null you can say right.   So suppose the researcher says he wants to  know the intelligent of two people are same   or not same, so the null says it is same and  the alternate says no they are not same so   they are not equal to so this is the case  of equal to this is the case of not equal   to okay. and the most important thing is that  in most of the researchers be say with this   want to disprove the null hypothesis we want to  disprove or reject the null hypothesis why?   The question is very simple because any  researcher if he wants to accept with the   status quo what is happening would happen then  what is the point of doing a research? So in   order to have an outcome which is significant  or effective whether you know its effective in   those cases we always like to check the more  important thing is the alternate for us right   that is why if you read research papers please  understand the hypothesis that you see on those   research papers written on the research papers  are basically they are not the null hypothesis   they are the alternative hypotheses right  this is the hypothesis that the researcher   actually wants to claimed or check okay. Now let us get in to the subject during hypothesis   testing in the last session also I had discussed  there are basically several steps to check the   hypotheses. So what are the steps so first we  one has to check for the basic assumptions okay   of a normal distribution, so assumptions of a  normal distribution right that means the data   behaves in a normal manner right it behaves  similar to a normal distribution right.   Then we said you need to you know check  the tail of a test the direction of a   test direction of a test right so whether it  is a one tail one or two tail test okay then   we said the researcher has to be sure what  level of significant he wants to work right   now significant level has to be decided by the  researcher now when I am saying the significant   level one can understand it as the a right. So if you remember a we said this is the chance   that a hypothesis which is correct is will be  still rejected right a chances of rejecting   a true hypothesis is called a. Rather the other  side is which was the chance of accepting a false   hypothesis okay so once we do the significant then  we calculate the statistic. Now what is statistic   her, now the statistics means we say the z or t  statistic right, w calculate the z or t statistic,   so that means and how do we calculate, now z had  said if you remember z or t whatever if you see   is equal to x which is the x- right up on the  standard error right this was the formula. So   same thing will happen also applicable for t the  only difference between a z and a t distribution   I will explain is that to start with you have  to understand that is z distribution or z test   is used when the sample size is large, okay t is  used t is used when sample size is small or up   to 30 okay, 30 if your sample size is small up to  30 then we will use the t test, okay. Let us see,   so once you have done with the statistic the final  step comes is to compare the compare I think it   will not be visible if I write here, maybe I will  write at the top this point I am writing here,   so its visibility will be there. So the  fifth point compares the statistic the z   statistic z or t statistic with the critical  value. Now critical value is something that   you can find from the z or t table at the  end of any book or you can just Google out   and you can see okay, what is the value. Now this is the outline so let me go through it,   so we are getting into the logic of hypothesis  testing, so the five step model which I just said,   so hypothesis testing for single sample means  one sample or it could be more than one sample,   sample proportions basically hypothesis this is  the when you talk about z or t there are two ways   of understanding. The hypothesis testing  is call as a test of means or proportions,   right what does it means? It says it is the test  of means or proportions that means what, two let   us understand, let us go to the basic meaning. What does hypothesis testing actually say, through   a hypothesis testing we are actually trying to say  that there is a significant difference between the   sample mean and the population mean or there  is not a significant difference as good as it,   that means if I still break it to a more you  know elementary level it means can I say that   from a sample mean that this sample comes from  particular population or not, is the sample a part   of the population or it is some other sample, it  is not related to the population. We are trying to   actually test this thing right, in any hypothesis  testing so but the question is when we are taking   understanding it from the terms of mean right, so  we can check it from the terms of mean so what is   happening in the normal distribution we said okay,  basically we are bothered about the mean right.   So that is why if we take the mean of the sample  and we take the sample population sample mean and   then we say is it significant some relationship is  there or nor or if the suppose there is something   is it by chance or it is the sum, it is it would  happen again and again as good as that, so one   is test of means the other is as I said test of  proportions right, so proportions as we understand   from mathematics right, so proportions are  something which is in a form of p and q right.   So it is the ratio basically, we say right so  what is happening we are trying to see okay   whether if there is no mean and we only have  a idea okay, what percentage of the population   it is there or not there can it have such  situations also, can we test hypothesis yes,   so in those cases when you do not have the mean  you can use the proportion okay. Let us see I have   an example I will show you. So let us say. Hypothesis testing is designed to detect the   significant differences as I said  that did not occur by random chance,   so if there is a significant difference we are  saying there is a significant difference between   the population mean and population mean and the  let us say sample mean let us say okay.   Now we are saying that this significant difference  that is happened will happen is not one which has   happened due to some chance element it is actually  it has happened right so to claim or to test   this we are doing the hypothesis test right. So there are two three types of test basically   okay now if you let me rub this off I am not  getting space so basically as I said the z   and the t these two are more or less the same  thing right if you look into any statically   software or anything you would not see a z test  because that it is that z is nothing but an you   know it is extension of the t only right so only  thing is that t is small and this is a large right   so this is around 30 as I said so what happens  is how does the distribution look like now if   you take a t distribution the t distribution the  curve is something like this okay now what I mean   that means what if you look at the kurtosis you  know the kurtosis of the height the peaked ness   of the curve the t test the t distribution  is more flatter is more flatter than the z   distribution the z distribution is more or less  it is normal in nature right but the t is more   flat and tapering at the end so what happens in  this situation what is basically happening is   so when a t when you extend increase the number  of sample size for example okay. So as you go on   increasing the sample size the t tends to become  a z so that means the t and z hardly there would   be any difference because when you increase the  sample size from 30 to let us say 40, 50, 70,   80, 90, 100 whatever then automatically the t  and the z would more or less look same right so   there is a basic understanding. So okay now I  was saying so that is why we basically when we   talk about the t test right what we are doing  is t test is of again is of three there are   three basically tests one is called one sample t  test okay the second is independent sample t test   okay the third is the dependent sample t test  okay so the t test has been you can say there   are three types of test right the t test can be  explain in terms of the one sample t test the two   sample or independent sample t test and the pair  sample t test right now what does it mean what is   that one sample when you have only one sample  When you have only one sample and you want to   compare you want to compare again sampling right  what will you compare so you will compare this   one sample the mean of this sample against what  you will compare against some hypothetical some   hypothesized mean right so that means let us  say when you have got a group of sample let us   say the intelligence or the score of a group of  people right of one section or one class you want   to check the you are checking the mean right. So you found something the mean is let us say 60%   or something okay now this 60% is significantly  different from the population mean or not how   would you know and to know that is such cases  what we do is basically we compare it against   the hypothesized population mean right so  the hypothesized value hypothesized value.   The hypothesized value that we use is basically  something that we compare and this value must   have come may be from some past record or  past experience so we know that the people   in this class generally in a class of let us  say marketing research score around let us   say 70% marks okay so now this is a something  that we have hypothesized from the because of   the past records and now whatever we have  calculated from that one sample right.   Now we will compare this mean with that  hypothesized mean value okay so we compare   what is saying we compare a random sample  okay from a large group to a population okay   and this population value is the hypothesis  value second we compare a sample statistic to   a population parameter to see if there  is any significant difference or not.   So this is highly useful for those studies like  example in industries in manufacturing industries   where you know they are trying to find they are  making suppose some kind of products and they   want to check the strength of the products,  so they will take the sample and they will   compare it again some earlier value that they  know atleast the share should be able to take   the take 150 kilo weight at one times suppose.  Now that 150 kg is something they are comparing   against the sample mean against this 150 kg okay,  let us say this is the problem we have taken.   The education department at a university has been  accused of grade inflation they will accuse the   grade inflating the grade so the education  majors have much higher GPAs than students   in general. Now people who have taken education  as a major subject right have found to be having   higher GPAs right then the students in general  okay GPAs of all the education major should be   compared with the GPAs of all students there are  generally if you see so we have to compare the   GPAs of all the people who are having education  as a major and the non ones and check them.   So there are thousands of education majors right  there are thousands of subject which are where   people have majors right and which is too many  to interview it is very difficult to work on   such a large sample okay large group how can  this be investigated without interviewing all   the majors so you have around thousand majors  or more than 1000 majors now if I am going on   if I go on checking then it is like checking  the whole population and that is not wise and   that is not advisable because of the lakh of  time and money so in such a condition what we   will do now what we know the data says. The average GPA the average GPA for all the   students for all the students is 2.7 okay now this  is the population parameter that means that is   the population statistic okay. So = 2.7 if you  remember I told you is the sign symbol used for   population now if you look at this is the sample  values right the x bar or the sample means sample   mean means to the people who had some kind some  majors education majors right some subjects let   us say history bio technology or anything right. There scores were taken to found to be the sample   mean was 3 right the s is the sample standard  deviation there is a population standard   deviation which we denote by let us say  s okay. Now this is the sample standard   deviation s is 0.7 n = 107 so they have taken  117 candidates to took their interview to the   score and they wanted to do the test okay. The question is there a difference between the   parameter the population mean let us say and the  sample mean if I am asking is there a difference   between the population mean and the sample  mean yes or no, so to do that what we are   saying could the absorbed difference if suppose  there is a difference we are finding 3 2. 7 is   0.3 but is this difference actually really  there is a difference or it has is there a   chance that by chance it has happen by  those samples which we are taken so is   there a difference real we want to check okay. Now it saying the sample mean is the same as the   population means two possibilities there are two  possibilities actually the sample mean is the same   as the population mean that means it is only by  chance it has happen is time the difference is   trivial and caused by random chance okay or the  difference is actually significant the difference   is real the education majors are different from  all students that means the people who have   taken education some education majors there mean  that they have derived the scores are actually   different from the other students okay now what  is the as I said if you remember you have to first   ride the null and alternative hypothesis. So the null hypothesis is what is this the   difference is caused by random chance so it  states there is no significant difference   what does it say that whatever is a happened  if there is a difference of 2. .3 or something   this is due to a chance okay and there is no  significant difference between the two groups   the sample and the population in this case we say  that there is no significant different between   the population mean and the sample mean right  but as a researcher are you interested to find   that no so what are you interested to find? Now we are interested to find to see that no the   difference is actually real now what is it mean  now it says that there is difference that means   the population mean the difference between the  population and the sample is significant in nature   right So and if you see both the explanation  cannot be true the two possibilities cannot   be true null and alternative at the same  time cannot be true only one either it is a   difference or there is not statistical difference  okay. So now which one is true let us see?   So to assuming that the null hypothesis true  right we always test the null hypothesis we   always we will although we will interested to  have the alternate but we will check the null   hypothesis what is saying what is the probability  of getting the sample mean 3 if H0 is true and all   education majors really have the mean of 2.7 in  other words the difference between the means is   due to random chance null hypothesis right. Now what is the probability? Now he is taken if   the probability associated with this difference  is less than 0.05 reject the null hypothesis   now we should remember I had told you in the  last session also how do you except or reject   a hypothesis so if I said the suppose you can find  out the z value right now if the z value that you   have found out you compare it against the table  value now table value at what confidence level   that you have to decide earlier now why earlier. Now the point is if you do not decide their   confidence levels earlier from the beginning  such let us say 95% or 99% then the researcher   might change his mind if he does not get the  desired result so that is why it is always you   have to decide it fix it from the beginning  okay. z value let us say at 95% for a also   said for a two tail test and tail test both  the values would differ for a two tail test   may be the regression roles are spreaded two  ends right so at 95% it becomes 1.96 right.   So the area is basically if you how do you check  it now you can go to the normal distribution the   table and look at the value of 0.475 now why it  is 0.475 now 0.475 x 2 is basically nothing but   95% so if my two tail or taking 0.25 0.25 so I  0.475 here 0.475 here but if it is only a one   tailed or one tail test it will look something  like this okay, that this side may be suppose I   am not getting the directions, suppose I am not  interested in the left I am only interested in   the right, so the rejection will lie, all the  5% will lie here okay so this portion is 0.5   as it is this become 0.45. So to do this if  you want to check the area under the curve,   so if you look at 0.45 at the table you will find  the area under the curve or the Z value sorry,   is not 1.96 now, it is only 1.64 okay. So once you have calculated then you see   if this is the acceptance zone right, now whether  your value falls this side to this cutoff value   or this value, if it is falling this side your  null hypothesis is accepted right. But if it is   falling somewhere this side away from the cutoff  value then it is rejected okay. This is one;   I also said if there is something called a P  value, now a P value in the last session only   I told you P value is the probability value. If this P value is less than 0.05 in your case   right, what is this P value basically  it says what is the chance of the value   of your calculated value falling at a extreme  zone or the other extreme ends okay. So if it   is less than 0.05, then you reject the null  hypothesis right. If it is less than 0.05,   if the P value is less than 0.05 you reject,  if it is less than 0.05 you reject the null   hypothesis. But suppose it is more than 0.05  at a 95% confident level please remembers this,   if it is 95%, if it is 99 the value will change. Then you will accept the null hypothesis right;   accept the null, so it is a probability. So is  it the probability of falling is within the 5%   or is less than that, if it is less then reject  it okay, as good as that. Now let us look at this,   so I am not getting into this, so you have to  calculate I have already told you so right,   if the probability less than 0.5 the calculated  observed z will be beyond + or -1.96 as I said.   Now this is how it looks the cut off value right,  so this is then area where you are talking,   we were just talking about okay. Now this is the five steps which I told   you at the beginning right, now let us see what  has been done, let us go to the calculation.   Now =2.7 in other words in null hypothesis we  are saying that the population means and the   sample mean are same equal, right so there  is no difference okay, now the sample of   117 comes from the population that has a  GPA of 2.7 right, the difference between   2.7 and 3 is trivial and caused by random  chance this is what we have to prove okay.   And what is an alternate hypothesis,  now is not equal to 2.7 right,   okay now let us look at the calculation. Now what he is doing, the sampling distribution is   Z right, now okay one thing you have to understand  now whether it will be a two tailed test or one   tailed test. So the is equal to the population  or not equal to the population there will be   two tailed right, because it can be less than it  can be greater than. So any difference with the   probability less than the a is rare and will cause  us to reject the null hypothesis, okay.   So let us go to the what is the formula as I  have already done this formula many a times   the Z for large samples which is greater than or  equal to 100 if Z = X sample mean the population   mean up on the standard error or s/vN, N  is the sample size, so through also you can   calculate the sample size as I told earlier. But suppose your sample deviation is not known,   suppose the sample deviation, the standard  deviation of the population is not known   in that case your formula will slightly  change that means if you do not have the   population standard deviation you have  to take the sample standard deviation.   And when you take the sample standard deviation  which was 0.7 if I am not wrong if I remember it   was 0.7 you have to divided by a degree  of freedom of not N by n-1 right, v n N   by n-1 but, so this is the only change right. Now in this case let us go back and see so what it   has done, so to test the hypothesis he has taken  3-2.7 divided by sample because you do not know   in this case in our case with the population let  us go back if you have forgotten I will show you,   I think I show you okay, so if you see we did  not have the population standard deviation.   It was not given to us, we had the sample right,  so we are using this okay, so 117.07/v117-1 so   how much is the value in a 4.62. Now this  4.62 is obviously our 95% value was 1.96.   So 4.62 would come obviously this else somewhere  here right, 4.62 so if it is there automatically   you can understand that it is to be the null  hypothesis would be rejected, okay.   Now the obtained Z score fail in a critical region  so we reject the H0, if the Ho where true sample   outcome of 3 would be unlikely therefore the H0  is false and must be rejected. Now what are the   conclusion education majors have a GPA that is  significantly different from the general student   body, so earlier hypothesis was okay, there is  no difference between the education majors and   the normal students. But now you are saying  okay, no there are null hypothesis has been   rejected and actually this is what we wanted that  null hypothesis should be rejected and there is   a difference between the two groups okay. So this  is how it looks like 4.62 so this is 1.96.   So we are saying it is falling  somewhere here okay.   So summary is already I have explained the gp  of education is significantly different from the   general body so this is all right we are going  to do so we rejected this 0 and concluded that   the differences was significant right okay  fine now this is the rule of thumb.   If the test statics is in the critical region  a is 0.5 it is beyond reject the height 0 the   difference is significant right suppose it falls  in the critical region that is in between 1.96   to -1.96 that means what please if you remember  I told you they never say we never ever should   say that we have accepted the null hypothesis we  always say we failed to reject the null hypothesis   so understand the differences in our test. We generally use we say for a normal you   know interpretation we say we accept the null  hypothesis but that is the wrong interpretation   we should say that we fail to reject the null  hypothesis so here we will say if something   falls in between +1.96 and -1.96 right so we will  say that the difference is not significant and it   is only a matter of chance that this time it has  happened okay so this is the students distribution   for small samples also there. I can show you this is what we   have done right. So this is the formula for a t   test which is like which was similar to the one  when you did not have the population standard   deviation right so this is all this are the some  other problems I have brought but I think you can   do it later on right so may be in next class  or something we will continue this session   okay thanks for this session we will meet in  the next session where we will continue with   the way the t and z will just formulate. And we get into a third condition where we   have more than two right till now we have only  worked with one sample we are not been able to   even do the two sample and other things so  we will may be continue in the next session   by hope that you are clarity has been there  what is the null what is the alternate and   how do you check the null so this is what  in further session thank you so much.
Info
Channel: Marketing research and analysis
Views: 192,144
Rating: 4.7245359 out of 5
Keywords: Hypothesis Testing: T-Test, Z-Test
Id: zmyh7nCjmsg
Channel Id: undefined
Length: 31min 47sec (1907 seconds)
Published: Sat Aug 12 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.