An Introduction to Statistical Inference

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in 1972 as part of a study on gender discrimination 48 male Bank supervisors were each given the same personnel file and asked to judge whether the person should be promoted to a branch manager job that was described as routine the files were identical except that half of the supervisors had files showing the person was male while the other half had files showing the person was female it was randomly determined which supervisors got male applications and which got female applications of the 48 files reviewed 35 were promoted the study is testing whether females are unfairly discriminated against let's take a look at the data the percentage of males promoted is 21 out of 24 roughly 88% and the percentage of females promoted is 14 out of 24 roughly 58 percent so there's a considerable difference between the proportions of males and females promoted in this study there are two possible explanations as to what might be going on in this study and these are our two competing claims one there's nothing going on promotion and gender are independent there's no gender discrimination and the observed difference in proportions is simply due to chance this is our null hypothesis and two there is something going on promotion and gender are dependent on each other there is gender discrimination that observed difference in proportions is not due to chance this is the alternative hypothesis hypothesis testing is very much like a court trial in the u.s. the null hypothesis says that the defendant is innocent and the alternative hypothesis says that the defendant is guilty we then present evidence or in or in other words collect data then we judge this evidence and ask ourselves the question could these data plausibly have happened by chance if the null hypothesis were true if the data were likely to have occurred under the assumption that the null hypothesis were true then we would fail to reject the null hypothesis and state that the evidence is not sufficient to suggest that the defendant is guilty note that when this happens the jury returns with a verdict of not guilty the jury does not say that the defendant is innocent just that there is not enough evidence to convict the defendant may in fact be innocent but the jury has no way of being sure said statistically we fail to reject the null hypothesis we never declared the null hypothesis to be true because we do not know and cannot prove whether it's true or not therefore we also never say that we would accept the null hypothesis if the data were very unlikely to have occurred then the evidence raises more than a reasonable doubt in our minds about the null hypothesis and hence we reject the null hypothesis in favor of the alternative hypothesis of guilty in a trial the burden of proof is on the prosecution in a hypothesis test the burden of proof is on the unusual claim the null hypothesis is the ordinary state of affairs the status quo so it's the alternative hypothesis that we must consider unusual and for which we must gather evidence so to recap we start with a null hypothesis that represents the status quo we also have an alternative hypothesis that represents our research question in other words what we're testing for we conduct a hypothesis test under the assumption that the null hypothesis is true either via simulation or using theoretical methods if the test results suggest that the data do not provide convincing evidence for the alternative hypothesis we stick with the null hypothesis if they do then we reject the null hypothesis in favor of the alternative so if you have a deck of playing cards handy you can actually conduct the simulation yourself with me remember the objective is to conduct a simulation under the assumption that the null hypothesis is true in other words assuming there is no gender discrimination and that differences in promotion rates that are observed are simply due to chance first we're going to let a face card represent a not promoted and an on face card represent a promoted file we're going to first start with setting aside the jokerz there are 52 cards in a deck however only 48 files in our experiment to simulate the experiment we need to remove some cards to hit a total sample size of 48 we take cards out in such a way that if we let a face card represent not promoted and an on face card represent a promoted file the distribution of face and non face cards match the distribution of the promoted and not promoted files so we're also going to take out three aces so therefore there should be thirteen face cards left in the deck these are aces kings queens and Jack's which is the total number of promoted files we're also going to take out a number card just any number card so that there are exactly 35 number cards left in the deck representing the promoted files let's repeat that one more time with a bit of visual aid we're taking out three aces and one number card a total of four cards out of a full deck of 52 and hence we're left with a deck of 48 cards the same same number as the observations in our study number cards represent files that were promoted and there are 35 of them and face cards represent files that were not promoted and there are 13 of those then we shuffle the cards and deal them into two groups of size 24 representing males and females note that random shuffling is what simulates this idea of leaving things up to chance and here is some visual aid to go along with that as well next we count how many number cards are in each group which represent the promoted files and we calculate the proportion of promoted files in each group and take the difference between the proportions of males and females promoted just like we did with the original data let's go through the results of my simulation together if you have been following along with your own deck of cards you might have different results than mine since the shuffling and splitting into two piles was done completely randomly since we're randomly splitting the Primeau files into two groups we would expect to see no difference between the proportions of male and female promotions in other words the proportions of number cards in the mail and female piles that being said the observed value may not exactly be zero in this case we had 18 number cards in the mail pile which yields a 75 percent promotion rate among the males and there are 17 number cards in the female pile yielding a 70 point eight percent promotion rate the difference between these simulated promotion rates is what we want to keep track of we expect this number to be zero but we also expect it to vary and we want to know how much it varies so that we can compare our original difference of 30 percent to the distribution of differences simulated under the assumption of Independence between promotion decisions and gender in this case we calculated the difference of 4.2% so we note that before we proceed to the next simulation once we're done with one simulation we repeat steps two through four many times to build a distribution of simulated differences so let's go through this one more time we're going to start by shuffling the cards usually if you have a full deck of cards make sense to shuffle them about seven times to get a truly random shuffle when you're done with that what we want to do is to split this into two equally sized decks of size 24 representing the males and the females doesn't really matter which one you're calling male versus female so let's just say this is our male pile and this is our female pile the next step is going to be to determine how many files were promoted in each pile which means we need to count the number of number cards in each pile among the males I'm counting 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 so we have 17 out of 20 for males promoted which should leave about 18 out of 20 for females promoted in the next step we need to calculate the proportions and take the difference and note that on our dot plot and we would repeat this many many times to build a simulation distribution so how do we ultimately make a decision if the results from the simulations look like the data then we decide that the difference between the proportions of promoted files between males and females was due to chance and that promotion and gender are independent if on the other hand the results from the simulations do not look like the data then we decide that the observed difference in the promotion rates was unlikely to have happened just by chance and that it can be attributed to an actual effect of gender in other words we conclude that these data provide evidence of a dependency between promotion decisions and gender if we repeat the simulation many times and record the simulated differences in proportions of males and females promoted we can build a distribution like this one for example here we have a dot plot of the distribution of the simulated differences in promotion rates based on hundred simulations while we showed earlier how to simulate this experiment using playing cards we should note that the task of the simulation is best left up to computation it's faster and less prone to errors the distribution is centered at zero which we can also think about as the null value since according to the null hypothesis there should be no difference between the proportion rates of males and females yielding a different of zero we can see from the distribution of the simulated differences in promotion rates that it is very rare to get a difference as high as 30% the observed difference from the original data if in fact gender does not play a part in promotion decisions the low likelihood of this event or a difference even more extreme suggests that promotion decisions may not be independent of gender and so we would reject the null hypothesis our conclusion is then that these data show convincing evidence of an association between gender and promotion decisions made by male bank supervisors we just walked through a brief example that introduces you to statistical inference and more specifically hypothesis tests we started by setting a null and an alternative hypothesis then we simulated the experiment assuming that the null hypothesis were true we evaluated the probability of observing an outcome at least as Extreme as the one observed in the original data and since this probability was low we decided to reject the null hypothesis in favor of the alternative the probability of observing data at least as Extreme as the one observed in the original study under the assumption that the null hypothesis is true is called the p-value one of the commonly used criteria for making decisions between competing hypotheses we will continue our discussion on p-values and hypothesis tests in future units as well and learn various methods for conducting hypothesis tests for various types of data
Info
Channel: Research Channel
Views: 14,441
Rating: 4.6767678 out of 5
Keywords: statistical, inference, null, alternate, alternative, hypothesis, hypotheses, testing, simulate, design, experiments, how to, what is
Id: 1UV1Q14oiL8
Channel Id: undefined
Length: 12min 16sec (736 seconds)
Published: Wed Jan 13 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.