Null Hypothesis, p-Value, Statistical Significance, Type 1 Error and Type 2 Error

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Distinguished future physicians welcome to Stomp on Step 1 the only free videos series that helps you study more efficiently by focusing on the highest yield material. I’m Brian McDaniel and I will be your guide on this journey through Null Hypothesis, Alternative Hypothesis, Type I and Type II Error, p-Value, alpha, beta, power & Statistical Significance. This is the 11th video in my playlist covering all of biostatistics and Epidemiology for the USMLE Step 1 Medical Board Exam. There is a lot to cover but, we will try to move through things quickly and break them down into bite sized pieces. We will start with the Null Hypothesis which is represented by H subscript zero. The null hypothesis states that there no difference between the groups being studied. In other words there is no relationship between the risk factor or treatment being studied and occurrence of the health outcomes. For example, if we are comparing a placebo group to a group receiving a new diabetes medication then then null hypothesis states that the blood sugars or medical complications would be roughly the same in each group. We will talk about this more in a second, but by default you assume the null hypothesis is correct until you have enough evidence to support rejecting this hypothesis. If you are the researcher it is usually kind of a bummer when the null hypothesis is valid, because it means you didn’t find a treatment that works or that the risk factor you are studying isn’t as important as you were hoping. The Alternative Hypothesis is denoted by H subscript a or H1. As you might expect it is the opposite of the null hypothesis. This hypothesis states that there is a difference between groups. The research groups are different with regard to what is being studied. In other words there is a relationship between the risk factor or treatment and occurrence of the health outcome Obviously, the researcher wants the alternative hypothesis to be true. If the Ha is true it means they discovered a treatment that improves patient outcomes or identified a risk factor that is important in the development of a health outcome. However, you never prove the alternative hypothesis is true. You can only reject a hypothesis (say it is false) or fail to reject a hypothesis (could be true but you can never be totally sure). So a researcher really wants to reject the null hypothesis, because that is as close as they can get to proving the alternative hypothesis is true. In other words you can’t prove a given treatment caused a change in outcomes, but you can show that that conclusion is valid by showing that the opposite hypothesis (or the null hypothesis) is highly improbable given your data. Anytime you reject a hypothesis there is a chance you made a mistake. This would mean you rejected a hypothesis that is true or failed to reject a hypothesis that is false. Type 1 Error is when you incorrectly rejecting the null hypothesis. The researcher says there is a difference between the groups when there really isn’t. It can be thought of as a false positive study result. Usually we focus on the null hypothesis and type 1 error, because the researchers want to show a difference between groups. If there is any intentional or unintentional bias it more likely exaggerates the differences between groups based on this desire. The probability of making a Type I Error is called alpha. You can remember this by thinking that alpha is the first letter in the greek alphabet so it goes with type 1 error. I’m gonna hold off on talking about alpha and p-value for a few slides. Type 2 Error is when you fail to reject the null when you should have rejected the null hypothesis. The researcher says there is no difference between the groups when there is a real difference. It can be thought of as a false negative study result. The probability of making a Type II Error is called beta. You can remember this by thinking that β is the second letter in the greek alphabet. Power is the probability of finding a difference between groups if one truly exists. It is the percentage chance that you will be able to reject the null hypothesis if it is really false. Power can also be thought of as the probability of not making a type 2 error. In equation form, Power equals 1 minus beta. It is good for a study to have high power. A cutoff for differentiating high from low power would be roughly around 0.8 or 80%. In other words, having a beta less than 20% for a given study is good. Where power comes into play most often is while the study is being designed. Before you even start the study you may do power calculations based on projections. That way you can tweak the design of the study before you start it and potentially avoid performing an entire study that has really low power since you are unlikely to learn anything. Power increases as you increase sample size, because you have more data from which to make a conclusion. Power also increases as the effect size or actual difference between the group’s increases. If you are trying to detect a huge difference between groups it is a lot easier than detecting a very small difference between groups. Increasing the precision (or decreasing standard deviation) of your results also increases power. If all of the results you have are very similar it is easier to come to a conclusion than if your results are all over the place. p-value is the probability of obtaining a result at least as extreme as the current one, assuming that the null hypothesis is true. Imagine we did a study comparing a placebo group to a group that received a new blood pressure medication and the mean blood pressure in the treatment group was 20 mm Hg lower than the placebo group. Assuming the null hypothesis is correct the p-value is the probability that if we repeated the study the observed difference between the group averages would be at least 20. Now you have probably picked up on the fact that I keep adding the caveat that this definition of the p-value only holds true if the null hypothesis is correct (AKA if is no real difference between the groups). However, don’t let that throw you off. You just assume this is the case in order to perform this test because we have to start from somewhere. It is not as if you have to prove the null hypothesis is true before you utilize the p-value. The p-value is a measurement to tell us how much the observed data disagrees with the null hypothesis. When the p-value is very small there is more disagreement of our data with the null hypothesis and we can begin to consider rejecting the null hypothesis (AKA saying there is a real difference between the groups being studied). In other words, when the p-value is very small our data suggests it is less likely that the groups being studied are the same. Therefore, when the p-value is very low our data is incompatible with the null hypothesis and we will reject the null hypothesis. When the p-value is high there is less disagreement between our data and the null hypothesis. In other words, when the p-value is high it is more likely that the groups being studied are the same. In this scenario we will likely fail to reject the null hypothesis. You may be wondering what determines whether a p-value is “low” or “high.” That is where the selected “Level of Significance” or Alpha comes in. As we have already discussed Alpha is the probability of making a Type I Error (or the probability of incorrectly rejecting the null hypothesis). It is a selected cut off point that determines whether we consider a p-value acceptably high or low. If our p-value is lower than alpha we conclude that there is a statistically significant difference between groups. When the p-value is higher than our significance level we conclude that the observed difference between groups is not statistically significant. Alpha is arbitrarily defined. A 5% level of significance is most commonly used in medicine based only on the consensus of researchers. Using a 5% alpha implies that having a 5% probability of incorrectly rejecting the null hypothesis is acceptable. Therefore, other alphas such as 10% or 1% are used in certain situations. So here is the key that you need to understand. In most cases in medicine, if the p value of a study is less than 5% then there is a statistically significant difference between groups. If the p-value is more than 5% than there is not a statistically significant difference between groups. There are a couple caveats that complicate things a bit. Both are related to how you can’t take statistics out of context to make conclusions. Statistical significance is not the same things as clinical significance. Clinical Significance is the practical importance of the finding. There may be statistically significant difference between 2 drugs, but the difference is so small that using one over the other is not a big deal. For example, you might show a new blood pressure medication is a statistically significant improvement over an older drug, but if the new drug only lowers blood pressure on average by 1 more mm Hg it won’t have a meaningful impact on the outcomes that are important to patients. It is also often incorrectly stated (by students, researchers, review books etc.) that “p-Value can be used to determine that the observed difference between groups is due to chance (or random sampling error).” In other words, “if my p-Value is less than alpha then there is less than a 5% probability that the null hypothesis is true.” While this may be easier to understand and perhaps may even be enough of an understanding to get test questions right it is a misinterpretation of p-value. For a number of reasons p-Value is a tool that can only help us determine the observed data’s level of agreement or disagreement with the null hypothesis and cannot necessarily be used for a bigger picture discussion about whether our results were caused by random error. The p-Value alone cannot answer these larger questions. In order to make larger conclusions about research results you need to also consider additional factors such as the design of the study and the results of other studies on similar topics. It is possible for a study to have a p-value of less than 0.05, but also be poorly designed and/or disagree with all of the available research on the topic. Statistics cannot be viewed in a vacuum when attempting to make conclusions and the results of a single study can only cast doubt on the null hypothesis if the assumptions made during the design of the study are true. A simple way to illustrate this is to remember that by definition the p-value is calculated using the assumption that the null hypothesis is correct. Therefore, there is no way that the p-Value can be used to prove that the alternative hypothesis is true. Another way to show the pitfalls of blinding applying p-Value is to imagine a situation where a researcher flips a coin 5 times and gets 5 heads in a row. If you performed a one-tailed test you would get a p-value of 0.03. Using the standard alpha of 0.05 this result would be deemed statically significant and we would reject the null hypothesis. Based solely on this data our conclusion would be that there is at least a 95% chance on subsequent flips of the coin that heads will show up significantly more often than tails. However, we know this conclusion is incorrect, because the studies sample size was too small and there is plenty of external data to suggest that coins are fair (given enough flips of the coin you will get heads about 50% of the time and tails about 50% of the time). In actuality the chance of the null hypothesis being true is not 3% like we calculated, but is actually 100%. Lastly we have Statistical hypothesis testing which is how we test the null hypothesis & determine statistical significance. For the USMLE Step 1 Medical Board Exam all you need to know when to use the different tests. You don’t need to know how to actually perform them. When you are comparing the mean or average of 2 groups you use the t-Test. When you are comparing the mean of 3 or more groups you use an ANOVA test. When you are using categorical variables instead of numerical variables you use a chi-squared test. When using categorical values rather than having a continuous numerical value that is measurable you have categories such as gender or the presence or absence of a disease. That brings us to the end of the video. I’d like to give a big thanks to Brittany Hale & dave carlson for going to my website StompOnStep1.com and making donations which helped to fund this video. If you found this video useful please comment below as it really helps me out. And if you would like to be taken directly to the next video in the series which will cover confidence intervals you can click on this black box here if you are watching on a computer. That video will be very much related to this one so I definitely suggest checking it out. Thank you so much for watching and good luck with the rest of your studying.
Info
Channel: Stomp On Step 1
Views: 1,275,384
Rating: undefined out of 5
Keywords: usmle step 1, What is a null hypothesis (and alternate hypothesis), Statistics 101: Null and Alternative Hypotheses - Part 1, Hypothesis testing and p-values, Probability and Statistics, statistics, biostatistics, What is a p-value?, Statistical Significance, What is Statistical Significance?, null hypothesis, Khan Academy, Inferential statistics, Understanding the p-value - Statistics Help, Statistical significance of experiment, united states medical licensing examination
Id: YSwmpAmLV2s
Channel Id: undefined
Length: 15min 53sec (953 seconds)
Published: Wed Apr 20 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.