Distinguished future physicians welcome to
Stomp on Step 1 the only free videos series that helps you study more efficiently by focusing
on the highest yield material. I’m Brian McDaniel and I will be your guide
on this journey through Null Hypothesis, Alternative Hypothesis, Type I and Type II Error, p-Value,
alpha, beta, power & Statistical Significance. This is the 11th video in my playlist covering
all of biostatistics and Epidemiology for the USMLE Step 1 Medical Board Exam. There is a lot to cover but, we will try to
move through things quickly and break them down into bite sized pieces. We will start with the Null Hypothesis which
is represented by H subscript zero. The null hypothesis states that there no difference
between the groups being studied. In other words there is no relationship between
the risk factor or treatment being studied and occurrence of the health outcomes. For example, if we are comparing a placebo
group to a group receiving a new diabetes medication then then null hypothesis states
that the blood sugars or medical complications would be roughly the same in each group. We will talk about this more in a second,
but by default you assume the null hypothesis is correct until you have enough evidence
to support rejecting this hypothesis. If you are the researcher it is usually kind
of a bummer when the null hypothesis is valid, because it means you didn’t find a treatment
that works or that the risk factor you are studying isn’t as important as you were
hoping. The Alternative Hypothesis is denoted by H
subscript a or H1. As you might expect it is the opposite of
the null hypothesis. This hypothesis states that there is a difference
between groups. The research groups are different with regard
to what is being studied. In other words there is a relationship between
the risk factor or treatment and occurrence of the health outcome
Obviously, the researcher wants the alternative hypothesis to be true. If the Ha is true it means they discovered
a treatment that improves patient outcomes or identified a risk factor that is important
in the development of a health outcome. However, you never prove the alternative hypothesis
is true. You can only reject a hypothesis (say it is
false) or fail to reject a hypothesis (could be true but you can never be totally sure). So a researcher really wants to reject the
null hypothesis, because that is as close as they can get to proving the alternative
hypothesis is true. In other words you can’t prove a given treatment
caused a change in outcomes, but you can show that that conclusion is valid by showing that
the opposite hypothesis (or the null hypothesis) is highly improbable given your data. Anytime you reject a hypothesis there is a
chance you made a mistake. This would mean you rejected a hypothesis
that is true or failed to reject a hypothesis that is false. Type 1 Error is when you incorrectly rejecting
the null hypothesis. The researcher says there is a difference
between the groups when there really isn’t. It can be thought of as a false positive study
result. Usually we focus on the null hypothesis and
type 1 error, because the researchers want to show a difference between groups. If there is any intentional or unintentional
bias it more likely exaggerates the differences between groups based on this desire. The probability of making a Type I Error is
called alpha. You can remember this by thinking that alpha
is the first letter in the greek alphabet so it goes with type 1 error. I’m gonna hold off on talking about alpha
and p-value for a few slides. Type 2 Error is when you fail to reject the
null when you should have rejected the null hypothesis. The researcher says there is no difference
between the groups when there is a real difference. It can be thought of as a false negative study
result. The probability of making a Type II Error
is called beta. You can remember this by thinking that β
is the second letter in the greek alphabet. Power is the probability of finding a difference
between groups if one truly exists. It is the percentage chance that you will
be able to reject the null hypothesis if it is really false. Power can also be thought of as the probability
of not making a type 2 error. In equation form, Power equals 1 minus beta. It is good for a study to have high power. A cutoff for differentiating high from low
power would be roughly around 0.8 or 80%. In other words, having a beta less than 20%
for a given study is good. Where power comes into play most often is
while the study is being designed. Before you even start the study you may do
power calculations based on projections. That way you can tweak the design of the study
before you start it and potentially avoid performing an entire study that has really
low power since you are unlikely to learn anything. Power increases as you increase sample size,
because you have more data from which to make a conclusion. Power also increases as the effect size or
actual difference between the group’s increases. If you are trying to detect a huge difference
between groups it is a lot easier than detecting a very small difference between groups. Increasing the precision (or decreasing standard
deviation) of your results also increases power. If all of the results you have are very similar
it is easier to come to a conclusion than if your results are all over the place. p-value is the probability of obtaining a
result at least as extreme as the current one, assuming that the null hypothesis is
true. Imagine we did a study comparing a placebo
group to a group that received a new blood pressure medication and the mean blood pressure
in the treatment group was 20 mm Hg lower than the placebo group. Assuming the null hypothesis is correct the
p-value is the probability that if we repeated the study the observed difference between
the group averages would be at least 20. Now you have probably picked up on the fact
that I keep adding the caveat that this definition of the p-value only holds true if the null
hypothesis is correct (AKA if is no real difference between the groups). However, don’t let that throw you off. You just assume this is the case in order
to perform this test because we have to start from somewhere. It is not as if you have to prove the null
hypothesis is true before you utilize the p-value. The p-value is a measurement to tell us how
much the observed data disagrees with the null hypothesis. When the p-value is very small there is more
disagreement of our data with the null hypothesis and we can begin to consider rejecting the
null hypothesis (AKA saying there is a real difference between the groups being studied). In other words, when the p-value is very small
our data suggests it is less likely that the groups being studied are the same. Therefore, when the p-value is very low our
data is incompatible with the null hypothesis and we will reject the null hypothesis. When the p-value is high there is less disagreement
between our data and the null hypothesis. In other words, when the p-value is high it
is more likely that the groups being studied are the same. In this scenario we will likely fail to reject
the null hypothesis. You may be wondering what determines whether
a p-value is “low” or “high.” That is where the selected “Level of Significance”
or Alpha comes in. As we have already discussed Alpha is the
probability of making a Type I Error (or the probability of incorrectly rejecting the null
hypothesis). It is a selected cut off point that determines
whether we consider a p-value acceptably high or low. If our p-value is lower than alpha we conclude
that there is a statistically significant difference between groups. When the p-value is higher than our significance
level we conclude that the observed difference between groups is not statistically significant. Alpha is arbitrarily defined. A 5% level of significance is most commonly
used in medicine based only on the consensus of researchers. Using a 5% alpha implies that having a 5%
probability of incorrectly rejecting the null hypothesis is acceptable. Therefore, other alphas such as 10% or 1%
are used in certain situations. So here is the key that you need to understand. In most cases in medicine, if the p value
of a study is less than 5% then there is a statistically significant difference between
groups. If the p-value is more than 5% than there
is not a statistically significant difference between groups. There are a couple caveats that complicate
things a bit. Both are related to how you can’t take statistics
out of context to make conclusions. Statistical significance is not the same things
as clinical significance. Clinical Significance is the practical importance
of the finding. There may be statistically significant difference
between 2 drugs, but the difference is so small that using one over the other is not
a big deal. For example, you might show a new blood pressure
medication is a statistically significant improvement over an older drug, but if the
new drug only lowers blood pressure on average by 1 more mm Hg it won’t have a meaningful
impact on the outcomes that are important to patients. It is also often incorrectly stated (by students,
researchers, review books etc.) that “p-Value can be used to determine that the observed
difference between groups is due to chance (or random sampling error).” In other words, “if my p-Value is less than
alpha then there is less than a 5% probability that the null hypothesis is true.” While this may be easier to understand and
perhaps may even be enough of an understanding to get test questions right it is a misinterpretation
of p-value. For a number of reasons p-Value is a tool
that can only help us determine the observed data’s level of agreement or disagreement
with the null hypothesis and cannot necessarily be used for a bigger picture discussion about
whether our results were caused by random error. The p-Value alone cannot answer these larger
questions. In order to make larger conclusions about
research results you need to also consider additional factors such as the design of the
study and the results of other studies on similar topics. It is possible for a study to have a p-value
of less than 0.05, but also be poorly designed and/or disagree with all of the available
research on the topic. Statistics cannot be viewed in a vacuum when
attempting to make conclusions and the results of a single study can only cast doubt on the
null hypothesis if the assumptions made during the design of the study are true. A simple way to illustrate this is to remember
that by definition the p-value is calculated using the assumption that the null hypothesis
is correct. Therefore, there is no way that the p-Value
can be used to prove that the alternative hypothesis is true. Another way to show the pitfalls of blinding
applying p-Value is to imagine a situation where a researcher flips a coin 5 times and
gets 5 heads in a row. If you performed a one-tailed test you would
get a p-value of 0.03. Using the standard alpha of 0.05 this result
would be deemed statically significant and we would reject the null hypothesis. Based solely on this data our conclusion would
be that there is at least a 95% chance on subsequent flips of the coin that heads will
show up significantly more often than tails. However, we know this conclusion is incorrect,
because the studies sample size was too small and there is plenty of external data to suggest
that coins are fair (given enough flips of the coin you will get heads about 50% of the
time and tails about 50% of the time). In actuality the chance of the null hypothesis
being true is not 3% like we calculated, but is actually 100%. Lastly we have Statistical hypothesis testing
which is how we test the null hypothesis & determine statistical significance. For the USMLE Step 1 Medical Board Exam all
you need to know when to use the different tests. You don’t need to know how to actually perform
them. When you are comparing the mean or average
of 2 groups you use the t-Test. When you are comparing the mean of 3 or more
groups you use an ANOVA test. When you are using categorical variables instead
of numerical variables you use a chi-squared test. When using categorical values rather than
having a continuous numerical value that is measurable you have categories such as gender
or the presence or absence of a disease. That brings us to the end of the video. I’d like to give a big thanks to Brittany
Hale & dave carlson for going to my website StompOnStep1.com and making donations which
helped to fund this video. If you found this video useful please comment
below as it really helps me out. And if you would like to be taken directly
to the next video in the series which will cover confidence intervals you can click on
this black box here if you are watching on a computer. That video will be very much related to this
one so I definitely suggest checking it out. Thank you so much for watching and good luck
with the rest of your studying.