What is a p-value? by Daniel Lakens

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] if you read articles in the scientific literature you often see people reports p-values when they report statistical tests p-values are widely used and it's important to understand what they mean they're also widely criticized because people often misinterpret p-values so in this lecture the goal is to understand what they mean and how to correctly interpret them when we talk about p-values the first question we should ask ourselves is why are they so popular in scientific articles well there is a reason for this and then jhemini expresses it quite nicely here he says in some sense it offers a first line of defense against being fooled by randomness separating the signal from the noise so this is what p-values allow you to do when you interpret your data you might be very likely to interpret data in favor of the hypothesis that you have even when the effect might be only slightly in the right direction the risk is that you're fooling yourself you might be too likely to declare that something is going on when you're actually looking at random variation in data so p-values are one way to prevent you from fooling yourself p-values tell you how surprising the data is assuming that there's no effect and we'll look at all these aspects in more detail what surprising means why there are statements about the data and why they're built on the idea that there's no effect now some people say that P values are more accurately explained as what you use if you don't know Bayesian statistics yet in Bayesian statistics people don't use P values and I still remember when I was doing my own PhD that I had this confusion about whether I should use P values or Bayesian statistics my understanding was more or less that I realized there was some problem with using p-values and Bayesian statistics might be preferable but most people didn't use Bayesian statistics so it was we find to just continue using p-values now I think it's fine to use p-values but you should interpret them correctly so that's the goal in this lecture to prevent this confusion in you and to make sure that you use p-values correctly if you decide to use them let's start with a practical example let's say you want to do a study where you examine the influence of calling while you are driving thus being on the phone when you're in your car increase the risk of getting into an accident you might design a study where half of the participants drive around the city while they're on the phone and the other half of the participants drive around but they're not on the phone you want to see if there is a difference maybe in the number of people that they hit while they're driving driving through the street or maybe your ethical committee doesn't allow you to do this and you're better off using a driving simulator to study this now if you have collected your data you counted how many people get hit by a car either by people who are on the phone while they're driving or people who are not on the phone then you can look at the difference between these two conditions now this difference is never exactly zero there's always some number faro after the coma that makes a difference so let's say the difference you observe is zero point eleven a mean difference now how should you interpret this mean difference there are two options a what you're looking at is probably just random noise there's always some random noise in your data option B this is probably a real difference this is something that you should take seriously and at least examine further in future studies so which of these two is true well we can use the p-value to differentiate between these two options from the data that we have we can calculate means standard deviations and we know the sample size that we have we can use these parameters to calculate a test statistic and compare this test statistic against the distribution you can use many different types of distributions if you examine precognition you might want to use a paranormal distribution but most often people just use the normal distribution so these this bell-shaped graph is something you might have seen before and there's something you should note here and that's that this distribution is centered on zero and when we talk about the p value being data that is sir prime surprising assuming the null is true the null hypothesis is true this is what we mean we look at a distribution centered at 0 now you can see that most of the data in this case let's look at 95 percent of the data will fall between two critical values and these critical values you might have seen this number 1.96 and minus 1.96 these are critical values if you want to use an alpha level of 0.05 if data falls between these two values it's not surprising assuming that the null hypothesis is true most of the data will fall between these two critical values but sometimes we might see a data point that's more extreme than this and this is a surprising finding this is surprising data whenever the mean difference or the test statistic that's computed from this main difference is in one of the two tails of this distribution so whenever we find data that falls in these tails it's surprising and we might want to examine it further it also means that the p-value smaller than 0.05 the formal definition of a p-value is the probability of getting the observed or more extreme data assuming the null hypothesis is true now I highlighted the word data here I think it's important to realize that we're talking about the probability of observing data a p-value is the probability that you observe some data but not the probability of a theory this is a very common misunderstanding people often want to make a statement about the probability or the theory is true but when you calculate the p-value all you can do is make a statement about the probability of the data now if you make this mistake you're in good company let's take a look at example from quantum physics where a physicist talks about the probability of observing a certain spin between quantum particles so this is a study where they measured the spin in a particle that was floating around somewhere in Delft and another one that was floating around somewhere in Amsterdam in the Netherlands and these two particles spin together they have some sort of relationship and this relationship based on the data was statistically significant with a p-value of 0.05 is assisters interviewed about this finding and this physicist concludes in other words there is a 96% probability they won the race so this person is making a mistake here because with won the race this person means there's a 96% probability that the theory is correct but this is a statement about a theory it's not a statement about the data that you have observed so it's comforting maybe that a quantum physicist which sounds like you're supposed to be really smart also makes this misinterpretation of what a p-value means after you have observed a p-value that's smaller than 0.05 for example an effect is not 95% likely to be true think about precognition research let's say that I present one study to you where you find a statistically significant effect of precognition after this do you really think it's now 95 percent probable that praecox a precognition exists probably not you cannot get the probability that the null hypothesis is true given the data from a p-value if you look at the two statements below on the screen you see that the probability of the data or more extreme data assuming the null hypothesis is true it's not the same as the probability of an hypothesis given some data that you have observed these two probabilities can differ widely if you want to know the probability that a theory is true you need to use Bayesian statistics Bayesian statistics is the only approach that will you to make statements about the probability that a theory is true what happens if you do study and your p-value is larger than 0.05 well first of course he spends a lot of time and effort on collecting this data and maybe you hoped to find a statistically significant effect so the first thing you do is cry a little you're a little bit depressed that's okay but after this how should you interpret this data well all that we know when the p-value is larger than 0.05 is that the data we have observed is not surprising that's all it doesn't mean that there is no true effect there might very well be an effect but you just didn't have enough participants in your study to detect this effect remember that you need large samples to statistically detect a small effect so just because your p-value is larger than 0.05 doesn't allow you to conclude that there is no effect there might be a very small effect you don't know personally I try to think of a p-value larger than 0.05 s mu which is a concept from Zen Buddhism in sin Buddhism there's a famous saying that goes like this a monk asked a Chinese Zen master does a dog have a Buddha nature or not so you might expect a yes or no answer here because that's also how the question is phrased yes or no but instead the Zen master insert mu which basically means I'm asking the question its negating the question that's asked whenever you find a p-value that's larger than 0.05 you might feel the tendency to say so is there an effect or not but whenever the p-value is larger than 0.05 you can't answer this question so you should just answer me so how do you use p-values correctly the first thing to understand is that p-values can be used as a rule to guide behavior in the long run you can calculate them for every single study but they only work in the long run let's take a look how if you use the decision rule whenever the p-value is smaller than the Alpha level so this is your type 1 error rate which is often set to 0.05 you can act as if the data is not noise now this word act is very important it's independent of what you believe is true but all that you know is if you use this decision rule in the long run you won't say that there is something when there is nothing more than 5% of the time alternatively when the p-values larger than the alpha you can remain uncertain or act if the as if the data is just noise so these are rules that you follow in the long run when you act as if there is an effect whenever the p-value is smaller than 0.05 in the long run you won't be wrong more than 5% of the time now this is an interpretation of p-values as proposed by name and it's often used let's take the discovery of the Higgs boson as an example if you remember during the press conference about the Higgs boson researchers were talking about whether the 5 Sigma threshold worth was passed and 5 Sigma is used as a threshold to declare something a discovery in physics now 5 Sigma is basically a p-value smaller than 0.000000 3 so based on this idea we can act as if the Higgs boson is true every now and then of course will be wrong with such a high threshold for an error will only be wrong in one of many billions of parallel universes so there's one parallel universe where people spend the time and effort to build a Large Hadron Collider to detect the Higgs boson and they declared it was a statistically significant so it was there but they were actually wrong but with such a high threshold this of course rarely happens and we can be pretty safe that there is a Higgs boson and we didn't make a mistake when you interpret p-values and you want to write some about what you found you should not write we found a p-value smaller than 0.05 so our theory because if you do this you're making a statement about a theory based on a p-value and you shouldn't do this the correct way to discuss a p-value smaller than 0.05 is to say we found a p-value smaller than 0.05 so our data you make a statement about the data because that's what the p-value relates to you might say something like so our data is in line with some idea that you want to test whenever you found a non significant result a p-value larger than 0.05 you enter what's known as a degenerative research line you made a prediction but it doesn't hold up so you have something to explain now one explanation might just be random variation p-values Fairey and even if you if examined a true effect every now and then you'll observe a non significant result so you might just say everything's fine that this happens if I do another study that's exactly the same you'll see that it will pan out and my prediction will hold other times you might need to say well the effect that I predicted might be smaller than I expected so you do another study but it's larger and then you show that the effect is really there nevertheless whenever you find a non significant result there is something to think about you have to explain it in some way one way might be if you do a lot of studies every now and then you'll find a non significant result but then you need a lot of studies to support this other times you might say I have to do the study in a slightly different way and you can use this change in the paradigm to develop a progressive research line remember that p-values Fairey so always think meta-analytic read about B values this is also recommended by the statisticians who talked about B values in the very beginning for example this is a quote by Neyman and Pearson statistical tests should be used with discretion and understanding and not as instrum which themselves give the final verdict so if you calculate a statistical test that's only one thing that should go into your reasoning to decide whether this is a true effect or not always think more about a study p-value might be a starting point but you also want to look at effect sizes and other studies that have been done Fisher similarly says that a single p-value is not enough to declare some discovery on he says a phenomenon is experimentally demonstrably when we know how to conduct an experiment which will rarely fail to give us a statistically significant result so we have to repeat the experiment multiple times he also says no isolated experiment however significant in itself can suffice for the experimental demonstration of any natural phenomenon so he's saying that we should see a single p-value maybe as an invitation to explore this effect further but it can never be enough to declare something a scientific fact so we always need to do several studies and p-values can guide us in the long run in which studies we might want to do so at the end of this lecture let's take a look at the p-values that you can expect when there is a true effect and the p-values that you might expect when there is no true effect now I never really realized how p-values are distributed across studies when you do a lot of them and I think it's very important to understand this for the correct interpretation of a p-value so take a moment to think about this what kind of p-values would you expect when there is a true effect what kind of p-values would you expect when there is no effect let's take a look what really happens when there is a true effect the p-value distribution depends on the statistical power let's take a look at the visualization of this in this graph you see the p-values for a hundred thousand studies where every study had 50% statistical power this means that it's 50% probable that we'll observe p-value that's smaller than 0.05 if we look at the p-value distribution we indeed see that is much more likely to observe small p-values then it is to observe high p-values and if we look at the leftmost bar we see that indeed fifty thousand of the hundred thousand simulated studies yield the p-value that falls between zero and 0.05 now we might want to increase the statistical power a little bit you see that with higher power we have basically pushed more of the p-values below the significance fresh hold of 0.05 here we have eighty thousand after one hundred thousand simulated studies that yield a significant effect if we increase the statistical power even more to ninety-five percent we now see that most of the p-values that we'll observe given that there is a true effect will fall below the significance level so which befell use can you expect when there is no effect I really really never knew this myself I thought that p-values might be distributed in a way that if there's no effect we'll see a lot of very high p-values or I thought that it's possible that maybe they're distributed as sort of a normal distribution but instead it turns out that when there is no effect B values are uniformly distributed every p-value is equally likely and it also makes a lot of sense if you understand it in this case we have simulated a hundred thousand studies where there is no true of things and you see that no matter where you look in the distribution low p-values or high p-values they're all equally likely now this makes sense because in this way when there is a uniform distribution it means that 5% of the p-values that we observe when there is no effect fall below the 0.05 threshold so when there's no effect we have a 5% probability of making a type 1 error of saying there is a significant effect when there's actually nothing going on so here you can see this small type 1 error rate highlighted in red so this is what it means to make a type 1 error the reason that it's 5% is because there's a uniform p-value this if you would increase your alpha level to zero point ten there's still ten percent of the p-values that fall below zero points then to conclude it's important to understand how to correctly interpret the p-value it's use is often criticized because people incorrectly interpret what p-values mean and I hope that after this lecture you won't be one of them [Music]
Info
Channel: Daniel Lakens
Views: 17,285
Rating: undefined out of 5
Keywords: p-value, statistical inferences, frequentist statistics
Id: RVxHlsIw_Do
Channel Id: undefined
Length: 20min 31sec (1231 seconds)
Published: Mon Sep 16 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.