10 Pvalue

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] it's time now to talk about the p value and the p value is is a bit of an esoteric statistical concept but it is central to every medical study it is talked about all the time even if people don't use the term p-value so it is really critical that you know what it is so the goals of the lecture today are to explain how we decide if the results of a study are significant and significant is in scare quotes there describe the p-value the primary score that drives medical studies explain exactly how so many people misinterpret the p-value and what you should really do to interpret whether a study is significant now to give a visceral understanding of how the p-value works i'm going to crowdsource this a little bit so here we go we have a quarter here it's a mysterious quarter it may be an ordinary quarter or it may be a quarter that has a heads has heads on both sides okay i'm not going to tell you which it is i'm just going to flip it and you have to decide after each flip whether you think it's a two-headed coin or a regular old-fashioned coin okay so the first flip comes up in its heads and the next slip is heads and the next flip is heads and the next flip is heads are any of you starting to get a little suspicious about this coin think about it how about this how about this flip heads six heads probably you're feeling a bit more suspicious heads heads heads people pretty suspicious about this coin yet okay there's one last heads another last heads 12 heads in a row people are getting pretty darn suspicious at this point okay so so how weird was that how weird was it that we just got 12 heads in a row well this is a simple matter of probability simple matter of statistics you can say that if we flip it once our chance of getting heads assuming the coin is just a normal run-of-the-mill coin you know a single flip 50 chance of getting heads so that's 50 probability there two heads in a row assuming you have a normal quarter 25 percent chance 0.25 okay three in a row 12.5 and so on down the line if i have a normal quarter it's pretty unusual to get ten heads in a row around about one in a thousand times you might get ten heads in a row like that all right so so you felt that that was intuitive to you and somewhere between four and five head flips in a row you started to get suspicious right you started to think maybe there's something weird about this quarter maybe it's not an ordinary quarter after all and you'll note that the probability between four and five head flips is between 0.03 and 0.06 somewhere around the vicinity of 5 and there's something just viscerally strange when things start happening around that probability of 5 and in fact that's why we define statistical significance at that probability level so statistical significance is arbitrarily defined as a p-value of less than .05 now a p value is just the probability of seeing results as strange as the results you saw or stranger assuming that there's nothing special going on assuming it's just a run-of-the-mill ordinary quarter nothing strange going on here okay so we set the definition of statistical significance at five percent so it's a little weird four heads in a row but not crazy and keep that in mind okay so when a study reports a p-value of 0.04 just think ah that's about four or five head flips in a row assuming there's nothing here assuming the coin is a regular coin or assuming the drug doesn't work now statistical significance has nothing to do with clinical significance okay clinical significance is tells you you know oh well this is an important medication that will save lives and that people should take statistical significance like i showed you there is purely a mathematical construction to give you a sense of that think about testing two cars okay to see what their top speed is now we might show statistically that one car has a top speed of 161 miles an hour and one car has a top speed of 160 miles an hour and we can show that those are statistically different that they're they are not the same top speed they are different top speeds but honestly between 160 and 161 like do you care that much maybe a few of you do i certainly don't i don't think i've ever gone 160 miles an hour in a car i don't want to ever go 160 miles an hour in a car so this is a difference between statistical significance and clinical significance all the time in the news you'll see them say oh this this finding was statistically significant and it sounds like it's a medical breakthrough it's not clinical significance which is a subjective determination is what really matters now how weird are your results that is what the p-value is telling you it says assuming there's nothing cool going on okay assuming the drug doesn't work assuming the intervention is completely uh completely ineffective assuming the coin is a regular coin how weird are the results that you saw well if i flip a coin ten times and get six heads you remember this from the prior lecture that's not that weird right that's not crazy you wouldn't feel that that's very weird but if i flipped it a hundred times and got 60 heads that feels a little weirder and if i flip it a thousand times and get 600 heads that's even weirder still right so there's something that is linked to sample size here so in case you didn't do the math in your head i got 60 heads in each of these experiments right but the experiment that had a thousand coin flips was much weirder the results were much weirder despite still getting 60 percent they should feel weirder to you and the p-value actually accounts for that right so the p-value for getting six heads how weird this is is .75 all right so that's not statistically significant it says you know assuming this is a regular coin getting six heads in a row is not that crazy 0.75 it could happen all the time a hundred times in a row and getting 60 heads well that p value is 0.06 that says assuming it's a normal coin you know getting 60 heads out of a hundred well that's a little weird right we're getting to that threshold of 0.05 where we start to say this is a bit strange assuming it's a normal coin and if you do it a thousand times and get 600 heads you get this incredibly low p value if you did that you would start to say to yourself okay wait a second maybe this isn't a normal coin right like this is a very like this would be a really lucky set of flips if i got 600 out of the 600 heads out of a thousand flips so the p value is the chance of seeing results as weird as the results you saw or weirder assuming it's a normal coin i say this again and again and again it's under the assumption that nothing interesting is going on so the p-value accounts for the number of flips right it accounts for the sample size it builds it in it's cooked into the p-value which is great and we need to be aware that there's a technical term for nothing interesting going on nothing interesting going on is called the null hypothesis so we often think of medical studies in the context of two hypotheses the null hypothesis the drug doesn't really work and the alternative hypothesis the drug does work okay so the p-value is calculated assuming the null hypothesis is true it assumes the coin has heads and tails just like every other coin and just quantifies how weird your results are so the hypothesis testing framework is just this the null hypothesis it's a normal coin the alternative hypothesis it's a weighted coin or a coin that's kind of special different in some way so that it comes up heads more often all right this brings us to the question um which is is sort of i like to think of it as how different is different so whenever i measure two things or i measure a value in two different groups i'm never going to get the exact same number right it's just not going to happen so i have to quantify somehow how different are these groups how sure am i that these groups are different from each other really remembering that sampling framework it's like it's like how sure am i that this bag has more gold coins than this bag if i can only sample a few from each bag all right so just to give you an example i measured the weights of a hundred male internal medicine doctors like me and a hundred male surgeons the null hypothesis okay is nothing is interesting is going on what's the null hypothesis here the null hypothesis here is the average weight is going to be the same all right so the average weight among the medicine guys was 175 pounds the average weight among the surgeons was 180 pounds so these are different are we done do we say to ourselves okay well surgeons are on average heavier than medicine guys well not really right because i just took a sample of medicine guys and surgical guys i didn't i didn't measure every single one in the world so and i'm more interested in kind of general generally who's heavier okay so what's what we have to ask ourselves is assuming the null hypothesis is true assuming the average weight is the same doesn't vary by your surgical or medical specialty how weird is it that i would get these particular results how weird is it that i would get 175 as an average for the medical guys at 180 as an average for the surgical guys well that's a p-value right one of the groups is going to be heavier i'm never going to get exactly the same average for both groups i know that but remember the issue of sampling i'm not measuring every surgeon in the world i'm taking hopefully a random sample so if the surgeons turn out to be heavier how sure am i that they would be heavier if i were able to measure every single surgeon in the world in other words how weird are my results assuming there are no major differences in weight by medical specialty now there are equations for this and this happens to be one of the equations you could plug in to calculate a p-value using your data we are not going to go through this math this is very intimidating looking math it's not as hard as it looks but it's intimidating looking how you calculate p-values as a lecture in itself you just need to know how to interpret them and remember the lower it is the weirder your results are and .05 is an arbitrary threshold defining statistical significance in medical studies but be careful so here's a warning let's say i flip my coin five times and i get five heads in a row assuming this is a normal coin right this is just a coin that you you know head and the tails this is pretty unusual the p-value is 0.03 that's a bit unusual to get five heads in a row not crazy it would happen three percent of the time it's it's a little it's a little interesting so you could say that given that the coin is normal it's a regular old coin i'd see results this weird only three percent of the time that's a fair statement this is not the same as saying i am 97 sure this coin has two heads okay the results you found were a bit weird assuming the coin is normal it does not mean that the coin is abnormal and people get this wrong all the time they say oh well the p value is .03 that means there's only a three percent chance this is a regular coin no no no the p value already assumes it's a regular coin the p-value doesn't say anything about whether it's a regular coordinate it assumes it's a regular coin and it's just telling you how weird are your results assuming it's a regular coin so let me give you an example of this if we have situation one your friend is walking down the street sees a quarter on the ground picks it up and flips it and gets those five heads in a row p value 0.03 situation two a shady street magician comes up to you and picks a quarter out of his pocket and he makes a bet with you that if he gets heads you pay him a dollar but if he gets tails he'll pay you two dollars right it would be a great bet if it was a normal coin and he proceeds to flip five heads in a row you still have a p value of 0.03 in both of these situations which situation do you think is more likely to have the coin with two heads right it's the the shady street magician you're with me on this you got to be with me on this don't trust the shady street magician with that quarter so you took your you took information from before you saw the coin flips to use what the coin flips told you to make your final conclusion you essentially were suspicious before that street magician even flipped once and when his flips confirmed your suspicions you became even more suspicious whereas your friend who picked it up off the street you weren't suspicious it was a two-headed quarter that's a really weird thing why would you why would there be a two-headed quarter lying on the side of the street you're not suspicious at all so you have to interpret p-values in the context of your prior suspicion that something is going on that the alternative hypothesis is true that prior suspicion has a formal name it's called the prior probability and we'll talk about that more in future lectures but just to know the p-value doesn't tell you whether the drug works the p-value just says how weird is this data assuming the drug doesn't work at all now you'll see this in the headlines all the time okay so here are two p-values that are the same two different headlines one seat belts save lives p-value equals .04 a statistically significant result would you believe this study that seat belts save lives right so what's the null hypothesis here seat belts don't do any good the alternative hypothesis seat belts save lives so they're telling us that the p-value is .04 whatever data they had assuming seat belts don't work the data that they showed was pretty weird in a world where seat belts don't work so we would say yeah that makes sense seat belts probably work because we kind of believed before the study was even done that seat belts worked it makes sense that they work and they should work and that kind of thing now if you had a study that said astrological sign predicts income with the same p value they're telling you assuming the study is you know truthfully reported they're telling you that under the null hypothesis the astrological sign doesn't matter at all this data is a little weird you'd only get data this weird four percent of the time that's pretty weird does that mean that you're now 96 sure that astrological sign predicts income no chances are you just got lucky or unlucky however you want to think about this was just one of those studies four percent of the time you were going to get results this weird it happened to be one of those polls of the jackpot lever where you get four percent where the four percent comes up okay so you have to interpret these results in the context of your prior belief in whether this thing is going to work or not so p values do not tell you the chance that something is happening to figure out the chance that something is happening like if a drug or intervention works you have to know how likely the drug was to work before the study happened before the coin got flipped your prior suspicion that something strange was going to be going on how do you quantify that well you can you can read other studies that have been done you can use biologic plausibility like seat belts really should work because you know they restrain you in your seat and that seems to make sense whereas astrological signs really have no biologic plausibility behind them you can use any metric you want but you do have to come up with a sense beforehand of how likely you think something is remember extraordinary claims require extraordinary evidence that is why that maxim is true because something being a little bit weird if the claim is very extraordinary chances are you just got a little bit weird data not that you are blowing up the entire underpinnings of modern science so how does this work in a real study just to show you an actual example so here's an article close to my heart as a kidney doctor looking at our old friend lipitor here to prevent acute kidney injury after cardiac surgery so patients have cardiac surgery sometimes their kidneys stop working and this was a trial to see if giving them some some atorvastatin some lipitor would improve that and what you see is that the number of patients who had acute kidney injury uh in the lipitor group was 20.8 percent compared to 19.5 percent in the placebo group so if that's all we looked at we'd say oh well the atorvastatin group had more acute kidney injury 20.8 versus 19.5 does that mean that this that this drug you know hurts the kidney or something well well let's look at the p-value okay the p-value is calculated at 0.75 what does that say the p-value says how weird is this data 20.8 and 19.5 assuming that lipitor is inert it doesn't do anything it doesn't hurt your kidneys doesn't help your kidneys and the answer is it's not that weird this data is pretty common you know 75 of the time you'll get data this weird or weirder okay so so this p-value 0.75 we would call not statistically significant and we would say not that lipitor hurts the kidneys but that lipitor doesn't seem to have any effect one way or another so here are take-home points for today p-values are a measure of how weird the data is i'm going to say it again assuming nothing weird is actually going on the threshold for statistical significance p less than 0.05 is arbitrary and has nothing to do with clinical significance we want p-values to be the chance that the study is wrong but they are not remember p-values assume that the thing isn't working that the medication doesn't work they don't tell us how likely it is that the medication works you need to know the prior probability of the study to figure that out and that is somewhat subjective so interpret a p-value in the context of your prior belief that the thing that is being tested could actually work thanks very much
Info
Channel: YaleCourses
Views: 1,230
Rating: 4.8709679 out of 5
Keywords:
Id: vN8DwyPIW8s
Channel Id: undefined
Length: 18min 15sec (1095 seconds)
Published: Fri Jul 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.