P-values Broke Scientific Statistics—Can We Fix Them?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a little over a decade ago a neuroscientist stopped by a grocery store on his way to his lab to buy a large Atlantic salmon the fish was placed in an MRI machine and then it completed what was called an open-ended mentalizing task where it was asked to determine the emotions that were being experienced by different people in photos yes the salmon was asked to do that the dead one from the grocery store but that's not the weird part the weird part is that researchers found that so-called significant activation occurred in neural tissue in a couple of places in the dead fish turns out this was a little bit of a stunt the researchers weren't studying the mental abilities of dead fish they wanted to make a point about statistics and how scientists use them which is to say stats can be done wrong so wrong that they can make a dead fish seem alive a lot of the issues surrounding scientific statistics come from a little something called a p-value the P stands for probability and it refers to the probability that you would have gotten the results you did just by chance there are lots of other ways to provide statistical support for your conclusion in science but p-value is by far the most common and I mean it's literally what scientists mean when they report that their findings are significant but it's also one of the most frequently misused and misunderstood parts of scientific research and something it's time to get rid of it altogether the p-value was first proposed by a statistician named Ronald Fisher in 1925 Fisher spent a lot of time thinking about how to determine if the result of a study were really meaningful and at least according to some accounts his big breakthrough came after a party in the early 1920s at this party there was a fellow scientist named Muriel Bristol and reportedly she refused a cup of tea from Fisher because he had added the milk after the tea was poured she only liked her tea when the milk was added first Fisher didn't believe she could really taste the difference so he and a colleague designed an experiment to test her assertion they made eight cups of tea half of which were milk first and half of which were tea first the the cups was random and most importantly unknown to Bristol though she was told that there would be four of each Cup then Fischer had her taste each tea one by one and say whether that cup was milk or tea first and two Fischer's great surprise she went eight for eight she guessed correctly every time which cup was tea first and which was milk first and that got him thinking what are the odds that she got them all right just by guessing in other words if she really couldn't taste the difference how likely would it be that she got them all right he calculated that there are 70 possible orders for the eight cups if there are four of each mix therefore the probability that she'd guessed the right one by luck alone is one in 70 written mathematically the value of P is about point zero one four that in a nutshell is a p-value the probability that you'd get the result if chance is the only factor in other word there's really no relationship between the two things you're testing in this case how tea is mixed versus how it tastes but you could still wind up with data that suggests there is a relationship of course the definition of chance varies depending on the experiment which is why p-values depend a lot on experimental design say Fischer had only made six cups three of each tea mix then there are only twenty possible orders for the cups so the odds of getting them all correct is one in 20 a p-value of 0.05 Fischer went on to describe an entire field of Statistics based on this idea which we now call null hypothesis significance testing the null hypothesis refers to the experiments assumption of what by chance looks like basically researchers calculate how likely it is that they've gotten the data that they did even if the effect they're testing for doesn't exist then if the results are extremely unlikely to occur if the null hypothesis is true then they can infer that it isn't so in statistical speak with a low enough p-value they can reject the null hypothesis leaving them with whatever alternate hypothesis they had as the explanation for the results the question becomes how low does a p-value have to be before you can reject that null hypothesis well the standard answer used in science is less than one in twenty odds or a p-value below point zero five the problem is that an arbitrary choice it also traces back to Fisher's 1925 book where he said one in 20 was quote convenient a year later he admitted the cutoff was somewhat subjective but that point zero five was generally his personal preference since then the point zero five threshold has become the gold standard in scientific research a P of less than point zero five and your results are quote significant it's often talked about as determining whether or not an effect is real but the thing is a result with a p-value of 0.05 isn't more true than one with a p-value of 0.05 one it's just ever so slightly less likely to be explained by chance or sampling error this is really key to understand you're not more right if you get a lower p-value because a p-value says nothing about how correct your alternate hypothesis is let's bring it back to the tee for a moment bristol aced Fisher's eighth cup study by getting them all correct which as we noted has a p-value of 0.01 for solidly below the point zero five threshold but it being unlikely that she randomly guessed doesn't prove she could taste the difference see it tells us nothing about other possible explanations for her correctness like if the tea's had different colors rather than tastes or she secretly sawfish are pouring each cup also it still could have been a 1 in 70 flu and sometimes one might even argue often one in twenty is not a good enough threshold to really rule out that a result is a fluke which brings us back to that seemingly undead fish the spark of life detected in the salmon was actually an artifact of how MRI data is collected and analyzed see when researchers analyzed MRI data they look at small units about a cubic millimeter - in volume so for the fish they took each of these units and compared the data before and after the pictures were shown to the fish that means even though they were just looking at one dead fish's brain before and after they were actually making multiple comparisons potentially thousands of them the same issue crops up in all sorts of big studies with lots of data like nutritional studies where people provide detailed diet information about hundreds of foods or behavioral study is where fill out surveys with dozens of questions in all cases even though each individual comparison is unlikely with enough comparisons you're bound to find some false positives there are statistical solutions for this problem of course which are simply known as multiple comparison Corrections so they can get fancy they usually amount to lowering the threshold for p-value significance and to their credit the researchers who looked at the dead salmon also ran their data with multiple comparison Corrections when they did their data was no longer significant but not everyone uses these Corrections and though individual studies might give various reasons for skipping them one thing that's hard to ignore is that researchers are under a lot of pressure to publish their work and significant results are more likely to get published this can lead to pee hacking the practice of analyzing or collecting data until you get significant p-values this doesn't have to be intentional because researchers make many small choices that lead to different results like we saw with the six versus eight cups of tea this has become such a big issue because unlike when these statistics were invented people can now run tests lots of different ways fairly quickly and cheaply and just go with what's most likely to get their work published because of all of these issues surrounding p-values some are arguing that we should get rid of them altogether and one journal has totally banned them and many that say we should ditch the p-value are pushing for an alternate statistical system called Bayesian statistics p-values by definition only examine null hypotheses the result is then used to infer if the alternative is likely Bayesian statistics actually look at the probability of both the null and alternative hypotheses what you wind up with is an exact ratio of how likely one explanation is compared to another this is called a Bayes factor and this is a much better answer if you want to know how likely you are to be wrong this system was around when Fisher came up with p-values but depending on the data set calculating Bayes factors can require some serious computing power power that wasn't available at the time since you know it was before computers nowadays you can have a huge network of computers thousands of miles from you to run calculations while youth a tea party but the truth is replacing p-values with Bayes factors probably won't fix everything a loftier solution is to completely separate a studies publish ability from its results this is the goal of two-step manuscript submission where you submit an introduction to your study and a description of your method and the journal decides whether to publish before seeing your results that way in theory at least studies would get published based on whether they represent good science not whether they worked out the way researchers hoped or whether a p-value or Bayes factor was more or less than some arbitrary threshold this sort of idea isn't widely used yet but it may become more popular as statistical significance meets more sharp criticism in the end hopefully all this controversy surrounding p-values means that academic culture is shifting toward a clearer portrayal of what research results do and don't really show and that will make things more accessible for all of us who want to read and understand science and keep any more zombie fish from showing up now before I go make myself a cup of Earl Grey milk first of course I want to give a special shout out to today's president of space SR Foxley thank you so much for your continued support patrons like you give us the freedom to dive deep into complex topics like p-values so really we can't thank you enough and if you want to join SR in supporting this channel and the educational content we make you're at scishow you can learn more at patreon.com/scishow Cheerios [Music] you [Music]
Info
Channel: SciShow
Views: 323,977
Rating: 4.9031382 out of 5
Keywords: SciShow, science, Hank, Green, education, learn, mri, fmri, salmon, fish, p-value, statistics, Ronald Fisher, tea, probability, Null Hypothesis Significance Testing, null hypothesis, hypothesis, multiple comparison corrections, p-hacking, Bayesian statistics, Bayes factor, publish, research, Olivia Gordon
Id: tLM7xS6t4FE
Channel Id: undefined
Length: 10min 39sec (639 seconds)
Published: Wed Sep 11 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.