The Reproducibility Crisis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome back everybody. Today I'm in Oxford and I'm talking to Dorothy Bishop who's a professor for psychology at the University of Oxford thank you so much for your time Dorothy I read your interesting command in nature magazine about the reproducibility crisis could you maybe start with briefly summarizing what this is all about yes so in psychology we've been very well we're just over the recent probably 10 15 years that there's been a problem with psychology results not replicating and for some time people just thought this was maybe a pedantic problem that statisticians were picking up on and so on and then we had a big study done that actually tried to reproduce some results that have been published in quite respectable journals and found that really only about 30 to 40% of them did reproduce and so everybody started to get a bit alarmed about this and consider what the problem was and what we can do about it and I really first got heavily involved for now disaster chair meeting at the Academy of Medical Sciences which wasn't really so much about psychology but about biomedical sciences in general and it turned out that they had also been really getting concerned interestingly enough largely because the pharmaceutical industry was getting concerned because they were trying to build on results that had been coming out of experimental labs in bioscience and finding that they couldn't reproduce the basic result so they you know couldn't get taught at the past first base and so we had this very interesting symposium which had a whole diverse collection of people that included physicists telling us about how they do things in physics but also people from industry and a lot of issues arose and it was clear there's no one cause and there's no one solution and that some of the solutions are more to do with incentive structures which i think is something you're very interested in but also I got more and more interested in the extent to which we had problems with how people were designing their studies often not intentionally this we're not talking here about fraud we're talking sort of people designing studies in ways that are not going to be optimal for finding out what's really going on and so in this nature paper I just summarized some things I have from a talk I'd give on this which was really talking about what I call the Four Horsemen of the reproducibility apocalypse I have to try and remember all of them I mean the first one is publication bias it so the fact that we know and that affects all disciplines that it's much easier to publish things that show a positive exciting result than another result and that was really affecting I think not just psychology and it's been known about for a long time but it just goes on and on and to the extent that I think most scientists wouldn't even try and publish something if it wasn't significant because they will feel they wouldn't get me wouldn't get accepted so there's this notion of this big file drawer full of stuff that isn't published which distorts the literature because if you think of any particular quest research question there should be a sort of cumulative process of building on previous research but if the only research that comes through and is sort of filtered out is the stuff with positive results you get a very distorted idea so publication bias is number one number two is what we call P hacking in psychology which is not so much that you select which papers to publish but from within a study you select which specific results to pull out and focus on and there's lots of ways you can P hack you can analyze your data many ways and just focus on the one way that gives you an interesting result or you may gather data on lots and lots of variables but only report the ones that look exciting and I got the impression that a lot of psychologists don't realize this is a problem and don't really appreciate how again this can really distort the literature because they tend to think oh I've found something with the p-value is less than 0.05 it must mean something and so that is that is the second big one it certainly in psychology the the next one is power which is people doing studies with samples that are simply small to show an effect that they want to show again this has been talked about since the 1970s but like my impression again is that a lot of psychologists think it's statistics statisticians being pedantic and that really it doesn't matter and you know so if they can only test 20 people in each of two groups you know they may be sort of go on a wing and a prayer and think they'll find something and in fact the odds are very high that most of the sort of effect sizes we're talking about in psychology are really quite small and it's become clear we need much much bigger samples so what's the typical sample size of your visit like some 50 people or well it depends on the sub area so I mean in some areas it's really difficult to get large samples if you work as I do with special groups so you know it can take three years for us to collect a sample of 30 children of a particular kind but the solution there then is to sort of collaborate and people are beginning to realize that we have to form larger collaborative enterprises to investigate some questions I mean if you're just doing questionnaire studies it's obviously easy to get large samples so it's very varied and with things like brain imaging it's just very expensive to collect large samples because each brain scan you may pay 500 pounds so you know people are not motivated to sort of get large samples until they realize yet it's a real problem and then just that the fourth horseman is something that is described was described again many years ago as harking which is hypothesizing after the results unknown and it ties in with pea hacking because this is looking at your data and then finding from the morass of stuff there that you pull out one little interesting looking thing but then you make out when you write it up that this was your hypothesis so you tell a good story and we're all told when we write up our results it's important to tell the good story but what you're not supposed to do is use the same data to generate a hypothesis test the hypothesis when you're harking that's what you're doing you're first of all looking at the data and then you're saying oh that suggests this hypothesis and they use the same data to test it and again that can really create a lot of problems so those are the four things that I've been particularly focused on and I'm very interested in how we might fix them and part of it is just educational and I'm very keen on using simulated data to this sort of where you you know you you are God you know what the truth is because you've made up this data set to have certain characteristics and then you can show people how you know they compete how can find something significant when there's nothing there and conversely you can simulate data where there is a real event and show people how if their sample is too small how easy it is to miss it so that is beginning to become more common in education but it's still not routine that people are taught statistics that way and I think that's a lot of the problem so maybe I should add for clarification that the problem was tried different analysis methods on the data is that you get the statistical significance wrong because every different method of analysis is a new attempt yes to find something new yes so you can you can screw yourself over by trying just around as often as you or want to and this time I'm afraid that this is something that parsh it partially also happens yeah in physics well what a lot of collaborations do in physics is that they they decide on a method of analysis before they analyze the data yes and then they stick with us so what are what are people doing in psychology now to try to address these things so one of the things I really like is the development of something called registered reports as a publication model and I've tried this on several occasions and it does actually fix all four of those problems I mentioned because what you do is you try to publish your introduction and methods and your methods have to be very highly specified ideally with a script saying how you're going to analyze the data and maybe with some simulated data and that is what is evaluated by reviewers and on they may suggest changes just as normally this would all happen after you'd collected the data this is and prior to data collection but they decide is this an interesting question is this a good way of tackling it is the study adequately powered and if you can persuade the reviewers you can then get an in-principle acceptance from the journal that who will publish your paper if you do what you then said you were going to do and this puts all the time lag and everything early on in the process and that's why people don't like it because you're waiting for a review of comments before you've even got the data but it does mean that reviewers can be much more constructive but it also completely and the person who's really developed this chris chambers in the university of cardiff and he says you know the one thing that shouldn't affect a decision to publish it really is the results because that's not you know contingent on the quality of the methods and things so it should be more the quality of methods and sensibleness of the question and you basically break that link because you don't know the results at that point so you get this in principle acceptance and then provided you do what you said you wanted to do you will get published so it slows you up at the start but by the time you've finished gathering data then the process is usually very rapid so that is becoming increasingly popular and initially very few journals would offer it as an option but increasingly it's becoming more and more standard as people are realizing that it actually gets rid of a lot of the problems and does help us do better science so the Germans are offering this as an alternative yet if that message of reviews yeah yes and but you haven't had editors who know how it works and who are enthusiastic about it and it's sort of I think it's slowly beginning to break through because it only came in about sort of six or seven years ago I think that it was a new thing and now I know if initially there was just one journal that was trying it out and now I think there's about two or three hundred offering it not just psychology but other disciplines too and in fact plus one now it's just an agree to offer it so that should be a big change but it's mostly in the life sciences right I think so yes yeah just my sociology may be something yes and economics may be as well those areas of course you you're talking it gets complicated if you're talking about analysis of existing data sets you you then have less sort of control over whether somebody's really already had it so so you said earlier that these complaints from the statisticians about the p-value hacking and the small sample size have been around since the 70s yeah so it was not like this was or even earlier given that this was so well known how do you think it could get so far that it would take so long for psychologists to notice yeah I mean I think there's there's two things one is that people genuinely didn't understand how serious it was and it's like you say I mean people see this p-value less important they think this must be meaning something and I mean I try and illustrate it with you know thinking of if you had somebody who was a magician and said I can do you were you know a particular hand of cars you know maybe you know I don't play poker but something that's quite unusual and you come along and he deals it and you ah you know he's an amazing magician but if you know that prior to you he's tried this on you know 100 other people who didn't get this and there's only one in a hundred where it sort of actually worked you have a completely different attitude towards that result so you have to understand that you know you're talking about probabilities they have to be interpreted in the context of all the tests that you've done they're not a measure of an effect size in the same way that people think of it so I think this is that misunderstanding but this change I think also the very interesting change has been social media because social media has people a voice to people that previously didn't have one which is mainly junior early career scientists who may have been encouraged to do this all sort of really suffered as a result of finding they can't replicate something that was published and looked as if it was solid and they are actually you know actually getting quite militants' about making science better and in the past the only way if you found something in a journal that you didn't think was well done or didn't agree with would you a letter to the editor who may decide to publish it several months later whereas now people can zip up on Twitter and say hmm so what do you mean was there getting militant about it they aggressively draw attention to it yeah and they're concerned to try and bring about all sorts of changes so we with a couple of colleagues I've been running a course on advance reproducible methods for three years now four years I think this was the fourth we've just start with early career researchers and they go off you know really fired up and in Oxford we start we had a very good group of early career researchers who started various initiatives so they started having a journal club called reproducibility ta at the end where we drink tea and talk about but we also have had other events and this has culminated in putting in a bid to the university to support someone to sort of really coordinate these activities and my colleague in anthropology Lera Fortunato has headed up this bid to get funding across the university so we're trying to bring in all disciplines to just improve the credibility of scholarship even in humanities you know where you're gone you might be talking about electronic collections of items in museums and so on and sort of just making sure that things are open and properly documented and if you're doing science your scripts are available but the other thing that we have to tackle is the incentive structure so we have to make sure that the university when they're hiring and firing when they're promoting people that they have an interest in issues like you know how credible your science is rather than just it in a flashy journal but they are on board on that and in part they've also heavily influenced by the funders who are very motivated to bring about change so that original meeting I was at at Academy of Medical Sciences that was supported by the Wellcome Trust and two of our Research Council's MRC and BBSRC and of course they don't want to spend their funds on research that isn't credible so they are highly motivated to fix the problems once people became really aware of the problems the motivation from the funders is there and that will translate into new requirements for people submitting grant proposals which mean that you know whether they want to or not I mean I think this university does also want to do very high quality research of course and so again once people become really aware of just how endemic the problems are I think this we're on the cusp now I've really seen quite a lot of change and the way people do things it's gonna be quite different I find a super fascinating because I always have this impression that in their Kadima nothing ever moves nothing ever changes nothing I have to say University vogue this tends to be horribly true because we are an old institution where everything everybody's very very careful and many people have to be involved in all decisions but it is nice that the you know at heart there's a lot of people in the university in different disciplines who are all really very keen to bring about these changes and that's that's how strengths really because we keep finding new people I mean somebody in economics or somebody in politics or somebody in computational biology and they're all interested in this same goal and they will have different ways of solving the problems so I think we we're the only place I think internationally who are trying to do something at the level of all disciplines converging but we've only just started we only could we really got going in January we launched in January but which is only last month so we're already feeling very very positive so you said you have kind of you have the university administration and also the funding agencies behind you which is is a good start and the community of course yes how important do you think it is that the public is aware of these problems and of your efforts to do something about it yeah that's a very good question which was raised at this initial meeting we were at because there were some people who are saying well you know we mustn't really say too much about the problems because the public will lose confidence but the there were people there who I were agreed with who science journalist who said this is appalling you know you need to talk about the problems otherwise people will really lose confidence the there is a difficulty which is it can be weaponized so already we've seen this happening in the US where people who have a particular agenda that they don't want the Trump government to run government doesn't want to for example obey regulations about environmental protections are starting to say well you can you we don't have to take any notice of any regulations unless the data is open and given that you know data on things like asbestos was gathered years ago before there was a chance for open data they can their top four decide that they don't want to take any notice of it or they can just say well all the stuff on climate change of course you know if science is not very reliable there's different points of view so we that's that's the hardest thing really is trying to on the one hand be open and honest about what we're doing while on the other hand ensuring that this doesn't therefore act as a hostage to fortune and allow people to just weaponize what you know that we're doing and the best way though is to just make sure the science is really really good I think and you know the better we the more self-correcting the faster we self-correct and deal with problems I think the less easy it will be for people to just try to deny bits of science that they don't like do you also have people or do you know example of people who are actually commenting on this a habit of piworld you're hacking or something you know to argue that one should not take this or that science seriously I don't think be hacking particularly but I think I mean even I as a reviewer I have to say I mean I I've got quite skeptical about quite a lot of things that come through if the data isn't open and particularly I'll always more and more thing I just wish people would even if they don't go down the route of doing a full registered report if they would at least pre-register their plan and their hypothesis so rather recently I actually had to referee a paper which I was very concerned probably was affected by pee hacking and you know the author's just say because they've pulled out one little result here and me what this reply and just say well you know everybody does it and they're right I mean everybody does but you know this is not an adequate excuse for doing it so it's a problem yeah I actually think that kind of argument greatly contributes to the problem yeah because it's why people don't realize there's something wrong with it they say that's what we have learned yeah that's what everybody does must be okay and in psychology there's even a very classic example of a sort of guidebook to how to be a good scientist which explicitly recommends this both be hacking and hacking yes it says don't feel you have to test your hypothesis what you first first do is you look at your data and you look at them everywhere you can and find what's interesting and then construct the paper around them oh well that's painful yeah yeah yeah but when it comes to the involvement of the public I have to say that I think it's probably worse to try to sweep the problems under the rug instead of being open about it and yeah yeah we have a problem but we're working on it yeah yeah yeah I agree but I think we do have to be rather shamefaced about some of the things I mean it does mean confessing to quite a lot of phenomena that certainly in psychology there have been some very what we thought were very robust and and you know things that made it into textbooks but now we're beginning to realise probably don't stand up or or a much weakened at least than we thought they were well we will see what holds up I guess yeah well thank you so much I think this is a good place to wrap this up thanks everybody yeah thank you thanks everybody for watching and see you next week yeah
Info
Channel: Sabine Hossenfelder
Views: 92,808
Rating: undefined out of 5
Keywords: science, psychology, reproducibility crisis, university of oxford, p-hacking, statistical significance, problems in science, crisis in science, reliability of science, science skepticism, hypothesis, hypothesis fishing, what is the reproducibility crisis?, peer review, hossenfelder, dorothy bishop, academia
Id: v778svukrtU
Channel Id: undefined
Length: 21min 32sec (1292 seconds)
Published: Sat Feb 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.