The scandal that shook psychology to its core

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video was sponsored in part by brilliant 2011 cornell university renowned psychologist daryl bem had completed a decade-long pet project perhaps to preserve secrecy or perhaps because of its outlandish nature bem pursued no formal funding for this research his reputation was on the line and so he made sure his study methods were up to snuff he didn't want anyone to doubt him so he replicated his study nine times he used the most conventional analyses double and triple checked his work to make sure there was no mistake wrote up the results and sent them off to one of psychology's most discerning peer-reviewed publications the journal of personality and social psychology [Music] he might as well have sent a bomb because what daryl bem found in his 10-year study is evidence that esp the ability to sense the future exists as he predicted the paper was accepted and soon published though it didn't have the effect that bem had anticipated but did you really show anything there are many ways to tell this story but they all end the same way with a rumbling a rumbling that perhaps there's a fatal flaw in science perhaps the psychological studies that we've accepted as fact for decades the research that has laid the foundation for everything we know about the mind is wrong this is the story of psychology in crisis [Music] [Music] i may sound as though i'm being overly dramatic in order to reel in views but the reality is that the truth is dramatic enough on its own psychology and science as a whole is in a genuine crisis what kind of crisis well imagine with me for a moment that you're driving down the road and you catch a glimpse of what looks like michael myers standing at the corner you might doubt yourself but if you kept driving and you saw a second michael myers and then a third mike myers well then you might feel more confident that you really did see michael myers the first time too the more michael myers you see the less likely the first sighting was just a mechanic with a really pale face and maybe there's a totally logical reason there's so many men dressed like michael myers what i'm trying to convey is not the terrifying prospect of a michael myers fan con but rather a simplified illustration of the importance of replication all of science relies on evidence and if researchers make claims based on evidence that was obtained using the scientific method well then you should be able to repeat their work and get more or less the same results and if you do then that's how you know that there's some semblance of truth to the findings so if a study says that 90 of students perform better on standardized tests after they see michael myers dancing [Music] then i should be able to do that same study over again and get a similar outcome right yeah well that's where psychology's got a problem over four years 270 scientists from the reproducibility project psychology took on a huge task re-run 100 studies that had been published in three important psychology journals they tried to be as true to the original studies as possible even calling and consulting with the original authors to ask questions about how they conducted parts of their research and by 2015 they had replicated all 100 and of those 100 replications only 36 percent resulted in any significant results and for that 36 percent that were replicated the effect sizes were on average half of what was originally reported 36 percent three years later in 2018 a group of 24 psychologists attempted to replicate 21 studies from perhaps the most prestigious general science journals out there nature and science from such big names you'd expect the research to be rock solid and yet only 62 percent of the studies had significant results and the effects were again only about half as big as what the original paper showed so this whole replication thing doesn't seem to be panning out as you can imagine these revelations made huge waves in psychology and the field found itself facing essentially an existential crisis and like any existential crisis different people responded to it in different ways some denied that anything was wrong and tried to repress the bad feelings some viewed it as an inflection point and tried to radically change things some bought a corvette and started dating women 20 years younger and some questioned whether psychology holds any truth and meaning at all i think that that last one is perhaps the most worrisome and insidious because if we're honest whenever anyone talks about the replication crisis in psychology they aren't just talking about the fact that studies don't replicate rather it leads many to wonder what research if any they can believe so really more than a replication crisis psychology has a crisis of confidence and it's not some historical blip from a bygone era it's happening right now so how much can you actually trust psychology modern psychology has long striven to become more scientific it's a relatively young field and the mind and behavior are really hard to quantify and so starting out psychological observations seemed squishy especially when compared to the so-called hard sciences that seemed to produce more tangible results so it's no surprise that by the 1950s tools like the iq test and the skinner box were idolized and imitated for their apparent ability to reliably conduct experiments as many times as you like with as many different subjects as you like reproducibility was built in it was relatively easy and encouraged and so therefore it gave faith in the results because you could do it yourself the exact same way simultaneously as an extension of that psychology started using math math was a game changer because it unlocked the ability to use something concrete to analyze psychological findings so now you didn't have to trust a person instead you could trust the math researchers could collect their data run it through a statistical equation and determine just how powerful or effective the results were and the one number that rose above the rest and became the most important number in all of psychology no all of science is the p-value now p-value might sound like what my investment portfolio is doing in this economy right now but p-value sure sounds like a line of budget incontinence products but actually p-value might sound like uh ah i thought i had something there but let's move on it wasn't significant the p-value is the number that has been used to suggest whether a study's findings are statistically significant or not with that mathematical stamp of approval researchers could claim that the results were compelling robust and real and so now with tighter experimental conditions increased precision and mathematical rigor psychology had leveled up to an empirical science this all sounds like a good thing i mean it makes sense to use statistics to quantify and understand data but for decades now this statistical approach has led to some unintended consequences that have contributed to the crisis that we see today it's like a slow leak in the roof it might not be noticeable at first but with time it can cause a lot of damage so 2011 was when everyone woke up to the roof caving in daryl bem's unconventional study let's just say titled feeling the future was definitely a contributing factor i mean many psychologists found it absolutely unacceptable that a journal would publish this supposed evidence for a paranormal phenomenon like esp aka precognition it was an embarrassment for the field but it did meet the publication standards of that time so this led many to examine the research methods that made this possible that same year in one of the biggest cases of scientific fraud a prominent social psychologist named dedrick staple admitted to falsifying data in dozens of influential studies that were published in top scientific journals and to cap it all off in 2011 a few clever researchers published a paper titled false positive psychology where using common practices in psychological research they obtained significant results for an impossible phenomenon in which participants in their study became a year and a half younger after listening to when i'm 64 by the beatles [Music] it's an absurd finding but that was the point they pulled back the curtain and showed exactly how they arrived at the result by intentionally abusing certain methodologies that they argued could knowingly or unknowingly create positive evidence for a false theory this confluence of factors as well as the large-scale replication studies that i mentioned before made a lot of people ask what the hell is going on how did this happen well it's complicated but let me start by asking you a question if you ran a prestigious journal and you got hundreds of scientists submitting hundreds of studies every month but you could only publish a small percentage of them which ones would you choose to print well if we're being honest probably the most exciting and impressive results but therein lies a problem because if you only accept papers with world-changing results then that means that you're denying studies with null results basically results that don't show any effect and on the researchers side they might only want to publish results that support their original hypothesis because it makes them look good this is colloquially called the file drawer problem because studies with inconclusive or contradictory results often never make it out of the researcher's file drawer they either don't want to publish it or know that it won't get published this has given rise to a strong publication bias where around 96 percent of psychology studies published in journals have positive results in psychology the threshold for statistical significance is typically p is less than .05 which means that there's less than a five percent chance that we would see significant results in a particular experiment if there's no real correlation meaning that the null hypothesis is actually correct that might seem low seems pretty high to me but if only positive results are getting published then that's a much smaller sample of studies which means that journals are inevitably inflating the occurrence of false positives and some estimates say that the rate of false positives could be as high as 25 of published psychology studies that's potentially a big chunk of the issue right there and additionally by focusing so intently on positive results journals are more likely to publish studies that overestimate their effects because the data seems amazing so then it actually makes total sense that when these studies get replicated the effects disappear because either there's no correlation in the first place or the effects seem much less impressive because the power of the effects would regress to the mean the impact of psychology's publication bias also places pressure on researchers themselves there's a common expression in academia of publish or perish meaning that if you want a successful career then you have to demonstrate that your work is worthy of being published and published frequently in top tier journals or you can kiss your chances of tenure goodbye the problem with this mindset is that it often rewards quantity over quality so if everyone knows what journals accept and you stack on top of that the demanding expectations of academic institutions then it's no surprise that all researchers want publishable results and some might find any way to make their research publishable taken to the extreme this can lead to straight up fraud where researchers plagiarize the work of others or fabricate their data or just make up a supposed study that they say they conducted this kind of misconduct has existed in science for as long as science has existed and we all know that it's wrong now the good news is that blatant fraud is rare and likely it doesn't contribute meaningfully to the replication crisis but the bad news is that there's a more subtle kind of misconduct that's far more common questionable research practices see as a researcher you have a lot of flexibility in how you design conduct and analyze a study you call the shots and quite frankly there's a lot of shots to call but the inherent issue with having so many degrees of freedom is that it opens the door for random coincidence confirmation bias or personal motivation to influence the outcome now that influence doesn't mean the researcher is an evil doer this can happen unintentionally without the researcher ever realizing what's going on but if you've got loose ethics it can be ridiculously easy to obtain significant results with a little bit of elbow grease don't have a strong hypothesis ah just go on a fishing expedition or do some harking which stands for hypothesizing after the results are known worried you won't find a significant result well just add some extra variables to increase your odds or stop running your experiment as soon as the data looks good this process of seeking out significance has many names but the widely used catch-all term is p hacking but if you want to see where this happens most then we need to look at the numbers this is where math psychology's early savior becomes a dark accomplice remember that study i talked about before that found that listening to a beatles song makes you a year and a half younger those researchers purposefully engaged in nefarious pee hacking see you can use any number of reasonable statistical tests to interpret your raw data set and each one of them will spit out a slightly different result likewise researchers have the discretion to remove outliers deal with missing or incomplete data and decide how to score their variables among other things so if you conduct your study ethically then the way that you process your data or the model that you choose to interpret it could be the difference between statistical significance or not and on the unscrupulous side some researchers may run many tests or cherry pick their data or dredge the data until they find something significant one of my college professors literally taught me how to p hack it was in 2011 which is very coincidental now that i think about it and i was taking a research methods and statistics class and as part of it i had to go through the entire process of designing and running an irb approved study and at the end when i conducted my analysis my study showed no significant results there was simply no correlation but rather than my professor saying great write up your paper using these findings she helped me manipulate the data and run new statistical tests until i found a significant result oh yeah don't think i forgot about you dr kiddoo you didn't write me a lot of recommendation and in retrospect i think that that was good anyway it's worth mentioning that although p hacking sounds like an intentional or evil act like someone hacking into a computer that's not really all that accurate instead it's fair to say that many questionable research practices are the result of carelessness and many were common in psychology and were tolerated for a long time by the field because they were hard to distinguish from legitimate practices that was definitely the case for my professor i'm not saying that's good i'm just saying that's how it is but i think that that transitions nicely into another element that has contributed to the replication crisis which is a lack of transparency and rigor in study designs my study was crappy it was just bad the design horrible if i could go back redo it i'd well i i just wanted to get the grade but i'd probably do it a little bit better i mean think about what i was just talking about if a researcher conducts a study and gets significant results but they don't give a detailed description of their methods or how they process their data then it's far less likely that someone else will be able to replicate the results additionally studies that use small sample sizes or though that don't have clear hypotheses might have significant results but fall apart as soon as you test larger groups or scrutinize the claims i mean a p-value below 0.05 means absolutely nothing if your study design is bad and it can't be verified at this point in the video there's almost certainly someone saying it's not just psychology and you're right pretty much every field in science has been touched by the replication crisis from medicine to economics to neuroscience they all face similar issues and it's unfortunate because this crisis has undermined people's confidence in science when's the last time that you had a friend say something like scientists can't even make up their mind i mean one day they say that wine and chocolate are good for you and the next day they say they're bad and now expand that out into other more controversial topics like vaccines or climate change where people are looking for any excuse to disbelieve the research and you can see the damage however i want to be clear that i think it's a good thing that we're coming to terms with the poor replicability of our research for too long we've put too much faith in the numbers and i think that misunderstood concepts of p-values and statistical significance have discouraged replication so the goal going forward is not to expect 100 correct research because that's pretty much impossible instead the goal should be a more accurate representation of findings and more replication because just like one study doesn't prove or disprove whether something is true or not one reproduction of a study doesn't prove or disprove whether something is true or not it simply gives more information so what can be done what's being done luckily the last decade has brought many changes but there's still farther to go for example there's an encouraging movement in psychology towards open science which is a broad term that describes ways to make research more reproducible accessible transparent and rigorous some researchers have for example made their research data publicly available so that others could potentially run their own analyses i think that this is a good idea i do see some potential drawbacks if you expand this and require this or expect this from researchers for one it puts a burden on the researcher to keep their data well organized and consistently accessible which maybe isn't asking too much but two not everyone has the statistical know-how to properly analyze the data so i think a better solution would be for journals to require the researchers to submit their raw data along with their manuscript for publication and that way the journal could perform its own analysis or give it to someone else with the proper expertise another wonderful change that has gained momentum recently is pre-registration and registered reports pre-registration involves outlining your research plan and registering it with a public database before you ever begin your study and registered reports on the other hand are similar but they are different they involve submitting a study protocol to a journal and they decide whether or not to publish your study before it's ever run these are great because they really increase the quality of study designs they increase the publication of null or negative results and they also prevent p hacking by forcing researchers to commit to their methods and analysis ahead of time so you know that i was talking before about how 96 of published psychology studies are positive results well when you only look at registered reports that number drops to 44 so it's clear that there's a lot of research that wouldn't typically fall under the publishable status without this model some have suggested changing the math for example if you change the statistical significance threshold of p-values from .05 to .005 then you would have far fewer false positives being published of course the flip side of that is then you likely increase the rate of false negatives where something is true but it doesn't meet the necessary threshold personally i'd be okay with that but others have recommended doing away with p-values altogether as a primary standard for whether something gets published or not and instead looking at design rigor or effect sizes or confidence intervals as better gauges of quality on top of that in our big data world there's a distinct move in psychology towards obtaining larger sample sizes finding more representative populations and considering demographic elements that maybe in the past were ignored so this will do a few things it'll limit statistical anomalies by having more participants to average the results and it'll help identify how specific or general the results are rather than simply studying a small group of american college students and thinking that the outcome applies to everyone but from my perspective what needs to change most is also the hardest to change which is the incentive structures in research i really think that it is imperative that journals diminish the importance of positive results i mean i get it it's exciting but it's also created a harmful expectation that science has to always be right so to shift this dynamic big adjustments are required journals would need to publish more null or negative results i mean maybe we need to set some minimum requirement of the number that they have to publish each month and funding sources would have to pour money into replication research i know that replication isn't sexy because it isn't new but it should be appreciated at the same level as novel results and institutions have to change what's deemed valuable when making tenure decisions i think that would really change the pressure that's placed on academics and it might improve research quality and then of course researchers should hope to be proven wrong null or negative results are just as important as positive results and they should evoke the same level of appreciation and pride among all researchers oftentimes what we perceive as failure or messing up is actually just the learning process and that's why it's so important to challenge yourself to be wrong because it pushes us to a higher level of understanding and knowledge which is part of why i want to tell you about brilliant brilliant is an awesome way to interactively learn stem topics using these kinds of challenges they have tons of detailed courses that you can complete at your own pace but they're all really well designed with addictive questions and puzzles that help you understand new concepts on a deeper level there's an entire section dedicated to statistics and probability and i'm sure that a lot of scientists would benefit from these courses especially considering the fact that in 2016 the american statistical association had to release a statement where they explained p values because so many researchers misunderstood what they meant anyway given all of the statistical trickery that we've been talking about i highly recommend their statistics fundamentals course which not only helps you understand stats but also teaches you how people use them to lie to you i learned a lot from this course and you know what even when i got a question wrong i didn't feel so bad because brilliant gives such great explanations that i just ended up learning more so do yourself a favor and dare to grow by joining brilliant to get started for free visit brilliant.org neuro transmissions or click on the link in the description and the first 200 of you will get 20 off brilliant's annual premium subscription the replication crisis is such a huge issue and i'm not sure if i could ever do it justice i just don't have the expertise or the time to get into every facet of this complex topic but i will say that i have been impressed by how far psychology has come in the last decade it feels like the field is completely different from when i was in college and i think that actually this existential crisis is probably the best thing that could have happened to it in a weird way the poison is the cure and so out of this replication crisis must come more replication i mean after all that that's kind of the purpose of the scientific process right you examine old ideas and then you prune those that can't withstand careful scrutiny it's not meant to catch wrongdoers or fake research it's a way of building confidence in something that we can never know for certain that is why replication is so important it's because our scientific knowledge can only grow and deepen when researchers can safely build on the foundation that was laid by scientists before them and so now standing at this turning point it feels like the future of psychology looks brighter than ever thanks so much for watching and until next time i'm micah think about it
Info
Channel: Neuro Transmissions
Views: 16,522
Rating: undefined out of 5
Keywords: replication crisis, reproducibility crisis, statistically significant, false positive, false negative, publication bias, null hypothesis, psychology, daryl bem, diedrick stapel, micah caldwell, micah psych, neuro transmissions, research, p-value, is psychology a science, power poses, marshmallow test, ego depletion, psychology debunked, psychology research
Id: QGWeVbYduOI
Channel Id: undefined
Length: 29min 34sec (1774 seconds)
Published: Wed Aug 24 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.