Andrew Gelman - Bayes, statistics, and reproducibility (Rutgers, Foundations of Probability)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
shall we begin sure sure okay today's speakers Andrew Gelman from Columbia I think he's gonna talk about something else Bayesian what was a call Bayesian statistics and yeah I can't remember the exact title reproducibility yeah so I'm I will talk about that first let me talk a little bit and hopefully we'll have some discussion I won't just be yapping on let me talk I think that there's something about statistics in the philosophy of statistics which is inherently untidy and I said this as a former physics major I took a chemistry I took chemistry in high school and like I you know I got a good grade because I managed to like like sort of figure out the rules but was this like a crossword puzzle like you know when you mix this and this you get that chemical and like it was like I somehow cracked the code and was able to do the test then I I took a class in chemistry in college and I just hated it and this is a common experience for physics majors that they hate chemistry because physics will make sense and chemistry is very arbitrary it's like they keep pulling out principles and like oh this is like oh yeah this and I think that's how statistics probably feels to people with math training that like somehow although I had math training and statistics didn't feel that way to me but I'm not quite sure why but I feel like the way statistics is taught including how I teach it is this weird like like a bunch of rules of thumb all over the place like do this except oh don't do that here like you should you know don't do this rule if n is greater than 30 you can do that except when it's not and let's here we're doing we divide by n minus 1 which sounds sort of funny and here's a reason that like makes no sense to justify it and it's like why are you doing this thing that makes no sense and then we're using you know an unbiased estimate but it's on the scale of the variance not the standard deviation and like why is that and then you get more advanced things and it's like where does the prior come from and why do you one in 90 percent interval anyway like who you know who elected 95 as the number like it's just full of that and like math isn't really full of stuff like that I mean there are big questions like why do we care about prime numbers or whatever but not like so I think and I think that's just you know like we can maybe turn that frown upside down and make that you know make that bug into a feature and just say you know we're you know we we you know fly our subjectivity flag proudly or whatever but I think that statistics is inherently kind of incoherent and I want to talk about that first before going on this one that kind of established that so I have like a mini talk which I'll call here I'll read the title here the title is bootstrap Bayes and Cantor and I don't really have a good like : dot that dot but like the : dot dot would be somehow something about blah blah blah blah blah essential incoherence tension I don't know productive the productive tension of this is too many words though the productive tension of statistical tuition s I don't know something like this and the idea is that like well let's start with bootstrap raise your hand if you know what bootstrap is and statistics ok good you're well educated you know these things so bootstrap you you it's like a magic trick and I mean that in a negative way or in a positive what the magic I mean I think bootstrap is great but I'm saying it's like a magic trick in a negative way and that you know the characteristic of the magic trick is that you're watching this hand but meanwhile let's see how the person's foot does all the magic or the magic already got done right the best magic tricks the deck already got stacked before you even entered the room and meanwhile you're watching so bootstrap as you well know you you bootstrap the data you resample the data and then you reapply your estimate and get sampling distribution this and that so the magic trick is you it's all about how to do the resampling bootstrapping a time series bootstrapping spatial statistics bootstrapping is it a parametric boot shop it's a nonparametric bootstrap what do you do with the booster buh-buh-buh-buh-byeeeee but the thing that already got done before you entered the room is the estimate that got boots dropped like because it's like you have data bootstrapping you of data why and then you'll have theta hat of Y and then you bootstrap the hell out of Y which induces a bootstrap on theta hat which induces a bootstrap distribution all of their things all it's all everyone talks about the bootstrap where did they to hack come from well I don't know but that's okay my point is that bootstrap is a two ish procedure it's a it has two pillars it has where did they to hack come from and what do you boots dropping and there's no reason that they have to be at all related to each other in fact they're in fact in some sense like the bootstrapping has to do with how the data are sampled which isn't necessarily relevant to your estimate it depends it might be but it doesn't have to be there's two things going on and I think one reason the bootstrap is so successful is it's sort of non-ideological nature that you can have you can sort of bootstrap any estimator or conversely any estimator like like you can do any bootstrap procedure to an estimator they're two different things coming in now Bayes also know Bayes you you're maybe gonna think I think the two things are the likelihood in the prior but that's not what I want to say here Bayesian inference I feel the tuition is is inference and model checking so in Bayes you do inference based on the model and that's codified and logical but then there's this other thing we do which is we say our inferences don't make sense we reject our model and we need to change it and one of the things that I've thought about a long time when it you know they say the way to learn something is to teach it right you've heard that and the way to also writing a book is another way of learning something so when we wrote our book over 20 years ago in Bayesian data analysis I thought a lot about this idea of model checking because my experience when I shortly after I got my PhD in in 1991 I went to this Bayesian conference and it was a conference full of Bayesian so were like people he heard about these things now like people get obnoxious I get drunk and and unpleasant and they joke about like how much fun it is to get drunk all the time and like it's kind of I'm not really into that but like if people are that that's fine at that part I didn't like so much but one thing I noticed was that like nobody I I'd this trick which was to check the fit of your model because like I was like doing I can tell you I was doing a couple applied projects where this came up so I learned about checking model fit so every time I would come up to someone's poster I'd say hey have you checked the fit of your model and then people would like it's like what do you mean and say well could you simulate fake data from your model and see if it looks like your real data and like I know this is horribly we have this expression that we call a Fineman story which is like a story that makes you look good and everybody else looks stupid and like I'm embarrassed to say I'm telling you a Fineman story right now and I'm sorry I shouldn't go around doing that I'll tell you some stories where I look bad so anyway I apologize in advance for that anyway in this particular Fineman story is real actually that I would go up to people and first they had no idea about cheque no model and I was like mr. perfect like yeah you could just do this and they said no no no we can't check we don't check our models and why not well because you don't understanding in Bayes Bayesian statistics the prior is subjective so you know the subject's objective see you can't check it it's just it's it's your belief you know that you can't use nothing to check so I think sounds sort of funny like setting aside forget Bayes and all that to say someone says well usually you'd say well I have some piece of knowledge and it's very subjective maybe you think you really should check it I mean that's the whole point right like you like like of course you should check it why should your beliefs can be wrong all the time but they had a certain ideology and the ideology was was not checkable well do you know underlying the ideology underlying all good ideology is some solid theory that's just miss applied right I mean that's think of any ideology you can think of even ideologies you hate somewhere there's a solid theory that's being misapplied so the solid theory about the days was that there are certain aspects of a prior distribution that can't be tested so if I have a very simple model like this I have theta and it's a one dimensional parameter and then I have some data like why whine up to yn and I let n be large then I have a model of Y given theta if I get enough data I can check my model of Y given theta right I can just look it's a distribution I can check it okay but I can't really check my prior and theta because I only observe one theta so the true math underlying the inappropriate ideology of these alcoholic Bayesian z-- was that was that in this situation you can't you can check the likelihood model but you can't check the prior you can check the sampling distribution you can't check the prior so you can't if you can't check it you need to just you know suck it up and make an assumption just make assumptions that's what you got to do and you can't check them so it like if you can't check them just be honest you can't check them and let's move on okay but wait they were wrong about was that we don't usually fit models with just one parameter and when the models have more than one parameter and the parameters are connected to each other then you can have internal evidence that can falsify a model or probabilistically falsify a model which in a concept that are idealized Karl Popper would be able to handle with no problem whatsoever there's room in year if you're if you want okay um so that was one reason that was that was one one reason why they were wrong the other reason why they were wrong is you can actually check your model we um here there's just a couple chairs right there you can unfold there's one way that you can check your model even if you only have one theta which okay which is you can compare your inference to other prior information that you haven't put in your model yet okay so this happen my favorite example comes from a toxicology problem I was working on many years ago where we fit the model to some data and you fit the model and the parameter estimates don't make sense so whoa I mean they bunch of parameters including like how how much does the blood flow through different organs of their body how bigger the different organs so it's said that you're someone's liver weight 8 kilograms liver doesn't weigh 8 kilograms that's too much not so we had but that will happen we had prior information that wasn't in the model okay so you can check in that way too because our models we'll get back to Cantor in a second our models are inherently always incomplete right so I had this experience in which these gentlemen were not like how to serve ideology in which they couldn't check their model and it was kind of a revolutionary idea so I'll talk to you I'll tell you a little sociology it's not really sociology it's just that's a term we use when we talk about like when people behave in ways that we find annoying we call it like sociology so not that I don't mean sociologists behave that way it's just like you know like like when groups of people have group dynamics that we don't like we say well that's sociology so I wrote a paper as some colleagues on checking Bayesian model checking so we wrote posterior predictive p-values which I'm not really into anymore I don't like p-values so much but at the time that's how I thought it was based on the experience I had in medical imaging where we had a model and the model had the model we're fitting the model to 20000 data points and mal had 10,000 parameters so I want to know how many degrees of freedom for my chi-squared test but like although it had 10,000 parameters there they didn't have a prior but the parameters are constrained to be positive and the positivity constraints weren't like the the maximum likelihood estimate was like knocking up against the positivity constraint so although it had 10,000 parameters that didn't really have 10,000 free parameters so I was trying to come up with a statistical way a mathematical computational way to get an effective number of parameters for a model which was not like normally distributed but a model which had positivity constraints so my solution was on the boundary there's a famous paper in 1954 by Hermann Chernov about how to do this in one or two dimensions but it doesn't really work in 10000 dimensions so it's trying to work that out and we realized that if you just put a prior than like it's very easy well the answer might not be right but at least it's very easy to get an answer which like moves you forward a little so we had your posterior predictive p-values and and I really think of it as I was sort of tired the Bayesian sigh didn't want to check their models anyway so I just thought the non-bayesian would would love this idea but actually they didn't because it was Bayesian so they liked it in fact I had this weird experience of non-bayesian telling me hey I talked to so-and-so Bayesian XYZ and they told me this isn't really Bayesian and I'm like why you don't like Bayesian methods how come you're letting this guy who does stuff you don't like be the authority unlike but that's how they were so I had to really I do I had more success working within the community of Bayesian so that's like yet another thing which is like you got to sometimes work with people who speak your language even if you disagree with them because at least you have a chance at do excuse me I'm doing that so anyway to get back BAE bootstrap I talk to you how its it has two things it proceeds on two tracks a base also produce precedes on two tracks because what you need to fit the Bayesian model is a likelihood and prior but you don't actually need to have a data collection mechanism okay this is kind of subtle but um there are there are some famous paradoxes of Bayesian statistics that that tell you that you can have like the stopping rule doesn't affect the likelihood so it doesn't affect your posterior you might have heard about some of these things basically to do Bayesian inference you don't quite need to know how the data are collected you just need to know the likelihood to do model checking you need to know how the data are collected so you actually need more information to check the fit of a basing model than to actually estimate the model if so like if God told you the model was correct you could estimate it but if you're not quite sure because there's like multiple gods in your universe or the gods are malicious or whatever then you actually need to know more to be able to do that so that's good though again there's an incoherence to Bayesian inference and maybe maybe we can talk about a robot that would do inference and how the inferential robot would work that's always fine I think everyone likes to talk about robots AI we've been talk about this for a long time now the robots actually exist right so it's even better time to discuss these things so canter you know canter right canter is like that guy right you remember you've heard of him philosophers okay so we sort of you know all the action is I get on that I mean it's sort of here right so like this is like all the action is is on the diagonal like that's sort of life like that's where we always want to be it's the dot it's the I don't know you know it's so what's happening is we fit a model and then we sort of get more and more data and we find problems of the model and then we have to expand the model so we expand them out so this is like models and this is like data so we expand the model then we get more data and then we find that doesn't fit so we've expand them I only have to keep doing that right so we need somehow we always need the step of model expansion we can never just because it's just sort of the nature of in computability that you can't move along the diagonal like you want to move along the diagonal but you can't like all you can really do is move horizontally and take jumps like that sort of because if you could move along the diagonal then you'd you go along your diagonal until that didn't work you need to find a new diagonal right that's the classic proof of the diagonal argument in the first place so this is sort of I'm arguing in different ways like sort of by example and fundamentally how statistics has this fundamentally coherence no this was a really a topic of my talk I just want to establish this but since we're on the topic maybe we could talk a little bit about like an AI that could do statistics like we have Mike I spent much of my time trying to figure out how to get money to pay people to write a computer program which then allows me to fit models to do applied statistics that I want to do so like in some sense I'm trying to automate this which is fitting models as I get more and more data or or this or this so like like what do I spend my time doing either making it faster so I can fit more data or being able to jump down here and fit more complicated models most of you who've worked on applied statistics problems will know that when you start a problem you write down the model you want to fit and it's more complicated than anything that you can fit I can so you have to make compromises right away so I want to reduce the number of compromises that I have to make but like somehow automating Bayesian inference that's conceptually easy it could be computationally difficult maybe mathematically difficult there may be some problems that you just can't automate in some sense the computation is is too challenging but conceptually it's just like doing big integrals but how do you automate this and this requires two steps we need to automate model checking and automate model building and so my thought is like the models we fit are kind of linguistic inform it like we have like we put distributions together like we put words and sentences and you know jet grammar is like generative and all that and statistical models are kind of like that too that you know there's a recursion aspect that we can put a model inside a model so I feel like there there could be I could imagine like just as you can you can have computer programs that can sort of write English paragraphs not by just using silly Markov rules but by actually like generating somehow starting with concepts and generating phrases that we should be able to generate these models checking is sort of another story like it seems like like I think one thing that makes people Bayesian x' correctly I have this sort of philosophy of Bayesian inference which is about like involving model checking is playing a very active role and I think a lot of Bayesian s-- have a very fair objection to my this and the objection is how does it work like like how would you program a computer to do it because I'm saying I can do it but I'm like a computer right so how did I program myself to do it right so the idea would be that like I am saying oh yeah it's a Bayesian way dudes they build a model that's whatever I can do I fit to data and then when I get enough data the model doesn't fit then I go back and do something better and how do I check look at so how do you write a computer program to do that well you kind of need like a homunculus right like you run the computer program and then it checks like what would you do like the computer program like yours and here's an idea like I use graphs sometimes to check mark oh so what do they do well my computer program which is doing it all by itself like this is like in the future when we've all blown ourselves to bits but there's some there's some solar panels still operating so the computer programs are living all by themselves we're not around anymore they're like trying to figure out the world is nobody to program but they still like want to do statistics right because you know what's the point is being alive you know otherwise so they fit a model then they say well gee I'm you know wheat the computer program read old statistics my foot said it's time to check the model so what should we do well hey there is some old like HP printers that are still working like they have a little bit of ink left you know amazingly enough so what we're gonna do is we're gonna make a graph and we'll have a come out of the printer and then we're gonna grab our robot arm and pull off the graph out of the printer put it flip it around and put it on the scanner and scan it in and then we'll use our our visual image processing to read that image and see if anything is surprising well obviously that's ludicrous right like to print it out but that's kind of what we're doing right so it's like how can model graphical model checking all the graphical model checking is fundamental to how I do statistics how can that really be a fundamental process because if you like if you if your computer program did it would need like to pass it to a human to do it or like ooh but I'll sort of make the opposite argument which I think AI people are becoming increasingly comfortable with which is if if AI is like a human brain well we do have different models modules so we do have an analytical module and a visual module and we have an executive function and all that and so I guess I'm kind of arguing that an AI that could do statistics would have these different models it wouldn't have a full model ahead of time and just push the buttons and do Bayesian inference because Cantor's diagonal is a math I mean this is deeper than anything I mean this is pure math this is true and so a computer by virtue of being an AI can't like violate Cantor's diagonal principle any more than like a computer built rocket could go faster than the speed of light it's just not possible so a an AI that could do statistics all by itself would really have to address this and I feel like the only way it can't address it is to have different modules and have the executive function and sometimes get things wrong and all that I think that would be part of it now that's a parochial view okay so I'm really saying is that any AI has to be a little bit like me you know like that's like you know that um you know I'm like the old-style people who you know the old-style aeronautical Engineers who said that like any any airplane would have to flap its wings just like a bird and you know so forth I mean it's like I could be completely wrong but that's how I see it okay so that's really not what I came to talk about I okay so Bayes statistics and reproducibility I wanted to so you've heard a bit about like reproducibility crisis and science have you heard about that a little bit it's sort of so let me give a little background on that not a personal background because I think that this is a subject which has been saturated in personality too much but sort of statistical perspective or a scientific perspective that reproducible reproducibility crisis there's sort of three parts to it so the simplest part but not really the most important is that people have done studies that do not reproduce like someone will write a paper and in the abstract the paper they'll say a certain pattern is characteristic of the world then later people will gather more data and not find this pattern now I for this discussion I'm going to sort of skip a question like what does it mean to say it study got replicated or not because I mean that's a good question what does it mean to replicate everything's kind of noisy but let's just say there are some cases that unless you have a personal stake in the matter it's pretty clear that something that was supposed to have occurred did in a car and and that has to to consequences echoing backwards in time and one consequence that going forward doesn't echo moving forward in time so the two consequences that going backwards are first of course it calls into doubt whatever claim was made and various sub literature's so you'll have without getting into example I mean I can give you examples if you want but like you can have a study that was literally sighted four thousand times and presumably some of those who are replications they weren't really though they were what are called conceptual replications meaning there are other studies where the results were deemed to be consistent with the earlier theories so when a study like that when people go and externally try to replicate and it doesn't replicate it goes back and it sort of erases an entire sub literature it's as if that sub literature and not that it was never there but it all gets reinterpreted the same way that like you know you can never watch a Cosby routine again in the same way you can't look at these old studies even if they were done in complete legitimate you know ways you can't look at them the same way so everything is poisoned by what by knowing what came after it changes how you view all these things you just can't it's impossible to get there's that um the second thing going back though is it's kind of more interesting than this which is that you it makes you wonder well obviously something went wrong with the scientific method because what happened he was speaking this theologically remember what I mean by that I'm speaking sociologically you we have the scientific method which is some combination of you know hypotheses deductions non you know rejections of hypotheses and I'm rejections of other hypotheses and also a social process right which involves some mixture of you know peer review vetting experimentation you know loose ends you know whatever putting that together it created a what's of high degree of subjective certainty in certain predictions that didn't occur right now when that happens they're different interpretations so for example after the 2016 presidential election people said we'll look something wrong with polling because the poll said Hillary Clinton was going to win and she lost well whereas Hillary Clinton did win she got you millions more votes in Donald Trump so like the poles weren't really so far off as what Locke Atocha would say I'll deal with some exception handling I think I can't remember if that was this term but I'll say that actually yeah there were some staples that were wrong but that's because they didn't do the adjustments they were they're supposed to if they had done it right I'm telling you they all had exact electable elliptical orbits as once you correct things appropriately whatever and I think that's that's actually kind of right like I don't think the election of Donald Trump means that the science of polling is wrong I think it means that people had been overconfident and hadn't been adjusting their numbers well and also unexpected events can happen and so forth but the replication crisis is kind of worse than that like one or two studies that don't replicate you can say like something went wrong and people over interpreted things but there's been enough high-profile studies that didn't replicate that it does seem that the scientific process at least in some fields of science such as biomedicine biomedicine psychology have big problems now going for I told you there are two things going back and one going forward so the thing going forward is of course something has to change so people well I mean people will have to change what they're doing because they don't want to do things that have a high chance of not getting replicated I mean at the very least people are a lot less certain in their claims than they used to be and I'm gonna talk a little about like potential solutions to the replication crisis and in a bit but I I feel I haven't fully laid out the replication crisis because I said part of it is studies not replicating but that's not even the most important part so what are the other parts so the other parts are statistics and substantive theory so the statistics part is that if you look carefully a lot of papers don't even do what they claim to do so for example there is a paper saying that like upper-body male upper body strength per day to certain attitudes political attitudes well it turned out it wasn't upper-body strength they had a bunch of guys put they put had them put paper around their their biceps and measure that that was so was actually men with fat arms okay so that's I think that's actually statistics or it's at least research method so there's a lot of things where people didn't actually do what they said they did I mean a lot of high profile papers where it's just it just wasn't done as it was said and that's like in some way I mean that's you should know that without even you don't need to not replicate it to see it so it's what happened was the lack of replication make people go back to a lot of papers and realize that a lot of steps that people skip you couldn't we shouldn't be skipping so there's a lot of a lot of steps between there's three fundamental problems of statistics generalizing from sample to population generalizing from treatment to control group and generalizing from your measurement to your underlying constructive interest and all three of those are things that people will number two is the thing that like people often obsess about like identification but number one and three are also very important also so number three being the met being actually measuring what you said you're measuring generalizing from sample to populations a little bit of a digression but like I feel like I've unlimited amounts of time here like I haven't even spoken for I've only spoken for 37 minutes it feels like forever already though that's the funny thing I anyway but I feel like if unlimited digression and it's all like very clear in my mind where it all is I I love not using slides so there's a funny this is just as he says a digression but it's perhaps interesting and in relevant how we think about the world I'm when I was a student a statistic student I was told there was this really cool thing a randomized experiment followed by a permutation test where you could establish causal effect without knowing a mechanism by just demonstrating that you have more effectiveness in the treatment versus the control group and there are two things that were cool one is you don't need to know the mechanism and the other is that the people in your study don't need to be a random sample from the general population they can just be like 40 volunteers and you randomly give 20 that randomly give 20 the control now both of those statements are correct and yet often wrong that is mathematically it is correct that you can demonstrate a treatment without effectiveness without knowing the mechanism and of course we've all heard of penicillin right it's like there really are treatments or that work and people didn't originally know why it worked and it is true that like you don't need a random sample of people that find out that penicillin works whatever however not always so let me get off let me get the easy one first so the I'll talk in a bit a moment about the reason why you might need to know than that mechanism but let me talk about the other thing which so this I I thought hard about this because of something called the freshmen fallacy now what happened was a couple of few years ago I criticized a study in very--it from various ways but one thing I said was well one thing for fairness you can say about this study is you know maybe it tells us a little bit about call about college students and Mechanical Turk participants and we all care about them as I was making fun of the study because it was a study on college students and people who played Mechanical Turk yet the paper claimed it was about people in general so I got an angry email from a psychology professor who said how you made a mistake that my freshmen make in my introductory class which is to think that you need a random self from the population if you're doing causal inference and from there I coined the term freshman fallacy which is the fallacy of thinking that just because a freshman says something that it's wrong because his freshmen were right and he was wrong if you are estimating a constant treatment effect then you don't need a random sample from the population but if your treatment effect interacts that is if the treatment effect is positive the treatment is positive for some people and negative for other people then it matters a lot right what if your population is almost all you know people like this but your sample is mostly people like this and what if what if actually the treatment it's a negative effect on the put on most of the people in the population on everybody but when if most of the people the treatment hurts but helps some people but it happens that the kind of people it helps or the kind of people enter your damn study maybe because they knew would help right well all of a sudden even if your studies completely kosher randomize bah bah bah bah blah it's still not giving a good estimate of the population average treatment effect well of course if you put in the interaction in your model you'll find that people don't always do that so it's in and in in getting putting in the interaction gets into this whole mechanism thing right because by under Sidhant understand the mechanism you might have no idea what the treatment would interact with but the interactions are crucial in understanding this it's also difficult estimate interactions because to estimate interaction I can tell you this is a fun little piece of math you all know about this sleeping over square - then raise your hand you know I'm talking about when I say signal words the half maybe I won't tell you this one because I feel like the people I hate to exclude the people don't know what Sigma over square root of N means okay I was just going to give the little demonstration of why the estimate of the interaction has twice the standard errors the estimate of the main effect and which is heard a horrible because like interactions presumably are typically smaller than main effects so for example if it interactions half the size of a main effect and an interaction is estimated with twice the standard error and that means that the estimated interaction accurately you need sixteen times the sample size that you would be at estimate of main effect a given that studies are usually just big enough to estimate the main effect you'd sort of impossible to estimate interactions yet interactions are crucial so where are we in a very horrible situation needing Theory meeting mechanisms meeting understanding of the world so I actually do think that the solutes one of the solutions to the replication crisis is for our statistical models to be more intimately connected to our understanding of the world that's that's part of it I'll like to tell you something I've noticed that every study I've seen is both too large and too small well what do I mean by that too small as obvious you never have enough people in your damn study like it's never big enough whatever you're trying to estimate if you're lucky is just barely statistical significant about statistically significant if you don't like that but you know like your inferences are just barely conclusive typically whatever you want to study you get your estimate your uncertainty and it's like you just if you're lucky you have enough data that you can sort of say yeah I can pretty much estimate this effect but like it's rare that you have like extra data it's rare that you feel like you have data to burn like even when people have data to burn like you know they're sitting around in facebook or whatever at like you don't really have data to burn even there because they usually you'll want to look at variation like like how do things vary by subgroup or vary for month to month if I have a survey with a hundred thousand people in it which I sometimes do I'm not trying to estimate the support for Donald Trump you know two three decimal places with a hundred thousand people I'm trying to estimate how things are changing and I don't really have enough data so in that way like our data our studies are always too small but I've also found that when I work with real people the studies are always too large meaning like you like enroll people I'm thinking like studies I've done with colleagues in Social Work or Public Health you know you you you decide you need like 360 people in your study like your funding you figure it out right like this is my power analysis blah blah blah we need 360 people so you have the meeting and people are outside how we're gonna recruit people into the study then like oh yeah like I talked to this guy and he knows where I could if you could find some people they want to join it's all excited and then somewhere around like person number thirteen or person number fifteen in the study it becomes kind of boring and it's like you're already waiting for it to end so you could write the damn paper write the next grant like it's like you've already because like you know I do like when you talk about studies in public health in Social Work most of these studies the principle is that we already no this is gonna work and we're just doing this study to like prove it to those knuckleheads in Congress will fund us for real right like that's how it is like this is it you know we it's not like it's not so complicated it's not like hey it's a complicated three drug interaction this is more like oh people need social services we're gonna provide it to them you know we're gonna we're gonna do this we're gonna help people out like a lot of public health things are like this it's really very direct of course it's gonna work and so it's just painful these things that go on and on because so much of like to the extent you learn anything from these studies it's it's it's qualitative and to the extent that the quantitative analysis is useful it's sort of maybe as a check on more extreme claims so like at least you realize that like if you try to make a general claim you should try to back it up and the design is relevant because to the extent that people in your study are like a random sample of somebody to the extent that the treatment people are kind of like the control people again like it's sort of to the extent that you're being honest in your qualitative work it sort of keeps you under control you're getting qualitative inference about their represent a sample your as long as you're not like too internally biased you can learn something but it's kind of weird it's like what we do is so different from what's in the textbooks in including including my own what do you mean a branch is used for the purposes it's nothing it's not necessary it doesn't convey any real information but it's used for the purpose of persuading somebody who thinks it well yeah a lot not in the sense it's it's not even statistics it's like the scientific paper as a means of rhetoric I think it depends so in social services research like I'm not an expert in social services research but my impression is that that's often how things are viewed that we have an idea that's got to work and it's just a matter of proving it you in social psychology it's my impression that a lot of people have that same attitude that we're going to prove that it thinks work the difference is in in in a field that's kind of clinical like like in social services worker the things that are kind of imitating clinical medical trials there tend to be clearly defined end points so there's there's not as much of a margin for people to sort of sort of let their theories be flexible like so if I do an experiment and social services research and it doesn't really seem to work then really all I can do is say it didn't seem to work I can give alibis if you give immunize but in social psychology or orally a lot of like certain other fields like experimental economics and political science I've seen a lot of examples where someone does something and it doesn't work it actually contradicts their initial hypothesis and they just explain away the results all right so it doesn't work that way in social services you don't say well we thought our treatment was going to work and then like it seemed to have a negative effect and so then you don't write a paper saying hey our treatment hurts people like it's not what people would do like they would they would just say well we must have done something wrong because it's supposed to help you know like that's too bad in an open-ended research field if something goes the wrong way they'll just say that went the right way because it's different I think in social services your goal is to like you have sort of two goals one is to improve a very particular clearly defined outcome and the second is you know in some sense to promote your method and they really go together in academics like Arts and Sciences kind of academics the goal is to make a breakthrough or a discovery and it's a lot easier to make a discovery than to show something works so anyway that was a long answer but I guess that they think it's all sort of a branch of rhetoric but it has different flavours depending on what your goals are okay so there was something that so I'm sort of getting closer to that the meet of my talk or my abstract and this was the connection between statistical statistical philosophy and reproducibility so in statistics we have two other two main traditionally dominant philosophies Bayesian philosophy and there frequentist philosophy now I think maybe things are changing because now we have like machine learning artificial intelligence they have like I'm not sure what their philosophy is but I think it's probably slightly different we could maybe talk about that a little bit but the Bayesian philosophy is is pretty clear it's that you have a model of your data generative model and you have a prior distribution and you evaluate procedures based on how well they do integrating out over the prior nobody ever says the posterior interval is correct I mean maybe it represents your subjective belief or your model-based some you know batting probability or whatever but really the this a single basing probability can only be evaluated but it can't be evaluated like from the outside Bayesian methods can only be evaluated using calibration and calibration is done by integrating over the prior that's the definition that's the Bayesian methods are optimal if you integrate over the prior if you integrate over a different distribution they're no longer optimal that you'd have to use a different prior so my take on this is that reproducibility or reproduction or replica a cassette or replication replication is central to Bayesian philosophy or Bayesian theory because a Bayesian procedure to the extent that it has sort of external validity and check ability is calibrated based on averaging over this prior so it's it's a replication set of replications it's not in some way I speak as someone who uses basely and I sort of think of myself as a Bayesian I like this idea I mean like I've never been so happy with the idea saying how well does my procedure work if I apply to many different problems with the exact same value of theta because it doesn't make sense to me I'm not gonna see a lot of problems with the exact same value of theta but it does seem reasonable for me to say there's some distribution of data that describes the problems I see I've had this view I have this like you know like philosophers like these little stars I have a story that of the sort that I imagine philosophers would like and it's the statistician is sitting in a room a windowless room as the story goes and the room has two slots as an inbox in which which are given like food water and and data and problem descriptions and then there's an outbox which well in addition to like dirty laundry human waste and so forth we give our inferences and the idea is you're a statistician you're sitting you get your data you type the data into the computer and then you run your regression or do whatever you do and then you set the inference out how do I know if I'm performing well well I want my procedures to perform well when integrated I want any different procedure oh and to just make it even simpler I'm a statistician who can only do one thing like I can only do multiple regression with this prior I can only do this whatever it is like I can only do one thing okay so sometimes I get data and I don't do them like so I say this data this these data are not appropriate for this thing so don't give it to me and I just push it out but anything that I'm willing to analyze like give a result for now the correct as the Bayesian procedure I should use should just use as a prior the probability distribution of the true parameters for all of the data that gets sent to me now of course I don't know what that is although if it's a sequential game I can gradually change my prior until it it seems to match if I get enough information I can learn the prior I can I can do I can solve the inverse problem and I can figure out what the prior is so it should it should be possible but so you could imagine like that's what a Bayesian statistician does and the reason why I like this example this model of the world is that it emphasizes that there actually is a true prior the true Pryor is not a property of the universe and it's not a property of the statisticians subjective brain it's something in between it's a property of the set of problems that is that are given to the statistician or the set of problems a statistician is willing to analyze so the true Pryor is really just the distribution over which if you integrate it you'll you know if you integrate it you'll get calibration and then optimal procedures so I went a long time on that I spent 10 minutes on that because some Bayesian 's have this like propaganda where they say that Bayesian inference is supposed to be correct for every instance like they'll say well you know frequency probability is fine if you're talking about it die that you're rolling because you can really die a long time before it breaks right but like what about things that happen just once like you know nuclear war what's probably about other nuclear war we've only had one nuclear war in human history so far all right so we don't have a lot of evidence on it what's so probably we're gonna have another one in the next five years well how do you answer a question like that well you know the applied answer to that is you have to model it I mean you have to connect it to the you know probability of other militarized interstate disputes and and so forth but the point is the Bayesian will say well it's only gonna happen once but I would say actually no like in some way even if it only happens once mathematically these decisions are based on a model for which there's a distribution of things so I think that replication is is inherent to Bayesian inference and another example I like to give us like you roll a die and the die is a sort of die where when you roll it it rolls but then it has a little time bomb inside so you only get to roll it once and then it explodes so like even though that's true somehow you wouldn't model it any different that you've modeled would die that you could roll over and over and over again yes yes so why should that flow be stationed there oh I'm assuming it's stationary if you're right you're right if it's not stationary one would have to model it as a function of time and then time would be like an X variable that would go in so then it would be a model conditional on time doesn't it I don't think it really not the matically changes anything that's important but sure I mean I that reminds the eye of one of my favorite examples which is the the sequential this is one of these paradoxes of how Bayesian deals with sequential data collections so you get data someone has an experiment what they do is they keep gathering data on the treatment their control until one thing is statistically significantly better than the other and then they stop and so you're like how do you analyze that and they're just like how to analyze that I could tell you another time because but the the easy answer is that in the situation like that you should let your allow your treatment effect to vary over time because it turns out that these sequential designs can give you inference that that are very non robust departures from stationarity so when you're doing a very simple randomized experiment with fixed n then you don't really need to worry about time variation of the treatment it's actually very related to the the problem I pointed out earlier which was the relate to the freshmen fallacy so if you do an experiment a randomized experiment but it's not on a random sample the population you're in for your inferences are going to be non robust to interactions similarly if you a sequential data collection really see even your inferences are going to be non robust it to time trends and then they treat the effect so yeah sure you could do that I don't so when you're doing the modeling don't assume you're ever gonna have another instance of theta equal to ten exactly right but you're you're assuming that your I guess it's related to this which is that you're modeling during that you're in a similar situation you're in a familiar situation in order to make sense at this bomb right so I guess I'm trying to yeah you can bring this into the nuclear war example where you're making an assumption that the knowledge you have based on past experience still has to be somehow connected to yeah you need to have stationarity or if you don't have stationery need to have a kind of meta stationarity in the sense you need to have a model for that the amount of non stationarity there is which like so an example that comes up in my work is drug development so we've data on one drug and now we're developing a new drug and I don't I course I could just analyze the data on the new drug but then my inferences are very uncertain so I want to use partial information from the old drug so all I'm going to say that the parameters are different so it's non stationary but I have a model for how different they are so yeah you have you in practice like you know it's data or model I mean like if you don't have the data you need to have more model to deal with it conceptually I do mean the role of the model is to sort of bridge between your new question that you don't know the answer to and the old questions that you have data for I mean yeah so do you have a kind of heuristic or when for when you wouldn't feel comfortable to do anything and when you don't have enough background well I mean this isn't something that I've worked on but like I sort of you know I can give the hard line answer which I kind of believe which is if you need to make the decision then you sort of bite the bullet and you you fit the model you just write down clearly what your model is and and what your assumptions are so like if things are it's never I mean you know I'm not in the nuclear war it's all so it's like somehow it's hard to know like there it's like the nuclear Wars the outcome and you want to reduce the probability of it so it's like um is it time to stop now no no it's that's I respect that okay okay well nice to meet you all does he um yeah I mean I I mean this is like sort of anything we do like at some point like is it low like you're really hungry so like you're trying to like you know like you know if you have some deer you'll shoot a deer but if you can't find deer you're shoot squirrels is there some point at which you're just gonna stop hunting and you're just gonna like start like searching another form of food or you know whatever it's so like yeah at some point the problem is so hard that that it maybe that's not worth trying to fit a big model because the you put in a huge amount of effort and you you get nothing out of it beyond what you already have in your head so the that has to do with how much you know like a successful model will integrate more data and people get creative so like for it's something like the nuclear war just to sort of on it I mean people can do computer simulations they can run little war games with people you know how people play a game and see what it takes for people to push the button they can look at situations where like groups of people could die so people are actually risking your life they can do all the things they do as social scientists I'm not saying that like they have an obligation to put all the day together and make a big model I just meant if you are doing a Bayesian I love it then I would think that implicitly it corresponds to some level of replication okay so that's the base half then the other half is their frequentist half which is really easier to say because frequentist is about frequencies so in frequentist philosophy of statistics you evaluate the procedure over some probabilities are defined over some reference set so we have these examples like what's the probability it's gonna rain tomorrow so there's no frequentist answer to the question what's the probably it's gonna rain and new brunswick new jersey tomorrow because that's a unitary event it doesn't mean there's no frequency so you define a reference set then you say oh well we're gonna consider all days of whatever it is you know january 30th today the 28th okay okay so yeah so you consider all January 30th in you know cities in New Jersey or you know you say well that's not quite right because there's some climate change so we're gonna fit a model and so forth but then like then you'd say the residuals is you know there's a frequency distribution of those the point is that like you get and and like the frequency probabilities you have to have some reference set otherwise it's not defined and but it's not stupid like you don't really need a set of identical thinks that's just the simple formula the same way that priors don't have to be fixed they can change over time similarly like the reference set could be defined more abstractly as a probability distribution but it there has to be defined over some potentially infinite sequence of events so I think that there is a real unity between Bayesian and frequentist approaches in that way that I don't think it's always recognized and for me to tie it to replication in some way it's just like a treat cheap trick because everyone's talking about replication now but like maybe it's not a cheap trick like maybe there's something to that this idea of replication being fundamental to statistical theory as well as to to science I'll say two more things and then I'll either stop or pause like I have more that I could but I think it's probably time for you all that talk okay I mean I could I'm not going to I could talk about my like solutions to the replication crisis but I feel like that's a lost topic for this so the two things I want to conclude on are well I was at a conference here in New Jersey across the river a few years ago and it was about Bayesian 's and frequentist probability and someone stood up maybe it was Brad Efron or someone and said well you know I think the Bayesian should be a little more frequentist and there frequentist should be a little bit more Bayesian and and I stood up again another Fineman story I apologize I stood up and said actually I think the Bayesian have to be more Bayesian and the frequent tests have to be more frequent tests that a lot of problems in Bayesian statistics come because Bayesian aren't Bayesian enough and a lot of problems in classical statistics come because people aren't being frequentist enough and were you at this at this okay so you remember I said this maybe even yeah well yeah whatever it's a better story when I make the other person look bad I was like why I'm making him look very conventional and I'm looking so bold and clever you know flipping it around and all that so the examples I had one example for each so that the Bayesian example was I have an unknown parameter let's call it theta and we don't know what theta is so I'm gonna give it a uniform prior from minus infinity to infinity well I mean don't get all worked up about that like it's uniform from minus big a to big minus a to a where a is some number that could be very large so it could be a proper prior I'm not trying to paradox you on this one okay you know from - a thousand to a thousand and some reasonable scale then you get data oh and then you'll fit you have an experiment in the experiment like well it's complicated you gather some data you run a regression but when it all comes down to it what you do is you get an unbiased estimate of theta I'll call it theta hat it's normally distributed with mean theta and standard one on the appropriate scale okay so I could say it's like an observation but it's really the result of an entire experiment and then I do and I get a result and I observe y equals one so it's one standard error away from zero so as an applied statistician for reals I would say well this isn't you know doesn't look like much going on it's consistent with nothing going on at all I mean I'm not running a hypothesis test or maybe I am I guess I am running a hypothesis test I'm saying this this experiment doesn't really give me any useful information beyond you know the data are consistent with zero effect now but I could to the Bayesian inference after I wrote a damn book I'm Bayesian analysis it says that the according to the Bayesian law my prior is uniform my likelihood is centered at 1 with a standard deviation of 1 so my posterior has its normal with a mean of 1 and the standard deviation of 1 which means there's an 84% chance posterior probability that the treatment is positive that theta is greater than 0 that the treatment works whatever so this is kind of weird I mean like I have something which is pretty much the definition of noise and it's leaving me to say that I'm 84 percent sure that this treatment is effective like how can that be like how can I go through my life like that like jumping around like a puppet on a string you know like like complete noise it's making I would bet 5 to 1 in favor of this based on noise I mean like what does that mean so there's something wrong or like what if it was two standard deviations away they know be successfully significant I'd be able to but then I would bet 39 to one in favor doesn't seem right so it's suddenly the problem in the way I've framed it has to be with the prior that a uniform from - you know the big wide prior just isn't appropriate because the very fact that I framed this problem in terms of a zero effect I can't reject zero like it implies that like zero like not that zero itself is so likely but it's likely things will be kind of near zero that's like a real possibility and haven't really incorporated that into my prior and it's so it's kind of weird like it's in the framing of the problem somewhere so I think the point is that the Bayesian who uses the flat prior like in general statistical practice in general statistical practice it's it's really not a good idea to start betting five to one odds on things that are one standard error away from zero I mean like that's a way to lose money I I suspect you know fast way to lose money just or to make money just take the opposite side so I think the problem is the Bayesian is not being bayesian enough like the Bayesian is not thinking about their price now their frequent just not being frequentist enough that's very easy standard practice you get an estimate you do a study and if the estimates more than two standard errors away from zero you report it and publish it if it's less than two standard errors away from zero you shake the data until you get something two standard errors away from zero and you publish that if you do that consistently you will overestimate your treatment effects because after all you can only report things that are at least two standard errors away from zero you'd like so if the true value of the treatment is effect is less than that it's it's biased you know like it's biased no matter what but like it's if the true value is less than two standard errors it's it's like deterministically bias it's not even by it's an expectation so there the problem is in some sense it's not so much that these frequentists is not being bayesian enough it's that they're not being frequent test enough that they're not actually being they're not actually looking at the frequency properties of their method because they're not carefully looking at their replication they're not modeling what they're doing so they're wrong similarly with various abuses of p-values that that I I had this funny thing I criticized some paper I said this paper has this p-value but you know I I think you know that was not a pre-registered study and I think had the date have been different they would have done a different analysis and that people came back to me and said how dare you criticize our study for something we didn't do it's like you're saying like they were really indignant they're like you're not criticizing for us for what we did you're saying if we need something different we would have done something different what does it have to do with anything well maybe it doesn't have any to do anything but if you want to compute a p-value it does because the definition of a p-value has to do with what you would have done had the date have been different so they by writing a p-value they were making a claim about something they would have done so it's like it's like you it's your problem you didn't need to put a p-value your paper I never made you put a p-value in your paper you put it in you put it in so you're making a statement about what you again this is horrible I'm telling you too many stories that make me look good I apologize okay so in summary I think there's some sort of things out there that I don't really have the answers to but I do think that reproducibility is not just the name of a river in Egypt or just a current slogan but also fundamental to statistics and so when we're thinking about things like the replication crisis we shouldn't just think about various high-profile examples and silly studies and all but like think about this as fundamental like and I think with some point we're going to move I don't know the answer but I think we are going to move away from the things like P values towards statements that are more directly about replication and it's not there are things people have tried to do say well the p-value is the probability of a successful replication that's not really right like I think we have to compute different things but the point is that like replication is I think really it's it's not only fundamental to science it's fundamental to statistics and it can maybe help us understand our our methods a little bit so now you all can you know yes yes go back to what you said about Bayesian uh he actually even nice Republican prior the bias they are not be off they also do for you my question is that what should we do if we don't really have any you know prior data or I think you make us assumption I think the answer is transparency so like if I don't have a good prior then I would put down a prior that represents a certain model of the world I serve I'm a big believer in like putting it down and then letting other people criticize it it's a little bit like this trick if you have a collaborators not doing any work like you put their name on the paper and then if they have any decency they're gonna be like don't want to have their name attached to something crappy so then they'll start editing and fixing it up it doesn't always work once I did that with someone and he was like oh looks good it's like wait I wrote the paper and I put your name on it and then you said it looks good don't you understand the unwritten rules of you know whatever it's like it wasn't written so yeah I mean I think that I mean we gave in my paper on this subject going beyond objective and subjective in statistics we gave a couple examples but I I think put the explicit prior on and it might be sort of stupid but you use it and then go from there that's my belief try to repeat this experiment I can use you know whatever prior use the electrophorus paper or experiment um yeah I guess you would in the analysis I mean unless you think there's a problem with I mean then serve a general question they Cal exact should a replication be because there's no such thing as an exact replication because it's a fits on people it's new people so times have changed see is strictly speaking with the pre-registered replication the pre-registration parts more important than the replication part like you say exactly what you're gonna do and then you do the analysis so I don't think you need to use the same prior that used before but if it's different you should probably say why it's different the same way that if any if part of your substantive theory is different you would say like yeah we used to think that the mechanism of this was bla bla bla but now we think bla bla bla so maybe you change your experiment I I don't think we should go around replicating stuff that we don't believe just like to satisfy like I mean there's no law that you have to replicate anything like if you know there's an old paper in the literature you can believe it or not you know the typically we replicate things because we feel that we're going to learn something from that replication itself so their registration story progressed so pre-registration just for people unfamiliar is the idea that you just did the strictest version you write the paper including your design and your analysis before you collect the data and then you get it accepted into the journal and then you before you collect the data it's done so in Madison they do it a lot I think that you have to rent you have to pre-register clinical trials with the Fed Food and Drug Administration so it's medical journals like medical journals don't quite do that but like so for example like a journal like Lancet if it's a clinical trial though require that it have already been pre-registered so like you're not like your if you do an analysis that wasn't in the pre-registration plan you're supposed to say we did supplemental analyses blah blah blah so no well it might be published not in the journal but it will be it'd have been archived like it might have been it might have been published registered with the FDA I mean I don't know how much they're relying on people's honesty like I think there's a lot of transparency rules like relying in people's honesty isn't that bad because at least you're like like then you're forcing them to lie which is a bigger threshold and a lot of people a lot more people will shade the truth and will actually lie in psychology they have some journals that do it and political science it's not I mean it's not going to be so common because we often use available data you're not gonna say I'm pre-registering the study and first we're gonna create six more Wars and you know just didn't gather the data that's ludicrous it's in like economics right like you could do econ experimentation so but you know what about like internally so suppose you're like Google or Facebook and you want to you know you're doing like Google does does a be testing save a platform if you want to put an ad on Google and you want you can give Google for first and so they add and will a be tested or I guess if it's four versions ABCD tested for you and run the entire analysis and do it all now do you want to pre-register that well that's kind of interesting it takes journals out of forget journals right just say you're trying to make money should you do a pre registered study or not like it's kind of not clear like in some hand it's tying your hands on the other hand the advantage of doing the pre registered studies Google do it for you for free right so it's like it's funny like when you change that when you change the context it changes all of the calculations so instead of it being all about who do you trust and what about publication and delay it becomes these silly things like Google as a platform so yeah I'm gonna do a pre register because I don't want to waste my time analyzing so my my guess is that pre registration is going to happen most quickly in situations like that where it's economical that's not the things that we hear that much about or or would the FDA because there's a law you know like I don't care that much you know I'm not a social psychologist I mean I care a little about psychology but I don't care about it so much that like you know they don't feel like pre registering their studies you know it's some way that's that's their problem like I serve use I gonna just psychology from the outside cuz it's a great series of examples which helped me understand statistics psychology that can gather data so cheaply it's it's like they can pre they can replicate that's why you hear about live replication in psychology because it's just so easy to replicate these studies biology also but you know I guess that's as less of the it's less accessible oh yeah there are time right so no you're right so right so there's so in some sense there's a couple of different one type of pre-registrations people pre-register their own replications because their own research another is a sort of adversarial version where someone already published something and then the new group says I don't believe you so I'm going to show you by pre-registering though they don't like or it might be might be more agnostic like I don't know whether to believe you so I want to pre-register so that's presumably always going to be some subset of paper traditionally see the thing is replications kind of easy to do compared to coming up with your own study so traditionally the reason why people don't replicate is because you don't get credit for it you you know if you rep Lee if you do your own state you can get published in a good journal and you can get a job right the academic job but if you replicate then you're not gonna get a journal article out of it traditionally and so that's why they needed to organize these things because if they didn't like get together no one would do it but that's maybe really a different topic you know finance that County different their transparency with me telling about the search she went through me to get something significant and to be transparent about that kind of pin add to your paper and a counterweight from all the things you tried to do before you found something that you could publish very very strong leadership sociology so yeah I sure I think I guess the questionnaire but I guess my question would be do you think that do you think we'll get there well yeah I don't know I mean I think people should move towards the direction of publishing more I mean there are certain practices that we could really avoid like the practice of sort of going through doing a lot of things and circling the things that are statistically significant and and isolating them and and say I mean people don't understand this so like you'll see well you have a question so I'll stop well I think yeah it's it's in some sense even a little worse than that in that like it's people expect like you know they see textbooks and published papers and everything seems like a success story so people do an experiment I think like put I think that people have like a typically slow sample size psychology experiments let's say that like the probability of having success is probably like 6% it could be 5% if it was all noise so maybe it's really 6% or 7% some little number like that but I think people are the subjective view that the chance of getting a success is somewhere like one in three or one and two like everyone knows that they have failures so if you're a psychologist doing experiments sometimes you do a whole experiment work and that's like well that's too bad then you do something else and it works and like let's say a third of the time you feel like you have this big breakthrough or great success and you publish it and you kind of feel like you paid your dues cuz maybe 2/3 of the time you you you you know shelled it right but actually you only owe most of those one thirds themselves are really not replicable pause I mean it depends on where you're working but it it could be a lot of those have problems too I mean and so it's it's like that's sort of another aspect of it which is that I guess which is about people not being frequentist enough it's like but you come in with the expectation of success and then you keep getting it and so then you think things are going okay so in that sense the you know actual external replication was kind of needed because no amount of mathematical reasoning could convince me there was a study that somebody did where I do two quick calculations suggesting that to estimate the effect that like to really have a good chance of estimating any effect you would need a sample size a thousand times but hundred times bigger than what he had and like well I mean it's not ironclad mathematical reasoning because it did depend on the effect size but it was pretty this close to ironclad as you're gonna get and he just didn't believe me I mean like I didn't have a long conversation with him but like you know I wrote something and he didn't like you know cuz he found that he was getting successes and he was like looking at different things in different papers he was like going through the garden of forking paths and all that but you know it's some way like it's hard to convince people of that because there's always someone saying you can't do something mathematically and they could be wrong like I could be wrong too so the replicate little failed replications has a real value just because it's more direct than that yeah and you know it is hard and in policy analysis it's hard or like in education policy because if you have a brilliant idea usually somebody else has had it too and like you go to a school to try it your brilliant idea you're comparing your treatment to the control which is best practice and it might be that they're already doing your idea as part of best practice so it's like in medicine - right in fact in medicine it's considered unethical to study large effects typically because if your effect is that big you shouldn't be doing the control to anybody at all so it's where it's really like statistical theory is kind of classical statistical theory about getting statistical significance and Bayesian theory with the flat prior both of them don't really fit the modern world because they're both there the Bayesian is wrong because it's based on this flat prior and the modern world is that most effects are small peak near zero and the classical approach doesn't really work either because the getting statistical significance doesn't tell you much if you're if you're signal-to-noise ratio is low and it's so it's you know getting back to the selection on statistical significance is it that you submit your methods and then it goes to peer review and it gets accepted or rejected based on the methods and then yeah we're getting accepted you run the experiment of any publisher yeah it's already yeah so then I mean if journals the journal I mean this publication bias right but wouldn't there be publication bias here yeah and there's all there's always gonna be some selection bias you're removing one source but it's not yeah I mean I think like you know in some sense like the ideal world would be everything gets published somehow like the whole idea I mean publication as a threshold creates problems and it's like that's another system that seems a little bit broken now I mean I have to say like it can be good for our work right so if you if it's hard to get something published thing you work extra hard to make it read a ball so that's good you also work extra hard to like you know you you reviewers can catch errors this happened to me just recently you know reviewers caught errors in a paper and like other you know I would have published it and people would have liked based on my reputation gone and thought it was right and and you know so I'm really glad it's kind of random whether reviewers catch errors like there are three reviewers and only one coffee error so we were lucky right so that's great you know on the other hand there's there's a lot of problems with publication thresholds so I don't really know what like you know again like if you think about these models like I'm running ads on Google and I want to make more money like it's it's a simpler framework or like with the the FDA it's like well we're trying to decide which drugs to approve like that that's a fairly clean set of problems that we can think about a little more when you say like what papers should political scientists economists and psychologists be able to publish like it's very you know then it gets these uncomfortable questions like why are we studying these things in the first place and you know that no getting the grant is the end point obviously oh no no I mean it's not not completely clear I mean things do go on archive and get published publicized without ever getting vetted by it because they seem cool I mean it's it's it's people things get published in PNAS without getting vetted you know and then and then they get like so it's not it's not clear I mean I agree like getting rid of traditional publication like blowing it all up and starting over isn't necessarily a good thing because the the things like I wouldn't say publishing everything doesn't accomplish anything I think it would change things isn't that clear how good or bad it would be like other publishing everything might put more of an emphasis like it's you know somehow like then it puts even more of a benefit perhaps to making exaggerated claims and hype and all that like you do you'll see archive papers get promoted like when people when people start talking about an archive paper it's typically because it makes some big claim and it might be kind of bogus like at least like so in PNAS again has the same problem because it's kind of a none refereed journal so refereed journals make mistakes too but maybe at least you have to be a little more moderate it's it's not it's not completely clear a you know it's funny in computer science because a lot of times I feel like some people build machines to built to produce nips papers I mean like that's like the goal because like dude just like if you just like well if your professor you spend lots of time grading paper so it's like always first thing your mind it's like I have friends who are computer science professors and their number one goal is to like have all their students have a nips paper and paper in every conference and so like it's all about spilling out nips papers and if you look like a research program that's really good at spewing out nips papers they'll do that if you have something that can lead to an algorithm people might use it's not necessarily where they're gonna go so I don't know in statistics it's true I kids I always think it's funny because we always think our methods are so great and we're like begging people to usually our method is so great you know I'm gonna write a thirty thousand word paper in the journal that convince people of my methods better it's like if my method was really so great wouldn't people be coming to me and using it and like if it's really great nobody no couldn't I monetize that somehow like like like I don't know I'm as bad as everybody else like I'm out there like I do some consulting for money but it's not like I hide my best ideas I want everybody to use it well I like to feel that that's because I'm very socially minded like I've been given everything in my life for free I've had a you know free graduate education and all that so I owe it to the world too but it's not quite that either right like it's there is something funny that we think our ideas are so damn valuable not just as efficient right academics like our ideas are valuable and they're brilliant but at the same time they're evidently correct you know they're like it was said of Beethoven of his music that every note not every note but like many note like while exaggerate every note was unexpected but yet once you heard it it sounded just right and that's like our research like our ideas are so brilliant nobody else but me could have come up with this idea this was so great and now having written it down it's so evidently corrected who could ever disagree and yet we find ourselves screaming at people listen to me like here I am doing this well but then I mean there are there are people with other ghosts I mean like I like okay we're talking really about political before they talk about political polling so pollsters there are some pollsters who actually want to get results are different from everybody else because they want to be in the news other pollsters like it's kind of funny so like political polling is like a loss leader for pollsters and so a lot of pulse some pollsters they want the press so they want to be the one pollster who's different it's like putting a bat on their thing and if they get their bat they become well-known right but other pollsters they don't want to be surprised they just want to act normal because it wouldn't be like everybody else because they're actually not making their money from political polls they're making their money from other things so then though like basically adjust their polls to be like everybody else some people have found that is the election approaches the variance among the different polls becomes like less than binomial like because they're all like hurting whenever so yeah but I guess what I'm saying is like pollsters they kind of do when you get it right like all em even setting aside those examples like pretty much they want to be the one if everyone else does right they want to be right if everyone else is wrong they still want to be right they don't want to be like everybody else you know so forth so yeah they're motivated to get things right but you know it's hard like it's not I mean I spend lots of time trying to convince pollsters to adopt new methods and the funny thing is I have a colleague you like works in a poll like he's a pollster and he doesn't make a secret like we use certain methods he tells everybody what methods they use it's not like it's not like this is our secret sauce it's more like like this is it you know this is what we're doing this is great and it's like a sales thing like you know use us because we know what we're doing but like I guess that the the business logic of it is that he is they have specialized expertise so they can implement it better than other people but but as I said even in areas where there's actually money to be made like you see surprisingly little secrecy perhaps it's because pollsters are more cooperative een I think maybe different pollsters are like different NBA teams like they're competing but like everybody wants all the NBA teams want basketball to be more popular and like pollsters are sort of all like selling the idea that polling works and that's sort of how scientists are to like where you know science is our product but it's and I already said it yeah like the Philosopher's with that one yeah I've just say you all are the best audience ever usually I talk and nobody asks any questions the true prior is the set of true Thetas corresponding to the data sets that you analyze so like everyday you get the data and then there's a parameter car spying - you don't see it so you the only way you can you you'll never know that you'll never know the truth prior because you'll never know what were the truth eight is coming in but you can deduce it so if if you have stationarity or a stationary you know in the time series sense of a model as you get data more and more data you can do it's like an inverse problem what was the prior that gave you this set of data sets and if your model is correcting eventually you can learn the prior and use it so at least the point is really that in this particular version of the world this story there is a true prior and my hang-up is I think usually the way statistics is taught is there's only one data set so the usual model is the exact same model I had except you don't need the slot for the food or the slot to get rid of your waste products because you're only analyzing one data set then then there's no true prior because there's only one theta so my innovation is not to put the guy in a window or girl or whatever the person in a windowless room my innovation is to is to have the person operating repeatedly and that's creating the true prior so that's sort of another way of saying that Bayes is really about replication because other without replication there's no such thing as a true prior then it does it doesn't mean anything I don't like that phrase empirical braze base because I feel like that's like implies that there's some non-empirical that Bayes is usually non-empirical which I don't so I don't I mean I mean we you know like in our book we talk about hierarchical modeling so you know they're like you know in practice like you'll have a prior distribution that'll have hyper parameters that you estimate from data so but yeah I don't like that term just sort of seems a little insulting to me I don't know yeah it would be like if I said are you like a would you consider yourself a sophisticated person from New Jersey you know have some Jersey joke if I'm coming over here from New York because was something like an empirical amazing Sybase and selfless that his prior distribution was it could be avid read enough of it maybe his eye perhaps he felt that he had perhaps he felt that he had to be Bayesian if he wanted to say that cigarettes didn't cause cancer because he had to have a really strong prior on it that in order to evade the date off so maybe that was his reasoning but I don't know I haven't read enough of his stuff yes yeah all day he's there's a long list of famous statistician who have consulted for the cigarette companies number one yeah well as I said if it's non stationary you sort of have to have more model like to deal with it like yeah and it's at some point your sensitivity to the model list is so great that you're not getting much out of it anymore but I think that's true of that that's true yeah I mean I think it's just sort of true of probability in general so probability works really well for things like flipping coins where there's a very clear physical mechanism and it works well when you have very clear replications for things like the probability of rain tomorrow it doesn't work so great because the replication is you know ambiguous and you know people end up modeling right so you do the adjustment here an adjustment there and and you know fitting something like a regression model you could get predictions and pass statistical tests but all that fooling around seems to tell us that probably well I I just I don't know I mean I I in I thought a lot about this when we're writing Bayesian data analysis and I I worked hard like in chapter 1 in that book where we talked about like foundations of probability and one thing I felt was that there was no single true foundation so like for example you can talk about bedding but bedding is kind of weird in some way because first it brings in money and that's complicated but also to bet you need to have someone on the other side of the bed and then it brings in a game theoretic aspect but then then symmetry is another principle so that works with dice and coins but you know symmetry famously does not work with what's the probability that there are cows on the moon you know there's there there are there aren't so it's 50/50 you know that can't be right and like there's there's a lot of paradoxes using symmetry so you say well we can't use symmetry so we'll use replication we'll use a reference set but the trouble of course once you have a reference set why do all the things in the reference set get weighted equally that symmetry right so like you can't do either I think there's you know that Turtles all the way down like this there's no single like probability I think itself is a model of the world and and so I don't I don't think that there's not a single to me like it's or if there's not a there's not a single foundation of probability I don't I don't think it makes sense I get kind of upset when people like think that there's like one story like to say probability is like all about betting or it's all about frequencies like that to me it would be saying like what's addition well you could add sticks together like two sticks with two tricks equals four sticks but you could also like add water together like if you mixed a quart of water in another quarter water you get two quarts of water so this is is addition really about sticks or is it about water or like a that's ridiculous right like that's how I feel that these are these are all bunch of different examples not yeah well that's what's all about although now I don't have the satisfying sense of having convinced you because you're already in agreement you can feel you convinced me Anna wait will you do you come late anyway no too bad I talked about that before III yeah what do you mean not that but seriously that is serious what do you mean not that but seriously you know you kind of popular face into this what is it genuine stochastic am like and the other one is even more interesting I think it's much physical mechanics we have even classical mechanics we'll even quantum mechanics we have both and you still have they probably didn't matter over in hospitals dance well where for example you can get a lot of predictions about arrow time asymmetry in histories and the macroscopic occurrences all from fundamental physics and the idea that that might be the foundation the true foundation pocket of you that give rise to frequencies and business priors and others mr. Beale very popular around here well there's one thing about this I think this is interesting but the statistical mechanics haven't thought that hard about I mean I you know I always sort of have a kind of simplistic kind of Jane's Ian's view that you're just it's there's a joint distribution of everything in your conditioning and some subset of things so like you know the temperature and the pressure of the gas are macroscopic properties and there's you know like there's things like the canonical ensemble and the microcanonical ensemble and the macro canonical ensemble are just different conditional distributions of the joint I haven't thought about at the end work for the beyond that the real world because you know we started on the theory like his dick oh that that part that stuff that that's um yeah this is for closed system so you do this but for the open I mean right now no I understand no I see your point and I read about that but I really know nothing about I can't say anything useful beyond what you've already said I mean you'll relates a little bit to economics so I think you economics like I said how we live on Cantor's diagonal so economics kind of lives in the face transition because like you know I was just talking to my fourteen-year-old kid today and he's were walking down there's like a an ad on a bus for mattresses how come those mattress stores have so much money that they can always advertise like what's what's that about it I don't know like you know according to Karl Marx like they should only just have enough money to be able to pay their employees starvation wages and it hasn't quite happened yet and I don't fully understand but like in economics there's this idea that equilibrium like if we're ever at equilibrium in economics everything would be kind of frozen so we need this liquidity and somehow lissa quiddity is like created by things like people living and being born in people dying and unexpected things happening like diseases and technological changes and people meeting each other like and if there were no if we were in equilibrium nothing could work but yet if we weren't kind of close to equilibrium then the laws of economics would make no sense so economics is kind of living on a phase transition so I've always you know I personally find social science more accessible than hard science which is why I switched became a political scientist residens than a physicist so when I like think about physical principles I'll often like draw analogies like you know in the other direction like that so that's I'm not adding anything except that whatever you're thinking I you know economists talk about this to economists are very smart like any idea I have some economists have had to but I don't find this in the mainstream of their discourse and perhaps it's just like statisticians we spend so much of our time trying to sort of fight against misunderstanding some statistics economists spend lots of times trying to explain to people he come 101 all right like you know like how come you understand he come 101 you know you can't do this you know you can't just print more money and make people rich so they're always talking about stuff like that so they never get to the interesting stuff I'm sorry they're no good ending for a philosophy seminar I don't know
Info
Channel: Harry Crane
Views: 15,228
Rating: undefined out of 5
Keywords: reproducibility, andrew gelman, statistics, bayes, replication crisis, science, harry crane, rutgers, philosophy, probability
Id: xgUBdi2wcDI
Channel Id: undefined
Length: 103min 15sec (6195 seconds)
Published: Tue Feb 06 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.