35th Fisher Memorial Lecture: And thereby hangs a tail: the strange history of P-values

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a very great pleasure pleasure and an honor to be given this lecture I'm a real fish of fan although eyes if I'd known him in real life I might have found him to be rather difficult care to get on with so first of all some acknowledgments thanks to the trust also to various people have helped me various papers and in particular John eldritch and Andy grieve have made very helpful comments on and early but early version of this talk and to say that some of this work is supported by a European Union grant that I have an apology to all of you the abstract follow that probably said an awful lot I was going to cover the whole of the history of p-values and that would start with John Arbuthnot a Scot back in 1710 or possibly with Daniel Bernoulli a Swiss and continue to 2017 but in fact I'm going to limit myself to just about 10% of this but if you take what I promise as being the null hypothesis tell 10% is not statistically significant and so I haven't rejected my promise so on that basis I'm going to carry on and I will actually sell my conscience by pretentiously sprinkling in a few names as I go along and I have various excuses for that but the best is this this lecture stands between you and a drink and that is very worrying for any particular lecturer so an apology also the Bayesian z-- I'm going to say a few harsh things about Bayesian statistics but not about statistics in itself simply about a particular criticism of Fisher which Bayesian make and I'm firmly convinced that really what's happening is a sort of proxy war it's really a war between different schools of Bayesian statistics so don't let there have to be at war but I think they are in a sense and in the middle are somehow p-values and when you look at the history of p-values you'll see why this is so so I'm going to start with a case against Fisher so here we have Fisher the cause of inferential confusion and Robert Matthews is a very interesting person he is not only a journalist but he's a physicist and he's a statistician on as I often say the intersection of those three sets surely only has one individual in it so he is actually sweet generous Roy officials gave scientists a mathematical machine for turning baloney into breakthroughs and flukes into funding and you may say well that's pretty amazing where did that appear where it appeared in all places in the Sunday Telegraph and if the Sunday Telegraph knew as long ago as 1998 that P values were a busted flush why on earth are we still talking about them here in fact he's quoted by David Cahoon a fellow the Royal Society who recently very very late in his life has become converted to looking at p-values in a Bayesian way although he can't bring himself to utter the B word and so he sort of says he's not really a Bayesian and he doesn't understand what Bayesian is one about but anyway not the only one here's Frances Hanscom who was a very great statistician himself who says this results of clinical experiments often culminate in a significance test it is natural suppose that well establish cystic all theory supports such tests this is not so our significance tests of such null hypotheses at the end of an experiment can fairly be laid at our a Fisher's door especially because of his insistence of them in the design of experiment so a particular book that Fisher wrote that was very important they were actually myself I would have said statistical methods research workers perhaps to deserve more of the blame and here we have Fisher the false and plagiarizing prophet Zilly akan McCloskey as iliac is a statistician who has at least one good thing going for him his first name is Steven and here he says that student of student's t-test was right and his difficult friend Ronald a Fisher though a genius was wrong and then they want to essentially convince you that all of Fisher's vast inferential machinery was a terrible mistake so this is the false history that I'm going to claim is the is what people are saying I'm gonna claim is false to the extent that scientists were using formal inference before Fisher came along they were using something that we would now call Bayesian although the word was not actually used they were dealing with what was sometimes called inverse probability our efficie invented p-values as part of his rival sis system of frequentist statistics and they gave statistics much more easily this is the the breakthrough baloney machine that Robert Matthews somehow Fischer made it easier for them to claim Eureka I found something important so they became an instant hit and they seduced scientists away from the path of Bayesian rectitude and this is largely responsible for a replication crisis that we now face people publish one thing people try and repeat it well not very often but if they do then what they find is it doesn't repeat and so the question is what is the cause of all this but the history is not like that it's not true in my opinion p-values may or may not be good statistics I'm not going to claim that they're a particularly brilliant way to look at things but what I also think is that setting the historical record straight will help us to see what the problem is but the bad news is we're nowhere near a resolution we are not in fact I think in the near future going to find some automatic system of inference or even a theory that everybody can sign up to but I'm also well aware that the problems we face will have to be solved by better brains and mind so let's go back to gossip this is the person that celiac and McCluskey said was the true inventor of modern statistics he was actually born in Kent he was educated Winchester in Oxford he got a first in mathematical moderations and a first in a degree in chemistry and unusually for someone who was a leading figure in statistics he was he was educated at that time he was educated at Oxford I think I'm fear that I having said that that our own president may have been educated Oxford originated I didn't wish to I don't wish to cast assertions on that particular place but traditionally what happened was statisticians were educated in Cambridge and they then went to work at UCL and I put that like this the reason was that only Cambridge was good enough for statistics but statistics wasn't quite good enough for Cambridge and so basically statisticians were forced to go off there but Gossett is an exception and he went to work of all places he went to work with Guinness in Dublin and Guinness at the time had a very enlightened policy of getting scientists to go out and interact with the academic world and they gave him a year's leave of absence to go and work with the leading statistician at the time was Karl Pearson and during that time there he published a paper called the probable error of a mean in biometrika which we regard as being the original of the t-test and this is the paper here it's divided into the following nine sections I'm not going to go through all of this but I'm going to go through an instance of their use which appears in the table because that particular calculation that student made was also repeated in a slightly different way in Fischer and that will help us to understand what's going on so these are in fact the original data that student used he used data from after cushiony who was a physiologist who was born in Farber's which is a play a town on the Maury Firth and he then went to study in in Bern and Strasbourg and was appointed at only the age of 27 as the professor of materia medica which is essentially the the professor of pharmacology at Ann Arbor University in in Michigan so at the age of 80 27 only and he we worked a lot on optical isomerism and together with the students of his Alvin Peebles they published this paper in 1905 in the physiology journal and what they had done was they compared the number of hours of sleep gained for a number of inmates at the in sinusite insane asylum at Kalamazoo to see how much extra they slept if they were given these various treatments by the way they had in fact tested these treatments on themselves so they were not using the inmates as guinea pigs without having first of all check that the stuff was safe but they these are their figures so these are the figures as reproduced in in by students actually student copies down the data incorrectly so it's not in fact a comparison of dextro and label form it was a comparison of the Rasim eight and the labor forms for the mixture of the right hand the left hand form and they were trying to infer the difference between the two forms by comparing the mixture of the pure form and in any case he also got down a different drug altogether but anyway as student finds these particular differences here you'll notice that all the differences with exception of one are positive so it seems to be the case that the the particular inmates are sleeping rather longer when given the left-hand form then when given what students assumed was in fact the right-hand form and what student says is and this is the interesting thing he does an analysis and then he says of course the odds are about 666 to 1 that 2 is the better soporific so immediately he doesn't do a modern frequentist calculation he essentially tells us something about the true value of the drugs in a sense he's saying something about various hypotheses and he's claiming here that the odds are 666 to 1 that 2 is the better soporific that the levo form is better than the dextro form as he thinks and odds of this kind make it almost certain that 2 is the better soporific and in practical life such a high probability is in most matters considered as a certainty so here we have someone who's essentially using a Bayesian argument in what sense I'll explain in a minute who considers that a particular probability of this sort of sort of nature is proof positive that one form is better than another what students who did not do what he didn't did not do first of all he obtained the distribution of the sample standard deviation that was necessary for him to do this he showed it was uncorrelated with the sample mean that was also necessary of him to to develop his particular statistic he obtained the distribution of the ratio of the mean assuming independence and through a rather indirect argument he went to an enormous amount of work working with a machine calculator hand driven calculator to tabulate this distribution he carried out various empirical investigations he also did some simulation simulated some numbers he applied it and he interpreted the probabilities in a Bayesian way what he did not know do was he didn't show that the sample mean and variance were independent he didn't generalize the problem beyond one sample he didn't define the T statistic in its modern form it's actually due to Fisher he did not use the modern significance test interpretation and he did not explicitly use Bayes theorem or any derivation that would now call Bayesian during the actual calculations that he did the extent which is a Bayesian comes from this statement only within the paper the usual method of determining the probability that the mean of the population lies within a given distance of the mean of the sample is to assume a normal distribution about the mean of the sample with a standard deviation equal to s of a routine where s is the standard used in the sample and to use the tables of the probability integral so he doesn't really say anything about why this can be justified in the Bayesian sense but he does claim that some sort of a Bayesian statement can be delivered what we would now call a Bayesian statement more explicit reference to prior distributions is provided in his correlation coefficient paper and I'm grateful to Andy grieve to pointing this out to me so in another paper in the same journal he does use a more explicit Bayesian approach now compare student and Fisher what student says this 666 to 1 figure I've already already noted Fisher uses another argument he says let's assume that the two treatments are identical then in that case only one value in a hundred of the Cystic statistic calculated will exceed this value in other words what he's saying is if you believe that these two treatments are identical then the result that has just been seen is a very very strange and unusual result he uses an indirect argument he doesn't directly tell you what the probability is that two is in fact better than one so if we look at students analyses in fact he compares three different analyses but let's just have a look at the one we've been talking about so far odds of 666 to 1 that's the figure he concentrates on what Fisher does instead is he calculates the probability of observing a result as extreme or more extreme well in fact he doesn't calculate it completely he just says it's no more than one in a hundred and why does he do that because in doing that it obviates a necessity to be involved in more complicated calculations what he did was he provided you with the one in a hundred tables you simply had to compare it to this table to see whether the statistic was larger or smaller than you would get and these are in fact the critical values of T the value of T that you need to have a look if you're going to do a one-sided test you would now call 2.8 to one and two-sided would be three point two five over this is the value that we would use so what did Fisher do he reformulated the statistics so that asymptotically it's normal naught one students table didn't have that effect they weren't going anywhere interesting but in fact fittest Fisher's table go towards the normal distribution the distribution that was being used for large samples already by everybody before student came along he showed how it could be extended to other cases he showed you could use it for two samples he showed you could use it for regression coefficients he generalized it to three or more means he stressed an alternative interpretation the one we now use but they could actually also be found in Karl Pearson's work and he suggested a doubling he said you really should double it because you didn't know beforehand that you were going to find that the left-hand form was better than the right-hand form it could have been the other way around and you really ought to pay some sort of penalty for that and so what he did was he proposed a doubling in fact what he did not do is he didn't actually Cal Cate the p-value he didn't calculate it here he media said it's significant at the 1% level funnily enough he does calculate the p-value for the scientist he notes that the values are significant that they're all in one direction 1/2 the power of 9 is 1 over 512 and he tells you this particular figure if students have modern resources this is what he would have calculated for his tail area probability the 666 corresponds to P one side of the point over one four and had student had fish being able to use a calculation at the flick of a switch or the press of a button as we now do nowadays all the click of a mouse I should say then in that case what he would have got is he would have got a value of 0.02 8 and this is simply twice the value that student got the only difference between them would have been that the probability would have been doubled that's the only different but the point to remember is that students interpretation is Orthodox Bayesian or at least what we would now call Bayesian according to one particular model at that particular time point a diversion here by the way is your sometimes find this mysterious thing especially in America referred to as an NHS t NHS T is null hypothesis significance testing its regarded as being a ghastly hybrid between a system due to Fisher and one due to Neyman and Pearson confusingly the pearson in question is not Karl Pearson but his son Egon Pearson who collaborated with Yeji namin a prominent permit polish mathematician in the 1930s to develop Fisher's approach further and this is what Steve Goodman writing in statistics in medicine says in an NP hypothesis test there is no formal role for a p-value and we're supposed to accept or reject the null hypothesis depending on whether the observation falls into a critical region defined by the study's design and the pre-trial Alpha and Beta errors the precise location of the observation within the critical region as indicated by an exact p-value is not relevant he's actually claiming there's a fundamental difference between Fisher's approach and naimans approach but we now have an unholy mess in which these two things are mixed together and the first thing any biologist does in writing out a plan if they do write a plan physical analysis is to say the significance level will be set at 5% and then what they do is they calculate some p-values and they compare these p-values to 0.05 or not if they are less than 0.05 then the champagne is broken open if they are greater than 0.05 then some serious critical thought is given as to how possibly they could be into the right direction and if that doesn't happen instead something is written like this with a p-value pointer weight one might write wrote write the results showed a trend towards significance and this I often say this is the wonder of having a medical colleague this is a sort of thing they can do a statistician couldn't do they know where the p-values are going whereas we regard them as being already in a particular place they are that are either are or they aren't but they know that with only more data they would go in that direction which just goes to show that they obviously have some form of clairvoyance so but actually this is not true this distinction is completely false no less a person than layman who was an avid disciple of yes your name ins says that the value of a p-value is it enables others to reach a verdict based on the significance level of their choice so you can imagine it like this pearson writes the name and saying my dear naman results significant and naman says to pearson i'm my dear pearson this is no good because i know you always use the 5% level and i always use the 1% level how am i to determine that a result which is significant for you is significant for me and vice versa so in fact naman can say to pearson my dear pearson result not significant and pearson says well how do i know it wouldn't be significant for me eventually they will stumble upon the idea that what they need to do is to quote the p-value even though they are hypothesis testing and you'll find this in statistical hypotheses by eric layman so here's a diagram explaining if you like how students saw it what he thinks he's done once his calculates all of this he's produced a Bayesian posterior distribution the probability is somehow centered on the point estimate this is the mean difference in hours of sleep and he's following laplace via airy who was a particular astronomer come petitioned the day that everybody read when they wanted to know how to calculate things with errors involved and the distribution is centered on the statistic and it's a statement about the probability of the parameter but if you look at Fisher he looks at it rather differently as far as he's concerned he's going to play devil's advocate he's going to say supposing there is no difference between them now let's have a look how the statistic would vary given that there were no difference and so he's interested in this particular tale here this is the value of one and a half or the difference of about one and a half between the two Rasim eighths the two sorry the two isomers not the Rustom a two isomers that there are and this here is the critical value and he just notes it's beyond that and that's the day this is the one going to correspond to one percent if I put the two diagrams together then this is what they would look like the blue one here is Fisher's curve and the red one here is students curve and what they could do is they could calculate two probabilities which exactly agree and I'll show you that by actually magnifying the diagram here this is a probability that the two the the left-hand form is in fact worse than what a student thinks is the right-hand form after all and this is the one-tailed p-value and these two particular air are exactly the same so the point is numerically student and Fisher do not have to disagree and the only extent to which they would disagree is that Fisher would produce a more conservative inference by doubling the tail here by the way is a geometric way that Fisher looked at the t-test he said supposing is supposing we just knew how variable the data were then in that case we could see how unusual a particular result was and here I've taken a rather degenerate case in which we have only two observations because geometry is notoriously easier to see when you only have two dimensions than anything else and I certainly can't follow Fisher in his inferential insights and here we have the point of zero zero the first observation the first poor patient who has given this optical isomers and we tested the difference between the hours of sleep on one occasion another and the second one here and these are two particular results we have a sample size two we can plot all these results here are some simulations by the way that I've done everybody seems to love simulations these days these are the boundaries that will be critical I've actually chose a 10% region simply because it's easier to see what's going on and all the red points are samples that would lead us to believe that the mean was not in fact roundabout zero this is the value we should see if there was no difference between the two isomers this is the situation if there is in fact a big difference we then expect all the points to go off in one particular area and the statistical test would enable us to say now that many many more occasions are significant so many more more experiments that we could do here with two individuals only would actually cause us to believe that the result was significant unfortunately the contours of this particular diagram are not known to us they're known to me because I'm the god of this universe I created this simulation I know exactly what everything is but they're not known in general to anybody else so what can we do and this is where Fisher shows that's a very cunning geometrical construction you could do he says supposing what I do is I simply cut a slice into the center of the cake this particular slice will cover 10% of the cake altogether that's what I propose to do or whatever percentage I've chosen here a bit doubtful so whether there's a 10% now but anyway whatever it is you can you can fix a certain slice that you cut and you don't need to know the density in question the cake can be dense at the middle and not so dense at the outside you'll still end up in choosing exactly a particular percentage of the cake and this is what you see when the points go off in the other direction because you don't know about the contours of the cake you suffer a loss you more often mistakenly classify points being not significant only the red ones are significant but nevertheless you have a method of guaranteeing but you won't claim significance more often then you should do if the null hypothesis is true and you still have a way of claiming rather more often if the null hypothesis is not true this by the way is very easy to demonstrate algebraically but in preparing this lecture I remembered the warning of the wise Chinese sage confuse us not to be confused with Confucius and confuse us warns any lecturer if you commit algebra you will suffer an aftermath that's a joke by the way so to sum up fish and student did not disagree as regards probabilities numerically at least not in any way that caste fish was more liberal we've also got the whole business of the two tail controversy I don't want to go into that but they differed as regards interpreting the probability and Fisher saw that any Bayesian interpretation depended on particular prior assumptions students simply used a default argument the reason that students paper was only influential was eventually influential is thanks to Fisher so I just want to take this opportunity to reject celiac and McCluskey's claim that Fisher was merely derivative here and not very helpfully so so who did produce a formal Bayesian derivation the T distribution well Steven Stiegler another gentleman with an excellent first name he has a law of apana me which basically says that if anything is named after somebody they definitely did not discover it and at the end of this wonderful paper he reveals that of course stigler's law of autonomy was not discovered by Steven Seagal er he was discovered by Robert Newcomb but we all know of it now as Stevens testigos law of autonomy and that just justifies what he says but who in fact discovered the t-distribution will definitely Jakob look back in 1874 John all just been testing me that day Deacon was possibly there before in 1860 Edgeworth again in 1883 Burnside independently in 1923 but by then Fischer was on the case Jeffries again in 1931 and of course he did in his book of 1939 but Jeffries is a Bayesian and he uses a formal Bayesian argument all the way to produce this particular test so I'm now going to carry on and consider the story with Jeffries and this is the point at which I get to sprinkle lots of names pretentiously you will see here we have Laplace Nucky de Laplace who was a French mathematician eventually appointed by Napoleon to be the Minister of Interior affairs an expert on calculus and Napoleon dismissed him saying that he carried into the business of government a love for the infinitely small and so apparently didn't turn out to be a great great administrator but he was made a baron by Napoleon that didn't stop him actually helping to convict Napoleon in absentia when he was finally dethroned as Emperor de Morgan 1838 who was one of the first people to work at University College London but here this is Venn of Venn diagram Fame and a great critic of Bayesian reasoning and he introduces this particular formula based on a reasoning of the plas imagine that you have a number of counters in a bag and so far you've drawn them out you pulled out counters without looking inside the bag having shaken the bag out very very thoroughly and you pulled them out and what you found is you found that all the counters so far let's say are white you want to imagine that so far before you even started majorie device you saw one counter that was white and one counter that was not and your estimate of the probability that the next counter pulled out will be white is the ratio of the total counters you've seen so far that a white plus 1/2 total counters that have been pulled out plus two thus far mathematics as Venn says the place for instance has pointed out at the dates the writing of his essay feeders are fake the odds in favor the sun's rising again on the old assumption of the age of the world were one thousand eight hundred and twenty six thousand two hundred and fourteen to one this is assuming you've been around to see it rise all that time the Morgan says that a man who's standing on the banks of a river has seen ten ships pass by with flags should judge it to be eleven to one the next ship will also carry a flag so this is the sort of reasoning that character that went on in those days but a particular philosopher at Cambridge Charley broad pointed out that there was a great problem and broad is a very interesting person because he turns out to be a scientific skeptic who believed in X's for extrasensory perception so some of his particular research was psychic research and he was a knight bridge professor of philosophy eventually but in 1918 he published a bit of a bombshell and it went like this he said that's fine but if you think this is an argument for proving scientific laws as a problem because what the plas has given us is he's given us a rule for proving that the next count are giving us the property that the next counter drawn will be white but if we want to prove a scientific law where´s making a statement about something which always happens and that's equivalent to saying all future counters will be white and it turns out using exactly the same argument that although it is correct that the probability that the next counter that will be drawn will be white is large given that they all have been so far the probably that all future ones will be white is small and in particular if the number of future counters exceeds those that you've seen then the probability is less than 1/2 so Jeffries thought this was horrendous this was a terrible assault on the whole business of scientific inference and he says what broad showed was that it was no justification this is a sailor plasters approach what ever for attaching even a moderate probability to a general rule it is necessary that it must have a moderate probability to start with thus I may have seen one in 1,000 of the animals with feathers in England on the places theory the probability of the proposition all animals with feathers have beaks would be about 1 in 1000 this does not correspond to my state of belief or anybody else's he thought there was something basically wrong and he was going to fix it and he fixed it by the way this is the economists getting it wrong the economists in 2000 had an article saying the new Bayesian era is upon us and here they said they talked about a newborn babe seeing the Sun Rise every day gradually the initial belief at the Sun is just as likely as not to rise each morning is modified to become a near certainty that the Sun will always rise so there we have the Economist is actually 82 years behind Charlie broad on this particular one they get wrong precisely the point that he was making so Jeffries didn't do this on his own in fact he did it with Dorothea rinche who B was a philosopher who eventually went to America and they did a paper together and the conclusion they came up with you needed to have a lump of probability on the scientific law being true you couldn't smear simply anything you like you couldn't say every value is equally likely you had to have lumps of probability on particular values being likely this by the way is a picture of Dorothea Ridge at least it's a picture I think I found it on the internet and it seems to suggest it with Dorothea brinch I don't know for sure so here's a simulation again here I show how this works I'm going to do two particular approaches to hypothesis testing one of them is what David Cox has called a dividing hypothesis I believe it's possible that one of the two of the two treatments be better I don't really care too much about them being equal anyway I simply want to avoid claiming that what I think is the better treatment is really better when it is in fact worse I want to avoid making an error of reckon mending to a patient that they take what I think is a better treatment when in fact it's worse on the other hand I don't really believe necessarily in the identity of treatments on the other hand what you could do is you can have a different sort of test you could actually believe that two treatments were identical and ironically at the time that cushiony and pebbles were doing their work it would have been possible to believe that two optical isomers could have absolutely identical effects because certainly not very long before that chemists have been astonished to discover that under some circumstances they didn't these were two molecules which to all intents and purposes were indistinguishable in the two dimensional space writing them down a sheet of paper they would look the same only if you could envisage them in three dimensions would they be different the same atoms were involved the same atoms at the same positions it was just some slight configuration that was different and somehow nevertheless they could make a difference and this eventually the fact they did led to the specific receptor theory of pharmacology the idea that a molecule actually act as a key in a lock on the cell and switched on something in the body so these two particular simulations here what I'm doing is I'm working out I'm working at how often I would get it wrong if I was simulating from these two particular models what proportion of the occasions in which I claimed there was an effect would there actually be no effect and what you can see is that the Bayesian posterior distribution here it actually tracks the p-value and there's some simulation error here perhaps whereas if I use a Jeffrey's approach I get something radically different it's the lump of probability that makes the difference not here certainly switching from p-values to Bayesian prior distribution now none of this is radical new everybody knows it but for some reason it's completely forgotten every time the debate takes place I don't understand why so why the difference the difference I think is fairly easy to understand this is the likelihood this is the probability as a function of the true difference Delta the probability of the result that we saw as a function of the difference between the two treatments this is the value that statistic has and I've taken here a case where it will just significant at the 5% level two-sided and it's a huge sample so I can use the normal distribution not students T and the wonderful value that we all remember off by heart is 1.96 similarly we can all remember the value for a chi-square with one degree of freedom 3.88 four but two degrees of freedom five points something I forget but anyway it's one of those values we know 1.96 that's the value here and this is the likelihood and now let's imagine we have a dividing hypothesis I'm interested in knowing whether the yellow area is what is true about the true effect the two difference Delta or the blue one this is the likelihood of the result under the parameter and you can see it in this particular area here it's fairly well supported but it doesn't matter even when I'm out here well to the right I can always find values here much further to the left which are not as well supported in other words the picture if this is what I'm looking for the picture is moderately encouraging that what I have got is I've got an effect but now let's imagine I'm in the world of Jeffries hypothesis testing this is the alternative hypothesis all of this and so is this all of that is your turn defy pathes is and I now have an infinite jar of probabilities I have a mass of property at 0 every time I pull out a likelihood value here then in that case I can pull out 1 here and now it turns out that many of the ones I'm looking at not just in the wrong tail but also in the right tail all the values here are actually less than the likelihood at this particular value here and people like Davy Cahoon sometimes say well I start out with the probability that the null hypothesis is true of 0.5 which is surely not unreasonable it's not so much the 0.5 of the null hypothesis it's the exact mass on about round zero or an accretion around zero which is the problem so this is what I offer you as the real history scientists before Fisher were using tail area probabilities to calculate posterior probabilities this followed more than a century of work interesting mathematical statistical work largely grounded in Laplace Fisher pointed out this was interpretation was unsafe and offered a more conservative one Jeffries influenced by broads criticism actually I'm a bit uncertain pronouncing it broad it may well be his name was Brody I don't know but anyway broad criticism was unsatisfied with the laplacian framework and he used a lump prior on probability on a point hypothesis being true in actual fact it's a magum makers recently or Varg and Marquez perhaps I don't know have claimed that Haldane in 1932 anticipated Jeffries but then again I have to say well that might be right has regard some technical details but Jeffries and rinche had already had the idea earlier than that anyway it's Bayesian Jeffries versus Bayesian the PLAs that makes a dramatic difference not frequentist Fisher versus Bayesian Laplace so in summary what I think is that the major disagreement is between p-values and Bayes it's not between sort of p-values and Bayes it or at least not Daisy was going to informative uninformative it should be uninformative prior distribution it's between two Bayesian approaches using uninformative prior distributions and using a highly informative one this conflict is not going to go away by banning p-values and if you have a look at the aasa' statements of the american system associated felt it was necessary to put out a public statement on p-values and as i pointed out last year when we were talking about this we had a we had a RSS session on the a si statement people complain that the statement was actually wishy-washy and I put it like this if only the other 26 people involved could have agreed with me we would have had something much more reasonable but there we are so the the conflict is not going to go away by banning p-values and my lesson here the point of view I have here is there's really no automatic Bayesian ISM you have to do it for real so my tentative opinion is there may be a harmful culture of significance I don't want to argue in favor of p-values as being a great way to look at data they're one of a number of different ways that you look at them they may be a signal that it's time to think a little bit if there's an unnie and then expected result that comes up certainly 5% significance is a very very moderate level of anything where you've got more structure you can often do better here are some ideas using likelihood has problems with nuisance parameters but nevertheless confidence distributions or other way of looking at things Debra Mayo has proposed severity which is a way of thinking of p-value is not just conditioning on the boring null hypothesis but conditioning on other values as well a sort of p-value distribution point estimates and standard errors and in fact they're extremely useful for future research synthesizers although the way the world is going we're just going to be sharing data in the future so pessimistically sometimes I think whether there be any statistical analysis at all which is worth publishing because everybody will have their own just using the data however we'll see or some of you will see and also of course Bayes so Bayes is good for personal decision making and there's the subjective Bayesian School in which you might say Ramsey definitey Savage Lindley and more modern exponent proponents rather like kin Antonia Hagen pragmatic compromises I could mention people like Jack good I have done George box and then also colleagues of mine when I was at ciba-geigy back in the 1980s a mere I seen Andy grieve Hyuga flora and Adrian Smith who are former president the Rawls Siskel Society have a very nice paper that was a red paper in 1986 the Royal Society pointing out how one could use Bayes practically as an ADA thinking Robert Matthews whom I quoted like the very beginning he was the person with the Bologna machine he has a way of thinking about it like this if you think this result is impressive let's try and work out what your Bayesian prior distribution must have been to allow you to claim that so we can then see whether this is a plausible thing to hold and the conditional Bayesian approach of speaker hata Friedman and Palmer there was a paper read to the Royal Society in 1984 Bayesian approaches to randomised trials but which also advocates using different prior distributions and seeing how the results hold up so you could have a uninformative prior skeptical prior enthuse 'ok prior and so forth in that paper I note there is a very very slight reference of using a lump probability such as Jeff is used but on the whole that's not I think but the David could certainly correct me over dinner I don't think there's a particularly and enthusiastic espouse of that in that paper not so good in my opinion Bayesian significance tests I'm not particularly enamored Bayes factors I'm not a great fan p-value is modified to behave like Bayesian tests I'm certainly not a fan and Andy grieve said well okay Steven that's fine but how about this awful thing of Bayesian approach is modified just to make them behave like p-values that's also equally bad so where both sides are guilty in that particular way speaking of Bart by the way I was very much struck by the time when I was at ciba-geigy and David first unveiled his approach to in particular sequential monitoring in various matters in clinical trials then Bart Simpson was just making an impact on television and David often used to bring up cartoon figures of Bart and I think maybe he was slightly influenced by that in choosing Bayesian approaches a randomized trial so speaking of Bart however Bart has been replaced by somebody else in the Simpson pantheon as everybody now knows the true hero of the Simpsons is not Bart it is in fact Homer and you can see what he's holding this Linga stands between you and a drink thank you very much ladies and gentlemen
Info
Channel: RoyalStatSoc
Views: 2,574
Rating: 4.6666665 out of 5
Keywords: RSS, statistics, 2017 conference, Fisher Memorial Lecture, Stephen Senn
Id: vJIc_9wzh6Y
Channel Id: undefined
Length: 44min 50sec (2690 seconds)
Published: Mon Mar 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.