All About that Bayes: Probability, Statistics, and the Quest to Quantify Uncertainty

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Physics person here. Every Bayesian starts their presentation in the same way, talking about the "great battle" with frequentism. Here's the thing: I've never heard anyone go the other way. I've literally never heard someone argue "probability can only be interpreted as the long run frequency of repeated measurements". It makes it seem like there's a massive strawman that every Bayesian statistician emotionally sets up and then pulls down. Does anyone else feel the same way?

👍︎︎ 6 👤︎︎ u/[deleted] 📅︎︎ Dec 20 2017 🗫︎ replies

That was a rather nice lecture. I thought she would reveal herself as being a Bayesian in the end.

In any case, a couple of things made me pause:

  • "Subjective […] means dependent on human judgement"

    Is that necessarily what it means? Might it not as well, especially for the objective Bayesian, mean that the conclusion depends on what information the person doing the analysis has available? Such information might come in the form of human judgement, for lack of something better, but that would be a special case, and not the central reason for why it's normal for different observers to come to different Bayesian conclusions. (I would understand her choice of words better if she was characterising subjective Baysianism, rather than Bayesianism in general, but that's not how I interpreted it.)

  • Nevermind, I have but one point.

👍︎︎ 2 👤︎︎ u/Bromskloss 📅︎︎ Dec 20 2017 🗫︎ replies

Well I understand Bayesian stats much better but frequentist stats still makes no sense

👍︎︎ 2 👤︎︎ u/gaybearswr4th 📅︎︎ Dec 19 2017 🗫︎ replies
Captions
welcome to the seminar series this is put on by the computational engineering division so the Bayesian and frequentists divide within the statistics profession is of historic proportion and it has relevance to both statisticians and non statisticians alike as scientists and engineers are having a grapple with ever larger quantities of data statistics plays an increasingly important scientific role for data fusion for uncertainty quantification and ultimately for informed decision making an understanding of the historical approaches to statistics and probability is of interest to all of us who is our speaker says quote employs applies consumes or contemplates statistics and data analyses Kristin Lennox has been at the lab since finishing her PhD in 2010 at Texas A&M University she is both the founder and the current director of the engineering statistical consulting service she has provided a statistical expertise on a range of problems at the laboratory involved in everything from lasers to explosives please join me in welcoming our speaker dr. Kristin Lennox today i spoo today i speak to you of war a war that has pitted statistician against statistician for nearly 100 years a mathematical conflict that has recently come to the attention of the normal people and these normal people look on in fear in horror but mostly in confusion because they have no idea why we're fighting I speak of course of that Bayesian versus frequentist thing now not too long ago people didn't have to worry too much about the differences between different types of Statistics some people didn't even know there were different types of statistics that started to change in the 90s when this thing that we call Bayesian inference began to slowly creep its way in and infiltrate various areas of science and technical interest but it was still something that you wouldn't just encounter in day to day life no more these days you can be minding your own business reading the news and all of a sudden bad Bayesian statistics happens this is particularly common in election years I blame Nate Silver but as so many of us do but the the reason this is an issue is because you're being presented with this information but no one's exactly told you what to do with it we live in an increasingly quantitatively sophisticated world and people have access to tools for prediction and inference and understanding data that previously they would never have seen which again would be grand if someone had ever bothered to tell you how they work now this is not a uniquely Bayesian problem let's face it most people view all statisticians as sort of a hybrid of an accountant and a wizard which is ridiculous we have nothing in common with accountants but the fact remains before I can explain to you the difference between different kinds of Statistics I have to first tell you what is statistics without qualifiers and I like to call this the central dogma of inference statistics statisticians use something called probability to understand and to quantify uncertainty now this is not the only thing with statisticians do sometimes we draw pictures but this is something that all statisticians do and it's something that we all spend a significant amount of our time doing and these statistical procedures that you have most likely been exposed to are this or at least attempts at this so if the underlying principles of statistics are this simple where is there room for disagreement and I'll give you a hint we mostly agree on what probability is so the place where you see conflict and divisions within statistics is this second italicized word uncertainty probability is a mathematical concept it is exquisitely well defined uncertainty is an English word what it means kind of depends on when you hear it so the central difference between Bayesian zand frequentists is what they mean when they say that they are quantifying uncertainty so how am I going to explain this to you here's our little roadmap for the talk hopefully you already believe that this is important it matters that people understand how statistics works because frankly it's inescapable at this point and you ought to know what you're expected to do with it since it's already there I'm going to explain these two key concepts in my central dogma of inferential statistics what is this thing called probability this is mostly a vocabulary lesson and secondly what is this thing called uncertainty or rather what are these different things called uncertainty that different statisticians are trying to describe I am then going to take you through a very very abridged tour through the history of uncertainty quantification which is the history of mathematical statistics and probability the reason for doing this is because different people throughout the last several hundred years have used probability in different ways to solve different problems and if I can show you the different styles of inference that people have been using throughout history hopefully that will allow you to better understand not only what the Bayesian ZAR but the different styles of inference that you are presented with today and finally the big reveal this is a high a partisan topic one of the nice things about it is it's very openly partisan you are never really left in doubt as to whether the person you're speaking to is a frequentist or evasion and what their corresponding biases might be so I have tried to keep this talk as unbiased as possible but at the end it's only fair to let you know how I feel about it so you can figure out where you should believe me and maybe where you should believe somebody else so moving along what is this thing that we call probability probability as we currently understand it dates back to a 1933 monograph by Andrey Kolmogorov and colegrove's insight was that this concept of probability which previously had been sort of isolated from the rest of mathematics was actually a special case of something called measure theory and what measure theory is is the mathematical formalization of ways that we actually measure stuff in real life so for example there is a mathematical formalization of the concept of length it's called the lebesgue measure there's a mathematical formalization of the concept of counting it's called the counting measure because no one quite had the nerve to name counting after themselves different not after all this time so different measures measure different kinds of things the kind of measure that we applied to things called uncertainty is probability and what makes it different from other measures is that it always takes a value between 0 and 1 so key concepts and understanding probability probabilities have to take a value between 0 & 1 which means if you take all possible outcomes or all possible things of interest the probability of all of them together has to be 1 the way that you allocate those fractions of probability among individual events or sets of events is called a probability distribution and there are a couple of familiar distributions such as the exponential distribution there which allocates probability across all real numbers the normal or Gaussian distribution which pardoning exponential is all positive real numbers the normal or Gaussian which allocates probability across all real numbers and they're also discrete distributions such as for example the distribution over the outcomes when you throw a six-sided die which is the distribution over the numbers 1 through 6 so statisticians define distributions according to things called parameters when you have a named distribution like the exponential or the normal you can perfectly define it using a small set of numbers so for example with the exponential distribution if you know the scale parameter you know everything about that distribution you can calculate any number that you want similarly with the normal or Gaussian distribution if you know the mean and the standard deviation you can calculate any probability that you want for a dice distribution the parameters are the probabilities of occurrence for each of the six sides the reason that you as non statisticians should care about parameters is because this is what statisticians do inference on so let's say for example I want to test a hypothesis about whether or not men who work at the lab tend to be taller than women who work at the lab well I will define a distribution on the heights of men at the lab I will define a distribution on the heights of women who work at the lab but I'm not going to compare these big distributions head to head I'm going to compare their parameters I'm going to compare probably their means to assess whether or not my hypothesis is true so parameters matter because they're the the tools of inference that statisticians use in a lot of cases I will times have exceptionally bad notational manners and use the terms parameter and hypothesis interchangeably I apologize but hopefully now you'll kind of understand what I mean last key concept is something called the likelihood now the formula that's up there is something called a probability mass function it's a way of describing a distribution with parameter theta and basically it is just the function that returns the probability that a random variable takes a value little X given that the distribution has parameter little theta and if I summed up over all possible values of little X I get 1 because that's what probability distributions do now what happens if I flip this function on its head if instead of fixing my parameter theta and varying my data X I fix X and I vary theta that is something that is called a likelihood so the likelihood is a key tool in both Bayesian and frequentist inference and you can sort of see why that would be where if you calculate the likelihood for a bunch of different parameter values the ones that have a higher likelihood value are in some sense more consistent with your observed data than the ones with lower likelihood values something to keep in mind about the likelihood though is that it is not a probability distribution remember when I was integrating over X that had to integrate to one if I integrate over theta that does not have to integrate to one it doesn't even have to be a finite value and this is going to matter a little bit later in the talk so I'm sure you're relieved that's everything you need to know about probability for now now we're going to get into this more subtle concept what is uncertainty or what are the different kinds of things that statisticians would call uncertainty and try to describe with probability this is a subtle concept and I have been told that one of the best ways to explain subtle concepts is through the use of story so I wrote you one I call it the statistical lunch bunch and the summer student revolt of 2015 it is entirely fictional yet strangely plausible so once upon a time there were six statisticians and these statisticians work together and they played together and they went to lunch together every Thursday now there are two key things that you need to know about statisticians the first we are creatures of iron will the second is that we are creatures of slight whimsy now the iron will was really causing problems for this group of statisticians because they could never agree about where they should go to lunch however instead of taking terms they came up with a more whimsical solution in the form of dice so every statistician had a number we rolled the dice and whoever's number came up got to pick the restaurant but these were not just any dice oh no these are what are called precision dice or gaming dice and they are exquisitely engineered to have equal probability of occurrence for all six sides so when I say I am uncertain about where I'm going to lunch on Thursday that is due to the inherent random nature of the outcomes of throwing these die rather than any situation where I'm ignorant about the physical properties of these die now one of these statisticians was a collector of dice and a connoisseur of dice and was not entirely satisfied with the current lunch selection solution mostly because the dice are kind of boring I mean they're mass-produced they're plastic they're also sharp and they kind of hurt when you hold them so she came up with a more whimsical and overall superior solution in the form of hand-carved evany dice with hand applied brass pips now these dice add a certain amount of elegance and panache to any event where they are used and she presented this alternative to her fellow statisticians and they liked it but they were a little bit concerned because they were uncertain that these dice were fair so remember with our blue dice we know that the outcome probability for each of those sides is equal for these dice we do not actually know what the probability of outcomes are for different numbers because there are less carefully manufactured so the mathematicians talked for a little while and they decided that the increase in whimsy was acceptable as a benefit for the sacrifice in certainty however they didn't want to certify that these dice were at least kind of fair and they came up with a testing procedure that basically involved rolling the dice 50,000 times and writing down the outcome which is doable but no one was volunteering fortunately a solution presented itself in the form of our summer students so we went to our summer students and we told them that they were going to spend their internship at a National Laboratory rolling a die over and over and over again and writing down the outcomes and we told them they were following in the proud footsteps of John Carrick who is a mathematician interned in Denmark during World War two who spent the period of his imprisonment flipping a coin over and over and over and over and over again to prove that the law of large numbers works and he got a publication out of it so this is a true story and I personally find it very compelling they didn't buy it so we had a they refused to help us with our little experiment and we had a period of very tense negotiations while we were trying to figure out how can we figure out if the dice work while while they have what they consider to be an intellectually satisfying summer experience apparently somebody told them that if you come to a National Lab it's all lasers and explosions all the time but eventually we were able to come to a mutually acceptable solution in which they were going to leverage the fantastic computational capabilities available here at Lawrence Livermore to build an exquisitely refined physics model of our wooden dice so that they could simulate rolling them over and over and over again instead of doing it for real there was just one tiny problem our summer students like us our statisticians which means they had never built an exquisite three-dimensional physics model before and we were uncertain as to whether or not they had done it right so while we were discussing how we might validate their work the entire situation was overcome by events in that the statistics group hired two new people now we can't use six-sided dice at all we need eight sided dice and you might imagine that a statistics group with a dice collector as a member might have access to an eight sided die or even a selection of eight sided dice you would of course be correct but you know we're back to mass produced plastic aren't we whereas there are alternative options such as for example additively manufactured custom-made 8 sided spinners available in a variety of attractive metals and finishes which are just a wonderful way of making any eight factor decision making process more fun however one is uncertain as to whether one should bite the bullet and purchase this item because on the one hand I don't actually need it on the other hand all of my dice collecting friends will be extraordinarily jealous so here's a form of uncertainty which is not tied to any physical process at all it exists only in one's mind and it describes a state of belief so I've run out of space which means that we're not going to talk about any additional kinds of uncertainty and I leave you on the cliffhanger did I buy the top of course I bought the top but the point is I have shown you four different kinds of uncertainty not exhaustive and all of these styles of uncertainty are quantifiable and all of them can be described using probability however if I use probability to describe all these different kinds of events clearly that probability is going to mean something different in different situations and I'd say the two cleanest examples of that are going to be the blue dice and the gold top we're in the blue dye situation we have an actual physical system about which we have exquisite knowledge but we are still uncertain as to its next outcome now this is a style of uncertainty that's called randomness that's called aleatory uncertainty and when I use probability to describe this situation it's called objective probability or frequentist probability and the word frequentist refers to the long-range frequencies of random events now on the other side I have the gold tub the gold top is not tied to any physical system there's not a right answer somewhere graven and stone if I am using probability to describe my thought process on whether or not I should make this purchase then that probability is entirely subjective it describes a state of knowledge or a state of belief and it could differ from person to person now this is sometimes called subjective probability or Bayesian probability so we have now laid out the two sides in our conflict most statisticians would agree that objective probability is the right way to describe the blue dice and subjective probability is the right way to describe the Goldtop it's really hard to get all statisticians to agree on anything but most but what about those middle two situations in both of those situations I have on certainty about a real physical system the the wooden dice do exist they have some physical properties they have some distribution over their outcomes so that's like the the blue dye situation however I am ignorant on some level about how this system behaves I could conceivably learn more about it so in that case I also have some subjective uncertainty about how these systems work so when I say I want to use probability to describe those two intermediate systems the frequentist say well of course you should use objective probability to describe a physical system and Bayesian say well you should use subjective probability to describe your state of belief and then they fight and so that's the difference between the Bayesian Xand frequentist it really is almost a units problem even though probability is unitless where it can be used to describe two very different kinds of uncertainty and to give you an idea of how we found ourselves in this point now I'm going to give you a brief history of uncertainty quantification how did we end up with different styles of probability describing different things going back to the beginning what we consider to be the origin of mathematical probability today is a series of letters that were exchanged between Blaise Pascal and Pierre de forma in 1654 what were their writing about while they were writing about gambling and specifically something called the problem of points which is the question of if you are playing a game of chance against an opponent and you are interrupted before that game is completed how do you split the stakes and this was a matter of great interest to both mathematicians and gamblers and in the course of figuring out a mathematically rigorous way of addressing this problem they came up with the core concepts that describe probability today for example Pascal came up with the concept of expected value which these days means something like a mean or a measure of central tendency back then it was literally the expected value you will get back when you place a bet on a game of chance and this is what probability was it wasn't even called probability back then it was called the doctrine of chances or the study of chances because it was the mathematical description of games of chance things like dice and cards Lottery's and that's how things stay for a very long time with the first hint a broader application coming in 1763 when a publication was came out that had been written by a gentleman by the name of Thomas Bayes now Thomas Bayes was a Presbyterian minister and he was also an amateur mathematician he published in a variety of areas on different mathematical problems but he had a very interesting idea about this doctrine of chances which we now call probability which is how can you use probability to describe learning how can you use it to describe an accumulation of information over time so that you can modify a probability based on additional knowledge so since he was a Presbyterian minister he wasn't going to talk about dice or cards or lotteries he came up with his own thought experiment which he did not bother to name but which I shall and I call it blindfolded one dimensional table watching Bayes bachi for short and this is a a notional game set up the way that the game is played is that you Don your blindfold and you take in hand you're starting ball you're going to throw this ball at the table in such a way that it has an equal probability of landing in any location on that table and your job is to guess where it is if that sounds impossible that's because so far it is this is where your friend comes in they're not wearing a blindfold now you're going to take one of these secondary balls shown here in blue you're going to throw it at the table in the same way and your friend is going to tell you whether it landed to the left or the right of the ball you started with and as you continue playing this game for multiple rounds you will learn where the red ball is so what does this look like here's my base bocce simulator you can see the overhead view of the table we've also got the side view because we're only interested in that distance from the left and on the bottom that is my starting probability distribution for the location of that red ball where it is flat because I do not I throw it in such a way that I do not have an idea that it's in one part of the table versus another and the the true location is marked by that vertical line so round one blue ball fell to the left of the initial ball and you can see I'm updated my probability distribution it's fallen to nothing on the extreme left edge of the table gradually increasing the closer you get to the right ground two two balls have fallen to the left now I have almost no probability on the left half of the table again increasing The Closer that we get to that right edge let's skip ahead a little bit round ten I've now had nine balls fall to the left and one fall to the left to the right and I've got this peaky distribution which is not centered exactly at the true location but it's pretty close skipping ahead to round 25 we've now had 22 balls fall to the left and three fall to the right and again this distribution is becoming narrower and getting closer to the true location of the red ball now trust me if I kept playing this forever eventually I would get something that's very very closely centered around the true location of my starting ball how did Thomas Bayes come up with this and how am i doing those calculations for the probability distribution well though Bayes was far too modest to say so he used Bayes theorem Bayes theorem describes a concept called conditional probability which is the probability distribution of one event or random variable given information about other random events so in this case the the event we're trying to get information about is X the location of that red ball our our ancillary information is the left/right information we have about the secondary balls and you can see we have this is a function of the the distribution of a left-right information for the secondary balls given the location of that red ball that's just a binomial distribution and that starting flat unconditional distribution for the red ball so what is Bayesian about this remember I said Bayesian z' are so defined because they use a subjective interpretation of probability probability a state of belief rather than descriptive description of randomness so what's Bayesian about Bayes theorem absolutely nothing Bayes theorem is a theorem which means it works for probability no matter what your interpretation of probability is if you want to be Bayesian with Bayes theorem then you need to take an additional step so what Bayesian x' do is they use Bayes to perform inference about distribution parameters remember I said that's what what statisticians describe we talk about a distribution parameters data and what they do is they calculate the conditional distribution of a parameter theta given the observed data X so what makes this a Bayesian procedure well it's got a couple pieces the first is our old friend the likelihood where you're just taking a probability density or mass function treating it as a function of theta rather than X but then there is the prior so the prior is the unconditional distribution that you put on that parameter theta we do not necessarily think theta is random remember my example for parameter earlier the mean height of men at the lab we don't think that's a random variable we just don't happen to know what it is so if I'm using probability to describe something that is not random I'm not using objective probability I'm using subjective probability probability as a description of a state of belief or a state of knowledge now you put all these things together and it's called the posterior distribution of theta given X and again we have our constant of integration because probabilities have to integrate to one it's the rule so I've already told you Bayes theorem isn't Bayesian was Thomas Bayes Bayesian well we don't quite know we do know that he never really applied his theorem to real problems in fact the paper describing it wasn't even published until after his death so who was the person who took Bayes theorem and did this with it who said we should be doing this subjective probability based inference for scientific problems and that gentleman was pierre-simon Laplace pierre-simon Laplace was what you might call a probability enthusiast he took this thing that was used to describe gambling games and he said let's just use it to describe everything and he applied it to problems in biology and problems in sociology to astronomy he wanted to apply it to criminal justice where he wanted trials in France to actually quantify the evidence and come up with some kind of posterior distribution of guilt or innocence at the end that part didn't catch on but the science parts did and the reason was up until this point there wasn't really a quantitative way of accounting for uncertainty in data or models and so this style of thinking which we call Bayesian statistics back then was called inverse probability was the only game in town eventually Laplace came up with something called the central limit theorem which is a cornerstone of a frequentist inference so if you know the entire conflict really is his fault but the the central limit theorem only applies when you have large samples and so during Laplace's time and and for over a hundred years after that the only way to do inference for small samples was using Bayesian statistics basically and some people hated it and it's worth talking about why because it's not like there was a better alternative available and the issue was the prior some people just did not like this prior they weren't sure how they were supposed to come up with it and they really didn't like the fact that if two people came up with two different priors you could get a different answer to the same scientific question so here is an example that illustrates this that is actually one of laplace ozone how do you calculate the probability that the Sun is going to rise tomorrow so first we need some prior on the probability of the Sun rising and the plas said well it may rise it may not other than that I don't know so it's just a flat prior on that interval we need a likelihood and he said okay what we're going to use as our successes every time the Sun has risen in the past which thankfully has been every day so far and one of the confusing things about Bayesian statistics at least when you talk about it is that you're calculating probability distributions on probabilities what we're actually going to call the probability of sunrise tomorrow is the expected value of this distribution over the probability of sunrise tomorrow and it is available in closed form in this case trust me I'm a professional and is simply the number of times the Sun has come up in the past plus one divided by the number of times the Sun has come up in the past plus two plugging in numbers they said the Sun has risen every single day for the last 5000 years including leap years and so that is his calculated probability of the Sun rising tomorrow one minus that the probability of the Sun not rising tomorrow is about one in two million that number is there be small it is not small enough for one thing we know that the earth is over 4 billion years old which means that if that were the actual probability of the Sun not rising on any given day we'd have had well over 2,000 sunless days by now so not only do we not believe that number Laplace didn't even believe that number so when he was talking about the probability of the Sun rising at that point 9 9 9 9 etc he said this number is far greater for him who's seeing in the totality of phenomena the principles regulating the days and seasons realizes that nothing at the present moment can arrest the course of the Sun so if you ask some random person who doesn't know anything about the Sun this is their correct probability for the Sun rising tomorrow if you ask the plus he says don't worry it's coming up how does this work remember probability got its start describing gambling games and the probability of rolling Snake Eyes does not depend on who you ask ideally it also doesn't depend on who's rolling but how do you how do you make that leap from a description of physical phenomena physical processes which are the same for everyone - a situation where the probability of the Sun coming up changes depending on who you ask and the answer is of course this is subjective probability we aren't talking about the actual probability of the actual Sun coming up tomorrow we're talking about our state of belief that the Sun will come up tomorrow which of course depends on what you happen to know how did Laplace make this leap how did he go from a description of random events to a description of a state of belief as the interpretation of probability and why on earth didn't he tell people that that was what he was doing well the answer is because Laplace didn't think he was making a leap at all Laplace did not believe in randomness it was actually pretty common during Laplace's time to believe that the the world was entirely deterred deterministic and predictable and that if you just had enough knowledge of starting conditions and the laws of physics you could predict every event including the role of a die going out into eternity and it wasn't until the 20th century that we started to see scientific and mathematical revelations that made people start to question this so to Laplace the only kind of uncertainty there was was uncertainty due ignorant was what the kind of uncertainty you describe with subjective probability now that being said even though this is how he he justified it and this was literally the only way people had of sort of calculating probabilities for hypotheses a lot of people didn't like it because first of all you have to come up with a prior and second of all at different people come up with different priors particularly if there's not a lot of data you can get different answers so there was a real push to get the prior out of statistics the only way that you can which is by coming up with a style of inference which is tied only to objective probability only to randomness and that is what the frequentist did so my two representative frequentist are Ron Fisher who came up with the concept of maximum likelihood and the Nova he was also very influential in coming up with the idea of randomization experiments and would not appreciate being called a frequentist by the way he advocated a third style of in Prince called fiducial probability but it didn't catch on so we'll just leave him up there and also we have Nieman pardon me Jerzy Neyman who along with his partner in crime Egon Pearson came up with the decision theoretic foundations a frequentist hypothesis testing so this idea of controlling your false positive or type 1 error rate and then minimizing your false negative or type 2 error rate and using that to define a optimal test that was that was due to his work so these gentlemen and and a variety of other people who founded frequentist inference did something very interesting they came up with a way of doing probabilistic inference which is tied only to randomness it doesn't have subjective probability at all and as you might imagine there are some caveats to this and some some limitations on it and the best way for me to explain them to you is by showing you a particular inference problem doing it as a Bayesian and then comparing that to how you might approach the same problem as a frequentist so let's go back to our statistical lunch punch and imagine we have decided to use the wooden dice and I have been assigned the number 1 remember we don't know if these dice are fair so I want to make sure that they're at least unfair in my favor so I do a little experiment I roll the dice 50 times and I get 12 1s which is a 24 percent success it that's better than fair which is closer to 17% but I know I have not fully characterized these dice so I want to know okay what is the lowest reasonable value for this probability of rolling a 1 which I'm going to call theta from here on out and I'm going to say well I'm willing to accept an uncertainty of 5% weasel word is used advisedly because uncertainty is going to mean two different things over the next two minutes so first off I am a Bayesian so the first thing I have to do is pick a prior and that prior that you see up there is called the Jeffrey's prior turns out that's a better way of expressing ignorance of a proportion value than just a flat prior now I'm going to calculate my posterior distribution using my twelve successes and fifty trials and I get something that looks like that nice peaky distribution centered around point two four and I say this is my probability distribution for theta find the 5th percentile and say with 95% probability the probability of rolling a 1 with these dice is at least fifteen point three percent so far so good now I put on my frequentist hat and I can't do any of that anymore remember as a frequentist I can only describe random events or events that are similar to random events using probabilities so what was random in that in that problem I just described well the only thing that was random was the experiment where I rolled the dice 50 times unfortunately it's not random anymore I already did it but can I come up with a procedure that if I had a hundred thousand dice and I wanted to come up with an interval for each of them by rolling it 50 times and counting the ones can I come up with a procedure that is going to work for 95 percent of those random experiments which is called a 95 percent coverage probability it turns out you can and that is called a confidence interval and I don't really have time to explain how it works but this is how you calculate it in this case I'm going to figure out what outcomes of my hypothetical experiment are as or more consistent with values in my interval as the one I saw in this case that means anything that produces 12 or more ones out of 50 trials so now I'm going to take every possible theta every possible probability of rolling a one and I'm going to calculate the probability of getting 12 or more ones out of 50 trials and that's what that looks like that is not a probability distribution that is just a probability calculated for a bunch of different values of theta which is varying over the x-axis I'm going to identify all of the probabilities up there that have less than a 5% chance of producing 12 more ones and I'm going to boot those out of my interval and the reasoning behind that is because if I really had a probability that small I don't think I would have gotten the data I would write that I actually saw which was exactly 12 ones and then I can say with 95% confidence the probability of rolling a 1 is at least 14 and 1/2 percent and this this procedure does in fact have that 95% coverage probability which is to say for 95% of experiments that you carry out this way the interval will capture the real parameter value but of course you don't know if you've captured it for any particular experiment it's worth pointing out I pretty much got the same answer twice I mean both of them say that the probability of rolling a 1 is greater than 15% ish with uncertainty whatever that means a 5% it was a lot harder to explain how the frequentist did that and I would like to leverage my experience in teaching frequentist statistics to tell you it takes a very very long time to get people to understand where probability comes into play and frequentist procedures that you have this coverage probability for confidence interval that when you're doing a hypothesis test you do not actually have a probability that the hypothesis is true bayesian inference can give you that but frequentist inference can't because the hypothesis is not random a parameter value isn't random and it takes a while to sort of wrap your head around that and that is one of the primary Bayesian critiques a frequentist inference so why did this catch on and boy howdy did it ever catch on the advent of statistics as being a key piece of the toolkit for science really was the advent of frequentist statistics they're much better at marketing than the patience all of a sudden even though objective probability is its inflexible and it requires these sort of ancillary randomized experiments before you can you can calculate results you've gotten rid of this problem of the prior which is twofold the first is how do I come up with a prior that really represents my beliefs if I'm not a statistician and I therefore don't think in probability and second of all how do I deal with this situation where if I have one prior and you have a different prior we get different answers oh no so by getting rid of the prior they got rid of this problem and they put statistics in the hands of people who don't think in probability now this made statistics enormously popular it's also led to a lot of the problems that we're experiencing in statistics today with misinterpretation of results if you want my take on on frequency statistics please see my other talk everything wrong with statistics and how to fix it now available on YouTube but rightly or wrongly when this caught on the Bayesian z-- had to find a way to respond to it and to respond to the implicit critiques of Bayesian inference that were sort of embedded in the popularity of frequentist inference and you'll be shocked to hear they couldn't agree on how to do that so there is one school of Bayesian I'll call them the subjective Bayesian z-- who just doubled down on subjective probability they said using these ancillary randomized experiments is silly when you can just directly use probability to quantify your state of knowledge for an experiment but they did pay attention to the critique where it's hard to come up with a good prior when you don't think in probability so they invested a lot of effort and a lot of research time into figuring out how do we do that better how do we really represent people's state of knowledge using a probability distribution and they came up with a concept called expert elicitation which is how do you pair up someone who thinks in probability with someone who knows about science and find a way to write down a state of knowledge of science using probability some subjective asians took it a little bit farther and they denied the existence of objective probability at all probably the most famous of these is bruno de Finetti who like to say probability does not exist he did not mean mathematical probability he meant that probability is not of any real-world phenomena it was sort of a callback to Laplace's view the only reason we can't predict things is because we're ignorant about them so the only viable description of uncertainty the probability can have is as a description of ignorance rather than some description of randomness de Finetti never actually worked with quantum physics might have changed his mind so on the other side you had the objective Bayesian s-- so the objective beiges were receptive to a different part of this critique of Bayesian statistics by the way don't be fooled by the name they still use subjective probability all Bayesian z-- do but objective Bayesian said all right how do I come up with a procedure that's going to give the same answer no matter how who uses it how do I just let the data speak instead of worrying about different people's prior knowledge and so they invested a lot of time and effort in coming up with what are called non informative priors and these are priors that are expressive of ignorance in some mathematically defined way so for example that that smiley face prior I showed you a few a few slides ago that is a non informative prior and it's actually less likely to bias your data then a flat uniform prior is every time somebody just uses a finite uniform prior than an objective Bayesian sheds a single tear there are more rigorous ways of expressing that concept some objective Bayesian z-- take it a step farther in the objective stage where they will actually calculate the frequentist properties of a Bayesian procedure so remember that that jeffrys prior based credible interval that i calculated for the the probability of rolling a 1 it has a 95% subjective probability of containing my parameter value that's great it's also a 95% confidence interval which means they've tested it using simulation and they have found that in 95% of cases where you do a random experiment and you follow that procedure it will capture the real parameter value so this is basically saying to to the practitioner ok you're not quite so comfortable with subjective probability as we are you don't have to be this procedure is going to work under the objective probability framework as well and you can guess what ejected Asians thought of that there are other styles of Asians as well for example the empirical Bayesian z-- they use their data to inform their prior which depending on which of those gentlemen you ask is either slightly cheating or a step away from human sacrifice something that I hope you've picked up on from the way I've been describing statisticians in this talk is that we take the interpretation of probability very very seriously because we have two statisticians deliver probabilistic products anytime you get a statistical a statistical interval or a result of a statistical hypothesis test that is either explicitly a probability or a function of a probability if you don't start with a probability distribution that means something and then maintain that meaning very carefully through your various mathematical permutations then you end up with a number at the end which you can't use because you don't know what it represents what it stands for so this is why this is a big deal to statisticians be they Bayesian or frequentist not everyone uses probability that way some people use probability as an intermediate step in some other procedure and in that case since the probabilities a means to an end they are a little bit less sensitive to maintaining a really good interpretation for that probability one of the examples of that is Bayesian search and one of the best examples and one of the earliest examples I have of Bayesian search is the search for the USS scorpion so the scorpion with a skipjack class nuclear submarine that was lost with all ninety nine crew aboard on May 22nd 1968 when I say that she was lost this was not just a tragedy which of course it was it also presented a very practical problem because the last known contact with the Scorpion was on May 21st and she was off the coast of Europe she was supposed to return to her home port in Norfolk Virginia on May 27th but she never appeared which means that somewhere in the Atlantic Ocean you have a missing submarine and needle-in-a-haystack has got nothing on that so eventually the US Navy was able to use acoustic signals which they believer are from the accident where scorpion sank to localize somewhat the location of the wreck of the submarine however their search box was still well over a hundred square miles it was 400 miles from land and it was a part of the ocean that was up to two miles deep so this is a non-trivial problem on top of that they have other sources of uncertainty they couldn't precisely locate the search ship that they were using to check the ocean floor for scorpion they didn't have GPS they had both false positive and false negative probabilities for the sensors that they were using to sweep the ocean floor and they could not figure out how do we fold together all of this information to try and conduct the search in a credible way and they went to a man named dr. John Craven who was a mathematician who was very interested in Bayesian search and Craven and his team said all right well first of all we need a prior distribution and they came up with one the they were working with a number of experts from the Navy about where might an incident have taken place what would the the heading the speed the depth of the Scorpion have been what were scenarios that might have caused the Scorpion to sink and they rolled all of this together into a probabilistic prior they sampled that prior and they fed it through a simulation to say if the accident had occurred in this way where would the wreckage of the Scorpion now be found and the Navy had divided up their search box into grid squares and the prior is actually the count of the number of times the simulation placed the wreckage in one particular square just divided by the total number of 10,000 simulations on top of this Cravens team came up with an update procedure similar to what we saw with the the table bocce example where based on the false negative and false positive probabilities for the various sensors they were using to try and find the wreckage they could update this probability distribution based on where they'd already looked now it took a little bit of time to fold all of this together but eventually they appeared literally on the search ship in August of 1968 with this prior and thus update procedure and you will all be shocked to hear that the Navy did not let the math people drive this is because they thought they'd already found the Scorpion they had had a magnetometer contact at that location right there and they were now searching repeatedly this grid box and they were not able to get a photograph of the wreckage with their camera which was their sort of gold standard for up there other search sensors and so the first job that the mathematicians had was a futility calculation was basically saying based on our false negative and false positive probabilities when have we searched here long enough that we would have seen the Scorpion if she were there and eventually they concluded that they would have had an eighty percent chance of seeing the ship if she'd actually been there so they moved on from that search box concluding it had been a false positive eventually they did let the mathematicians drive and they you know promptly went to that darkest blue box up in the middle and they searched and searched and searched they didn't find anything and they concluded that they needed to refine their model a little bit like for they needed to integrate their location uncertainty they needed to understand their sensors a little bit better with false positives and false negatives and they went out on a calibration cruise to to better understand their mathematical model and promptly found the submarine such as life so when I say that this was not an exquisite Bayesian model what do I mean well the biggest problem with it is this discretization of their probability distribution and the problem with discretizing your distributions is you can bias your results depending on where you draw those boundaries and you can actually see where that happened here scorpion is three hundred yards away from the highest probability box each of these boxes is about three quarters of a mile square if they had drawn their boxes in a different place they might have found the ship sooner so that's my my warning about discretizing your distribution but I'm not saying that they did anything wrong for two reasons number one they found the submarine and they found it faster than they would have if they had not been using Bayesian methods so we have to call it a success on that front and second of all they did not have a choice they were on a ship in the middle of the ocean literally working 24-hour shifts they had to be able to update this distribution in real time and that means they needed a distribution they could actually compute with so if we had used a fully rigorous continuous Bayesian distribution they almost certainly would not have been a to update in time in fact at this time they probably wouldn't have been able to work with it at all so they took the mathematical shortcuts that were necessary to be able to work on this problem and if you are ever asked to do a probability calculation in the 1960s on a ship in the middle of the ocean you can take shortcuts to otherwise consider not doing it that way so I have just alluded to an issue that I think some of you are probably wondering about which is why do people think Bayesian statistics is new it's not it's hundreds of years old in fact it's older than its most popular competitor so why is it you didn't start hearing about it until fairly recently and the answer is computational remember that that constant of integration that I kept kind of glossing over earlier on in the slides that's actually a huge problem it turns out you cannot arbitrarily combine a prior distribution with a likelihood and expect to be able to compute with it you will certainly can't write it in closed form and Bayesian inference for hundreds of years with limited to this very small family of priors and likelihoods that were compatible and computable there's a reason you saw the beta binomial model three times during this talk so that changed in 1990 with the publication of a paper by Gelfand and Smith on something called Markov chain Monte Carlo what MCMC lets you do is sample from a distribution even if you don't know the constant integration so now evasions can go to town they can put together any prior in any likelihood that they want they can sample from it they can do numerical integration and they can do inference on those samples of the population and this is what caused the Bayesian Renaissance it was the combination of the availability of these algorithms with the availability of increasingly sophisticated computing hardware and this is what allows for the the use of very elaborate hierarchical style Bayesian models today National Laboratory connection the MCMC algorithm is a direct line descendant of something called the metropolis algorithm that came out of Los Alamos in the 1950s and our old friend Edward Teller was a co-author on that paper so prepare yourselves it's time for the big reveal what kind of statistician am i I am all of the above I have used every style of inference that is presented in this talk in full honesty I was raised frequentist as so many of us are and to this day my instinctive understanding of probability is a frequentist understanding its probability as a representation of random events that said when frequentist inference can't do something I want to do I go Bayesian a good example is my dissertation is actually a set of Bayesian models and one of them is a very elaborate model over protein folds where it's probabilistic distribution over the structures that proteins can take and this model draws information from adjacent regions of the same protein from families of closely related proteins when that is not available the prior goes out farther afield and draws information from a broad class of general proteins I could never have written that model in a frequentist way so my take on inference is that if it makes sense to you for your problem and if you are upfront about why you made the modeling decisions that you made it's all good if somebody disagrees with you that's fine they can analyze it their way maybe they get the same answer maybe they don't and then you get to learn something so I've mentioned that the frequentists are better at marketing than the patient's I meet a lot of people who are hesitant about Bayesian inference because of the word subjective in terms of subjective probability subjective does not mean arbitrary subjective does not mean non rigorous it means dependent on human judgment and everything we do in science is dependent on human judgment when we decide what hypotheses were going to test how we set up our experiments how we're going to take measurements when you do your statistical analysis your choice of likelihood is subjective so if you introduce a well-considered prior distribution that's not going to somehow contaminate your beautiful objective experiment in fact sometimes clinging to mathematical objectivity can lead you down the wrong path and I have an all frequentist example for you remember Ron Fisher he was one of the early advocates for the use of randomization and experiment and one of the reasons for that is that if you use randomized controlled experiments that can help you establish causality in a mathematically rigorous way in fact it's probably the only thing that can the early evidence linking smoking with lung cancer was all based on observational studies so Ron Fisher went on world tour literally saying correlation is not causation the evidence is just as consistent with cancer causing smoking as smoking causing cancer now on the one hand he's right on the math he's wrong on everything else when you think about it they never are going to do that randomized control study for for smoking they never did it so what they did instead to establish causality was look at the weight of the evidence all of the different observational studies the fact that people tend to start smoking before they get cancer which is subjective studies and animals which might have been randomized chemical analysis of cigarette smoke and you put all of that evidence together and you use human judgment to say yes smoking causes cancer not the other way around it's not justified by objective probability but it's still the right answer so what have we learned today well you have learned that statisticians use probability to describe uncertainty and that we don't always agree as to how that should be done you have learned that it's not just about the beige ins versus the frequentists we also have the frequentist versus the frequentist and the beige ins versus the beige ins and a giant statistical scrum John Tukey once said the collective noun for a group of statisticians is a quarrel so what I hope that you take away from this is a better foundation for your understanding up not just Bayesian statistics but all statistics there's really only two pieces up up here that feed into inferential statistics one is mathematical probability and the second is the application of mathematical probability to real-world problems neither of those is trivial I can't teach you how to be an expert in that in an hour but it's a lot easier to build an understanding when you understand what you're building towards so again this gives you I hope a better way to approach statistical methodology and to approach ways to increase your understanding of statistical methodology so assuming you're not now just going to take a time out of two to five years while you just study probability intensively before you encounter statistics again please don't hesitate to reach out if you're at the laboratory you can call me you can call the statistical consulting service if you're outside the laboratory then don't hesitate to contact a statistician at your own institution or at a university I play up the strop enos of statisticians in this talk because it's funny but in my experience statisticians are very friendly very collaborative people who just like to argue for fun so on that note I hope that you have learned something today that will help you in your future encounters with statistics and thank you all so much for your time you
Info
Channel: Lawrence Livermore National Laboratory
Views: 53,306
Rating: 4.9307089 out of 5
Keywords: statistics, probability, Bayesian, uncertainty principle
Id: eDMGDhyDxuY
Channel Id: undefined
Length: 56min 35sec (3395 seconds)
Published: Tue Sep 27 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.