Why Bayes rule is nicer with odds

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Absolute gold mine Grant. You've changed the way I look at probability. This used to be my weakest point. Now I'm slowly able to tackle newer and newer concepts.

👍︎︎ 13 👤︎︎ u/kingbradley1297 📅︎︎ Dec 22 2020 🗫︎ replies

Great vid! One thing I gathered that I want to explore more is the intersection between quick estimations and combating misconceptions.

I feel like both teachers and students alike struggle with this as I think the idea of "quick estimations" has been misunderstood (maybe that's just me). I think alot of profs I've had have just interpreted that as giving the student many problems instead of specifically tailored problems. Do quick estimations refer to solving a simple case (like Prior << 1 in the video)? Or is it a gross approximation, like a Taylor series expansion to understand a function better around a local area (I know weird example that's all I could think of). Obviously, the example chosen greatly affects how the misconception is tackled but there lies my question, what makes a good example for an avid learner?

Maybe I'm overthinking this or I'm completely misunderstanding it, but I do feel like this is such an integral part of the learning process. Also, if anyone is willing to share some resources on the matter it would be greatly appreciated.

👍︎︎ 7 👤︎︎ u/boiSlimThick 📅︎︎ Dec 22 2020 🗫︎ replies

Wow, this was very helpful. I've always knew I had to be very careful handlig medical tests sensitivity, but I was never actually able to grasp its logical description. Very cool video! (as usual).

But, as every proper explanation should do, as much as it clarified me, as much it raised more questions. If it is very clear now how a medical test can update your probability of a certain result, now I'm struggling to understand the prior, or rather, I'm not understanding how to deal with my medical test when i don't have any information about the prior whatsoever.

Let me try to make an example.
Let's say I created a sensor for detecting the presence of a molecule inside a solution. Let's assume for simplicity that the sensor result is simply "yes, the molecule is present" or "no, the molecule is not present". Let's assume I can create some test solutions, which content is know with 100% accuracy, that I can use for determining the four parameters of my sensor: sensitivity, specificity, FPR, FNR.
Having done so, what can I say about the result given by an arbitrary solution which content is completely unknown?

Ok, the example is full of flaws, but please try to understand my point. I imagine it is possible to create a medical test which parameters (sens., spec., FPR, FN) can be determined experimentally by "feeding" the test with known test samples. If this is possible, I could then use the medical test with a population which prior is completely unknown, but what can i say about the test result then?!

👍︎︎ 4 👤︎︎ u/Niccco_ 📅︎︎ Dec 23 2020 🗫︎ replies

Hi Grant. Do u like making a video about History of Numbers like how people got inspired about numbers becoz that shot everywhere.

👍︎︎ 1 👤︎︎ u/MMS_2705 📅︎︎ Dec 23 2020 🗫︎ replies

I really, really, enjoyed this video, Bayes theorem presented in such a simple way simply cuts through all explanations I have tried wrapping my head around previously. I actually think this method is the best bit of probability theory I have come across. My question...where can I find more resources, i.e. books, that cover this idea and in particular it's practicalities. I have been scouring the internet for a couple of days and I came across a paper about hypothesis testing, though it was a bit heavy for me. Anybody have links to more information about calculating Bayes factor? Thanks!

👍︎︎ 1 👤︎︎ u/Conscious_Chicken_22 📅︎︎ Dec 24 2020 🗫︎ replies
Captions
Some of you may have heard this paradoxical  fact about medical tests. It's very commonly   used to introduce the topic of bayes rule in  probability. The paradox is that you could   take a test which is highly accurate, in the sense  that it gives correct results to a large majority   of the people taking it, and yet under the right  circumstances when assessing the probability that   your particular test result is correct, you can  still land on a very low number. Arbitrarily low   in fact. In short, an accurate test is  not necessarily a very predictive test. When people think about math and formulas they  don't often think of it as a design process.   I mean maybe in the case of notation it's easy  to see that different choices are possible,   but when it comes to the structure of the  formulas themselves and how we use them   that's something that people  typically view as fixed.   In this video you and i will dig into this  paradox, but instead of using it to talk about the   usual version of bayes rule, I'd like to motivate  an alternate version, an alternate design choice. Now what's up on the screen is a little  bit abstract, which makes it difficult to   justify that there really is a substantive  difference here, especially when i haven't   explained either one yet. To see what i'm  talking about though we should really start   by spending some time a little more concretely  and just laying out what exactly this paradox is. Picture one thousand women and suppose that one  percent of them have breast cancer. And let's   say they all undergo a certain breast cancer  screening and that nine of those with cancer   correctly get positive results and there's one  false negative. And then suppose that among the   remainder without cancer 89 get false positives  and 901 correctly get negative results . So if   all you know about a woman is that she does  the screening and she gets a positive result,   you don't have information about symptoms or  anything like that, you know that she's either one   of these 9 true positives or one of these 89 false  positives. So the probability that she's in the   cancer group given the test result is 9 divided  by (9 + 89) which is approximately 1 in 11. In medical parlance you would call this the  "Positive Predictive Value" of the test, or PPV.   The number of true positives divided by  the total number of positive test results.   You can see where the name comes from to  what extent does a positive test result   actually predict that you have the disease. Now hopefully, as I've presented it this  way where we're thinking concretely about   a sample population, all of this makes  perfect sense. But where it comes across   as counterintuitive is if you just  look at the accuracy of the test,   present it to people as a statistic, and then ask  them to make judgments about their test result. Test accuracy is not actually one number but  two. First you ask how often is the test correct   on those with the disease, this is known as  the test sensitivity. As in how sensitive is   it to detecting the presence of the disease. In  our example test sensitivity is 9 in 10 or 90%.   Another way to say the same fact would  be to say the false negative rate is 10%. And then a separate not-necessarily-related  number is how often it's correct for those   without the disease, which is known as the  test specificity. As in, are positive results   caused specifically by the disease or are there  confounding triggers giving false positives?   In our example the specificity is about 91%. Or  another way to say the same fact would be to say   the false positive rate is 9%. So the paradox  here is that in one sense the test is over 90%   accurate, it gives correct results to  over 90% of the patients who take it.   And yet if you learn that someone gets a  positive result without any added information,   there's actually only a 1 in 11 chance  that that particular result is accurate. This is a bit of a problem because of all of  the places for math to be counter-intuitive   medical tests are one area where it matters a lot.  In 2006 and 2007 the psychologist Gerd Gigerenzer   gave a series of statistics seminars to practicing  gynecologists and he opened with the following   example: A 50 year old woman, no symptoms  participates in a routine mammography screening.   She tests positive, is alarmed and wants to know  from you whether she has breast cancer for certain   or what her chances are. Apart from the screening  result you know nothing else about this woman. In that seminar the doctors were then told  that the prevalence of breast cancer for   women of this age is about 1%, and then to  suppose that the test sensitivity is 90%   and that its specificity was 91%. You might  notice these are exactly the same numbers   from the example that you and I just  looked at, this is where I got them,   so having already thought it through you and  i know the answer, it's about 1 in 11. However   the doctors in this session were not primed  with the suggestion to picture a concrete   sample of one thousand individuals the way that  you and I had. All they saw were these numbers. They were then asked: "How many women who  test positive actually have breast cancer?   What is the best answer?", and they  were presented with these four choices.   In one of the sessions over half the doctors  present said that the correct answer was 9 and 10,   which is way off. Only a fifth  of them gave the correct answer,   which is worse than what it would have  been if everybody had randomly guessed! It might seem a little extreme to be calling  this a paradox. I mean it's just a fact, it's   not something intrinsically self-contradictory.  But as these seminars with Gigerenzer show,   people (including doctors) definitely find it  counterintuitive that a test with high accuracy   can give you such a low predictive value.   We might call this a "vertical paradox", which  refers to facts that are provably true but which   nevertheless can feel false when phrased a certain  way. It's sort of the softest form of a paradox,   saying more about human psychology than about  logic. The question is how we can combat this. Where we're going with this by the way is that I  want you to be able to look at numbers like this   and quickly estimate in your head that it means  the predictive value of a positive test should   be around 1 and 11. Or if i changed things and  asked what if it was 10% of the population who had   breast cancer, you should be able to quickly turn  around and say that the final answer would be a   little over 50%. Or if i said imagine a really low  prevalence something like 0.1% of patients having   cancer, you should again quickly estimate that the  predictive value of the test is around 1 in 100,   that 1 in 100 of those with positive test  results in that case would have cancer.   Or let's say we go back to the 1% prevalence,   but I make the test more accurate, I tell  you to imagine the specificity is 99%   there you should be able to relatively quickly  estimate that the answer is a little less than   50%. The hope is that you're doing all of  this with minimal calculations in your head. Now the goals of quick calculations might feel  very different from the goals of addressing   whatever misconception underlies this paradox,  but they actually go hand-in-hand. Let me show   you what I mean. On the side of addressing  misconceptions, what would you tell to the   people in that seminar who answered 9 and 10?  What fundamental misconception are they revealing?   What I might tell them is that in much the  same way that you shouldn't think of tests   as telling you deterministically  whether you have a disease,   you shouldn't even think of them as telling  you your chances of having a disease.   Instead, the healthy view of what tests  do is that they *update* your chances. In our example before taking the test a  patient's chances of having cancer were 1 in 100.   In Bayesian terms we call this the "prior  probability". The effect of this test was   to update that prior by almost an order of  magnitude, up to around 1 in 11. The accuracy of   a test is telling us about the strength of this  updating, it's not telling us a final answer. What does this have to do with quick  approximations? Well a key number for   those approximations is something called the  Bayes factor, and the very act of defining   this number serves to reinforce this central  lesson about reframing what it is the tests do.   You see, one of the things that makes test  statistics so very confusing is that there are   at least four numbers that you'll hear associated  with them. For those with the disease there's the   sensitivity and the false negative rate, and then  for those without there's the specificity in the   false positive rate. And none of these numbers  actually tell you the thing you want to know! Luckily if you want to interpret a positive test  result you can pull out just one number to focus   on from all this. Take the sensitivity divided  by the false positive rate. In other words how   much more likely are you to see the positive test  result with cancer versus without. In our example   this number is 10. This is the Bayes factor,  also sometimes called the likelihood ratio.   A very handy rule of thumb is  that to update a small prior,   or at least to approximate the answer, you simply  multiply it by the Bayes factor. So in our example   where the prior was 1 in 100, you would estimate  that the final answer should be around 1 in 10,   which is in fact slightly above the true correct  answer. So based on this rule of thumb, if I   asked you what would happen if the prior from our  example was instead 1 in 1,000, you could quickly   estimate that the effect of the test should  be to update those chances to around 1 in 100. And in fact take a moment to check yourself  by thinking through a sample population.   In this case you might picture 10,000 patients  where only 10 of them really have cancer. Then   based on that 90% sensitivity we would expect  9 of those cancer cases to give true positives.   And on the other side, a 91% specificity means  that 9% of those without cancer are getting   false positives, so we'd expect nine percent  of the remaining patients, which is around 900,   to give false positive results. Here, with such  a low prevalence, the false positives really do   dominate the true positives, so the probability  that a randomly chosen positive case from this   population actually has cancer is only around one  percent, just like the rule of thumb predicted. Now, this rule of thumb clearly cannot work for  higher priors. For example, that would predict   that a prior of 10% gets updated all the way  to 100% certainty, but that can't be right.   In fact take a moment to think through what the  answer should be again using a sample population.   Maybe this time we picture 10 out of 100 having  cancer again. Based on the 90% sensitivity of   the test we'd expect 9 of those cancer cases to  get positive results, but what about the false   positives? How many do we expect there? About 9  of the remaining 90, or about 8. So upon seeing   a positive test result it tells you that you're  either one of these 9 true positives or one of the   8 false positives. So this means the chances are  a little over 50%, roughly 9 out of 17, or 53%. At this point, having dared to dream that Bayesian  updating could look as simple as multiplication,   you might tear down your hopes  and pragmatically acknowledge   sometimes life is just more complicated than that. Except, it's not. This rule of thumb  turns into a precise mathematical fact   as long as we shift away from talking about  probabilities to instead talking about   odds. If you've ever heard someone talk about the  chances of an event being "1-to-1" or "2-to-1",   things like that, you already know about odds.  With probability we're taking the ratio of the   number of positive cases out of all possible  cases, right? Things like "1 in 5" or "1 in 10".   With odds what you do is take the ratio of all  positive cases to all negative cases. You commonly   see odds written with a colon to emphasize the  distinction, but it's still just a fraction,   just a number. So an event with a 50% probability  would be described as having one-to-one odds.   A 10% probability is the same as 1-to-9 odds.  An 80% probability is the same as 4:1 odds,   you get the point. It's the same information, it  still describes the chances of a random event,   but is presented a little differently,  like a different unit system.   Probabilities are constrained between 0  and 1, with even chances sitting at 0.5,   but odds range from 0 up to infinity with  even chances sitting at the number 1. The beauty here is that a  completely-accurate-not-even-approximating-things   way to frame Bayes rule is to say: Express your  prior using odds, then just multiply by the Bayes   factor. Think about what the prior odds are really  saying, it's the number of people with cancer   divided by the number without it. Here, let's just  write that down as a normal fraction for a moment   so we can multiply it. When you filter down just  to those with positive test results the number of   people with cancer gets scaled down scaled down by  the probability of seeing a positive test result   given that someone has cancer, and then similarly  the number of people without cancer also gets   scaled down, this time by the probability of  seeing a positive test result but in that case.   So the ratio between these two counts, the  new odds upon seeing the test, looks just   like the prior odds, except multiplied by this  term here, which is exactly the Bayes factor. Look back at our example where the bayes factor  was 10. And as a reminder this came from the 90%   sensitivity divided by the 9% false positive rate;  how much more likely are you to see a positive   result with cancer versus without. If the prior  is 1%, expressed as odds this looks like 1-to-99.   so by our rule this gets updated to  10-to-99, which if you want you could   convert back to a probability. It would be  10 divided by (10 + 99), or about 1 in 11. If instead the prior was 10%, which was the  example that tripped up our rule of thumb earlier,   expressed as odds this looks like 1-to-9. By  our simple rule this gets updated to 10-to-9,   which you can already read off pretty  intuitively. It's a little above even chances,   a little above 1-to-1. If you prefer you  can convert it back to a probability,   you would write it as 10 out of 19, or  about 53%. And indeed that is what we   already found by thinking things  through with a sample population. Let's say we go back to the 1% prevalence,  but I make the test more accurate.   Now what if I told you to imagine that the  false positive rate was only 1% instead of 9%.   What that would mean is that our  Bayes factor is 90 instead of 10,   the test is doing more work for us. In this  case, with the more accurate test it gets   updated to 90-to-99, which is a little less than  even chances, something a little under 50%. To be   more precise you could make the conversion back  to probability and work out that it's around 48%,   but honestly if you're just going for a  gut feel, it's fine to stick with the odds. Do you see what I mean about how just  defining this number helps to combat   potential misconceptions? For anybody who's a  little hasty in connecting test accuracy directly   to your probability of having a disease, it's  worth emphasizing that you could administer   the same test with the same accuracy  to multiple different patients   who all get the same exact result, but if  they're coming from different contexts,   that result can mean wildly different  things. However the one thing that does   stay constant in every case is the factor by  which each patient's prior odds get updated. And by the way this whole time we've been using  the prevalence of the disease, which is the   proportion of people in a population who have it,  as a substitute for the prior, the probability of   having it before you see a test. However that's  not necessarily the case! If there are other known   factors, things like symptoms, or in the case of  a contagious disease things like known contacts,   those also factor into the prior, and they  could potentially make a huge difference. As another side note, so far we've only  talked about positive test results,   but way more often you would be seeing  a negative test result. The logic there   is completely the same but the Bayes factor  that you compute is going to look different.   Instead you look at the probability of seeing  this negative test result with the disease versus   without the disease. So in our cancer example  this would have been the 10% false negative   rate divided by the 91% specificity, or about  1 in 9. In other words seeing a negative test   result in that example would reduce your  prior odds by about an order of magnitude. When you write it all out as a formula, here's how  it looks. It says your odds of having a disease   given a test result equals your odds  before taking the test, the prior odds,   times the bayes factor. Now let's contrast this  with the usual way that Bayes rule is written,   which is a bit more complicated. In case you haven't seen it before it's  essentially just what we were doing with sample   populations, but you wrap it all up symbolically.  Remember how every time we were counting the   number of true positives and then dividing it  by the sum of the true positives and the false   positives? We do just that, except instead of  talking about absolute amounts we talk of each   term as a proportion. So the proportion of true  positives in the population comes from the prior   probability of having the disease multiplied  by the probability of seeing a positive test   result in that case, and then we copy that term  down again into the denominator and then the   proportion of false positives comes from the prior  probability of not having the disease times the   probability of a positive test in that case. If  you want you could also write this down with words   instead of symbols, if terms like sensitivity  and false positive rate are more comfortable. This is one of those formulas where once you  say it out loud it seems like a bit much,   but it really is no different from what  we were doing with sample populations.   If you wanted to make the whole thing  look simpler you often see this entire   denominator written just as the probability of  seeing a positive test result overall. While that   does make for a really elegant little expression,  if you intend to use this for calculations,   it's a little disingenuous because in  practice every single time you do this   you need to break down that denominator into  two separate parts, breaking down the cases. Wo taking this more honest representation of it,  let's compare our two versions of Bayes rule.   And again maybe it looks nicer if we use the  word sensitivity and false positive rate,   if nothing else it helps emphasize which parts  of the formula are coming from statistics about   the test accuracy. I mean this actually  emphasizes one thing i really like about   the framing with odds and a Bayes factor,  which is that it cleanly factors out the   parts that have to do with the prior and the  parts that have to do with the test accuracy.   But over in the usual formula all of those are  very intermingled together. And this has a very   practical benefit, it's really nice if you want  to swap out different priors and easily see their   effects. This is what we were doing earlier.  But with the other formula, to do that you   have to recompute everything each time, you can't  leverage a pre-computed Bayes factor the same way. The odds framing also makes things really  nice if you want to do multiple different   Bayesian updates based on multiple pieces of  evidence. For example, let's say you took not   one test but two. Or you wanted to think about  how the presence of symptoms plays into it.   For each piece of new evidence you see you  always ask the question: How much more likely   would you be to see that with the disease  versus without the disease? Each answer to   that question gives you a new Bayes factor,  a new thing that you multiply by your odds. Beyond just making calculations easier, there's  something I really like about attaching a number   to test accuracy that doesn't even look like a  probability. I mean, if you hear that a test has   for example, a 9% false positive rate, that's  just such a disastrously ambiguous phrase!   It's so easy to misinterpret it to mean there's a  9% chance that your positive test result is false.   But imagine if instead the number that  we heard tacked on to test results   was that the Bayes factor for a positive test  result is, say, 10. There's no room to confuse   that for your probability of having a disease,  the entire framing of what a Bayes factor is   is that it's something that acts on a prior,  it forces your hand to acknowledge the prior   as something that's separate entirely and  highly necessary to drawing any conclusion. All that said, the usual formula is definitely  not without its merits. If you view it not simply   as something to plug numbers into, but as an  encapsulation of the sample population idea   that we've been using throughout, you could  very easily argue that that's actually much   better for your intuition. After all, it's  what we were routinely falling back on in   order to check ourselves that the Bayes factor  computation even made sense in the first place. Like any design decision there is  no clear-cut objective best here.   But it's almost certainly the case that  giving serious consideration to that question   will lead you to a better  understanding of Bayes rule Also since we're on the topic  of kind of paradoxical things,   a friend of mine Matt Cook recently wrote a book  all about paradoxes. I actually contributed a   small chapter to it with thoughts on the question  of whether math is invented or discovered, and the   book as a whole is this really nice connection  of thought-provoking paradoxical things ranging   from philosophy to math and physics. You can of  course find all the details in the description.
Info
Channel: 3Blue1Brown
Views: 555,806
Rating: 4.9723363 out of 5
Keywords: Mathematics, three blue one brown, 3 blue 1 brown, 3b1b, 3brown1blue, 3 brown 1 blue, three brown one blue
Id: lG4VkPoG3ko
Channel Id: undefined
Length: 21min 13sec (1273 seconds)
Published: Tue Dec 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.