Why Bayes rule is nicer with odds

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Absolute gold mine Grant. You've changed the way I look at probability. This used to be my weakest point. Now I'm slowly able to tackle newer and newer concepts.

👍︎︎ 13 👤︎︎ u/kingbradley1297 📅︎︎ Dec 22 2020 🗫︎ replies

Great vid! One thing I gathered that I want to explore more is the intersection between quick estimations and combating misconceptions.

I feel like both teachers and students alike struggle with this as I think the idea of "quick estimations" has been misunderstood (maybe that's just me). I think alot of profs I've had have just interpreted that as giving the student many problems instead of specifically tailored problems. Do quick estimations refer to solving a simple case (like Prior << 1 in the video)? Or is it a gross approximation, like a Taylor series expansion to understand a function better around a local area (I know weird example that's all I could think of). Obviously, the example chosen greatly affects how the misconception is tackled but there lies my question, what makes a good example for an avid learner?

Maybe I'm overthinking this or I'm completely misunderstanding it, but I do feel like this is such an integral part of the learning process. Also, if anyone is willing to share some resources on the matter it would be greatly appreciated.

👍︎︎ 7 👤︎︎ u/boiSlimThick 📅︎︎ Dec 22 2020 🗫︎ replies

Wow, this was very helpful. I've always knew I had to be very careful handlig medical tests sensitivity, but I was never actually able to grasp its logical description. Very cool video! (as usual).

But, as every proper explanation should do, as much as it clarified me, as much it raised more questions. If it is very clear now how a medical test can update your probability of a certain result, now I'm struggling to understand the prior, or rather, I'm not understanding how to deal with my medical test when i don't have any information about the prior whatsoever.

Let me try to make an example.
Let's say I created a sensor for detecting the presence of a molecule inside a solution. Let's assume for simplicity that the sensor result is simply "yes, the molecule is present" or "no, the molecule is not present". Let's assume I can create some test solutions, which content is know with 100% accuracy, that I can use for determining the four parameters of my sensor: sensitivity, specificity, FPR, FNR.
Having done so, what can I say about the result given by an arbitrary solution which content is completely unknown?

Ok, the example is full of flaws, but please try to understand my point. I imagine it is possible to create a medical test which parameters (sens., spec., FPR, FN) can be determined experimentally by "feeding" the test with known test samples. If this is possible, I could then use the medical test with a population which prior is completely unknown, but what can i say about the test result then?!

👍︎︎ 4 👤︎︎ u/Niccco_ 📅︎︎ Dec 23 2020 🗫︎ replies

Hi Grant. Do u like making a video about History of Numbers like how people got inspired about numbers becoz that shot everywhere.

👍︎︎ 1 👤︎︎ u/MMS_2705 📅︎︎ Dec 23 2020 🗫︎ replies

I really, really, enjoyed this video, Bayes theorem presented in such a simple way simply cuts through all explanations I have tried wrapping my head around previously. I actually think this method is the best bit of probability theory I have come across. My question...where can I find more resources, i.e. books, that cover this idea and in particular it's practicalities. I have been scouring the internet for a couple of days and I came across a paper about hypothesis testing, though it was a bit heavy for me. Anybody have links to more information about calculating Bayes factor? Thanks!

👍︎︎ 1 👤︎︎ u/Conscious_Chicken_22 📅︎︎ Dec 24 2020 🗫︎ replies

Captions

Some of you may have heard this paradoxical fact about medical tests. It's very commonly used to introduce the topic of bayes rule in probability. The paradox is that you could take a test which is highly accurate, in the sense that it gives correct results to a large majority of the people taking it, and yet under the right circumstances when assessing the probability that your particular test result is correct, you can still land on a very low number. Arbitrarily low in fact. In short, an accurate test is not necessarily a very predictive test. When people think about math and formulas they don't often think of it as a design process. I mean maybe in the case of notation it's easy to see that different choices are possible, but when it comes to the structure of the formulas themselves and how we use them that's something that people typically view as fixed. In this video you and i will dig into this paradox, but instead of using it to talk about the usual version of bayes rule, I'd like to motivate an alternate version, an alternate design choice. Now what's up on the screen is a little bit abstract, which makes it difficult to justify that there really is a substantive difference here, especially when i haven't explained either one yet. To see what i'm talking about though we should really start by spending some time a little more concretely and just laying out what exactly this paradox is. Picture one thousand women and suppose that one percent of them have breast cancer. And let's say they all undergo a certain breast cancer screening and that nine of those with cancer correctly get positive results and there's one false negative. And then suppose that among the remainder without cancer 89 get false positives and 901 correctly get negative results . So if all you know about a woman is that she does the screening and she gets a positive result, you don't have information about symptoms or anything like that, you know that she's either one of these 9 true positives or one of these 89 false positives. So the probability that she's in the cancer group given the test result is 9 divided by (9 + 89) which is approximately 1 in 11. In medical parlance you would call this the "Positive Predictive Value" of the test, or PPV. The number of true positives divided by the total number of positive test results. You can see where the name comes from to what extent does a positive test result actually predict that you have the disease. Now hopefully, as I've presented it this way where we're thinking concretely about a sample population, all of this makes perfect sense. But where it comes across as counterintuitive is if you just look at the accuracy of the test, present it to people as a statistic, and then ask them to make judgments about their test result. Test accuracy is not actually one number but two. First you ask how often is the test correct on those with the disease, this is known as the test sensitivity. As in how sensitive is it to detecting the presence of the disease. In our example test sensitivity is 9 in 10 or 90%. Another way to say the same fact would be to say the false negative rate is 10%. And then a separate not-necessarily-related number is how often it's correct for those without the disease, which is known as the test specificity. As in, are positive results caused specifically by the disease or are there confounding triggers giving false positives? In our example the specificity is about 91%. Or another way to say the same fact would be to say the false positive rate is 9%. So the paradox here is that in one sense the test is over 90% accurate, it gives correct results to over 90% of the patients who take it. And yet if you learn that someone gets a positive result without any added information, there's actually only a 1 in 11 chance that that particular result is accurate. This is a bit of a problem because of all of the places for math to be counter-intuitive medical tests are one area where it matters a lot. In 2006 and 2007 the psychologist Gerd Gigerenzer gave a series of statistics seminars to practicing gynecologists and he opened with the following example: A 50 year old woman, no symptoms participates in a routine mammography screening. She tests positive, is alarmed and wants to know from you whether she has breast cancer for certain or what her chances are. Apart from the screening result you know nothing else about this woman. In that seminar the doctors were then told that the prevalence of breast cancer for women of this age is about 1%, and then to suppose that the test sensitivity is 90% and that its specificity was 91%. You might notice these are exactly the same numbers from the example that you and I just looked at, this is where I got them, so having already thought it through you and i know the answer, it's about 1 in 11. However the doctors in this session were not primed with the suggestion to picture a concrete sample of one thousand individuals the way that you and I had. All they saw were these numbers. They were then asked: "How many women who test positive actually have breast cancer? What is the best answer?", and they were presented with these four choices. In one of the sessions over half the doctors present said that the correct answer was 9 and 10, which is way off. Only a fifth of them gave the correct answer, which is worse than what it would have been if everybody had randomly guessed! It might seem a little extreme to be calling this a paradox. I mean it's just a fact, it's not something intrinsically self-contradictory. But as these seminars with Gigerenzer show, people (including doctors) definitely find it counterintuitive that a test with high accuracy can give you such a low predictive value. We might call this a "vertical paradox", which refers to facts that are provably true but which nevertheless can feel false when phrased a certain way. It's sort of the softest form of a paradox, saying more about human psychology than about logic. The question is how we can combat this. Where we're going with this by the way is that I want you to be able to look at numbers like this and quickly estimate in your head that it means the predictive value of a positive test should be around 1 and 11. Or if i changed things and asked what if it was 10% of the population who had breast cancer, you should be able to quickly turn around and say that the final answer would be a little over 50%. Or if i said imagine a really low prevalence something like 0.1% of patients having cancer, you should again quickly estimate that the predictive value of the test is around 1 in 100, that 1 in 100 of those with positive test results in that case would have cancer. Or let's say we go back to the 1% prevalence, but I make the test more accurate, I tell you to imagine the specificity is 99% there you should be able to relatively quickly estimate that the answer is a little less than 50%. The hope is that you're doing all of this with minimal calculations in your head. Now the goals of quick calculations might feel very different from the goals of addressing whatever misconception underlies this paradox, but they actually go hand-in-hand. Let me show you what I mean. On the side of addressing misconceptions, what would you tell to the people in that seminar who answered 9 and 10? What fundamental misconception are they revealing? What I might tell them is that in much the same way that you shouldn't think of tests as telling you deterministically whether you have a disease, you shouldn't even think of them as telling you your chances of having a disease. Instead, the healthy view of what tests do is that they *update* your chances. In our example before taking the test a patient's chances of having cancer were 1 in 100. In Bayesian terms we call this the "prior probability". The effect of this test was to update that prior by almost an order of magnitude, up to around 1 in 11. The accuracy of a test is telling us about the strength of this updating, it's not telling us a final answer. What does this have to do with quick approximations? Well a key number for those approximations is something called the Bayes factor, and the very act of defining this number serves to reinforce this central lesson about reframing what it is the tests do. You see, one of the things that makes test statistics so very confusing is that there are at least four numbers that you'll hear associated with them. For those with the disease there's the sensitivity and the false negative rate, and then for those without there's the specificity in the false positive rate. And none of these numbers actually tell you the thing you want to know! Luckily if you want to interpret a positive test result you can pull out just one number to focus on from all this. Take the sensitivity divided by the false positive rate. In other words how much more likely are you to see the positive test result with cancer versus without. In our example this number is 10. This is the Bayes factor, also sometimes called the likelihood ratio. A very handy rule of thumb is that to update a small prior, or at least to approximate the answer, you simply multiply it by the Bayes factor. So in our example where the prior was 1 in 100, you would estimate that the final answer should be around 1 in 10, which is in fact slightly above the true correct answer. So based on this rule of thumb, if I asked you what would happen if the prior from our example was instead 1 in 1,000, you could quickly estimate that the effect of the test should be to update those chances to around 1 in 100. And in fact take a moment to check yourself by thinking through a sample population. In this case you might picture 10,000 patients where only 10 of them really have cancer. Then based on that 90% sensitivity we would expect 9 of those cancer cases to give true positives. And on the other side, a 91% specificity means that 9% of those without cancer are getting false positives, so we'd expect nine percent of the remaining patients, which is around 900, to give false positive results. Here, with such a low prevalence, the false positives really do dominate the true positives, so the probability that a randomly chosen positive case from this population actually has cancer is only around one percent, just like the rule of thumb predicted. Now, this rule of thumb clearly cannot work for higher priors. For example, that would predict that a prior of 10% gets updated all the way to 100% certainty, but that can't be right. In fact take a moment to think through what the answer should be again using a sample population. Maybe this time we picture 10 out of 100 having cancer again. Based on the 90% sensitivity of the test we'd expect 9 of those cancer cases to get positive results, but what about the false positives? How many do we expect there? About 9 of the remaining 90, or about 8. So upon seeing a positive test result it tells you that you're either one of these 9 true positives or one of the 8 false positives. So this means the chances are a little over 50%, roughly 9 out of 17, or 53%. At this point, having dared to dream that Bayesian updating could look as simple as multiplication, you might tear down your hopes and pragmatically acknowledge sometimes life is just more complicated than that. Except, it's not. This rule of thumb turns into a precise mathematical fact as long as we shift away from talking about probabilities to instead talking about odds. If you've ever heard someone talk about the chances of an event being "1-to-1" or "2-to-1", things like that, you already know about odds. With probability we're taking the ratio of the number of positive cases out of all possible cases, right? Things like "1 in 5" or "1 in 10". With odds what you do is take the ratio of all positive cases to all negative cases. You commonly see odds written with a colon to emphasize the distinction, but it's still just a fraction, just a number. So an event with a 50% probability would be described as having one-to-one odds. A 10% probability is the same as 1-to-9 odds. An 80% probability is the same as 4:1 odds, you get the point. It's the same information, it still describes the chances of a random event, but is presented a little differently, like a different unit system. Probabilities are constrained between 0 and 1, with even chances sitting at 0.5, but odds range from 0 up to infinity with even chances sitting at the number 1. The beauty here is that a completely-accurate-not-even-approximating-things way to frame Bayes rule is to say: Express your prior using odds, then just multiply by the Bayes factor. Think about what the prior odds are really saying, it's the number of people with cancer divided by the number without it. Here, let's just write that down as a normal fraction for a moment so we can multiply it. When you filter down just to those with positive test results the number of people with cancer gets scaled down scaled down by the probability of seeing a positive test result given that someone has cancer, and then similarly the number of people without cancer also gets scaled down, this time by the probability of seeing a positive test result but in that case. So the ratio between these two counts, the new odds upon seeing the test, looks just like the prior odds, except multiplied by this term here, which is exactly the Bayes factor. Look back at our example where the bayes factor was 10. And as a reminder this came from the 90% sensitivity divided by the 9% false positive rate; how much more likely are you to see a positive result with cancer versus without. If the prior is 1%, expressed as odds this looks like 1-to-99. so by our rule this gets updated to 10-to-99, which if you want you could convert back to a probability. It would be 10 divided by (10 + 99), or about 1 in 11. If instead the prior was 10%, which was the example that tripped up our rule of thumb earlier, expressed as odds this looks like 1-to-9. By our simple rule this gets updated to 10-to-9, which you can already read off pretty intuitively. It's a little above even chances, a little above 1-to-1. If you prefer you can convert it back to a probability, you would write it as 10 out of 19, or about 53%. And indeed that is what we already found by thinking things through with a sample population. Let's say we go back to the 1% prevalence, but I make the test more accurate. Now what if I told you to imagine that the false positive rate was only 1% instead of 9%. What that would mean is that our Bayes factor is 90 instead of 10, the test is doing more work for us. In this case, with the more accurate test it gets updated to 90-to-99, which is a little less than even chances, something a little under 50%. To be more precise you could make the conversion back to probability and work out that it's around 48%, but honestly if you're just going for a gut feel, it's fine to stick with the odds. Do you see what I mean about how just defining this number helps to combat potential misconceptions? For anybody who's a little hasty in connecting test accuracy directly to your probability of having a disease, it's worth emphasizing that you could administer the same test with the same accuracy to multiple different patients who all get the same exact result, but if they're coming from different contexts, that result can mean wildly different things. However the one thing that does stay constant in every case is the factor by which each patient's prior odds get updated. And by the way this whole time we've been using the prevalence of the disease, which is the proportion of people in a population who have it, as a substitute for the prior, the probability of having it before you see a test. However that's not necessarily the case! If there are other known factors, things like symptoms, or in the case of a contagious disease things like known contacts, those also factor into the prior, and they could potentially make a huge difference. As another side note, so far we've only talked about positive test results, but way more often you would be seeing a negative test result. The logic there is completely the same but the Bayes factor that you compute is going to look different. Instead you look at the probability of seeing this negative test result with the disease versus without the disease. So in our cancer example this would have been the 10% false negative rate divided by the 91% specificity, or about 1 in 9. In other words seeing a negative test result in that example would reduce your prior odds by about an order of magnitude. When you write it all out as a formula, here's how it looks. It says your odds of having a disease given a test result equals your odds before taking the test, the prior odds, times the bayes factor. Now let's contrast this with the usual way that Bayes rule is written, which is a bit more complicated. In case you haven't seen it before it's essentially just what we were doing with sample populations, but you wrap it all up symbolically. Remember how every time we were counting the number of true positives and then dividing it by the sum of the true positives and the false positives? We do just that, except instead of talking about absolute amounts we talk of each term as a proportion. So the proportion of true positives in the population comes from the prior probability of having the disease multiplied by the probability of seeing a positive test result in that case, and then we copy that term down again into the denominator and then the proportion of false positives comes from the prior probability of not having the disease times the probability of a positive test in that case. If you want you could also write this down with words instead of symbols, if terms like sensitivity and false positive rate are more comfortable. This is one of those formulas where once you say it out loud it seems like a bit much, but it really is no different from what we were doing with sample populations. If you wanted to make the whole thing look simpler you often see this entire denominator written just as the probability of seeing a positive test result overall. While that does make for a really elegant little expression, if you intend to use this for calculations, it's a little disingenuous because in practice every single time you do this you need to break down that denominator into two separate parts, breaking down the cases. Wo taking this more honest representation of it, let's compare our two versions of Bayes rule. And again maybe it looks nicer if we use the word sensitivity and false positive rate, if nothing else it helps emphasize which parts of the formula are coming from statistics about the test accuracy. I mean this actually emphasizes one thing i really like about the framing with odds and a Bayes factor, which is that it cleanly factors out the parts that have to do with the prior and the parts that have to do with the test accuracy. But over in the usual formula all of those are very intermingled together. And this has a very practical benefit, it's really nice if you want to swap out different priors and easily see their effects. This is what we were doing earlier. But with the other formula, to do that you have to recompute everything each time, you can't leverage a pre-computed Bayes factor the same way. The odds framing also makes things really nice if you want to do multiple different Bayesian updates based on multiple pieces of evidence. For example, let's say you took not one test but two. Or you wanted to think about how the presence of symptoms plays into it. For each piece of new evidence you see you always ask the question: How much more likely would you be to see that with the disease versus without the disease? Each answer to that question gives you a new Bayes factor, a new thing that you multiply by your odds. Beyond just making calculations easier, there's something I really like about attaching a number to test accuracy that doesn't even look like a probability. I mean, if you hear that a test has for example, a 9% false positive rate, that's just such a disastrously ambiguous phrase! It's so easy to misinterpret it to mean there's a 9% chance that your positive test result is false. But imagine if instead the number that we heard tacked on to test results was that the Bayes factor for a positive test result is, say, 10. There's no room to confuse that for your probability of having a disease, the entire framing of what a Bayes factor is is that it's something that acts on a prior, it forces your hand to acknowledge the prior as something that's separate entirely and highly necessary to drawing any conclusion. All that said, the usual formula is definitely not without its merits. If you view it not simply as something to plug numbers into, but as an encapsulation of the sample population idea that we've been using throughout, you could very easily argue that that's actually much better for your intuition. After all, it's what we were routinely falling back on in order to check ourselves that the Bayes factor computation even made sense in the first place. Like any design decision there is no clear-cut objective best here. But it's almost certainly the case that giving serious consideration to that question will lead you to a better understanding of Bayes rule Also since we're on the topic of kind of paradoxical things, a friend of mine Matt Cook recently wrote a book all about paradoxes. I actually contributed a small chapter to it with thoughts on the question of whether math is invented or discovered, and the book as a whole is this really nice connection of thought-provoking paradoxical things ranging from philosophy to math and physics. You can of course find all the details in the description.

Info

Channel: 3Blue1Brown

Views: 555,806

Rating: 4.9723363 out of 5

Keywords: Mathematics, three blue one brown, 3 blue 1 brown, 3b1b, 3brown1blue, 3 brown 1 blue, three brown one blue

Id: lG4VkPoG3ko

Channel Id: undefined

Length: 21min 13sec (1273 seconds)

Published: Tue Dec 22 2020