Bayes' Theorem Example: Surprising False Positives

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
suppose you go to the doctor and you get some routine medical tests done well there's a couple of different ways that that particular test could be in fact inaccurate one ways what we call false positives where the tests tells you that you're positive for whatever it being tested for whatever disease it might be but you're not actually positive that the test was incorrect to say that you were positive we also have false negatives where the test says you don't have the disease but unfortunately you actually do so what I want to investigate is what happens if you know that false positive rate how confident are you in the test results that you get for example if I imagine a scenario where I've got a five percent false positive rate so that means of all the situations where you don't have the disease five percent of the time it tells you that it you do in fact have the disease so then my question to you is this suppose you've taken a test you've gotten a positive result and you know that it has this five percent false positive rate what then are the chances that in fact you've got the particular illness now you might be thinking well five percent of the time I don't have the illness ninety five percent of the time I do have the illness that is a reasonable first guess but I want to ask you than this does it matter how rare the actual underlying diseases for example if one out of a hundred people have it does that change the answer that you're gonna get if it's one out of ten thousand or one out of a million or 1 out of 3 that's something to think about and also something to think about is okay I've got a false positive rate but does it matter if I know what the false negative rate is as well in this case maybe I'll say that ten percent of time because a false negative rate you can start to think that perhaps these things do matter for instance all the times where you actually have the illness if you've got a very high false negative rate then you might not always reveal that with the positive test so knowing you've got a positive test might hide a few cases so to look at this I want to imagine a sort of specific example where I've actually got a hundred people so here we put up hundred different people now remember I said 1% one of a hundred people are going to actually have that disease so here we go we have this one person and then there's 99 remaining and we know that if all of those 99 people were going to go and take the test none of them actually have the disease but five percent of the time it's gonna give a false positive so five percent of 99 is approximately five so there's going to be this other group of five people who are going to also claim positive so what we see if we count them up is we have this one person who actually is positive and ninety percent of time it would it would show they were positive on the test any of these five other people who are not positive but they test positive because of this false positive problem so really they're sort of six people here and only one out of the six of them are gonna demonstrate as having the disease so we can say it's approximately one-sixth chance but you actually have this particular disease however this is not exactly accurate it's not exactly one sick it's not exactly the sixteen point six seven percent that one six represents because first of all I said five percent but there's only 99 remaining things that's a little bit off and then this red dot it actually only shows up as being positive ninety percent of the times here to take a test we've got little things around the edge you've gotta deal with but approximately one in six chance by the way one in six chance is way way way lower than you were probably thinking because you were initially guessing ninety-five percent chance and the five percent bill rates of 95 percent chance but here I'm telling you it's only about sixteen point six seven percent far far far far lower and it has to do with how rare the diseases in the general population now the way we're going to formally compute these out exactly is with Bayes theorem and Bayes theorem is a way to relate different conditional probabilities when I write something like the probability of a given B that's what I mean by this probability of a line B probability of a given B this is gonna tell me what is the chance that a if I know that be occurs and what's a powerful will be serum is that if you don't know that you often know it the other way around you often do know the probability that B occurs if a occurs and so Bayes theorem allows you to relate these different conditional probabilities so the way I like to think about it is that we we know what we what we have already is one of the two conditional probabilities in this case the one on the right the probability of B given a but that what we want to get is the other way around the different conditional probability okay so what's a and B in our scenario in our scenario we're trying to talk about the probability that you have the disease given that you've tested positive for the disease so in this case our a is going to be that we have the disease and that our B is gonna be that we tested positive for the disease okay so we got to be able to go and put our numbers we've seen previously into this formula but I gotta do one little piece of work first let's look just at the numerator okay so what's going on on the numerator well this is the probability that you test positive given that you have the disease times the probability that you have the disease this is just the same thing as saying the probability if you if you have the disease and you test positive for it that's what the numerator saying the probability you have the disease and that you test positive positive for it but what about the denominator here well there's actually two different cases the denominator the probability that you test positive splits into these two cases the first is that you actually have the disease and that you test positive for it so that's what we're gonna say that probability you have the disease that you test positive is exactly what the numerator is but the other possibility is that you don't have the disease and that you get a false positive and you do test positive for it so that's another way that you can test positive you don't have the disease but you do test positive so this denominator this P of B I don't have an immediate answer to it but I actually know it's the sum of these two different cases but one case is just the numerator and the other case we'll figure out so let's expand it a nominator keeping everything else exactly the same and what we're gonna get is this formula so on the left hand side we have that same if you test positive and you have disease that's the probability of B given a types of public today but then on the right hand side but we're gonna have is this is the probability that you test positive even though you don't have the disease but sort of squiggly sign means none this is saying the probability that you don't have the disease and you're asking what's the probability you test positive given that you don't have the disease okay so now we have a formula this is Bayes formula in the case of two disjoint cases you either have it or you don't and it separates this denominator into these two different buckets if you will you can see my previous video for a different example having multiple buckets now if we remember all the way back to the beginning where I laid out the statistics for my hypothetical test we said that there is a 10% false negative rate which means that 90% of the time that you actually have the disease the test says yes you had the disease so the probability of B given a the probability that you test positive given that you have the disease is 90% so I'm gonna come in there and put in a point naught okay the probability you had the disease we said one of them a hundred people are gonna actually have it so this is gonna be 0.01 one percent and then down on the bottom on the left hand side normalize exactly the same so I can just plug that in same numbers now the probability that you test positive given that you don't have the disease well that was our five percent false positive rate so I can come in here and put a 5% in for that and finally the probability you don't have the disease well if one percent of people have the disease ninety nine percent do not so I'm gonna put in point nine nine and I can compute this out and I get about fifteen point four percent just a little bit lower than the one six that we sort of reasoned our way we're going to be close to at the beginning this is really bad this is a really small number or maybe you should think in the case of not wanting diseases is a really good thing that is a small number if you test positive for a rare disease it actually doesn't mean you're that likely to have the particular disease in addy you want or do mall the full test to be able to confirm this so what if we do that when we've done one test it says we're positive but we only think there's a fifteen point four percent chance that we're actually positive so let's do this whole thing again all right so now what I'm doing is the same base formula that I had before nothing's changed and the actual formula except that what I'm calling B has now changed B is now that I test twice positive in a row I test positive once and then I do it again and I test positive a second time this is gonna change a lot of the numbers some of the reduce state is saying the probability of a and the probability of not a those don't change so I can come in here just put those in the same 1% and 99 percent they were before okay but what about the probability that you test positive Weiss in a row given you have the disease well let's think about it the first time you take the test you got a 90% chance of testing positive if you have the disease 90% so then if I do in a whole second time it's 90 percent of that 90 percent and since 9 times 9 is 81 this gives me point 8 1 or 81 percent of the time so fine if I have the disease is 1 percent the time the test will tell me twice in a row that yes I have the disease all right so I put that point 8 1 it final thing to figure out okay what's the probability that I test twice positive if I don't have the disease well if you don't have the disease and only a 5% chance that you're gonna test positive but you'll have a false positive the first time so 5% the first time 5% the second time so 5% times 5 percent is going to be 0.0025 and I can compute that out and what I get is approximately 77% so going way better I've gone from some about 15% chance of having this particular disease all the way up to a 77% chance of having this disease which still is nowhere near the 95 percent that you might have initially guessed so what's the big takeaway from this it's that the rate of which something occurs in the general population is actually really really important if instead of 1% we put in say 1 in 10 or 1 in a million all of these numbers gonna really change that said think about when you might actually get a test so for example if I go in and get a particular blood test it might be that everybody in the population is getting this particular test in which case the rate of which the disease occurs in the population there's 1% we put in here is the relevant thing to consider but a lot of the time and you get a medical test you actually know something else you come in with a list of symptoms and those symptoms influence the sort of category of people that you're in so for example if you're going and getting a prostate exam as an older male well you don't want to compare to the rate of prostate cancer in for example all males we want to consider the rate of prostate cancer in males of your same demographic now this kind of thing is very common and based there because the whole point about Bayesian analysis is that as I learn new things as I get more information coming then I get to update the probabilities by which I believe events are going to occur that there's a difference between my probability of having a particular disease if I don't do any tests it's just one in a hundred if I take one test which was this 15% or if I take two tests which is this 77% if I then update my information even more and I find that I'm actually in some high-risk category given my symptoms then my probabilities are gonna go up further and further so you need to be always very clear about what your assumptions aren't as more information comes in you can adjust accordingly using this Bayesian inference
Info
Channel: Dr. Trefor Bazett
Views: 71,371
Rating: 4.9486079 out of 5
Keywords: Education, Math, Solution, Discrete, Bayes' Theorem, Conditional, Probability, False Positive, test, illness, example
Id: HaYbxQC61pw
Channel Id: undefined
Length: 12min 36sec (756 seconds)
Published: Tue Feb 06 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.