Statistics 101: Logistic Regression Probability, Odds, and Odds Ratio

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
(clicking sounds) - [Brandon] Hello and welcome. Brandon here, thanks for choosing my video. If you like the video, please give it a thumbs up. If you think someone you know can also benefit by watching, please share. And, as always, please subscribe, I appreciate it very much. So, let's go ahead and get started. So, here we are in video two of logistic regression. Now, if you did not watch video one, I highly recommend going back and watching that one, and then coming back to this one. So, the first thing that we're gonna do in this video is just a basic review of probability. So, remember that probability is the outcomes of interest divided by the number of all possible outcomes. Let's look at a few examples. So, let's say we flip a fair coin. The probability of heads is one over two. The outcome of interest is flipping heads up out of two possibilities, heads or tails, probability is point five. How about rolling a fair die? So, what's the probability of rolling a one or a two? So, that's two outcomes of interest divided by six possible: one, two, three, four, five, or six. That's one over three, or a probably of .333, or 1/3. How about a deck of playing cards? Standard playing cards. So, what's the probability of randomly pulling out a card that's a diamond? So, there are 13 of each suite. 13 diamonds, 13 hearts, 13 clubs, and 13 spades in a normal deck of cards. The probability of pulling out a diamond is 13 out of those 52, which is obviously 1/4, or .25 So, again that's just basic probability, and really that's all you need to understand about probability to grasp the basics of logistic regression. Okay, so that was probability. Now, what about odds. What are odds? You hear them everyday. So, the odds is the probability of something occurring divided by the probability of it not occurring. So, the probability of an event divided by the probability of a non-event, an event not occurring. So, we think of it as the odds is p the probability divided by one minus p. Remember, probability can only be as high as one. So, if p is the probability, the probability of it not happening is one minus p. Let's go ahead and look at some of our other examples again, in this context. So, about flipping a fair coin. So, the odds of getting heads is point five. That's your probability of getting a heads of the event occurring divided by point five. Which, again, is the probability of it not occurring. Which, in this case is getting a tails. So, for this fair coin flip it's point five divided by point five, or one. So, the odds are one. Sometimes you'll see them written one to one, or 1:1. So, this means that odds are even, and that makes sense in this case, cause we're flipping a fair coin. So, how about rolling your fair die from before? So, what are the odds of a one or a two? Now, the probability of a one or a two is .333 repeating. We found that out from the last slide. So, that means the odds of not getting a one or a two is .666 repeating. So, we divide that out. That's one divided by two, or point five. So, in this case the odds of getting a one or a two is 1/2, or .5, or you can write as 1:2. Now, how about our deck of playing cards? So, what are the odds of pulling a diamond card out? Now, we found out the probability is .25 or 1/4. So, that means the probability of not pulling out a diamond card is the remainder or .75, so that's 1/3, or .333 repeating, or the odds are 1:3. So, again the odds are related to the probability, but it's expressed in a different way. As the probability of an event occurring divided by the probability of that same event not occurring. So, we've talked about probability, we've talked about odds. Now, we're going to talk about the odds ratio. Now, the odds ratio is exactly what it says it is. It's a ratio of two odds. So, remember our fair coin flip from the last slide. The probability of heads is point five, and therefore the odds of getting heads is one, or 1 to 1. Now, let's say we have an unfair coin, or a loaded coin. Now, in this coin the probability of getting heads is point seven, not point five. That means the odds of getting heads is point seven divided by the probability of not getting heads. Which in this case is only point three. The probability of tails is point three. So, we divide those and we end up with the odds of getting heads is 2.333 repeating in this loaded coin. So, a fair coin in the odds of head are one to one, and a loaded coin flip down here at the bottom in this case, the odds are 2.333 to 1 in favor of getting heads. So, the odds ratio is just a ratio of two odds. Now, if we wrote everything out it would look like this. So, on the top we have the odds for the first event. So, remember this is just how we figured out odds from the slide before. So, if the probability of event one divided by one minus the probability of event one. So, that is the odds for that event there on the top. Now, on the bottom we had the same thing for the other event. So, if the probability for the event on the bottom divided by one minus the probability of that event. So, we just have two odds stacked on top of each other. Now, in this case we can just go ahead and plug everything in. Now, I usually put the larger odds on the top, or the larger probability on top. You don't have to, it doesn't affect it in any way. You just wanna make sure you interpret it correctly once you do the calculation. So, I'm gonna put the loaded coin on top. So, we have point seven divided by point three, that's our loaded coin, divided by our fair coin. Which is point five divided by point five. So, we go ahead and multiply those two fractions together, and we end up with point three five divided by point one five, and we do that division we have an odds ratio of two point three three three. Now, in this case that comes out very easy because the fair coin odds is one. So, that is why the loaded coin flip is the same as the odds ration over on the right. It just happens to be how it is in this example by sheer luck basically. So, what does this mean? It means the odds of getting heads on the loaded coin are two point three three three times greater than the fair coin. But, this is how the probability, odds, and odds ratio work. And it is central to understanding and interpreting the output from logistic regression. So, speaking of the odds ratio. Let's go ahead and talk about the role of the odds ratio in logistic regression. Now, in this slide we will use a very brief, very simple example. It is not related to our overarching problem on home mortgages, but again just to give you a quick insight into how to interpret the odds ratio from the output of a computer, if you need to do that very quickly. So, what is the odds ratio? Well, in logistic regression the odds ratio for a variable and independent variable represents how the odds change with a one unit increase in that variable holding all other variables constant. Now, if you're new to logistic regression that may not make a lot of sense right now. But, hopefully in the next minute or two it will. Let's just look at a fictitious example. Let's say we were looking at a study that involved a persons body weight, and whether or not they have sleep apnea. If you don't know, sleep apnea is a condition where people stop breathing momentarily and often repeatedly in their sleep. Now, of course that can cause a lot of health problems. So, we're going to look at how body weight is related to whether or not a person is diagnosed with sleep apnea or not. So, we did this analysis and in SPSS, or R, or mini tab, or whatever. Our weight variable had an odds ratio in the output of one point zero seven. Now, what does that mean? Well, this means that a one pound increase in body weight increases the odds of having sleep apnea by a factor of one point zero seven. Now, that also means seven percent. So, O7 is seven percent. If we're by a factor of two, I would have made a 100 percent increase. That's how we can kind of go back and forth between odds and percentages. So, this not very high because we're looking at only one pound increments in body weight. Which is actually a relatively small way to measure. Now, using that information we can also find out some other things for other amounts of body weight. So, a ten pound increase in body weight increases the odds to 1.98 or increases the odds by 98 percent. Or almost doubles a persons odds of having sleep apnea, and a 20 pound increase in body weight raises the odds to 3.87, or by almost a factor of four, and we will learn how to do these calculations later. The thing about logistic regression is that this holds true at any point in the weight spectrum. So, if I went from a weight of 200 pounds to 201 pounds, it would be one point zero seven. If I went from 150 pounds to 151 pounds, the odds ration would still be one point zero seven. If I went from a 10 pound increase, let's say 200 to 210 that would have the same odds ration of one point nine eight as going from 130 to 140 it would still be the odd ratio of one point nine eight. So, the odds ratio holds true for any interval, that same interval along the weight spectrum. And again, we'll talk about that in future videos as we go forward. The last slide is a warning. It is very important to separate probability and odds. In the previous example a person gaining 20 pounds increases their odds of sleep apnea by almost a factor of four, regardless of their starting weight. Cause remember, that 20 pound increase applies at any point in the weight spectrum. However, the probability of having apnea is lower in people with lower body weight to begin with. So, why is that important? So, while the odds are four times greater, the probability may still be low. We have to separate odds and probability. So, even though gaining 20 pounds increases the odds by a factor of four, the reality is is that people with lower body weight have a starting probability that's low to begin with. So, basically what this means is that the odds can have a large magnitude change even if the underlying probabilities are low. And here's the last example just off the top of my head. Let's say we have two probabilities, the first probability is that you are struck by lightning. The second probability is that you are hit by a meteor falling out of the sky. Now, the probability of either one of those happening is minuscule. Very, very, very low. However, the probability of being struck by lightning is higher than being hit by a meteor falling out of the sky. So, the odds of being hit by lightning are probably going to be much higher. Even though the probabilities to begin with are very, very, very, low. So, we have to keep in mind the difference between probability and odds, as we go forward interpreting logistic regression problems. So, I'll see you in video number three. Thanks for watching. (clicking sounds)
Info
Channel: Brandon Foltz
Views: 348,043
Rating: 4.9634371 out of 5
Keywords: logistic regression brandon foltz, statistics 101 logistic regression, odds ratio, logistic regression, logistic regression explained, simple logistic regression, logit regression, logistic regression analysis, logistic regression basics, logistic regression example, logistic regression model, machine learning, machine learning basics, machine learning tutorial, data science, Linear regression, odds ratio explained, machine learning explained
Id: ckkiG-SDuV8
Channel Id: undefined
Length: 13min 3sec (783 seconds)
Published: Sun Mar 08 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.