Econometrics - Binary Dependent Variables (Probit, Logit, and Linear Probability Models)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in this video i'm going to be talking about binary dependent variables now everything we've done so far has been with ordinary least squares and ordinarily squares is great but it does assume a couple of things and one of the things that it assumes is that the dependent variable effectively has infinite range it can take any value it's continuous and unbounded now that's often not true most of the variables that we work with in fact couldn't really in literally take any value uh they're constrained within some reasonable range like even something that's theoretically infinite range like you could have infinite negative wealth or infinite positive wealth realistically that's not really going to happen right but you know even beyond that generally that works okay right you know if if it's if that's a reasonable assumption you know maybe it's not perfectly continuous but it's kind of continuous and maybe it doesn't have an infinite range but it has a pretty good range that extends beyond the range of our data we're fine with that but there are a couple of kinds of dependent variables where the strain on that assumption that we are continuous and infinitely bounded just gets to be too much and in that case we need to use some other kind of model besides ordinary least squares to handle that kind of data and the most common form of this is when we have a binary dependent variable independent variable that it can only take one value or another this is so common because we are often interested in whether something happened or not like let's say we put you through uh you know some sort of a program that's supposed to make you more likely to graduate college well did you graduate college or not that's a binary outcome there's a lot of different outcomes like that did you survive to next year that's binary you know so there's lots of different binary dependent variables that we're interested in and ols is not going to necessarily do such a great job of it and let's demonstrate why that is exactly so first thing let's do let's start working with binary dependent variables and let's still work with ordinary least squares this is going to be called a linear probability model a linear probability model is you just take ordinarily squares and you use it with a binary dependent variable that is it that's the whole thing right so uh what's going on with this why is it so bad is it so bad and how does it work so for one thing um how does how would you interpret a model like this so if we run this regression of the binary dependent variable on our regression our our independent variables that we're interested in the nice thing is that we can interpret the exact same way as we would typically interpret our ordinarily squares except now the dependent the the outcome is in terms of a probability so if we get a coefficient of 0.03 uh typically that would be interpreted as a one unit increase in x is associated with a 0.03 increase in the dependent variable but that doesn't make any sense here dependent variable can't increase by 0.03 but what it can do is increase by 0.03 in probability so if before we thought you had a 47 chance of having a value of 1 in your dependent variable well now we're going to assume that you have a 0.5 or a 50 chance of having a 1. right so the interpretation is all the same it's just that instead of actually affecting the value of the dependent variable it is affecting the probability that the dependent variable is equal to one so that's how we can interpret it so what's the problem with this so i will say i'm an economist if you're watching this you're not an economist uh you probably have heard many more bad things about linear probability model economists tend to like the linear probability model does a couple of things really well it's very easy to interpret uh you know if you're only interested in certain kinds of slopes that economists tend to be interested in then it works okay it's great if you have a lot of fixed effects but it has some real problems as well first of all it makes terrible predictions and second of all the slopes themselves are actually incorrect in some cases so what's going on there with that so first of all something that you might notice is that this is fitting a straight line and the straight lines have infinite range and that means that eventually you're going to predict that the value is the probability that you have get a one is either below zero or above one this is always going to be the case as long as you go far enough out in the range in either direction but often it is even the case within the narrow range of the data that you actually have so here we have an ordinarily squares oh we also have a typo i'll fix that later but we have ordinarily squares line that we fit uh and you can see that over here on the left that we have you know the the the ones tend to be grouped up at a higher value of x and all the zeros is going to be grouped down at lower values of x so unsurprisingly we have a positive slope that's sort of what we'd want but that also means that if we go to very small values of x we're going to predict that you have a less than zero chance of getting a one value which makes no sense right the prediction is terrible it makes no sense at all but why is the slope also going to be wrong well that's because it's a straight line and uh what you really want is that you want the prediction and therefore the slope to sort of flatten out as you get towards zero or towards one right so what should the line do as it gets closer and closer and closer to zero well we want to sort of curve away and not hit the zero we don't want to give a negative prediction and so inherently because we have bounded data the slope shouldn't be constant ordinarily squares fits a straight line it gives a constant slope but we can't have a constant slope because we go off far enough out to the side eventually we're going to break the one barrier so we want to curve before it hits that point so the slope should change depending on which x value you are at ols does not do that so particularly around the areas of x where we're close to zero or close to one the slope is going to be wrong right so if we're down here we're going to say okay if we reduce x by one unit i predict that the probability is going to drop by five percentage points but that puts us down here it should say down here if we reduce x by one we're going to drop the probability by 0.01 percentage points so there's not much more room to drop right if we're at three percent already we can't drop five by five percent now maybe five percent drop made more sense in the middle here but not near the edges so in the middle when we're predicting that the outcome has about 50 chance to happen the ols slopes are fine but out near the edges they don't work so well so we have linear probability models they have problems what can we do instead instead we're going to use something called a generalized linear model so generalized linear model has a lot of nice features it works with a lot of the same intuition as ordinarily squares and it's the same idea so here we had an ols equation y is a function of x but we're just going to take this exact same thing this beta0 plus beta 1x and we're just going to run it through some function and then we're going to use that to predict y right so instead of predicting y with without that straight line we're going to take that straight line run it through a function and then use the function of that straight line to predict y that's the only difference right so we're generalizing it we can see that as a general version as well right because if we make this function just you know the function of x is equal to x so it's an identity then we're back to ordinarily squares right so we're just taking this ordinarily squares i'm generalizing it by making this function not have to be the identity function so what do we do with this so first of all let's get some terminology down uh so this part here that goes into the function that's called the index it's the index function or the in the index line basically and it's when we take all those those those explanatory variables they fit into some index the higher that index is the higher of the the value that we're going to plug into that function then this function here we're going to call that the link function so we take our index we plug it into the link function and we come out with a predicted value so we have our index we have our link and we can choose our coefficients in such a way that we are predicting our outcome value variable as best we can right that's the same idea as ols basically except that we're running it through this function first so what kinds of link functions should we use well there's a lot of link functions out there and it depends on the kind of dependent variable that you have right if you have a variable that just can't be negative well that's going to give you one kind of link function if you have a variable that has to be counting numbers that might be a log length or a poisson regression which uses a log link uh if you have a dependent variable that takes multiple categorical values you might use a multinomial link function but the most common ones especially when we're talking about binary dependent variables are probit and logit and the link function for probit is the normal cumulative distribution function uh which is basically for a given uh z score uh what's the probability that you get a z score less than that right and for the uh logistic uh index function which is what use is used for logic regression it's the e to the power of the index over one plus e to the power of the index now the features that these have that are useful is first of all they can't possibly predict outside the bounds of zero to one the if you put in an index of negative infinity you'll get a prediction of approximately zero and it won't go any lower if you put in a prediction of pos an index of positive infinity you'll get a prediction of one and no higher and then all the numbers in the middle are going to be in the middle there so these are the link functions that we can use uh if we set up these link functions we can fiddle around with those coefficients until we find coefficients that predict the outcome as well and those predictions will never be outside the range of zero to one uh here's an example looking at the predictions from logit and probit as opposed to ols so let in the two things that we wanted it to do so first of all it does not predict below zero and also as we get closer to the edges it curves away and avoids going away so the slope varies across the range of x another thing you can notice here is that the predictions from probate and logit are basically the same that's pretty common most of the time it usually does not matter too much whether you choose probit or logit so i'm probably going to keep using logit as an example because it is more common these days in most scenarios so two brief things to note here before we go on to the next video we'll have some more information about probit and logit but one thing to note is that the slope here should vary depending on the values of all of the independent variables right so let's look at this graph here right so we can see here that the slope of the probit and logit is shallow over here it's steeper up here for different values of x but what if we had more than one variable in the regression well then it would it would vary the slope would vary not just with the values of x would vary with the value of the index function the index function is going to be based on all of the variables that we have in our regression and so it's sort of like if you remember back to polynomials we said the slope of the of the coefficient or the slope of the x the coefficient on x would vary with the value of x and in the interactions variable video we said okay well the slope on x is going to vary with the values of some other variable and here the slopes on every variable varies with the value of every variable if all if the prediction that you get from your model is very very slow if you're down close to zero then the slope is going to be very shallow if the prediction is more in the middle then the slope is going to be a lot bigger for all of your variables because again once you get down here there's only so much further down you can go and similarly if we go out here to the right there's going to be a point at which we run up in here it's going to get shallow so the slope itself is going to vary so there is no one single effect of x it depends on how close you are so for example uh you know what's the effect of this uh this drug on uh whether you get cured from your disease or not well if you were already 99 likely to get cured then you know there's only so much it can do for you if you were only 50 likely to get cured it can do a whole lot and that just makes that's it that's that's what we wanted to do that makes sense the other thing to know is that we can look at the actual regression table itself uh so first of all we see that here that the effect of x on the probability of the outcome in the linear probability model is 0.07 so an increase of one unit of x increases the probability of the outcome by seven percentage points and in logit and probit we see um 0.53 and 0.32 so is that very different predictions well no actually what these are these are the probit and logic coefficients which means they're the coefficients in the index function which means they can't be interpreted as relating directly to the probability of the outcome we can't interpret these in the same way we also noticed that the predictions for logic and probit were basically the same but these coefficients are very different again because the index functions are on completely different scales so we can't just interpret them next to each other we need to do something either interpret them as they are which you can do if you get really really used to using logit and probit or we can interpret them in the terms of marginal effects which is going to be something we talk about in the next video all right that's it for probitlogit right now thank you very much
Info
Channel: Nick Huntington-Klein
Views: 2,977
Rating: 5 out of 5
Keywords:
Id: FoQaJdT-1AI
Channel Id: undefined
Length: 12min 45sec (765 seconds)
Published: Thu Sep 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.