Maximum Likelihood estimation of Logit and Probit

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I want to talk about how we can actually use maximum likelihood estimation in order to estimate logit and probit models so just reminding ourselves of binary choice models the idea is that we have as a dependent variable with the probability that our dependent variable is equal to let's say 1 given that we have X in the linear probability model this was just equal to in the case of having one independent variable beta naught plus beta 1 times X and we spoke about the problems with this particular type of model in that the linear combination of the independent variables is constrained to lie between minus infinity and plus infinity so that isn't a very good thing because of the fact that probabilities are typically constrained to line between 0 and 1 rather than minus infinity and plus infinity so one way we spoke about getting around this is using a nonlinear function which I'm going to write here is f of our linear combination of our independent variables and the way in which F works is that the F of let's say minus infinity is defined to be equal to 0 whereas F of plus infinity is defined to be equal to 1 so if we were to draw a quick graph of this it would look something like this whereby as our linear combination of our independent variables which is our x axis is so this is beta naught plus beta 1 times X is our independent variable and then it's going to look something like this whereby as our linear combination of independent variables increases the function increases asymptotically towards 1 and as it and as our linear combination of our independent variables 10 sports minus infinity a function tends to take on a value of 0 and I should quickly mention that the two particular cases of F which we spoke about one of them being where F is actually given by the logistic function so f here is written as either Capital lambda of beta naught plus beta 1 times X and which means just the exponent of that linear combination divided by 1 plus that exponent of that linear combination or we spoke about the probit model which was just where we were taking the normal cdf of our linear combination of independent variables so the case where we have a lambda here in terms of we're talking about exponents is the logic model and when we have a capital Phi like we do is bottom one here we're talking about the probit model but for all intensive purposes for all intensive purposes rather the way in which we actually go about estimating these two types of model is exactly the same so we can speak about the both in one video ok so this is the probability that y equals 1 given X if we want to find the probability the y equals 0 given X that's quite easy to find it's just 1 minus this probability that y equals 1 given X because Y is either 1 or 0 and 1 minus this probability is just going to be 1 minus the function of beta naught plus beta 1 times X ok so this these are our two probabilities which correspond to the two and possible outcomes of our dependent variable and we're going to use these in order to construct a likelihood function for one particular observation and the way in which we're going to do that is exactly the same way which we used for the case of a Bernoulli random variable because if you think about it our dependent variables just got two potential values it can take on can take on a value of zero or it can take on a value of one so if we write out here the probability that y equals y I where Y I can either be 1 or 0 given X this is going to define our likelihood function and it's going to have exactly the same form as that of the Bernoulli random variable so we're going to have function of beta naught plus beta 1 times X or to the power Y I and then our next bracket is just going to be 1 minus F of beta naught plus beta 1 times X those brackets or to the power 1 minus y I so it has exactly the same form and the Bernoulli random variable except that Bernoulli random variable if you remember just had a likelihood function which was P to the power Y I times 1 minus P to the power 1 minus y I and whereas here we're replacing P by F because F is what actually represents our probability okay so why is this likelihood function okay well if you imagine the circumstance where Y I is 1 so then this first term here has just a 1 up here and the second term here is just going to have 1 minus y or Y is 1 so this second term is just going to have 0 here so the second term just all becomes 1 so we're just left with our function of beta naught plus beta 1 times X which is just this function or the first function up here which is the probability that Y I equals 1 given X okay so that's the case where we took you about the case where Y is equal to 1 what about when y I is equal to 0 if Y is equal to 0 then what we have is that this first term is 0 or to the power 0 which means that this whole first term becomes 1 and when we've got Y I here is being 0 then the second term which is 1 minus the function itself so that's actually going to give us the probability that y equals 0 so in other words our likelihood function is behaving use up yes we would like it to okay so that's the likelihood function for one observation in reality what we're dealing with is we're dealing with a sample of n observations from a population so when we're dealing with n observations from population what we define is the likelihood if our observations were independent from one another is just the product of I equals 1 to N of each of the individual likelihood functions so all we're doing is we're taking the product from I equals 1 to N of function of beta naught plus beta 1 times X I times y I times 1 minus F function of beta naught plus beta 1 times X I or to the power 1 minus y I and as we spoke about before likelihood functions as they sort of sound are quite difficult to deal with because of the fact that we're dealing with essentially we're trying to differentiate a product so what we typically do is we take a monotonic transformation in that we take the log of the likelihood and then we maximize that instead and a benefit of taking a log is that the product actually becomes a sum so we get the sum from I equals 1 to N of Y I times the log now of our function of beta naught plus beta 1 times X I and then we're going to get plus 1 minus y I times now we're going to have the log of just going to be 1 minus the function of our independent variables which particular write like that so this is our local likelihood function and ideally what we would like to do is we would like to differentiate this with respect to the parameters which which on our estimates so DL over D Peter Norton DL over D Peter 1 or general if we have P parameters we have P first-order conditions whereby each of these differentials of us to evaluate to 0 but the problem with this type of maximization is that typically the solutions to these types of equations aren't typically analytic in other words there are no closed form solutions I can't just write that you know beta naught is equal to the sum of x times y divided by the sum of X all squared and and typically what we need to do is we need to do some sort of iterative computational search in order to search for the values of beta naught and beta 1 which yield condition which is close as possible to that which I've stated up here and which is great for us we don't actually have to do this ourselves all we need to do is we need to tick a box in whichever statistical software program we're using and the computer does the search through the parameter space for us so then the values which it reports back all the values of beta naught and beta 1 which are as close to us as possible to satisfying these two above conditions so even though it helps to know about how the computer actually sort of searches for these particular conditions we don't need to know the ins and outs of how logit and probit computational search models work but I hope that this video has provided some background as to at least what the likelihood is and then what computer is actually searching over
Info
Channel: Ben Lambert
Views: 131,648
Rating: 4.9413204 out of 5
Keywords: Maximum Likelihood, Econometrics (Field Of Study), logistic model, probit
Id: WflqTUOvdik
Channel Id: undefined
Length: 9min 18sec (558 seconds)
Published: Wed Oct 30 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.