Probit and Logit Models in Stata

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome this video clip is about how to do profit and logit models in Stata before you view this video clip make sure that you have watched my videos on how to interpret the profit and logit models as well as the data example 1 and these are just going to be the commands on how to do that in Stata so here I have opened a do file editor in Stata and I have already executed the program by just hitting run or execute or do the program and here are the results so that we save time throughout the video so how does it start the first thing that you need to do is use and then point to where your data is located so in my case I have saved data profit underscore insurance this is the file name that I have with the data I have saved this on on my C Drive econometric slash data and you can download this file from my website as well as the program so the first thing that I'm going to do here is define my Y and X variables I use global and so my Y variable or Y list would be insurance whether or not a person has insurance and my X list these are my independent variables would be retired age health status good household income and so on and I put them here so basically if you want to use this program for your own research all you need to do is change to the file name that you have with your own data and where it's located so you need to provide it with the path and you need to tell Stata which one is your dependent variable 0 1 variable and which ones are you in fandom variables and hopefully if you cross your fingers everything else will run on for the rest of the program so the first thing that we will do is we will describe the Y and X variables and use describe Wireless in X list notice that now we have these dollar signs in front of the Y and the X that's how we're going to refer to them later on in the program so here is the command that got executed described Y list and X list and here we see the variable names and their formats and so on and they actually don't have labels which is you don't quite know what they refer to but that's how the data came so next thing that you can do is you can summarize your Y and X variables and to do that here the result and you see that for insurance we have the mean is point 38 percent this is the proportion of people who have health insurance we have 62 percent of the people retired average age is 66 years old and so on this is how you interpret those results next you can just list Y variables and X variables the first 10 observations from the first the 10th observation and this is how the data looks like so we have for the dependent variable well the first 10 observations are zeros but they're also once here's a retired age health status and so on if you also want to see the data you can click on the data editor now that you've read it here is also the data and you can you can look around and you can see there's some y ones for the insurance so this is our dependent variable here I'm going to close the data and continue next thing that you can do is tabulate your Y variable or the insurance and when you tabulate this again you see the frequency in both there are three thousand two hundred and six observations and here's that 30 39% of them having insurance so next thing here is how to run a regression and that's done with reg you put your Y variable and then all the X variables behind it and these are the results that we have here and if you notice these are the results that are actually copied into the table how to interpret them so the way to interpret these is again more likely or less likely but don't interpret the magnitude of these coefficients a profit model is run very easily in SAS you just put profit and then you put the dependent variable name and independent variables and these are the profit model results and you can see it looks like a normal linear regression so here's here's those results and here's that pseudo r-square that we talked about that you also need to copy for your table we know six kind of low next you can estimate the logit model either that would Lodge it you put the Y variable and then your X variables and here again is the Y variable and the X variables you see those coefficients again retire people are more likely to have the insurance and so on because of the positive coefficient next this is how marginal effects are calculated at the mean and the average margin effects are calculated means theta you can quietly run a regression with your Y variables and X variables and then use the command margins comma dy/dx stars and you put at means which means the margin effects add the means and here is the conditional marginal effects that are calculated and they're listed over here so I copied these marginal effects from here now notice that now you don't have a constant anymore there's no marginal effect for for a constant and these are actually the means of the independent variables so retired you have 62% of the sample are retired and these are the x-bar values that are used to calculate those those margin effects so the way to interpret this now is if a person is retired they're four percent more likely to have insurance so now you can interpret both the sign and the magnitude of these if you look next one for the margins comma D Y D X star then these are the average marginal effects which are better calculated a better approach in calculating marginal effects but nevertheless if you compare the magnitudes they're very similar okay so next thing these are for oil-less and then i have for logit and the probit models so and these are the commands as you can see they're exactly the same thing so one thing you're going to see here for the regression is that the marginal effects are the coefficients in the oil-less model and they have the same the same values and then you can you can check them out for the logit and the profit models here and I have put them in in the table and explained them okay so once you calculate this one one thing that you can do here is to calculate the odds ratios in this data that is done by using logistic as it is a command and then you put wine lists and X lists and here the results for the logistics that gives the odds ratio and you can see right here that on top it says odds ratio and for Retired the way to interpret this is that you can say that the odds of being having insurance versus not having insurance is 1.21 so you're more likely if you're retired you're actually more likely to have insurance that not not have insurance and so on next what you can do is calculate predicted probabilities you can quietly estimate the launched model and then use the command predict and this is p logit is the using the per this is basically the variable name that you give it for these predicted probabilities and comma PR this is the standard command to call for the predicted probabilities and you do the same thing for the profit and for the for the for the regression you use basically comma X B because these are just the coefficients times the X variables in the linear model and then you can summarize these so here we look at the summaries of of them and if you look at the means they're very very similar and they're also very similar to what the frequencies are the actual frequencies so this is the insurance that's the sample frequency this is the predicted probabilities coming from the logit model this from the profit and this is from the oil-less model and if you actually open up the data editor and you scroll down the data they actually got calculated here and these are the predicted probabilities so the first person in the data has a predicted probability of 22 percent of having insurance well all of them according to this model so one very interesting thing to is notice this number right here do you see that it's negative that's one of the drawbacks of the oil-less model it is that it does not restrict the predicted probabilities to be between 0 and 1 so you have a prediction for probability going outside of of this region so basically you can have these predicted probabilities either as a ratio like that or if it's less than point of the 0.5 you can say we're predicting that this individual for example basically is we're predicting him not to have insurance because it's less than 0.05 okay so again this was the summary of those predicted probabilities and you see how how close they are to the sample average okay so one thing that that I can also show you is these are the values that we're trying to predict you see like how it's it's all zeroes these are the actual values in the twelfth observation that person here has insurance so if we scroll down and we look at those probabilities we're still predicting that that person is only thirty three percent likely to have to have insurance so that's basically not a very good prediction right here but these these probabilities these are the values that we're trying to predict this is the insurance the dependent variable to zero one variable okay so the final thing to show you here is how to calculate percent correctly predicted you're estimating again the logic model you're giving it wide and X and then use the command es stat a classification and these are basically the results here and Stata has a way of displaying that table that was in my lecture so you have the true predictions you have the false predictions and basically this is the number sixty-two percent that you need to report in your paper is percent correctly classified so basically we have well not so good prediction here ability for this model to predict and then from the profit model if you have to me the profit model and then use this e star classification you can see that again it's around 62% so very similar prediction for both the profit and the logic models so again this is how you do probably the logic model since data all you have to do is just change those lines based on where your data is y and X and it should run with your problem that you have and also look at the interpretations of how to interpret those results that I have summarized thanks for watching
Info
Channel: econometricsacademy
Views: 173,057
Rating: 4.8857141 out of 5
Keywords: Probit regression, Probit model, Stata, Logit model in Stata, Probit regression Stata, Econometrics in Stata, Econometrics, Logistic regression, Logit regression, Probit model in Stata, Econometrics (Field Of Study), Logit model, Stata Software, Logistic regression Stata, Econometrics Stata, Econometrics Academy
Id: wU1DVbpD9SY
Channel Id: undefined
Length: 13min 51sec (831 seconds)
Published: Sun Jan 27 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.