Multinomial Probit and Logit Models in Stata

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I will show you how to do the multinomial probit and logit models the conditional logit model and the mixed logit model and I have three different programs for us to take a look at in Stata so this is the first set of program how to do the multinomial probit and logit models and then I have already executed the program which you see right here on the right hand side and you can follow along with the results so we're going to consider a data set on fishing and I'm reading in the data set and I'm going to open the data set so we we take a look at it here's how the data looks like it's in the wide format which means each row is for one individual so mode would be our dependent variable okay and we can see that the first person has picked the charter mode so we have four different choices beach pier private and charter so the first person has picked charter the second one charter the third one private the fourth one pier and so on and we have one line for each individual and the other variable that we will consider from here is income that's going to be our independent variable and notice that this variable is alternative in variant therefore no matter what alternative and individual picks they only have one income so these are the two variables that we will consider in in this in this model so let's go ahead and define the variables here for global for my Y variable I will have mode and for the X variables I will only have income one variable if you have more independent variables you can just put space and like write them out here now the one thing to notice about this example is that we have four categories denoted by 1 2 3 & 4 these are the different fishing options so you need to pay attention for the rest of the program if you have different number of options than 4 you will have to make slight modifications and I will show you where in the program and the rest of the program should work with with your data if you just put here your Y variable and then your X variable so I will describe and summarize the Y and the X and here for the Y the fishing mode is the dependent variable and income is the independent variable monthly income in thousand dollars for summary here is the summaries again for the mode that number is meaningless because choices 1 2 3 & 4 are just different categories and the numbers don't mean anything so don't report this number in tables here's the correct way to summarize a multinomial variable and these are the percent frequencies you could have the frequencies we have 1182 observations in the data set and these are the percent frequency so notice that this charger option has the highest percent frequency in the data 38 percent have picked that option so it would be good to report this table in in your paper in your analysis when summarizing the dependent variable so the next thing is to look at the multinomial logit model ok so the command in Stata is M logit and then you give it the name of the Y variable for us is mode and then put put all your X variables behind and if you don't specify which is the base outcome Stata would consider that as the most frequent category so here you see like how Charter does not have coefficients because that was picked to be the base categories so these coefficients are assumed to equal to zero and now we have the number of code that the sense of the call the number of sets of coefficients is three which is the number of choices or alternatives minus one okay so here the way to interpret this is that if income increases the beach option is well actually that's not significant here okay so this one is significant so if if if income is higher than pier is less likely in comparison to charter and here we have that if if income is higher the private option would be more likely in comparison to charter boat so that's how we interpret these results no interpretation of the magnitude only more likely and less likely so now if you would like to specify the base outcome which is should in your program here we could specify the base alternative to be the second one for us and we could either put here pier or we could put the number two because all data has two ways of recording pier you could you could call both the number and the the label for that okay so this is the multinomial logit model if you have the best outcome alternative to notice now that pier is the base outcome here and therefore it does not have coefficients that are assumed to be equal to zero so now what we can say is that in comparison to pier the beach option is more likely the income is higher if the income is higher the private option is more likely and the charter option is awesome more likely so that's how you interpret that and notice that these coefficients here and whether it's positive or negative they're very different from the previous model so the next thing we can do is we can predict the marginal effects and that is done here with the mfx command-comma predict and then the outcome equals one two three and four now in your case if you don't have four options or they're not labeled one two three and four you will need to change here to correspond to your numbers and your categories these numbers from one to four but here are the marginal effects and we have if income increases by one unit then we have that the likelihood of the beach option being selected right here would be let's see what percent is that very very small percent point zero zero seven five percent higher so if if income is higher by one unit or one thousand dollars then we would have that the pier option right here is two percent less likely and so on that's how you interpret those margin effects and you can go ahead now and interpret the magnitude as well and notice that these numbers right here would have to sum up to zero that's true for the marginal effects so if you want to take a look at the predicted probabilities you can predict it using these and you name them this is the names that I've given it and then the comma PR for the predicted probabilities and here are the predicted probabilities and you can see that the the means are summarized here and they very closely correspond to the actual frequency so we have the models are usually doing very good prediction for at the mean in our on average is doing very good prediction now if individual observations are predicted well that's a different that's a different story so okay so then next what we can do is this time we can estimate the multinomial probit model and the command for that is and probity instead of m logit and then you give it the Y and the X variables and here are the results that we have here so again if you don't specify a base outcome the best outcome would be the most frequent alternative and here we have again you can say more likely and less likely but notice that these coefficient magnitudes are different than the logit model and that's why we cannot interpret the magnitudes just more likely or less likely and likewise if you put here on the profit model , best outcome to this now means that the best outcome would be the second outcome which is peer and those coefficients would be normalized to zero and again you think of miss higher you would say that people are going to be more likely to choose the beach option in comparison to peer and so on that's how you interpret those results so for the profit model you can also do the marginal effects and if you the interpretation is exactly the same as the logit model and if you compare those marginal effects now they are very similar to the marginal effect from the logit model and that's exactly the same case as we have with the binary profit or logit models that the coefficients may be different but the marginal effects would be very very similar and then the final thing that you can do is predict do the predicted probabilities for the profit model and use the command predict this is the variable name that you're giving it and then outcome 1 2 3 and 4 these are the commands and you can summarize them and you can see that again average the multinomial probit model is doing a good job predicting the actual the actual percent frequencies that comes from the sample so this was a program on how to do the multinomial probit and logit models so next let's go ahead and talk about the conditional logic model I have the program on this side and here's the program executed so we're reading in the data that is called conditional underscore fishing and it's a little bit different structured I'm going to open up the data editor and we're going to look at it now this dataset is in the long format so notice here that we have the ID the first idea second ID third ID these are the people in the sample and it's in a long form because each person has 4 rows based on all the options so here we just have it each of the options repeated four times and notice how here beach is associated also with the first option you see like here this one is the second option this one is the third option that one is the fourth option if you notice that so each of these labels is associated also with the number okay so this time mode the the the variable here that we had before is charter but charter is repeated four times because the first person picked charter so these were the available options for them and they picked charter so for the second one these were the available options this guy picked charter two so for the third one these are the available options they picked private and so on so this is these are all available options this is what they picked so this D variable here would be a dummy variable that is equal to one for the option that they picked and 0 otherwise so notice that when the option was charter in that person picked charter this number is 1 and it's zero otherwise for them and same thing for this guy so this guy here we have that the option was private they picked private and therefore this is a 1 here for the third person and everything else is equal to 0 for them and so on so now this would be our dependent variable in the long form where it would have a value of 1 if they pick that option and 0 otherwise now we have income which is the alternative invariant variable so notice how like income is repeated four times for each person because that income does not vary across alternatives the person has only one income and this one has the same income repeated across it all options and so on so this would be our alternative invariant variable the P and Q P is let's see price for available alternatives in Q is the cash rate for available are two alternatives so these ones do differ and these were the prices for each of the options and they're listed for the prices that that individual faces and these are the different catch rates and so facing those prices and catch rates the individual picked you know what what they pick but these would be our alternative variant independent variables okay so so let's see how now we're defining these variables the dependent variable would be 0 1 whether the alternative is picked and so Wireless would be our dependent variable that would be D in our case whether or not the alternative is selected the Z list would be case specific regressors or that's the alternative in variant 1 and the Z list would be the alternative specific regressors so when you modify this program this is the variable names that you need to put in here we also need to give it the ID variable in my case this is called ID in your case maybe cause something different you need to also give it the set of possible alternatives in this case is the fish mode here is where they're all just listed four times for each individual regardless of what they selected and here I'm I'm taking the base alternative one and two being a peer and charter I'm gonna use these as the base base categories so the first thing that we can do is to describe and summarize the data and this is what we have for description again for IB we have the person ID number of fish mode the income pricing catch rate and these are the summary variables here so ID doesn't make sense the fish mode cannot be summarized because it's its labels d that variable Y is it point five here notice that we have four alternatives right and exactly one of these four is equal to one depending which one the person picked and the rest are equal to zero therefore that's why this is point 25 because the person can take only one out of out of four options and then we have the income and price in cash rates summarized here so how to do conditional logic model if you have base outcome selected as the base alternative one we will do the a a s-see logit and then you put the white list is the dependent variable then you put the Z list which is the alternative specific regressors here for case you put the ID that you have this is the ID for the person for the alternatives you give it the possible alternatives which in our case is the fish mode for case variables you need to give it the X list the case-specific regressors that don't bury with the alternatives and then the last thing you put this based alternative and I'm going to give it here either a one or a two that's the difference between these two models and this way we can have the conditional logic model estimated two different ways okay so here are the results with the based alternative equal one so notice one thing here if you have a alternative a specific regressor notice there is only one set of coefficients here but if you have an alternative alternative in variant variable like income notice that you will have as many coefficients minus one as there are alternatives minus one here and so again you would interpret these they're these coefficients here are the same as you would in the multinomial logit model so you would say here if income increases this option of Veatch is more likely the option of charter is also more likely and the option of private the private is also more likely in comparison to peer and the coefficients de that are here you would interpret that if price of an option increases they're less likely to pick that option and if the catch rate increases they're more likely to pick that option so you can also estimate this model this model by using a different based alternative so one thing that changes here we have peered the first one as the the base alternative and now we have charter as the base alternative okay and so here is the appear and the charter as the base alternative and so now the charters based alternative we would have those coefficients normalized to zero notice that these coefficients are exactly the same as before but the coefficients here completely changed not only that but does the sign changes because now we're interpreting them how much they're likely how much this outcome is likely in comparison to base charter so that's why they they change completely so same interpretation as before now you can also estimate the marginal effects and this is where it gets really complicated so you used it in stat mfx for the marginal effect and then you put variable list and then you put the X list in the Z list because you want the marginal effects for those so now notice that we have marginal effects for each of the alternatives and you also have marginal effects on the probability of each choices so that's why it's getting a big big table so here for the first one is the probability on the choice being Beach that that's a choice that beans is being selected so here income doesn't have a significant effect but you can say that let's see if the price for beach increases then the probability of beach being selected is less likely and if the price of the other options increase the probability of beach being selected is is more likely so why is that well if it's more expensive for a certain option then you're less likely to demand this option more likely to demand others likewise with the catch rate if the catch rate for this option increases then you're more likely you're exactly 1.7 percent more likely to select Beach option and so much less likely to select the other the other option here and the same interpretation for all the rest of the of the of the marginal effects so notice now that this one has a negative coefficient on its own on its own price so you can take a look at the example where I have summarized all these okay and the last thing to do is to calculate the predicted probabilities and we can do this with the predict probabilities and then you put comma PR this is what I called it and then summarize them and then compare them to the actual whitelist and so you can see that on average we got the point 25 and it's exactly the point 25 that comes from the empirical frequency so the last thing that we can do is we can generate a multinomial logit model using the conditional the conditional logic model if we just use the X variables and we don't give it the Y the Z list so notice that after here after Y list I don't have the Z list here so if you don't give it any variables that vary with the alternative we're going to have exactly the multinomial logit model and if you look at the results here and if you go to the previous slide to the previous program that we had on the multinomial logit model we would have exactly the same results here so you could get the multinomial logit would be a specific case of the conditional logic model when we don't have variables that vary with the alternatives okay so this was about the conditional logic model so the last thing to do is to look at the program for the mixed logit model and i have the program open here as well as the program executed so i'm going to look at the data again it's the same fishing data but it's a little bit different we have the data in the long format but some some of the options are deleted notice that now we only have three options per per person so beach pier and private the charter boat is not here don't ask me why that's how the data and the example came not like it has to be the case but in this case we would have only three options three alternatives okay so our dependent variable would be D and I already talked about that from the previous the previous program so these are fish model does the possible alternatives and that person pick private and therefore only this one where the choice was private and they picked private that would have a 1 and the rest would be 0 say that person number 7 with an ID of 7 if the option was Beach and they picked the beach that variable would be equal to 1 and if the option was pier but they didn't take it that that would be 0 and if the option was private but they didn't pick it that would be a 0 and that's how the dependent variable is defined now the other independent variables that we have here is Q q is the catch rate for available alternatives so notice that we would have them all listed regardless of whether or not the person picked that option right here and then we would also have a dummy variable for beach whether or not the person picked a beach here and then whether or not the person picked here that one would be the reference category which we would miss and then we would also have why beach and wife here that would be the multiplication of the dummy variable for beach times income and dummy variable for pier times income so this is these are additional variables that we will use and the other variable that we would use here is price price is the price of the option that they have the different alternatives that people face and this will be the different prices that they face before they made their choice so we would also assume that this price is the random variable so I've defined here the whitelist would be D that's our dependent variable excellest I have here several of them and the random variable would be a P now you also need to define the ID variable and if there are any groups in there you could also identify them but I don't have any groups it's only my ID variable so I would I would use this again I would put an ID there so we can describe and summarize so you can see this is the ID number and the independent variables that we would use now notice here that again we have the ID the summary of that makes no sense but again here we have a point 3 3 3 because in this case we have three alternatives so this would be equal to one only one of the three cases and that's what we have 0.33 and same thing for whether or not they picked Beach or private the same same thing okay so the way to estimate the mixed logit model on the random parameter models is you use the command mix logic you put the dependent variable here you put the independent variable here and then you can put the group if you have any groups if you don't have them just put the ID variable then you put the ID variable for you that could be anything you want it to be and then you have to give it random which is your random variable and in my case this dollar sign Rand is my variable P so price would be the random variable so you put here your random variable whatever you want that to be and here are the results from the mixed logit model that i have and the way to interpret this is that if the price of an option increases then that option is less likely to be selected this one is a significant coefficient which means that there is a in the data and people have different effects of price on their options and you can take a look at at the handout for more information on how to interpret these these choices okay so this was a video about how to do the multinomial probit and logit models the conditional logic model and the mixed logit model in Stata thanks for watching
Info
Channel: econometricsacademy
Views: 92,941
Rating: 4.8979592 out of 5
Keywords: Probit Model, Stata, Logit model Stata, Probit model Stata, Logit, Econometrics, Multinomial models, Logistic Regression, Logistic regression Stata, Econometrics (Field Of Study), Logit model, Stata Software, Probit, Mixed logit model in Stata, Econometrics Stata, Econometrics Academy
Id: iqypob4My4o
Channel Id: undefined
Length: 29min 10sec (1750 seconds)
Published: Sun Feb 10 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.