Logistic and Multinomial logistic regression on SAS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello today we're gonna learn how to do logistic regression on SAS so I'm gonna open up my SAS and my logistic regression codes here so I'm gonna do a very simple case which looks at the number of hours that you have studied and whether you have passed the test or not so I have my data step when I have an ID which indicates the ID of the student and I have whether you have passed or not and obviously go just in regression your dependent variable has to be dichotomous it's 1 or 0 in this case if you have passed it is a 1 if you have not passed and it is 0 and I have hours which means that the number of hours that you have studied ok so I'm going to read in the data step first and I'm going to run it and it probably has run correctly and I'm going to run my logistic regression it's very simple it's probably gistic and I'm gonna have my data and I do have to set as descending so by default actually what proc logistic does is that the dependent variable in this case it's going to be 0 or 1 but in my case pass is a 1 and a fail as a 0 but if I don't use descending it kind of assumes the other way around so which means that pass is 0 and fail is 1 so I have to have it as descending and this is my model obviously might dependent variable is pass and my independent variable is number of hours that you have studied so let's run this and see what we get ok my we'll just stick is running and it seems that this is the result that I get ok where do we look look at this see the key point that we're gonna look at is our estimate parameter estimate which is ours what we see here okay and it gets me a estimate of 0.49 49 standard error of 0.38 for 1 and it gives me an odds ratio of 1 point 6 4 and this is my 95 percent confidence interval so it seems that my odds ratio poor estimate is 1.64 and how do I get this if I take the exponent of 0.49 4/9 then I would get my point estimate one point six four which means that the odds ratio is one point six more meaning that my odds ratio of passing over failing is going to be increased by one point six more if I study one more additional hours okay so seems like a good estimate some obviously if I study more than I probably would have a higher likelihood of passing the test but however is it significant or not it doesn't look significant because it seems that my 95% confidence interval includes one so it's not significant and I can also check my p-value is not significant add our normal significance level which is 95 percent significance level okay so that was the very simple example let's go to a more interesting example in this case so which is I'm gonna read in a real data here so thanks to DM analytics org they also provide a data set so I'm gonna use a data set from their textbook okay and I have downloaded into this file here see that's a start and the file is by the way chapter 14 relay C test I'm just going to use the test file so in this case I'm gonna read it in I'm gonna call it retail relay it's a excel format and the sheet is contained in relay test dot CSV it does have header so that's why I called it as get name as yes so I'm gonna read in this excel file okay so I'm gonna read this in and it seemed that I probably have it on my work directory just to kind of check that where it is let's see where I have it on my work directory so here it is so this seems to be the file that I have downloaded okay so it has a lot of variables but you can probably guess that the that we will be using as a dependent variable would be retained because we want to see that whether the customer has been retained or not and seen that there are several independent variables that we can include in order to estimate our logistic regression okay so now having that in mind let's look at the first thing that we would probably want to do which is run a logistic regression looking ad retain as our dependent variable and average order which means that the average order amount that the user has spent on a average purchase so we want to see that whether this is a important independent variable because it kind of makes sense it seems that somebody is making a higher order amount in terms of dollars maybe that's a good indication of this person being retained as a customer okay so what I'm gonna do is I'm going to run the logistic regression here okay and this seems to be the result I get a point estimate of point zero zero zero four six seven and I get a arts ratio points an estimate of one okay so this it doesn't really seem that this has been a important way to estimate our dependent variable because it seemed that the odd ratio which is in this case the probability of being a customer over probability of not being a customer is increasing by nothing meaning that if I use one additional dollar in terms of my purchased product the probability of being a customer over probability of being not a customer is equal so it's meaningless to use my dollar estimate as a way to see that whether this person is retained as a customer or not okay so that seems a little bit interesting result but let me let us try use a different type of independent variable and let's say in this case I'm going to use doorstep here so let's run doorstep so meaning that doorstep is whether this customer ordered a doorstep service for their delivery okay so obviously if you think that somebody is ordering doorstep meaning that you're spending a lot of money because you have a lot of groceries that needed to be delivered to your doorstep rather than to your local pickup place so in that case maybe this is a good indication of being retained so I'm going to run my logistic regression and see what happens okay and now it seems that I get an estimate of one point zero zero nine nine and that's a point estimate of two point seven four five okay and seems quite high meaning that if I order a doorstep it seems that being able to retain that customer is increased by one point seven four five so that's a quite nice increase and is it significant or not and it seems that it's significant because that the confidence interval does not contain one and my lower confidence interval is actually over one so that's significant and also you can cross-check that with my significance level here the p value seems like it's below point zero zero one so it's very much significant so it seems that if somebody is ordering a doorstep delivery to their house it seems that that person is very likely to be retained as a customer okay so that was the logistic regression part here now let's try to do a multinomial logistic regression so meaning that okay in some of the cases maybe your dependent variable isn't dichotomous maybe you want to have a pole economist dependent variable meaning that you have more than two dependent variables so I mean two more than two outcomes in the dependent variable so what I did was that I use the original data set retail relay here and I divided up the independent variable into three outputs so I'm looking at a person who has aura as an average order of less than $100 I'm gonna call that person Group one and for somebody who has over 100 but less than 200 then I'm gonna call that person group 2 and obviously if somebody has an order amount of more than 3 $200 then that person is going to be group 3 so maybe in some cases you want to have a different segmentation in terms of looking at your customers so you want to divide how much average order they have a place so let's see if that is a good way of trying to understand our logistic regression in a multinomial way okay so I'm going to create my data set here ok so I've created my data set and let's try to run a multi low Meo logistic regression okay how to run it'll just multinomial rule this regression is pretty much the same you can use proper logistic here and my dependent variable is again it's my group it's the new variable that I just have created which has three outcomes one two and three and I'm going to use doorstep again as my independent variable so I want to see if if somebody is more likely to use a doorstep maybe in some way that could be predicted in the group okay and in order to use a multinomial logistic regression I have to create a link so I have to call a G link here might link as G link which is general logic okay so what I'm gonna do is I'm going to run this I forgot to put my quit there just to make my SAS program happy so I'm going to run and let's see what happens okay it seems that so this is the result that I get okay and probably you notice that if I look at the outputs here so these part are the outputs that I need to look the maximum likelihood estimates I probably need to look at the odds ratio here and all of the cases even though that the group has three outcomes it seems that SAS is using Group three as my reference so if I want to change that I could obviously put a descending at the end of this part okay but I have decided not to do that in this case so if used ascending here you'll probably get a reversed outcome meaning that one would be your reference okay so let's leave it now for this okay I'm gonna look at my outcome so it seems that doorstep is giving out estimates for what they look like compare it to its reference to the third rope okay so door step one means that in reference to the third group my point estimate is negative one point three eight three five eight eight or if I look at my odds ratio it's zero point two five seven okay meaning that if I'm in Group one I am this much likely okay point two five seven likely to order a doorstep service if I'm in Group one compared to group three okay so this is a value that is less than one meaning that it is very unlikely that Group one is going to order a doorstep service compared to group three which kind of makes sense because Group one are the people who orders less than a hundred dollars Group three is people who have ordered more than two hundred dollars meaning that if you're ordering more than two hundred dollars you probably have a lot of amount and it could be quite heavy so maybe you would prefer a doorstep delivery okay now mean going to door step two so now the point estimate is point three seven which means that it is 0.37 Group two is point three seven likely to order compared to this reference group Group three okay now again remember this is a value that is less than one which means that it is let group two is less likely to order a doorstep delivery compared to group three okay so which also makes sense because these people are people who are ordering between one and two hundred dollars and that is still less than the two hundred dollars that the group three is ordering so that's why they probably would think that they don't want to use the doorstep delivery and for both of the cases it seemed that the point estimate is quite low and if you look at the confidence intervals it doesn't contain one so yes it is quite significant that they are more unlikely to order the doorstep delivery and that is also confirmed by looking at the p-values here okay you can also run the multinomial logistic regression using cat mod it's a little bit more complicated here because you have to add in a couple of more lines so rather than using no lines for the logistic for cat mod you have to direct your doorstep as your independent variable you want to say your responses will be related to logics and your model will be group which is your dependent and doorstep as your independent and if I run this then I will probably get identical results and yes you probably noticed that I do get identical results as the first case so yes that's how you would do the logistic regression or the multinomial logistic regression for this case okay so today we kind of reviewed logistic regression and multinomial logistic regression and I hope this video was helpful thank you
Info
Channel: Jinsuh Lee
Views: 6,838
Rating: 4.9000001 out of 5
Keywords:
Id: OERhIyyLmlU
Channel Id: undefined
Length: 15min 6sec (906 seconds)
Published: Sat Nov 19 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.