Logistic and Multinomial logistic regression on SAS Enterprise Miner

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay hello today today we're going to do logistic regression and multinomial logistic regression on SAS Enterprise miner so I've opened up my SAS Enterprise miner and so I'm going to start a new project here okay I'm just going to call it legit logistic and I'll call it M legit because I'm going to have both cases being running and as my directory I'm going to use my normal SAS directory from my C directory okay so once I do that it's creating and as created and it seems that it has created successfully okay so now I probably need to load my library where I have my data files and my data files is located in SAS so I'm going to create a new library here and I'm going to call it just my Lib and my location again is going to be my SAS location so there's where I'm going to use it okay and I'm going to finish it and everything is set to be used now I'm going to select a data source so my data source is from retail relay from the textbook cutting-edge marketing analytics you can also download this from their website so I'm going to use retail relay - ok I'm just going to click on enter in finish so now I have the data set available with me okay so now I have a data set and it's called detail relay and like I said this is a text data set you could download from their web page cutting-edge marketing analytics it's called DM analytics dot org you can probably find it there and the data set looks like this ok so it seems that it has few dependent of variables here but what we're really interested in the retain which is the dependent variable that we will be using ok so which means that whether this customer has been retained as a customer or not retained as a customer and there are several types of independent variables that we can use ok and I think we're probably interested in the average amount that they have been using and I by the way I actually created a group a variable that is not included in the original data set this is just looking at how many amounts of dollars they have used and try to see that in terms of groups so that is if you are in Group one you are somebody who spent less than a hundred dollars per order and if you are in group two which means that you have spent on average between 100 and 200 and if you are in group 3 meaning that you have spent more than 200 dollars okay so now kind of having that in mind in terms of how our data set looks like so now you are going to run a logistic regression so the first thing that we have to do in SAS Enterprise miner is to draw a canvas so I'm going to call this just my can I'll call it my canvas ok so now I would get a empty canvas shortly so there we go and the usual steps for SAS Enterprise miner is drag and drop so first I'm going to drag and drop my data and now I'm going to go to the model here in order to run the logistic regression I have to go to the regression node right here which is the on the model tab and the third node from the right okay so now I have a regression node and I have my data node and I'm just going to connect my data to my regression node as of now if I run the regression it's not going to work what I need to do is that I need to specify what my data would be for independent variable and dependent variable so I have to set my independent variables independent variables from the retail relay by right-clicking and edit my variables so I'm going to do just that okay now you probably notice that there are a lot of different roles that to be given here remember I'm going to use retained as my independent variable so that's going to be my target okay but however in terms of a level this dependent variable only has two outcomes so I'm going to specify as binary okay all the other things are the same but I'm going to specify as binary on the first case I'm going to run as my order amount as my independent variable so that's going to be still my input but all the other stuff I'm going to just not use it in my regression logistic regressions 9 so what I have to do is I have to put all these in two labels meaning that I'm going to not use as my independent variables if I use them as input what SAS Enterprise miner is going to assume is that this is going to be a independent variable and they'll run it in the logistic regression line so everything else that I do not want to include as an independent variable I'm just putting it as label okay so now I have to clicked everything so I only have my dependent variable as retained and my independent variable is average order amount in dollars okay I'm going to click OK to that now ready to run but I just need to one check one last thing so if I click on regression I have property and values and if I go down and you will probably notice that there is a class target here and you have noticed that there is a regression type and there's a link function okay and I want to set this edge logistic regression but if you haven't touched anything then this would be your logistic regression and I want to use my link function as logic okay but if you can obvious you can change it to linear regression if you want and you can always have some different type of link function profit or C log log but I'm going to use loaded for now okay seems that now I'm ready to run so I'm going to run this okay okay now it's probably ready to run an n it is currently running ok SAS Enterprise miner takes a few moment or so to finish and it seems that it has run completely and I'm going to look at my results ok so it seems that this is the results that I get I'm going to go to my output here and I'm going to look over to my regression line and it's odds ratio estimate so this is my estimate for average order it has an estimate of point zero zero zero four six seven and also has an odd ratio of one so it seems that in this case making an additional order in terms of daughter or two your average order amount it really doesn't help you in terms of your odds ratio because remember odd ratio is the probability of retain over the probability of not retained so it seems that that is equal to one which means that the two probabilities are equal so it's really not helping in terms of your spend and your whether you are retaining that customer not okay so in this case let's try to look at a different variable that might be probably more important and I'm going to this time I'm going to use number of emails that I have sent okay so I'm going to use this as my independent variable so I'm going to put this as my input and I'm not going to use order again so I'm going to put this as my label okay now having that in mind now I want to see whether the number of emails that I have sent to the customers whether that has a positive increase in terms of whether that they will be retained or not okay so I'm going to now run this logistic regression again with the new independent variable okay now it is running and the regression is running again and now I can probably see the results and this is the results okay now I'm going to go down and try to look at my estimates and these are my estimates okay so number of emails sent it seems that it has a point estimate of point two zero five two if I look at the odds ratio it's one point two two eight which means that if I do send a additional email to the customer it seems that the odd ratio of being retained over defecting not being retained is one point two two two eight so it seems that there is an increase of probability in the odds ratio so that seems nice and it seems that this is a very significant value so it seems that the more that you send your email to your customer it seemed that it is helpful in terms of retaining your customers so that seems quite positive okay so now let's try to do a multinomial regression okay which means that your dependent variable is doesn't have two outcomes like retained its likely for retain its zero and one maybe in this case we want to have multiple outcomes and remember there was the group a variable which has 1 2 & 3 as 3 different type of outcomes so in that case we want to use this group variable as our independent variable I'm sorry dependent variable so because remember the group was 1 if you have an order of less than $100 and you could be group of 2 if you're between 100 and 200 and your group of 3 if you have more than $200 per order okay so kind of I want to make a segmentation in terms of your customers so now let's use that as our dependent variable okay so I'm drag and drop another data set or it's actually same data set but I'm going to use a different type of Russian and in this case I'm going to rename this as not regression but I'm going to use this as multinomial regression logistic regression okay so I'm going to just say M load JIT okay so I'm going to connect the two again okay and I'm going to have to edit my variables again okay so remember I'm going to use group as my dependent variable so this will be my target and in terms of the level it's not binary this time I have to do nominal because I have more than two outcomes so it's not nominal it's not binary it will be nominal okay and in terms of my independent variable I'm going to use doorstep ok doorstep meaning that whether this customer has used doorstep service for their delivery service so I'm just going to try to see that if somebody has using the doorsteps delivery service and whether they have a better probability of predicting my group or a group variable okay so all the other things I'm I'm going to do is I'm going to have to put it as the inactive variable so it's all at input right right now so besides doorstep I'm going to put all of them as label here okay so I'm going to put all the variables as label there we go and the only dependent variable right now is group and that is remember nominal and my independent variable will be doorstep whether you have applied for the doorstep service or not and that is my only independent variable okay so I'm going to click OK and now I'm going to run this okay Enterprise miner is running currently running it's taking a little bit time but now it seems that it's finished so now I'm going to go look at my results okay so these are the result that I get I'm going to go down to where I could see my estimates okay so it seems that this is the estimate that I see remember four doors that there were three groups that there was okay but it seems that reference one is my first group okay so it doesn't report my first group as a result but however I get results for door step two and door step three okay so let's look at door step three first so and it has a point estimate of one point three five seven seven and my odds ratio is three point eight eight seven so which means that in this case door step three in reference to door step one is three point eight eight times more likely to have a the probability of using the door step service compared to the group one so which kind of makes sense because remember door step Group three was somebody who is ordering more than two hundred dollars and group one is somebody who is ordering less than a hundred dollars so the more that you purchase maybe they have a higher purchase which is a little bit more heavier so you are more likely to probably ask for a doorstep service okay and that is also quite true for group two so compared to Group two and one in this case the odds ratio is one point four three eight witches probability of asking for door step for group two it has a odds ratio of one point four three nine more likely than the prompt the for the probability of asking for door step service for Group one okay so that's my odds ratio which also makes sense because group two is somebody who is ordering between 100 and 200 and they are definitely more likely to probably ask for that service than somebody who is ordering less than $100 okay so in this example we went through logistic regression and multinomial logistic regression and you can probably try to change your variables around your independent variables around and so on and you can probably run your own logistic regression and multinomial logistic regressions okay all right well thanks for watching this video
Info
Channel: Jinsuh Lee
Views: 7,110
Rating: 4.6666665 out of 5
Keywords:
Id: 9TQLIU3M0YE
Channel Id: undefined
Length: 16min 39sec (999 seconds)
Published: Sat Nov 19 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.