Decision Trees in SAS Enterprise Miner

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right in this video I'd like to talk about decision trees assuming that you have some data that has been partitioned notice in this data partition that I have right here 50% of the data is going to training and 50% is going to validation I can then go and create my first model so under the model tab I will pull down a decision tree I will run something to that and first I'm going to demonstrate the creation of a maximal tree so using a chi-square driven algorithm the the decision trees are created in every case and then they can be pruned if you run an automatic procedure or they can be not pruned at all in the maximal tree case and so to demonstrate that I will show the over here in the properties panel I will go to interactive under trained once that's open I see this first split point that's available and I if I go and I well there's a couple different things that I could do but if I just simply click on well okay if I click on split node I can do it in a manual approach and so this is the chi-square based number that is created called log worth and this deterrent this what this does is it's showing us which is the variable that's best to use for splitting our data to get better predictions of a 1 or a 0 outcome for in this case for a donor or non donor and if I click on edit rule I can see within that what point within the data is the best point that splited at so we get kind of two different things happening one is the function scans every single variable for which is the best one and then also within each one what's the point a split adapt and so the first so if I were to just to click this I can it would create that first split point and I could do it again and in this interactive fashion and we can tell just by the the magnitude of this number that it's less important than the previous split was in terms of predicting whether somebody was a one or a zero and and then we can also see what the split point is for that so we could do this manually all the way on down or I can right click on here and just say train node and what and if I will right click on here and say view fit to page this is our maximal tree so using this chi square inspired algorithm it selects all the variables and all the split points within the variables to create a tree for making decisions in and if I were to zoom back in a little bit so I could see one better one of the things I can see here just to kind of explain it a little bit is that the the variable was gift count 36 months and if we go if the value in there is less than 2.5 then we already know that within our training data set that there's a 57% chance that it's a non donor and there's a 43% chance that it's a donor and better yet we know that in the data that has not been seen yet the validation data set that it's almost 57% know and 50 and 43% yes so this is probably this is actually the more important thing to be considering now the maximal tree is generally not going to be your best tree to use you actually want to use one that's trimmed down the reason for that is that in on the whole if you specify that many splits you're usually over optimized for the particular data set you did the training with and not as much for you've kind of over complicated things and it's probably if you had simply chosen less split points you would probably be be able to have a higher accuracy rate in predicting cases on a data set that hasn't been seen yet because what we've done is we've over trained or over optimized our tree for a particular set of data and it's picking up all every single last nuance in that data well not really every last one but close enough we picked up a lot of nuances in the data that just might be random and so typically a smaller tree will be better so I am going to just rename this one just regular decision tree and I'm just going to end and to run this I don't have to change anything I just right-click and hit run but let me before I do that let me just point out a couple of things that in terms of well now there's this subtree part right here and so the way that this the the decision tree nodes work is at first they create the maximal tree and then they trim them down in order to work the best against the validation set and there's a couple different ways that you can define best in this case because it knows that the target outcome is 1 or 0 it's going to say well we're going to try to optimize the branches that would be used for making a 1 or a 0 decision with the particular data set that we have we have another option which I'll demonstrate in a minute which I'll call miss classification but for now we'll just stick with we're going to trim the tree based on checking against a validation data set and really we're checking to get the best alignment of ones and zeros that we can so I just simply right-click on there and say run it will create a new tree that is started off with the maximal tree but then it trimmed it to and it tries several different methods of trimming all right and we get this results window and so notice how this tree looks very different than the other one it actually has the same initial split that the other one did but it it is it's much smaller and let me now create one last tree and I'm going to call this a probability tree and what I'm going to do is I'm going to alter this way that it trims the the tree I'm going to use average squared error which means it's instead of just going against optimizing it for a one or zero decision it's trying to optimize it for the numeric outcome in any given leaf so remember we visualize them before and we see things like fifty-seven percent and forty-three percent chance of doing this or that so we're trying to optimizing it as close as we can to that number so if we're trying to estimate something else other than a yes/no decision such as the amount of money that people are going to spend and we want to get as close to that number as we can then average squared error would be the right way and the best way to trim the tree so it just kind of depends on what it is you're trying to predict if you're trying to predict yes/no then a just a decision measure would be the right thing to use to trim your tree if you're trying to predict an a number as your outcome then you'll want to use average square error alright here we have another tree that's been created and this time it's it again it started with the same initial split it behind-the-scenes it had the same maximal tree to begin with but then it trimmed off leaves based on and split points based on you know how well it it it matched the pretty on the training data versus the validation data the the numbers that are listed there so and note that it look at using that pruning method it came up with a different number now a question we should probably have is okay now which one is actually the best one meaning overall which of the trees that we made is most it has the highest level of accuracy in predicting outcomes well because we're doing decisions one of the best things that we should look at is this missed classification number here there's re in this little table here says fit statistics we want to look at two different things miss classification rate is the percentage of the observations that it gets wrong so you create your tree to try to tell you you know if you're greater than 2.5 are you going to give or not give how many times did you miss classify while giving the given the tree that we've created here we're wrong 43 percent of the time in the validation data which is what we want to look at we trained with our training data and then we test it on data it's never seen before it's wrong 43 percent of the time and then we have this other thing right here actually let's go with average squared error so that's another way to look at it and we want to have the lowest number for this as well so you could use either one of these numbers to analyze compare and contrast which of the models is best and because it takes a little bit of time to go in and out of the windows I won't do this here but I would recommend that you would look into you each of these and look at their mess classification rate in their average squared error and whichever one has the lowest you want to go with that one last thing I'd like to mention is that if you go after you've after you viewed the results of something if you go view model subtree assessment plot you can you can see what happened when it started with its maximal tree with 15 leaves that the misc let me go over to miss classification right first so this is kind of gives you a little bit of an intellect just a sense for what's going on behind the scenes when you start when it started with the maximal tree and started well when it started there the miss classification rate for the training data or sorry for the validation data was getting better it was forty percent however that same formula as it were with all those different leaves all the different categories you can be dropped into really didn't perform as well it was three percent worse on the validation data this blue line finds where you had the simplest smallest tree that had the the best performance on the validation data set which is what we want we want something that's really well predictive of data that it's never seen before in terms of predicting a one or a zero a donor or a non donor and this blue line indicates that five leaves is the one that does that you could also similarly look at average squared error and kind of see the same story that if you're going to judge your model based on its performance in terms of predicting numbers that numerical outcomes rather than one zero that we have although that is a number but you know what I mean in terms of class in terms of something else like how much you're going to spend or percentage likelihood if you're trying to most accurately gauge the percentage likelihood of doing something not simply like one zero again we can see that if we added in a bunch of leaves with our maximal tree all the way to 15 leaves you know that we get a certain rate for squared error that our squared error is relatively low at least preferred to the picture but the validation that data set doesn't do as good so we've kind of over trained we're predicting our our training data set better and better but we're predicting a validation data set worse and worse and again it shows that five leaves is is where it smack it it were best performs against the validation data
Info
Channel: Degan Kettles
Views: 25,505
Rating: 4.9477124 out of 5
Keywords:
Id: z0WQe5_gWEc
Channel Id: undefined
Length: 13min 37sec (817 seconds)
Published: Fri Oct 07 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.