SAS Enterprise Miner- Decision Trees

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi this is Prashant and I'll take you through a short video tutorial on SAS Enterprise minor so let's begin the start will create a diagram will name it as demo for now let's go ahead and import our file import node you can import a file by clicking on import file and then browse it from a computer will select the wine quality data and openness you can preview the data before importing the file as you can see from the data that we have certain attributes for the wine like procedural sugar citric acid chlorides etc and we have a quality here which is rated from one to ten this is our target vegetable that we are going to predict so let's go ahead and import the very file now we can edit the variables here we will click on edit variables as you can see that we have all the attributes this residual sugar should be an input so we'll change it to input and we'll make quality as our target variable you can observe that the levels are interval this is because our variable is all numeric so let's click OK so now we are going to do some statistical explorations so we can import this stat Explorer node and we'll connect the nodes and run this node as you can see from the results that we have a correlation plot this shows alcohol as a positive correlation with the wine quality this and as we go down you see that density has a negative correlation with the with a target variable that is the quality of the wine you look at the output you can see some stats of the different independent variables we have the mean and the standard deviation moving to the variable worth this shows that the alcohol is the most important variable and predicting our target that is the quality of the wine now let's go ahead and create partition for our data we will import the data partition node and we'll also add a control point this is because we want to run multiple models from the same point so we'll connect the data partition node and we look at the properties of the partition we can assign the partition of the training validation and test set at 60 30 and 10 so now can connect will connect the control point and we'll add a model which is the decision tree here we'll connect the control point to the decision tree model and we can look at the properties of the decision tree as you scroll down you can see the maximum depth we'll assign is at 10 this is the number of approved generations Pro 10 generations our data will be displayed you can assign the leaf size as 8 and we'll keep the remaining as default so we'll run this model and we'll see the results so as you can see from this result that there is a score ranking we have the huge statistics so this is our decision tree that has been created you can have a look so this is the graphical representation of the tree and if you look at these fit statistics you can see that the data has been split into trained validation and test we have approximately 3000 in train validation is around 500 so if you move down the average squared error for the training dataset is 0.49 for validation it's 1 5 4 and for the test it is 0.55 further we move down we can look at the output of the model that we just ran the variable importance as you can see is alcohol is the most important variable in splitting our data point and we can see the number of splits that are they are Phu so moving to the leaf report you can see there are nodes and depth is their training observations per node has been assigned so the depth as you can see it's not in a sequential order this is because the tree is automatically pruned now let's go ahead and check the node rules we can do that by clicking on view and model and then known rules as you can see the node rules basically describes the offense condition that has been used for splitting the nodes at each iteration so now we'll go ahead so now we will compare our model we'll add a gradient boosting node and we can look at the properties we'll keep the maximum depth as five and we'll add a model comparison and we will compare a decision tree with the gradient boosting node the gradient boosting is nothing but it's just resamples the analysis data several times and generates the results of the weighted average so we'll run this model and check the results so as you can see from the fitness statistics that the gradient boosting performed better than the decision tree and the criteria was based on the average squared error which was 0.53 in case of the grain boosting algorithm and it was point 5 for in case of the decision tree so that's all from this video thanks for watching
Info
Channel: Prashant Bhowmik
Views: 24,415
Rating: 4.3846154 out of 5
Keywords: Decision tree, SAS Tutorial
Id: pFBR6iXtOsc
Channel Id: undefined
Length: 5min 12sec (312 seconds)
Published: Fri Apr 08 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.