365, ch 10, discriminant analysis in Excel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video will walk you through running a discriminant analysis in Excel we are using the data set train 10 where our dependent variable our Y variable is employee performance it is categorized as satisfactory which gets a one or unsatisfactory or to get to zero and we're going to try to predict that based on the scores on to aptitude tests test1 and test2 so that data is all here our first step is to sort the data by the dependent variable group so notice I have this sorted here pre-sorted come on into 0 1 1 you can you know highlight your data or you can highlight all of your data here sorry go into data and sort and then sort by you could sort by group value smallest to largest and it will you know change that to have all zeros and then all ones or you could have it the other way around as long as you start off sorting it it's fine it doesn't matter which way you do it okay next regress the independent variables on the dependent variable so go into data analysis and down to regression so you're going to run you know essentially a regular regression here where your dependent variable is your categorical variable your 0 or 1 whether or not the performance is satisfactory or not based on their performance on these two tests ok and we do have labels in there I want to put this output on this sheet so I can find it so I'm going to click right there and click OK so now it's going to run my general regression then use the regression equation to predict the independent variable Y hat given your specific or sorry to predict the dependent variable given your independent variable so we're going to do a series of individual predictions in here so I'm going to insert a new column in here and I'm going to call this Y hat so these are my predictions I could also call it predictions if I really wanted to okay so I am going to predict where they will be based on this intercept and that's going to be constant so I'm going to F for that plus their school they're the beta of test one so that's also in we constant so I'm going to F for that times the individual score on test one plus the beta of tests to again that's going to be constant so f for it times the individual score on test - okay and I'm going to carry that all the way down and it's going to predict a series of zeros and ones now I have to see did I get it right or not so I'm going to determine a cut-off point the cutoff point see um is classified classifies you know whether or not you belong in a 0 or 1 because these numbers may be rounded to 0 1 but they might not actually be zeros or ones so I can let's go here see for each of my individual yeah let's highlight all this I'm not really sure why it's not going out um they're rounding to 0 reserved ones but they're not actually giving me a value an exact value of 0 1 because there's nothing that constrains Excel to necessarily do that so here you really saw it rounded I had set to round but here are the actual Y hat values that it's giving you so what I need to do is determine a cut-off point and I'm going to call that C C is going to be based on the number in group 0 so the number of zeroes the number of ones the average value of your discriminant score and average discriminant score for Group 1 and then the average discriminant score for group 2 okay so this is D 1 let's do G 0 bar first and D 1 bar okay so I need to figure out the number of zeros that I had so I count this okay count those those are the number of zeros and there were 20 zeros and then I'm also going to use the count function to count the number of individuals who received a satisfactory scores that was the number of ones okay and there were 23 of those now I need to get the average discriminant score the average y hat for each of those groups so average of those who should be in group 0 what was the average Y hat that I got there and this is why I had originally divided my data into zeros and ones so that I could just go through into these averages relatively easily and then repeat the same thing the average Y hat for those who should have been in Group one ok enter ok so now I have that data and I'm going to use that to calculate something called C which is going to be my cutoff so C is equal to the numerator is n 0 times d 0 bar plus and 1 times d1 bar and that entire thing is my numerator and then I'm going to scale that by the number I have in each group so divided by n 0 n 0 + + 1 and again this equation is this equation is available on your PowerPoint slides ok so this gives me my cutoff point C which I'm going to get as 0.5 3 now 4 this is going to be my classification that will allow me to classify should it be in group 0 or 1 based on my prediction that I have here that I got with my Y hat because obviously my my hats are not actually zeros or 1 so for anything less any value less than that cutoff score I'm going to call it as 0 and for any value greater than that cutoff score I am going to call it a 1 all right so I'm going to go in here insert and let's see so what do I predict here so equals if this value I'm sorry backspace if this value here is less than this value and I'm going to f4 that because that's going to remain the same comma 0 otherwise 1 because if it's less than it should be in group 0 so carry that down all right so these are my actual predictions as to whether or not they will fall into group 0 1 and then I'm going to do a miss classification check so how often did I get it wrong okay so let's see this classification check space all right - classification check I am going to see whether or not I have things correct so I'm just insert another column here and I'm going to say correct you can call it miss classification you can call it whatever so if this cell equals this cell if my prediction is equal to the group it should have been in comma yes otherwise no and I think I need to have these in / in quotations but we'll see now I didn't like that okay so put these over here this needs to be in quotes okay and then so it's telling me did I get it correct well for the first three observations no I didn't get it correct but a large number here I have a whole bunch of yeses so it doesn't look like I did so bad um let's actually you know to evaluate my performance here I would kind of need to know the number of yeses and noes so number no number yes okay and then I'm just going to count the number of nose so count if this array right here this correct column comma to put this whole thing in so count if that range I didn't really need a parentheses around that but whatever that's okay um says no and then count if that same range comma yes okay and then I'm going to figure out my percentage of my percentage correct so my senate percentage correct is going to be my yeses divided by you know my total so this number okay so I correct divided by my total observations here and that needs to be in parentheses to add these two and indicate that they are going to be my denominator and I'll put that in percentage notation so that says that my model correctly predicted whether or not you would receive a satisfactory score roughly 79 percent of the time so it's not great but it's not really all that bad so I can see in here how it would do any individual predictions my interpretations and everything since this is essentially just a regular regression are all going to be the same so my interpretation for the beta of test one for each additional point scored on test one um it's kind of art I can't really say there you know it this is kind of like it increases your likelihood of being in the satisfactory group by point zero six is you know essentially the interpretation but this is basically just used for predictive purposes so given any individual test score I am then able to make a prediction this Y hat then based on that see that discriminant score I'm able to you know fill in whether or not based on my model I predict you would be in group zero or group one so this is essentially they're just for predictive purposes I can however use my p-values to see whether or not each test is significally significant so it looks here like your score on test one is statistically significant but not your score on test two so that's how we use discriminant analysis okay I will stop my recording
Info
Channel: Daria Newfeld
Views: 26,020
Rating: 4.860465 out of 5
Keywords: discriminant analysis, Excel
Id: NaZ6Xuczs94
Channel Id: undefined
Length: 13min 50sec (830 seconds)
Published: Tue Mar 25 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.