Two-step Cluster Analysis in SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm going to show you how to play around with cluster analysis in SPSS cluster analysis is really useful if you want to for example create profiles of people the good kind of profiling for example you can see if your employees are naturally clustered around a set of variables for example do I have highly educated employees who are married and have been on the job for a long time are there also divorced on the job a long time who have a lot of education and various other clusters there's a way to explore that here in SPSS and I'll show you how to do that you go to analyze classify and we're going to use the two step cluster and let me just reset this what you'll do is you'll choose either categorical or and/or continuous variables to use as your clustering variables your indicators so let's choose sure let's see um let's just choose as one ones we talked about let's go with years and current job you know I don't know what marital status means so let's try years in the firm and education oh these two are probably overlapping way too much let me get rid of that one and throw in something like reliability how about that now these are all continuous variables if I chose something like gender that would be categorical because it's the the number the numeric values in that variable you can see over here in the gender column those ones and twos don't actually mean anything numerically they just mean male/female so those are categorical variables anyway I'm going to get rid of gender agender zas whomping variable there are only two values possible and so it's very unlikely that I'll end up with more than two clusters unless I force it let's go to impact that's a good example let's include gender I also want to include an option is nothing and an output I want to see how my clusters what's word I want to evaluate them on their productivity and job satisfaction now these two variables won't be used to determine clusters but they will be used as a post hoc a sort of after word analysis seeing how each cluster rates with regards to productivity and job satisfaction I also want to create cluster membership variable what this will do is create a variable at the very end of your data set that shows the cluster membership number for each respondent so for each row there will be a value and that value is the the cluster number essentially what you can do is you can use that cluster number as a moderation variable in other analyses like a like a structural equation model alright other things to do on here are to determine the number of clusters we want to see well I don't want to see 15 that's ridiculous I can't even I wouldn't be able to figure that out I think 4 is the maximum I want to see or I could specify that I only want 4 but I'm going to go with them I'm going to let SPSS determine how many are most appropriate number of factors and then just hit OK pull this up here all right the only thing that will come out is a model summary and at first you think that's it ah but it turns out you can double click it to get more information now what this says is we had 4 variables used to compute or to extract clusters and we only ended up with two clusters just like I predicted I figured gender was probably a swamp invariable there are only two values for it so we're probably gonna end up with the cluster of males and cluster of females and then all the other variables would be useless but it says that this those set that set of indicators those four variables are actually good for clustering so that's interesting ok I just double clicked it and expand this a little bit alright looks like we have the smallest cluster has only 78 members as probably fine you want to keep probably above 30 35 ish in your smallest cluster although it's good another good rule of thumb thumb is to keep the ratio under three I like under two that just means that no cluster in your data in your cluster set is more than two times as large as any other cluster we can now look at variable importance so we use gender education reliability and years in the firm to create these clusters well it looks like gender is this swamp a variable the the clusters were created essentially just based on gender so if we go over here and look at clusters we'll see that sure enough we have gender female as 100% gender males 100% I means everybody in this cluster was male so it only clustered based on gender so this is kind of useless I'm actually going to go get rid of a gender in order to make this more useful just close that rerun the analysis and get rid of gender go boom and hit OK and we end up with three inputs and three clusters not a great fit but it's not terrible double-click it here we are smallest clusters 53 probably okay the ratio is two point eight nine that's under 3 we get variable importance oh man another swamping variable education well bother let's try a different set of variables it looks like these demographics probably aren't very good for profiling are our employees so I'm gonna try something else get rid of these let's see about how happy and how productive they are productivity and job satisfaction now you remember I we use these evaluation swing get rid of those I'll throw a burnout in here that's probably a good one burnout here we go we can evaluate them based on burnout alrighty there it is just these two that's all I'm going to use all the other settings are the same hit okay it came up with three clusters you know though I want four I want a high high a high low low low and a low high if that makes any sense so I'm going to force it go back here specify fixed I want four clusters run it and looks like the fit is just about the same so it doesn't really matter double-click expand and here we are looks like the cluster sizes are pretty good night nice and even 2.35 ratio that's kind of nice look at variable importance good both variables were used to determine clusters go over here go to clusters now let's see expand this a little bit a couple things we need to do first you'll notice that the cluster numbers are not in order it goes four one two three it's because because it's because they're ordered in it with regards to their sample size are that the size of the cluster so this one has 127 this one sixty two sixty two 54 so they're ordered in that way you can change that by clicking on this little icon down here so now they're ordered one two three four that's pretty convenient next thing I want to display also the evaluation variable Oh bother there it is burnout you have to click the evaluation fields checkbox and hit OK and you'll notice that that just gets added down here at the bottom so once again Burnett was not used to create the clusters but we can now see the level of burnout for these different clusters ok to view a cluster you click on the very top course as number one and we can see over here on the right scoot this over a bit that this cluster is those who are not satisfied so they're pretty unhappy at work but they're still pretty productive this is a box plot and so the line represents the median and then the size of the box represents something like one standard deviation from the mean on either side and so it looks like very unsatisfied but fairly productive and very burnt out yeah that makes sense cluster two we have fairly unsatisfied but not productive at all and really burnt out so whether you work hard or not I guess if you're not satisfied you're going to be burnt out here are the ones that are really satisfied love their job but they're not very productive and they're not very burnt out so they're just sort of kicking back and taking an easy at work and then you have those that are satisfied and are very productive and they're not burnt out at all these are your golden employees these are the guys you love these are the ones that you'll probably punish by making them managers anyway so there we go we can see these meaningful clusters go ahead and close that and you can see let me close this as well you can see on the very right side at the very Oh for this extent in your data set you have these TSE variables that's something about clusters and you can see in this one we have four three two one it's a one to four range it's the cluster membership number so respondent one belongs to cluster four respondent two belongs to cluster three etc you can now use this in ANOVA's as a factoring variable or you can use it in Amos when you're doing a structural equation modeling you can use this just as you would use any other multi group moderating variable like gender or job category I'll show you how to do it in here with ANOVA we'd go to something like compare me means ANOVA and here are these things productivity job satisfaction burnout oh that's not good cause I'm using these in the clusters but I can do burnout and my factor will be that cluster number down there and hit OK and it looks like with regards to these clusters they do differ with regards to burnout they don't all have the same level of burnout so very interesting you could also test it in an Amos model I hope this has been helpful if you want to learn how to test it in turn Amos model go watch my other video on multi group moderation in Amos alright
Info
Channel: James Gaskin
Views: 186,561
Rating: 4.902564 out of 5
Keywords: Two-step cluster analysis, SPSS, statistics, AMOS, moderation, multi-group, profiling, profiles, tutorial, demonstration, demo
Id: DpucueFsigA
Channel Id: undefined
Length: 11min 42sec (702 seconds)
Published: Mon Mar 19 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.