Two-Step Cluster Analysis in SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this is dr. Ghani welcome to my video on conducting a two step cluster analysis in SPSS in counseling research we use the cluster analysis to allow us to group participants together based on variables that we specify an analysis so taking a look at these fictitious data have loaded in data view I have a hundred and twenty participants and two independent variables program and gender and three dependent variables depression anxiety and substance use let's assume that these three dependent variables are all recorded using a standard scores specifically eighty score which has a mean of 50 and a standard deviation of 10 and that lower scores represent reduced frequency duration and severity of symptoms and higher scores represent increased frequency duration and severity of symptoms so to get started with the cluster analysis we'll first go to analyze then classify and we'll select two step cluster and this is what the dialog will look like by default the area of interest here would be the continuous variables specifically the dependent variables depression anxiety and substance use so I just hold down control and select those three and move them over I'm making no change under options but in our output I want to make sure I check off create cluster membership variable so so create a new variable that you can see in the dataview and identify which cluster a specific participant belongs to and then we'll click continue and you can see for a number of clusters you can have SPSS determine the number automatically and you could set a maximum and you can also specify a fixed number of clusters by default the maximum is 15 I'm just gonna leave it there and click OK so at first you can see the output you have this the two-step algorithm you have 3 inputs and have found 3 clusters we know that and it gives you an idea of the cluster quality and the cluster quality here would be good but just barely to access the rest of the output we're going to double click on this and it's going to open up what SPSS refers to is the model viewer I'm going to expand this out a little so you can see SPSS identified 3 clusters that was over here and you can see on this pie chart as well and the smallest cluster has 26 records and the largest 49 and that gives you the ratio of largest cluster to smallest cluster in this case the ratio is under 2 which is what we're looking for although under 3 is often acceptable as well down here at the bottom left of this window with cluster sizes you can select predictor importance and you can see that substance use had the highest predictor importance then depression then anxiety and then looking at the model summary if we look at the bottom left we can see that it's selected to model summary we can select clusters and I'll actually provide us information on each of the clusters you can see they're ordered here to 3-1 because it's going by the number of records in each cluster if you want to order it 1 2 3 you just go down to this button here and now it reorders it 1 2 3 so as you can see cluster 1 has 26 records and to has 49 and 345 and it'll give you the mean values of the different variables so even at this level we can see a pattern which is the lower scores tend to cluster together and then kind of the mid-range scores tended to cluster together and then the higher scores and the largest group was those in kind of that middle range so if we select the entire cluster by clicking up here at the top we can see the cluster comparison comes up in the window on the right and you can see that the median is provided for each of these variables so we can see in cluster 1 which we know is the cluster has the lower scores that the substance use in the depression moved quite a bit in terms of you know to the left they're quite a bit lower but anxiety doesn't move quite as far if we look at the cluster two you can see they're all pretty much in that middle range although substance use and depression a little lower than anxiety and then in cluster 3 anxiety is a bit higher the depression is even higher than that and then substance use is the highest so if we're working with actual data here we would recognize that anxiety appears to not move a lot as depression and substance use moves quite a bit so it's it's less sensitive perhaps that would be a characteristic of the particular population sampled or measurement error or a combination of both but either way the anxiety dependent variable seems to be less sensitive to movement in the other two domain substance use and depression so again if you look at these three you see four one and three substitutes and depression are further left and right respectively but of course in that middle cluster to which is the largest they're fairly close to the same another useful output is if you select one of these cells for example the substance use in cluster one it'll show you the cell distribution so the overall is the pink in the background and then the red kind of in the foreground is the substance use cell distribution for cluster 1 so you can see the cell distribution is to the left to the lower range and then for cluster two kind of in the middle and then cluster 3 toward the right for depression we can see a similar pattern and then for anxiety we could see in the this is the lower category cluster 1 the lower scores and cluster 2 and cluster 3 so it's not as distinct with anxiety so the lower cluster does seem to be a bit lower but the cell distribution for anxiety for cluster 2 and for cluster 3 really don't seem to be as different from one another as you would see for depression and substance use between clusters 2 & 3 so I'm going to minimize the model viewer and the output and I want to show you the variable that we created in the dialog for a two step cluster which identifies the cluster for each record so we can see for ID 1 0 0 1 this participant was in cluster 1 to their cluster two four three cluster one and so on it'll give you all the cluster numbers for each record if you wanted to look at these records by cluster you could sort you could sort descending or ascending in this case I'll go with ascending so cluster one this cluster tends to represent scores that are a bit lower or so this would be where the frequency duration severity the symptoms were a bit lower and then you remember to is kind of mid-range cluster and then cluster three a higher frequency duration severity of symptoms so by sorting you can get a look at the records divided up by cluster I hope you found this video on 2-step cluster analysis and SPSS to be useful as always if you have any questions or concerns feel free to contact me and I'll be happy to assist you
Info
Channel: Dr. Todd Grande
Views: 28,097
Rating: 4.9085712 out of 5
Keywords: SPSS, two-step cluster analysis, cluster analysis, cluster, classify, independent variables, dependent variables, predictor, predictor importance, ratio, cell distribution, categorical variables, continuous variables, counseling, Grande, Statistics (Field Of Study)
Id: s0e0esZAk7w
Channel Id: undefined
Length: 10min 2sec (602 seconds)
Published: Sun Sep 13 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.