Hierarchical Cluster Analysis SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I've been learning a lot about cluster analysis lately and I thought I'd make some videos to log what I've been learning I'm not an expert yet I'm not an expert at all but I figured I'd make these videos in order to help others along the way we're trying to learn this as well so there are a few different types of cluster analysis there's hierarchical and then there's a non hierarchical which would be like k-means which i'll make a video on shortly and then there's the two-step approach which i've already made several videos on and that is a combination of both hierarchical and non hierarchical in this video I'm going to show you a hierarchical approach so I have some data here this is real data on burgers at various restaurants Arby's Burger King McDonald's Dairy Queen Sonic etc McDonald's and let's say I want to know how these burgers and burger groups are related to each other I could do a hierarchical cluster analysis go to analyze classify hierarchical and then me just throw all these out for a sec what I'd like to do is I'd like to use a certain number of variables to create these clusters in order to determine the groupings of burgers so I'd like to burgers based on calories total fat how about sodium you know all these terrible things that's that's probably good for now so I want to see where the awful burgers are where the good burgers are in terms of how these group now I'd like to label each of these cases by the sandwich name so I'll be able to see where each burger groups you'll see how that works in a sec now under the statistics options there's the agglomeration schedule which might be handy we'll see and there's a proximity matrix now I don't typically use this but you could use this if you wanted to export the the cluster analysis into a more biological type of software like mega or clustal or any of those clustering softwares for biology and then you could create all sorts of really cool phylogenetic trees SPSS will create for you a dendogram and that's the extent of their phylogeny trees I'm going to not use a proximity matrix as for cluster membership I'm going to say you could have a single solution or a range of solutions and so I'm going to say I want a three cluster solution or if you wanted to do minimum to maximum before you know it let's run with that and I can show you that that does continue plots I would love to see a dendrogram that shows us the sort of ancestral kind of relationships amongst the burghers ah I have not been able to figure out what this icicle chart really means so I'm going to say none and continue method a lot of stuff here there are several different methods for conducting a hierarchical cluster analysis - the most popular are the centroid and the nearest neighbor there's also the wards method Ward's method has a lot of flaws but one thing it's good at is creating roughly equal clusters to get a cluster with 20 burgers in it another cluster with 18 burgers in another cluster with 21 burgers in it very roughly equal groupings whereas in some of these others like centroid which is what I'm going to use you could have a cluster of two burgers or one burger and a cluster of 50 burgers all on the same cluster analysis now when using the centroid clustering method the best interval measure is actually the squared Euclidean distance there are reasons for this if you'd like to know why go read chapter 9 of Joseph hares multivariate data analysis book and then I'm going to standardize I have you can see here I have calories which are measured in calories and fat which is in grams and sodium which is milligrams and these are on vastly different scales you can see calories in this column here in the hundreds go look at sodium in the thousands and then total fat in the tens and teens so these are on different scales so I'd like to standardize them to between negative 1 and positive 1 and it continued and there's also an option to save membership numbers so for example this burger number 3 beef and cheddar classic from Arby's that's going to land in a certain cluster and each cluster is going to be numbered if I want to know which cluster number beef and cheddar classic belongs to I can save that solution and it will save it is as a new variable column right now I'm not going to save that so I'm going to cancel here I want both statistics and plots and I'm going to cluster by cases hit OK it's going to think about it right now I have my data split by restaurant so it's going to do an analysis for each restaurant let me just come down here to whoa Wendy's here we go here's Wendy's all right in the agglomeration schedule we learn how much new information is provided by each additional cluster now one thing you can do to see if this is even useful is you can double click it and then highlight all of the coefficients here right click it go to create graph and go to line graph this creates something like a scree plot and we can see that these burgers actually don't cluster very well until you get out to 22 clusters where it jumps up and we add a lot a lot of new information so that's unfortunate typically what we'd like to see as far as I understand it is a sharp increase early on and then a tapering off so just the opposite of what we're seeing here sort of like a sort of like a scree plot for a factor analysis here we asked for minimum of two clusters maximum of four clusters so it gives us one two three or two three four numbers of clusters and then it tells us for each burger which cluster number does it belong to you can see that there are differences so if we go to number 245 the double stack I think that's from Wendy's the fourth if you look at for cluster solution the double stack would end up in cluster four if you look at a three cluster solution it end up mr. three and if you get a two cluster solution it would end up in cluster 1 that's what that'll do for you now here's the dendrogram I think this is pretty cool stuff this tells you many things it tells you how many different clusters we could observe right now we could say that there are two clusters you have this one right here which is just one burger the triple who the three quarter-pound triple hamburger so the death burger which is just way outside the norms in terms of sodium calories and total fat as my guess so it's a class of its own a cluster of its own and then you have everything else or we could say we have one two three four and then maybe this one is a fifth cluster and we can also see how things relate so this cluster right here relates to this cluster more than it relates to this cluster and then this cluster relates to this cluster more than it relates to these clusters and so on so there are multiple levels hence the name hierarchical multiple levels of clustering and to interpret this we can just look at what it is let me copy this go paste it in a Word document so we can zoom in a little bit better let's see paste as picture and there we go we can see it a little bit better now in this first in this first cluster here we can see we have grilled chicken go wrap another wrap a little burger for kids junior burger another small burger another kid's burger Junior burger these are the small low calorie burgers we have chicken chicken and then another Junior so those are all the small burgers let's go down they're closely related to chicken chicken chicken and another junior hmm this makes sense so we have small burgers relatively small burgers let me scroll down now we have this group of three we have the ranch club ultimate chicken grill the pretzel pub sandwich homemade chicken and the quarter pounder so these are slightly heavier then we move down to these this next group we have the son of Baconator and the Baconator single the half pounder double hamburger these are big burgers and they're more related to this cluster right here then they are to these guys up here and then of course you have your massive monster death burger the 3/4 pound triple hamburger which is just so beyond everything else in terms of sodium calories and fat so that's how you would read a heart hierarchical analysis a dendrogram or a phylogenetic tree and that gives you information about the relationships between burgers and groups of burgers or whatever unit of analysis you're using typically Holmby burgers will be more like customers or companies or cars or something like that and that's the extent of my knowledge on hierarchical cluster analysis at this point if I learn more I'll make another video
Info
Channel: James Gaskin
Views: 196,053
Rating: 4.8634639 out of 5
Keywords: SPSS, Cluster Analysis, Hierarchical, Statistics, visualize clusters, dendrogram
Id: bMH-aHNlhBA
Channel Id: undefined
Length: 9min 49sec (589 seconds)
Published: Wed Jun 24 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.