60. Cluster Analysis in Practice - II

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Welcome friends to the lecture of Marketing Research and Analysis. So in the last class, we are talking about a new technique called cluster analysis. So we understood that cluster analysis is a technique very similar to factor analysis, which is also called an interdependence technique. It is a subjective measure where we are trying to classify objects or create groups or clusters, which have lot of similarity within themselves, right. And our intention is to create such clusters, so that within the clusters there is a lot of homogeneity, but between 2 clusters, there is a lot of heterogeneity. That means 2 clusters are as far from each other is possible, but the data points within the clusters are very much similar or close to each other. That is the objective of the study and where it is used? It is used for in areas like climate change even biology, botany, in genetics, and largely also in marketing. So our interest is mostly into marketing because we want to segment the market. When we want to segment the market, how do you segment the market? For that you have to understand the consumer’s buying behavior and patterns, right. So to do that cluster analysis is very helpful, another area which is nowadays becoming where cluster analysis also being largely used is in the political system, right. During elections where people want to when leaders are giving the speech, they try to create a speech according to behavior of a particular cluster. So suppose some cluster is a very traditional, for example some leader wants to give a speech in a place where the people are mostly very modern and fashion oriented, so he would talk more about simple such kind of issues. Suppose somebody where is a very sensitive place and people are more open to relationships and all, so they talk more about such kind of rules and regulations. In some places, people might be very orthodox people, so they would like to talk more about holding to the basics and fundamentals and all. It helps leaders also to create a different kind of speeches for themselves and accordingly catered to the audience. So cluster analysis is being largely utilized in every sphere nowadays, right. And we said that it uses instead of the core relation measure, it uses the distance measure, right and we learnt in the last lecture that basically there are 2 types of cluster analysis one is the hierarchical cluster analysis and the other is the non-hierarchical cluster analysis. The objective of the hierarchical cluster analysis is to identify the clusters. Basically it helps you to identify the clusters and the non-hierarchical cluster analysis helps you to identify the characteristic of the cluster. For example, hierarchical only said to you how many cluster can be formed. That how many is explained through the hierarchical clustering and what is the characteristic of the cluster is explained in the non-hierarchical or k-means clustering, we say. So let see how it is done. So hierarchical cluster analysis as I was just saying non-hierarchical cluster analysis and combination of both. So sometimes many of the times, it is better, it is advised that we used a combination of the hierarchical cluster and non-hierarchical cluster, so that here the number of groups or number of clusters are decided and here the characteristic of the clusters are decided or observed, of the cluster is observed, right and then the researcher or the marketer can decide, which cluster to cater to, right. So let see the hierarchical cluster analysis here the stepwise procedure attempts to identify relatively homogenous groups of cases you remember this is basically the cluster analysis is done on cases, as factor analysis is done on variables, it is done on cases that means the respondents. Cases based on selected characteristics using an algorithm either a agglomerative or divisive. Agglomerative means combination, combing. Divisive means breaking it up resulting to a construction of hierarchy or tree like structure depicting the formation of clusters. It is preferred when sample size is moderate under 3000-400 and not exceeding 1000. So hierarchical clustering is preferable when you have a moderate sample size. So as I was saying there are 2 methods agglomerative and divisive. It begins with each object or observation in a separate clusters in each subsequent step, the 2 clusters that are more similar or combined to build a new cluster and this process goes on and on and on till you have a single cluster, from n cluster to 1 cluster. Divisive it begins with all objects in a single cluster, which is then divided into each step into 2 additional clusters that contain the most dissimilar objects. So single cluster is divided into 2 clusters then 1 of this cluster is split for total of 3 clusters. This continues until all observations are in a single member clusters from 1 cluster to n subclusters. This is the opposite way. So this is how it looks. So this is the divisive method. If you go this side from 6-5, 4, 3, 2, 1 and if you go this way that means this side it is the agglomerative way. There are different types of methods when you talk about the agglomerative technique. So the 3 most popular agglomerative algorithms are linkage method, which has I will tell you what is that, centroid method and variance method, right. So when you use any software for example, SPS or something, it will ask you which approach do you want to use. So a single linkage method, a complete linkage method and average linkage what is that let see and then we will look at each one of them. It is also called the nearest neighbor method. Define similarity between clusters as the shortest distance from any object in one cluster to any object in the other. For example, this is one cluster and this is another cluster. Now there are several data points. You see this is the distance. This is the distance. This is the distance. Similarly, from here, if you this is one, this is one, this is one. Similarly, this is one, this is one, this is one, but which is the closest or the nearest point. This one, so this one is the closest. So the most bold looking one. So now what it does is, it tries to add the clusters on basis of the nearest point taking the shortest points. Complete linkage is just opposite. It takes the data on base of the farthest neighbor. So the oppositional approach is to single linkage assumes that the distance between two clusters is based on the maximum distance between the 2 members in the 2 clusters. Third is the average. Now what is the average? It identifies all the different points and adds them and takes the average distance between all pairs of the 2 cluster members. So that is why if you have large data points, you know hierarchical clustering becomes very difficult, because so many combinations will come, so many calculations have to be done. Then the next method is after seeing linkage, we have the centroid method. A method of clustering, hierarchal clustering in which clusters are generated so as to maximize the distance between the centers or centroid of clusters. What are the centroids? Cluster centroids are nothing, but the mean values of the observations on the variables of the cluster. So we take the mean value. The distance between two clusters equals the distance between the two centroids. So it takes the mean value and finds the centroid, right. For this cluster, this is the centriod or the mean point and then it takes the distance between these 2 points. The third method is the variance method. What is this method doing? In this method, the clusters are generated to minimize the within cluster variants. So you know about within cluster variance means, so within the cluster you take all the distances and you measure the variance and then find out the minimum variance. So here the method used is called the Ward’s method. In this method, the similarity used to join clusters is calculated as the sum of squares between the two clusters summed over all the variables. This method has the tendency to result in clusters of approximately equal size due to its minimization of the within and you know the problem again the Ward’s method makes it very complicated sometimes because of the calculation, within the software become very complicated because large number of calculations are to be done. So that is the only, otherwise for you and me it does not matter because we only change the output. But for the computer, it takes lot of exercise. So let us do one. This is the pros and cons of the hierarchal clustering. For example, it is simple and yet portrays the clustering solutions, speed advantage of generating an entire set of clustering solution in efficient manner, right. But what is the problem, difficulty in analyzing large samples. That is what I was just saying and the results are most susceptible to out layers. If there are out layers, then it will have a problem. So let us take one case. This is the case which I have brought. So this is the, there are different names of different children. Their age is given, memory span is given, I/Q is given, reading ability is given. This I think we have also used during regression, the same data set, we have used. So now suppose I want to do a hierarchical clustering. So I want to see how many clusters can be formed out of it. First question, so to do that, first of all what do you do is, I will explain. Go to classify and you see hierarchical clustering here. So when you go to hierarchical clustering, take the data points that you are interested in the variables, right. Now what do you want. Now go to statistics, here the agglomeration schedule is by default there, so no issues, continue. I am never interested in looking at the dentogram because I can find it through the agglomeration schedule, so I do not use the dentogram generally. So you can use that, but then it looks very odd sometimes, it is very complicated. And I do not feel it is a required even. So what is the method you would use. So there are different methods. So generally one of the best method is the Ward’s method and how do you want to measure the distance, the squared Euclidean distance this is by default. This is now important, now you see, if suppose do you want to standardize, it is giving an option, you need to standardize when your data are measured in different kinds of scales. If they are not measured in different kind of scales, you did not have any problem, but if you have the suppose some in categorical, some in you kind of a continuous or such kind of scale issues are there, then you will have to just standardize. You do not have to do anything; you just have to go for these 2 options. Either z score or range between -1 to +1. So if you give any of the them, it will be standardized and then the scale issues are automatically gone. So I am not doing it here, right. I will show you if you want to do in another case may be. So do you want to save the value, do you save it, well at the moment no, because it is just I want to find the number of clusters, so nothing here. So let us see if you look at it. See this data is a very small data just for in class purpose I have brought it. Now look at the agglomeration schedule. How many clusters to be formed you can get it from here. Now for example look at the last value, go from backwards 794.185, the coefficient value right. Then the next one is 249, so there is a significant difference in the value, there is a great fall, if you come from this way great jump. Here also there is a significant difference, here also to some extent here, so you up to that point you go till you feel there is a serious or a big difference between the 2 values, right. Suppose but remember 1 thing, if you create too many clusters, it is not very good. Why because the characteristics of the clusters will be very thin and there will be a very little difference between them sometimes very much possible. If you create only one cluster then there is no point. If there is only one cluster, there is no point of doing a cluster analysis. So you should be reasonable enough to understand how many clusters to be formed. For example, in this case, I will take 1, 2, 3, and 4. Let us say 4 clusters and see whether there is any significant difference. So this part of understanding and identifying how many clusters are to be done, is done through this agglomeration schedule. So you just have to look at these values and see till what point, there is a substantial change in the values and when you find the differences now not much, then you stop there that is what it does. In the non-hierarchical cluster analysis, so after you have identified the number of clusters the next part. What it does is, the non-hierarchical cluster does not involve the tree like construction process instead, they assign objects into clusters once the number of clusters is specified. So you have to specify, here during the non hierarchical, you have to specify the number of seed points we say or the number of clusters. The non-hierarchical clustering methods are, basically there are 3 types basically. But we are more bothered about this K-means method. So if you go to a spaces or anywhere, you will find that instead of non-hierarchical clustering, there is no name called non-hierarchical clustering only you find a K-means because it is almost all the time that the K-means clustering represents the non-hierarchical method. There are 3 methods as you saw here. So what are they? It starts by selecting one cluster, the sequential method, one cluster seed and includes all objects within a prespecified distance because the whole measure is distance. The disadvantage is that when the observation is assigned to 1 cluster, it cannot be assigned to another cluster even if it is more similar. So that is one problem with the sequential. Parallel threshold method, this is the non-hierarchical method that specifies several cluster centers at one point more like a divisive method. All objects that are within the pre-specified threshold distance from the center are grouped. So these are the 3 and the optimizing procedures, which is the K-means, it differs from the 2 in that objects can later be reassigned to clusters to optimize an overall criteria such as average within cluster distance. So average within cluster distance is what we are talking about the Ward’s method. So our objective is to reduce this minimum within cluster distance. So this is the some of the pros and cons you can check it later on. We will use a combined approach. Now let us go back to the data set. Now once we have done it, we had suppose 3 or 4 clusters, now we will go back to the K-means cluster. Now again we take all the data with the variables to this side. Now I am interested to see whether this is okay. Do you want to save anything; do you want to save the particular respondent comes into which cluster membership. You can do that. You can even check the distance from the cluster center. Continue and options, initial cluster center and cluster information continue and number of clusters here, I would like to make a change. So for example, I have gone with 4 clusters you may make it 3 or you may make it 5, there is no binding, it is the researchers own logic that has to be used. Now we have run it. Now you see, we said 4 clusters and 4 clusters have come, this is the initial cluster center and then it tells you the cluster membership. So each case you see 1-20 respondents were there. Now first respondent is in cluster 1, second cluster 1, third cluster 3, four cluster 4, on basis of certain behavioural traits. Let us see what is that. So go to the final cluster centers. So there are 4 clusters, cluster 1, cluster 2, cluster 3, cluster 4. So what is the behavior of cluster 1. Let us see, cluster 1 is somebody whose mean age is 5.9, short term memory span is 4.88, I/Q is 101.2, reading ability 6.58. Cluster 2 has got a higher age mean from cluster 1, but memory span is almost equal, I/Q is less than cluster 1, reading ability is slightly higher. Cluster 3 the mean of the age is quite less; short-term memory span is 4.33. Again it is quite less. I/Q is very high, and reading ability is poor because obviously we can connect that it is a young child because it is the child. So the age group is also less. The age minimum is less, so that is why the reading ability might be less also. Fourth cluster is 6.06 again not very high, not very low, it is in between and look at the memory span, very, very poor, in comparison. I/Q is 90 the lowest and reading ability is again also the lesser side. Now once a company or a researcher identifies these clusters, the behavior of the cluster is also known to you. Now suppose let us say had it been the case of a marketer and the marketer knows what is the age group of the people who are using his products. What is their memory span, what is their I/Q level, what is their reading ability? Suppose a bookseller for example, now he can say well I want to target, you cannot target all the groups right, because that is why segmentation is done, so that you can choose you target segment. Now out of this 4 maybe I will choose only one or two maybe three or whatever it is. So let us say, he wants to select 2 clusters out of it. So which 2 clusters will you select. The question is you can see from the distance between the final cluster centers. Now cluster 1 and cluster 2, 5.525, 1 and 3, 5.2, 1 and 4, 11. So 1 and 4 are really far off from each other, that means there is a very huge distance between the two clusters. Cluster 1, 2, 1, 3, 1, 4, so 2, 3. Now 2 and let us say 3 10.4 again it is far off, 2 and 4 is 5. Cluster 3 and cluster 4 very far off. So we can see here the cluster 2 and cluster 4 seems to be the one which is very close. My sample size is very less, but when you do, you can use it on a larger sample size. So by understanding this, the marketer can understand the cluster 2 and cluster 4 have some similarity and they can target maybe that particular cluster. So this is only make you understand to realize how it works, how do you use it, but then you use your logic and see what is to be done and what not is to be done. So this is how cluster analysis works. So your combined approach is what I just noted, first through the hierarchical clustering you find the selected number of clusters and serve as the initial clusters sits for the non hierarchical and the non hierarchical clusters, then clusters all observations using the seed point to provide more accurate cluster memberships. Now I wanted to show something here. Let us go to the data set. Now you see everything is given to you now. Which this particular respondent falls into which cluster and what is the cluster distance. Now if you see the higher the distance that means farther and which member, suppose you want to know this guy Ronald in which cluster does it fall, in 4. Suppose you are interested in cluster 2 and 4 as the marketer. So 2 and 4 let us see who are the people Luzy, Ronald, Getrade, Betrese, Quinee, Thomas, Morris and Noel. So these are the people who are a part of your target segment. So you do not want to invest your resources and waste your resources for the cluster 1 and cluster 3 and cluster maybe 2. I am only interested for cluster 4 let us say for the time being, just to understand it. So then my focus all my resources will be diverted towards cluster 4. So that is how it helps a marketer a lot, okay and obviously you can see from here, if the same logic applies to as I was giving a political example. Now suppose, I know the cluster 1 is a, suppose, suppose I am just saying, is a very open minded person, cluster 3 is a very religious person, so when a politician goes to these 2 cities, city 1 is the very open minded city, people have like you know very modern and modern thinking approach. Cluster 3 is the very traditional religious kind of people. So the speech of the politician will differ in both the cities. So that is why it is very important to understand what kind of cluster does the city fall to or the person to or the product fall to. So this is very important to understand. So this is what we have done. So the steps are all mentioned over here. So now it will help you to understand how to proceed with the cluster analysis. So once you can do this, this steps I have already done it. And this is for your reference only how you do. And after you do, then you can finally write your cluster analysis report, the interpretation. For example, you may write that we did a cluster analysis and we first conducted hierarchical clustering analysis and we found 3 or 4 clusters and then after that we did K-means clustering and we found the behavior of the cluster. There are 4 clusters for example, the cluster 1 and write down the behavior of the traits of the cluster, entire. So the age group was this, the memory span was this and this and this and similarly cluster 2, cluster 3 and cluster 4 and then you may give your opinion. Even you should do one thing, please remember that kindly give the names of the clusters. For example, cluster 1, now according to the behavior of the cluster, you can given a name to that particular cluster maybe young and very intelligent students. Second impulsive, but very sharp I/Q. What you can do is, you can give a name to this clusters. So that will be very helpful for others to understand what is the basically the trait of this particular cluster. So this is how you do a cluster analysis and you interpret it. I think this should be extremely helpful for you in the future and if you use it well, you can really make good publications out of it. So wish you all and thank you so much.

Info

Channel: IIT Roorkee July 2018

Views: 2,224

Rating: 5 out of 5

Keywords: Prof. J. K. Nayak, Department of Management Studies, Indian Institute of Technology Roorkee, cluster analysis using spss, heirarchical cluster analysis

Id: HsVV__pzFy4

Channel Id: undefined

Length: 26min 21sec (1581 seconds)

Published: Thu Apr 11 2019