53. Exploratory Factor Analysis in SPSS

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Welcome everyone to the class of marketing research and analysis. So in the last lecture, we had started with factor analysis, especially the exploratory factor analysis. So we had understood that factor analysis is a technique which is used for data reduction or summarization, correct. So why it is used? And it is also called as an interdependence technique. Because there is no dependent or independent variable in that as such. So we say it is an interdependence technique, okay. And I also mentioned that the heart of factor analysis is correlation. That means we try to find out the correlation among the variables and the stronger the correlations are within a construct, it is better, okay. Or within a factor, it is better, right. So we assume that all the items or all the variables within the factor are strongly correlated, okay. So we understood that in the last lecture that we first will, suppose we have a data set. Then we try to find out what are the different kinds of factors that are emerging. So that is based on several factors. For example, several ways you can select factors. One of them was on the basis of the variance explained. That means we said if a study is explaining sufficient amount of variance, then we can select the right number of size of factors, okay. What does it mean? For example, as I said in the last lecture that your variance explanation should be at least 60%, right. So let me show you on the data set. It is better if I show you because then you will realize it better. So, okay, so this was the data set. So we had if you remember several variables, product, quality, peak hours, activity, technical, support. So these are different variables a company is interested to see how they are influencing, right, the customers. So we went for a dimension reduction or a factor analysis. And this is the exploratory factor and we took all the variables, okay. So we will take all the variables in here, okay. So now in descriptives, we had learnt that we have to find out for the KMO and Bartlett Test. What is the KMO? The KMO is a measure of sampling adequacy and the Bartlett Test says it should be significant. That means if it is significant, that the null hypothesis says that there is no correlation among the variables is rejected and we will say that at least there is some correlation among the variables which is an important assumption of the factor analysis. So we did this, right. We said, we can extract on the basis of eigenvalue and fixed number of factors. So eigenvalue we said if you remember across the factor, how do the variables explain in a particular factor, the contribution of the different variables in a particular factor. So we take the loading square, the sum of the squared loadings across the particular factor and that value is called the eigenvalue. So eigenvalue should be greater than 1, right. Or we could also use the fixed number of factors in which we said how do you determine the fixed number of factors. The fixed number of factors could be determined on the basis of past knowledge or sometimes we can do it on basis of the amount of variance explained. So let me show you, for example, or if you remember, if the amount of variance explained is sufficiently high. Let us say, you have 13 variables in this case and you have got, let us say, 4 or 5 factors. These 5 factors explained about 85%. Then you can say well, since it is explaining quite a good amount of variance, what I can do is, I can increase a little further. So if I, in certain cases, in such cases, the variance explained is minimum should be 60%, right, in the social sense, right. So if it is 70% or 80%, you have a better chance. You can still reduce the number of factors. That means suppose 6 factors were explaining 85%, now I do not want 6 factors to take. I want only 4 factors from my theoretical understanding. Then I will take 4 factors, that will reduce my variance explained, there is no doubt about it, right. So as the number of factors will reduce, so the amount of variance will also reduce. But if 4 factors now will explain, let us say, 70% of the study, of the variance in the study, then it is good enough, okay. So that is how you find the number of factors, okay. Now I also explained about rotation, if you remember. And I said why do we rotate, if you remember? Sometimes to get a simple structure, we get a simple structure, we want to rotate the variables so that the variables are placed in such a manner that they are not loaded too much into one factor and very much less on to the other. But they are more or less equitably distributed, okay. So the most popular is the, among the orthogonal rotation is the varimax rotation. Now what is orthogonal and what is the oblique rotation? So I explained that orthogonal rotation is a rotation in which we say that the factors are at 90 degree to each other. That means what? If something is at 90 degree, mathematically you can understand that they would never meet. So they are not correlated, right. Uncorrelated. But on the other condition, other hand, if you have a, let us say, oblique rotation, so it is not 90 degree. But the problem, the problem is that although in real life, in social life, hardly you find a relationship which is uncorrelated or not correlated. But the other side is when you take an oblique rotation which is more favorably saying that there is some correlation, but it is very difficult to analyze mathematically. It is a very complex process. Because the number of correlations that can occur will be very high. So in fact even the softwares are not developed to such an extent which can explain it properly. So most of the time what we do is, we consider and also theoretically, there is a logic that we fear of 2 factors, these 2 factors would be separate as much as possible. So to believe that, we say well they are in a orthogonal rotation, will take an orthogonal rotation. That means they are not related. And taking that we take the varimax rotation, right. I said if you remember, you should sort by size and suppress the small coefficients. Now when you suppress the small coefficients, what do you mean? I mean that the factor loadings which I would take should be at least 0.3 that is 30%, right. Because if it is, factor loading is what? The correlation between the variable and the factor, okay. So if it is 0.3 or 30%, that means the R square is at least 10%, right. So we take that as a magic number or a cut-off value, right. And we will say run, okay. Now let us see the output. So KMO is good, is above 0.5. This is significant. Now coming to our, and I had explained you about the role of communality also, right. So now look at the rotated component matrix. So if you look at this matrix, slightly let us go up. So if you look at this matrix, it looks little more complicated. There are lot of cross loadings. That means 1 variable is loading on to several factors. Factor 1, factor 2, factor 3, factor 4, factor 5. But if you look at this now, when we have rotated, now they are quite neat. The variables are loaded into 1 and only 1 factor. Only in few cases, there is a cross loading. For example, in this case if you see, there were 3 cross loadings or sometimes, this case is even 4. But here if you see when we had done maximum there is a cross loading of 2 in to 2 factors which is not very, okay, it is fine, right. So what we do in here, now you see, you have got a factor. So factor 1, factor 2, factor 3, factor 4, factor 5, right. So all these 5 factors, now suppose you see, now I have got a 0.987 but it is a single variable, right. So if I want, I can run this in a slightly different way. What I can do is, I can go to analyze, dimension reduction again, and now this time, you see since I had, let us go back to the output. Look at the amount of variance explained here. Now the amount of variance explained in this case is about 81% which is sufficiently high. So instead of 5 factors, if I make it 1 factor, will it matter or will it make some difference. Let us see? So I will now extract only 4 factors. So what I am doing is, I am taking 4 factors. And I will run this again, okay. Okay, fine, everything remains the same. So now you see my variance has reduced from 81% to 73.7, 74, almost 74%, right. Has there been any change in the communalities? You see one of the communality has become very poor, new products, let us say, let us see that, okay. And let us look at the rotated matrix now. So if you look at the rotated matrix, right. So the rotated, you see the new products is not visible. Because it is now less than 0.3. So it is much, it has a very low communality. It is not explaining to the factors. So it is automatically not visible to you. It is less than 0.3 that means, right, the loading which you have taken, putting those values into the SPSS. Now look at it. Now we have got 4 factors. Are these 4 factors now explaining slightly better with 74% of the variance. If yes, then you should choose this model instead of the other one, right. So this is what, I am not forcing you to, asking you to do anything. I am just giving you an option. So you had 81% with 5 factors, but 1 factor was explaining only, there was only 1 variable in that. So instead of having 1 variable, what we did is, we tried to see whether if we reduce it to 4 factors, will the variance decrease too much. If it is not too much and it is sufficiently explaining, then why not do it. And in this case, it happened, right. So after the analysis is done, now let us move to the PPT. So the final decisions, how do you make? The final decision about the number of factors to choose is the number of factors for the rotated solution that is most interpretable. So you can, either you take it from the, through the eigenvalue or the extracted number of factors, that is up to you. To identify the factors, group the variables that have largest loadings for the same factor. So we understood there also. Interpret the factors according to the meaning of the variables. Now let us go back to the output file, right. So the output is here. Now let us see. Is there any relationship? That is what your job is now. So let us see, there are certain variables loading into factor 1. So what are they. Delivery speed, complaint resolution, order and billing, new products. New products is almost gone. So product quality. Product quality is also not here. Why? It is in the second factor, right. Price flexibility, product line. So now you have to see whether there is any relationship among the variables which are loaded into factor 1. Similarly what is the relationship that is among the variables loaded into the factor 2. What is the relationship for example to the factor 3. Sales force image, E-commerce activities, advertising. So most important is something linked to the promotional image, something, right. And this is, for example, if you see technical support and warranty and claims, more into the service part, okay. So now you have to find the name. You have to suggest a name to the factor, okay. That is your job. So interpret the factors according to the meaning of the variables. So as you saw the variables, you have now understood that there is some relationship among these variables. And on basis of this relationship, you have to give a name to this factor, okay. So now after you have understood this, now let us see what you can do with the factor analysis results, the exploratory factor. We are talking about the exploratory factor analysis. Generally factor analysis when you say the word factor analysis, you mean the exploratory factor analysis. Because there is another type of factor analysis which is separately done, I mean, which is called as the confirmatory factor analysis which you will be doing later on, okay. Now once you have done with this exercise, then there are some options in your hand. What? First of all, you can select a single surrogate measure. Now what does it mean? Choose a single item with a high loading to represent the factor. Now suppose in that factor, let us say, in the factor, there are 4 variables, right. So 1 of the variables you see let us say v2 has a very high loading with this factor, right. So you can select this particular variable to represent this factor, that is also a, that is a possibility, right. But it has own limitations, sometimes, why it has limitations? Because it is not good to say that this factor should only be representing, this variable should only be representing this factor and we should, we will completely overlook these 3. That is not a right way. That is, to me at least personally I think. The second way is that you create a summated scale. Now what is a summated scale? Now the values, for example, v1, v2, v3, v4. There are 4 variables. Now these are the respondent 1, 2, 3, 4, 5. Let us say you have 300 respondents, okay. So the scores given by each respondent, whatever score they give, let us say, right, the values. So let us say, any value they give, let us say, 2, respondent 1 has given to v2 as 3, 3, right. Then let us say 4, suppose. Then you create a new column which is, let us say, summated v, right. Give a name or something and then, what is the average of this? 3+3+4+2=12, 12/4 is 3, right. Similarly for respondent 2. Suppose 3 4 3 2. Again it is 3. So you can find out, right. So take this, take finally this column and use it later on. We can use it. So what it says? form a composite from items loading on the same factor. That is what we did. Average all items that load on to the factor. Calculate the alpha for the reliability, okay. Now this I have, I will have to explain you. What is this alpha and reliability, right? So we use a term called Cronbach's alpha. So it is an inter item, Cronbach's alpha, right. It is an inter item correlation. That means what is the correlationship among the variables within a certain factor, right. So we assume that within a certain factor, let us say, this is a factor 1. So assume it is a house. It is a particular residence or house. So we can say that the members, let us say v1, v3, v6, right. And here v2, v4, v7, let us say. So we can say that the members within this house or this factor, they should be having strong correlationship, right. Similarly, the variables within this factor, they should be strongly correlated among themselves. But should they be related to each other? No. That is why most of the time you say that we need an orthogonal rotation because we feel that the factors are uncorrelated to each other, okay. So but there has to be a strong correlation. So this strong correlation if that is there, then that means what? We will be measuring what we are intending to measure, right. That means the value of what we want to measure that will be more or less repeatedly becoming the same. That means it will be repeating the same result again and again. So most of the variance will be explained. So here we, in this condition, when we say this, right, so we find this a reliability. So reliability, how to measure, we will see. So that is a, as I said the Cronbach's alpha has to be measured, okay. So now you name the scale or construct. Now let me show you how to do the, let us say, measure, right. So take one of this. For example, 12, 17 and 10. Let us remember this, okay. So we go to this file. So how do you measure reliability? Go to scale, reliability, right. So 12, we will take 12. Then it was I think 17, okay and 10, okay. So now we want to see the alpha. So you can see there are several things out here. So we do not want to do anything. You can even check. Suppose you have large number of variables, 7 or 8 variables. And because of some variables, let us say the entire reliability is coming down. So you can even use this, scale if item deleted. So what it will do is, it will reduce, it will deduct those variables which are contributing to the poor reliability of the study, okay. So that is a very nice thing. But I do not want anything at this moment. I just want to see the alpha. Now you see, the Cronbach's alpha in this case, the number of items and the Cronbach's alpha is saying 0.67. So 0.67 is as, earlier it has been said that this reliability value should be 0.7 or more. But researchers like for example, if you go to Anderson’s multivariate analysis. In his book, he has said if it is even above 0.6, then it is fine. Because in social science, the reliability measures do not, are not, will not be as high as you find in basic science, right, like engineering and medicine. So in this case, we will try to see whether it is sufficiently above 0.6 or around 0.6 or not, okay. So this is the second method and this is the, one of the most preferred method now a days. The third method is also good but this is a preferred method, right. So now remember the third method is the factor score method. Now factor scores I had shown you I think that if you go back. So you can do one thing. You go to the exploratory factor analysis here, right and see scores. So what you can do is, there is a method to locate this, you can save the scores, right, factor scores. So if you go back to the data set now again, you will find that 5 or 4, I do not know how many, as the number of factors, the same number will be the factor scores. So 4 columns have been inserted into the data set. So these 4 columns are nothing but called factor scores. They are the scores which are representing each factor. Now this factor scores, you can use it for your study in some other studies, may be in some regression or some other study where you want, insert it as an independent variable, right. But also remember, the same thing could have been done had you taken the summated scale. So because in summated scale also there were 4 factors. So you only did, what you did was? You took the variables only linked to that particular factors. So let us say, factor 1, let us go back to the slide. So in this case, the difference between the summated and factor score is that, while calculating factor score, the software or using, is taking, considering all the variables, right, all the variables. But in summated scale, the difference is, while calculating summated scale, we are using only those variables that are linked to that particular factor, that is the difference. So if you are using factor 1, let us say, v1, v2, v3 are the 3 factors, then only these 3 will be considered. And v4, v5, v6, v7, v8, v9 are not to be considered. But while calculating the factor score, all the variables are considered to calculate the factor score, right. And obviously the weightage is more to the ones which are having highest connection with it. So these 2 methods but most of it, now it is, if you see, mostly the summated scale is used, right. Now coming to this concept of, we just spoke about validity and reliability. What is this validity? And what is reliability? So validity as it says, what does it mean? The soundness or appropriateness of a test or instrument in measuring what it is designed to measure, right. Who said? Vincent 1999. He says that whether an instrument is actually measuring what it is intended to measure or it is not measuring. So if it is doing the work, then it is called a valid instrument, right. On the other hand, he says, there is another term which is important for any construct or factor which needs to be measured, is that. The degree to which a test or measure produces the same scores when applied in same or other, different other circumstances, right. Then we will say it is reliability. Suppose if you are going to measure your weight, for example. So you measure your weight in 3 different instruments, right. For example, a weighing machine or 3 different types of weighing machines. Then if they are, if it is a weighing machine, then we will say it is a valid instrument, correct. But then suppose let us say, let us do one thing. If you go to weighing machine and check your weight first, first time. Then you again check your weight and you again check your weight. And suppose, the deviation in the weights is not much. Let us say in first one you were, let us say, 80 kg. Second one, you were 79.5 kg. Third one, let us say, 79.8 kg. Then we can say it is more or less a reliable machine. But suppose you go to another machine and it gives you 80, let us say, then 78 and then 83. Then it is a not reliability machine. So that is what it says, right. So our measure, our construct, our scale or factors have to pass this test of validity and reliability, okay. Now this is the example you see. Now target A, poor validity. So it should be hitting here, right. But it is hitting somewhere here. But it is a good reliability. Why? Because all the values are more or less giving the same result again and again. This case, poor validity, not hitting anywhere in the center and even reliability is not good. Because it is haphazard, right. Now look at this. This is a case of a good validity, right and also a good reliability, okay. So this is what we are understanding from the validity and reliability. And I showed you how to find out the reliability, right. So when you talk about reliability, you can do it through a Cronbach's alpha which should be, the values should be above 0.6, so should be above 0.6. If it is above 0.6, we say it is decent enough, right. This also gets affected by the sample size, okay. So the other thing is the validity. So now, for now, we will talk about, when I talk about validity, I will talk about face validity or content validity, face or content. That means what? What we do here is, we try to take the opinion of experts and see whether the instrument that I am using, is this instrument good enough to find out what my objective of the study is. So to do that, we do a face validity. That means the instrument, we are checking the instrument through some experts and saying kindly check it for us and say whether I have used the right questions or the right instruments or not, right. Then you have content validity, face validity. Then you have other validity, types of validity also. Nomological validity, discriminant validity, right, construct validity, convergent validity. So what is convergent validity? When the factors, as I said, factor 1 has got, let us say, v1, v2, v3. If the items, the correlation among this items is high, let us say above 0.7 or more, then we say it is a strong correlationship, right. That means we will say that there is a convergent validity. So convergent validity is also measured through the correlation among the variables, okay. Now this is what I will continue with the reliability and validity in the next section also when I will get into confirmatory factor analysis, okay. Because there it is more deeply explained. Now how do you report. Finally you have written everything. You have done a factor analysis. Now you have to report. How do you report? So if you create a factor based scale, describe like this. So the report is, what is the theoretical rationale for EFA. Why did you do EFA? Suppose you say well I did a study to understand the suppliers behavior. So to understand the suppliers behavior, I needed to do a factor analysis. I had asked several questions, there were several items in my study. And when I had too large study, number of variables, the size of, the number of variables were very high, so I needed a exploratory factor analysis to squeeze it or to reduce it to a few meaningful factors. Then detail description of the subjects. So what is the questions you asked and what are the factors that were generated, you need to explain it very clearly, okay. Then also include the descriptive statistics. So the description, like the mean, the number of occurrence, the maximum value, minimum value, the standard deviation, also like that. Show the correlation matrix, why? It will tell you which variables are correlated with other and in which way, okay. Then you say what did the, what was the method you used? Was it the principal component analysis, was it a common factor analysis, right. So I had already explained the principal component analysis and factor analysis. So and then you can also write the communality estimates and the factor extraction if it is required. You can omit this part, right. And what was the kind of rotation you had used to bring in some sanity into the study. Suppose your study was not behaving properly or it was not giving a good result. Then you should do a rotation. Otherwise, you need not do a rotation. Suppose you found that most of the variables are rolled into only 1 factor, then we will do a factor rotation. So which type of factor rotation did you do? For example, a varimax, or equimax, promax, whatever. And what was the criteria employed for the number of factors. For example, as I said, so we will not take any loading less than 0.3, right. So that is a cut-off value. So you have to mention that. And the meaningful loading. So any loading which was above 0.5 was only considered, right. So 0.5, 0.3, whatever it is, right. So saying all this, then you have to write it in your research report. So then finally you will say after doing all this, I got 3 or 4 factors and this factors explained; one thing is not given, the variance. During here, how much variance in the study was explained through this factor analysis. That also you have to mention. When you do this, right, then I have already explained how to do the, conducting the factor analysis in SPSS. So need not go further. So when you write this, then your factor analysis, exploratory factor analysis is over. So in the next lecture, what I will do is, I will continue with a new technique which is called the confirmatory factor analysis, right. So confirmatory factor analysis is also a very important tool. Now it is being used largely in various researchers. So what does it mean and how it is done, I will explain in the next lecture. Thank you so much.

Info

Channel: IIT Roorkee July 2018

Views: 6,530

Rating: 4.969697 out of 5

Keywords: Prof. J. K. Nayak, Department of Management Studies, Indian Institute of Technology Roorkee, factor analysis using spss, cronbach's alpha, validity, relability, reporting results of factor analysis

Id: j5q1E4_wq3k

Channel Id: undefined

Length: 29min 5sec (1745 seconds)

Published: Thu Apr 04 2019