Principal Component Analysis and Factor Analysis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I will talk about principle component analysis and factor analysis click on the link below the video to go to the website and find more information on this topic in this presentation I will first go over a principal component analysis or PCA and factor analysis overview then I will talk about the PCA methodology and we'll talk about component factor retention or component / factor rotation and this would be the orthogonal or oblique rotations and then I will talk about when to use principal component analysis and now we'll conclude with explored exploratory factor analysis so what are principal component analysis and factor analysis they are data reduction methods you used to re-express multivariate data with pure dimensions so what we mean here is that we have a lot of variables in your data set and you wonder if all of them should be used in an analysis or some of them might be redundant and you can express all of those variables with fewer factors or components so the goal these methods is to reorient the data so that many original variables can be summarized with very few factors or components they capture maximum possible information or the variation in the data from the original variables so the PCA analysis is also useful for identifying patterns of associations across variables so these two methods are very similar principal component analysis and factor analysis and are often used interchangeably but they do have a little bit of difference between them and that's that a factor analysis assumes the existence of a few common factors that drive the variation in the data we just don't know what they are or how to measure them really well but we know that there are a few common factors whereas principle component analysis does not make such an assumption you just have data and you're interested in the most efficient way to represent this data okay so here's a little bit more about the PCA methodology so the goal is to find components and we have here is Z equals from Z 1 to Z P these are P number of components which are linear combination of the original variables x equals x1 to XP that that achieved the maximum variance so if we have the number of original variables equal to P those variables can be represented as components which are in the same the same number of components but that are just reorient it a little bit differently so how are they real or e'en tat well the first component Z 1 is given by the linear combination of the original variables and accounts for the maximum possible variance so we're interested given all these X variables can we just come up with one component that captures as much as possible from the variation in the data like 50% or something with just one variable or component so the second component would capture most information that's not in the first component and it's also desirable that it is uncorrelated with the first component so this will be the requirement for the second third and so on components so PCA seeks to maximize the variance and so it is sensitive to the scale difference in variables so for example if you have a survey that has you know attitude questions from 1 to 7 rank things then you can use a covariance matrix but if you have variables one of them is measured in dollars one of them is measured in people one of you know numbers one of them is measured in proportions or something like that then it's better to use with correlations in your data because of scale issues rather than the covariance of original variables so going back pca maximizes the variants such as Z equals X u and we have that u prime u equals 1 so again we want to find those components that are linear combination of the original variables that summarize the data in the most efficient way and the solution would be obtained by what we call eigenvalue decomposition of the correlation matrix of the x variables by finding principal axis of the shape form by the scatter plot of the data and so these eigenvectors would represent the direction of one of these principal axes so what we mean here is suppose you have two original variable x 1 and x 2 and then the data is like a scatterplot can we find and different than the mesh may be like going like that in the middle that would summarize this variation in the best possible way and then maybe a second one uncorrelated dimension that would summarize the next possible variation in the data so this is what we're trying to do here so we would be solving the equation R minus lambda I for the identity matrix times u equals 0 and this R would be the sample correlation matrix of the original variables X this lambda right here is called the eigenvalue and you would be the eigenvector and so these lambdas are the variances associated with the components or the factors and the diagonal covariance matrix of the components we would denote with the D at diagonal of lambda and these would be repelled reported in the results as the eigenvalues the proportion of the variance in each of original variable X I that is accounted by the first C factor would be given by the sum of the squared factor loadings so the summation from 1 to the C number of factors would be equal to F squared of I K and so when we have that all of the components are retained so C the number of factors equal P the number of original variables then we would have that the all the summation of the squared factor loadings would be equal to 1 or all the variation in the data explained and that intuitively makes sense because if you have 13 variables you can summarize that all of the absolutely all of the variation in this data with 13 components so that that will be a an always a true statement here so the factor loadings are the correlations between the original variables X in the components factors Z which would be the correlation of X and Z and that is also u times D to 1/2 and we will actually look at these in the example section in the next video so now because the factor loadings matrix shows the correlation between the factors and the original variables we would have the typically those factors are named after the set of variables they're mostly correlated with so for example if you are doing a survey and you kind of know which factors are important you can have several questions representing those factors and then your analysis would again show you which questions really load on which factors would would confirm that and another thing that we're going to do here is that the components can also be rotated to simplify the structure of the loadings matrix and also to facilitate the interpretation of the results okay so next what is important is the topic of factor retention or component retention so since pca and n factor analysis are data reduction methods there is a need to retain a proc approach in a proper appropriate number of factors based on the trade-off between simplicity so we want to retain as few factors as possible and completeness we want to explain most of the variation in the data so how many should we retain so one rule is the Kaiser rule which recommends that you retain only factors with eigenvalues 1 by exceeding unity which means that any retain factor Z would become for at least as much variation as any of the original variables in the and it makes sense and this is a tool called a scree plot where we can examine the eigen values after pca and see how many to retain and so in this example we have the first principal component has an eigenvalue of 3 points something that two points something about two and then just above one and so on and this is the one and because the rule says that anything above one should be retained so we have 1 2 3 4 5 according to this rule should be retained here in this different example we have one eigenvalue that's about four point something and one that's very close to 1 and so according to this sky 0 again here we should retain two now in practice we would need to examine the scree plot to determine if there is a break in the plot with the remaining factors explaining considerably less variation or kind of like an elbow as they call it so in this case I mean look at this one there's like a significant break here between the third and the fourth and the fifth and these are getting close to one so in this case you may also use like the first three factors or try both ways to use five and use three and see which one is better in this case I mean this factor you know this uh eigenvalue is above one but just barely so so you may just want to use one component that expresses all of these original variables okay another topic to consider is factor rotation and the factor loadings matrix is usually rotated or reoriented in order to make most factor load loadings or any specific factor small while only the few of them being large in absolute value so what we need what we're doing with this factor of rotation is you're trying to have as few as possible of the original variables loading on a factor or a component but with highest possible values so we want highest possible correlations on fewest possible factors that's the goal here and this simple structure would allow these factors to be easily interpreted as clusters of variables that are highly correlated with a factor and again we want to be able to define clusters of variables that to large extent find only one factor because if you have original variables that load on several factors then the interpretation gets a little bit harder so there are two types of rotations typically used one of them is called orthogonal and one of them is called oblique with the orthogonal rotation it preserves the perpendicularity of axis what it means is that these rotated components or factors remain uncorrelated with each other and some people think that's a desirable property and the one that we would see there is the very max rotation which preserves the structure by focusing on the columns on the factor loading matrix and it's aiming said so the kaisers varimax rotation is aiming to maximize the squared loadings variance across the variables summed over all the factors whereas the quarter max rotation preserves that structure by focusing on the pros of the factor loading matrix so on the other hand with an oblique rotation that one allows for correlation between the rotated factors and the purpose is to align those factor accesses as close as possible to the groups of the original variables and we want to facilitate the interpretation of the results so some authors think that that's actually a better approach since a lot of the variables or factors in real life are correlated amongst each other so that would represent the natural phenomenon better in a variation rotation that we will use here is called a pro max rotation that's an oblique rotation so when is it appropriate to use principal component analysis it's um it should be undertaken in cases when there's efficient sufficient correlation among the original variables to warn this vector component representation so if you have data that is mostly uncorrelated with each other then why should we try to summarize it using common you know factors or just using fewer components but if there is very high degree of correlation among your variables then is a good example of when to use this these data reduction methods so the kaiser mayer alkane measure or kml measure of sampling adequacy would take values between 0 and 1 and if you have very small values indicating that the overall variables have little in common to warrant principal component analysis whereas that is above pointone are considered satisfactory for principal component analysis and that's one measure that we will use another one is the Bartlett's sphericity test which examines whether this correlation matrix should be factored which means are the data dependent or independent and that is a cry score test with a test statistic that is a function of the determinant of the correlation matrix of the variables now a little bit about factor analysis in particular exploratory factor analysis and we would think about the common factor model and there in under this model the observed variance in each measure is attributable to a relatively small number of common factors and a single specific factor unrelated to other factors in the model so we would have here X I so this is an original variable that we have in the data set would be equal to lambda I one side one and so on plus Delta I so these would be the common factors that we would uncover during the analysis side one side two and so on took size C and they would be rep you know they would be representing the original variable and this specific factor here Delta could be thought of as the error term so again these are the common factors and they're common because they might also appear on for the other variables and this is the error term that is specific to this to this variable now factor analysis would be appropriate when there is a latent trait or unobservable characteristic so if we know that there's some kind of factors that explain the data but we can't measure them very well or observe them very well that's a good that's a good scenario to apply the effect analysis and one of the useful things of factor analysis is that these factors scores can be obtained from the analysis of dependents so instead of like using the original variables you could now use the factor scores that represent the data and factor analysis typically used with survey questions about attitudes and the goal here is to identify common factors capturing the variance from these questions and which can also be used this factor scores now several assumptions to determine a solution of a common factor model and basically that everything is uncorrelated with everything so the common factors are uncorrelated with each other the specific factors are uncorrelated with each other and the common factor and specific factors are uncorrelated with each other this is an assumption we're making and now we'll talk about the communality and commonality is the proportion of variance in the x that's attributable to the common factors so this is the proportion of variance in in x attributable to the common factors but it's also equal to one minus theta squared I I and this theta squared is the variance of Delta I these are the the so to speak these are the error term that I talked about on the previous slide so this one represents a factor uniqueness so in some of the examples we're going to see that is the factor uniqueness one - this is the commonality so so basically we either have commonality which means there's part of the variation that is attributable to a factor all the variables or there is just factory uniqueness that it's just the lungs that is just based on a particular variable that's not accounted by the factors okay so the solution to the common factor model is determined by reorienting the first factor so that it captures the greatest possible variance and then the second factor so that you captured the greatest possible variance that's not already accounted for and it's uncorrelated with the first factor so this is exactly the same as we had with principle component analysis so now the correlation between the original X variables and these excite factors are called the factor loadings lambda and once we have that we can represent the factor scores which are the positions of the observations in the common factor space or the factor scores coefficient D would be equal to R to the minus 1 this this R is the correlation matrix of the original variables and we have this lambda which is the factor loadings so we basically could obtain these factor score coefficients so now once we have the factor score coefficients we could calculate the factor scores as X I equals XS times beta where beta is this factors course coefficient and then we have the original variables here so these factors course are included in the data and that could be used instead of the original variables okay so this was a very quick introduction of principle component analysis and factor analysis now join me in watching the next video on principle component analysis example
Info
Channel: econometricsacademy
Views: 114,506
Rating: 4.7456646 out of 5
Keywords: Econometrics, Principal Component Analysis, Econometrics (Field Of Study), Econometrics Academy, Factor Analysis, PCA
Id: hnzW8UxQlvo
Channel Id: undefined
Length: 21min 46sec (1306 seconds)
Published: Wed Jan 01 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.