Factor Analysis - an introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I want to provide an introduction to factor analysis so first of all I want to talk about what actually is the purpose of factor analysis so the idea is that what we do have is we have a set of observed variables so these are variables which we have data on and what we're trying to do is we're trying to explain the variant and the covariance between those variables typically what we're doing is we're doing this using a sample what we're trying to do is actually come up with a model for the population and our model will explain the variance of covariance between these variables is observe variables by a set of typically fewer unobserved factors and weightings so what exactly do I mean by that is probably better spoken about in realms of an example so the idea is that we might have some data on a particular set of observed characteristics for people we might have data from let's say samples of whether individuals experience insomnia whether they have suicidal thoughts whether let's say they hyperventilate and also we might have data on whether that individual typically feels nauseous most of the time and this data might be for example the data on individuals who are admitted to psychiatric care so typically with this data we we're dealing with a sample right we're not dealing with the entire populations data and within that sample there is a degree of variance and covariance between these set of variables say for example there might be some sort of covariance in our sample between insomnia and suicidal thoughts which is let's say something like not 0.3 and the idea is that what we're trying to do is we're trying to come up with a model which will explain that covariance in the population and the way in which factor analysis works is that we suppose that the variance and covariance structure in our observed characteristics is impart at least due to some unobserved factors so the unobserved factors here might be whether an individual is depressed and whether that individual is experiencing some form of extreme anxiety so we suppose that they are perhaps these two underlying factors which are responsible for the variance and covariance between all of these variables so the idea is that these two factors depression and anxiety actually cause the variance and covariance between all of these observed factors so there is a sort of weighting of depression on each of these observed characteristics as well as a weighting of anxiety so that's what each of these arrows mean I'm just saying that these two particular unobserved factors actually have a causal effect on each of these actually observed characteristics and typically the waiting's which these unobserved characteristics have on these observed characteristics is different so the weight of this first arrow the amount to which depression causes insomnia we might call SI Omega 1 1 the one here the first one indicating the fact that is the first unobserved factor waiting on the first observed factor and then this blue arrow here we might call Omega 2 1 to the 2 here now indicating that we're dealing with the second unobserved factor waiting on the first observed factor so the idea is that there are a set of weights and there are a set of unobserved factors and what we're trying to do is we're trying to estimate these weights and these unobserved factors and I should mention now that these unobserved factors can themselves be correlated and actually we can think about higher order models when these unobserved factors are actually caused by even sort of further down the chain other unobserved factors so what we can do is we can think about the variant of a given variable so we might be thinking about the variance of insomnia across our sample although typically what we're trying to do is we're trying to explain the variance in insomnia in the population order and we're trying to estimate that using our sample of data so there what we do is we suppose that there is a proportion of insomnia which is due to these shared unobserved factors and we call this variant commonality because it is the proportion of variance which is explained by a set of factors which are common to the other observed variables but we also suppose that there is a proportion of insomnia which isn't explained by these unobserved factors and this is something which we call the unique variance of that particular observed variable it's unique in the sense that it is unique to that specific variable and it's not caused by the common set of factors and typically in factor analysis we suppose that there are a set of unobserved variables II 1 let's say e 2 e 3 and E 4 in this case which themselves explain this unique variance of that particular factor if these factors themselves II 1 and less AE 2 are themselves correlated then we can also think about there being a proportion of covariance which is due to the shared factors and are also a proportion of covariance which is due to these unique factors so what are the uses of factor analysis well the main sort of use of factor analysis is that which we've really talked about in this video what we're trying to do is we're trying to explain the variance and the covariance between a set of observed characteristics in terms of typically a simpler structure so what do I actually mean by a simpler structure here the fact is we have in this case we have for observed characteristics and we're trying to explain the variance and covariance between these factors in terms of two unobserved factors so notice that we've gone from a system where we had a dimensionality of four to that of two so in that sense we've made it simpler so that's one of the uses of factor analysis the other main use of factor analysis is and typically in psychology it's used for testing a particular theory so there might be some sort of theoretical evidence which linked depression and anxiety with each of these four characteristics and what we're trying to do is we're trying to actually test whether it is actually likely the case that our theory holds up when we compare it to the data the final use of factor analysis which comes to mind is that of Diamond dimensionality reduction so what do I mean by that well I mean that we started off with a system of four variables and we're trying to estimate estimate the variance and the covariance between those factors in terms of a system which is of lower dimensionality so we started off with a system of four and now we replaced it with a system of two unobserved factors and this concept of dimensionality reduction comes into the fore typically in machine learning where we have highly dimensional variables and we want to remove some of that dimensionality in order to improve partly the predictive power of a model and also to make the computation a little bit easier
Info
Channel: Ben Lambert
Views: 328,061
Rating: 4.8261728 out of 5
Keywords: econometrics, Sem, Factor Analysis, Statistics Field Of Study, Structural Equation Models, Statistics
Id: WV_jcaDBZ2I
Channel Id: undefined
Length: 7min 42sec (462 seconds)
Published: Thu Feb 20 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.