Factor Analysis | What is Factor Analysis? | Factor Analysis Explained | Machine Learning | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] data analysis is important for businesses today because data-driven choices are the only way to be truly confident in business decisions data analysis is also important in research because it makes studying data a lot simpler and more accurate one such method is factor analysis hi guys this is kavya from edureka and welcome to this video on factor analysis at first let's have a look at the agenda we'll start with an introduction to factor analysis and then we'll see what are latent variables we'll go through some assumptions in factor analysis followed by the purpose of factor analysis we'll have a quick look at the types of factor analysis and how to address the issues concerned with factor analysis we shall conclude the session with the basic logic of factor analysis but before we get started make sure you subscribe to edureka youtube channel and hit the bell icon to never miss an update also if you are interested in online training certification do check out the link given in the description now let's get started with an introduction to factor analysis factor analysis is a statistical technique that is used to reduce the large number of variables into fewer number of factors for example it is possible that variations in five observed variables mainly reflect the variation in one unobserved variable it is also known as dimension reduction since it reduces the dimension or the total number of variables in the data set factor analysis is a kind of latent variable model more on that later consider a job satisfaction questionnaire a person's satisfaction with a job can be based on numerous factors such as satisfaction with the job role whether it's as per a person's qualification supervisor which can in turn depend on appraisal or communication satisfaction with co-workers pay etc let's take an example of factor analysis say that you are a foodie and you want to pick a restaurant to go to so you start checking reviews on different restaurants you find that the reviews are categorized on the aspect of six variables which are waiting time cleanliness staff behavior taste of food food freshness and food temperature too many variables are making it difficult for you to pick a particular restaurant that you would go to the two factors to pick a restaurant that you really care about are let's say service and food quality the variables in the reviews can be broadly categorized in these two factors as shown this is what factor analysis does so service and food quality are not really present in the data but are derived out of the data these are called the latent variable now let's have a deeper dive into what latent variables are in statistics latent variables are variables that are not directly observed but are rather inferred from other variables that are observed this is what we call factors it's actually difficult to measure numerically the mathematical model that aims to explain observed variables in terms of latent variables are called latent variable models hence factor analysis is a latent variable model examples of latent variables are quality of life business confidence morale happiness and conservatism among others now in order to perform factor analysis certain assumptions are made about the data let's have a look at these firstly we assume that our data is clean there should be no outliers or missing values secondly the sample size is expected to be greater than the number of factors a general principle is the minimum 1 to 5 ratio that is for let's say 10 factors you should have around 50 samples at least thirdly the variables are expected to be interrelated the concept of factor analysis is based on correlation of data so that it can be grouped together we can perform something called buriedtest to analyze the correlation next matrix variables are expected that is the variables are expected to be of numeric type it should be in an interval of numbers and lastly the data is preferred to be normalized however multivariate normalization is not necessary now let's have a look at the purpose of using factor analysis the primary purpose of using factor analysis is for data reduction having too many related fields can make it difficult to analyze the data thus factor analysis reduces the number of variables factor analysis also helps in latent variable discovery as we saw in the examples before some factors such as empathy cannot be measured but it can be formulated using other variables factor analysis supports simplification of items in the subset of concepts sometimes many fields in our data signify the same thing such as in the restaurant example delay in serving staff behavior and cleanliness signify the same factor which is service moreover with factor analysis you can access the dimensionality and homogeneity in the data now let's move on to the types of factor analysis factor analysis can be broadly classified into two types efe and cfa exploratory factor analysis is used to discover the underlying structure in the data using something like correlation matrix it is used for getting insights out of the data and confirmatory factor analysis is based on the insights derived in efa so cfa is used to test those expectations it makes use of equations for modeling the structure efa is further divided into many types the very popular pca or principal component analysis common factor analysis or just factor analysis image factoring that makes use of correlation matrix derived out of ols regression maximum likelihood method which is again based on the correlation matrix and other methods such as alpha factoring and weight square out of these the most commonly used ones are principal component analysis and common factor analysis now let's go through some of the issues that we need to address with factor analysis first you need to understand whether to use principal component analysis or factor analysis next you should know how to interpret the results of your analysis and finally you need to figure out how many factors to pick let's address these issues one by one principal component analysis tries to find the variables that are composites of observed variables such as in the house pricing data set pca would identify that air quality index is closely determined by the number of parks in the locality but in factor analysis we assume that there are some latent factors some immeasurable factors which can only be derived out of the given numeric variables and secondly in case of pca we take into account the total variance in the data that is the sum of unique variance variance due to error and common variance however in factor analysis only the common variance of shared variance is considered so when you want to find the latent variable using many variables use factor analysis and when you want to eliminate some variable that are having high variance use pca when the number of variables is more than 30 the result of pca and fa is the same next you need to address how to interpret the results of the analysis for this we use something called loading factor loading is basically the correlation coefficient for the variable and factor factor loading shows the variance explained by the variable on that particular factor let's say you have 10 variables that you want to derive into 3 factors so for that you make a table to account for how much of the variance of the variable is explained by a factor it ranges from 0 to 1. so if significant amount of the correlation is explained by a factor the variable can be denoted using that factor for deeper analysis you can calculate the communality of a variable it is given by the horizontal sum of squares of the values for example for variable 1 it would be 0.7 square plus 0.2 square plus 0.1 square similarly the vertical sum of squares of values for a factor is called eigenvalue for example for factor 1 eigenvalue will be 0.7 square plus 0.4 square plus 0.7 square plus 0.1 square and so on also sometimes for a particular variable it shows high correlation for more than one factor this is called cross loading and in this scenario variable rotation should be performed so we know how to interpret the results of the analysis now how do we know how many factors to select when we talk about the sample size the rule of thumb is to have minimum 5 observations per variable that is for let's say 5 variables you should have 25 observations 10 variables 50 observations and so on but when it comes to deriving the factors of the variables let's say from hundred variables how do we know how many factors today 5 8 10 how many for this you can make a screen plot and notice the bend in plot however this is not very intuitive you can instead use the latest root criterion which states that for a particular factor if the vertical sum of squares of all the values called the eigenvalue is greater than 1 you should include that factor in your analysis with that now finally let's go through the basic logic of factor analysis factor analysis basically gives you the items that you want to reduce it creates a mathematical combination of variables that maximizes variance that you can predict in all variables which is the principal component or factor new combination of items from receivable variance that maximizes variance you can predict in what is left is your second component or factor continue this until all the variance is accounted for and then select the minimal number of factors with that you can finally interpret the factors using rotated matrix and loadings i hope you enjoyed the session thank you for watching this video if you have any doubts please leave a message in the comment section happy learning i hope you have enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest do look out for more videos in our playlist and subscribe to edureka channel to learn more happy learning
Info
Channel: edureka!
Views: 738
Rating: undefined out of 5
Keywords: yt:cc=on, Factor Analysis, factor, analysis, what is factor analysis, factor analysis in research, factor analysis method, factor analysis example, confirmatory factor analysis, exploratory factor analysis, machine learning, data analysis, data analytics, Structural Equation Models, edureka factor analysis, edureka, data science concepts, factor analysis explained, explanation of factor analysis, machine learning concepts, machine learning factor analysis, edureka machine learning
Id: Jkf-pGDdy7k
Channel Id: undefined
Length: 11min 24sec (684 seconds)
Published: Mon Dec 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.