R Studio: Confirmatory Factor Analysis (CFA)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this is a brief video tutorial on how to do a confirmatory factor analysis with our lockdown so similar to my previous videos on how to do an exploratory factor analysis and data perhaps figuring local library's important view the data identify and exclude outliers I'm going to check for assumptions source such as additive 'ti linearity start with leader two regressions essentially then after we're done with all of that then Muller on the CFA I'll show you guys how to create a factor model but run it how to create a path diagram although it's probably gonna be easier for for us to just make the pattern I'm in PowerPoint maybe it's probably because why our kung-fu is that now that ought to speed yet but I'll show you guys how to load up at least you'll know how to load up a factor getting our check for Haywood cases a lot of times these are things that can help us figure out if there is something seriously wrong with the analyses you should check behaved cases and then know we're gonna get global fit indices will check for modification in the season we will talk about how we should probably take into consideration theory and not just the numbers and then we'll take a look at residual correlations as well to help with the decision and if we should revise the model and is so how so and then we're gonna basically rerun the revised model again create the factor models run the factor model create a path diagram check the four here with cases and guy global fit indices and right at the results so further ado let's go ahead and get started so I'm gonna go far and go to new file our Bart down just call this video so that's I'll put this to work let's get rid of this unnecessary syntax and if you are my student in my second matrix class I would recommend that you would just copy and paste these things especially for the assignment please pay attention to that you're using the correct data set so for the assignment you should use CFA data címon dot CSV for this example we will be using another data set which I'll show you what data set that is but it's copy and paste this into our studio really quick alright so first of all let's load the Livan package right all right and you'll need Livan basically to run the CFA in the same plot to make the path diagrams oops I kind of got ahead of myself right there that was the second part of the question please take a look at the grading rubric as file view if you're on my second matrix class it's basically the instructions right here I'm basically just transferred to students grading rubric please pay attention to what how much points are given to each step obviously since this is a CFA assignment I'm going to give a lot more points for you earning a CFA and running up the results and an interpretation that's gonna be really important this fall so I guess that's why I assign it a lot of points so notice and plot can already the time all right important view data so I mean having mixed successes we're running out the code for importing the data let's see if this will work do you ever run into any problems you couldn't use the point that click interface to make sure that our knows what is your working directory usually assumes that it's in the same directory as your that's where you're saying these are default 794 observations 30 variables so that's about right so 7 CFA basics this is the data set that we'll be using so question 1 all the way to question 30 we do have about 700 or so 794 people if you ever have a have any problems with this click on the session set working directory to a directory until all are what is your working directory over here and it will actually spit out the syntax over here you can run it and then you can load up your file into the global environment all right use a summary function to figure out how much missing data is in this very well summary master sorry there are any missing data so you don't see any an ace over here so I'm pretty sure this was cleaned up already okay that makes our lives easier use the view function to figure out how to be a lot data so a lot of this is basically an iteration of in previous videos on how to prep the data so I'm going to views the only one that is capitalized it has a capital S alphabet at the beginning of the word so this lets you lets you actually see the the data over here don't know why this one decimal that's kind of strange okay use names function it's just you see the actual variables Q 1 all the way the huge 30 that looks about right okay so next time I'll tie red outlier let's do this we're on my house know this because do this master data set using table delete stuff paralyze if there is any missing data probably not gonna be relevant this because you don't have any missing data let me just run through this make sure you type this out correctly looks about right there you go alright in a summary on Mahalanobis values so range 5 till 208 that's really large okay calculate a cut-off score for P less than point zero zero one four but it's for Halliwell this square minus one zero one the columns a master this degrees freedom oops must have something q chi-square don't find that I'm the number of people to exceed the cutoff score actually sorry I hated myself so we got a cut-off score let's check the degrees freedom cuz you'll need this for you right up later on that's dirty and cut off actually fifty nine point seven zero okay number of people who exceed the cap score 69 people all right so let's check over here what is it 69 people who how notice values larger than the cutoff score 69 plus seven to five it seems about right okay so now we're gonna be using this data set instead check some assumptions [Applause] oops sorry sorry this first of all let's run some by very correlations oops I just have well there you go okay bivariate correlation so Corral equals two correlations using it know our data set and you know I don't think I need to type this in but I'm just gonna do it for the sake up just so that you would have two syntax if your data set does have missing data correlation okay let's make a table and then apply correlations that are higher than like nine symbols for numbers synonym okay so we're looking for bees and asterisks bees and asterisks in this table about Vera correlations don't think I see it I think we're good okay while you're interpretation I let you do that create alright it's examine linearity assumptions think I forgot to there you go I just like to have my hitters in blue helps me make sense of things create random contacts remember it sounds just a random number that's just the recent freedom based on a number also using all rows on this data so we're basically creating a random car our values for the same number of participants in out date okay and then run a fake regression with these random class four values that you just create it predicted by all the relevant variables in it is so hey let's take and take a random number with all of the columns of variables data sets gonna be no well okay but you studentized residuals based on this vacation kick your plot let's make this a good one all right so kind of matches the most of it actually matches I remember we have 700 something people over here and we have like literally only a handful that are not really on this linear curve right here for the quant all taken from a sample and this quant are supposedly taken from a linear distribution right here so I I do think we're actually pretty okay I mean if you want to be conservative you could say you know might be some slight issue sort of it the narrative across today so we could go with that but you know we're looking for gross deviations from this line over here and I think this society feels like given the sample size the histogram Stan lacks residuals you just be nice oh I see so it's a function and we basically want a normal distribution addicts it's a little slightly positively skewed but in general can I think this is pretty normal axis exciting how much in eighteen homoscedasticity so with the scatterplot still fake x-axis okay let's take a closer look so for homogeneity you basically want your plus here to be mostly evenly distributed across these four quadrants and so it's mostly okay I mean there there are some a few again keep in mind that we have like seven hundred people over here so like maybe I don't know one two three four five six seven eight nine I'm just looking at the top of it so one two three four five six seven eight nine 10 11 so like 10 or 11 of them are basically it's kind of on the far end of this thing so again depending on if you wanted to conserve it different or not you could say that homogeneity is fine heteroscedasticity is basically we want we don't want any shapes it's gonna be a lot of dots and this looks like a blob or does to be um the main reasons to why I like to run these things is especially if your sample size is big I'm usually not very time certain about these assumptions over here but if something does does go wrong or is something unexpected that's happened in the results and it helps to know what maybe a little bit already in regards to the assumptions in your opening into in through your analysis over here so I'll be fine either way if you want to say that homogeneity is fine or if you wanna say that there might be times like issues of homogeneity I'll be fine anyway this is a large sample size okay let's get to finally creating the model itself and so where did this day is I come from so let me open up to you so this is the practice data set so remember it's different from your assignment data set it's comes from the computer version attitudes and familiarity index and it was a measure that was created by my assessment professor like then so I have and the reason why I'm using it is because my stats professor dr. Andrew cannon in Missouri State shakes you use this as an example and so it so it was a very convenient way for me to use this as an example to unfortunately right now I don't have access to the specific items corresponding to the item numbers I do have the the brief description of the items but they're not according to the item numbers but I do have the the response form and it's basically so let me just open up there you go so it's a bunch of different statements different attitude screen in regards to using computers so I often meet computer magazines a person would read that and then rate themselves I'm still negative 3 to 3 negative 3 being absolutely false really neutral and lastly indicating a degree that the items absolutely true so it's a like a type scale as far as I know and there are three factors the computer version factor all the items on these factors are reversed course so you do that in this analysis as well and this is the citation but I'll give you a copy of the paper now one disadvantage of using this data set is it's especially when it comes to model model a bit version it really helps to know what specific numbers correspond to which item so I won't be able to walk you through in a very detailed fashion as to what theoretical considerations to take to to consider when you're wanting to say you know Cola correlate errors between items but I'll talk about these things in general in generalities so we at least know what to think about when you're doing the assignment so anyways three factors I believe this is 20 right up that I had for this 30 item all right it's a 30 I am liking type scale the respondents indicator level agreement a disagreement to you based on your stations three factors right here so I'm gonna test out the original extreme factor structure like you are that's great back of all let's call this opened on the side yeah equal to QA Demi's this is the Lathan factor [Music] sorry about that do that that really change the culture probably it's because your your assignment just it's gonna be using an instrument that's not that does not need any type of reverse card that's the reason why don't have it in your assignment over here but for the example there will be some reverse phone so let me just do this real quick all right so negative 3 to positive 3 and it's gonna be basically 4 items again that's the Sun they go six seven five so that is six seven six seven nine and fifteen seventeen twenty four forty six six seven nine ten twelve fifteen seventeen fifty five six it's gonna be three plus one so before oops all right that's my need to self in fashion but that's okay this shouldn't take you long at all so let's meet the data side done all that already alright so fresh and six for example has item responses and negative treat treat and you see the frequencies out there so 380 people for example start off truthful in those three that so I'm gonna reverse for that and after just again number six and see if I get it right so I'm assuming that it did right for the rest of the other items so let's do this again so I know this okay 35 M 69 people good things for when memory serves me right okay now we remove all the other wives okay sunshine depression kinky flower it feels about the same that's before the most of the ones that between negative to be a pocket to our friend in you mister BAM yeah it's five me possibly ski but overall that's pretty much like the same as before it's a blob and they need some slightly she was hearing the got something between okay like if you're again so great again it's not gonna look really really P like societies you know that all is capable of doing this you ever want to know what else can do so this is how people study that just sets the font size now if they're running off for the first time you might not need to burn this code right here for some reason my resident afar is kind of messed up when I was kind of tinkering around with our seated before it could change some of the defaults for someone using so I had to kind of manage pretty soon so the numbers that come on at least are sort of eligible again I'm much prefer to do this in PowerPoint if you ever want to know what other codes can do to format your diagrams here and I just type in touching water touching mod CN pad and you could do that it's going to show you all the different stuff could be to do with this so you're basically insert these arguments right here when you're making your path diagrams okay all right this one doesn't see [Music] [Music] so you can see here that we are running the three-factor model which we live in fighters by the specific manifest variable so return all your items if you see that the beards is set to one that needs to scale on the factors essentially and the other thing that you can use the scale is to scale based on the value at the front manifest variable for each factor like here so if you treat you want six numbers in here should be value that you get based on when you're standing on scene based on the first minute as we're both as well as the variances and so typically these would be the numbers that we would get also in Explorer effect like an exploratory factor analysis these are the correlations give you the factors right there so again it's really hard for you to make this legible as long as you can run this I'll be find that that all I can show you how to get this in PowerPoint a lot easier a lot quicker and you're less headaches but there'll be another video for another one right next we need to check for any cases and a key reason why do you want this is because we want describe multiple correlations new cases are basically caused by any one or two things squared multiple correlation that are over 1 or negative error variance so scram multiple correlations or SEMS to get that we need to use a sterilized equals feature in the argument and then you get the sorry I thought I'm going to decide when so if you want this crime up with the correlations be one extra standard I just basically means that we want to get different scaling resolve so we can get results that are based on if you just standardized based on the first manifest variable or if you just analyze just based on the baby barons p-61 all you can if you get the values for when you are standardizing based on advanced and things are based on advanced in 61 and I suppose your first manifest variable so standardized which was true this gives you a bunch of different options that's why you do that all right so our square number R square this mathematically it cannot be over one so we're looking down here down the line for anything over one we don't see so that it's good and we don't want any of the differences to be to be negative because if that doesn't make sense at all as well like a magic trick so any negative form so if you're none so we should be go so with cases usually I will also give you a warning notification if you try to run the model and get the global fit indices it'll tell me if you're having a Kingston's but it's just like actually for you to find out on your own as well and I've have to pay for these for our studio to shouting earlier that there's something that's going on so let's take a global symmetry so in general what these fit indices will tell you is how well your your theoretical factor model is able to be produced the correlation matrix that it's coming from the actual data so so you basically want a couple of things in here and I've actually reloaded this ready to save time beyond high score the chi-square is coming from yeah just a little bit different so I need to update these values sound cream on six one and let me see it is funny right now I think these values are coming from when they're not reversed or the items for the third factor but as you can see is finally changer it's a whole lot so it's 6 p.m. cest lig that's still kinda easier to let go s R & R into a 4 so that's still not good one that's right might be less confronted a CFI like 74 and let's give your AIC NDSU as well these to put in DC's I'm gonna be very useful when you complain complain on that's the model I'll just get those values to you while we're at it this would be relevant than your such a very modest you just feel a bit all right basically everything is pretty bad and this is acceptable so this whole sordid and get you the highest values for a top which I may go like watery minimum values we're gonna set a 1022 that corresponds to P that's the point 0 1 so basically when it comes to taking the consideration these questions that you need to that you might want to consider you want to take into account the religions that he through the largest modification indices especially the ones that that have critical values more than more than that that have critical values that have p/e of less than point zero zero one and expected for a major change or EPC values that are lodged in the Prasad so anything any are that's bigger than point five zero I think it should be pretty large in terms of their standardized you can see values in addition to that not just fat you want to take into account to you so the nice thing about this outfit here is that it gives you the actual code for you to to revise the model so cue 14 cue 14 tilde tilde - funny tree that basically means that they want to correlate the residuals for those two items and if you do that yeah mortification index is free some people actually want well that's that's really really significant trust all you need is a 10 for the whole point 82 if you look at standardized you can see values that's look at let's use this one right there so anything above 25 if you want you probably want to consider in terms of the effect size so 14 for the 18 with 23 3:21 in 2014 from LA and this is where it really is a disadvantage that I don't have the actual items corresponding to the specific item number so I don't really know what you 14 says but assuming that I do know what you 1483 says basically what I wanna do is I want to revisit those items reread those things again and then ask myself is there any particular reason for me to believe that these two items have a shared variance that is over and above common that they already show because they have a the same weighting factor if there isn't any theoretical argument supported that then don't don't just change the model based on the modification in the series or the effect size you definitely have theory in this so I'm just fine looking at this I would say the thing that could give me that you give me the bang the most bang for my buck because I don't I don't really know what's in here I'm just gonna discard item 23 because we have a bunch of items loading onto their facility so what she's thinking loading up the familiarity and this may reduce say for example the predictive ability operation factor itself but just for the sake of doing this demonstration yes I just want to show you what are some things to consider after you do back the model so in this write-up over here this is kind of what it has everything so the largest modification index the bagua suggested that I was 14 when she shouldn't have talked only the residuals in addition to that 21 in 2014 and one and dirty that's all and provided a mine if you see that is above the asymmetric like you so let's just say I decide after inspecting the items that I Lucia wrote in fact I typically whirring avoiding effects aren't gonna be items that are reversed for so let's say for example if there is an item on this measure that says you know I love using computers and then another item that says I hate using computers those two items they may share a unique years with one another not because of any other reason other than the fact that they're just the inverse of one another so they could have voting effects especially if you have some something that just stayed at the other way there could be other theoretical reasons as well in your assignment is providing a theoretical reasons to why you think a certain item terracing have correlated signals so you can select a rationale do you think you have won if you don't if you think there is a colony island that's causing multiple issues you could say something like this so I put in here however view are not it to vacate or identify any compelling to theoretically based arguments when did you start your revisions so given I don't when experimental correlations that I've spoken 21 me decide to discard I'm gonna treat and leave on the factor model so I'm gonna run this one more time this time to doubt our infantry because that might just take care of this correlated residuals it 40 min from the object they might get system is that okay [Music] and we can also take a look at residual correlation so I'll show you what that is so without going too deep into the math basically you better remember that factor analysis trying to express each year or SS sum of two portions the common and using portion so the common portions of all the variables are by definition truly explained by the latent factors that you're consistent in the model the unique portions and are ideally perfectly uncorrelated with one another because if they do is they do share any common variants within that same factor it should be explained by the factor the degree to which a given data set since this condition can be judged from this very rough analysis call by looking at the residual correlation matrix coalition's on in this way are correlations that would have to be allowed among the unique portions of the variables and supposed to the common portion of the variables in order to make the common portions of the labels for the hypothesis of the model if you have really high unique correlations over here that basically means that that increases the likelihood that your model is not doing a very good job of explaining how your arms cover with one another that's essentially what factor analysis is so to get the correlations syntax is a little bit more involved a little for l equals four so we're asking for correlations instead of variances and then let's view so we're picking the coronation in this corner so this all license so you can't be residual for relations right here but I think wanna be a more useful way to examine this is we want to look at correlations that have really large that are highly that are statistically significant and we do that we can transform those correlations and do z-scores so to do that why CoV male Detroit okay so now we have these four five cows or anything larger than 1.9 say it's really is significant at a P less than 0.05 level anything larger than 2.5 80 plus or minus is gonna be P less than point zero 1 z score that is larger and larger than 323 are smaller than magnitude 253 is P less than point zero one and so based on your modification indices or the comforts are potentially item 23 21 14 if you look at item 23 maybe the residual correlation so it's a fairly large numbers right here that's all so let's compare 23 and say 27 - for example you look at the Z scores for 27 6 and negative 5 94 for question 23 20 those are pretty large sort of twelve that's fourteen anyone so it's pretty consistent with what we want to be fishing in the season solidness so assuming that there aren't any theoretical rationale for doing any of those correlations because I don't have access to the items right now let's just get rid of our infinite tree and see if that significantly improves the evening that's great yeah it's the same thing like what he did before so it's pretty much popular groups makes my picking the right model right here okay that should slide you I'll square the one and receive money one insured and 4G era there's nothing we that's well the reason why I still have high square over here at the degrees freedom that's cuz it's a pretty traditional double index that a lot of people know about but in recent years it's kind of falling out of favor cuz its significance is impacted by a lot of different factors variables such as sample size assumptions under goes into it and so yeah I had to do with your are typically reported in papers but I don't even Friday interpretive and I and I do explain why and I feel it's wrong with the thing so r MC 1 0 something that's that's see so that's a different body let's make this really fun just to somebody thought how about this pipe right SRM our point 75 hi my enterprise yes the FDIC value so basically the models that have the lower values for a icy and gassy would be the model that have significantly better fit compared to the other one and really anything larger than 10 points would be considered significant so six two five four five six zero one thousand in sixteen early morning ten points same thing with this as well six three eight three four six three or four five six and so in the write-up what kind of look like this in order to conduct conduct direct all eight dollars comparison of the original and by three factor models which if you're hiking and Basin Commission appearance goodness II for about models little values are better larger equal to or larger than ten this basically why do you want and this country start plugging in the CFI PRI back so this year it's like a 7c actually updated SRA more importantly inspection of the air CMBS p-value suggests that they do by three-factor model have seen this between better model fit compared to the original shape factor model that's right now all together these global Sigyn see secure to support better fit for everybody effective model over the original through fact the model results once you're done running after it's also a year and you would flip them together using the arm bonk function I'm just checking to see if that looks right so there you have it that's how you run a confirmatory factor analysis in our hope you had fun walking watching this video we try to do this on your own using the practice data set based on this video see if you can reproduce to use the dots over here and I want to see you when you see it once you're done with that and then you can go ahead and try to assign them and yeah so have fun with this let me know if you have questions and stay tuned for my next video
Info
Channel: Eu Gene Chin
Views: 3,200
Rating: 4.8888888 out of 5
Keywords:
Id: dmnFv-cVKvI
Channel Id: undefined
Length: 61min 2sec (3662 seconds)
Published: Mon Aug 13 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.