Confirmatory Factor Analysis in R using lavaan

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so the purpose of this is to show you how to implement the confirmatory factor analysis exercise that I have in SPSS in ah so I've downloaded the SPSS files in the zip file so I'll just extract them delete that and then let's have a look inside what we have here is a data description talks about the 25 personality test items they're in here so that's some of the other aspects we have metadata so here are the 25 items you we have an SPSS data file which we have - we have an exploratory and confirmatory which you know in some cases might be young yeah you might gonna run analyses on the exploratory and the modification indices to refine the items so you get a final kind of structure which you then evaluate on a confirmatory data file we have the instructions in here which talks about what we're gonna do so essentially the aim is to fit a number of different models on this data set so one factor model five factor uncorrelated but also I have a look at the modification indices create a table of model fits that includes key features so to do this in our I'll be using my approach of project template so I talked about this approach on my blog and my workshops on so this this post talks all about it specifically as the customized version of project template so I'll just download a copy of that and I'll put that yeah that's just a zip file which I then extract and give it an appropriate name like our confirmatory factor now all analysis exercise and I'll copy that name into the our project file which will allow us to open it quickly in a studio we then want to move the data file or data files into the data folder so these are these two SPSS files we copy them into there we give them the name that we want to give them I'll call this it's just a convention I use I'll call this see cases I'll just I'll just run around the young on this contemporary data file so this is the cases this is the cleaned discipline cases we've got these we have some metadata some artists will copy them that interesting case we use it for any reason as a convention I from call my metadata meta and that will automatically import it into yeah import this information this information is going formation about factor scoring item text reverse ones okay so we've set the few things up think if we double-click on this now assuming you've got our and our CTO installed our studio should open up and the nice thing about opening a studio through this project file is that you will now be in the correct working directory which is where this is and you know you can see here in the file pain that you are in that folder so that's good alright so the first thing we want to do is I'll go into reports and it's MD files this is where I'll work from and if I run this line library project template load project it does a whole pile of things you can learn more about it in my specific post about project template but essentially it's loading the data running any initial data manipulations importing relevant libraries and configuring relevant options so we can see that see cases and meta dot personality has been added to the global environment so if I look at the first few rows of C cases head see cases we can see that that looks like the data file at wildenstein saying SPSS these are six rows of the data file and we can see the variables up top a one two oh five and a few demographics and we can also see here what else can we see we can look at the dimensions of this Wow it's got 1200 36 cases 32 variables okay that's good the next things we might want to do is just do a few manipulations it's probably just a habit but I like to make their lowercase find it's easier to type and R is case-sensitive so we'll go names see cases stop and I'll just make them all lower cases so to lower C makes all those three names lowercase so yeah now we have C cases that's all so notice I'm putting this in a file called 0 1 / - ah and that's in the manager folder munge refer is a term that refers to manipulating data manipulating and cleaning data the reason it's in a dedicated file is that this process occurs within this overall loading of the project so whenever you're doing anything with a project you want to analyze your data the first step is load the data manipulate the data and then you can start analyzing it so you run that one command it gets you set up and ready to go another thing I like to do is create a variable list so this is an empty list at the moment but I would like to have I guess the UM the items in there and so what we have here is a 1 2 O 5 so looks like we're going to be about some sort of issue you all right stat it you okay you so yes employee we got married our personality here and we have the names of the items as well now they're also up okay so I want to get that consistent and so I may actually go into the Excel spreadsheet and ensure these names are consistent so I'll go equals and lower do that for all the items edit paste special and paste the values of the time okay so now is we've cleared those values and now they're all lowercase so the benefit of that is that if I own now rerun that reload meted or personality we've now got lowercase names and I can then copy them in and now we have lowercase items so we seem to have a few extra so let's check what's going on there so have a look at the dimension 6.31 bro so looks like it's importing some extra data from that ministry let's open that up and just clear that stuff out of the way you ringing pull all the data and then say whether that's still about 31 rise let's try delete that save it exit Greenport there we go somehow we've delivered those those Royals were coming out for some reason okay so now we have those personality test items stored in a vector which is which is handy and we could run a quick factor analysis on that data so that's C cases with the variables stored in that vector B star B dot items and we'll have five factors because that's what we believe is there and we can run a quick factoring asset and probably we also want am rotation to Promax and yeah we can store that it's back one and yeah we could print back one think this is how we do it yeah so under loadings you can see that you cut off and you can sort so our bottoms are already kind of sorted get a cutter 4.3 is no high step so yeah we can see that there's not too many cost learnings factors seem to be loading up their primary factors that's it that's a basic exploratory factor analysis you know we could look at something like the spree for this scale this is from the psych package and we can see one big factor than a second and a third fourth fifth is actually kind of arguably a meaningful six one there as well before the screen really begins so that's an interesting tidbit but that's not really the purpose of today but you said could go down the track of exploring six factor now I should mention while I'm at appear that under config this file called global DCF and I have specified here in the library state of what libraries are unused by default so already we've got the psych package loaded so that was wild ever use the street like stereo spring box right away as well as the lattice and H misc which sometimes use so next step we possibly want to add the Livan package if you don't have it installed you'll have to do install the packages Livan and that will download off the internet and add it to your library or you can you can go to packages go install and type them up okay so yeah lavage will all be using points here so next step let's have a look at the LaVon website you you lose the Lebon project so this is a really great place to get started with your bond you've got tutorials so our first example CFA so it's got some examples shows you how to define a model did the model and summarize the model so let's let's use that as an example and the rest of it you so yeah keep going and summary you okay so that's a good start so we've got the idea we're going to fit five five models the first model I wanted to fit that's will start with a five factor correlated model so if we look at the names of the names of the items we've got 25 items actually start with a table did the global more I fit a global factor model so I'll call that this m1 like so I'll call it let's do let's create models it's a list and I'll call this models and one that way we can store the models in on this so this is just going the first element of that list we then have the distint x4 a script so on the Left we have the factors and then we say you know he's assigned these three indicators alright so I'm gonna say global and you can see here we have to say each item a 1 plus a 2 plus a 3 plus a 4 and so on so certainly do that a 1 plus a 2 plus a 3 little trick maybe we could do taste the items and then I think collapse equals plus and there we go created that stream I believe the constraint where the approach would buy an item is constrained to one therefore defines the nature of the global factor is driven by the first item so we may want to if we want to ensure that the global factors say a good personality like an agreeable conscientiousness extroverted personality it's not my only check that a 1 is a positively word about and if we do that I think we said yeah I was I mean different about the others yes I think we can I think we can formally specify like this so where we go one times so that changes okay so I'll hide that little things it's not really cool we're then going to fit the model so we've got models in one using this CFA functions this is a bar obviously LaVon he's not currently those that rerun that rerun this sense of updating I can't be far so I'll also create a list of hits and we'll call this fits and one as well you I will see every use the wrong data files once de casas there we go we've got fits in one fit nothing really to say but let's put it into the summary and okay that's much better so you can see here we have a chi-square value test statistics basic set of eyes RMS CIA's and so on none of them are very good it looks like my approach of constraining 83 to one did not prevent a one from being positive numbers so what else they're probably other ways of doing this to all no these around now just salty okay so we ready around that and we check in yes III was constrained to be one and we can say that I won is now put a negative floating so yeah if we wanted to interpret that we can certainly say that the fits not great but we could also see the loading of each item on the global factor now by default it looks like we've got a unstandardized estimates which are fine but we may want to get the standardized estimates now I have a little Levon cheat sheet which it's quite nice I think so yeah you can get the standardized solution by going here put the the feed object into that argument and there you go you've got standardized estimates of each items loading on the global factor and you can see that every item well some more some less like the agreeableness items might be a large conscientiousness is quite high well most things are pretty high to standardized loadings that's a few of the agreeableness items were people okay so we successfully fit a global factor okay so what about a five factor model how do we get that so let's call this some you know the five factor five factor model I think we'll have to specified are not specified correlations between adapters in the CF very call which is a little bit different to what's done in a mas so what we need to do is say something like agreeableness equals equals tilde and then well let's copy all this down and we'll break it up make it pretty align you and add the relevant feet so consequence tilde extra equals tilde near our fanaticism equus tilde openness equals okay so that's that's good as I said we need to check that the first item is positively worded for the scale so conscientiousness excited might work that looks good that's conscientious don't talk a lot that's not so fun a difficult of approach that's introversion now how to captivate so we'll just move III into the first position so that gets the M the one the constraint unstable I was loading of one so that the directionality is caption get angry easily is fine and full of ideas that's good okay so now we've got model model to and I guess well that's kind of the same as well just bird all right just just have consistency it is the same so what we wanted to do here is say m to this with uncorrelated we're going to use this and then I think that's bogging oh it's true you can read the help on this but by default orthogonal is false or allowed correlated factors but you can megafauna troops or that allows unprotected so we'll have so we have the five-factor orthogonal solution model - and then the five factor correlated factor solution industry okay so we could then repeat this process you but for other models so so there's the summary of model 2 um and you know we've got various bit information you you probably want to start to actually compare some of these things and likewise we do the same for him three so at this point yeah you may want to be wanting to compare the fits of these different three models and we could do this for all models one way of course is to run summary and you know find where the chi-square is CI it's five four six zero point two four two and run it again yeah blah blah blah you know and get yeah now it's three thousand eighty seven point six six eight you can do that approach but that's not a very our way of doing of doing things by one year was to to 365 so yeah you can see that the chi-squared is being reduced substantially after each step so that's good yeah sure but um what's the our way of doing so we go back to this cheat sheet I pulled up before see how do you extract specific fit measures okay there's a fit measures function so fit measures is m1 and you see it returns a vector of relevant information so let's let's define the set of fit measures we're interested which ones we want to pull out so let's skip the names I'll put it in D put which puts it in a nice little string that I can reuse and that's the set of all kind of things I could potentially want to extract and I'll store it in acid general place so so come out we'll really use this so I'll call it the deep indices so these are that's getting indices I want to keep so if I rerun the you that can just remind us what each of these things doing so if we think about it we might want to have a number of parameters in the model the chi-square degrees of freedom I think anybody is gonna be so always notice again this is probably not that interesting but whatever I quite like just a small huh no no how much this is habit but I couldn't get CFI rmsea as well as paramecia confidence intervals and I quite like SRM ah as well so yeah it's the very analysis there's lots of opinion about what's the best bit measures but that's done yeah okay so so we now should have think about yeah I run it we have these fit indices so you the point of that is that when runs fit measures we could store that same X and go X the X is a name detector so if I wanted to pull out just the ones of interest I could do that so now we've just got the measures of interest and I cook up surrounded three decimal place so that's that starting to look like instead of indices that we might want to pull out 4-h model so we've got fit measures here hey this is a nice chance to make a function call it call fit measures and you can take fit object it can take which young start which would be like them fits in one what else within like we want a what though is they didn't fit indices should be and so by default we'll use this list and digits equals three centers to carry one around it and some different like and all we're gonna do here it's going to take take that code will just row and make it general so we're going to place the actual literal get object with the general term fit which is the argument to the function fit indices replaced in here and digits yeah so we now run that that loads that function into ah we now have course it in this for coffee it measures which we could then go picks and 1/2 and returns for good measure I'll move this now into the lip folder so it's always available so that will automatically be loaded up every time I reload the project so another nice thing now that I've put it in a list so I could go you know I'm cool now one at a time but better practice would be to use something like an air supply because I've stalled these fits in a list I can essentially apply a function to each feed object which is what's going on up here I'm applying the function core Fitz measures good measures to each model fit object but I can actually pass the whole list of foods and then send that to the core fitness function and so I write function X X will be the particular fit it's sort of like passing in a loop so now we have something quite nice we have a table of fits which shows quite clearly the number of parameters for each of the three models Chi squares the fits so that's very easy to read off now in show look rmsea has gone from point one to four in a one factor model to 0.09 one for a five factor uncorrelated 2.08 zero for a flat factor correlated so we've gone from it well a dreadful model or at least a model to make no pretense of representing the structure in any comprehensive way to others approaching something reasonable and we're also highlight the benefit of storing a fits in lists and then extracting that using a function so not and yeah likewise we probably do something similar with standardized solutions to compare loadings but okay so what other models do we want to fit we fit a one effective model we fit a fire factor moral model with uncorrelated factors with feet of five factor model with all factors into correlated we haven't seen the data-driven model and we haven't fitted a higher-order model so let's perhaps explore the idea of using modification indices to justify some dive driven modifications so but go back to model improvement we see that there's a function called modification so let's start with that five factor correlated model that seems to be pretty good and for free do that on that we can see that we get some modification a lot of them a lot lot lot of the problem is that they're of different types and some may be a problem so some will be cross loadings some will be correlated residuals I think there may be others as well so you want in a sense look at particular categories of loadings and also sort them by time so this is a quite a convenient thing to show the largest modification so let's see what that gives us so we'll store this as more in and structure of modern is that it's got a bunch of things but the mi is democracy them the modification so well order the data decreasing by modification indices and show the first ten rows of that so what when you see here is the size of the modification to the chi-square that we achieve by adding a particular tune ordered by size so seems that adding a correlated residual between in 1/2 would substantially increase the model as well as adding a few different cross loadings loading in fire more conscientiousness in for an extra so let's have a look at the method of personality and just have a quick look at so named in what was it n1 and n2 just we'll just pull out those two rows and we see that the two items are get angry easily get irritated so--that's they're clearly very similar worded items you compare that to other even neuroticism items like have frequent moves things off of your blue panic easily there's something more similar about those two items and therefore words saying well look the commonality of those two items is greater than just what's shared in general between factors so we could have a better representation of this test if you allowed these two factors parlay so to do that we we take an existing model we'll call this model for and we essentially need to allow these two factors to correlate so we're going to say to add some comment thank correlated residuals and we'd have and one is correlated with in four so we go to Livan Livan website and the tutorial you I get so you see that for covariances and variances use a double tilde notation and for factor loadings you use equals tilde and for regression or prediction equations you choose a single tilde so we use double tilde here say these two items going to be correlated you we then fit that model with nominal learning's and we could rerun this general thing we had one more model to our good subject books and I just update that and we can see here that by adding that correlated residual our chi-square has improved from two six two three six five two three three one six oh four just doing an over this three six and four we can get more explicitly and we can see that the chi-square has changed by 48.6 oops so going back I've clearly added the wrong line so I should have correlated in one and in two so if I do a mining in two and rerun that then we see that the model is improved by one hundred forty eight point nine one which when we looked at multiplication indices still it's not quite that so that's intriguing I was expecting to see that the change in carts bill would be the same as the occupation so that sound that's something for me to understand better certainly the the model has improved substantially okay you you okay so that's an example of a correlated correlative residual let's try an additional one so you can either run rerun modifications and that's probably advisable or you could simply yeah that's probably advisable in general would be too we're gonna run out of one modification at this time but sometimes you can do a couple in a batch so now it's saying are maybe five should load on conscientiousness or oh four should load on your rhotacism and if we look at method or personality we look at a five take charge so perhaps that has elements of conscientious hard-working as opposed to making friends and knowing that a captivating anyway we'll think about this a little more but so we would add if we wanted to do that modification and that's always an issue we could add a five to the conscientiousness factor so call this model auto-5 and you we wear a lot fit that object you and then we'll rerun this supply to get all our so we see that now we've got our rmsea down 2.0 7-6 from a point of light so it's starting to approach rules of thumb that would suggest a reason or model and certainly will cease with say a large bunch of sort of additional good correlative residuals and so on we could probably get it to a good model know whether that's all appropriate and what that says about the test that's sort of a much broader theoretical question but that's the that's the process by which you could introduce additional modifications and you can in particular you notice the check I've down to check that these things make theoretical sense ok so the final thing that we have is to get a global model so I've never done this before but let's see if we can let's see if it's straightforward and for simplicity I'll I'll build it off model two because we interested there maybe sit like that think it really matter to have crossed loadings are in a global model that will just keep it simple yeah and so what we're saying is that global has these items these latent factors learning on it you so let's see cross the fingers okay model m6 and you not sure whether they say if I will work I think agreeableness to openness and now I'm exogenous perilous this argument probably wanted not a now so can we look at the summary of that model you we can see the chi-square but probably what we really want is standardized solution I see sorry I've got this wrong okay I'm not sure so we now have the loading standardised loadings of each factor on the global factor so we could tip in your Otis's and loads negatively which is as we would expect agreeableness conscientiousness extraversion openness were all have in the positive direction and it's interesting to notice that extraversion happens to load most and then agreeableness and then the other three are a little bit smaller in that regard so we could then produce a summary sorry summary of these fits oops and that's effectively our table of results and it's interesting to see what they say like a one factor solution which is m6 how that compares to m3 in terms of whether you know having unique correlations now it is that m6 does have a larger chi-square but it's also more parsimonious it only took five parameters different correlations in factors rather than ten so looking at see if I would still a little bit lower I miss you is pretty much the same SMI a little bit high so it looks like maybe the global factory's not quite as having correlations for everything but it goes a fair way to represent the pattern of correlations in fact if you wanted to export this summary table to say Excel a word or whatever you're reporting then we could do write that CSV some table and then file equals now I usually write to the output folder sort of a consistent way to store output so out put some table dot CSV and then we open that chute generally open up in Excel and yeah you could report you know your models this way I actually think maybe might be nice to transpose the some table sometimes I'm okay yeah I think that might be a bit nicer so now we have a summary table like that you could add information like one factor model to factor so if i factor uncorrelated five factor correlated five five factor correlated with yeah one model this with two more modifications and global back to model you can you can potentially tidy it up here think about what decimal places you want so on and I consequently one decimal place is enough see if I will get it I quite like custom point zero zero so that it's always less than one so it's kind of appropriate so we'll do yeah custom point zero zero probably get these wigs a bit nicer you could probably fix up the rmsea and stuff at obviously at that add some borders top border bottom border to make an APA yeah you could probably actually probably should save this as I can have a file called output processing just rest or near this manipulation you know you just copy your CSV into here and or your kind of edits you it seems like I've gotten too nested into my folder structure to save this file properly which is kind of fun funny but yeah you could get inside that and you copy that into a Word file your report and there you go I guess you'd uh autofit to the window what everybody's or make it uh make it okay so I think that's that's all I've got for tonight so there you go that's how you commit confirmatory factor analytic models with that
Info
Channel: Jeromy Anglim
Views: 15,879
Rating: 4.7473683 out of 5
Keywords: statistics, rstats, lavaan, confirmatory factor analysis, factor analysis, R (Programming Language)
Id: gcrXP2yMYY8
Channel Id: undefined
Length: 54min 42sec (3282 seconds)
Published: Fri Jul 31 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.