R - SEM - Confirmatory Factor Analysis Class Assignment

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
next we have a CFA Simas the CFA stands for confirmatory factor analysis so this might be they follow up to an exploratory factor analysis so this would be the first try CFA and looking at how to code them in liván and how to create pictures of them how to work with a standardized solution how to work with a data set full data set this time since all the other class assignments have included correlation or covariance tables and so we're gonna get into how to program this first so we're gonna use a scale called a computer aversion attitudes in film and familiarity index at a caffeine about Schulenburg who's a friend of mine and all of these questions are about different issues with computers so I don't like them I think they're a good thing I'm familiar with using them and what we're gonna do is program the three factor model that he founded his original exploratory analysis and then we're gonna try a one factor model an intestine see if those models are different for each other and look at some residuals so this should just say residual here standardized mmm modification indices loadings and fit indices so let me look it up our studio here see let me close everything out that's some stuff for working on it for the last video still open all right so let's go a new script remember the first rule is always to load your library so we're gonna use LaVon and Simplot um to create the pictures of the analysis now the first thing I want to do is load the data so this time I have real data instead of a correlation table or covariance table and it's saved in a CSV file so I can use this important to set option let's go port data set from text file and all these files will be available soon on our website stat tools comm and let's see here what do I have that scrolled away Asian modelling assignments or and there we go CFA basics now when you use this window make sure you always check down here when it shows you how it's gonna come out in the data frame so I'm gonna rename mine to something simpler like data but you want to make sure that the first column where all your variable names are is actually the variable names you want to use so I'm going to turn on heading so I get question 1 question 2 question 3 and not character columns with question 1 is the first row of data an SPSS will do will import this in the same types of ways so you guys have to make sure that it realizes that the first row of data is the names of the variables and not just another row of data lots of other options but I don't need them so i'ma hit import and then I always cheat cuz I can never remember the past four things now you could set your working directory instead of using a very specific path in case you wanted to move it but for right now this will work for us right so grab my data set imported now I want to create the models so let's start with a three factor model and so the three factor model is gonna have a specific set of questions so let me copy that from word over here and then go back to studio so I'm just gonna kind of temporarily plug that into our and use that to create my three factor model so familiar is approximately so I'm going to do equals approximately as well as new what is that if you've been watching YouTube videos in order so before we've just been using it and it's this y-variable is approximately these X variables added together well now what we're doing is we're creating a latent variable so with confirmatory factor analysis what you're trying to do is show that there are specific latent variables that affect the questions of that manifest variables and Kight amending on which book you're reading and so it's it's basically like if you think about this as as computer familiarity is what causes the answers on my questions so I want to build familiarity with these questions but since this is a confirmatory factor analysis we want it to be reflexive so that the latent variable predicts the questions and so this allows us to say that yes familiar is X and it is approximately equal to all of these other things add it together so not only does this create us a latent variable that isn't in the data set already it also allows us to say this variable predicts all of these other ones so I'm gonna type in my question numbers and if I'm looking at the data I have them labeled fairly easily so it's Q 1 Q 2 and so I'm gonna look down here at what the items are so it's Q 3 plus Q 13 plus 14 and 16 you would you would wish you could do the spacing doesn't matter I'm just trying to keep it nice and neat q 20 : Q 23 that would be great if it was actually besar but it is not so you have to type them on each one individually oops don't leave off a number it'll be very confused now one thing you want to make sure that you don't do when you're doing this is name a variable with a number I don't think are will normally let you do that in any kind of way because it thinks of numbers as numbers so I can't just have this be one two three four five like you would want to do because it would assume I literally meant to add one two three four five and so it would think that you were trying to add some sort of intercept but which so starting it with q1 q2 all sort of help in general starting with or a number is not a good idea and then you like actually cannot name them solely just numbers got a plasterer i would say tell me when i make a mistake but this is a recording 29 oops q 29 a so got my second one now a version here is approximately equal to what's good question six last question seven plus there we go hey you got all that time now I don't have to include traditional CFA's can include the correlations between the lightening variables remember that that Livan will do this for you so this is my three factor model that are I didn't get rid of that uppercase Q there we go now to program a one factor model I'm gonna do the exact same thing except this time I'm just gonna make everything computers so imma copy this you know just stuff with you're watching me type all that again and I want them to do is delete this put all the questions on one line because I want them all to be predicted by the same variable and then I'll just call that computers so we are saying that there might be a three-factor model list familiarity attitudes and aversions version or it might be just one giant one factor model so I've got my models created next thing I'm gonna do is run the models let's see what I asked for because I can't remember so we're going to include a picture with the standardized loading and then we're gonna check for some different things so for Haywood cases and we want the squared multiple correlation and then we're look at the loadings the residuals modification indices and fit indices okay we can do those one at a time so let's start by running the models okay so we got three model let's do three fit and one fit and this time we're going to use the CF ache function and the CFA function is a little different from the sim function but not by a lot so when we're looking at sim in the Livan library there are lots of different options we've been working with model and covariance structures and stuff and you spin indices that sort of thing but in the CFA it looks pretty much the same but since we're doing a CFA it helps if you use the CFA function because that will kind of help you remember but out know that there's really that big of a difference between them it's the first thing you gotta do is tell it what model you want to use so let's use three model fit three model for this one I'm gonna do my data equals data so this time I'm not gonna use sample Kove cuz I don't have a covariance table I have the raw data so I'm gonna go with the entire data set now that's all you really need and that will run the confirmatory factor analysis and give us a this three factor model so let's now do the one factor model get that second to run great that's great two pictures first let's see if that did what we expected it to do okay so I'm gonna use empath cuz it'll work on the CFA creative models too so I wanna do three dot fit let's do what labels equals okay and this time I'm gonna do STD for standardized solution instead of par-4 parameter for unstandardized solution and the layout for this one that works best is tree alright it's loading slowly there we go so it's all crammed down here something hit zoom so the first thing it did mmm was it gave me my three factor model and it really squishes them all together but I can look at how that's a funny abbreviation it's unintentional I can look at the standardized solution which makes these core correlations out here so you see variance is standardized and set to one the dashed line means that my scaling was set on the manifest variable and so it actually didn't estimate this parameter it set it to one to use to scale it but here's the answer if it were in the standardized solution the correlations between these three are pretty moderate to high so one factor model may fit pretty well because they're so highly correlated or it may be a second-order model which is the next assignment if you're interested in second order and by factor models watch the next video all right and I can't really redo all these paths so I would have to look at them individually but it looks like it did it right I mean it looks like what I would expect they're correlated with each other that sort of thing so let's make the other one I'm gonna use basically the same code just trade out for the one factor model I go so for computer out everything tied to it right so that looks like it created pictures that I would have expected all right so I have done this create one factor model create include pictures of standardized loadings cool so for this three factor model let's check for hey what cases so order haver cases and I'm pretty sure it gives you a warning we can get hey with cases but it doesn't hurt to know what they are so that you can sort of look know what to look for so hey what case is an improbable solution because your squared multiple correlation or R squared or over one now in Livan it's called R squared and s that's pretty appropriate because it's the amount of variance accounted for in that question or item or lightning and but every other book that you'll see will usually call them squared multiple correlation if you're using a different program that's what it's labeled as in the output the other thing to look for is negative error variances so variance cannot be negative because it's squared in the formula so it's mathematically impossible covariance can be negative so we're looking at the relationship between Leighton's that can be negative but error variances cannot and those are labeled variance on the output so let's to some summary of our three factor model to answer that question so this will be our summary section so we use a summary command and let's do that on three dot fit now to get the we've been talking about how you want to do the standardized solution so standardized solution that's true so I can look at the different versions of a standardized solution and then to get the square both the correlations the code is R square equals true all right let me look at this output here now here are a couple of different things about scaling so if you don't do anything in the CFA command here it will automatically scale the variables based on making the one of the manifest variables the first one set to wand and you'll see that here so this is the unstandardized solution here and so these are interpret as regression paths so for question 13 every one unit increase in familiarity I get point eight eight increase on question thirteen so that's easy to interpret when I think about how the scale works standardized latent variable this column here instead switch is the scaling to the latent variable and forces the variance for the latent variable to be one and so those are just two different ways to scale it's a kind of my opinion is that unless you specifically need to scan a standardized before the latent variable is to not to set the manifest let Ematic lee do manifest scaling for you and then look at you can get both of them by turning on standardized this true and so I could look at either one so this one gives me more options if I set standardized to the latent variable I only can see one of them so I kind of like more information than less standardized all is really useful because it standardizes on both ends and what-what that I think is useful for is since we're doing confirmatory factor analysis you are probably have already run an exploratory analysis and standardized all will let you see it in the same scale that you would expect for an exploratory analysis and so when you're looking at you know do questions load over 0.3 it was just kind of the rule for EFA this will give you that same scale whereas if I'm looking at the unstandardized solution I'd go 1.3 and you think you've screwed something up and you have it because that's in the scale of the data so for its 1.3 points increasing when familiarity increases in the same problem here standardized versus on the latent is sort of like a C score and so that doesn't necessarily give you that in the same scale but standardizing everything gives them to you kinda like the FA so that's why I like that option honestly there's nothing wrong with any of the options just tell people when you're working when you're working on writing this up which one you did and so they understand why your parameters are what they are on your picture okay anyways back to the Haywood question so I'm gonna scroll down a little bit now see how this is label covariances those can be negative none of them are but they could be but now I'm gonna come down here to variances so these are my error variances for each question I just don't want them to be negative okay they can be they can be large but they're I just don't want to be negative so none of them are so that's good and the last thing down here is R square so how much variance is counted for five in each question by latent variable and while some of them are not the best we wanted to be really high right none of them are over one okay so we don't have a Haywood case in this instance so you want to check and make sure your SMC's are not over one your air variances are positive and then you are good next question is what about the item loadings so we would expect questions to load onto their factors right mm you'd hope they would someone go back up here and this is what I was saying why I would like standardize all you could check the p-values or they're all significant but sometimes it's not super helpful because when you have large samples everything will be significant even if it's really tiny so you know it might be statistically significant but not practically significant and their loadings are sort of like an effect size so you can look at those to see if they are indeed loading in a way that you would want so you can kind of use a point three role as a criterion and they do look like they're loading pretty well anything I would say some 18 19 may not be very good questions for attitudes although they are still over our cutoff kind of informal cutoff and so I would say all the questions are doing and I would expect right so they don't seem too bad you can also look at our square but R squared will tell you the same thing that loadings do so things with really low loadings will give you low r-squared values to these we're only accounting for 14% of the very it's if you use traditional r-squared there's sort of two sets of rules for r-squared and so it's either 1.01 a small point oh six is medium and point one for as large or 0.0 1.0 nine and point two five this is somewhere between medium and large oops we go back to word here all right so no not really no residuals is a way to look at where the fit maybe not matt is not matching and so we can look at a residual correlation table to see what some of the fit problems may be okay so I'm going to show you two different ways to do this using the residuals function and kind of a hint on how to make this easier for yourself so let's do correlations so the residual correlations and so what you do is I'm going to call this coral to give me the correlation table so I'm save it and you'll see why here in a minute I use the function residuals so when the residuals of three fit which is where we saved our CFA and I want to give it a specific type otherwise it does covariance automatically so covariance is on standardized so I don't know how to interpret those I don't know what a bad score would be so let's do Core four correlation okay so if I run that it gives me a list over here so instead of just giving me the correlation table it actually gives you a couple pieces of information so let's view it by typing coral down here and when you have 30 questions or more this correlation table is huge so you'd be like scrolling through and it's just tons of stuff here let the correlation table you can see here is cover saved under core so what I can do is view the correlation table now remember the view function is one of the weird ones where it's a capitalized letter I'm going to coral and this is subsetting from a-list so me use this dollar sign and I want core and that will give me a the option to see the correlation table now if we're using studio the great thing about that is it pulls it up here in the editor window and it gives you the option to sort them if you do not have this option update studio because I didn't ever have this option on my Mac until I updated recently so I could flip through and look at the bad combination so question 3 has a couple of negative pairings with these questions that aren't so good and then a couple of positive ones and so the same issue I would scroll through each one kind of look for the really bad spots who I saw one there yeah so question 14 in question 20 23 are not playing along with each other because look how high that correlation is that's half that I just cannot figure out what the difference is also over here in question 21 so it's looking to me like question 23 might be kind of problematic because there are lots of times where it has a high residual value and so really this will tell you where in the correlation table you might be having some trouble with fit so it might be one particular item has a bunch of really high residuals or it might be that one factor has a bunch of high residuals and that can give you some clues as to where you might be able to edit it or take a closer look at what's going on the problem teasing correlations is why they are standardized they don't really have a cut-off score I mean I would say you know 0.1 it's gonna be kind of bad but there were a lot of in at war point 1 we haven't looked at fit indices yet so I don't know if this model fits so let's give it a more traditional view and let's make it a z-score so let's do Z Coral this time residuals and I'm gonna do the same pan it above three dots bit and now type is equal to standardized there's also another option which is normalized and it has a discussion about this in the help guide for residuals for LeMond and essentially the difference between standardized and normalized is just a math issue most people are gonna use standardized but if you get a bunch of na s it suggests try using normalized instead so standardized means make up z-scores so imma save that and let's try viewing that one now it's gonna do sea coral dollar sign and now my options are type Cove and mean so you want to use a cove why are they different I don't know they just are now these are Z scored so since there's these scores anything over 1.96 is statistically significant at the 0.05 level if you want to be more stringent you could use two point five eight four point zero one or three point two three four 0:01 I said we did get some n/a values but look at this thirteen good gracious that's out of control and so I would I think mostly the n/a values are when it's correlated with itself so that's not too surprising because you can't z-score zero zero as a residual and so I would look for um okay well it did give me zero here so I don't know why sometimes the N eyes are happenin I would look for a bunch of a question that reappears a lot with Cora significant what we would normally consider significant residual correlations z scored okay so that's how you'd look at them and it kind of looked to me like question 23 is culprit here so it's got a bunch of high values now I have some that are normal but most of them are pretty bad so I might consider taking question 27 out I'm sorry 23 and we knew from before that 19 wasn't sitting very well so let's look at 19 and it has some but not a whole lot so it might not be the question you expect let's put it that way all right so that's how I can look at the standardized residuals are there any modification indices okay so we haven't talked about how do I get those now part of the summary could be mod indices equals true but instead what I'm Atty to do is tell you to do it separately because then you can kind of you can add more arguments to it if you add it in the summary function you sort of stuck with seeing every possible modification and to see and it's this huge table if you do it in the separate function you can you can look at you can sort of set some rules so we want to do the minus C's for three fit and let's do its sort with a dot so that's a little odd but I'm gonna use that let's do sort equals true so that the highest ones are first and then just run that and we'll come back to minimum value here so I'm gonna do that give it a second so I got 561 of them lots of them are zeros look at all the stuff so much stuff right lips scroll too high there you go and so but it did sort them in descending order for me if I look at the mi column I'll explain what that means in a second but I can clearly see that there are 500 of them and so what's gonna create a lower criterion so I could do minimum value here and I could say only give me anything that's over 30 and that is just because I'm looking at the numbers already and I know there are lots of them you could use 3.8 for that is the chi-square cut off for one degree of freedom okay sometimes called a Grange multiplier so let me cut some of those out basically using the sort function of the minimum value just help you see only the biggest ones if you we see specific paths that's when you might need to just leap all of the Monica scroll through them alright so what is in this output well the first thing it gives you is the actual code that you would use to include this in your mouth so you can cut and paste it so that's nice now this is a double tilde so it's a covariance between question 14 and 23 so you'll tend to see things with high residuals pop up here because you're trying to account for those residuals so if I added a covariance between question 14 and 23 I would get a change in chi-square of 372 okay so mi is modification in the C which is how much the minimum fit test statistic is going to change chi-square if you add this exact path to the model I don't know the rest of the columns are super important but it's expected change and then some standard errors for those so that's expected path change I think but generally people look at like how big the chi-square is gonna change cuz usually you're trying to do this to approve model fit now it will suggest things based totally on math and so you want to make sure these are things that theoretically makes sense so at least you know in the social sciences were really focused on theory building and model testing based on those theories and so I don't know if I should correlate those questions I would need to go back and look at the questions and see if like okay the reason why we're getting this is because the error terms and those who are really correlated because they're really similar maybe I should consider excluding one of those questions because they're just way too similar so and that's why we're getting so much error there the other things that'll give you are some some what we would normally consider cross loadings so it's saying that question 8 should also load on to a version which I don't think it currently does so let's go back and look up here my model question 8 at the moment currently learns in attitudes and so it's saying add a path from a version to question 8 so should also be on this factor and then there are some folks who say that you should never let questions cross load which means load on more than two factors and that's very traditional EF a simple structure rules and then the some people say it's fine so that one you're just gonna need a really good reason and hope you don't get one of those really staunch reviewers that doesn't like that sort of thing Hey and which will likely happen to you either way from experience um all right so that is how I'd look at the modification indices and I could add these paths to my to my model path thing I'm sorry my model creation up here and then I would just rerun all of the run the models run the picture and I could read around this code I've already used okay so that's how you do modification indices all right the last thing is let's look at fit you can include fit in the summary command and we actually haven't run a summary of the one fit model yet so let's do that alright so let's see if things load on this so they all appear to be statistically significant let's look at the numbers here ooh this question is bad question 21 not so much question 23 this should not surprise you given what we've already looked at I would argue are probably bad questions given the very low standardized all scores that they have I don't appear to have any Haywood cases all my variances are positive and all of my R Squared's are less than 1 but look at 21 I mean good gracious that's tiny so we're not doing a very good job with that question and you'll notice that all the R squares tend to be a little lower so I'm gonna imagine that the three factor model is gonna fit better so we could use fit dot indices or fit dot measures equals true in the summary function or to do it separately so it's fit measures and I want to do the three fit and so I'm gonna pull all the numbers I need from here for the three-factor model I like separating them out so I can see which piece of information I want to look at one at a time so I don't have to like kind of scroll through everything and find it but to each their own on that one so I got hi square here so I got 28 70 4.75 okay the reason freedom are 402 that is clearly gonna be significant which is not a helpful number I'm gonna ignore my baseline chi-square because I don't need it right now my CFI is point 737 my toi 0.716 I often like to include in fi you can also you don't like AIC include ECB I pick your favorites just don't pick any of the goodness of fit statistics cuz they're terrible so if I come down here to NFI not double in its point seven or eight so don't eat any of those scroll the AIC is seven three two one 6.7 one nine okay so it's right here see there's my rim season point oh eight eight by 90 percent confidence interval it's point oh eight five two point zero nine one so that's a good sign because that's all in the acceptable range and so my SRM R is 0.08 seven then if I wanted to include EC VI to the last one 3.77 nine because I'm gonna do some model comparison here all right now which ones should you report chi-square Rim see SRM are pretty much a given that's like and especially in our field CFI is very popular I might tell you to pick one of this one of the comparative indices so CFI t li and n fi all phone and that category there's also IFI and RFI you know there's lots of them if you're gonna do non nested model comparison or any type of model comparison it doesn't you can use these for nested bottles to is AIC and a CV ISO akaki in information criterion and the expected cross validation index alright let's get one factor model here okay so here we go let's look at took a chi-square for four eight six nine eight seven floor so already not doing good cuz chi-square has gone way up I'm almost double I've only really changed three degrees of freedom because we need collapsed all those covariances into one my CF I went from kind of like not so hot to like really not so hot so those are all not acceptable Chris the point seven ones aren't really not acceptable either AIC is this number time I cut and paste in trips that's trying to target okay so the first models AIC is lower and you always want that to be lower my rim see yeah I've got point one one three am i 90 percent confidence interval is 1 1 0 2 1 1 6 so that is not in the acceptable range SR Mr is one of seven any CBI is 5.8 oh all right so which models better it doesn't take a whole a big chi-square table to tell that three degrees of freedom difference in nearly two thousand points significant so clearly these models are different the change in CF is greater than 0.01 that's the criteria for that one and then the AIC andcb is whichever one is smaller so clearly the three-factor model is smaller so while the one factor model is more parsimonious it's the simpler model the three-factor model is clearly got a better fit the loadings are better although there are some problematic ones and the the fit indices are better even if they're not perfect so I would argue that the three-factor model is much better than the one factor model because the fit indices are better in the loadings are better etc and so then I might start to explore the modification indices to see what's going on or maybe deleting questions and seeing if this there are outliers or problems with the question that maybe are causing poor model fits and rethink maybe how the questions are loading or if they should be on there at all so all of that taken together is sort of a basic idea of CFA the next assignment covers a second-order cfas our higher goal cfas and by factor models where you split the variance into a generalized factor in a domain-specific factor
Info
Channel: Statistics of DOOM
Views: 13,705
Rating: 4.9565215 out of 5
Keywords: YouTube Editor, confirmatory factor analysis, cfa, sem, structural equation modeling, R (Programming Language), fit indices, heywood cases, Factor Analysis, Statistics (Field Of Study), modification indices, residuals, model comparison
Id: TOsleQgu8RU
Channel Id: undefined
Length: 39min 29sec (2369 seconds)
Published: Sat Jul 04 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.