CFA and path analysis with latent variables using Stata 14 1 GUI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay in this video I'm going to illustrate how you can utilize datas graphical user interface to carry out a confirmatory factor analysis and test a path analysis model using latent variables so I'm not going to go into all the bells and whistles associated with the program but rather just to kind of give you a bird's eye view of different options that are available to you in terms of drawing and in terms of obtaining different outputs so typically what we would do is you would either have data already in a Stata data file or you might have to import data from another source into Stata save the data as a state of data file and then basically you would perform whatever manipulations are necessary so I'm not going to go into issues related to dealing with missing data or data screening and so forth I mainly focus on the graphical user interface but you know keep in mind that typically before you actually specify the model using the the graphical user interface you would want to do your data screening and and say whatever changes to the data set that you actually make so I'm going to start off by just importing some data from Excel into Stata so I'm going to go to file import and Excel spreadsheet I'll click on browse and this is the data file that I'm going to be working with so this is a Microsoft Excel spreadsheet with the data and so now I'm going to click on open and you'll see that basically I've got variable names and these are these variable names these are associated with different items on a questionnaire so I have three items that are representing different constructs and then the values for those items below so I'm going to click on import first row as variable names and then click on okay so now I have the data imported into the state of program and I want to actually save the data file before we do that I'm going to show you that it that it actually exists if you go to data data editor go into browse mode you can see that we have our variable names as well as the data associated with them under browse mode you can't make any changes to the values in the data set if you wanted to make any kind of changes you would have to go into the data editor and go under the edit mode and you can do that but at any rate I'm going to save this data file the state of data file I'm just going to call this SEM and it's going to be given in DTA file extension so I'm gonna click on save and so now I have the state of data file that was created originally from the data in our excel file so to run our CFA and also run our path analysis using latent variables what I'll need to do is to go to statistics SEM and model building and estimation and then this box is going to come up if you give it like if this box looks a lot smaller and you want more area you can always go to view and then you can say fit into the window or zoom in and zoom out or you can adjust canvas size and so forth but at any rate so what I'm going to do is I am going to click on add observed variables and I saw I'm going to do do this for a lot of tative truth would yeah I'll just do this for each latent variable in the model so I'm going to actually start with here and so basically this is a little button right here the rectangles are used for indicator or observed variables the ovals are used for latent variables so you can see as long as this button is pushed as long you know as I put indicator variables into the the interface you can see these these boxes continue to be to appear and and so I'm going to do this also on this side since I'm going to have an outcome variable ultimately so there we go so now I want to click off of this button right here so I'm going to just click select and so now nothing I can't draw any more boxes at this point if I wanted to draw more I could certainly click back on here and do that but I'm going to click off so next I want to click on the oval and this is to draw a latent variable someone draw one here I'm going to draw one down here and then one here as well and then click off and next so these are these these are the indicator variables these are based on the data in your data file and these are basically going to be the latent factors so next what I'm going to do is I'm going to draw out the relationships between my latent variables and latent and the are between the latent variables and the indicator variables so I actually had three indicators per latent factor in my original model so I'm going to draw this out so you can see that you were using the single headed arrow and drawing and as we draw we're going to get these these uniqueness values on the left side of the of the indicator variables so so we're drawing those out so basically the uniqueness represents the variation in the indicator variables it's not explained by the proposed latent factor so there you go so now I need to give names to each of these variables so what I can do I'm going to click off the single the path drawing part and then I can click on a indicator variable or observed variable and so when I do that when it highlights and this little line appears up here and I can just basically use a little drop-down and start to fill in so I'm just going to highlight each of these and fill in and these are the variable names from the original data set okay so next I need to give names to the latent variables so when I highlight a latent variable up here where it's his name I'll just give it a name so I'm just going to call this contact just generally and then press ENTER and there you go it appears in the Oval this one I'm just going to call ID and then press Enter and there you go and then here I'm just going to call this and then press ENTER and there you go so these are the latent constructs that are being used as Indic that are used to explain variation in the original set of indicator variables so if I want to draw out the and within us frequently with CFA models it will allow the latent variables to be correlated so I'm gonna use a little double-headed arrow here to add a covariance so I'm going to just draw you know from here to here so there's one then from here to here there's two and then from here to here there's three so this is representing the covariances among the latent variables so at this point I'm just going to click off again and estimate the model so if I go to estimation press estimate then I have different estimation methods we're going to stick with maximum likelihood and use the standard default standard errors here and I'm just going to press ok and so now the model is being estimated and so what we see in this particular case are our it is the unstandardized solution so you'll notice that that one of the paths for each of these latent variables is fixed at one and basically with the unstandardized solution we have to give the latent variable a scale so and we and we can do this by fixing a path coefficient to one in the the standardized solution then basically there would not be a paths that would be constrained to one and those paths would be estimated but the variances of the latent variables would be fixed at at one so right now we have the unstandardized solution and you'll notice that if you click on a given path then you'll get the unstandardized path coefficient as well as a test of significance of that path coefficient so like for the unsterilized path coefficient from this latent variable to this variable here it's negative 0.6 the Z value is negative 5.4 5 and the p-value is clearly going to be less than point zero five so this would be a statistically significant path so this indicators is significant as an indicator of this latent construct we also have down below we've got the standardized factor loadings as well so this is a standardized factor loading between the latent variable and and this particular indicator and you can also see you have significance tests associated with those you'll notice for this particular path right here the unstandardized loading is one point zero two the significance level it's indicating statistical significance and then here's our standardized path coefficients our factor loading as well so you interpret really the standardized factor loading very much like you would interpret the factor loadings associated with exploratory factor analysis where the loadings are expressed you know more in correlational terms or at least when you're looking at the structure matrix where you don't have more of a complex factor model that's given so you'll notice that for the the paths that are fixed at one you'll see right here the regression coefficient was fixed at one so there's no standard error associated with it so there's no test of this of this indicator as to see if it's a significant indicator of the late construct because basically the latent variable is being scaled in relation to that variable the same would go over here so you can see there's the fixed path coefficient or factor loading and the standard error is zero because that path is not being estimated and the same would go right here so in terms of looking at the associations among the latent variables you can see that in the unstandardized solution you get covariances among the latent variables because again latent variables are being scaled in relation to one of the indicator variables if you look at the standardized solution the covariance is actually the correlation between the latent variables so it's expressed in pearson correlation terms so if you want this information in a little bit more compact form you can go into the output file here and you can see that we've got our factor loadings so you've got the coefficients these are the unstandardized loadings here so you got your coefficient you can see that for this latent variable going to this particular indicator the first of the indicators the coefficient is fixed at one is constraints so there's no test of significance of that particular path from the late variable to contact - there's the unstandardized factor loading standard air Z value and then significance tests the then we have the latent variable - contact three that's that factor loading is being estimated and tested so until they are indicator variables where the paths are not fixed at one so let's say we want to look at the global fit of the model so we want to see how well overall our our specification fits the data so we can do this by going to estimation goodness of fit and then click on overall goodness of fit and so typically I will just under statistics to be displayed I'll click on all of the above and then press ok and so now when we go to our output file you can see that we have a chi-square test root mean square error of approximation compared to fit index Tucker Lewis index and standardized root mean residual so with the chi-square test that we see right here we're looking this is a p-value and so what were one indicator of a good model fit to the data is a non significant chi-square test so if we have a value that's less than 0.05 that would be indicating statistical significance and that's actually a bad thing if we are assessing fit so the chi-square test is oftentimes thought oh there's a badness of fit test and if it's statistically significant that would suggest that we have poor model fit to the data so we can see right here for this particular test the chi-square value is less than point zero five which would suggest a poor model fit to the data the downside of the chi-square test is that it is impacted by sample size and so given that SEM procedures tend to be large sample procedures it would be very easy to obtain statistical significance with a model that exhibits you know fair or acceptable fit in other ways so so it's useful to assess other more descriptive indices like the rmsea or root mean square error of approximation for this index good fit or is indicated by values that are you know 0.05 or below acceptable at 0.08 or below and so this index right here is falling at 0.08 which would be you know fairly you know acceptable it's not great fit but it's acceptable given that rule of thumb when we look at the comparative fit index and the tucker lewis index good fit or is indicated by values that would be a point 900 or above more optimal levels would be above you know 0.95 or above and so you can see that using those rules of thumb the CFI would be indicating you know at least an acceptable model fit whereas a tli would be into a poor low model fit the standardized root means squared residual a good model fit is indicated by values at 0.05 or below and in this case this index is following the value of this index is falling above 0.05 so we have sort of mixed evidence for model fit the chi-square test is is suggesting poor fit the rmsea is suggesting at least acceptable fit compared to fit index is consistent with an acceptable fitting model but the tucker lewis and the standardized remain residual are suggestive of poor model fit so we have sort of a mixed bag in this case and that that sometimes can happen you can have different fit indices that can kind of tell suggest a different way of thinking about fit but we you know we can look at the different fit indices as sort of different lenses for assessing overall fit if we want to look at we can also look at fit in terms of equation level goodness of fit so I'm gonna click on this and click on OK and so basically what's going to happen or what has happened is we get r-squared values for each of our indicator or observed variables in the model and the r-squared values are basically communality estimates so these are proportions of variation accounted for in the indicator variables by the latent factors that have been specified to predict those those variables so if we multiply these proportions by 100% we can talk about them in percentage terms so we could say that the delate variable associated with contact one accounts for about 68 percent of the variation in that indicator it counts for about 44 percent of the variation in contact 2 and 33 percent of variation in contact 3 the anxiety variable accounts for about 46 percent of variation in the first indicator 30s roughly 36 percent in the second and so so basically we can also in theory ask for modification indices to sort of explore where we could possibly add parameters to improve model fit so if we click on modification indices and click on OK when we look at our output file these are some suggested parameters that we could add to improve the overall model fit the thing to keep in mind though is that my you know modification indices are purely in an empirical strategy for deciding on what parameters to add to the model and so they may not necessarily be suggestive of or they may not necessarily be consistent with theory so that's like I said a very brief overview of running a CFA now I want to quickly illustrate just running just a path analysis using latent variables so what I'm going to do is I'm actually going to get rid of these parameters so I'm just highlighting this and pressing delete highlighting this one and pressing delete and highlight this one pressing delete and in fact I'm going to just kind of click off of show estimates here so that can kind of clean it up a little bit so in this particular case you know I've got three latent variables and I could have other latent variables where I could be specifying more complex path models but for this particular illustration right here I'm just going to stick with this basic design so I'm going to use this arrow from this variable to this variable and you can see that when I do this an error term or a disturbance term is generated because the this latent variable is basically a latent endogenous variable whereas these are being treated as eggs on Jonah's if I add in a covariance between these two variables here then basically what this is is nothing more then a regression analysis using latent variables as opposed to just indicate our observed variables you know like I said I could add in another latent variable and make a more complex model in fact why don't I just do this really quickly I'll just add in okay so here what I've just kind of I've added another variable as well to make this a little bit more complex path model so you can see that we've got these two latent variables predicting this variable and only this variable predicting this one right here notice that we can move things around to if we highlight a given parameter and just we can sort of drag it around and make make things a little bit cleaner looking so you know to sort of improve the way it our appearance of within the model so at any rate now so now that I've specified this particular path model you know basically what we have is a direct effect here from this variable to this variable a direct effect here and then we also have a direct effect between this variable and this variable and then an indirect effect this via this route so we'll click on estimation and estimate and then click on okay so the model is being estimated and I want to go back and I want to show the estimates in the model and so this is Betty's are the estimates for the unstandardized solution so if we click on this path right here this would be interpreted as a nun standardized path coefficient the standardized path coefficient is given below you can see that you know both of these have the significance levels associated with them but I you know I would focus on interpreting the significance associated with the unstandardized path coefficient so there's the path coefficient for this path here but the unstandardized in standardized solutions and so forth so there we go if we look in our output file you can see that it kind of grows that basically we've got the measurement model which the measurement model is basically you know from this the CFA and then the structural component is basically laying out the paths and forming the relationships predictive relationships among the latent variables so basically from this variable to this very boy right here that path coefficient the unstandardized path coefficient is negative 0.026 and it was not statistically significant there's your p-value there's a Z Z Z value associated with that test this variable loop to this variable right here there's your your path coefficient and that relationship was not statistically significant when we go from anxiety to this outcome variable here there's the coefficient and you can see that that was statistically significant so you know looking at those parameters in the model you know basically what we have is the path from this variable to this latent variable is statistically significant the model if if we want to look at overall model fit we could go to goodness of fit overall goodness of fit leave it as all right here and click on OK and you know there we go we've got our chi-square test we can see that the chi-square test was statistically significant suggesting poor model fit to the data the rmsea was at 0.05 which is you know at a preferred level and suggesting a good model fit the CFI was at point nine seven two and tli point nine six three both of these values were above point nine oh suggesting good fit the S on the SML SRM r was 0.05 seven which is a little bit above what we would conventionally consider good fit so we have sort of a mixed bag suggesting related to the fit of the model the rmsea the CFI tli are suggesting good fit the chi-square test is suggesting poor fit as well as the SRM R and then we can also look at the the one we look at the paths in the model we actually see that really only one of those paths among the latent variables is statistically significant if we want to look at the r-squared values associated with our latent variables we can go to goodness-of-fit and equation level goodness-of-fit and so we've got the r-squared values so these are basically again the uniqueness is associated with each of our indicator variables in the model and then we have the r-squared values associated with this variable is 0.16 - so our predictors accounted for about sixteen point two percent of the variation in this variable but in terms of this variable right here we basically account for less than one percent of the variation so really from at the equation level you know this we're not really doing a very good job of explaining variation in this variable right here based on this variable but we are doing a fair job of explaining variation in this variable but given that this path is the only one that's significant the lion's share of the variation and in this variable is largely accounted for by the anxiety variable so at any rate there are you know we still have we can look at modification indices to see if there's any other paths see if there any other paths that we might our parameters that we might include that might improve the fit of the model to the data so that's certainly an option there's also we can ask for underestimation we can ask for you know direct and indirect effects so we can click on on this and basically get when we just go with the default what we end up with is you know looking at so when we're looking at the indirect effects we want to consult the structural component when we're looking at their relationships among the late variable so you know prejudice to this variable right here there was no indirect path that was estimated from contact to this variable there was in direct path look for my anxiety to that variable there is so you know basically as you can see this variable to this variable that's a direct effect and then this variable to this variable via this this prejudiced variable is an indirect effect so that's the only indirect effect that was tested in the model so so at any rate the point of this exercise was just basically again to illustrate how you can carry out a CFA and also path analysis using latent variables and the state of program so like I says obviously I'm not going into a lot of depth in terms of interpreting various parameters just because they're they're quite quite frankly a lot of them too to look at just keep keep in mind that generally speaking one of the prominent strategies has utilized when testing the full path analysis with latent variables is you say you start off with a CFA or test of basically a measurement model to see how well the indicator variables are representing they're late constructs and then if that model fits the data well then moving into laying out proposed causal associations among the latent variables so you know when I started this video I focus mainly on these three latent variables right here and then as I was kind of expanded the discussion into the path analytic part I actually added this variable here with its latent variables ordinarily I would have basically started off with a cfa model that would have incorporated all four of these latent variables and their respective indicators test that out to see how well it fits the data and if it did then layout they proposed causal associations I didn't do that in this particular case but that would be the ordinary strategy that one would utilize enroute to testing the path analysis with with the latent variables so I hope you find this helpful as a discussion of how to use the GUI or GUI and in the state of program and while you're running cfas and testing path analysis models with latent variables you
Info
Channel: Mike Crowson
Views: 34,922
Rating: 4.9452057 out of 5
Keywords: STATA, confirmatory factor analysis with stata, structural equation modeling with stata
Id: NDmJm9AtkAI
Channel Id: undefined
Length: 31min 24sec (1884 seconds)
Published: Tue Apr 19 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.