A Gentle Introduction to Structural Equation Modelling

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon everybody and welcome to your first theoretical lecture on structural equation modeling um the purpose of this lecture series is kind of like to take you through the theory behind structural equation partner and i'll take you through every step of the way don't worry but in this little lecture series you'll be able to manage everything quite well i think um my name is louis van sale and i'm an assistant professor at the technical university of eindhoven and i work with m plus and i work with structure equation models every single day of my life and i want to share with you how to approach one how to estimate one how to compare different models with another and how to just become the best possible researcher that you possibly can so during the next couple of um lectures we'll be doing four things so today i'm going to show we're going to reflect a little bit upon what you already know about structure equation modeling what you already know about regressions right once you have that i'm going to take you through a gentle introduction into structure equation so just the basic concepts step-by-step guides you have a thorough understanding of what to expect in the next section in the next video we will then specifically focus on how do you estimate these models how do you compare what's the process there's a five-step process that we follow how do we get from idea to results and i'll take you through that and finally um we'll take you one step further and i'm going to show you how to what is mediation or indirect effects from the structural equation modeling perspective so after watching these three videos and there's three things that will happen three things that we hope you will learn first one is really have a very deep understanding of structural equation modeling and its value in relation to the normal general general linear approach we're going to show you how to differentiate and compare different measurement models with one another and what's the difference between measurement model structural models and finally i'm going to show you the basics of mediation okay so i'm going to show you that you guys already know quite a lot before you even started the structural equation i'll take your hand and we'll walk through this step-by-step promise so let's kind of like look at what you already know so let's take a little bit of an homage a little bit of a reflection on normal multiple regressions so i utilize a paper that i did very long ago in 2010 and the basic idea behind this paper was that we wanted to look at what predicts work engagement of industrial psychologists we had an assumption to say that we'll crawl fit so me as an individual do i fit the role um that's required of me if there's a match between the person that i am and the work that i do i'm going to be more engaged um also if i find meaning in my work so if i see that my work contributes to something bigger will that lead to engagement and again this is theory orientated so we did normal regressions step by regressions we kind of have our descript statistics here um it was on a scale of one to five i think we're one being um the best and five being the worst so here we see work profit 1.9 so basically two that's quite good same with meaning closely to engage with gratitude so people are quite good high levels of engagement high levels of meaning and high levels work off it we look at the skewness and kurtosis as you can see here um they were normally distributed because they were smaller than zero points smaller than one um but what's important was the first thing is to look at the relation between the factors here we see there is a relationship between all of the factors they're very high very strong and they are significant so that tells us the combination of these two things tells us that we can actually do a regression so we put all of those things into spss and this is the results that we got so overall we found that meaning and work real fit declares 55 of the variance in engagement it's quite quite a lot actually um and there's also our fit statistic name our degrees of freedom that we have um we found that our overall regression um the difference between our null model in our actual there is a significant difference between the two so yes the nested and unnecessary models are significant therefore we can move on now we have to take all of these things kind of um into consideration and then we plot this on a nice diagram so the actual standardized coefficient so the strength of the relationship is significant so the strength relation between worker fit and engagement is 0.22 and the strength relation between meaningfulness and engagement is 0.54 and both of them together declare 55 of the variance in work engagement great we have an amazing model but what are some of the limitations of this approach so this will be called the the ordinary least squares approach um and one of the big things here is that we lose a lot of variance when we try to estimate because there is things that's going on between these two factors there is interaction effects there are things that's happening here but we're not capturing them with no regressions we're also not capturing error so there are a bunch of factors there are a bunch of things that kind of make this an obsolete approach so another major factor for us is that how do we get to these factors in the traditional approach we take all of the items so all five items that loaded on workload fit we added them together and divided them by five and we got a mean score for that that's how we've got to do this score and we do the same for each one of these what's the problem with this so um this is kind of how that would look like in practice so we've got four items that we asked people on a scale of one to five one being the best and five being not um regardless of what you are doing with your time do you feel time passes by quickly i'm always very absorbed in my work i love what i'm doing and my life is too short to postpone pleasures so um if we look at it from the general least squares perspective we see that the mean score for all of our participants on this item was 4.7 the mean score 4.9 so on and so forth the um item loading was 0.73 0.77 0.62 and 0.61 right this is the factor loading that you will be familiar with now like i mentioned when we want to get to this happiness factor right we want to create it we say item one plus two plus three plus four divided by four gives us happiness right makes sense what's the problem we assume when we do it that way that each one of these items loads exactly the same on that factor and unfortunately it doesn't even though the mean scores would stay the same the item loadings the factor loadings are not equal therefore you can't say that this plus this plus this plus this gives us that factor or divided by 4 gives us that factor we have to take this into consideration and the general least squares or the ordinary least squares method unfortunately does not do that now i'm going to go over to the next slide i want to show you what happens i if i estimate exactly the same thing this confirmatory factor analysis in m plus so i'm utilizing structural equation modeling now the things that's going to happen is i'm going to add error to each one of these items and let's see what what happens to the mean score and the factor loadings item learning says exactly the same but now you see the factor loadings so i mean sorry i mean the the mean scores are exactly the same but now the factor loadings are different see gls 0.73 0.75 0.77 0.8 0.62 0.6 so you see that there is a major difference between these two approaches because this approach also takes in consideration the interrelationships between these factors and that's important the estimated mean score here is 4.31 where here it's 4.62 so it's a bit lower but when we use same we can't really have an actual mean score because our actual mean score is zero because we're more focused on the correlation the covariance matrix and interaction than we are for mean scores but i estimated just to kind of show you okay but now how do we how do we compensate for this what do we do well we utilize structural equation modeling and this solves all of our problems so what is structural equation modeling right so basically it's a series of it's not a specific technique right it's not one single technique it's basically a series of different types of multivariate and physical techniques that we utilize to analyze and estimate very complex multiple relationships between factors so unlike normal regressions where we have factor one then we set the next factor in to our regression equation and so on and so forth we look at all of these relationships and things together so it's an integration of different approaches so we utilize item response theory for example like i showed you in the earlier slide with the different loadings that's item response theory so include that we look at um everything from latent profile things to regressions to correlations to etc so it's an integration of all these things together what makes it different it's also it's a confirmatory approach rather than an exploratory so we have a theoretical idea about how these things work the relationships between different factors so we don't throw all things into the big statistics spot and pray that something positive comes out we have an idea of how these things work and we're testing theory against our data we don't need our data dictate to us uh what it is that it thinks is the best social equation modeling also helps us it specifies systems of relationships right so it's not just a relationship between a and b and a and c it's a relation between a and b with c and b and c with a so it's combinations of relationships rather than specific relation at a time what makes this also a lot different than normal approaches is that we're estimating what we call unobserved factors we're estimating latent factors from things that we've actually measured right so we're creating things out of measured variables so out of those items we create happiness but they don't necessarily measure happiness the items together measure happiness but each item in itself doesn't so from the combination of these things we create something that doesn't exist that's unobserved we measure something that you can't see and finally what's really important here is that um we're basically testing different forms of the theory against our data okay does it make sense so to summarize um structural equation modeling is not a specific technique it's an integration of techniques that we take from psychometric theory factor analysis latent variable analysis path analysis regressions simulation studies and etc it's integration of these things that help us to estimate the systems of relationships between factors okay so where do we utilize this how can we utilize it so primarily it's theory testing right so it's hypothesis theory and we test our data against it we can estimate mediation or indirect effects if you want to know how a factor translates so how we know that work profit leads to happiness and happiness to performance but we want to know how does work grow fit indirectly affect performance through happiness so we know that worker offered declares maybe let's say 50 of the variance and happiness but how does that amount of variance that impact that will crawford has on happiness how does that specific impact have uh influences fast performance that's mediation we're going to look at group differences so we can look at how males and females respond on our model so maybe the relationships look different for them for some reason we can look at longitudinal relationships and we can look at it from a multi-level perspective but all of this is underpinned by latent variable monument and latent variable modeling is again they can observe things that we've asked and we're estimating something that doesn't exist so this happiness you know i think doesn't exist but it's a combination of these different factors so we ask these questions based on these questions we create that happiness thing that is unobserved it doesn't exist practically in our data so what are some of the advantages of structural equation modeling over the ordinary squares method so of course it's a lot more flexible in its application you can do a lot more with it um at once than you can do in spss for example in unlike the general d squares method um we really focus on the estimation and incorporation of a measurement error so the noise that's happening in the system this happens as a byproduct in gls but it's an active consideration in our in our same process um we also test multiple relationships and interactions simultaneously unlike the gls method where we take one factor to our one independent variable to our dependent variable then a next iv to our dv then the next unlike that we look at all of these things together and we model these things simultaneously which is important because then we can estimate the the flow of information between our different fractals um like i also mentioned previously we examined these latent factors as a function of our observed factors it provides us the more accurate or improved um statistical estimation so it's a more it's more closer to reality than you would do if you just did it with the normal gls method it also helps us to remove a lot more of the random error of the noise because we're estimating it we're also incorporating it into our model and it also helps us to find the best model for our data where the gls method doesn't do so okay so that's some of the advantages of the model now what are some of the key things we have to consider when we want to utilize sim so there are some key aspects so the first thing is we always specify what we call a priority model april ri models april higher models are theoretical model theoretically informed models and we always specify the full model so we don't in a normal gls method we would do an efa exploratory factor analysis on uh work office scale and then we'll do one on the happiness scale and then we'll do one on the performance scale and then based on that we then create the means and then we actually do our analysis later in sem we do all of these things together it must always be on the same proverbial page you can't have this do a validation of an instrument so a cfa confirmative fracture analysis on the one factor and then the other factor and then the third factor and they put all the things together no they're always together on the same page you always estimate them together um and they're all theory-based right so you can't just thumb suck something out of your well out of your thumb you have to have a very strong theoretical argument why these things are in place the next assumption is that your data needs to be normally distributed um in more advanced programs like in plus for example you can utilize other estimation methods to kind of compensate for um issues with normality but if you utilize literal or amos or some of these more rudimentary packages um you have to ensure that your data is normally distributed because you can't change the estimation method okay so you can transform your data or something a big thing with structure equation modeling is unfortunately it's sample size and model complexity dependent you need a very large sample size to get this model to work right and um you have to do what you call a power analysis so you can go to there's a website um i'll link it in the description but there's a website you can go to or you can utilize g-power and you can enter your items the significance of the effect that you want the alpha value and you can get a good indication of how many people you need for your model now unfortunately it's also even if i've got a very large sample size um model fit so does my data fit my model is very dependent on how complex it is it is we know one of the limitations of the model is that of structural equation running is that if you have a very like um complex model your data is automatically going to fit you're going to automatically get a lot better effect it's not necessarily it means that you get a better model but or better fit for your theoretical model it just means the more complex the thing the more likely you are to get fit but just because you have fit doesn't mean it's the best model for your data another factor that we have to assume is that our residual variances so the error that our items provide needs to be small and as close as possible to zero as you possibly can large residual variances so that's the error in um in each item that you measure if it's large there's a big issue in your specific item you have to figure it out key assumption that i mentioned before is that we test different models and we compare them against each other the major strength of this approach and finally is that we have certain criteria that we utilize to evaluate if a model fits the data so we look at what we call model fit statistics um and there are certain criteria that we utilize to figure that out but we also look at what we call measurement quality so we look at specific item loadings is our questionnaires reliable not so there are some other factors that we utilize that we look at in a model fit in measurement quality so these are the six things that you have to consider when you want to utilize um so basically in sim we have two major types of models the first model is called the measurement model and this is basically correlational in nature i'll explain what these things kind of mean in the next slide but this means like there is a bivariate relationship or covariance between all of the different factors right so and we have different versions of it so we don't specify what relates to what we just specify how each factor is made up of which items that's our measurement model that's the the model that tells us which theoretical model could physically perhaps match our data the base once we find the best fitting measurement model we transform it into what we call a structural model now here you'll see this is correlations we've changed this into regression so this is directional so this theory tells us that demands impact engagement and engagement impacts fast performance so as soon as we add that into the equation we have we it automatically also estimates error terms for these other latent factors okay measurement models correlational structural models regressions direction so just to summarize measurement models we're interested in the relationships between latent variables and its measured observed indicators so we're interested in the relationship between these factors and between the factors and the items regressions we want to specify directionality in the relationships so do what you call this path model this is the path model so a is the bbc so you want to spread directionality between the different factors in structural equation modeling um there is certain conventions certain things you need to be aware of okay so it's how we draw things is very important here um in any model that you draw a square means it's an observed indicator right it's an item or something or a mean score it's something that we actually observed a circle is what we call a latent variable right dual arrowed arrows like this this is what we call a car it indicates a correlation and if we have an arrow that shows just one way that is a regression okay straightforward squares observed circles latent dual arrows correlation regression one error let's see what this looks like so this is an example of a path model structural model and structural equation model right so here as you can see the little blue ones these are what we call observed indicators and this is the items the questions that we specifically ask right so this is the exact questions that we ask for each participant these little green ones are we call latent variables and they're made up of these specific blue factors these items that we've estimated these little red things right that's what we call the error terms or um the noise in each one of the items so every item that we ask it's not like 100 of the people are going to answer it in 100 the same way we're also estimating like the the noise that some people might have answered difficult some people might have seen it a little bit different um so there's always some error in everything that we measure so that's what the little squares are for um now the little circle software so unlike your normal gls method we have different words for independent and dependent variables in structural equation product so here for example we have an exogenous variable so this will be traditionally called our independent variable i always remember it um by saying xo so it's kind of like uh beginning so our exogenous variables of factors are basically equivalent to independent variables in the gls method our endogenous factors are the factors of the outcomes of the interactions that we have to predict so competence will be an endo engagement would be an exo for that and um performance would be an exo genius variable we know an exogenous variable also because as soon as we enter it it automatically generates its own residual variance competence would not have one because here we're predicting competence predicts engagement right and it doesn't predict it 100 and all of the variance so let's say it predicts like 50 of variance but there's also some noise there right and that's where we get this residual variance there and that's how you know it's an endogenous factor another thing you will notice is that each error variance the residual variance is constrained to one right so the loading of it is all equal for everybody you also notice that each one of these item loadings is constrained to one so the iterators are fixed to one why because it helps us to estimate um other factors so if we want to get a standardized value we need something to be constrained to be a standard a constant so that we can generate um standardized values and it implies that all these other techniques do it automatically so the first item in your loading will always be constrained to one okay that's automatic and this is important for um in a bit okay now that we have some a basic idea of some of the concepts in sem um you have different types of what you call path models so this is a path model right but there's basically three types we have the confirmatory factor analytic approach we have a what we call a latent path model and we have an observed path model okay so the confirmatory approach here we are basically interested in understanding the relationship between the observed factors so these items either nine eight and seven with its latent factor right so we are interested in understanding the relation between the observed indicators and the associated latent factors it's confirmatory why because the model is based on an a priori theoretical assumption that we have from this perspective we know that demand is made up of these three items and that's what the theory tells us that's why it's a confirmatory approach we can also estimate a normal path analysis with um observed indicators and this is kind of like we've done in the um initially we did the regressions we have work off it predicts meaning and meaning predicts work engagement but there's squares so they're mean out here in this case unlike the other approach we only um we can only estimate the structural model unfortunately we don't have different measurement models to compete we only have the structural model we are only interested here in the relationships between the constructs represented by the direct uh measurement thereof so we're only interested in the specific relationship between or the the combination of the relationships between work profit meaningfulness and engagement so in other words it's basically a series of multiple regressions but put together in an equation okay so the latent path model is incorporating this the observed and the cfa approach right so here we basically simultaneously test the relationship between the measurement models so that's this is the measurement model this in its odessa measurement model and that and its owners measurement model so we simultaneously test the relation between those measurement models and the structural model so that's the combination of competence predicting engagement predicting performance right does it make sense so we're basically doing the cfa and the um the previous one all together so this approach is really awesome why because it specifically incorporates the relationships between the observed and latent variables so in other words the measures and the factors it also looks at the relationship between the latent factors themselves and finally it incorporates and compensates for the residual error the residuals in other words the noise in the model so this is what makes this approach so freaking amazing okay so those are the three different models that you can estimate platforms so in order for us to estimate this there's a certain flow that we have to get to get to our final structural model because the structural model that path model this one is what we are basically working towards this is the one that we actually want but there's a certain process we have to follow to get to this final outcome um and i'll go into a lot more detail on this in the next video lecture but fundamentally there is a five-step process firstly we have what we call model formulation so in structure equation modeling because it's a conformatory technique and it needs to be based on a very fundamental theory so we have to really figure out what is the underlying theory that you want to test sem needs a model that declares the relations between the factors so you have to have this before you even start with your data collection once you have this we have to have we have our model that we need to test needs to be um identified so we look at model identification here mostly to have enough what we call pieces of information in the actual model in order to help us to estimate the stuff that's unknown so that latent variable that's that's there but it's not measured we need enough degrees of freedom enough um pieces of information to be able to help us to estimate that okay once our model is identified um we have what we call model estimation and this is actually you doing the actual evaluation so here we try to minimize the differences between the observed and the implied or latent factors right so we try to get a good match between trying to incorporate as much as possible so we said as much as possible in as little as possible words so we predict as much variance as possible with the least or the most complex model that we kind of have and we utilize like an iterative process to get there as we estimate and what's called a nested and an unnested model so something that's least restrictive to more restrictive and seeing which one fits the data once we have this the fourth step and this is where the measurement modeling strategy ends um we have model evaluation and modification here we want to assess um the model fit so the model that we have we want to assess i mean against our data we have certain criteria that we utilize to do this so certain thick boxes that you have to look at we look at certain first statistics and we look at measurement quality um we are also really we also look at what we call absolute fit and comparative so does the model actually fit its data right and compared to fit is like now that we have see that our model actually fits let us compare to our other models that we have also estimated because we want to figure out what's the basic based on this this interaction between these four factors we finally create our structural model and the structural model is the one that tells us a needs to be leads to c that's something that we're interested in so now that we have a structural model that's of course the outcome but you can do a lot more with the structure model we can do mediation and moderation analysis we can do multi-group we look at how different groups how the model looks like for different groups and we can look at changes over time so how do people change them how do they grow um we can look at longitudinal um confirmatory or causality we can look at the underlying profiles that's in this thing and underlying classes so there is a lot of kind of like things in place that um this approach can do so i really hope dude i really hope that you learned something from this lecture and um that you have some basic understanding of um sorry if you have a basic understanding of structural equation marker ladies and gentlemen thank you so much in the next lecture we're going to a lot more detail on how to estimate these models have an amazing
Info
Channel: Mplus for Dummies
Views: 2,212
Rating: 4.8888888 out of 5
Keywords: structural equation modelling
Id: kBHzggVCYwc
Channel Id: undefined
Length: 32min 40sec (1960 seconds)
Published: Mon Sep 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.