EAM 2021 SEM Workshop

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
post this video on youtube um just when we're done well once it's done uh compiling i'll post this on youtube and oh you'll have access to in the future james i just wanted to request if you have the option can you turn the captioning on captioning yeah let's see live transcript yes sign nope copy no enable there it is enable and those who don't want it then just cross the button on top of the screen and they don't see it but those who would like to use it would see it all right there we go we'll see how it translates some of the statistics language well i speak with an accent so it will be better with than that it does a pretty good job although i have to say oh good all right looks like we have a few more uh uh jumping in so we will once again post that here's the link to the data um so what uh what dr billings asked me to do is cover just the basics of measurement models and structural models using spss and amos and we only have an hour and a half so it's not a lot of time i i do have to end sharply uh probably a minute before that before what is that noon your time um or no 11 year time uh because i have another meeting that i have to jump over to so we will see what we can do even though we're sort of constrained time wise if you have questions please do ask them anything we don't get to i have videos for i some of you may know i run a youtube channel that has a bunch of sem videos on it videos for pretty much anything you want to do in spss amos smart pls m plus uh i think that's it yeah and i'm going to add some more con well continuously and i have a course actually for anything if if what you we if what we do here is interesting to you and you want to learn more about uh analysis in spss and amos here's an online course i'm putting in the link here in the chat window there's an online course that goes over all this stuff in way more depth and actually explains it gives you references has exercises projects exams grades if you want for your doctoral students and so it covers about three two to three semesters of sem in that one course which is about a 50 hour course so if you find this useful let's get started so i'm going to just open that data set i better share my screen huh let me share my screen let's see how about not that let's go over to here take that down i'm going to share my screen and you should just see a tree right now and i'm going to open up that data set that we've been sending in the chat window if you still need that link we can post it one more time but it's called eam 2021 workshop dot save also in this workshop i'm not going to cover sort of the the preliminaries of prepping your data of going from a raw data set to um to a cleaned data set this data set is already cleaned and um i'm going to assume that you've all had a at least a basic statistics class so you know what a p-value is you know what a coefficient is um and all that if you don't then i'll i'll at least i'll speak i'll interpret things so that it will become evident as we go and here we go again if you have questions feel free to um feel free to chat them in the chat window or just stop and um and speak up i just gotta find where my chat window is so that i can keep track of it here we go pull this down here okay i'm just getting all set up here hopefully you should be able to see my screen and here we go so we're going to start with a factor analysis an exploratory factor analysis and there's some debate out there whether you need to do an exploratory factor analysis i should explain exploratory factor analysis is is a class of factor analysis where you find common groupings among a set of variables and it's unguided or undirected meaning we leave it to the software spss to determine those groupings based on iterative correlations between items and item groups so we give we give spss a set of items and we say find us the groupings here and what it does is it goes through each item and in the covariance matrix it finds how is this item related to that item has this set of items related to this set of items and what it does is it tries to form tight groups of highly correlated items into what we call factors and then try to separate those groups into distinct factors so there is as much tightness within a factor and distance between factors as possible it's a lot like cluster analysis but uh 90 degree turn of a cluster analysis cluster analysis is trying to find groupings of cases of rows in your data set and factor analysis is trying to find groupings of columns or variables so we're going to do that and and why would you even do this an unguided factor analysis because most of you you'll be coming into your data set knowing already what the factors are supposed to be right you you built a survey or you collected data where you had five measures that were intended to measure a specific construct so why do we need to explore whether they factor together the answer is there's debate about that probably about half maybe even more than half of the methodologists out there say you don't you don't have to factor now explore your factor analysis you can just do a confirmatory factor analysis um i am of the camp on the other side where i say well it doesn't hurt anything it's not hard and it gives you more information and as a scholar more information about your data is usually better than less information so i still always do an exploratory factor analysis just to inform myself is it required no unless you truly don't have a measurement theory going into your data set let's say you've inherited a data set secondary data then definitely do an exploratory factor analysis but i do it all the time anyway because it it emerges uh issues that you might not see in a confirmatory factor analysis okay somebody says we just heard the kind of opposite yesterday and i'm directly into cfa like i said there's debate on this topic and i'm in the smaller i'm in the minority uh of scholars on this and again my my logic here is what does it hurt it's so easy and quick just do it okay let's do it so here's the data set um for those who are following along i'm going to try to go at a moderate pace i'm not going to go as fast as i usually go and i will be going somewhat slow for those who are just watching so i thank your forgiveness and your patience so the first thing we'll do is we'll go to analyze at the top menu and you're going to go to dimension reduction because what we're trying to do is reduce the number of dimensions used to explain our data and then go to factor oops dimension reduction factor and this will bring up a window like this where we have all of our variables on the left and yours might look different from this yours might look like this um and we should probably fix that before we move forward actually because it's going to be a mess so um hit cancel if you're if you're following along um hit cancel real quick let's go up to edit and options and this will save a lot of headache if you do this so edit options and that'll bring up this window and what you want to do is here on the general tab change display labels to display names but there's more so don't close it just yet this is the first thing change display labels to display names and then over in the output tab change um these two the outline labeling should be names and then labels down here and same here names and then labels down here in the pivot table labeling that's going to clean up your output considerably and make life easier so again in the general tab change it to display names and the output tab should be names and then labels names and then labels on the left there once you do that hit ok i'm going to hit cancel because i've already done this you hit ok and i it'll say i'm going to it'll say reset all dialogues hit ok after that and then we should be back to here okay so back to back at it go to analyze dimension reduction factor once you do that the window pops up all the variables now should look like this on the left um if you did that edit options thing and then on the right it's the variables we want to include in our factor analysis so i'm just going to click on anxiety one i'm going to scroll down and hold shift and click on useful seven so that selects everything from anxiety one to useful seven it's all of our measures for our reflective latent factors i'm going to stick those in here just by hitting the arrow and so they are all now ready to be analyzed don't hit okay yet there are a few things we need to check over here on the right in the descriptives option uh in descriptives menu the default is to have the univ the initial solution and nothing here checked go ahead and check the kmo and the reproduced matrix reproduce there are a bunch of other things i encourage you to go and explore those and have fun with those and learn what those gave you i won't be covering those in this video or in this workshop but there are lots of cool things you can do with factor analysis that i don't do by default so again check the camo and the reproduced hit continue and in the extraction menu we have different methods for extracting factors they're just different algorithms with different biases and constraints um and parameters and accord there are different schools of thought on this which ones you should use and if you should use them uh intentionally theoretically and uh in the end i like what joseph hare has to say about this joseph hare is this popular methodologist he's probably the most cited methodologist in the management area or in the business school he wrote the multivariate data analysis book in all of its editions but he says in his book in the end the results are almost identical this is an exploratory factor analysis choose whatever you want and then try a different thing and if it gives you better information then you have done your job of exploring so is there a reason to choose one over the other yeah there is actually uh but does it matter terribly no it doesn't so pick one i'm going to start with principal components it's kind of the softest solution i mean is the most friendly usually for most data sets so i'm going to pick principal components and i should ask can you guys actually see my screen is it too small is the text too small it's too small but i can still see i don't know about others because i can go like this does that help yes yes i'll do that on occasion to zoom in that's better yeah i have a 4k screen and it's really really big so on laptops it might not show up very well all right so that was in the extraction menu we're going to do principal components for now um and you can also extract factors based on various criteria the common approach is to extract based on eigenvalues eigenvalues are just a measure of the contribution of a factor towards the explanation of variance in the resulting solution and so anything with an eigenvalue over one is considered a good contributor and so worthwhile of extracting we'll start there and see how it goes hit continue and then in the rotations menu sorry is it going to get a little dizzy for a sec there we go um a factor analysis uh quote-unquote rotates the data in order to maximize the distance between factors to create more discriminants between factors and so we do want to rotate and the most popular is probably verimax the most useful i would say is promax it's faster with big data sets and it does maximize distance if you can't get a good solution with pro max you can try varimax there are different reasons to choose each of these go ahead and just choose for now choose pro max trust me on this one hit continue we're not going to go into the scores menu although there are some useful things there for those who are familiar with factor scores or latent variable scores you can you can save those through that menu and then in the options menu zoom out again here are the options um we're going to there are no missing values in this data set so i'm not going to mess with that for now but if you had missing values you could choose one of these options and then we could sort our result by size which is useful but i'm not going to do it right now and we can suppress small coefficients which i am going to do what this does is in the resulting solution which is a pattern matrix which is essentially a matrix of the groupings of the items we can suppress any coefficients that are not very meaningful to our analysis i'm going to suppress all the way up at 0.3 because i'm not really interested in loadings less than 0.3 in fact less than 0.4 is not very useful for what we're doing but i'll set it at 0.3 for now so we don't miss important information hit continue and now we're ready um we can hit ok and i'll zoom out for a sec yours might run for a sec yours might be done already um here are the results the kmo in bartlett's says uh to what extent is this set of variables that we've included in the factor analysis usable or adequately correlated to run a factor analysis is it even appropriate to use this set of variables for factor analysis and it's assessed based on more or less the covariance matrix the extent which all variables are related to all variables or at least do enough variables and in this case we want the kmo to be high approaching 1. anything over 0.8 is considered fine uh over 0.9 is good and we want the significance of the bartlett's test to be the p-value less than 0.05 in this case we do have that so what does that mean that means the set of variables we've chosen do have sufficient sufficiently high and sufficiently a sufficient number of correlations within this set of variables so they're appropriate for factor analysis and then the communalities table in the extraction column this is a similar test it says well to what extent is each item related to other items in this set of variables and we want high approaching 1 is probably too high but um we want them above about point three you'll read different thresholds out there some say point four some say point two um anything around point three or above is considered adequately correlated so as i look through this make uh through this table i don't see any that are particularly low there's a 0.38 but everything else seems to be adequately correlated with other variables what this means is that the solution we receive will likely have a loading a primary loading for every measure there won't be any measure that doesn't really load anywhere because all of the variables are related to something so that's good this total variance explained table it tells us something like an r squared uh over here in this cumulative percent column it says what percent of the variance is explained by the solution we've derived and the solution we've derived is a nine factor model or nine component model so we see that in this second column here the total column these are the eigenvalues and we have nine factors or components that have eigenvalues greater than one and so it is extracted a solution with nine factors because nine eigenvalues greater than one what's good in terms of how many factors are extracted good would be what we expect of course if it's exploratory we don't know what to expect um theoretically but in this case we do know what to expect because we had names on these measures and we came into it with an idea where every measure should go in terms of uh the percent of variance explained more is better usually and anything over 60 percent is great anything over 50 percent is adequate so um that's the variance explained you can see we had 47 measures included and so it could have extracted 47 different factors if every measure bucketed into its own factor which would not be good then there's no real relationships between factors i'm going to skip the components matrix for now and go down to the reproduced correlations matrix this is a massive matrix which shows the error the residuals on the solution and we want to minimize error as always and so a lower number here and this percentage is better different thresholds are published some say five percent some say 50 percent less is better so 10 that's probably okay and here is the solution the pattern matrix this is what we really want to look at and it shows us how our measures group together into factors or components in this case since we did a principal components analysis and where you see no loading where it's just blank that means the loading was less than our suppression threshold we set our suppression threshold to point three so any loadings that aren't at point three or above uh they're just missing they're not showing here we can see here that the loadings which we hope will approach one um for anxiety measures all load together and nothing else loads with them across that matrix so that's probably a good thing that shows a good convergent validity that they all hang together and good discriminant validity that nothing else tries to group with them same with computer use or no that's comprehensive use i should explain this data this data is a data set of how you use excel i mean very very uh old school stuff how how do you use excel do you use it comprehensively in unique ways atypically playfully which is silly but very academic but it works well for a data set anyway so this factor comprised of these four measures also has good convergent and discriminant validity for convergence we want to see loadings above 0.5 ideally averaging out above 0.7 but definitely above 0.5 well not definitely there there are very few definites in exploratory factor analysis um but higher is generally better above 0.9 you start getting suspicious um that there's maybe some high multi multi co-linearity going on we look down playfulness looks pretty good average loading is pretty high social desirability we see breaks up into three factors well that's interesting so social desirability is this measure you guys are probably familiar with that it's a measure of uh to what extent do i answer things in a way that uh is socially desirable um do i say that i i gossip a lot or am i influenced by the social desirable answer socially desirable answer and i say no i never gossip at all um so this broke up into three different factors which is telling so it is not a single construct we're measuring with these items there are multiple constructs or dimensions of this construct going on so we might want to take a look at that in a minute this one information acquisition loaded well decision quality loaded pretty well usefulness loaded very well and so the only mess is over here in social desirability what we could do for those who are familiar with social desirability um we just need it to assess method bias in the end in our confirmatory factor analysis um so we it's not a theoretically um like critical variable it is a is a methodologically useful variable and so i'm not too concerned with how it played out like this when we when we model it in cfa we might break it up in fact you can see here there's one item social desirability three that seems to load on two different factors is this a problem the answer is not in this case because the distance between these two loadings the difference is pretty good um it's more than 0.2 it's almost 0.3 different which means it's not really cross-loading between these factors it has a primary loading right here on factor 8 and then it just happens to be correlated with those two those two other items on factor nine so we don't need to adjust this at all now this is a good solution it just worked which is terrible for a workshop because and also good for a short workshop it's good because we don't have to go and troubleshoot but it's terrible because in real life this will never happen i mean unless you are really good at measurement design and study design and survey design this will almost never happen that it works first time so let's pretend for a moment that we do not want social desirability to break out into three dimensions what could we do well we could try this again so just back up here analyze dimension reduction factor and in the extraction menu instead of extracting based on eigenvalues what if i just extracted based on what i expect and what i expect is seven factors because we had nine and two of them were redundant right there there were three um factors for social desirability but we only wanted one so let's take off two of them and see if social desirability sort of consolidates into a single factor so again that was in the extraction menu i just changed the extraction options instead of i based on eigenvalues we're going to fix it at an exact number of factors this is useful uh for exploring your data continue and hit ok now we didn't add or remove any variables and so these first few tables aren't going to change they are identical but the total variance explain table will change because instead of nine factors we now have seven factors we can look at the cumulative variance it's still above 60 which is great so we're good there and we can go down to the reproduced matrix to see if there's more or less error looks like there is more error because we uh imposed an unnatural constraint on this on the solution so there's more error but is it a lot of error not really it's it's not up to 50 so we're good here's that solution and we can look at social desirability and what happened oh it clearly does not want to consolidate onto a single factor in fact what it did was since we forced it to have seven factors it said well the most tightly related factors are actually information acquisition and decision quality so push them all onto a single factor and then let social desirability stay separate under two factors so this is not an ideal solution we should probably just leave it how it was with nine factors and then when we model it in our cfa we'll model it as three different factors okay i want to show you one more thing in efa take a few questions and then we're pretty much going to be done with spss we'll move on to amos um so the other thing i want to show you is in an efa um i should show you here actually here's a shortcut in spss if you didn't know there is this button right here next to the edit undo button um it's the recall recently used dialogs this is all the stuff you've done recently you might not have a lot in here but there is factor analysis up here you can just hit that instead of going back to the menu system and trying to find it the last thing i want to show you is in the scores option let's see oh we gotta zoom out where we go here it is in the scores menu um there is an option to see as variables the factor scores so let's say you didn't have amos or some structural equation modeling software and you just wanted to stay here in spss do some multivariate regression or general linear modeling or something like that something that spss could handle um and in fact sbs can handle a lot now especially since hayes david david hayes i think that's his name made that macro called process he made it so that spss can actually do mediation and moderation and so if you wanted to stay in spss and not bother with latent factors you can save these factors as scores and what it will do is it will take every factor i'm just going to hit continue there it'll take every factor in the pattern matrix and create a standardized score of as a new variable to represent the entire factor so instead of eight items for one of the factors you'll have one and that one will be more or less a centroid of the factor representing all of the items that were on that factor so it's a good representation of the factor in fact it's way better than an average or a sum or a proxy a proxy is when you take one item from a measure as a proxy for all the other items this factor score is a more accurate estimate of measurement of that construct so if you hit save then what it does i'm actually going to change our extraction real quick back to eigenvalues so we get a good solution here um continue what it'll do and when you hit okay is part of the process of analysis is it will produce over in your data set let me zoom out over here in your data set if you were to go to the very end you'll see there are now up here let me zoom in again new variables that we didn't have before called fact c1 fact 2 1 um and 3 1 and these are the factor scores for the pattern matrix how do you know which one is which well you go back over to your output go to your pattern matrix and you can see factor one here is if you scroll down it is these items useful so i would then go back to my data set go to my variable view down here on the left because there's a data view in a variable view i would go to the variable view and find that new factor score this one right here fact one one and i would change this to usefulness that is the new one and you could say it's the over here in the label useful factor score something like that and so now i have a single observed well calculated variable representing that entire factor and now i can use that in a regression um or in an anova or whatever i want here in spss the reason you'd want to do that instead of retaining the latent factor is just if you were to stay in spss or move over to excel or something like that if you're going to use a structural equation modeling software like amos or smart pls or m plus or stata or all those they're a bunch eqs um keep it latent uh the one thing one of the things the factor score doesn't do very well is estimate measurement error there is some of that in the calculation of the factor score but it is it doesn't account for the current model because it was built during the factor analysis where there are no um guided parameters it's all it's all exploratory um where the solution is built by the computer and so once we change the model by adding regression lines and covariance lines or parameters then the new error inherent to the new model is not captured as well in a factor score as it is in a latent measurement model so keep it latent if you can you'll have a more accurate model the other reason you might want to use a factor score instead of a latent factor is that it's vastly simpler you will not have such a complex model and so when it comes to estimating a solution it's easier to minimize error you have fewer degrees of freedom and so it's easier to find a solution than if you have a complex latent model so if you have a low sample size for example and you don't want to estimate um a massive covariance matrix because you have so many variables in a latent factor model then then you can use factor scores which will really reduce that covariance matrix and make it much easier to run so that'd be another reason the last reason i can think of is related again due to complexity if you're going to run a moderation like an interaction between two variables it is very complicated to do that in a latent form it's way easier to take a two-step approach and and do a multiplication of two factor scores rather than a pairwise uh permutation of multiplications of all of the items associated with each of those factors i think it's really complicated okay that's the end of factor analysis maybe i think yes i'm actually going to say a couple more things on factor analysis real quick um and then and then we'll move on um the the things you want to test in a factor analysis the the quality criteria are three-fold or four-fold um in the outcome it's really just three-fold now you want to assess convergent and discriminant validity so are the are the factor loadings tight and uh or high i guess um above 0.5 ideally above 0.7 in this case we can see yes in almost every case we have loadings above 0.7 um are they distinct discriminant are they cross-loading and the answer is yes in this case we're good um there are no major secondary loadings on any of these factors the closest we came was here with social desirability three and even this cross-loading isn't a problem because it's pretty different the primary loading is much higher than the secondary loading so we have discriminant validity which means that we we have distinct factors that are measuring distinct constructs or distinct dimensions of a construct in the case of social desirability so as convergent and discriminate validity we also want to assess face validity which is do the groupings make sense um or do we have like some jumble of items on a single factor if we have some jumble of items on a single factor what we'd want to do is go look at the wording of the measures assuming they were survey questions go look at what it's actually measuring and see if there's some common theme that we didn't pick up on that's causing a bunch of what we expected to be unrelated measures to be measuring the same thing the same factor and if there truly is a common theme then maybe we form a new factor that we weren't expecting if it seems to be random or or just coincidence then there is no face validity to that factor it doesn't make sense it's a statistical anomaly which does happen um in that case tough decisions need to be made about whether or not to remove items or to separate the factors based by by omitting certain measures the last is reliability you want to assess the cronbach's alpha to see if these are reliable factors the way to do that in spss i wish it would just do it automatically in the efa that would make sense but the way to do it in spss is to go to analyze at the top go to scale and reliability analysis and when you click on that it brings up this little menu where you can take a factors measures so for example anxiety one through seven stick it over here in items and hit ok and that will produce a reliability score a chromebox alpha let's see oops wrong way there we go um which if i can zoom into it here oh sorry this is jumpy i'm trying to get there it is um the chromex alpha in this case is 0.934 we want a chromebox alpha ideally above 0.7 although there are some justifications for slightly lower if you have only a few items as you know fewer items means higher error and so higher error means lower chromebex alpha by by by its nature a few a smaller set of items means uh potentially a bias towards lower um chromix alphas so ideally above 0.7 but if you only have like three items in a measure in a factor um then you could accept down to 0.6 there is some justification there okay andrew hayes ah thank you alice not dave not david uh thank you for that correction okay um i think i think that's all i want to say about efa are there any questions about the efa that was a lot like yeah um i do have one question uh first of all thank you very much for all of the information that you shared um i just have a quick question about do you advise that we do efa on all of the constructs at the same time or one at a time well it is such a great question because um there are some university doctoral programs that teach a very wicked tradition uh to do an efa on a single construct at a time what's the point of that the whole point of the efa is to explore the multiple dimensionality of your data set and to explore the discriminants between factors if you only do one constructs measures at a time you can't test discriminate validity so you've laid that one it's like if you go to disneyland and you don't ride on thunder mountain like what was the point of that um so so yeah you should always be testing as many as many variables as you have data for uh that are appropriate for a factor analysis so for example um in this data set we have a bunch of variables let's say i collected this data with the intention of doing multiple studies and so there are actually more measures in here than i'm going to include in my first model would i still include all of the measures in a factor analysis or just the measures for that model the answer is i would include all of them here's why it's more and better information and it's more opportunity for uh measures to find a dimensional home uh and and so it it's like it's like if you had a large data set would you want to include your full data set or just part of the data set uh in terms of sample size um include the full data set because it minimizes error same idea here the more variables you have assuming you have enough uh sample size uh the potential the greater the potential to minimize error and have a more accurate representation of reality so yeah include all the variables that are appropriate notice i didn't include uh age experience gender industry things like that um they are not appropriate for a factor analysis because they do not belong to reflective latent constructs there are there are occasions to include them i i won't cover them here but in your standard sem analysis you would only include reflective latent measures and all of them thanks for the question any other questions about efa for those who oh go ahead oh oh i have a question we were um yesterday there was the session um by lisa and i forgot her last name but she said if you have a scale that you're testing no reason to do an efa so right yeah cover that again i'm glad you brought it up so there are different uh schools of different camps on on whether you should do an exploratory factor analysis or if you should just jump into confirmatory um if you boil it down to what is required definitely if you have a a scale that you have built with a theoretical construct in mind you do not need to do efa you can jump straight to cfa there is no compelling requirement to do an efa on data that has a measurement theory going into it a priority so and that applies to 99 of all cases which is i build a survey based on a model that i have a priority which has constructs that i'm intentionally measuring very rarely would you have a data set where there was no a priori model going into that data set although there are cases where that does occur which is in secondary data data run by a different organization by like the world health organization or world bank or the government very rarely do they have an a priori theory they just have a bunch of questions they want to ask and in that case yeah you should do an efa um but and and it would be required because you don't know the constructs a priority now so then if this only occurs in one percent of all sem analyses why teach it uh the main reason is i still do it for every data set because efa is way better at um surfacing discriminant validity concerns than a cfa in the cfa it's very guided it you say these measures belong to this construct or to this factor i guess and so you're constraining the model from the get-go um in an efa you don't have that you just say here are the variables you go find the the the factors and it has no theory that the software has no theory about how these should factor and so it's really good at surfacing the dimensionality of the constructs case in point let me go back to the output here when we go look at the pattern matrix and we look at social desirability if i jump straight to cfa what would i have done i would have made one factor for social desirability but when we tried to do that let me zoom out go back to this one when we tried to do that this is what happened it says no i am not one construct i refuse to be one construct even if you constrain me to be one construct i refuse and it will not converge onto a single single factor and the efa revealed that whereas if i jumped straight into the cfa i would have just struggled to figure out what is going on with this construct why won't it converge how are there multiple dimensions if there are how do i see them because in a confirmatory factor analysis you don't get a pattern matrix there is actually some software that produces a loadings matrix um but in the for the most part you don't get a pattern matrix and so you don't need to see where the natural unguided loadings are and that's why i do an efa more information is mo better right yeah and it's easy this this took like half an hour 45 minutes which is really long um and uh in in practice when you're not explaining it to someone and having people follow along it takes 10 minutes uh so just do it thank you thank you for explaining that that's helpful oh good i'm writing a paper right now that explains why you want to do an efa even in an a priory theorized model and it's been rejected from journal after journal after journal and so actually if you want to send it to me i'm doing scale development i'll cite it if it helps there we go all right i have a good plan for it it'll get published yeah any other questions on efa i know we spent like half our time here i apologize okay now that we've now you've seen the pace at which we're going and you've gotten familiar with the software somewhat are there any who are still following along um with me in in the in uh in the software in other words do you want me to keep going at this pace or should i speed it up because not many people are following along think we can speed it up all right i in amos i'm i'm going to assume you're not following along i'm just going to show you and explain uh and the point there is to expose you to amos and see what it's capable of and so that you might in the future decide to use it i get nothing from it i get no kickbacks in fact i emailed ibm and said hey i'm making a bunch of plugins and automation tools for amos can can i can we collaborate and they said no so i get nothing from this uh but i find it useful and and i don't like syntax based uh statistics software and i'm a developer i'm a programmer uh and i don't like syntax-based software so for those who are not developers i can imagine it's so much worse so i'm just going to show you some stuff i'm going to open up amos get it up on this screen it takes a moment to load because it's ibm it's thinking it's like adobe like opening up that software man takes forever here we go let me pull this over oops this is someone else's model that i was playing with i help out people every day around the world and they send me their models and data and that was one of them all right let's see i'm going to load a data set real quick um just like i need to move this oh there we go okay so in amos this is amos it was built in like the 90s and they never updated the interface it is very 1994 ish um i'm going to link a data set by using file system i'm just going to use this data set we've been using the eam 2021 and again i'm not going to go as slow as i have been going um hit ok and the the slow way to do this is to use this little candelabra and um you can draw latent factors here um and then click on them and create items for those factors so the the observed measures would go in the rectangles the latent factor name would go in the ellipse here and then the the small circles here are the residuals the error um it's kind of a pain to do this manually so i wrote a plugin that would just do it all for us so let me just erase this the plugin says let's see it's the pattern matrix builder and pull this up here i just says paste your pattern matrix in here so i'm going to go copy my pattern matrix copy and paste and hit create diagram and it should hopefully there we go it creates the diagram for us uh which is way faster and way less error prone than using the manual approach let me just really quickly name each of these um so that we have useful names this is use full uh ooh i have i have a variable name useful in the data set so i'm going to call this useful f for factor you can't have latent factor names um that are identical to your observed variable names in your data set we have a variable named useful we don't have one called usefulness because i didn't save the data set so i'll just use that anyway decision quality dec quality so i've got to be careful about what i name these things um anxiety i'm going to put an f on there because i think we have one named anxiety already playful oh mess because i don't think it's called that um info acquisition pretend i spelled that right comp use uh you don't want um comprehensive apprehensive there we go you don't want spaces or mathematical symbols in your variable names uh in your latent factor names social desirability a social desirability b hopefully these aren't uh bad acronyms in some language all right there you go so here we have our measurement model our cfa and this is a lot like our factor analysis we just did in spss except we have told the software exactly where every variable goes exactly what group it belongs to whereas spss tries to figure that out iteratively so we are constraining the model to to already have a solution and so what you want to do since we're already telling it what the solution is is you want to compare your proposed model this is our proposed model to the observed model the observed model is the covariance matrix that is inherent in the data set so every variable is related to every other variable in the data set by its nature there is a r coefficient of correlation coefficient associated with every pair of variables and excuse me um if we were to take that covariance matrix and more or less subtract the covariance matrix implied by this cfa this solution we would get the chi-square statistic and so the bigger the difference between the observed covariance matrix and the proposed covariance matrix the bigger the chi-square the bigger the chi-square the worse the model fit and in this case model fit literally means how well does your model fit the observed covariance matrix and so again the bigger the chi-square the worse fit because the bigger the chi-square the bigger the difference between the observed covariance matrix and the one you're proposing and so if we were to like move variables around and assign them to groups that they don't belong to there would be a more error associated with that proposed model and the more error the higher the chi-square the higher the difference so worse the fit so we can test the fit we just uh actually go to a couple things first we're going to go to the analysis properties which is up here on the top left it is the abacus with a color palette terrible terrible interface um oops excuse me we're going to click on that and it brings up this window called analysis properties which just disappeared where'd it go here it is uh down here it was on a different screen so it brings up the analysis properties and you can set a bunch of different properties here we're just going to go straight to the output tab and we want standardized estimates because it's easier to read and relate and then we want modification indices at a certain threshold a modification index is a uh is a measure of how the chi-square statistic would change if you were to add a parameter to the model that is not currently there it's essentially saying hey this covariance matrix from the observed model is really different from the proposed one but if you were to add this one from the observed model to the proposed one the chi-square would drop by a certain amount so what is the threshold we want to see for that this is a relative threshold because the chi-square is a relative measure so the more complex your model the more sample size you have the higher you want your threshold in this case i'm going to set it at 20. how do i know that experience you'll figure it out as you do more of these there's no ok button so you just hit x and zoom out and then let's save it with this floppy disk and i'm just going to save it here eam 2021 and i'm gonna run it with the regular old abacus not the one with the color palette on it looks like a piano a little bit it runs it says there is an error i have no valid license i'm really glad this happened um i do have a valid license this error occurs randomly it seems uh every now and then so hit okay and okay and okay and okay and i'm going to save this if you didn't get that error fabulous and if you're following along i'm going to just close amos and then i'm going to re-open amos i'm really glad the closed captioning understands the word amos that could turn out bad here we go here's the same model i'm just going to run it again oh look it works this time found my license again weird error i don't know why they do that okay here's the solution but this is the unstandardized solution i'm going to switch over here to the left where it says standardized i'm just going to click standardized and that will automatically toggle over to standardized estimates and you can see the advantage here for those of you who use r or stata or m plus or any non-visual sem software you can see some advantage here there's a visual representation of the model which is so helpful to to understand your model and the output comes out on the model which is so useful instead of trying to sift through tables of output um and it's just organized and pretty i like it much prefer this what we see here is the estimates the loadings of the items on the factory and just like in the efa in the cfa we want to use above 0.5 ideally above 0.7 averaging out above 0.7 at the very least and we observe that's roughly what we have here here's a 0.53 but everything else looks pretty good so that factor is fine and so we can scroll down and just make sure we've got more or less good results here and it looks like we do until we get down to social desirability which we have obviously here not averaging out to 0.7 same here same here so what do we do in this case um this is evidence that we don't have convergent or discriminant oh sorry not convergent um we don't have convergent validity here those loadings are too low um and so what do we do about that well in the case of social desirability like i said before it is a method con factor it is a methodologically useful factor but it's not a theoretically useful factor it's not a core component of our theory and so i'm not terribly concerned about its validity i'm using it to eventually in a method bias test extract shared variants uh variance that is shared with this idea of of social desirability and so that is the only way i'm going to use this so i'm not concerned i'm really just concerned with my theoretically important factors here and they all look good now how good are they well we need to check the model fit which we were talking about which is in this output option which will come up in just a second let's see here it is here um this is the output so there is some table output it's not all visual and um let me zoom in you can see your basic output your chi squaring your degrees of freedom which are relative to the complexity of the model and the sample size so we have a pretty high chi-square here and our p-value for the model fit is significant which counter-intuitively we don't want because this is a test of bad fit we want the probability that the chi-square is zero to be non-significant we don't want or we want the chi-square to be zero we don't want it to be different from zero i mean so anyway you want this p value above 0.05 uh if you go to the model fit here we can see some familiar measures like let me scroll down here the cfi the comparative fit index uh we want the cfi for the default model that's the the estimated model to be above 0.9 ideally above 0.95 um for model fit and there are other measures you probably recognize here the nfi the tli uh the rmsea is a popular one um we want that less than .06 ideally and the p close to be non-significant again and um i think those are all the ones i want to show you here there is uh for some of you you're all going to have your favorite model fit measures if you do sem and different journals have different preferred fit measures there is one called the srmr that is pretty standard that's in the plug-ins menu by default with amos and for the srmr if you run that plug-in this happens which is not intuitive nothing happens but if you run this model with that window open it will disappear and then reappear with the srmr value which here is very small 0.048 you want it less than 0.08 so we're pretty good this is a good fitting model and again what does that mean that means that i'm not applying unnatural constraints to the measures the the sets of measures that we are proposing here that the useful one through seven measures belong together that is not unnatural it is a natural grouping that we would probably find if we did an exploratory factor analysis so that's that um if you end up doing that uh if you end up watching my videos or doing my course um there are a bunch of plugins here that will produce output for you um automatically here's a model fit plugin that i'm running and you'll see it produces a model fit table that can be copied and pasted and interprets for you and gives you thresholds and references so it just does it all for you and those of you who are used to using r or m plus you'll realize how useful this is you don't have to sift through and make your own tables there's another really useful one um let's see validity with confidence intervals let's see if this works it's thinking here we go it's gonna make it please please work still thinking it's calculating a lot of stuff this is a very complex model i mean producing an htmt ratios matrix um here it is holy cow it produced a correlation matrix square root of the abe on the diagonal found significant correlations produce the composite reliability let me zoom in the cr composite reliability the ave the msb and a bunch of other statistics which help you understand the convergent validity and discriminant validity and reliability which i'll go through real quick for reliability and he's gonna ask you one question of course yeah so um i have looked at your plugins and they're free i know but how can i put it into my amos because mine is university provided so i think it restricts my ability is it on citrix like a version yeah it is on cloud yeah i don't know how to do it in the cloud okay i just thought i just don't know i just don't know what it is okay thank you my guess would be it would have to be in the local uh the virtual local machine and so you'd have to find the hidden folder of app data in the local virtual machine um and you'd have to have permission to view that hidden folder which i doubt i kind of doubt your institution would allow that no i don't know it's unfortunate which is so sad because i could have avoided so much work it's phenomenal what you do and provide to others uh if only it worked everywhere um so here's the composite reliability you want the cr column you want those values to be greater than 0.7 when they're not here in the sda the social desirability factors those are not where we want them but again it's a methodological variable not theoretically useful so i'm not so concerned what i am concerned with is if there is a red in the theoretically useful variables here and i see in information acquisition abe is flagged i'm trying to figure out why oh it's because the maximum shared squared variance is greater than the ave which is a sign of discriminant validity issues but there are many different measures of discriminant validity this is only one of them so you'd want multiple signs to say oh this has a problem one of the most common is the for now larker approach which is uh does the square root of the ave in this case for information acquisition that's 0.726 is that greater than any correlation with all other variables um and so i look to the left with these correlations and uh oh yep there is a 727 which is greater than the 726 so there is a discriminant validity issue one more measure of this though is the htmt which is found down here and i can look at htmt for information acquisition and the one it's crossing with which is um decision quality right here so 765 you want the threshold for that ratio to be less than 0.850 and so actually we do have one measure of discriminant validity that says there is no problem this is the most recent measure and so i would feel comfortable relying on it to say no we don't have a discriminative validity issue here even though the for now larker failed and this other test of maximum shared squared variance versus abe failed uh the htmt passed and so we're good enough anyway useful stuff confidence intervals for all that um and citations anyway so that's that's the cfa there there's so much more in the cfa uh so much more that i haven't covered there's method bias there's invariance testing which is a pain in the rear um and i and i hate doing it anyway there's a bunch of stuff you can do in a cfa but we don't have time for uh you can find it in my videos you can find it in my course and and i've written plugins for most of it to make it less painful oh where to get the course or the videos there we go somebody pasted it thanks um the videos are there i have let me paste the course in as well let's see i have it up here somewhere here it is copy that link and i'll paste it in here that's the oop that went to alice only sorry let's see paste again there we go that went to everybody okay um so that's the course let me go back to amos any questions about cfa before we do some causal modeling in amos okay if you do have a question just just chat it or interrupt um again i only scratch the surface of cfa there is there are so many layers to cfa um and the whole purpose of the cfa is to validate your measures uh in two factors to have confidence that when you go into causal modeling saying that this variable predicts that variable uh that you have confidence that those variables are distinct they are different very they are different constructs measures of different constructs and that those measures are reliable that's the whole point of a factor analysis is to validate your factors so that when you move on to testing your theory that you have confidence that the variables used to test your theory are valid otherwise you don't know whether what you find is just a statistical anomaly or random noise uh or or valid so you got to do a measurement validation here so from here if we want to test some theory there are multiple ways to do this you can use factor scores like we produced in efa you can also produce factor scores here i can show you that if you go to analyze and data imputation again that was analyze data imputation we can produce factor scores based on this result um it'll what what it does is it produces these factor scores into a different data set which is just whatever your current data set is called in the same folder but it adds an underscore c for composites um i think it's for composites i'm actually not positive what the c stands for but that would be my guess um so you just hit impute it happens you close and if it doesn't happen there's there's a good reason which i cover in some of my troubleshooting guides and that data set let me go find it here oh you guys can't see that it's in the same folder as the data set um you're currently using so i'm going to show you here's my downloads folder which not always safe to show your downloads folder on a recording zoom call i'm safe don't worry um okay here is that data set uh underscore c so if i were to open this it would have factor scores in there based on this amos model so what i'm going to do is i'm going to do this both ways i'm going to show you a latent causal model and then i'm going to show you a path model an imputed causal model to turn this cfa into an imputed causal model oh actually i want to show you one other thing first i'm sorry not enough time i want to show you a second order factor uh we have here with social desirability three dimensions of the same factor let's pretend this was a theoretically interesting uh factor what i would do to turn this into a second order factor is i would omit all these correlations between them oh i think it yep trying to get rid of all these little correlations and some of you who use syntax-based software thinking that's so slow i can do that with like one line in r or an m plus yeah you can't see the model as easily all right um and now what we want to do is create a new latent factor over here and i will call this social desirability where you go here it is i will call this variable name social d for now and close that and i will connect it to its subdimensions and since this is a covariance-based software whoops then we need to have a parameter constraint those who are familiar with sem know that for all covariance based methods you need a parameter constraint per latent factor at the highest level so you'll notice on each of these latent factors there is one one per factor and that is to provide a parameter constraint it is a necessary condition of covariance based methods so i will add a parameter constraint here just to this top one double click the arrow go to the parameters add one and close that also you'll know that every predicted variable must have a residual associated with it notice all of these predicted boxes here have residuals error so i need to add error to these predicted dimensions that's a mess there we go okay um and i need to name them there is a shortcut for that you can just use a plug-in name parameters not parameters whoops close that name un observed variables here we go name unobserved variables it automatically names those for me and now i can co-vary this with everything else and only draws clockwise how convenient so there it is co-vary this to everything else that's another thing all exogenous variables in covariance-based methods must be covaried to all other exogenous variables you can actually run it without but it's an assumption of covariance-based methods okay we now have a second order latent factor reflective latent factor that should run just fine if i were to hit run here big model okay there we go it does run just fine um and now we can see it has loadings for each of these sub dimensions um some of them are negative which implies that they're inverse to each other but it does work we can see that this does work as a second order factor now if we were to take this and move it to a causal model a structural equation model with causal uh parameters we could do that by removing some covariances from our proposed endogenous variables the dependent variables so let's pretend for a moment that decision quality is our dependent variable what i'm going to do is i'm going to remove the arrows that go into decision quality all the covariances i'm going to just for fun and for aesthetics turn that around and move it with this fire truck that makes a lot of sense over here and then i'm going to draw regression arrows from other factors to decision quality and this is pretty basic because i'm just having one dependent variable and a bunch of independent variables in amos you can actually do mediators and moderators and stuff like that um and i'm going to show that in a moment but this is how you would draw a latent causal model um and this should run just fine if i were to go run this i'm getting an oh it says produce an error it says you don't have an error variable here i need an error variable because it's being predicted there we go and i just need to name that parameters text variable name e what are we on 51 okay and now i can run it should have used a smaller data set and smaller model would run a lot faster at least i'm not bootstrapping okay here we go it now produces let me zoom in regression estimates for these relationships um you can see here for example information acquisition to decision quality has a standardized regression weight of 0.63 i don't have this r squared displaying let me produce that in the analysis properties in the output there's a squared multiple correlations um that i can produce and actually i can make this run a lot faster if i get rid of some of these check boxes there we go now run it it's in that window as well that you would run your bootstrap if you wanted a bootstrap for a mediation or interaction or something so it is producing now an r square of 0.58 which is pretty good that's the variance explained in that dependent variable and that is a latent causal model as you can see this is very complex the model fit won't have changed much from what we observed in the cfa if we were to reduce this down to factor scores and just create a path model the model fit will change drastically because um we have reduced the number of parameters drastically and the covariance matrix has shrunk drastically from all of these observed items down to just a few so let me show you that i'm just going to save that and create a new here we go plugins i'm just going to erase all this there we go and connect a new data set for the factor scores one i'm going kind of fast here because we're going to run out of time just going to change the data set to this new one we created underscore seeds the one with the factor scores hit okay and what you can see in this little stacky button thing is oh let me move it so you can see it it's down here here we go this is a window with all of our variables in it including the factor scores we just created they're down here at the bottom i can just drag these out into my model oops i just did the same one let me zoom out be easier okay um there we go okay let me just pull out a few of these and you can see what we can do here i wish you could pull them all out simultaneously that'd be nice but you can't okay close that let me arrange these how i think theoretically they might relate to each other i think uh we're gonna have comprehensiveness on the left decision quality on the right usefulness on in the middle playfulness in the back information acquisition in the middle this one on the left okay here's what i think i think these guys predict decision quality through the mediators of information acquisition and decision quality and usefulness excuse me um and so i will co-vary these guys in the back they're my exogenous variables and i will predict these guys now i'm adding arrows everywhere for the sake of time but you wouldn't include ones that are not theoretically valid and then i would provide error for anything predicted and i would name those errors okay this is kind of an ugly model but magically in this there we go okay but this is a causal path model where we have implied mediation through these two variables and we can test all sorts of things in here if we just run this then we can see the r squares for each of the predicted variables on the top right 0.25 0.16 0.61 the standardized regression coefficients the standardized regression weights on the paths um this is a very high coefficient and i can move things around a bit so we can see a little bit better oops move this to where we can see it move this to where we can see it there we go okay and so you can get a pretty quick uh understanding of your causal model just at a glance which is really nice you can also produce tables that that will be useful such as this where you have each predictor each outcome and its standardized regression rate and whether it's significant um and you can you can assess mediation um i haven't i have a plug-in for that this is an s demand um there's not enough time to explain s demands but let's see star dot s demand let's start up vb i'm gonna have to show you that in a video later i don't have smn downloaded right now so you can assess mediation through using an s demand essentially it names parameters multiplies parameters by each other to create indirect effects um and those estimates are available on the stat wiki for free we've run out of time unfortunately i have another meeting i've run to in two minutes um let me show you one more resource uh let me go to the stat wiki real quick so you can see where you would find this stuff um statwickywiki.gaskination.com so here's a wiki it covers uh sem topics uh and includes amos and spss but also uh m plus and smart pls and even pls graph which was like decades old um and it has all the plugins in here and it has very useful references over here in the references area so for different topics like validity and mediation moderation it has access to all the automation stuff it's all free i just make this stuff as a service for people um and and that's how i serve the community um and if you get really stuck there are tutors you can go to here a bunch of tutors who i've worked with who know different software and different methodologies and hear their linkedin contacts anyway there's so much more i usually run a three-day bootcamp to go through this that covers three semesters worth of statistics i've tried to cram it down into an hour and a half and it's just impossible um but it's all we have time for any final questions for the last 60 seconds if you have any questions i can oh sorry somebody was no no go ahead oh somebody asks can moderation be done with something like this also yes um you can run multi-group moderations you can run interaction moderations and i have videos for that and you can find it here on the wiki too if you go to uh causal models then you'll see there are how to run multi-group how to run interactions mediation all that stuff let's see thank you yeah you're welcome um any other questions all right well good luck you can also email me i i respond to every email i receive within 24 hours every day i respond to a couple dozen statistics questions from around the world um if you do email me do me a favor since i do answer a lot of emails i keep it brief and if it includes a model just send me the model don't don't try to describe it in words i'm a very visual person so thanks um i will post this video to youtube i hope you don't mind none of your faces are on it i've kept it only on my screen so your faces won't appear on the video i gotta run okay thank you so much thank you thank you all right it is stubborn yes
Info
Channel: James Gaskin
Views: 2,585
Rating: 4.9398499 out of 5
Keywords:
Id: -_Kdr_8V4bI
Channel Id: undefined
Length: 85min 58sec (5158 seconds)
Published: Thu May 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.