Delhi SEM workshop

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] he has published more than 160 scientific articles in various journals like mis quarterly information system research general of association of information systems computers in human behavior and so many his research focuses in augmented reality human computer interaction and human cognition dr gaskin is best known for his efforts to make advanced statistical methods like structural equation modeling accessible to the world his statistics youtube youtube channel and accompanying state wiki are viewed thousands of time each day and nearly 20 million views today dr gaskin is also a serial entrepreneur having helped start eight companies in the last week welcome professor james kaskin to iit daily floor is all viewers thanks um hello everybody i uh i'm excited or happy to do this i occasionally do these workshops for uh universities or groups or conferences and i'm sorry it's gonna be a short workshop it's just an hour and a half and i usually i hold big like three day seminars workshops trying to cover three semesters of statistics all in three days which is pretty intense um but the pandemic has prevented me from doing much of that lately i just so you know this meeting is being recorded and none of your faces are on it except uh sachin your face is there so people will see that probably i hope you're okay with that when i post it to youtube um but for the most part you guys won't be seen so don't worry as i post to youtube i'm gonna send along the chat let's see here's the chat window um i'm gonna send oops a few links real quick to a data set that i'm going to use today let me see here are the links over here there we go just posted uh the first is a link to the spss file it's actually a file i used for another workshop i just did for one of the conferences uh so forgive the name it's tailored to that conference but we're going to cover the same stuff so that should work just fine the another link is to the stat wiki some of you might be familiar with some of my resources that i've put online one of them is a wiki called statwiki and it has a lot of information about statistics and sem in particular and everything there is free and then there's an online sem course if you want to brush up on your statistics or if you want to earn college credit for learning sem this is essentially an interactive book textbook but with videos and assignments and quizzes and grades if you want grades and a completion certificate when you're done if you want that that one's not free but it's i've made it far less expensive than any workshop fee for a real workshop so hopefully that's accommodating to most people today today i'm going to use this spss file that's linked at the top and we're going to go through exploratory factor analysis and then confirmatory factor analysis in amos efa exploratory factor analysis in spss um but then in amos do cfa and not everything with cfa there's quite a bit you can do with cfa i'm just going to sort of brush the surface and then we'll get into some causal modeling like path models latent causal models um mediation hopefully we'll have time for some mediation maybe some moderation and we'll see how far we get i do have to leave promptly at 10 pm your time in india i have another meeting i have to drive off to that i could not reschedule so we have an hour and a half one more thing uh i'm going to i'm going to be assuming that most people are not following along in the data set um i make this assumption because i'm going to be going faster than usual um it's all being recorded and i have videos for everything i'm going to do today there are videos already online and i there just won't be time to follow along slowly if you're fairly adept and savvy with statistics software with spss in particular please do feel free to follow along um but i won't be answering like troubleshooting questions and and trying to keep everybody up to speed with me as we go through this um i just gave you the data set so that if you want to later go back and watch this video and do the analyses you have the data set but i will not be trying to keep everybody step by step with me because i'll be going way too fast also sachin should i turn on closed captioning would that be helpful you're muted but i saw a nod okay um live transcription enable auto oops i just hit disable haha just like it's enabling there we go okay hopefully this helps and it's going to be kind of confusing probably with statistics i mean that's like another language um so it's like translating two languages at a time looks like it's doing okay good all right here we go i'm going to share my screen let's see i will share this screen share and hopefully you can now see um the spss window and as we go along i i should say as we go along feel free to uh chime in ask questions this doesn't need to just be like a fire hose of information if you want to ask questions feel free um maybe don't like constantly ask questions if you're the only one asking questions but feel free to ask questions you can post them also in the chat window which i better open up so that i have some awareness of what's going on in chat let's see where did my chat window go there it is um i'll be monitoring that and if i see a question there i'll try to answer it in real time okay for those who have no background in this stuff which i assume is a couple of you at least this is spss spss is a statistic software primarily for first generation statistics things like t-tests anovas correlations regressions things like that and um it allows you to do some more sophisticated stuff as well which i'll be showing you today like exploratory factor analysis so this data set is the data set i sent out a link for on the chat window if you still need that just ask and we'll post it again um and i'm gonna go through how to use spss very briefly but also how to conduct an exploratory factor analysis maybe in a little more detail so first off this is one view of spss there are actually two views this is the data view down at the bottom you see there's a variable view as as well with all of our variables on the left and information about them including uh the survey wording in that label column now uh for those who are not super familiar with behavioral research often we conduct behavioral research quantitatively through surveys and in surveys we often ask questions multiple questions about the same construct like anxiety or computer use or playfulness and all of these things ask about the same construct and using all of those measures together we can sort of surround this idea of a construct and measure it hopefully effectively hopefully validly and we can use spss among other tools to validate those measures to see if we did validly and reliably measure the construct we thought we measured or we intended to measure so one way to do that is through exploratory factor analysis i should say before you even do exploratory factor analysis there's actually a lot of other stuff you need to do like cleaning your data restructuring and organizing your data taking care of missing values looking at skewness and kurtosis all the normality assumptions looking at outliers so there's actually a lot you do before exploratory factor analysis but we don't have time for all that it is in the stat wiki it is in the online sem course if you're interested i i have other resources and videos online that cover all this we're going to jump straight to the efa and for those who are familiar with the measurement model arguments in the literature you'll know that an efa is sometimes debated do we need an efa an efa is an exploratory unguided factor analysis that's where we say here spss here are all of our measures how do they group sort of like a cluster analysis but for variables um and scholars say well we don't need that because we never design a survey without a measurement theory in mind uh we know which measures measure which constructs and so why do you need to explore that and they're right you don't need to do an efa if you have a measurement theory in mind if you do have someone else's data secondary data you did not design the data set you have data from some large association or organization some non-profit they often don't have a theory going into their measures and so an efa is very helpful so that's why i'm going to show it to you here we go i'm actually going to do some stuff now to do an efa you would go to analyze and dimension reduction and factor again i'm going to go fast because not enough time to keep everybody up with me what we're going to do is we're only going to bring in the latent measures the measures for latent constructs um for that that's like anxiety one down through useful seven notice i skipped the id you don't want to start factoring an id variable i also skipped these other values down here uh is my screen big enough for everybody to see or do i need to zoom in so here's here's what it looks like zoomed in maybe i'll do this here and there maybe that'll help okay so here are all the these latent measures for latent factors you can see that there are many for each construct for example these soci s-o-c-d-e-s one through ten those are all for social desirability but down here at the bottom we have things like age experience gender these are not measures that belong to sets of measures for a latent construct these are individual observed measures we have an actual value for age and for gender and and for things like that and so they don't belong to some latent construct they are individual measures they do not belong in a factor analysis so i'm going to exclude them i'm going to put all these measures for latent constructs over here in the variables area and then let me zoom out here i'm going to just go through each of these buttons real quick and let's see if it shows up oh it's up on my other screen sorry here we go um on this screen the descriptive screen there are some options and i'm not going to explain each of these but i do explain them in my course and on the wiki and in some videos uh these just determine what output you see in spss i would like to see the reproduced matrix i'd like to see the kmo and bartlett's test these help me know whether this is a well reproduced matrix helps me know if i have a good factoring solution the camo and bartlett's test lets me know if i have a good set of measures that are appropriate for a factor analysis again there's lots of other stuff i'm not going to worry about it right now but you're welcome to explore it and play with it that's how you that's how you can learn in the extraction button which is hiding on my other screen just a sec there we go um this is probably the most important button here it allows us to choose which method of factoring we'd like to we'd like to use so the most common methods are principal components analysis pca maybe maximum likelihood and principal axis factoring paf there are other options in fact the default option and some other software is gls general least squares but i'm just going to keep it at principal components you're welcome to choose what you want there are reasons behind choosing each one of those if you'd like to know more about that i talk about that on the stat wiki also joseph hare in his book multivariate data analysis talks about each of these methods and why you might choose one over the other uh but then later in this book he says but really you end up with more or less the same solution so it's since it's exploratory just pick one and pick another and pick another and learn something and that's the whole point of exploring your data there's no like definitive right answer here it's just exploration so i'm going to leave it at principal components i'm also going to extract factors based on eigenvalues and eigenvalue is a value of contribution to explained variance and there's a lot behind that but but uh essentially anything that has an eigenvalue of one or more is considered a good contributor to explained variance so i'll use that as my cut off criteria for extracting a factor and continue let's zoom out rotation lets you pick the rotation type they're actually different types of rotation on a factor analysis a factor analysis it can iterate and actually rotate the view on the data to minimize error and spread out the factors and condense the factors so there's more discriminant validity and more convergent validity within between factors so i do like to rotate my factor analysis i use promax but there are other options varimax is a very popular approach direct obumin is a default in m plus i think there are different reasons to use each i'm just gonna use promax for now and you're just gonna have to trust me on that one i'm not gonna use scores but let me show you what's in here scores allows you to save each factor as a variable so in sem we have these sort of massive models with factors that are measured by multiple measures and it's kind of messy a lot of degrees of freedom a lot of parameters and so one thing you can do to simplify your massive sem models is use factor scores instead of these latent factors it's like if you were to take the average or sum or use just one proxy variable to represent your factor same idea here but this is a weighted average that is also standardized so it is considered the most valid approach to to representing a latent factor i'm not going to do this right now but if you were going to do it you would check the box for save as variables and then it would save a variable in your data set at the bottom to represent each of the factors that was extracted last here is options bring that down here in options uh this is just preference sort of stuff we can sort the factor analysis to be prettier we can suppress small coefficients so that it's not as messy i'm actually going to do this one suppressing small coefficients i'm going to suppress it down to 0.3 i don't care about any loadings less than 0.3 i don't show those to me in my pattern matrix the solution okay that's it i know that was a lot of information but it is being recorded you have it all in front of you um and you'll have access to this on youtube when we're done and so this is really just to make you aware of the options and what's possible and some little explanation of why i'm doing each thing but not a full explanation what we have here is the output the km1 bartlets again is an explanation of or a measure of whether the set of variables we have chosen is adequately correlated uh meaning that it's appropriate to do a factor analysis a factor analysis has the basic assumption that the items are correlated in fact that's how it determines which items belong with with other items to group them into factors and if your items aren't correlated your kmo will be low and it's a sign that these aren't correlated strongly enough to do a factor analysis same idea with commonalities the extent to which each item is correlated with other items you want i should say i should say actually for the camo you want something above 0.9 ideally but about 0.7 is fine for communities in the extraction column you're looking for values above 0.3 there are different thresholds published here i use 0.3 but if you find something less than 0.3 you don't automatically kick it out it's just one bit of evidence that might contribute to a mountain of evidence for removing an item but you would never remove an item just because its communality was low okay let me scroll down this table the total variance explain table this shows you how many factors were extracted and you can tell by this this last filled row so we have nine factors or components since it's a pca we have nine components extracted meaning we had nine components or factors with an eigenvalue greater than one you see the eigenvalues oops sorry right after the ninth one dropped below one so we don't extract that factor the variance explained here is sort of like an r squared like you're familiar with with regressions it's how much variance is explained by a nine factor solution and obviously more is better and we explain 66 percent of the variance is that good yeah anything over 60 is good in anything over 50 is more explanation by the factors than by error or by chance so uh above 50 is fine above 60 is better zooming out component matrix i'm going to skip reproduce correlations as i mentioned before this is a signal as to whether your solution is a good solution says here's the amount of non-redundant residuals with absolute values greater than 0.05 what this means is how much error is there and we want less error so approach zero uh less than fifty percent for sure less than five percent good um we're at ten percent so we're pretty good not phenomenal but pretty good and here the pattern matrix this is uh where you're going to spend most of your time in an efa you're going to look at the loadings that's these values here and the factors they load on or in this case since it's a principal components analysis it's the component they load on so anxiety loads all together all of the measures for anxiety they are highly correlated with each other but not correlated well not highly correlated with all of the other items how do we know that well because here on component 3 all of the anxiety items load together with loadings greater than 0.7 in this case uh but they're strong loadings anything above 0.7 is considered strong anything about 0.5 is considered fine you can have loadings down to 0.35 depending on the circumstance but you want them to load together more strongly on their home or primary component or factor and not load elsewhere very strongly now in this matrix i have suppressed or hidden any loadings less than 0.3 so even though for example anxiety 1 does load everywhere its loadings are less than 0.3 except for on component 3. i should just show this i'm just going to rerun this real quick without suppressing that one without suppressing those small coefficients just so you can see what i mean here if i go down to the pattern matrix again you can see this is the same pattern matrix as this one all i'm doing is i'm suppressing those low loadings so every item loads on every component or every factor but i'm just hiding the low loadings because it makes it easier to read and we don't really care that much about the really low loadings so here we are again compuse uh another construct loads nicely together and not anywhere else same with playful now you'll notice here we have a loading less than 0.7 is that bad should i delete that item no uh that is not bad don't delete items that have loadings less than 0.7 again down to 0.5 is fine even down lower is fine depending on certain criteria having a low loading is not sufficient evidence to just delete an item there should be lots of evidence against that item uh before you delete it so that's fine i'm not worried about that at all we get down to social desirability and something weird has happened we have three different components in social desirability so it didn't load together although it did load in dimensions we'll take a look at that in just a minute let's make sure the other things are fine information oops information acquisition looks good and decision quality we have here's a lower loading decision quality 11 at 481 is that bad again it's fine it's not great but it's probably not a justification to delete that item as we look later if the cronbach's alpha is suffering because of it or if convergent validity measures such as the ave can't meet certain targets then maybe we'll consider dropping that item maybe hearing usefulness looks pretty good so let's go back to social desirability where we have the problem social desirability should load as a single factor it's not there are things we can do about this we can crunch uh we can force it to load as a single factor uh or we can explore the wording of those dimensions and and see if there are truly multiple dimensions to this factor let's do the lazy and easy thing first we're going to just redo this factor analysis let me go suppress again and instead of extracting however many we extracted that looks like we extracted nine uh let's extract seven because that's how many we actually expect so in the extraction options instead of extracting based on eigenvalues i'm gonna extract a fixed number of factors and that's gonna be seven so it's going to force it regardless of eigenvalues it's going to force it to extract just seven hit okay it says still a good solution still a good solution uh still explaining 61 of the variance almost 62 percent of the variance is still a good solution scroll down here error 16 not great but not terrible here's the pattern matrix looks a bit of a mess nope wait up is this the one yep here's the pattern matrix looks a bit of a mess um looks like i didn't suppress it 0.3 expressed at 0.1 oh well let me just fix that real quick to make it easier to read one thing i love about spss is you can run things pretty fast uh so if you make mistakes or if you want to try new things it doesn't take long to redo it here we go we forced social desirability to collapse and it wouldn't like it will not collapse into a single factor instead it collapsed these two into a single factor which which tells me that information acquisition and decision quality are more highly correlated than the dimensions of social desirability because these two won't collapse into each other um so truly we have multiple dimensions here now what do you do with that um the next thing is do the the not lazy thing and go back rerun the factory analysis with regular extraction based on eigenvalues just so that's my latest but the thing you do is the not lazy thing you go read the wording of these survey questions and figure out is there a conceptual theme to each set so in this case one through three are those strongly related to each other but not conceptually related to numbers four and five i'm just gonna do this once and then we're gonna move on so here's social desirability one two and three let me zoom in here and the wording i'm always willing to admit when i make a mistake i always try to practice what i preach i never resent being asked to return a favor these are all in the same theme for sure and then four and five i've never been irked when some when people express ideas very different from my own well first off it's hard to read and people might not have understood that wording i've never deliberately said something that hurts someone's feelings um kind of a double negative there and then those are separated from these i like to gossip at times so now we're in an opposite direction so the first few were like no i'm a good person these last ones are no i'm a bad person i like to gossip i try to get even i insist on having things my way i like to smash things these are fun items but they don't belong with the first three and so it makes sense that they're factoring separately so what do i do i'm going to keep them separate okay that is a factor analysis we want to talk about a few things with factor analyses the first is adequacy are these measures adequately correlated yes they are we saw that in the kmo measures do they have convergent validity that we can observe by looking at the loadings are they strongly loading yes they are do they have discriminant validity that is are they loading only on their own factor not on some other factor very strongly and for the most part they are the one we observed that didn't have good discriminant validity with social desirability actually didn't have good convergent validity either because it didn't all load together but do the individual dimensions of social desirability manifest good convergent and discriminate validity well let's look at it social desirability one through three those loadings are fairly good all about 0.5 so convergent over here what about this one social desirability 3 loads in both places well remember it actually loads in all places this is just a secondary loading that is greater than 0.3 is that a problem not really it's still quite a bit different from its primary loading the threshold i like to use is is it point two different i mean it is it's at least point two different um some other publications say even point one different is fine so there's that so i would only count these two as the items loading on this factor and they do load nice and high and then this factor not as strong and i would say that's kind of expected because the items are quite divisive you looked at those measures talking about gossip and getting even and smashing things people are going to answer those very very differently and so it's not a surprise to me that they don't have a really good convergent validity the last thing with the factor analysis is the reliability of the set of measures per construct and the way you test that in an efa is with a kronbox alpha and a cfa would be with a composite reliability let's just show you real quick a cronbach's alpha in analyze scale reliability analysis we have the ability to test chromax alpha i'm just going to stick all of the items from a single set of measures in here hit ok and down here this very tiny table here says we have a 0.934 cronbox alpha is that good yes it's very good we went above 0.7 ideally there are actually reasons you might accept as low as 0.6 if there are only a few items or excuse me um like three items uh you might accept as low as a point six and that is in joseph hare's book as well he talks about some uh flexibility with small sets of measures okay um i should say for uh oops sorry for social desirability do we test them all together no we would test them separate so to test this dimension of social desirability i would just do these first three measures to test the second dimension i would only do these two measures um and then the third of these four measures let's test that third one um because it looks dubious a little bit down to social desirability 6 through 10 and here it is let's check this out notice the reliability statistic is less than 0.7 is only 0.6 is that okay and in in the case of this construct yes this is okay social desirability is a specific bias marker it's not a theoretically critical variable to any theory right now at least in this data set it's just going to be used to test for method bias um specific bias attributed to some method variable and so is it okay that it's a lower uh chromex alpha yes that's okay i mean you don't want it to be a 0.2 or anything that becomes useless um but 0.6 is fine okay that is efa i have a question on the chat um someone says zoom please oh zoom in yes done uh someone else says which is the most accepted method and rotation ah in the factor analysis uh as shown there are extraction methods here let me zoom in um which one is most accepted it really depends on your type of data and your area in what i've done with business data we most often use pca a principal components analysis occasionally we'll use paf and maximum likelihood but usually it's principal components analysis a lot of behavioral psychological research uses generally squares but as joseph hare says it's just exploratory so what i do more often than not is i'll actually use multiple extraction methods and see if my factoring solution is robust or resistant to uh differing methods if it is then it's a pretty uh strong solution same with the rotation rotation the default i think i mentioned in m plus is direct oblement i believe in other softwares it's ver it's verimax and then i choose promax in spss which one is the right one depends on what you're doing uh varimax provides kind of a softer solution so if you have what we call haywood cases uh where you have a loading above one loading should not be above one uh if you have a haywood case veramax will fix that if you want a clean solution again it's just exploratory so you're just using that information to better understand your data so getting a perfect solution isn't really the goal of an efa but yeah a lot of people use varimax i would say that's probably the most popular i prefer pro max because it doesn't soften the loadings it lets them be what they want to be uh be extreme if they want to be extreme and so you get the true er loading if if i can say it that way if you use promax i almost never use direct obloomin unless i'm in m plus okay i think that's all the questions oh here's one more question uh could you please reiterate the checks needed if factor loadings are low yeah so if you end up with a low loading let me go back over here to this pattern matrix so zoom in here let's pretend for a moment that this line here this is social desirability 3. let's pretend for a moment that it did not load anywhere above 0.3 so it was just blank all the way across what do you do with that um with a loading less than 0.3 it's hard to do much but what i would first do before like deleting the item is that the last thing you want to do before deleting the item i would change the extraction method i would change over to principal axis factoring in fact let me do that right now oops wrong one here's factor analysis instead of uh principal components let's shift over to principal axis factoring see if that fixes it let's go back to that one loading there and you see the loadings did change in fact this top one whoa that one changed a lot um the loadings will change the solution might not change like you see the the pattern is still roughly the same um did change that one nice loading over here on the first factor um so i would just explore other options that one didn't work as well as i'd hoped let me go check uh maximum likelihood run that jump down to the pattern matrix oh look at that hey that's kind of curious all of the social desirability items loaded together now some of them loaded negatively and that's because as we saw the in the items the first half were positively coded i mean i am a good person and the second half were negatively coded meaning i'm a bad person so of course they're going to be inverse to each other um if we were going to use them as a single factor we'd want to re-reverse the values since these were on a five point scale we'd want to subtract those values from six and then they would all load together um [Music] so that's what you do and if you couldn't resolve it if you had an item that just wouldn't load with anything or the loading was really low you might try it in your cfa and probably you'd find it still doesn't work and if that's the case you can omit that item it's not contributing toward the measurement of that construct you'd have to mention that in your write-up say we emitted this item because it was a non-contributor it did not load above a certain threshold and it was bringing down the convergent validity of the of the construct so you just have to justify that all right uh last thing was please suggest a good book for efa uh joseph hare's book is really good let me just post that real quick um i have it here i'm just going to post it in the chat window here is the reference for joseph hare's book uh he has a more recent edition in the eighth edition i think is 2017. um it's a great book i think most of what i learned from a book about sem i learned from this book i've learned a lot from other instructors and from articles and workshops and just from exploring and figuring things out and from videos on youtube but from a book this is probably the most this book is probably contributed the most to my knowledge of any other book on sem i even have it right here next to my desk and i reference it regularly it's it is you can't see in this room my daughter's bedroom which is my office now during the pandemic um but this is the only book i have on my desk um even at my office at my university this is the only book i have on my desk there are no books in my office except this book i get no kick back unfortunately from this but uh there's my glowing sponsorship for joseph harris book okay last question here it says for consumer behavior marketing perspective would pca or generalized one be better again uh do both um pca is probably more common um but do both it's just exploratory so my my philosophy is more transparency more information about your data is better because you can make better decisions and you have more confidence in whatever you do observe we're going to move forward with amos we've spent a lot of time here in spss on a thing that you might never have to do but you need to understand it it is sort of the foundation has foundational principles for the confirmatory factor analysis which you should always do if you have latent factors so let me show that to you um over in amos where did i put amos here's amos so amos is a structural equation modeling software that allows you to model measurement models and structural models simultaneously or separately and also path models and it is drag and drop actually it's click and click there's no dragging and dropping um it was built i think in the 90s and the user interface has almost never been updated so it is a pain in the rear to figure out for the first time but once you figure it out or watch some of my videos you'll be able to use it and after you watch this you'll be like oh that's kind of annoying and not intuitive but it is literally the best software out there for covariance-based structural equation modeling for drag and drop there are other software's out there that do more rigorous more precise more comprehensive sem such as m plus stata and there are several others but this one is a user user friendly in quote in air quotes software so the way you would draw a common a confirmatory factor analysis is you click on this candelabra thing and then over here you draw this the ellipse that represents a latent factor of whatever size you'd like then you click on it a few times to add observed measures so for example for our solution over here let me go back to a this one for our solution over here anxiety has seven measures i would need to click this seven times to get enough measures for anxiety and then i'd want to uncheck that double-click this call this anxiety i'm not gonna do this for everything but you'll see what i'm doing i'd go to the data set which i haven't linked yet you need to link your data set to do that you go to file data files find your data set mine's just in my downloads folder let's see you can see my directory structure there nothing to hide there ooh here it is right here okay okay uh and here's the data set linked i hit ok and now i have a set of variables in this white stacky thing i think that's supposed to represent a database or data set and then i would have to drag out one at a time this item which defaults to massive label i have to go fix that view interface properties i know you're not going to remember all this uh but that's why we're recording it it's going to be in a saved video over miscellaneous don't display variable labels that fixed that and i'd have to do this for each item here and it's kind of a pain in the rear and amos in general that's a good description for amos in general it's a pain in the rear but it's better than what we have and uh let's see resize okay and you'd have to do this for each of those factors that we had in our solution over here takes forever so i created a plugin that will do this for you let me just erase everything here erase all um and let me show you the how the plugin works uh let me go to plugins and pattern matrix builder here and all you have to do is paste your pattern matrix in here which is way easier i'm just going to copy this and paste it in here and then hit create diagram and it will make it for you this is what we would have had to make manually which probably would have resulted in some human error i mean mistakes that we'd have to go and figure out later and then the only thing you have to do here is uh name it so this is usefulness etc and we can name these things i'm actually going to do this real quick because we will need these names decision quality anxiety i already have a factor named anxiety in this data set i need to change the name let me just call this anxiety underscore f for factor um playful playfulness uh info acquisition i'll just call its ia comp use cu social desirability sd but i have three social desirabilities so sd a s d b and s d c um i know this is small uh but so each i can zoom in like this there we go just escape i um if i scroll i zoom in so each latent factor is represented by an ellipse this is common representation common symbology in sem latent factors are ellipses and observed measures are rectangles error is attributed to all predicted variables and being predicted is indicated by an arrow going into you becoming endogenous so since all of these observed measures have arrows coming into them they are being predicted by the latent factor which is latent hidden not actually observed but since they're being predicted they need an error term associated with them a unique one so we've named each of these uniquely and that oh and then the last thing i know a couple more things each latent factor is correlated with every other latent factor that is an assumption of covariance covariance-based sem at least for the measurement model for the structural model all exogenous variables variables that are not predicted those are assumed to be correlated in covariance based sem and so you would want to co-vary all exogenous variables all the variables that are not being predicted the last assumption here is that per latent factor one parameter must be constrained this provides some sort of an anchor for the algorithms to minimize properly there's some math behind it but just trust me on this one there must be one parameter constrained notice here the parameter of this line right here is constrained to one i can actually delete that and move it it could be on this line instead or this line it doesn't matter i could even put it out here on the latent factor as a variance constraint there just needs to be at least one but usually it's just one parameter constraint per latent vector in a covariance based sem application again the reason behind that is quite mathematical i'm not going to cover that and but just trust me if your model doesn't run it might be because of that okay let's run this and see what happens first i'm going to save just call this uh so you go to downloads call this workshop cfa and before i run it i'm going to select a few options i also need to show you where i got those plug-ins just a sec uh so there is this abacus believe me that is an abacus it took me forever to figure out what it was i thought it was a piano and that didn't make any sense anyway it's an abacus with a little little color palette on it that opens up your analysis properties so intuitive um and in here you can go to your output and select all the things you want to be outputted we want standardized estimates for sure there are lots of other things i'm actually just going to check one other thing that's modification indices some of you may have heard of modification indices in relation to the the concept of model fit or goodness of fit modification indices tell you how to improve your goodness of fit and they have a certain threshold for visibility i'm just going to change this right now to 20 and you may be asking which is the right threshold it's actually based on the chi-square which is based on your degrees of freedom and sample size so it's relative so if i had a basic model with a small sample size four would probably be good i have a fairly complex model with a larger sample size so i chose a larger threshold because they will inflate based on that chi-square inflating okay close that zoom out save one more time and run this with the non-colored abacus oh i'm so glad we run into this this is great okay so check this out amos says when i tried to run it there's no valid license for amos i i'm amazed they haven't fixed this yet i do have a valid license this is not like some pirated copy i have paid for amos um and uh this just happens and i don't know why the programmers haven't fixed it all right so when this happens hit ok you've saved your model hopefully i'm going to save it again just in case i'm going to close amos completely every instance of amos i only have one instance open um so i only have to close that one window of amos if you weren't sure whether you had multiple instances you just need to hover your cursor over the amos icon down here and it would show you all the instances of amos open you could right click it and say close all but i'm not going to because i only had that one open let me go ahead and open amos again it will open back up to that most recent model eventually here we go and i will run it again hey look it ran this time i do have a license it's so weird okay so it ran a bunch of numbers popped up i'm going to select over here on the left there's unstable like ah sorry over here on the left there's unstandardized or standardized i'm going to select standardized um standardized is easier in a measurement model because all the loadings are relative to each other not to each other they're they're well yes more or less they're in the same spectrum as from negative one to positive one whereas unstandardized could be any value from negative infinity to positive infinity so harder to uh to relate to each other and we can just look at this um we can sort of eyeball it and zoom in here here's playfulness right here and these are the loadings sort of like the loadings we saw over here in the pattern matrix they will be different they shouldn't be like strongly different absurdly different but they will be different because a cfa is guided we have specified which factor groups each measure should belong to and so it is no longer uh having to deal with that extra error um it's well there's different error involved in constraining groupings rather than letting them be free so the measures the loadings will be different but only slightly hopefully um as you hover over them they turn red you can see what the loadings are again we want above 0.7 ideally about 0.5 is fine and even down to 0.335 is probably okay unless it completely undermines the validity of that measure of that construct and so we can look up and down here we're looking pretty good i don't see any like really low numbers um these all look pretty good you do oh there's a four a three those aren't fabulous again that's with social desirability which is just a marker variable so am i terribly concerned about the validity of that construct not like strictly i can be a little more liberal a little less strict when it comes to a marker variable like this i don't want it to be completely invalid but as long as we're having above 0.3s here i'm not too worried okay but that's just eyeballing it let's actually do some work um and and check out the validity of these i'm going to use another plugin for this uh and i should show you where to get those plugins so here's a plugin called let's see validity and reliability if i click on that it checks my model and produces we'll see here shortly i hope there it is it produced a an html file with a matrix in there a correlation matrix with all correlations identified and significance indicated with asterisks and the composite reliability which is like the chromebox alpha but for a cfa it's actually a different measure but it's and considered more valid than the convex alpha at least more uh more uh specific precise the abe average variance extracted uh i should say cr is supposed to be above 0.7 just like a chromax alpha uh abe should be above 0.5 these are both well cr is a measure of reliability ave is a measure of convergent validity although cr also achieves that um maximum shared variance uh we can ignore that for now same with max r we can ignore that for now and then the square root of the ave is on the diagonal and bolded let's talk about conversion and discriminant validity with this matrix for convergent validity we want the ave above 0.5 looks like we're doing pretty good we have this one red in red but that's for another reason um it's because the msv is greater than the ave we have the social desirability ones that are less than 0.5 so a little concerning but again marker variables i'm not terribly concerned really concerned with these ones up top here the the theory the theoretical constructs that are part of our theory then so convergent validity were pretty good discriminant validity would be looking at the square root of the ave which is on the diagonal in black and comparing that to every other correlation and we don't want this square root of the abe to be less than any correlation with another factor because what that implies is that the amount of variance explained by the set of items measuring that construct is actually uh the variance is actually better explained by some other construct uh and so that would say that it's not a good set of measures and and these two sets of measures are are co-mingled and not distinct not discriminant so we look here and um below here we don't have anything above 0.837 below here and to the left nothing above 0.749 we can do this for each one we're looking pretty good with all of our critical constructs we get down to the social desirability and it looks like um we're still good no no real discriminative validity issues except this one identified here um i a information acquisition square root of the abe that's over here 726 is less than 727 ah there is one okay um that's really really close um and we could probably fix it if we did an efa with just those two factors do i have time for that maybe uh what you could do is you could do an efa like we did before in spss but with just the items from those two factors i'm not going to show you this but it's just here uh so information acquisition decision quality remember when we tried to force this into seven factors those actually collapsed onto a single factor what we do is run a factor analysis with just those two sets of items and see where they're overlapping get rid of the overlaps the items that overlap and then run the cfa again although at this point it's so close it's like man do i really want to delete an item uh you could justify it either way that's up to you and then see if your reviewers buy it okay there are also measures of discriminant validity called the htmt the heterotrain monotreat ratio and you want all these ratios to be less than 0.85 or 0.9 if you're being a little bit loose about it and looks like we are this indicates that we do have discriminant validity i hear some references for all that and some uh notes now the way you get these plugins i have a video that shows how to install the plugins but it's just to run the stat wiki if you were to go to the left navigation bar there's a plugins page and on this plugins page explanations video of how to install them example of using them explanations of what each one does so there are lots of plugins they're all free just use them if they don't work on your version of amos i apologize i only have my specific versions of amos that i've tested them on and usually i make note if there is an issue with plug-in with compatibility okay uh there was a question can we accept a factor with just three items ooh good question yes let me go over here so you'll see in my model here that i have a factor with three items i have a factor with two items are those okay the answer is yes they're okay you can have a two factor sorry a two item latent factor that's fine it's not great it's not ideal but it's fine if as long as they meet the criteria for validity you can actually have slightly loose uh looser validity criteria for two item and three item measures are factors because they will have more error because there are fewer data points just like with sample size the list less the sample size the more the error um it's not ideal four is good uh in fact in that book joseph hare's book he says if you had to pick an ideal number of items it's four it's enough to identify the construct and measure it validly but it's not so much that you have multiple dimensions represented within those measures uh so four is considered ideal by joseph hare there are definitely arguments against that in other literature but i do subscribe to that to some extent i am a hair if that's a thing uh i do follow what he he uh has taught um a hair disciple maybe that's what i'm trying to say um now when you start getting measures with lots of items you can see these ones here a decision quality has eight items that's a lot and the problem with lots of items on a single reflective latent factor is [Music] you're more likely to have multiple dimensions hiding within those those several measures and so modeling them as a single factor is maybe imprecise and you'll lose some information so going above eight would be i don't know that i'd ever want to do that going above eight measures per latent factor and if i do i'd want to test them in an efa totally exploratory uh with a just those items as a single factor to see if they do break up into multiple multiple dimensions okay so that's a cfa oh one more thing in a cfa um one of the critical things you want to test in cfa is model fit model fit is a comparison of the observed covariance matrix that is to the the extent to which all variables are related to all variables um it's comparing that matrix to uh to the proposed matrix which is implied by this model here so we're suggesting that uh the measures are correlated most strongly in these specific sets well that produces a different correlation matrix and when you subtract it's a literal matrix subtraction problem when you subtract the proposed covariance matrix from the observed natural covariance matrix you get the chi-square and chi-square is a measure of the error associated with your proposed model compared to the observed model and that is the extent to which your model fits the data model fit so how well how good does your model fit the data that is something we need to test because if there is a lot of error i.e high chi square then you have not modeled the relationships in this data set very precisely very very correctly um you have there's some problems with the way it's been modeled those sets of measures don't belong together the way they've been modeled as indicated by poor fitting model to the data the way you test that is by looking at the output here let me go to the output window it is opening somewhere all right here it is nope where'd you go would you right here here's the output window and if you were to go down to model fit let me zoom in here are a bunch of measures of model fit uh the c min is the chi-square that's just another name for chi-square the df the degrees of freedom and dividing those into each other you want something between one and three is considered ideal but it's an outdated measure very few people actually use that anymore more people use measures like the cfi the compare to fit index which should be above 0.9 um ideally above 0.95 it's just supposed to be approaching one that's your target your target is one and so you want to approach that target and above 0.9 is considered good same with things like let's see the rmsea you want to be approaching zero that is the target thing uh measures less than 0.05 would be considered great less than 0.06 is acceptable less than 0.1 is also acceptable just depends on who you read measures like the p close are related to the rmsa they give you a confidence confidence in that measure you actually want bad fit here uh you want the p close to be above 0.05 in this case so this is actually good it's a good fitting model uh the last thing i often get questions about where to go i often get questions about the chi-square p-value it's shown here it's also shown let me go back to notes for the model oops this one right here is shown in your notes for the model here's your chi-square here's your degrees of freedom and here's your p-value and there's confusion around the p-value the rule is you want the chi-square test to be not significant a significant a significant chi-square test means that your proposed covariance matrix does not match the observed covariance matrix they are different p-test says they are different you want that to not be significant you want them to be the same more or less you want you want the difference between them to not be zero or it could be zero to be significantly not different from zero it's statistics there are a lot of double negatives in this case you want the p value above 0.05 but it is a very strict measure of model fit and considered outdated because it's very sensitive to model complexity and sample size because it's dependent upon the chi-square which inflates with sample size and model complexity so very few people use that anymore you can report it but i would not i would not throw out your model because you didn't have a bad p-value here a bad meaning above 0.05 which is good again model fit there are a lot of double negatives you want this above 0.05 to indicate that you have good fit um but i never use it i use cfi rmsea and i use another one the srmr let me show you that the srmr doesn't actually compute in amos they wrote a plugin for it and never bothered to integrate it into the main output so to run this standardized rmr pick that plug-in and then don't hit close hit run and then it'll pop up here and we'll say your srmr is .0480 we want something less than 0.08 so that's good so whoops we have a lot of indication that we have a good fit a good fitting model not all of the model fit metrics were like hunky dory good above and below the right thresholds but you don't need optimal fit you need adequate fit one way to summarize all this is you go to plug-ins model fit measures that's just one of the plugins that i have in my data set are in on the statwiki let's see it's gonna pop up here here we go and it says here are the estimates we observed here are the ideal thresholds and some interpretation is a good fit and the citations for that so that's model fit we're running out of time let's move over to structural models real quick so you can have a as i mentioned before you can have a latent structural model which gets really complicated really fast i'll show you that or you can have a path model with imputed factor scores i'll show you that as well so if you want to run a latent model which is considered more rigorous by the way more valid it it accounts for measurement error better than a path model which uses just factor scores so it is better to run a latent model when you can but it is highly complex and so if you have low sample size or if you're running interactions uh it is usually infeasible to run a fully latent structural model and instead you would do a path model but i'm gonna show you show it to you here uh for your sake here i'm going to make a dependent variable of decision quality oops let me use this symmetry option and then the moving truck or fire truck to move things around there we go that's weird um and then i'm going to rotate these items they're not in the way i'm going to delete the covariance arrows because we're about to predict this latent factor you can't both co-vary and predict and draw arrows from usefulness to decision quality these are single-headed regression arrows let's pretend we have a mediator for a moment let me move information acquisition over to this position and we'll treat it like a mediator delete its covariance arrows because we want we want to predict it and let's draw some arrows from these variables to that and from information acquisition to decision quality it's now a mediator and throw these in here as well and you don't want a floating variable a variable that's not predicting anything or being predicted by anything that would be weird and useless okay just going to organize this a little bit better like this i really should have saved this before making all these changes oh well there we go so this is my latent causal model it's a bit of a mess it's very complex and it's incomplete i actually need to provide error on anything predicted so that needs an error term that needs an error term and we need to name those errors and i know i'm going way too fast but it's all being recorded so there you have it so we've named those errors we've gotten everything right i think i'm going to run this hey it ran look at that and standardized estimates are being displayed and if we look into this a little bit more closely let me go over here we can see that information acquisition has a strong positive effect on decision quality um that regression weight is 0.63 probably statistically significant don't know by looking at the model unfortunately amos does not provide significance indications on the model but you can see it over here in the output here in estimates you can see the p-values for all those regressions um so for example the one we were looking at was information acquisition two um decision quality that's this one right here and it has a p value of three stars i mean less than point zero zero one we can actually change the decimals here to whatever we want here they are and it's still less than point zero ten zeros one so it's a very small p value let me change that back those kind of obnoxious there we go okay um that's a latent causal model very few people uh do this in amos because again it's very complex they prefer a path model you can create a path model by imputing factor scores i already have factor scores computed but i'm going to show you how to do this i'm going to go back to this model let me see if i can just go back to it without saving this one no don't save there we go back to this model um i would go to analyze data imputation and that is going to compute a factor score which is a weighted by regression weight weighted average that is standardized so that so that each factor like usefulness up here instead of being measured by uh seven different observed items is measured now by one factor score that accounts for all of those weights um so i can just impute i actually have a file name that i need to change the file name um youtube um if i hit impute it will impute those it says that data set was created i can close this create a new canvas here upload the new data which is just in the same directory as my old data which was downloads you can see this is the one we just created and it will have some new variables at the very bottom which i'll show you at the very bottom here let me zoom in it has these new ones that we just created starting with sdc so i could create a path model from just these let me do that okay i think we had decision quality out to the right and information acquisition right here in the middle and pretty much everything else uh on the left now i'm not going to bring out social desirability for now because it's just going to make things more complicated we'll just create this kind of basic path model and if i want to make it pretty i can resize observed variables they're all the same now draw my regression weights or my regression lines just like before but a lot less mess draw my error terms name my error terms covari my exogenous variables and then just make it look good well it doesn't look very good oh well it's pretty close um and now i can run this model i'm going to add one more parameter to my output that's going to be squared multiple correlations that's the r square um we can also do mediation right now but let's just test this real quick save this as workshop path run it i know i'm going fast it's because we're running out of time after i run this you can see here are the regression weights we can see that usefulness has a strong effect on information acquisition information acquisition has a strong effect on decision quality and the r square for decision quality is pretty high 62 which you can see up here in the top right notice that this value this regression weight 0.79 is different from what we observed in the latent causal model um a latent structural model and and there are a couple reasons for that the first is we aren't including all of the same constructs here we actually dropped all three social desirability constructs and they may have been explaining some variants in decision quality that i that now information acquisition is explaining so it's absorbed some of that and inflated a bit the other reason is because we are not accounting for all the measurement error because this is just a path model and so these values will change slightly they shouldn't change drastically but they will change slightly so which one's more valid the latent model because it is accounting for all that error which one's easier to use the path model which one is most commonly used uh in amos path modeling in m plus and other syntax-based softwares i usually the latent model because the complexity is hidden in the code you don't have to worry about it visually as much all right we're super running out of time we have like 13 minutes um there are ways to do mediation and moderation i'm going to cover those crazy fast and not complete to do moderation you would double click on group number over here and you'd add a new group group number two and to each of these groups you would add their own data set right here and then you'd run them all with both data sets and then you could toggle back and forth between um the let's say it's male versus female or something like that you could toggle the male data set and the female data set it can actually be the same data set as long as you provide evidence of a grouping variable right here and provide a grouping value to say uh in my gender column one is male 2 is female or something like that and then it would just run the analysis for the ones in one group and for the twos in another group then you could compare those there's more sophisticated methods there that you could do don't have time for that but i have videos for that and the course covers that if you want to try the course remediation um i have lots of plugins for that uh but the basics are without the plugins you go to analysis properties do indirect direct and total effects and run a bootstrap mediation requires a bootstrap because you're producing confidence intervals um so here's the bootstrap run it a lot of times usually upwards of 500 2000 is a pretty good number you could do some bias correction because there is some inflation going on you can also change the confidence level from 90 to 95 if you want i prefer 90 because it's mediation mediation's already got some issues in it oops i need to get rid of this group delete if we run this uh we can see in the output that there are now where'd you go estimates matrices let me zoom in here you'll see on the left we have uh total effects direct and indirect effects i'm going to go to the standardized indirect effects and we can see over on the right oops sorry it's a let big pull this over here there we go we can see that uh the indirect effect from usefulness to decision quality has this standardized regression length and if we want to see the p value for that we can come down here and look at bootstrap confidence bias corrected percentiles two-tailed significance and the p-value for that indirect effect is point zero zero one so we have a significant indirect ie mediated effect between usefulness and decision quality and i know that was a whirlwind again it's all recorded there are videos for all of it it's all on the stat wiki it's on the online course this was just a a really quick exposure to the kinds of analyses you can do with spss and amos and there's so much more usually i cover this in multiple semesters of courses so i missed a lot i didn't cover everything didn't talk about method bias in variance testing interactions hierarchical models higher order models there there's a lot we didn't cover in the few minutes we have left are there any questions regarding this or anything related to this if anyone has to ask anything here she can ask directly i'll see molina you raised your hand go for it it was nice session and very quick we'll refer to the video later yeah but thank you very much for your time oh you're welcome professor i wanted to ask can we perform multi-level modeling in spss um yeah by multi-level modeling i assume you mean something like organization department employee the answer is yes i think it's possible i've never done it i have no videos on it and i'm definitely not an expert in that area i'd look for other youtube creators to to help with that okay thank you very much welcome somebody else asked let's see sono asked for a common method bias how to select the variables oh yeah so there are multiple ways to do method bias um i can just show you on the stat wiki real quick i think i have some pictures cfa and method bias so there are different approaches to method bias there's the common latent factor where you relate some unmeasured latent factor to all existing measures um there's another one where you have some marker like specific like social desirability and you include that as a latent factor and then again a com an unmeasured latent factor over here connecting all of them together uh that's another way and considered more rigorous and it doesn't have to be social desirability some specific bias variable it could just be some theoretically unrelated variable some variable that shouldn't be related to the other factors but the extent to which it is related can then be extracted out by that common factor it's not perfect because if there is some trait variance that is shared that is also extracted um so it's definitely not perfect but it's better than just using a common factor alone because what the common factor does is it doesn't extract method variance it extracts common variants which might be trait variants which is what you're trying to use to explain your theory so using just a common factor is actually not a great solution having the marker variable in there helps because it minimizes the amount of variance being extracted and the most current approaches i explain here in detail the best approach is if you have a specific bias marker so if you're collecting data for example on good and bad behavior social desirability is a really good marker variable for that because there is some socially desirable way to answer those questions about bribery and and theft and and things like that the people are all they're all going to say no i don't do that no i don't do that um so there's a socially desirable way to answer those questions in that case social desirability is a good marker variable in other cases um such as a company allegiance or loyalty when you're asking company questions like how innovative is your company how successful is your company how much do you like your company all these things will be inflated for very loyal employees and for disloyal employees or employees who are who are feel disenfranchised or feel unloved by their organization or feel like leaving these will all be deflated they won't they won't match the true quote unquote true measure of the construct and so you might pick a specific bias variable like loyalty um to extract that thank you professor for the answer uh just one thing also i wanted to ask do they actually ask for the justification of why we chose for example social desirability as the method bias factor or no justification justification has to be given um if it's not evident it should definitely be explained uh for sure good reviewers will probably identify that but not all reviewers are good reviewers so they might not um but in all cases it's good to justify your analytical decisions whenever i read through a paper as a as an editor or reviewer i check the the decisions they made the analytical decisions they made that could have changed their results and if there are decisions they made that aren't justified even if i think i know why they did what they did um i asked them to please give me at least parenthetical justification for why they made that decision and what the consequence would be thank you so but we can use a new kind of a variable not necessarily which is available like in established scales varied skills we use for common method also are there any validated kind of or we can use any yeah so the most common uh method bias variable is definitely social desirability bias that is actually the only marker variable that i'm aware of that has been validated to to somewhat effectively extract method variants most others that have been tried they share too much trait variance like i mentioned loyalty loyalty shares too much trait variance with other constructs and so it's hard to use that and not break your model method bias is a tricky thing accounting for it trying to mitigate it excuse me methodologically in your model often destroys the validity of your factors and so it hurts your model so instead it may be better just to show that there is not significant bias and then move on without trying to mitigate any nominal bias but that's a debated topic and i'm definitely just one voice in in that uh that debate thank you so much yeah well i've gotta run here i have to run off to another meeting so thank you for your participation i just for a quick memory we can have a snapshot if you don't mind uh all of you have if you can switch on your videos you can have oh you want everybody to turn on their video yeah oh sure i'm gonna stop recording though
Info
Channel: James Gaskin
Views: 2,784
Rating: 5 out of 5
Keywords:
Id: YxF7vzbFlUQ
Channel Id: undefined
Length: 88min 28sec (5308 seconds)
Published: Thu Jul 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.