SPSS - Mediation with PROCESS and Covariates (Model 4)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody I am back doing some SPSS videos for you today and so what we're going to do is we're gonna cover the new version of process as this process 3 which has some new additions to it we're trying to do a bunch of different permutations of the options and process as well as present to you some options to doing this at our if you would like to learn how to do it in R so I'm gonna start with simple mediation which is model 1 here or metaphor in process it's one of the more popular options that process has so it's simple mediation but because it includes only one mediator so we're gonna have some X variable predict Y and also to see if X actually predicts M which predicts Y so we're predicting this sort of third variable which is what mediation is and we're looking at if the path is redirected from X to Y to X into y so it's kind of like predicting an order I've done simple mediation on my channel before so this time we're gonna throw in some covariance to kind of mix this up so we're going to include two extra covariates to show you how that works in future videos we're going to talk about categorical predictors because that's another popular request so there's gonna be simple mediation with covariance and process version three so I pulled this dataset from arts the empty cars data set and this example may not show real mediation but it will show you how to do it so we have our cylinders in a car that are gonna predict the miles per gallon but we also know that horsepower is involved here so we're gonna predict miles per gallon with cylinders as X but also include maybe a redirect of horsepower because cylinders can lead to horsepower which can lead to miles per gallon well we know that the weight of the car and the gears of the car also are influencers so we're gonna include those as covariates and so that leaves us with four IVs and so if we were trying to do power for this powerfully tied to the number of independent variables and I'm mediation can be really tricky because you're really wanting to have that indirect effect be different from zero but you still probably need your r-squared or overall prediction for the model to be significant in whatever format means to you and so we're just gonna use kind of a easy hack for power that kind of anybody can use and to do that I'm gonna use a cheap power which is one of my favorite power programs for non our people and what we'll do is we'll pick that we want to use an f-test and linear multiple regression fixed model R squared deviation from zero what that's gonna allow us to do is use R squared deviation from zero and that is we expect something predict why it's either gonna be excess predicting why originally and then M is actually the one that fairly predicts or xnm together predict our Seavey's we might expect to predict so something to be predicting y so this one works pretty well when I want to just kind of hack estimate power move this over click determine we're knees from correlation coefficient because it's the easiest so squared multiple correlation or row here that's our our estimated r-squared so let's say we want to do a medium FX is 0.06 for r-squared there are a couple of different rules that you can use here but I've seen point oh six the most okay calculate and transfer to main a that's gonna transfer it into Cohen's F squared alpha is 0.05 or pick your favorite power most people use point eight at least in Psych and we have four predictors our IV RM and our two CVS because we're doing this on the final model which is one more the most interested in so four predictors here if you're on a Mac the calculate button SAR defines I just hit enter you got a Windows machine you can find it pretty easy and then it says we need two hundred people I don't think we have quite that many cars in this data set but that would be the required number of people so I'm gonna cut and paste that into this how-to guide for you to look at later it now we're going to pop over to SPSS and work our way through some data screening if you've watched my channel before you know I'm really into data screening because it's important to look at your data and sort of understand what's happening in the background so this data screening procedures from tabachnikov Fadel I'm not really gonna explain a lot of steps I'm doing because I have like six videos on how to do this in SPSS so if you want to learn more about why the things that I'm doing be sure to check out one of those videos so I'm kind of trying to keep these a little bit shorter by walking you through these steps so you have a a guide from start to finish but maybe not explaining it quite so much in detail because there are other videos that you can watch if you need some more background on this procedure so let's start with data accuracy so here's my data set in SPSS and there's a lot more data than I really am gonna use here so we're just gonna start by looking at our of the data points accurate for the columns I'm interested in okay so I'm gonna do that imma go to analyze descriptives and then descriptives we could also use frequencies if we wanted to create histograms of each individual one stick with the script is here so miles per gallon cylinders horsepower weight and gear here on the options it gives you me the main ones that you're interested in but really the range too so you get the min and the max and that might be one reason you want frequencies instead of descriptives all right so miles per gallon the range is from 10 which seems kind of low to 34 so that looks about right cylinders runs from 4 to 8 I only got 32 participants so I'm way under my sort of required melt here so hopefully we have a bigger effect size than we think horsepower here and is from 52 to 33 35 I have no idea if that's accurate but I'm assuming it is weight from one to five tonnes gears from three to five so we have any fancy 6 gear cars in here right so this looks fairly accurate I will be looking for values that are outside of the range of the expected data are really wild standard deviations that I am familiar with and I know that it should not be that high so we've checked off accuracy here check now for missing data I can already tell I don't have any missing data because it would be down here it would say which ones we're missing so check and check now let's talk about outliers since this is a regression analysis we're gonna use the two strikes you're out rule and we're gonna look at calculating Mahalanobis distance Kooks values and leverage values for the final model so we're going to not use step 1 or step 2 in the sort of traditional mediation models but we're gonna use the last model with all of the variables in it so I'm gonna go down to analyze now I have to do this as regular regression and not use the plug-in because the plug-in doesn't have this option yet the Russian linear our dependent variable here is we're going to predict miles per gallon we're gonna use cylinders horsepower weight and gear because it's our X M and to CBS under save we're going to hit all three of these bad boys and there are even more options to checking for outliers these are just kind of the big three that you'll see continue that's all we really need to do but since we're also gonna move on to assumptions let's go ahead and set that up so under plots we're gonna put Z predicted and y is 0 0 and X histogram and normal probability plots so this is going to give us everything that we need to check for outliers and for data screen subjects continued and there's gonna be a lot of output here 30 first thing you want to do is ignore the output so we want to come over here and look at our scores for these so I have to figure out what kind of cutoff score to use for these so grab my Mahalanobis scores which ones are no good and what we want to do that with come back over here is we got first figure out a degrees of freedom well degrees of freedom is the number of predictor variables which in this case is 4 if you've looked at some of my videos before sometimes that's the number of variables that goes into the equation but in regression is the number of predictor variables so we've got to find a cut-off score some use a chi-square table oops misspelled chi-square and no chi-square table is gonna pop out there this that's what you want to do is not do whatever I just did sorry about that that's trying to make it bigger so you could wring it alright so Adobe hates me how about zoom zoom would be nice zoom okay here it is let's make this bigger so we have 4 variables down here and we want to use P less than 0.001 because when one things be really weird before we start excluding them so we've got 1847 for 4 degrees of freedom and we want to see how many people should we exclude let's come over here I'm gonna sort my Mahalanobis values so data sort cases because on my Mac ctrl-click doesn't always work for the sword option for some reason so you can also right-click on the column and do sort okay I'm gonna put those in descending order because it's always positive thank you for telling me now check this out so anything over whatever our cutoff score is would be considered an outlier we said it was 1847 so we don't have any outliers her home Mahalanobis okay so no outliers yet for cooks values what I want to do is figure out a cooks cutoff score and I listed those criteria here in the document so we're gonna do four divided by in which I said was 32 minus K what the heck is K okay remember is the number of predictors so four minus one do a little math here on the phone so 32 minus four minus one is 27 so four divided by 27 it's 0.148 it's usually best to go three decimals can these can't get and these are always small numbers Wow I have my handy calculator out let's go ahead and do leverage as well so this would be 2 times 4 plus 2 divided by 32 it's 2 times 4 plus 2 divided by 32 it's point three one three so let's go back to SPSS and sort our cooks values still descending and now I have some values that are over 0.148 so I always want to mark those people in some way so we can I've noticed a couple of different ways before but really what we can do is use a column that are transformed and so we can compute variable here I'll call them cooks out for cooks outliers and I'm going to say any see here cooks values what hit a button that are greater than 0.148 that's it okay I hate that it opens the output every single time so anything greater than point one four eight is now marked with the one so that's really handy because now I can filter or sort cases based on them being bigger than this cutoff score we didn't have any Mahalanobis outliers it was pretty easy to see but if you did you could use the same procedure i'll see that last thing one more time for leverage i'm pretty sure we have some leverage outliers let's go to leverage out when we decided that was point three one three it's a point three one three so anything greater than point three one three is an outlier and i can also sort those to see those people at the top so I only have one leverage out later so I to cooks outliers and one leverage outlier now the two out of three rule means that if you get two strikes you're out so if I had cooks and leverage cooks in Mahalanobis etc I would be considered an outlier so we could create a total outlier column now I don't have any Mahalanobis ones so easy enough add these two together looks to me like I don't have anybody who's two strikes are out but the nice thing about using it and creating these columns is later in life in your life what the heck did I do is there for you to remember additionally you can also use filters so data select cases I could say only use people whose total outlier score is less than two now in this case that will be all cases but later if I'm trying and when I maybe if I had an analysis where I do exclusive outliers I could remember how many outliers there were and I could use this filter column to filter people so that's just kind of an easy way if you have maybe thousands of cases and you wanted to run analysis with them without outliers okay so you can turn the filter off and on so with our outliers we don't have any now the nice thing about not having any is it means we can look at that output that keeps popping up and going away I'm here for all of our assumptions so we don't have any outliers here so we would say no overall outliers found then if you want to cite this procedure I Mahalanobis have pulled from tabachnikov Fidel Cookson leverage I think is also in that book but mostly Cohen Cohen Aiken and West it's a blue regression book so additive ities really the first assumption an activity remember is that the there's no multicollinearity now the point of a mediation analysis is actually multi colony so this gets a little tricky but mainly we're just concerned that it's not too high that the whole thing doesn't run so to do that we go to analyze correlate bivariate we're going to throw all of our X our predictor variables in there so not miles per gallon because that's why but cylinders horsepower weight and now we do expect these to be correlated because you expect X which in this case is cylinders to predict M which is horsepower so I expect them to be correlated I just want to make sure this isn't like point nine nine or something you just don't want too high that the regression analysis won't run so we are talking about some very highly correlated variables here but again we're doing mediation simulations kind of point is kind of suppression because normally I would not want to include these two variables together on the same analysis because those are so highly correlated they're just gonna kind of cancel each other out but that's the point of mediation so mainly here make sure nothing is perfectly correlated and that will help you troubleshoot any errors about a singular matrix so lots of super highly correlated variables that's what I would expect here so moving on now for normality we expect the the error terms to be residuals to be centered around zero and normally distributed so let's go back over here now that I actually already had run as part of that regression analysis it's jumping around if you exclude outliers run that regression analysis again so you get the histogram without the outliers this histogram we didn't have any outliers I'm going to use this one it is centered over zero and it's mostly between two and two is not perfect we also have a small sample size so the larger the sample of hopefully the better it will approximate normal with at least 30 people the central limit theorem kind of kicks in but bigger samples are always better here you can see that maybe there's some missing out here but miss mostly centered over zero in between two and two so it looks pretty good if we scroll down and we want to look at linearity we mostly want the dots to be on the line be forgiving because especially with smaller samples so mostly you don't want these to look like s curves or to be like a hammock that you could take a nap in so mainly we're here just looking kind of that they're close to the line these look pretty close to me the last thing we want to do here is look at this standardized residual scatter plot so it gives me a residual of the our scatter plot of the residuals from the regression so that's how far off we were at predicting our scores with the standardized predictive residuals and so you kind of you want these to make a blob with this few dots it's hard to tell but you want for homogeneity them to be centered around zero both directions so mainly you want the graph to be from two to two on the bottom two to two on the side or three to three just not like two six that would be better so we're mostly centered around zero for homoscedasticity you want this to be just a blob okay you don't want to need triangle shapes jokingly call these Dorito chips sometimes but no triangle shapes no megaphone shapes worth small and one in it gets larger on the other mainly and just no issues with like funky shapes we're not doing so hot because there's data missing here potentially because there is a obviously a lower limit for a lot of these variables when you're talking about cars kind of some data missing care I'd say this is mostly okay so just aren't a whole lot of dots so if you draw a circle around the edge of the dots you just don't want it to make anything any weird shape if you google heteroscedasticity it may take some tries on the spelling even for me in SPSS you'll see a lot of these charts with examples of some bad ones so I'm gonna say all in a we've done the outliers and we've met our assumptions so let's try the actual mediation now and what I'm gonna do afterwards you know if the listen we point-and-click a lot on a lot of these is cut and paste these graphs into this document so you can actually open this Word document and follow along now for the actual analysis be sure you've installed the process plugin there's exam how-to guide on Hayes's website I'm gonna come down to analyze regression and then down here a process and I'm gonna use version 3 this time so we're using the newest version available all right so this is gonna be model for most of the model numbers appear to me to be the same unfortunately there's no templates for this new version it's only available with the book but at least the first six models up here to be the same I haven't really tested the rest of them yet so we're gonna use model for which is our simple mediation model our X variable is gonna I'm sorry why variables to me miles-per-gallon X variable it's give me cylinders and mediators can be HP weight and gear here in covariates and then let's double check that's what I said I was actually gonna do hey yes Silla nerves horsepower weight and gears great now we can save the bootstrap dev estimates but that takes like a lot of space so I tend to not use that option I think most commonly five thousand is pretty common if your computer runs really slow you can bump it down to a thousand just kind of preview it I'm gonna leave mine at 95% confident confidence intervals under options over here we can pick to see the total effect model that's X predicting Y which is considered C sometimes I can say C if I want to get the effect size for it heteroskedastic consistent inference you would want to look up your these different versions there's something explanations of them online and pick which one you like they're pretty different so I'm just gonna leave that at none cuz I don't have an opinion and then this section here is for moderation which is not we're doing yet so we're gonna continue and mostly this is set up for mediation at the start okay and then wait all right so what I'm gonna do is we're gonna double click on this copy it and then write on it and explain all the pieces of the output here so we're gonna pop over the word and use word to keep going now I won't really totally line up perfectly when we copy it but we'll be able to figure it out so the first part just reminds you what you put in so here are is everything that we entered into the equation this very first one so what I always like to do is just sort of figure out which model this is first so it definitely looks like we've got X predicting Oh Colibri or mono space I guess expert in horsepower remember the horsepower is M so this is X predicting M which potentially people call the a path okay because a is put important part of the indirect effect because the indirect effect is a times B it's also C minus C Prime in some instances but it'll help if you're watching this series if you can remember that it's a times B because when we get into serial mediation it's a moderated mediation it's always some form of a times B depending on what you're doing so we're gonna say that does do cylinders predict horsepower so we're doing exporting em and if I wanted to write this out an apa-style I could say yes so we're looking mainly here at this line and what I would do is present B okay these are the unstandardized coefficients and there's a whole long explanation in his book about why those are better but I would say for every the B equals here 96 so for every one unit increase in X we get B unit increases in Y and then so in this occasion for every one unit increase in cylinders we get thirty three increases in horsepower we would include degrees of freedom here so this is tricky so what are the degrees of freedom what's always gonna be that like second degrees of freedom here so that's this one it doesn't line up because of the way I copied it but if you do pictures this will copy a little better but let me go over here so you can see the second degree of freedom here which is ms KMS one so or and yeah M minus K minus 1 because it's M minus 3 minus 1 and so but the way I was reversed is the second degree of freedom for our T value and that's true for all three of these coefficients so for cylinders I got T equals and if I line it up here it's this six point four four or six point six for our p value is less than 0.001 and we could maybe calculate P R squared or something to add an effect size so yes our a path is significant we could also talk about our covariance which one is significant and one isn't so for weight here we would say that the weight doesn't really a perder appear to be predicting horsepower so it's still t of 28 bulls 1.75 0.09 and I'm gonna go three decimals here suggested by APA and we could also talk about gear Xavier does appear to be predicting for horsepower T still 28 4.4 3 and P is less than 0.001 and so that's how we'd read this so we're gonna use this cone here Co f stands for B we use the T column right so these three T values P values and if you want to also present the confidence interval for though for the perfect coefficient here's your lower and upper limit for confidence intervals in this case it's a 95% confidence interval so our our a path is significant so it's not X predicting in great alright come down here and this is going to be our full model where it's gonna include B which is the important one and C prime so in this case we have X predicting Y right which is gonna be C Prime so that's cylinders predicting miles per gallon so I got negative point-eight 1 oops sorry so be here equals don't do it there we go zero point eight one this negative our T so our degrees of freedom in this case is gonna be 27 because we have one extra variable in this equation which is M so that's where we got 27 for cooks earlier all right so here's our coefficient here sigh standard error here's T here so it's this negative one point two two or two three if I round up p-value is 0.23 now often you want C prime to not be significant but that was really the older or kind of views on like if C prime was significant the world was over now we're not really interested in significant so much but we're looking at the indirect effect so the indirect effect is eight times B which it tells me if there's a difference between C and C prime so kind of haven't figured out if mediation has happened yet or not we can also look at M predicting why this is the B path in predicting Y with X in the equation so we're gonna say the B value here so not be passed not to be confused with B the notation for coefficients it is our horsepower so I got negative 0.02 T values 27 and this was not significant either now the question is is it not significant due to a lack of powers and not significant because it just isn't one seven nine and so this is where you would have to figure out if you think that this variable is just not predictive or if you think well you know this is wildly underpowered because I expected a certain R square and maybe I'm not hitting it but if you want all our squares right here so we have predicted 84 percent of the variance so it's probably likely that this variable just does not predict Y or maybe there's some serial mediation going on here I could also talk about weights and gear one more time so I might copy this you don't ill watch me type for quite so long and so in this case weight is actually negative of all these other variables in here so weight has changed from 17 to negative 3 so there's clearly something going on here with the addition the chain worked also talking about a different dependent variable here so not completely comparable not comparable at all really I should have said that but yeah but we're talking about now weight is predicting in this instance miles per gallon and not horsepower so don't compare those two I got ahead of myself so it is significant so I would say three point five five notice I changed my T value to 27 and P equals 0.001 and now gear here is just not doing us a whole lot of use 0.36 to seven the T value is also 0.36 because our standard error is perfectly one which was a little unusual this is and our p value was equal to point seven nine - I'm missed that 77 - oh so in the full model weight is predictive of miles per gallon so heavier cars get less miles per gallon and gear here is not predictive now our total effect bottle which we kind of have to ask for is where we have X predicting Y I get asked a lot does X have to predict Y originally no because the idea is that X predicts M predicts y kind of depends on how much you want to adhere to the original Baron and Kenny steps and what were mirror you get so the kind of feeling now is no but your indirect effect should be greater than zero we're almost there so I could report this one by saying B equals now one point five negative one point five two five three if I round up T again it's back to 28 because we only have X in the equation and not also M negative three point six four and that actually is significant and then we would talk about our covariance one more time if you wanted to or you could say we Co varied them out and never talk about them again I've seen that happen both ways and so now we've got negative three point three nine so these two are comparable this is with and without M so you can sign up see how em is affecting these other ones now it's less than 0.001 looking right here here actually flipped signs so that's interesting alright so before it was positive and now it's negative okay still not significant so let's how it write all of those up now let's get into the actual did mediation happen part now it's gonna repeat the total effect on Y right so this one is C for total effect this one direct effect is C Prime and they're pretty different right so it's dropping its having when I add in em but I don't at the moment know if that half is big enough to be considered mediation and that having effect here is listed here so it says HP out here because that's the effect of M so the effect of adding in horsepower when I look at cylinders predicting miles per gallon so this is kind of like how much that's changing now here's where all that action happens so if you want to report the indirect I always just write indirect equals because I know there's not really like a symbol for indirect so you can just do like indirect equals or you can say in well it's not really B value so I don't know if I'd stick be in front of it so it's negative 0.72 I usually tend to also list the standard error so that's that boot I see here this is 0.55 and then definitely also list the confidence interval so 95% CI is and this is where you get into negative one point six two and OH once positive and ones negative that's no good okay and since that interval includes zero that would indicate to us this is no mediation because the CI includes the zero so even though the change between C and C Prime or a times B is large it's all it clearly it looks like a nonzero number there's a lot of standard error there and so this does not indicate mediation because it includes zero in the confidence interval although some different effect sizes he also look at hazus stuff to tell you a little bit more about how these are calculated and then you'll notice that also does not give you the Sobel test anymore if you're wanting the Sobel test what you would do is go and you have SPSS is that you can use Chris preachers website I don't know that the Sobel test is necessary I find that it tends to agree with the confidence interval but let's say you have a reviewer that just is in love with a Sobel test and you just need it on quantity org there is a way to type it in and just kind of get it for easy peasy AC you would type in a and B the error for a and the error for B and then it'll give you all of them I recommend this one or you can actually type in T for a and T for being get the same answer so let's plug that in so let's go back up here and find a there's be a was up here so here's T for a that's probably the fastest way t4b there's B go and that would tell us that it is not significant so as I've said it tends to agree with the conference several so all that together is how I would run a simple mediation model so model for with covariance in the in the equation and think about how to report those like an apa-style and so in this example we didn't find mediation also super underpowered so we also talked about power and so mainly if you want to keep following this series we're going to walk through all these different examples hit subscribe to save the channel and then we're gonna try to post videos more frequently on the different permutations of mediation and moderation models especially focusing on categorical variables because the new version of the process does a lot of cool stuff with those so thanks for watching and good luck with your mediation
Info
Channel: Statistics of DOOM
Views: 34,575
Rating: 4.9211268 out of 5
Keywords: statistics, spss, process, regression, mediation, covariates, data screening, normality, outliers, linearity, homogeneity, homoscedasticity, apa style, plug in, hayes, model 4, data science
Id: D7wt9s0siNY
Channel Id: undefined
Length: 39min 31sec (2371 seconds)
Published: Fri Jun 15 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.