SPSS - Mediation Analysis with PROCESS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so in this video I'm going to cover how to do mediation analyses with focus on the traditional steps listed by Baron and Kinney and they're sort of seminal paper on me a ssin it also includes a lot of haze work with process and that plugin for SPSS I will warn you that there are lots of different alterations to this type of design Dave Kenny has a bunch on his website and Chris preacher on his difficult is site talking about simple slopes sorry not simple so Snowbell test and then it would just warn you that there are lots of different ways to do this but this is the very traditional way okay so what we're going to do is look at datasets 6 on blackboard and work through a mediation design where we think that the treatment condition there in this first column sorry treatment condition affects their number of days at their house so what's going on in the study is we have like temporary housing and we're looking at how many days at their house and so we have two different groups a control group and a treatment group but we think that the number of housing context they have mediates the relationship between condition and days house so that there is a change in the relationship between conditions and housing because of the contacts ok so this isn't that at each level of contacts we get a different relationship is that the relationship between housing and contacts essentially goes away when we add that third person in so the way they think about mediation is that it's sort of the third variable issue and so I always think about it as like on the lawyer sense there's a mediator someone who changes the relationship between two things that can't get along essentially but also you can think about it like your friends so you and your friend on the room you chitchat in blah blah blah when that third person comes in it does change the dynamics of you and the first person talking so that's big the idea for Hine mediation alright so for power I'm going to do is use G power here and it's the same power we've been talking about for Russian designs so test family is f right statistical test is going to be multiple correlate sorry multiple regression fixed model deviation from zero and then we would just enter what's our R squared what's our alpha Harris beta I'm sorry power which is 0.8 and the number of predictors in this simple kind of mediation is going to be two at some points it's one but generally it's two the important part and we can calculate there's a really simplistic way to calculate power for mediation like I said between Kenny and preacher you can probably come up with ten different ways but um I think that's probably the easiest one to do is sort of a quick estimate but I would recommend you to read some of their work if you really wanted to know like a best way to estimate hour for mediation so it says we need 68 people and then we've got about 100 I think hundred something so we should be good alright let's talk about data screening now remember that the process plug-in is run in a separate way so normally when you screen for regression you can do the actual regression and screen at the same time but in this particular case we're not gonna be able to do that because we're going to run mediation in a separate window so what we're going to want to do is data screen and then run okay so let's check for missing data and accuracy so analyze descriptives frequencies will move over all three variables and then under statistics at least min and max but I also usually recommend the standard deviation and mean continue and okay so I don't have any missing data which is good I'll have to estimate anybody it looks to me like my treatment groups are none of any weird coatings going on and then I would have to know the upper and lower limits but clearly it can't be house negative days and so there are no negative numbers which is pretty good for us too so this is where I would check for like typos and any missing data all right so let's run an outlier analysis and there's going to be three parts to this so we don't need to do a fake regression because it is real regression so we're going to do analyze regression and then linear put our DV independent which are this example is days housed here and then both Riv and our mediator and independent because we're sort of just trying to screen things right now now treatment is a categorical variable it is one or the other so if your output when you screen with categorical variables looks totally crazy try taking it out and screening again seeing if that's what's causing the issue because sometimes we'll get like these flat lines where it's clearly just separating them by group the other thing you could do is split your file on the categorical variable and screen each piece separately essentially that's also an option people like to use I was just try it with the categorical variable because I am going to use it as a predictor and see what happens first okay so let's see um we want to get just the stuff for screening since this isn't the actual mediation test so plots Z prior to my Z residual and X histogram and normal probability continue and then save and all through the distances Mahalanobis cooks and leverage remembering that there are a bunch of different outlier statistics it's just these three are the ones we've been learning and they're sort of the best bang for your buck continue and okay all right I'm gonna hit the star button to go back to the data and we're going to do since now I have a lot more people than the other examples on my channel here I'm going to do a transform recode in two different variables to kind of check for all the outliers and this is what I've been saying where you know you have 10 or 20 people it's easy to just kind of look at the data but when you get hundreds of people this will make it much easier on you use the transform window and recode in two different variables so first thing I'm gonna move over is Mahalanobis and you have to give it a name before you can do anything so I'm going to Cal it all out for outliers MA from Holley nobis and hit change click old and new values this is where we're going to code everybody who's an outlier has one and everybody who's not an outlier as 0 so the first thing you have to figure out is what is my cutoff score well that's my chi-square tables for so I have two predictors KS - it's both Ivy in the mediator and then I'm going to use point oh oh one so I want to be really different before I cut anybody out this I'm going to do 1382 so 1382 is gonna be my cutoff score from a holla nobis only so we're going to use this range value through highest what that's going to do is it's going to take everybody who has a 13 82 and up and cut them as a 1 everybody else is going to be counted as a zero so range value through highest means this score and up it's gonna be one click Add and then all other values everybody else is going to be oops um 0 and then add and that codes everybody as zeros not outlier one is outlier continued and okay so what that did is it put that one outlier there at the top these are the only person who score is above my cutoff now let's do cooks transform recode in different variables hit reset they have very different cutoff scores so you can't do all of them at once let's go cooks here which I'm going to rename as out cook change click old new values we're going to use that same principle value through highest but with cooks cutoff okay so the formula for cooks is 4 divided by n minus K minus 1 remembering that n is the number of participants K is the number of predictors so we're going to have four divided by I have a hundred and nine people minus K which is two minus one so that's 4 over 106 which is point zero three eight I'm going to make everybody that scored higher a one ad and then everybody else so all of our values are zero and hit add and then hit continue okay it's going to make us one more column so I have will see descending here two cooks outliers have two people over that cutoff score you'll see there's a big break here so I sort of cooks descending let's see that is a big jump between most of the participants and then these top two one more time for leverage so transform recode in the different variables reset centered leverage let's call it out love change old new values so what's the formula for leverage what's two times k plus 2 divided by n so that's 2 times 2 because k is 2 plus 2 so that's 6 on the top / 109 is point zero five five oops to me zeros goodness by Mac there we go so zero point zero five five make them ones add everybody else can be 0 hit add continue okay and that means we have let's see here descending for leverage outliers so there are four people who are over the cutoff for leverage all right now what do I do with all that information excuse me well what I'm going to try and do is come up with the person the people who have the best who have two out of three now the best you are have two out of three outlier scores because that's usually good criteria to say okay I should eliminate them now we've been talking on your about Mahalanobis and how it's used all by itself when you're talking about regression you can have people who are far away from the rest of the points and those are tend to be called discrepant their scores are not close to the rest of the data but they might be perfectly in line with a regression equation okay and that would be leverage so leverage is how much they're changing the slope discrepancy is how far away they are well somebody who's you know whose data is not really that close to the rest of the data but is within the equation that I'm expecting it's not really an outlier that just I mean they are but they're not like that big of a deal really because they're still with him what I would a range of data I would be expecting but people who have a high leverage of the slope like their scores changing the slope a lot that really grows into okay so why are they changing the slope a lot so as always with outliers kind of look at like why are they outliers okay a lot of times is because they're at the top of the bottom of the distribution which isn't too surprising but like why other scores what they are all right so how do I determine who has two out of three other than I can visually kind of see it right now because the way I had it sorted but if you have lots of data that's maybe confusing so the easiest thing to do is create one more column transform and the compute variable this time we're going to call it outliers total now you're just going to add all three of them together in the numeric expression window so out for Mahalanobis plus out for Kooks plus out for love um so just add them all up and don't add the original chromes together because that score won't make any sense so you're adding the three new columns we created that are zeros and ones then hit OK and now I can tell how many of them have two or more as their outlier scores so these two participants here are outliers because they have at least two markers for them so what is it about those two participants well they're different treatment groups that's not it they're not at the top of this the housing one right and they're not sending here well one of it and then one of them is at the top of the contacts one so it's not clear that's like okay well then it scores a really high at either condition or either variable so you'll want to sit down and think about like why are they outliers and so that will make sure that you can explain why you eliminated them so for what I'm going to do I'm going to delete these two because they have two out of the three so that you output that you are looking at in the notes matches what I'm doing on the screen so that's outliers now let's look at multicollinearity well the only two variables we have are going to be treatment and housing context so we're going to check X and M and make sure they're not two multilinear because they do get used together in the same regression equation as part of the steps its analyzed correlate bivariate so treatment and days how no days houses are DV sorry treatment and contacts and then hit okay so this is technically a point-biserial correlation because a treatment condition is dichotomous and yes they're correlated which is good for us we do want them to be correlated if they're uncorrelated mediations not going to work but they're not too correlated so that's good alright so I can't look at all this data because I deleted people so I'm going to rerun my regressions analyze Russian linear it's still all set up okay but one thing you will notice when you rerun it you get more outlier statistics because you just rerun it don't double delete people okay these are not outliers the first time so don't delete them the second time so I'm just going to ignore those columns and delete them if you want so let's go back to the output and look at normality here's what I said it'll look funny so this is the normal curve for days house given these other two variables but one of them is categorical so I think that's probably why you're seeing this pump in the data that's double hump because there's two separate groups and maybe this is Group one this is group two so you may consider taking out that categorical variable to be able to screen the data for just the continuous conditions because you can't really change the category variable right so if you see this sort of thing that might be a time to take the variable out and try it again but in general most of the data is pretty normal it is from 2 to 2 and then that with the caveat that it these two this bimodal hump thing is probably due to the fact that there are - there's a categorical variable in the mix let's go down to linear and yeah that's that's messy right there so that's not good for you so this would be another good clue that maybe something about the conditions is different which is the point so I might run it without and then this thing for sure with homogeneity and homoscedasticity the reason that you get this is because it is limited data right so it can be one or two and so that's sort of why you're seeing it looked like a triangle but for home and then generally it actually does not really meet the assumptions homogeneity does this way but not this way almost get esta city that is like a flying car so not so good so what I tell you to do in that situation let's try that regression one more time without the treatment variable let's see what the continuous variable looks like well what's sort awry so something's going on here we do have a strange distribution so I don't know that this is really very normal um because that peak should be over zero so that's kind of problematic um it's it might just be the way that that data set is we have at least 30 people so we can kind of get away with normality this is also bad and that it day two changed a little bit you still have this restriction of range it looks like so essentially these assumptions are crap but it's just an example and I tell you to kind of dig into why or assumptions of not being met so maybe maybe it should be an X cubed function because this is starting to be an s-curve and maybe maybe we shouldn't assume normality here so different things to check when you're using categorical variables all right but the point is to show you how mediation works so what I'm gonna do is run a mediation analysis analyze regression we're going to use the process plug-in because it's fantastic all right so how does all this work well this is going to be modeled for remember model one is a basic moderation model fourth basic mediation so we're going to stick our Y variable or DV into Y so that's taste house so we want to know how many days we're going to have to have them our treatment condition is X and our housing Canada contacts is M now moderation it does not matter if you switch X and M because it's an interaction so it just depends on which way you want to look at the simple slopes for mediation it really does matter so make sure you understand the conceptual picture that you're trying to make and the cool thing is there's a templates file they'll find it going to open eyes characters as part of the package you get when you download process is you get this template file so it shows you the pictures of all of the different designs that have been programmed in so let's go down to four here's four and you really cannot switch X and M so this is more of a theoretical order that X leads to Y and then X leads to M but if you put in the equation X to Y is changing so make sure you have which ones which here all right so we're only the model number here is 4 leave all this stuff alone under options what you want to do is pick the first four I'm sorry that's moderation so if you pick effect size here we want to get Kappa squared the Sobel test to test if moderation I'm sorry mediation happen and then the total effect model and compare the two models okay so the bottom for essentially all of them that's I for continue and then go alright so let's talk about the the steps to mediation since I have no idea what the heck word there we go put a blank document here so um the way that Baron and Kitty have described it described mediation is at first you have to have that your X variable predicts why because you're saying that X changes with M in the equations of X doesn't predict Y then you kind of miss the point this is considered path a path see I'm sorry see the second thing that has to happen is that X variable predicts em that's path a so I'm saying that X products Y so I have to establish that there's a relationship there first and then I'm saying X has to predict M because we're saying the path is going to be diverted through M so this is like to a diverging roadway and then the next piece is two things together so we're going to put X and M together or predicting Y but specifically we want to know that M variable predicts y and that implies that the diverging path continues past M it doesn't just stop because then you just have X predicting M and X predicting Y and you're saying with mediation that instead of going X to Y it actually should go x2 M the Y so you're predicting a separate pathway with a third variable involved so if M doesn't predict Y you're kind of don't have mediation so I that have be traditionally and then X still predicts our X variable no longer predicts y or is lessened predicting y with M in the equation and that's passed C Prime and the whole point of mediation is that C and C prime are different and usually that C is greater than C prime you want it to you want it to go down so you want to be closer to 0 and now it will be interesting in face flipped signs but that might be a better indication of moderation that those actually interaction going on so the argument here is that I have X to Y as my one path so I can take this highway from city one to city two or I can have X to M 2 y is my second path well I can take the loop around the city to get there faster and when you add that loop less people are taking the other highway so it's you're like you're diverting you're diverting the relationships now that is the order that they have proposed that you go in because if step one does not happen it doesn't matter for the rest and then X tend to predict iam the rest doesn't matter if M doesn't predict y the rest is a map so there in order for a reason but sort of cleverly the output that you're going to get is in alphabetical order so we kind of have to match this to the alphabet I'm sorry to the output that we're looking at so I'm going to do this kind of one piece at a time here how did the whole thing but we'll go through it one part at a time so mmm all this crazy output what next let's see where I'm at here so the first thing it gives me tells me is um what I did in my analysis which is nice because it reminds you what you picked as X M + y and then the next thing you get here this is outcome as high as housing contacts so housing contacts is actually M and then the one variable down here what's this treatment is X so this is X predicting M and so that is actually number 2 here X predicting M so if you can't figure out what pathway you're trying to talk about go in and label them as XM m 1 and then match it to the step so that overall model is significant and what you'll see sometimes is sometimes people put in all of the statistics and like a table and sometimes people ignore it and then just talk about the the coefficients you could do both I'll kind of talk about both here well let's get rid of this so we can pull it up one piece at a time here so the first thing I'm saying is that path a is significant okay so I could do one one at five is eight point eight seven okay so I'm going to use these numbers here to talk about degrees of freedom one two and here's F so P is point well it's less than a one and then R squared all right is e equal to go eight percent basically okay so that overall model is significant and then the predictor which is probably the more important part so let's go it's B lowercase B for the coefficient so how do I know lower case thank you words so B here is listed under Co F so it's this number one point eight eight my T statistic remember the degrees of freedom is the second one of the F statistic so it's 105 here is here under T so it's coefficient standard error T so two point nine eight if p value is going to be the same p as before this is always only one variable so it's point O one okay so here's P here so what am I saying well I'm saying X predicts M so treatment condition predicts the number of housing contacts they have this is coded work check not sure actually this is coded where zero is control and treatment is one and so if you're looking at dummy coded variables that means B is the difference between the two so treatment condition has two more common housing contacts than the control condition if I round up okay so there's more housing contacts than the treatment condition which is good that's what we wanted it needs to be a predictor okay so that's the first one is a so let's look at the second one here the next thing we're going to get is B and C prime like I said alphabetic order so here's this is going to be Y and then we've got M and X together which makes it this part here and then the first one that comes up is M predicting Y so that's path B so it's here and the overall model for this piece is significant so both of them together it's a significant predictor of days house so it's to 104 is 1682 T is less than 0.01 and then our r-squared sheet here and copy is 24% so a big effect size now what about my stuff does it predict Y that's an important part and it does so lowercase B is 1.73 here so this is B so for every one housing contact we're getting one point seven three days house T is going to be 104 because remember it's the second degree of freedom here is four point nine six so here and it is significant just good does it has to be less than one all right now what about C prime okay so it is X predicting Y and so that's expert thing Y but it's going to be C prime because we've got em in there so Emma's tossed in so be sure you're doing this one a C prime and so let's see here we want this to be non significant in the perfect world but we're also going to use the Sobel test three point four five five the T value here 104 right still it's two point three six no I'm sorry that's standard error got a line off come up be standard over here it's 1.50 and then P is equal to 0.14 so it's actually non significant that's good for us that actually follows the path that we would expect now that doesn't always happen sometimes it's still significant at this set step so that's partial mediation but what we're going to do is use the Sobel test to sort of show that it's significant that C and C prime are different when using m and so if this step is significant don't freak out the very last thing you get is label with nicely total effect model that's the first picture so like I said there has been a quarter you ABC Prime and then C so the overall model is significant it's 1 and 105 is seven point three eight T is less than a 1 or equal to O one if you want to round up so I use these numbers here the r-squared is seven percent so overall it is significant cool the next thing we want to do is talk about what's the e pack right so that's here so we got X predicting Y and so it's six point eight to our T value with the grades of freedom 105 is 2.72 P is the same thing 101 so the most important step we're experts were actually comes last in the output but that's okay I filled in all of them do I meet the criteria yes X predicts y yes X predicts M yes M predicts y controlling for X and then X appears to go are the paths see where X predicts y controlling for him appears to be lower because look at the comparison at six point eight two versus three point five five okay so as the treatment condition is seven days different control versus treatment in with no mediator present and it's at least five days different with the mediator present so using a lessening in the number of days we're having to house people when you control for contacts essentially so contacts changes the relationship between the treatment condition and the number of days house all right what happens here is if you get everything again essentially so this is actually C so the total effect of X on Y see the direct effects of X on Y is C prime so it gives you CM and C Prime and then this indirect effect is C of sergeant of deleted item this piece here is C minus C prime so six point eight to minus three point five five that's where this effect member is coming from and you notice there's no statistical test out here because it's down at the bottom and instant but it does give you the bootstrapped confidence interval so the bootstrap confidence interval tells you if the effect the difference between the two crosses zero so C minus C prime includes a zero and it's confidence interval that means there's no difference between the two so you want to see that this confidence interval does not does not cross the zero because that implies there is a difference between the two all right the next couple of things we're going to get is a bunch of different types of effect sizes if you're interested in this I really recommend any fields book um you with the SPSS version talking about what all these things are and how they're calculated is also in Hayes's processbook and there's just lots of different things going on here but I'm going to suggest using the last one which is preacher and Kellie's Kappa square so preacher in Cantonese Kappa squared I think is probably the best supported is like the one that makes most sense to me you lacking too much into the math and so the effect here is Kappa squared and you don't want that to cross zero either but if one crosses zero usually the other one does too I'm going to use that when I talk about this last piece so this is a bunch of stuff at the bottom so after all of these steps are supported most people do this so Belle what's called the sub L test and that's the normal theory test as it's listed here in the output it is AC score test if the indirect effect or C - C prime is different from zero so if it's if it's basically saying is it greater or less than zero like it is not zero in some form because it might be negative the order of the difference might have changed between them and so it's testing this 3.26 okay remember that's C - C prime and here's the z-score for it so the way you write up the Sobel test just say Z equals you write as e 2.5 to list P and you do want this to be significant saying that there is a significant difference between the two paths and then I tend to list Kappa squared here to show people how big that difference is remote exercises are important how different are they so let's find Kappa here little K cute little k here this cat squared I'm going to put it in the font that we like so 2 squared equals point 1 3 you can turb with this the same way we interpret r-squared and that's about a medium effect size if you go with 1o 1.09 0.25 and so that's basically saying that there a mediation did happen having this full picture with all three variables at once is important so it's important to understand that x and y related but x 2m the wire also related so I gotta remember that that third variable isn't is useful and interesting and is part of the equation this is one good way to test for the third variable problem okay so all that being said that's going to be slightly different than the notes it looks like because I didn't delete people when I ran it that way what do I want to report in my outputs well generally um if you're going to report the F statistics they can go on a table most people list the T values that I've included the show which ones are significant which ones are not for sure and then a little chart of that relationship so so making that triangle picture of X M my and including those B paths on there that is a great picture for this there's no like graph of this sort of analysis it's more tends to be a depiction of the mediation relationship with those path coefficients so the B values on the picture but we want to label them as path ABC at C prime because that's the traditional nomenclature in this type of analysis so that concludes mediation analyses using process in SPSS
Info
Channel: Statistics of DOOM
Views: 156,017
Rating: 4.9229121 out of 5
Keywords: SPSS (Software), Mediation, Plug-in (Software Genre), Statistics (Field Of Study), regression, statistics
Id: ByuUyLtoTt8
Channel Id: undefined
Length: 37min 21sec (2241 seconds)
Published: Thu Apr 16 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.