Running and interpreting multiple regression with dummy coded variables in SPSS (2019)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in this video I will be discussing how to run and interpret multiple regression with dummy variables using SPSS the text file that I have open on my screen will be made available for download underneath the video description as will a link to the data also within the text father's the this is the actual link itself so let's go ahead and get started it is a common misconception that only continuous variables can be used as predictors in multiple regression analysis however it is actually possible to incorporate categorical predictors through the use of dummy variables dummy coding is a means of translating the grouping information associated with a categorical variable into a new set of dichotomous variables which can be included as predictors in a regression model in other words regression analyses can incorporate grouping variables so long as they have been appropriate appropriately recoded into a new set of dummy variables which is equal to j minus 1 or simply the number of groups on the original variable minus 1 dummy variables generally have values of 0 & 1 with this coding facilitating greater interpretation of the intercept in regression models the most simple dummy coded variable is the one is one that already has two values associated with it such as gender identification coded 0 for male 1 for female if your variable and the data set already contains this coding then there is no need to recode the regression coefficient for gender identification is simply interpreted as the mean difference conditional on any other predictors in the model between persons identifying as male and female on the dependent variable the intercept in the model is interpreted as the mean of the dependent variable for the group has coded at 0 which in this case would be persons identifying as male by adding the regression coefficient to the conditional mean for fit males you obtain the conditional mean for females so let's say that we run a simple regression when gender identification could add 0 for male 1 for female is included as a predictor of a dependent variable called achieve and we obtained the following results using SPSS so in this case you'll notice in the results we have a constant and that is the intercept for the model so the value of the intercept is ninety four point eight four seven and you can see we have the standard error right here there is no beta coefficient for the intercept but you'll notice that we have a T value and the p value for that then on the next line we have gender ID variable and values of six point three zero six for the slope there's a standard error the beta coefficient and then our T value and our p value so the constant is the intercept in the model and in a regression the intercept is the predicted value on Y when the predictors are all equal to zero in the current model with our single predictor the intercept is equal to the mean for males and that mean is ninety four point eight forty seven the slope for the predictor indicates the predicted change in Y for one unit increase on X which means that it is essentially the difference and means between males and females as a result the slope of six point three zero six is interpreted as indicating that on average female scored six point three zero six points higher than that of males the mean for females is simply computed as the sum of the intercept and the slope which would give us a mean of 100 and one point one five three so let's run the analysis using some data so I'm going to open up our data set right here you can see it we have our gender ID variable coded 0 and 1 and our dependent variable is achieve so to run the basic analysis first just note that gender ID is essentially already a dummy variable because it has codes of 0 and 1 on it and we can include it as a predictor of achieved so I'm going to go to analyze regression go down to linear and I'm just going to reset this so we'll move gender ID over to the independence box and achieve to the dependent click on ok and now we have our results so you can see kind of scrolling down here you can see that we have our R square value multiple R here F value and P value for the model all the usual information but then down here below we have our coefficients table and this is essentially what you saw in the text file so we have the constant that's the intercept that is the mean for males and then we have the slope for the gender ID variable and that is the difference between males and females so if you add the intercept and the difference between males and females you would get the mean for females you can also see that within the model that the predictor or gender ID was statistically significant indicating essentially that females scored higher on average on the achieved variable than males now let's say we run a multiple regression with gender ID and mastery goals master goals being a continuous variable as predictors of achieve and we obtain the following results so in this case right here this is our intercept and then we have each of our predictors gender ID and mastery goals with their respective slopes standard errors in this case there's our beta coefficients and then our t values and P values so in this model the intercept is a predicted value on achieve for the male identification group coded 0 conditional and mastery goals being in the model the regression slope for gender ID is the difference between persons identifying as males and females again conditional on mastery goals the conditional mean for females is simply computed as the differed the negative 0.5 9 7 taking the sum of negative 0.5 9 7 and 5 point 8 to 4 which is the slope representing the difference between the two groups so the conditional mean for females is 5.2 to 7 and as a side note the intercept technically is a predicted value on Y when the predictors are all equal to 0 and because the scale for mastery goals in our data does not technically include 0 in the range of possible values this limits our interpretation of the intercept so one way to increase interpretability would be to Center the mastery goals variable and to and this is accomplished by subtracting the mean of the variable for master the mean of mastery goals from the original raw scores on that variable and so then after including ours after doing the centering and including math the centered variable and the regression analysis we can interpret the results as a conditional for the intercept as the conditional mean for males scoring at the grand mean on mastery goals so let's let's show you how to do this so this is our mastery goals variable right here so the way that I actually obtain this is I first went in to analyze descriptive statistics descriptives clicked on mastery goals moved it over basically clicked on okay and I get the mean for mastery goals which in this data set it's 100 so the next thing that I did was I went under transform compute variable and you can see right here I had already kind of set it up so I had to create a new variable which I'm calling em goal centered or in gold period centered and then I've got M goal minus 100 so when I clicked on okay I'll just do it again you can see that essentially saved over what we already had done and this is the same number these are the same numbers as what we had before so this is the mastery goal centered variable so now when I run the analysis what I would do is include instead of the original mastery goals variable as a predictor which would produce the first set of results instead of doing that I can include mastery goal centered right here and when I click on OK I get those results and so you can see again we get our standard regression output and then we have the same values for our regression slope standard errors etc as what we have in our text file which is presented right right here so let me just again draw your attention to the fact that the intercept now is different because we've included the mean centered mastery goals variable so this is following mean centering that's the intercept and this is prior to mean centering so those values are essentially different whereas and in this particular case the set using the centered variable produces a more interpretive ol intercept and you'll also notice that the regression slopes for gender ID and mastery goals and all the subsequent values are exactly the same as what we had above let me also note that basically mastery goals would be interpreted are the slope for mastery goals would be interpreted as the predicted change on achieve for every one unit increase on mastery goals now when you have the categorical variable with more than two categories things get a little bit more complicated in this case you must create dump multiple dummy variables to represent group membership with the number of dummy variables equal to again J minus 1 or the number of groups minus 1 when doing this you'll need you also need to establish a baseline a reference category against which all other groups are compared so the regression coefficients for each dummy variable are equal to the difference in conditional means on the dependent variable between a specific group and the reference category and let me just note that when we ran them the analysis that included gender ID the reference category was essentially male because it was coded 0 so for example let's say that we have a predictor that's called education level that has four ordered categories where one is equal to very low 2 is equal to somewhat low 3 is equal to somewhat high and 4 is equal to very high to include education level as a predictor in a regression model predicting achieve we will need to recode this variable into three dummy variables again it's the number of groups minus one and one group is going to be treated as a reference category so below I demonstrate how to recode the original education variable into three dummy coded very people's education level one is going to be treated as the reference group and is going to be coded zero across all three dummy variables education level two which is the somewhat low group is going to be coded one zero zero across the three dummy variables education level three which is somewhat high group is going to be coded 0 1 0 and then education for level 4 which is the very high group is going to be coded 0 0 1 so we can see right here this is the original education level variable with codes of 1 2 3 & 4 and you'll you'll notice that we have up here dummy variables and this is just kind of demonstrating the system so for the very low group you'll notice that we have codes of zero across the three dummy variables for the somewhat low group we have codes of 1 0 0 for the somewhat high group is going to be coded 0 1 0 and then for the very high we have 0 0 1 and you'll notice that the information is contained on the four groups is encapsulated in the three dummy variables and also note that the naming system is arbitrary so the intercept for the regression model can now be interpreted as a conditional mean of the very low group the regression slope for the 4 SL can be interpreted as the difference between the conditional means for the very low and somewhat low groups the regression slope for the s8 for s H can be interpreted as the difference between the conditional means for the very low group and the somewhat high group and then the slope for V H can be interpreted as the difference in conditional means for the very low and very high groups so going back to our data set we have our original IDI level variable and I've already got it set up in here where I've created those dummy variables so there's the slw variable Sh dummy variable and vh dummy variables now how did I create these well let me just kind of go ahead and delete these and I'll show you so basically what I ended up doing was I used the transform recode in two different variables option so I'm going to click on that and I'm going to reset this so you can see it play out so we'll move at the ED level variable over to this box right here and you'll notice at the top it says numeric variable leading to the outcome there are output variable so I have to create three dummy variables so the first one that I'm going to create I'm just going to call it the the SL and I'm going to click on change so you'll notice it says we're converting educational level to SL next we'll click on old and new values and this box comes up so basically on the education level variable the somewhat low group has a code of two and so this is the old value on that variable of two and I'm going to recode it into a new value on the new variable to one and then hit add then I can do the same things for the remaining groups instead that we're going to be using our new values we will be using zero so essentially we would say old value one add and so we're converting it to zero then we also have three then add converting it to zero and then four to zero and so now you can see we have one two zero three two zero four two zero and two to one so next we'll click on continue and then on ok and so now we get that variable SL so let's do it for the other two and I'll show you a little quicker way of making the conversion so now I'm going to create the somewhat high variable or dummy variable so I'm going to in this case I can just leave this alone and type in a new name click change and so now we're creating the sum somewhat high dummy variable so we'll click on old a new value and I'll just kind of show you really quickly a little faster way of making these these variables so in this case a value of 3 on the educational level variable is the somewhat high group and we're going to convert it to 1 again click on add and then I can click on all other values and make them 0 so now when I click on add there you go and so it's it's a lot faster so now when I click on OK I get that variable and we'll do the same for the last one so we'll do recode again and in this case we'll make this a VH for very high click on change and in this case I will just click on this button right here and I can just say old value is 4 2 and so it's again being changed to 1 everything else is 0 in this case I'm just going to click press the Change button right here and so there you go so now when we click on continue and then on ok now we have our three new variables so we have our dummy variables for somewhat low somewhat high and then very high now before analyzing our data including those new dummy variables I do want to mention that the reference category is an arbitrary choice by you as a researcher so R or it may be a substantial choice but it's basically up to you as to what group you want to be representing as a reference category so you'll notice in this little system right here what I did was just to demonstrate that group four which is the very height group you'll notice I've coded 0 across the three dummy variables making that category the baseline or reference category so all of these other codes right here we're essentially comparing the very lows against the very highs here we would have the somewhat lows being compared against the very highs and then the somewhat highs being compared against the very highs so that's how you would be interpreting the regression coefficients within the model so let's go ahead and run our analysis we're gonna leave gender ID in our model we're also going to leave mastery goal centered in our model and then include the three dummy variables so what we'll do is we'll go to analyze regression linear right here and we'll just go ahead and add our three dummy variables so I'm going to move them over as well and click on OK and so now you can see our R square is higher it's point four zero nine we have again our F test it's indicating that the model is statistically significant and then we have all of our regression coefficients for our intercept and our predictors so once again the intercept is interpreted as a conditional mean unachieved when all the predictors in the model are 0 for the M goal centered variable a person scoring 0 is falling at the grand mean on that original variable a person scoring 0 on gender ID is identified as male in a person with values of zero on the three dummy variables representing education level is falling into the very low education category so taken all together the intercept is the conditional mean for persons who a identify as male B fall at the grand mean on mastery goals and C fall into the very low education group the slope for mastery goals indicates that for every one unit increment on this variable the conditional mean for achieve increases by 0.8 3 8 to 3 units the slope for SL is not statistically significant you can see its p-value is 0.6 3/8 which is right here indicating no significant difference in the conditional means for the very low and somewhat low education groups the slope for SH is not significant as well indicating no significant difference in the conditional means for very low and somewhat high education groups now the slope for the vhw variable is statistically significant so the slope is the 5.5 for 3 and you can see right here we have our p-value and that indicates a significant difference between the very low and very high education groups so in other words persons in the very high education groups based on the regression slope scored 5 point 5 4 3 points higher on average than those in the very low group okay so that concludes our video on running and interpreting multiple regression with dummy variables in SPSS thanks for watching
Info
Channel: Mike Crowson
Views: 38,358
Rating: 4.8641977 out of 5
Keywords: SPSS, multiple regression, dummy coding, mean centering, least squares regression, regression analysis
Id: XGlbGaOsV9U
Channel Id: undefined
Length: 19min 17sec (1157 seconds)
Published: Fri Jun 14 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.