V14.7 - Stepwise Multiple Regression in SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm gonna show you how to conduct a stepwise multiple regression in SPSS and the example is based on research trying to predict cholesterol levels this is the dependent variable on the basis of a large number of variables including age sex s BP BMI cigarette smoking physical activity alcohol consumption family history as well as sleep and so we've got 1 2 3 4 5 6 7 8 9 9 independent variables that might be predictive of total cholesterol so in order to conduct the analysis click on analyze regression linear and put the dependent variable total cholesterol level in the dependent box and then put all of the independent variables in the independent box now this is important the default in SPSS is to do a method enter multiple regression which would force all of these variables into the predictive equation irrespective of whether they were significant contributors or not so you need to click on this and then click on stepwise also click on statistics and we want part and partial correlations and also click on descriptives click continue and click ok so here are the descriptive statistics which in terms of means and standard deviations are not really important here what I wanted to point out is that our Pearson correlations and if you look at the magnitude of the correlations with the dependent variable you can see that age is the largest Pearson correlation with total cholesterol level it's a positive correlation so this variable is going to be included in the stepwise multiple regression first and then it's going to look for other variables to include after controlling for the effects of age so let's go down keep scrolling down and here we have the model summary so even though we had nine independent variables as possible candidates for inclusion in the regression equation we can see that SPSS ultimately only chose seven of those predictors so two of them were left out of the analysis and you can see that the first multiple are is 0.44 which corresponds to that 0.44 correlation between agent ec total cholesterol so that is what was selected first as you'd expect and the r-squared value increases a little bit with the inclusion of another independent variable and it progressively becomes a smaller increase and SPSS lists the variables here that are included in each model there's a little letter here ABCD efg you can see what predictors predictors constant that's the intercept and age that's the first model then model to age and sleep was chosen as the second variable to include in the model and we can see that sleep had a correlation of 0.2 one so it actually wasn't the biggest correlation 0.29 was BMI so correlation with TC but stepwise isn't looking for the biggest correlation for the second model it's actually looking for the biggest partial correlation and that's not shown here but that is what was found Oh point two one was the biggest partial correlation with total cholesterol controlling for the effects of age so that is a basic model summary with the model are increasing across the seven models as well as the adjusted r-square and the standard error of estimate the standard error of estimate is decreasing as the percentage of variance accounted for in the dependent variable is increasing as you'd expect now here's the ANOVA table and again there are seven models with statistics one through seven and each model has an F value and these F values are testing for statistical significance the model R and each one is statistically significant so once we get to the bottom the seventh model which is the one we would report in this type of analysis we have an F value of seventy nine point six four and with seven and 992 degrees of freedom in the residual we have P less than point zero zero one so this r-squared value and our value is statistically significant and here's the adjusted r-square which would be more accurate as a estimate in the population next we have the coefficients table and you can see again it increases model 1 2 3 4 5 6 7 and that's because we have 7 independent variables ultimately included in the model now you typically would not report any of these you wouldn't even consult these models that precede the final model you'd pretty much just go straight to the final model and we can see that they are all significant predictors of total cholesterol the dependent variable here here are the unstandardized beta weights and here are the standardized beta weights I simulated these data so don't interpret these unsanitized beta weights thinking that they mean anything they don't in this particular study I didn't have enough information to get accurate unstandardized beta ways but the standardized beta weights are probably fairly close to accurate and so we can see that age was associated with a positive standardized beta weight of 0.4 3 0 which we would interpret as statistically significant now the semi partial correlation over here is 0.4 1 2 so we could square that and report that as a percentage of variance much like we would for any other multiple regression now we can see here that the standardized beta weight is decreasing in magnitude because as more and more predictors are added to the model there's less and less variance to predict and SPSS is algorithmically trying to find the next best predictor so naturally the magnitude is decreasing across the models so sex was the last variable included in the model with a standardized beta a 2 point 1 0 2 and a semi partial correlation of 0.1 0 1 I note that the standardized beta weights and the semi partial correlations are quite similar in this case and that's because the correlations between the independent variables were rather weak in this study so surprisingly there isn't much of a correlation between these variables at least based on the research I looked at so when reporting a stepwise multiple regression you'd pretty much probably only report this at least in the main and maybe you would report these in the supplementary materials now excluded variables as the model number increases the number of variables excluded from the model decreases and you can see that sex which was the last variable included in the model in model six it was excluded but then it's not appearing as an excluded variable in model seven instead only two variables have been excluded from the multiple regression physical activity and alcohol consumption now physical activity was very nearly statistically significant had this been P less than point zero five the multiple regression would have created yet another model and then only excluded alcohol consumption but it wasn't significant so the analysis did not identify a statistically significant increase by including physical activity to the model and therefore it excluded it in addition to alcohol so that is a stepwise multiple regression in SPSS
Info
Channel: how2statsbook
Views: 5,872
Rating: 5 out of 5
Keywords:
Id: 2W6KoBcQSFg
Channel Id: undefined
Length: 7min 22sec (442 seconds)
Published: Sun Mar 03 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.