How to do Simple Linear Regression in SPSS (14-5)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
We are now ready to calculate a simple linear regression in SPSS Before we begin I would like to say that I assume you've watched the previous video about doing regression by hand. That video explained what we are doing using a simple data set, and it explained the assumptions that underlie the test. If you have not yet watched it, and you really want to understand what we are doing and why we are doing it I'd encourage you to go back and watch it first. It will really make a difference in understanding this lesson. We will be computing a simple linear regression in SPSS using the data set Job Satisfaction.sav in which an administrator at a mental health clinic was interested in predicting job satisfaction from burnout. These are data from 200 mental health counselors. We will predict job satisfaction among these counselors using their level of burnout, which is a loss of enthusiasm for the job. So the predictor, or x-value, is burnout. Burnout predicts job satisfaction. Theoretically, high levels of burnout would predict low levels of job satisfaction. So let's turn to SPSS. Here we are in SPSS using the data set Job Satisfaction.sav Let's begin with a scatterplot to make sure that we have some semblance of a linear relationship between these two variables Go to 'Graphs' 'Chart Builder.." In the gallery click on 'Scatter/Dot' Then drag 'Simple Scatter' into the canvas In the drop zones for the simple scatterplot move Burnout to the x-axis and then move Job Satisfaction to the y-axis And now click OK What I see here in the scatterplot looks like a negative correlation The dots generally go down and to the right, but let's add the regression line. Double-click anywhere in the scatterplot to open up the chart editor In the chart editor add a regression line by clicking on the icon 'Add Fit Line At Total.' You know that you have the right one if you hover your mouse over the icon and SPSS gives you a dialog box that says 'Add Fit Line At Total.' Click. And now you can close the chart editor and your scatterplot is complete Now let's check some assumptions while we're here. Remember, I covered these assumptions in detail in the previous video about doing regression by hand. In the scatter plot the data look linearly related and negative. So as burnout goes up, job satisfaction goes down. Also, the spread of the data are similar all along the regression line. They're not cone shaped or curved. So we have established homoscedasticity and linearity. We do not need to check for collinearity because there's only one predictor variable. Collinearity occurs when we have multiple predictors some of which are correlated among themselves Not a problem when we only have one predictor. So now we can conduct the regression analysis. Go to: 'Analyze' 'Regression' "Linear' This opens the dialog box for regression. We are using burnout to predict a job satisfaction So move the variable burnout in to the independents window Then move the variable job satisfaction in to the Dependent window The method should be left set to Enter. Now we need some additional settings. Click on 'Statistics...' In this box we want to get confidence intervals, descriptives, Durbin-Watson, and Casewise diagnostics. Now click continue. Click on 'Plots' Move z-predicted to X. It is the predictor. And move z-residual to Y. These are your residual errors. Then click on 'Histogram' and 'Normal probability plot' Click Continue and OK We are going to interpret the output, but first we should check the rest of the assumptions for regression As I explained in the previous video about assumptions for the test, regression is very sensitive to outliers. We can check for outliers in the residual statistics box. We will look at the standardized residuals which because they are standardized can be interpreted like a z-score. The minimum and maximum values for the standardized residuals should not exceed -3.29 or +3.29 respectively. If they do, you have outliers. If we had outliers, we would stop now with the interpretation, go back to the data set, examine the scatter plot, and identify the outlier or outliers. [ You could then delete the entire case or you could winsorize the outlier so that it has the same value as the next highest non-outlier value in the data set, and of course if the outlier was a data entry error we would just fix it and then rerun the regression. Okay, good. No Outliers Let's check for independence of observations which we check by examining for independence of errors using the Durbin-Watson test. The Durbin-Watson statistic is close to 2. Good. We don't want it less than 1 or greater than 3, so the assumption of independence of observations has also been met. The only other thing to check for is Normality, which we can see in the PP plot here at the end. The dots generally line up on along a 45 degree line So we have normality of residuals, and we can see that our dependent variable of job satisfaction is also nicely normally distributed. The scatter plot of the standardized residuals versus the predicted values is elliptical, as it should be, there's no pattern here, so all of the assumptions have been met. The data look great, nothing to fix or transform. It's almost like I planned it that way. But now we are ready to interpret the regression. Going back to the top of the output window we see Descriptive Statistics. The descriptive statistics will be useful later when we write up the model. We know the means, standard deviations, and that all 200 cases were used in the model The correlation matrix shows us that the variables correlate at a -.65 that is a moderately strong,negative correlation. The Variables Entered/Removed box tells us that the only predictor in the model was burnout, and the only dependent variable was job satisfaction. We use the Enter method, no surprise there. In the Model Summary we get some other useful information. First, we get our the correlation which is a .65 which, of course, we already knew from the correlation matrix. We also get the R-squared which is .423 So how do we evaluate the R-square? Remember that we used a lowercase r-squared in bivariate correlation, as what we call the coefficient of determination. It was the proportion of variability in the outcome variable that was explained by the predictor variable. Here we have a capital R-squared statistic The R-squared statistic is still the coefficient of determination But in regression we typically use multiple predictor values to predict an outcome The capital R-squared is the multivariate equivalent of the lowercase r-squared from bivariate correlation. It still tells us the proportion of variance in job satisfaction accounted for by burnout. The R-squared for this equation is a .423 that means 42.3% of the variance in job satisfaction was predicted from level of burnout. So again, capital R-squared is multiple predictors. Lowercase r-squared, one predictor Now this is interesting. I thought that we were running a linear regression, but here we have an ANOVA summary table. Now aren't those two different tests? Remember that regression, ANOVA, t-tests, correlation are all part of the general linear model. So this ANOVA is just another way of looking at our regression model, and what this tells us is that our model with one predictor works better than simply predicting using the mean. The significance value here means that our model using burnout as a predictor was significantly better than prediction without burnout in the model. There is a statistically significant relationship between my predictor and the outcome variable. The notes underneath the table show us the dependent variable and the predictors Burnout is being used to predict job satisfaction. If this ANOVA was not significant, then the predictors you chose did not contribute to predicting the outcome any better than just using the mean. Now we get to the coefficients. This is an important box. The ANOVA told us that the model works, but now we figure out how it works. There are two types of coefficients in regression: standardized and unstandardized. I have another video in which I go into detail about the differences and how to interpret them. For now, we're going to stick with the unstandardized coefficients So now we need to find a and b, so that we can plug them in and create our regression equation. You notice a column with a b, but no column for a. Actually both a and b are in this column labeled capital B. Remember that a is a constant, so this value of a is 235.459 that will be our starting point The b value is -2.11. Now the b coefficient has a t-value associated with it. This is a t-test to see if adding this variable as a predictor improves the predictive ability of the model. Now, of course, this would be more useful if we had multiple predictors some of which were significant and others that were not. If a t-test for the beta coefficient is not statistically significant, then that tells you that this predictor does not add to your model, so ignore it. If, on the other hand, the t-test is significant, as it is here, then look at the coefficient. Look at the sign. Is it positive or negative? That tells you whether the dependent variable will increase or decrease due to an increase in the predictor. So here is how you interpret a coefficient: for every one unit increase in the predictor variable, the outcome variable will increase by the unstandardized beta coefficient value. This coefficient is negative, so we would read this as for every one unit increase in burnout, job satisfaction will decrease by 2.112 points. The standardized beta would be interpreted as for every one standard deviation increase in burnout, job satisfaction will decrease by .65 of a standard deviation, and by the way, does that negative .65 look familiar? Where have we seen that before? Oh yeah, that was the correlation between the two variables, and what is correlation again? It's the standardized covariance of two variables, and beta is the standardized coefficient of the relationship between two variables. Is it all starting to make sense now? Cool. So now that we have our a and b coefficients, we can use our regression equation for prediction. The regression equation for predicting job satisfaction from burnout is y hat = 235.46 - 2.11X We can use this equation. 4 more employees are measured for burnout. What do you predict will be their job satisfaction? Using our regression equation plug in these x values 25, 50, 70, and 120 Plug them in, multiply by 2.11, subtract that from 235.46, and feel free to stop the video while you work out each of these equations Let's see if you got all of them right For 25 the predicted value is 182.71 For 50 it's 129.96 For 70 it's 87.76 and for 120 it is -17.74. That -17 looks a little suspicious. We're going to talk about that more in the next video reviewing how to interpret regression output. Here is a sample APA write up for bivariate linear regression. We explain what test was done, what were the results, we include the regression equation, and the r-squared, and the interpretation of what that means. And that is how you do a simple linear regression.
Info
Channel: Research By Design
Views: 134,341
Rating: 4.8917975 out of 5
Keywords: Todd Daniel, statistics, flipped classroom, beginners, introduction, Research by Design, how to, research, linear regression, simple regression, regression, Method of Least Squares, SPSS, Durbin-Watson, collinearity, tolerance, VIF, heteroscedasticity, assumptions, slope
Id: 6xcQYmPDqXs
Channel Id: undefined
Length: 16min 6sec (966 seconds)
Published: Thu Apr 27 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.