In this video, I'm going to show you how
to do a simple linear regression in SPSS. So regression is built off of correlation
in that it deals with the degree of relationship between two variables. But
it goes one step further in allowing us to predict the value of one variable if
we know another. The accuracy of the prediction will depend on the strength
of the correlation between those two variables. I cover some of the conceptual
background of regression in a video I made showing you how to do regression in Excel. So, you may want to take a look at that
for some of the background and I will use the same example here of trying to
predict a person's IQ score if we know how much caffeine they consumed. Now I hope it goes without saying that my
example is totally made up and these data are made up. So, don't start drinking a
ton of coffee and expect your IQ to change very much. If there is a relationship between two variables like these, we can use the value of one variable to predict
the value of another. So, whatever variable we're using to
predict something else will be the X variable and whatever variable we're
trying to predict to be the Y variable. In this case we'll pretend that we're
using caffeine consumption to predict a person's IQ score. Ultimately, we're going to end up with
regression equation structured like this. y=bx+a
In this formula, y is the value of the variable we're predicting. b is the slope of the line which we multiply by x, which is the value of the variable we already know. 'a' is
the y-intercept which is the point at which the line crosses the y-axis when x
equals 0. So before we construct anything like
this equation in this format and start predicting people's IQ scores, we need some data. We need to see
previous instances of how caffeine consumption relates to IQ scores and
these are the data on that. So we see various people's daily
caffeine consumption and those same people's IQ scores. So we can use these data to
construct the regression equation then predict the IQ scores for other people whose IQs we don't actually know. The
only thing we do know about them is how much caffeine they consumed. So once you have these data popped in it's pretty easy to run the regression in SPSS. You
need to go to 'Analyze' and then 'Regression'. Now there are many different kinds of
regression and the one we choose depends on what kind
of relationship exists between the two variables. It can be straight, it can be curved, it can be U-shaped in some way. We don't actually know for sure what kind of relationship
exists. So it might be a good idea to graph it first and take a look. So there's a couple different ways to
create graphs in SPSS and I'm honestly not the biggest fan of SPSS's graphing capabilities, but let's take a stab at it any way. That menu is right up here under 'Graph'. I prefer going to legacy dialogues and
then just picking the kind of graph you want which would be a scatter plot. There are different kinds. I like to go with simple scatter. Hit define. Now here you just have to
move over the variables into either y or x axis. Remember the variable
you're trying to predict is the Y variable, which we're trying to predict IQ scores; and the variable you're using to predict is the X variable so that'd be caffeine dose.
Once you have those things popped in there in the right order, click 'OK' and this graph will pop up.
You can get the exact same result if you go to 'Graph' and 'Chart Builder'. I don't really like this because I find
it a bit confusing but you get the same result. You pick what kind of graph you want,
ScatterDot and then whichever one of these kinds visually depicts what you like, just click and hold and drag it up here
and then just drag your variables over. So I want Caffeine_Dose to be on the
x-axis and I want IQ score to be on the y-axis. Once
you have it (this is just an example of what it looks like or what it might look
like), click 'OK' and you get a very similar looking graph. So there are two different ways to get basically the same thing. Either way you do it, you can kind of get
the sense that the relationship is roughly a straight
line. So i think a linear regression would be the most appropriate analysis. So go to 'Analyze', 'Regression', we'll pick 'Linear'. Then pretty much
the same way you did the graphs, you just move over the variables in the same way. Caffeine_Dose is going to be your
independent variable, the thing you're using to predict. And IQ_Score is the dependent
variable, the thing you're actually predicting. Once you're ready, click
'OK'. As usual with SPSS, we get a lot more
than we really need. This first table can pretty much be ignored. There's really
nothing useful there. The second one, 'Model Summary', does have some useful things. 'R' is the correlation coefficient between the two
variables. It's about . 92 which is a very strong
positive correlation which we could have guessed from the graphs. It's very tightly
clustered and moving from the bottom left to the upper right so that's a positive
correlation. We also have 'R Square' which is exactly
what it sounds like. It's the square of 'R' or 0.917 x 0.917. Another name
for this is the coefficient of determination. So whatever you call it, it can be used
as an indication of how good your regression equation fits your data. It's the proportion of variance in your
Y variable that you explained with the X variable. Generally numbers closer to
one indicate a better fit. so just keep that in mind you can ignore
adjusted r-squared for now but you can take a peek at this thing called the
standard error of estimate this is a measure of variability kind of like the
standard deviation which we've dealt with before this tells you how much in accuracy
you're going to get from your predictions and for now all you really
need to know is that smaller numbers mean more accuracy and larger numbers
mean less accuracy this Nova table can useful but i'm not going to focus on
here instead i just want to show you where you can go
to get the information you need to construct a regression equation and it's
right here in the coefficients table under unstandardized coefficients this
thing called constant that's your y-intercept this is the supposed or the predicted IQ
score of a person who gets zero caffeine and the thing right below it is the
slope so for every additional milligrams of caffeine a person consumes this is
how much their IQ scores increases in this example so you can put these values here into
your IQ score or into your IQ score into your regression equation structured like
this and you can start calculating for y and making some predictions now let's say there's a lot more you can
do in SPSS with regression all these wonderful things here you can do
multiple linear regression where you have several predictor variables you can
even do not linear regression to but the stuff we just did with simple linear
regression I should get you started