Welcome to the second video in SPSS
for Beginners from RStats Institute at Missouri State University. In our
first video, we learn how to create variables in SPSS. The next step is to
add some data, and we're going to begin in Data View. Here in Data View these are
the same four variables that we created in the first video. So now we can add
some numbers. Pause the video and enter these same numbers into your SPSS
spreadsheet. Now that we have numbers, it's important
to understand just what these data represent. The first column is a random
identification number. It stands in for the names of the participants and keeps
our data anonymous. This second variable is gender. And these last two columns
represent the height and the weight for each participant. Even after you've named
a variable, it's possible to change the variable names. Double-click on a
variable name to change it. When you do, you will be taken to Variable View, which
is where you actually will make the changes. We set the measure for each
variable previously. The ID variable is nominal because it stands for a number.
It stands in for a participant's name. The variable "gender" is also nominal, and we
are going to code gender as 1 and 2, for Male and Female. When a categorical
variable has only two categories, we call it "dichotomous" So the 1 and the 2 are
categories. You can be in one category or the other. You can't be in both; you can't
be in neither. These last two columns represent height and weight. Height and
weight are both quantitative variables, not categorical. They are measuring
something. They both have fixed intervals between the scores, and they both have a
meaningful zero. Both height and weight are set to Scale because they are both
ratio level. Before we begin analyzing these numbers, there is one other thing
that we should do. For a variable like gender - where we did not code the 1 and
the 2 - we don't want to get confused with who was male who was female; what number
stood for what. And so we are going to assign value labels for each level of
this categorical variable. Click on Values. I'm going to tell SPSS to
represent all of the 1's as Male and all of the 2's as Female. Of course,
I could make 2 = Female or 0 = Female; really any number that I
wanted to, depending on the coding. The 1 does not mean that males are "first place." The 2 does not mean that females are twice as good the number is only a
placeholder; it does not indicate an order or a quantity. So, now click OK. In fact,
when I return to Data View, you can see the numbers, but watch this: you see
this button? Click it and you can toggle between numbers and value labels. Let's
leave this set with the value labels on. It's just easier that way. Now we
can look at our data. The height of our participant was measured in inches, and
we have values between 60 and 70 inches, which is between five and six feet tall
(1.5 to 1.8 meters). Weight was measured in pounds. Of course it might be easier
to see the range if we sorted these data. Ctrl-click on Mac or right-click on PC
and choose "Sort Ascending." All of these participants were between 116 and 153
pounds. Just for illustration, I'm going to pretend that there were two participants
for whom we did not get their height or their weight; both of them female. Notice
that when the numbers are toggled on, all that I need to do is type "2"; however,
when the value labels are toggled on, I need to double-click and select "Female."
So now I think we're ready to analyze these data. One of the simplest things
that we can do is to count up how often things occur. For example, we want to
know how many males and females were in our sample. We want their frequencies. This is easy enough to do in SPSS. We're going
to use the Analyze menu. Whenever you run an analysis in SPSS, you use the Analyze
menu. We can see that there are lots of options, each with their own sub-menus
and sub-sub-menus. The one that we want is Analyze -> Descriptive Statistics ->
Frequencies. This window pops up and you will see lots of windows of this type in
SPSS. All of the variables that we have in our dataset are on the left, and the
variables that we want to analyze go on the right. You can select a variable for
analysis by clicking on its name and then clicking on this arrow between the
boxes. Alternatively, you can also drag-and-drop, and in some cases you can
double click. Let me show you just how easy it is to use SPSS: click OK. What we
are seeing now is the output window, and here is something very important to know
about SPSS, especially compared with other types of statistical software: SPSS
will give you copious amounts of output, often more than you really need, and you
need to know how to interpret that output. In SPSS, it is easy to run an
analysis, but it takes some education to learn how to interpret the output. First,
we see a summary of the variables in the box labeled "Statistics." We have 12 valid
scores for gender with no missing data, but for height, we only have scores for
10 people, with 2 missing values. The valid sample size is the number of
participants for whom we actually have scores. This first frequency table is for
gender. The total tells us that we have 12 valid scores. We see that there are
5 males and 7 females. Notice the columns for "Percent" and "Valid Percent."
They are exactly the same. They are the same because we have no missing values
for gender. This second frequency table is for height. Remember that we have
missing values for height for two of our participants, so we see the valid total
is 10. Two values are missing in the data set - called system missing - and the total
is 12. We see that the Percent column is different than the Valid Percent column.
The Percent column is calculated based on the total sample size of 12; the Valid
Percent is calculated on the valid n of 10 people for whom we actually have
data. I recommend reporting the valid percent column unless you have a
specific reason why you need to report Percent. Well, this is a good start, but we
can do better. Let's make some pictures of our data. I am going to run another
analysis and I want you to see that you do not need to go back to the data set.
You can run a new analysis from the output window, as well. Just click on
Analyze -> Descriptive Statistics -> -> Frequencies. You can see that our
previous analysis is still in the window. We could clear it by clicking on this
Reset button, but let's just continue with these data. So this time, click on
Charts, and then under Chart Type, click on Bar Charts. Let's change Chart
Values to Percentages. Click Continue, but before you click OK,
let's turn off the frequency tables because we already have those. Now, click
OK. In the output window, we see that the
chart for gender looks really good. We have two distinct bars, one for male
one for female, and we can estimate the percentages of each. But when we look at
the bar chart for height, the options just don't look as good. We can
definitely do better. Let's run another analysis. Click on Analyze ->
Descriptive Statistics -> Frequencies. This time, click on Charts, and then under
Chart Type, click on Histogram. Let's also choose "Show normal curve on histogram."
Notice that the chart values are now gray, because we don't need them. Click
continue, but before clicking OK, let's do one more thing. Click on Statistics. Here
we can choose other options like the mean, the standard deviatio,n the minimum,
and maximum. We could also get the variance, the standard error of the mean,
and the sum is good, too. As you see, we can pick as many of these options as we
would like. If we change our mind, we can unselect them, too.
Click continue and then OK. In the output window, we see all of the statistics that
we asked for. For example, the average height was 65.8 inches. The tallest
person? 70 inches tall. The shortest? 62 inches tall. If we added up all of their
heights, they would total 658 inches. But notice this first histogram for gender.
It just doesn't look good, not like it did with the bar chart. The bars are
connected, but gender is supposed to be discrete categories. We no longer see the
labels for males and females. And the normal curve makes absolutely no sense.
On the other hand, the histogram for height is much improved. The bars touch,
indicating that the data are connected, and the superimposed normal curve makes
sense with these data. We can see that the shape of the data match reasonably
well with a normal distribution. The important thing to learn here is that
you should choose the statistics and the graphs that are appropriate to your data.
A nominal variable like gender should be reported with frequencies and a bar
chart. Scale variables like height should be reported with a mean, standard
deviation, and a histogram. We know that the average height for all participants
is 65.8 inches, but what if we want to split that by males
and females? Let me show you how. Click on Analyze, but instead of Descriptive
Statistics, choose Compare Means and this first option, simply labeled, "Means." Here
we have the options for dependent variables. "Layers" refers to the
independent variable, or categorical variable. We did not really assign people
to the condition called gender, so gender would really be what is called a "quasi
independent variable." Still, we will use gender as our independent variable. We
want to examine differences in height, so height will be the dependent variable.
Now click OK. We can see the means and the standard deviations from males and
females separately and together. There are 5 each for males and females, 10
total. We see that males were a few inches taller on the average than
females. The total mean and standard deviation
here are the same as the values that we got earlier using the Frequencies
command. Overall frequency counts, charts, and
descriptive statistics are a great way to take a peek at your data and see just
what you have. It's a good idea to do this before running any other kind of
analysis. In our next video, we will look a little bit more at these
descriptive statistics and how to convert raw scores into z-scores. I'll
see you then.