02 Descriptive Statistics and Frequencies in SPSS – SPSS for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to the second video in SPSS for Beginners from RStats Institute at Missouri State University. In our first video, we learn how to create variables in SPSS. The next step is to add some data, and we're going to begin in Data View. Here in Data View these are the same four variables that we created in the first video. So now we can add some numbers. Pause the video and enter these same numbers into your SPSS spreadsheet. Now that we have numbers, it's important to understand just what these data represent. The first column is a random identification number. It stands in for the names of the participants and keeps our data anonymous. This second variable is gender. And these last two columns represent the height and the weight for each participant. Even after you've named a variable, it's possible to change the variable names. Double-click on a variable name to change it. When you do, you will be taken to Variable View, which is where you actually will make the changes. We set the measure for each variable previously. The ID variable is nominal because it stands for a number. It stands in for a participant's name. The variable "gender" is also nominal, and we are going to code gender as 1 and 2, for Male and Female. When a categorical variable has only two categories, we call it "dichotomous" So the 1 and the 2 are categories. You can be in one category or the other. You can't be in both; you can't be in neither. These last two columns represent height and weight. Height and weight are both quantitative variables, not categorical. They are measuring something. They both have fixed intervals between the scores, and they both have a meaningful zero. Both height and weight are set to Scale because they are both ratio level. Before we begin analyzing these numbers, there is one other thing that we should do. For a variable like gender - where we did not code the 1 and the 2 - we don't want to get confused with who was male who was female; what number stood for what. And so we are going to assign value labels for each level of this categorical variable. Click on Values. I'm going to tell SPSS to represent all of the 1's as Male and all of the 2's as Female. Of course, I could make 2 = Female or 0 = Female; really any number that I wanted to, depending on the coding. The 1 does not mean that males are "first place." The 2 does not mean that females are twice as good the number is only a placeholder; it does not indicate an order or a quantity. So, now click OK. In fact, when I return to Data View, you can see the numbers, but watch this: you see this button? Click it and you can toggle between numbers and value labels. Let's leave this set with the value labels on. It's just easier that way. Now we can look at our data. The height of our participant was measured in inches, and we have values between 60 and 70 inches, which is between five and six feet tall (1.5 to 1.8 meters). Weight was measured in pounds. Of course it might be easier to see the range if we sorted these data. Ctrl-click on Mac or right-click on PC and choose "Sort Ascending." All of these participants were between 116 and 153 pounds. Just for illustration, I'm going to pretend that there were two participants for whom we did not get their height or their weight; both of them female. Notice that when the numbers are toggled on, all that I need to do is type "2"; however, when the value labels are toggled on, I need to double-click and select "Female." So now I think we're ready to analyze these data. One of the simplest things that we can do is to count up how often things occur. For example, we want to know how many males and females were in our sample. We want their frequencies. This is easy enough to do in SPSS. We're going to use the Analyze menu. Whenever you run an analysis in SPSS, you use the Analyze menu. We can see that there are lots of options, each with their own sub-menus and sub-sub-menus. The one that we want is Analyze -> Descriptive Statistics -> Frequencies. This window pops up and you will see lots of windows of this type in SPSS. All of the variables that we have in our dataset are on the left, and the variables that we want to analyze go on the right. You can select a variable for analysis by clicking on its name and then clicking on this arrow between the boxes. Alternatively, you can also drag-and-drop, and in some cases you can double click. Let me show you just how easy it is to use SPSS: click OK. What we are seeing now is the output window, and here is something very important to know about SPSS, especially compared with other types of statistical software: SPSS will give you copious amounts of output, often more than you really need, and you need to know how to interpret that output. In SPSS, it is easy to run an analysis, but it takes some education to learn how to interpret the output. First, we see a summary of the variables in the box labeled "Statistics." We have 12 valid scores for gender with no missing data, but for height, we only have scores for 10 people, with 2 missing values. The valid sample size is the number of participants for whom we actually have scores. This first frequency table is for gender. The total tells us that we have 12 valid scores. We see that there are 5 males and 7 females. Notice the columns for "Percent" and "Valid Percent." They are exactly the same. They are the same because we have no missing values for gender. This second frequency table is for height. Remember that we have missing values for height for two of our participants, so we see the valid total is 10. Two values are missing in the data set - called system missing - and the total is 12. We see that the Percent column is different than the Valid Percent column. The Percent column is calculated based on the total sample size of 12; the Valid Percent is calculated on the valid n of 10 people for whom we actually have data. I recommend reporting the valid percent column unless you have a specific reason why you need to report Percent. Well, this is a good start, but we can do better. Let's make some pictures of our data. I am going to run another analysis and I want you to see that you do not need to go back to the data set. You can run a new analysis from the output window, as well. Just click on Analyze -> Descriptive Statistics -> -> Frequencies. You can see that our previous analysis is still in the window. We could clear it by clicking on this Reset button, but let's just continue with these data. So this time, click on Charts, and then under Chart Type, click on Bar Charts. Let's change Chart Values to Percentages. Click Continue, but before you click OK, let's turn off the frequency tables because we already have those. Now, click OK. In the output window, we see that the chart for gender looks really good. We have two distinct bars, one for male one for female, and we can estimate the percentages of each. But when we look at the bar chart for height, the options just don't look as good. We can definitely do better. Let's run another analysis. Click on Analyze -> Descriptive Statistics -> Frequencies. This time, click on Charts, and then under Chart Type, click on Histogram. Let's also choose "Show normal curve on histogram." Notice that the chart values are now gray, because we don't need them. Click continue, but before clicking OK, let's do one more thing. Click on Statistics. Here we can choose other options like the mean, the standard deviatio,n the minimum, and maximum. We could also get the variance, the standard error of the mean, and the sum is good, too. As you see, we can pick as many of these options as we would like. If we change our mind, we can unselect them, too. Click continue and then OK. In the output window, we see all of the statistics that we asked for. For example, the average height was 65.8 inches. The tallest person? 70 inches tall. The shortest? 62 inches tall. If we added up all of their heights, they would total 658 inches. But notice this first histogram for gender. It just doesn't look good, not like it did with the bar chart. The bars are connected, but gender is supposed to be discrete categories. We no longer see the labels for males and females. And the normal curve makes absolutely no sense. On the other hand, the histogram for height is much improved. The bars touch, indicating that the data are connected, and the superimposed normal curve makes sense with these data. We can see that the shape of the data match reasonably well with a normal distribution. The important thing to learn here is that you should choose the statistics and the graphs that are appropriate to your data. A nominal variable like gender should be reported with frequencies and a bar chart. Scale variables like height should be reported with a mean, standard deviation, and a histogram. We know that the average height for all participants is 65.8 inches, but what if we want to split that by males and females? Let me show you how. Click on Analyze, but instead of Descriptive Statistics, choose Compare Means and this first option, simply labeled, "Means." Here we have the options for dependent variables. "Layers" refers to the independent variable, or categorical variable. We did not really assign people to the condition called gender, so gender would really be what is called a "quasi independent variable." Still, we will use gender as our independent variable. We want to examine differences in height, so height will be the dependent variable. Now click OK. We can see the means and the standard deviations from males and females separately and together. There are 5 each for males and females, 10 total. We see that males were a few inches taller on the average than females. The total mean and standard deviation here are the same as the values that we got earlier using the Frequencies command. Overall frequency counts, charts, and descriptive statistics are a great way to take a peek at your data and see just what you have. It's a good idea to do this before running any other kind of analysis. In our next video, we will look a little bit more at these descriptive statistics and how to convert raw scores into z-scores. I'll see you then.
Info
Channel: Research By Design
Views: 540,549
Rating: 4.9281392 out of 5
Keywords: SPSS for beginners, Todd Daniel, statistics, flipped classroom, SPSS, beginners, introduction, diving, deeper, how to, how to do, how to use SPSS, introduction to SPSS, data cleaning, EDA, frequency table, online, teaching, learning, instruction
Id: bapuGcjwiLQ
Channel Id: undefined
Length: 14min 1sec (841 seconds)
Published: Tue Dec 05 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.