Tutorial: Introduction to SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this introductory tutorial on SPSS this tutorial will cover entering data into the SPSS data editor it'll cover a basic manipulation and preparation of data for analysis then we'll look at running basic descriptive statistics and finally we'll look at using the SPSS syntax editor part 1 preparing a data set welcome to SPSS so when you first open the program you'll be given a few different options these options include opening an existing data file or typing in new data to get a blank data set like this one here with nothing in it you would select the option to type in new data so before you enter any data you'll notice that both screens look the same so we have the data view screen and the variable view screen and without any data or values entered into this the data set they do look quite identical however you will see that they are very different once we get into actually using them so to start we'll go to the data view so the data view is very similar to excel in that this is where we're going to see all of the raw values in our data set so how it works in SPSS is that each row represents a participant so if we start creating some data here so we'll start by creating an ID variable we can give participant 1 an ID of one participant 2 an ID of 2 percent 3 3 4 4 5 5 etc so what we've actually done here is we have created a variable with five values and SPSS knows that we have five participants in this data set because we have five activated rows so for example if we were to enter more data along here SPSS would know that we're adding participants whereas if we were to enter data into these road or sorry these columns SPSS would know that we are adding additional information about the existing participants if you're to enter information down here on row 14 with a standard a1 SPSS has automatically added all these other participants here okay notice right now there the values have been activated however they since we didn't actually enter any values they're just being identified right now by this period which means that it there's no information there so for now we'll delete all of these participants but there we go so now we're just back to 5 ok so right now SPSS has given this first variable a default name called var 0 0 0 1 if we want to change the name to that it's there that name it's very simple we just need to click on the variable view here and it's gonna bring us to a different view and what we can see here is the name of our variable and by double clicking we can change it so we will call it ID SPSS is pretty open in terms of what you name your variables the only rules are that they have to start with alpha and numeric letters so it has to be letters you can have numbers and symbols in your variable names however they can't start with that you can't have any spaces so if you do want to add a space you have to use an underscore and then you couldn't put numbers or whatever you want afterwards and it'll be ok but for now we'll just keep that as ID so you'll notice that we also have a lot of other options here but before I show you those options we're also gonna and create a few other variables so we'll have gender and score so gender is gonna be a categorical variable where you have two categories you can be either male or female and score let's say we have scored on a test from 0 to 10 so if we go back here we're going to enter the rest of our information so we have our participant IDs here and now we want to enter gender so SPSS can handle word variables they call them strings however for the purposes of running analyses it's always better to use numbers so when trying to indicate gender instead of writing male and female you should assign a code to male and assign a code to female so typically with gender we would use 0 and ones or ones and twos so let's say for this particular example that males will be one and females will be two so I've entered this here and I'll show you shortly how to make that also mean valency male but for now we just have the numbers and then these are their scores on ten so let's say we have seven eight nine eight ten perfect so we have a continuous variable here and a categorical variable and an ID variable so if we go back to our variable view we can tell SPSS a little bit more about this and we can look at the different options here so numeric essentially means that we're using numbers in the data set I did mention that SPSS can handle words however if you're going to enter words you have to change the type of variable here to string down at this option and then SPSS knows to try or not to try to interpret the values in that variable as numbers and try to analyze it unfortunately SPSS can't actually analyze string variables so if you do indicate something as being string it's just gonna skip over it and not use it so that might be fine if you're entering someone's address or something however if you do want to look at gender differences you need to use the numeric option so generally we're gonna just use numeric okay the width here refers to how many characters wide the variable will be and it includes the decimals so for example right now let's say none of these variables need decimals so we can take the decimals out and you'll see now if we go back I only change two of them that the first two variables now don't have decimal points but scores still does so we'll get rid of that here and then we can also just reduce the overall width of the variable to say two and it just means that it won't be quite as wide and that's that so decimals and widths are dealt with label is an opportunity to give a name to your variable that doesn't meet their name requirements of SPSS so let's say for example that you you found this code here that you've given it very ambiguous and you'd like use you know maybe a longer sentence or some spaces to indicate something you could hear for example put you know test scores or males and females instead of gender and what this means that when SPSS runs analyses with these variables instead of using the variable name it'll use this information to say that's doing that so we can move that no problem or actually will replace that just by gender so values this is very useful for categorical data if you want to assign a number of code to the values in your variable so if you recall on sort of gender variable we had indicated males with 1 and females with - well now - let SPS know that SPSS know that we're doing that we would click on this and we would say one is male and then click Add and two is think of female and click Add so once these are here you can click OK and then they should show up right there under the values option and if we go back to the data view you'll notice that nothing has changed however if we click up here we can ask SPSS to show us the value labels and if we click that it changes to male and female however we still do have that underlying numerical value which means that SPSS will be able to run analyses on this variable okay going back the next option we have is missing variable so right now our very our data set does not have any missing data but if we were to go back and let's say that participant 4 is missing a score here so if we want to tell SPSS that this is actually a missing data we can assign it a code traditionally we use something like 9 9 9 9 and then we go back to the data set and on the score missing here we would select discreet missing labels and we could give it the code of 9999 and SPSS will automatically know that that is a missing value and that we will not analyze that particular case okay next option is the columns you can ignore that the alignment you can also ignore that and then the mint level so sometimes if you enter a lot of data into SPSS automatically or import data it will attempt based on the patterns of the data to determine what the measurement level is however you should always double-check this because it doesn't always get it right so there are three options in SPSS scale refers to anything that's measured at the interval or ratio level ordinal is ordinal measurement and then anything that's categorical is nominal so in the case here our ID variable even though it technically you know has a it has a numerical value and you know sort of his ratio since we're arbitrarily assigning these you have to go with it's a nominal variable and really you shouldn't use it in any analyses it's more just to it identify our participants however a good idea to make it nominal gender is also nominal because it's categorical and our score on tests would be scale because it's a it's a ratio variable where there is a zero and you know the distance between each score is the same and then you can ignore this last column here so the reason is important to use ID variables is because if ever you decide to sort your data so let's say for example that we are going to sort our data and ascending order here so just by right clicking okay so this output box is gonna open we close here and what we've noticed is that it's now sort of the data from smallest to largest and we've actually switched the order of the participants so the number is here no longer correspond with the same number they did before and you can tell that based on the fact that this variable has been reordered so obviously with just five participants in three variables this isn't a really big deal however when you start having really large data sets it's very important to assign an ID number so that you can keep track of which row of data belongs to which participant part two transforming data so sometimes in spss before we can run any analyses we have to make certain transformations to our data so before we do that let's add a few more participants what six seven eight nine and ten and again we'll arbitrarily assign them a gender and a score so first things let's assume that we want to create a new variable where we say whether participants have either passed or failed a course so here we have we have ten people and will no longer have this missing data here let's say that you know this person got a three on ten so let's say that in order to pass the quarry in this course you need at least a score of five I'm sorry of five on 10 or higher so we want to recode this into a pass/fail so it's pretty easy to do with SPSS so what we'd have to do is click on this option called transform and we would do recode in two different variables so what we're gonna do is take our score variable here and bring it over into the input box and then we're gonna give a name to our new variable which is gonna be score pass/fail so if you notice I'm using applying all the rules for naming variables in SPSS okay so old and new values it so what we need to do here is we need to assign a code or pass and fail so it's easy there as well say that one is a pass two is a fail so that means that anything in the existing variable that has a score of five six seven eight nine or ten is gonna be a one and the nude in the new variable is it's a pass and anything below five is gonna be a two in the new variable which is a fail so you can do each of these values independently or we can do a range so we can do five through ten and then here we would say is equal to one so these are the old values the the new values and we click Add so what it does here is it says five through ten equals one and then we can do zero through four equals two okay so we have five through ten as a one zero through four is a to click continue and click OK this output window opens ok and we close this and suddenly we have our new variable pass/fail so if you recall we need to go now to the variable view and specify that one equals pass and two equals fail click OK and now if we go back here we can click on this option and now we have our pass/fail variable okay some other times we might need to either take an average or add up variables so let's say that we actually had our students do another test so we'll say score two and again it was on 10 and we have you know different scores just random excellent now we have two variables so we might want to create a new variable that represents the sum of these two scores so it would be a total on 20 so what we would do is we would go to transform compute variable and then from here we actually have a giant calculator so we'll call this score total so we just have to give it a name and actually here we just treat this like a regular calculator so you would do score plus other score so just by double clicking and it appears here and you don't even have to double click you can just write the variables out yourself if you want and if you know it works just like a regular calculator so if you wanted to add brackets and have this function done first and then maybe x gender this is an option doesn't necessarily make sense but you get the idea so we can just leave it here and click OK again this window is going to open suddenly now we have our scores on 20 so we have 13 15 14 et cetera and if you check it out you know 7 plus 6 13 8 plus 7 15 etc so again right now it's SPSS automatically puts the default of two decimal points if you want to get rid of those you would just go back to this window and you can take them all the way now if we go back they'll be gone so another thing we might want to do is instead of creating a total score of these two test scores we might actually want to create a mean score so let's say that instead of you know adding them we want the average so SPSS has a whole list of functions that you can use in order to do that and creating averages is pretty common in statistics especially in psychology when we do scales so it's a pretty good function to know so again we go back to compute variable so everything we did before will stay if we want to get rid of it you would just click the reset button and we'll bring everything back to the default so we'll call this one score mean and the best way to do means or a lot of you know standard deviations anything is to take the function because it's automatically gonna treat missing data properly whereas if you do it just with the calculator it handles missing data differently so my recommendation would be to actually use the function so if you click on the all function and then go down you'll be able to find me right here so you double click and you'll notice you have mean bracket question mark comma question mark close bracket so what you do is you want to replace the question marks with the names of your variables that you want to include in the mean score so right now we only have two so score and score two however we could ask variable three variable for variable five if these are our data sets and SPSS would automatically take all of them so the important thing is that they're separated by commas and then there's a bracket here at the end but for now we don't need those so we've our two scores we click okay tells us that it's being done here and then we have our score so 7 plus 6 Amina's 6.5 8 plus 7 and mean of 7.5 etc so that would be how you create a mean score part 3 descriptive statistics so the next thing we'll cover is how to run basic descriptive statistics in SPSS so the descriptive statistics follow under the option analyze here and descriptive statistics so notice that there are a lot of different options however the two easiest ones to use our frequencies and descriptives they both do similar things but have served different functions so we'll start with the frequencies option so if you click here this window will open and what you can do is select the variables you're interested and analyzing so the easiest thing to do is just click here and then grab that and there so for now we'll just look at score and gender bada as a default this box here is going to be checked which is gonna ask SPSS to display the frequency tables so what this will be is a table that highlights how many or how many of each response we have for all the variables so if you check that or uncheck that it'll you know to not have that option come out SPSS is giving us a warning right now though because since we haven't asked it to do anything else it means that right now there's no analyses to run since this box has it been checked so if we're interested in asking for some basic statistics we would click here and we can just check all the options that we're interested in so you'll notice the central tendency measures are here we've got mean median mode and then you can actually ask for the sum of all the variables if you're interested and then we have our measures of dispersion so a standard deviation or variance our range minimum Maxim and standard error and then we can also get our measures of skewness and kurtosis SPSS will automatically also give you the standard error of skewness and kurtosis so that you can calculate your ratios here and then you can also get your quartiles and tell us if you like but maybe I'll just select a tall and we'll see what that looks like so you click continue and what SPSS is going to do is bring out all that information for both variables now since we have a categorical variable here some of these measures are going to be appropriate for that variable and some of them will not and then there's others that will be more appropriate for the category or sorry the continuous variable of score but either way we'll see what it looks like so we click OK and SPSS opens this new window here and this time since we ran some analyses it's going to give us all of our information so for gender really is telling us that we have 10 valid responses and no missing which makes sense we have a missing data right now and that's the same for test scores so the mean isn't really interesting because it's a categorical variable that means the standard error mean we don't care either the median that and as well however the mode is interesting since it's a categorical variable however because it was a little Asterix here it means that there's more than one mode so what this means in this data set is since there's only two values is that we have an equal number of males and females and SPSS will automatically put the mode out as the lowest value that you have assigned as a code so let's say instead of two categories we had three if there is an equal number in all three categories one would always are the one with the lowest number would always be the one that shows up but you just need to pay attention here to how many modes there are so we don't care about the standard deviation or any of these things since we have a categorical variable however on test scores these additional variables or this additional information does become useful so we know that the average is 6.8 and the standard error is 0.68 we have a median of 7 a mode of 6 but again we have more than one mode so it means that all those six was the most popular there are more than one it's not just six it's the mode standard deviation of 2.15 our variance is four point six skewness and standard error skewness our here kurtosis and standard error kurtosis are there we have a range of seven a minimum score of three and a maximum of makes sense the range would be seven that the adults cores to get area of a 68 and then you can see the percentiles here so those are our descriptive statistics so we can get most of these same descriptive statistics through the analyze descriptive statistics descriptives option however you will notice that a lot of them are not available here so if I click on options we have just the mean in the sum we don't have the median or the mode we have all the dispersion statistics and all the distribution ones but we don't have those we don't have the percentiles of the quartiles as well so again you can click on all the same things and for whatever reason if you run things through this option SPSS automatically gives you you know if different the table or the output table has a different orientation so instead of being you know in this direction it goes wide so it's not having just two columns and a bunch of rows we have a number of columns and just a few rows so you can see here all the same information again and of course depending on whether it's categorical or continuous you should interpret the scores accordingly so there's two extra features that are available in those two tabs that are different from each other so if we go back to frequencies we can reset it so it's gonna take away all those other things we'd asked for and we can ask it just to show us the frequency tables so if I click OK what this does is essentially gives me how many males how many females we have in the dataset and you know how many scores of three four six seven eight nine ten etc so if you recall we were told that we had two modes that make sense we have five males five females and we're also told that we had more than one mode on the test score and if you recall it told us it was six however we had two sixes two sevens and two eights and SPSS told us six was the mode since it's the lowest score but we have more than one so with frequency tables what you get is will lose this one for the example you get the frequency what percentage of all scores that represents what represented of violin scores it represents so if we had missing data there would be a missing column here and that would have a percent and then this valid percent row would make a correction where it would not take into consideration the missing data and redistribute the percentages based on only the valid scores and then here you have the cumulative percent so you know one escort of threes plus the floors plus the sixes etc so if we close this so that's the other feature on analyze descriptives oops sorry analyze descriptives frequencies the other feature of analyze descriptives descriptives is that we can ask SPSS to give us the standardized values of the deads the z-scores for a variable so this is obviously not interesting for a categorical variable however for the test scores which are measured at the interval/ratio level it would be interesting to see what the corresponding it is that standardized scores are so we just select this little box here click OK SPSS opens this window these are all the options we already selected from before since we haven't removed it we can close this and we go back to our dataset we notice that there is a new variable called z-score so what it'll do is take the variable name there you have and add a little set in front of it and now we know that these are the corresponding Zev scores for these scores in the data set okay so next thing we might want to do is create some sort of visual representation of our data so if you recall for categorical variables we're interested in bar charts and for continuous variables were interested in histograms so there's two ways to do those personally the easiest way is through the descriptives and then frequencies option however you can also do it through the chart builder here so I'll show you both so in descriptive statistics we go to descriptives frequencies so we would select the variables that were interested in so since we're going through them one at a time since one's a categorical and one is a continuous so we'll start with the categorical variable so we have this here we can unselect the display frequency tables and then we would click on charts so you'll notice here there's the no chart option the bar chart pie chart or histograms so since its categorical we'll select bar with click continue we click OK and then SPSS will generate us a picture of this data and you can see it's a pretty boring chart since we have exactly five males and five females but what's nice is that SPSS automatically incorporates the labels or the values that we've given here for the or the labels rather that we given for the values and if you want to customize the look of your graphs you can double-click and eventually it'll open you can change the color double clicking and make it different if you'd like but no need et cetera so lots of options there we have our gender and this is the number of observations male female okay so that's the bar chart if we want a histogram we go through the same option this time we'll get rid of gender add score go to the charts option and click histogram is at a bar chart and we continue and this time we have the histogram so the difference between a histogram and a bar chart is that the histogram uses a scale across the bottom so this is a scale going from 0 to 12 and it's going to place the bars accordingly along that scale whereas with a bar chart it's just gonna automatically give a value or a the if each category is gonna have its own branch of the chart regardless of whatever these labels are so if we were to run the same one as a bar chart it would separate them and you wouldn't have this scale across the bottom so this is useful when they want to look at the distribution of scores so we notice here there are no fives that's why there is a space here and this is the frequency so we have either one or two if you recall from our frequency or sorry our frequency table so just I'll show you what this looks like in the bar chart you can see the difference but if it's continuous data we do want to be looking at it with the histogram but this is what the equivalent bar chart would look like so we've kind of lost the distance between the scores along the bottom and now it's no longer obvious that five is missing whereas that is important for the actual shape of the distribution okay so this is not appropriate for continuous data part for the SPSS syntax editor as a final step we're going to look at how to use the syntax editor in SPSS so using it luckily using the syntax editor in SPSS is very easy and does not require any knowledge of coding however it does create a nice record of all of the different things that you've done in your data set and once you start running more complicated analyses it's nice to be able to just copy and paste your syntax and rerun it that way so to open a new syntax file you go to file new and you would click syntax and a new window is going to open that looks a lot like the output editor except that you can write things so in SPSS syntax the only thing you need to know is that if you want to write little notes to yourself you need to start with an asterisks and then you can write whatever you want and by ending it with a period SPSS knows not to read that line so that's about all the code you need to know how to write because everything else SPSS will generate for you and the way SPSS generates it is through the paste function so if we go back and we redo everything we already did today we can actually ask SPSS to just paste it right into our syntax editor and we'll have a copy so if you recall one of the first things we did today was we transformed our data so we recode it some of it to a new variable so if we look here we have the recode in two different variables option so if we click and we had recoded a score into a pass/fail so instead of clicking okay and automatically running it I can click paste and what SPSS does here is gives me the syntax version so it says recode score that's the variable name where 5 through 10 are going to be equal to 1 and 0 through 4 are going to be equal to 2 into the new variable name so I just need to leave that here I can delete this line that says that data set activate because I only have one data set open and I leave that there so to run it what you do is you select it and you click run selection and what that'll do is like you clicked OK in the syntax window except that now we have a copy of what we've done here in this nice little window and obviously it didn't show up again here since we've already done it but if you had given it a new name so I'm gonna add pass/fail to and it's gonna redo the same thing again ok so now we've made a new variable and if I go back to my data set you'll see here pass/fail score 2 has now been added okay so something else that we did read my syntax window sorry oh there we are okay so that was for the example there the same thing applies for the compute if we just click paste' it's asking a change exist from variable it lets click paste and click OK and we have it here so it's computing score mean and we have this so if I want to tell myself what this was I can just add two asterisks and say this is how I computed the mean score for the tests and I end it with a period if I forgot to put the period here you'll notice this is all gray that is because does it thinks it's part of this note to myself and not an actual command in the syntax so by adding the period it now knows this is a new line of code this execute line here goes at the end of every series of commands so that you can our SPSS knows to run it so it's like the equivalent of putting ok so now let's go grab the descriptive statistics so we go back to descriptives and again here we have that same paste options we can select everything we want in the menu and then instead of clicking ok we would select paste and it gives us the same information so we have frequencies on variables just score and we ask it to make no tables and no sir no frequency tables but we asked for the bar chart as you can see here so the syntax will change depending on what you pick knowing the difference in tax is not necessary and really understanding it however I do want you to get used to keeping a copy of what you did so frequency Tate or frequency analyses to create bar chart let's say ok and you didn't period you have that there and same thing goes for the descriptive statistics option so here you click paste and you'll notice that we're running descriptive statistics again on score and then this save line is telling us to create the standardized scores and then you'll notice here under statistics we have all the different things we'd asked it to do so if you delete some of these you can delete them right from the syntax and it won't run them or you can leave it and then you need to add your execute right here so not very complicated um like I said this may not seem very useful right now however as we progress through this semester and you start running some more complicated analyses this will be very useful to have kind of a written log of everything you've done or for example if you have to recode a lot of different variables instead of doing it manually each time you could just say okay so score one we do this and then let's say sorry we want to do a whole bunch at once you could say the same thing here okay so score two is going to be two and then score three will be three and you can leave everything like that and I'll do all of it at once instead of wasting the time and manually doing it each time so there are definitely some shortcuts once you get more comfortable with the syntax but for now just keeping a record of the analyses you did and the score or sorry and the different order you did things in is sufficient enough and as you get more comfortable you'll see that it can help and the syntax is pretty straightforward especially if you have any experience with coding
Info
Channel: Meredith Rocchi
Views: 704,462
Rating: 4.8420887 out of 5
Keywords:
Id: SL2bZXfWQls
Channel Id: undefined
Length: 36min 0sec (2160 seconds)
Published: Wed Aug 13 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.