How to Make Subsets, Filter, and Split Files

Video Statistics and Information

Captions Word Cloud
Reddit Comments
this is a dataset on how to make subsets within your data set so this may be that you want to separates out some of your analysis into different groups or you want to make a new dataset that only has some people in it or some cases in it or you want to temporarily filter some cases or even remove them from your data set so I'm going to show you a few tricks that are very handy for multiple purposes one is that sometimes we're engaging and sustained comparison between two different groups right so we want to compare this is the educate education longitudinal survey of 2002 and this one is very focused on this is very focused on Hispanic students who live in the city this portion of it and say for example this this variable about English students native language and it's dichotomous which is very useful we can do this with non dichotomous variables but it's nice that this one is and what we're really interested in doing is we're going to run a bunch of different kinds of analysis comparing the relationship between variables for students for whom English is the native language for whom it isn't so if we go up to data and then down here to split file this is a temporary split this you can turn this on and off right in fact you have to turn it off at the end and it's not saved from from it doesn't save within your dataset necessarily so it makes no permanent change to your dataset but what it does is it will organize all of your output into two different groups right or just as many groups as there are in your project so we're going to put the variable that we're interested in here I'm gonna say we want to sort the file by these grouping variables so we just click OK and what it will do is it will automatically for all of you output you do from now on it will divide it into how people answer this question right so say I just wanted to do a very simple frequency analysis right so I want to look at some frequencies of some other variable so I want to look at question of how far the school the student thinks they will get and um it's just what I'm interested in so I click I might want to get some statistics on it got some quartiles and like a mean and median in a mode I should help I can't do the me in here so the meeting in the mode um I'm going to do that what it will do is first of all it it made it the category of missing right so it's like oh this first group is kind of missing so probably due back to clean my date up and this the folks refused right but here are who think the English is to this language no right so these are students who were poor raised initially speaking some of the language probably Spanish and it tells us how many students Templar are included in this section it gives us the median for this group it gives us the mode there's just some percentiles and then it gives us a frequency distribution just for students who for whom English is not their native language and how they answered this question so what percentage of them think they will attend college think they will graduate from college think they will obtain a master's degree or equivalent how many of them think that like this is what they're gonna they're gonna end up doing okay and then we scroll down here are students for whom their English is their native language right so there's 12,000 of them so we can go through we can do frequency distributions here if you found multiple cross tabs overwhelming this is a really easy way of using control variables using such a control variable and getting small our cross tabs so you don't end up with those huge tables you get individual tables for each kind of subgroup so this is kind of useful it's particularly useful about when we make graphs right so if we just want to make a pie chart a very very simple pie chart and we want to just make the same pie chart right so how far in school they think they're gonna get I'm gonna change it to percentage for here and what we can do we click okay is it makes two pie charts for us actually this is like here are students for whom English is the students native language no it wasn't like they were made speaking some other language so we can see kind of how they answered this question right and then for student for when English is is a student's native language so this can be very useful if you're comparing two different groups okay tourists some people live in San Antonio people who were have moved into a housing complex more recently or have been let me know time right it can be a way of visually comparing two different groups very very easily it may be that you're not that interested in temporarily doing this right so I'm gonna go back and I'm gonna click reset to get rid of this split file but it may be that we want to do something a little bit more permanent or a little more sustainable right so maybe we want to make a whole dataset that is just students who whose native language is not English so we might decide okay I want to make a whole dataset because this is what my project is really about it's really about students who answered no to this question so I could go up to data and I can go up to select cases right there's all sorts of different ways that I can do this I'm going to use the if button here and we're gonna run into this more later and I'm gonna say okay I only want to make a dataset of student who said that English is not their native language right so there's a couple of things um that's my selection and then for my output I can decide to temporarily filter out on selective cases right so this will make a temporary filter so that I can just analyze within my dataset maybe I want to run a whole bunch of analysis I'll keep that filter you notice it'll make a filter variable so I'll show you what I mean by that specs like okay what it will do is that if I look in the date of you you can see that it actually crosses out anyone who not being included in this analysis so it's an easy way of figuring out if your filter is on right so you go through you can see that only people who said that English is not their native language are included which is a very small group of this huge data set so any analysis I do so if I go and get a frequency distribution mmm it will be just those students it will be just those students and because you can't really like tell that it's just those students you have to be really aware of when you have a filter on the second thing it will do is it will in your variable view it will make this new variable which is um which is your filter variable right so it can it can be a way of recoding things so you'll see that one means I've selected so this is a student who was raised by lingual and women's not selected this is less impressive for something like this which is already dichotomous but if I had made a selection for say like a range of individuals who their parents went to high school or a more advanced degree right it would make a dichotomous variable out of that for me that I could use in the future and I could use it again I can go and I can rename this bilingual instead and that kind of this little extra bonus perk so the thing about select cases is that you also have to go back in and you have to say okay I don't want to be filtering anymore I'm going to reset this we can decide to get more particular right so I'm really interested in you know so these students who think that they're going to go to college right they want to attend or complete they want to graduate from college right this is what they aspire to do so they answered this question greater than or equal to five I'm gonna look and I also may be interested in what their parents expect right um I may be interested in whether or not they their parents thinks I want to graduate from college so say this is a group of students that I'm really interested in and I want to make a whole dataset that's just about them so we go to data select cases I'm gonna say okay well if students come to use these little parentheses for this if the student is greater than or equal to five so they said either arm and a graduate from college or I might go on and get like a master's degree right then I can decide I'm gonna make a whole new dataset of just kids who want to go to college right so I might name it and it has to be just one word so call it found if I can spell compound and if I click OK it would make this whole new dataset this is just called college bound I thought I want to get more complicated right so maybe it's not just these kids I'm interested in maybe I'm interested in if they both believe that they are going to go to college and their parents believe that this as well so this will include people only for whom the student and the parent are an agreement that they're going to go to college right so I'm like this will create a whole new data set that just has individuals who the kid wants to go to school let's go to college and the parents also thinks that they will graduate from college one could get really interesting and say sort of either/or right not both conditions have to suffice as long as one or the other works then they can be in the data set right there's some advantages to that I also can get a little bit more complicated and how I'm thinking about this and if I'm really interested in kids who have aspirations above and beyond with their parents believe right I might say well I mean I'm really interested in the kid really wants to go to college and the parents think the child will not go right so that may be a whole new group that I'm interested in analyzing and so I can analyze them instead right so if I continued with this it would create this whole new data set or I could use that as a filter so you can get pretty creative here with this if button you also can make a lot of problems with this if button you can't string too many things out if I said students expect this and I know what parents do not expect them to go to school or I want you know the student to be his student to be Hispanic I am it's this is getting too complicated right especially when you start mixing and Zoar SPSS isn't really sure what you're trying to do I can do a series of ants a series of conditions that I want to be matched right I want to be Hispanic female students who expect to go to college but their parents don't expect them to this could be really interesting analysis right but if I do I can do a series of ORS right either they want us to go to school or their parent wants them to go to school or their friend expects them to go to school mmm these are all things I could do when you start mixing your hands and your oars and it gets a little bit too messy
Channel: Amy Stone
Views: 15,886
Rating: 4.5555553 out of 5
Id: Jtk4Di8rSGY
Channel Id: undefined
Length: 12min 26sec (746 seconds)
Published: Thu Mar 19 2015
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.