Recoding data using R programming. Using the tidyverse and dplyr packages to create a new variable

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we're going to talk about how to recode data now what do I mean by recoding data let's take a quick look at the Star Wars daughter see if it's on the screen right now so in the Star Wars data set which incidentally if you've installed the Thai diverse package onto your computer you've got the star to sit you can practice everything I'm going to show you at home right so the Star Wars data sets got rows which are Star Wars characters Luke Skywalker etc etc etc and columns that are variables we could name - ethics etc now how using our programming do we get from this star to set the original one into a new one that I'm going to create over here there is a little bit different let me talk you through the differences right here we've got height in meters squared the original data set had height in centimeters we've also got gender using M and F instead of in the original set we had male female etc etc also we've gotten rid of all the missing values we've created a new variable called size characters that are big that meet the criteria of being taller than one meter and weighing more than 75 kilograms right so let's see how it can get to this new data set from the original one using some simple code then you have come to the right place on this YouTube channel we're creating our programming videos on everything of course our starting point in our is will always call the tidy verse package right you only ever install it once but you call it using the library or require function and once you've called the Tova's package we also have access to the Star Wars data set right so we're gonna create a new object and we're gonna call that object SW and we're gonna make that object equal to the Star Wars data set the first thing we're going to do is we're gonna select the variables that we want to work with so we get the pipe operator up and running shift command M gives you the pipe operator enter now we type in the come on select and which variables do we want with one's name all right mass and gender now I'm gonna rename mess and call it weight just because I prefer that and the other thing I'm gonna do is I'm gonna get rid of missing values by missing values are gone right now I'm gonna create a separate video on how to deal with missing values some other time next I want to take height which is at the moment in centimeters and I want to change that to meters in other words I want to divide each of these numbers by 100 so I used the function mutate and in a pipe operator mutate and with mutate you can either create a new variable or change an existing variable in this case I'm gonna change an existing variable I'm gonna take a height make it equal to height divided by 100 and now but a Bing badda boom these are in meters let's have a look at the gender variable here we've got it as males and females we want to change that to M and if we might we could change that to zeros and ones we could change it to anything we wanted but importantly in fact if you scroll down we can see that there's not just male and female but we've also got M Aphrodite so before we carry on we want to filter this variable and make sure that we've just got males and females and I'm going to show you two ways to filter firstly we can say filter gender is equal to male and then a vertical line like that is or gender is equal to we use the double equal signs because we're asking a question of the filter we're saying here's any given observation equal to male or is it equal to female and if so use that observation right if we use just a single equal sign that means we're making a statement that this is equal to that portion come and enter and now if we go up into our data set we don't have anything except males and females now a slightly more elegant way of doing the same filters we could say gender and concatenation male female right that does the same thing the reason this is a slightly more elegant solution is because sometimes you may have many many possible kinds of observations you want to filter for and using a concatenation like that is tremendously useful okay so we push enter now and of course we get the same thing not only its recode male and female into M and M so we could let's do our pipe operator we're gonna do it your mutate because we're changing an existing variable changing the gender variable it's gonna be equal to we're using a new command called recode right and what is it that we're recoding its gender then I like to put a comma just to keep things neat and go to the next line but you don't have to we're going to recode male and that's gonna be equal to M comma next line but you don't have to female equal to F and there you go and the last thing we want to do is is if we look at our daughter frame we want to create a new variable called size and we want size to say for any of these Star Wars characters if they're taller than one meter and and more than 75 kilograms we're gonna call them big and otherwise we're gonna call them small so let's have a look at how we do that all right so once again we're gonna say and then so the pipe operator is like saying and then mutate it we're gonna create a new variable called size it's gonna be equal to any observation where the height is more than one meter and the weight is more than 75 kilograms now if we look at the data frame it's used it's using trues and falses to say whether or not those criteria were met this is what we call a logical vector so to change the trues and falses into bigs and smalls let's put a comma here we're going to continue with the same youtaite right so we've got size is equal to it now this is a nice little trick it's called if-else right and it's saying if so we take if size is equal to true then put in big otherwise put in small and the broomshakalaka there you go so if you are serious about learning how to analyze data and you want to learn our programming then hit the subscribe button now and hit the little bell notification if you want to get notified of future videos [Applause]
Info
Channel: R Programming 101
Views: 18,987
Rating: 4.992157 out of 5
Keywords: R programming, R programming for beginners, tidyverse, recoding data, statistical analysis, cleaning data, quantitative analysis, data manipulation, data wrangling, dplyr
Id: KQuPsYHG1TI
Channel Id: undefined
Length: 7min 5sec (425 seconds)
Published: Fri May 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.