R Tutorial - 009 - How to use the mutate function in dplyr

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back everyone in this video I'm going to be covering one of the last basic functions that are provided by the deep liar package so for this video we're going to be covering the mutate function and to start we are going to load up this flights data library that we had before and we are also going to be loading up obviously deep liar so we need to add both of these lines to load those libraries and then to load the data you can name your data anything you want I'm gonna be assigning it to just a object called data but you could obviously call this data name or anything it's just a name and this is going to be the flights data so this is provided by this NYC flights library so that's why I can write it just like this when I run that we'll have this data that you're a little bit familiar with from prior videos and why don't we just check out the column names of our data just to remind ourselves what that looks like so we have some time periods and some flights and some details on the flight timing all right and so now we are exploring this mutate function provided by deep liar so why don't we just take a look at what that looks like so we come down here to console question mark mutate and then we can load up the documentation for the for the mutate function within deep liar okay so this will give you some usage it'll also tell you give you some examples here let's see if any of these are helpful here but the thing that mutate does is it allows you to add new columns to your data table very easily and you can even do some calculations for the new column so that's what we're gonna be exploring let's take a look at our data here and see if we can come up with something interesting okay let's see what we have here we've got some timings we have departure time scheduled departure times the actual delay so that's interesting right because this flight took off two minutes later than expected and the same thing for arrival times okay and then we also have flight numbers destinations air time distance and hours okay so why don't we do something real simple here okay we're gonna be adding a new column to this data table where we look at basically how much slower this flight was relative to expectations and the way that I'm gonna calculate that is I'm gonna say what's the difference between this delay from departure and the delay on arrival so for example in this first row we took off two minutes later than expected so if the actual flight itself is exactly as we would expect we should be arriving two minutes late as well but in this case we arrived 11 minutes late so I'd like to have a column here that just has a minus 9 it says the flight itself was 9 minutes slower than expected and we can use the mutate function within D prior to do that okay so how's that going to look we're gonna be using the mutate function and within this function we're going to be applying the mutation to our data and what do we want here so we're creating a new column let's come back here and just take a look I suppose we could call this flight gain so the idea here is if it's faster than expected then we'll get a positive value if we have a loss it'll be a negative value and what is this equal this is going to be equal to see your column names here let's go back to our data and actually see it this is just simply going to be departure delay - this arrival delay so we can type that in depth the delay - our delay okay and this is finished so let's run this and see how it affects our data we could change this if we didn't want to override our all day at a table for some reason but this is this is fine as well so we run that and then let's just jump in here and see how it updated so we should have a column here at the very end with our new calculation so if light gain minus 9 exactly as we expected another useful thing about the mutate function is that we can do other calculations in this as well so we can at the end of this column if we add a comma we can add it even in another column here so let's call this one gain per hour and this is gonna be equal to how much of a flight gain we receive per hour so how are we able to calculate this well we can take this value which we just calculated and we can divide it by the total air time this is in minutes okay so if we wanted it in hours we do another adjustment as well so let's say this is equal to our flight gain divided by our air time divided by 60 so this is now in hours and make sure all our parentheses are correct so now that when I run this we're still gonna be mutating data and it actually already has this column here flight gain so we're gonna be overriding this as well we're not gonna get a second column a flight game here okay we're replacing our entire data set and then when I run that let's take a look and see how that looks all right so now we have this flight gain of minus nine divided by this total air time divided by sixty okay so we are a little bit slower than expected on this flight all right so now these are all normalized by how long the flight is so that's pretty interesting too perhaps we can do some plots of that later and that wraps up the mutate function we're gonna be using this all the time so keep this in mind and we'll see it again if you don't have everything solidified yet it'll become solidified over time alright see you in the next video
Info
Channel: analystguides
Views: 22,982
Rating: 4.8285713 out of 5
Keywords: RStudio, R-programming, dplyr, mutate
Id: 2dFpblO7MB8
Channel Id: undefined
Length: 6min 39sec (399 seconds)
Published: Sat Mar 18 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.