dplyr tutorial | A quick guide to using dplyr in the wild

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi! Welcome on the R programming channel from dynamic data script. In this video we will learn how to use dplyr by example. So I won't go into the details of the implementation of each of the dplyr functions.But, I will show you a classic dplyr workflow. During this video we will use a dataset from a scientific article it is published in the journal plos one and it's an open access journal so you can go and get access to the data and you can read the article yourself. This article is about the effect of vegetation structure on the chance of a lion getting. Its prey that it was a pretty cool dataset so this is what we are gonna use you go at the bottom of this page you will find a link to the data-set. I will put the link in the description below. So we can just copy the link and bring that link into R and I'll show you how to very quickly load those data into R. To load your data into R. You will need the download function. We'll use download.file() and we will copy the hyperlink. We will write the name of the file that we will download (let's save it as "df.csv"). If you run that it will download the data-set (it's a small CSV file). It will download it onto your computer in your current working directory and it will save it as "df.csv" and then we will load "df.csv" and we'll load it into `df` and that's it. We now have the data-set here. There are 882 observations 13 variables. Let's look at it very quickly. We want to know what they are. To open the dataset, let's click this little icon here. This dataset seems to contain lion IDs, the sex of the lion, ones and zeros in a column, representing whether it did a kill or not, the type of prey (and we see that there are some unknowns). I have played a little bit with this data-set before, so I know that, one thing to notice, it that, here, the prey species is not always written exactly the same you can see here there is no capital K and here there is a capital K. This seems to be a row ID and then, at the end, we have some data point about the view shed that's so that's far that's a how far can the lion see and and the distance from the closest down with cover and then distance to any cover let's say that the first thing we want to do is to clean up the column prey species so just to have an idea of what we were what we are working with we will apply the unique function to the variable prey species and so just to know what type of variability we are dealing with so we see there are 32 different ways to call species and this data set and so we have kudu unknown and then we have an empty string and then several animals and sometimes there are qualifiers to those press PCs so we won't get it super super clean the goal of this video is not to actually learn some stuff about lions but just to learn how to clean data first thing let's say that our goal is going to be to remove those two types of species the unknowns and the empty species and we'll do that using the filter function and so we'll apply the filter function on DF itself I will save it as DF one so we can see a bit the impact of what we are doing on this data set and so filter will keep every rows that are returned as true I would ask to give a gift back actually all prey species that are not equal to unknown DF filter know oh yeah we first have to load the packaged apply are now R was trying to use another function called filter and it couldn't figure out what I wanted so it doesn't work more I didn't run the by a Frenchman named iron did I and now it works okay so as you can see we have removed about 20 rows those that contain the unknown let's say that I wanted to give it another condition so I want those that are not equal to unknown and that are also not empty to remove the empty rows we will say that we also want the case where an car is above zero where the number of character in the column for a species is above zero so an car requires a character vector and this is because I forgot to tell the read CSV function that it should not read as factor this happens often and is the source of a lot of bugs so now we see that we have three hundred and eighty eight rows if we do are unique on the data frame df1 we should see that unknown and the empty strings are not there anymore so now to keep cleaning our data we will use the function mutate here you can see that I use the pipe operator and this is an operator that is loaded when you load the package T ply are but it actually does is that it gives the result of the function filter which gives back usually a data frame and input that into the first argument of the next function so if I use the mutate function here the first argument usually is my data frame but now I will omit that because I have used the pipe operator and so what we will do now is that we will change our column prey species to all to lowercase will override this column with the same value but with lower cases and if we reapply unique on df1 for prey species we can see now everything is in lower case let's say now that we wanted to know the number species that each lion killed to do that we will do first a grew by and then we will grew by lion and species because we want to know the number of species that each lion killed and then we will summarize our data points so we will aggregate I did a point and the aggregation will do is that we will sum the state kill one or non-kill zero and I'll just ring the DF I use that to give me the name of that count and be killed so this is the number of Prater was killed now we will run that a species it's three point species here go so we can see here that art lamb the lion our limb has killed two Buffalo one island and 20 kudu etcetera what if I wanted to sort my results by the highest number of kill to the lowest and so I would use this for a descending order on and we kill and let's look at our result the f1 so you can see that the lion Jess has killed 38 ostriches same for John not John was 33 and then Gina got 22 kudu and on and on only one deep lie our verbs remain and this is the Select function and so I often use the Select like that at the end to reorder my columns so or to rename them too so let's say that I wanted to rename my lion ID for just lions so it's a bit shorter and cleaner instead of prey species I would want on the prey number kill is fine so I want it and so I can just give him back like this okay and then I ran my dataset and you see that the selection had the effect of renaming those column you can also use select to only take a subset of your dataset and so if my dataset had been very big I could have used my D in my select just right at the beginning like that and then I could have used my aesthetic function here on the F and then I could have asked for only only to keep the lion ID columns and then I would have wanted also the prey and State killed by not and this is quite a long name so I would have needed that column two and then let's say that I would have saved it as a Khan is killed equal and then I would have in this case used is killed which would have been a bit easier there and that would be it and I could actually have just renamed it just at the beginning at that and then skip my last select and then make sure that I use the right column name throughout so I would have used Lion everywhere that I have lion ID and I would have used pray everywhere that I have crispy sees what I've been shorter and it would have been a good idea so let's run that and see that prey is not found oh hi I didn't send it I didn't use the pipe and then that's it so you see that this would have been equivalent and it's just another way to use the Select function so this was an example on how to use dir I hope you liked it thanks for watching see you next time
Info
Channel: R Programming - DDS
Views: 4,164
Rating: 4.9712229 out of 5
Keywords: RStats, R programming, R programming tutorial, R tutorial, R introduction, RStudio, DDS, DDSR, dplyr, tidyverse, datascience, example
Id: 27zCOiIWwhE
Channel Id: undefined
Length: 10min 38sec (638 seconds)
Published: Fri Apr 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.