Tidyverse and Tidyquant

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so I was talking to you buddy my Lucas he's a student at McCombs and the University of Texas in Austin and they're just a couple of things that I want to show him but this is for anyone who's interested so with our there have been a set of tools that have been created that make a lot of the the base our functions and my eyes a bit obsolete it's it's a new and cleaner way of writing code I really enjoy it and there's been a set of packages that have come out that kind of build off of this tidy verse package so I'm just going to introduce it to you here real quick first off there is he just google tidy verse you can look at all the the packages that are actually available deep liar is really good for data manipulation and and this is like the essential part of Tidy first that I would suggest you learn the second most important I think is ggplot which is a lot what has basically made our graphics so wonderful plots and such Tibble is a kind of a replacement for the data frame in our it's it's a bit cleaner and neater and it also yells you more which is good reader it's it's for importing data per anti gr we can get into a little bit later okay so let's just jump in and we are going to you load the tiny first package which loads all of those packages that you'd seen before and then we are also going to load tiny quant taijiquan's actually as you can see you have all these different tiny packages that have been kind of created to be used with this time first package so let's load both of those alright so why is tidy first great well let's just take the empty cars data set which just will always kind of show up in our it's always going to be loaded for you in the past if you wanted to create in a column you could say this is I don't know wait wait divided by m DG is equal to empty cars dollar sign and wait by the empty cars / mpg that's all great but honestly I hate those dollar signs and getting this out of the picture would really make writing code a lot cleaner but that is that that's one way to create a column there are ways of filtering data so for example in the past you'd filter data by saying empty cars and then empty cars mpg greater than 21 and then run that and then you just get to the cars that are greater than 21 it's been so long that I've actually nearly forgotten how to sort but let's see so we've got empty cars and then what was it in t cars and Peachy was a bank sort well geez I don't even know I've completely forgotten so okay so you want to sort data set and R this is how you like figure everything out order that's what it was order and actually I am jumping the gun here I'm showing things that haven't talked about yet okay so order and then we pop that here it's gonna select those columns that we want and see it comes back as this point being this sucks so let's let's learn a new way of doing these things the first thing I want to show you is the pipe which can be read as and then I'm drinking wine okay so just to show you the pipe real quick this is something that comes from the package Mac Ritter library of magret or ma gr but it loads up with the Teddy verse however the Mariner package is is built to kind of use this pipe which I'll show you a second okay so for example say we have a vector of numbers one to ten okay instead of saying okay take the max of 10 using this function here we we can say okay build the numbers one through ten and then take the max this is great because this makes a very readable code but you end up not having to deal with is is something like max head one through ten instead it's take one through ten and then take the head and then give me the max number and so it makes this just super readable code if you if you hit shift command M on on Mac then it'll just it'll just pop out that pipe shift command and shift me and there's a shortcut for Windows they just don't know what it is no so let's get into this alright so we have this iris data set and actually let's just open up t player so deep player you can see these different functions here there's a whole lot of different tools that make data rankling super easy so for example there's a couple verbs that are or functions that are super useful so let's take I recycle um names actually so let's do empty cars okay let's look at the head here so let's let's filter I am equal to 1 actually I wouldn't open this up I am equal to 1 and MPG greater than 2 what do you want so now we've got am are all equal to 1 MPG greater than 1 and then how about we do like horsepower I don't know horsepower less than 100 all right so now we've just subsetted all this data now you know how however we've decided to designate it here and then see and then we're going to mutate which means create new column so now let's create that that other column we done earlier so MPG div what mpg deployed is equal to mpg to platypus wait all right see over here we've got mpg to fight at the weight and then once we're done let's select MP g div weight MPG weights and I don't know direct trap ok to drag and so you see how we can very quickly kind of subset some of this data however there's just a ton of stuff you can do this simply touching the surface you could do like some sort of like select if is numeric for each column and then it's only gonna give aspect these more is in stop factor I don't even know if there's any factors in this know there's many factors however we get if we did iris and datas factor then we would get back just the factors or is that the America here so so this this is just simply touching the surface though for example if we looked at the New York City flights I think that it is I think it's flights here we could do and and I'm only gonna use teeth plier functions here so we're just going to call this flights is equal to that let's look at the head of flights well let's take flights and then mutate dates is equal to paste 0 month - day - year alright so let's look at that you know for here we've got this date but let's also use lubricate library loop or date Oh what am i doing library library leaver date take a sip of wine because it's delicious and we'll use and then use the mdy function which converts something to a month day year for example say I had a 1 - okay - so 1 - 1 - 2 - 1 7 that is a character if we did and that would then convert it to a class of date which makes plotting nice and easy so let's see here so now we've got a date and then we don't really care much about year month and day now so we can say select - year - month - day and then for good measure we could also say and then select date and then put everything after that so what have I done here maybe a maybe I am understand this less than night I thought Oh hold on I see so we're gonna get rid of those all together and then we're going to select date try everything here but then not have your birthday no I can't okama date didn't get rid of your month and day that looks like it okay cool anyways we don't care much it's just things you can do okay so now that we've done that lets we could either say this is variable just keep going down the pipe now that we've done that we can say group by by date okay and then let's just let's just count so we're going to summarize which takes a summary of some sort of statistics like for example if I said if you see we can see here on different dates there's different departure times not that would be very useful for me to do an average we can do like average air time for each so we can say like um average air time by date is equal to me and air time and we're just gonna get back oh hey alright let's treat you like an average air temp I date that's not very useful what I do think is useful though is to end flights so each each row is a flight so let's just count all the flights or sorry let's just do in here so now we have a date and the amount of flights so let's some flights by date so then we can say like plot like just to get a quick look at it we did type equals oh we can see that there's some some sort of like period periodicity though that some some like weekly periods here we could further subset this date if we wanted to so let's just do filter what was the column name care flights by a date filter date less than n dy 0 1 0 2 0 1 2 o 1 3 and I'll see here there's this 31 so that's good and then we could say BG plot and then it slides by date aesthetics are x equals a date y equals and flies and then she lime and so now we all we've done is just done a simple count of how many flights for each day and now we can kind of see that probably during the week there's more flights and I know what this dip is here so we might even be able to do something like current date function so like zero one that's your one touch 201 nine B then see why it's our day so that's the first day just do this let's say let's see get de name Libra date I wonder if there's a way we can get find the day of the week so what is this weekdays let's try yeah o weekdays Tuesday um let's see um the strategy on bar and then no let's not doing me that let's just do this lights by date okay you Tate day week is equal to weekdays and date and then we can say arrange here we go here's a arranged descending and let's I'm sorry I think we can just do yeah so it looks like maybe Saturday is the most or the least popular day so maybe Saturdays have the least amount of traveling and then let's see Wednesday Wednesday Thursday I guess it seems to just kind of stick around about the same for for peace but anyway so that's deep liar let's let's open up tidy point so there is a tidy quality or tidy plant TQ get options okay so this is what you look at to see what kind of stuff you can find at a teak one we're going to use a function teak you get TQ kit just get there we go so X is probably a ticker the get is what we're actually looking for and in complete cases will permit you from like getting back in the eighties and just probably cut out those lines so let's look at just give me something let's look at Apple and Amazon and let's say get stock prices so if this just runs and we'll end up getting back a data frame here of these stock prices and what would be considered a tidy format so what that means is that makes in this case that it makes it easy to plot stuff and to visualize it and and every every row is its an observation so let's see here so we get this data back data to this it's it's really easy to visualize so we could say ggplot stock data aesthetics well I need fifth column Dave so stop do that symbol and then we're gonna do symbol and adjusted so X equal symbol y let's suggest it and then we're going to say what's the color the guy who wrote this is from New Zealand that's why I symbol I should split it up and then G online what am I doing here oh next date there we go so now we can you know see these two different things plotted against each other you could use our D player skills to kind of get a subset of this what's kind of cool is that you could even do something like not that you would do the log of feet well I mean yeah I guess you could in some instances do the alarm with the adjusted but you could also just do functions on on on these columns if you wanted to you know not that you ever would but along with the adjusted and then he could like you know multiple multiple functions in here that would just reverse everything you've just done but it's it's things you can do nonetheless okay so let's check something else out so let's do financials and then TQ get hops get and then what was what were our options over here so you've got must you keep ratios TQ gets hey okay we need to meet this vector APL and AMC and now let's do ki get the ratios and let's just call that good all right so let's look at it we've got key ratios here and we get these symbols back we get sections back and then we get data back so what we're looking at here nested data frames you know basically it's like it's like take this excel spreadsheet and then just put it in one little cell this makes data really really nice and tidy we just have to know how to deal with it so let's just show you a separate schedule so let's open up iris all right I could say nest everything but species so now I have for each species all of that data that was in that that data set for each for each row is now in its own little separate column so if I were to you know say we filter species equal to setosa just get that and then I can say let's understated just packed it all up and then unpacked it but that's kind of the you know the way that you would you would do these things or you know say well let's see here let's use empty question this example let's say group by vs. and am just look simple enough and let's try a nest see what happens okay so if we group by these things here these two columns and it's going to find all these unique combinations and that's going to create a you know a separate data set for each of these so if we you know we could just unnecessary and nest or something I don't know it's just a way of tidying things up and making things clean so this is how this data is coming back into gear ratios here um well let's do this let's let's just look at let's try a profitability so let's select filter sorry filter section which is the column name section equal to let us say profitability PR o fi ta PIL ity so now that we have these two we're only looking at profitability let's just understand see what we're looking at here and then let's look at subsection so we're gonna pull subsection which is just like list that column and then let's just look at the different types of subsections so it just says that there's two subsections of profitability so let's go back and that's data um why don't we plot the margin of sales for let's see here what have we got here we go category see it there we go that's what I was looking for um so no let's look at the the EBIT margin so filter category equal to margin so now we're only looking at that for is it we have Amazon and Apple in there so then why don't we try just throw it to ggplot aesthetics x equals date y plus value color it's equal to symbol and she online not mine so now we're looking at the the ebq margins for you know Apple and Amazon together we just randomly pulled out you know one you know measure of profitability but there there's many more and and we've we've filtered the amount of data that we had down to a ridiculously small subsection of it all so we can kind of go back here let's look at something else Q ratios we could look at filter section equal section equal to efficiency okay and then let's just unnecessarily lling crazy like that ggplot ascetics date-date she hasn't valleys yeah why equals value color so what's a category this might just look completely ridiculous probably well line yeah actually we have the color is equal to category but we should also consider the fact that this is for two different companies so let's do this let's filter section equals symbol to APL here we go so these are all the different ratios for Apple it doesn't really make sense to plot all these at once and unless you're kind of looking at some sort of like a period of thing but you know you can do it and then we could you know you could do one for Amazon as well and just pick out what's that whatever is important to you just by filtering this stuff yeah so that's that's kind of some of the stuff that you can do at the Teddy point obviously there's going to be a lot more but I think that at least if you're business student over that McCombs which is when making this for you could immediately see the usefulness and using this for you know making plots for you know some sort of comparison between companies and and whatever else especially whenever you get into that financial capstone course where you have to do a lot of research this is a really quick way of pulling those numbers out and and now I think that you should at least have a idea of how to do it let's see there's key stats here when we just check that up real quick so just going to keep that stats and then want to copy those but then put that back so let's come down here [Music] oh this is because I think Yahoo has turned to crap um yeah I don't think that we can do that anymore because there's no because Yahoo's turned off all of their their stock date huh there's some other ways to get the key stats talked about later if you want well I can't just leave it here with the something going wrong um let's look at dividends so let's do stock data actually just yeah there's the dividends and actually okay here's another kind of cool function if you use we could use our stock data hold on that says stock data that's in stock anything let's call that stock ticker let's call this div data okay stop data all right so we have the stock data but we don't have the dividend data so let's do left join dead data and then hopefully we will have dividends on the correct days we need to make sure simple and date yeah cool [Music] so would you have the dividends in their table yeah so the days that the dividends are in there there's just obviously not dividend data for every single day especially I think earlier on and some of these companies lives but you know it's there and then you know it's connected to the the data set so now you've got that dividend data in there you could just replace actually let's do this so let's say that the stock data is stock give is equal to sorry this it's this there's another cool function that I like with deep liars called case win so let's say stock data just type it again so we do have these stock do sorry oh okay stop give head there we go so we have all these na values here but we could actually let's just let's just view this real quick so if I scroll down I'm just looking for trying to see this is back in 2008 either that was before Apple had a stock or stock dividend yeah so the most recent was like 2012 and guess Amazon that Amazon right yeah I guess they've never had a dividend before just verify mr. just didn't like am a scientific whatever mmm if these four it looks like yeah okay so probably they don't probably um dividend in are the case let's look at this so now we've got this stock dividend data together and then that column name was named dividend so let's save mutate dividend is equal to the case when dividend o is na is ten a dividend this little thing here is going to say give it give it a zero and then else just leave it where it is dip it in knock down that's our different name evidence and so here now we've well we could also call this dividends here de and if it did there's much time as I've spent in finance I should yeah it's about dividends okay so now we've kind of replaced those in agent zeros and I really like this case one function for example empty cars and let's just look at it you can get really ridiculous a case when oh sorry we should do it's got to do with in the beauty function which we call imbue Tate dump stuff is equal to the case when I see Q Cusack is less than seventeen turkeys am is equal to zero infinity [Music] drat is less than three this and then otherwise I am hungry for lots of wine I don't know what's gonna show up let's see type character okay let's do a side character it's not character yeah we can't we can't create new we can't have mixing classes so now let's go to whatever this so if this is is first you know it's it's gonna scroll down and whatever condition happens first is what's gonna happen initially so if Cusick is less than so teen well you know then it gets turkeys otherwise am this equal 0 give it infinity as character if you don't settle either those conditions and look at this it's less than three give it this random number and it looks like we never rented that case and so then okay well I'm hungry for lots of wine okay well it sounds like a good idea okay but that's that's the case one function there's some other things I think are really cool so let's just look at empty cars again these are the things that I used like every single day empty cars [Music] let's group let's let's do this let's select just make this smaller and so now you just have these two columns let's group I am and then let's and this group eyes like let's split this data set up until many data sets without having to split up the data set so let's let's do mutate what happens if we said mean mpg that mean mpg well what you'll notice is that because we use this group by function it's going to every time you use some of these like eat ate or summarize functions it's going to look at it in relation to this grouping so we've got this group of am here well you can see that this mean MPG is the mean of just II am so let's actually just a little bit more point being is that because we have you know am grouped together the the average MPGs of only the a.m. you know that are the 0 1 that's that's the number you're gonna get over here so we get 2 am Hmong people are going to have that that same mpg the difference with mutate summarizes that mutate keeps all of that that you know that data frame or that Tibble take together in its own you know large data set summarize is going to just kind of pull it all down into into like a summary statistics so we've got 0 1 and the the you know average mile per gallon for a I think a manual transmission would be 0 then for the automatic think it's 0 1 1 for true for automatic transmission so the the automatics tend to get better mpg according to this data set let's see it's always something else that I could talk about here but I think I'm just gonna leave it at that you can you can get into some really interesting things where with those nested data frames where you could run like you know a thousand different like regressions on you know a thousand different data sets he'd created by investigator angel whatever there's a really good example and this is using that per package Hadley that's super general but there's two really good video to check out see yeah managing many models and our would probably be good actually here we go I believe it's this plot con one here where yeah yeah this is it so so look at plot con 2016 Heffley Wickham to find some really cool stuff picked with Purdy plier with these nested data frames which is you know the primary point of this talk here anyways I hope that that was a good introduction to some of these things sorry I'm a little long-winded sometimes but if you have any questions just let me know alrighty
Info
Channel: Freddy Drennan
Views: 8,580
Rating: 4.8490567 out of 5
Keywords: tidyverse, tidyquant, R programming
Id: WM2hctrlMts
Channel Id: undefined
Length: 44min 49sec (2689 seconds)
Published: Sat Nov 11 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.