Introduction to ggplot2 in R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
see it working okay all right so if you have so if you can all open up our Studios probably what something you should see if you're unfamiliar with our CEO is give it a really quick overview so this here is your our script you can think of it kind of like your notepad or what you kind of want to write so none of this code will actually went like run through the are console until you want it to be but it's kind of just like this is a notepad and then down here we have the console which is actually where the our lekars being actively run and so any any line that you type in here will be run through our program this shows any kind of variables in your environment and then the whole thing will show like the files of in your directory there's a plot there's packages I don't need to love these things but if anyway so when you make a plot that will kind of pop up here this is what we're going to be doing today so so if you have not yet already we're going to do install drop packages and then in quotations you want to do plot to and then if you hit command or control enter um it should like uh let's check those loaded in my library so you need to also do that as well so once the type of system installed we need to load it into your current environment so that means that it's kind of prints or package in the R or it's to be available for use so I kind of want to talk about what ggplot is really quickly here so the racing stuff in a comment on the here so comments are obviously things that are not read by the program but kind of for your visual pleasure so kind of the basic components of a GG plot to use plot to plot so you need you need a data frame if you're unfamiliar with our data frame is cause it kind of looks like an excel file so it's got rows and it's got column so um the kind of example data set that we're going to be working with is one that's built into our of mpg so if I just go ahead mpg this is kind of what your data frame will look like so you have your columns which show all the manufacturers the models this is car data so for whatever cars was they have a bunch of different information about what it is cool it makes very really good don't see the sweat so yeah so we need a data frame so that the kind of data frame that we're going to use all right so then the next part that we kind of deal with in ggplot we need a data frame first of all and then we have these things called our aesthetics mapping so the aesthetics are how how the data are going to be according for like size color like how you want to display your data so essentially the aesthetics and then we have genomes and these are the geometric objects so these are the things that are like the point any line shape so bring you would you would give the genome certain aesthetic so make sense okay so that's kind of like the three basic thing so can also add those no wait you teapot works is kind of interesting so it's kind of built in layers so G G stands for grammar of graphics and this is kind of the whole mindset behind this package and the reason why everyone uses it is because it's really easy to layer on things and things those things and it makes you able to modify a graph really really easily so let's kind of get into it so let's make a graph so we're going to call the ggplot function which is the name function in ggplot obviously this but the first thing you need let's say so we're going to tell how do you find actually does something release that or are through does something really nice where it's like if you type in a function it will tell you all the arguments that function can kind of take right out of that so we're going to tell it our data is equal to MTV's we're using the built in mpg data frame that comes with our and we're going to tell it so the aesthetics tell it um the x and y variables so I believe it was excellent Y right or is it what yeah yeah yeah thanks a lot okay so so we're going to have the distance displacement I guess whatever that means for cars versus the highway which is whatever doesn't matter so but if we run this plot if we run this function and we try to call B it's not or a difference okay well if there's nothing on it right it's just an empty graph as we didn't we didn't give it any genomes we didn't tell it that we need like something to display I data so um so what we do is you're going to add to your GG spot so the nice thing is is rip is actually somewhat different than regular our syntax if you guys have been using our for any point amount of time but you're not familiar with GG plot so you just use the plus operator to kind of add on to your to your graph so let's say so we want it let's say we want a scatter plot right so we're going to call genome point we're not going to give it any aesthetics yet we're just going to use the default wearing plane aesthetic so if we run this command into our Chronicle and then we hit G again so I'm taking G in the console here so that it displays it and so now we've added points to our graph and now we can actually look so that is kind of the gist of GG plot so you could imagine that it's like what this is this plus operator operator it makes it really easy to kind of add on to whatever you want so let's add a losing very smooth line as we go down smooth what this does is this is going to give us a line of best fit and so method is one of the art events that you kind of need for this and women say LM which is just linear like regular linear regression so if I hit command enter on that so we've changed our DRG like what we define for G so let's see what we got here and so now we have a linear regression line for this and the default I believe gives a confidence interval of 95% like that so these gray lines are so you cannot really modify that to suit your your needs for your plot but you can imagine my message easier to see so in life if you like that was really fast right so imagine you already had your data set like your professor gave your ear excel file you imported your excel file into a data frame in our just just a very another simple function you can don't even like throw up a lot really really fast so that's kind of the nice thing about us are but to be honest this is kind of an ugly plot right and like gray --great random lines no I don't like it so um so let's change let's change their and color of our point let's go with color is equal this is kind of a default color I'll see you in so if I rest enough all right you good you guys heard the news Oh s1 all right mentor screw this I'm just trying to separate it out so you can see it doesn't you can do it all in one line is you don't have to do this love like this so yes so now if I ecology again we've modified our plot so now we've got cool cool that I mean not that exciting but you could even like if we wanted to the color is one type of aesthetic to our point genomes is that make sense so um another kind of so the interesting thing about GG plot is that each genome can have its own aesthetic like I've just done therefore color equals blue so this blue color only applies to this point right we're not coloring anything else on the graph but if we wanted to color based on referring back to our data which is obviously ring that could be informative you have to specify that with the aesthetic function so let's do that so within our genome point function we're going to call an aesthetic function and we're going to say let's color it so color it equals ERV so this is a cr-z column from our MPG data frame so if I release so if I run this because every time we change the priority you have to rerun the command so now we have colored our points based on the drive now we're showing three types of information on you know as to quadrat or as to them in their grass um I guess smells three damage anyway so we have our highway mileage or displacement and then I think this is like this must be like for your will four-wheel-drive and whatever so plus we have our line of best fit so you can imagine that if we were to add you know we can even change the size of these points so let's change the size is equal to 4 that is this cheap shot we're not going to do this for you oh yeah now a little easier to see and so the nice thing also about ggplot is that let's say let's get rid of the shrinking size of this - let's make it five is equal to two to remember because we're changing beat like these arguments are only applying to the GL smooth function so we can still have our size one dot but we can have our line with size - um we can I think at the equal slope will get rid of any error line producer so if I wrote a minute and then take a look at our graph okay yeah so that's also a big one but it got rid of the confidence interval line okay so now we just have our regular linear regression fit line we have our colors and so you can even VD plus cause these things call scenes that are kind of built-in so I always use scene black and white cuz I'm worried you can probably I pretty much always wear black pages but the vices do seem black and white there's kind of automatically changes per plot for you and I think it's a better than the regular gray I just like a white background I find it easier you um and you can within just theme you could change so the function to change all the text type is based family um so we could imagine that if you put to infiltrate and let's try two times because it you know how about if it is another problem so then we have a look at a plot okay now this change that is made of time and it looks much more a little cool but you can end and you can see like in literally like 15 minutes not even and I'm obviously going slow because I'm showing someone that I've made a pretty decent graph I think and you can modify this be more or less that you want it to add a title key please so the function for to add a title for labels of love um the title equals and let's run this through the console all so we've added if I can make this bigger experience oh yeah and you can see here G is our plot that's got a list of mine so that's probably not to make sense and what you know a lot about our but um ggplot work simplest three picture here but yeah so I don't know if there's anything else we wanted to do like you that so within the X within the label function you can add like a neck label should be this is what I wrote they're not leaders and then you can add my label okay I'll per count Oh run Oh again and you write the stuff the thing is finicky the scroll pad on my computer is a little finicky so okay yeah so here we go we've added some per gallon engine displacement we've added some questioning sir titled um Oh would you um so I mean and these are just scattered all right so the other thing that I had actually open so you go the the gracing also go ggplot is that there is a ton of resources out there like how many people use it can I open a new one I don't know if that word did I so imagine so if you go to needy plot to encourage [Music] some serious but here we go so if you go to a doc set the you plot that word flat current this will give you every type of deal possible with um with cg plot so you can do like far cause you can do mark lots you can do density plots you can do dot plot violence laws like any plots that you can put your little heart desires you can do on here and there's also like I said like Justin scales so say you had you know data that was numerical and you wanted to scale it continuously on it on a great scale you can do that with you know different functions you can alright I actually got something actually but the other thing that I wanted to show you real quick too is that GG plot has a cheat sheet which is really really good PDF I have this printed off like double sided near my desk never live so it's really good actually so it kind of tells you the basics of the program and kind of the spots that you can do if you have one variable that continuous or discrete um the s only bar plots or discrete student uh but um so or like you know you have two variables whatever kind of variables you have it really gives you a nice kind of overlay of of all the types of plot there are and then on the next page it kind of gives you how it's like what if you wanted to like to change the stats which I actually don't do really on ggplot at all but I think this is more if you have like a histograms or density kind of plot yeah or like like a special scales if you wanted to you can you can fill it out for manual colors if you play colors you wanted to do or you can scale it based on aesthetics so based on like kind of like as we colored our plots based on another column and our data frame you can do that you can scale it that way you can change positions here's the title themes you can I think yeah if you're the guys function gets rid of that legend that comes up automatically pops up pops up so that is a really really awesome resource does these kind of two resources will be your best bet or you can do what I often do and just Google your questions and guaranteed Stack Overflow will have somebody that came into the same problem or you can even post them and I've done that a few times as well so yeah yeah so I might show you some staffing I tried this before but on some people's computers but not this is that when we did what was the things we did the title of the work yes however angry when we ran a summer and I were dreaming um so no that's saying but we had people that were like really small problems with their computer in the sauce is just like so unless you have like at least two places brand I probably wouldn't do this because we must be a get like what like 1 or something on gigabyte of RAM and it just crash your computer like that but like generally if you have a normal computer you should be fine um but that's kind of a downside yeah so cause it allows up so that's kind of the downside of our in general is that everything pretty much is done in your RAM and your working memory so that's always a downside people have kind of like made packages to kind of with a lot of the stuff that the Jeep was made by Hadley with them and his kind of group and he's done a lot with that kind of stuff like I think it's like C code under the hood raised yeah so people have written this packages for that and stuff to kind of like work around it that's really the main downside of our advice I'd say like if you're working with huge data sets like I remember when I first start using our I used to have to use like the big iMac in the in the lab where I can run it on my computer or just crash so that's kind of like a word of caution and this is these really huge do this we're going talking like tens of thousands of rows so but anyway let's class something so if I go the function process rap and so this is kind of interesting so if I do open them which should I wrap it with let's just do so we have it colored we can also stop yeah okay so here this is what's popping really is so what we've done is we've separated the points out so they're still colored by the drive too but now we can explicitly see with like one small function like we have okay so we just want to look at the highway miles per gallon versus engine displacement of just the four-wheel drive cars and the front-wheel drive cars and then the real car crash we have it all kind of separated out they all have their line of best fit um and that was pretty easy so the the tilde it kind of matters which way so if I put DRB on the other side it's going to do it from port like it's going to do it smolders it's going to do it the other way from the lane oh right oh we're screwed doesn't tell us that we don't want we want to go like this I don't really stop to be honest but it is kind of a useful tool mmm yeah really that's another okay well what it should do is instead of we're going to do I don't even do stopping Chris you uh oh so there's two functions I don't really know the difference between rapid grid I guess this is it - if you want to do it this way yeah this is what I want to show you so I guess if you wanted to do them like lying on top of each other you have to do faster grid otherwise you would do cause about redoing the whole district mm this was about this derivative that was yeah what did you do not agree but I think if you if you don't if you don't like it the default is for it to do it in like columns and then it but if you put it before the so they will do it kind of like in rows okay that's probably someone need a file on your own time so my friends we will get you right now but um anyway yeah I mean I'm not a programmer or any I'm a dabbler but I do use this a fair amount you can also make heat maps with it I wanted to show you mine but the data is online computer like I was just going to show you what I would make in ggplot but I forgot the adapter just for my computer so sorry but you can make my heat map in here or something I do a lot just based on the work that I do here I have a default I'm oh you have no telling them thank you we should buy one for the group we should buy some adapters just that way okay so do you have one here um but I haven't using the Bri shape to package so I didn't tell you guys just download this thing but I'll just quickly throw up oh I have this all done and the sound looks like um he fires you try Pollock you probably want me to look so long and AH while we have this printed okay so [Music] I'm married you playas I think let's take a while all right what this is it's a good it's a good tactic to have its if you guys do any kind of fast drifting like you know the piping like the fire allows you to pipe in are so that's really kind of the main benefit of it so you instead of having this type of command after commands you can just type it all out so well I've gotten into using that those your um because the thing when any kind of heat map is that you usually need to change the structure of your data frame from like oh like normal kind of long fourth wife or not into kind of the tall format where you need like two columns and that's how you would you eat map in our I don't know if that's anyone cares about that what kind of costs would you go sleep using to keep up for anyone have any questions or let's look our general right nothing mmm-hmm so I use the empty cars is another dataset in there why are a lot of carding so there is a little bit so if I just look at that so it's more kind of harsh so this is kind of this is kind of I guess an intro is deep layer as well so this is the kind of site man percent greater than for son and so what that does is it's saying okay take empty cars and then use that data frame and we're going to select oh I think I need here too right is it select content and desires of your select ok and identify okay cool I'm on ok um I always load like so every supports are every Arthur Milo's like supplier Khidr than ggplot so look why I always I never really know which ones which anyway so we're going to select 1 columns 1 3 4 5 6 7 so if you use Python it all you have no idea what R is so are doesn't have that thing where it starts counting at 0 footsteps coming at 1 so what I'm saying is like let's select mpg displacement horsepower drat I am getting rid of like be flat columns right click we call them so that only want the numerical data those that are like fully Americo and not factor data because it's because I want to make a heat map oh then I'm saying alright so once we selected the columns that we want we're going to take the pipe and we're going to perform poorly on it so and then the next step that I have here I'm going to round my for relation su and then I'm going to melt so this is the reshape to package which I don't know if you've downloaded that or not I didn't do it was very alright service reshape basically that's doing what I said in the turning it from the wine into tall but raisin kind of if you're curious about and then we can actually use the pipe going to deep ggplot student rings nice really fun so so we're going to do and so the giome tile is the one that you're going to want you'd be like a lot of heat so if I run all this so that took me like what two minutes not even zydeco knocking oh the backers no oh I didn't load okay well so by default it doesn't include so all the correlations you can kind of see how all these variables are correlated to each other and not communicate at all so using a few simple packages we quickly threw out those correlations so when we use to melt the melt function it will automatically assign like your first column as variable one and your second column is variable - so it's like it did the correlations and then we melted it so um it's time to show you they didn't see this as an individual variable I can do that actually what not so I know so this would be the long way of doing it without using that type questions behind this cheese if you can backwards on your roll because mine is tickling the other way anyway so so no so that's what that's what I'm kind of showing you right now so I'm going to kind of show you how to melt functions and this is the same kind of it's a very similar package Oh amazing colleague ease you get to like one machine and then you're like the most confident thing ever on anything else anyway so um sighs and you actually want to lose yeah dude yeah we don't even know how yeah okay what was it work again some stuff I think so this is on other pockets but I probably should have showed you I just like to be ours i I wanted to do this more in depth but time to like figure it out today so I kind of saw something that I already made and try to modify it so that's called piping so what it's doing is it makes your are so I'm going to show you the traditional this is how you would traditionally do this and ours what I'm doing right now where you kind of define every so variables which they select T is going to be dot and then we're going to say all right so let's go next up let's perform the correlation so we're going to call it ours are going to be core of T kind of go step by step and do everything that I just did so and then we'll say Q is going to be round round are the two digits and then we want the letter eleven uses W to be melt yes okay so this was kind of being how you would do it all these would kind of throw everything up with a variable and if I just do head of W so I'll just do so and then I'll show you the head so this is what we started with was our correlation values for everything and then I just took those and I melted them into one variable versus the second variable and then the value that correlates to it this was like reshaping the data of packages come ppreciate to and so we needed to be in this we need it to be in this long format and the W format for the giome tile to work but what the percent thing is doing is you need the deep-fryer package like I said and that just what took a while to install you can kind of install it on your own time but what this does is instead of having to do it instead I want it instead of like doing it all this way and like continuing to go on and then you know writing giome ggplot w and then all our aesthetics is I'm having G time are you said I'm writing it this way and making it and then clogging up your environment with a bunch of different variables I mean this makes it easy to go back and look at all your stuff right so that kind of this bonus of doing it a long way but this way saves kind of memory and it makes it faster so this you do faster thing else do you bastard - no no okay this is called a pipe so what this does is it's saying instead of typing out the command like this what you're doing is you're saying take this data and do this with it it's kind of like the backwards way of thinking about it so you're saying take this data do this with it then take this data and do that with it then take that data and do this with it then take this data and just that was it and then take whatever the output of this and do that with it right so it's kind of like you're you're you're funneling your stuff through it as opposed to doing it this way where you're saying create a variable do this thing now create another variable doing this thing on the old swing then create a new variable doing this sensing on the other old thing and you know I mean so it just kind of makes your code easier to read as well this wasn't so convenient rosy flowers are good but in the end it kind of it also makes it's more efficient I think for your your computer as well so it doesn't clog up so much of your RAM yes anything everything you want yeah in a much better chance so um we should do it like a decibel session on that egg at some point do is a while ago fire you call this is doom so you can do yes you kind of just service BOTS but I've never done it how do I get back to ya it's not like fully 3d but you can do like density 2d plot freaking kind of like this cool what Excel is like it's kind of like a contour contour plot mixed with like and then you add in like so you have saw one like someone did like a volcano and it was like the shape of the volcano and then like the height and they have other stuff on it you can have a new graph like that but you can't like do it like a true like ex-wife says but I don't think that there's anything yeah which I jump out you pretty hectic over here yeah just like variables that would be something I would kind of nothing about if I don't what will is on three dimensions thing I use this like color from so I guess that's kind of what they're describing us free variables but yeah I could list etic like the spill technically I guess is another variables if you can do that with using a continuous variable right like I just did with the correlation so if I go back to our studio like this is a full-on gradient right like it's not it's not plotting based on discrete values like you're it has one is this color and then zero is that color and then everything else is the gradient so I guess that is second third variable because it is a continuous variable and not discrete but that's not really something that I ever do but you can do I guess the third variable a third continuous variable yeah but um so well my data's never gets that complicated the moment I got like this so it'll be the way it that I do with three variable because I think you do that with radius kind of like scale yeah okay I think I'm sure there is a way to do it it's not nothing I'm the guy who wrote gdy properties yeah I say they're kind of misleading um yeah but like I said there has been a ggplot release for Python or the nose I'm not sure yeah so um yes I mean my leg I have something to call it but so this is like something if you're if you're more familiar with Python programming and you use this kind of the very aesthetic to kind of add operons I'll go Pablo they have it as plus and then the flash but okay um yeah but um there might be a way you could kind of marry the two in type I'm not I'm sure there's more like I feel like Python has just been more robust for that kind of stuff rather than our but it uses the very same kind of like syntax those you could maybe think of something it makes every be a pretty sup but yeah so it uses the aesthetics the kind of the genomes and all that kind of stuff that's kind of the most confusing part is if you want your aesthetics to reference back to your data you have to use the AES function otherwise you would just you know change it like right in right in the genome so that's kind of the difference like these are aesthetics to the genome and this is Nick aesthetic that refers back to your data that's specific to the GM point and then these aesthetics the x and y variables apply to the whole graph so anything in the GG spot function applied to the whole graph and then these are kind of their own individual things here and so that's what kind of like that's why GG plot is slicked up the grammar of graphics is just saying like let's be able to separate this all out and so we can kind of really modify the exact thing that we want so like I said like in the like so now changing the color is one tune in the font whatever you want to do a thing like I can specifically change the color of each XY label and get really fancy ya know any other cotton or we're done here good does everyone feel like they could at least muddle through a plot ya know hey our union or injury sure so that so here you go export you can save them as an image or save them as a PDF I would recommend saving them PDF um so and then you can kind of preview it you can change the size of you want based on images it's not like this is the best way to kind of modify them but you can always like save them as a PDF and then that means you can open them in Illustrator which I do a lot if you really want to like add to kind of figures together that you've made a digi products tend to do that illustrator or you could use any sort of PDF Editor or you could save it as like it is JPEG or what kind of images I don't really see I don't really use them a disagreement do everything in like vector-based graphics but we could do that um and so that's how you would export and graph what Gigi safe and that and that is what takes a lot off this is a lot fun okay Elsa specify like yeah we all despise easily okay just people's five points tomorrow so I wasn't efficient I'm I know that good see that is degree amusing that yeah then it's yeah um so either of those methods will work to bigger plot um in if you're just kind of going into very CCI are if you want to get yourself um so if so there's two functions read that table and then read CSV so these apply whether or not you have like read uh cables for like XLS files Excel SS sort of layer and then CSV files or for CSV files I kind of do everything in CSV because I find sometimes the Excel format doesn't always um want to work with are those easier to just like save your excel file to see as a CSV in Excel and then just use your CSV in energy to go in a car and then you can kind of you know then you can have your CSV you have like and then it will give you a nice the reads Fe Fe will take your excel file and put it in a format that looks like the empty cars where you have your road names whatever row names that you've specified in the reason yourself in your CSV file and you kind of like you would put like row names equals whatever you can think about how to how you want to modify that function yourself but then it will take every column they have in your CSV and put that as a column and in a data frame in R so it's very very easy and quick to kind of yeah like and generally that tends to be what people have like your supervisor your chance to work looks like so at least mine does but yeah so and then you can also if I wanted to like if I had data that I'm modifies I create a calculated correlation so my data and I want to save that table you can also export CSV files from our as well so it's very easy to kind of like put in your data and take it out and then put in I don't know any was reading flops on that we should pay people to I can't figure with anything you would ever want to do that but you can easily export your yeah very well seven like a soul and here now before I go does anyone have any any other questions this trivial or beginner like this is an introductory court where like class oh please never never has a paid job question yeah cool thanks guys
Info
Channel: UofT Coders
Views: 2,266
Rating: undefined out of 5
Keywords: ggplot2, data visualization
Id: jQVKQcbu92o
Channel Id: undefined
Length: 53min 26sec (3206 seconds)
Published: Thu Feb 16 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.