R visualisation with ggplot2 – intermediate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this course called data visualization with our studio and ggplot intermediate this is a follow up on our introduction you might have followed that at UQ or online and we're going to have a closer look at the ggplot2 package today and specifically we're going to have a look at installing a custom tool or a tool for picking colors rather to customise scales and ranges we'll divide a visualization in several facets we'll explore new geometries modify statistical transformations just a geometries position and have a closer look at themes too you might also introduce an extra add-in to our studio that can be personally take your life or your work with ggplot2 so Phil you find this session enjoyable and useful and [Music] yeah we'll get started with opening a studio so you should have an hour studio already installed on your computer I've got the latest version of our studio a studio 1.2 and the version of art I am currently using is version 3 point 6 point 0 which is also the latest at the time of recording this video so having up-to-date software and packages is always a good idea I mean often a good idea unless you really want to stick to particular package versions that are important for your work so we've got on the left here our console panel you can see our prompt here waiting for input there's some information about as I said our our version some recommended functions here so what else can we see in our studio we've got our environment at the top right and we've got files pane here that show us our home directory files a few more useful tabs here that we'll go through as we need so we'll have to make sure that we've got ggplot2 installed on our computer before we can keep going so what you can do is use the function installed at packages and as an argument give the string of characters ggplot2 now once you execute this it should fairly quickly install the necessary packages if you already have the package on your computer you will update to the latest version and once you're back at your prompt and there's no error message you should be good to go you can double check in your packages pane so you can find this panel display in this tab here in the bottom right of our studio and you'll find that ggplot2 here shows up and it's described let's create elegant data visualizations using the grammar of graphics the version is three point one point one that's the one we're going to use today the other bit of setting up that we'll do is creating a new project so we're going to go to the second button in the toolbar here the second icon you can click on that and select create project new directory so we don't have a directory to start with and a new project so a basic new project I save all my directories or all my projects inside and our projects directory and I'm going to name this one ggplot2 underscore intermediate so this is the name of your project and which is the same as your working directory and this is where it's going to be saved can a click create project so it restarts our session we still have our package installed because you only need to do that once it stays on your computer where we'll need to do is load the package later on to access the functions inside ggplot2 as you can see if you scroll through your packages there's a few packages that are already loaded automatically when a new session is started so for example you can see graphics here and graphic devices those are ticked so they're loaded automatically and their functions are available there's quite a few in there that are part of design that are usually needed for your work so we've got our project everything that we'll do now will happen in by default inside this directory ggplot to underscore intermediate and the only file that we can see currently is the ARP rush file our project file that you can reopen later on to go back to your work and to our studio one more thing that we can do to keep things tidy is to use the dear create function to create a new folder and we're going to call this one plots so we can store all our plots in there they would create plots it's a string of characters as an argument once you press ENTER and execute the command you can see the new directory here in your files pane and finally still setting up a little bit we'll be using a script today so you can use the first button in the toolbar here our first icon and use the first choice in this drop-down menu for a new r script you can also use the ctrl shift n shortcut if you need to if you want to do that more quickly and we've got a new panel opening it's the source panel source pane and you can start typing your code in there so I'll start with a couple of comments they start with hash symbol then we'll say this is first a description into immediate session on ggplot2 little typo here an author that's my name and a date is today state which is the seventh of May 2019 in this session we're going to play around with a data set called Gapminder it's a small version of the data that's provided by the gap minder project and we're going to import that in a minute okay so I'll add a comment import data and we'll put a command here to import the data and execute it from the script before we keep going and that is that our script is called entire old one and there's an asterisk next to the name this means we haven't saved it we can click the floppy disk icon here or we can use ctrl s like in most applications most programs to save our script so I'll click on that by default it saves it in the working directory so the project directory and I'm gonna give it the name process we can click save or press ENTER and now we've got a process dot capital our file in our working directory you can see the name at the top of our source pane you can see it in our files tab - it added automatically dot capital R because this is the default extension for in our script they don't make clear to your system that this file this text file simple text file contain our source code alright so to import the data we're going to use the command that starts with the name of the object we want to save the data in so a Gapminder and use the assignment operator to stalk something inside it so I'll use the base function read dot CSV and as an argument I only need one thing where the file is located and this can be locally on your computer it can also be a URL so you can find the URL in the description of this video but I'm gonna copy it from the course notes which also linked in under the video in the description of the video so I'll copy this URL here there you go so rather long one so I'd rather not type it and in between double quotes we've got this string of characters that shows that tells read the CSV where the file is located I can now press ctrl enter if you've got your cursor anywhere in the command now at the end of the line you can use ctrl enter it's the same as pressing this button here run you can see that it tells us ctrl + Enter is a shortcut to use it and this sends the command to our console and executes it so it's worked we've got a new object here Gapminder and we can see the size of the data set I'm just going to make sure make a note of this shortcut to execute from script so our data is called Gapminder it's got 1704 observations and six variables and if you click on this blue arrow here you can see the different variables that are contained in it so those are your different columns in your data frame country year population continent life expectancy and GDP per capita you can also see how the data was imported or interpreted there's a factor an integer and some numerical data no another factor here so depending on the data laid has found and he has done apparently the good thing he has assigned a type to the data so if you want to familiarize yourself with the data you can use a couple of commands like summary which will give you summary statistics for the data set execute that and you'll see in your console for each one of those six variables some summary statistics which change depending on the type of the data so now we know that the data set starts in 1952 and finishes in 2007 we can also see that for each country there are twelve observations so twelve years and we can also see the maximum life expectancy M maximum here and minimum life expectancy ever recorded now another way to explore your data is to use the view function if you simply want to have a look at a tabular data view its it opens a new tab titled cap minder and you can scroll through or search for a term and this is more similar to any spreadsheet program that you might be used to using so this might be a better way to explore your data so make sure you use view with the capital V here because of our recognizes or makes the difference between capitals and lowercase the case of your functions and objects etc will matter so we have a bit of an idea of what the data is now let's start exploring it visually so previously in the introduction to ggplot2 we've gotten used to the usual syntax of a ggplot2 visualization so we're going to keep using that full syntax and using those three essential elements I'll go through them once more and then we'll experiment a bit more which is applied to so let's make sure we load the package you can do that with the library function and we need to load library sorry in ggplot to execute that and if you check in your packages pane you'll see that ggplot2 is now ticked now let's let's visualize our data set and we're going to have a look at in our first visualization is the growth in population over the years so we can do that with starting with the ggplot function make sure its ggplot man ggplot2 where we can set a few defaults and we're going to use our three essential elements today so the first weather data comes from that's Gapminder our data set the second argument after O'Connor is mapping which will take the function AES for aesthetics and we'll decide to associate the aesthetic X with the variable year we'll also use the aesthetic Y or the y-axis and associated to population pop make sure you close the two parentheses and after that we'll add a plus to make sure we let G plot you know that command is not finished and add our third essential element which is the geometry then in this case we use the genome and score point function to use the point geometry so we've got our data where the data comes from we've got the mapping of aesthetic elements to variables from our data set and we've got a way to visualize the data or geometry so I can execute this and I can visualize the growth in population for this particular data sets are going from 1952 to 2007 we've got two countries that dominate the data set here and that's going to be China and India at the top and everyone else bunched up at the bottom so we always remember to have those three essential elements data mapping of aesthetics and geometry so remember that we can add an extra aesthetic element in esthetical AES and for example we can color our points according to another variable so in our case we're going to use the continent variable which is a factor a categorical variable and we want to identify where those points come from so we're going to go color equals continent execute that again and with this extra line extra argument we can now have some color on our points and see where in which continent each point is from so it's quite a few different aesthetic aesthetics that you can use in your visualizations some of them work with some geometries others want so depending on the geometry that you use some aesthetic elements will make sense or not so with your scholar we've used fill in the previous session there's others like shape and size that you can use you can experiment with those too so let's have a little bit more control of colors this is the default palette that ggplot2 uses there's also going to be one for discrete values there's going to be or for categorical variables there's going to be one blue scale color scale that's use or color gradient that's used for continuous variables but let's have a look at if we want to change those defaults or replace those defaults let's have a look at the help page for a function called scale and disk or color and the score Brewer we don't need those parentheses at the end we only need the name of the function and this is a function that's part of the package ggplot2 you can see it here it's described as sequential diverging and qualitative two color scales from color brewer org so when you see this color Brewer you know that those pallets have been sourced from the project color brewer so those ballots are straightaway available in ggplot2 when you've loaded the package and you can see the names of the different palettes when you get to the pallets section so there's going to be some diverging ones some qualitative qualitative one sorry and some sequential ones if you want to visualize those ones you can go to the website color blue or two dot org and as you can see was originally created to categorize auto color spatial data you can see those different palettes click on those ones and see the name here so pube you for example that's for purple to blue I believe you can have a look at the different sequential ones if you want to change that to diverging you can click on this one here diverging explore those ones find the one that you like and finally qualitative is up here and change those ones too so here we're looking at qualitative data continents so for example we could change our palate to something different let's say we want to try this particular one set one so I'm gonna go to our code and add an extra plus sign here threw out an extra function afterwards and modify of the scale so I'm gonna go scale and this call Carla underscore Bru WA and set the palate to set one so you can see that our colors changed now it's using this particular color brew palette called set one I made sure that I selected one that was qualitative and it's worked now if you want to display the palettes and select choose your palettes more comfortably inside our studio you can use the following functions to visualize them so to visualize calabria palates we can load the package color blue WA now notice that this is the American spelling here our color Brewer there's no u and there's a capital R capital C and capital B execute that we've got the package loaded and we can use the function display dot brew WA dot o and this function will display in the plots window or plots panel the different palettes that are available so Richard this line on line 29 and there's all applets so there may be a more comfortable way to find the palette that you want to use and I particularly like this function because you can slightly modify and modify it by adding an extra argument colorblind friendly you can see it suggested here in the drop down so that's a capital F colorblind friendly and set it to equal to true by default it's set to false so make sure you set it to true here and this will reduce the choices although there's still quite a bit of choice there and there's three qualitative ones so for example if you really want to make sure your your visualization is accessible to all you can set your color palette to set two which is a colorblind friendly palette you can see it here listed and that will make sure that it's accessible to most people right so this is the built-in pallets that we can use straight away with ggplot2 now we'll follow on with saving ourselves some typing because we'll keep modifying this plot and we're going to create an object called P that takes this base plot so we had first in the G plot call where the data comes from and let's type this one to practice that I call skip minder mapping equals the AES function that contains x equals year y equals pop and color equals continent and the geometry that we use is the point geometry so this is the base of our plot if you execute this there won't be anything in the plot window but I can show you my environment here and you'll see that there's a new object called P that is a list and that's the base of our plot that we can reuse afterwards so for example let's try to create a custom palette so instead of using the built-in palette on one of the several built-in ones I'm going to use our base plot here so we don't have to type it again P add a plus afterwards and then use the function called scale color underscore manual instead of Brewer the scale end is called color underscore manual and we have to provide it with values so our contains a bunch of colors several hundred hundreds tons of colors with particular names so you can expect blue to exist red purple green and orange so we've got five colors because they've got five continents here in my data set I'll execute that and you can now see our updated plot using our custom palette if you need to find out about the color names in our you can use the function color without any argument and this will print out oh sorry colors puerile and this will print out the list of 657 color names so it's not particularly comfortable to pick a color from although you can search for particular terms in there so there is an more comfortable way to do things and that's using [Music] and in that's called color pika so some packages that you install let's try this install that packages in the console color picker all lowercase with the British Australian spelling here color picker he executed this and install this add-in instead of being a package that you load and you use the functions off you'll see that it you will add an extra element in your menu in our Studios so there's an add-in drop down here you can click on that and pick the color picker tool it says that it lets you easily select colors and that's particularly useful to find the colors that you need so instead of printing out the whole list of our colors you can go to this tab here at the bottom find our color oh sorry the third one all our colors and instead of having names you can see the actual colors so if you hover over a NATO color you can see that this one is called corn silk before this one is called dark slate blue and this one is called hot pink one there are all the ways to find colors you can find particular our colors by specifying what kind of color you want so if you want a light purple like this one it'll tell you here's those are colors that you can use names of and finally any color will give you the hex value for any color that you pick so you can see this code here hash and then six characters this is a way to code to specify particular color and that's called a hex value you can always use hex values you don't have to use a our color a color name the are color names are only here for convenience you can always use any x value that you want so if you wanted to replace this with better colors you can replace this whole concatenated list of colors and use the color picker so let's say you want to pick your custom colors go to color picker with the cursor design at the right spot make sure that you've got enough boxes here so you can click the plus button to have five boxes and you can then pick the colors that you want so I want first this kind of blue for the second one I want this kind of yellow the third one I want something that's a bit pink the fourth one some red and finally the fifth one will be orange right sort of you've got your five colors you can click done here the blue button and this will concatenate automatically a list over the six hex values so you can see that this is more comfortable more practical to create a list of six different values so I can execute this and have a look at what it looks like and there's my custom palette used in my visualization I'll make some more space for our visualizations yeah and for with some scale modifiers so you can see that we used functions that starts start with scale we modified the color aesthetic here what we're going to change here now is modify the y-axis to space out our data so you can see that we've got a lot of data bunched up at the bottom and two countries that dominate the data set so if you want to change or transform the data on the y-axis to spread out the data you can use a built in function and has one application is to apply a logarithm transformation to your data so modify the y scale to spread data again we can start using by using our p base for our plot and use the function scale underscore why it's the y is fake and it's called log ten we don't need to specify any argument the default behavior here is to transform the data along the y axis by a log ten transformation execute that and you end up with spread out data like this so this might make sense for your particular application in minot but make sure you justify any transformation it needs to make sense another thing we can do is modify the axis breaks in our visualization on the x-axis at the bottom here you can see that by default Drupal 2 uses the decades decades for the years so 1950 1960 etc etc this makes sense because it makes the visualization more or easier to read more visually pleasing often but in our case because we've got very specific years here where that I was collected we might want to change that so it actually applies to the actual data so here let's modify x-axis breaks the first step in that is to create a list of years included in our data set and you can do that with the unique function so I'm gonna create an object called unique underscore years and use the assignment operator to store the output of the function unique on the data set commander but specifically the column year of the variable year so the unique function will find all the unique values in the year variable and store that as unique years you can execute that you can see in our environment there is a vector numerical vector with 12 years here it is and it's called unique years now I can use that to create my ticks more bricks rather so we'll use the vector for the breaks on our x-axis again starting with the base of our plot P that we saved previously we'll use again a scale function for the X a stake and because we've got a continuous variable we'll use the continuous variation here the breaks argument will take the list provided by unique years execute that and you'll end up with a very similar visualization but your frame and the back are your lines at the back and your ticks and labels on the x axis correspond to the actual data so you can see 1952 all the way to 2007 another example of modifying the brakes so I'll change this comment here because we're gonna modify also the y axis I'll add a plus up to this scale X continuous function and if I want to can add another comment here so I'll say simplify breaks on y-axis and to do that we can use scale underscore y underscore continuous you might have expected that and here we're going to define a list of breaks so we don't want the ones here it's not really simplified but making it more readable rather so here we want breaks not with those automatic breaks that use the scientific notation we want to use our custom list and it's going to be 0 then followed by hundred million three hundred million five hundred million and finally 1 billion so this works straight away you can have those different breaks with different distances between them you can see it here on our y-axis but it still uses this scientific notation so what we can do is use an extra argument in our function scale Y continuous and use labels a list of labels so I'm going to use again a concatenated list of labels that needs to be the exact same length of the breaks start with the 0 we happy with this but we'll follow with 100 M string of characters 200 m 500 M and finally 1 B so make sure that those strings of characters are surrounded by those level quotes or single quotes and here you can see that our custom labels have been replaced or have replaced our CA scientific notation yeah so you can really customize as you want the breaks like this still working on axes of scales now have a look at yet another modification imagine that you want to zoom in onto one particular part of your data so if you want to ignore for example China in India at the top but you want to zoom in at the bottom here on all the other countries you can change the automatic limits of your y axis so let's modify the y's scale range let's use the base of our plot and after a plus use the function y Lima so this is a shorter name for the limits of our y-axis and here we need to provide a concatenate concatenated list or a vector of two values where the axis starts 0 and where it stops 360 million so we end up with all the data except China and India and notice that there is a warning message in our console here it says warning message removed 24 rows containing missing values for the geometry point and that's really important information that's 24 data points which means 12 plus 12 to countries China and India so this is a good reminder that this particular plot that you just generated is not showing the whole data set at least here we can zoom in and have a look at other countries in more detail so this is it for this particular example on modifying scales we'll move on to a different geometry now and look at histograms still using our same data set we'll start with ggplot and save ourselves some typing by not mentioning the names of our arguments so you can go straight to Gapminder without using data equals and for the aesthetic elements you can go straight to the AES function without the mapping argument name and say that we want to associate the aesthetic X with the life X variable make sure that you've got a capital e there and after a plus will change our GM function to GM histogram we need to specify anything here we'll have a look at what it looks like so here's our histogram the default of the histogram is to use 8 here sorry 30 bins that's the number of cells in your plot and it's categorized all our data along the range of life expectancy and is showing us that what's the range of the whole variable and also where are the biggest densities so this is the default look of our histogram if you want to change your number of bins you can do that by using the bins argument for example that's what the message that's in our console stat bin is using bins equals 30 pick better value with bin width so two methods here I can use bin width as is recommended bin width and say that I want for example 15 years for each bin that's not particularly informative I can reduce that to 10 see what it looks like the other method is to use the bins argument and specify directly how many cells you want in your histogram so I could say I won 40 cells rather than 30 so that's more definition again I can go further up 60 cells and that's even more definition but there's more noise there so let's say we're gonna stick to 10 bins for our next examples bins equals 10 and I'll add some information here again bringing in the continent variable so I'll add to a es to the function IES the aesthetic fill associated to continent now remember that we use fill here because we want to feel the whole area or as color will only color the outline of our areas so now we can have a look at actually where continents are represented the most in the life expectancy range you can see that for example Africa is more around the lower end of the range whereas Europe is over represented and Oceania are over represented at the top of the range now what's interesting about the histogram geometry is that we can modify the position so in a GM underscore histogram function we can add an extra argument called position and the default is stack so this won't change anything that's the default you can check with looking at the help page for GM underscore histogram so I can press the f1 button when I've got my cursor in the name of the function brings up the help page and I can scroll to my arguments and see the position argument here which is the position adjustment either as a string or the result of a call to a position adjustment function if you go to your function here GM histogram you can see that the default is stack and that's all area stacked on top of each other now I can change that doesn't have to be like this depending on what you want to represent you can change it to for example fill and this will fill the whole area so the position fill uses ratios rather than absolute values and here we've got on the count axis rather than an absolute value of the number of observations we'll have a proportion or ratio or fraction of this particular bin so you can see that in the first bin here we've got 50% that's associated to Asia and 50% that's associated to Africa whereas if you look at the last pin the top bin is all in Asia so that may be an interesting position to play around with and a third position that you can try is Dodge and that will create a separate bar for each category in the Phyllis thick and that will divide each one of your bin in several bars so that might be a more comfortable way to compare how a continent is distributed between bins so play around with those positions they might be very helpful to represent the data most effectively alright let's move on to faceting now so fascinating is useful to divide a plot in several facets several plots according to a variable so we're going to use something very similar to our histogram before start with ggplot tells you to put - that the data comes from Gapminder followed by the aesthetic function that groups are mappings of aesthetics X is associated to defects still and phil is associated to continent after a plus I'll go to GM underscore histogram and specified on my bins for family this is the difference here we're going to use a function called facet underscore wrap you can see that there's a couple of facet functions you'll often use facet grid and facet wrap but in our case we've got only one function or one sorry variable to divide our plot with so we're going to use a set up to automatically wrap on several lines so here we have to use the formula notation and that uses the tilde so I'll start with the tilde and on the right side use the continent variable control-enter and you can see what faceting does now it divides our plot in several panels several facets one for each one of the variable that was used to create the faceting so we've got a bit of information that's duplicated here so let's do a little bit of theming faceting and theming because we've got the continent variable that comes up for fill for the colors and the continuum variable that comes up for the faceting so what we're going to do is remove this legend on the right side because it doesn't really add to our visualisation and make more space for all the rest so we can add an extra plus here and use the theme function with the argument legend dot position as none so no position means no legend if you execute that you end up with more space for your facets so this is also where you would change the position of your legend to the bottom of your plot for example you can see at the bottom here you can also change it to as you expect the top the top and the left so you know that the legend position is by default to the right but you can move it around or set it to none if you don't need it so let's keep theming our plot remember in the first session that we use built-in themes let's have a look at theme underscore minimal here which might make it a little bit leaner so that removes the gray background and changes a few different things that might be the look that you're after remember that there's quite a few functions if you start typing theme underscore something you'll see black and white classic dark etc etc if you want to play around with more functions like this you can use an extra package to explore different ones it's called GG themes so I'll go back to the console here use installed packages with the string GG themes and execute that so you can use GG themes for more options I've installed it I can load it now with library GT themes and now if I start typing theme underscore I can see that not only there's my GG plot function our code execution error got an issue here not sure what it is fully yeah it's having a struggle there which might have to do with the latest versions of the packages so for example if I want to use GG theme themes function theme underscore calc let's see what this does so it still works and that's using this style from calc you can use also other websites or I use a terrible terrible theme from Excel from an old version of Excel so there's a few funny ones there but there's more interesting ones too so I can go to the help and have a look at GG themes see if I find some help there ok so there's a problem with the help page I think that's the only issue here so we won't be able to have a look at that you might find some information on your own but the function still work here so try them out I'm gonna stick to minimal here because I think it's nice and clean and keep going with the theming here now notice that something happened here now I've got my legend that is back that's because I've used my theme modifier here first and then I've used the theme underscore minimal function and what theme underscore minimal does is that it will replace a bunch of the defaults and it has apparently replaced my setting for legend position so what we can do here is invert the two so cut this and put it after my minimal theme and this should work there you go we've got a mini-mall theme and we've got a legend that is gone one more thing you can do with your plot is add labels so I'm gonna go xlab which is the short version for adding a label to the x-axis I'm gonna say life expectancy here a bit more understandable than the current label and after that my lab similarly to have maybe heavy tools here okay I've got my capitals and my different name for the x-axis so let's have a look at an example of customizing a scatterplot I'm going to use mostly the tools that we've just had a look at during the session so I'm going to start with ggplot Gapminder is our data set IES we'll take the x-axis are associated to GDP per capita P per capita we want to have a look at the relationship between GDP per capita and life x of expectancy so on the y-axis life x on the x-axis GDP per capita I'm a tree that I'll use is giong point I'm straightaway going to use some labels I'm going to label it on the x-axis GDP per capita on the y-axis life expectancy and the title is how does GDP relate to life expectancy finally I'm going to use the theme and a score black and white so this is all functions that you've seen before it's not an ideal plot here it's very dry and we can probably present it a bit better so instead here I didn't use X lab in my lab I used the full labs function to specify labels for X Y title this is where you can also give your plots tags if there's several ones a caption etc etc so what I'm gonna modify this way is add an extra aesthetic to color according to continents very similar to where we did before but here we're gonna do it locally in the point geometry and say color is associated to continent so look at what this does okay we've got our legend and we've got our colors on our points now the reason why I'm coloring here locally is because I'm gonna use an extra geometry the extra geometry is GM underscore smooth and it's a smooth line that's gonna go on top of my plot so this is a bit messy by default it uses the gamma function here because I've got quite a few points you can see it in the console it's a little bit of information about the automatic method that has picked a function that's most suited to the data set according to ggplot2 what I'm gonna do here to make it better is go to a scale modifier after a plus and I'm going to modify the x-axis to do and log10 transformation to spread out the data okay and the other thing that I want to do is modify the default smoothing method with a linear model so in between double quotes here I can specify LM to use as a function for my smoothing method and here it goes through the data with a straight line finally I want to maybe make the plot a little bit more readable by adding some transparency to my point geometry so I can go to my point geometry and add a second argument here and say alpha which is accessible in a lot of functions to give some transparency and I'm gonna give 0.5 percent of 0.5 as a fraction of transparency or opacity here we go then there's an our enhanced plot hopefully it makes a bit more sense now now the transformations that we use and the particular smooth line on top or trendline you have to justify why you're using that and you might find it makes sense so it doesn't depending on the data that you're playing with and what your question is now just to make sure that yeah so for GM point we use the local setting of mapping the color is 32 the continent variable because we don't know we didn't want to specify in the G but call otherwise he would have been into taken into account by Jian smooth and it would have drawn a different line for each one of our continents that's why we only had to color inside the GM point function so to save our plots if we like this one in particular we can use a particular command we've used before the menu here export you can find here you can save as an image you can save as a PDF you can copy to your clipboard but if you want to automate the process you can use a function called GG save they will automatically save the last generated plot for example my plot dot PNG now let's give it a more descriptive name maybe GDP lifx but I'm gonna make sure I save it underneath or inside my plots directory so I can specify here the path to where I want to save it and also also the name of the file so you can do that it tells you what size it has saved your image as and if you know you get to your files tab here you can see your plots directory and your file is here so if you click on that it will automatically open it with whatever program you use usually to open pictures so it saved it as a PNG that's what we said gave it the right name and it saved it inside the plots directory if you want to see some help about Gigi save you can bring up with f1 the help page and you can see that there's more arguments that are available in particular what's interesting here is that you can specify the DPI's so in DPI you can specify the plot resolution which is important depending on where you want to present your visualization so it not only accepts numbers for dots per inch ease but it can also take a name like a word like retina or print or screen depending on where you want to present the visualization so that's a particularly handy one especially if you want to put something on the poster and you want to make sure it's got no visible pixels you can also specify directly the width and the height of your plot making sure you specify you you need to so for example I could say with is 20 centimeters the height needs 15 centimeters and the unit's see Emma it doesn't give me any feedback here because I've specified the size but if I go to my files and up in this now if it's got the size left keeping it 15 centimeters up 20 simulator centimeters wide and I can also change the DPI if I want to right now there's quite a few more geometries available one example is the bar geometry so let's have a look at this one again with Gapminder and the aesthetic X equal continent a simple example they're just having a look at how many times the continent comes up in the data set you have to specify GM underscore something and you can see that there's quite a few available including some Gigi themes once I think yeah this one here for example range frame up there and I'm gonna have a look at the help pages oops there we go there's my error but here I'm gonna use bar and by default the bar geometry will use the count statistic which will count the number of times something comes up in a data set so we can see that Africa comes up more than 600 times right so say Anya just a few times so that's our bar geometry very similar to histogram in a way another one that might be very useful for you is the boxplot geometry so again ggplot is our default call using the gut miner data set and in the aesthetics we'll have to use two here X is continent and why is life expat the spread of the data for each continent using the GM boxplot function here's our box plots so you can see your box and with boxes and whiskers and your outliers outside very similar to boxplot is the violin plot so it might be a bit more helpful to visualize densities more precisely so using the exact same code here got my dice the data the aesthetics our x associated continent y associated to life x the instead of box plot we're going to use GM underscore violin you'll see that it's very similar but it allows you to visualize some variation in density here for example in this particular one rather than just a square box if you find that your axes at the bottom of the lapping it might happen with some some categories some very long labels you can play around with theming again and in this case I would use axis text dot X because we want on the x-axis the text of the axes sorry in the the text of the labels on the x-axis and we'll have to use a function called element underscore text which you use to modify any text element in ggplot2 and we'll specify that the angle needs to be 90 so this turns the text around and now he can maybe have more space and you won't have any overlap and this is it for today the one little extra that I want to show you is to use an add-in called s kiss to cheat so you can do that we install that packages s kiss and it's another example of a add-in that's added to your R studio interface so install the packages and the string s kiss gives you a graphical user interface for doing ggplot2 visualizations so if you're feeling a little bit lazy and you want to use something graphical rather than figuring out the code you can go to your audience menu and under the header s keys here you can see ggplot2 builder so click on that and you will open this new window where you can select a data frame so here we've got Gapminder data set and we can select the variables that we want to keep by default it keeps them all and we can click validate imported data so you end up with this window here that allows you to drag and drop variables into those boxes those asthetics boxes and try and see what it comes up automatically for a visualization so for example let's do GDP per capita is on the way and we'll have a look at continent on x so it ends up with a boxplot here that's the default that it's selected but if you want we can change that and select a different visualization so I could change it to valine here and you can play around with that for example you can also reuse I think continent into the field now you've got different colors for your visualization or for each one of your violins with a good thing is that it's not all graphical is you can click on export and code here and you can modify the code or you can I believe you can at least copy to clipboard or instead insert the code in the script so if I click on here and go back to my script I've got the code here that went to my console I didn't have my active cursor in the script but you've got the code here that was just generated by his keys I can go back to my Builder here I can use Gapminder and try other options now that is that you can also change other options here with the menus at the bottom so you can change your Legend position here for example so if I put continent in color I can straightaway change my legend position here I can also select one of the palettes that are automatically included that's very similar to what we did with color Brewer so here I can change it to a different qualitative one set one change labels here and then also filter the data so if I want to reduce for example the years I only want to have a look at the years until 1987 I can do that here and it reduces the range of my data again you can see the code here so here you go this is it for today if you look at the material that's online that stays public and up-to-date you'll find that there's another example at the bottom that we haven't been we haven't had a look at but it's yet another thing that you can explore have a look at the code try and figure out what it does and this is the resulting visualization with a different data set called diamonds so if you're interested in learning more you can go there but also make sure that you follow our links for ggplot2 in particular here but there's also a compilation of more our resources that we compiled over the last few months and then we recommend you to have a look at so hopefully this session was helpful to you make sure you save your script ctrl s you will be all saved inside your working directory as your script and you will be able to open your project again from the project menu if you want to thanks for watching and hopefully find you add another session which is you
Info
Channel: UQ Library
Views: 12,218
Rating: 4.9610391 out of 5
Keywords: RStudio, RStats, ggplot2, data science, Tidyverse
Id: zzXCkYR84M0
Channel Id: undefined
Length: 71min 33sec (4293 seconds)
Published: Thu May 09 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.