R-Ladies St. Louis October 5, 2021: Getting Started with R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so i have a full screen i can't actually see the chat again like on my on my system downstairs i usually have it in uh on two screens and i can see the chat at the same time so um someone's gonna have to unmute themselves and tell me if if there's a question if there's something that um i should stop for which please please please please do interrupt me because um if you are confused about something you're likely not the only person um so in terms of getting started uh let's go ahead and do that um so i um i'm guessing given that there were instructions beforehand that most folks have installed what you need but if you haven't the first step for getting started with r is to install uh both r and rstudio and again like i said when i started out i was actually very confused by this because it was like wait a second i need to install two things i thought i was just using r so let me kind of talk you through um the installation process and then we'll talk through like why you have to install two of two things so the first thing you would you do if you have not um ever installed r before is you install r so on my slide here i have a link to um i'll actually just open it up here what's called the comprehensive r network aka cran and this is where you can find our um for to download r and use it for your operating system i'm guessing most folks here are either on mac or windows if you're on linux you can for select that as well so if you are on a mac for example you would click that and then there are basically two options either you download the version if you have a slightly older mac or if you have a new fancy m1 mac you would download that one if you're on windows you would go to the windows page and then you can just hit the base and then hit download r4.1.1 for windows um for from there all you really need to do is you just need to install it like you would any other app um if you're on linux i am guessing you can figure out i i to be honest i've never used linux but people who are on linux tend to be a bit more advanced with this kind of thing so i'm guessing you can figure out which distribution you're using and what you need to install um all right so if you have not installed are go ahead and do that now i'm going to talk briefly about um about r itself and then um can also go over any questions um so the thing with when people say are um what we what i just talked about having you download is um the kind of is r and you can in fact work directly within our something for example i'm going to just go to my spotlight i'm on a mac oops and i'll open up our um and you can work directly in r so when you open it up it's going to give you these messages people always get [Music] always asking about why it says absolutely no warranty in bold letters i'm not entirely i know there's some background there but around licensing but um don't worry about that the most important thing is you get a cursor here and when you work in just r you can work uh at the at the in the at the r console so you can do things like i could type two plus two and r will return a value so for example it will return four i can do eight times ten it will return eighty i can do nine divided by three it will return three you get the idea right you can use r um to do in this case i can do any kind of mathematical operations um so i do say that their chess just i'm going to keep going but if they're things that folks that i need just someone let me know please um so if you haven't go ahead and open r so if you if you've downloaded our studio i'm not asking you to open rstudio just download our what so you should see what i saw and i want you to enter you know two plus two or whatever and see your result um so let me actually just go through this um so sorry that that's what i did i worked directly in r i did two plus two and i saw the result so for you i want you to download and install r then open it and then just make any kind of mathematical expression uh enter it and make sure it works going to pause for a minute or two i'm going to actually um open up the chat and [Music] see if there are questions and then we'll come back and i'll keep working let me exit that okay sarah i'm glad you found it i think we've answered all the questions in the chat david okay cool yeah cool i think we're caught up okay someone did have a question about how often you want to um update your version of rnr studio i don't know we like we everybody has opinion jean said maybe once a year somebody said once a semester i did it just yesterday and it like broke a bunch of stuff so um yeah it's a good question i mean i think it is a personal preference thing i did it i actually did it yesterday too because i was recording some materials and i needed to demonstrate how to update r and it also i i wouldn't say it broke things but it the packages that i had previously installed i had to reinstall um so that's just something to keep in mind for me how often realistically i probably update it maybe once every six months or if i see that there's a new major version so for example our 4.0 came out several months ago and i and i updated when that happens all right i'm gonna continue um let's talk now about our studio so you can just work in the r console and i was actually talking to a friend several months ago and he's like oh yeah he's r2 and i was like oh cool you use our studio he's like no no i just used the r console um which again it's possible i i would not recommend it and i don't think most people actually do that so our studio i've actually taken this from the modern dive book the materials there's a there's a link to that really great introduction book and they use this metaphor where r is the engine so you know doing things um like the mathematical operations that we did just a minute ago our r gives you the engine that allows you to do that but our studio provides a really nice dashboard that helps you to drive so i don't know i guess if i were to extend this metaphor for a while i don't know if there is a way to drive i don't know much about cars if there would be a way to kind of drive uh you know without a fancy dashboard maybe like old-timey cars that just had like a um a steering wheel basically that would be the equivalent of working just in our but our studio is going to provide a really nice dashboard that will allow us to kind of organize things see what's going on um so i i i definitely definitely definitely recommend you use rstudio um so if you use rstudio you have this user interface you can see stored information you can see you can save code uh it's a bit cut off but you can see for example plots that you make uh things that you have saved uh some really old code that i wrote probably like three or four years ago at this point um so to get our studio you just need to go to uh the rstudio website um there are you're gonna see that there different versions so if you are at the top of this page you'll say choose your version for most people and when people think about r being open source and it's free they're talking about our what they use typically is rstudio desktop so just hit download download whatever the latest version for your operating system is and again just go ahead and install that so i'm going to give you in just a second or well now i'll give you a brief tour of our studio and i'll flip over to our studio in just a second when we open up our studio um it's got four panes um so by default uh you'll have your scripts on the top left you'll have your environment or history kind of on the top right there there are other things that each of these panes can do but i'm just kind of talking about like the most common things your console and terminal will be on the bottom left and your files plots packages help will be on the bottom right if you don't know what these things are and you're wondering what i'm talking about don't worry i will talk through each of them uh let me actually flip over to our studio um so right now what you're seeing somewhere else okay is my r studio um so uh in the top left this is where i have code that i'm working on so this is an r markdown document which we'll briefly touch on on the bottom left here is my console and let me actually just to show you let me actually i'm just quitting our studio and i'll come back in um the thing to note about your console you can run code in your console and if you actually look here you'll see that your console is exactly what we saw here when we just opened r itself so basically our studio has the console has you know what we saw when we when we just opened r but it adds a bunch of things a bunch of other um uh panes on top of that so you know again like i could do whoops two plus two and get four there um again this is where code we write will live up here will be our environment so when we for example read in data it will show up here down in the bottom here you can see there are various files that are part of the the project that i'm working with okay so again i'm assuming that folks have downloaded and installed our studio if you haven't go ahead and do that now and then if you want to i'll probably just go quickly through this because i think folks most folks have installed our studio um you can do the same thing in the console if you want to use any mathematical operators to create an expression and make sure that it works let me stop for a second and pause see if there are any questions at this point okay well i will continue um please interrupt uh as you know i again i assume most folks did the first those two things but i wanted to just go over them to uh make sure we're on the same page okay now that we've downloaded and installed r and downloaded and installed our studio we're ready to start talking about using our studio so we're going to start by talking about projects our studio has the concept of projects which allows us to um kind of keep all files together when we're working on a particular project so with projects and i'll show you how they work in just a second they allow you to keep together uh things such as files such as any r script file so any code that you've written you can save it as part of a project you can create and save what are called our markdown files i'll briefly touch on these shortly any data files so for example if you have a csv or an excel file that you want that you want to have as part of your project projects allow you to keep those all together hey david uh gary has a quick question if arsenio wants an update what's the best method to update um so i would just go to the rstudio um down the download page and just re-download the latest version and just install that and that'll like overwrite that'll update whatever you have just make sure you quit rstudio before you do that um oh interesting thanks shelly i didn't know that that was an option good to know okay um so with the sample project this is kind of a sample project this kind of visual that i have where you have you know all of the pieces together scripts your environment files console etc let me show you how to create a new project so if i want to create a new project if i'm in our studio i would go to file new project and then um it will open up a window for some reason it's always really slow opening up this window i've never uh i don't know why but it always did i is do the right thing it's just going um oh it's going just very slowly okay um so you can then then choose do you want to put your project in a new directory in an existing directory uh you may or may not have version control depending on how you have set things up so i'll just say let's make a new directory it'll give you a bunch of options starting out you're always just going to choose new project from there you are just going to you need to give your project put it in a directory that you choose and then choose where to put it so for now let me just put it on the desktop and i'll just call this test project and when i do that it will open up uh that project and you can see right now it's i don't have any files the only file i have is this special.rproj file which is just an rstudio project file which will if we want to can open this back up so the way i would do that let me actually close that so if i go to my desktop and you can see i now have this test project and if i click this you you'll see that it will then open up our studio and it will put me back within this particular project that i was working on um so i'm actually not going to work on that project because i need to go back to the project that i was working on which is this intro meetup 1 october 2021 um so i know people and always ask me like how should i divide up my work how should i decide when to to make a new project i'd say anytime you have any kind of functionally new um uh deliverable it's it's tough to say but um typically for me like whenever i have you know there's gonna be a new uh report at the end or something like that i'll typically make a new project it just allows me to kind of separate things so you know when i'm working on this thing i just work in this project it'll be something i think you'll get the feel of more too as you kind of go on um yep so i showed how you can quit rstudio and then open up your project um so let's go ahead and have you all create a new project doesn't matter if it's in the new or existing directory um after you've created it i want you to do what i did find the rproj file and reopen your project so make sure when you create the new project simplicity i would suggest just putting it on your desktop for now okay oops all right so i'm going to pause for a couple minutes let folks do that see if there are any questions and then we'll go from there david someone has a question about file paths um interesting sarah why don't you try it again without spaces uh i i haven't seen that happen to me but try it with with no spaces and see if that works and if folks actually don't mind um what i've done in other trainings like this where uh people are um where we're on zoom if folks can just give um do the little reaction thing with like a thumbs up to let me know that you have been able to do this here that would be helpful cool i'm seeing a bunch of folks saying they got it awesome awesome we'll take another um 30 seconds or a minute let folks other folks get it and then we'll move on um rachel you got a project options box i'm guessing you're talking about that um in which case you can choose either new directory or existing directory no that's actually not what i got oh that's not it no it says project options use default to inherit the global setting restore our data into workshop at startup save workshop to our data on exit and there's a lot of things on the side of it um code editing in arm so did you go to tools project options like that that's what it looks like i don't think i went to tools or objects i thought i went to the file okay i'm not sure what happened um but you can just hit cancel for now that's okay just try it again yeah yeah okay thank you okay um all right i'm gonna go ahead and continue for folks who are having if anyone's having trouble getting the code um you're gonna actually i'm gonna actually have you download it uh i'll give you some code that will actually have you download it in just a second okay um speaking of which um so i'm gonna have i've put together a course project um or i guess probably call it a workshop project this is adapted from a free course that they have on the art for their astronauts website called getting started with r um so i'm going to have you uh download this and i'll put up the your turn i'll demonstrate how to do it and then and also put the code in uh i may or may not put in the chat because it tends to well i'll show you it kind of mess sometimes it messes up uh the quotes um that's not right hold on uh i just need to refresh this okay um so what we're going to do we are going to use a package called the use this package and that will allow us to the code that i have here will allow us to easily download the materials that we're going to use today so i'm going to copy that and let me go into our studio and show you and actually let me do it one line at a time that'll be that'll work out better okay so i will first i'm going to run uh in the console install packages and then in quotes use this um this is installing a package that again we're just going to use that will allow us to download the materials pretty easily okay so let me actually put this um you don't necessarily need to do this if you already um download the materials the same i i made like a couple tiny little changes right before um but no you should be fine uh r is the case sensitive yes i'll talk about that in a little bit more in just a second um once you've done that it'll you know give you this message here it's downloading and then it'll say the downloaded binary packages are in that location that means it's installed we actually just make it a little bit bigger for folks at this point we can just type library use this so again i will put that in the chat and that's just now giving us access to the functions from the use this package last thing i'm going to do and this is what actually changed from the materials that were on there so if you downloaded them like yesterday i just changed the location of what you download um so we're going to use this code use course and then the url in quotes like that i'm going to go ahead and run that and when i do that all this function does is it's just a really easy way to help you download the materials so it's going to ask it's going to say downloading into for me it's users david kai's desktop it'll by default put things on your desktop um it'll ask if that's okay just say whichever one is yes it'll it might look different for you so for me that's one you'll download that uh and after it's downloaded it then um it'll ask if you if it wants to if it should delete that zip file um again whichever one is yes and that's fine and then it will go into it'll actually open up that project for you so let me put in uh the use course line which should give you access to those materials so it'll what you'll know that it works when it opens up it'll say intro meetup 1 october 2021 main so let me pause for a second and i'm going to put my screen back up so folks oops have uh this um actually i'll leave it like this so i can also see participants um just if you wouldn't mind uh giving me a thumbs up um when you got it sorry i don't have enough screen here so and if you run into issues just put those in the chat all right marty said you can uh open up or update our studio yeah feel free um [Music] oh i see some good questions um so um bobby i will actually take your question in just a second let me let folks just finish doing this and actually kate same thing um let me let give folks just another minute or so and then we'll um i'll talk about each of those um yeah it will open up a new rstudio window and yes you can delete the zip file um rachel that message is totally fine as long as it works for you it should but if it doesn't work let us know great i see some thumbs up thank you great thanks jocelyn aubrey thank you um all right let's take another 30 seconds and then we'll move on uh if you download the zip file yeah crystal's absolutely right you can if you download it before that's that's totally fine um reena i'm gonna put the line in the chat where you um what you'll do what you need to do first library use this and then that use course line so in the opposite order of of the way they're actually in the chat um kate excellent question i will actually get to that in just a second hopefully i remember all the questions so there's r tools oh where did that go our tools to use this package and what packages are perfect um so if you um yes serena just put that in the console um so just down here is where you can put it okay so to um answer your questions um and sarah you don't need to specify that you can just um use the default destination that's destination directory so just use the default um so the idea well i'll talk about packet like kind of what a package is they use this package to be honest i i i don't want to diminish your question but i wouldn't actually worry about it that much because it's it's really more of an advanced package it's typically used for um things like if you're making your own package or if you're installing certain types of packages i for now i just um needed folks to it was a package that makes it easy for folks to download materials so it's not something you're going to actually use uh very much at least not at this point um if you are on windows and you got asked about uh installing r tools uh let me actually go back um so if you go uh and i'll put this url um in the chat when you go to if you click on download r for windows and then can click r tools um you may need to download that to be honest i'm not a windows user and so i i don't from what i've heard i mean i've had like 2 000 something people sign up for my getting started course and some of the people tell me they do have to use it i think and some don't um so i i'm not totally sure about why you do and why you don't but if you do need to just again go to this page and download whichever one is for your version or your computer and then you should be able to restart our studio just close it open it back up and and you should be good okay um okay that's that's awesome um all right let me keep going and i will say i'm gonna actually take in about um at like well five o'clock my time i know it's not five o'clock for other folks uh we'll take a short break uh let folks um stretch or or maybe get out of your car or uh do whatever whatever will work for you oh david there's been a couple people that said that they got the message that um use this was built under our version 4.0.5 but that's just a warning message that shouldn't mess up anything right that's what we yep um it should be fine if anyone runs into ish like if you're not able to install or yeah install or get the materials let me know but yes it's just a warning message um so i'll talk about packages more generally in just a minute because we'll get to those all right so let's talk about files in r so actually before i do that um so down here is where we view all the files that are within their directory that is where we're working for this project so i'm working in this directory and let me actually even um go into um so you know i'm on a i'm on a mac again so you can see um these are all the the the files that are in this folder aka directory and so within an rstudio project the files that you will see will be the exact same ones that are within that that folder okay so it's exactly the same here so in terms of files in r there are really two types of files primarily um there are our script files and these end in uh they have the dot r extension and with these it's assumed that text that you write in the r code or text that you write is our code unless you comment it and if you're like what the heck is he talking about don't worry i will talk more about this shortly but just as an example if i go over here um there is for example an exercises dot r file which you should see and um this is where we're going to end up putting some code there's also what are called our markdown documents uh if you're not familiar with our markdown here let me actually open up quickly the chat and put a link to an article uh that i wrote about our markdown and why it's so powerful it's just a way where you can write reports say in r so instead of having i mean i typically contrast our markdown with like a workflow where you get your data say you import it to spss you do any kind of data cleaning analysis then you spit out your clean data to excel you make your graphs then you copy your graphs into word and you write your report our markdown allows you to do that all in a single place within our markdown file um that's unfortunately we're not going to really get to how our markdown works today but the important thing to know is our markdown files end in dot rmd and within those it's assumed that a text is assumed to be text unless you put it in what's called a code chunk um which i actually said more on this soon although i that might be a lie i'm not actually sure if we're gonna get to it so if you go to the slides directory and you find for example the getting started with r dot rmd this is an r markdown document this is a bit meta but this document actually allows me to make the slides that you're seeing so this is the code that i wrote that ends up generating this so for example if i search for file types whoops i haven't got extra space this code here is what makes this slide you can do all sorts of things reports slides dashboards with our markdown um let's talk though about making uh our script files which is uh basically what we're gonna do today um so you can make a new r script file by going to file new file our script and actually before we do that let me just flip over here so file new file our script and so you can see i have untitled one here uh if i save this let me just i so i use the keyboard shortcut command s on mac you can also just hit uh save there um i'll just call this temp and you'll see it'll automatically add the dot r on the end and so um i could have for example where i did library use this i did that in the console before i could also actually just put it in an r script file and i'll i could run it there i'm not going to because it doesn't we've already done what we needed to do with use this but in a second you'll see about we'll import some data and take a look at it using some code within an rscript file um so if you haven't just go ahead and you know create a new r script file save it just so you see what that process looks like okay um so you can run code um there's several ways to do it um so let me actually flip over here let's say i want to run this line this library use this first of all what what most people starting out see is there's this run button and i can do that that will run it um when you start using r a lot that you won't want to go up here and click every single time so you'll probably want to use the keyboard shortcut so if you're on mac that's command enter or run whatever line you're on you don't need to highlight the whole thing or anything like that you can just as long as your cursor is anywhere on that line it will run and on windows that's control enter so there's also within ours the idea of comments within an r script file anything that starts with the the pound sign is a comment and typically it's the kind of thing that you want to do to say like what you're doing in your code and i put do them for your for others and for your future self what i mean by that is you will often want to write comments so that when you you know work on something this week then don't work on it again until next week or next month and you forget you forget what you were doing if you write a comment that will help you to remember so let's say for example i wanted to write a comment here i mean this is contrived because i know what this does but let's say i want to load i use this package so by by starting my line with a pound sign this is not run so like if i even hit run i don't even know oh it looks like it actually runs the next line it won't run this line because it knows okay this is an r script file so anything that starts with the pound sign is not code that's a comment um all right um so that's comments r script files uh um i'll go through one more thing which is the idea of packages and then i think take a break at that point um let me go ahead and talk about packages um the incredible thing about r when people say they use r is that it's not just r itself like it's not just what you have when you first start out there are a ton of what are called packages so packages add functionality that's not present in base r and they're really where much of the power of r is found again uh oh sorry i got cut off but this is also from modern dive uh the metaphor that they use is that r is like a new phone um whereas our packages are like apps that you download right so like you get a new phone it's pretty powerful like you can do a lot but there are probably specific things you want to do um i don't know maybe you want to go on twitter so you download the twitter app or you want to you know get it you're on an iphone and you want the google maps app um you download a package to do whatever specific thing that it is that you want to do and the benefit of our being open source is that um you can basically if you can think of something uh that you want r to do it is highly highly likely that someone has written a package that would allow you to do that um i i'd say at this point it's rare that i'm like oh i wish r could do x and i can't find a package um it's there's just there are thousands and thousands of packages uh we're gonna use two two packages um although one of them the first one is actually kind of unusual it's called the the tidy verse and it's actually a collection of packages specifically we're going to use today the read r package to import our data um when you oh and i'll show you in a second when we load the tidy verse it'll actually show you which packages it's loading but there probably are some a couple other or a few other packages that are like this the tidy verse is kind of unusual in that it's a package that brings together multiple packages we're also going to use the skimar package it provides easy summary statistics and it's a really nice way to just kind of get a sense of our data let me actually check the all right um so i thought one question actually about github you don't need to know anything at this point about github um i just that's where we hosted the code and just where you accessed it but no you don't need to know anything about them um so to install packages you will use install pack packages just like we did with use this and the package name has to be in quotes if you if i were to type for example let me even do it let me show you so if i did install dot packages oops i'm so used to adding that quote that i did it automatically but let me do tidy verse it'll give me this error aaron's install packages object tidy verse not phone you can't install a new package you if you don't put the name in quotes you have to put it in quotes um and the way it works with packages is that you need to install them once per computer so once you install uh the packages that we're gonna use today you won't need to do it on your computer again slight caveat if you update r depending on what type of update you do you might need to reinstall them but the nice thing is the the way you you do it is really simple you just run that line again the other thing i'm going to mention just to make explicit is that when i install packages and i recommend you do this do it in the console up here in our script files is where we're going to run code that we want to save for a future you know future session when we're working but because again just as a reminder packages only need to be installed once per computer we don't really need we don't need to run that code over and over and over we just need to install the package once so your console is where you want to put code that you're just going to use one time okay so that's why i did it down or that's why i would install it down here um cool and then the thing with r is just because you have a package installed doesn't mean you can use it so to use a package uh you have to do what's called loading it and the way we do that is with the uh function library so for example if i run library tidy verse or library skimar that will load those packages and here once we've installed them now they no longer need to be quoted although you can quote them if you want so you'll see for example well here like i did library use this you could also do like library tidy verse and i don't know if you saw that but it actually does a nice little autocomplete thing um and then i could either hit the run or i'll do command or yeah command enter to run that which will load the tidy verse and there's a message there that you probably want to see or will probably be slightly confused about so we'll talk about it in just a second i'll come back to that um so at this point um i'm going to have you open the project you downloaded before you probably already have it open uh i'm gonna have you open the exercises.r file and actually these directions are slightly confusing i'm realizing now um i should probably change them um so you can actually just skip opening uh actually no it's fine for now if you want to um just open that exercises dot r file um just in the main directory uh the way this is gonna it's gonna work from here on out is i have instructions that are set as comments so for example here it says install the tidy verse and skim our packages using the install packages function again i don't actually recommend doing it this way i would typically do it down here in the console but just for now we can we can do it here okay so install those two packages and then i want you to go down to this section here and load your packages so load both the tidy verse and skimmer okay um so let's go ahead i think at this point i'm gonna let folks work on that why don't um crystal or one of the other organizers should we take like five ten minutes slip um short break and folks work on that and then we'll come back together that's something yeah um how about we do it at um 5 10 my time which is well depends where you are and if folks have questions as you're working please just put them in the chat i'll be gone for just a second but i will come back and take a look at them in a few minutes okay thanks david all right let's start again in just a minute um kate feel free to chime in uh whenever you figure that out i or somebody else here can can hopefully uh help you with that um let me see i'm gonna close this in the interest of screen space um so let me um i saw that question i'll get to that in just a second so if you uh just to make sure everybody did everything you should have gone and done install packages tidy verse and then install packages skimmer uh i'm not gonna run these lines because i already have them installed um the tidy verse one especially if you're installing it for the first time might take a while um skimmer should be relatively quick um what you're going to do then down here if you again if you didn't do this is library tidy verse and then library skim r so these i would run um so i'm going to hit command enter command enter um actually let me do one thing i'm going to restart my session so it says if i'm starting fresh when i do command enter on library tidyverse you are going to get this message move this over just a bit um and people are always confused by this saying like what's this talking about recall that the tidy verse is actually an unusual package insofar as it's a collection of packages so all these packages here ggplot2 per tibble d fire etc are the collection of packages that make up the tidy verse so what it's telling you is when you run library tidy verse it's actually loading all of those packages for you that's the first thing the second thing is this conflicts or conflicts i guess uh down here what it's telling you is when you load the tidy verse when you run library tidy verse um there is a function um so the green filter here what this syntax means here with a d plier and then a double colon and then filter that that means a function called filter from the d plier package and when it says masks stats filter that means that it there is a another function within the stats package which is one of the base r packages so whenever you load r that package is just loaded automatically you don't have to type like library stats but what it's telling you is okay now that you've loaded the tidy verse if you ever use the function filter i'm going to use the version from dpplier not the version from the stats package and then saying the same thing with the lag function um so it's nothing to worry about it's just telling you that that's what it's doing um cool all right um and then library skimmer okay um i think there was one other question that i said i was going to get to and i did not well if if there was a question that i said i was going to get to and i did not um please just ask it again because i think we got to it it was about um red red line oh okay cool the the tidy verse tends to be very verbose it'll just tell you everything that it's doing and a lot of times it'll give you messages that look like something's gone wrong but they're just warnings or it's just telling you hey this is what i did so um don't be alarmed all right [Music] well there is indeed um okay i'm can i i'm guessing crystal or janine you can try that person who's having trouble cool all right i'm going to continue on and hopefully uh get help for folks so let's talk now about importing data um and the thing to understand with r is that you don't actually have data until you run code that imports it into r and i know this is really confusing to me because so i primarily used excel before i came to r and there's no idea within excel of like separating out you know this is where data lives that i do analysis from this is like my raw data file right like it's all just like an excel file that's where you write formulas you do your analysis within r though it's actually different we have like our raw data which will for now it's going to be csv files although you can read in any type of data into r again there are packages for reading in say spss data check out the haven package anything you can think of you can pretty much read them so the way to do that is to write code that reads in that data so this line here which i'm going to actually copy over into our studio in just a second will read in data from a csv file and then it will assign it to an object called fake tucky and if you're wondering what i'm talking about like what's an object what do you mean by assign it i'm going to actually talk about both of those in just a second but let me go ahead and flip over here and i'm going to copy that here and i'm actually not going to put this in the chat because i actually think when you're learning it's best not to copy code um so i would encourage you just to write it out when i read this what you're going to see let me bring this over here is that now in my environment i have what's called an object that's called fake talking so the only way i have data that i can use within r is if i see it here in my environment and the only way i get something here in my environment is if i run a line of code or lines that allow me or that create that object here so again like i have a raw csv file here and i could open that in excel or whatever and look at it that way i mean i can even actually open it up in um that's pretty big i don't want to do it i could open it up here trust me it's just a csv file but i can also import it with this code here i know there was a question before about read csv versus read.csv so for example there's read.csv there are two functions that do with the same goal in mind to import a csv file the second one here in line 21 the dot csv is from the base this base r uh this one on my 19 read underscore csv is from the tidy verse um i'm gonna see for today you're gonna need to use the read underscore csv because we'll get to something later that won't work uh if you do read dot csv let me check the chat okay um i think we're figuring it out david okay cool yeah um so oh actually someone was asking about this is um rk sensitive yes uh it definitely is case sensitive um and so what my suggestion for you and i saw actually some discussion of this in the chat is when you create uh objects you're working with things just be consistent um i give given you a couple or three kind of options here um and i'll show you what the what i mean by this in just a second there's like a snake case where you write a word and then underscore to separate between words so snake underscore case camel case periods in names i am a big fan of snake case uh i would that's what i like doing a lot so for example if you wanted to create an object called student data if i were going to do that i would probably do student underscore data but again i've written like examples of what you can do if you want to create objects in other with other approaches uh the other thing i'm going to mention is that um you need to think about your directories when you're working within r so if you noticed let me put back here and i'm going to actually what i'm going to do is i'm going to comment that to indicate that it's that we're not going to run it i'm going to copy this in and i'm going to say fake taki is um we assign fake to fake tacky the results of reading a csv but if i do just read csv fake tucky dot csv you're going to note that it's going to say fake tacky does not exist in the current working directory and that's because if you look at your files here within your your project our our working directory is um the root of our project folder so in other words this where our rproj lives so if i look here there's nothing called faketucky.csv there's only an object or there's only a file that's figtucky.csv within the data folder so that's why i need to run read csv and then in quotes data slash faketucky.csv and that works but that does not work [Music] um yeah a couple other things here i'm just going to power through those again that's what i said before but our data when we import it lives in the environment history pane so if you have not already i want you to open up the exercises.r file import the data into a data frame called fig tucky and make sure that you can see that fake tucky object within your environment history pain um again this looks kind of good as i was talking um but if you don't just given me a little thumbs up for something to to let me know or if you have other questions that would be great sarah can you let us know what your issue is we can try to help you troubleshoot um bobby i saw your comment about being like java i i don't know java i don't actually understand exactly how that would that object that function would work so if you want to explain it more i can do my best to give you an explanation [Music] so um actually the the thing that i see sarah um dealing with is actually something that's really common so let me actually um show you how sarah likely got that um so what i just did now is i restarted my session so it's as if um in this case the most important thing is that i haven't actually run the lines where it's library tidy verse and library skimmer so if i try to run this line now and i hit run it's going to say could not find function read csv read underscore csv and the reason why is you can only run that function after you've loaded the package that the function is a part of so if i do library tidy verse now i'll just load skimar for for good measure now i can go down here and now it will work okay um cool i'm gonna continue on um so you heard me in the the last little section talk about objects and functions let's talk about what objects and functions are um i like to use this quote um and i'll just read it and then we'll talk about it so to understand computations in r two slogans are helpful everything that exists is an object and everything that happens is a function call so this is from john chambers quoted in hadley wickham's advanced r book and let's think about this specifically uh i've given an example of reading in data um it's not the fake tucky but it's the same concept so in our um when we wrote that code again remember everything that exists is an object so if i look here fake tucky exists metaphorically speaking right um but it exists within our our our studio we can see it so it's an object and everything that happens is a function so read csv is a function because that that's what happened in other words the thing that happened was we read a csv uh in so for example here you know i'm creating an object called grants data using the function read underscore csv other important thing to note is that we use what's called the assignment operator so that's this thing here and what that does is it means it takes everything on the right side and assigns it to an object on the left side and i know that was confusing for me when i started out because i expected to just do something like this read csv data fake talky dot csv and you can actually do that you see down here how it actually um you know outputted some code the thing though is it doesn't actually save it all it does is it just displays those results so if i want to save it so let me call this like fake tuck e2 i have to use that assignment operator and create the this object here so fake tucky two only exists when i run this line of code where we assign the results of this function to this object so now i have fake e2 and i can click on it and i can see it up here in the viewer um [Music] okay all right i'm going to keep going so again just that's just what i said all right now that we have read in data uh let me actually pause for a second see if they're they're questions because i've been talking for a little while and i like to see what questions folks have so let me pause for a second i think there's just a couple people that are still on right problems reading the data and maybe like new directories off or something yeah i'm i'm guessing with rachel um i would just run that use this the use course function again and then open that project that that's what i would do um okay not seeing any additional questions so let me go ahead and um oh yeah so rebecca can you explain the code without the assignment operator again yep um so let me copy this and go down here so if i just run this line which is on line 27 now read csv and then data fake talkie.csv ahead and run that the console is where it will output results and so when i run that what it's doing is it's actually showing the results of running that function and so you can see it says a typical which is um similar to what's called a data frame it's basically it's showing that it's data and it's showing okay the first column is student id then first high school then school district male race ethnicity etc and if i actually open up this object so if i click on the object up here note that that's the same school or student id first high school attended etc so it's just showing the results here whereas up here for example let me delete this because i don't actually need that that is um that that is what is um actually saving that now as this fake tacky object hopefully that kind of clarifies things for you okay cool all right let me keep going so let's take a look at our data and i'm going to talk about various ways to do that um and then we'll this is where we'll actually use the skimar package in just a minute so we can examine our data in several ways if we type the name of our data frame for example fake tucky r will output the following this so let me actually go into here so if i just type faketalky and note that rstudio will autocomplete as well once i have an object loaded so i'll do command enter to run that line it's going to output that same exact thing that happened when i did read csv and the reason why is because we saved the results to this object called fake talky so now if i run this object called fake tucky again it'll output the same exact thing if you want to just look at the first few rows you could do use the head function so for example if i do head fake tucky 5 it'll show me just the first five rows and here fake taki and the number five are what are called arguments so the first argument is the data frame that we're using or tibble and five is the number of rows that we're showing uh in a similar vein tail will show us the last x number of rows so tail fake tuck e5 comma sorry i should say fake tacky comma five will show us the last five rows another thing you can do is use the view function let me copy this um and that will actually if i go into here and i do view with an update v uh it will actually load fake talk here in the viewer again as i show before you can also just click on the name of the object here i do this all the time i know a lot of people like to just view their data down here in the console but i i don't know just for me it's easier to view it up this way all right so those are several ways that you can uh take a look at your data um actually let me show you one more that i didn't have my slides but i started using a lot recently um there's also a function called glimpse so if i run for example glimpse.faketucky when i just type faketaki it shows me the whatever columns it can fit and then it tells me down below you know there are 57 something thousand more rows and six more variables and it tells me the names of those variables but it doesn't show me anything if i do glimpse it does kind of the opposite so it puts the columns each column as a row and it shows me okay the first column the student id and then it shows me you know the first however many observations fit same thing for first high school attended school district male race ethnicity and i should say this data is just um made up data from some folks at harvard made it and it's just uh something i use just to import some data it's not actually real data but it's based on real data um okay okay um so the skimar package is a way to um get some really simple kind of summaries not summer statistics but just get a kind of a sense of our data actually does provide some some similar statistics um it provides again more detailed information and it's broken up by the type of variable uh so if you haven't installed skimara you'll need to do that and then you can just run skim fake tucky so when you do that skim kentucky um again okay so what it displays down here in the console is a nice summary of your data it tells me okay your data it's called fig tucky it's got 57 something thousand rows 12 columns and then it tells me a bit about the column so for example three of them are character so in other words their their text or at least that's how our or the read csv function has interpreted them nine are numeric group variables is none i'm not going to get into that today but if we look down here you can see it tells us for example okay this first high school attended that's a character variable so it's telling us these are the three that are character it's telling us how many are missing the complete rate obviously there it's one is 100 uh min and max is actually the length of the string so there's one high school with a four letter uh name and one with a 14 letter name it's got some other things number of unique observations etc um down here for our numerics it'll tell us say um like let's look at percent absent uh zero missing so 100 complete rate it tells us the mean the standard deviation and then it tells us the value at the zero percentile the 25th percentile the 50th percentile 75th percentile 100th percentile um [Music] huh that's weird now that i look at that i have no idea why it's never noticed that it should not be 3153 uh absent because obviously that's not possible um [Music] so anyway this allows us just to kind of get a sense uh of our data um i like to use it when i first start out just to see you know what my data looks like um so if you haven't already go ahead and open the exercises dot our file follow the instructions to use the head tail view and skim functions to take a look oops at your data okay so let me go ahead i'm going to exit out of this okay and just open up the exercises that are and you'll see the comments and below each comment go ahead and do the things so for example write the function with head to show the first 10 rows and then same with tail last 20. okay let me pop out the participants so and give give me a thumbs up if you wouldn't mind when you are done all right looks like folks are good uh any particular questions on that folks have anyone notice any anything unusual um let me move this out of here anything unusual that you noticed um on the output of the the skim function here aside from the i don't know what's up with the 3153 any of the other variables anything look odd to you it's not a trick question there actually is something on yes thank you sarah so you notice there are lots of 999 values so we've got some issues right and i'm going to talk about two things several variables have the max values of 999 this seems suspicious right um so the other thing is that some variables show up as numeric but we know they're not really numeric so as an example like enrolled in college um that's a dichotomous variable i mean we can list it as 0 1 which is i'm guessing what was the intention but we can also you know read those in as a different type of variable so i'm going to show you how you can read for example those variables in as characters [Music] in r there's also the idea of a more complex very data type called factors which you might actually want to use but for now i'm just going to show you how you can read them as characters and we can add to our read underscore csv function to do that so let's deal with each of these um okay so sorry i'm going to i'm going through my slides uh but not talking about them so let's import our data again we're going to need to do two things to deal with this we're going to need to tell read csv okay how do we how should you handle missing data um because that 999 here that we saw it's probably not actually 999 it's probably that 999 was used for to indicate missing and we just need to tell our specifically to read csv hey don't treat that as a real value instead just treat it as missing and then we're also going to tell read csv to assign the correct data type to each variable okay for the ones that we want to read in as character not this numeric so let's do these one by one so in terms of missing data recall that this is how we imported our data the first time um so we just ran that line of code but we can also add an argument to our code to tell read csv what data is missing and so note that we have a comma after this and that indicates that there's an additional argument and so the argument is n a equals and we want to tell read csv hey treat whatever um i put here as missing so if i go back here let me actually just delete all these for simplicity and i'm going to overwrite that close this so if i run this again i've now re-uh read this back in and if i look in here but before it's like i can actually search here i can search for 999. and it looks like there are some 999s but i'm guessing that's like maybe in the [Music] i don't know what that would be in but in any case i don't see like 999s here and actually to to make sure i can go ahead and just run skim fake tucky again um again and now you can see that i don't have those 999's here right because i've read imported it and then run the skim function again on it okay so that's how we can read in and tell read csv what values to treat as missing so again just what i did the other thing that we can do is we can tell read csv which uh columns to treat as what data types you may have seen this message when you read in your data so let me even like show you where that showed up oh you know what actually hold on let me restart my session and run it again to see if uh if i can generate that message so i'm looking down here in the console okay so of course since i made these slides they have actually changed the the way it spits out so now it shows me which ones are column and which ones are doubles which double just means numeric um [Music] this is what it used to look like um so it's telling us hey these are the columns that i think are character and these are the columns that i think are numeric but let me go back and run that skim function again and let me treat for example like free and reduced lunch i don't actually want that to be uh numeric because it's really just a one or a zero the student receives free reduced lunch or they don't so again these are the the data types that i talked about and they're a lot that i'm not talking about so we can add this here and let me talk through what this is um and this is actually if you download the materials like yesterday anytime before like right before we started i actually um or no i think yesterday i changed this what we can do with the call types argument is we can say i want you to treat these columns as this type so for example with enrolled in college when i say enrolling college equals c that's saying don't treat enrolling college as numeric which is what our read csv guess that it was based on the values instead let's treat it as character same thing for free and reduced lunch same thing for male same thing for received high school diploma and it does have a kind of funny syntax not funny but um a syntax you probably haven't seen where you have to do list and then open parentheses and then each um column equals whatever type you want it to be for now just c is a character d would be double for numeric but we're only turning things into characters so we can list the columns that we want to turn into characters in this way then we have another closed parentheses this one is matched to this one and this close parentheses actually matches this one up here i will say most of the errors that i end up getting in r are due to like misplaced close parentheses or commas that i misplaced that kind of thing so um yeah um so if i go in here and i'm going to delete that paste that in run that and you can see now if i go down to skim fake tucky the columns that i wanted to be character for example free and reduced lunch that we talked about before now shows up the character section no longer in the numeric section so this syntax is how we can tell read csv if we want to manually say this is what this data type should be [Music] so i want you to open up uh exercises.r change the codes that you correctly import the fig techie data frame telling read csv which data is missing and explicitly defining column types where necessary so basically doing uh what i did just a minute ago and i think that's all um after you make your changes make sure you rerun the code and then you can have it here um have a run skim on your new on your newly created fake tucky data frame and make sure that everything worked okay so let me go back here i'll just leave that up so folks can soon so again just give me a thumbs up or if you have questions put them in the in the chat um yeah i mean you could make student id a character as well um yeah oh good question jocelyn i'm gonna actually i'll talk about that just a second kate can you paste uh last kate can you paste your code into the chat that you are trying to run and um can take a look uh yes sarah i'm not sure why the call types isn't working you could also paste your code too if you want us to look at it um i don't know kate i don't know if any of you organizers see anything that looks totally fine to me did it work kate when you just read in read csv without the call types i wonder are you in the project do you see intro meetup on october 2021 very strange yeah i don't i mean as far as the code i don't see anything wrong either so i'm not sure um kate i'm gonna actually in just a second show you one thing that like how i would deal with this or how i would attempt to figure it out um anyone else have issues questions all right i'm gonna close this close this and kate i'm not forgetting about you i'm just gonna go through the slides you'll see you actually set me up really nicely for the last slide that i have um so um hopefully um you have gotten a sense of how uh our and our studio work you have learned now to um you've gotten everything set up you've learned about packages projects how to import data how to take a look at your data um let me just show you one um let me show you one one last thing so kate you other kate you asked about what's going on with a 3153 for data cleaning um so actually what i would do uh that's a that's actually a really good question um so let me actually deal with that so what i would do is this um within my n a argument i can actually give it multiple values that i think are n a's and the way i do that is using the c function so with the c function it just combines multiple things so i would do it was three one five three yeah and i would say n a equals nine nine nine and three one five three so let me run that again and let me run skim fake tucky and see uh that's very weird it now has 658 um okay so if i were really dealing with this the way i would do it is this i would add that which is called a pipe and i would say sorry this is getting a bit advanced for what we're doing today but i would say basically mutate percent absent equal or no sorry um yeah replace n a let me write the code and then um i'll explain no wait actually let me do this let me do filter um sorry i'm struggling to think uh while i actually talk so i would say mutate percent absent equals if else so basically if percent absent is greater than one make it an n a otherwise keep it as percent absent so i'm just i'm basically just changing it oh so i need to make that um what is this yeah again a bit more complex but all i'm all this line does is it says if it's greater than one then make it missing otherwise leave it as percent absent so now if i went down to skim um yeah now it looks good there we go okay so let's talk lastly about getting help because um and a really important skill with working with ours being able to get help um several ways you can do it first thing you can do i haven't showed you this because i wanted to just kind of have you walk through things with me but if you type read cs or any function and you just put the question mark in front of it in the bottom right it will open up the help file for that function i will say the documentation for our functions varies widely the tidyverse functions tend to be tend to have better documentation um and so it will give you there's a skill to reading help files but it will give you all the arguments that you can use so you can see for example call types is here where is n a n a is here um and down below it'll explain a bit more about them and then usually there will be at the bottom some examples that show you how to read in data um so i definitely start if you're having trouble with the function taking a look at the help file i also highly recommend you take a look at the titiverse website um so pin that out of the way at the bottom maybe um [Music] so the tidyverse website will if you go to packages you can see information about all the different packages for example if i click radar it just has they have really nice websites with help files what are called vignettes which are kind of like long form descriptions of how the packages work another reason why i really like tidy versus because of the the websites that make it really easy to view the documentation again these are the package vignettes um [Music] so if you're using the tidy verse i would just recommend going to the website and then clicking on uh the top oops i think i already or yeah clicking on the top um there will be uh like articles or something like that and that that will take you to the vignettes um within oops i don't want to open visual code studio um can also gosh we're going to see how oh yeah that's right so if you type vignette um package equals skim r oops i don't actually do this no okay really i don't do this very much because i don't actually know how to load a vignette within the uh within r just one of the organizers want to help me i've never done that either actually um it is possible to load a vignette oh maybe is it vignettes no okay well um again if you're using tidy verse packages the websites will have the videos uh other places to get help twitter um if you're not on twitter i would and you're considering using r i would recommend joining just for our uh use the r the rstats hashtag is what people use it's where i have learned like 90 of what i know about twitter um or sorry what i know about twitter what i know about r so definitely a great place to go people are super welcoming um happy to help beginners so yeah i definitely recommend it there's also r for data science community it's a slack community people happy to give help there as well and then last thing and this is uh kate with the last name um your question about or your the issue you ran into i would google that and i like this uh quote from kale edmondson about what it means to be a skilled programmer um so if if it were me and i got the um if i got that error that you've got whoops i would google it so uh here's your error so i would you would end up uh somewhere um oftentimes it'll be what's called slack overflow which is a place where people will post issues i don't know that i'm actually going to be able to solve it right now but by googling it you're likely to find something hopefully that will allow you to kind of figure out um what's going on so i would probably just like scroll through these and actually you know if i were doing this i would actually add the name of the function that i was using so i would do read csv and then that and see if i'm getting anything that looks relevant kate actually has a really good question she said she fixed the error by changing how things were tabbed over on each row which shouldn't matter too much but she said can you share the significance of where the code is entered on the row like when you go into the line and tab over like see how you have things tabbed over and whatnot so that shouldn't actually really matter yeah it should more for an organization readability thing but yeah i mean like if i like it could have been possibly like a line break maybe in your code yeah and i will say sometimes if you copy from zoom yeah chat it like messes things up so yeah yeah so we didn't solve your question but hopefully gave you an idea for how you might think about starting um so i do see it is time for us to wrap up are there any final questions things folks are thinking about before we wrap up um okay spaces at after the end of a line an issue i don't think so i don't think so either no um i wonder if you happen to have some weird um something syntax and or not syntax but like some weird character or something um cheryl if you want to uninstall a package and reinstall it the easiest way well you can go to packages and um you could find whatever package and hit that x but the easiest way to do it is just to do install packages package name and then it'll just update to the latest version okay well people can still ask questions but i just want to say thank you so much david we really appreciate it it was such a great talk so thank you for being here i'll stop this real quick and people can keep asking questions as long as david wants to stick around so yeah yeah i'm happy to stick around for a few minutes
Info
Channel: R Ladies STL
Views: 84
Rating: undefined out of 5
Keywords:
Id: tbeLU71ZKkU
Channel Id: undefined
Length: 100min 51sec (6051 seconds)
Published: Thu Oct 07 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.