Importing/Reading Excel data into R using RStudio (readxl) | R Tutorial 1.5b | MarinStatsLectures

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Mike Marin and in this video we'll look at how to import Excel formatted data into R using the Reid excel package and the our Studio menus a quick reminder to subscribe and click on the bell to receive notifications when we upload new videos we've got hundreds more awesome and absolutely free videos teaching both R and statistics so the read excel package can import both xlsx formatted files as well as the older XLS format to do so we can either click on file then import dataset from Excel or we can click on the import data set tab over here and then from Excel if we're importing data directly from the web we can enter the URL here or we can select browse if it's saved as a local file on your computer I'm going to go select this data here I have saved in my teaching folder and the Excel data file taking a look we can see this is saved as an XLS X format so let's open that up here now we can see over here our studio gives us a preview of what the data is going to look like over here is the name the data is going to be given and by default the data name will be the same as the file name I'm going to go ahead and change that to lungcapdata here looking at the other import options we can see this sheet option here and if we take a look at the excel file we want to import we can actually see that this file has two different worksheets this one here as well as the second worksheet here so by default R will open up the first worksheet but if you'd like it to open the second or another worksheet you can select which worksheet here so first we're going to look at importing the first worksheet and then we'll come back to looking at importing the second worksheet later the next option is range this allows you to select certain rows or columns to import and again we'll come back to this when looking at that other worksheet and importing that the max rows option allows you to limit the number of rows of data that get imported we can see if we set that equal to five it's going to end up importing just the first five rows of data now usually you're going to want to import all of the data but there may be occasions where you want to limit the amount of data that gets imported the Skip option here allows you to skip rows on the import setting this equal to one will skip the first row setting this equal to two will skip the first two rows and so on the n/a option here is how we deal with missing values so let's take a look at that excel file first you'll see here we have one cell that's left empty or blank and that's one way of indicating a missing value we also have some other cells with three stars in them to indicate missing values I included both of these so we can show you how our finds or identifies missing data so you can see here that blank cell is treated as n/a or missing by are so blank or empty cells are the kind of way of indicating to our that the value is missing if a different character or symbol is used to indicate missing values such as the three stars here we can tell our what our coding is for missing values by filling that in here so we can see if I tell our that three stars indicates a missing value now those starred observations become n/a or missing the final two options here I'm ticking the box of first row as names just lets our know that the first row of our data set is the variable names and will untick this if it we do not have variable names in the first row and taking the open data viewer gives us a data view once the data is imported in our studio we'll take a look at that once we import the data now we can notice that read Excel doesn't always get the variable types correct if we take a look at the disease variable its coded using ones and zeros for yes and no although this variable is actually a factor or categorical however if we hover over the little triangle here we'll see that this is going to be recognized as a numeric variable with a range from 0 to 1 and we want this to be treated as a categorical variable or a factor so if we click on the little triangle here we can change this to character we can also see some of the variables over here are listed as having a type that's double so let's just check those to make sure that go okay if we hover over the triangle we can see lung cap is going to be recognized as numeric and that's what we'd like it to be one final thing to point out is if we click on this triangle here we can also select this option to skip setting this variable to skip is going to have our skip it or not import that variable when importing the data so I'm going to have it skip age for the sake of demonstration now one last thing to mention before importing we can notice down here we get this code preview and this is the code that we would enter into the console in order to import the data from the command-line instead of using all these menu options so let's go ahead and copy that here and a quick note you can also just click on the clipboard there to copy it and what we're going to do is we're going to save this in an our script so that the next time we want to import this data we won't have to go through and enter all these menu options we'll have the code ready for us now let's go ahead and import that data now quick note on that warning message that popped up that's just letting me know that the read excel package was built using the most recent version of our and I'm working on a slightly older version it's nothing to worry about though so up here you can notice the data view and this allows us to just get a quick look at the data and this was the box that we left ticked in the options previously you can also notice that aid was not imported as we asked for this variable to be skipped so now we can go in here and under file we can create a new file and a new our script and in here I'm just going to paste those commands that were used to import the data so I'm just going to show you this here quickly I'm going to ream port the data again under a new data set called lungcap too and we can just submit that here ask it to import lungcap to data and if we asked it to view again we can see it's imported the data as we wanted it so this will just allow us to skip all those menu steps the next time we want to import data now let's quickly take a look at importing that other data sheet to do so we're an import data set from excel and I'm going to select that data file and first let's give it a name I'm just gonna call it other data and remember the default is to import the first worksheet so we want to select that second worksheet here and you'll notice things look a little bit messy here so let's take a look at why that is if we go and look at the original excel data we can see the data set in the second worksheet and we can see the first few rows are filled up with kind of a preamble maybe describing the data set a few rows after are some summary statistics like the mean and median and so on and really the data that we want import are just these variables here and this data is contained within the rectangle defined by starting at cell b3 and completing at cell 11 so we'd like to import just the data within this range so here under the data range we can let it know we want to import everything between cells b3 and 11 and you can see once you put in that range now the data looks like like we'd like it to just the four variables none of the extra stuff getting in the way so this looks good so let's go ahead and import that all good well I hope this helped subscribe like and share our videos and let us know what you thought in the comments below
Info
Channel: MarinStatsLectures-R Programming & Statistics
Views: 114,267
Rating: 4.9352851 out of 5
Keywords: RStudio, R Studio, import data into R, read data in to R, copy data into R, loading data into R, copy data from excel to r, readxl, reading data from excel, importing data from excel, importing data with RStudio, r packages, r course for beginners, r programming tutorial, r programming language, statistics with r, Data Science with R, statistics for data science, R programming for beginners, statistics crash course, statistics course for Data science, marinhamadani
Id: JYVWufSQ4OI
Channel Id: undefined
Length: 8min 12sec (492 seconds)
Published: Thu Nov 22 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.