Day 1 of Introduction to R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
again solo ah all right great so some of these questions will kind of seem self-evident if you're in this class you probably know what R is and you know maybe you want to know why you can do it so I'm going to give a brief rundown of you know what R is so it's a language and environment for statistical computing and graphics that kind of in a lot of respects it started it's kind of like a pipelining software as well so it's kind of grown in the last you know 15 to 20 years really with a big boom in the last like five to six so a lot of it was about statistical computing but there's stuff for there's packages for kind of all different types of things like natural language processing you know the whole game so it's an open source implementation of this thing called s right so this is built before the time of Google otherwise you probably might make it language there's a little bit more Google than R but we will talk about how to maybe Google for some answers but it was written on s it was developed at Bell so it's open source and open development which means there's a community of people to some it's their job some of them they volunteer and they write all the code that kind of goes into the thing that you get when you download are right to all the basic functionality and all that kind of stuff so um some of the reasons why you know a lot of people like these are powerful and flexible so we can do a lot of things right that being said some of that flexibility comes at a cost in the sense you know SAS is developed by an organization state is developed by an organization they do a lot of work a lot of time a lot of energy a lot of money to standardize those things so that you know function a and function B have relatively similar syntax bar I write a package the next person or is a package the next person writes a package and we'll talk exactly about what packages are those things act might be completely different I might have a very different way of organizing my thoughts and coding functions than the next person so it is flexible it is powerful but for example that does come at a cost so it's free right even downloaded on your laptop's no licensing really where the power comes in these add-on packages or the add-on software that is written by the community so there's now over I believe 12,000 packages out there between the comprehensive our network cran where you downloaded our and bioconductor which is a bioinformatics repository so a lot of it was on triskin physical computing so there are a lot of state-of-the-art statistical methods you know a lot of mixed effects models that kind of thing penalize methods kind of you name it it's a high-level language so there are things that are highly technical about it it is an object-oriented programming languages of sorts so that and you can do certain things with objects that you won't be able to do in sacks or Stata for example so in SAS or Stata saving out a table as an object and being able to manipulate that for example in Stata you have to save it maybe as a CSV and then reroute it in or use post file or some of these other functions to kind of not hack your way into it but it's not necessarily designed for that if you use data for you know getting out regression tables and stuff like that is kind of a chore that is not a big event or eonar because you save that table out you can change the column change the rows just like you would any data set and SAS is a totally different kind of beast altogether it does a lot of good reporting but getting things manipulate to manipulate the way exactly you want them is not always the easiest set of code so why not are so as you can see about a number of people here where you've been growing in the number of years of teaching this class there's a fairly steep learning curve there are a lot of resources online but I think a lot of new users I've heard the concern you know where to start right and there's not like a manual or if it is sometimes too technical or too specific so that's why we kind of built this course it is programming oriented and it has a minimal interface so when you download are just a console right there's no point and click there's no drop-down menus like there isn't data or some other pieces of software so to a lot of new users that's kind of a non-starter the other thing is you know the poor are members are not a company right they're not necessarily all paid to do this so there's not a lot of centralized support so you rely on Stack Overflow online communities are helped and list sir kind of get answers but there's no guarantee you're going to get that in a timely manner or necessarily in the most polite way so that does throw some users often that's they somebody do that sometimes it is annoying to update so if they make major changes to the versions most things will you know still work but that's the beauty in the curse of working with a actively developed language is that your code from ten years ago might not work whereas SAS puts hours and thousands of dollars to say your stuff today will work on a computer from 1996 using SAS 2.5 all right so it's slower and more memory intensive than more traditional programming languages like C Python a lot of packages in the advent of the last five years have really tried to bridge that gap so there are some things that are really speed up code but it's if you're saying you know if you're a software developer and you wanted to roll this out to like an app might not be the fastest day so really quickly I mean we try to purchase a live we talked about for slot so you know why are you here I want to kind of get a gauge on know why I mean learning ours is good but you know there are other languages or other reasons so do you have any yeah in some SPSS right it has dropped down it is more or less built to be dropped down kind of cool money so I know it's free but I'm there's other responses I get worse like you know so many my lab used it and they have to take over their code so you know yes plotting is a big is a big thing and if you if you worked with data and getting a figure to look exactly the way you want you can spend months in some respects right so I will say that I'll show a lot of figures in this making a publishable figure takes a lot of time takes a lotta energy takes a lot of customization so accepting defaults is never something we're going to condone here but we will show you the way to customize it but still still figures I write for papers even though I've been programming in R for almost a decade it takes me probably an hour or two to customize them exactly the way I want with the resolution with the font sizes and all that kind of stuff so I'm just want to kind of put that there yeah that's a good point so we will touch on GD plot for plotting and there is some other interesting packages out there called plotly which if you have a plot that's already kind of made and you want to make it interactive there's a relatively easy way to do that also we will touch on it we will not talk about shiny which is a system in R that allows you to Brett write our code and essentially have a web application that can be interacted with the user so you can say what do you want to do can you click some check boxes and stuff like that send that information over to R it can run an analysis and plot a figure back just based on a webpage where no one has to actually write any code ok so I'm going to upload an update materials hopefully throughout the course that shouldn't be too frequent but if something somebody does find an error or something like that I'll try to upload it so if you do download like the static PDFs they might not be necessarily the most up-to-date but if you go off the website it should be the most up-to-date things all of the materials for this course our I will share them at the end of it I'm not sharing them now because there are answers being felt like that you know you have so after you do the labs so there is a website where you can download literally everything from start to finish that made this that made this course it's all almost all written and on okay so really briefly we're going to talk about our studio so when you start with our say I downloaded our this is usually what you get I'm going to open the our application something like this beautiful I it's just beautiful just just so intuitive now it's pretty much a console that you touch commands in it's got some other buttons and it's got some some menu items there are some reasons why we don't necessarily just work off this and that's why we use our studio we will talk about exactly why we use our studio in the next slides but at the very core are is just a consult our is just a line or a place where you type in commands whether you type in the commands interactively where you say run this little script command that's what RS right just to the command line interpreter you know it's a calculator you can create variables or objects and you're going to talk about exactly that those are kind of like what I was talking about before when I said you know our our can hold us a table or our can hold like a grant as an object and you can manipulate those things that's what I'm talking about creating variables which is not as intuitive if you're coming from other languages with respect to statistics and it applies functions so you say I want to apply this function to the data set or this vector or this column and so throughout the time I will be saying variables in a sense of things that are in our right I don't need a variable in a data set right that being said sometimes I will slip in and say this variable in this data set right because that is kind of the way it's usually termed in statistics and dependent dependent variables but most the times I'll try to say column of a data set or column of the data frame and we'll talk about those in a minute but whenever I say variable usually I'm talking about something loaded into your memory in your computer we'll get into these specific so in our studio there's an analysis script which allows for interactive exploration let's go to that really quickly inside so it should be take a little turn around my fault so this is what I want to show you this is the console so this is our studio okay so if you download in our studio you open up your studio this this is the console so when I open that our application this is the exact same thing just put in here okay and the analysis scripts are put up here so usually no okay so our robots are on functions alright just a set of commands that execute a script of code right to do something right whether you want to sort a column you want to make a new variable you want to run a model something like that so that's pretty much what we're talking about functions so it's just commands at a input do something return an applet so when you download R it has a base set of functions right so when you load up that our application that I showed you it's got a couple functions already look like Ln for linear models GL n for general linear models you know sort to sort something things like that that's what we're going to be referring to as base R so that is the thing that the our organization maintains writes and says this is what R is that's important because when we start using other people's packages we are going to differentiate between what is base R what is the base that you just download we're kind of add-on packages and I'm going to do that because the code on line that you say like if you put a help file out there and say I want to know how to do this sometimes they're going to give you ways to do it with additional packages and sometimes they're going to say well if you don't have all these additional packages you can do it this way in besar so the other thing you will note there are many ways to kind of skin a cat in our there's many ways to do the same operation some are more efficient than others some more intuitive dollars but there are five to ten different ways to do the exact same operation with more intuitive code or not we're going to try to give you the code that's kind of the most intuitive and pimp in my opinion so like I said other people write these packages I would like to think some of my packages are good some of them are not some of them I wrote six or seven years ago I don't know if that code still works as well as it should it could have been optimized something like that so all I'm saying is if you were going to use packages from other people we were going to try to talk about very briefly how to trust those things all right so the way they're described is our extensions so again you need to know base our because answers from Google will use that um but there have been more intuitive ways develop from last probably I would say five years to say hey this is how you can manipulate this data set in a way that's like oh that code makes sense whereas if you do it in a standard way and are the way I would say I grew up on are about nine years ago to a new user it's not intuitive so um Hadley Wickham is kind of an art guru out there so he wrote ggplot2 a package we'll be talking about extensively for visualization he wrote a package called deep liar which will be using for data manipulation and a whole bunch of other packages that have been deemed kind of the had reverse aisle packages so he has so many big a lot of packages that are very very ingrained into how we're going to teach this class they've also they've redubbed it to be called the tidy verse so they didn't have his name attached to it but the idea the reason I bring him up is that I trust almost every package he puts up so unless he says this is the package I wrote yesterday don't trust it it's a very reputable resource and he works at a company company called our studio there's a company called our studio that makes a piece of software called our studio okay so the thing you downloaded is our studio by our studio okay so there's a a pretty interesting blog post here one of our Faculty's Jeffrey Kerr saying how do I trust in our match so generally some of the strategies are look at the popularity of the package if only three people never used it you don't necessarily think that it's been well tested all right at least by the community that being said sometimes if you're looking for a very specific analysis or a very specific function sometimes you have to use that but for the most part for 95% of the things you want to do make box blocks make you know histograms do all the kind of English and one or data manipulations or generalized linear models linear models mixed effects model that I sell all those are relatively are very popular packages that it's been extensively tested and a lot of them that are done for modeling like mixed effects models GE es a custom have been validated against some of the other standard pieces of software and SAS and Stata for example okay so whenever I talk about our studio for the rest of the course I'm not talking about the company I'm talking about a peace offering pure which is described as an IDE writes an integrated development environment so really the reason we use our studio it makes it easier for the user right just being able just typing commands into a console is not intuitive for the user being able to search for commands I have everything in one cohesive place is really what our studio allows you to do what it is not it is not a drop-down of statistical tools like in state so it's data you can click the drop down menu and say I want to do a longitudinal data analysis set up my data this is my Y variable V 0 X variables and it will set it up for you that is not what our studio is and in general in the art community there's not really anything out there like that in the same in the same way there are two times that we point to we're not going to talk about them our commander and Brady so if you click on those it will give show you some tutorials they have kind of things that will help say I want to do a histogram or I want to do a fish's exact that they'll have some drop-down boxes but we'll cover all those commands here okay so some of the things when I say it makes it easier so syntax I'll so sometimes you want to you know read in the data set so we're going to function that Regis and then we're going to Pat in the location of that file write that location that file has to be for example encapsulating quotes this is a string this is where the locations but let's say you forgot if you forgot a quote all right with syntax highlighting the rest of your code will change a different color to more or less indicate that this you're more or less missing a quote right so things that you that you common mistake common typos and stuff like that are really helpful to be caught with our studio code completion so let's say we're using a function and it takes in an x and a y z but let's say it so three variables but let's say the arguments in that function aren't needs XYZ there's something really longer or strange if you hit a pad it'll say hey these are the arguments of that function just so you don't have to remember so there's just intuitive ways to kind of allow the new user to get up and running or just any user and it's marking the occasion so a lot of times you've written food per file reading code a new Trinity language it's really messy format and well it's really hard to see the our studio our studio really helps you do that and it easily allows you to manipulate and jump between multiple projects and working directories so if you if you have to our studio windows open one can be an analysis from you know collaborator a and other can be a analysis from collaborator B there's no cross talk between them you can kind of separate them out and say oh I want to just run this and let this run for a while and switch over to another project that's totally fine okay so it also tells you what's loaded up in your time it shows you the history of your plots you can scroll through them it allows you to export PDFs and figures you you point and clip we're going to show you how to export them and save them from the command line it has help and documentation kind of all in the same window and it shows you all the search history all the commands that you wrote in the last you know 200 or 300 commands so again this is the console this is where you type things more or less if you want to do things in Iraq so say I want to run this command I'll create this data set so it's where code is actually executed you can type in here things that are actually but once you once you've got a piece of code that you want to say you don't just leave it in there put it into a skirt that by the way so whenever I'm doing in mind let me pull up my RCT or really completed to show you what I mean my art studio looks like this so I use a black background background waiting so wait okay that stuff so my my art studio isn't necessarily going to look like yours all the time that's just because I changed some preferences around okay so if you're saying why doesn't it look like this and mine doesn't look like that it's because I've changed the Preferences and so if you go to our studio preferences pane layout it'll put the pain in one of the four forms that you want so for example source will be here console will be over here so I like my photo I left where I'm executing it on my right that just intuitively makes sense to me let me just jump back the way our studio by default has codes up here where you execute it on the bottom that doesn't make sense in my brain so I've changed and then also you can click appearance so if you don't like white background and dark text then you can change all that I've seen and again we're not talking anything about Aria it's just kind of getting set up and running so here so let's say I've been writing online Hansen here I finally have some commands I really want I'm going to put them in the script up here so you'll see by default they'll all be called an untitled and when you ask them when you edit anything in here this name up here will change to something red and Aster's to indicate that that hasn't been saved yet so source so if you double-click on a our function or sorry our file on your computer and our studio is associated with it that's where they open up right it's where the code and the comments are you can highlight code and press command enter or control insert windows are not to run the code so that's where you that's where you will actually save analysis you'll do the interactive stuff in the console you'll save all the command somewhere else so you have a reproducible analysis in a script so over here is the workspace slash environment right so it'll say environment is empty if you started things up but once you start loading in datasets loading in you know tables or make models or something like that this is where it's going to list out everything that's loaded in your our session all the things you've created so it tells you what objects aren't our again I'm going to say objects or variables so things you define things you can created this could be a data set this could be a model this could be a whole bunch of different things so history I'm going to go back so history is only looked at it'll say you show me the previous command it's good for us to look after debugging but don't rely on it at the script right so you can you can actually copy pecos into a strip but sometimes depending on how you open our studio like I sell if you open that and a different directory they might not say all of your mates and your pre-decided so it's not going to rely on say of eleventeen safety here I don't need to save a script so you can also type the up cane so if I'm in in our studio session at x equals 5 I'm going to make this a lot bigger and it's at x equals 5 y plus 4 if I just hit up let me uh I hit if I hit up to scroll through the previous commands that I've run so instead of going to history which says you know x equals 5 y is 4 so this is a little low for most people pull it up right instead of having to go over here and look through that you can just hit the UP button and it will scroll through your previous commands so there are other pains that we don't really talk about the much good we don't use them as recently so files shows a file on your computer in the directory you're working in so sometimes if you're in a certain folder and just list the full list of files in that folder just to say you know for example G Road ran a piece of code that says re in this dataset and it says F datasets not there you can double check the you know for example the job if it was a typo in the name or something like that viewer we will try I will try to use this a bit more so those who use data or SAS for example do a lot more interactive browsing so I'm a different so you actually look at it like a spreadsheet and our studio allows you to do that it allows you to view like datasets like that I don't do that so much we're going to talk about two commands head and tail that pretty much just print out like the first five rows just get a gist of what's going on that's what we use generally by we I mean myself and Andrew the other creator of the course but we will talk about the viewer and how you can kind of scroll through things if that's more intuitive to you he'll we're going to go into that shows the help of our commands what the arguments are what is the things are supposed to be passing in and so on plots all plot you've done passages is a list of our packages that are loaded up when you're running on so there's some other shortcuts again command enter will will execute the code let me show you that in a minute control 1 takes you to the script page control two can flip around the console there's a whole bunch of different shortcuts but what I'm talking about here is if I say X equals five if I hit command enter it'll instead of having a copy and paste over and over again right I just highlight these two lines and hit command enter or if I hit command enter it will say command enter it will ask you the line and then move the selector down a line right so just demonstrate that in your time here hit command enter this is going to move down execute and move down so that you can kind of step by step go through your script to see if something went wrong or you know you step by step and say okay this is what the ganga step looks like here there's 15 rows or something like that and then you did some merging operation there was no data there for some reasons you can see like it emerged and go writers okay so having talked about any are syntax any are anything like that and so that's what we're going to get into now so this is object tonight alright so again everything right now is the is really just going over setup the reason we did go through this in previous years we made more or less apply a correspond to every single kind of question we ever got from this course about what's going on so exactly we need that fly you know to show you how to make the layout because some students were a little call-out that our our studio didn't look like their our studio so we were trying to head off any questions and get everybody kind of set up on the same page so do everyone have our in our studio downloaded and kind of up okay alright so let's start talking about code so the other thing is you know downloading that and getting that set up is relatively easy but if you don't necessarily update your software sometimes different versions of the software let's say you you and downloaded are five or six years ago and then you try to run something that someone else you know develop it all didn't work and that could be because for example I'll fit in my bar they might have added one or two new functions to the very base of our and that might not work if you use your your your version from five or six years ago so that can be sometimes a frustration the recommended thing is to update whenever there is a major or minor release so for example everyone here should have our three point four point zero yes there are so for example I probably like 100 X loaded in mind let's download it on my machine so there are ways to kind of update that when they do a major release so the version so three point four so the three is major so if you have like two point five or something they made a very major release it's not so easy to do that but we can we can do that we can get you up to date kind of if kolani oh you are obviously I guess for future reference there are yeah um so we will we can maybe touch on that a little later so so it's not the data size problems so depending on how you agree need to say is that one column you think is all dollars right but let's say an Excel spreadsheet you read it in and there are spaces in it right right like what when you look at it it looks like heavy stuff when you read it in are if you have you know specify that empty spaces or just spaces are just empty cell it will then say that whole column is for example example characters right it's not known so let's say you would age but want to sell I can have sick justice Jason it's going to read that Oh common and the character which is sometimes frustrated um so we're going to talk about data types and what you can do because then you can't do things like do the age plus five right you can't do that because our doesn't know how to add a word so sometimes we're now like you know show you how to identify you know what's a character what's the number so for example in this data if you open up the browser they're different colors strings are red numbers are black label things are blue they're not color coded in are but there are ways to kind of differentiate and there's functions to say like is this a cat so working directory problems so sometimes you say I want to read a file and you put a file name in there so one that is usually case sensitive so if you have a capital A versus a lowercase a it has to be exact finally the second thing is a lot of times new users will put in a complete file right it's only it would really see drop slash users slash genre Shelley's flash stock and flash are class / you know five and if I came at you know a piece of cookie you does not arrive because you don't have that which you use my name is traitor login for some reason okay so our studio can help so our Co projects which we'll talk about a little bit later will open you to the directory of that project so what I'm talking about right here so when I go to the console it'll tell me what we're going around in there right so it says I'm in my Dropbox teaching intro to R and so if I go to reading a file like let's say the baltimore city employee survey and i wanted to read the csv in there alright i use a relative hat right i don't put c drive all that yeni I'm just saying I want to read data and this file name and are says okay you want to reading this file I'm going to assume it's in this directory well it's actually in the data subdirectory but because I put a slash data right data slash so it means it's a subdirectory so if I if I did go to files and I went to data we see that that that isn't that and so very briefly I went over this very quickly but when I click data I hit the tab button and you'll say these are the files in there right so if I open something with a quotients a date like you know some folder or something like that I can scroll through and then get the exact name hit enter and so that I know 100% of this filing is going to be correctly you know there's not going to be capital letters over eight letters errors in that comes on yes here any in fact quotes so it so let's say so filename is an object an arm and so so these run the exact same pieces of code the only thing it's not I feel quotes around it say hey this is a string whereas I did I passed an actual object into the function and it wasn't quoted so are says hey let me look in my environment and I'm going to look for this name s mean filing it looks fine to here and then essentially patents industry does that make sense so in the way of different use at least here is it's quoting singular double quoted are interpret that as a characterize it as a string as a word rather than some sort of object no no that's just a full the sub folks are that's just a folder in my directory yeah so I have a folder called data so I don't know that yeah so no no so yeah so it's not a good question all right so um we're going to talk about that a little bit more in they put out lecture for a book so are in K transitive lowercase X uppercase X are totally different alright lowercase X in just example I just did why was it it was a dataset and then just write it I can say capital x equals four they're totally different things they're not linked in any way this is another thing coming from other languages is a big kind of a concern because you can have two columns age and age one with a capital letter one with a lower loan with a lower case letter and you can have them both as columns in your data sense whereas other other software case so it's just case in our Saguna does help with tab completion so let's say for example you like to make really long variable names so for and what I mean by that is this is a variable okay note a few things bigger one you put underscores in there do you give up there is so underscores and periods there are some other rules you can't start what you can put numbers in there but if you do if you type can you start typing in our studio know that a with the memory and it'll show you what is in memory if you have some like unique sorry strip so you're saying this and I say underscore is and that I hit tab and then it will complete it for me right so this is really important especially when you reading data sets that you have necessarily curated or suburb or have really long variants especially in reading for example from Excel so sometimes people like to write more or less a column header to be an entire sentence all right so sometimes this is a good way to allow you to still be able to manipulate those variables we are going on behind rename this columns but you can use tab completion in an effective way so that you don't have to remember exactly the way it's spelled or the case alright so you should have ladies version bar so we're going to go to our studio so you could do files new and our script and we're going to save the black our script to the directory I'm going to add a comment header okay so this is an example of a proper header so you'll note that every single line or somewhere in line there's a tag or a pound sign that tells our this is a comment okay so you can add as many as you want so we're just doing it so we can you know visually see it as a group you know we're going to put we put a title and author a purpose for simple say the demo our scripts I wrote it and what was the point so this is usually useful for later in life when you come back and say what was this thing supposed to do at least give you an overall kind of what message is that what this with the scripture is to do so if you put a comma at the beginning of a line everything after it's interpreted as a comment you can have again now within a comment right it's still just as it starts it everything else I'm not going to interpret this as code this is an arcade it's just comments you can write text you put quotes you do indentation whatever you want so again when I was talking about for so this visit comments and I can save that I'm going to I'm going to copy and paste that comment header copy that the top and I'm going to save it for example to my desktop day one dot R and if I highlight everything and hit command enter it will run everything but none of itself is conservative s code and then also if you have code and then you have pulse again and then you have a comment afterwards that's still okay this is code passable don't interpret anything after that there is no there is no operator for multi-line comments right so that you don't have a thing where you say like this is a comment or start a comment I can write all the text I want in the comment that's not how it works and all you have to put a pound sign in front of everything but if I like to write a bunch of stuff you can highlight and command shift C is the shortcut but you can go to code and comment comment on comment lines and I don't comment all them out for you right so there is no there's another thing so for example in Stata you say backslash star star backslash and everything inside of that would be interpreted to come that's not how it works in our you have to comment every single line out that you want but again you can use a keyboard shortcut or RCU to highlight as many pieces of code as well alright so in slides we're going to talk about code or coach off so it's shame because projectors don't really like to differentiate the colors on them but if you have these slides open on your machine this should have a little gray box around it okay so if you look at that it'll don't have like grey box all the way up here and that that shows you in the slide that that is that is code that we brought so the way it's set up is this is the code it'll have a gray box around it it'll also have some highlighting and then afterwards is the output of that code usually so in this case we just bring in the string on code and this is the output from our okay so that's kind of how you read these things and we'll have code interspersed throughout if you there should be and for every single every single lecture I you don't have to be copying and pasting they're shooting it in our file on the website that has all the commands that we've executed in this in this thing so if you do go to the website for example when you're in basic our if you click the are this is all okay that's run right so print uncoated there and bigger trinom code in there so if you want to follow along and execute these things on your own download the our file open that up in our studio and you can go that way yes yes yes totally totally separate okay alright so again our can be ours just a calculator right if you type in so it's just executing command so you can do all the things that you want to do with arithmetic so addition multiplication subtraction powers right the carrot is the power operator in our you can also use two actors back back into two actors after it for that is the same as two to the fourth power usually most people use the carrot also note that by default unless you tell are you don't want to necessarily see the result it's going to turn it right out right sort of print printer console that's not a huge yield most the time but sometimes the output that you're just trying to you know do some manipulation see what's going on and it could be a huge shoots dataset so you don't always wanted to print it out you just want to save it to an object and then maybe do it just trash that okay again carrot double stars tower you know you're going to work just on the calculator you do all kind of operations in there matrix multiplications that kind of stuff modulus but it really it keeps the order of operations right so I don't know with mock reviews we use the okay please excuse my dear Aunt Sally so parentheses exponents you know so so on and so forth you know multiplication division subtraction addition alright for parentheses you can group things together and so if you want things you know if it's X plus five so this will execute two times three all right get 6 squared is 36 add 2 to all right so if you want to 1 plus 3 divided by 2 plus 45 again this is not 4 divided by 47 all right this is 1 plus 384 divided by 2 to + already set so just if you want to group things together just need to order operation okay so what do we get we're going to get for the first one give some fractions alright we're going to get six quarters by three so negative 1.5 add two should get 0.5 just make sure I just want to do rhythm thick all right yes I'm negative yeah positive all right so um again you just play around with that so a lot of times you will be doing six things to effort with this you'll be multiplying - you know vectors together H squared or some colony or something like that well all the same concepts still work the only difference is instead of using numbers here you're going to be using variable name of variables in memory or using column names routine sentences all right so now how do you how do you create a variable how did you design so I gave you a little preview while I was doing things interactively so the way you create something in our or way I'll describe it is assignment so there are two assignment operators one I'll call the arrow but really I call it the assignment operator or the equal sign single equals so you say x equals two now in your workspace will be a variable called X it'll have the value 2 okay so throughout the entire course we will be using equals that being said we show you the assignment operator because sometimes that more two weeks on some people right if it's a one-way operation right you're saying I want to put 2 into 2 into X right because it's not because you cannot do you cannot do this you can say x equals 2 you can stop not you cannot say 2 equals x it's not the same thing the left-hand side is always the thing being assigned okay so whatever object you want to create is always 1 what's inside so a lot of code that will most to help in our is unlike that flower any other websites you really look for help they will use one of the two right so answer one might use the assignment operator answer two might use the equal sign so we want to let you know this exists but it we're not going to use it it's still very very common I don't use that because you have to use you know we have to do two things you could use press to separate buttons versus one so we say x equals two now we can just print X again we're just going to get two back and like we did with other operations with arithmetic x times four is just going to be the exact same thing as 2 times 4x plus 2 it's going to be equal 4 so there are many many different classes that we're going to talk about most common ones or the most common one that we're going to talk about with respect to data is called an ef3 okay you can think of it as essentially an Excel spreadsheet is a type of object that has rows and columns and columns to many different things column 150 a column 2 can be a gender column 3 can be some IV variable right um just set that all the rows have to have a definition for all those variables it still can be missing but it has rows and columns it's essentially a square or sorry rectangular object and have some column names maybe some row names and some data so there's a little little dance in the sense way you access columns and rows of that for real step back and start from the ground up so we're going to talking about one-dimensional classes two-dimensional classes and then some more complex classes so you start from start from the beginning we're going to introduce one-dimensional classes which are also called factors right so that there's kind of multiple sets of operation but each observation has to be at the same class and class I need when I say classical sex or sorry class of X so this is a function class before again we b2x astute I said class of X is going to say numeric so more or less anything that hold the number in our is an American class right I mean and I'm talking about when you're thinking about Excel spreadsheets one column of your data let's say age that is a numeric column okay we can assign Y to be this string again we tell us a string because quote again we can use single or double quotes we really try I think throughout the course I use double quotes for almost everything but if your single quotes make more sense I've added a single book say hello world we said it four we print it out and we say the class of Y and Y is a character so this little the bottom of the screen is a little hard to sleep some so we can bring it up so it's a character so the other thing I want to note when you go back to full screen y the character exited in America when I print X and a bracket just have the number two laughter why it's got quotes around it so if you print something out and you see it's in quotes you should know that the vector character just by the way it's printed not not having to use the class function at all because when our said I'm printing out numbers I'm not putting cloture on it when I print out characters I'm going to put quotes around it so people know that it's character there is a the we're going to thought about another format that is much more intuitive or categorical variables and only bit call factors but this artist sees this is a strength extreme manipulation all that kind of stuff okay so let's uh try assigning your full name to an our variable called name right so the only need so we only has really three components right I'm going to do up here name quotes equal sign so my nickname is John Michell okay so we go to the next slide surprisingly we already have that out there if you just if you just type it again my default is just going to print it we see my name in quotes okay nope those things are still called vectors they only have one but they're still called vectors right so a vector can be a length long at one element and multiple elements against the hundred million element doesn't matter but whether it's 1 or 100 it's still called a vector that time so some people at least if you're taking you know some math license over the years they say like a one-dimensional thing to scaler a bunch of them is called a vector now anything that contains a string or I'm sorry on a string a set of things that all the same class we're going to call it a vector so a lot of times it's great that one thing but usually I have more of them left together so the way you do that the way you create a vector of multiple elements is the C so the combined organic or collect function so if that hasn't created a vector in R so it's motion Julie use for creating vectors of numbers character strings and other data types so here um although I'm just gonna I'm gonna break I'm going to just show you I think I'm 100% sure that we don't use the assignment operator a lot right here we did so we say X we define this vector a Formula one forces me okay X the sine C and then we just start typing in the elements that we want to fill that vector with separated by commas okay so C is a function we print it out will just prints it back out again it doesn't that quotes its number we say class it's still in America is FXX so the C function you'll use that all the time okay C is important C is useful so luckily it's very short right you see but it collects things together so let's assign so I'm going to sign my first and last name I have to set of character strings into a single vector called name too so okay we're going to use the C to collect things together we're separate things by comments or quotes around things because their characters so name two is equal to we're going to collect John Michell II and now this is a vector of length could I or two legs vector okay so again we see if print it out it prints out John it prints out Michelle II and puts in quotes because they're characters but again we'll talk about why why this one it there in a little bit so it's just as a reference to show you like the index of the vector so if you print it out things like you know five hundred elements or something like that and you want to say what what where's that element is it element five or a hundred or something like that that's just an indicator saying you know that's where it is let me let me actually print out like a hundred things so it doesn't always just say one so for example is the 41st first element felt like that so these numbers in brackets in all the code that we've shown it just says one over remember again that is just a it doesn't have anything you're doing data it's just a way offerings and after that if you want to say where is this element this will die - kinda figure that out we're going to we're going to show you how to do that programmatically but that's just that's just why you see this little bracket one so now you've got a vector we can ask how long how many elements is that so we use the last function so again X we assign to one four six and eight say length of X you're going to get four okay so if you wanted to say if you had a kind of you a row of a user's accommodate you see how many how many elements do we have there you'd use one again if we set Y we set Y the hello world length of Y X how many elements are in this vector and it gives us one that does not count the number of characters in the string okay so all this although this string has all 12 characters into the graphic 79 up to 12 that is not what mics does it says this is a character of length one okay there are functions that if you want to say like how many you know how many character you're in this so let's say for example you had a vector of ID's or something like that and you wouldn't say oh how many I want to finding I need that more you know correctly format and then we have like four elements it's just that easy or something that there are other functions that life is not the function you use on characters okay so what's the expected what do you set the length of the name there well just finally just in quotes when that name - just up to again so you can perform functions on entire vectors just as you would if it was just one element so you say X plus 2 what it's going to do is every single element of X the one the for the six mate it's going to add 2 to each one of those numbers element wise so that's why we get three six eight and ten we multiply by three it's going to it is going to multiply each number by three and give you the resulting vector you can add two vectors of the same length so X 1 2 3 4 remember it's 1 4 6 8 so 1 plus 1 2 2 4 6 and so on so we will talk about you can try and use admin to do this with a warning you can add two vectors of a different length and some odd behavior happens but if you come from other languages that's not the case can we do that for example MATLAB if you try to add a vector with three numbers and a vector of four numbers it's going to say that doesn't make any sense I still don't think it makes sense but our allows you to do we're going to tell you about the behavior because if you don't know sometimes they can give you a warning sometimes it can execute and not give you a warm so it's good to know kind of what you're doing or when this hat name - was a two element vector with my name John so I try to add forward to it and we get this error error in mean time so we swap dummy plus for not America argument the binary operator so welcome to the world of errors in are some of them are very is doing it some of them maybe not so much pretty much what it say a binary operator right the Plus which is adding two things so it's operator the axle two separate things a non-american I try to add my name to the number four are doesn't know how to do that there's probably no definitions right this is important though because in other languages for example if you do this with a Beck with a with a character other things happen but you can't add things together there's a there's a command will talk about later called paste if you want to add put the number four to the end of every single number you can do that but that's not the way you do it here if you want to repeat my name four times you can do that but that's not the way you you can't add characters and numbers so are keep things in memory but it's memoryless it doesn't care what you assign something to be before they mean that we're going to say Y is equal to X plus one two three four our does not care that I unassigned are two hello world it doesn't care that before it was a character and now it's another it doesn't it doesn't keep memory of that so once I reaffirm this it will not give me an error say hey you already have a variable name for it will not do that and he doesn't care what our was previously that is gone okay so it's the same operator still assignment but if you had Y memory before it doesn't matter why is overwritten in your memory why is now 2 not 6 9 as well okay so this when we talk about later is one way you can reassign barriers so if you want to say you know age is equal to you know age plus 5 or age you know you can reassign a column in your data before with a new value if you play right so when you have a data set one of the columns age you would say data set days column is equal to something and you'll replace the previous values if you want it to do that okay um so STR is another function other than length it's a it stands for structure so not as new a quick snapshot to see what is what kind of object this is city state structure or Fe o our next an American vector 1 : 4 we'll talk about the colon operator essentially that means 1 2 3 4 which is the sequence 1 2 4 by 1 that's the colon operator view and it prints out the first color values this is the structure of X is the America it has four values using the first four and similarly for awhile so tells you is an American it gives you the length if we went to our environment that is essentially what it's going to give us is the structure so I say X is 1 to 8 right that's what's printing out here but if you didn't want to go to the environment or if you're just working with code you can stay structure X and I'll give us the exact same thing if I make X sweet sweet 100 all right exit now this bill doesn't matter there's 203 and it gives us a quick snapshot if we do structure of X again it'll show us whatever the first 10 or something like that so it gives you kind of just a snapshot so STR is another function that allows you to kind of briefly view your data see what's going on all right so we create a new script use ours calculator science new Y values variables performing some out algebra algebra on some numeric numeric variables we show some errors now I leave so we're gonna do lab now but I want to show you this when we kind of show you the in lab but I do want to talk about this before we start doing the lab because it is kind of a very important kind of gotcha in our let's say X is 1 2 6 & 8 Oh 168 so gives us a warning so I try to add three numbers to four notes remember excellent once you super me I kind of want you good it did it didn't didn't cry didn't do an error E or what you said okay so three things we turn the box sometimes a commando just executed do not it sometimes you'll excuse a warning sometimes you'll execute or it won't execute note that right like we tried to do and it said never alright so if I try if I go why pick this one let's use e and then I try to print D because it not now because this is never executed it's never right this is never find P dating okay so let's go to back to the case I was talking about X so what happened said well rocket right is not a little shorter of it again that's true and explicit but not only the most intuitive of errors sometimes but this is probably better than most so better than some so what it does it says okay you have a vector of length four and a vector of length three I'm going to try my best to help you out I'm gonna I'm going to try to execute this for you what it does it takes the first element and sees there's a one and it adds to it that's how you get to to the second element to add this is it to you get for third element you get six you add three you get the nine now what happens it says okay we've run out of things on the right hand side so I'm going to wrap them around so what is the is it okay I'm going to repeat the same over and over again until it makes it the last that we need so that's that's where you get the 9 again because the 8 and X is added to the sorry is added to the 1 here so that is the same thing right this one being same-same aqua because what if it was sick 133 I don't have any left so I'm just going to go back to the became right that is sometimes helpful usually dangerous because that notice it didn't say it didn't say they weren't the same length they said it wasn't a multiple so this is one for this one to the ventricle is great this is really great when you have super structured data like let's say you have a spreadsheet and every row is take control take control take control take control take control and you want to be able to delegate control easy as long as there's a you know an even number of rows you say column three is case is the fact okay control that I'll fold it over and over again sometimes it's not what has most times it's the higher data but sometimes this is what you want to do for efficiency most I'm not but you should be aware that fit attic use it not only for things but it's going to deal at all from you want that and it doesn't need a warrior so let's go to the lab so if you go to website you go to basic our lab either this will download or it will let me copy the link address so if you're in a difference if you are in a different browser sometimes if you go to this it should download but if it does open in the browser you can do file save page as and then just say I want to be I think you just want to save it as text I believe so offended if that doesn't happen rather than try to download just just to have you come over just because sometimes saving off the website is not always does always work so well it tries to save you some weird HTML but that shouldn't be the case but if it does raise your hands okay so we are going to Reno and so we're going to probably do about 15 probably do about 20 minutes and then kind of reconvene instead Leslie and I will be sorry there's a datum here that's wrong Leslie I'll be floating around answering any questions kind of going through so download it if you don't have it if you don't have the Internet I can give you a job drive with it on there so does everyone have the internet right now can't get on ok so let me put let me put the labs or at least today's labs on jump rock and I'll be right over let's do jump you jump drop you're on now also now what's happened I know I was already asking you okay yeah I just say our studio discrete velocity [Music] maybe yeah beautiful terrific time they can use oh yeah you usually wherever you see it you if your need yeah [Music] and then do you find home design just set it equal to two exceeding your fire so if you want I searched on you find very large I don't reporter you didn't stag so on the left-hand side it will never have you won't have quotes it'll just be whatever you want it to be right so so for the first one for example my mom we wanted to contain six numbers whatever ones I feel like you just you just put the barrier you don't put you don't put a quote so it's not so this so you don't put like a single or double quote around that's that right so there's never put on this and then also really quickly so if you put quotes around something again ours over that as a character so here five a my eye on people and then I signed it contrast these two with this so this one's done that quotes are said this is not a character this is an object in our and so it's going to go to memory retreat say where what it is equal to openness can print down so just on the left hand side there's no quotes just whatever the variable name you want to be and this is different business right so when is encapsulated in quotes it's interpreted very differently to see are seemed out of the work I must say any foundation about the very so combined so the C function is the combined or collector all right so one week one more site that maybe it's a little ambiguous in the question when we say combine make a vector with both of them in it not don't take the strings together just make a long rebecca with my numb and my car so use the c function for that just like we've collected element here separated by commas we're just going to collect those two vectors in there so we we aren't going to put into putting in numbers we'll be putting any objects again without quotes because if we put it as quotes that means something completely different and it's beautiful quotes we didn't approach your honor right so this 2r is saying take this character and this character put them together once that whereas what we're trying to do in that question is a two different vectors button so you have to leave off the quotes so that arm is your objects yeah yeah yeah our [Music] so you don't mean to do so a lot of times groups are better now generally actually all right so I'm going to one second give me one second so I just wanna meet so it is just something so that you can see the keys okay okay so sometimes it takes a minute to refresh but if you go back to webpage freshness of these times okay look I just get some some water but there should be a teeny if you refresh the website now somewhere your download should be okey so I'm going to pin it up alright we walk through these step by step okay so we're going to create a new variable called mynum and it's going to contain six numbers okay I'm going to just so this is again okay so this is the way I tend to work you know it is interactive you can clip in over here I definitely set my code in the source and then run it like I was talking about fourth command enter it actually executed the code over here if I wanted to print it out I would still write something like this and then I would maybe you could comment that out or delete that if I didn't want to like if I just wanted to do something interact with okay so again we're collecting six numbers together didn't matter which with those ended war we will talk about that seek in the colon operator right so you could have said 1 : 6 so so speak 1 2 6 which will do a sequence of numbers by default with sex 1 you sequence 1 2 6i one-hundredths or something like that so we'll come weathers a little bit later but again just collect six different numbers assign it to my lineup again there's none quotes there is nothing over here you can put periods in here you put underscores you can't start it with a number right you can't start with certain operators we'll talk about their definitions of this before a little dinner but might known in times for multiplies it every single element by four we're going to create a second variable called my column that contains five character strings okay so I think this is a little bit misleading so we did put my hotbar in quotes jus so that you knew that was a variable name but maybe that is a little misleading because on the left hand side you're still don't provoke so you do not put quotes around my that far and here we're just putting and you're John I'm John and you're John you know five times okay and when we say combine we're just going to put these together we're not pasting things together or anything like that we're just putting them all in one vector alright let me pull tend to stop so one thing Tommy both know what does this happen hat is a quotes so that should at least give you an indication what number six is going to be that's what's the length so my item has six - five votes for me eleven okay we see that it is also that is what I was talking about before we got people a racket you get at least a legitimate good solid way I can fit the rigors way we want to get just a gentle how many in 39 here C is 9 I don't she can get kind of a ballpark around how many there are do not use that as a use that as just like very gross gauge if you're trying to say like is there a hundred cases in our dataset would still use operators like links or the number of rows or something like that so it is the class of both character so again vectors have to be of one class we see those quotes around here we see that it became a character this is what's called coercion so R doesn't know how to make a character into a number right but the way it thinks on how to make a number you know characters is okay remember quotes around it we're intrigued is just that just like I really work so although my gut numb had numerix in there and I can do my num plus five when I try both divided by three we should get an error non-numeric operator to buy an automatic argument to bind our operator again because even though the word numbers in maya huh and you put together after you collect it into one single vector with my akbar do that hey I don't know what you're doing with numbers characters yet but I assume you want to make these numbers and characters I'm linking them together again no warning a lot of times you're doing I want so you definitely want to do this for there are a lot of enough cases where you want to do this we're warming days is not kind of warranty it's kind of prohibited in some respects so you're not a true one or ours are is kind of like a catch-22 it's flexible but if you don't know what you're doing you can kind of hurt yourself with some of the things if you don't know some of these catch catches so is that clear so we have like I said before when you read in two days age this is not one of those cells has a space in it and you didn't tell all that space is meant missing it's going to say hey this is a vector this column is a vector they all have to be the same class wanted me to the character you know all the rest number numbers I'm going to meet this whole thing into the character okay does that make sense so when we combine them together because vectors again same class same class or same sorry I keep saying class what I mean is same data type data type the American character plot schools probably at large building factories with it okay so we're in a create a vector with elements once you give your five six call it X create another vector call y 10 20 30 40 50 again what happens when you add it it works but it gives you a warning so when it did was 111 to the a lot all the way up to six don't have enough film is down here so it adds to the first one and wraps around this is called if you google it it's usually called the wraparound effect in R now if we append why we obtain the number 60 we add them together no warning everything happens element wise again if we multiply everything more slavery sugata would multiply each one okay is everything in there mix up yes the five yes okay so my tom is this not up like this so we want one that that means all the encourage a five most intuitive sample battle okay so um so much for example okay let's say my unknown actually warn owners they technically where I eat writing a lot of studies now II's aren't truly know you don't want to be able to add five that makes anybody right so maybe you know that one is a lot of hand in the numbers right just by the way with right hand or something happens data something we got engaged a two-litre factors and you wanted to combine into one big house into one big risk of IDs okay that's that bit more to indicate well doing what it will do a good day okay be your numbers I don't know how to make characters in the numbers but I don't have any numbers and characters I'm going to treat these are characters it's back what we actually get when we when we combine them together to get those to get the variable book says hey this have to be all one class these are actors these are numbers I'm going to change these characters and put them together okay so we will be talking about other functions that will do this explicitly there are functions like as that character that are very explicit about doing you say like I'm not kind of - another actor vector there are also good I want out of need so this is there I was in York number one here characters on your own make sure sir so I believe it later um and again you see in season quotes so you know that this actually was coerced to a character so that being said there is one for example like as numeric let's talk about missing game later again both add to the name and it had a numbers in there when you say adds on America say hey I want to coerce it I want to change this I want to make sure this you get it back and in America you can do that it's just going to say if you didn't put numbers in there I'm going to give you this back wearing a engineers like Porsche these are all again John John so knows then when the number five just quotes around it it knows how to make that into a number but it doesn't know how to make words it knows when there's quotes not found on us right it knows how to change certain like numerix into sorry characters right this still our clothes are pieces of the string I'm a Fed of hi this one is commas in it right so some of the data you get might be in thousands of dollars or thousands or something you have to strip those away before you change them into numbers you know stuff like that okay so is at least all these examples pretty intuitive or not to it but are they straight forward so far okay normally we have like about a 10 minute break I'm going to hope I'm going to allow people to kind of break during the lab because it start out a little bit late today I don't want cold people later than then they need to be so if no one has no very strong do neither we have to like get out of the classroom right now just keep my boys gotta leave now um we're going to charge for all right so cool we believe with vectors and mumbles they've no gays no really that's what this function getting it in getting data out and I spoke with some students and we will talk about things that are like law there is no real direct analog data of law right we generally you have scripts that excuse code and saves things out really that'd be a CSV and a table top loader or something like that or a figure four there are some things well there are are there exists our object holders so trying to use the word our and our or the the presence of exists in our is kind of a hard thing to say sometimes okay so again working director Thomas these are these are probably the biggest problems you'll have when you're trying to read a date in a anyway you have to make sure you're pointing that software to where the data is right I try to use everything with relative directories like I did before with data slash I try to use nothing with absolute directories like I was saying before C colon slash my username stuff like that so we will show you a way in our studio it allows you to do that relatively straightforwardly we're gonna have a drop down menu we're going to show you with one drop down menu how to import some CSVs and stuff like that so there are going to be some some drop down things but for the most part what we're going to do is use that drop down menu get the code copy and paste that code into a script and then you would run that script again Kay sensitivity this is not as big of a problem with that completion but with data sets if you don't call the variable name that the column name from your data set exactly it's not going to return the right thing they can have the same name just different cases um and some of them can be very very very long and complex depending on how you import the data so yours had completion it had data set problem so again we will be talking about data frames the only thing about data frames that are kind of concentrate is it shows you like a spreadsheet like all the data but it's not all 100% clear in this column encoded as a number or character right so some of those commands and for live that I didn't go over yet the add dot the the as a character something we can say make sure this is a character you know we can use those in manipulating our data work later but if you just read a data set in and say like histogram of age and you just haven't ensured having made sure haven't checked that that column is numbers you're not going to get the thing that you want out you're not going to get the right plot so just touch just general typos with reading data again feel some brackets and stuff again so working directories when they exist is programming okay if it out so again you can change directories and scripts I suggest not do so there are two functions get W D stand for get working directory and set W D set working directory if you have our studio closed completely and you open a and I let me close out our studio and I'm going to open the key so this is the key then click it it's going to open up in our studio session it's a terrible example cuz it's in my downloads give me one minute so let's the injured are so we're in the data input output sorry this is very small I'm going to click the data input output that our file what that will do because I have my arm a survey or studio it's going to open it in my browser and sort of automatically seven directors two directory where that file so if I if I start this project tomorrow and I open it like that you're always open in the directory where that where that script is and then I usually do everything where it's like you know read data point where it's like you know data something something something something where it's always relative to the directory of the script so sometimes you still want to set working directories but again that's not at least the way the codes can be set up for this so are you look probably feel welcome so we're in directory so assume you're in the directory the script is in open if you do set it work if you are going to set a working directory set it up at the beginning top of your shirt you're not set it middle to not set it five or six times so they met before if you're you know you're you know if you're going the discharger in FX your level when you go to get that person off is very hard to read rerun yes so I'm putting so that's a good question usually I will read the data relative for that Hey so in the sense I usually have a folder that this programs a folder that is data and my scripts are in the programs so for example I would do read a CSV go up a directory go into data and then read that so level up data entry so my holders are still separated between code and and data we will talk about this a little bit later but this is a good question because there's just different ways people are doing it all I'm saying this is relative to the script a relative is a way to relatively specify your date versus if I say that night let's contrast these right this will work on my machine this has been detention for wearing somebody else's sheet by copy of ER so later we will talk about these things called our studio projects so I'm going to sorry it's it's really small to see this stuff on the big screen but if you say file and I'll show you how to do this in our studio file new project and you should note here it up here in the right hand corner it says I need a project that will automatically set it to the directory where that has powers many of you do things relative to that third century so if you open that project and start writing code you know your exact correct versus opening a script and assuming you're in the scripts directory that make sense so there are different ways but it's just that it's just a one kind of a project management idea but the but the big thing is the big message in here if you're going to do reading a data writing whatever make sure that least somewhat relative to that and not hard coded using your username or something like that that's the very least if you think it's going to be in someone's Dropbox right we'll talk about the Tildy in a minute in the home directory okay so this is kind of what we were going over just not as structured okay so the default directory structure uses single slashes single backlashes but aren't references xscape character sorry there should be a yeah a slash there that is that should give so you must replace the backlash for the four slash or two back guys throughout the entire course writing in anything with respect to paths or anything use the forward slash so if I'm taking C : I'm using that in in Windows I do not use sorry I do not use that because that is the way you have to do it in our and I'll say that in sent okay you can still use forward you know backlashes backslash so whether users they're used to like ones too cold like that that for most systems did not wait that up so more or less don't use these supports which there are a lot of reasons for that I can go into them but just don't use them don't use the and if you're going to use them you have to use it used tool so the package in our studio will like to do this with Linux and Mac you don't worry about this they use the loopers laugh well note here when IG call them flash this this quote hey Anna she applause hear that car says you're not finished that command isn't done I expect more to come next so if I just keep hitting Enter you just keep hitting these pluses and nothing executes that's because the backslash it unquote see : this is interpreting is called escape character so that means if I actually want a double quote in a string I can escape it again any more detail more or less that for that backslashes can cause a little bit of problems which affect the passes pads just use the forward slash but again use it relative to the directory all right so the other shortcuts so period period goes up alone so if you're in directory a the directory above that is period period right so this is just kind of how you specify directories in on your computer's piri-piri goes up a level . / period is the current directory or dot okay so you say like so if i say set working directory period period it's going to go up with directory and I can keep doing that and it'll go up three directors for example I do this or this it'll set it to the exact same directory that I'm in again when we use the drop-down menus it shouldn't be as big of an issue of utility is your home directory again this is really important if you're sharing with other people but getting no program is getting code or make it easy for someone else that's kind of a sense that so yes yes so so one you can look up here on the console but that's not helpful if it cuts off get WD is get the working directory so again there's some vile final offer functions Sodor will display the the contents of their directory their periods tell me what files in this directory der period period said tell me what files are in the directory above me that kind of stuff if you just want to do things without looking at the point click so we talked about this absolute and relative paths you can also instead of using such WD if you want to if you want to set your directory and this is kind of the way you think you can go session set working directory choose directory and should pop up and say where do you want to set your directory so that's one way you can do it if you go session set working directory you can say to source file location which will set the directory of this bat if can also set to wherever's in the files name use that what if you do want to set it explicitly if you had changed the directory and you want to change it back you can use this drop-down you have to explicitly specify so if they're here let's uh so if there was a subfolder in data you have to say like data slash subfolder because you could have for example input Baltimore employee salaries FY 2015 you can have that in the data folder in the sub folder in one of the soulful orders right so if you have to be exposed to as to where which one it is that make sense but for the most part we will try we will show file import data set will be what we're going to use i'll show you so from CSV I can click browse and we can say I want to go to the data folder and I want to get that Baltimore Gate in there so I think after me goes a little not that small so it's going to say it's going to show us a little preview all right let me let me redo that just so I can it's this is also in the notes I just want to show you one way we can do this what you have to worry about all this kind of stuff okay so put the directory travel snapshot say some of the options is your first name is the first row actually ahead or out or just data you want to trim some of the spaces so if there's spaces on either ends of those words right so if it was like space John Michell e space do we trim this away if you want to open the data viewer after you open coming up it's a comma delimited file you know comments or you can specify there's some something else na you know do you want them to be just empty string stuff like that and when you click import so the next thing down here is it gives you the code so it says we're going to bring in this other package we're going to assign a variable and then it's going to view it can I click it it reads it in it brings up a viewer and there's your CSV version so again we'll get to that in a minute but this is the way you can do this quickly if you're new to arm you want to give you to sit in I won't be playing around you use the file import dataset and you can see that it has multiple options it has a CSV Excel SPSS a source data if you want to load one of those in and start working with it immediately it'll give you the code so now you can copy and paste this code into a script and you'll have you will have a script that'll or the hate but again your absolute cat so might not necessarily work on the next person's computer it's all okay so again ah good for that but we're talking so before we tell about any of the functions remember my hell so the way you can get to health is a question mark so you say question mark and then function so dirt was the function talks about for if you execute this in your are it will come up with the help directory for in the bottom right hand corner lists the files of a directory or folder right we see you know this is the help file forwarder it has all this information it's it shows you this is the uses how you here's the function leave the arguments either with the arguments really mean use your little description in a title and you can also use the help function explicitly by help der and put it in quotes that will also bring up this help this that also bring up this this directory or sorry the sub file okay so everything will be using publicly available data we will talk about well not anything examples not I may be allowing things are going to be really 20 sets we will go through some toy datasets in the lectures there from open Baltimore gauge go so reading data is usually the first step it can read in almost any file format so we can reset it SPSS you can read in you knows any vector CSV with the limiters and strange things going on there Google sheets that kind of stuff ah that's not the say out of the box or can do all that a lot of these a lot of these reading capabilities come from external packages add-on packages so for example if you downloaded are loaded are up there wouldn't necessarily be able to read in a state of data set if you're if you're more using our studio so somebody some of these user packages people allow people wrote it so they could review things but general you know text-based data is not bad so we're going to talk about usually text delimited tabs one it comma-separated using either colons or sorry commas or semicolon and then we're going to be using show one or two packages to read Excel files so we're going to be talking about the Youth tobacco survey from David Cove so it's about smoking and smoking cessation in youth so you can download the data so if you have this we're going to we're going to click and we're going to show you what happens when we click this it's the Youth tobacco survey it's going to come up with the CSV right and so if you get something like this in Safari or Chrome or something like that and you actually work down with the data you say file save as and it should download the data not as a web archive but as a page source and you can download that wherever you like you can also read in CSV from the web directly alright so I'm going to show the code where we actually just take this website name and we're going to pass that into a function to read it okay so I talked about this and that's what we're gonna do so I'm going to go into our let's say file import data set it's a CSV so I'm going to first I already downloaded this so Marx English vinaigrette Mario just to show you how to do that and then I'm going to paste the the CSV pad directly from the website yes so yeah so I'm usually so if it it should give you an option just to update those yeah so some of these so our studio on the back end so you'll see here the code that it gives you calls a different package to read in that data set so it depends what version of our studio well if you don't have updated hacks unreasonable softy okay so I'm going to go to my desktop I got the Youth tobacco survey gives me a little preview name I'm just going to call this YT s okay because although it's called the youth tobacco survey might be a beta by people it's going to try to make that as your variable name that's a really long variable name something you don't want to use over and over again right I think o YT s for simplicity right again because because the way I assume doing I can't really get a good view what the data looks like but I know this is a header I want to keep all this stuff in there and I'm going to open the data viewer again so you can take a look at it so it says nine thousand seven hundred nine for entries we can scroll through we can filter things so we can take a look at this data as it's loaded in memory right so we see there's a whole bunch of variables here so we can filter write based on let's say Arizona right so it says 106 of 248 degrees don't look mad if you want to do some interactive browsing excess stuff you can do that in this viewer okay so that's one way you can download it that's one way you can read to CSV so these drop-down menu like I said we have this code and I would put copy and paste that into my screen alright so now you have this data set in there we will talk about data manipulation but just getting into our that's kind of one way you can go again if you really briefly let's we'll go do this we can do file import data set from CSV I can paste it directly it will retrieve it again I'll call this YT s to R so I should name it here watch s 2 and so this execute so again exactly data set so it just it doesn't does give you a little output so the parcel column specification year it assumes an integer data value doubles of thing I think user numerix I took a guess if you want to be specific as to what they were you have to specify it directly in there but for the most part it's really good at guessing and if it does come into the case like we were talking about before we had something not Amerika Amerika it'll make sure that's again maybe you won't delete or throw off data from your from your from your from your files without at least without telling you okay so what's actually going on behind the scenes so that read underscore CSV function really is back ended by this once you call read underscored tool in and and have you saw before it used this function library right and use new you are so remarked the package that you are so use in hundred to read in CSV tab delimited file in a more efficient and so on smarter way than the new bolts are so there is a read dot CSV function if your downloaded are we are not going to go over that because um the defaults are are little bit strange so mostly every single data set that I think this will public health I think Nina everyone 90% it has a column names right reset if you didn't always assume that so more or less when we're talking about reading and data we're going to be using the read our package so this is the first package we're going to be talking about in this class to read to do anything with and it's going to be to read data so the function that goes on in the background and read underscored to live and these are all the articles so you can see you can customize reading in a lot of different ways depending on how let's say interesting reformatted your dated so it's give certain rows you can trim white space for example you can specify a different delimiter it's not Thomas a grade or colon separated or something like that and all these other other more or less functions that most of the time you don't need especially if it's formatted from among in itself or some other standards format so you have to specify the following path the the filename is the path to your file either relative or absolute has to be in quotes again if you don't have you specify relative path and looking working directory you can also specify a website URL um you know we're not going to go in exactly so okay so again because the way my computer where this where this code was executed one level up in the data directory is the back of the dataset was in there we assigned it in this case to be Det alright why TS or stat whatever whatever you doesn't matter we need so now the data is successfully in your workspace and you can just just run all these all the data types of data manipulations right so so readers for dilemonade also turn this thing called Sybil it's kind of a play on the word table or something like that I don't know but like we said before there's much different data size so we talked about actors in America talk about law schools and factories or categorical very little bit later and now this object you are is not a veterinary it's dated a data frame or tip which mean it doesn't just have one dimension that has multiple dimensions there's rows and columns each column now can be of a different type so before vectors when we when we took my down on my car and put them together doesn't make everything exact that's not the case now you can have one column of eight with numeric one column of character one column with whatever right just like an Excel spreadsheet right it doesn't necessarily all have to be numbers or characters or anything like that so um we will show you a news read under so CSV three doubles everything to read from CSV file in here we used to read automatically everything that can read you know standard text data we will use a different package to read Excel data but the reason I'm making such about a point about read underscore CSV is because if you say no if you see any questions online that s about reading a comma separated values 90% of the chotto say use read CSV they use read ASP for two reasons well two big reasons one it comes with Bayes are you don't have to download anything else read PSE is in are when you started out actually immediately that you not have to download this additional package second read ours leave it around for maybe about four years but the reason we use it is it's much faster it's a bit more intuitive it has better default and it works with our studio all right so although the view function is really helpful in our studio you saw you can filter and do all these fun things a lot of times when we're exploring when I explore data I don't use that view I use the function called head right what head will do is it by default will print the first six elements up and if that's a data frame will print the first six rows if the vector will check the first six elements usually I'm doing I'm using that to get a gist on what's going on the data right what are the columns what's kind of some of the values in there right so you can see here here is an integer 2015 location abbreviation location description top effect so got a whole bunch of variables or you saw on the viewer it has like a hundred barriers but when I'm printing this out for the screen head says hey I don't think you want me to fill the screen like Huntington's on YouTube records or hunters at all so I'm gonna shoot it okay I'm just going to show you a couple two columns and I'm going to tell you all the ones that I did that I didn't present so here it show you can see the first one two three four and it doesn't show the other 27 so we'll talk about how to select the specific columns you want to view that guy stuff all right so if i just type instead of writing head if i just printing out it by default these people objects print out the first ten rows and then talk about this many more as we get per sentence or two columns reading people um yeah so we do talk about read along and read uh CSV because that is how a lot of code is read in online that's how a lot of codes you'll get from collaborators is right in but we will we will try to start you out using read underscore seriously because I think that's that's a much more intuitive way but you should still look at the suit back and look at the help files for these functions because they their behavior is very different then read underscore CSV and read dust you see by default so for example redux eat read dot CSV will give you just a standard data frame out not this pibil Center and I say that because when you print them out it looks different it doesn't tell you the type of the column doesn't tell you is an integer double or whatever it's hard to see here maybe I can make a little smaller it prints out every single column right it just keeps printing and printing and printing in print it doesn't just limit to the first roll call it prints out over here so that's sometimes better if you want to see all the columns sometimes less intuitive but if you do class and see that something's just a data frame it'll print out like this versus a different on different so again I'm what's the better so we have the data inner we got that right got Allen's guys so it doesn't just have the data in there it has the actions that we just saw column names row means so one way where you can rename the columns is with this names function so you can use column names or names they go through the same thing and it's in this case so you say the names of that I want to say the first element and I'm going to call it year the lowercase by default when it's right in all caps I want to rename the first column to year Lori's so after we do this operation so again this is the function we're going to grab the first element of the column names and I'm going to sign after year so I'm going to step through each each different level of this function alright so names we're calm call names maybe bit more intuitive right gives us the colonies of the ASA okay one will go into very a lot more detail on the subset but this is your easy kind of just jump in the social setting bracket one grab the first element okay and as we just saw I'm up here the first element was you alright I note that the first element of things is one not zero like it might be in some other some other things so just a one based system the first element is assigned to the index one now when we do this now we do this we do say here before you're saying if economy something to the first now we're actually doing is reassigning that to a new column and in this case you just all year although okay so we look over here you see it's no longer capitalized lowercase so this is important because we have a function of the left-hand side and we're adding something so setting it and it's reassigning okay so this is why I would say are some that can be very confusing because on the left hand side you can reassign kind of many many aces I think entry sign variable and agree on America's you can recycle reassign album data and in this case we're reassigning the column the first column name of this data set to a new value okay again we'll talk about some more intuitive ways to do that but this is one of the ways um like Reed underscore CSV right underscore cs8 there are our size excuse me like really underscore d'Alene and right things for 350 reading I'm sorta learn there are associated right functions so you raise CSV out all right let's say forgive that subset of negative negative columns merge it with a value a date that you want to set that heck out to the platter right do all the manipulations new great underscore city and then that'll be a CSUN computer you will get an excel or whatever and then you can send that along if you want add the one unit for example and there's options to say what do you want the missing this to be do you want us to be empty quotes or na s what is the delimiter in this case it might be spaces you might want tabs and semicolons button CSV by default its commas and then you uh you output I'm going to kind of press over that really quickly so in this case we're going to do an operation you did before so if we're not just we're going to sign up to Capital y over here and then we're gonna write it out so if you run this command I'm going to use YT s two things almost set okay first off your cache in the data frame let me think we require enormous agent it okay if I don't put any other path or anything like that I just filing by default it'll go on whatever directory okay so the only thing we've done the only thing we've done in this data set is just change a column name and we output it and I'm going to open this up and then if I open this in Excel so one there is a CSV here open it up in Excel and I'm going to show you the first column years lower Kip's so I can't never you know here's just lowercase here alright so let's go meet it so again I mean you know ground-breaking but you know obviously you're going to talk about filtering making the columns using all that stuff right and then you can output it you can just write a green and red function but I know we haven't gone into detail the back of the theater option these other things but we want to allow you get data in relative to the drop-down any from our studio is a really helpful solution and you can code up with that to get datasets in there so my suggestion I'm restarting out use the drop-down menu when you're trying to CSV or other times other delimited data or SPSS or Shaya grab the coat there copy and paste it into a script and then you're kind of ready to go with some data with some data and that would that would be something else Jeff this is um a relatively new way to do that new development I would say so that that wasn't available about a year and a half two years ago so I would say it's a really good so especially if you're using Excel you can use the read excel package I'm going to show you for example how read in Excel and you should I would I would highly suggest playing around with this playing around with the input with read read reading in reading in data it says the first rows is named and then I would say like maybe like incidents so I so for example I can copy and paste this code I'm going to get out of here copy that code paste it in here and now if I run this code here although it's got an absolute path I should be able to get that data set into are relatively easily relatively quick quickly without necessarily knowing anything about the read excel package and the read excel function but you now have a dated have a convicted beauty start you know cleaning you need your uh you know summarize as well okay so so we will talk about we're going to most of the time I would say I use CSE's because it open days open format Excel got a whole bunch of stuff in there a you see is used for a lot of things sometimes only not always be the best way store data but I use the green underscore and bright under source functions for that the re excel package is very good for reading unix up it doesn't really do that well it doesn't write excel up that being said if you read in excel file you can always output it as a CSV that should always be able to be opened in Excel um that being said if you really really really really need to write out an Excel spreadsheet xlsx will do that but I wouldn't I I don't I don't know maybe you have a use case where you really needed to be an excel file versus a CSV but I really don't have any of those cases where I can't just output into CSV okay so um these are some song actors that I don't know I haven't used the drop-down with regions at SPS or anything like that but musician really good packages that I have come out relatively recently Haven can read in that SPFs or Stata format read Excel which we just talked about read our Reno's what we're going to be using for CSV to another tab Dylan it is much much faster than the seminar that's M of e that will reading fast of beat up files which is course at the forum package that used to be the go-to for reading in SAS data in SPSS formats but that's kind of fall by the wayside David Haven and then there are other other packages that have come out there's really stated or keen or something like that because the different versions of state and outputs have been changing over the years and not it's not I'm not 100% sure that the state of formats all state of formats can be read in by this patch that being said if you're in Stata or SAS and output something to CSV you can read that in are just the same alright so take a medicines what read our package is good you okay the drop down menu for reading data is pretty well set up it gives you a good intuitive interface start music as I said as you go along your college you start and you start using the code right we underscore C is using that function rather than using the drop down box saying bad things that drop down menu chip usually for a community or time probably just looking super good I would also suggest copy and pasting the code from the output into a script so that you can reproduce this whatever analysis you're going to do whatever clean it's 93 data set in and do some you know some exploring but you should really save that data consistent okay so we've another lab and I'm going to say we take like five minutes for turns one on it it's two minutes the questions not a lot but that to get at old Jimmy well for the most part leave for this course all the way we find it is that we assume most of you what you want to get into are and start working or rectangular plates all right that intense data or SAS on that you can get an excuse you so either Excel or CSV if you have a humor saying if you have something like I don't use those formats I use this format exclusively and we would love to hear that but I am not really running through those very specific cases yet so we are gonna laugh so I was going to be data audience we can be you can start on the line now we're going to go over it in a bow I would say a half hour and then kind of wrap up and talk about what we're doing for the next lecture uh if you did a try our model and you're getting this of course you can 70 they're just a snapshot of the badge or a snapshot of the completion on code school that is fully sufficient or Oh [Music] no but with RC that should have been loaded up if you had a new work fresh version should've did it has anyone else tried it so it said you can't insult me okay okay so it doesn't auto assal this so what comes up when you do this ah wreath yeah so if you don't have that yeah you can go to tools install our packages I think read our might come package maybe read Excel doesn't then you should be able to just write it if you say repository you should be able to just write in the package name for example read our if you get that don't worry about it for example she also want read Excel sorry about that I thought they had that prepackaged with but I guess not yes so tools install that packages is one way you can install what you need so if you have tools and smaller packages in the Reid excel package that will give you what you want okay let me double check it okay it works you know okay Oh teacher we went over days I know I actually added alright so with some things okay so I thought the breeder fell package was bundled with our or with our studio but I guess it's not so read our seems to be I believe so install our packages is the function of we used to install things but much more to two ways so I'm out sake tools install packages he said read Excel so it's going to read it download that or so okay so what that will do is it will grab that package off the internet and install on your machine so that's just like downloading a file more or less so if you want to use any of the functions that exist in that package you have to use library library okay so libraries not react to it sir so install really being download library mean use use this package okay so that's what we mean by invoke the read Excel pack so you have to install so I'm there have been some people with some permissions issues on the machines with downloading packages that case let me know because you install that batteries and then you go to the library and still there's nothing there I'll try to help you for missions in as much as I can as long as you don't need an administrator yeah if you so yeah so that'll be um and if it's already installed it'll tell you that um if it's already installed it might ask you to restart so I don't know why that that did that might be a new very humorous to do but install our packages with the package name in quotes so is the way you install packages when you invoke it though when you actually call the functions up you don't technically have to have quotes around it so there's a reason for that kind of highly technical don't worry about it but but most time wouldn't you do this with a function or is going to look for an object with that name right it's going to look for a you know variable called read Excel data set recall read Excel in the specification of library it doesn't do that it knows you're trying to call a package you don't have to go put your install that package you do so let me explain the reason fits okay so I immediately let's say a four hundred right package installed on and I'm running sure and I don't want like you know let that package that reason you know max thank you I don't need that every time I don't want those function breaks every time off so by default are doesn't load any of the packages up so you think this is kind of loads minute so normally for a script at the top we load the package and then run your command so um and the reason for that is so if you work in data for example in your a do for all the functions that you've ever downloaded here except one of the big reasons for that is it assumes you have a much much fewer set of those and the other thing is in our if I read a packet let's say let's say I don't like the reagent for CHP function in radar let's say I make my own my own breeze underscore CSV function that fine I can make a package and I can put that up to a place where you can download so now reader has a functional regions for CSV I have a function called read underscore CSV so you only want to load the package with that has the function that you want done make sense so like you know there are 100 you know you can think of you think of some simple command or something like that that there are five or six packages that have the same function name so there are ways there are ways to be explicit about that we're not going to get into those Roenick I've run into those that much where you loaded two functions that have to say are sorry two packages that have the same names but that can happen so in your thought this is how you read it it sorry that's how you load up the functions and so now if you go to use any of these functions will be available to you I so we do suggest to download the excel and then read it in you can I believe read it straight from the internet but some people are having some some odd problems with the you know an error with zip but you should be able to use import data set from excel paste it in there update and it should give you give you this but if you're not getting this if it's giving you some error when you go to click update and won't be that route just go to website download it and just look just just paste so okay let's do this so go to the website download it and you say from Excel and then click browse and you can go to wherever you have this stored for example monuments is here and it should come up but if you do get an error if you really off your hard drive then definitely I go down that's true so something either went weird with the downloading or something like that so um now like we said okay so some some people were getting this redundant or CSV file not given I don't know a lot why that was the case but if you do it from the drop down menu there probably some option on there ah yeah that's why because our studio is smart it knows that the read Excel really cell function cannot read the data straight off the internet you need to actually download it first it just gives you some very nice and convenient code that says like this is the URL I'm going to download it for you right now file and then read right so our studio does have some nice little wrappers that you know instead of going to the website downloading it saving it somewhere opening it this is the actual code so no matter what folder you're in it's going to download a full a file in there and then read it directly in are so that I can just copy this paid come on and it'll do exactly what I want and then you can add other fun things like file dot remove so that what this look what this code will do is they'll go to this URL saying I want to download that URL if it is Monda xlsx read it in a memory so leave it off my disk alright and one fell swoop so um so the drop-down menu does allow for some some nice nice things allow you to do something well using yourself just download it to your disk it's a good practice anyway right we're showing you that for some somewhat of a convenience sake here so that you know we didn't have we didn't have necessarily problems say downloaded here whatever you can download whatever you want we are showing you some of these things for convenience sake reading straight off the website that's not so great because what happens when tomorrow I change that data set you want it to be reproducible so you usually want to download the data now we do got it in memory so Mon so now we want to write it out as a CSV but again the functions we were saying that we are converting to use is write underscore CSV we're going to put that in their path Mon dot CSV we execute that code and you get a CSV of the monuments data form so that you can just read it into anything with CSV so this is this lab is a very simple converting Excel to CSV kind of not using are okay but again that's not so exciting or interesting but when you start doing data manipulations emerging in that kind of stuff and then writing out as a you know an analytic data set for someone else to use that's where things get more interesting obviously so the data I a key give me one moment I will put on looks like [Music] so good to do okay so now I'll put that so I'm just going to refresh my website if you just refresh a few times your cash should be in there so the key should get there now which essentially has the same exact things you're talking about again library install packages downloading this is exactly the same type of code just you know a few years downloading a file to a temporary directory so when I close my our studio you'll go away and then we share to your ways right left use means we're not going to talk about your voice underscore CSV using radar but I'm going to show you the output of both of those because this is important because they're not exactly the same based on defaults I'm just going to library things in again it's going to download the file it's going to read the data in I'm going to write two things out and I'm going to show you the output from write CSV and write underscore CSV okay come on okay do it okay the one on the left is right underscore CSV or already and great dot C is the big difference between the two the first column first column just the names first column here right that is the assessment that you want to break out the rows in this case the road names for just one to the number of rows a lot of times you don't do that really many many many times you do not want to compare you do not want actual data to be contained in the rheumatisms okay but default confirm that PSB say hey we assume your earnings are important to you we're going to write them out and a lot of times I plot we want write underscore CSV doesn't assume that by default it doesn't pick them up so although they are faster they do have different defaults different behavior we are going to go with the one on the left and assume you don't care about the names of the roads if you do care about the names of the rows making a new column alright so we covered a lot and great so one day I like to fall out a lot are in language if we were taking a horn language is spoken foreign language I would not assume that I would be fluent in that language by the end of a three hour period so I don't think you should assume at the end of a three and a half hour period you should be able to rate our fluidly well you should be able to get some of the juices so we talked about the input/output so we did show you how to do drop-down which in the long term we're not necessarily anything do you still think everything should be covetous or a screw but starting out it's a really great way to start out because it gives you the code you can copy and paste that into wherever you want so now what we've covered so far is kind of data types very simple ones new American character that had to differentiate them very simple algebraic operations how to read a set in how to read a set out times of a guitar studio right how to do some things there we'll talk about so maybe right now in about ten minutes I'll talk about projects because we did talk about working directories live it's kind of why you talk about this but keeping things organized is really great for power management it's awesome information like a lot easier so um the way you get a project you say file new project I'm not saving the key so you have an existing directory sometimes you have a new directory either one I'm going to start a new one so it gives you three different templates if you wanted to dessert if you had a bunch of functions you wanted to design your own package or if you create one of these shiny applications I'll show you a little bit get a gallery and about after we go through this usually though you want to start with an empty project create a name I'm going to put this on my desktop but now this right so you can do this thing called create a git repository might do these action you not we're not going to talk about git git is a version control system that if you have like code and you made a lot of changes over time it can track those really well but that's a whole different thing on its own so we're just going to say ok and you click create project so it'll do is it'll close what number is going on in your art studio restart it and then go to that directory and you'll see in the top right hand corner you are in a different product so my make a file new new file let's say in our script say read CSV like I'm going to say library read R and so I'm going to they really quickly I'm going to copy that monuments data set to this test folder is best analysis folder so read it in I'm going to save this as read function okay got this you click over here you can click on your project open different project right now I'm just going to close the project and I'm going to open it back up double clicking this project folder I'm going to close it out and it'll go back to whatever I was doing before in our studio okay so I'm just in no project I'm going to close out of this so it's hard to see if it does a dot are proj full file so if you if you follow the steps here and make a project so when you double do this it all began automatically that's right through you're in that project you will open up a strict rate at that time and again because you're in that directory you know that you know that the monument folder is in there so you can bundle up this file with with or this project more or less and give to someone else and they should be able to run all the code so that's something that's like for example how you might use our studio projects to do some project manage that being said we're going to talk about this later but I always like showing the stuff on the first day because I think it's a little cool stuff so you a new file are markdown so we're going to keep a document there are other things I'm going to say title example analysis I'm going to keep it as a default to be HTML and I'm going to click this button knit it's going to make these saved so it's example analysis RM d RM d stands for our markdown and so when you knit it let me pull this over here you have code over here I think this is the Mar code this is some text I think this is an R markdown document we're doing a summary of a data set it does a plot and on the output it gives us the text make this big it gives a code the output so this is I guess the most analogous thing to a log in Stata but it's a little bit more better because now you can include plots right the plot the code for the plot was in there and we didn't want to print it out so we could say printing out or not being embed this thing if we go into that folder to do and look at example analysis that there's an HTML file and the beauty about this is you don't you can't you don't just have to miss the HTML you can into a PDF document or a Word document so if you knit to a Word document what's going to happen it's going to run that code get the output plus it then open up we're dotting it with things for getting once my word so now he would document again I've got my sweet this bigger it's gotten bedded plot not formatted exactly the way you needed to be all the time there are ways to make it super formatted but for the most part for an example analysis you want to send to somebody they can throw track changes on their right you can move forward with that then PDFs also especially if you like writing if you have to like write an equation it'll convert it to like a Microsoft equation editor thing you have to write it in a specific way but if you knit it to PDF it'll will it it apparently fail no ah so it'll usually work a lot of times I just really I'll in my operating system if you need to go do something you should do for uses less but I need a call repack which will allow me to convert things to PDF but word in HTML should more or less work out of the box yeah so that's the fun thing um those projects Oh shiny so we're not going to cover this really at all but you start getting into writing are this thing called shiny I allows you to do some fun interactive visualization so let's do where's the simple ones yeah so here it's pretty simple it's got two drop-down menus X variable Y variable cluster counts here's the code that goes into that there's a server file and a user interface file that pretty much specify like hey I want to drop down menu decorator will drop my variable challenge be able to set writing in clusters how much to put plot in there and this is all our code and the server is where all the our code is executed so pretty much you write so many things it is a totally different syntax I will give you that but the nice thing is you start clicking here I want it to be petal width so sepal length is my X variable the Y variables petal width let's see have its species known as a very one petal length and I'm going to say there's three clusters actually I think there's four clusters you know what if I ran a clustering methods with the respite Buster's how would that kind of break down and so all this is written in our if it's interactive and you can kind of specify a really effective yet simple kind of inputs and outputs granted I understand this is uh if they wanted learning language this would be you know writing you know a specific region over time all the things I want to push this all written arcade right so the two things that I think are really important on shiny but that's that's less common to be used I think for new users but our markdown is really really effective for me so for having your code enter your code and your output in one area so the nice things here that I didn't demonstrate so we did a summary of cars it was like input output of cars the cars is a data type that the toy data set in our so ours and so we'll talk about how to subset data but if you want to say like the mean speed of the cars is miles per hour again this shouldn't make that too much sense to you yet but I want to demonstrate you can then write things PDFs I think yeah let me go to word so um the thing I want to demonstrate is that you can then say like things like this where you say that meaning speak of your cars is 51 4 mile an hour we're at it right in 15 point 4 that was executed in the arcade it grabs them this column or about the mean and embedded it in there so that you can actually write a full report full like it's really hard to do this all I will say but you can write a full manuscript where you updated the data and rerun everything and all the temperature will change so if you do data management reproducible reports or consistent reports this is this is really a big deal I will say we will touch on that around the last day we are going to get you up on our but usually I want to do some of this stuff excite you because I think it's really cool and at the end of the day it is it's really different there's all your stuff going on here but at its core it's just you say I'm writing text I may be braiding in a way that is I want this to be big text I want it to be a header and then you're also doing some of the things that I own fits and outputs tomorrow and you can have it all kind of bundled in one and get it that word or PDF or HTML but for right now right we're still just reading it in writing it in more already thoughts such that relation actually work so I'm going to email tonight who is taking the course of credit so that you know yes I'm just going to email so that you know and just email me the the badges I believe I will email tonight the homeworks are due but more or less usually I think homework one is is so homework 0 is it was the with the try hours you can give me that at any point during the week but a homework 1 our homework 1 what we do I think in two days so probably Wednesday and really only do I say you do because we're going to go over the key you just want your smith the homework before the keys are released yeah question yes yes that would be good so did everyone else get the emails okay no okay so you to really quickly okay okay yeah but I just want to make sure you're in there so that you know course evals on like my stuff so if you three can talk to me real quick um alright last minute anything all the stuff will be up there if you go through some of the sub studying database of summarization base if you look for anything you have any questions don't have thinking you know me okay thank you since you put your genius on me put your the page yeah [Music] [Music] [Music] already Oh [Music] well - OH Oh
Info
Channel: John Muschelli
Views: 4,732
Rating: undefined out of 5
Keywords:
Id: Xi-wsACc7p0
Channel Id: undefined
Length: 210min 19sec (12619 seconds)
Published: Thu Jun 22 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.