Conditional statements and loops in R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this video on conditional statements and loops in our a conditional statement is something which allows us to say if our data meets this criterion apply this action or this function and if it meets this criterion then apply this function and otherwise apply this function or do nothing loops allow us to apply a function or set of functions across different collections of data and equally and quickly so for example if we have several thousand data frames and we can loop over every single one of those data frames to collect the same information from each one without having to do it one by one so we're going to start off with the if-else conditional statement which is similar to the if statement that you might be filming familiar with in Excel so function is if-else and then we supply three arguments the first argument is a condition a a logical statement so it could be a comparison so is for greater than five the second argument is what we want our to do if the condition evaluates to true so we can ask for it to print out the character vector yes and the third argument is what we want our to do if the conditional statement is evaluated to false so we ask it to return another character vector no and then if you hit ctrl + Enter to run this code then of course it prints no because it looks at the conditional statement the first argument 4 is greater than 5 this is false and so it returns the third argument no and just to show the other way around if we change this so that we say five is greater than or equal to four and of course this evaluates to true and it returns yes instead now we've just returned character vectors yes or no but of course you can do much more complicated things if you want to like putting a function in here and if this evaluates to true then I will look here and perform the function and however you've defined it now that's great if you have a conditional statement that you want to test and you just want a binary decision so either do this thing or do this thing but sometimes we want a little bit more control over our conditional statements and the if statement is a way of exerting finer control over what happens so if we define two variables a and B a is equal to four B is equal to five and then to start an if statement we just type this and then in brackets we give the conditional statement two so that the logical statements are to evaluate so we'll say that a is smaller than B which of course is true following the brackets we open a set of curly brackets and then it's inside these curly brackets that we tell our what to do if the logical statement is evaluated to true and convention says to make these if statements a little bit easier to read we usually put this next section on a new line and you can see that our studio has already put the closing curly bracket on the line below and the code has been indented so that we can easily see that whatever we write on this line and is part of the if statement so let's say that if a is smaller than B return a divided by B and then if we run this if statement we get the output of not 0.8 because a is smaller than B so our evaluators to true because it's true are then execute to the code inside these Culebra if a is equal to five and we run this nothing happens so our evaluators defaults and because this is false the code within the curly bracket is not run and nothing is returned now like above sometimes we might want our to evaluate a logical statement and it is true perform one action and if it's false perform another action and you can do this with the if-else statement that we started with but you can get a bit more fine control if you combine the if statement as we've used it here with a separate else statement so I'll show you how that looks in a second so we'll start with a similar looking if statement as before so if a is smaller than B close the parentheses and then we open up the curly bracket because this is where and the code is going to go if the logical statement evaluates to true and then if this is true we're going to ask us to perform the paste function so the paste function is useful because it allows us to literally paste together different sections of output so for example we can paste together a character string so we want output to the console the string be divided by a it is equal to and then with a paste function and if we separate the arguments with a comma it'll paste together and all of the elements that we've separated by a comma so then we want to paste with this character string the value of B divided by a let's just set a back to being four and then I can show you actually what this will look like with the paste function so if we run this snippet of code our looks at this logical statement a is smaller than B it evaluates to true and so it runs the code in the curly brackets and so what happens here is our paste together the character string B divided by a equals and the actual value of object B / object a which is one point two five so it's pasted these two together to make a single message at the bottom of the screen so is it usual to use this paste command sometimes to produce and human readable output messages but if this evaluates to false we want our to output something different and so we're going to use the else command so the way this works is that on the same line as the closing curly bracket and this is really important otherwise it won't work so on the same line as closing curly brackets we type the word else and open a new curly bracket and again convention says that we start the the body of this Elsa statement on a new line and so if this evaluates to false instead we want our to paste a divided by B equals oops separate with a comma and then instead we wanted to perform a divided by B and so what will happen is if a is smaller than B this evaluates to true and our execute the section of code in the curly brackets for the if statement and then it stops it doesn't execute the else statement if this statement evaluates to false it ignores the curly brackets after the if statement and instead goes straight to the else statement so if we run this code we get the same output as we had a second ago but if we change this around so if we make a 5 again and run the code instead of having a divided by B equals 1.25 we see we have a divided by sorry it's that is having B divided by a equals one point two five we have a divided by B equals one it's ignored this section of code and instead has run this section of code so again this is a instance where we basically have a binary outcome if this is true perform this it is false perform this but sometimes we will have perhaps three or more different outcomes or results that we want depending on our our logical statement and we can achieve this using the else if statement not to be confused with the if-else statement so I'm going to copy this section of code this if statement and just to make things a little bit quicker for ourselves and after and this first if statement so before the else put this on to a new line we're going to add an else if oops else if statement so the else and if our two separate words and then we basically treat this like a normal if statement so we open standard brackets and give some kind of logical statement inside so let's say a is equal to remember the double equals means is equivalent to B close the round brackets and then open curly brackets for a second which is going to delete the closing bracket because we're add it in a second and two because we add it back in in a second so in front of the else statement and then if we go into a new line let's say that if a and B are equal we want our to print and a and B are equal and then on the new line we'll put the closing bracket for the else if statement so what will happen so far is I will evaluate this expression if it evaluates to true it will run this code and only this code if this code is false it will skip this code and instead will look at the logical expression that we've given the else if function so this is basically saying if this is true do this else if this is true perform this and then if this is evaluated to true it will perform this code and only this code if this evaluates to false then finally we rely back on the elf statement to perform this code regardless of what else has been evaluated so that that happen the elf needs to be on the same line as the closing bracket of this and else if statement and so we've got our initial if statement our else if statement and our else statement so we have three possible outcomes and so if we set this back to full this if statement evaluates to true and we get the result of this B divided by a is one point two five if a and B are both equal to five this code is not run because a is no longer smaller than B this code evaluates to true and so we get a and B are equal and the final elfs code is not run if a is equal to six then hopefully you can see that what will happen is the else statement will kick in and we get a divided by B equals one point two so we're just sort of performing very minor operations pasting some text and a quick division together but I hope that you can see that for example you could use this to you could give it the vector of data from your data frame ask it to apply certain operations or transformations to the data or to select it and data depending on whether it meets certain criteria and you can make these if statements more complicated as more S&L else if statements inside them if you want to perform four five six or eight or even more different operations depending on what the data evaluates to sometimes we want to perform functions or actions and while a particular condition remain is true and we can do that using the while statement so the while statement is sort of what it sounds like so you start it with the word while open parentheses and then you give it a logical condition to evaluate so for example while number one is equal to number one just like the the for us or the if statement we give the curly bracket and convention says that we put the next code onto a new line and let's say we want to print this is true notice this time that we are explicitly wrapping this character string inside a print function there's a difference in our between printing something such as text and returning a value and if you run the code that we're about to run without this print function and you won't actually I will return the value but you won't actually see anything printed to the screen so I'm just simply using this so that we see this this text printed as a message difference between returning and printing is that say we were to save this as an object and if our if this function returns a value then that will be saved as part of the object whereas if we print something not actually saved to anything it's just printed out onto the screen and so if you hit control + Enter on this basically forever and ever our will continue printing this is true this is true this is true because of course the number one is always and forever going to be equal to number one and if you get into an infinite loop like this which is not a good thing usually you can either click this stop button or just hit escape and should stop so I've used it just to illustrate what while does but try not to get yourself into infinite like this and so what I hope you can see happened is our evaluated this statement if it evaluates to true it performs the code between the curly brackets and then it goes back to this statement again and evaluates to see whether it's still true and of course after printing this is true maths remained the same and number one was equal to number one again but this while statement is useful for wrapping functions or wrapping code in here that changes or may change the whether this statement evaluates to true or not so let's let's take an example to make a bit more clear so let's initialize two objects X which is equal to 1 and Y which is equal to 10 and let's make a while statement that says while X is smaller than Y open curly brackets print X is smaller than Y and then we're going to modify X and so we're going to say that the we're going to overwrite X and the the new axe that we're creating is going to be the current X plus 1 and then the closing curly bracket and so what will happen is our will evaluate this conditional statement which in the first instance will be 1 it is one smaller than 10 this evaluates to true and so our execute this first line and it will print X is smaller than Y to the console and then it will run the second line which creates a new X which is X plus 1 what has done that it will go back to this statement and evaluate whether it's still true which it will be because now it will be is 2 smaller than 10 it's true so it will print this and then it will increment X by 1 again and so 3 is smaller than 10 4 smaller than 10 and we'll keep going until X is now equal to 10 this will evaluate to false and the whole loop will stop so if we run this we get X is small right excess money it keeps going until X is incremented and actually you can see over here in a global environment that now X is equal to 10 so the while statement is useful if you want to repeatedly apply an operation or a function which changes the original variables that you supplied to the while statement such that the operation is performed at a certain number of times and then until this condition evaluates to false now sometimes we might want to apply a function to every object which is part of a list or part of a data frame and we can use that it do that using for loops so let's create an empty vector called result so if you just use this you should remember the the C function the combine or concatenate function that we use to create vectors and if you assign the result of this combined function to an object then it will create an empty vector with nothing in it now what we want to do is fill this empty vector with the values of the numbers 1 2 5 squared and we can do that with a for loop the basic structure of a for loop is that we call for and then we open bracket and then we supply curly brackets in the standard brackets the first argument or the first thing that we want to to state is the object or the unit that we're going to apply our function to and now actually this can be whatever we want we can name it whatever we want a common convention is to say I which stands for index or you can use words which make it a little easier later on for you to come back and understand what it was the UI to writing other so it could be for subject or MOUs or in our case we're going to say number and then you use the word in and then you supply the list or vector or the structure which holds all of those elements that you have to I to write over so for us it's going to be the vector 1 2 5 so so far what we've got is we're telling our that for every number in the vector 1 2 5 apply whatever is within these curly brackets again we call this number because it's an easy way of remembering exactly what it is that we're generating over a common thing as I said is to use the letter I just because it's it stands for index and it's a bit shorter than typing out number each time so actually we're going to stick with that because this is a slightly better convention and then inside these curly brackets we're going to supply the function and that we want to apply and again convention is that we put this onto a new line and we're going to say that for the ice element in result assign this to be I to the power of 2 so what will happen is for every number in 1 to 5 so so that will slip we start with number number 1 the code that is executed is actually result index number 1 so the first element of the result vector is equal to 1 to the power of 2 then we go to the next element which is number 2 so result index to the second position in the result vector is equal to 2 to the power of 2 and then we do the same for 3 4 and 5 and so if we run this and then look at our result object we have + 1 4 9 16 and 25 so 1 2 3 4 and 5 squared again and we could we use I but we could have set to every number you one two five the result indexed by number is equal to each number to the power of two let me get the same result so what you use as a name doesn't matter so you can either use something more verbose to remind you of what it is that you're I try to rating though that or you can just use I it doesn't matter now we can combine for loops with if statements so for example we could and I to write over every number in one one two five and reach one of them ask the question if this number passes some kind of test apply this function so we'll start with our four statement so the same for for for I in the vector 1 to 5 close bracket open curly braket and we're going to say if I modulo 2 is equal to zero so this is probably a new function that we've we've encountered a new operator two percentage marks meet is the modulus operator and what this means is divide this number by this number and return the remainder so for example for modulus two is zero because there's no remainder and leftover when you divide by two and so what we're basically asking here is if I is an even number it was an even number this value will be equal to zero and it's not an even number then it won't be equal to zero so if this is true we want to return I is even else.i is odd and so I hope you can see that all we've done is wrapped this if and else statement inside a full statement and so every single element in the vector one to five will be passed through this if-else structure to see whether and each element is even or odd so if we run this oops I forgot to wrap these in print function because of course it returns the value and doesn't actually print it to the console if we rerun this then for number one it says I is odd for number two it says I even and so for each element in the vector one through five is passed through this if statement and we get an output for it now if you go onto Stack Exchange or any of the our forums looking for help with your for loops and you'll often find that people a lot of people and sometimes unhelpfully tell you don't bother with for loops just use the apply family of functions there's a couple of reasons why people tend to prefer the apply family functions over for loops and we'll look at some examples in a second and one of them is that sometimes not always but sometimes the apply family of functions are quite a bit faster than using for loops if you're iterating over tens of thousands of elements apply family functions can be faster another reason is that actually they tend to be much more concise much easier to code you can do on a single line what it might take you six or seven lines to achieve with a for loop and finally and apply family functions are considered safer because whereas a for loop can change a variable that is already in your global environment the apply family functions don't do this which it may not be overly obvious at the moment why that's important but it can be useful for avoiding making mistakes and modulating variables in your environment that you actually didn't want to modulate and so they're they're called the apply family of functions because there are lots of them so apply L apply s apply V apply T apply M apply we're just going to look at three of them L apply s apply and apply and look at instances where they are they can be used instead of using slightly more complicated for loops so let's start with L apply so what does L applied stand for it basically means apply a function to a list of objects so that to every element of that list apply a function and return a list so the L stands for list apply so it takes a list of input apply the function to every element of that list and then returns the output as a list as well so let's just initialize an empty list like we did with the vector before so for pi in one mm define a variable called norm an object called normal I want this to be list of random normal values ten of them for each list with a mean of zero and a standard deviation of one so what this this looks like on its own these are ten random samples from a normal distribution with a mean of zero and a standard deviation of one and the fact that we've wrapped it inside lists means that this is a list of length 1 is ten elements inside it and then we're going to update my list by combining my list which initially contains nothing with our list of length one containing ten values from our normal distribution and then we close the curly bracket and so what will happen is that for every element in the vector of one one to a thousand basically a thousand times I was going to generate a list of length one which contains these random samples from a normal distribution and then it's going to combine and those values with the existing values in my list so at the moment there's nothing in my list it's an empty list when we do this once it will be a list of length one with these ten random values in it when it does it again those values will already be there and we're going to add the new values that we create when we do this the second time and so if I run this code and we call head on my list and we'll just see the top three basically we've got the the first element of the list is a vector of ten and random samples from the normal distribution so is the second item in the list so is the third item and this has gone on for a thousand times okay so that was just to produce a list of vectors from a random normal distribution now we want to extract the mean value from every single list element so from every every single sample of ten and values and so we can do that using for loops so we say for I in and what we can do is we could say one mm because we know that there are a thousand elements in the list but actually often we don't know exactly how many elements that we have and so what we can say is in the in the and the vector 1 to whatever the length of my list is and actually if we if we run just this little snippet we get the vector 1 to a thousand so this is the same as typing I in one two and a thousand but for the sake of being good programmers and we're going to say 1/2 length of my list and we want our list means and indexed by I to the ice element of our list means list the list that we created to hold the output the data is going to be the mean of my list indexed by I and remember that when we index a list we always put the index in double square brackets and that if we run this and call head on list means we get the first six means from our list of random samples now it took us a little bit of thinking to come to this this is far more complicated actually than it needs to be we can do this with far fewer characters and on a single line using L apply so let's just create a new object called list means to to hold the data and to this we're going to assign the out yet output of L apply and the way this function works is that you supply and the object that you're going to iterate over so my list and then you supply the name of the function that you want to apply to each element of that list hit control enter and then if we call head-on list means to you'll see that we get the identical results as we got with the for loop so for each element in the list we get the mean of those ten values that were randomly sampled from a normal distribution and these map these values match up to the the data that we produced using the for loop so hopefully you can see that by doing this and we take out a lot of the hassle a lot of thought into how to produce this output and compared to using this and the code is much neater is easier for us to see exactly what it is that we're doing and we're not in danger of altering any of our variables in the global environment so for example we could have done something inside this for loop that without us realizing and altered the original my list variable whereas the Alapai doesn't do this and instead returns the output which we then saved to a new object now the output that we got from L apply as in the for loop actually and is in the form of a list you can see that we have the first second third element of a list of individual values list a good for holding data but actually often we like to work in vectors because vectors are bit easier to work with as a form of data and also you can view a far greater number of elements in a single vector than you can in a list because the list is printed down the screen whereas an element a vector sorry is printed across the screen and so often we can get an output of a looped function by using an S apply and the s in s apply stands for simplified and so all s apply does is the same as L apply but it tries to return the output as a simplified vector instead of lists so if we just copy and paste this line of code we'll call it this means three and change the l2s and hit enter and then if we call this means three then we see that we get the mean for each element of our original list printed as a vector which is a much more amenable to us manipulating it as a form of data and also of actually interrogating all of the values one word of warning with s apply although it will try to simplify the results into a vector depending on the data structure that you give it it might not always be able to if it's not able to instead of giving you an error it'll just sort of do the best that it can which make sometimes result in you getting something unexpected so if you're using s apply which is fine always check or validate that the output that you're getting from it is actually what you want now the very last thing in this video is I want to talk to you about the apply function and because often we want to apply a function over every column or every row of a data frame so if we're working with data it's quite common that we may want to apply certain transformation or get some summary statistics from rows or columns of our data and you can do that with the apply function so apply really is full performing operations over rows and columns of arrays or matrices but we can use it for data frames as well so let's use the data function to load in our favorite iris data data frame and let's say that we want to get the mean of every column and or every numeric column in our data frame so let's just remind ourselves of the iris data frame so we've got these first four and columns of numeric data and then a fifth column of factor and so let's apply data to virus and all of the rows and just columns 1 2 4 because we don't want this species column so say similarly as for s apply and L apply the first argument is the object that you want to buy to write over the second object is basically where we tell apply whether we want to iterate over the rows or the columns if you want to iterate over the rows we supply the number 1 if you want to apply a 2 and iterate over the columns you supply the value 2 so we're going to start with columns and then you finish with the function that you want to apply so we'll apply the mean function let me hit control + Enter you can see that for every column our has returned and the the mean of the entire column if instead we wanted the mean of every single row we just change this to the number one to tell apply to look at the rows instead and then we get a vector of the row means so that was a very brief introduction on conditional statements and loops in R we only sort of kept things very very simple but I hope that you can see that you can if you have a basic understanding of each of these building blocks you can use them to make and quite succinctly and to make sometimes quite complex decision stretches of applying certain functions to certain kinds of data quite quick and intuitive
Info
Channel: Hefin Rhys
Views: 25,495
Rating: 4.9085712 out of 5
Keywords: rstudio, statistics, programming, loops, for, if, while, lapply, sapply
Id: 2evtsnPaoDg
Channel Id: undefined
Length: 39min 1sec (2341 seconds)
Published: Mon Jul 10 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.