Hadley Wickham - Cupcakes (for-loops vs map/lapply)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Yuni point at any amount of time you've probably heard someone tell you that for loops are bad and you're a bad person for using for loops this is absolutely not true but there are some advantages to kind of taking the next step beyond for loops and learning these techniques of functional programming which is basically instead of you writing the for loop every single time you're going to use a function that does the for loop for you so in some sense this is all about being lazy instead of you doing the work rely on the work at someone else's stuff and so I'm going to turn them illustrate these ideas using some cupcake recipes as motivation so here is a cupcake recipe this is a real cupcake recipe from the hummingbird bakery cookbook how many of you know about the hummingbird bakery it's a fairly famous bakery here how many of you've made a cupcake in your lives okay so if you've ever made a cupcake before and you read this recipe you notice it's quite explicit right put the flour sugar baking powder salt and butter in a free-standing electric mixer with a paddle attachment and beat on slow speed until you get a sanding consisting seeing everything is combined right or my favorite instruction here is continue mixing for a couple more minutes until the batter is smooth but do not over mix right worst advice ever don't over mix you can kind of take it as given right don't under mix it either don't over bake your cookies so this is great the first time you've made cupcake so it's very very explicit it spells everything out in detail but now imagine why you've mastered vanilla cupcakes and you want to go on to make chocolate cupcakes so what is the difference between a vanilla cupcake and a chocolate cupcake and they're gonna switch back and forth you might notice well I say you put cocoa in a chocolate cupcake and so you can imagine this is like a hundred page cookbook it tells you how to make cupcakes and every page repeats this information and changes a few small things so this is important because now that you've mastered vanilla cupcakes and chocolate cupcakes maybe you want to go on to and win your own cupcake recipe it's really useful to kind of be able to easily see one of the parameters are the same and what are the parameters they're different but it's hard to follow because there's so much text here so what I'm gonna do is I'm gonna refactor this recipe I'm gonna rewrite it as if I was rewriting a function to be easier to understand so one unfortunate story about this cookbook my friend and New Zealand recommended it she said this is a really great cookbook and then I bat border and made some cake baked cupcakes from it and they tuned out terribly and I complain to her and I say this is a terrible cookbook but it turns out unfortunately I bought the American version which uses these ridiculous like a scant 3/4 a cup of sugar instead of reasonable weights and metric measures so the first thing I do this recipe is I'm going to convert it to reasonable units the next thing I'm going to do is I'm going to rely on some domain knowledge so I'm going to assume that you've done some bait made some cupcakes before we've done some bacon before and I can reduce the recipe to this so mix these things together until sandy and then beat them in with the wet ingredients now this is this is this amusing historical anecdote is that when I was a kid growing up I did quite a lot of baking with my mom and her recipes always really annoyed me because they're just a list of ingredients and they would say baked hot oven 10 minutes and there was no there were never any instructions it was just assumed that you would know based on the ingredients what to do with it and so at that time that really frustrated me so I rewrote all these recipes with very explicit instructions about what to do and now I'm rewriting recipes in the opposite direction other thing we can do here that's going to make it easier to generalize I'm going to kind of use some variables let's say just beat the dry ingredients together and then mix the wet ingredients and and now that I've done that I can put multiple recipes on a single page and this is useful because we because we can now see precisely what is the difference between a vanilla cupcake and a chocolate cupcake and it's not that you add cocoa right is that you substitute flour for cocoa so this is useful because now if you made this cupcake and it wasn't chocolaty enough it's kind of obvious what to do right you put more cocoa in instead of flour or if it was too chocolatey you'd put more flour and less cocoa and now that we've done that well the next kind of step is to start thinking about these recipes as data so I can now put them in a data frame and I can fit on even more recipes on the page so now added a recipe for lemon cupcakes so what's the difference being a lemon cupcake and a vanilla cupcake well instead of adding vanilla you add some lemon zest right so this is important because if you want to go on and generalize to create new types of cupcake recipes now you can kind of see what's the same what's normally the same and what's different well there's another recipe red velvet cupcakes you can kind of see immediately that these are a little different right they have more flour they have no baking powder they're more sugar and they have more butter so right away you know this is going to make a bigger mixture but it's not going to rise as much probably now there's one sort of interesting there's still a one egg right because eggs come in integer quantities and although interestingly if you really get into baking the integer number of the eggs is not really accurate enough and it's even better to weigh out like 40 grams of the egg which I assure you is quite frustrating so what's this got to do with programming well here are a couple of for loops and I want you to just look at them for a couple of seconds and see if you can figure out what they do what's the same between these four loops and what's different so these four loops so there's three parts to every four loop your four loop should always start by allocating space for the output so here I'm going to make a vector of doubles that's the same as one element was victor feech column and the empty cars data set it's the first player you always want to do this or your for loops they're gonna be terribly slow if you have to grow the output at every iteration then we're going to iterate we're gonna say four one over seek along how many of you've heard of seeker long before so seek along just quickly demonstrate that say we've got one two three your c-cups let's not do that let's color it just gives me a sequence of vectors a secret of limited is the same thing so basically like this so what happens if I just have two elements in this vector I get the numbers 1 2 2 what happens if I have one element in this vector so what should happen if I have a vector with no elements in it I should get no integers right then that's what I get with seeker long I get an integer of league 0 but if I use 1 to link X I was going to helpfully count back from 1 disease so normally you don't like deliberately use zero link vectors in your code but you you want to protect against accidentally using it because that's not going to give you an informative error message you're just going to get this weird error message somewhere later down the line that's gonna have you're gonna have no idea what's caused it so seek along is just a little protection okay so we've got the output we've got what we're looping along and then move up the operation extract the eighth column of empty cars compute the mean and then save it and the I element of this vector so what's the difference between this fall open that followed well we're just computing the mean in this one and the median that one so the problem of for loops is that they kind of tend to emphasize the objects that you're working with where is what actually important is the action right this is the critical difference between these two for loops but it's hard to see it because it's you know it's like 5% of the total characters I typed it's what's actually different so it's hard to see what's different if there's a lot of things that stay the same and so what I think you should do instead or what you should learn to do over time is use a function that wraps up that for loop so I'm going to show you the functions from the per package these have many Anna and analogs and base are but the functions and / are kind of more consistent and have a few shortcuts that tend to help you out so this code does exactly the same thing we're going to take each element of the empty cars we're going to apply the mean to it and we expect that the output is going to go into a double vector and so indeed if we look at the code for that map double function this is exactly what we see we see exactly that same code we used before but instead of using a specific function like mean or median that's now an argument to the function so in our functions can take other functions as arguments and this allows us to write not a specific for loop but a generic for loop now this is actually somewhat of a convenient lie the source code to map devil doesn't actually look like this because for various not really important reasons I decided to write it in C rather than but basically this is what it's doing behind the scenes it's still doing the same thing it creates a space the output iterates over each thing in the input calls the function and it and saves in the output and then returns it there are lots of other functions right if we've got a function that returns doubles you might imagine we also a function that returns integers we have a function for characters and for logicals and then the most generic form of all is map which returns a list this is the most general because anything and I can go inside a list now those functions have all varied in their outputs these functions could also vary in their input so for example map two instead of taking a single vector X it takes a vector of x and y and it loops through those in parallel right so calls X if what the I spell you item an X and the I fail you and Y right and so you kind of imagine like map three and map four and so on which don't exists because there's a more general form or P maps or four parallel map and if you could have going back to the the cupcakes analogy we start thinking about recipes as data you know you can even think about functions as data so here I put the list of functions I want to apply in a list and I can use some pretty terse met a pretty terse per code to basically for each of these functions computed for each element in this data frame so now kind of moving up higher higher along this tower of abstraction and you know you maybe you'll never get to this point but still there's this idea that you can write a function that wraps up a common pattern of for loop is a really really powerful idea
Info
Channel: Dmytro Perepolkin
Views: 15,203
Rating: 4.9593906 out of 5
Keywords: #rstats, #r, #functionalprogramming, #forloops, #lapply, #purrr, #hadleyverse, #hadley, #user, #thankyouhadley
Id: GyNqlOjhPCQ
Channel Id: undefined
Length: 12min 35sec (755 seconds)
Published: Tue Jun 21 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.