Loop like a native: while, for, iterators, generators

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This is very useful! Thank you !

👍︎︎ 2 👤︎︎ u/chronodekar 📅︎︎ Sep 22 2016 🗫︎ replies

The reddit embedded view is linking to Jay Gattuso: From nothing to daily use - How Python changed my work life

👍︎︎ 1 👤︎︎ u/[deleted] 📅︎︎ Sep 23 2016 🗫︎ replies
Captions
to teach us how to loop like a native please give him a warm welcome thank you very much so my name is Ned Batchelder I'm not going to tell you who I am you can google me if you're interested let's get into the meat of the talk if this talk is too long and you didn't listen here's the summary iteration is in all of your programs Python has very powerful tools for iteration you can be very direct about your iteration I'm going to show you examples of being weird about your interation and you should stop doing that and you should write more abstractions we're used to having functions that abstract series of statements and classes that abstract bundles of data with methods to work on them Python also has tools for abstracting the iteration that we use and I'm going to show you some examples of how to do that this talk is labeled as a beginner talk because I start from the very beginning but it actually gets into more advanced material I prefer to think of it not less as a beginner talk than as a fundamentals talk and I think if there are experts who've wandered in here by mistake please stay I think you'll find something useful so let's talk about iteration basics this is an example of a loop let's say you have a list you want to iterate over all the values in the list and print out all the values you can start a counter at zero you can loop as long as the counter is still within the range of the list you can get the element out of the list you can print that value and then you can increment I to go on to the next element so that you can go back to the while loop and it continues around and around this is how most programmers are taught to loop initially because it uses the simplest tools available in the language and it's available in any programming language Python programmers know you shouldn't do this if you want to loop over all of the indexes of a list you can use the range function range of Len of my list will give you an I that ranges from 0 to 1 minus the length of the list you can get the element out of the list and you can print it but both of these ways are really rube Goldberg's right we started with the problem of I want to print the elements in my list and the first thing we do is we go we start talking about integers I don't care about integers what does that got to do with the problem at hand right we might as well have said the boot kicks the football which goes into the net and tips the watering-can over right so what you should really be doing if you want to loop over the elements in your list is you should use a for loop and just over the elements in the list for V and my list print V no integers right we haven't gotten distracted by some side issue about how lists are implemented we can simply loop over the list directly the for loop is a very basic statement in Python it's very powerful the iterable value produces a stream of values and we'll talk later about exactly what an iterable is but an iterable is a something that can produce a stream of values it assigns each stream value to the name and then it executes the statements once for each value in the iterable and the iterable gets to decide what values it produces this is of key point and lots of different things are iterable so the for loop will execute the statements once for every value in the stream produced by the iterable but the iterable itself gets decide what those values are and python has a lot of different kinds of iterables so for instance if you iterate a list it gives you its elements if you iterate a string it gives you its characters if you iterate a dictionary you get its keys which is a little counterintuitive people often focus on the values in a dictionary but a dictionary is really best thought of as a collection of keys each of which gives you its values which is why for for K and D gives you all the keys and if K and D asks if a key is in the dictionary notice that it gives them to you in a surprising order and also down here the dictionary has other methods if you want to iterate over different things so if you want to iterate over the values there's a method on the dictionary to get a stream of the values and you can also get a stream of the items the key value pairs out of the dictionary so this interesting example because notice that a dictionary is not a linear thing there is no order to a dictionary but when you iterate over it it gives things to you in an order files you can iterate them and it gives you their lines and this is very handy because if you want to open a text file and do something once for each line in the file you don't have to use dot read line in a while loop you can simply iterate over the file directly so once we've opened the file as F we can use for line in F and that will assign to line each a string which is each line in the file in turn so we get the first line and then the second line and then the third line the standard library has lots of other interesting iterables as well so for instance in the re module there's find itter which will give you a stream of match objects ones one for each place in the string where your pattern is matched the OS walk function iterates all of the sub directories in a tree of directories this is interesting because we've taken a fundamentally two-dimensional structure a tree of sub directories and we flattened it out by iterating over the sub directories OS dot walk has kind of a strange form you see it gives you three values each time the point here is that os del walk produces an iterable and the for loop can iterate over it itertools is a module which is full of all sorts of tools for playing with iteration the count function gives you an infinite stream of integers starting with zero and it just keeps going forever it never stops so probably here you want to have some way of ending this loop but the important thing to note is that an iterable can be an indefinite iterable there's no way to ask an iterable how many values do you have and that's a great thing that it doesn't it doesn't give you that ability because that makes live lets us have infinite iterables and lastly the itertools module has all sorts of cool functions that you can use to put things together here i've made a chain which repeats seventeen three times and then cycles forever around all of the values in range of four which gives me 17 17 17 1 0 1 2 3 0 1 2 3 0 1 2 3 forever I'm not sure why you'd use that but itertools gives you these pieces to build these pipelines together to make interesting iteration all right the point here is not so much that you'd need to memorize what's a knitter tools because you can't every time you think you have an iteration problem you don't have to solve it looking eater tools it's in there somewhere no one knows what everything that's in there that's fine don't feel bad the point is that Python gives you a way of creating iterables which are streams of values and they can take all sorts of forms I'm a point I'm going to make to you over the course of this talk is that your code has interesting streams of values that you are not producing in a nice iterable form yet and you should in addition to the standard library giving you interesting iterables Python has other things to do with iterables so you can iterate over them with the for loop but for instance the list function will take an iterable and will pull all the values out of it and give you a list of all of them don't use this on either tools count because you will run out of time a list comprehension can loop over an iterable and give you for example all of the values f of X for every X in your iterable some will take an iterable full of things that can be added together and give you the total of all of them min and Max will take your iterables of comparable values like numbers or strings and find you the least or greatest value and join will take an iterable of strings and join them all together and this is just a few examples of all of the different places in Python where you can use an iterable where you can loop over your data without having an explicit loop right the list comprehension actually says 4 in it so it looks kind of like a loop but none of the others do the loop is implicit in the function so an interval not only is something that you can strut construct to build your values the way you want but you can pass them into functions those functions can do things with them they can return new intervals to you and so on and so forth let's talk about some common questions so when we were looping over our list we got rid of the integers right we're no longer looping from 0 up to n minus 1 and the integer lovers out there who really wanted to use that form find a lot of ways to come back and say ok you were right about that case but I've got another case where I really need to use those integers so I'm going to go back to using integers and the first example is I also want the index right so I want to loop over everything in my list but I also want to know is this the zeroth one is this the 17th one is this the 230 third one and often what they do is they resort to using for I in range of Len of my list and I'm going to assert right now that anytime you see range of Len of something you don't need it you probably is a better way to do it so here we can use for I in range of Len of my list we can pull the value out of our list and then we can print both the index and the value right we've solved the problem how can I print out all of the values with their index but this is not the way to do it the way to do it is the Python gives you an enumerated function and enumerate will give you two values for every value you give it it will give you the index and the original value so here if I have a list of landmarks with the Eiffel Tower Empire State and Sears Tower if I enumerate those names and make a list of what enumerate gives me you can see that I've gotten a tuple of 0 comma Eiffel Tower and 1 comma Empire State and 2 comma Sears tower's so enumerate has taken an iterable and it pulls values off of the iterable and bundles them together with the index that's keeping track of how many it's gotten and it gives you pairs that are the index and the value right so now we can do the right thing to number our values without resorting to integer right we're still focused on the actual thing we care about iterating which is the values out of the list but we can also then annotate them with numbers right so we get the output that we want ok the integer lovers are now alright I really miss my integers but I can see that you've got that case covered oh sorry one other thing about for I in range of Len list a problem with this is that this step here my lists of I only works if the value you're iterating can be indexed right lists can be iterated over but you can also ask for the seventeenth element lots of iterables don't let you do that so when we use enumerate of iterable we don't have to index into the list for example an open file object you can't get the hundreds line in a file by saying f sub 100 you have to read them in order right so if we want it to read all the lines in a file with their numbers we couldn't use the range of Len of list trick enumerate does it for us and it does it better because it's less code we're not talking about integers and so on and so forth by the way here's the other bad way which is you use a for loop but you also keep an integer alongside right this is just dumb don't do this you don't need silly sidecars on your code like why is that little I running alongside like a puppy I'll next to your data right stay focused on what you're trying to do here right you can write simpler code that is better okay so the integer levers they'll grant us this but okay now I've got two lists right I've got a list of the names of my landmarks and their heights and meters and I want to print out the correspondence between them right so now now I have to go back to main searchers right I can go back to my integers please so you use for I and range of Len of names and then you can get the name out of the names list and the height above the heights list and you can do what you want we still don't have to do this the answer is that there's a function called zip zip takes a pair of streams and gives you a stream of pairs so what zip produces is a list of pairs where each pair is the first element out of the two lists the two intervals you gave it the next value is the two values out of the intervals you gave it the third value is all the third values out of the herbals you gave it and so on and so forth so here we've got less code right if you look at the previous slide we had four lines of code in our for loop and we were distracted by all this you know named sub i and height sub i here we've got less code and we're more direct we're not distracted by integers anymore so again zip is a function that takes iterables and produces an iterable right we're not just we don't have just have static data that we're looping over we can take an iterable as a value and manipulate it in a function to produce a new value which is itself an iterable very powerful and by the way just as another example of where python does interesting things with iterables the dict constructor will take a stream of pairs which means if we have a list which is the names of our landmarks and a list of their heights dict of zip of names and heights will give us a dictionary mapping the names to the heights because the zip gives us a stream of pairs dick takes a stream of pairs and makes a dictionary out of them so we're manipulating data in very powerful ways by constructing iterables and consuming intervals now let's say that we've built that dictionary we've got a dictionary called tall buildings which is all of these buildings with their heights there's various ways that we can manipulate this for instance the max function can tell us what is the hall tallest value we know that the tallest building is 828 meters the max of the items if we sort them by there a second element is the Burj Kalifa which is 828 and if we just look at the the we sort the keys based on their values then we get the Burj Kalifa out the name and all this just goes to show you that Python gives you the ability to loop tall buildings in a single bound that jokes the whole reason I'm giving this talk okay let's talk about customizing iteration so we've seen that Python gives you very powerful ways of dealing with data dealing with serialized data the units in a stream we can also customize it to be even more direct so let's say that we've got a list of numbers called thumbs and what we want to do is we want to do something for all of the even values in our list right the simple thing to do is that we can just simply iterate over our numbers and for the even ones this will test to see if it's even we'll do something and this is fine this is a really short loop but imagine that the actual condition we were looking for is much more complicated and so it's actually a number of lines of code for example we might want to separate that loop out this loop to me is doing two things it's picking values out of the list and then it's doing something with the values and the picking and the doing can be two separate pieces that can be abstracted apart so for instance what we could do is we could define a function called evens which accepts a stream of values and it makes an empty list and for all of the values in the stream it puts the even ones into our list and then it returns the list so now what we've got is a function called evens which is accepts an iterable and produces a new iterable which happens to be a list but it's iterable and that means that now we can say for n in evens of num do something so we've successfully separated them right the evens function encapsulate s-- which are the things we want to do things to and then the loop actually has the something we want to do so we've separated them this is this is abstract our relation right we've written a function that takes a stream and makes a new stream which is just what we want the way we made that is not so great and those of you who have dealt with generators before probably thought why isn't using a generator so what's a generator a generator is like a function a function when you call it it runs all the statements and it returns one value a generator when you call it produces an iterator and when you iterate the values in the iterator it runs the statements in the generator and every time it hits a yield statement it produces one more value so it's kind of like a function that can keep producing values over and over again and wait for instance here we've defined hello world this is a very simple generator you'd never see this in the real world it simply has two statements first it yields hello and then it yields world and in fact if you loop over that for X in hello world print X will get run twice once to print hello and wants to print world because it the generator made an iterable which when you ask it for values will give you hello and then it will give you world and then it will tell you that it's done so generators are a really really powerful way to implement iteration and if you take nothing else from this well you're going to take the Superman joke from this joke but but after that if you only take two things from this talk you should be writing more generators and if you don't know what generators are and you're going this confuses you because it's kind of weird and by the way if you're coming from other programming languages and you're learning Python generators are the first place where you think oh it's not just a syntax change this actually is executing very differently than I'm used to so I understand if it's a little bit complex and a little bit foreign look into it you'll be really glad so let's look at how we can use a generator to make our even's function so here's the same even's function but we've turned it into a generator and it was very simple all we did is we got rid of the initialization of the empty list we still have the same loop over the stream we still do the same comparison but then instead of appending to our list we just yield the value and then we can get rid of the return list at the end so the generator is actually much simpler than what people think of as the simpler thing which is to make a list and then we can use for N and evens of num and do the things we want to do great thing about generator is it's lazy for instance that first even's function if we passed it an infinite stream it would compute until the end of time trying to produce that list of all the even values right evens of itertools count would never complete but you can do evens of itertools count with a generator because all it does is it's executed statements until it finds something and then it yields a value so it would return zero to you and then it would return two to you and then it would return four to you evens doesn't care that the stream it's got is infinite right if you give it an infinite stream it will be an infinite stream but that's fine because the generator is producing values as you need them rather than chunking them all up at once by the way if you look in Python to range is a function that returns a list of all the numbers X range is a func is a generator that produces the list of numbers as you need them range of a hundred quintillion will run out of space X range of 100 and put onion will run out of time that's how you can keep some straight that was an another example of abstracting your iteration here we've got a more complex and realistic example of the same problem with the evens right with evens we were looping over stuff picking the ones we want and then doing things with it here we're reading a text file but the text file has a format where it can have blank lines or it can have commented lines and we don't want to look at those lines we're going to skip over those lines and only do something with the interesting lines so we open our file we loop over it we strip the whitespace if it starts with pound then we're going to continue meaning we're not going to end up getting down to the do something line if the line is actually blank after stripping it meaning it was a blank line then we're going to continue if we make it through that gauntlet of tests then we'll get down to our do something so the interesting lines we'll get down to here and we'll do something but again this loop is doing two things it's filtering out the values that we want to work on and it's actually doing the thing so just like we did with evens we're going to make a generator for it so we're going to take all of that logic that we had at the top it looks exactly the same but at the end we're simply going to yield the interesting lines so now we have a function called interesting lines which can take us an iterable I said it we took a file but notice we don't do anything here that knows that it's a file this is one of the great points of pythons polymorphic dump duck typing this function will take any value that can produce strings which means that we can test this function in our unit tests using lists of strings so we have file processing that we can test without having any files which is amazing so there's another good reason to abstract your iteration the piece you get out once you've pulled it out will probably be more testable and more applicable to more kinds of data than you would started with once we've got to generator then we can use it so we can say for a line in interesting lines of F and this kind of processing of lines and skipping the comments and the blank lines is the kind of thing that will probably come up in your programs many times and so you you've saved the logic and moved it into a generator we've got it in just one place and you can test it and not worry about it so much so this is a great example of abstracting the iteration right you've taken what seemed like a simple task open a file look at some lines do something with just some of them but instead of having the explicit conditions right there in the main line of your code you've moved it into an abstraction that gives you a stream over interesting lines in the file rather than a stream of all the lines in the file and this is what I mean about your data is probably something that you can make abstractions of iterations for okay another question from the iterator levers from the integer lever sorry where the iterator levers the bad guys are the integer levers how do you break out of two loops let's say you've got a spreadsheet right and you're searching through the cells of a spreadsheet for something we all know how to do a search through the cells of a spreadsheet you have two loops right you loop over the rows then you loop over the columns and then you get the value out and then you can do something with it and then if this is the value you want you want to stop the loop there's no way to break out of two nested loops in Python right and the typical answers are well you put the inner loop into a function so you can return or you use a try except block those are all the wrong answers the answer is that you make it a single loop so instead of having a double loop over the rows and columns we can make a generator that produces pairs of numbers it's a 2d range so we say we want all the coordinates in a grid that goes from width to height and so we've got our double loop in our it generator and we yield the pair's and now we can have a single loop that loops over our spreadsheet right and so now we've got the answer is easy we have a break so not only if we gotten rid of the integers right much to the chagrin of the integer levers but we've got simpler code that we can actually do more with which is fabulous now of course this is still kind of silly because we're still sort of talking about integers because column and row is coming out of there the right answer is the spreadsheet should have a method on it called cells which gives you all the cells and the fact that if the 2d spreadsheet or a 3d spreadsheet why do you know that at this level you just want to get all the cells out of the spreadsheet and once you've got that then you can still have your single loop let's talk a little about low-level entry iteration and how you can customize your objects what's actually happening at the lower level I've been using the terms iterable and iterator and people often think they're interchangeable they're not an iterable is any value that can produce a stream of values and an iterator is the object that knows where you are in that stream you can think of it like a book full of pages is an iterable and the bookmark is the iterator an interesting point about that is you can have two bookmarks in the same book and the way it works is that an iterable produces its iterator by being called on the inner function so the inner function will take an iterable and produce an iterator and then the iterator you can use the next function to just do next next to get values off the iterator and that's the only operation that's supported on iterators you can't ask what the last value was you can't ask what the next value was you can't ask is their next value you can't ask how many values they're going to be you can't say please give me back this value the next time I ask you for a value you can't do any of that stuff all you can say is give me the next value and that simplicity is the power of iterables because because it's promising so little it's applicable towards broadest us a broader set of data as possible lots of different things can be iterable because you don't need to know much about the data to iterate it as an interesting example of using low level iteration let's say you have a text file that's got a CSV file maybe and it's got a header line and then it's got all the data lines so you want to read one line off the file and then read the rest you can use the next function directly on your open file and we'll give you the first line as a string and then when you loop over the rest of the file it will give you all the rest of the lines so that you can use next to customize just pick one value off of an iterator and then the next time you iterate over all of it it picks up from where it's pointing to and reprint it brings you all the rest how do you make your own objects iterable so going back to for instance the spreadsheet case right it had a cells method on it we're going to make a to-do list here we're not really going to make it it's got almost no code in it but let's say we have a self tasks attribute which is a list and it's going to be lists of these task objects the way you make an object iterable is you implement the dunder itter method and when someone calls it err of your to-do list it will invoke the dunder inner method and the done diretor method has to return an iterator and the simplest way to return an iterator is to use the inner function on some data that you've got so if we have self tasks which is list Aitor of self tasks is an iterator over that list and now our object is iterable because we've got a dunder it or method that returns an iterator and i know getting all scrambled about iterator an iterable and it'll probably sort itself out as you work through this once we've done that we can construct a to-do list and now we can say for task and to-do if we hadn't implemented under it err and you tried to do that it would say that it is not an iterable object but now it is right so we've made our own object iterable directly so we don't need to know that there's a task list in there and loop over the task list we can just loop over the dualist and by the way dunder itter is a great place to put generators remember I said a generator is a function that when you call it produces an iterator and then 10 minutes later I said dunder it er is a method on your object that when you call it has to produce an iterator which means dunder it er is a really good place to use generators so here for instance we've changed our to-do list so that when we iterate over it we're going to look through all the tasks and only if the task is not done will we return it so iterating the to-do list directly gives you all of the not yet done tasks and we've added another method called all which gives us all of the tasks which we know how to do we just did in the last slide a tour of self tasks will give us an iterator over our tasks list and by the way here's another cool thing which I don't have time to explain which is called a generator expression this does the same kind of logic that our dunder itter method does but in one expression and it produces a generator without actually implementing a whole function for it so once you've got the hang of generators remember the generator expressions look just like list comprehensions but they work like generators so that they're lazy and they only produce things when they need to wrapping it up thank you for getting that joke iteration is everywhere right there's there's very little chance that any of you are writing real programs that don't have to iterate over data somewhere python has a very clean and powerful model for iteration that's used throughout the standard library but isn't as widely used throughout our own code we're not writing enough generators and under itter methods so abstract and customize your iterations and if you do you will have superpowers too thank you we have about five minutes for questions if you'd like to ask a question please line that well that's a really bright please line up behind the mic there and ask I'm at your mics off hey Ned um great talk one nitpick you you mentioned that X range is a generator I know it's not a generator it's an object that is an iterator right okay okay so yeah you can iterate over a generator multiple times you can't do that on a generator you can also slice but you just blew it because you said you can iterate over generator multiple times I may iterate over X range actual times I twist it don't snipe at me dude turn it turn about is fair play anyone else any way else no that means I could have left in three slides I won't do it to you I'd do it to Matt Matt and I are good friends that's okay can you talk a little bit a little bit about the the stop iteration yeah so so what I didn't mention was that when you call next of an iterable it will produce you a next value except if there is no next value then it will raise an exception that's how that's how the end of the iteration is indicated in that low-level iterable protocol and that's a great thing because that means that any Python value at all can be an element of an iterator iteration you can have an iteration that produces none none none none 0 0 0 false false false and there's a special thing that indicates the end which is the stop iteration exception that gets raised so for example in that code where I used next on the file technically if I use an empty file here I'll actually get a stop iteration exception when I try that because there is no next value so you might have to watch out for that if you're using the next function it turns out the next function also is a second optional argument which is the value to return if there actually isn't another value so you can avoid the stop iteration if you're doing that so I've seen some code where they are modifying the list as they're iterating over it yeah and what they usually do is have an integer iterate over the list backwards and remove elements would you recommend instead creating the Nuala there's you it's usually much better to just make a new list so so the question is about iterating over a list and modifying the list as you do it and for instance if you have a list and you're going to examine elements and you want to get rid of those elements and you delete them from the list if you do that going forwards then the iterator thinks you're on the fourth element you remove the fourth element a new element comes down to be the fourth element and then the iterator goes to the fifth element and you've completely skipped this element right because you're modifying the list as the iterator is pointing at it one way that people solve that is they just do it backwards they go from the ends deleting things and then they don't have to worry because the iterator is going back that way but really the better way is just to make a whole new list right instead of deleting the ones you don't want make a new list which is all the ones you do want and they don't get into that trouble and other things have the same problem if you modify a dictionary while you're iterating it you'll actually get an exception because the dictionary machinery can tell it you've done that and it knows that your program is not going to work right because you can't predict the order of elements in a dictionary so you can't even you know psych it out and do it backwards to fix it so you get an exception in that case thanks anyone else I won't snipe at you I promise No thank you sir go for us
Info
Channel: Next Day Video
Views: 102,146
Rating: 4.9682035 out of 5
Keywords: psf, pycon2013, talk, NedBatchelder
Id: EnSu9hHGq5o
Channel Id: undefined
Length: 29min 14sec (1754 seconds)
Published: Thu Mar 21 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.