Python's secret second argument to iter()

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey everyone, welcome back to mCoding. I'm James Murphy. Today we're talking about the two argument form of iter(). That's right. iter() can actually take two arguments. And I bet you've never used the two argument form. Don't worry, though. You weren't missing much. First up, though, the one argument form. If you pass in one argument to iter(), all it does is call that argument's dunder iter method. If you've used Python for more than a few months, you've probably already come across this. Dunder iter allows you to customize what it means to iterate over your class. In this case, my class is a LucasSequence. It's a generalization of a fibonacci sequence. You define the next term in the sequence as p times the previous one minus q times the one before that. If you're familiar with the linear way to compute the fibonacci sequence, that's exactly what this code is doing. And in fact, by choosing the right initial values and constants, we can actually create the fibonacci sequence. We grab out the iterator and every time we call next, we get another number of the sequence. So, when we run the code, down here at the bottom we see 0 1 1 2 3. That's the start of the fibonacci sequence. Of course, the one arg form of iter() is deeply connected to how for-loops work. If I do for n in fib, then it's going to call the one arg form of iter() to get the iterator and then repeatedly call next putting that value into n. And this is probably the main reason why you've never used the two arg form of iter() Because normally, you don't call it or directly. You just use a for-loop. So, what does the two arg iter() do? Well, it's mostly equivalent to this code. The first thing to note is that although it does return an iterator, that's basically where the similarity with the one arg form of iter() ends. It very confusingly does something completely different. It doesn't even take in an iterable for its first argument. Instead, its first argument needs to be a callable, not an iterable. And no surprise, it's going to call that callable. So, this version of iter() is calling dunder call, not dunder iter. Whenever you ask it for its next value, it will call the given callable with no arguments. If the value equals the sentinel, then we return which ultimately leads to a StopIteration being raised. Otherwise, we yield that value. So, if you've never heard of this word sentinel before, you can also think of it just as a stop token. Keep calling the function f yielding its values until you see the sentinel or the stop token. And then that signals the end of the process. So, how might you use this two argument form? Well, actually, passing in a function is not going to be that helpful. If I have a pure function as most of your functions should be, then look what happens. Let's suppose that my sentinel is not 42. In this case, I'll just use None. Well, I'm supposed to keep calling the function until I hit the sentinel None which is never going to happen. It always returns 42. So, every time I call next, it's just going to return 42 again. Okay. So, that's pretty useless. Or, if I did put in 42, then I just immediately get a stop iteration because that's the sentinel. So, that's also pretty useless. So, what we've discovered is that this is useless for pure functions. But if you rewind, I actually said that it takes a callable, not necessarily a function. So, it's pretty much necessary that every time you call this callable, it changes some kind of global or internal state. That way, the next time it comes around and calls it, something else can happen. The most common kind of state mutating callable that you're going to run into in practice is just a class that defines a dunder call. So, let's take a look at one of those examples. The goal of this ChunkedReader is to allow you to read from a file in chunks of given size. f is our file handle. And every time we call this object, we just read that many bytes from the file and return the result. And note that every time we call read on a file, this advances where we're reading from in the file. So, eventually if we call this enough times, we'll get to the end of the file. And when that happens, it'll return either empty bytes or the empty string depending on whether you're in binary mode. So, we could tell that we're at the end of the file. And that we want to stop when we see that empty bytes or empty string. So, that will be our sentinel value. Then we could use the ChunkedReader like this. We'll open up some example file. We'll make a ChunkedReader that reads from the file 4 bytes at a time. 4 is really small but it's just for demonstration purposes. Then this says: Keep reading through the chunks until we hit the empty bytes sentinel value. Here's the file I'm reading from. It's just going to be 123456789 all the way up to 20. And when we run it, you can see '1234' '5678'. We get chunks of 4. So, technically this works. But there's a few problems with the design. First off, I had to create this ChunkedReader class in the first place. That probably shouldn't be necessary. Secondly, it's kind of a bad choice to call this method dunder call. Now, we had to do that in order to make it work with the two argument iter(). But it should really be called something like read_chunk. Normally, it's a bad idea to define something like dunder call just to allow for some special syntax, unless that syntax actually makes sense. If I was defining some kind of function proxy object where it would make sense to call a function, then defining a dunder call might very well make sense. But for this ChunkedReader, I don't think so. So, if I really want to make the two argument form make sense, then I need to pass in a class that kind of models function semantics. Pretty much the only realistic example of this that you would run into is funktools partial. This takes in a callable and some arguments to pre-fill for that callable. And you can then call that callable at some later time. This example is the same as the previous one. But it allows you to avoid having to make that intermediate class. Of course, though now, you have this monstrosity in your code. So, is that really much better? Well, I think it's a little bit better. But in my opinion, this two argument form of iter() was a mistake. I think that if you really wanted this behavior, it should be a function in the iter tools library, not a completely separate and different functionality in the built-in iter. So, stop using it. Or more likely, since you weren't already using it, just keep doing what you're doing. If it were up to me, I would deprecate this functionality and eventually remove it from the language. And I don't feel bad about suggesting this. Because as of Python 3.8, there is a much better alternative. Python 3.8 introduced the walrus operator and assignment expressions. Assignment expressions allow you to assign a variable and use it at the same time. You get to have your cake and eat it too. Every iteration of the loop, I can call my function however I want to and capture its value. Since I capture the value here, it now becomes available to use within the loop. But an assignment expression also returns the value back. So, I can continue using it and compare it to the sentinel value. In my opinion, this code is easier to read and the intent is much clearer. Additionally, this gives you a lot more flexibility on how to control when to stop. Here, I'm checking if the chunk is equal to the sentinel. But I could also check if the chunk is the sentinel, or if it's close to the sentinel, or any other condition involving the chunk and the sentinel. For example, here I take a starting time. And then every iteration of the loop, I get the current time. This condition basically says keep looping until it's been one second. Of course, I'm not going to hit exactly one second later. There's going to be some error. So, I can't just check for t0+1 as a sentinel value. But I can check whether or not I've exceeded that value. So, that's an example where the two argument iter just wouldn't work. In summary, the two argument form of iter exists. Don't use it. Instead, use assignment expressions. That's all I've got. Thank you to my patrons and donors for supporting me. If you enjoy my content, please do subscribe. And if you especially enjoy my content, please consider becoming a patron on Patreon. If you have questions, comments or concerns, feel free to leave a comment or join my discord. As always, thank you for watching. And slap that like button an odd number of times. See you next time.
Info
Channel: mCoding
Views: 58,886
Rating: undefined out of 5
Keywords: python
Id: YC-12-0sXR8
Channel Id: undefined
Length: 7min 19sec (439 seconds)
Published: Mon May 09 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.