Hello and welcome to mCoding where
the only limit is your imagination. And your download speed. But mostly your imagination. In this episode, we're talking
about Python generators including the yield keyword,
generator comprehensions, yield from. And how all
this relates to async. You define a generator function in
Python just like you would any other function using the def keyword and they
can take parameters. The only thing different that
makes this a generator is the presence of one of these
yield statements somewhere in the body. Generators act like normal functions. Except
when you hit a yield statement, it pauses. Every time you pause you
can also yield a value, in this case, "hello" that
becomes available to the caller. Unlike with functions, calling
the generator doesn't run the generator. Instead printing out the generator,
we just see a generator object. The way you actually run a
generator is by calling its next method which is what the next
built-in will do. It will run the generator
until it hits a yield statement. And return the value at
the yield statement. With every next call, it resumes the
generator until it hits another yield statement or until the function ends. If you resume the generator and the
function ends before hitting another yield, you'll get a stop iteration
exception raised. Returning a value from
a generator is fine. But it doesn't appear as the
result of a next call like the others do. The return value of the
generator actually appears as an attribute on the stop
iteration exception that's raised. This is mainly for a very niche purpose
that we'll talk about at the end. While Ariana Grande might prefer to thank you
next next next over and over again, in Python, it's more common to
let a for-loop do it for you. This works because under the hood a for-loop will
call that next function over and over again until it finds a stop iteration
at which point it stops. Check out my video on what
a for-loop actually translates to under the hood if you haven't seen it already. In any case, here we see the items printed out hello world123. So what are generators good for? The most common use case for a
generator is to define an iterator of a class. For a simple example,
consider this `Range` class. Just pretend you don't know
about the built-in range. And you're building
a Range class yourself. In the spirit of being lazy
like a generator, our ranges are all going to
go from zero to some stop value. We don't support start or step. So just like the built-in range which
we don't know exists, range of 5 is the numbers 0, 1, 2, 3, 4. But when we create the Range, we don't
actually store all those numbers somewhere. We just store the start on the stop. But if all we have is a
start and a stop, how do we iterate over the elements of
the Range that we're supposed to be representing? The answer is the highly
sophisticated solution of counting. Start at the starting value. And then continually yield the
current value and add 1 until we get to the stop. We can now iterate over our Range just the same
as we would over the built-in range. And just like the built-in range because we're
only storing the start and the stop, not all the numbers in between, we
can construct huge ranges. There's no way a list of numbers from 0 all
the way up to this number would ever fit into memory. Yet we can construct the Range
and iterate over it quickly and efficiently. And to reiterate it's this laziness where
we're not actually constructing those numbers until we ask to see them
that allows us to be able to do this. So, while generators can be slightly slower
than lists in certain situations, if you're just processing them
one at a time like here we're
just printing out the current n, then a generator can be
a huge win over a list. And here's a little history for you. The built-in range in Python 2 actually
used to return a list. That turned out to be
a huge mistake. And for Python 3, it was changed to be
something more similar to what the generator is doing. It's not actually using a generator.
It's using something handwritten in C. But it's the same idea. Another very common and useful place that
you might use the generator is reading from a file. Once again, this is a situation where
because files can be so big, you might not want to read
the whole file into memory all at once. If you can process things line
by line using a generator, then even if the file is gigabytes big,
it doesn't matter. You'll only need as much memory
as you would need to process a single line. So for example, I have
some custom dataclass. In this case, it's just xyz points. My file just looks like this. I just have floating point
values xyx, xyz, xyz. I don't know what's up with
the red squiggle. pycharm thinks I have a
syntax error in my text file. Feel free to free associate about
what the problem is. In any case, we define a generator
that expects a file handle. What I mean by that is, the file
object that you get back from an open call. In a very generator like fashion,
iterating over a file actually iterates over the lines of the file. We strip off the trailing new line and
split it by the commas. Then we convert everything to floats. Create one of our custom data
structures and yield it. And then we just print out the rows. Of course, you
can do whatever data processing you like. The next very common use case for generators is
to think about them as lazy sequences. You can loop over them
repeatedly returning values. So you can think about those as
values of some sequence, whether it be a mathematical sequence like in this
case, or just a sequence of objects And you just don't compute the next
term in the sequence until someone asks for it. So, here's a `collatz` sequence. Take a positive integer n. If it's even, divided by 2. Otherwise, multiply by 3 and add 1. Then repeat. If you ever got to 1 then the sequence
would start repeating, it would start to go 1, 4, 2, 1, 4, 2 and so on. As of 2022, it's one of the world's
most famous unsolved problems in math. Starting at any number do
you always get to 1? Or could there be a sequence
that goes off to infinity? Or maybe some other cycle
like 4, 2, 1, 4, 2, 1? Well I'm here to announce that
I've actually proved the Collatz Conjecture to be independent of
the axioms of mathematics. Just kidding. Anyway, here's what a
typical Collatz sequence looks like. It does some unpredictable stuff. And then eventually, you hit
a power of 2 and it shrinks down to 1. This showcases another very
important property of generators. Imagine, if instead of a
generator we were using a list. Well, besides the fact that this
list might need to be arbitrarily large because we don't know
how long a Collatz sequence will be, what if I didn't care about
the whole sequence? For instance, what if I just
wanted to know how long is it? If you're wondering, it's 111 elements. But if I return to list in this case,
that would have been a huge waste. Why allocate all that memory and
store all those numbers just to get the length? If I wanted to be more efficient, I'd
need to write another function that calculates the
length instead of storing the list. But that length function would be
basically identical to the list function. Just instead of appending into a list,
we add one to account. if only there was a way to have one
implementation of the Collatz sequence that I can do whatever I want with. Once again, generators to the rescue. If I want the length of the sequence, then
I just count one for every element of the sequence. Once again, we see 111. And if I did actually want the whole list of
numbers in memory, then I can just call list on it. Generators can even be used to represent
sequences that we know are infinite. We can only ever use finitely
many terms. But we're able to compute as many
as we need without specifying ahead of time. So, you could represent all the powers of two all the
rational numbers of Fibonacci Sequence or all the prime numbers. All you need is an algorithm
for enumerating them. Defining a generator is as simple as
defining a regular function. But you can go even simpler
if your generator is simple enough. This is a list comprehension which
hopefully you're familiar with. And it creates a list whose elements are
x times x for each x in the range. Replace those brackets with parentheses.
And you now have a generator comprehension. This is really just shorthand notation
for defining and calling a generator function. Meaning this code and the elements
of this sequence will not be computed until you try to actually iterate
over the generator. And once again this can be
more efficient than the list version which creates all the
elements in memory immediately. If you happen to be immediately passing
a generator into a function, you can also do it this way. This creates a generator just
like in the previous line. And passes it to the sum function. Basically, it just lets you leave
off a pair of parentheses. And now that we know about
generator comprehensions, another great feature of generators is
that they're extremely easy to compose. You can build pipelines of data
out of generators in no time. Suppose, you want to be
able to parse a file like this. It has data in it that you want
to treat as floating point numbers. But you also want to allow
comments, full-line comments nans, infinities and blank lines. No need to write a fancy parser. Generators are plenty expressive
enough to get this job done. We start by opening the file and
iterating over the rows. Remember each row is
one line of the file. Strip off the new line. And remove anything after a hash in
order to strip trailing comments. Then Define another generator that
loops over the generator from the first line. All it does is filter out empty lines. Each lines should now contain
a floating point number. So, we use float to convert it
from a string to a float. Then we do another filter operation to
throw out any Infinities or nans. Then let's just pretend that
we want to replace anything negative with 0. And just for something to do,
let's just say we want to add up those numbers. This was very simple to write and easy to read like a step-by-step instruction
manual on how to create the pipeline. And once again all this happens lazily.
So, it's very memory efficient. We've completely defined our pipeline
before we ever actually read from the file. At this line in the code, we haven't
even read a single byte from the file. Each next call inside the sum triggers
this generator to look for one more element. That triggers this generator to compute
one more element which triggers this generator
to compute more elements until it finds a finite one. Which triggers this one to compute
more elements until it finds a non-empty row. Which triggers this one
which finally reads a line from the file. So, we're able to process
the whole file. And we don't even need more
than one line at a time in memory. And now we get to the advanced
usage of generators. A yield statement is not
just a statement. It's also an expression.
It returns a value back to you. And that's because generators are
not just possible functions that yield values. Generators are
actually bi-directional pipelines. Just like a generator can yield
a value up to its caller, Its caller can send a value
back down to the generator. And it's these sent values that
are returned from the yield expression. So, here's how we read this. We have a worker generator. The worker has a collection
of tasks, initially empty. Initially, we yield none because we
haven't had any chance to receive any tasks. Our caller is expected to send
us a batch of tasks. The idea being that we're
supposed to evaluate the given function using the arguments that
the caller passed to us. If the caller passes some new tasks,
then we extend our task list with those arguments. Otherwise, we assume that
the caller is asking us to complete a task. So, if there's a task available,
we pop it off. Evaluate the function with those
arguments to get some value. which then gets yielded
back out to the caller. And here's how a caller would
use the worker. So, our worker is just going to convert
whatever arguments we give it to a string. We use the send method to send
values into the generator. However, when we just create the
generator it doesn't start running it. The very first value that we send
can't possibly be accepted by the worker because the function is going
to start at the beginning. There's no yield statement there. So, first we just send none to
cause the generator to run to the first yield. Just like a call to next,
send will cause the generator to run until yield or until the
function returns. After the send none,
the generator will be paused here. So now, we'll send in three tasks:
the number 1, the number 2, and the number 3. These are wrapped as single
element tuples because we're processing them
using star args. Now, if we call next three times,
then we'll see our three values evaluated. When use the next call,
the return value from yield is going to be none. We can send in more values
and then print them out. And that's part of the
usefulness of this setup. I can add tasks or evaluate
tasks at any time. I don't need to have everything
prepared ahead of time. And I don't need to compute
all the answers at once. And another thing you can do is, use the
throw method to throw an exception inside the generator. As you can see, the exception acts
as if it was thrown from the yield statement. So, you could surround this in a try-except if you
wanted to handle exceptions that way. There's also the close method that
does basically the same thing as throw except it throws a
special generator exit exception. This exception gets special treatment. And it's basically a way for you to cancel
the generator without having that error propagate up. So at this point, this should feel
very familiar to something else in Python. We're basically submitting tasks
into this worker thing. And then something drives
the worker. And the worker decides how
the tasks are scheduled and when to actually call the
function and do the work. Doesn't that sound
a lot like async? I don't expect you to already know
all about async. Don't worry, I've got
a video on that coming. But just the general idea of
defining tasks and pausing functions and continuing later when
things are convenient. Well, it's no coincidence. As it happens, under the hood in Python, async
await co-routines are defined in terms of generators. So, once again, being lazy is
paying off big time. The lazy machinery of generators is powerful
enough to design an entire async framework around. And that's not even the
end of the video because we still have one
major feature of generators left to cover. `yield from` `yield from` allows one generator
to yield values from another generator. In most cases, you could use it exactly like you
would a for-loop looping over the values and yielding them. And that's totally fine.
There's nothing wrong with doing that. Using `yield from` is going to
be one line shorter. And it's going to avoid
using an extra local variable. However, that's not the intended
use of `yield from`. And it's not why it was introduced
into the language. Just think about it. Do you think they would really
introduce a whole new set of keywords just to have a shortened for-loop. The true purpose of `yield from`
was that it was introduced into the language in order to facilitate the
bi-directional nature of generators. Remember a caller can receive
values from generators. But they can also send values
to generators. But what if a generator wants to take
the values that it receives from its caller and pass them to a sub-generator. For instance, here's a quiet worker. It's another generator because it
has a yield. And all it does is it creates
a worker. And then yield from allows the worker to pass
messages from its caller directly into the worker. And likewise whatever the worker
yields is yielded back up to the caller. If a task causes an error,
the quiet worker just catches it. And then creates a new worker
to keep going. This is of course very bad practice because the task queue of
the worker may not have been empty. But we're throwing it out and creating
a new worker anyway. In any case, the `yield from` is what allows
us to pass messages bi-directionally. It essentially acts as a pass-through. Taking whatever messages from our
caller and passing them to the worker. And taking whatever messages from
our worker and yielding them up to the caller. That is the true purpose of `yield from`. The fact that you can use them to write one
line shorter for-loops is just a bonus. Oh yeah. And just like yield,
`yield from` also returns something. it's the return value of
the subgenerator. That thing inside the stop iteration from
the beginning of the video. This is its real purpose. So, there you have it.
That's all I've got on generators. I hope you learned something.
I hope you enjoyed it. Let me know in the comments how
much you want me to make that async video. If you really enjoy my channel, please do subscribe and consider
becoming a patron or donor. As always, don't forget to slap that like
button an odd number of times. See you next time.