Functions within functions, closures, and variable scopes in Python

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hello and welcome to mCoding. I'm James Murphy. Today, we're talking about functions within functions, variable scopes, and closures in Python. Of course, don't forget to subscribe so that more people can see my content. Let's start with this example that looks complicated, but by the end, I'm sure you'll be able to get it. The question is, what's going to happen when we run this function "level_six"? We've got these two functions, "donkey" and "chonky." "donkey" defines another inner function, which it ends up returning. "chonky" calls "donkey" to get that inner function. And then calls the inner function with an argument. So ultimately, the function returns some call of this inner function. But the complexity or confusion of it all comes from the fact that we have multiple different 'x,' 'y,' and 'z's floating around all the place. inner() has a parameter 'y', so that one's easy. That's just a local variable. But 'x' and 'z' need to come from somewhere else. It might use this "global x". Or maybe it takes this local x because when the function is actually called, Remember, it's the return results of this 'donkey' call. The nearest enclosing scope might seem like it's the scope of this 'chonky' function. As for 'z', since the inner function is defined before this 'z,' maybe it uses this outer 'z'. Or maybe it somehow gets this inner donkey 'z'. Or maybe it wants this donkey 'z'. But it can't have it because it's not defined yet. So maybe we get an error. Feel free to take a moment and pause if you want to work out the example yourself. But if you're already confused and you know you're not going to get the answer, don't worry. This is a tricky part of Python that doesn't come up that often. And as such, even experienced programmers don't get a lot of practice with it. And may not know what the rules are. So what answer do we actually get? The answer in this case turns out to be that it uses the global value of 'x' Of course. It uses the passed argument of 'y'. And it uses this value of 'z' in the "donkey" function that was defined after the inner function was defined. We're going to start off with much simpler examples than this. But once you know the rule, even this example will make complete sense to you. We'll start off with some simpler examples. But I want you to keep in mind the key to understanding these is the compile-time of Python. That's right, compile time. A very common misconception is that Python is not compiled. It's interpreted. In fact, it is compiled. And it's also interpreted. A module source is first compiled to bytecode. Then the interpreter interprets the bytecode at runtime. We often don't realize that this compilation is happening first because it happens automatically. How variable scoping works inside nested functions is one of the few features that depends on the separation of compile-time and runtime behavior. Here's the one rule that you need to remember in order to make sense of all of these examples. Variable lookups happen at runtime. But where Python will look for the variable is determined at compile time. Let's start simple and work our way up. level_one: we just say `return x`. And there's no 'x' argument. The only 'x' inside is the 'global x'. The code for every function is compiled at compile time. There are no assignments to 'x' in this or any enclosing function scope. So the compiler decides that it will look in the global namespace for this 'x'. Of course, you didn't need this video to tell you that. This one you probably expected. level_two: once again, we're just returning 'x'. But in this case, we take an argument 'v'. If 'v' is truthy, we assign a local variable 'x'. Otherwise, we don't assign to 'x'. So, which 'x' is returned? Remember, where we look for 'x' needs to be determined at compile time. The compiler doesn't use any information about the argument 'v'. The compiler simply notes that somewhere in this function, I assign to 'x'. Therefore, everywhere in this function, 'x' is treated as a local variable. Of course, that means if we pass in something truthy, then we get the `x = "local x". And then return that 'x'. But if we pass in something falsy, we get an error. The compiler determined that 'x' was a local variable. We tried to return that local variable. But we never gave it a value. That's why we're getting this "UnboundLocalError." It doesn't matter that there was a perfectly good variable named 'x' in the global scope. It was determined at compile time that it wasn't going to use that 'x'. It's going to use the local one. Alright, let's keep going. Level three: here we define our 'outer z'. Our inner function which only takes 'y'. And returns 'x', 'y', and 'z'. And then we call the inner function with a given argument. At compile time, the compiler says there is no local variable 'x' in this inner function. There is no local variable 'x' in this level_three function. Therefore, it will look for an 'x' in the global scope. There is a 'y' in this local scope. So it will use that local 'y'. And for 'z', there's no 'z' in the local scope. So it determines that it will use this 'z' from the outer scope. And not too surprisingly, indeed, we see the global 'x', the 'y' argument and the outer 'z'. Okay, but what if we defined "inner" first. And then defined 'z' afterwards? We do not get an error even though 'z' isn't defined when the function is defined. Once again, at compile time, the compiler decides that 'x' is going to be found in the global scope. 'y' is going to be found in the local scope. And 'z' is going to be found in the outer scope. It doesn't matter where the assignment in the outer scope is. Somewhere in the scope of the 'level_three' function 'z' is assigned a value. That means everywhere it's treated as a local variable of the 'level_three' function. So, Python decides that this 'z' will refer to that local variable. But remember, it doesn't look up the value of the variable until runtime. The runtime for this function is here when it's actually called. And by that time, 'z' is defined. So just as before, we see the global 'x', the 'y' argument, and the outer 'z'. Even though that outer 'z' wasn't defined at the time the function was defined. Just to repeat the rule again, it's because variable lookups happen at runtime. But where Python will look for the variable is determined at compile time. Let's drive this one home in "level_four". Here, we define an outer 'z'. Then our inner function. Then a new value for 'z'. Then we call the function. So which value of 'z' does it use? There you see it, it uses the second one. Even though at this point when the function is defined, 'z' already had a value. The inner function does not use that value. Instead, Python says the value of 'z' will be looked up in the outer function scope whenever the functions run, whenever the value 'z' is accessed. By the time the function is run in this call, the second outer 'z' has already been set. So it finds that value. Now, this might be where you start feeling like something is kind of fishy. How does it know? What if instead of calling this function right here, I had just returned the function? And then called it a million lines later? Isn't 'z' just a local variable that's going to disappear and be garbage collected once the function returns? Here's where we need to talk about closures. Traditionally, a closure is an object that wraps up a function with some kind of extra environment. In this case, the environment would be some kind of thing grouped together with the inner function that keeps a reference to this 'z' variable, keeping it from being garbage collected. Unfortunately, this is one of those cases where there are multiple definitions of a closure floating around. Some people use closure to mean the function together with its environment. Other people, including the people that wrote Python, use closure to mean just the environment part. Printing out the closure of the inner function, we see that it's a tuple containing a single element, which is a cell. Python determined at compile time that this cell is where the value of 'z' is going to be stored. The cell has a reference to a string object which is going to be the first outer 'z' because we're printing it before we define the second outer 'z'. Printing out the closure again after we assign the second outer 'z', we see that the cell object itself hasn't changed. It has the same address. However, the string object that it's referencing has changed. This use of a cell instead of the object itself is how Python ensures you always get the latest value of 'z' at runtime. Because the inner function only references the cell and not the string object itself. This ensures both that when we run the function, we get the latest value of 'z'. And it means that we can define the inner function even when the value of 'z' isn't defined yet. In that case, we see that the cell is just empty. Then, once we define a value of 'z', it gets put in the cell. Also, note that we only have one cell for 'z'. The global variable 'x' does not get a cell. For global variables, Python stores a reference to the global namespace in which the function was defined. That means that even if you pass this inner function off somewhere else and call it from a different module, it will still look up global variables in the module that it was defined in. Of course, this achieves a very similar effect to the closure attribute. The global variable 'x' doesn't need to be defined at the time I define this function. When I call the function, the most recent value will be looked up in the global dictionary. And if my inner function didn't reference 'z', meaning it doesn't have any references to any non-local variables, then the closure attribute would just be set to "None." Let's move on to "level_five". The point of "level_five" is to show you that although functions are compiled at compile time, meaning their source is translated into bytecode at compile time, actual function objects that get hooked up to that bytecode are created at runtime. That's what the "def" keyword does. "def" does not compile a new function. "def" creates a new function object with the given name. And hooks it up to the pre-existing bytecode. That means every call to level five defines its own copy of the inner function. Each of these copies is distinct and has its own closure. Every call to "level_five" has its own cell for its own copy of 'z'. Therefore, the closures for each copy of the inner function can refer to completely different 'z's. So in this call, we pass in "n=0." This call, we pass in "n=1". In the first call, we see 'outer z 0'. And in the second one, we see 'outer z 1'. Of course, this is probably what you expected to happen in this simple case since there's only one 'z' floating around. And now we're back to "level_six". Let's just follow the rules. There's no assignment to 'x' in 'inner'. There's no assignment to 'x' in "donkey." There's no assignment to 'x' in "level_six." 'x' will be looked up in the global scope of this module. 'y' is a parameter of the function. It'll be looked up as a local variable. There's no assignment to 'z' in "inner". But there is an assignment to 'z' in "donkey." Therefore, the 'z' will reference the cell in "donkey," which eventually gets set to "donkey z" and is never modified again. We return the function. And because that function's closure is referencing the cells, it's pointing to this "donkey z" That cell is not garbage collected, and it continues to point to the value "donkey z". In 'chonky', we call 'donkey'. And the inner function gets assigned to the value of 'f'. The 'x' here is irrelevant. The inner function will always look in the global scope that was determined to add compile time. When we call 'f', we pass in 'y' which is that local variable. And 'z' is still referencing the cell that points to 'donkey x'. So we'll see 'donkey z'. Therefore, when we print this out, we'll see 'global x', 'y arg', and 'donkey z'. Now that you know the rule, it's not so bad, right? Before we get to the final "level_seven", I'd like to go over a few odds and ends. First up, what about lambdas and comprehensions? Although lambdas are syntactically a very different way of defining functions You don't give them a name. They kind of have to be one line. They have this implicit return statement. And Lambda is an expression, not just a statement like a "def" is. Ultimately, lambdas are just functions. It's fancy syntactic sugar for defining a function without giving it a name. All of the scoping rules for lambdas are exactly the same as for a function. So the rules for this Lambda would be the same as for this function. The code in the body gets compiled to bytecode at compile time. And variable lookups happen at runtime. But where Python will look for the variables is determined at compile time. Next up, comprehensions. I'm going to use a list comprehension, but it's the same for any of them. You may not have realized it before. But defining a comprehension is defining a function and immediately calling it. This comprehension is semantically equivalent to this, which is equivalent to this: defining a generator and then passing that into a list. And this part, defining the generator, is more or less equivalent to this. It defines a generator function and gives you an instance of it. The scoping rules for generator functions are exactly the same as for normal functions. So whenever you use a comprehension, just pretend you've got a generator function there. And use the normal scoping rules. These cases usually aren't very confusing since they're typically very short. In this case, 'x' is treated as a local variable since the for loop assigns to 'x'. And the last thing we need before our last example is what do "nonlocal" and "global" do? "nonlocal" and "global" don't actually correspond to any operation that happens at runtime. By default, the compiler would determine on its own whether each variable was a local variable, a non-local variable, or a global variable. Because we assigned to 'x' in this function, the compiler would normally assume that this is a local variable. "nonlocal x" instructs the compiler to treat 'x' as a local variable of some enclosing function scope rather than a local variable of the current function scope. So this assignment will actually change the value in this function. In our first print statement, we see the "nonlocal x." Calling the function changes the value of 'x' to the overwritten "nonlocal." That's the return value, so we see it once. Then we print out 'x' again, so we see that indeed, its value has changed. Then in "main," I'm also printing out the global 'x', so we can see that it hasn't changed. Let's change this to "global x". The first print, we'll see 'nonlocal' Our call to 'f' will change the global value of 'x' and return it. Then when we print this 'x' again, we'll still see the 'nonlocal x' because in this function, 'x' is a local variable. But when we get to the global print, we'll see its value has changed. So indeed, we see 'nonlocal', then the overwritten global, the 'nonlocal' again, and the overwritten global. The lookup rules are exactly the same. It's just that you get to override the compiler if it would have made a choice that you didn't want. This is typically only needed if you want to assign to a variable from an outer scope within an inner function. If you're just reading the variable, what the compiler does is usually the most sensible thing. But if you really shadow your variables a lot, you might end up using this. So I'd say if you're using "global" to just read a value, then the real solution is to just choose a better name that doesn't conflict with a global variable. Or, better yet, to not use a global variable in the first place if you don't have to. So finally, we come to "level_seven". Please, please, please do not do this. But if you understand the rules, it's a straightforward application that you should be able to understand why you get the output that you do. Take a moment to think about it and comment below. I'm just gonna blast right ahead. Okay, inside "level_seven," we define "please_dont_do_this." This defines some generator. And it returns an instance of the generator along with this lambda returning 'a'. Because of this monstrosity, 'a' is determined to be a local variable of the "please don't do this" function. It doesn't matter. This code will never execute. Because there's an assignment to 'a' somewhere in the function 'a' is treated as a local variable of that function. Inside the generator, we explicitly mark 'a' non-local. Every time we unpause the generator, this updates the value of 'a'. Because 'a' was non-local, that refers to this 'a'. Inside this lambda, 'a' is also treated as non-local because we don't assign to it. So it finds this 'a'. Then we return the generator and the lambda, whose closures both point to this empty 'a'. In the outer code, we grab our the generator and lambda. If we call the function first, then we get a name error. "cannot access free variable 'a' where it is not associated with a value in enclosing scope." 'a' was pointing to a cell. That was empty. That's why we're getting this error. But if we run the generator first, then print the function, we see the value 0. Every time we call "next" on the generator, it updates the value of 'a'. And whenever we print the function, it gives us whatever the current value of 'a' is. Just imagine the person who's got a bug in their code. And they track it down to the source, and they find this lambda that just returns 'a'. But every time they call it, they just seem to get a different value. Gotta say, that would be a pretty bad day for that developer. So anyway, like, comment, subscribe. Thanks for watching. Thanks to Kevin for submitting the donkey-chonky example. And as always, thank you to my patrons and donors. If you really enjoy my content, please do consider becoming a patron or donor. It does help me out. So please go forth with this knowledge and never do this. See you next time!

Info

Channel: mCoding

Views: 62,578

Rating: undefined out of 5

Keywords:

Id: jXugs4B3lwU

Channel Id: undefined

Length: 18min 43sec (1123 seconds)

Published: Mon Nov 21 2022