super/MRO, Python's most misunderstood feature.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi, I'm James Murphy and this is mCoding. In today's episode, we're talking about Python's super. super is probably Python's most misunderstood feature. So, today we're going to break it down. We'll cover its basic usage for single and multiple inheritance, go over some of the misconceptions about how it works and what it actually does and then finally see a pure Python implementation of it. I'm also giving away professional licenses for PyCharm and for CLion. So, comment #pycharm or #clion if you're interested. Okay, here's the simplest and most common usage of super. We have a Base class that defines some function f. Then we have a Derived class that derives from the Base class and also defines a function f. The Derived class wants to do basically the same thing the Base class does but maybe with a slight modification. Instead of copy pasting the code from the Base class which introduces redundant code and is also very error-prone, instead, we'll use super. Inside Derived, super f of x is going to call Base's f of x. The result in this case is that we see the Derived f called then the Base f called then the Derived f finishes. Notice that we didn't pass self into super. But when the base function is called, we see that self was passed as a parameter. So, you should not try to manually put the self parameter in. It's automatically filled in and we'll see more on this later. You should just pass whatever remaining positional and keyword arguments there are, in this case, just x. super also works just fine with class methods. And in that case, it will automatically pass in the class instead of an instance. Here's a more concrete example. Suppose, we want a logging dictionary. I want it to be usable exactly like a dictionary. But whenever I get, set, or delete a key, I want a logging message printed to standard out. Instead of trying to implement our own dictionary, we just inherit from the built-in dict and then use super in the setitem, getitem and delitem methods. We print our logging message and then the super call ensures that the real dictionary functionality still goes through. We can then use a logging dictionary in much the same way that we use a normal one. Create it. Then we can set d of 0 equals "subscribe". We can grab out the value of d of zero and delete it. In just a few lines of code, super allowed us to have a fully functioning dictionary that also gives us our desired logging messages. Before we move on, although I think it's extremely important to understand how super works and the more complicated features of it, I think that for 99% of you, this basic use case of super is all you will ever need. And for 1% of you, congrats on your incredible job security. Okay, quiz time. Suppose, we have a class A and a dot f works. Then we have a class B deriving from A that defines its own f and calls super f. Does super f necessarily call the f to find an a? Pause and think about it if you'd like. Okay, here's the answer. The answer is... No. In this case, although you can call f on an instance of A, A didn't define its own f. So, when you call super f, it actually goes up to the Root's f. So, in this case, super f did not go to the parent of B. It went further up the chain. Okay, question two: Suppose we have a Root class and A inheriting from Root. In this case, both the Root class and A define their own version of f. So, my question is: Does this super f necessarily call the Root's f function? Once again, pause to think about it. Once again, the answer is no. When I run the example I'm about to show you, we see this A dot f. So, that code is running. But the super call ended up calling some other function B dot f. Root dot f never got printed out. So, the parent classes f was never called at all. This happened because A dot f wasn't called on an instance of A. It was the result of a super call from a child class. I created a sibling class of A B which also inherits from Root. Then I created a C which inherits from A and B. We instantiate an instance of the C class and call its f function. The C class of super f calls A's f. But because of the multiple inheritance the super call in A doesn't go up to A's parent, instead it goes to A's sibling B. The B class happens to not make any super call. So, the root classes f is never called. If we add the super call into the B class, then the Root will eventually be called. But notice, the super call of A still called B, not the parent Root. So, what's going on here? Why does super sometimes call a parent, sometimes call a sibling? It could even call different things based off of the class that it was called from. If I had an instance of A, this super call would go to Root. But from an instance of C, that same super call went to B. The answer has to do with what's called the class's Method Resolution Order or MRO. It's actually kind of a misnomer because it applies to all attribute accesses not just method lookups. Consider the same inheritance hierarchy but instead of f being a function just let it be a variable. We create a C object and print c.f. Of course, C defines an f, so it should print C. But what if C didn't define its own f. Well, C inherits from both A and B. And A and B both do define f. So, when we look up c.f, we should be able to find one of these f's. But which one should we choose? And if neither A nor B defined f, we should still be able to find it all the way up at the Root. By now, I hope you can see that this is a search problem. Given a class, when I try to look up an attribute, I need to decide on some order to look through that class and its parents and their parents and so on and so forth. This is what the Method Resolution Order for a class does. Here, the full class representations are kind of long. So, I'm also just printing out the names of the classes in the MRO. So, as you can see, the MRO for the class C is C, A, B, Root, object. That means, when you try to look up a value in C, it will first check in C, then in A, then in B, then in Root, then in object. If Python searches through all of these and still can't find your attribute, then an attribute error is raised. Hmm. C, A, B, Root. That's exactly the order we saw when we were doing printouts with super calls. Remember this example. We saw C.f then A.f then B.f then Root.f So, finally we can understand where super is taking us. A super call does not take you to the object's parent class. It takes you to the next thing after the current class in the object's MRO. The way you should read and think about super in your head is next in line. super means next in line. In a single inheritance case that line is simple. Next in line is your parent. But with multiple inheritance, that's not necessarily the case. As a programmer, there are just a few different properties of the MRO that you need to keep in mind. The MRO of any class starts with the class and eventually ends in object. Because all Python objects eventually inherit from object. The next property is that a child goes before its parents and the parents must maintain their relative order as they appear in the declaration of the child. So, C goes before A and A goes before B. In this case, the resulting MRO is C, A, B, object. The third important property is that the MRO of a child must be an extension of the MRO of each of its parents. That means each parent's MRO is a subsequence of the child's. This property ensures that an object's parents and their parents and their parents and so on all appear in the MRO. And the last important property of the MRO is that if these rules create a contradiction, then you'll get an error when you try to define the class. We'll get an error trying to create this D class because A is supposed to come before C. But C is supposed to come before A. That's a contradiction. So, we get an error telling us that we can't create a consistent MRO. But in the real world, you don't need to use logic to figure out the MRO just access the dunder MRO attribute. Still though, even if you understand MRO and you understand that super means go to the next thing in the MRO and call that, this still presents an interesting design challenge. When I design a class, I can't possibly imagine every single possible class that could be the next thing in the MRO. This poses two major issues. First: If a super call isn't necessarily going to call my parents version of the function, then how do I ensure that my parents version of the function is eventually called? Presumably, I need that functionality. That's why I'm inheriting from my parent. And secondly, since a super call can go basically anywhere, how do I know that the parameters that I'm passing to that call are correct? For instance, what if the next class in line needs an x parameter that I don't have to give. In general, it's not possible to make super work with an arbitrary class hierarchy. Instead, the solution is cooperative inheritance. The only way you can make this work is if through some means you can ensure that all classes in the hierarchy obey certain rules. The first rule is that there should be some Root class that everything in the hierarchy inherits from. An explicit Root class is often the easiest to work with. But it's also okay if everything ultimately inherits from a built-in like set or dict. The second rule is that if you use a super call in one version of a method, then you should use the super call in every version of the method in the whole hierarchy except, possibly, the Root. This is what will ensure that a super call will eventually result in your parent classes version of the function being called. The parent class might not be the next in line. But whatever is will also make a super call and then whatever's next will also make a super call and eventually it will arrive at your parent. The Root object versions of that method will then act as a sink to end the chain of super calls. Now, let's see how we make sure the arguments that we pass to a super call make sense. The first solution is one we've already seen and it's the one that dict uses. Just make all of the versions of the function take the exact same arguments. This is a great first choice but it's also very restrictive. Consider an init method. If I have a whole hierarchy of classes, I can't very well expect them to take the same parameters in their init. In this case, I recommend what I call keyword argument peeling. Do you know what signature can match any function? *args **kwargs Make your function take any positional arguments and any keyword arguments. And then just peel off the keyword arguments that you actually want. In this case, I'm defining a ValidatedSet. It's just a set. But when you try to add an element into it, it checks that the element is valid. The initializer takes in an optional list of validators. It doesn't matter what other keyword arguments you pass to this function. Anything that's not validators gets shoved into kwargs which is ignored and then passed on through the super call. If every class in the hierarchy cooperates, this will eventually pass all the unused args and kwargs to the Root object which in this case is set. set will of course eat all the remaining positional arguments and error if there are any remaining keyword arguments. For the add method, we take the simpler approach. Every add method should take one positional argument. We can then use the validated set like this. We pass in a list of validators in this case I'm just checking that all the elements are integers. But trying to add the string "5" results in an error. Let's see another example. This time I'll have a ReducedSet. Whenever you try to add something into a ReducedSet, it first reduces the element before adding it into the set. Again we take *args, **kwargs and an optional function to reduce the elements. This time when we try to add an element, we first reduce it before passing it along through a super call. Then we can create a modular set which is both a validated set and a reduced set. I'm thinking about a modular set as being like integers mod n. So, I pass on the validators list as being just a single list of is_int. And I take the reducer to be a function which reduces things mod_n. We can then take a set of integers mod5. 5 and 10 are multiples of 5. So, they'll wrap around and be reduced to 0. So, when we run the example, we just see 0, 1 and 2. And if we try to add something that's not an int, we get an error. At this point, I hope we understand a little bit about how to use super in both single and multiple inheritance settings. But have you ever thought about what super is actually doing? There's clearly some weird behavior going on. First off, I can't even use the zero argument super outside of a class. I just get a runtime error. Okay. So, let's create A. And B that inherits from A and they both have functions f. And B calls super f Let's try extracting super into a variable and then using it. Okay, it still works if we do that. Quiz time. What is this super variable? Take a moment to think about it. People often think that super returns the parent class object. Or if you're more enlightened, maybe you would say that it returns the next in line class object. But this is not what super actually returns. The return value of a super call is an instance of the super class. That's right. Super is just a class and a super call is just constructing an instance of that class. An instance of super is what's called a proxy object. It's a wrapper class that stores an object and then forwards attribute lookups to that object. Here's an example of the most basic kind of proxy object. I literally take in an object to the constructor and store it. The getattr function is what's called whenever you use dot something on an instance of a class. So, if I called proxy.abc, then the string abc would be passed to this function. We then use the built in getattr to pass that same request onto the stored object and return it. We could then use it like this. Say, I create an object, a list [1, 2, 3] I create a proxy for that object and then I say proxy.append(4) proxy.append will call the getattr and the getattr will then return the underlying lists append function. We then call it with 4 and 4 gets appended to the underlying list. But super doesn't just forward attribute requests and return the result. It does something a little bit different. Here's a proxy object that's a little bit more similar to what super actually does. This proxy object takes in a class and an object and it stores them. Then to look up an item, it forwards it to the class object. If the result is something like a function that has a get method, then it calls that get method to bind, say, the function to that object and return the function. Now, we can use our kinda super proxy object kind of like we use super. Then when we run it, we see the same B A printout that we would if we used super. But there's one big difference here between this and the real super. Here, I'm just passing in A, telling it where the super call should go. But with the real super we didn't pass any arguments at all. So, how can it possibly know what class is being run from, what the self object is, or where the super call should go next? Hold on to your shoes because super is doing something pretty cursed here. Did you know that in Python you can get access to the currently executing stack frame? Let's take the current frame, go one function up in the stack and look at what are the local variables in that scope This allows us to look up the local variables of the function that called this function. Take a look. We have x equals five, s equals "subscribe" and then we call this print_callers_locals. As you can see, print_callers_locals was able to see the x equals five and s equals "subscribe". Okay, let's create a class and a method and call print_callers_locals inside the method. We didn't pass any parameters to print_callers_locals but clearly it can see the self variable. But don't worry. It gets even more cursed. super can use this trick to find the self parameter or actually what it does is look at the first positional argument. But where does it get the information about where this code is being run from? Well, watch what happens if I so much as mention the word super inside the function. I didn't even call it. I just mentioned the word super. And now we have a new local variable in our function. dunder class That's right. If you so much as mention the word super inside of a function, then Python will add this secret dunder class variable that contains the class that is currently executing. The same thing happens if you just mention dunder class. So now, a zero argument super can gather all the information that it needs by looking in its caller's stack frame. This is super's true form. What's actually happening with the zero argument form of super is it's sneakily going up the stack frame and grabbing these two variables. The two argument form of super is how you should really think about it. It's a proxy object that stores the class that it's currently being run from and the object that it's currently being run on. Then it uses the MRO to find the next thing in line. It's just compiler magic that this works if you don't pass in the arguments. You can actually use this two argument form of super if you want and you can actually use it anywhere even outside of a class. You just tell it what class it's supposed to pretend it's running from and what object to proxy. I create a B object, create a super that's supposed to be running from within B on a little instance b. Then when I go to look up the method f, I get the method from A bound to the B object. Then you could call it just like you would any other super call. If you want to see the information a super object has stored, you can access the self, self_class and thisclass attributes. self is going to be the object little b that it's supposed to be proxying. thisclass is the first argument, big B where it's supposed to be pretending to run the code from that class. self_class is used in order to make super work both with B instances and if you pass in the B class. If I pass in the b class, like what would happen if I was using a class method, then self_class will be the class itself rather than the type of that class which would be type. These can actually all three be different typically if your super call is the result of another super call. That would cause the class that the code is being run from to be different from the type of the object. There's also a single argument form of super where you just pass the class and don't pass the object to bind. The only thing useful you can do with a super that hasn't been bound to an object is to bind it to an object at some later time. But in my experience, I have never needed to do this. So finally, are you ready to see my pure Python implementation of super? It's not for the faint of heart. So, thank you if you want to see it. It's also probably full of bugs but it does work. First off, in init, we take the class and the object to proxy. The most interesting case is when both of these are None. That's the zero argument super. In that case, we get the current stack frame, look up the local variables of our caller and find the first positional argument which is probably going to be self. Then we look up the special dunder class variable. Now, this is one exception that I do have to make. The real super gets special treatment. In that, if you even mention its name, Python will automatically fill in this underclass variable in the caller's local variables. Well, I don't get that special treatment. So, what I'm going to say is if you want to use my super, then you have to mention dunder class somewhere in your function. Either do that or use the two argument form. That's the only exception that I'm making. So, I try to look up the dunder class variable then we set our this class variable and then use this bind_self method to set our self and self_class variables. This case is for the one argument form and I just said self and self_class to None. Otherwise, we check if our object is itself a class. In that case, both self and self_class are that class. Otherwise, the object better be an instance of the given class. In which case, the self is our object and the self_class is its type. Let's ignore get. That's just to make the single argument version of super work which you shouldn't even be using anyway. Then we get to the meat of super, the getattr the thing that actually does the proxying. We start by looking up the MRO. And then take the next index after thisclass in the MRO. This is the next in line part of super. We look up this class in the MRO and then go to the next index. Then we start looping over the rest of the MRO. We grab out the ith class then we try to look up the item in that class's dictionary If the lookup fails, then we just pass and go to the next iteration. Otherwise, we check to see if the thing that we got back has a get method like a function. And if it does have a get method, then we need to bind the function to the correct argument. After we bind the instance, then we return it. If we look through the entire MRO and still didn't find it, then we just raise an attribute error. And here's how you can use it. I just have a simple class hierarchy. I just mentioned dunder class and then I use super with zero arguments just like I would the normal super. Well, that's all I've got. Thank you so much for watching to the end if you got this far. Don't forget about the giveaway. Check the description for more info on that. As always, don't forget to comment, subscribe and slap that like button an odd number of times. See you next time.
Info
Channel: mCoding
Views: 186,000
Rating: undefined out of 5
Keywords: python
Id: X1PQ7zzltz4
Channel Id: undefined
Length: 21min 7sec (1267 seconds)
Published: Tue Mar 15 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.