8 things in Python you didn't realize are descriptors

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hello and welcome to mCoding. I'm James Murphy. Let's get going talking about descriptors. Contrary to how the name might sound, descriptors are not the mutual enemy of Autobots and Decepticons, nor do they have anything to do with descriptions. Descriptors in Python are somewhat of a feature that's hidden in plain sight. Officially, an object is a descriptor if it has any of these dunder `get`, `set`, or `delete` methods. And their purpose is to allow you to customize what it means to get, set, or delete an attribute. In this case, `x` is an instance of the descriptor on the left. When you call `obj.x`, this calls the descriptor's `get` method. When you call `obj.x = something`, that calls the `set` method. And when you call `del obj.x`, that calls the `delete` method. `get` takes the object and the object type. Importantly, this allows you to do different things based on whether something was called from an instance or the class itself. If this is your first time hearing about descriptors in Python, you're probably thinking this is one of those very niche features. It's advanced use only. And, by the way, isn't that the same thing as `getattr`, `setattr`, and `delattr`? Why is there a need for descriptors at all? It is true that without knowing more about the internals of your class, `something.x` could be calling the `getattr`, or it could be calling the `get` method of a descriptor. `obj.x = something` could be calling the `setattr`, or it could be calling the `set` of a descriptor. And `del obj.x` could be calling the `delattr`, or a `delete` of a descriptor. Without seeing the internals of the class, you just can't tell. But there's a big difference between `getattr`, `setattr`, and `delattr` versus the descriptor versions. Namely, the methods on the right-hand side are defined per class. Whereas, the ones on the left-hand side are defined per attribute. On the right, it's the class that's determining how to access attributes. Whereas, on the left, it's the attribute itself that determines how it's accessed. Even still, you may be thinking: "Descriptors? I'm never going to need that, right?" "I never define descriptors, I never work with descriptors." "I don't need to know." Well, welcome to my list of descriptors hiding in plain sight. You may not have realized it, but you're using them all the time. In descriptor number one, functions. Have you ever noticed a difference that you get when you access a function through an instance versus through the class? Accessing the function `f` through an instance little , `a`, we get a bound method. But accessing the same `f` through the class itself gives us a function object. It's the same `f` in both cases, but I'm getting different results. That's because every function you define with the `def` keyword is a descriptor that defines a `get` method. And it uses the descriptor protocol to do something different based on whether it was called from an instance or the class itself. Functions in Python are written in C. So this isn't exactly what it's doing. But it's something close to this. Every function has a `get` method. If the object is `None`, then there's no instance associated with this lookup, meaning it was called from a class object itself. In that case, we just return the function as is. Otherwise, `object` isn't `None`, and we're in a case like this, where we're looking up a function on an actual instance variable. In that case, instead of returning the function itself, we return some kind of bound function that remembers the object. So, yeah, if you're using functions, then you're using descriptors. Number two, descriptor hiding in plain sight, properties. And yes, this is a different case than just functions. Let me show you why. Let's reach into the class's dictionary and print out what this `area` thing actually is. I'm accessing it through the dictionary like this in order to avoid invoking the descriptor. We see that `area` is not actually a function. It's a property object. First, `area` is defined as a function just like normal. Then, you replace `area` with whatever you get by calling `property` on it. That's why technically, `area` is a property object, not a function. So, this is exactly the same as this. And property is a descriptor. So, it controls what it does when you say `dot area`. It just so happens that what it does is call that original `area` function. Properties like this are never really needed. You could always just call the function directly. But it's a common way to indicate to the programmer that this thing is really cheap to compute. By making `area` a property, you're basically telling users that `area` is so cheap to compute that it's basically as free as an attribute access. If for some reason your `area` function really was expensive, then make it look expensive. Don't hide the fact that it's a function call. Just make that explicit. Funnily enough, the main reason that I see people use properties is actually to introduce a feature to Python that was specifically left out of the language. By design, in Python, all attributes are public. There's no way to prevent someone from accessing internal implementation details of your classes. And aside from inheriting from a built-in type like `tuple`, this also makes it impossible to make truly immutable types. But this is a pretty common pattern to prevent people from accidentally mutating your object. Add an underscore to the beginning of your attribute name. Then make a property with the same name without the underscore. People can still read the name, no problem. But if they try to write to it, they get an error. But of course, this isn't true immutability. Someone could just reach inside and manually change the underscore variable name. But it's pretty much an unspoken rule in Python that if you have an underscore variable or underscore function, then you're not meant to touch those. So if you do change an underscore variable or call an underscore function, then you should expect everything to break. It's your own fault. Anyway, here's how you might implement property if you were doing it yourself. The built-in property also does `set` and `delete`, but you get the idea. As per usual, if you weren't passed an instance, then just return the property itself. Otherwise, call the stored function on the instance that was passed in. Hidden descriptor number three, class methods and static methods. Both class and static methods allow you to call a function whether you have an instance of the class or the class itself. In both cases, since you might not have an instance to work with, there's no `self` parameter. And the difference is that a static method has no implied parameters. Whereas, a class method has an implied class parameter. So in both of these cases, whether you called with the capital animal class or the lowercase animal instance, the class parameter of the `create` function will be filled in with the animal class. I have a whole video on class methods versus static methods. Check that out if you want to hear more. As far as possible implementations go, they could look something like this. Just like properties, both of these take and remember the function that they're applied to. Static method is much simpler. Whether you were called with an instance or not, just always return the function back. Class method is a bit trickier because we need to supply that class parameter. If we weren't passed the type to use, then we just use the type of the object. Then this is how we bind that object to the class parameter. Remember, functions are descriptors. And the `get` method returns a bound version of the function where the first argument is bound to the first argument of the `get`. It's not totally clear if the second argument matters at all. But this works. So as you can see, descriptors are often used to make sort of function object-like things. But that's not all they're useful for. Let's take a look at number four, slots. This is another one that I have a full video on, but here's the quick rundown. Normally, objects have an instance dictionary. Anytime you store a variable into the object, it really just stores it inside this dictionary. But especially for small objects, dictionaries aren't necessarily the most efficient way to store things. If you define `__slots__ = ['x', 'y', 'z']`, then you're saying the only three attributes that my instances are going to have are `x`, `y`, and `z`. You can get, set, and delete `x`, `y`, and `z` no problem. But if you try to get, set, or delete `W`, then you get an error. Once again, directly reaching inside the classes dictionary, we see that `X` is a member object. These `__slots__` members also define all three of `get`, `set`, and `delete`. And because there's no instance dictionary to manage this, these `get`, `set`, and `delete` have to reach into the underlying C structure of the objects and manually modify them. Again, see my video on slots if you'd like to hear more. And speaking of instance dictionaries, do you know what else are descriptors? Instance dictionaries. The dunder `__dict__` attribute of any class that has instance dictionaries isn't a dictionary. It's an attribute object, which is a descriptor with all three of `get`, `set`, and `delete`. And notice this weird idiom that I had to do in order to see this attribute object. I had to reach into the dictionary of the class and then read the dictionary. If you print out just the `__dict__`, it looks like a dictionary. But if you look at the type, you see that it's actually a mapping proxy object. This happens because, remember, `__dict__` is a descriptor. So accessing `E.__dict__` invokes the descriptor. And what the descriptor does is return this proxy object instead of the actual attribute. So that's why we had to do this double `__dict__` thing. Moving on, how about a real-world library use case of descriptors? Look no further than one of the most popular Python packages of all time, SQLAlchemy. SQLAlchemy lets you communicate with databases through Python in a pythonic way. Here, I define a user account table with two fields, ID and name. When I define the class, I say that ID is a column that takes integers, and name is a column that takes strings. When I operate on an actual instance of the class, I'm working not with column objects. But with actual ints and strings. Getting different behavior on a class versus an instance? You guessed it, they're descriptors. Number seven. Another common use for descriptors is field validation. For example, I want to say here, this item has a price that's greater than zero. Whenever I set a price, I want to make sure that it's positive, and if not, I want an error. This functionality is accomplished in the `set` method. Before setting the attribute, check if it's bigger than zero. Otherwise, throw a `ValueError`. Then proceed with setting the attribute. This is an interesting use case for Python's `__set_name__`. This function is called at class construction time. And its purpose is to let each object know what its name is. After the class body runs, this `greater_than` object will be told that its name is `price` by having the `__set_name__` called with the owner being the `Item` class and the name being `price`. In this case, I'll prepend an underscore to the name. And then use that as the sort of private location where I'm storing the actual data. So the descriptor is stored at `item.price`. But the value that's underlying the price, the actual price, gets stored at `item._price`. Knowing the name that we're assigned to is a good way to avoid conflicts if you have multiple of these descriptors in the same class. So if I also had a `quantity` field, then there wouldn't be any fight over where to store each of them. They each have their own private location. And the final descriptor on my list is super lookups. Suppose I have a `Package` class that can ship to some address. And an `ExpressPackage` that ships faster. I don't really recommend doing this. But you can set a base view onto your class. Create a super object, and stick it on there. If you ship an `ExpressPackage`, it's on the way right away. But if you call `ship` on the base view, then you'll get the parent's behavior. I'm not going to go into why this works here. I have a whole video on `super` if you want to see the gory details. This next portion is slightly more advanced and deals with some tricky issues you might run into. So if a descriptor's `get` method and a class's `getattr` method can both define what it means to say `object.something`, then what happens if you have a class that has both? I encourage you to take this example. And try commenting and uncommenting things to see how things actually work. The first trip-up is that there's not just one `getattr`. There's `getattr`, and `getattribute`. And `getattribute` is actually the one that's more similar to `setattr` and `delattr`. `getattribute`, `setattr`, and `delattr` are always called. When you say `object.something`. `getattribute` is always called. When you say `object.something = something`. `setattr` is always called. When you say `del object.something`, `delattr` is always called. And for these three functions, it's actually the base `object` class whose versions of those functions actually implement the descriptor logic. So if you're defining `__getattribute__`, `__setattr__`, or `__delattr__`, then you should call the base `object`'s version of that function inside your implementation if you want it to work with descriptors. If you don't call the base `object`'s version, you'll find that your descriptor methods `get`, `set`, `delete` are not called. Unless, of course, you manually call them in your version of the function. That's to say, you can sort of override the descriptor protocol if you so choose. And then there's `getattr`, which is actually only called if `getattribute` raises an `AttributeError`. So how does `getattribute` decide what order to do things in? The object version of `getattribute` has a series of fallbacks that it tries to figure out which one is the best one to call. If it finds a descriptor that has a `get` and a `set` or `delete`, then that's called a data descriptor, and it has the highest priority. Next, it checks inside the instance dictionary. Then it checks for descriptors that just have a `get`. These are called non-data descriptors. Then it checks for variables found at the class level. Then it'll raise an `AttributeError`, which triggers the `__getattr__` if it exists. These defaults are definitely not obvious, although they are generally good for most use cases. It may feel especially weird that instance variables are split between these data versus non-data descriptors. The reason that non-data descriptors get lower priority than instance variables is because the most common use case for these is caching. The descriptor computes some expensive variable and then saves it in the instance dictionary. The next time it gets looked up, it's found in the instance dictionary and isn't recomputed. This was chosen because of how common caching is. But if you want your descriptor to always be preferred, then you can just define a `set` or `delete` that just does the default thing. Here's a case where you can see that priority difference play out. For `x`, I have a data descriptor. It defines both a `get` and a `set`. Whereas `y` just has a `get`. Both of our `get`'s return `None`. So when we print `X` and `Y`, we get `None`, `None`. Then for both `x` and `y`, we store `42` in the instance dictionary. Print them out again, and for `x`, we get `None`, but for `y`, we get `42`. This is because the data descriptor's `get`, which returns `None`, has a higher priority than the dictionary. Whereas the non-data descriptor's `get` is not called because the dictionary has higher priority in that case. Obviously, the rules are complex, non-intuitive. And I don't expect you to get them in one go. I just hope this helps you remember that there is some subtlety there. And you might want to look into it if you're ever defining a descriptor like this. Anyway, thanks for making it to the end. I hope you enjoyed it. As always, thank you to my patrons and donors for supporting me. If you enjoy my content, please consider subscribing. And if you especially enjoy, please consider becoming a patron on Patreon. Don't forget to slap that like button an odd number of times. See you next time.

Info

Channel: mCoding

Views: 80,308

Rating: undefined out of 5

Keywords:

Id: mMbVs17Vmo4

Channel Id: undefined

Length: 14min 21sec (861 seconds)

Published: Mon Sep 26 2022