8 things in Python you didn't realize are descriptors

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello and welcome to mCoding. I'm James Murphy. Let's get going talking about descriptors. Contrary to how the name might sound, descriptors are not the mutual enemy of Autobots and Decepticons, nor do they have anything to do with descriptions. Descriptors in Python are  somewhat of a feature that's  hidden in plain sight. Officially, an object is a descriptor if it has  any of these dunder `get`, `set`, or `delete` methods. And their purpose is to allow you to  customize what it means to get,  set, or delete an attribute. In this case, `x` is an instance of the descriptor on the left. When you call `obj.x`, this calls the descriptor's `get` method. When you call `obj.x = something`, that calls the `set` method. And when you call `del obj.x`, that calls the `delete` method. `get` takes the object and the object type. Importantly, this allows you to do different things based on whether something was called  from an instance or the class itself. If this is your first time hearing about descriptors in Python, you're probably thinking this is one of those very niche features. It's advanced use only. And, by the way, isn't that the same thing  as `getattr`, `setattr`, and `delattr`? Why is there a need for descriptors at all? It is true that without knowing more about the internals of your class, `something.x` could be calling the `getattr`, or it could be calling the  `get` method of a descriptor. `obj.x = something` could  be calling the `setattr`, or it could be calling the `set` of a descriptor. And `del obj.x` could be calling the `delattr`, or a `delete` of a descriptor. Without seeing the internals of the class, you just can't tell. But there's a big difference  between `getattr`, `setattr`,   and `delattr` versus the descriptor versions. Namely, the methods on the right-hand  side are defined per class. Whereas, the ones on the left-hand side are defined per attribute. On the right, it's the class that's determining how to access attributes. Whereas, on the left, it's the attribute itself that determines how it's accessed. Even still, you may be thinking: "Descriptors? I'm never going to need that, right?" "I never define descriptors, I never work with descriptors." "I don't need to know." Well, welcome to my list of descriptors hiding in plain sight. You may not have realized it, but you're using them all the time. In descriptor number one, functions. Have you ever noticed a difference that you get when you access a function through an instance versus through the class? Accessing the function `f` through an instance little , `a`, we get a bound method. But accessing the same `f` through the class itself gives us a function object. It's the same `f` in both cases, but I'm getting different results. That's because every function you define with the `def` keyword is a descriptor that defines a `get` method. And it uses the descriptor  protocol to do something different based on whether it was called from an instance or the class itself. Functions in Python are written in C. So this isn't exactly what it's doing. But it's something close to this. Every function has a `get` method. If the object is `None`, then there's no instance associated with this lookup, meaning it was called from a class object itself. In that case, we just return the function as is. Otherwise, `object` isn't `None`,  and we're in a case like this, where we're looking up a function on an actual instance variable. In that case, instead of  returning the function itself, we return some kind of bound function that remembers the object. So, yeah, if you're using functions,  then you're using descriptors. Number two, descriptor hiding  in plain sight, properties. And yes, this is a different case than just functions. Let me show you why. Let's reach into the class's dictionary and print out what this `area` thing actually is. I'm accessing it through the dictionary  like this in order to avoid  invoking the descriptor. We see that `area` is not actually a function. It's a property object. First, `area` is defined as  a function just like normal. Then, you replace `area` with whatever you get by calling `property` on it. That's why technically, `area` is a property object, not a function. So, this is exactly the same as this. And property is a descriptor. So, it controls what it does  when you say `dot area`. It just so happens that what it does is call that original `area` function. Properties like this are never really needed. You could always just call the function directly. But it's a common way to indicate to the  programmer that this thing  is really cheap to compute. By making `area` a property, you're basically  telling users that `area`  is so cheap to compute that it's basically as free as an attribute access. If for some reason your `area` function really was expensive, then make it look expensive. Don't hide the fact that it's a function call. Just make that explicit. Funnily enough, the main reason that I see people use properties is actually to introduce a feature to Python that was specifically left out of the language. By design, in Python, all attributes are public. There's no way to prevent someone from  accessing internal implementation  details of your classes. And aside from inheriting from a built-in type like `tuple`, this also makes it impossible  to make truly immutable types. But this is a pretty common pattern to  prevent people from accidentally  mutating your object. Add an underscore to the beginning of your attribute name. Then make a property with the same name without the underscore. People can still read the name, no problem. But if they try to write to it, they get an error. But of course, this isn't true immutability. Someone could just reach inside and manually change the underscore variable name. But it's pretty much an unspoken rule in Python that if you have an underscore variable or underscore function, then you're not meant to touch those. So if you do change an underscore variable or call an underscore function, then you should expect everything to break. It's your own fault. Anyway, here's how you might implement property if you were doing it yourself. The built-in property also does `set` and `delete`, but you get the idea. As per usual, if you weren't passed an instance, then just return the property itself. Otherwise, call the stored function  on the instance that was passed in. Hidden descriptor number three, class methods and static methods. Both class and static methods allow you to call a function whether you have an instance of the class or the class itself. In both cases, since you might not have an instance to work with, there's no `self` parameter. And the difference is that a static method has no implied parameters. Whereas, a class method has an implied class parameter. So in both of these cases, whether you called with the capital animal class or the lowercase animal instance, the class parameter of the `create` function will be filled in with the animal class. I have a whole video on class  methods versus static methods. Check that out if you want to hear more. As far as possible implementations go, they could look something like this. Just like properties, both of these take and  remember the function that they're applied to. Static method is much simpler. Whether you were called with an instance or not, just always return the function back. Class method is a bit trickier because  we need to supply that class parameter. If we weren't passed the type to use, then we just use the type of the object. Then this is how we bind that object to the class parameter. Remember, functions are descriptors. And the `get` method returns a bound version of the function where the first argument is bound to the first argument of the `get`. It's not totally clear if the  second argument matters at all. But this works. So as you can see, descriptors are often used to make sort of function object-like things. But that's not all they're useful for. Let's take a look at number four, slots. This is another one that I have a full  video on, but here's the quick rundown. Normally, objects have an instance dictionary. Anytime you store a variable into the  object, it really just stores  it inside this dictionary. But especially for small objects, dictionaries  aren't necessarily the most  efficient way to store things. If you define `__slots__ = ['x', 'y', 'z']`, then you're saying the only three attributes  that my instances are going  to have are `x`, `y`, and `z`. You can get, set, and delete  `x`, `y`, and `z` no problem. But if you try to get, set, or delete `W`, then you get an error. Once again, directly reaching inside the  classes dictionary, we see  that `X` is a member object. These `__slots__` members also define all three of `get`, `set`, and `delete`. And because there's no instance dictionary to manage this, these `get`, `set`, and `delete` have to reach into the underlying C structure of the objects and manually modify them. Again, see my video on slots if you'd like to hear more. And speaking of instance dictionaries,  do you know what else are descriptors? Instance dictionaries. The dunder `__dict__` attribute of any class that has instance dictionaries isn't a dictionary. It's an attribute object, which is a  descriptor with all three of  `get`, `set`, and `delete`. And notice this weird idiom that I had to  do in order to see this attribute object. I had to reach into the dictionary of the class and then read the dictionary. If you print out just the `__dict__`, it looks like a dictionary. But if you look at the type, you see that it's actually a mapping proxy object. This happens because, remember, `__dict__` is a descriptor. So accessing `E.__dict__` invokes the descriptor. And what the descriptor does is return this proxy object instead of the actual attribute. So that's why we had to do this double `__dict__` thing. Moving on, how about a real-world library use case of descriptors? Look no further than one of the most popular Python packages of all time, SQLAlchemy. SQLAlchemy lets you communicate with databases through Python in a pythonic way. Here, I define a user account table with two fields, ID and name. When I define the class, I say that ID is a column  that takes integers, and name  is a column that takes strings. When I operate on an actual instance of the class, I'm working not with column objects. But with actual ints and strings. Getting different behavior on a class versus an instance? You guessed it, they're descriptors. Number seven. Another common use for descriptors is field validation. For example, I want to say here, this item has a price that's greater than zero. Whenever I set a price, I want to make  sure that it's positive,  and if not, I want an error. This functionality is accomplished in the `set` method. Before setting the attribute,  check if it's bigger than zero. Otherwise, throw a `ValueError`. Then proceed with setting the attribute. This is an interesting use case for Python's `__set_name__`. This function is called at  class construction time. And its purpose is to let each object know what its name is. After the class body runs, this  `greater_than` object will be  told that its name is `price` by having the `__set_name__` called with the  owner being the `Item` class and  the name being `price`. In this case, I'll prepend  an underscore to the name. And then use that as the sort of private location where I'm storing the actual data. So the descriptor is stored at `item.price`. But the value that's underlying the price, the  actual price, gets stored at `item._price`. Knowing the name that we're assigned  to is a good way to avoid conflicts if you have multiple of these descriptors in the same class. So if I also had a `quantity` field, then there  wouldn't be any fight over  where to store each of them. They each have their own private location. And the final descriptor on  my list is super lookups. Suppose I have a `Package` class that can ship to some address. And an `ExpressPackage` that ships faster. I don't really recommend doing this. But you can set a base view onto your class. Create a super object, and stick it on there. If you ship an `ExpressPackage`,  it's on the way right away. But if you call `ship` on the base view, then you'll get the parent's behavior. I'm not going to go into why this works here. I have a whole video on `super` if  you want to see the gory details. This next portion is slightly more advanced  and deals with some  tricky issues you might run into. So if a descriptor's `get` method and a class's `getattr` method can both define what it means  to say `object.something`, then what happens if you  have a class that has both? I encourage you to take this example. And try commenting and uncommenting things to see how things actually work. The first trip-up is that  there's not just one `getattr`. There's `getattr`, and `getattribute`. And `getattribute` is actually the one that's  more similar to `setattr` and `delattr`. `getattribute`, `setattr`, and  `delattr` are always called. When you say `object.something`. `getattribute` is always called. When you say `object.something = something`. `setattr` is always called. When you say `del object.something`, `delattr` is always called. And for these three functions, it's actually the base `object` class whose versions of those functions actually implement the descriptor logic. So if you're defining `__getattribute__`, `__setattr__`, or `__delattr__`, then you should call the base `object`'s  version of that function  inside your implementation if you want it to work with descriptors. If you don't call the base `object`'s version, you'll find that your descriptor methods `get`, `set`, `delete` are not called. Unless, of course, you manually call them in your version of the function. That's to say, you can sort of override  the descriptor protocol if you so choose. And then there's `getattr`, which is actually   only called if `getattribute`  raises an `AttributeError`. So how does `getattribute` decide  what order to do things in? The object version of `getattribute` has a series of fallbacks that it tries to figure out which one is the best one to call. If it finds a descriptor that has  a `get` and a `set` or `delete`, then that's called a data descriptor,  and it has the highest priority. Next, it checks inside the instance dictionary. Then it checks for descriptors  that just have a `get`. These are called non-data descriptors. Then it checks for variables  found at the class level. Then it'll raise an `AttributeError`, which  triggers the `__getattr__` if it exists. These defaults are definitely not obvious,   although they are generally  good for most use cases. It may feel especially  weird that instance variables  are split between these data  versus non-data descriptors. The reason that non-data descriptors get lower priority than instance variables is because the most common use case for these is caching. The descriptor computes some expensive  variable and then saves it  in the instance dictionary. The next time it gets looked up, it's found in the instance dictionary and isn't recomputed. This was chosen because of how common caching is. But if you want your descriptor  to always be preferred, then you can just define a `set` or  `delete` that just does the default thing. Here's a case where you can see  that priority difference play out. For `x`, I have a data descriptor. It defines both a `get` and a `set`. Whereas `y` just has a `get`. Both of our `get`'s return `None`.  So when we print `X` and  `Y`, we get `None`, `None`. Then for both `x` and `y`, we store  `42` in the instance dictionary. Print them out again, and for `x`, we get `None`, but for `y`, we get `42`. This is because the data descriptor's `get`, which  returns `None`, has a higher  priority than the dictionary. Whereas the non-data  descriptor's `get` is not called  because the dictionary has  higher priority in that case. Obviously, the rules are complex, non-intuitive. And I don't expect you to get them in one go. I just hope this helps you remember  that there is some subtlety there. And you might want to look into it if  you're ever defining a descriptor like this. Anyway, thanks for making it to the end. I hope you enjoyed it. As always, thank you to my patrons and donors for supporting me. If you enjoy my content,  please consider subscribing. And if you especially enjoy, please consider becoming a patron on Patreon. Don't forget to slap that like  button an odd number of times. See you next time.
Info
Channel: mCoding
Views: 80,308
Rating: undefined out of 5
Keywords:
Id: mMbVs17Vmo4
Channel Id: undefined
Length: 14min 21sec (861 seconds)
Published: Mon Sep 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.