Hello and welcome to mCoding. I'm James Murphy. Let's get going talking about descriptors. Contrary to how the name might sound, descriptors are not the
mutual enemy of Autobots and Decepticons, nor do they have anything to do with descriptions. Descriptors in Python are somewhat of a feature that's
hidden in plain sight. Officially, an object is a descriptor if it has any of these dunder
`get`, `set`, or `delete` methods. And their purpose is to allow you to customize what it means to get,
set, or delete an attribute. In this case, `x` is an instance
of the descriptor on the left. When you call `obj.x`, this
calls the descriptor's `get` method. When you call `obj.x = something`,
that calls the `set` method. And when you call `del obj.x`, that
calls the `delete` method. `get` takes the object and the object type. Importantly, this allows you to do different things based on whether something was called
from an instance or the class itself. If this is your first time
hearing about descriptors in Python, you're probably thinking this
is one of those very niche features. It's advanced use only. And, by the way, isn't that the same thing
as `getattr`, `setattr`, and `delattr`? Why is there a need for descriptors at all? It is true that without knowing
more about the internals of your class, `something.x` could be calling the `getattr`, or it could be calling the
`get` method of a descriptor. `obj.x = something` could
be calling the `setattr`, or it could be calling the `set` of a descriptor. And `del obj.x` could be calling
the `delattr`, or a `delete` of a descriptor. Without seeing the internals
of the class, you just can't tell. But there's a big difference
between `getattr`, `setattr`, and `delattr` versus the descriptor versions. Namely, the methods on the right-hand
side are defined per class. Whereas, the ones on the
left-hand side are defined per attribute. On the right, it's the class that's
determining how to access attributes. Whereas, on the left, it's the attribute
itself that determines how it's accessed. Even still, you may be thinking: "Descriptors? I'm never going to need that, right?" "I never define descriptors, I never work with descriptors." "I don't need to know." Well, welcome to my list
of descriptors hiding in plain sight. You may not have realized it,
but you're using them all the time. In descriptor number one, functions. Have you ever noticed a difference that you get when you access a function
through an instance versus through the class? Accessing the function `f` through an
instance little , `a`, we get a bound method. But accessing the same `f` through
the class itself gives us a function object. It's the same `f` in both
cases, but I'm getting different results. That's because every function you define
with the `def` keyword is a descriptor that defines a `get` method. And it uses the descriptor
protocol to do something different based on whether it was called
from an instance or the class itself. Functions in Python are written in C. So this isn't exactly what it's doing. But it's something close to this. Every function has a `get` method. If the object is `None`, then there's
no instance associated with this lookup, meaning it was called from a class object itself. In that case, we just return the function as is. Otherwise, `object` isn't `None`,
and we're in a case like this, where we're looking up a function
on an actual instance variable. In that case, instead of
returning the function itself, we return some kind of bound
function that remembers the object. So, yeah, if you're using functions,
then you're using descriptors. Number two, descriptor hiding
in plain sight, properties. And yes, this is a different case
than just functions. Let me show you why. Let's reach into the class's dictionary
and print out what this `area` thing actually is. I'm accessing it through the dictionary like this in order to avoid
invoking the descriptor. We see that `area` is not actually a function. It's a property object. First, `area` is defined as
a function just like normal. Then, you replace `area` with
whatever you get by calling `property` on it. That's why technically, `area`
is a property object, not a function. So, this is exactly the same as this. And property is a descriptor. So, it controls what it does
when you say `dot area`. It just so happens that what it
does is call that original `area` function. Properties like this are never really needed. You could always just call the function directly. But it's a common way to indicate to the programmer that this thing
is really cheap to compute. By making `area` a property, you're basically telling users that `area`
is so cheap to compute that it's basically as free as an attribute access. If for some reason your
`area` function really was expensive, then make it look expensive. Don't hide the fact that
it's a function call. Just make that explicit. Funnily enough, the main reason
that I see people use properties is actually to introduce a feature to Python
that was specifically left out of the language. By design, in Python, all attributes are public. There's no way to prevent someone from accessing internal implementation
details of your classes. And aside from inheriting
from a built-in type like `tuple`, this also makes it impossible
to make truly immutable types. But this is a pretty common pattern to prevent people from accidentally
mutating your object. Add an underscore to the
beginning of your attribute name. Then make a property with
the same name without the underscore. People can still read the name, no problem. But if they try to write to it, they get an error. But of course, this isn't true immutability. Someone could just reach inside and
manually change the underscore variable name. But it's pretty much an unspoken rule in Python that if you have an underscore
variable or underscore function, then you're not meant to touch those. So if you do change an underscore
variable or call an underscore function, then you should expect
everything to break. It's your own fault. Anyway, here's how you might
implement property if you were doing it yourself. The built-in property also does
`set` and `delete`, but you get the idea. As per usual, if you weren't passed
an instance, then just return the property itself. Otherwise, call the stored function
on the instance that was passed in. Hidden descriptor number three,
class methods and static methods. Both class and static methods
allow you to call a function whether you have an
instance of the class or the class itself. In both cases, since you
might not have an instance to work with, there's no `self` parameter. And the difference is that a static
method has no implied parameters. Whereas, a class method
has an implied class parameter. So in both of these cases, whether you called with the capital
animal class or the lowercase animal instance, the class parameter of the `create`
function will be filled in with the animal class. I have a whole video on class
methods versus static methods. Check that out if you want to hear more. As far as possible implementations go,
they could look something like this. Just like properties, both of these take and
remember the function that they're applied to. Static method is much simpler. Whether you were called with an instance
or not, just always return the function back. Class method is a bit trickier because
we need to supply that class parameter. If we weren't passed the type to use,
then we just use the type of the object. Then this is how we bind
that object to the class parameter. Remember, functions are descriptors. And the `get` method returns
a bound version of the function where the first argument
is bound to the first argument of the `get`. It's not totally clear if the
second argument matters at all. But this works. So as you can see, descriptors are often
used to make sort of function object-like things. But that's not all they're useful for. Let's take a look at number four, slots. This is another one that I have a full
video on, but here's the quick rundown. Normally, objects have an instance dictionary. Anytime you store a variable into the object, it really just stores
it inside this dictionary. But especially for small objects, dictionaries aren't necessarily the most
efficient way to store things. If you define `__slots__ = ['x', 'y', 'z']`, then you're saying the only three attributes that my instances are going
to have are `x`, `y`, and `z`. You can get, set, and delete
`x`, `y`, and `z` no problem. But if you try to get, set,
or delete `W`, then you get an error. Once again, directly reaching inside the classes dictionary, we see
that `X` is a member object. These `__slots__` members also
define all three of `get`, `set`, and `delete`. And because there's no
instance dictionary to manage this, these `get`, `set`, and `delete` have
to reach into the underlying C structure of the objects and manually modify them. Again, see my video
on slots if you'd like to hear more. And speaking of instance dictionaries,
do you know what else are descriptors? Instance dictionaries. The dunder `__dict__` attribute of any class
that has instance dictionaries isn't a dictionary. It's an attribute object, which is a descriptor with all three of
`get`, `set`, and `delete`. And notice this weird idiom that I had to
do in order to see this attribute object. I had to reach into the dictionary
of the class and then read the dictionary. If you print out just the `__dict__`,
it looks like a dictionary. But if you look at the type, you see
that it's actually a mapping proxy object. This happens because,
remember, `__dict__` is a descriptor. So accessing `E.__dict__` invokes the descriptor. And what the descriptor does is return
this proxy object instead of the actual attribute. So that's why we had to
do this double `__dict__` thing. Moving on, how about a
real-world library use case of descriptors? Look no further than one of the most
popular Python packages of all time, SQLAlchemy. SQLAlchemy lets you communicate
with databases through Python in a pythonic way. Here, I define a user account
table with two fields, ID and name. When I define the class, I say that ID is a column that takes integers, and name
is a column that takes strings. When I operate on an actual instance
of the class, I'm working not with column objects. But with actual ints and strings. Getting different behavior
on a class versus an instance? You guessed it, they're descriptors. Number seven. Another common
use for descriptors is field validation. For example, I want to say here, this
item has a price that's greater than zero. Whenever I set a price, I want to make sure that it's positive,
and if not, I want an error. This functionality is
accomplished in the `set` method. Before setting the attribute,
check if it's bigger than zero. Otherwise, throw a `ValueError`. Then proceed with setting the attribute. This is an interesting use
case for Python's `__set_name__`. This function is called at
class construction time. And its purpose is to let
each object know what its name is. After the class body runs, this `greater_than` object will be
told that its name is `price` by having the `__set_name__` called with the owner being the `Item` class and
the name being `price`. In this case, I'll prepend
an underscore to the name. And then use that as the sort of private
location where I'm storing the actual data. So the descriptor is stored at `item.price`. But the value that's underlying the price, the
actual price, gets stored at `item._price`. Knowing the name that we're assigned
to is a good way to avoid conflicts if you have multiple of these
descriptors in the same class. So if I also had a `quantity` field, then there wouldn't be any fight over
where to store each of them. They each have their own private location. And the final descriptor on
my list is super lookups. Suppose I have a `Package` class
that can ship to some address. And an `ExpressPackage` that ships faster. I don't really recommend doing this. But you can set a base view onto your class. Create a super object, and stick it on there. If you ship an `ExpressPackage`,
it's on the way right away. But if you call `ship` on the base
view, then you'll get the parent's behavior. I'm not going to go into why this works here. I have a whole video on `super` if
you want to see the gory details. This next portion is slightly more advanced and deals with some
tricky issues you might run into. So if a descriptor's `get` method
and a class's `getattr` method can both define what it means
to say `object.something`, then what happens if you
have a class that has both? I encourage you to take this example. And try commenting and uncommenting things
to see how things actually work. The first trip-up is that
there's not just one `getattr`. There's `getattr`, and `getattribute`. And `getattribute` is actually the one that's
more similar to `setattr` and `delattr`. `getattribute`, `setattr`, and
`delattr` are always called. When you say `object.something`. `getattribute` is always called. When you say `object.something = something`. `setattr` is always called. When you say `del object.something`,
`delattr` is always called. And for these three functions,
it's actually the base `object` class whose versions of those functions
actually implement the descriptor logic. So if you're defining `__getattribute__`,
`__setattr__`, or `__delattr__`, then you should call the base `object`'s version of that function
inside your implementation if you want it to work with descriptors. If you don't call the base `object`'s version, you'll find that your descriptor
methods `get`, `set`, `delete` are not called. Unless, of course, you manually
call them in your version of the function. That's to say, you can sort of override
the descriptor protocol if you so choose. And then there's `getattr`, which is actually only called if `getattribute`
raises an `AttributeError`. So how does `getattribute` decide
what order to do things in? The object version of `getattribute`
has a series of fallbacks that it tries to figure out which
one is the best one to call. If it finds a descriptor that has
a `get` and a `set` or `delete`, then that's called a data descriptor,
and it has the highest priority. Next, it checks inside the instance dictionary. Then it checks for descriptors
that just have a `get`. These are called non-data descriptors. Then it checks for variables
found at the class level. Then it'll raise an `AttributeError`, which
triggers the `__getattr__` if it exists. These defaults are definitely not obvious, although they are generally
good for most use cases. It may feel especially
weird that instance variables are split between these data
versus non-data descriptors. The reason that non-data descriptors
get lower priority than instance variables is because the most
common use case for these is caching. The descriptor computes some expensive variable and then saves it
in the instance dictionary. The next time it gets looked up, it's found
in the instance dictionary and isn't recomputed. This was chosen because of how common caching is. But if you want your descriptor
to always be preferred, then you can just define a `set` or
`delete` that just does the default thing. Here's a case where you can see
that priority difference play out. For `x`, I have a data descriptor. It
defines both a `get` and a `set`. Whereas `y` just has a `get`. Both of our `get`'s return `None`. So when we print `X` and
`Y`, we get `None`, `None`. Then for both `x` and `y`, we store
`42` in the instance dictionary. Print them out again, and for `x`,
we get `None`, but for `y`, we get `42`. This is because the data descriptor's `get`, which returns `None`, has a higher
priority than the dictionary. Whereas the non-data
descriptor's `get` is not called because the dictionary has
higher priority in that case. Obviously, the rules are complex, non-intuitive. And I don't expect you to get them in one go. I just hope this helps you remember
that there is some subtlety there. And you might want to look into it if
you're ever defining a descriptor like this. Anyway, thanks for making
it to the end. I hope you enjoyed it. As always, thank you to my
patrons and donors for supporting me. If you enjoy my content,
please consider subscribing. And if you especially enjoy,
please consider becoming a patron on Patreon. Don't forget to slap that like
button an odd number of times. See you next time.