Python's most DISLIKED __dunder__ (and what to use instead)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome back to mCoding. I'm James Murphy. If you had to pick, what would you say is your most disliked dunder method in Python? Maybe dunder new and all of the metaclass glory that goes with that? Or maybe you've seen my video about plus equals and don't like the in-place methods? Or hey, nobody likes taking the time to write a wrapper, maybe that's it? Well, I've got a different choice. One that's so disliked in the Python community that if you so much as ask what it does on Stack Overflow, you will be inundated with comments telling you not to use it. What dunder is it that could possibly harbor this much hate? It's none other than dunder del. So, in this video, let's check out `__del__`. See what it's supposed to do, hone in on some of those pitfalls. And, of course, see some more robust alternatives at the end. First up, how does `__del__` relate to the `del` keyword?   Well, there are three ways that you can use the `del` keyword. You can say `del x[something]`. And that usage corresponds exactly to calling the `__delitem__` method. You can call it with `del x.something`. And that corresponds exactly to the `__delattr__` method. Finally, there's just `del x`, which does not call the `__del__` method. This is our first hint about why people don't like `del`. `__delitem__` and `__delattr__` are very straightforward. Nobody hates those. `str(x)` calls `__str__` `repr(x)` calls `__repr__` `len(x)` calls `__len__` But `del x` does not call `__del__`. But it kind of looks like it does. If I run this example, I see the `__del__` printout. But if I comment out `del x`, it still gets called. Let's throw in some print statements at the start, just before we call `del`, and at the end, so we see exactly when this `__del__` is being called. We see start, then just before `del`, then the `__del__` printout, and then end. So, it seems like `del` is triggering `__del__`. When we comment out `__del__`, it appears that it's getting called after the end of the function. So, is this like manual versus explicit garbage collection? Well, still no. If we create another variable `y` and assign it to `x`, then even if we manually delete `x`, we don't see the `__del__` until after the end of the function. `__del__` is, of course, short for "delete." This function is supposed to run before the object is deleted. But there's a big difference between the object itself and the name that you assigned to it. Every object in Python has what's called a reference count. The reference count tells you how many things are referencing that object. Assigning a name like `x` to point to an object increases its reference count. If I create another name like `y` for the same object, that also increases the reference count. So, what the `del` keyword is doing here is actually just deleting the name `x`. This effectively just reduces the reference count of the object by one. But we don't want to delete the object if there are still things referencing it. So, `__del__` is actually supposed to be run when the object's ref count hits zero. If we didn't have the `y` variable, then `del x` here deletes the only reference to the object. So its ref count goes to zero, and it should be deleted. That's why `del` sometimes calls `__del__`. But that's kind of just circumstantial. But even once you understand that `del x` doesn't directly call `__del__`, there's still a lot of confusion around `__del__`. You might think that `__del__` is a great place to put cleanup code. Maybe I open up a file descriptor, and  then when nobody's using it anymore, close it. The main problem with this idea is that  the documentation for `__del__` explicitly states that it may just never be called. Not a great property for cleanup code. The first reason `__del__` may never be called is because of reference cycles. Suppose I have an `x` and a `y`. Maybe it's a tree structure, so `x`  has some children and `y` is one of them. Then, I include a back reference to `x` as the parent of `y`. So, `x`'s children have a reference to `y`, And then `y` has a reference back up to `x`. This is called a reference cycle. And when you have a reference cycle, it's impossible for either of the ref counts of `x` or `y` to hit zero. Even after this function returns. And it's impossible to access `x` or `y` ever again because they were just local variables. Their ref counts won't go to  zero because they're still referencing each other. Furthermore, even if an object's ref count hits zero, Python explicitly states that that doesn't guarantee the object's `__del__` is called at that time. Now, in CPython, which is the Python that 99% of you are going to be using, CPython does call `__del__` immediately when the ref count hits zero. And for reference cycles, there's a garbage collector which periodically runs and detects these kinds of cycles. As long as the interpreter didn't flat-out crash, CPython does a really good job of ensuring that the garbage collector runs and all of your objects' `__del__` methods are called. But still, the docs insist that this might not happen. So you can't really depend on it. Oh, and by the way, since the time that `__del__` is called isn't guaranteed, it's basically impossible for you to be able to  handle any errors that might propagate out of them. Therefore, the interpreter just completely  ignores exceptions raised in a `__del__` method. It'll still print the traceback. But as you can see, it still ran the code afterward. Oh yeah, and remember, if `__del__` is called at all, it might be called while the interpreter is shutting down. This `__del__` tries to  dump some state into a JSON file. Remember, exceptions get ignored if anything goes wrong trying to open this file. And then, if Python happens to wait until the interpreter is shutting down to call this `__del__`, then this `json` module, which is a global variable, may have already been deleted. In CPython, in practice, using even a relatively oldish version like 3.7, I haven't actually run into this issue. But once again, the documentation won't make any promises And says you need to watch out for this. What else could there possibly be that's more confusing about `del`? With all the lack of promises about even basic functionality, you wouldn't think `del` would support that many features. But it actually purposefully supports the idea of so-called resurrecting an object that's about to be deleted. Let's say this `__del__` method is called because there are no more outside accessible references to the `self` variable. Well, we can just put `self` into a global variable. Thereby increasing its reference count back up to 1. Of course, it's kind of your own fault if you do something like this. But Python guarantees you are allowed to do this. If you increase the ref count of an object inside `__del__`, then its memory won't be reclaimed by the garbage collector. It will survive until another day. But you know they're not letting you get away without throwing another curveball in there, right? If an object is resurrected, then when it dies again, its `__del__` method may be called, or it might not be called. So, you also need to make sure that your `__del__` doesn't do anything nasty if it gets called twice. I already mentioned that exceptions from `__del__` get ignored, right? Okay, so how about we just get to an example where it does something useful  and it should theoretically be okay to use? This class makes a temporary directory. When you create an instance, it creates an actual temp file. You can manually call `remove` on it. Or the file is removed when it's garbage collected. Notice that the first time we call `remove`, it sets `name` to `None`. This means that if `remove` is called multiple times, either by the user or  by `__del__` being called multiple times, nothing will happen after the first time. We're definitely depending here on CPython's unadvertised property that garbage collection will happen before shutdown. And it will happen early enough in shutdown that this module, `shell_util`, will still exist. I'm definitely not confident given all those restrictions. But it does seem to work with the current version of Python that I'm using. So, if `__del__` is so finicky, it might never get called, it might get called twice, it ignores exceptions, globals might no longer exist. How am I supposed to reliably clean up the resources used in my code? The most robust solution is to use a `with` statement. Define `__enter__` and `__exit__` methods. And make sure that you call your cleanup code in the `__exit__` method. Then you can use the class like this. We just say `with`, we create a new instance as `d`. Do whatever we need to do inside the `with` block. Then when the `with` block is over, Python guarantees that the `remove` method is called. The behavior of the `with` statement is much more precisely defined. And you can depend on it. This is slightly limiting though. We have to use the `with` statement. What if I don't know when I want to delete my temp file? Maybe I'm making a text editor. And I want to delete this temp file whenever the user is not looking at it anymore. That might not be confinable to a `with` block. But certainly, if Python has no more references to the object, then it's fine to delete. If you really want to support cases where you can't use `with` statements, there is another alternative. Let's leave our `__enter__` and `__exit__` methods. We still want to encourage people to do it the right way. Go ahead and delete the `__del__`. Then use the `weakref` module to create a finalizer for our object. We can manually  call the finalizer in our `remove` method. It's even fine to call it multiple times. We can also check if a finalizer has already run by checking its `alive` property. A finalizer does have some  of the same pitfalls as `__del__`. In particular, they have the same exception-ignoring behavior. However, Python makes much stronger guarantees that finalizers will run and when they'll run. In particular, when the interpreter shuts down, all remaining finalizers that are still alive and haven't been disabled will be called. And they'll be called in the reverse order that they were created in. It's also guaranteed that these finalizers will run early in the shutdown process, before any global variables like modules have been deleted. And, of course, by design, finalizers are allowed to be called multiple times. And their effects will only happen the  first time they're called. This makes them a more robust and dependable solution compared to  `__del__` when you can't use a `with` statement. I didn't really say much about  what weak references are. Normal references, like assigning the name `x` to a variable, increase the reference count. Those are called strong references. And they would keep  the object from being garbage collected. But we obviously don't want the finalizer of our object to keep it from being garbage collected. Its whole purpose is to run right before the object is garbage collected. So, it's actually possible to create so-called weak references that don't increase the reference count and don't prevent it from being garbage collected. If you want to hear more about them, make sure to leave a comment. That's all I've got. See you in the next one. And don't forget to slap that like button an odd number of times.
Info
Channel: mCoding
Views: 102,829
Rating: undefined out of 5
Keywords: python
Id: IFjuQmlwXgU
Channel Id: undefined
Length: 9min 59sec (599 seconds)
Published: Sun Jun 05 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.