Welcome back to mCoding. I'm James Murphy. If you had to pick, what would you say
is your most disliked dunder method in Python? Maybe dunder new and all of the
metaclass glory that goes with that? Or maybe you've seen my video about
plus equals and don't like the in-place methods? Or hey, nobody likes taking the
time to write a wrapper, maybe that's it? Well, I've got a different choice. One that's so disliked in the Python community that if you so much as ask
what it does on Stack Overflow, you will be inundated with
comments telling you not to use it. What dunder is it that could
possibly harbor this much hate? It's none other than dunder del. So, in this video, let's check out `__del__`. See what it's supposed to do,
hone in on some of those pitfalls. And, of course, see some
more robust alternatives at the end. First up, how does `__del__` relate to the `del` keyword? Well, there are three ways
that you can use the `del` keyword. You can say `del x[something]`. And that usage corresponds
exactly to calling the `__delitem__` method. You can call it with `del x.something`. And that corresponds
exactly to the `__delattr__` method. Finally, there's just `del x`, which
does not call the `__del__` method. This is our first hint
about why people don't like `del`. `__delitem__` and `__delattr__` are
very straightforward. Nobody hates those. `str(x)` calls `__str__` `repr(x)` calls `__repr__` `len(x)` calls `__len__` But `del x` does not call `__del__`. But it kind of looks like it does. If I run this example, I see the `__del__` printout. But if I comment out `del x`, it still gets called. Let's throw in some print statements at the start, just before we call `del`, and at the end,
so we see exactly when this `__del__` is being called. We see start, then just before
`del`, then the `__del__` printout, and then end. So, it seems like `del` is triggering `__del__`. When we comment out `__del__`, it appears
that it's getting called after the end of the function. So, is this like manual
versus explicit garbage collection? Well, still no. If we create another variable `y` and assign it to `x`, then even if we manually delete `x`, we don't see the `__del__`
until after the end of the function. `__del__` is, of course, short for "delete." This function is supposed
to run before the object is deleted. But there's a big difference between the
object itself and the name that you assigned to it. Every object in Python has
what's called a reference count. The reference count tells you how
many things are referencing that object. Assigning a name like `x` to point
to an object increases its reference count. If I create another
name like `y` for the same object, that also increases the reference count. So, what the `del` keyword is doing
here is actually just deleting the name `x`. This effectively just reduces the
reference count of the object by one. But we don't want to delete the
object if there are still things referencing it. So, `__del__` is actually supposed to
be run when the object's ref count hits zero. If we didn't have the `y` variable, then
`del x` here deletes the only reference to the object. So its ref count goes to
zero, and it should be deleted. That's why `del` sometimes calls `__del__`. But that's kind of just circumstantial. But even once you understand
that `del x` doesn't directly call `__del__`, there's still a lot of confusion around `__del__`. You might think that `__del__`
is a great place to put cleanup code. Maybe I open up a file descriptor, and
then when nobody's using it anymore, close it. The main problem with this idea is that
the documentation for `__del__` explicitly states that it may just never be called. Not a great property for cleanup code. The first reason `__del__` may never
be called is because of reference cycles. Suppose I have an `x` and a `y`. Maybe it's a tree structure, so `x`
has some children and `y` is one of them. Then, I include a back
reference to `x` as the parent of `y`. So, `x`'s children have a reference to `y`, And then `y` has a reference back up to `x`. This is called a reference cycle. And when you have a reference cycle, it's
impossible for either of the ref counts of `x` or `y` to hit zero. Even after this function returns. And it's
impossible to access `x` or `y` ever again because they were just local variables. Their ref counts won't go to
zero because they're still referencing each other. Furthermore, even if an
object's ref count hits zero, Python explicitly states that that doesn't
guarantee the object's `__del__` is called at that time. Now, in CPython, which is the
Python that 99% of you are going to be using, CPython does call `__del__`
immediately when the ref count hits zero. And for reference cycles, there's a garbage collector
which periodically runs and detects these kinds of cycles. As long as the interpreter didn't flat-out crash, CPython does a really good job of
ensuring that the garbage collector runs and all of your objects'
`__del__` methods are called. But still, the docs insist
that this might not happen. So you can't really depend on it. Oh, and by the way, since the time
that `__del__` is called isn't guaranteed, it's basically impossible for you to be able to
handle any errors that might propagate out of them. Therefore, the interpreter just completely
ignores exceptions raised in a `__del__` method. It'll still print the traceback. But as you can see, it still ran the code afterward. Oh yeah, and remember, if `__del__` is called at all, it might be called while
the interpreter is shutting down. This `__del__` tries to
dump some state into a JSON file. Remember, exceptions get ignored if
anything goes wrong trying to open this file. And then, if Python happens to wait until
the interpreter is shutting down to call this `__del__`, then this `json` module, which is a
global variable, may have already been deleted. In CPython, in practice, using
even a relatively oldish version like 3.7, I haven't actually run into this issue. But once again, the documentation
won't make any promises And says you need to watch out for this. What else could there possibly
be that's more confusing about `del`? With all the lack of promises
about even basic functionality, you wouldn't think `del`
would support that many features. But it actually purposefully supports
the idea of so-called resurrecting an object that's about to be deleted. Let's say this `__del__` method is called because there are no more outside
accessible references to the `self` variable. Well, we can just put `self` into a global variable. Thereby increasing its
reference count back up to 1. Of course, it's kind of your
own fault if you do something like this. But Python guarantees you
are allowed to do this. If you increase the ref
count of an object inside `__del__`, then its memory won't be
reclaimed by the garbage collector. It will survive until another day. But you know they're not letting you get away
without throwing another curveball in there, right? If an object is resurrected, then when it dies again, its `__del__` method may
be called, or it might not be called. So, you also need to make sure that your
`__del__` doesn't do anything nasty if it gets called twice. I already mentioned that exceptions
from `__del__` get ignored, right? Okay, so how about we just get to an example where it does something useful
and it should theoretically be okay to use? This class makes a temporary directory. When you create an instance, it
creates an actual temp file. You can manually call `remove` on it. Or the file is removed when it's garbage collected. Notice that the first time we
call `remove`, it sets `name` to `None`. This means that if `remove`
is called multiple times, either by the user or
by `__del__` being called multiple times, nothing will happen after the first time. We're definitely depending
here on CPython's unadvertised property that garbage collection will happen before shutdown. And it will happen early enough in
shutdown that this module, `shell_util`, will still exist. I'm definitely not confident given all those restrictions. But it does seem to work with the
current version of Python that I'm using. So, if `__del__` is so finicky,
it might never get called, it might get called twice, it ignores
exceptions, globals might no longer exist. How am I supposed to reliably
clean up the resources used in my code? The most robust solution is to use a `with` statement. Define `__enter__` and `__exit__` methods. And make sure that you call your
cleanup code in the `__exit__` method. Then you can use the class like this. We just say `with`, we create a new instance as `d`. Do whatever we need to do inside the `with` block. Then when the `with` block is over,
Python guarantees that the `remove` method is called. The behavior of the `with` statement
is much more precisely defined. And you can depend on it. This is slightly limiting though. We have to use the `with` statement. What if I don't know when
I want to delete my temp file? Maybe I'm making a text editor. And I want to delete this temp file
whenever the user is not looking at it anymore. That might not be confinable to a `with` block. But certainly, if Python has no more
references to the object, then it's fine to delete. If you really want to support cases
where you can't use `with` statements, there is another alternative. Let's leave our `__enter__` and `__exit__` methods. We still want to encourage people
to do it the right way. Go ahead and delete the `__del__`. Then use the `weakref` module
to create a finalizer for our object. We can manually
call the finalizer in our `remove` method. It's even fine to call it multiple times. We can also check if a finalizer has
already run by checking its `alive` property. A finalizer does have some
of the same pitfalls as `__del__`. In particular, they have the
same exception-ignoring behavior. However, Python makes much stronger
guarantees that finalizers will run and when they'll run. In particular, when the interpreter shuts down, all remaining finalizers that are still alive and haven't been disabled will be called. And they'll be called in the
reverse order that they were created in. It's also guaranteed that these
finalizers will run early in the shutdown process, before any global variables
like modules have been deleted. And, of course, by design, finalizers
are allowed to be called multiple times. And their effects will only happen the
first time they're called. This makes them a more robust and dependable solution compared to
`__del__` when you can't use a `with` statement. I didn't really say much about
what weak references are. Normal references, like
assigning the name `x` to a variable, increase the reference count. Those are called strong references. And they would keep
the object from being garbage collected. But we obviously don't want the finalizer
of our object to keep it from being garbage collected. Its whole purpose is to run
right before the object is garbage collected. So, it's actually possible
to create so-called weak references that don't increase the reference count
and don't prevent it from being garbage collected. If you want to hear more about
them, make sure to leave a comment. That's all I've got. See you in the next one. And don't forget to slap that
like button an odd number of times.