Python __slots__ and object layout explained

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi, I'm James Murphy. And today, we're talking about slots in Python. Slots are primarily a tool for saving on memory usage when you have a lot of really small objects. Using them is pretty easy. We'll go over that briefly. But the main point of this video is to help you understand what slots actually are and how they work. That means, we're also going to learn a little bit about what by default makes up an instance of a class and a little bit about descriptors and a little bit about how classes and instances are laid out in memory. I'd like to thank me for sponsoring myself. Did you know that I am available for consulting, contracting, training and interview prep services? I also now accept bitcoin for all business purposes including donations. By default, classes act a lot like dictionaries except you access attributes with dot syntax instead of using square brackets. And that's no coincidence. Every instance of a normal class has a double under or dunder dict attribute where you can actually view this dictionary. Notice that this dictionary only contains instance attributes. So, it contains the x that was assigned on the instance. But it does not contain the class variable v. The variable v would actually be stored in the capital A class object rather than in any little a instance. The way that attribute access works is that first Python will look in this dictionary and then if the key is found, it will return you that value. If Python can't find what it's looking for in the instance dictionary, then it'll start looking in the parent classes and their parent classes and so on all the way up the chain or raise an attribute error if it doesn't find anything. So, in this case, a.v is found in the capital A's dictionary instead of the little a's dictionary. You get an error if you try to access something that's not in any of the parent classes. But you can add new attributes at runtime because dictionaries are dynamically allocated and you can add new elements to them no problem. Now let's define a class that uses slots. It's very simple. You just add a dunder slots attribute onto the class. It should just be a tuple of your instance attributes. In this case, our only instance attribute is x. v is a class variable not an instance variable. The first thing to note about using a slotted class is that instances no longer have this dunder dict attribute. Instead of using a dictionary, Python will just create the space for each slot directly on each instance. You can still look up slotted variables and change them as usual. Plotting does not affect mutability. The main difference in functionality with a slotted class is that you can no longer add new attributes at runtime. All the attributes are decided when the class is defined. However, keep in mind that the class is different than an instance. So, I can still set class variables on the class even though I couldn't set it on the instance. So, it seems that slotted classes are just less functional classes. Why would you ever want to use one? Let's take a look at the memory usage. Here, I'm using sys.getsizeof() to get the size of an instance of A and an instance of B. Here, A did not use slots and B did. So, the A instance uses 48 bytes and the B instance uses 40 bytes. Okay. So, that's a little bit less memory. But sys.getsizeof() doesn't give us the full picture. sys.getsizeof() doesn't count the size of sub-objects within an object. So, it's completely ignoring the actual size of the dictionary. If instead I use a recursive getsize() function that actually counts the size of sub-objects and their sub-objects and so on, then we see much bigger numbers. The normal class used 230 bytes whereas the slotted class only used 68. That's a factor of three savings just for knowing the names of your variables ahead of time. Is it worth it? The difference gets even bigger the more attributes you have. In this case, we have seven attributes. And we compare not slotted versus slotted and these are the results. With seven attributes which I'd say is an upper middle amount for an average class, we have a nearly five times memory savings. Personally, I'm rarely memory constrained in Python. I have 32 gigs of RAM and I don't usually work with data that big. But in the rare case that I'm actually using lots and lots of small objects, a five times memory savings is a pretty good deal. Thankfully if I'm using something like a named tuple, this happens automatically. I also tested for speed differences in creating objects, getting attributes and setting attributes on slotted versus not slotted classes. I found slotted classes to be faster. But just a little bit. Nothing to redesign your code around. So, if instances of slotted classes don't use instance dictionaries, then how do slots actually work? Here, I have a not slotted and slotted class. And then I print out the dictionaries of both the classes. Note that it's the instances of slotted classes that don't have dictionaries. The class itself still has a dictionary. Notice that, the class object capital B already has an x attribute even though I've never assigned to it. Well, printing it out didn't really tell us much. But it turns out that this object is actually a descriptor. Descriptors which you could write yourself allow you to modify the way that attribute access happens. When Python looks up an object, if it finds that that object has a get method, it'll actually call that method instead of just returning the object. When I try to access little b.x, it doesn't find the x in the little b object. So, it looks in the super class. Since capital B.x has a get method, it goes ahead and calls it with the instance being the little b and the owner being the big B. The real number descriptor is written in C and it will directly reach into the underlying memory underneath the instance object and return the value of x. It doesn't have to look up x in a dictionary. It just remembers that all instances of capital B's store their x's at a fixed offset relative to the base instance. We'll see more on this in a bit. And a similar thing happens when you try to assign to x. It calls the set. And then, it uses C to set the value at a fixed offset within the object. Okay. So, there's some descriptor magic going on, some stuff that's happening in C. But what actually is a slot? To understand what a slot actually is we need to enter the matrix. We need to understand how Python objects are laid out in memory. The actual numbers that we're looking at may depend a little bit on the architecture of your computer. On my computer, sizes take up 8 bytes and pointers take up 8 bytes. So, here I have a slotted class A with no slots. Every instance of A is laid out in memory like this. All Python objects start out with a reference count and a pointer to their type. They could have other things. But for this class A, there's nothing else. The basicsize of a class tells you how much memory an instance of the class takes. Our class A is just a size and a pointer, the reference count and a pointer to the type. So, that's 16 bytes. And when I run it, we see that indeed A's basicsize is 16. Similar to sys.getsizeof(), basicsize doesn't include the size of any sub-objects. It's basically just counting up the size of the pointers and the size. sys.getsizeof() actually counts a little bit more though. For the purpose of garbage collection, Python actually puts two pointers just before the object starts. And getsizeof() actually counts those two pointers. So, sys.getsizeof() is going to count 1, 2, 3 pointers and 1 size. That's four times eight is 32 bytes. And that's exactly what we see when we run it. But these pictures are big enough. So, let's just focus on the instance and ignore the garbage collection stuff. The last class defined the slot's variable but it was empty. Here's what happens if you actually define slots. Again all objects start with their reference count and a pointer to their type. Directly following that is just a pointer for each slot. So, that's all the slot is. A slot is really just a piece of memory that you use for storing a particular piece of data about the instance. This slot holds x. This slot holds the reference count and this one the type. So, really you should think of the dunder slots variable as naming the extra slots that this class defines. This is also in line with how it actually works with inheritance. So, for each instance, we have one size and four pointers. That's five times eight is 40 bytes. If you include the garbage collection stuff, then that's 16 more bytes and you get 56. Let's look at the layout of just a normal class now. As always, we have the reference count and pointer to the type. Then instead of having a space for x, y and z directly, we have a space for the instance dictionary. x, y and z don't actually show up anywhere in the instance. You have to look inside the data for the dictionary and that's where you would find x, y and z. After they're initialized, of course. There's also going to be this weak reference attribute which would be part of the object by default if you didn't define slots. Explaining weakref is a topic for a different video. Just know that defining slots gets rid of it. Go ahead and make sure you can explain why the basic size of this object is 32 bytes. So, what about inheritance? What happens if you inherit from a class that defines slots but that class doesn't itself define slots? Well, being a slotted class is not inherited. So, even if B inherits from A which defines slots, B will get an instance dictionary. But remember, you should think about the dunder slots variable as being extra slots. So, B didn't define any extra slots. But it still has the ones from A. So, if I define b dot x equals 10, that will not show up in the instance dictionary of b. The x variable was a designated slot from A. So, it will get stored in its designated spot from A. If I do want to define more slots in a subclass, I should only include the additional ones. So, in this case, I add a t slot and the x, y and z ones will still be there from A. For anyone wondering about metaclasses, you will get an error if you try to define slots on a metaclass. You can make it have empty slots though. Finally, you can actually define dict as one of the slots. If you do, instances will get instance dictionaries. In this case, instances will have a dict. But they won't have that weakref attribute. I could also define a weakref slot if I wanted weakrefs but not dictionaries. As we saw before, empty slots will prevent the creation of both weakrefs and dictionaries. And if you don't define slots, then instances will have both weakrefs and dictionaries. The last case is if you have something like a dictionary along with additional slots. In this case, objects will have instance dictionaries. But specifically, the attributes x and y won't be stored in the instance dictionary. The legitimate use cases for doing something like this are pretty slim. But that's the way it works. As always, thank you for watching and don't forget to slap that like button an odd number of times.
Info
Channel: mCoding
Views: 91,037
Rating: undefined out of 5
Keywords: programming
Id: Iwf17zsDAnY
Channel Id: undefined
Length: 10min 16sec (616 seconds)
Published: Sat Oct 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.