This Is Why Python Data Classes Are Awesome

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
data classes are pretty awesome i did a video about data class already a while ago but since then i've been using them much longer in python and i've learned quite a lot about them so in this video i'd like to revisit them show you what you can do with them and i'd also like to cover a couple of really cool things that have been added to data classes in python 3.10 before we start i have something for you it's a free guide that helps you make better software design decisions it's available at rmcodes.com design guide it describes the seven steps that i take whenever i design a new piece of software and i hope by writing it down that it's also helpful to you to get started so you can get it for free at ioncodes.com design guide and i've also put the link in the description of this video data classes are mainly aimed at helping you write more data oriented classes as the name also says so what is that think of things like a class to represent a point or a vector or any kind of simple data structure this is very different from behavior-oriented classes things like a payment surface that exposes a number of methods that you call in order to process payments or a button that can handle clicks and respond to other user inputs like hovering over it how does a data class help with representing data-oriented classes well it adds a couple of convenient mechanisms like being able to represent the object as a string easily compare the object with other objects and define the data structure easily and add an easy initialization mechanism to it i'm going to start with a really simple example today just to show you some of the possibilities with data classes we have a function here called generate id that basically generates a random string from uppercase characters and default that's going to be a 12 character string i'm not using it at the moment i will do that later on that's class person it's also very simple it has a name and address that are set in the initializer and also passed as an argument when you call that initializer then i have a main function that creates a person prints the person and we run the main function let's run this code and see what happens so this is the output now unfortunately that's not very useful because with ideally if we want to print a person you want to see the name and the address on the screen because otherwise why would you print the person right unfortunately in python if you create your own class add a couple of instance variables and then try to print object and instance of that class then this is basically what you're going to get it's basically a memory address not really useful now you can define a str dumber method in your class to indicate what should happen when we print the class so what we can do is add a str string dunder method to the person class like so it's going to return a string and let's use an f string for this so self.name self.address like so let's run this code again to see what happens and now you see we actually get a string that makes some kind of sense we have a better understanding now of what the contents of the person object is and there's a couple of other things that might be useful if you have a personal class like this like being able to compare a person with another person to do sorting for example or you might want to add more fields to the person like a gender a city state zip code phone number email address and so on and so on in the end the person class can actually get pretty complicated and the disadvantage of doing all this work yourself is that every time you need to add an extra field to person you're going to need to add it here as an argument so you can pass to the initializer you need to add it here because it needs to be stored you need to add it to the string donder method you need to make sure that if you compare persons that you take that extra field into account if that's applicable so it complicates things and especially for data oriented classes like this there is data class which helps you create these classes in a much more simple and quick way so what i can do instead of writing off this code myself is simply turn person into a data class so let's import data classes so we import the data class decorator and we write that above the person class like so and now i can define my instance variables like so name string address and that's also a string the initializer is going to be generated by the data class decorator and so is the string donder method so let's remove that and now this is our person and now let's see what happens when we run this code and now you see it prints this that's because data classes generated a string donder method for us that prints this that's pretty useful also what i like about this is that by defining the variables here in this way and providing the type it's actually really short we can write these classes really quickly so i use them quite a lot in my code one thing i don't really like about data classes is that they abuse the concept of a class variable to represent instance variables this can be confusing especially for beginners and if you forget to write the data class decorator above the class definition it happened to me a couple of times then you end up with a bunch of class variables and not instance variables leading to all kinds of problems in the future but overall i find the pros of data classes outweigh the cons and i still use them quite a lot in my python code so let's look at a couple of more things you can do with data classes that i think are really useful one is that you can assign default values let's say we want to keep track of whether a person in our system is active or not so i could add a boolean instance variable here called active and now i can provide a default value for example i can set that to true so now if i run the code again we have another person but now it has active equals to a default and because it's a default value we don't have to provide it to the initializer now for primitive types like booleans integers floats and strings this way of using default values works pretty well but what if we have something a little bit more complicated for example let's say we want to have a list of email addresses so the type is a list of string how do we provide a default value you would think okay let's just do this but that's actually going to lead to problems because python evaluates these default values when it interprets the script so that means if you have multiple instances of person that doesn't matter it's always going to be the same reference to the list so each person is going to have the same list of email addresses which is problematic in order to solve that data classes provides a factory function that you can use instead so in order to use that you need to specify that this is a field which is also from data classes you see it at the import here and inside that we provide a default factory and in this case we want to create a new list so we supply the list what happens is that when data classes generates the class it's going to call this function so it's a function we don't provide a type here we provide a function i can show you another example let's say we want to add an id so an id that's a string basically and we can provide a default value here as well and we could do something like this but let's say that we want the person to have a random id when we create that person and what we can do we can use exactly the same mechanism so i can create a field like so and i have my default factory and there we need to provide the function that's going to generate the default value for us and that's going to be generate id let's run this code see what it looks like so if you run this now you see we have a person has a name an address active email addresses is an empty list and we have a randomly generated id very simple now with these default values you can still set them as part of the initializer so for example i could still do this active equals false and now if i run this code again then you see that the person is initialized with the value active equals false the same goes for any other attribute as well so for example i could also provide a custom id like so and now when i run this now it's going to have a custom id and data class is not going to call this default factory for us because we provided a value let's say that we want to restrict this a little bit let's say we want to make sure that you're not able to set an id explicitly well that's problematic at the moment because this is part of the initializer that data classes generates in order to avoid users being able to set the id themselves directly by calling the initializer you can add an option to the field called init equals false and that basically means that this field this instance variable is not going to be part of the initializer so let's save this and now if you try to run this you see we got an unexpected keyword argument id so we can't provide the id anymore and that's what you do with the init equals false thing so for id the solution works we simply call a function that generates a random id sometimes you want to generate a value from the other instance variables how do you do that because you can't create a function for that because you don't have two values yet well that's when the post init thunder method comes into play let's say for example we have something called a search string so we have a person class and this is going to be a huge database and we want to be able to search for persons and the easiest way to do that is to create when we create a person create a search string that contains the things that we want to be able to search on like the name and the email address for example or the name and the address so we have a search string instance variable for that now obviously init equals false we don't want search string to be an argument to the class initializer so i'm going to put it as init equals false so when we do this it's not an initializer and we need to make sure ourselves that gets initialized so let's add a post in it method and in that method this is post initialize so we know that the other attributes the other instance attributes have a value so we can now construct the search string from these values for example something like this self dot search string equals and let's say we create a search string out of the name and the address like so and now let's run the program again so now you see it prints the person name address active and it has generates an id and the search string is constructed from the name and the address that we provided to the initializer so that's pretty neat one more thing you might want to do in your classes make some distinction between what is supposed to be publicly available as part of your data and what is a more protected or private element so what you can do to indicate that search string something internal to the class is that you add underscores in front of it two for private and one for protected so generally i just stick to one underscore i think that's enough and it clearly indicates that search string is not something that you're supposed to change outside of the class so when you run this we get of course exactly the same result except that search string is now a protected member variable the issue is that if you look at the output of the program well it prints the search string here that's not really very useful information because it's just a copy of the other things that are already part of the person and it's an internal thing of the person so when we print a person maybe we want to exclude search string from that printed version of the person and what you can do is indicate that in the field by saying that the wrapper equals false like so and now when we print the person you see that the search string is omitted but it's still part of the person so as you can see with fields and various options of fields you can do quite a lot of different things so one more thing i want to show you and then i want to talk about some new things you can do with data classes in python 3.10 one thing that i find quite useful is being able to freeze a data class what you can do to achieve that is you can pass an argument to the decorator called frozen and you can put that to true now let's initially default this is going to be false default a person is not frozen so what you can do if it's not frozen is this person.name equals iron and now of course we can print the person and now the person's name is ariel but when we freeze the data class this means that once we've initialized the object once we've created the object we can no longer change it it's read only so if we try to run this code now we're going to get an error we can't assign to the field so that's also pretty useful especially in many cases you want to make sure that your data is not mutable making things not mutable generally simplifies your code because you don't have to worry about whether something has changed or not it's constant that's why i really love using constants wherever i can now python doesn't have a constant mechanism for primitive types this kind of comes close for objects but that's only about the contents of the object of course i could assign a new value new person instance to the person variable there's no way to get around that if you look at other languages they often have a const concept so that you can also no longer assign something else to the variable and that's i find that pretty useful it's pity that's not available in python so what i've just shown you simple data classes being able to provide default values using a default factory like a list or supply function to provide the default value excluding some information from what you're printing using post in it to augment the object with extra generated data after it's been initialized i think this covers about i don't know 95 percent of the things that i in generally do with data classes there's a bunch of other things you can do with data classes i won't cover all of those details in this video but i want to talk about a few things that i think are interested that have been added to python 3.10 so if you're on python 3.10 or newer then these things are available to you and they can be quite helpful the first thing that i want to talk about is the keyword only argument that you can pass to the data class decorator let me just delete this frozen stuff here again like so what you can say since python 3.10 is you can pass a argument keyword only and set that to either true or false so default this is going to be set to false so let's set this to true and what this means is that you can only initialize an object of this type person by supplying keyword arguments so this is what we're already doing so when i run this code there is actually no issue it simply prints the person but what i'm not allowed to do is remove these keyword arguments and simply initialize the person by providing the arguments directly as non-keyword documents and now you see we're going to get a type error so this is not possible anymore because we set keyword only to two if i set this to false again then you see it's going to run without any issue so keyword only allows you to force somebody that uses the person class to actually supply the keywords let me put these keywords back in there like so and now this is going to work as expected again a second thing that's been added since python 3.10 is that you can set match arcs equal to true and what this does is that well python 3.10 it introduced structural pattern matching i actually did a video about structural pattern matching a while ago and if you want to watch that video i'll put a link in the top but what this does is that if you set this to true it's going to generate the match args dunder method that's going to supply the arguments that you can use in structural pattern matching so that can be useful match arcs by default is set to true so you can actually switch that off if you're not using structural pattern matching so basically it means data classes support structural pattern matching out of the box but using match arcs you can actually disable that if you want to the third thing that's new by 3.10 that's actually a real game changer i want to talk about that's the slots option under the hood a python class is actually a kind of really advanced dictionary when you create an instance of class there's going to be a donder dict object that contains the references to all the instance variables like here i have person you can see there is a donder dict that's right here and this is actually a dictionary from string to the value of the instance variable so you could actually print person dict and then let's say we have the name of the person and this is going to access the instance variable name so if i run this code you see that it actually prints the name of the person and that's because under the hood in a class these instance variables are stored in a dictionary now dictionary access is pretty fast but there's actually a faster mechanism in python called slots that makes accessing instance variables in the class much faster because that works in a much more direct way the default classes use the dander dict object to access instance variables so when you have data class it actually generates this dict dunder object for you if you want to benefit from faster access then you can actually indicate that slots should be true and now data class is not going to use the dict dumber object but it's going to use slots instead and this still works exactly the same way but instance variable access is just going to be much faster how much faster are slots exactly well i made a little example to show you the difference i have a class here person it's a data class slots equals false that's the default way of creating a data class so this uses the dict donder object and i have a data class person slot that has slots set to true otherwise it's exactly the same but it uses slots then i'll come to this point in a minute then i have here a function that does a get and a set and delete just performing some operations so this gets either a person or a person slots object this is the union new union syntax in python 3.10 and i have a main function here that creates a person creates a personal slots they're exactly the same and then i'm going to repeat the get set delete call on person and personal slots lots of times and i'm going to time it using the time it library and then in the end i bring the output and i compute the percentage improvement of slots over no slots so let's run this and see what that looks like so as you can see there is a performance improvement here of over 20 percent if you're doing a lot of data processing this actually makes a big difference at 20 performance improvements which is quite a lot there is an issue though you might think hey why are we not using slots everywhere why is dict actually the default way to do it in python well the problem is that python is a very extendable languages and that has a price this is the price that you pay for that the issue is that slots break when you try to use multiple inheritance i have an example that i want to show you let's say we have an employee class that also uses slots and now i'm going to create a class that's called person employee and that's using multiple inheritance combining both person slots and employee slots what happens if we run this code we get this arrow because there's a conflict between these two base classes that we have here because they both use slots and now person employee doesn't know which list of slots it should rely on for the inheritance relationship in my opinion this is yet another reason to not use things like mix-ins and multiple inheritance actually in my course the software designer mindset i talk at length about why you shouldn't use those things and also give a lot of background on what it means in terms of cohesion and coupling in your design but here you see very clearly the price that you pay in python for having complete freedom to do whatever you want because python can't just drop support like that for multiple inheritance and those kinds of things well they can't make slots the default way of defining a class so that means that default if you do nothing you're going to pay a 20 cut in performance even if you don't use multiple inheritance if you're aware that you can define your classes using slots then you get the performance improvements but you have to do some work for that and that's really a pity i would actually be in favor of python adding a strict mode similar to what javascript has where basically a couple of things are going to be limited like you don't use multiple inheritance in strict mode but because of that you can use an optimized version of python that does have slots by default so the strict mode is going to be more limiting but because of that python can make assumptions about how you're going to write your scripts and that means you can optimize the performance more i hope you enjoyed this more in-depth video about data classes if you did give this video a like it really helps support the algorithm and let me know in the comments what you thought about it if you want to learn more about software design and development consider subscribing to my channel thanks for watching take care and see you next time
Info
Channel: ArjanCodes
Views: 769,837
Rating: undefined out of 5
Keywords: python data class, python data classes, python dataclasses, data classes, python classes, python tutorial, data class, data classes python 3.7, python data class tutorial, tutorial data class, python dataclass, learn python, classes python 3, dataclasses python 3.6, python storing data, dataclasses python 3.7, python programming, data classes python, data classes vs data, python programming course, data vs behavior class, classes python, Python programming basics
Id: CvQ7e6yUtnw
Channel Id: undefined
Length: 22min 19sec (1339 seconds)
Published: Fri Mar 25 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.