Pydantic - Nested Models, JSON Schema and Auto-Generating Models with datamodel-code-generator

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're going to continue learning about pedantic and we're going to dive deeper into what the library can offer including looking at nested models that form parent-child relationships we're going to look at how to use the literal class from Python's typing module in order to constrain values we're going to look also at how to define default values for particular fields on your pedantic model and finally we're going to look at a tool called Json schema we're going to see how to create Json schemas from a pedantic model classes and we're also going to see the other way around we're going to take a Json schema that's out in the wild and we're going to convert that to a pedantic model using a tool called data model code generator and that gives you flexibility if you have adjacent schema in the wild you can then create pedantic classes automatically and start using them in your code so let's get started now I have a blog post here this is a continuation from the previous post and the previous video that we've done and what we're going to do is we're going to start by building nested pedantic model classes now what we had in the previous video is a student model and that's a pedantic model as you can see it inherits from the base model and what we've got in vs code is I've refactored this a little bit and I've moved the model into a models.pi file so the pedantic model now lives in this file and it's the same as we had at the end of the previous video contains these fields here as well as a custom validator function to ensure that the student is above the age of 16. now in addition to that one of the fields the department field is an enum value and that's defined here in this enums.pi file and finally in the main.pi file that's where we fetch the data from GitHub and in a for loop we're converting each record in that data to a student model and that data comes from this repository here and in the previous section we were just working with the student version one file which contains a flat record for each student in the data what we're going to do in this video is we're going to move on to the version 2 file and that's this one here and as you can see each student has another field in the Json data and that's a modules list and and each module is modeled by an object as well so this list of modules each contains an object or a dictionary with key value appears and what we're going to do in this video is we're going to validate that as part of our incoming data and that's going to involve nesting a model within our existing student model we have the student model here what we're going to do is we need to add a new key here for the modules that's going to have to reference another pedantic model we're going to see how to link these up in the first part of this section so if we go back to our blog post here we're going to go and scroll down here and you can see one example record from the module objects each module has an ID a name a professor and credits as well as a registration code now to help us learn more about pedantic in this video we're going to create the following constraints firstly the ID field it can either be an integer as it is above or it can also be a uuid and remember if we go back to our student model that has an id2 which is always an uuid object but for the module object we're going to model that as either being an integer or a uuid the second constraint we've got here is that the credits field can only have values of either 10 or 20. we can't allow just any integer for that field it's going to have to be either 10 or 20. we're going to use Python's literal type in order to do that later on and there's two more constraints here if a student has not chosen a course in other words if the course is null then our module list will not exist in the data and finally if a student has modules in the data there must only be three modules for the Academic Year so if we have modules in a course we can only have a total of three modules and we're going to write a custom validator function for that particular condition later in the video so let's get started we're going to see how to define a nested model in pedantic we're going to create another class here within our models.pi file and this class is going to be called module so we'll call it module and again because it's a pedantic model it will inherit from the base model class and just like the student model this is going to have some Fields firstly it's going to have an ID D field now remember we said that can either be an integer or a uuid object so we're going to model that with a union type it's going to be an INT or a uuid DOT uuid object and remember in Python 3.10 and above this is a Syntax for a union type and this means it'll try and convert the ID that it's receiving to an integer that doesn't work it will then try and convert it to a uuid and if that doesn't work it's going to throw a validation error so that covers the ID field in our data let's go back to GitHub and what we're going to do now is model the name so if we copy the name of that field and go back to our model we're going to Define that as a very simple string object the name of the module could be something like web development or machine learning that's just going to be a string so we'll just model it as a string let's move on to the next field back to GitHub you can see that we have the professor field and just like the name that's just going to be a string now we have two more fields in our data let's go back to GitHub we have credits and that is an integer and we're going to model that for now as just being an integer but member that we said that that can only be 10 or 20 so later on we're going to change the model and make sure that that can only accept those two values and the final field if we go back to GitHub it's this field here it's the registration code we're going to model that as a string so that's our simple pedantic model for the modules that we're going to have in this application what we're going to do now is copy the name of the class and we're going to reference it in the student model because remember the whole point of these nested models is that the student that we have as a record here it contains this field called modules that itself is a list of objects so what we're going to do is we're going to type hint the modules and it's going to be a list of these new modules and we can give that a default value of an empty list which means that if we don't have a key code modules in our data it's going to default the value to an empty list and that's one way to define a default value with pedantic after defining the type which is of course after the colon we can then have an equal sign that references the default value so here we have a student and it contains a field called modules that is a list of nested module objects so when pedantic receives this data from GitHub that we have here it's going to look at the modules it's going to look at all the records and for each one of those records it's going to validate based on this model that we've defined above the student and this is a very nice and pythonic way of defining validation for complex data you just define a class with fields and data types and those validations will occur automatically when you try and convert your objects to these models so what we're going to do within the main.pi file I'm going to change the source data and this is referencing the first file that we had on GitHub we're going to reference V2 now so let's change that to V2 and save the file and we can run this with python main.pi and let's see what this outputs below if we look at the first record here we can see that the modules is referencing a list of dictionaries that's because we're using the model.json function what we can actually do is we can print out the model that's here and we can do that only for the first iteration of the for Loop by breaking out after printing that model so let's save that file and we're going to clear the terminal and run python main.pi again so if we Analyze That output here you can see that the modules list references are list of module objects and these have all been validated by pedantic because on our model class the student model we are now referencing a list of module objects and pedantic will then validate the incoming data from the nested record that you can see here it will validate all of those fields based on the specification that we have defined here in the base model subclass and you can see that being printed to the terminal below now we can see that the ID is a union it's an integer or a uuid object what we're going to do is go back to main.pi and we can remove the print and the break from this for Loop above that we have a model which is a student model after converting all that data to a student what we're going to do is we can iterate over the modules for that particular model so we can say for module n modeled or modules remember more modules is now fueled on the pedantic model for each one of those we can then print out the modules ID and we're going to see a mixture here of integers and uuid data types so let's print that out to the terminal and you can see that we get integers for some of these records and uuids for others so this is working we have a union of types and that is not throwing any validation in us if we removed one of these types for example the uuid and we tried to do this again we're now going to get some validation errors because some of these records some of the data is not going to pass and you can see that below so that's just a demonstration of why we need the union type here in order to pass the validation let's now move on with the tutorial now what we're going to show in this part of the video is a very quick caveat to using Union types if we go back to the blog post we're going to focus on this section here and there's a statement here that is very important to know that when pedantic encounters a union as a type it's going to try and cast the data in the order defined in the union so what we have an error code if we go back to our module it's going to try and convert the ID to an integer first and if it doesn't work it's then going to move on to a uuid but this can be a bit dangerous depending on what you're doing if you go back to the post let's imagine that we have a class called number and this is a pedantic model and we have a field on that model called value and we Define our Union it could be an integer or a floating Point number now the caveat here is that if we try and pass a value such as 2.2 that's always going to be able to be converted to an integer the integer type is defined first in this Union will then convert that number to just the number two and that's because all floats can be converted to integers so you need to be careful if you're doing something like this where one type might be a subtype of another this is probably not the behavior you want when you pass a number like 2.2 to this model that can be represented and should be as a floating Point number so the order of the Union types is sometimes important I'll link to a section of pedantics documentation that talks about this issue but I just wanted to cover that quite directly let's now move on to the next section and that's where we're going to Define additional validations on our module pedantic class now you can see here that we've got two additional validations that we want to perform on this module class firstly the credits can only have the values of 10 or 20 and secondly if a student has modules the list must only contain three modules so we're going to check whether or not this student has modules and if they do we're going to constrain the length of that list of modules to three elements so let's start with the credit field and we're going to go back to our models.pi file I'm going to make the terminal a little bit smaller and what we're going to do at the top is from the typing module in Python we're going to import the literal type and this allows you to define a type that's constrained to the set of values you give to the literal type and we can do this for the credits which is an integer field by defining a literal and the two values that we're going to allow on this field are the values 10 and 20. so a module has credits at this University but it can only be 10 or 20. we're not going to allow any any integer number it has to be 10 or 20. so let's save the new model and we're going to go back to main.pi we're going to remove this for Loop here and instead we're just going to print out the model for each model and the data on GitHub let's now run main.pi and I'm going to expand this terminal now it's not the easiest to see on this terminal but we have credits equal to 20 on this first record and we also have another one equal to 20 and finally one equal to 10 here so these are all passing the validations even though they have different values for the credits but none of them are values are not equal to 10 or 20. what we can do is we can introduce a value that is not accurate by copying one of the source records from GitHub and I'm just going to copy this particular module here we're going to go back to the data and what I'm going to do to the data from GitHub is I'm going to get the last record and we're going to get the modules key from that Json data and we're going to append to that list of modules our new module and I'm going to paste in this one here now the formatting is not great here but we don't need to worry about that but what I'm going to do is change the credits here from 20 and let's just change it to 24 and see what happens so if we rerun the main.pi file you can see that we have an error that's coming through and the error is related to the credits field and it's an unexpected value we are only allowed the values 10 and 20 but we have been given a value of 24. so that's the effect of adding this literal type to a model it constrains the values that are applicable to that field to whatever you specify here in the literal type and because we're passing a value that's not defined and that type this is now flowing a validation error so let's go back to main.pi and we're going to comment out this now the other thing that we wanted to do in the models.pi file for the student model we want to make sure that the modules list is either an empty list if the student doesn't have a course or if they do have a course and a list of modules we want to make sure that the length of this list is equal to three now for the student model we've already written a custom validator function that was for the date of birth field so we're going to define a new validator function here and it's going to be for the modules field so let's write that function here we're going to call it validate module length it's going to take the class as the first argument and it's also going to take the value that's coming through for the modules field in the data now what we're going to do first of all is we're going to check the length of the modules that we're getting from the data and if that's equal to 0 that's going to return false and we're not going to continue in this if statement but if that's true we're going to chain an and to this if statement and we're also going to check that the length of the value is not equal to a 3. and in that case we're going to go to the if statement and we're going to raise our value error and to that value error we'll pass a string saying that the list of modules must have length of 3 and if we don't go into that if statement we know that everything is okay so we're just going to return the value so let's go over this code quickly what we're checking in this if statement is we're checking to see whether the module list has a length that's greater than 0 in other words if there are any elements in the list and if there are elements in that list we use an and and we then check whether the length of the list is not equal to 3 you know we require it to be three based on these requirements so this statement all it's doing is telling us we have a list that has a length that's not equal to zero but that length is not equal to three so we're going to raise the value error now if we run python main.pi you can see that we get back all the data everything is passing this validation but we can introduce another error here by going back to main.pi and I'm going to uncomment this data dot append statement that we had here where we were adding one module to that list of modules for the last object in the data now if we append a new module to that list it's going to take the number of elements from three up to four now if we execute this as it was before you can see we get the same error because of the literal that we had so let's change the credits back to 20 and we can re-execute this code and you can see that we get our modules error here with a list of modules needs to have length of three so we're appending this new object to the module list and that takes the number of elements Beyond three and that will then cause the validation error to throw within this validator function and that will stop us being able to construct the object based on that data so let's go back here and again we're going to comment this out now let's finish this video with something a little bit different we're going to look at something called Json schema and how it can be used with pedantic we can go to a web page on this this is the Json schema website and you can see that it said declarative language that allows you to annotate and validate Json documents so for fields in a Json file you can Define things like types and other constraints that are quite similar to what pedantic models can do as well what we're going to do now is we're going to learn how to take a pedantic model and convert it to adjacent schema now there's a section of the pedantic documentation that I'll link below the video this is on schemas and it's this page here and it will tell you how to create a Json schema from your pedantic model classes as you can see in the documentation there's our main model class here and at the bottom we use the static function schema Json on the model class and we can then convert the model definition to a Json schema and we're going to see how to do that right now with our module code we have the module here that's defined in the models.pi file if we go back to main.pi what I'm going to do is we're going to comment out this for Loop that we've been running and below that we're going to use a print statement and we're going to reference the module pedantic model and that has a function called schema Json and we're going to print that out to our terminal and of course the module needs to be imported at the top here now let's expand the terminal and we're going to run this code to see what it outputs now you can see here we get certain fields in this output for example the title is equal to module that references the name of the class and we also have things like properties now it's not particularly easy to read this but the schema Json function has a parameter called indent and we can set that equal to 2 here and that's going to print things out with a bit of indentation so let's clear the terminal and rerun this and you can see that we get a better representation of this data the properties refer to different fields on the model so for example there's an ID field and that corresponds to what we have on our model class there's an ID a name and a professor and you can see that these are represented in the Json schema and each field for example the name field has a type in this case it's equal to string whereas the ID is equal to an integer or it also could be a yoyo ID and it uses the any of key to determine that it could be either one of those and you can see that the credits below here and that's another field on our model and because we're using the literal type it specifies that as an enum where the values can only be 10 or 20. so this is what Json schema is it gives us a way to declaratively annotate and validate our Json schemas and we can produce this from a pedantic class and that can then be given to other developers who can develop their own code from that Json schema now what I'm going to do is I'm going to save this output to a Json file and what we're going to do next is we're going to use that Json to create a pedantic model from the Json file and that's one of the great strengths of Json schema it allows you to share the schema without developers we can then load it into their own applications whether it's in python or go or c-sharp or any other language so what I'm going to do is copy this output here and I'm going to paste that into a file so on the left here I'm going to create a new file that's going to be called Json schema.json and in that file I'm going to make the terminal a little bit smaller and I'm going to paste that code that's been output from the models Json schema function so we now have Json within this file and this is a Json schema file it determines the fields that we have in our Json as well as some of the types and also what fields are required in the Json now once you have this Json schema file you can put it on the internet if you have an API or something like that and other people can then generate code based on that schema now what we are going to do is install another module and it's this one here it's called the data model code generator and in the project description for this particular package it says it's a code generator it creates pedantic models from an open API file so let's copy the PIP install command here and we're going to paste that into the terminal in our virtual environment here so we'll paste that in and run the command and this is installing that module locally we're going to use the command line tools provided by that module in a minute to generate a pi dancing model based on this Json schema now remember this Json schema is coming from the module model when we run the schema Json function it generates this file so we don't technically need to generate another model because we already have the source model but let's imagine that we didn't once we've installed our library we can clear the terminal and this new library called data model code generator provides a command line tool called data model code gen so let's copy the name of that we're going to paste that into the terminal and this is going to generate pedantic models based on adjacent schema so we need to give it an input file which is a Json file or some sort of Json schema that's defined on the web and the input file is going to be this Json schema.json file that I've just saved based on the pedantic output so let's reference that so data model code generate fix that Json schema as input and then it's going to provide an output file and that's going to contain our pedantic models that have been generated from the schema so let's just call this models2.pi and that's where it's going to place the pedantic model generated from the schema so let's run this and see if it works and when that finishes executing you can see that we have a new file in this directory it's called models2.pi and you can see at the top this comment it's been generated by data model code gen and that's the source file that's generated this code and it's quite interesting to see here how it's taken that schema and it's generated this code for us because it's not exactly the same as what we defined for example you can see that the credits field that's been given a type of credits which references this enum here and that contains the two numbers that we had in our own model except that we defined them with Python's typing.literal type so this is doing things slightly different it's using an enum instead of the literal class and you can see we also have the union defined here for the integer and the uuid but is replicating the model that we have this is quite a powerful tool it's called data model code gen you can take any Json schema on the web and you can then create pedantic models automatically based on what's defined in that schema and that can be very useful if you're integrating with a particular service that has a Json schema available and we can also test if this works with the student class which is a bit more complicated than the module because it contains validations and also a nested model so let's delete models 2 dot pi and within the main.pi what we're going to do is change what's output here we're going to reference the student model instead of the module and if we run python main.pi you can see that we are getting some output and what I'm going to do now is just copy that to a file so I've copied that Json and I'm just going to overwrite what's in Json schema.json I'm going to paste that in there if we scroll to the top of that you can see that this is defined for the student object and the properties are now referencing things like date of birth and GPA and these were defined on a student model and remember the student model also has a list of modules you can see that's represented just below here modules has a reference to the child model and the schema so the module type is array and it has an items which references the child model schema and if we rerun the data model code gen command that's going to try and take the input from this new Json schema file for the student and output that to our models 2. Pi so let's check that file and we'll see what's been generated here you can see that we get the Department enum that we had in our own code so it's done that successfully it's exactly the same as what we defined and we also have the other enum for credits and we have the child model which is module and there we have the student model as well and that contains all the fields from the student and it's included things like the constrained float for the GPA that makes the value between 0 and 4 and it's also included the optional value for the course and also the list of modules now it's referencing a field function that we're going to see in a future section of this series but everything else is is by and large the same now there is one limitation here of course if we go to models.pi you can see that we have these two custom validator functions these are python functions defined on the model class this is not represented in models two we don't have these custom validator functions and that's because with Json schema it's not immediately clear how we take a python function and represent the constraints in that function within a structured file such as a Json schema so it's not able to replicate the validator functions apart from that it's very good at taking the schema and creating models automatically from the schema so that's all for this section here we've learned how to do nested models we've learned about the literal type in Python and we've learned how to define another custom validator function on our model and finally we've learned what Json schema is and how we can use that to create definitions for the data and our applications and also how we can read a Json schema and create pedantic models from that schema in the next video we're going to talk about this field function you can see and these generated classes and that allows you to do things that are more flexible with pedantic and we're also going to learn how to define config classes for pedantic models and these config classes apply modal wide configuration to fields and finally we're going to see more advanced ways that we can export our models to dictionaries and Json so far we've only seen a complete dump of that data but there are situations where we might not want to export everything in our model so we can also constrain what's exported by the model.json and the model.dictionary functions we're going to see that in the next section but that's all for this one thank you for watching if you've enjoyed the video or learned anything please like And subscribe to the channel and we'll see you in the next video
Info
Channel: BugBytes
Views: 13,783
Rating: undefined out of 5
Keywords:
Id: yD_oDTeObJY
Channel Id: undefined
Length: 24min 39sec (1479 seconds)
Published: Mon Mar 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.