Pydantic Tutorial • Solving Python's Biggest Problem

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to this video tutorial where I'm going to show you how to use the Pydantic module in Python. One of the biggest issues with Python as a programming language is the lack of static typing. Python uses dynamic typing, which means that when you create a variable, you don't have to declare its type, like this x for example. Compare this to something like Java or C where you actually have to declare the type upfront. Once a Python variable is created, you can also override it with a different type than what you created it with. So here if I create x = 10, in the next line I can override that with the word "hello" as a string. And Python allows you to do this. This does make it easier to get started with Python, but it can cause a lot of problems later on. For example, as your app gets bigger, it becomes harder and harder to keep track of all your variables and what type they should be. It's also difficult when you have to work with functions where the argument types aren't obvious. For example, what is this "rect" argument supposed to be here? It could be a tuple, but then it doesn't tell you if the x-axis or the y-axis should come first. But the biggest downside of using dynamic types by far is that it allows you to accidentally create an invalid object. By that, I mean an object with values that it shouldn't be allowed to have. For example, here I'm trying to create a person and the second argument is supposed to be age, so it's supposed to be a number. In the first example, I created correctly with 24 as an integer, but in the second example, I created with 24 as a string. And both of them might work at the beginning. Python will allow you to do this and things can actually seem fine for a while. But eventually, when you do try to use that age variable as a number, it will fail. This can be really hard to debug because the failure could occur at any time in your program. And it could be hard to associate that failure with the actual cause. Luckily, these days, Python has a lot of tools you can use to solve these problems. This includes dataclasses and type-hinting, like in this code example here. But today, we're going to be taking a look at Pydantic. It's an external library and it gives you powerful tools to model your data and solve all of these problems that we've just been talking about. Pydantic is a data validation library in Python. It's used by some of the top Python modules out there, notably HuggingFace, FastAPI, and LangChain. Its main benefits are that by modeling your data, you get better IDE support for type-hints and autocomplete. You can also validate your data so that when you create an object, you can be 100% sure that it's valid and it won't fail you later. And finally, if you ever need your data to be in a universal format like JSON, Pydantic gives you an easy way to serialize your objects. This really comes in handy if you need your Python app to talk to other apps on the internet, or if you just want to save your data to disk. Let's take a look at how all of that works. First, make sure that you've installed Pydantic into your Python environment. You can do it using this command. To create a Pydantic model, first define a class that inherits from the base model class. Inside the class, define the fields of the model as class variables. In this example, I'm creating a user model and it's got three fields, a name, which is a string, an email, also a string, and an account ID, which is going to be an integer. You can create an instance of the model like this and then just pass in the data as keyword arguments. You can also do this by unpacking a dictionary. So this works well if you already have the data and you just want to put it inside the model. For example, you have a response from an external API. If the data that you've passed in is valid, then this user object will be successfully created. You can then access each of the attributes of the user object like this. I'm going to head over to my IDE so I can show you how this works in action. I have my user model defined here and I think by far the most useful feature of modeling your data is that you get type hints in your IDE. So what I mean is if I start typing out my user, I get autocomplete and auto suggestions based on this model. So here I've created this user object and I haven't filled in the data yet, but if I mouse over it, it actually tells me which arguments it accepts. And here I can fill it in with the examples you saw earlier, so a valid name, a valid email, and an account ID. And now if I print the user, you can see that all of this information is contained in this one object. And of course the type hinting makes it easier to work with when you actually need to use one of these models. So for example here, if I'm printing the user, I can just press a dot and then I get a list of all the valid variables associated with it. So for example, if I wanted email, I just start typing and it knows that this user has an email attribute. With type hints, your code becomes much easier to work with because you don't have to remember everything yourself. Your IDE does it for you, and this is especially useful if you're working with really large code bases or if you need to collaborate with other developers. Pydantic also provides data validation right out of the box. This means that if you try to create an object with the wrong type of data, it will fail right then and there. This is good because if your software has to fail, then it's better that it fails as early as possible. This will make it easier to debug. So let's go back to our example here and see how that works. If I try to create this user with an account ID that's not an integer, for example if I turn it into a string and I try to run it, I now get a validation error. So I can see immediately that I tried to create this object with the wrong type of data. And in cases like these, I much rather it fail right away with the descriptive error message than silently succeed, but then fail at some point much later down the line. You can also validate more complex types of data. For example, let's say I wanted to validate that this string is actually a valid email. First, let's change it to an invalid email, for example just Jack on its own. So this is no longer an email, and if I run this it still works because all this checks for is that it's a string. But I can actually import a special data type called email string from Pydantic. And if I replace this instead and run this again, you'll now see that I get this validation error and that this string here is not a valid email. So let me change this back to a valid email again and see if that works. And after fixing this value, the validation passes. So I have an easy way to assert that this email field always has a valid email string. If none of the inbuilt validation types cover your needs, you can also add custom validation logic to your model. For example, let's say that we want to enforce that all account IDs must be a positive number. So we don't accept negative integers for our account ID. This is what we can add to our class to make that happen. First, we'll have to use this validator decorator from Pydantic. And then we write a custom function. This is going to be a class function. And then inside the function, we can check if the value is less than or equal to zero. And if it is, we can raise a value error saying that this is not a valid value for this field. But if it is, we can return the value. So let's go back to our code editor and try that out. And here I've imported this validator decorator. And this is the validation logic I'm adding as a class function of this user model. And here you can change this validation condition to whatever you want it to be for your app. But in this case, I'm just checking that it's greater than zero. So if I run this with my current data, it should still work. And here you can see that it's fine. But if I change this to a negative number, let's see what happens. Now it fails with that validation error and it says the account ID must be positive. And here we can actually make the error message really descriptive because we can print anything we want here. And we can even print the value that the user tried to create this model with. Another great thing about Pydantic is that it provides built-in support for JSON serialization. Makes it really easy to convert Pydantic models to or from JSON. To convert a Pydantic model to JSON, you can call the JSON method on the model instance. This will return a JSON string representation of the model's data. So if you print it out, you'll see something like this. And if you don't want a JSON string, but you just want a plain Python dictionary object instead, you can use this dict method. If you have a JSON string that you want to convert back into a Pydantic model, you can use the parse_raw method. And since JSON is widely used and understood across every major tech stack, this feature will make it really easy to integrate your Python code with external applications or APIs. Finally, let's see how Pydantic compares to dataclasses, which is Python's built-in module that solves a similar problem. As great as Pydantic sounds, Python actually does ship with some data modeling and type hinting capabilities on its own. For example, you can already specify type hints like this, and most IDEs should pick it up. There's also an inbuilt module called "dataclass" in Python that lets you create a class with fields. So if you haven't used it before, this is what the syntax looks like. It's very similar to Pydantic, except instead of extending from a base model class, you're using this "@dataclass" decorator instead. As you can see, it's also really easy to use. So how does this compare to Pydantic? Well, let's take a look at some of the top criteria. They actually both give you type hints in the IDE, which personally is the biggest reason for using these libraries to me. So both of them tick that box. Dataclasses, however, does not give you any easy validation or deep JSON serialization out of the box. Now, if validation is a big deal for you, for example, you have a lot of emails or you have a lot of fields where the data type is very specific, then you probably should go with Pydantic. If you're using dataclass, then your JSON serialization capability isn't as good out of the box as Pydantic. But if your data is simple enough, you can still do some basic serialization with a one-liner like this. The one major advantage that dataclasses have over Pydantic is that they're in-built into Python directly. That means that it's more lightweight and you don't even have to install it. For many users, this may be enough. If you want some rough guidance as to which module you should use, then I recommend Pydantic if you have complex data models or you need to do a lot of JSON serialization or you need to work with a lot of external APIs. But if data validation isn't important to you and your data isn't super complex, you can get away with dataclasses. And that's it for Pydantic. If you haven't used it yet, then give it a try and let me know what you think. If you've enjoyed this video and want to see more tutorials like this, then please subscribe to the channel and let me know in the comments what type of topics or modules you'd like to see covered next. Otherwise, I hope you found this useful and thank you for watching.
Info
Channel: pixegami
Views: 104,531
Rating: undefined out of 5
Keywords: python, python tutorial, fastapi tutorial, pydantic, python pydantic, pydantic tutorial, data manipulation, simplify data validation, serialization, complex data types, custom validation logic, data validation, data validation logic, deep json serialization, json serialization, data storage, dataclass, object creation, python developers, python pydantic tutorial, pydantic tutorial python, how to use pydantic, what is pydantic, pydantic python
Id: XIdQ6gO3Anc
Channel Id: undefined
Length: 11min 7sec (667 seconds)
Published: Mon Sep 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.