Carl Meyer - Type-checked Python in the real world - PyCon 2018

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Nice talk, thank you for sharing

👍︎︎ 6 👤︎︎ u/r0s 📅︎︎ Jul 15 2018 🗫︎ replies

Interesting, I try to type annotate most of the python I wrote. Haven't considered actually running it through a type checker before, he mentions MyPy seems like its worth a look.

👍︎︎ 1 👤︎︎ u/boarnoah 📅︎︎ Jul 15 2018 🗫︎ replies
Captions
hey folks let's all give a round of applause for Carl Meyer who was here to talk to us about type Tec Python in the real world Thank You Rami welcome to the final session of talks at PyCon 2018 I hope that after three days of PyCon and three nights of enjoying the best of Cleveland you're all awake enough to process some Python code on slides because we're gonna see a lot of it in the next half hour I'm Carl Meyer I work in instagrams server core infrastructure and I'm here to talk about type-check Python so if you see me around the internet I probably looked like this this is me and my sister prototyping some eyewear designs of our own creation that never took off I'm Carl JM pretty much everywhere on the internet I've been writing Python code now since the well since before the turn of the millennium which I'm fairly sure makes me officially old for the last couple years I've been working at Instagram most recently on adding type-checking type annotations to our server code base so a rough plan for the next half hour we'll talk a little bit about why you might want to type your Python code we'll go into how you would go about it a sort of a brief tour of pythons type system and lastly we'll talk about gradual typing what it means and why it matters so Y type your Python code if some of you in here are coming from a static typing background you might have the opposite question like how is Python even usable without static typing but since we're at PyCon we'll take the question from the opposite side I've been using Python for years it's fine why do I care why do I need type annotations so in this method process method on some class it takes an item's argument what is items in Python we have this idea of duck typing if it walks like a duck and quacks like a duck it may as well be a duck so we can give a duck typing answer to our question what is items items is some collection that we can iterate over each item in the collection should have a value attribute which itself should have an ID attribute that's great that's a very flexible answer to the question it could allow us to reuse this process method in various contexts past different kinds of collections maybe even containing different kinds of objects as long as they all conform to this contract with a type int look the problem with this though is code is written once but maintained for a long time so what if I come back to this code in six months and I've forgotten everything I knew when I wrote it the contract that we just described I have to reestablish by reading through every line of code in the function line by line it's entirely implicit or what if how do I know that I'm conforming to this contract everywhere my code base may be somewhere in some dark corner and passing in some object where it's value attribute could in some cases be none and then I'm gonna get an attribute error how would I catch that or maybe I need to add some new functionality to this function I have a new requirement I want to access a new attribute on my items how do I know that everywhere that I'm currently passing in objects to this function that they have this new attribute if I have a large code base in some cases answering that question satisfactorily could require digging through layers and layers of code not only the call sites of this method but perhaps their call sites and their call sites until I track down the origin of the collection that ultimately is getting passed to this method where the type annotation all of that goes away now I know exactly what I can expect to receive a sequence of this particular item class I can go directly to the class I know where it is I can see what attributes and methods it has this is nothing new of course people have been putting the same information into docstrings for years or into comments there's multiple standards even EP doc and others for how you can represent argument and return types in your doc strings in Python so it's clearly useful information for maintained errs the problem with the docs Turing annotation is that at some point it's guaranteed that someone will update the signature of the function and forget to update the doc string at which point it's obsolete and arguably worse than worse than useless whereas this type annotation can automatically be checked for correctness so it has to remain up-to-date with the code so I I can almost hear someone in the room thinking that's great but I don't need it I could catch those things with a test which is great I love tests I've written a lot of tests I've given Python talks on writing tests in Python there's this trope of the dynamic language programmer claiming they don't need static types because they write tests or the static typing programmer claiming they don't need to write tests because the compiler catches all of their bugs both are right and both are of course wrong if we imagine a two argument function and this plot is the space of all possible inputs to the function so one argument on the x-axis one on the y-axis we have the space of all possible arguments if we write a test case it's a single example we give two argument values one for each argument we assert the correct return value we've covered exactly one point in this plot typically we write test cases for a variety of points on this plot that we think or hope or maybe even know are representative of the space of possible inputs that we think we care about maybe if we're especially clever we can write a parametrized test case cover a whole range of inputs with a single test maybe even a quick check style or property based test cover an even wider array of possible inputs with type annotations we can add just a few characters to our function definition and instantly eliminate entire sways of this area to cover i annotate that my function takes two integers and all of this area out here for all the possible strings and lists and dictionaries isn't is eliminated I can focus all my testing effort on ensuring correctness with high granularity in this area where it really matters we're even the best type system can't fully ensure correctness so let's say I've convinced you and you want to start type checking your Python how do you make it happen let's take a little tour of what typing looks like in Python so we have a simple function a square function it takes an integer returns an integer the square of the argument we've seen the syntax for this in previous slides after each argument we can have a colon and then the type after the argument list we can have an arrow and then the return type so let's call the function a few times we'll take the square of 3 we'll take the square of a string just for kicks we'll take the square of 4 and then add it to a string now let's try type checking this code will pip install my PI my PI is a open source Python type checker written and maintained by a team at Dropbox it's the by far today the most commonly used Python type checker so we'll use it in our examples for this talk so let's run my PI on our file and we get a couple of type errors let's dig into them a little bit we get a 1 type error because we tried to pass in a string where an integer is expected and another type error because we tried to add an integer the annotated return type of the square function to a string which is a type error in Python and we got both of those errors without having to set up any kind of test harness or write any kind of test case that would exercise this code just by running a static analyzer over the code so the type checker asks us to annotate our function signatures in order to validate our assumptions about input and output types in between there's a lot it can infer for instance in this class because it knows that the we've told it that the type of the width and height arguments to the initializer are both integers it can infer through the assignments to self and understand that every photo instance will have a width and a height attribute that are integers and if we need another method we try to return self dot width and self dot height and claim that they're a couple of strings the type checker can catch that and tell us no that's a couple of integers we can also infer the types of containers if we create a list of photo objects try to append a string to it the type checker will tell us hey maybe that's not what you intended to do this is of course the type checker being a little bit opinionated in Python it's perfectly legal to have a heterogeneous list but the type checker assumes that if we initialized it with a homogeneous set of objects that probably that's what we intended and it was probably a mistake to add a different type we can use an explicit type annotation if we want to give a broader type to the list in some cases type inference won't be enough to understand the type of every variable for instance if we create an empty container the type checker doesn't know what we intend to put into it so it asks us to be explicit we can add a type annotation for the variable like this and say this is intended to be a list of strings then the type checker is happy this particular syntax with a colon after the variable name then the type annotation before the equals sign is new in Python 3 6 if you're on an older version there's an alternative comment based syntax you can use I won't go over it here but it's in the documentation so that's pretty much the basics to review what we've covered mostly you want to annotate your function signatures the arguments and the return values and occasionally you might have to annotate a variable but usually you only want to do this if the type checker asks you to otherwise you'll end up with a bunch of redundant variable annotations for things the type checker could have inferred correctly anyway so let's go a little deeper sometimes we write functions that can take or return more than one type we can handle this the simplest way to handle this is with a union type so for this function it could return a foo or a bar so we annotate the return type as a union of foo and bar that means it could return either a foo or a bar very common case of this is a function that can return something or none it's so common in fact that there's a special form for that optional foo means the same thing as union of foo and none this function could return a foo or it could return a none so here we have a function foo that takes a foo ID which is an optional integer either an integer or none and returns an optional foo either a foo or none so let's get a few instance my foo and let's access that's ID oops we have a type error because we told the type checker that this function could return none and we didn't check whether my foo was in fact a none so accessing the ID attribute could be an attribute error at runtime so we get an error from the type checker this illustrates why you want to avoid using unions and optionals particularly as return types because every time if your function returns a Union or optional every caller has to check what they got back before they can safely make use of the return value in this case though that's a sad outcome if we look at the code forget foo we can see that if we give it a none it will always return none if we give it an integer it will always return a foo so we know that but the type checker doesn't so even though we call it with an integer the type checker thinks the return value might be none and this is going to cause us to have to add extra redundant checks into our code that are useless at runtime just to satisfy the type checker there's a better option in this case using the overload decorator from the typing module we can give the type checker more information about the invariants of our function for instance we can say overload allows a kind of pattern matching similar to overloaded functions in other languages so you can say in this case if foo ID is none then the return value will always be none if foo ID is an integer the return type will always be foo and then lastly we give the actual definition of get foo now it's important to note that there's nothing there's no kind of dynamic dispatch or anything happening here at runtime this is purely additional information for the type checker at runtime the only thing that's used is the final definition of get foo that's why the other two you don't need a body they can just use pass they're just additional information for the type checker to better understand the type invariants that are actually implemented by the function so with this definition if we call get through none the type checker will understand that the return value none and if we call get through with an integer it will understand that the return value is a foo and so we won't have to check before we access its ID attribute or whatever else another way that we can make the type checker smarter about understanding our code is generic functions so to define a generic function we can define a type variable which is like a placeholder for a type so here we define a type variable called any string which is a placeholder for either string or bytes type variables can be unbounded where they could match any type or in this case this type variable is as a bound of string and bytes so we can define a concatenate function that takes to any string and returns in any string and then concatenates them and returns the result now this is different from using a union of string and bytes because the type checker will ensure that the type variable is binds to the same type throughout any any call to the function so it will give us a type error if we try to call concat with a string and a bytes which is good because adding a string to a bytes as a type error and of course because the type variable is bound it will also give us a type error if we try to call concat with two objects that are neither string nor bytes and perhaps most importantly if we concatenate two strings together the type checker will understand that the return value must be a string not a string or a bytes and similarly with bytes we can catenate two bytes we definitely get a bytes back in fact this any string type variable is useful enough for defining functions that can handle strings or bytes that it's built into the typing module we don't need to define it ourselves we can just import it so to review again we can use unions and optionals but sparingly and overloads and jerax allow us to keep that each the type checker more about the invariance of our type signatures compared to using unions or optionals generics or overloads can make your functions much are usable for callers without needing redundant checks so at this point somebody might be wondering what about my ducks I like duck typing in this new type safe world how do I write a function that can take any type at all as long as it has the right methods and attributes for instance maybe I want to define a render a function that can take an object and will call its render method any object that defines a render method no matter its type this is actually similar to a number of built-in protocols in Python for instance the land' function glen built-in will call the dunder len method on on any object or the next built-in will call dunder next etc so how can I type this we could try to use object since we know that every object every type in Python is a subtype of object but this won't work object has no attribute render or we could try to use the any type the any type is a sort of escape hatch the typing system provides any type is compatible with anything in type system terms it's both the top type and a bottom type it's a sub type and a super type of everything or you could think of it as it has every attribute and method basically it will never cause a type error this makes our function type check okay but it's a bit sad because now we can pass in something that doesn't have a render method which will throw an error at runtime but the type checker won't catch it these are the kinds of bugs we want our type checker to catch for us so I mentioned that this this pattern is similar to built-in protocols in Python and the type system solution for it is also called protocol it's still technically experimental you have to pip install typing extensions and import it from typing extensions but in practice it's very unlikely to change and will soon be in the built-in typing module so if we import protocol we can define renderable as a subclass of protocol and give it a render method we don't need to provide a body for the method all we're giving here is an interface what matters is the attributes and their types and the methods and their type signatures so once we have this protocol defined we can say that our render method it takes an object of type renderable and then if we have some random class which has no explicit relationship to renderable simply because it has a render method with the correct signature the type checker will accept this call it will allow us to pass a foo object to a render method because it sees that it matches the protocol if we try to pass some other object without a render method we'll get a type error so this is exactly what we want and we found our duck you might hear this feature also referred to as structural subtyping so with typical inheritance we have nominal subtyping because if foo inherits Bar we've named our super tight bar so that's nominal subtyping with structural subtyping foo is a subtype of renderable because it matches the structure of renderable it has the same attributes and methods so that's structural subtyping so strict static typing tends to be really good for like 90 to 95% of your code that's pretty straightforward it's not doing anything too dynamic if you're writing production code or production application you probably want most of your code to be like this because it's also going to be easier for your co-workers to read and maintain but there may still be those few cases where you really do want to take advantage of pythons dynamic nature you really do want a meta class or to generate a bunch of classes on the fly or whatever other off-the-wall thing you might be doing or like us at Instagram you may have a lot of legacy code that was written long before type-checking existed and you need to continue supporting that code even if it's doing some things that don't quite fit into the static typing world so pythons type system feels that pain and provide some escape hatches that you can use when you really just need to tell the type checker to go take a hike so the first one we already saw it's the any type one sample case where you might use the any type is some kind of get attribute wrapping proxy where you're wrapping some object and proxying every attribute access you have no idea what you might be proxying or what attributes it might have or what their types are so maybe the best you can do is just say that your proxy returns any from its get attribute it's not great because it means you lose all the benefits of type checking those wrapped objects but in some cases it may be the best option you have a second escape hatch is the cast function it basically lets you lie to the type checker about the type of some expression so for example at Instagram we have a configuration system and we can get a configuration value by key and basically their JSON struck stare dictionaries or lists or whatever and we don't know what shape any given config far will have so the best that our get configure our function can do is be typed to return any because we don't know what shape of object it might return but in practice given a particular config it's some specific call site we probably do know what the shape of that config he will be otherwise we wouldn't be able to make use of it so we can use the cast function to tell the type checker look actually I know this function says it returns any but in this case I know it returns a dictionary mapping strings integers and the type checker will believe us so of course since you're lying to the type checker you want to make sure that you're right because if you lie to the type checker and you're wrong well you can expect the type checker to lie right back the third escape hatch is kind of a nuclear option type ignore says ignore any type error on this line no matter what the cause we try to reserve this one for bugs in the type checker or limitations of the type checker that we can't work around any other way so one example is my PI currently has a bug where it can't handle a property decorator stacked on top of another decorator so we just stick a type ignore on the line where it throws an error add an explanatory comment linking to the bug and move on so if the cast function is a way to lie to the type checker stub files or how you lie to the type checker at industrial-scale so at Instagram we use a lot of syphon and C extensions for performance hotspots and of course the type checker can't see into any of that code it can't read psyphon syntax it can't read C code of course so it has no idea what functions and classes are in our siphon or C code and what signatures they might have so for example say we have a fast math jool compiled module with some fast math functions in it and we won't if we put those functions in there it's probably because we call them a lot and if we call them a lot we'd really like our calls to them to be type checked of course so we can solve this problem by putting a py file next to the compiled module so py is Python interface it's sort of like a C header file but for Python code it just provides the type signatures the interfaces of our functions and classes that are in the compiled module so that the type checker is aware of them so for instance our fast math Pui if we had a square function in our compiled module we could put this this interface this definition line in our py file and now the type checker understands that fast math module has a square function that takes an integer and returns an integer similarly we could put class interfaces in there now it knows that we have a complex class with these two attributes of these types so now the type checker will be able to check the correctness of our uses of those functions and classes ok so that's the end of our tour through pythons type system last review here protocols are statically checked duck typing or structural typing and then we have a number of escape hatches that we can use if we need to escape from the restrictions of the type checker we can use any cast ignore stub files so we've talked about why you might want to type check and how you go about type checking lastly what do we mean by gradual typing and what does it matter we've actually already started talking about gradual typing so gradual typing just means you can type check your program even though not all expressions in the program are fully typed so when we look at something like the any type that's already an example of gradual typing but we can go beyond that gradual typing also allows us to incrementally add type checking to our code base as we're ready to deal with the consequences so for instance here's our code base bunch of Python modules arrows showing dependencies between the modules we introduce type checking and this is what we're gonna see errors everywhere it's not because the code is bad it's just the nature of introducing type-checking to a codebase that was never type check before but this is a problem we can't deal with this we have too much code too much to do we can't stop the world while we fix thousands of type errors so gradual typing in Python is implemented with a simple rule only functions with type annotations are checked a function that has no annotation is considered to take any return any and the body of it isn't even checked at all nothing inside the type checker won't even look at anything inside the body of a function without type annotations so this rule allows us to introduce type annotations where we're ready to deal with the consequences step by step function by function and of course there's a network effect as we add more and more type annotations so we annotate one module and we will catch some type errors in internal calls within that module maybe some calls to standard library functions and as we annotate more modules we'll be able to catch more and more type errors and calls between those modules and of course the number of errors we can catch increases super linearly with the network effect you'll want to start with your most used functions or modules because that's where you'll get the most immediate benefit from type checking and you'll want to use continuous integration to defend your progress once you've started adding type annotations and fixing type errors you really want to make sure that nobody's adding new type errors back into that same code so you'll want the type checker running in your continuous integration to prevent that my PI also provides a lot of options for various strictness levels and you can apply those options per module so once you have a module that's fully type checked all the functions are type annotated there's no type errors you can tell my PI don't allow any untyped function to be introduced into this module from now on and protect your progress that way you can even go one step further and say don't allow any usage of the any type within this module if you really wanted to keep it strictly typed so that's all great there's still a problem I mentioned at the beginning of the talk how painful it can be if you come back to code that you're not familiar with and you try to figure out what types some function can take and this may require digging through layers and layers of code to find all of the call sites of the call sites of the call sites it turns out that this painful process is exactly the same painful process that you have to go through when you're adding type annotations to code you're trying to look through it you're trying to understand what are the types what could be passed in here how do i type annotate this correctly how do I know if I am type annotating it correctly maybe I'm adding a type annotation but doesn't actually match what I'm doing in production so our CTO at Instagram Mike Krieger was actually the first person to dive into type annotations at the beginning of last year and tried annotating one of our big core modules a thousand lines of code or so and came back two weeks later and was like I'm done this is ridiculous so he suggested that maybe we could build something that would trace at runtime what types were being passed into all of our functions and then dump that information out in a really usable way to make it much easier to add accurate type annotations so a couple of us set out to built that and it turned out to work great and last fall we released it as open source so you can also use it it's called monkey type so an example of how you could use monkey type pip install monkey type of course and then you can use monkey type run to run any script it could be your tests or it could be any other script that exercises your code or there's even ways you can install it to run in production which is what we do at Instagram we sample a small percentage of production requests and run them under monkey type tracing once you've collected some data using monkey type run monkey type tracing then you can run monkey type stub some module and it will print out a stub file just like the pyi files we saw earlier that's directly usable and it will show exactly what types were recorded at runtime when your code ran and then if you want to go further you can use monkey type apply and take that stub and it will apply it to your code and rewrite your code with the type annotations applied then you can review that commit it and your type annotated so what's coming next in the world of Python typing we already mentioned yeah in in Python 3.7 there will be a future import that will allow you to get rid of some ugly string forward references that are currently necessary when you have circular type references in your code so that's one thing that's coming potentially in the future this isn't for sure yet but we may be able to also get rid of some of these extra imports from the typing module like the capital D dict and instead just use the lowercase Indic that's already built in in our type annotations there's also a pep that was recently accepted for a standard for how to bundle type stubs with third-party packages which will make it much easier to distribute type annotations with your libraries on IPI conclusions from our experience at instagram over the last year type check python is here it works there are some warts still but it's been very productive for us in production use we prevent landing diffs in our code base if they have type errors so we're using type check Python actively in development everyday our experience also is that developers love it we've received basically no pushback from anyone in our team of hundreds of developers working on our Python code base and our type coverage has grown almost entirely organically as developers choose to add type annotations because they see the benefits of reading and maintaining code that has annotations using monkey type you can annotate large legacy code bases we've gone from 0 to about half of our million and half lines of Python code annotated over the last eight months mostly by using Monkey type so it's early days it's far from perfect but it is good enough for use and it will get better in the future it's being actively worked on a few thanks before I go to the team at Dropbox for creating and maintaining my PI which has been a critical tool for us and to everyone in the Python community who's contributed to writing and reviewing typing peps I should mention quick we did recently switch at Instagram from my PI to a new type checker Peyer that was developed by a team at Facebook because it's faster for very large code bases so if you have a very large code base or you want to experiment with alternatives you can also try pyre for our code base my Pi took about five and a half minutes and pyre takes about forty five seconds for a full from scratch type check if you're working with type check Python there's lots of resources available I won't list them all out loud in detail but both for my PI and for pyre and for the reference standards and there's real-time support and get er the pepp places you can file issues monkey type issues on github as well and that's it if you would like to follow up with me afterwards and explain to me the many failings of this talk I would welcome that I'm Carl jam on almost everywhere except of course on Instagram itself where I was too late to the game and yeah I'll be taking questions outside in the hallway after the talk if you want to chat I'd love to talk to you thank you very much [Applause]
Info
Channel: PyCon 2018
Views: 55,901
Rating: undefined out of 5
Keywords: pycon, python, coding, tutorial, carl meyer
Id: pMgmKJyWKn8
Channel Id: undefined
Length: 32min 9sec (1929 seconds)
Published: Sun May 13 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.