Stanford Seminar - Optional Static Typing for Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Why are annotations mandatory in data classes?

👍︎︎ 7 👤︎︎ u/[deleted] 📅︎︎ Jun 08 2018 🗫︎ replies

Too video; didn't watch link?

👍︎︎ 5 👤︎︎ u/ivorjawa 📅︎︎ Jun 08 2018 🗫︎ replies

Oh? Types are useful huh?

👍︎︎ 2 👤︎︎ u/gct 📅︎︎ Jun 09 2018 🗫︎ replies
Captions
denis emailed me Monday morning and I did not have anything prepared so this is kind of rough although I've been talking about and thinking about this topic for a long time as you will shortly see so last time I was here python was a pretty cool language I'm still shocked that we're now number four on the most conservative list of popular languages and number one or two other popularity ratings and Denis just asked me how come Python is so popular I don't actually know I have a wild guess which is that data science has lifted all boats I don't do data science myself I barely know what the pandas thing looks like I'm sure several of you here spend a lot of time with that so yay Python but as always Python is easy to read easy to learn easy to write but that comes last actually and I should actually call out my great mentor alum bear mertens who in the late 70s early 80s designed a language named ABC that was designed to be easy to read and write and learn from for which I was one of the four or so programmers tasked with implementing ABC after three or four years the project was cancelled for lack of popularity and Python is really just a remix of ABC with perhaps the biggest difference being that ABC used capital letters for keywords as was sort of prevailing fashion in the late 70s and before utter things that sort of make python a good thing for people it's it lets you write short code but not so short that it becomes cryptic and it comes with batteries included there is a huge standard library plus it's really easy to put in different batteries better batteries batteries I had never heard of batteries I don't care about in the form of third-party packages and Python as well as everything in pythons ecosystem pretty much is open source which makes my work easier because I don't actually have to do any of it anymore I just have to say I'm the PD FL now oh let's just do a quick poll who here is familiar with coding in Python from sort of regular ok that's almost everyone so I don't really have to explain too much about pythons pythons runtime type system one of the reasons that Python is concise is that you don't have to be explicit about the types of your variables arguments return types and other things Python does all its typing at runtime so from a compilers perspective there's not much more you can say about a typical name that occurs in Python except that it is an object this is true even if it's a very primitive type like integer or a string or floating-point number there are no unboxed values in Python there is some obscure library that will let you manipulate unboxed values and the data science world numpy arrays are actually a raise of unboxed values but everything is sort of demarcated by api's that mean that the type system of the language itself has no worries about unboxed values it also has no opportunities for efficient code generation because the compiler just doesn't know it's object-oriented it supports multiple inheritance if you really want to you can do crazy things like write a function that computes a list of classes and use that as the set of base classes for your class you can even create your own meta classes I'm not going into too much detail here as I mentioned there are no declarations as such well the only type of declaration is about the scope of a variable but nothing about its type you just say def add of a and B return a plus B and it either works or it doesn't when you call it also and a lot of people don't appreciate this when they start using Python variables are not actually really variables in the sense that they are in most other programming languages in Python a variable is basically just a dynamic binding in most places what the user thinks of as a variable and what the language actually goes to considerable lengths to make you believe is a variable is really just a key in a dictionary and when you assign to that variable under the hood it sets value in that dictionary this is also why the the origin at least of why you can delete variables in Python now for local variables in functions there are certain constraints that make it possible for the compiler to actually cheat and not materialised that dictionary and in fact just manipulate values as if there were registers or something like that but for the most part you can't actually tell the difference you have to sort of really try to use some introspection api before you can tell oh hmm i guess in this context there is no real dictionary or you have to read the reference method so unlike some other dynamically typed languages in Python there is still a certain level of type safety which comes purely from the interpreter being fastidious in doing type checks when they matter when you try to add two objects together and one of those objects is an int and the other is a float great we'll turn the int into a float add the two floats and return the float on the other hand if one of them is an int and the other is a string and we'll throw a type error now if you try to multiply an int and a string together you probably know what happens it's not a type error you get a repeated string other things that are interesting in the type system that more or less follow from all this the concept called duck typing I guess the technical term would be structural type checking or structural subtyping when you call a method on an object it may or may not work if it if it works you win if it doesn't work well you have a bug in your program that's the sort of the attitude which means that you can do all sorts of interesting things where you pass a function an argument that has an attribute that is actually a method and the function calls that method and it will work for any class that defines that particular method there is no requirement to say well the right method on files can be called here but the right method on some other object cannot be called even though it happens to be called right duck typing for the win people love it with sort of some limitations also with this dynamic type system comes a lot of introspection if you want to know what the attributes of the data attributes of an object are you can ask for that objects dunder dict attribute we pronounce under-under as dunder if you want to know what its type is you can ask for its dunder class attribute if you have a particular class and you want to check whether it's the instance of that class you can call is instance like in most languages then there are all sorts of interesting quirks where at some point we added an interesting new feature like a decorators context managers descriptors a whole bunch of stuff that is well defined but is not apparent from the syntax of the language or only marginally so so why do we like dynamic typing or why do pythons users in general like dynamic typing well less typing finger typing means shorter code less boring code you can use those dynamic tricks to add a little touch of excitement like if you have a function that takes arbitrary keyword arguments then you can also turn those arbitrary keyword arguments into attributes of the current instance of any instance in fact although I wouldn't recommend it necessarily why does this not lead to terribly buggy programs because we also have a culture in the Python world of doing a lot of unit testing and other types of tests there I remember it's it's sort of settled now I think PI test is mostly one but like a decade ago there was a battle of different testing frameworks and there was a good thing there was lots of interesting developments were going on and this was all open source stuff there where I didn't really have to do much so pop dynamic typing is clearly popular on the other hand the static typing people have not given up yet static typing definitely catches bugs earlier at least some bugs it clearly doesn't catch all bugs with static typing you have a little more confidence when you're refactoring your code if you add an extra argument to a method then static typing will tell you when you forgot to add an extra value to a call site in addition and I call type declarations I call them type annotations because that's what we call them in Python now that we sort of have them human readers of large code bases the people who sort of are hired to maintain a million lines of code that they didn't write but they're still going to be responsible for fixing it or adding stuff to it those humans really appreciate having something that tells them what the types of arguments of functions are and you can say well we have documentation well documentation doesn't always stay up to date sadly I wish it was not so but it turns out that it's really hard not to get the type annotations keep the type annotations up-to-date because the compiler will tell you when they they don't match the code and it turns out static typing is also still pretty popular so there is this new concept well it was new a decade ago although I I think I only heard about it about four years ago gradual typing in Python we typically call it optional static typing it is a technique for adding static typing to a dynamic language that lets the user basically mix and match statically typed and dynamically typed parts of their code now obviously this mean that if you want to cheat you can just keep your code dynamic and the compiler will not catch you we're not into that kind of bondage and bondage and discipline you you're just shooting yourself in the foot but if you want to have more confidence about the correctness of your code now perfect confidence but some better confidence you if you want to benefit from some of the benefits of static typing including its popularity you can start using type annotations in your Python code now and sort of we've developed this over the past half decade and we found that it is actually a very reasonable way to introduce typing into a large dynamic code base Python is not unique of course typescript is exactly the same idea for JavaScript and Facebook did the same thing for PHP with hack so my part I just noticed it's it's doubly open-source it's a type checker for python it was developed in I think starting in 2012 or so by a Finnish guy in yoga lettuce lettuce hello can never pronounce his name he got a PhD from Cambridge Cambridge England from that pretty I remember he approached me at a Python conference either in late 2012 or early 2013 and he said look I I have this language that is sort of inspired by Python but I've added static type annotations to it in a gradual fashion because that that was also part of it and I've written a type checker and the trans transpiler that turns that code into Python code which you can then execute I said wow that's great and how many users do you expect to have well you know the story about the typical language developed by a PhD candidate there's one user until the thesis has been approved and I didn't come up with that that that story that I heard that from someone in the Haskell world anyway I said but if you change your syntax just a little bit let's start by turning the angular brackets into square brackets because angular brackets will not work and we can change a few other things and guess what you can actually make your language a superset of Python or maybe a subset am never never sure in which direction this goes but you can make it compatible with Python so you don't need the transporation phase code written in language X well language mypie is valid Python code and will execute and the Python interpreter will execute it just fine and in fact the Python interpreter will execute it at the same speed as it would completely unanimity yuka thought about that and together we sort of whipped up a plan and after that and after I hired him at Dropbox we started actually introducing this in the Python community and we started working on standardizing this this happened in 2014-2015 with a very contentious PAP 484 that takes existing function annotation syntax in Python them basically says henceforth function annotations will be used for type annotations because syntactically these things already existed in Python that was the sort of that hint I was giving Yuka but semantically there was never any agreement on what they meant and sort of when we entered when we started a Python 3 work there was also a very contentious discussion about adding typing to Python somehow and after sort of going around that subject several times myself I realized that we would never get agreement at least not in time for time for Python 3 oh and probably not in time for Python 3 10 we would never get agreement on exactly how those type annotations should work but I said if we just add a little bit of syntax without giving it meaning then people can experiment with assigning different meanings to that syntax the big debate by the way was about whether these type annotations are just used for correctness checking or whether they're also used for runtime checking or possibly even for code generation and it turns out that you need sort of quite different approaches for the different different ideas and I wasn't sure that I knew with what I wanted but I knew that eventually we would want some of type annotations well so that happened because you could work tat Dropbox he started sort of playing with my PI and Dropbox codebase during events called hack week and in 2015 I think after the second hack week where we played with it we started introducing annotations in the Dropbox codebase for real fast forward to this year we now have two million annotated lines of code now how do you count that it means that the total line count of the functions that are annotated is about two million this out of five or six million lines total we currently have a four person team including Yuka it's pretty popular in fact we almost never hear anything negative about this except about the speed of the tooling or the speed of the introduction of type annotations in the remaining four megabytes million lines of code we also because my PI is open source and we kept it open source and we aggressively developed it open source so all the design discussions about how should we solve this problem in my PI the entire bug tracker everything in all code reviews is done on github and as a result of that we have quite a few external contributors some of whom occasionally get promoted to Cordell and sometimes to Dropbox or we've also found that Facebook adopted this Quora adopted this lift adopted this there is an open source chat bot named zu lip that there's a Python code base of about 100 500 thousand lines of code that was completely annotated over the course I think of the previous summer so anyway I think that's a pretty healthy project the standardization efforts also paid off because it's now supported by other type checkers for Python perhaps the most well-known of which is PyCharm popular ide which has its own code completion and type checking engine but which supports the same syntax that we introduced for my pie google has a static analysis tool that will eventually help them port their I don't know tens of millions of lines of Python 2 code to Python 3 Facebook was so excited that they wrote their own type checker which they just recently open sourced at the last Python last month and then there is a component which is sort of type stubs or descriptions of the standard library and third-party packages that haven't been annotated that everybody shares and contributes to so with the project status out of the way let's start showing some code and give some details so an important requirement was that there would not be changes to the Python syntax which is where the existing syntax for function annotations was essential which is also of why I convinced Yuka to change his angular brackets to square brackets because it turns out that in Python you can overload array indexing or dictionary indexing on anything including using meta classes on types themselves so you can sort of you can define the meaning of list square brackets in square record clothes as this refers to a list of integers the other requirements were that Python itself should not slow down and should not actually do anything with the types so you can put all the annotations you want in your code your code executes exactly as if those annotations weren't there and finally no transpilers otherwise we would have a lot more syntactic options but it would also be incredibly hard to get people to adopt this because transpilers only work if all the tooling understands the sort of the richer syntax that the transpiler accepts and so I'm very happy that we didn't go that way so now here is a tiny bit of code I'm not using a fixed width font because I think that's old fashioned for slides let's some a thing named X sorry X thing named a we start with the total initialize to zero we loop over all the things in a and we add those things to the total and then we return the total well you can see that if a was a list of integers or a list of floating-point numbers or a list of or maybe a numpy array of complex numbers it would all make total sense there are also things where it doesn't make sense but that's okay like you can't sum a list of strings this way because the total is initialized to a number number type so this is fine this is also still fine in my pie but of course is no type checked now let's add an annotation so we add list square brackets of in square bracket close and arrow int 2 the function heading the signature and that's the only change we make now the mypie checker has enough information to figure out what's going on and it will in fact say this is fine because I don't have any examples with bugs in them so the syntax for annotations is named : type where type can be any anything that will explain soon return type is indicated using an arrow I think there are few other languages that loot use the same notation it's definitely not very common but Python is not unique I don't know maybe maybe Python sort of inspired the other language that they have this I think rust uses the same notation and Scala also uses square brackets for generics this same function we could annotate it differently for example with lists of floats to float now if you're still a Python to user this is the only slide I'll spend on the Python 2 syntax there is a Python 2 version of this syntax which uses a comment that we call a type comment pound type column and then in this case types in parentheses and in arrow for the return type note that the type comment that does not repeat the variable names there are the argument names otherwise this does exactly the same thing as the Python 3 version in fact you can write this even in Python 3 why would we allow using type comments in Python 3 to enable people to write code that is type of version neutral which is an incredibly important property of code while you're porting into Python 3 oh yeah a reminder that we have about a year and a half until Python 2 is officially unsupported which doesn't mean that the world suddenly will stop using it okay so how does type inferencing work this may be fairly basic a simple example the sum of two integers for some reason I assign the sum to an intermediate local variable X and then we return that and how does the type checker go about sort of figuring out what's going on well let's focus on the line x equals a plus B so we start start by looking up the types of a and B which come from the signature so we know that both a and B are ins then because we have operator overloading we're looking up the function dunder add on integers I'm simplifying things here if you know how binary operators work in Python you know it's much more complicated than the complications don't don't add anything to explaining the type inference so now we have this dunder add method on the type of a which so that is actually installed under add can it be called with an int darn-tootin it cam so what does it return in that case because that function could be overloaded and in fact I believe it is because if you call it with float it will magically also work and return a float yes you call it with an integer and int and that is the type assigned to the variable X Y on the next line return X we do a similar thing we say well what is the type of X because that's the entire expression it's int because we just computed that on the previous line is that the desired return type for this function yes bingo okay the whole function works there's a lot more to this in particular I'm not touching on the notion of context where in some cases if you have a variable of that whose type is declared the declare type can actually influence how we interpret the the types occurring in the expression that's being assigned to it that seemed overly complicated it turns out it's essential to deal with all sorts of corner cases of Python I don't have time to go more into that if I were ever want to get to the end of this talk so then we have the question well what what can it type be I've shown a few very simple examples int list of int well they're only a very small number of atomic types others are string and bytes well they're not even atomic but they're sort of common built-in types then we have container types and those are in fact not built-in because again for obscure technical reasons you have to spell them with a capital letter at least that's how we made it so that you can use this in versions of Python that were already released before my PI was invented so capital lists with square brackets is a list of whatever occurs inside those square brackets we also have sets dictionaries have to type parameters because of course there is a key type and value type in Python unlike some other languages where the key type is always strength tuples are technically not generic types but a special form for example a tuple square bracket int in wool is a three tuple there's also a way to spell a tuple of a variable length but I don't really like those guys perhaps more useful than the concrete container types are the abstract container types like sequence and mapping and mutable mapping and mutable sequence there's at least a dozen of those for those that existing names already were predefined in pythons collections dot ABC module which is pretty old from the first the first time we introduced abstract base classes in Python for my PI we basically sort of steal those names and give them aggressively better meanings then there are a bunch of things that are not so regular there is a type named any which means well for this particular variable don't bother checking the type because I'm going to do nasty things to it which perhaps you shouldn't do in a perfectly well type program but in practice in Python and sort of on the path to gradual typing real realistically that happens also Python has a bunch of dynamic features that when used sparingly are very useful then there is a whole bunch of stuff that I really don't want to get into much there is optional which says it's either an X or it's none very useful a very very common sort of style of of using things in Python even in untyped code how often do you see if X is none then there is Union which is well it could be you can say union of intend stir it could either be an integer or string then you have to insert a runtime type check just like you would do if the code was untyped I'll get to those a little later there is a callable type you can say well the type must be none which is not very useful for arguments but pretty popular for return types obviously then there's no return which means this thing never returns it either loops forever or it raises an exception there's a couple of crazy things having to do with strings and bytes and more things named tuple type Dix a new type that you can look up in the manual so of course these steps can types can can be combined here is a very verbose function signature sequence of tuple of float and float returning tuple of float and float we can solve that with a type alias which is just a Python assignment that takes a type on the right like tuple a float and float and assigns it to name on left and my PI is smart enough to realize oh this is not a regular variable assignments this defines a type that I can then use later and lo and behold you can now write the signature using list of Veck and Veck all pretty standard stuff so escape hooks already mentioned any a few times there are other escape hooks there is a cast the cast is completely unsafe maybe at some point we should add a down cast to the repertoire which actually checks that you're doing a down cast and not just casting to some random object other types that's unrelated but we currently don't have that we do have a cast which is like the last resort my PI is apparently confused about what the type of notice we tell my PI don't worry I know it's a list expression and then you can use it as a list expression there is also a different way to just suppress an error that might by my give for example if you import a module that my PI cannot find you can put a type ignore on that line and then my PI will not tell you that it cannot find it however it will create a variable at least in its it's own representation of your program it will say well they're there as a variable that name setup tools that has type ne and if you call a method on it the return type is still any sure Rajan a nobody I don't know that that there are any Python implementations that that run on such unusual platforms see Python is portable as long as you have 8-bit bytes and your integers are at least 32 bits and we're slowly entering a world where we're not even caring about we don't even care about integers or pointers smaller than 64 bits there are some other Python the implementations jython which runs on the JVM aren't Python I think is still in existence which runs it on Microsoft's DRL I'm not sure what the cast sort of could do differently for other platforms I mean this cast is not there to truncate bits or anything it's really really there to correct the type checker and that whether the list expression example was actually meant to be downcast but there's not much difference generics okay so we have generics I already showed list of int and dict off well here's a slightly more complicated dictionary that has string keys and tuples of two integers for values you can also create generic type aliases you need a type variable type variables are one of the warts of the type system because we didn't have a way to to sort of introduce type variables without pre declaring them so you have to call a utility function to create a type variable and you shouldn't be playing dynamic tricks with that because the type checker is not actually ever executing your code the type checker is just reading your code and trying to guess what it means anyway so we can create a generic type alias name pair which is a tuple of two values that have the same type and then we can derive from that a type alias named Veck which is a pair of floats and then we can find the same add vectors function again you can also define your own generic classes which is much more interesting stuff than using them again you have to start with type variable and then you creates you you start with class statement stack plus stack your base class must be generic of some type variable now you can also include other base fare base classes still so you can have multiple inheritance from a non-generic class or from a different legionary class but basically you must have generic of T in there now we have an instance variable named data declared in the class using Python field 6 and later syntax we spent a lot of time debating how we would introduce variable declarations that were somewhat similar to the function annotations that we already have and we ended up introducing syntax that does not use a new keyword you just say variable name colon type and that has worked out pretty well very sort of short and pythonic before that you would have to use a type comment on a variable initializer anyway this is your classic stack except I forgot the full or empty method there is a push that takes a T there's a pump that returns a T for things that return nothing you have to explicitly say returns none and then on the right we have little we create a stack so to create the stack to instantiate that generic class you have to first in Chet instantiate stack of T to stack of int and then you have to call that to create an instance there's some cashing there to make it fast now you push an integer you can not push a string at least they've made what at runtime you can but if the type checker catches you doing that it will give you an error you can pop something from that stack and then the inferred type for that variable will certainly be in an integer which you can prove to yourself by sort of trying to do something to it that you can do with integers there's also a built-in function in only in my PI revealed type that will just print the type as an error message that's really just for debugging what are we getting to here runtime type checks so when say you have an optional and a function upper that takes an optional string and apparently there are some cases where we want to be able to pass none into it and then we want to get on an empty string back a different version actually would also return an optional string but let's say well we want to always return a string so the way you write that is if the argument is actually a string we call the upper method on it there's an existing string method named upper but of course you can't call that on none otherwise if it's not a string we just return an empty string on the right is code that works the same way except instead of using is instance it just checks if it's none returned an empty string that's actually more idiomatic my PI understands both versions equally well and it will will understand that in one branch of the if statement the type of the variable is actually ster what do we have on the bottom we have a similar case where we check we take an an argument that could be any object which you can express with object and unlike any object does not suppress any error messages so you can't do anything with an object because the type checker says object has no methods well you can convert it to a string that's about it so if that object is an integer then we add one to it and otherwise we just give up and return zero on the right I have a little helper function that takes an optional int and returns a non-optional int and at runtime it just throws an assertion error when the argument is not none sorry when the argument is none again the type checker understands things like is instance or is none or even just false sort of a equals B or just if a it understands those things in assertions as well as in if statements and it will do the right thing there so that again that's those three statements on the bottom right type check correctly the required call will in fact raise an assertion error if there's no key in that dictionary D which is apparently the functionality that we want for this particular require function yeah is that included in some sort of specification or is that just how my eye works and user has to understand that like pyre might work differently no we we have that that particular one we have nailed down in that four eight four yeah yep four eight four does not nail down every corner case of the type system unfortunately and there there are other corner cases of inference where it doesn't doesn't specify things completely and unfortunately pyre and why I give vastly different answers on some program actually on many interesting programs but my PI is sort of the reference implementation at this point and the pyre people are feeding us sort of complaints and bug reports and feature requests and I expect that they will also send us requests to change the status the standard let's see so after pep for eight four good thing you mentioned that we started sort of evolving the type system a bit and probably the biggest addition is protocols or duck typing I'll try to read through this example quickly there is a log function that takes a message and a file object and it appends a new line to the string arguments and then writes that thing to the file very primitive logging function then there is a class save which derives from the stack class I defined previously just for convenience and to show that you can inherit from a generic class sort of specializing it to a specific type because save is no longer generic it is a stack of strings that has has a write method so the write method just pushes that message on the stack and I don't know what happens to it after that so we're creating more those savers and we try to log to it now if you run that code it will actually work but if you try to type check it you get a complaint from my PI saying that save object nevermind that it has a right method then and everything works just fine it's not actually an instance of IO of ster which is the very elaborate type that defines the i/o streams in Python which has like two dozen methods write and read and read line and read lines and read into is a TTY way to many methods it's also generic because you have files of strings and files of bytes anyway the type checking doesn't work well here so we want a way to say accept anything that has a write method well actually accept anything that has a write method that takes one argument that is a string so here we go we define a class writable and we writable and we inherited from a predefined thing named protocol we add a method to it the method is complete dummy there's a path there you can also literally use the syntax dot dot dot instead of the pass although that won't work if you're also Python 2 compatible this is not a class that you can instantiate this is a class for putting in signatures so now we change your log method and the rest of the code is the same we change it to say that the file argument is a writable well since the only thing we do with it is call it write method that works just fine and since that save class actually has a write method of the correct signature the last line the log call is actually now correct and this is how we deal with duck typing there are a few details like in most versions of the typing module you would have to import protocols from a typing extensions module but will smoothen that out eventually so the key thing to note here is that writeable is not an abstract base class because save does not in fact inherit from writable so without protocols you could only solve this by defining a abstract base class that has the defines the write method and modifying the i/o class which is deep inside the standard library to inherit from that writable class we sort of we did this long ago in the standard library for one or two very popular duck type methods like there's a hashable class there which if you check dynamically is this thing hashable the hat you can use that is instance law of hashable and the hashable class overloads the is instance behavior and just checks is there a right is is there a dunder hash method that stuff doesn't really scale yes questions that it does not that is really difficult because that sort of my PI would need its own interpreter that is able to execute all the code in this instance or his subclass overload method so we chose not yet to go down that path future generations will get some PhDs out of that probably but that would that would really not solve everything for protocols it would certainly be a very large stick to kill that particular mouse poorly translated Dutch say okay well glad you're all still with me here so there are many details I didn't even mention the typing module you can look that up in the documentation I do want to mention stub files and a thing called type shed sometimes there is no source code to annotate and where is my PI going to get its type definitions from this happens when the code is written in C or C++ or Fortran sometimes there is source code but it's too crusty or it's read-only for some reason you don't want to annotate it the standard library has a lot of very crusty code that would not be easy to annotate so my pie is actually configured not to bother reading the standard library it never never reads any standard library code and it doesn't read any install packet third-party code either but it does read a thing called type chat which is a collection of stub modules that are basically class definitions with empty method bodies and function definitions and constant definitions and few other things but like it's in tactically valid Python code it is not executable but it has all the type annotations that are necessary to make my PI understand what's going on and so we start out with like a stub for the built-ins module and we have stubs for I would say at least a third of the standard library that doesn't sound like much but it's definitely the most popular third and a small number of popular third-party code and I think I already mentioned this this is also an open source collection it's on github and it's actually shared with all the other static type analyzers for Python another thing I want to mention briefly is suppose you have a million lines of code or maybe just 10,000 lines to start small somewhat small how do you get that stuff annotated if you point my pie at 10,000 lines of code that you wrote over the past two years let alone five million lines of code you wrote over the past decade it's gonna throw a lot of errors it might even crash man in 2015 when we started experimenting we thought we fixed a lot of my pie crashes just because that sort of the drop box code was much richer in all the corner cases of Python that it explored that any of the test cases we had thought of at the last icon in May in Cleveland was a good talk by Greg price clear code at scale aesthetic types at Zoo leap and Dropbox which sort of explains step by step how should you go about it my three point summary of that is write some kind of wrapper script that sort of invokes my pie on the right set of files with the right set of flags start annotating only a small number of files in fact start analyzing only a small number of files at a time so you don't get overwhelmed set up continuous integration so that once you you have a fixed point where you have zero errors for at least the configuration that that you actually have in that script you will keep your developers or yourself if it's just you honest and you don't introduce new type failures and then I want to mention that there is a config file where you can set things like strict optional is true the whole story I mentioned about optional actually only works if you use strict optional is true you can also say disallow untyped calls which is useful flag that tells my PI if I mean type checked code and I see a call to a method or function that has no annotations flag that as an error by default when anything that's uh Nana kated is just seeing as a big blob of Enys and when you call that it always type checks fine and whatever it returns is now inferred as also having type any and that in many cases sort of propagates through the rest of your code or at least to the bottom of your function or however far you you're using the variable that captured that result so by using disallow untyped calls you can make my pilot pick here about that particular situation there is like 30 more flags that effect all sorts of things like where to look for modules and a whole bunch more strictness Flags oh yeah they're also you can specify most of the flags differently for different packages or modules so here's a summary of the standardization effort at 484 sort of was the result of a long and painful discussion in the community about well should we have type annotations what should it look like what should the syntax be in the end that the sort of the mypie design mostly prevailed it's definitely not a complete standard you couldn't start implementing a type checker from just that specification and end up with the useful type checker it would sort of report errors for most interesting Python programs the variable annotation syntax was not in Python 3 when it was created so we had a separate pep for that and it's only available in 36 then protocols were their own pep we have another pet that lets you publish stubs outside the type shed collection because while type chat is a really handy way to bootstrap you for third party packages it really doesn't scale to have everyone submit their stubs to type chat in fact there are many third-party packages that are not very popular and commonly used and have conflicting names so who's whose packaging ends up in in type shed so the answer is neither but each package can publish their stubs with their package or separate for as a separate package and then their first thing that came out of Facebook was for word references which I didn't even mention here but there are so much painful syntactically and there's now a future import where you can avoid most so that pain that's actually it except for our questions I have a bunch of handy links mypie long wrangled org is really the only one you have to memorize because all the others are linked from there yeah how to install it tip 3 install my pie my pie itself of course is a PI 3 program even though it can analyze Python 2 code and here are some more links to all the various bits of tooling that I could remember this afternoon including Peyer which is named pyar check on github we have two of everything it seems we have two different packages that do run time type collection which is something again didn't mention at all but again when you're annotating a large code base that is already working and you've got unit tests and everything it would be nice if you could instrument your running program with something that observes every single function call and logs what are the types of the arguments and well it turns out there's not a completely trivial problem so Dropbox and Facebook came up with separate solutions type checkers the lor so questions heckling selfie requests [Applause] couldn't cram that into a slide either the variant story is that there is a way to say it's covariant invariant or contravariant in each type variable you could you you can have things that are that have multiple type variables like dictionary is invariant in both of its type arguments on the other hand mapping which is a read-only version of dictionary is covariant in the value type tigran's no that is definitely an issue I refrained from showing a certain example because of that currently it doesn't seem to be a big deal because in general people have already solved this problem for untyped code which is kind of a cop-out but that's sort of realistically that's that's where we are we are mostly adding annotations to existing code and like that stack example works just found without find without ever being able to instantiate tea obviously there are other examples where it's not so easy like that this the summing example would be nice if you could create a new tea but the problem is in Python you can't always create a new instance of a class without knowing the signature of its constructor so we really don't I mean Python just doesn't really lend itself to doing something like that okay I'm actually kind of curious how many cases are there of things being removed from we have not done it for that reason we have removed things from type shed but always for the reason that the module or the package turned out to be hopelessly dynamic and someone had contribute partial stubs that caused more pain than than being helpful I expect that people will start sort of taking their their own annotations in their own hands that the first example that that will happen I think is the utter package or utters I think you do import adder it's very popular in certain circles it turns out that adder is not easily typed using this type system but my PI has a plug-in mechanism which is currently still very experimental but someone contributed a plugin that lets you type classes created with the help of utters decorators after is incredibly dynamic it's a class decorator and a bunch of things you put in the class and suddenly the class has an automatically generated comparison constructor or a bunch of other things maybe pickle support hashing and because it's a class decorator that sort of just adds methods to the class my PI has no hopes of understanding what's going on there so when you instantiate an instance of those class my PI is very sad or actually the user is very sad so using the plug-in we basically special case Mike my PI so that it actually understands what that class decorator does in that particular case without having a like a hole interpreter for for class decorators built into my PI but one of the first bits of feedback we got was well adders is evolving there's enters 1.7 and 1.8 and whatever comes next and they're all different and now we want to have different versions of the plug-in that will happen there they're sort of that we there there are a couple of other packages for example Jango where people are working on a set of stubs and Pep 561 is coming just in time so that's no Django stubs will ever have to be placed into type chat because Django can just publish its own stubs and of course it can publish different versions of the stubs for different versions of Jenga which is also the point sure back to you well okay it supports a tiny bit of magic just enough to understand I believe yeah I'm sure Dropbox uses some of those things internally is that how they sort of approach that sort of Dropbox is trying to stay away from Jason not always successfully this explains why we have a protobuf plot let's not plug in well it's a plug-in for the protobuf compiler actually we don't have a good strategy for Jason other than saying well you have to write a verifier and verifier should probably return objects of a same class that can be fully typed if you have a very basic JSON schema that isn't recursive you can also use a thing called type dict which is a hack where you can tell my PI well here's a type and it's just an alias for type check it as follows it should only have these five keys and for the first key the type of the value is that and for the second key the type of the value is that and so on and you have to only index it with literals for the keys and it turns out that there is a law of sort of Perl code written in Python that uses this approach and Jason often follows that pattern as well but yeah that there are definitely some some things that could be done better for Jason in particular well the DB API is purely a runtime thing so it's too late because all the type checking happens just by reading your source code so I didn't yet four databases we would probably have to have some other way where you you sort of you somehow get the database engine to spit out the schema and then you have a little translator that turns that schema into a set of class definitions that have annotations in them and then you can use that yeah I'm afraid that that is really out of scope for this talk because I I have no idea what you're talking about sorry file a bug okay first off thank you very much for the talk and I'll admit I'm not a programmer so I'm not qualified these are two philosophical questions I was startled by what are your slides saying the advantages of a dynamically typed language including the fact it's exciting power tool if black & decker made a chainsaw with no safety interlocks they could say was a real fighting chainsaw but I don't think that's what they were so the first question is that a product should be safe above all else and in retrospect do you think it was a mistake to not have this be strongly tight ethically tight from the store that's that I don't know I the anthropic principle for that if I had created a statically typed language I would have had to compete with C++ and Java and I wouldn't be standing here from the opposite perspective a product that's easy to use or easy to jump onto the bandwagon that looks appealing that's fun that does have value because otherwise it might just founder I don't know how to balance the philosophy of giving people what they want versus giving people what they need doing what they want persons pushing on them some reason their consciousness and what exactly it is it that they need I mean I I also created a language that while being dynamically typed was not easy to set hold and apart from one corner of the standard library whose only goal it is to to sort of unsafely hook into see you cannot segfault python easily from within itself I mean you'd have to basically rely on a bug and interpreter both where the sites were declared upfront but maybe one of your types was treat this as a dynamic type such that in special cases like calls to standard IO or something the sharp uses that approach I gotta admit I mean there there is something to say for that that idea in ABC there were no visible type declarations but ABC was statically typed the language design had to compromise because there were sort of there was a different operator for each type basically like there was no operator overloading because like plus meant that both the arguments were numbers period and then there was like runtime stuff where oh well actually we have different representations of numbers like summer rationals and some are floats and the floats have a bit whether they were exact or not or something but and then there was a separate separate level in ABC where suddenly everything was still dynamic because you could sort of edit your code and the system as designed did not have sort of did not retype check every piece of code in existence when you change one function [Applause] you
Info
Channel: stanfordonline
Views: 18,607
Rating: 4.9378238 out of 5
Keywords: Stanford, Stanford University, Seminar, ee380, Guido van Rossum, Dropbox, Python SOftware Foundation, Optional Static Typing, Python 3, Programming Languages, mypy type checker, code base
Id: GiZKuyLKvAA
Channel Id: undefined
Length: 77min 6sec (4626 seconds)
Published: Thu Jun 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.