Advanced Python or Understanding Python

Video Statistics and Information

Video

Captions Word Cloud

Captions

>> Hello. Yes, hi. Welcome to the latest in our series of talks in advance topics on programming languages. Today we're very lucky to have Thomas Wouters whose here to talk to us about features that are currently available in Python but are advanced features. Last week, we had--we were very lucky to have Guido, that awesome talk about upcoming features; they will be talking about existing features, right? Yes? I wanted to--as always, this is the latest in our series of talks. I always want to make a pitch so that people will actually give talks of their own. It's a really useful thing to do. If you have specialized knowledge or if you have interesting knowledge, to share that knowledge with as many Googlers as possible and potentially even the rest of the world. So, please come and see me and we can set up a talk and you can give--yet another, in our series of talks here. And with that, I will turn it over to Thomas who shall give a very interesting topic. Thank you. >> WOUTERS: Is this on? Right. Although there's an echo. So I'm going to give a talk about Advanced Python which is not an easy subject in that there's a lot of Python that can be considered advanced. I'm not sure what your level of advancement is so I'm going to cover basically everything from the start until the end and we'll see how far we'll cover. These are the subjects I'll be covering. I'll certainly be explaining what the Python design, objects, iterators and generators and hopefully decorators. Everything after that is a bonus. I'm going to keep the pace pretty high, so if there's any questions, just wave your arms and I'll stop and explain. But if everyone can keep up then we can probably cover the more interesting topics at the end as well. And if there's specific interest in any of the topics especially, for instance, Unicode, I can skip over the advanced stuff and jump right to Unicode. So, first about Python, Python was developed by Guido as a middle system language between shell scripting and system programming. So it was intended to be easy to use by normal programmers but still allow more complex structures than a shell script--scripting language. It turns out that it's also convenient for library programmers. People doing the actual programming for end users because you can hide a lot of smarts inside of objects or modules and just expose a very simple API to end users. This has gotten progressively better with later releases of Python. There are multiple implementations of Python; CPython is the one everyone uses except for those that are developing the others. Jython is--has been around for quite awhile it has--is actually in use by a lot of people but not as much as CPython. IronPython and PyPy are still actively being developed although, IronPython is apparently very usable. It is designed by implementation in that if you don't come up with an implementation of a feature that you want implemented, it's not going to happen. But it's not a slave to the implementation and it doesn't mean that if you implement an idea, it will get in. It's still a very designed language. We have feature releases every one to two years. It used to be every year or so and now, the last one was in the works for two years and there's a lot of bugfix releases as well. An important note about the feature releases, if you do not get any warnings in using a particular minor release, upgrading to newer minor release, the one following that minor release will almost certainly work. Any--anything that changes semantics or possibly breaks code will be warning for at least one release. And bugfix releases have a similar requirement that you can always upgrade to a newer bugfix release of the same minor version and nothing will break that wasn't already broken. Important in realizing how Python works is that everything is runtime, even compiletime is actually runtime. As you may know, there are cached files, .pyc files that are cached bytecode, that's basically borrowed runtime, someone else's runtime that you can skip. Also important is that all execution happens in a namespace and that namespace is actually a dict, it's a dictionary object with keys and values. Modules, functions and classes all have their own little namespace and they don't affect other namespaces unless they explicitly do so. And modules are executed top to bottom. Script start running at line one and they run all the way down to the end or until loops or whatever make them wait. So in that regard, it's very much like a script. When you import a module, it does the same thing, it just starts executing line one and runs on down until in the end and the import is done. And another import aspect is that the def and class statements don't define a function or at compilation time, it's a runtime thing. It actually happens when bad statement is reached. So some more about functions and classes as well because they're important features. At compiletime which is runtime, the code for the function will be compiled into a code object. It's a separate object from whatever the rest of the code is compiled into. And then at def time when the def statement is executed, it will be--the code object will be wrapped into a function object along with whatever arguments it takes, any defaults to those arguments. And essentially def is an assignment operation. If you look at the bytecode, you'll see that it just uses the same bytecodes as normal assignment operations. So here's an example of a function. The red part is the actual def statement that gets evaluated when it's reached including the part in blue which is an argument default. That part gets evaluated at the moment of def. So the Foo object is constructed or the Foo function is called, if it's a function, at the moment the def is being executed and the green part has been compiled at that time because compiletime has been in the past but it's not being executed until later which is when you call the function. So this is when you call the function, there's an inner function in there, so what happens is that the green lines you see here, they've been compiled into their own code object and when you call func, the inner func gets created at that time and again the default argument of the defaults for the three argument is evaluated at the moment you call def. If you look after the definition of inner func, you'll see that arg1 and arg2 are reassigned. That means that whatever arg2 was, will be assigned to inner func's arc3 and not whatever it is at the moment you call inner func. However, inside inner func, we also use arg1 which is using [INDISTINCT] scope which I'll be mentioning later as well, which means that it will be using the outer functions arg1 and that will be whatever arg1 is at the moment you call inner func. So there's a very big difference between those two uses. >> Question? >> WOUTERS: Yes? >> That very final line, was that evaluating inner func or [INDISTINCT] function? >> WOUTERS: No, it's just returning function object. >> Okay, thanks. >> WOUTERS: So it's not calling anything, only the parenthesis do actual calling. Yes? >> [INDISTINCT] could you repeat the question... >> WOUTERS: Sorry? Yes. >> [INDISTINCT] >> WOUTERS: The question was is the last line actually evaluating inner func? It is not, it's just returning the function object that is inner func. So class statements are very similar but also very different. A class statement is followed by a block of code, a suite, that's executed when the class statement is reached right away. It is however executed in separate namespace, a dict again. And then after that code of block has been executed, the namespace is passed to the--or is used to create the actual class object. Any functions that you define inside the code block then get turned into methods by magic, basically. And--well, as people are already programmed in Python will know the self argument of this, for Python, is passed implicitly when you call a method but your method still have to receive it explicitly as the first argument. And an important thing to realize is that the code block in a class is actually a normal code block. You can do whatever you want in here, for instance, at the top I create a class attribute which is just something stored in the class, result of calling func. I define a method which is just a normal function but it takes self [INDISTINCT] argument and it assigns to an attribute of self when it is called. I also define a helper function which just did some silly arithmetic and call that helper function inside the class block itself to calculate or recalculate the class header, and at the end I delete helper because otherwise, it would end up in the class and since it doesn't have self [INDISTINCT] first argument, it wouldn't do very helpful things. Other important parts about Python, variables aren't containers like in other languages. They're not allocated memory that you can use to store data. People often say about Python that everything is an object and they also sometimes say everything is a reference. That's both true but both are not true when you apply them to variables because variables are not objects and variables are not references. Variables are only labels. Everything concrete, everything that allocates memory, everything you can toss around is an object and whenever you hold an object, whenever you store it anywhere, you're actually storing reference. You're not actually own an object, you only own references to an object. Variables don't have type information, they don't have information on the scope or the variable or how many objects live around or when they were created, the only thing they do is refer to an object and that's it. If you want to look at it from an implementation standpoint, it's a dict. It's a dictionary mapping names to objects, that's it. So, scopes also related to namespaces. Python has a very simple two scope rule which is actually three scopes. Names are either local or they are global or built in. Local means it's--it exists in a current function or class or module but it doesn't exist outside. Global means it exists in the current module, it doesn't mean exist everywhere, just in a current module. And builtin is a special namespace that adduced by builtin functions that you can actually modify. It's just a special module that you can modify yourself if you want to. When the Python compiler examines a function and compiles a function, it keeps track of whatever names you assign to and assumes correctly because by definition, those names you assign to are local. You can change this by using the global declaration which is the only actual declaration Python has. But don't do it too often, it's usually a sign that your code is incorrect. All other names that the compiler finds are either global or builtin and the lookup is it looks in global and then it looks in builtin. So if you have a global of the same name as a builtin, all the code in the module will find the global of that name instead of the builtin. You can actually use this on other modules as well. If you import a module and then assign to an attribute of the module, you'll be assigning to a global name in that module and you can mask whatever "itthings" is a builtin name by assigning to that attribute. There's also a trick with nested scopes which were added later in Python. I think in Python 2.1 where you can--as I pointed out before, you can refer in an inner function to a variable in an outer function but that is only read-only. You cannot assign to a name in an outer function. This isn't really a mechanical problem in Python. It would be possible to assign if we added it but there's no syntax to say and one I assigned to another--outer scope variable. Global only assigns to a global name and not to outer scopes. Apparently, Python 3 might be getting--Python 3.0 might be getting syntax for this. So I mentioned modules. Modules are there to organize code. They're very convenient because they have their own namespace with global names. They also keep the rest of the module tidy. They're always executed on first import and then cached in sys.modules. This cache is just about everything that you import in your program and it's also the only thing that keeps track of what you have imported. So if you toss something out of sys.modules and import it again, it will get executed again. Import is just syntactic sugar, just like--basically everything in Python is syntactic sugar. It calls the __import__ builtin function. If you want to do an import over module whose name you have in a string object, you use __import. If you want to replace __import, you can just write your own function and replace the one in the builtin namespace. Another trick is that sys.modules does not have to contain modules. It's a mapping of name to whatever object you want import to return. So you can toss in any--any all object in there and then importing that name will return your objects. Storing none in sys.modules just means this module couldn't be found. So, if you want to prevent some module from being imported, you can insert in sys.modules "none" under that name and then it'll raise ImportError whenever you try to import it. Python objects in general are described under--in various terms, Mutability is a common one. That means the object is changeable. List, for instance, are mutable, Tuples are not. Mutability is very important in Python because everything is a reference. If you have a mutable object and you end up changing it by accident and changing it for everyone, that's rather inconvenient. So you have to keep in mind whether objects are mutable or not. Fortunately, this mostly happens by accident correct, anyway. Related concept is Hashability, whether you can take the hash of an object. In normal operation, mutable objects are not hashable at most, immutable objects are hashable. For instance, Tuples are hashable but they're only hashable if all their values are also hashable. Hash--hashes are being used just for dicts and sets right now, so any dict key has to be hashable in any set item, anything you want to store in the set has to be hashable as well. But it's not inconceivable that more stuff uses hashes. And then there's the imaginary abstract base classes that are often referred saying that an object is file-like or it's a sequence or it's a mapping. Those are abstract concepts that are somewhat like protocols or interfaces or whatever you want in another language but they're just informal. They're just saying whenever an object is sequenced; it has the sequence operations implemented. You can look in and you can iterate over it, etcetera. Some of them are overlapping, for instance, sequences, mappings and files are all iterable so the iterable interface applies to all of them. And actually all of the objects you define, all that you're defining is implementation of syntax. Python defines syntax and you can plug in whatever implementation you want in that syntax. And you don't do it by--in other languages you say you implement the less than operator, in Python you say you implement the less than operation so that it can be used even when you're not actually using the less than operator. Here's a short list of hooks that Python supports. I'm not going to go over all of them. Most should be--should be common enough. The one thing that's worth mentioning is conversions. You can define a few hooks that define how your object gets converted to other types. That just works for builtin types of course, because the hooks don't exist for arbitrary types. But most arbitrary types that want to do something like this can just say if you have an __Foo method, I'll call it to turn you into a Foo object if I want to. But it's not very common. These things are not Python hooks. These things you cannot configure on your object. Assignment. Assignment is an operation that changes the namespace and not the object. Since an object is just the reference to an object, you have no control over where your references are going. Another thing you cannot control are type checks. You cannot control when people are comparing your type or doing instance checks, you cannot control what they get as a result. Related identity comparison, the "is" operator, it checks whether two references are pointing to or referring to the same objects, you have no control over that. "And" and "or" and "not" are Boolean Operations and they just call the one truth value test that you're operated--your object can implement and you have no control over what they actually return. Some may know "and" and "or" in Python return, one of the two values were as "not" returns always a Boolean. And Methodcalls, you cannot define in a single operation whatever will happen when someone calls a method on you because methods are getting an attribute followed by calling that object. They're two separate steps. So in order to implement methodcalls in your object, you would have to implement getattr to return some in go-between object and then have that go-between object do something when it's called. I'm sorry. So on to some implementation details in C. This is just--applies to CPython of course. Python objects are implemented in struct that holds various bookkeeping information like ref counts, the actual type that an object is, as well as arbitrary C data. It can be pointers, it can be in C, it can be an array, whatever you want. Types are what describes what an object is and how it behaves. There are separate struct which is also a PyObject struct. The arbitrary data in a PyType struct is the function pointers and various other things that describe the type. The PyObject structs are not relocatable. You cannot move them around once you've given them out to anyone else. It's a blog of memory that's allocated and that's it. If you want to move it around, you have to destroy the object which means you have to get rid of all the references to it. That also means that variable sized objects like lists that have a portion that needs to be reallocated and might move around, are--just use an extra layer of indirection. It's just a pointer stored somewhere in. And of course because it's C, it doesn't have ref counting by nature so ref counting is on manually, whenever you receive a reference you incref and when it's over, whenever you toss one out, you decref. It sounds simple, it can get quite complicated but the Python C API is mostly clear enough that it's not too hard once you get used to it. Another feature that is done reasonably manually is weak references. Weak references are references that get informed when their object goes away so that they can do some cleaning up and not crash. The weak references are just callbacks basically in the PyObject struct arbitrary data. One of the major problems with ref counting is reference cycles. That is two objects that refer to each other causing them both never to be cleaned up. Two objects referring to each other as of course to the simple problem and the complex problem is a huge loop of object referring to objects and everything. Python has a cyclic garbage collector which keeps track of objects that might participate in cycles, for instance, lists and dicts and walks them every now and then to find object groups that are not reachable through anything but themselves and then it cleans up the entire cycle. This is all done in C, if you write a C extension or a C type, you don't usually have to bother with it too much. There are some methods you have to implement if you want to participate in cyclic-GC, when you think you might be participating in cycles. So in Python, when you have an object, what do you have? Well, you have an object that has two special attributes, the __dict attribute which is a dictionary holding per object data, and you have the __class attribute which refers to the class. And like in C, the class defines behavior, and in newer Pythons the class is actually a PyType object and it all matches. The way to define the behavior is not through function pointers because Python doesn't have function pointers. It's with specially made methods. Methods that start and end with __, those are specials Python. They're not special because they get allocated differently, they're just special because some of them get called by Python automatically in some situations. You can define your own __ methods and use them whenever you want. There's nothing special on that regard, it's just the--some of them signals a Python, this needs to be called whenever that happens. In general, you should not be calling another objects __ methods yourself, not directly. You should always use them through whatever mechanism wraps them. So for instance, if you want to have the string representation of an object, you shouldn't be calling object .__str__, you should be just be calling StrObject. Another feature of Python is that class attributes, that is attributes of the class object, are also reachable through the instance. If you do the instance.attr and it's not defined in the __dict, it would be fetched from the class and the class might have pair of classes so they--those get searched as well. And of course in Python, the whole point of Python is that you don't have to do stuff that Python can do for you, so refcounting, weak references and cyclic-GC are all done automatically, you never have to worry about them. Typical Python code does not use type checks because--partly because it was never possible until a couple of years ago to sub class builtin types. So pretending to be a builtin type meant that other people would not have to use type--must not use type checks or you could never pretend to be a builtin type. It's also very convenient, it just means you can pretend to be whatever type you want, implement the right methods and it'll just work. The C layer sometimes needs specific types, if you want to multiply a string by an integer, it needs to have, actually have a string and an integer or there won't be any multiplication in C. So the C layer has conversions. When it wants an integer, there are special methods that say I need an integer at this argument, it will convert whatever gets passed in--and it will convert whatever get passed in to an integer or a string and do the multiplication that way. If you really must use type checks, for instance, because you're faking C code or you're wrapping C code or whatever, use isinstance instead of type. Checking type for a specific value means you could--you only accept objects of that type where isinstance does proper instance checks so that someone who does subclass whatever type you expect works the right way. Functions become methods by magic in Python, it happens when you get an attribute off of an object or rather when you get an attribute of a class through an object. Whenever you get an attribute of a class that is a function, it get turn—-gets turn into unbound method which is a method that knows it's a method and it knows which class it belongs to but it doesn't know which instance, so when you call that method it knows that the first argument you passed must be an instance of that class and then it'll complain if it's not an instance of the right type. Of course that type check uses isinstance, so when you have the methods on bound--or if you have an on bound method you can pass a subclass of whatever class it express--expects and it works. Bound methods on the other hand are methods that know they're instance and they will be passing that argument automatically so you call them without the first argument, you start at the second argument and it all looks normal. So any questions so far? No? All right, on to iterators then. [pause] >> WOUTERS: So iterators in Python are really simple, they are helper objects for iteration; they encapsulate, if you will, the iteration state and they're created and used automatically. If you don't want to bother with them, you don't have to and it all just happens by magic. If you do want to bother with them, you can do a lot of fun stuff with them, even more so if you combine them with generators which I'll be talking about next. Iterators are two methods, __iter__ and next, notice that there's no __ around next, because next is actually intended to be called directly sometimes. So there's no __ around it or people would think that they shouldn't be calling it. Because they're simple, iterators in Python are not rewindable, they're not reversible, they're not copyable, none of that, they're the lowest common denominator in iterators. There are however ways to write reversible iterators if you want, you can just write your own iterator class and add methods to rewind or reverse or whatever. You can also just nest iterators. And iterators themselves are iterable, they are just—-actually an iterator is required in its own __iter method to return itself or it wouldn't be an actual iterator. So in the example I have here, I create—-exclusively create an iterator over range which is a list of number from 0 to 10, not including 10, and then I call zip on it which takes a number of iterables and takes one element of each iterable and wraps it in tuple and stores that in a list which it returns. So I created a single iterator pass the same iterator to zip twice, and as you can see zip takes one element of each iterator consuming the iterator as it goes and ends up with having two elements at a time basically from the original list of zero through nine. So generators are a convenient way to write iterators, they are lazy functions that get executed as you iterate over the result basically. Generators use the yield key keyword instead of return, it works very much the same way except after a yield statement execution continues they're--the next time you call it or the next time you iterate. The function with yield in it will return a generator, it will return an iterator that you can loop over and there's—-this is terribly convenient for __iter__ methods because you can just write what you would expect in it'll just work. In Python 2.5 generators were complexified and they can now also receive data and--and you can use them to build co-routines very easily, nothing you cannot also do with 2.4 and earlier iterators, just more conveniently and with extra magic. There's a lot of generator-like iterators in the iter tools module which is I think new in Python 2.2 or 2.3, there's a whole bunch of stuff in there you can use to chain and filter and whatever with iterators that are just really convenient for chaining their various combinations. So here's a generator example, the bottom function is map, as you may--may know it accept--it only takes one iterable that goes in map function, it creates a list of results from applying the function to every item in the iterable--iterable. In the bottom, there's a two line function that is the generator version and then there's a one line function that is the original map implemented in terms of the generator. As you can see you just basically lose half the function if you use a generator because you generate items--each item on the fly and you return generator instead of an actual list, any questions? >> [INDISTINCT] now you can do it like that? >> WOUTERS: Can I define what a co-routines and how would I do it in Python? A co-routine is a routine that is basically like the generator, it stops execution, passing data to someone else but where a generator returns results to its caller, a co-routine returns results to someone else, another function. So you can have two routines where they both consume the output of the other and then the end result is handled data. How would you do them in Python? Well as I said you can do them in Python 2.5 with the new sending data to a generator thing, before 2.5 you would write a class that was itself an iterator and just write it in bits and pieces and--it wouldn't be as convenient as co-routines in actual languages that have co-routines because you don't have a single blog of code, you have a whole bunch of separate pieces of code that get called at the right time, I wouldn't bother implementing them in Python right now. Maybe with 2.5 outer 2.5 out and people getting used to the new stuff you can do with generators will get actual coroutines in Python. Yes? >> [INDISTINCT] >> WOUTERS: The question was if generators are lazy, can you write a generator that loops infinitely and just keeps on yielding results as long as you ask for it? Yes, yes there are... >> Then you just ask for finite number? >> WOUTERS: Well it wouldn't—-do you ask for a finite number? You can if you use the iter tools module, you can slice iterators, you can say up to so many items in this iterator and it'll return in iterator that starts consuming the original iterator until the slice ends. But you don't have to do that. The iterator is something you loop over, so if you have a loop that loops over two things at once and it stops whenever the condition is met, if you don't--and you can just loop over an infinite object and rely on the fact that your loop will end itself for other reasons than that the iterator stops. There's actually an infinite number of infinite iterators in the iter tools module like itertools.count which starts counting in zero and just keeps on counting forever and ever, unless you stop for some reason. [pause] >> WOUTERS: So on to decorators. Decorators are syntax in Python 2.4 for a feature that's been around forever which is function wrapping. Decorators are just funcs—-are functions or callables rather, that take a function as argument and return replacement callable or they can return the original function if they want. However, the syntax means that it can get confusing if you have a decorator that also has to accept its own arguments because now you have a function that should return a callable that should accept a callable and then return the original callable, so we'll see some examples of that. Another problem is that because functions are special when they're attributes of a class they become methods, when you have the decorators that returns something that's not an actual Python function but something that acts like a Python function for--in most regards, it won't get turned into method unless you implement the same magic that methods are--that functions have that turn them into methods which is __get which I'll maybe explain later. So if you write decorators make sure their methods or make sure they're functions or they won't get turned into methods. And anyway, as I said simple concept but the devil is in detail and we'll see about that. Here's a simple decorator, it's a spam decorator that says whenever the function that you decorate, the same function at the bottom is called, it actually calls the wrapper function at the top which loops for ten times calling dif--original function. So in this piece of code the original function get a called--gets called ten times and there's no return value of the actual function call which is [INDISTINCT] then the original either. Here's an example of a decorator with an argument, the decorator is spam ten which is not--no longer--spam is no longer the decorator, it's rather the decorator generator that takes a number which is a number of times to repeat. And then in spam, we define the decorator function which accepts the function as an argument and then has a wrapper function which codes the original function and then return--return wrapper and then of course return the decorator from the decorator generator. So that will--looks--looks okay, I mean, it takes some getting used to, the nesting and everything. But there's another problem, what about interspection? Maybe Python users don't care about the name of their function but some tools do, like Pydot for instance, which is the Python documentation tool, it actually looks at function objects and looks the other name and their signature and their DocString and whatever. And because we replaced the original function, when you asked for the documentation of sing, it'll actually say, it's called the function wrapper and it has a signature of *args and **kwargs and it has no DocString. That's probably not what you want. Another thing is that if you have another decorator in there, you can chain the decorators that changes attributes of the function; those changes will go away because you're not doing anything with the attributes of the function. So, some people write code like this which is the original spam with the repeats argument, with the decorator function in there, with the wrapper function in there. And then after defining wrapper, we assigned the _name_, the DocString, the _module_ and--and the dict of the original function back to wrapper so that all those things will actually be the same for the new function and the old one. And assigning dict like that actually works, you can--you can copy—-its not a copy it's a reference assignment, the original function—-all of the original function dicts or attributes will be accessible in the wrapper function and when you assign to either one of them it'll--it'll appear in the other one as well, they just share their attributes space. Now this is not the easiest way to write a decorator, so in Python 2.5 in the func tools module there is a decorator for decorators that does this for you. So you have a decorator that you apply to decorator or to decorator generators and then that decorator generated decorator gets applied to whatever function you have. So as you can see, the devil is in the details, they can get confusing somewhat quickly, any questions? All right, how are we for time? >> You got [INDISTINCT]. >> WOUTERS: All right. New-style classes. When I say new-style classes, when anyone says new-style classes they actually mean all old new-style classes because they were added in Python 2.2 which was released I think six or seven years ago something like that, they're old. And it's a unification of the C types that I explained and the Python classes because before or actually still in classic classes, instances and classes are distinct C types. There is a C type called class object or class obj—-that implements all the magic that I talked about--about turning methods into or turning functions into methods and there's the C type instance which make sure that instances work as expected with the __dict and everything. So they're separate types and if you ask for the type of an Int it will say it's an Int, but if you ask a type of—-about the type of any class that tries to be an Int, it'll say "it's an instance". So you have no way of checking all that. And another—-a problem with the original approach was that you cannot sub class builtin types, so Guido worked on--in Python 2.2 on unifying C types and Python classes and the end result is pretty good. You can mix them or match them and everything, it worked good. But it only works because a lot of new general mechanisms were added. They were necessary to bridge divide between C objects and Python types and things that you assigned from Python have to be inserted in as a C data type in a C struct rather than as a Python object pointer. Classic classes are still the default so if you write a new class and you don't specifically make it a new-style class, it'll still be a classic class and that was for compatibility reasons because there's a lot of stuff that's slightly different between—-well a lot of stuff. There's a couple of things that are slightly different between classic classes and new-style classes mostly with multiple inheritance and you can check if any class or--actually any instance of a class is an instance of a new-style class because it inherits from object instead of nothing. So you can do actually do is isinstance, my thing object and you'll know that it's an instance of a new-style class. The first of the mechanism that I am going to explain is descriptors. [pause] >> WOUTERS: Descriptors are a generalization of the magic that happened with functions to turn them into methods. A descriptor is any object that lives in the class namespace that are class attribute space, so it's an attribute of the class and that has __get__, __set__, __delete__ methods. It doesn't have to be--have all of them I think, you can just do it __get or __set or __delete for specific operations. Whenever an attribute is retrieved--attempted to be retrieved from an object whose class has an attribute with a descriptor in it, those methods on the descriptor will be called and the result of those calls will be passed back to the object. Same for setting, it'll call the set method no result occurs and deleting they call the delete method. The delete method is not called __del__ because that was already taken for some other hook apparently. It's now the method behind--the mechanism behind methods. So if you want to have a function like object that behaves the same way as functions do becoming a method, you can do that by defining __get__ and it's also the mechanism behind property which is a trick of hiding accessories behind normal attribute access. So here's an example of properties, I'm not going to give an example of actual descriptors because it's too boring and you won't be using it anyway, but here's a property. We have a class, we define the function get prop, it takes a self argument even though it's not going to be a method, it takes a self argument, it doesn't stop there and to return whatever the value is of the extra property. And then we wrap it in property and store it in a local—-in a name that'll eventually be a class attribute. Oh, I see I have an error right there. So we instantiate the class, I should have had Foo instead of X there and then we do foo.prop, foo.prop calls to the get prop and because it's a property, even though it's not a method because it's just a function inside the class log, the property type knows that it needs to pass the instance for convenience, the instance that it actually--is called on, onto the function that wraps it. If you look at this you can say, "Oh, this can be a decorator, too," and it's true you can just say at property at the top of def get_prop, except that property takes multiple arguments you can also pass a set prop and the del prop if you want it, I didn't do it in this example for brevity, but if you just have a get there, you can just say at the top at property instead of prop = property(get_prop) at the bottom, any questions about this? All right. So the other general mechanisms, kind of related, they're also descriptors, classmethods and staticmethods. Before Python 2.2, Python only had "instance methods" that is normal methods, methods that take self as the first argument, they got called in an instance and if you try to call them on a class without having an instance, you get an error. So in the Python 2.2, we got classmethods and staticmethods. Classmethods take class as the actual class objects as the first argument and that's terribly useful, I'll show why in a moment. Staticmethods take no magic argument and their not very useful even though Java and C++ programmers come into Python often say, "Oh I need a staticmethod for this." Generally not, they're only useful for one particular thing and I'll show that in a minute. So here's a classmethod. Again it's--you can--if you're using Python 2.4, you can use classmethod at classmethod at the top for the decorator syntax, if not you'll have to use it at the bottom. So say we have a FancyDict, that sub class of dict, and we define a method to create a dict from a set of keys with the single value, so we don't have to generate this list of key value pairs, we can just say "generate it from keys". So what I have here is we create a list of that key value pairs and pass that to the class and because it's a class method and gets the actual class passed, we can call it on any sub class for FancyDict without anything, in particular happening in the sub classes, and it'll create a sub class of FancyDict instead of a FancyDict itself. So whenever you think, "Oh, I should have a staticmethod and I'll do something with my own class," you should actually use a classmethod and do something with the first argument. Now, this is a rather silly example because dict already has this exact thing implemented. There's already a fromkeys method that is a class method in the dict type and it's very useful whenever you sub class dict which is not too often. Anyway at the bottom it's shown what happens when you use it. So staticmethods, they're not very useful, the main use is protecting dependencies from becoming methods. When you use dependency injection as I do here in the example, you don't know what you're actually injecting into your class. If it happens to be a method or--if it happens to be a function or something that does something magical when used as an attribute of a class, this won't do what you want it to do; it won't do the same thing as calling sendmail.sendmail where I'm now calling self.sendmail. So you can wrap it in a staticmethods to prevent it from becoming an actual method. That's the only thing I've ever seen that makes sense for using staticmethod. Although as we'll see later, Python actually has a staticmethod itself which is a good example of something that should have been a classmethod. Another new feature, __slots__ which is for omitting the __dict__ attribute for arbitrary attributes, it basically prevents you from assigning arbitrary attributes to another object. It reduces memory use because the dict isn't allocated and it's a more compact form to store the same number of attributes as a dict, but it's not going to be much faster than a dict even for a few attributes. The main reason to have it is when you want to emulate or actually implement immutable Python classes like we add immutable types where you don't want attributes assigned to them arbitrarily and then your well-—there's a tiny, tiny, tiny class for showing slots, they're right there. If you actually try to assign to something other than self the value, either in init or anywhere else, it'll actually throw an exception, except for stuff that's already in the class but, I mean, __init the def statement won't be throwing an exception of course, because Python knows that it's already there. Another new thing in Python 2.2 is the actual constructor. Before Python 2.2 there was just __init which is an initializer and other constructor, it gets called after the object has been created and it's your job to initialize it and set the right attributes and whatever, but the data is already there, the objects are already there. So __new is called actually to construct an object, allocate method--or allocate memory for it, make sure it's alright. In Python, it's used--actually just for implementing immutable types because if you have an __init to set attributes, it's too late because the types who are to be created so it can be immutable if you can assign to it in __init. So you need to do it in __new. And this is the example of a staticmethod that shouldn't actually be a staticmethod. It cannot be an instancemethod because its job is to create the instance, so there's no instance to be a method of it. So, Guido made it a staticmethod before he added classmethods in the development cycle of Python 2.2, I think. It could have been a classmethod but, well it's too late now. It's the staticmethod that takes the actual classes' first argument and you need to pass it around explicitly whenever you call the super classes new. When you want to actually implement __new, you generally always call object on __new or your super class __new to do the actual location because there is no other way to allocate memory in Python. However, your __new method or staticmethod can also return existing types, existing objects, you can just return any object you want from __new whereas __init, either has to return self or return none because there is nothing else--you can't actually change whatever it returns, __new you can return whatever you want and that's the result of instantiating your class. There are--there's one caveat there, when you return an instance of your own class, whether it's an actual instance of your class or a sub class of your class, the __init is still called, even if it's an already created object. That's because python can't know that you're __new is returning an old object or a new object. So it always calls __init. Of course, if you return something that's not your type, that's not a sub class of your type. It knows that it's already been initialized, so it doesn't call under __init. So, here's an example of an __new, WrappingInt which is an immutable type in Python. We set slots to the empty list so it doesn't get arbitrary attributes. And then in __new we take the value which is whatever our constructor was called with. We modular it so it doesn't go pass 255 or 256 or whatever. And then we create--we actually create self by calling the parent class method. As you can see here, it's a staticmethod because, even though we're calling it on a class, we're passing the class we were passed as well exquisitely. Any questions so far? Yes. >> You do you make a—-an object of view [INDISTINCT]. How do you define class? [INDISTINCT] make sure if it is immutable. >> WOUTERS: How do you create class and make sure it's immutable? By not providing any of the things that are--that mutate to the object. So, for instance, this is an easy example because Int is its own value, so you're not storing anything in the value. We don't have—-we don't accept arbitrary—-attributes and we let our parent create the object and it's done. If you want to not sub class ability to immutable type, you have to do a lot of more work because you need somewhere to store the data and then provide properties to read the data but not write the data. That's basically how you create. So, you do the same thing as here and you have some magic in here that sets a hidden variable basically, that then properties can't get out but nobody else can get right access to. It's not easy and it's not, usually not worth it. Mostly Python classes just are--just implemented in terms of existing types and if you want an immutable type you either want something that is int like, string like, or tuple like and you can just sub class in string or tuple and be done with it. Any other questions? Alright, is there any interested in Metaclasses? So I mentioned them--alright. So, Metaclasses are these theoretically, satisfying and mind boggling concept where you can draw this graphs between what the classes and it's Metaclass and what the class of the Metaclasses and then what instance-—where object its an instance of the--the general ideas that the Metaclass is the class of a class. It's the type that are classes, it's whatever implements a class. And of course the Metaclass is an instance of something, so the Metaclass has a Metaclass and they are all in Python, the base Metaclass is called type. And type's type is type. And type is actually an object; the parent of the type of—-parent class of type is object. All objects have a base class that's object, so you can see how it gets confusing. Of course, the type of object is type, so you can draw very funny graphs that way, but it's all, you know, it doesn't matter because in python everything is simple and you can just--you can just say, the class--the Metaclasses, the class that creates the class. So, of course it doesn't apply to type or object because they are created secretly in python by magic, but every other class you define is Metaclasses that whose job it is to create the class from that name space that is the cold walk of the class. So, we go [INDISTINCT], yes well, if we go back to all the way up to the class, here, this is all done before the Metaclasses is even looked at. And then when this piece of code, the blue parts and the green parts are all compiled and nicely wrapped up in the name space executed, nicely wrapped up in the name space, then that dict is passed to the Metaclass along with a name and the parents whatever you want to sub class. And it says, you know, "Create me a class and return it." And then the result of that instantiation is your new class. So, it's practically simple. And whatever you want to use it for in Python you can use it for the stuff that you normally define in a class to define how an instance of the class behaves. You can do the very same thing with a Metaclass and it'll define how the class behaves. So the magic that creates, for instance, functions which is hidden in the class and stuff that calls descriptors which is hidden in the class is actually called by get at or get attribute which is—-I should probably should have mentioned. __new and __init are called to construct the class in to initialize the class, that all happens to same way that you would expect. So you can overwrite them and you can do as many things as you want, the thing there most useful for is post processing a class just doing some wonderful renaming or mangling or other stuff with a class after it's been defined, before it's visible to any other Python code without requiring an explicit post processing step. As I said, you can do evil stuff in get at or get attribute if you want. It's probably not worth it. It'll just make your code much more complex. So here's a Metaclass example in Python. We--sub plus type because it's convenient and I think it's necessary as well. You define in __new which is the staticmethod as usual. I forgot to mention, you don't actually have to explicitly make new staticmethod but you can, if you want to. The under--the Metaclasses __new gets passed—-the class that is actually ourselves because it's a staticmethod of ourselves or a class method of ourselves. The name of the class that needs to be created, the basis which is the tupule of the basis that are named in the class statement and the dict that is the name space that--of the code object, it's just a dict with attributes. So what we do here is we go over all the attributes, we skip any of that start with __ because you don't want, by accident do the wrong thing with the magic methods. And then we name mangle whatever attributes are leftover by appending or prepending Foo under--to them to make them look funny, we delete the original attribute and then we--at the end we call the base class new with the same arguments but with the modified attributes. And then to use it, to use the Metaclass you have to explicitly tell Python to use a different Metaclass than whatever is the default which is type. You do __metaclass__ is MangledType in the class dict or if you want at the top of the module. Now Metaclasses is getting heritage, so if you sub class mangled class you automatically get the mangled type as Metaclass. If you want to sub class and have a different type your—-or different Metaclass--your Metaclass has to be a sub class of your super classes Metaclass. So if I wanted to have a more mangled class with a more mangled Metaclass, I have to sub class mangling type to get a more mangling type and have that as my Metaclass. So, any questions there? >> Are mangle type [INDISTINCT]? >> WOUTERS: Yes, sorry that's a typo. The—-it should say mangling type at the bottom and not mangled type. Yes. >> I think I remember that django is a Metaclass at the bottom [INDISTINCT], is that true? >> WOUTERS: Yes. >> Do you know how would it do? >> WOUTERS: Yes. I don't have an example right now. I have some django code and it's very interesting how it works in django. Sorry. The question was django uses Metaclasses and how is that done. Django has an API where you define a class with various attributes to describe your data model. And then you can have some extra classes inside the class to describe admin options and display options and whatever. What it does is it actually--just like this, it goes over all the attributes that were the result of executing the code inside the class statement and examines them. And the order doesn't matter to django for the most part but where it does order it abuses the fact when compiling a class for executing a class, it executes top--top to bottom. So it calls a field, let me see, you do--a field is a particular type and the type is a--is an instantiation of a class. So you say, "field is in field" and "field two is car field". And it keeps track of--in what order are those type were created by keeping a global counter for instance, so I'd know which order the fields are in the class statement. That's about the only magic thing that the django thing does. And then, for the rest is just--is just examining whatever's in the others dict that gets passed in the Metaclass to write down the sequel that's whatever the database backhand needs to store and whatever options there are, etcetera. Does that answer your question? >> Yes. >> WOUTERS: Alright. Any other questions? All right. So, I'm going to cover multiple inheritance unless everyone here goes, "No, no, don't ever use multiple inheritance," alright? So, multiple inheritance in Python and in other languages is something that's frequently argued whether it's good enough or sane or insane. Well, it's generally not a good idea but occasionally, especially in python, it could make stuff a lot easier. New style objects have a different way of doing multiple inheritance in that they usually C3 method resolution order which is an algorithm described, I think in a Dylan paper, describing how to do when you have multiple inheritance, in particular, involving diamonds where multiple super classes inherit from the same super, super class, how to handle that correctly. And the algorithm is pretty simple, it just goes a depth-first, left-to-right search through all the classes, but then it removes all duplicates except the last one. So, if a class appears two times, it'll be visited, not the first time it appears, but the last time it appears. And in Python, we also have a convenience--contingency object called super which can help you continue method resolution... All your parent classes are visited after you're visited are therefore visited before you're visited. But, you might—-they might not be the next phase class, if you're super class might not be the next super class to be visited and your sub classes might not have been visited right before you. That's rather important to realize. So calling your base class method directly saying "my parent.foo" whatever, is never right. Because there's always going to be some situation where that will do wrong thing and skip classes when visiting methods. So, the way to do it in Python is you have a single base class with a full interface of whatever your tree is going to implement, so that any object can always call whatever method they want to call within that tree on any other class within that tree. But--in usual cases those--those functions will probably be do nothing functions. They shouldn't raise errors because then you can't safely call them all the time but they should—-if anything, they should do nothing. The signature of those methods should never change because you cannot know which order of classes will be called in. If you have to change a signature of a method in a particular part of the multiple inheritances tree, you should have a second master basically, in the multiple inheritance tree and make sure that it's a separate section of the multiple inheritance tree. And then you should have-—you should use super everywhere you want to have anything from a base class, anywhere. And that can be annoying because all of your code has to follow it in all of the classes and you're usually not the only one developing all those classes, so you have to convince everyone to use super everywhere. And as I can show here, using super is not actually convenient right now. You call super, passing this class, the current class the class you're coding in, and the instance or class if you want-—if you have class methods that you were passed, you were called on. That's--that returns a proxy object and then you can, you can use that as if it was self to call the original method. It's not too convenient and I hope we'll have syntax for doing super calls in python 3.O, maybe sooner, but, I'm not holding my breathe. Any questions about these? All right. I'm going to cover Unicode then if we have time. >> [INDISTINCT] >> WOUTERS: This [INDISTINCT] >> [INDISTINCT] >> WOUTERS: So, Unicode, it's somewhat longer topic though. >> [INDISTINCT] >> WOUTERS: No, just twice as long as the previous topic or whatever. So, Unicode is a standard for describing characters. The way byte strings describe bytes, you say, "A is represented by 65", a Unicode you say, "A is code point 65" and--there's no relation between Unicode as such and bytes on the disk. In Python that means the old store type that string holds bytes, it's a byte string, we call it a byte string nowadays. And Unicode object or Unicode strings they hold characters which is for ASCII is actually the same thing. But for non ASCII, it's entirely different. A Unicode has no on disk representation as such, so if you want to store Unicode in disk or even in memory, you need to use an encoding. Python internally uses either UTF-16 or UCS-4 encodings or UCS-2 and UCS-4 depending on how you exactly define what Python does. But it uses either 2 or 4 bytes to represent each character, whereas, bytes strings they always use 1 byte for every character or byte. When you had a byte string and you want to have a Unicode string, a Unicode object, you have to decode the byte string into Unicode. And if you--if you have a Unicode string and you want to store it somewhere, you want to pass it over to the network or when you write it to disk; you have to in code Unicode into bytes. A convenient encoding is UTF-8, and some people confused UTF-8 with Unicode, for instance, Postgres, the database, has an encoding code Unicode which is actually UTF-8 and not Unicode. It's one of those mistakes many people make but UTF-8 is not Unicode. UTF-8 is a Unicode encoding and it's Unicode encoding that looks like ASCII to all Americans who are people who don't care about accents or funny characters. But it can actually encode all of Unicode and it does so by basically screwing Chinese and Japanese people by having all of their characters be like 7 or 8 different bytes. So, in Python, Unicode is pretty convenient, except for--it mixes the strings. You can have Unicode literals which look just like string literals except you have a couple of more escapes besides the normal backslash X escapes and backslash O, and backslash 0, I mean. You can have backslash U which is a short Unicode escape and a backslash capital U which is a long Unicode escape. The short U has, as you can see, 2 bytes and then the long U has 4 bytes. And the long U isn't very necessary until you start working with the higher code planes that were added last to Unicode and anything. Also, instead of Chr, the heart to create a single character, you have Unichr which creates from a number or any character, any Unicode character. And we support in the compiler at compile time Unicode names. You can use backslash and then in the curly braces, the name for any Unicode code point. Then the code defines all these names, we have them all in the interpreter at compile time, that actually results in a single character Unicode string with a euro sign in there. It's a single character Unicode, but when you encode it in—-in an encoding that supports the euro sign, it'll look actually an entirely different character or multiple bytes or whatever. If you want to do this name look ups in reverse, if you have a character you want to look up in the Unicode name, the Unicode data module does that for you. If you have the name and you want the actual character, Unicode data does that, too. So Unicode objects, they behave exactly like strings, it's very convenient, you can slice them and you actually slicing characters instead of encoded data. The length is right, you can iterate every character, everything is great, except when they mix with non-ASCII byte strings. When they mix with ASCII byte strings, Python will automatically upgrade the byte string into a Unicode object which is with the ASCII encoding. So that works for ASCII bytes strings, but if there's a funny character in the byte string that's not ASCII, it'll blow up, because it tries to interpret it as ASCII and it sees that it's actually not ASCII and it doesn't know what you want to do with it, so don't do that. Another problem with the python approach is that the decode and encode methods of strings and Unicode objects are generalized. They don't just do encoding to byte string or decoding into Unicode object, you can actually convert strings into strings and byte strings into byte strings and integers into whatever you want or two integers. It's--it's inconvenient and I'm not entirely sure if that should be fixed or not, but it's inconvenient when you only care about Unicode. On the other hand they do have convenient features. So, using Unicode in python is very simple. Never mix Unicode objects and byte strings. It's a very simple rule, if you do that everything would be great, except of course it's not always easy not to mix byte strings in Unicode. If you write libraries, you might get pass the string when you don't expect it or you get--might get pass a Unicode object when you don't expect it. If you have your application--if you write your application, you have to make sure that anywhere you get a string; you get it as Unicode object or you get it as a byte string and you translate it yourself. So decode--the best way to do it is decode byte strings into Unicode when you get them and in code Unicode into whatever output you have when you're outputting. And of course you have to remember to use the right encoding, so you have to remember what the encoding would be like when you get input or should output and there's no way to detect and encoding. Because it's just bytes and there's no marker in there that says "this is UTF-8" or "this is a latin-1" or whatever, or UTF-16 for that matter. And it all looks vaguely familiar when you're actually looking at the bytes, but that might not mean that's the correct thing to decode with that encoding. Fortunately, if you can figure out which encoding to use Python does have some conveniency functions and modules, the codecs module, in particular. Codecs module has an open function that behaves like the builtin open function, except it takes on encoding and it'll automatically decode data as a—-as if you just read from it. So when you read from codecs with open objects, you're actually reading Unicode and you write Unicode to it it'll automatically encode it into whatever encode you passed in. If you want to wrap existing streams, like sockets or files you partially read, you can use codecs to get reader or get writer to transform the on the file transform the string from byte string to Unicode or the other way around. And lastly, when you do write stuff like it and you're debugging your code and there's some problem with mixing Unicode and byte strings, pay attention to the exceptions because there's two exceptions you could get; there's Unicode decoder which you get when decoding a byte string into Unicode goes wrong and there's Unicode encoder which is the other way around. And if you use str.encode, so you're trying to encode a byte string into a Unicode encoding, what it'll actually do is silently try to decode the string first into Unicode and then encode it with the encoding you gave it. So, that actually tries to apply the default encoding which is ASCII to str, and even though you call str at encode, you will be getting a decode error if it ends up that str is not an ASCII string. So, that was it. That was all my topics I'm glad we went over all them. Here's some more information if you want; descriptors, metaclasses and super all describe in Guido's original descrintro tutorial for python 2.2 which is still very relevant. Iterators and generators are well described in a--under kooklinks tutorial on functional programming, you don't have to follow the whole thing if you don't like functional programming, but the parts about iterators and generators are very good. And if you want to learn about writing C types, the standard documentation is a great source, as well as the Python's source, the actual Python modules are all in the same API and they're very readable source even if I do say so myself and it's highly recommended. And always the python standard library and the Python C code are all great sources. That was it. Any more questions? >> [INDISTINCT] somewhere, that we can get up? >> WOUTERS: I can put it up somewhere. >> How about the previous presentation about the upcoming [INDISTINCT]. >> WOUTERS: Sorry? >> The previous presentation, I guess, it was [INDISTINCT] wheel about the feature of Python? Is there any record of that somewhere that we can look at up? >> WOUTERS: There's like a four or five different movies, sorry. >> It's on my laptop you can upload it. >> Okay great. >> WOUTERS: Any other questions? >> What's a [INDISTINCT]-—what's the good resource for a sort of module import resolution? You know, like, when you're changing—-when you're moving Python [INDISTINCT] from one place to another. [INDISTINCT] and is there, like a, standard way of how you do of all that. >> WOUTERS: So, you mean from one string to another? >> From, you know, one string to another or what [INDISTINCT] code base [INDISTINCT]. You start mixing it [INDISTINCT] and things like that or in like [INDISTINCT] libraries. It's got to be like when you do [INDISTINCT]... >> WOUTERS: Usually a byte store...

Info

Channel: Google TechTalks

Views: 303,844

Rating: 4.8409872 out of 5

Keywords: python, google, howto

Id: E_kZDvwofHY

Channel Id: undefined

Length: 75min 44sec (4544 seconds)

Published: Mon Oct 08 2007