>> Hello. Yes, hi. Welcome to the latest in our
series of talks in advance topics on programming languages. Today we're very lucky to have
Thomas Wouters whose here to talk to us about features that are currently available in Python
but are advanced features. Last week, we had--we were very lucky to have Guido, that awesome
talk about upcoming features; they will be talking about existing features, right? Yes?
I wanted to--as always, this is the latest in our series of talks. I always want to make
a pitch so that people will actually give talks of their own. It's a really useful thing
to do. If you have specialized knowledge or if you have interesting knowledge, to share
that knowledge with as many Googlers as possible and potentially even the rest of the world.
So, please come and see me and we can set up a talk and you can give--yet another, in
our series of talks here. And with that, I will turn it over to Thomas who shall give
a very interesting topic. Thank you. >> WOUTERS: Is this on? Right. Although there's
an echo. So I'm going to give a talk about Advanced Python which is not an easy subject
in that there's a lot of Python that can be considered advanced. I'm not sure what your
level of advancement is so I'm going to cover basically everything from the start until
the end and we'll see how far we'll cover. These are the subjects I'll be covering. I'll
certainly be explaining what the Python design, objects, iterators and generators and hopefully
decorators. Everything after that is a bonus. I'm going to keep the pace pretty high, so
if there's any questions, just wave your arms and I'll stop and explain. But if everyone
can keep up then we can probably cover the more interesting topics at the end as well.
And if there's specific interest in any of the topics especially, for instance, Unicode,
I can skip over the advanced stuff and jump right to Unicode. So, first about Python,
Python was developed by Guido as a middle system language between shell scripting and
system programming. So it was intended to be easy to use by normal programmers but still
allow more complex structures than a shell script--scripting language. It turns out that
it's also convenient for library programmers. People doing the actual programming for end
users because you can hide a lot of smarts inside of objects or modules and just expose
a very simple API to end users. This has gotten progressively better with later releases of
Python. There are multiple implementations of Python; CPython is the one everyone uses
except for those that are developing the others. Jython is--has been around for quite awhile
it has--is actually in use by a lot of people but not as much as CPython. IronPython and
PyPy are still actively being developed although, IronPython is apparently very usable. It is
designed by implementation in that if you don't come up with an implementation of a
feature that you want implemented, it's not going to happen. But it's not a slave to the
implementation and it doesn't mean that if you implement an idea, it will get in. It's
still a very designed language. We have feature releases every one to two years. It used to
be every year or so and now, the last one was in the works for two years and there's
a lot of bugfix releases as well. An important note about the feature releases, if you do
not get any warnings in using a particular minor release, upgrading to newer minor release,
the one following that minor release will almost certainly work. Any--anything that
changes semantics or possibly breaks code will be warning for at least one release.
And bugfix releases have a similar requirement that you can always upgrade to a newer bugfix
release of the same minor version and nothing will break that wasn't already broken. Important
in realizing how Python works is that everything is runtime, even compiletime is actually runtime.
As you may know, there are cached files, .pyc files that are cached bytecode, that's basically
borrowed runtime, someone else's runtime that you can skip. Also important is that all execution
happens in a namespace and that namespace is actually a dict, it's a dictionary object
with keys and values. Modules, functions and classes all have their own little namespace
and they don't affect other namespaces unless they explicitly do so. And modules are executed
top to bottom. Script start running at line one and they run all the way down to the end
or until loops or whatever make them wait. So in that regard, it's very much like a script.
When you import a module, it does the same thing, it just starts executing line one and
runs on down until in the end and the import is done. And another import aspect is that
the def and class statements don't define a function or at compilation time, it's a
runtime thing. It actually happens when bad statement is reached. So some more about functions
and classes as well because they're important features. At compiletime which is runtime,
the code for the function will be compiled into a code object. It's a separate object
from whatever the rest of the code is compiled into. And then at def time when the def statement
is executed, it will be--the code object will be wrapped into a function object along with
whatever arguments it takes, any defaults to those arguments. And essentially def is
an assignment operation. If you look at the bytecode, you'll see that it just uses the
same bytecodes as normal assignment operations. So here's an example of a function. The red
part is the actual def statement that gets evaluated when it's reached including the
part in blue which is an argument default. That part gets evaluated at the moment of
def. So the Foo object is constructed or the Foo function is called, if it's a function,
at the moment the def is being executed and the green part has been compiled at that time
because compiletime has been in the past but it's not being executed until later which
is when you call the function. So this is when you call the function, there's an inner
function in there, so what happens is that the green lines you see here, they've been
compiled into their own code object and when you call func, the inner func gets created
at that time and again the default argument of the defaults for the three argument is
evaluated at the moment you call def. If you look after the definition of inner func, you'll
see that arg1 and arg2 are reassigned. That means that whatever arg2 was, will be assigned
to inner func's arc3 and not whatever it is at the moment you call inner func. However,
inside inner func, we also use arg1 which is using [INDISTINCT] scope which I'll be
mentioning later as well, which means that it will be using the outer functions arg1
and that will be whatever arg1 is at the moment you call inner func. So there's a very big
difference between those two uses. >> Question?
>> WOUTERS: Yes? >> That very final line, was that evaluating
inner func or [INDISTINCT] function? >> WOUTERS: No, it's just returning function
object. >> Okay, thanks.
>> WOUTERS: So it's not calling anything, only the parenthesis do actual calling. Yes?
>> [INDISTINCT] could you repeat the question... >> WOUTERS: Sorry? Yes.
>> [INDISTINCT] >> WOUTERS: The question was is the last line
actually evaluating inner func? It is not, it's just returning the function object that
is inner func. So class statements are very similar but also very different. A class statement
is followed by a block of code, a suite, that's executed when the class statement is reached
right away. It is however executed in separate namespace, a dict again. And then after that
code of block has been executed, the namespace is passed to the--or is used to create the
actual class object. Any functions that you define inside the code block then get turned
into methods by magic, basically. And--well, as people are already programmed in Python
will know the self argument of this, for Python, is passed implicitly when you call a method
but your method still have to receive it explicitly as the first argument. And an important thing
to realize is that the code block in a class is actually a normal code block. You can do
whatever you want in here, for instance, at the top I create a class attribute which is
just something stored in the class, result of calling func. I define a method which is
just a normal function but it takes self [INDISTINCT] argument and it assigns to an attribute of
self when it is called. I also define a helper function which just did some silly arithmetic
and call that helper function inside the class block itself to calculate or recalculate the
class header, and at the end I delete helper because otherwise, it would end up in the
class and since it doesn't have self [INDISTINCT] first argument, it wouldn't do very helpful
things. Other important parts about Python, variables aren't containers like in other
languages. They're not allocated memory that you can use to store data. People often say
about Python that everything is an object and they also sometimes say everything is
a reference. That's both true but both are not true when you apply them to variables
because variables are not objects and variables are not references. Variables are only labels.
Everything concrete, everything that allocates memory, everything you can toss around is
an object and whenever you hold an object, whenever you store it anywhere, you're actually
storing reference. You're not actually own an object, you only own references to an object.
Variables don't have type information, they don't have information on the scope or the
variable or how many objects live around or when they were created, the only thing they
do is refer to an object and that's it. If you want to look at it from an implementation
standpoint, it's a dict. It's a dictionary mapping names to objects, that's it. So, scopes
also related to namespaces. Python has a very simple two scope rule which is actually three
scopes. Names are either local or they are global or built in. Local means it's--it exists
in a current function or class or module but it doesn't exist outside. Global means it
exists in the current module, it doesn't mean exist everywhere, just in a current module.
And builtin is a special namespace that adduced by builtin functions that you can actually
modify. It's just a special module that you can modify yourself if you want to. When the
Python compiler examines a function and compiles a function, it keeps track of whatever names
you assign to and assumes correctly because by definition, those names you assign to are
local. You can change this by using the global declaration which is the only actual declaration
Python has. But don't do it too often, it's usually a sign that your code is incorrect.
All other names that the compiler finds are either global or builtin and the lookup is
it looks in global and then it looks in builtin. So if you have a global of the same name as
a builtin, all the code in the module will find the global of that name instead of the
builtin. You can actually use this on other modules as well. If you import a module and
then assign to an attribute of the module, you'll be assigning to a global name in that
module and you can mask whatever "itthings" is a builtin name by assigning to that attribute.
There's also a trick with nested scopes which were added later in Python. I think in Python
2.1 where you can--as I pointed out before, you can refer in an inner function to a variable
in an outer function but that is only read-only. You cannot assign to a name in an outer function.
This isn't really a mechanical problem in Python. It would be possible to assign if
we added it but there's no syntax to say and one I assigned to another--outer scope variable.
Global only assigns to a global name and not to outer scopes. Apparently, Python 3 might
be getting--Python 3.0 might be getting syntax for this. So I mentioned modules. Modules
are there to organize code. They're very convenient because they have their own namespace with
global names. They also keep the rest of the module tidy. They're always executed on first
import and then cached in sys.modules. This cache is just about everything that you import
in your program and it's also the only thing that keeps track of what you have imported.
So if you toss something out of sys.modules and import it again, it will get executed
again. Import is just syntactic sugar, just like--basically everything in Python is syntactic
sugar. It calls the __import__ builtin function. If you want to do an import over module whose
name you have in a string object, you use __import. If you want to replace __import,
you can just write your own function and replace the one in the builtin namespace. Another
trick is that sys.modules does not have to contain modules. It's a mapping of name to
whatever object you want import to return. So you can toss in any--any all object in
there and then importing that name will return your objects. Storing none in sys.modules
just means this module couldn't be found. So, if you want to prevent some module from
being imported, you can insert in sys.modules "none" under that name and then it'll raise
ImportError whenever you try to import it. Python objects in general are described under--in
various terms, Mutability is a common one. That means the object is changeable. List,
for instance, are mutable, Tuples are not. Mutability is very important in Python because
everything is a reference. If you have a mutable object and you end up changing it by accident
and changing it for everyone, that's rather inconvenient. So you have to keep in mind
whether objects are mutable or not. Fortunately, this mostly happens by accident correct, anyway.
Related concept is Hashability, whether you can take the hash of an object. In normal
operation, mutable objects are not hashable at most, immutable objects are hashable. For
instance, Tuples are hashable but they're only hashable if all their values are also
hashable. Hash--hashes are being used just for dicts and sets right now, so any dict
key has to be hashable in any set item, anything you want to store in the set has to be hashable
as well. But it's not inconceivable that more stuff uses hashes. And then there's the imaginary
abstract base classes that are often referred saying that an object is file-like or it's
a sequence or it's a mapping. Those are abstract concepts that are somewhat like protocols
or interfaces or whatever you want in another language but they're just informal. They're
just saying whenever an object is sequenced; it has the sequence operations implemented.
You can look in and you can iterate over it, etcetera. Some of them are overlapping, for
instance, sequences, mappings and files are all iterable so the iterable interface applies
to all of them. And actually all of the objects you define, all that you're defining is implementation
of syntax. Python defines syntax and you can plug in whatever implementation you want in
that syntax. And you don't do it by--in other languages you say you implement the less than
operator, in Python you say you implement the less than operation so that it can be
used even when you're not actually using the less than operator. Here's a short list of
hooks that Python supports. I'm not going to go over all of them. Most should be--should
be common enough. The one thing that's worth mentioning is conversions. You can define
a few hooks that define how your object gets converted to other types. That just works
for builtin types of course, because the hooks don't exist for arbitrary types. But most
arbitrary types that want to do something like this can just say if you have an __Foo
method, I'll call it to turn you into a Foo object if I want to. But it's not very common.
These things are not Python hooks. These things you cannot configure on your object. Assignment.
Assignment is an operation that changes the namespace and not the object. Since an object
is just the reference to an object, you have no control over where your references are
going. Another thing you cannot control are type checks. You cannot control when people
are comparing your type or doing instance checks, you cannot control what they get as
a result. Related identity comparison, the "is" operator, it checks whether two references
are pointing to or referring to the same objects, you have no control over that. "And" and "or"
and "not" are Boolean Operations and they just call the one truth value test that you're
operated--your object can implement and you have no control over what they actually return.
Some may know "and" and "or" in Python return, one of the two values were as "not" returns
always a Boolean. And Methodcalls, you cannot define in a single operation whatever will
happen when someone calls a method on you because methods are getting an attribute followed
by calling that object. They're two separate steps. So in order to implement methodcalls
in your object, you would have to implement getattr to return some in go-between object
and then have that go-between object do something when it's called. I'm sorry. So on to some
implementation details in C. This is just--applies to CPython of course. Python objects are implemented
in struct that holds various bookkeeping information like ref counts, the actual type that an object
is, as well as arbitrary C data. It can be pointers, it can be in C, it can be an array,
whatever you want. Types are what describes what an object is and how it behaves. There
are separate struct which is also a PyObject struct. The arbitrary data in a PyType struct
is the function pointers and various other things that describe the type. The PyObject
structs are not relocatable. You cannot move them around once you've given them out to
anyone else. It's a blog of memory that's allocated and that's it. If you want to move
it around, you have to destroy the object which means you have to get rid of all the
references to it. That also means that variable sized objects like lists that have a portion
that needs to be reallocated and might move around, are--just use an extra layer of indirection.
It's just a pointer stored somewhere in. And of course because it's C, it doesn't have
ref counting by nature so ref counting is on manually, whenever you receive a reference
you incref and when it's over, whenever you toss one out, you decref. It sounds simple,
it can get quite complicated but the Python C API is mostly clear enough that it's not
too hard once you get used to it. Another feature that is done reasonably manually is
weak references. Weak references are references that get informed when their object goes away
so that they can do some cleaning up and not crash. The weak references are just callbacks
basically in the PyObject struct arbitrary data. One of the major problems with ref counting
is reference cycles. That is two objects that refer to each other causing them both never
to be cleaned up. Two objects referring to each other as of course to the simple problem
and the complex problem is a huge loop of object referring to objects and everything.
Python has a cyclic garbage collector which keeps track of objects that might participate
in cycles, for instance, lists and dicts and walks them every now and then to find object
groups that are not reachable through anything but themselves and then it cleans up the entire
cycle. This is all done in C, if you write a C extension or a C type, you don't usually
have to bother with it too much. There are some methods you have to implement if you
want to participate in cyclic-GC, when you think you might be participating in cycles.
So in Python, when you have an object, what do you have? Well, you have an object that
has two special attributes, the __dict attribute which is a dictionary holding per object data,
and you have the __class attribute which refers to the class. And like in C, the class defines
behavior, and in newer Pythons the class is actually a PyType object and it all matches.
The way to define the behavior is not through function pointers because Python doesn't have
function pointers. It's with specially made methods. Methods that start and end with __,
those are specials Python. They're not special because they get allocated differently, they're
just special because some of them get called by Python automatically in some situations.
You can define your own __ methods and use them whenever you want. There's nothing special
on that regard, it's just the--some of them signals a Python, this needs to be called
whenever that happens. In general, you should not be calling another objects __ methods
yourself, not directly. You should always use them through whatever mechanism wraps
them. So for instance, if you want to have the string representation of an object, you
shouldn't be calling object .__str__, you should be just be calling StrObject. Another
feature of Python is that class attributes, that is attributes of the class object, are
also reachable through the instance. If you do the instance.attr and it's not defined
in the __dict, it would be fetched from the class and the class might have pair of classes
so they--those get searched as well. And of course in Python, the whole point of Python
is that you don't have to do stuff that Python can do for you, so refcounting, weak references
and cyclic-GC are all done automatically, you never have to worry about them. Typical
Python code does not use type checks because--partly because it was never possible until a couple
of years ago to sub class builtin types. So pretending to be a builtin type meant that
other people would not have to use type--must not use type checks or you could never pretend
to be a builtin type. It's also very convenient, it just means you can pretend to be whatever
type you want, implement the right methods and it'll just work. The C layer sometimes
needs specific types, if you want to multiply a string by an integer, it needs to have,
actually have a string and an integer or there won't be any multiplication in C. So the C
layer has conversions. When it wants an integer, there are special methods that say I need
an integer at this argument, it will convert whatever gets passed in--and it will convert
whatever get passed in to an integer or a string and do the multiplication that way.
If you really must use type checks, for instance, because you're faking C code or you're wrapping
C code or whatever, use isinstance instead of type. Checking type for a specific value
means you could--you only accept objects of that type where isinstance does proper instance
checks so that someone who does subclass whatever type you expect works the right way. Functions
become methods by magic in Python, it happens when you get an attribute off of an object
or rather when you get an attribute of a class through an object. Whenever you get an attribute
of a class that is a function, it get turn—-gets turn into unbound method which is a method
that knows it's a method and it knows which class it belongs to but it doesn't know which
instance, so when you call that method it knows that the first argument you passed must
be an instance of that class and then it'll complain if it's not an instance of the right
type. Of course that type check uses isinstance, so when you have the methods on bound--or
if you have an on bound method you can pass a subclass of whatever class it express--expects
and it works. Bound methods on the other hand are methods that know they're instance and
they will be passing that argument automatically so you call them without the first argument,
you start at the second argument and it all looks normal. So any questions so far? No?
All right, on to iterators then. [pause]
>> WOUTERS: So iterators in Python are really simple, they are helper objects for iteration;
they encapsulate, if you will, the iteration state and they're created and used automatically.
If you don't want to bother with them, you don't have to and it all just happens by magic.
If you do want to bother with them, you can do a lot of fun stuff with them, even more
so if you combine them with generators which I'll be talking about next. Iterators are
two methods, __iter__ and next, notice that there's no __ around next, because next is
actually intended to be called directly sometimes. So there's no __ around it or people would
think that they shouldn't be calling it. Because they're simple, iterators in Python are not
rewindable, they're not reversible, they're not copyable, none of that, they're the lowest
common denominator in iterators. There are however ways to write reversible iterators
if you want, you can just write your own iterator class and add methods to rewind or reverse
or whatever. You can also just nest iterators. And iterators themselves are iterable, they
are just—-actually an iterator is required in its own __iter method to return itself
or it wouldn't be an actual iterator. So in the example I have here, I create—-exclusively
create an iterator over range which is a list of number from 0 to 10, not including 10,
and then I call zip on it which takes a number of iterables and takes one element of each
iterable and wraps it in tuple and stores that in a list which it returns. So I created
a single iterator pass the same iterator to zip twice, and as you can see zip takes one
element of each iterator consuming the iterator as it goes and ends up with having two elements
at a time basically from the original list of zero through nine. So generators are a
convenient way to write iterators, they are lazy functions that get executed as you iterate
over the result basically. Generators use the yield key keyword instead of return, it
works very much the same way except after a yield statement execution continues they're--the
next time you call it or the next time you iterate. The function with yield in it will
return a generator, it will return an iterator that you can loop over and there's—-this
is terribly convenient for __iter__ methods because you can just write what you would
expect in it'll just work. In Python 2.5 generators were complexified and they can now also receive
data and--and you can use them to build co-routines very easily, nothing you cannot also do with
2.4 and earlier iterators, just more conveniently and with extra magic. There's a lot of generator-like
iterators in the iter tools module which is I think new in Python 2.2 or 2.3, there's
a whole bunch of stuff in there you can use to chain and filter and whatever with iterators
that are just really convenient for chaining their various combinations. So here's a generator
example, the bottom function is map, as you may--may know it accept--it only takes one
iterable that goes in map function, it creates a list of results from applying the function
to every item in the iterable--iterable. In the bottom, there's a two line function that
is the generator version and then there's a one line function that is the original map
implemented in terms of the generator. As you can see you just basically lose half the
function if you use a generator because you generate items--each item on the fly and you
return generator instead of an actual list, any questions?
>> [INDISTINCT] now you can do it like that? >> WOUTERS: Can I define what a co-routines
and how would I do it in Python? A co-routine is a routine that is basically like the generator,
it stops execution, passing data to someone else but where a generator returns results
to its caller, a co-routine returns results to someone else, another function. So you
can have two routines where they both consume the output of the other and then the end result
is handled data. How would you do them in Python? Well as I said you can do them in
Python 2.5 with the new sending data to a generator thing, before 2.5 you would write
a class that was itself an iterator and just write it in bits and pieces and--it wouldn't
be as convenient as co-routines in actual languages that have co-routines because you
don't have a single blog of code, you have a whole bunch of separate pieces of code that
get called at the right time, I wouldn't bother implementing them in Python right now. Maybe
with 2.5 outer 2.5 out and people getting used to the new stuff you can do with generators
will get actual coroutines in Python. Yes? >> [INDISTINCT]
>> WOUTERS: The question was if generators are lazy, can you write a generator that loops
infinitely and just keeps on yielding results as long as you ask for it? Yes, yes there
are... >> Then you just ask for finite number?
>> WOUTERS: Well it wouldn't—-do you ask for a finite number? You can if you use the
iter tools module, you can slice iterators, you can say up to so many items in this iterator
and it'll return in iterator that starts consuming the original iterator until the slice ends.
But you don't have to do that. The iterator is something you loop over, so if you have
a loop that loops over two things at once and it stops whenever the condition is met,
if you don't--and you can just loop over an infinite object and rely on the fact that
your loop will end itself for other reasons than that the iterator stops. There's actually
an infinite number of infinite iterators in the iter tools module like itertools.count
which starts counting in zero and just keeps on counting forever and ever, unless you stop
for some reason. [pause]
>> WOUTERS: So on to decorators. Decorators are syntax in Python 2.4 for a feature that's
been around forever which is function wrapping. Decorators are just funcs—-are functions
or callables rather, that take a function as argument and return replacement callable
or they can return the original function if they want. However, the syntax means that
it can get confusing if you have a decorator that also has to accept its own arguments
because now you have a function that should return a callable that should accept a callable
and then return the original callable, so we'll see some examples of that. Another problem
is that because functions are special when they're attributes of a class they become
methods, when you have the decorators that returns something that's not an actual Python
function but something that acts like a Python function for--in most regards, it won't get
turned into method unless you implement the same magic that methods are--that functions
have that turn them into methods which is __get which I'll maybe explain later. So if
you write decorators make sure their methods or make sure they're functions or they won't
get turned into methods. And anyway, as I said simple concept but the devil is in detail
and we'll see about that. Here's a simple decorator, it's a spam decorator that says
whenever the function that you decorate, the same function at the bottom is called, it
actually calls the wrapper function at the top which loops for ten times calling dif--original
function. So in this piece of code the original function get a called--gets called ten times
and there's no return value of the actual function call which is [INDISTINCT] then the
original either. Here's an example of a decorator with an argument, the decorator is spam ten
which is not--no longer--spam is no longer the decorator, it's rather the decorator generator
that takes a number which is a number of times to repeat. And then in spam, we define the
decorator function which accepts the function as an argument and then has a wrapper function
which codes the original function and then return--return wrapper and then of course
return the decorator from the decorator generator. So that will--looks--looks okay, I mean, it
takes some getting used to, the nesting and everything. But there's another problem, what
about interspection? Maybe Python users don't care about the name of their function but
some tools do, like Pydot for instance, which is the Python documentation tool, it actually
looks at function objects and looks the other name and their signature and their DocString
and whatever. And because we replaced the original function, when you asked for the
documentation of sing, it'll actually say, it's called the function wrapper and it has
a signature of *args and **kwargs and it has no DocString. That's probably not what you
want. Another thing is that if you have another decorator in there, you can chain the decorators
that changes attributes of the function; those changes will go away because you're not doing
anything with the attributes of the function. So, some people write code like this which
is the original spam with the repeats argument, with the decorator function in there, with
the wrapper function in there. And then after defining wrapper, we assigned the _name_,
the DocString, the _module_ and--and the dict of the original function back to wrapper so
that all those things will actually be the same for the new function and the old one.
And assigning dict like that actually works, you can--you can copy—-its not a copy it's
a reference assignment, the original function—-all of the original function dicts or attributes
will be accessible in the wrapper function and when you assign to either one of them
it'll--it'll appear in the other one as well, they just share their attributes space. Now
this is not the easiest way to write a decorator, so in Python 2.5 in the func tools module
there is a decorator for decorators that does this for you. So you have a decorator that
you apply to decorator or to decorator generators and then that decorator generated decorator
gets applied to whatever function you have. So as you can see, the devil is in the details,
they can get confusing somewhat quickly, any questions? All right, how are we for time?
>> You got [INDISTINCT]. >> WOUTERS: All right. New-style classes.
When I say new-style classes, when anyone says new-style classes they actually mean
all old new-style classes because they were added in Python 2.2 which was released I think
six or seven years ago something like that, they're old. And it's a unification of the
C types that I explained and the Python classes because before or actually still in classic
classes, instances and classes are distinct C types. There is a C type called class object
or class obj—-that implements all the magic that I talked about--about turning methods
into or turning functions into methods and there's the C type instance which make sure
that instances work as expected with the __dict and everything. So they're separate types
and if you ask for the type of an Int it will say it's an Int, but if you ask a type of—-about
the type of any class that tries to be an Int, it'll say "it's an instance". So you
have no way of checking all that. And another—-a problem with the original approach was that
you cannot sub class builtin types, so Guido worked on--in Python 2.2 on unifying C types
and Python classes and the end result is pretty good. You can mix them or match them and everything,
it worked good. But it only works because a lot of new general mechanisms were added.
They were necessary to bridge divide between C objects and Python types and things that
you assigned from Python have to be inserted in as a C data type in a C struct rather than
as a Python object pointer. Classic classes are still the default so if you write a new
class and you don't specifically make it a new-style class, it'll still be a classic
class and that was for compatibility reasons because there's a lot of stuff that's slightly
different between—-well a lot of stuff. There's a couple of things that are slightly
different between classic classes and new-style classes mostly with multiple inheritance and
you can check if any class or--actually any instance of a class is an instance of a new-style
class because it inherits from object instead of nothing. So you can do actually do is isinstance,
my thing object and you'll know that it's an instance of a new-style class. The first
of the mechanism that I am going to explain is descriptors.
[pause] >> WOUTERS: Descriptors are a generalization
of the magic that happened with functions to turn them into methods. A descriptor is
any object that lives in the class namespace that are class attribute space, so it's an
attribute of the class and that has __get__, __set__, __delete__ methods. It doesn't have
to be--have all of them I think, you can just do it __get or __set or __delete for specific
operations. Whenever an attribute is retrieved--attempted to be retrieved from an object whose class
has an attribute with a descriptor in it, those methods on the descriptor will be called
and the result of those calls will be passed back to the object. Same for setting, it'll
call the set method no result occurs and deleting they call the delete method. The delete method
is not called __del__ because that was already taken for some other hook apparently. It's
now the method behind--the mechanism behind methods. So if you want to have a function
like object that behaves the same way as functions do becoming a method, you can do that by defining
__get__ and it's also the mechanism behind property which is a trick of hiding accessories
behind normal attribute access. So here's an example of properties, I'm not going to
give an example of actual descriptors because it's too boring and you won't be using it
anyway, but here's a property. We have a class, we define the function get prop, it takes
a self argument even though it's not going to be a method, it takes a self argument,
it doesn't stop there and to return whatever the value is of the extra property. And then
we wrap it in property and store it in a local—-in a name that'll eventually be a class attribute.
Oh, I see I have an error right there. So we instantiate the class, I should have had
Foo instead of X there and then we do foo.prop, foo.prop calls to the get prop and because
it's a property, even though it's not a method because it's just a function inside the class
log, the property type knows that it needs to pass the instance for convenience, the
instance that it actually--is called on, onto the function that wraps it. If you look at
this you can say, "Oh, this can be a decorator, too," and it's true you can just say at property
at the top of def get_prop, except that property takes multiple arguments you can also pass
a set prop and the del prop if you want it, I didn't do it in this example for brevity,
but if you just have a get there, you can just say at the top at property instead of
prop = property(get_prop) at the bottom, any questions about this? All right. So the other
general mechanisms, kind of related, they're also descriptors, classmethods and staticmethods.
Before Python 2.2, Python only had "instance methods" that is normal methods, methods that
take self as the first argument, they got called in an instance and if you try to call
them on a class without having an instance, you get an error. So in the Python 2.2, we
got classmethods and staticmethods. Classmethods take class as the actual class objects as
the first argument and that's terribly useful, I'll show why in a moment. Staticmethods take
no magic argument and their not very useful even though Java and C++ programmers come
into Python often say, "Oh I need a staticmethod for this." Generally not, they're only useful
for one particular thing and I'll show that in a minute. So here's a classmethod. Again
it's--you can--if you're using Python 2.4, you can use classmethod at classmethod at
the top for the decorator syntax, if not you'll have to use it at the bottom. So say we have
a FancyDict, that sub class of dict, and we define a method to create a dict from a set
of keys with the single value, so we don't have to generate this list of key value pairs,
we can just say "generate it from keys". So what I have here is we create a list of that
key value pairs and pass that to the class and because it's a class method and gets the
actual class passed, we can call it on any sub class for FancyDict without anything,
in particular happening in the sub classes, and it'll create a sub class of FancyDict
instead of a FancyDict itself. So whenever you think, "Oh, I should have a staticmethod
and I'll do something with my own class," you should actually use a classmethod and
do something with the first argument. Now, this is a rather silly example because dict
already has this exact thing implemented. There's already a fromkeys method that is
a class method in the dict type and it's very useful whenever you sub class dict which is
not too often. Anyway at the bottom it's shown what happens when you use it. So staticmethods,
they're not very useful, the main use is protecting dependencies from becoming methods. When you
use dependency injection as I do here in the example, you don't know what you're actually
injecting into your class. If it happens to be a method or--if it happens to be a function
or something that does something magical when used as an attribute of a class, this won't
do what you want it to do; it won't do the same thing as calling sendmail.sendmail where
I'm now calling self.sendmail. So you can wrap it in a staticmethods to prevent it from
becoming an actual method. That's the only thing I've ever seen that makes sense for
using staticmethod. Although as we'll see later, Python actually has a staticmethod
itself which is a good example of something that should have been a classmethod. Another
new feature, __slots__ which is for omitting the __dict__ attribute for arbitrary attributes,
it basically prevents you from assigning arbitrary attributes to another object. It reduces memory
use because the dict isn't allocated and it's a more compact form to store the same number
of attributes as a dict, but it's not going to be much faster than a dict even for a few
attributes. The main reason to have it is when you want to emulate or actually implement
immutable Python classes like we add immutable types where you don't want attributes assigned
to them arbitrarily and then your well-—there's a tiny, tiny, tiny class for showing slots,
they're right there. If you actually try to assign to something other than self the value,
either in init or anywhere else, it'll actually throw an exception, except for stuff that's
already in the class but, I mean, __init the def statement won't be throwing an exception
of course, because Python knows that it's already there. Another new thing in Python
2.2 is the actual constructor. Before Python 2.2 there was just __init which is an initializer
and other constructor, it gets called after the object has been created and it's your
job to initialize it and set the right attributes and whatever, but the data is already there,
the objects are already there. So __new is called actually to construct an object, allocate
method--or allocate memory for it, make sure it's alright. In Python, it's used--actually
just for implementing immutable types because if you have an __init to set attributes, it's
too late because the types who are to be created so it can be immutable if you can assign to
it in __init. So you need to do it in __new. And this is the example of a staticmethod
that shouldn't actually be a staticmethod. It cannot be an instancemethod because its
job is to create the instance, so there's no instance to be a method of it. So, Guido
made it a staticmethod before he added classmethods in the development cycle of Python 2.2, I
think. It could have been a classmethod but, well it's too late now. It's the staticmethod
that takes the actual classes' first argument and you need to pass it around explicitly
whenever you call the super classes new. When you want to actually implement __new, you
generally always call object on __new or your super class __new to do the actual location
because there is no other way to allocate memory in Python. However, your __new method
or staticmethod can also return existing types, existing objects, you can just return any
object you want from __new whereas __init, either has to return self or return none because
there is nothing else--you can't actually change whatever it returns, __new you can
return whatever you want and that's the result of instantiating your class. There are--there's
one caveat there, when you return an instance of your own class, whether it's an actual
instance of your class or a sub class of your class, the __init is still called, even if
it's an already created object. That's because python can't know that you're __new is returning
an old object or a new object. So it always calls __init. Of course, if you return something
that's not your type, that's not a sub class of your type. It knows that it's already been
initialized, so it doesn't call under __init. So, here's an example of an __new, WrappingInt
which is an immutable type in Python. We set slots to the empty list so it doesn't get
arbitrary attributes. And then in __new we take the value which is whatever our constructor
was called with. We modular it so it doesn't go pass 255 or 256 or whatever. And then we
create--we actually create self by calling the parent class method. As you can see here,
it's a staticmethod because, even though we're calling it on a class, we're passing the class
we were passed as well exquisitely. Any questions so far? Yes.
>> You do you make a—-an object of view [INDISTINCT]. How do you define class? [INDISTINCT]
make sure if it is immutable. >> WOUTERS: How do you create class and make
sure it's immutable? By not providing any of the things that are--that mutate to the
object. So, for instance, this is an easy example because Int is its own value, so you're
not storing anything in the value. We don't have—-we don't accept arbitrary—-attributes
and we let our parent create the object and it's done. If you want to not sub class ability
to immutable type, you have to do a lot of more work because you need somewhere to store
the data and then provide properties to read the data but not write the data. That's basically
how you create. So, you do the same thing as here and you have some magic in here that
sets a hidden variable basically, that then properties can't get out but nobody else can
get right access to. It's not easy and it's not, usually not worth it. Mostly Python classes
just are--just implemented in terms of existing types and if you want an immutable type you
either want something that is int like, string like, or tuple like and you can just sub class
in string or tuple and be done with it. Any other questions? Alright, is there any interested
in Metaclasses? So I mentioned them--alright. So, Metaclasses are these theoretically, satisfying
and mind boggling concept where you can draw this graphs between what the classes and it's
Metaclass and what the class of the Metaclasses and then what instance-—where object its
an instance of the--the general ideas that the Metaclass is the class of a class. It's
the type that are classes, it's whatever implements a class. And of course the Metaclass is an
instance of something, so the Metaclass has a Metaclass and they are all in Python, the
base Metaclass is called type. And type's type is type. And type is actually an object;
the parent of the type of—-parent class of type is object. All objects have a base
class that's object, so you can see how it gets confusing. Of course, the type of object
is type, so you can draw very funny graphs that way, but it's all, you know, it doesn't
matter because in python everything is simple and you can just--you can just say, the class--the
Metaclasses, the class that creates the class. So, of course it doesn't apply to type or
object because they are created secretly in python by magic, but every other class you
define is Metaclasses that whose job it is to create the class from that name space that
is the cold walk of the class. So, we go [INDISTINCT], yes well, if we go back to all the way up
to the class, here, this is all done before the Metaclasses is even looked at. And then
when this piece of code, the blue parts and the green parts are all compiled and nicely
wrapped up in the name space executed, nicely wrapped up in the name space, then that dict
is passed to the Metaclass along with a name and the parents whatever you want to sub class.
And it says, you know, "Create me a class and return it." And then the result of that
instantiation is your new class. So, it's practically simple. And whatever
you want to use it for in Python you can use it for the stuff that you normally define
in a class to define how an instance of the class behaves. You can do the very same thing
with a Metaclass and it'll define how the class behaves. So the magic that creates,
for instance, functions which is hidden in the class and stuff that calls descriptors
which is hidden in the class is actually called by get at or get attribute which is—-I should
probably should have mentioned. __new and __init are called to construct the class in
to initialize the class, that all happens to same way that you would expect. So you
can overwrite them and you can do as many things as you want, the thing there most useful
for is post processing a class just doing some wonderful renaming or mangling or other
stuff with a class after it's been defined, before it's visible to any other Python code
without requiring an explicit post processing step. As I said, you can do evil stuff in
get at or get attribute if you want. It's probably not worth it. It'll just make your
code much more complex. So here's a Metaclass example in Python. We--sub plus type because
it's convenient and I think it's necessary as well. You define in __new which is the
staticmethod as usual. I forgot to mention, you don't actually have to explicitly make
new staticmethod but you can, if you want to. The under--the Metaclasses __new gets
passed—-the class that is actually ourselves because it's a staticmethod of ourselves or
a class method of ourselves. The name of the class that needs to be created, the basis
which is the tupule of the basis that are named in the class statement and the dict
that is the name space that--of the code object, it's just a dict with attributes. So what
we do here is we go over all the attributes, we skip any of that start with __ because
you don't want, by accident do the wrong thing with the magic methods. And then we name mangle
whatever attributes are leftover by appending or prepending Foo under--to them to make them
look funny, we delete the original attribute and then we--at the end we call the base class
new with the same arguments but with the modified attributes. And then to use it, to use the
Metaclass you have to explicitly tell Python to use a different Metaclass than whatever
is the default which is type. You do __metaclass__ is MangledType in the class dict or if you
want at the top of the module. Now Metaclasses is getting heritage, so if you sub class mangled
class you automatically get the mangled type as Metaclass. If you want to sub class and
have a different type your—-or different Metaclass--your Metaclass has to be a sub
class of your super classes Metaclass. So if I wanted to have a more mangled class with
a more mangled Metaclass, I have to sub class mangling type to get a more mangling type
and have that as my Metaclass. So, any questions there?
>> Are mangle type [INDISTINCT]? >> WOUTERS: Yes, sorry that's a typo. The—-it
should say mangling type at the bottom and not mangled type. Yes.
>> I think I remember that django is a Metaclass at the bottom [INDISTINCT], is that true?
>> WOUTERS: Yes. >> Do you know how would it do?
>> WOUTERS: Yes. I don't have an example right now. I have some django code and it's very
interesting how it works in django. Sorry. The question was django uses Metaclasses and
how is that done. Django has an API where you define a class with various attributes
to describe your data model. And then you can have some extra classes inside the class
to describe admin options and display options and whatever. What it does is it actually--just
like this, it goes over all the attributes that were the result of executing the code
inside the class statement and examines them. And the order doesn't matter to django for
the most part but where it does order it abuses the fact when compiling a class for executing
a class, it executes top--top to bottom. So it calls a field, let me see, you do--a field
is a particular type and the type is a--is an instantiation of a class. So you say, "field
is in field" and "field two is car field". And it keeps track of--in what order are those
type were created by keeping a global counter for instance, so I'd know which order the
fields are in the class statement. That's about the only magic thing that the django
thing does. And then, for the rest is just--is just examining whatever's in the others dict
that gets passed in the Metaclass to write down the sequel that's whatever the database
backhand needs to store and whatever options there are, etcetera. Does that answer your
question? >> Yes.
>> WOUTERS: Alright. Any other questions? All right. So, I'm going to cover multiple
inheritance unless everyone here goes, "No, no, don't ever use multiple inheritance,"
alright? So, multiple inheritance in Python and in other languages is something that's
frequently argued whether it's good enough or sane or insane. Well, it's generally not
a good idea but occasionally, especially in python, it could make stuff a lot easier.
New style objects have a different way of doing multiple inheritance in that they usually
C3 method resolution order which is an algorithm described, I think in a Dylan paper, describing
how to do when you have multiple inheritance, in particular, involving diamonds where multiple
super classes inherit from the same super, super class, how to handle that correctly.
And the algorithm is pretty simple, it just goes a depth-first, left-to-right search through
all the classes, but then it removes all duplicates except the last one. So, if a class appears
two times, it'll be visited, not the first time it appears, but the last time it appears.
And in Python, we also have a convenience--contingency object called super which can help you continue
method resolution... All your parent classes are visited after you're visited are therefore
visited before you're visited. But, you might—-they might not be the next phase class, if you're
super class might not be the next super class to be visited and your sub classes might not
have been visited right before you. That's rather important to realize. So calling your
base class method directly saying "my parent.foo" whatever, is never right. Because there's
always going to be some situation where that will do wrong thing and skip classes when
visiting methods. So, the way to do it in Python is you have a single base class with
a full interface of whatever your tree is going to implement, so that any object can
always call whatever method they want to call within that tree on any other class within
that tree. But--in usual cases those--those functions will probably be do nothing functions.
They shouldn't raise errors because then you can't safely call them all the time but they
should—-if anything, they should do nothing. The signature of those methods should never
change because you cannot know which order of classes will be called in. If you have
to change a signature of a method in a particular part of the multiple inheritances tree, you
should have a second master basically, in the multiple inheritance tree and make sure
that it's a separate section of the multiple inheritance tree. And then you should have-—you
should use super everywhere you want to have anything from a base class, anywhere. And
that can be annoying because all of your code has to follow it in all of the classes and
you're usually not the only one developing all those classes, so you have to convince
everyone to use super everywhere. And as I can show here, using super is not actually
convenient right now. You call super, passing this class, the current class the class you're
coding in, and the instance or class if you want-—if you have class methods that you
were passed, you were called on. That's--that returns a proxy object and then you can, you
can use that as if it was self to call the original method. It's not too convenient and
I hope we'll have syntax for doing super calls in python 3.O, maybe sooner, but, I'm not
holding my breathe. Any questions about these? All right. I'm going to cover Unicode then
if we have time. >> [INDISTINCT]
>> WOUTERS: This [INDISTINCT] >> [INDISTINCT]
>> WOUTERS: So, Unicode, it's somewhat longer topic though.
>> [INDISTINCT] >> WOUTERS: No, just twice as long as the
previous topic or whatever. So, Unicode is a standard for describing characters. The
way byte strings describe bytes, you say, "A is represented by 65", a Unicode you say,
"A is code point 65" and--there's no relation between Unicode as such and bytes on the disk.
In Python that means the old store type that string holds bytes, it's a byte string, we
call it a byte string nowadays. And Unicode object or Unicode strings they hold characters
which is for ASCII is actually the same thing. But for non ASCII, it's entirely different.
A Unicode has no on disk representation as such, so if you want to store Unicode in disk
or even in memory, you need to use an encoding. Python internally uses either UTF-16 or UCS-4
encodings or UCS-2 and UCS-4 depending on how you exactly define what Python does. But
it uses either 2 or 4 bytes to represent each character, whereas, bytes strings they always
use 1 byte for every character or byte. When you had a byte string and you want to have
a Unicode string, a Unicode object, you have to decode the byte string into Unicode. And
if you--if you have a Unicode string and you want to store it somewhere, you want to pass
it over to the network or when you write it to disk; you have to in code Unicode into
bytes. A convenient encoding is UTF-8, and some people confused UTF-8 with Unicode, for
instance, Postgres, the database, has an encoding code Unicode which is actually UTF-8 and not
Unicode. It's one of those mistakes many people make but UTF-8 is not Unicode. UTF-8 is a
Unicode encoding and it's Unicode encoding that looks like ASCII to all Americans who
are people who don't care about accents or funny characters. But it can actually encode
all of Unicode and it does so by basically screwing Chinese and Japanese people by having
all of their characters be like 7 or 8 different bytes. So, in Python, Unicode is pretty convenient,
except for--it mixes the strings. You can have Unicode literals which look just like
string literals except you have a couple of more escapes besides the normal backslash
X escapes and backslash O, and backslash 0, I mean. You can have backslash U which is
a short Unicode escape and a backslash capital U which is a long Unicode escape. The short
U has, as you can see, 2 bytes and then the long U has 4 bytes. And the long U isn't very
necessary until you start working with the higher code planes that were added last to
Unicode and anything. Also, instead of Chr, the heart to create a single character, you
have Unichr which creates from a number or any character, any Unicode character. And
we support in the compiler at compile time Unicode names. You can use backslash and then
in the curly braces, the name for any Unicode code point. Then the code defines all these
names, we have them all in the interpreter at compile time, that actually results in
a single character Unicode string with a euro sign in there. It's a single character Unicode,
but when you encode it in—-in an encoding that supports the euro sign, it'll look actually
an entirely different character or multiple bytes or whatever. If you want to do this
name look ups in reverse, if you have a character you want to look up in the Unicode name, the
Unicode data module does that for you. If you have the name and you want the actual
character, Unicode data does that, too. So Unicode objects, they behave exactly like
strings, it's very convenient, you can slice them and you actually slicing characters instead
of encoded data. The length is right, you can iterate every character, everything is
great, except when they mix with non-ASCII byte strings. When they mix with ASCII byte
strings, Python will automatically upgrade the byte string into a Unicode object which
is with the ASCII encoding. So that works for ASCII bytes strings, but if there's a
funny character in the byte string that's not ASCII, it'll blow up, because it tries
to interpret it as ASCII and it sees that it's actually not ASCII and it doesn't know
what you want to do with it, so don't do that. Another problem with the python approach is
that the decode and encode methods of strings and Unicode objects are generalized. They
don't just do encoding to byte string or decoding into Unicode object, you can actually convert
strings into strings and byte strings into byte strings and integers into whatever you
want or two integers. It's--it's inconvenient and I'm not entirely sure if that should be
fixed or not, but it's inconvenient when you only care about Unicode. On the other hand
they do have convenient features. So, using Unicode in python is very simple. Never mix
Unicode objects and byte strings. It's a very simple rule, if you do that everything would
be great, except of course it's not always easy not to mix byte strings in Unicode. If
you write libraries, you might get pass the string when you don't expect it or you get--might
get pass a Unicode object when you don't expect it. If you have your application--if you write
your application, you have to make sure that anywhere you get a string; you get it as Unicode
object or you get it as a byte string and you translate it yourself. So decode--the
best way to do it is decode byte strings into Unicode when you get them and in code Unicode
into whatever output you have when you're outputting. And of course you have to remember
to use the right encoding, so you have to remember what the encoding would be like when
you get input or should output and there's no way to detect and encoding. Because it's
just bytes and there's no marker in there that says "this is UTF-8" or "this is a latin-1"
or whatever, or UTF-16 for that matter. And it all looks vaguely familiar when you're
actually looking at the bytes, but that might not mean that's the correct thing to decode
with that encoding. Fortunately, if you can figure out which encoding to use Python does
have some conveniency functions and modules, the codecs module, in particular. Codecs module
has an open function that behaves like the builtin open function, except it takes on
encoding and it'll automatically decode data as a—-as if you just read from it. So when
you read from codecs with open objects, you're actually reading Unicode and you write Unicode
to it it'll automatically encode it into whatever encode you passed in. If you want to wrap
existing streams, like sockets or files you partially read, you can use codecs to get
reader or get writer to transform the on the file transform the string from byte string
to Unicode or the other way around. And lastly, when you do write stuff like it and you're
debugging your code and there's some problem with mixing Unicode and byte strings, pay
attention to the exceptions because there's two exceptions you could get; there's Unicode
decoder which you get when decoding a byte string into Unicode goes wrong and there's
Unicode encoder which is the other way around. And if you use str.encode, so you're trying
to encode a byte string into a Unicode encoding, what it'll actually do is silently try to
decode the string first into Unicode and then encode it with the encoding you gave it. So,
that actually tries to apply the default encoding which is ASCII to str, and even though you
call str at encode, you will be getting a decode error if it ends up that str is not
an ASCII string. So, that was it. That was all my topics I'm glad we went over all them.
Here's some more information if you want; descriptors, metaclasses and super all describe
in Guido's original descrintro tutorial for python 2.2 which is still very relevant. Iterators
and generators are well described in a--under kooklinks tutorial on functional programming,
you don't have to follow the whole thing if you don't like functional programming, but
the parts about iterators and generators are very good. And if you want to learn about
writing C types, the standard documentation is a great source, as well as the Python's
source, the actual Python modules are all in the same API and they're very readable
source even if I do say so myself and it's highly recommended. And always the python
standard library and the Python C code are all great sources. That was it. Any more questions?
>> [INDISTINCT] somewhere, that we can get up?
>> WOUTERS: I can put it up somewhere. >> How about the previous presentation about
the upcoming [INDISTINCT]. >> WOUTERS: Sorry?
>> The previous presentation, I guess, it was [INDISTINCT] wheel about the feature of
Python? Is there any record of that somewhere that we can look at up?
>> WOUTERS: There's like a four or five different movies, sorry.
>> It's on my laptop you can upload it. >> Okay great.
>> WOUTERS: Any other questions? >> What's a [INDISTINCT]-—what's the good resource
for a sort of module import resolution? You know, like, when you're changing—-when you're
moving Python [INDISTINCT] from one place to another. [INDISTINCT] and is there, like
a, standard way of how you do of all that. >> WOUTERS: So, you mean from one string to
another? >> From, you know, one string to another or
what [INDISTINCT] code base [INDISTINCT]. You start mixing it [INDISTINCT] and things
like that or in like [INDISTINCT] libraries. It's got to be like when you do [INDISTINCT]...
>> WOUTERS: Usually a byte store...