Alex Orlov Cython as a Game Changer for Efficiency PyCon 2017

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
lecture today we know Python is a great language for everything buzzspeed but how do we deal with it anyway we can solve this problem you will find the answer in the next talk let's welcome Alex for the title as a game changer for efficiency hello everybody thank you for coming today we're gonna talk about Python efficiency in general and more specifically we can talk about sytem so let's start with Python it's a great language that you probably love and you can probably name zillion of reasons why do you like it my favorite favorite is released adhere so it's speed of development code readability greater good system of libraries and of course community and actually I work with different programming languages but whenever I switch back to Python I always feel kind of relief and always getting surprised how fast and easy development is but let's admit that python is probably not the most efficient programming language in the world it's definitely quite efficient in terms of developer velocity but not that good in terms of CPU usage or memory usage but the real question here is how much do you care if your back-end engineer in typical web company and your company experiences growth it's quite likely that majority of your challenges will be somehow related to scaling issues of data bases or catch consistency or something like that but your web tier it's quite likely that the tool stays simple usually it's a stateless web server and all you need to do in order to scale is to add more boxes as simple as that but at some point number of machines that you edit might become insane or at least big enough for you to consider to save some money for your company and reduce number of boxes even then Python execution speed itself might not be a real concern your web tier can be CPU bound memory bound IO bound and all those are different type of issues although sometimes they're correlated but let's say that you're in the same boat as Instagram and you also have some CPU issues first things that you actually need to do is profiling according to Pareto principle 20% of work is responsible for 80% of results or in our case we should expect 20% of codebase to be responsible for 80% of 4 of global cpu footprint and in our case that's actually true so try to avoid prematurity customizations and figure out what piece of code you really need to optimize so let's say you found critical code the next thing that you want to do is read your code so it's quite likely that it just performs unnecessary actions or there is some misuse of data structure or some Python specific stuff like imports and sidle functions for example here we have this comprehension that intentionally generates huge lists and then we have for loop and inside of the for loop we check if element is present in the list looks good looks like normal code but problem is that this data structure is not designed for this type of queries so we actually have n square algorithm here which is fine for many cases but if it's critical code paths then even for a relatively small n it might become an issue fortunately it's easy to fix if you change this comprehension to set comprehension you'll reduce complexity to linear and problem solve so my point is before applying any dramatic changes and optimizations try to read your code first and check your algorithm but let's say is that you did it and your code looks sane but it's still slow don't worry at this point you also have multiple options option a microservices so you can take your critical piece of code and if it's easy for you to decouple it from the rest of your app and if you're using service-oriented architecture you can probably create separate micro service to write it in any programming language that you consider to be more performant in Python and yeah so it should work but it has obvious downsides so first of all it sounds like non-trivial amount of work like to take a bunch of code and rewrite completely different programming language and also if you're not using service-oriented architecture it will add complexity to your system in terms of maintenance deployment capacity planning and whatnot so hold on option B a classic C extensions again you can take your code rewrite it in C or C++ and the great then just create as a separate library and then just create Python binding for it it works it works perfectly well and as a matter of fact that's how many libraries that we use are written but you'll have to write in C++ which is arguably not the most friendly language to write especially if your product engineer and if you don't have relevant experience option C you can change Python runtime so Python of the programming language is obstruction and C pythons that you probably use is concrete implementation of that abstraction but there are multiple of their multiple options so there is PI pi which is probably the most popular one there's by Stanford wrong box grandpa from Google so a lot of good stuff and the each of them has pros and cons on its own so we won't stop here but let's just say that switching Python runtime is not that easy as it sounds for example if you want to move to pi pi and you depend or code that you use depend on libraries that you use depends on multiple C extensions then migration might be tricky and one more options you can update Python so as you probably heard we migrated from Python 2 to Python 3 and it improved overall CPU usage of our system by 12% and so there's actual ongoing work related to performance and if it's feasible for you to update Python yeah why not just do it and finally you can use site on so what is site on according to the QPD site on is super ok I don't see it Oh site on a superset of the Python programming language designed to give C like performance with code which is mostly written in Python so in short sightedness programming language which is basically the same stuff as a Python but with optional extra syntax that you may or may not use up to you it compiles to C or C++ and it works perfectly well with your existing runtime so no changes in infrastructure are required at all let's consider the following example don't worry you don't have to read this code so that's Django URL dispatcher basically piece of Jenga it takes URL and figures out what view or controller it needs to execute so once we noticed that this module started consuming four percent of global CPU in our system and at the first step we just compile this vist item and this simple action gave us 3x performance boost for this model and reduced overall memory CPU consumption of this module from 4% to one each percent and if you think about it that's actually not bad because so far we still don't know what sytem is we didn't have to learn any syntax and we didn't have even we didn't even have to read Django source code so easy win so on this slide we we didn't change any line so sight on was able to apply some optimizations but it can do much better job if we somehow tell it what are we trying to do let's consider this example so here we have transformation functions that takes X and just squares it and a plier function which takes and and it has for loop and inside of the for loop it just accumulates results of transformation into sintel into local variable so if you compile this code it will run 2.5 X faster which is quite similar to what we experienced before but sytem allows us to add types ok so this code now looks slightly different from normal Python but actually what we change here we change the signature of functions for example now it's death of lier in M which means that our function only accepts into argue arguments and also we declared type of local variables inside of body of second function so if you compile this code for relatively large n like more than thousand it will run 200 X faster and all thanks to static typing to be fair there are multiple optimizations that you that seitan provides but if your goal is to optimize existing Python code then adding types is all you need to do in majority of cases so let's take a step back and take a brief dive into site and syntax that you probably should learn if you want to optimize existing Python code main keywords that site introduces is CDF it's used to declare type of variable for example here we have three variables integer variable I empty string s and empty list data okay it may look weird but should be quite simple to understand same applies to function signature you can specify a return type of function as well as type of arguments and when I say you can it means you can but you probably should but you don't have to if you don't specify particular some particular type it will default to the most generic types that Python has which is Python object type so syphon won't be able to apply some optimizations but your code will still be good to go another thing that I should mention is that there are three different ways to declare functions inside on dev CDs and CP def def functions are normal Python functions exactly what you can expect but there is also C def declaration basically site on will compile your code into native C function and as a result you won't be able to call it from normal Python code only in only from site on or from C but on a bright side it won't have any Python function call overhead for example it doesn't need to do marshalling from Python object type and to Python object type so close to those functions are considered to be much much cheaper and for functions in your basic modules in your system it might be quite critical and there's also see PDF declaration which is intersection of two worlds site and will generate native C function as that will be used inside of site on but you will still be able to call it from external Python code because setting will also generate scene rubber few words about type system Satan has support for all primitive C types such as int long float double char of course it has support for strings both byte strings and Unicode strings and by the way a site on works perfectly well with both Python 2 and Python 3 so for example is th TR type here will be unicode type in Python 3 and white string in Python 2 and also set and has support for all Python collections that we love such as least set dictionary tuple and again in majority of cases if you want to squeeze like performance like 2 X 5 X that's all you need to use but sometimes if you want to squeeze even more performance you can go deeper and start using low-level types so that that is scary but unnecessary slide so site only has support for such low level types as see arrays or row pointers you probably should be very careful with those also has support support of in arms see structures unions and for example if you're a big fan of C++ standard template library as I am then and if you always wanted to use vector in your code or three based map now you have option to do that for example CDF vector in the data will declare empty vector of integers so it may look unusual and even slightly scary and but again there's those type of temptations you probably apply in rare cases another source of optimization that you probably should use our extension types they look quite similar to normal Python classes so for example here we defined by can speaker class it has three attributes name age biography and there is some constructor and there's some property so as you can see a quote here looks exactly almost the same as normal Python only one difference here is that we declared explicit see devlog and inside of the block we listed all attributes as well as their types so behind-the-scenes sytem will use the typed c structure instead of dynamic Python dictionary to store attributes of this class so as a result they consume much less memory they have faster attribute lookup they have faster method access you can declare some methods as CDF and what is most important they can be used as valid type for Satan static type system because foresight on any Python captain defined class is a black box because invite Python is the dynamic language and you can override everything so seven should be safe and it doesn't make any assumptions on internal structure of your objects and last but not least they work perfectly fine with your existing runtime you can create them and even more you can even create your new Python class and inherit it from sight on one so overall your optimization workflow with sytem should consist of of following steps first you detect critical modules and you compile it compare performance numbers if it's good enough you can stop there if not you will have to add some types then compile around performance comparison again then add even more types and so on and so on until you either get performance that you want or it's also possible that everything will be typed but you still want to get even more performance in this case probably you want you want to take a look into more low-level pictures of site on but that's a rare case for example you can replace Python data structures with like data structures or support positive structures so a few words about to link during compilation you have option to specify annotation flag it will generate such beautiful HTML here a yellow lines indicate interaction with Python virtual machine as you can see there are different shades of yellow so bold yellow indicates most expensive interaction this Python VM you can click on any particular line and you will get Yossi generated C code if you're not very comfortable with C or C++ probably this code will go on top of your head but at least you will figure out what parts of Python C API are considered to be expensive just want to share with you some Instagram results that we have so far so far we converted only 10-ish probably now it's closer to 15 modules 2000 and when I say modules it actually means files so and it already reduce global CP like CPU consumption of our web stack by 30% and we just started so we still see a lot of opportunities and places in our code base that we can optimize and reclaim even more CPU funny fact is that when we first experienced experienced issues the CPU Titan wasn't obvious option and overall I had impression that Saturn is quite popular in open-source community but it's mainly used for following to use cases to wrap existing C code or to optimize projects but projects in data science space but as you can imagine Instagram is quite typical web service and as you can see we were able to reclaim a good chunk of CPU with a little effort okay to recap what we have so far first of all don't be concerned about Python execution speeds too early as you can see it took Instagram a while before it became an issue maybe a few hundreds millions users so the definitely good problem to have once you get there first thing that you want to do is profiling it might sound obvious but practice shows that developers tend to optimize everything I I think your code base should be similar to ours in a sense that you will be able to find some low-hanging fruits that you will be able to optimize and reclaim massive amount of CPU and for optimizations you can use multiple tools but we can recommend you to consider saikhan and the reason is it will help you to avoid massive code rewrite it also allows you to gradually optimize your code so you can start with compiling existing code then adding some types then adding even more types and so it will preserve Python syntax so you don't have to learn new language that's basically Python just with types and maybe a few more weird constructions and last but not least it will preserve existing runtime so you will be able to keep using C pythons and no changes in infrastructure are required okay that's probably it you can go to site and ozark and check the documentation it's pretty good and I'm not sure why I put Instagram reference here but go to Instagram let's go - yes questions yeah please so I went to an earlier talk about type annotations in cpython and I was wondering if there's anybody thinking about making it so that when you do that you automatically get these kinds of benefits they're new projects like that so first of all satin is a good citizen so as I as I mentioned it supports almost like all features it slightly behind with supports all features that the normal Python has it but now it will just ignore all type annotations that you provided so the reason it is that big is that two type systems my PI type system and sight on type systems they're not very compatible and they were designed for different reasons so sight on type system is designed more for optimization basically for workflow and to map to more primitive types while my PI type system is designed for different reason it's more for developer velocity and to keep you know better than me probably so and types that my PI system has right now they're not very convertible and there's not much that seitan can use from it for example list int of course we can safely assume that it's a list but we cannot take a lot of advantage that it's a list of integers or for example if you put iterable int site and doesn't actually need to do that like if you in your function iterate through that object so item can automatically understand that it's iterable without any type annotations so you can argue that you can use it for primitive types such as integers and strings but even then it's not quite safe to map Python into c int because you can I don't for example if input is more than 32-bit integer you'll probably have a bad time so right now I don't think there are plans to merge them okay thank you can can sight on release the Gil inside yes yes that's another feature you can actually have fair threads like like inside of sight on and you can release Gil and it actually takes scare like one trick there is that you cannot cannot work with any Python objects but actually flatten provides ways to keep you safe and to explicative like view that no you cannot mark you cannot release Gil here because we do some stuff with Python objects here so yeah it can it can really go does site on add steps to building you're distributing packages or to like a CI tool chain I'll just compile the code so when you compile code it will produce a so file that you will be able to use as a normal shared library or almost the same as it will be pi file right so and how you distribute yours like is that like all your like code base to production machines that's totally up to you so we it has nice tools to like that can simplify calculation but we for example we built our own like pipeline and then you just distribute the objects separately yeah yeah thank you okay so sort of slightly related was one of your examples was like about the Django writing system and like compiling that bit so like how do you deal with compiling just a tiny bit of one library and not the rest of it and like how do you ship that you can compile files so we just took one file that the inside of the file in the file contains all critical functions that work heavily that used heavily CPU like basically don't use the tip package for Django anymore you know no no yeah no we still use Django we just on import stage we just watch one particular file and that's it Oh interesting so we move to this file to our code base compiled it ends and just a lot yes that okay cool thank you we didn't compile whole jungle I've got a couple of questions first you know roughly how many in here areas you spend so far on converting things the site on much less than migration to poison tree so hello usually optimization on one particular module takes two hours yeah and then I was also wondering if you could talk a bit about what we did what we dig is problems you ran into we're when trying to convert a second like own stupidity that's int so sometimes I sometimes I assume that my function accepts a particular type of arguments but it doesn't says for example once assumes that it's integer but it's actually false and so I caused a bad situation with our website yeah rather than that well you should be careful with row pointers but as I said you should usually you don't actually use it I we actually used it only a few times maybe two times Thanks I thank you great doc I had a question about the kind of C extensions that it creates is it the C types based the extension or can you also have csfi based extension that works with other virtual machines so I'm not sure that I'm very capable to answer this question but site on it by itself is a tool to write extensions so can for example can do the extension to work with pipe I for example oh for Paiva yeah it's a support for Python yeah okay okay but it explicitly added the support it's not like I think it has I'm curious why if you get a 3x improvement on the dispatcher why would you just run it on all of Jenga what's the cherlene some downside what sir why aren't you why don't you run all of Jenga through site on it just one module get to it is there some downside to doing all of it or yeah that's a really long topic if you want we can share the flying but first of all compilation takes a lot of time so we usually recompile everything on every Ally curl out and and to answer your question we didn't have to Jango doesn't consume much CPU on our machines right now so it was one particular modules that consumed a lot so just yeah we just compiled that one and so one thing you didn't talk about that I know exists they're a bunch of essentially compiler flags that you can put in are you making use of any of those things I'm going to remember there you could like turn off type checking and various things and maybe even see like and we provided the trick codes I know I don't remember probably probably few again we can fold okay it's just okay thanks for this very heat discussion that thanks Alex again [Applause]
Info
Channel: PyCon 2017
Views: 17,184
Rating: 4.920228 out of 5
Keywords:
Id: _1MSX7V28Po
Channel Id: undefined
Length: 27min 28sec (1648 seconds)
Published: Sat May 20 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.