Easy wins with Cython: fast and multi-core by Caleb Hattingh

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay we might make a start and let other people filter in as we go so next up we have Caleb Caleb has worked professionally both as a software developer and as a chemical engineer and has been using Python for about 14 years he's also one of Pi cons most avid retweet us and today he is here to talk to us about siphon give it up for Caleb thanks everyone for coming it means a great deal to me to be able to do this and hopefully teach you something thank you very much so is anyone else tired about hearing about how python is slow and about how the Gil present the brilliance concurrency I've been hearing that for over a decade now and in the course the work that I do the speed is a frequent concern that one has about getting a simulator to run fast and it and it's really easy to bypass and the objective of my talk today is to explain to you that there are ways you can do that and they're not that difficult so I'm going to try and make a pitch it's a sales pitch about how up there's this thing called that you can use when you really need to to go really really fast but it's easy and that and that's the point of my talk not how fast you can go but how easy it can be to get into so let's begin I'm not going to speak about this my last slide covers me and I want to make sure I have time to cover everything so I'm just gonna move on there's a lot of stuff in the world of siphon and I'm not gonna cover at all I'm going to focus on only two things the first is making a code fast and the second is how to use all the CPUs in your machine or in a cluster shared memory concurrency multi-threaded shared memory concurrency and it's going to be easy I'm going to show you really really easy way of getting into this world you don't have to know everything but it's not that hard to get a little bit of code and make it faster I'm going to hide all the noise as well so I have a tool force before dealing with setup tools so you don't have to worry about that so the question that comes up a lot online if you read forums and everything is Python fast the first good answer that usually reply with is well what do you mean by fast it needs to just be fast enough for the task at hand a much better question to ask in reply is what do you mean by Python exactly because the thing is Python itself is actually written in C and it's optimized one way to say that is to say that Python is as fast at doing what it what it intends to do but that's not what people are talking about people are talking about when they write Python code they have that it is slow right so if you're new to this world what what is really happening is a pattern is extremely dynamic Python treats your code like a box of chocolates and your variables can change type all the time so your Python code is treated like a box of chocolates and what the interpreter does is every time it sees a new variable or even same variable in a loop takes the chocolate out of the box see this you're this kind of chocolate okay I'll do that with you and then reaches in it takes another chocolate of the box and oh you're that kind of chocolate now I'll do that with you and even if you have a very very big box and every single chocolate is exactly the same time and you want to do the same thing with it Python doesn't care it will check each time to make sure that your variable is the same thing so one way to deal with that is to tell Python or some aspect of the execution engine that you are really dealing with the same thing all the time and this this is where the idea comes from that you can give the time to things in your loops and then we'll go much faster because all this machinery about checking what you're actually working with and and how to order that an execution engine goes away right so that's what cyclin does among other things for our purposes that's what it does you can tell Python what your types of your variables are so that it doesn't have to check and then it can process things efficiently part of what you need to get this to work is a compiler so I've got a slide I'm not gonna go through them but I've gotten a slide see other ways to install compilers on a bunch of different operating systems I've written a blog post which is that URL shortened link at the bottom about how you can get it set up on Windows for 32-bit and 64-bit on PI to 7 and PI 3 4 there are other guides as well online a quick word about Windows I open and use too much time but you can't tell people to get a real operating system if what they work on is windows I feel incredibly strongly about that it's extremely insensitive I can't tell my eight-year-old students you know get a real operating system they're all running Windows it's I think that's important point to make you guys are the experts you make sure that your code works on a machine I think that's a fit that's a fair point okay so this is this is a tool that does really nothing clever whatsoever it just wraps the setup tools machinery to build your Python stuff it's a command line tool you basically say easy Python my site and file and it produces a binary object there's no magic behind it if you look into the code it's a single file list 100 lines it really just creates a setup that PI and does the compilation for you to make it simpler to use right so we're going to begin we're going to begin extremely simply there is no complexity here whatsoever there's a main file and there's a simple dot PI module we import the simple dot PI module and we run the function inside it it takes two arguments it multiplies them together nothing nothing strange whatsoever everyone in the room should be very comfortable with this there are four terminal I'm just printing up what the files are and the way I run the program is I say Python main dot PI and it prints the autistics give you a moment to just digest that nothing complicated whatsoever now I'm going to use syphon and apply it to this problem and what I'm going to do is I'm going to siphon eyes that file there simple duck pie so that's that's the change I've just highlighted in yellow at the top what I've done I've added an X to the file name and then I call my magic set of tools handler easy sighs and simple doc pyx and after that process finishes and you see a compiler stuff go on screen you list Adam turmoil you get a couple of files here on the next slide I'll go into that detail the red thing there is a shared object that's your binary library which you can import main main type I will import that in the same way that it imported a PI file and the run function sided so the first thing I want to I want to drive home is syphon is a superset of Python you don't have to change your code you can compile syphon that you can compile normal Python through syphon completely transparently you end up with a binary thing at the end but all the Python code inside the site installer works and in fact it gets executed by the same Python engine the interpreter still runs the inside the boundary the binary file it's only when you begin to add types do the things inside the size and file that you begin to get access to the underlying C library X and the speed and so on but that's the first thing I really want to drive home is that getting starting to play with cyclin is is really really easy to do it doesn't involve a lot of work okay so this is this is what the tool made we're an easy cyclin on our simple that pyx and the yellow stuff there is new stuff the blue is also new ask me about that in question time I'm not going to cover that but it's cool so the build folder comes from the compilation process simple dot C is a C file that syphon made from our pyx file so what syphon does is convert your UX bar into AC file and simple that Esso is the binary shared object that you get after you compile the C finding Linkwood against python easy size and simpler pyx and it spit out it's not too complicated yet right it's pretty pretty straightforward so I try to come up with a case study and and the problem with syphon is that it's a tool that has been developed in the scientific community and they usually introduce syphon at the same time as fourier transforms or you know scientific learning it's difficult to tease apart the bits that are useful for non-scientific things from the problem that really really complicated science that neuroscientists present at conferences about brain imaging and so on so I tried pretty hard to think of a good example and the one thing that came up with this text because different axes are common to everyone right I didn't want to go the death root because morbid so went with the tax route okay so what we're gonna do is were going to calculate everyone's tax so this is a tax table and this you're extremely young you should know what this is because it influences you every year basically in a progressive taxation system the more you earn that greater your tax rate is so it's a pretty it's an if-else statement that's really what it is your taxable income gets checked on the left and depending on which bin you're in then your rate gets worked out on the right pretty straightforward the Python code is pretty much a one-to-one translation from the tax table and this is one of the things why we like why I use it it's because the code is extremely readable I would argue you perhaps I'm biased but I think that's actually more readable than the tax table because get a sense of how to play it right so I'm going to switch between the siphon the pure siphon version of that and this so I'm going to just go forward and back so that's the siphon code I've added CP in front of the dip the return type is a double precision number and the argument amount is a double precision number and that's it right so I changed the extension of that thing to pyx and around through easy silent and I get a binary thing up again you don't have to use easy siphon that's not required whatsoever the guy for siphon says you have to write a set up top I file and put say which things must get compiled with what arguments and then you'll get the same outcome easy cycling just does that it's really just careful thing there's no magic so go back quickly - cycling - cycling so I'm trying to impress impress on you that the logic really hasn't changed in the function it's really just types that I'm applying there and that mysterious CP at the front right I'm hoping everywhere I have a blindness for things that I'm used to so I'm just trying to make sure I'm absolutely sure that you really follow what's going on okay so there I eleven and half million registered taxpayers in Australia I just went with ten million just to just to make it simpler i randomized a bunch of income so five thousand dollars a year to five hundred thousand dollars a year big long list and I want to calculate everyone's tax and then add it up pretty pretty straightforward calculation if you are doing a games engine like it like an economic simulation of some kind you might want to do this calculation many times if you are trying to optimize a tax table for some objective you would have to do this a lot of time so there's an issue of speed you might want to be able to do this pretty quickly right so I've got my generator in there to call tax Python for iron incomes where incomes is my 10 million list of array and then I sum them all together so how long does there take to run all of these things were running my computer and I'm going to show you times and I don't want you to take away from this what the absolute times are but what I'm really trying to impress on is the kind of reduction you can and how little work you need to put in to access their so text 12 seconds right the Python function I've got my list sum it up 12 seconds it's okay 12 seconds is not bad if I have to do that a thousand times it begins to become difficult to really play with the system and reason about it so the function that I have made a syphon version of earlier the one that I just showed you on the previous slide that's eight times faster right hold on okay so first question is that loop there is actually still Python code I'm still calling my function in a loop that runs in Python so what happens if you put the loop in cyclone as well it's a good question so we can also loop in siphon I'll show you the code a little bit later because I don't want that noise to interfere with the impact I'm about to make which is that takes 50 milliseconds okay so that's that's a speed-up of 220 times which is more than I'm used to you with numerical code I'm used to getting about 170 times speed-up but significant it's it's it's game-changing actually to be able to do this but but but again what I want to impress upon you is how little work we have to do for this result although I'm still going to show you what the loop looks like it's not too bad but pretty pretty tractable right you can you can get a hotspot in your code and you can make a couple of very small changes and and kind of get a reasonable speed up I don't think you could make it go much faster if you wrote C natively and I don't know C that well so I definitely couldn't do it so this is a huge benefit to me in the work that I've had to do so I just want to show you quick comparisons with some other methods that you may have heard about for speeding up code ask me about them in question time I'm not going to go into them too much detail but just to give you an idea of other tools that exist and what you might be able to do with it so the first thing is numpy right with this kind of problem numpy actually doesn't help you the reason is because we're doing slightly different operations on each element in our array if you're doing the same thing to every element in an array numpy is awesome it's fantastic you can crunch enormous quantities of numbers and it's great but with this kind of scenario where you're doing a different slightly different calculation on each element in a long list of numbers numpy actually doesn't help you at all because the calculation is still getting done at the Python level so I just want to show you there the one we did where we use Python code and we called our siphon function you can wrap that in an umpire vectorize statement and you really don't get very much speed up at all it's pretty much doing a glorified loop around your stuff and it's letting your function play with non-price broadcasting rules so so with this kind of situation umpire doesn't help you that that's the conclusion there with pi PI I'm not a pi PI expert I've used a little bit I homebrew installed whatever pi PI was in the list and I ran that on my Python version and I got 50 times I think that's pretty reasonable for really very not not very much effort on my part I think that's worth looking at I love the pi PI project and not against them whatsoever and I wish them every success I regularly try out new pi pi releases on some of my code and the last one is number so number is also in the 50 millisecond ballpark in fact you get really the same answer that you get in siphon and what do you have to do with numbers you just have to add that jet decorator to your functions to the loop and enter your tax calculation function and then you get the same answer but know more about that ask me about in question time why this talk is not about number and why it's about sizing so this is the loop that I said originally we just had a some generator in Python where we just had an if statement loop and we just walked over our tax calculation and summed it up this is what the loop looks like in siphon and once you start using siphon a lot pretty much most of your size and code is it is going to look like this this is almost idiomatic right so you've got your return type there it's going to return a double double precision number you have an array of doubles which are going to come in and and this is this is pretty idiomatic it's going to look like that if it's integers you'll say int : if it's something else it'll be something else like that there you've got all the types of the variables that you're going to use in your function but that's that's the entire function this is something else that is the loop that does the the call out to the other thing that we siphoned and and it returns total there most of your site that will look like this if you do it so we don't so that's how to speed up coding siphon I think it's it's a remarkably efficient use of your time to learn a little bit of siphon and get massive enormous speed ups in your code and you get to bleed it in you get all the benefits of Python in your code that is Python and doesn't need to run fast and in the bits that need to run fast can run pretty much native C speed I think that's enormous ly beneficial and and we don't sell this the scientific community community does not sell this strongly enough I think to the non-scientific audience multi-core so concurrency JavaScript so we're not talking back on currency that is you know single core asynchronous i/o bound type processes what we're talking about here is multi-core multi-threaded shared memory concurrency shared memory so you have got multiple threads that are accessing the same memory so if you truly love the Gil set a tree so this is the same loop that I just showed you right this is the work that's gonna come in there are our array of incomes at the top and it's exactly the same loop that only changes what a marketing yellow there when used to have a context handler with no Gil that releases the lock that's it that's the change so that code will now run and it won't consume the lock from Python and it will run happily on its own on a separate call one tiny cache I've marked the name of the function there in red there's just one little thing that you have to do one of the provisions of this context handler is that every function that you call inside of it just has to be marked with the no Gil directive and what that does is the compile it allows compiler to check that you're not calling Python anywhere in your code but that's really what it does you can't you can't release the lock if any part of your code inside the context handler is calling it to Python so it all has to be typed and it becomes native see on the backend so what that looks like is they just add that our loop has context handler that says with no Gil just show you back there that's what it looks like there and of course into our function and our function just has to get that mark on the top so that's it so now we have to set up our threads and this is this is my favorite part of the talk I think this is just the most awesome thing Python threads are awesome right the guy just said my story was cool and he called me bro we've been we've we've it's become like a a a cargo cult inside Python that threads are bad don't you threads they're not really threads they're not running concurrently but the thing is we just released the lock we just released that so can threads work for us again right so I'm going to leave this slider I've got some time left I'm going to leave the slider for quite a while I want you to take it in I think this is the this is the most awesome thing I've seen in a long time and in preparing these slides I had originally had Python 2 threading this thread pool executors is a Python 3 thing that I played with and I think it's just marvelous so it comes with Python 3 you can't get that in Python 2 it's a thing that you run in a context handler I've got 4 CPUs in my laptop so I make the workers before they you get this thing which is in job handler which I called exe there numpy provides an array split function which splits your work into the number of pieces you say but it returns views so that's that's not copied stuff it returns for views on your chunk of memory right so you get sections or sections is a list of 4 things that are views into a single Arab memory and then we've got our iterator where we submit jobs and the way the jobs get submitted is you give you a function so that's my loop that's my loop that does the summation is type text and the S is the section so the jobs that you submit is your your work function and the arguments for the work function and I get jobs I wait for them to return routes and I sum them back on the way out you cannot get bugs into this if you've ever written native through its in C it's it's the kind of thing you do that your takeaway is I should never do this again this is this is terrible right in Windows nonetheless I've done it before in Windows it's it's awful multi-threaded programming is supposed to be awful right you can't get this wrong it's it's very nearly brain-dead once you once you get this pattern going and I thought this was really amazing that's that this is my big takeaway from from preparing this talk so we've released the lock and we've got our thread set up and we're submitting jobs to them so do we get what kind of benefit do we do do get any benefit at all so remarkably we we get massive benefit right I've got the number of threads there on the bottom and I'm running on my macbook it's got four cores at one thread we get our 50 milliseconds that's up there on the top lip and two threads comes down three and four and around four it kind of levels out the the improvement is fake that you see as it goes up to eight the this noise and that variation is within the noise so pretty much a four kind of flattens out because that's how many CPUs I've got that's that efficiency is 92 percent so I'm getting 92 percent of what you would expect in an ideal world where you know the amount of work that you can do is exactly equal to the number of cores you add to the system I've got no doubt that if I ran this on AWS ec2 cluster with 64 CPUs that it wouldn't be as high as that but it would be significant and depending on the size of my work I'm pretty sure I'd get I get a really big improvement and I have a pretty good confidence that it's going to run and work I really don't expect any bugs because the code is so short so that's 14 milliseconds they don't let I'm now down at on mine that book and it just to just be absolutely clear I'm not trying to say that you know using more cause is better than a single core that's obvious but to take away that I want you to get is it's not that difficult to get that if you've got code that is slow and you want to make a bit of it fast like there's some part of your code that's just really bogging down your your your application it's not that hard to get there and the tools are available to make that relatively easy to do and you can do this and in certain classes of problems it's easy to do multi-threaded programming it and this is one of them so it's so great right we should we should siphon everything like all of our programs we should run through this thing and you know get this optimal speed and everything the thing is no no that doesn't work and I've tried that because it's intoxicating it's like it you know it's like fine wine or crap or something it's when you come down it really hurts so good advice use you siphon very sparingly measure make sure which bits of your code are taking a long time to run just do those bits the smaller the smaller that you can confine the scope of your cycling work to get the maximum benefit is really really what you want to be doing the world gets lonely and scary place as you move away from Python yeah I've suffered I've suffered a great deal from trying to add syphon to too too much code the 90/10 rule very much applies find the 10% of your code that is taking 90% at the time and just focus there I wish I could tell you a lot more there are other things that siphon does that that that are magical and amazing it's not just this there are other things extension types in siphon are ridiculously easy to do they're basically Python classes with little bits of extra magic and they run extremely fast and it's easy to make things that work and and don't fail because of bad memory access so it's pretty easy to do right one of the criticisms with cycling is well now you need to know both C and Python together to do anything and that's not true I don't know see that well I learned a little bit as I go but I am NOT a C programmer and I have used - Everly so there are other projects that also try and tackle the the speed question but not only the speed question and they also have slightly different trade-offs in how they want to approach things pi PI wants to tackle the problem of just native Python without any decoration of types number has a very strong focus on trying to get Fortran speed for number crunching shared skin has their own thing York is another one and Nami experience vectorizing calculations easily but this is a happy ecosystem sometimes they seems to be a little bit of confrontation between the groups but mostly they get along well and everyone is trying to follow the same dream which is to make Python applicable test many domains as possible I personally have high hopes for a pipeline number I'm very supportive of those projects and I do check them out regularly when they bring out new releases and that's my slide I'm starting a company which is kind of insane don't have the funding or the know-how to do that but I'm trying anyway and I've got a minute to spare so got more question time thanks Kellan that's awesome I just wanted to give a plug for Python XY which is a Windows bundle which includes dev C++ which is mingw and is a simple install which gives Windows people psyched and ready to go anaconda does the same so they're pretty good as well thank you and when wind python is the other one which is similar to flatten X Y the same guide to crisp oh that's pretty good as well thanks your example was of something which is what I'd call data parallelizable which is that you can essentially independently process sub parts of the data and then do a combined at the end do you have their best patterns for doing non data parallelizable tasks that's a great question and I'm probably the wrong person to do that I don't have a computer science background so whatever I learn is what I need to do accomplish things so I learned the hard way which is stumbling blindly through the forest trying to make my way through Caleb great talk thank you you wanted us to ask you about simple HTML so consider yourself asked good ok so that's that file so the HTML that comes up looks like that so what it is it's pretty clever it's it's a HTML file with some coloring and syntax highlighting and and the yellow indicates the proximity to the Python runtime so if a line is white it means there's no proximity which means that gets compiled completely natively and it's independent of Python if there's some yellow that there is some proximity to the Python runtime and you can see this this line here at the bottom is vaguely yellow it's got a got a lighter shade of yellow if you click on any of these lines it'll show you the underlying C code that represents that line right and what you see when you open that line up is that southland has put a check in to see whether you're referencing in that either something that doesn't exist so it's an out-of-balance check that's happening you can add an additional decorator above or below the wraparound siphon don't wrap around that says bounce check false and if you've said that to be false then that pale yellow line goes away and it no longer does the balance check and you get another five percent speed so I think it's safe by default so any way that your code could fail for for the usual suspects like out of balance accesses and similar things integers that wraparound siphons default is is to be safe so Python exceptions get thrown when those things occur which is enormous ly beneficial you don't you don't get you know really bad crashes and access violations and so on so that's that's the view file that's what you get so you can click on any other lines and you can see what the underlying C code is it's not really pretty code and the variables all become really strange variables with lots of underscores around them so it kind of looks ugly but if you work with us a lock you'll have to deal with that and what I usually do is I just try and figure out if my code is too complex and I figure out what what bit I can change to make it a bit simpler and try to reason about what it's doing without digging too much of the scene so here we are adding some types to be coded like what about the overflows the type checking this seems to be a simple example of double actually what if a caster or a Const cat star is added is it not like going to be toted yes to you siphon second thing is like instead of using a dot B Y X or this way of making it as a type Python why can't we write a native siphon code in C and use it as an extension in Python is it not more faster great question it is not faster not only is it not faster it's more difficult to write if you do it in C and unless you're a good C programmer you're probably going to get it wrong that has been my experience making code easy to write is the is your best protection against against bugs I think a good C programmer probably can do tricks that can get faster than what can be represented but the guys who make scythe and they're incredibly smart people like way I am I'm not enable I'm here just you're pitching it to you but they're really good and they know all the tricks and many of them are Fortran program is actually long time Fortran program is it significant number crunching stuff I think if you really know what you're doing in C you're probably maybe you can probably you can but I'm not that guy and I don't think I don't think most people are the first part your question was what is the types of more complex right as you add psych and stuff to your code you you move in a continuum from Python to C so whatever you can do in C you actually can do in siphon you can do costs there's there's a syntax for doing costs in front of variables so you can do that and you can handle complex cases but the code becomes a bit more complex to work with but you can do it yes oh you'll get a Python exception if you overflow unless you turn that off and then you and then you won't get an overflow exception you'll actually get a crash some kind of access violation or something so yeah okay can we get a you want you raise at the end pi pi which is one project I would like you actually have the numbers for how fast pi PI does this because this would seem to be the sweet spot for what type I can do I had do you mean oh wait I went too far right yes here we go okay so oh sorry but I missed it I must've missed this line I'm sorry I do apologize I am so so I don't want to say that's the best pipe I can do against the performance or the other stuff it's a it's a naive I homebrew installed pipe I and I ran my Python code through it and that's what I got so so fifty times without doing anything else is actually pretty good and the other one is other probably other projects in the community that this sort of relates to the type annotations that's coming in the upcoming three five is there any plan for siphon to use those type annotations in instead of in addition to its own declarations or I speak under correction but I don't think so okay and it's kind of disappointing more than if it wasn't made to kind of marry the the types and taxes together so what we're probably gonna have is you know in siphon three stuff when the type annotations become popular we're going to have it twice you're going to have certain types and in the other types which will be disappointing I think we should not do that you
Info
Channel: PyCon AU
Views: 24,778
Rating: 4.958261 out of 5
Keywords: pyconau, pycon-au 2015, Python, PyCon
Id: NfnMJMkhDoQ
Channel Id: undefined
Length: 33min 4sec (1984 seconds)
Published: Mon Aug 03 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.