Miguel Grinberg Asynchronous Python for the Complete Beginner PyCon 2017

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon everybody and welcome to our next presentation at PyCon 2017 before we get started I would very much like you to encourage your electronic devices to not make any sudden noises because when they do everyone's desi with that said I would like to introduce our next speaker his name is Miguel Greenberg and he will be talking about asynchronous Python for the complete beginner please make him feel welcome thank you thank you thank you very much full house no pressure right man why do I do this anyway uh so I'm gonna get something out of the way this is not a talk about flask right I'm not going to talk about flask I'm gonna mention flask less than 10 times I'm going to try at least okay so great so something that maybe you don't know in addition to all the work I do with you know that framework that I cannot say I also have a an open source project that it's an asynchronous server for the socket IO protocol that started as an extension to that you know that set framework but then it grew on a life of its own and now you can use it stand alone or with other frameworks like Django etc so I see a lot of people having problems working with asynchronous code so I'm going to start with a question how many of you have heard people say that a think makes your code go fast or really very fast okay so out of those of you that heard this but one and a why or don't understand how's that possible so this is the table for you I'm going to try to explain it in very simple terms at least I'm going to try so I'm going to start with a super simple definition we're going to build on this later but I think and I mean it is in a generic term I'm not specifically talking about async IO that's not the only way to do a thing but that is one way to do concurrent programming which means doing many things at once okay so let's let's go through the you know the few ways that we have in Python to do multiple things at once so the most obvious way is to use multiple processes right you from the terminal you can start your script two three four ten times and then all the scripts are going to run independently or at the same time and the operating system that's underneath will take care of sharing your CPU resources among all those instances when you're using C Python right there the most popular Python that's actually the only way you can get to use more than one CPU at the same time right this is the only way so the next way to run multiple things at once is to use threads and a thread it's a line of execution pretty much like process but you can have multiple threads in the context of one process so they all share access to common resources right which is a headache that's why threads have such such a bad fame right it's difficult to write threading code so the operating system again is doing all the heavy lifting on sharing the CPU you don't have to worry about it when you write your Python code and of course you know that the global interpreter lock in Python this is a special special to Python when you have multiple threads running code the global interpreter lock allows only one to run Python code at a given time so basically you're running in a single core even though you may have two or four or more so the third one is the topic of this talk right a synchronous programming so to make the mystery even bigger I'm going to tell you that the OAS does not participate here the OS as far as the u.s. is concerned you are going to have one process and there's going to be a single thread within that process but yes we we can get multiple things done at once so what's the trick so to try to explain this I'm gonna go you know thinking completely out of the box I'm going to pull a real world scenario from the world of chess so this is a very old photo the the lady in this photo it's a unit polgár it's one of the best chess players in the world and what she's doing here it's it's called a chess exhibition I'm not sure this is still being done you know these days but it was pretty popular before computers killed the fun out of chess by not being so good at it but you know when I was a kid this were pretty exciting as you know events if you if you were into chess so basically she shows up at the event and then she plays a game of chess against a lot of lots of people everybody normal people like you and I right and she usually wins all of them right but it's the whole idea is to play with a chess champion right so what I'm going to do is imagine you need to run this event right so I'm going to just go back of the envelope math here and I'm gonna just pull some numbers out of nowhere so let's say that there are 24 people showing up for the event right so there are 24 games and you know you did folder she's pretty good right so she's going to come up with a move on average in five seconds and if the opponents are going to take 55 seconds so we get a round minute for a pair of moves and let's say that for the average game there are 30 moves which is a short game but she's going to cream everyone so you know it's going to be a short game anyway for for most of them are all so imagine you're going to do this the synchronous way so for each game is going to last 30 minutes right half an hour right so she needs to play 24 of these so she's going to be there playing for 12 hours right which is pretty bad right for her especially so in reality we see that events stop don't run like that right they do something else so what they do is they use a sink Rinna's mode and it works more or less like this she walks to the first game and makes her move so five seconds fake and then she leaves the opponent on that table thinking but she doesn't wait she's not waiting there for the opponent to make a move she immediately moves to the second table and she makes a move on the second table and she leaves that opponent also thinking and moves to the third and the fourth and so on right so she can go around the room and make a move on all 24 games in two minutes right so by that time she's back at the first game and the opponent at the first game had more than enough time to make a move so she can make it you know can her next move on that game without waiting right so if you do the math she can she can play you know that there are all 24 games and win them in one hour versus 12 on the synchronous case right so when when people talk about I think that's being really fast it's this kind of fast right what we're not putting an implant on you the polgár to play chess faster right we're just optimizing her time so that she doesn't waste time waiting right make sense that's all that's that's a secret by the way that's the complete secret so that's how it works so in this analogy you the polgár the the chess champion will be our CPU and the idea is that we want to make sure that the CPU doesn't wait or wait the least amount of time possible and it's always finding something to do so now I can tell you a more complete definition this is still mine and I'm just you know inventing these definitions I didn't take them from anywhere but a synchronous programming is it's a mode in which the the tasks that are running release the CPU when they enter a waiting period and and then that allows other tasks that need the CPU to run while the the first task waits and basically that's the secret but you probably want to know a little bit more right how can you do that with one process on one set right so you need two things basically the first thing that you need which sound it's awfully difficult is to have a function that can suspend and resume right we want the functions that are in our asynchronous program to suspend when they enter a wait and and then when when that condition that generated the way ends we we want to pursue those functions from the point where they they were suspended so up sounds difficult but they're actually I come up with four ways in which you can do this in Python without operating system help and those ways are called up functions which are gross I'm not going to even show an example of that because they're pretty gross but three ways are a little bit more decent are using generator functions which is a Python switch feature from you know from years it's been there for a long time in more recent Python 3.5 an app there's a a think await keywords that you can use for that and then finally there's a third-party package called green LED that actually implement this as a C extension to Python and you can install that the tip and gives you another way to suspend understand functions so so that's the first part right so now we can suspend and resume the next thing that we need is we need a piece of code that can decide how the CPU is shared how you know which function gets the CPU next right so we need a scheduler of sorts and in asynchronous programming this is called an event loop so we're going to have an event loop it will know all the tasks that are running or that want to run and then it'll select one it will get control to it and then that task is going to suspend when it needs to wait for something and control will go back to the loop and the loop will find another one and that's it will keep going that way until the script ends so this is called a cooperative multitasking it's a saying you know from many years ago I mean there did the very old versions of Microsoft Windows for example or Mac OS did this so it's an old idea so anyway that's how it works so I created a bunch of examples to show you how this looks in practice I'm not going to have time to show all of them but I if you go or say that link you can see more examples that I created that I'm not gonna have time to show here but I'm not sure with field here so this is a super simple test that I created let's say we want to write a little script that prints hello wait three seconds and then prints world okay so this is how we will do it in normal Python right you will print the first text then sleep for three seconds and then print the second one right so if I were to put a for loop on that hello at the bottom to run the clock hello 10 times for example this is going to run not for three seconds but for thirty seconds right if you know each function invocation will run and they're going to run back-to-back so here are two examples that use async IO and you can see you know there's a little bit of boilerplate at the top to create one of these event loops and there's another little couple of lines of boilerplate at the bottom to run the asynchronous function but ignoring that you can see that in the function we have we have here two ways to do this suspension suspending a function and then resuming on the left I'm using a generator function so generators are these special functions that typically you use in python to generate sequences of items and the nice thing about them is that you don't have to pre generate all the entire sequence you can generate elements of that sequence as you know that the person calling the generator asks so you can repurpose that using this yield or yield from keywords and also use it for anything connect function and basically what we're saying on the example in the left when we reach the yield from or saying okay loop I'm done for now so I give you back control please run the function for me that the one that follows the yield from so they think I'll sleep for three seconds and then when that's done I'm ready to continue and the loop will keep take note of that and then manage everything because it's a scheduler that's what it does so if I were to call this this hello function ten times instead of running for 30 seconds you're going to see ten hellos and then you know a pause of about three seconds and then you're going to see ten worlds because you know during that three second wake the loop will find you know all the other nine right it run one first and then the other nine eventually will get to run okay so increase in pythons there's an improvement you get a much nicer syntax which you can see on the right but you know functionally these two are equivalent and you have the async depth declaration that's what you use to define a synchronous function and then if you use that syntax to declare the function then you get to use a wait for for the suspension and assuming so that's the point where things are suspended and one of the things that I think I oh I think it's great for is that it makes very explicit the points where what the code suspends under Toombs you know all those points where we're at Lisa this magic of multitasking can happen but I think is not the only one there are there are a bunch of others actually I don't have time to tell you about all of them but I wanted to mention the ones that are based on the greenlit package that I mentioned early and they these two this RG event on the left and I don't let on the right you're probably going to have trouble finding what the loop is in this code and actually you're probably going to have trouble finding what's the difference between this and the synchronous Python example right they look kind of the same I don't know if you noticed but the only difference is that the sleep function that I'm using on these two is it's not the sleep function from the Python library it's a different one that each framework provides but that's the only difference and the goal of these two frameworks which are very much like to make asynchronous programming transparent solos and that could be a blessing and it also to be occurs depends on how you look at it me as an open source developer I find that a lot of people get into this thinking that it's all the same right I'm going to start doing whatever I do always and and then things weird things happen because that they're not considering that this underneath is running a loop and you need to make sure that you never block because if you go if you block then you're blocking the whole thing which leads into the pitfalls this is this is actually I'm interested here in saying these things that I'm constantly answering on my on the issues on my github project because people always trip on these things and people number one is that what happens if you have you know we probably have an asynchronous program about maybe one task or a few of the tasks need to do some heavy CPU calculation right and the problem is if you use the CPU in your function for say one minute then during that minute nothing else will happen because this is a single thread right so all tasks need to to be you know nice to the remaining tasks and release the CPU often so if you have an opportunity to wait right nothing to wait for what do you do and what you do is to sleep basically you have to be nice and call sleep every once in a while in your function that's often as you can and if you're really greedy and you don't want to release you know you don't want to give up the time that you've got the best you can do is you can sleep for zero seconds which is basically telling the loop I'm going to sleep because I have to I don't want to but I have to but please give me control back as soon as possible because I wanna have used this video so basically you sleep zero and if you calculation has a loop which which is pretty common then you stick a sleep zero inside that loop so once per iteration you allow other tasks to continue running so this number one now the big and this is going to be a big surprise probably there's a bunch of things in the Python standard library that are designed as blocking functions right so everything that has to do with networking so reading writing from network sockets waiting on input or output from sockets anything to do with processes with threads the sleep function that we we've seen before you cannot use them so this is true for every async framework you cannot use these functions if you use these functions the thing is going to hang so don't use them okay it's very unfortunate people it's like they they they want me to tell them when when they ask about this you know I want to use them well you can't so you can't so all async frameworks provide replacements for these functions right and sometimes that kind of sucks because you have to learn a different way to do the things that you know how to do right all these very common things that you do with processes threads and networking unfortunate but it's true for async IO true for Evan Leedy event twisted curio you know all of them they all provide alternative ways to do these blocking things now you remember the sleep function in Evan let and G event that was coded you know almost or actually it's identical to the Python one so the the folks that developed level 8 and G event they went other way to create all these alternative versions very in a very compatible way to the ones in the Python library and they have both this option to monkey patch the standard library so basically they swap out the blocking functions from the Python library and they put their own instead in their place so then you can take any piece of code that was designed to run synchronously and somehow they inherit this asynchronous behavior and for many applications that that's enough to get a you know code that was assigned synchronously to work but you have to use have a letter G event now I'm going to say you guys time somebody who is going to come later in the question period and ask how do I do this for a sinc IO and I'm going to say no you can't you're not going to do this with a sink I think are you able to signed as an async framework that doesn't try to hide the asynchronously you know under the rug it wants you to design and write your code thinking asynchronously which is a different goal than these are these two I don't live in fear that so I'm gonna summarize this with a little table and it's probably going to be surprising to some of you so this compares processes threads and they think on a number of categories so maybe based on what I said so far you you think that these are super cool non-blocking not known you know doing something while a task waits it's exclusive to async and that is not true processes and threads can do that pretty well too and it's actually not Python doing it in that case it's the operating system doing it so there's no winner here right there's a slight difference in the processes and threads case it's the operating system doing it in the async case you have you know your async framework I think I LG event or so on they're doing it so it's cooperative for async but under pre-emptive which is called when the operating system Yanks the CPU out of you without you knowing it but all of them can do it so there's no winner in this category now I already told you that if you want to maximize your multiple cores in your computer then the only waste processes right so that there's clear winner processes is the only option that can do that and many times people combine processes with one of the other two so you run a multi-threaded on an async program and then they run it as many times as course you have which is actually a pretty good idea but the processes is the winner there so then we come to scalability and this is an interesting one because if you're running multiple processes each process will have a copy of the Python interpreter but all and all the resources that it uses plus a copy of your code your application plus all the resources that you use so all of that is going to be duplicated so if you start going crazy and start new instances you're going to find that you know pretty soon you're going to reach you know you're going to be a probably out of memory you can really run you cannot run a lot of Python processes on a normal computer so scalability preload you know I would say in the ones or the tens but not more than that if you go to threads that's a little bit more lightweight the processes so you can you can associate a much more threads and processes so that that's better you can scale a little bit better I would say in the hundreds if you go with async async it's all done in Python space there's no resources at the operating system level that I use so these are extremely lightweight so this is the clear winner they think can go into thousands or tens of thousands even so this this would be a good reason to go async now we have the bad news of the using the blocking functions in the Python standard library which processes and threads can do no problem because the operating system knows how to deal with those but when we lose the support of the operating system in a think we cannot use those functions and we need replacements and then last the global interpreter lock we know it causes some trouble with threads in my experience though it's not that bad when you have the types of applications that are good for also for async which are heavily i/o because unlike some people think if you have threads that are blocked on i/o that you don't hold the global interpreter lock so if a thread goes to wait then the operating system will give will access will be able to give access to another another thread without any problems so really it's not that great right I mean there aren't that many things are better for for a sink it's that right that's it so basically the closing statement that I would like to make is that really the best argument to go a sink is when you really need massive scaling right so this would be servers that are going to be very busy lots of clients want to handle lots of clients without you know going bankrupt in in buying a hosting right so async can do that really well you can go into thousands or the tens of thousands of connections and it's like nothing it's not a problem which threads cannot get there and even less processes any other any other categories it's not really you know clear that we should go I think unless you like it and then it's a totally valid framework to develop your applications right so if you like it you like it right there's nothing to say against that so this is pretty much all I have I looks like I did good time right so just going to be time for questions looks like okay [Music] Thank You Miguel are there any questions yes yep on here so my question is is it possible to schedule a synchronous operation from multiple threads you can run a loop in front of multiple loops right in different threads so it's like this first thread right correct they don't go together in the same bucket and then you know then you have to use normal threading you know synchronization mechanisms if you need you know the tasks that are running under one loop to somehow coordinate with the other it gets pretty nasty to be honest so pretty much you get affiliated to that thread the data correct the loading yes yes because most of times you have only one thread so yes thank you yep hi Miguel thank you for that talk I was hoping if you can give me some help trying to conceptualize this and like a stack frame all this magic that happens not necessarily just what they think but generators and everything in general ah well generators a Python feature right they already support the you know say in the context when when the generator function calls the yield or give from keyword it basically returns it does sort of a partial return returns the value and and control goes back to the loop and then the loop can call that again to you know to make that function to do a little bit more work which is exactly what you do when when you write a generator function you basically you have a function that returns partial values every time you call it it that's a little bit more work and returns another you know another value and as a result okay that makes an additional question game unit in a architecture where you spawned off multiple processes of the async workers in say for a server how do you like the socket binding how would that work across the different processing uh it's it's only one set that that will have a really large number of connections but if I have the same port that has to go through multiple processes after them have a single thread of async add the double to two ways to do it so for example you can have a something like nginx a reverse proxy in front so then your say you have four processes they could be listening on different ports right four ports say and then nginx you know consolidates that and then reverse proxies into all these that canned async processes that would be one way thank you that make sense how much more imprecise is an async sleep command than time dot sleep ah [Music] or could it be like seconds more or a a couple Miller elegance so this is this is so cooperative right so you really your task depends on how the other tasks that are running at the same time behave right if you have a rogue task that you know that that's a lot of computation and that's an return to the loop a software X it should that's going to affect your timing so yes and that that's actually the the problem that I see most often is that you know people forget that they're doing a sink and there's some task that that's something that looks and that stops the whole thing for everybody so yes imprecise sure you need to make sure that all the tasks are well designed for I think we have time for one more short question so I guess my question is related to the one the gentleman asks previously so actually in in JavaScript one of the issues is we used to be say for 10 second it's actually at least a 10 second so I guess in Taichung they have the same issue when I want to sleep a certain callback for 10 seconds they still at least a 10 second mm-hmm they're right it it depends on which depends on the other item should as I said before the sleep function is going to be implemented by the a sync frame book framework that you use right so I think I owe implements sleep and then G event implements sleep in a different way right the wrong way and you know every framework does it in its own way and you can you know you're going to have to find the best async framework if you are concerned about that right you need to find the one that it's more accurate but in the end it's cooperative so it depends on all the tasks being nice to each other if you don't have that then this this doesn't work well I guess in JavaScript at least the guarantees there will be at least ten seconds yeah and you get a guaranteed you know of that sort but you know exact times are you know heavily dependent on how the tasks return to the loop I think this feels like I do thank you it could be continued out in the hallway please many thanks to Miguel thank you [Applause]
Info
Channel: PyCon 2017
Views: 98,018
Rating: 4.9354839 out of 5
Keywords:
Id: iG6fr81xHKA
Channel Id: undefined
Length: 30min 58sec (1858 seconds)
Published: Sun May 21 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.