import asyncio: Learn Python's AsyncIO #2 - The Event Loop

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi this is Lucas from HDB and you're watching the second episode of import async io an introduction to the Python framework enabling asynchronous single threaded concurrent code using coroutines before we begin let me thank you for the very warm welcome the CV is already received after just one episode I'm frankly a little terrified whether I'll be able to deliver on the very high expectations you seem to be having for this particular series in this series we're trying to teach you to show you a sink i/o as an incremental teachable incrementally understandable concept something that you can use to write applications that will support tens of thousands of concurrent customers with no problem in fact this is all done very often for this particular framework so this series is um split into eight parts I always felt like a single conference talk was not enough to cover a sink i/o in good detail so we're gonna go and make eight of those videos we are at the second one now but there are gonna be more to come so that you can get incremental in kind of experience with a sink IO each time with a particular important piece of it so this time around we'll focus on the fundamental construct of a sink IO the event loop will learn how to use it without co-routines this time how to configure it and how the various implementations of the event took differ from each other yes there's more than one will even look under the hood at the implementation to make sure we find the actual loop within and understand how it works casual users of async IO rarely go into such detail we're gonna go pretty deep today but we're careful learners we are not in a hurry and we know that this knowledge will save us from some heartbreaking debugging sessions in the future so let's dig in again the plan for this particular is going to be to recap what an event loop is from the last video to see how the event loop works from the perspective of our end user right so if you just want a code if you're just interested in how the API looks like you're in the right place now we're also gonna talk about differences between reactors and pro actors the fundamental to designs for event loops and i/o multiplexing in fact ii think i of supports multiple implementations of event loops are gonna go and see how they are different finally we're gonna see how to configure event loops in a sink IO there's quite a few toggles that we can pull and push and we're gonna look under the hood so that you are able to look at the very code of which a scenario is consisting and see that there is no magic there are no secret handshakes you can really understand how all of those things connect together the last point I wanted to make so let's just do it right now is that if you are running a sync io introduction there is one particular implementation of the event loop that you really should be using and that's UV loop so we're gonna talk for a while about why if there is a single most important picture in this video and that's the picture that's a very high level diagram of what an event loop is doing it's calling callbacks in some order one by one so how does this look like in async IO to get an event loop let's start by python is very nice it's gonna kind of help me and to make a very nice attractive video for you you just get the event loop right there is a function on async I oh that lets us do this and then you can run it forever you have to believe me that this is what is happening already it is running forever it did not return so B Python was unable to show us the prompt again there is in fact now continued looping through the event loop this is kind of boring because there's nothing registered on it so let's stop this see how else we can run the event tube for now we can for example run until complete for example let's just sleep for 5 seconds so that it runs for a while and then it'll release control again back to the rapper to be Python so you can run the event loop by just saying run forever or you can give it something to run and say run until this particular thing is complete so that's still not very interesting right because nothing is happening on the loop so let's create a function that will just print the current date time for us like you know this kind of thing like literally print date time now and we can schedule it let's just schedule it twice so that you can see that you know there are many things that we can schedule at the same time and when we finally run at the event loop that way you're gonna see that you know even though we're waiting for 5 seconds soon is pretty soon right after starting the loop the two callbacks that we registered were in fact cold so that's how you schedule things to be called now I would like you to introduce I would like to introduce to you trampolines which are a very special construct but they're pretty actually simple so they are callbacks that do something and then register themselves back on the loop so see that's the entire trick they're gonna do something and then they're gonna call themselves later again so if you run a trampoline just once we're gonna call the loop stop later we can now run forever and see there are incrementally many calls to our one trampoline why well because the trampoline registered itself again on the event loop so the event loop has something to run pretty often instead of saying call later zero five trampoline name we could say call soon but I didn't want to flood B Python with just the dates right here so trampolines are very useful so remember this trick we will need it in the future the awesome thing about trampolines is that we can run more than one trampoline in the same event loop so here let's just create a trampoline with the name first a trampoline with the name second and for good measure a trampoline with the name third so you can see there are three now and if we call later like call a loop stop just so that we are not in fact running forever after calling run forever we're gonna see all those three trampolines nicely interleaf but they're scheduling the next step after each other so they still maintain ordering which is pretty cool in fact so trampolines pretty awesome they can do something like cooperative multitasking that way this is kind of a spoiler for the next episode but you can see where this is going right now so however as we said before the event loop can only ever run one thing at a time just one call back at a time so suppose we create a hog function that does a lot of Python level computation which takes a while to complete my computer this is going to take us at least a couple of seconds so now we schedule the function let's say you know Rick right now or maybe let's schedule it 15 seconds from now right so at first something else is going to happen so now we also want to schedule loop stop so that we don't run forever and when we run forever you're gonna see that there's nice coroutines Walker T's trampolines but finally after 2nd 11th nothing is happening for quite a while and after second 17 it resumed and now the loop is stopped so what happened well the hog function kicked in and clogged the event loop until it was done so you might have noticed that this time we didn't have to schedule the trampolines again when we ran them now since they scheduled themselves on the loop they were already there and we are using that same loop object that teaches us an important thing an event loop can be started and stopped many times over just remember that stopping it to 4 too long we'll open network connections that can timeout so that's the event loop in a nutshell it never does more than one thing at a time and if that one thing is slow it will slow everything down so we avoid doing very long operations at any given moment we are doing something as fast as possible and then yielding our execution to something else we are never waiting actively we're always trying to make sure that we're just doing our work in the smallest chunks possible so the callbacks that we are scheduling should also be rather short so if the event loop never does more than one thing at a time but it can deal with many things at the same time how does it do it well by using a selector Cisco there's many kinds of selectors these days but the first one that established this pattern was called select select allows us to provide a list of file descriptors that we want to read from or write to naturally those file descriptors can either be regular files but also network sockets a UNIX sockets as well as pipes so when we call select with such a list as soon as any of the file descriptors are ready to either be read from or written to the call modifies the list of descriptors to only leave those which are ready so you're getting a list of things that you know there there's something interesting happening with those so there's also a time out so even if nothing is ready we can still unblock after a while to do something else for example to react to user interface events so this kind of multiplexing pattern is called a reactor it is reacting because the user code reacts to notifications about file descriptors ready for reading or writing and then the user code actually performs the reads and writes so that's on the part of the user code in the user application fun fact the twisted framework calls its event loop the reactor so that's why there's another approach to the same problem called iocp for shirt spearheaded by Microsoft and used in Windows a single IO completion port allows a pool of threads to block on it waiting for new events so when an event arrives on that port it wakes up just one thread to handle it if there are many events each thread gets a different one so that allows us to use multiple CPU cores automatically if there are more events than available then available threads some events will wait until one of the threads in the pool freeze up so in other words iocp is like a multi-threaded variant of select with Bolton built-in orchestration this kind of multiplexing pattern is called a pro actor because iocp internally initiates asynchronous reads and writes so the operating system and does them for you and those are performed internally there and the user code is notified about IO operations when they complete hence the IO completion ports this approach is extremely performant so which ones can we use an async IO well on Windows we can both use pro actor and selector on UNIX s we can use the selector 1 which does not mean you're going to use literally the select sis call where we can use multiple implementations of those but all of this is organized in quite a few files so if you're interested in learning how a sync io is built you're gonna be may be a little surprised that there's quite a few files that you need to look in so going from top to bottom here we have the abstract event loop first so that describes the interface of what an event loop should be with no implementation yet then you get the base event loop with a base implementation of the event loop mechanics so you have something like you know the actual while loop like it's there right but there is no multiplexer yet there is no selector yet there so then you have the base selector event loop and in the base selector you already have socket support right so given that there is some selector you can already decide that Oh having this we can implement TCP UDP TLS so those implementations are already there also just a raw file descriptor support is already implemented in the base selector event moving on on Windows you have the selector event loop from that scales not very well in fact right it supports up to 512 sockets and the sockets are the only file descriptors that you can actually use so there are no pipes there are no sub processes there the pro actor event loop scales much better as I said before the multiplexing uses IO that happens on the kernel level on the operating system level so that lets it be really fast and there is no limitation of 512 sockets at the same time the pro actors event loop supports sockets and sub processes but it does not support arbitrary file descriptors so none of those support unix sockets and unix signals so thats Windows on UNIX you only have a selector event loop that's boring well but that event tube already does everything that we need so it supports sockets it supports file descriptors it supports sub processes but to support them it has a concept of child watchers and there's a bunch of them so the default one starts a thread per sub process which is the most robust but you can also use fast child watcher but this one can only be used if you're sure not to use the blocking sub process module in your program and that also extends to the dependencies that you might be having so that's why the default is the robot threaded our child watcher but as I said we don't necessarily get the Select Cisco as our selector so what do we do get well a Python already will select the most performant a selector for you depending on your operating system so on BSD and Mac OS you're likely they're gonna get um KQ that's a single call that can both receive pending events and modify event filters so just with this one call that's more efficient than the traditional select and handles can be more than just file descriptors they can also be child process state changes very fine-grained timers signals and more so kick you pretty good choice for BSD and Mac OS online it's you're gonna get Ippo which is h triggered or level triggered event distribution also very performant available from linux to 544 so for a rather long time like your old linux is that you're gonna be running on should have this on solaris though are you might only have def poll which is still a pretty good well evolution of the poll and select modes because it is faster in terms of Big O right it is o to active file descriptors which is faster than select which is o of highest file descriptor and faster than Paul Paul M being the AT&T system and V system v equivalent of select which goes with all to number of file descriptors so the reason why we're going into so much detail in which one is faster and whatnot is that this is literally your tightest loop in your program right sometimes people are saying that all premature optimization don't optimize unless it's a very tight loop so this loop if you are using async i/o is going to be your tightest loop of your entire program so it really make sense to ensure that there are a lot of performance gains from your particular platform you wanna use the most advanced event loop available for you at any given point so those are there different selector event loops so sometimes you might get one but actually you wanted another for some particular purpose maybe you want to be able to audit the events that happen maybe a some particular behavior of the other Cisco interests suit so can you change the implementation and decide to set a different selector than the one that is selected by default for you yes you can this is the async i/o documentation right there and you see that even though a Python will choose the most performant selector available on your operating system if for some reason you have a strong preference you can change it in this example from the documentation and you can see like this yellow line onset event loop to loop you can create a selector event loop with a specified selector in this case you we really want you know the og select Cisco and then set the event loop manually you don't have to set an event loop manually in most cases except for one important special case Python creates a default event loop only in the main thread so if you're starting a python program and you say get event loop you're gonna get one but if you're starting new threads secondary threads or so-called worker threads for those if you'd like to use a separate thread specific event loop you will have to set it manually why is that well this is not done automatically for you to save you from a very common gotcha in your code if that code written to work on the regular main single threaded Python process if you suddenly started running it in a different thread and that created its own event loop automatically kind of behind your back you would end up with two event loops but one isn't even running remember you have to call run forever or run until complete and you we wouldn't do that for the secondary thread one if you never knew about it but even worse yet even if you kind of did with some clumsy debugging that one would probably be misconfigured it would have a different configuration from the main one and it would not see events that happen on the other one so there would be many problems with that approach so that bug is kind of hard to find and attempt to fix it you know it just may be running that event loop on the start of thread or whatever else like that that never ends well so python defaults to the safe thing which is not to create an event loop for you on worker or secondary threads by default if you really want this which is fair enough you should create that yourself so coming back to our diagram this is the class care key of event loops in Python right again the abstract event loop describes the interface no implementation but the base event loop includes the actual while loop so are you ready to see it I guess because I am I always like to show this to people because that already makes async i/o more familiar look at it there's literally a while true loop and run forever if you're trying it out on your own now you'll probably notice that your run forever is a bit more complex than this one like I cheated a bit and what you're looking at in this video is how the function looked like in Python 3 5 that was before a synchronous generators before automatic handing of the currently running event loop so if you look in Python 3 8 there's gonna be a bit more code but those elements would make it a bit harder to see what is happening here so instead I chose to show you the 3.51 because I find it beautiful like it's powerful but it's crazy simple there is literally a while true loop that just calls run once unless it is stopping then it breaks out that's essentially it of course the heart of the matter is hidden in the ear on one's method so you want to see that well let's see what that does that's run once from what will become Python three eight three so I'm done with cheating this is the latest and greatest version well it doesn't fit on the slide but don't worry about understanding each line here let's just read the dog string first so this is a single iteration of the event loop right it first calls all currently already callbacks pause for IO using the currently chosen selector or pro actor it schedules the resulting callbacks for the next iteration of the event loop and finally for it looks at whether some of the collator callbacks are ready so that's essentially it so let's try again like those are four main things that happen calling the currently ready callbacks pulling for i/o with the current selector or pro actor scheduling the resulting callbacks for the next iteration of the loop and looking whether any of the call later callbacks that we registered are already ready to be called so I need some advice from you right now by the way would you rather see me use the light color scheme next time or you know do you prefer the dark colored one right like with the actual black and background well I used the white one in the console because for conference talks I usually prefer to have a white background because it creates more contrast on a real screen you know with a real projector but for videos it might actually be better to use a dark background I don't know so let me know in the comments section if you'd rather see me use this color scheme or this one this one is actually the one I'm using when I'm coding day to day so I feel very in it so you decide while moving on again we are in the run once function that does four things the most interesting part to me is where we use the selector you know so where is the most important thing there so let's see scrolling a few lines down in that same function you'll see that on line 1854 we run at the select method on the current selector that selector comes from the selectors module so that is a uniform API over the many possible implementations of selectors that you might be dealing with so that's the heart of hearts of icing kayo let's take a short moment to appreciate this wonder now some details are again hidden from us through the aptly-named process event method right process events so what does that look like can we also click through and see how that's implemented of course let's do it fortunately this is all so short we are going through the event list and by doing so add callbacks for the new file reads and writes and remove callbacks from canceled reads and writes that's it make sense but all of this so far was just networking the selector selects ready file descriptors the process events method well it just adds and remove callbacks but who actually calls the callbacks where does that happen we scroll just a few lines down again you will see that in the same run once function we finally get to go through the list of ready callbacks and we run them so if our event loop is in debug mode if you see this if self debug async i/o even times the execution of the callbacks for us to warn us about slow functions hogging the event loop remember our example so it can do this ah that's crazy like okay can we actually configure our event loop to fix or at least make it easier for us to see the Hogg example doing bad stuff to us without printing anything so we didn't ever see the Hogg function in the output remember well let's try that actually this the debug thing is something that we can toggle so if we now go ahead and decide to set debug to true as easy as that and you can just flip this on an event loop that was already created no problem and again call a hog later 15 seconds later and call our loop stop another 20 seconds later we can run forever and see what's gonna happen again first second third stuff is happening everything is crazy but suddenly it comes it grinds to a halt but as soon as a resume do you see that there is some warning that we received from async ILO and that was actually from the logging module so if you would be actually longing to any file or syslog or any other means that you configured through logging that would be visible you would not lose that situation but we now lost it because the prints were too many so let's just look at this message again looking at this message you will see it says executing timer handle hog at some line and file created at some other line and file and it took six and a half seconds look how detailed this log message is it tells you where your function is defined it tells you where it was scheduled on the event loop and how long it took to execute so that is very detailed for debugging that is awesome so I know I should be hiding this information from you until the last episode which talks about debugging but set debug is seriously cool so you should be using it all the time when developing now in Python 37 if you're saying in - uppercase X while starting your Python interpreter you will get a debug set on your event loops by default so that finds many servi and sometimes not so silly problems in your code you should be using that all the time except when you're running in production well if it's a staging platform then you we might have this discussion later offline but if you're running production production you should be running UV loop what does UV loop well it is based on lab UV and libuv is a library written in C that powers nodejs it is extremely performant and also does this magic of selecting the fastest method to run regarding or you know your particular operating system that you're on right now so it also is able to do some magic with pro actors that even go beyond what async io is able to do directly with the Python level loops so let you be pretty cool so now you really wanna of actually used site on to wrap this library and provided with an async io compatible API remember that abstract event loop it tells us exactly what things I think I expects from an event loop implementation that was a deliberate design so plugging this in now you're able to use UV loop for your particular code but you might be asking me well if you believe is so much better than why does Python ship with some other event loops there are many answers to this but the most important thing is that the event loops that you're getting in Python in a cycle already they are plenty fast for most kind of low scale deployments but before you get to deployment there's plenty of things you need to develop your program and having a reference implementation that some random dude can use in his video to show to you that here is the while true loop this is how this thing works and then we can click through and read sentences that almost read like English to explain how all of this connects together is tremendously useful it is also useful when you're running code on it because if things are surprising every now and again on something that is an asynchronous event loop you can also just pdb breakpoint stop your program and step through it and really see what is happening so that reference implementation is important for teaching for correctness for being a reference implementation so others like UV loop can actually use them and compare behavior so we will always have a dot particular implementation there in python and it is already pretty fast especially on Windows if you're using the default Pro actor loop again IO is happening on the kernel level so that event loop is pretty good at the start so you pipe installed UV loop did anything change not yet what you need to do is well first yes let's pip install UV loop the latest version at the time of making that video was 0 14 and as soon as you have it you need to only import it and say UV loop install and at this point when you are getting on the vent tube from a sink IO the one you're gonna get is in fact going to be the UV loop loop so that's everything that you need to do just remember to call UV loop installed before you get event loop and beaker because if the event loop is already set up for you you would need to exchange it to the UV loop one so that is something very important to call UV loop install quickly so that's pretty much it for today's episode today we focused on the event loop to show you that it really is nothing magical it is Lily a loop which handles Network events and executes as a cutes callbacks one by one as the picture shows so it can handle many Network events concurrently using a selector or pro actor and can handle many callbacks concurrently even though it can only execute one callback at a time you need to really internalize this this is why I'm telling you this so many times some callbacks scheduled themselves again on the event loop which is a trick called a trampoline sure we're gonna see it again so now if you felt like we went quite low level here and making big programs like this using call soon and come later doesn't seem very natural your gut feeling is absolutely correct instead of using call soon and collator directly async il programs are written using Co routines and those are defined in async functions so when we are going through our videos fortunately the very next episode is about that so if you now feel like maybe this icing kind of thing is not really for me just wait for it just wait for that one episode coroutines make asynchronous programming really natural so the reason why we went through the event loop now is so that the curtains will feel natural but they won't feel magical we will understand how they're implemented as well so pretty much this is where we are end of slide show end of episode what I would like you to invite you to do now is first of all give me that feedback on whether a light background is better for us or a black background is better for us like I'm pretty sure that you know even though public speaking requires light backgrounds here it's not quite clear you might be looking at the screen you know in our room that is not very well lit so I don't really know like how you are watching how you're consuming those videos so let me know and the second thing I would like you to do is hey there are episodes that are coming in the future and some of them say very little so far like batteries included so let me know what kind of application you would like to write what is kind of brewing in your mind when you're thinking of async IO I might be able to help you when working on the batteries included episode as well as the example web application that I already have in my plan but again if you tell me exactly what you are looking for I might be able to introduce something specifically to answer your question to answer your need here so subscribe to let to be let know about the next episode thanks again for watching see you next time [Music]
Info
Channel: EdgeDB
Views: 18,271
Rating: 4.9386067 out of 5
Keywords: python, asyncio, edgedb
Id: E7Yn5biBZ58
Channel Id: undefined
Length: 36min 6sec (2166 seconds)
Published: Mon Apr 20 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.