import asyncio: Learn Python's AsyncIO #3 - Using Coroutines

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi this is Lucas from HDB you're watching the third episode of import async io an introduction to the Python framework enabling asynchronous single threaded concurrent code using co-routines this series is intended as a beginners course so if you haven't really used async io yet that's perfect just start at Episode one if you haven't seen that yet today we will finally play with the async await keywords that's probably what you've been waiting for all along and yet for an entire two episodes we haven't used those at all today this will change in this video we'll look at using async and await in a variety of ways we'll start with basic core routines but will also mention other available objects and discuss tasks later on but before we get there we will talk about one of the most important features of the await expression the ability to await multiple things at once from there we will seamlessly discover how we can fire up tasks to run things in the background including waiting for them to finish and cancelling them if we don't want them to finish anymore so we're also going to talk a bit about futures however common pitfalls and futures are gonna have to wait for the next episode because this one is going to be pretty long as you'll see yourself we're gonna finish on constellations are in general because I do realize this is an important topic for any production application so now you're gonna do alright before we begin let me address one thing in the last episode I asked you whether you'd like us to use dark text on white background or vice-versa there's a lot of code in this series so we wanted to make sure that we are maximally legible well we received a lot of responses but no clear favorite emerged I'm sorry we won't be able to accommodate everyone then but we decided to go with the white background this was not a coin flip I decided to trust the voice of the most experienced teachers that shared their advice with me specifically that Shaw points out that dark text on white background compresses better in video achieving better contrast and allowing people to see text clearly in more environments especially on smaller screens if our videos were very long every time all the time it would maybe be less of an obvious choice but given that every episode is planned to be well under 60 minutes seems like going with the white background is better on average so Dave Beasley shares this view again black text on a white background is more legible when minimizing the video looking at smaller screens and in general provides maximum contrast so if you need less contrast it's possible to dim your screen but if you don't have enough contrast to begin with well it is impossible sometimes to boost it so that's it this is the choice sorry if you had to help us it preference we appreciate you sharing your opinion moving on in the last video we discussed running things on the event loop with Coulson and Cole later what you scheduled away are just regular functions they run until they are done one clever function are that we met during this exploration was the trampoline which did some work and then rescheduled itself back on the event loop this is not how you program an async IO most of the time instead you'll use async functions which are a more high-level let's try to create an equivalent async functions to what we are seeing on the screen right now in other words an async function that keeps printing out the current timestamp so first let's start by Python for today's session we will need our print now function that prints the current timestamp so let's just recreate it and now we can create an async function which keeps printing the current timestamp some interval so it'll take some name we can use that to identify which a sync function is currently running when we just have a while true loop which just goes and on in an infinite infinite manner prints the name prints now and just sleeps for a while just so that we don't have a stream of prints that you know that would not be very nice in terms of just showing you the examples right now so even before we get to running the code let's look at our first async function and notice a few interesting things about it so first thanks to using the async DEF keywords instead of just plain def the function is compiled as asynchronous that means we can use our weight expressions and other recent constraints inside it what does an ASIC expression do are two things or an await expression first when you read this async function you know that the await expression will block from executing anything else until whatever we are awaiting on is done in our example it's guaranteed that the next iteration of the while loop will not happen until we waited at least half a second and hold on in async io is always important to realize that whenever you see a number that is some number of seconds to wait times out and timeouts and whatnot that always means that this is at least how long we're gonna wait due to the asynchronous nature of the event loop the fact that we are running blocking functions on it and the fact that this is all happening on a single thread means we will never be real time guaranteed you know to the nanosecond to the point of the chosen a number so if you are a sync I of sleeping for half a second that means we're gonna be sleeping at half a second so moving on with the await expression that makes it natural to read async functions because they read from top to bottom like regular functions since the await essentially blocks execution within an async function there is no callback magic that we need to worry about here but there's another thing that the await expression is doing under the hood it allows async IO to do something else while we're waiting for the sleep to finish so this function is not doing anything else but the event loop can run other things at that time sometimes people will say that when an async function awaits it yields execution back to a scenario allowing a sec I ought to switch to executing something else in the meantime so let's see how that works the easiest way to run an async function is to use async IO dot run it will run our async function until completion in our case well we'll have to interrupt it with ctrl C because the async function is literally an infinite loop in our definition so that's a nasty nasty trace back here is there a more graceful way to timeout on an async function yes we can wrap our async function in wait for and give it a timeout so the invocation still raises an exception here as you will see but that's a friendly exception that timeout it tells us that the async function did not finish executing before the timeout happened naturally we could have just used an async function that exits on its own in this example but that will be boring if it still bothers you to see a trace back in the console we can do better we can define a main function that we will pass to async I'll run and treat this as our entry point to the asynchronous world right we will do every asynchronous thing from our async main this is a common pattern in fact in real world a single application you will often see some async main which is why we're gonna show it to you right here we still use the same async/await for for handling the time out but this time the result will be probably a little less dramatic let's look what's gonna happen when we actually execute this so ACK oh the run and now async main cool boom time's up no exceptions now everything was handled very gracefully with the try/except this only shows you that inside are functions that are asynchronous you can also handle exceptions just as you would otherwise this is also a very helpful thing to understand because exceptions in Python are used for many things including some control or flow which we're gonna see with constellations later on in this very video so let's look at the documentation for the wait for function if we actually wanted you to see and you know figure out how to use it you would see this explanation of what our wait for does so what do we have here well there's plenty of new words quality available tasks future all those are in fact related on they are the building blocks of async error programs will introduce all of those today but first let's look at a higher level so here's an abstract class hierarchy of the terms I highlighted before so you can find a way to ball and co-routine in the collections dot ABC module future and tasks live in the async io package so what do we have here an away tab on is any object that can be awaited on in other words put it in a await expression and it will do something useful to see this better let's redefine our async main with the objects clearly instantiated as you can see we create a KP object here but it doesn't really run yet similarly we pass KP to a waiter object that we created that waiter object doesn't start counting seconds just yet we need to await on it okay that works exactly as before now but hopefully you can see that our way tables are just regular objects we can pass them around if we don't start awaiting on them nothing will happen to see this in action let's recreate our async main again this time forgetting to await on the wheel what would happen if we just didn't say oh wait so we'll need a little more space now again we're creating KP and weight and objects exactly in the same manner so instantiating here but we air quotes forget to await like this will look silly now and they try except because you will just say waiter or like which is just a variable name because we created the object on a separate line but if we created the waiter object right in the try except block without assigning it a name it would look just like an old-style a regular function call when we run our async main will see that async IO helpfully will tell us that we created two objects that we never awaited on unused availables are often a sign of a bug like this so it's good that async IO will inform us of it but this warning doesn't appear by default you need to start your python process with python a sin KO debug equals 1 that's an environment variable and if you also use python trace malloc set to 1 the warning will be much more helpful because it will tell you exactly where you created your eatable object in your code so I use those both pretty much all the time and there's also a dash X dev mode in Python which you can use for that as well however that does not turn on trace Malik by default because trace Malik does have a performance penalty on it so moving on now we know that an available object is one that we can use in an await expression there's a few kinds of them including custom ones that you can create but that's a topic for a different episode probably now we're focused on the core routine see I kept using the term async function when describing functions that are defined with async death and use our inside rain sometimes people will call them coroutines but I like to make a distinction an async function is the definition it will create coroutines when called corrodens in turn are away table objects that you create by calling async functions so coroutines are away to bond but async functions are not so let's see how that works if you try to a sync run just that async function this will not work it will tell us that the core routine was expected but we just got a regular function but if we actually run according that is fine if we run the other one now it is also perfectly fine however if we try to execute the one that we already I waited on again this will not work coatings can be only awaited on once so just to drill this a bit more an async function is a function that creates co-routines when called it is defined using async def and can use our weight expressions inside a coroutine in turn is an object that you create by calling an async function so that object is away table don't worry if this is a a little mind boggling right now it will become more natural the more you use async IO and most of the time you don't really have to think about the difference between the two just like in the example here in a line of two usually you will await on a coordinate that you just created right there on the same line and it really looks like you just calling a thing so C keep printing is an async function but when called it creates a coroutine that we immediately await on in our async function right there so the more you do this the more natural this will get don't worry is it I wanted to make the async function and co-routine distinction right at the beginning of our async I'll journey because I am a strong believer in no magic when you understand this distinction many bugs that confuse beginners well become easy to spot and fix and believe me you will forget to type await sometimes and you will try to await on the same core routine many times those things do happen so most of the mistakes we make our silly mistakes now that you understand how those things are composed it is less likely that you will be absolutely confused when stuff does not work so now and now that we know that we can outwait on co-routines it's time to learn how those things compose when we put consecutive await expressions one after another the second one won't start executing until the first one is done we've already seen it in our infinite loop example before similarly the third one won't start executing until the second one you know it reads from top to bottom essentially so we can check this out ourselves right the core routine that's running now is the first one it would be an infinite loop so I had to keyboard interrupted again like the the second curtain would never run but what if we want all the coroutines to run at the same time fortunately this is a built-in feature of async IO and a very powerful one at that instead of using many awaits we will just use one but we can put our coatings inside I gather coroutine so having that defined running it now will cause our stuff to run essentially at the same time however remember that we are not executing in parallel but concurrently so all of this is happening in a single thread 1st 2nd 3rd awesome thanks to cooperative multitasking that single thread executes some multiple coroutines at the same time still have to break this here but we'll deal with this our keyboard interrupt exception later so i already told you about the bartender analogy right but another great analogy i've seen is this video of workers hammering in a pole only one hammer hits at a time and yet thanks to cooperation it completes much faster so that's essentially how I think about cooperative multitasking about an event loop doing many things at the same time in just one thread I hope this is useful for you as well because I just kind of like the elegance of what you can see on the video but I think a kind of gets me that same feeling all right so as usual let's fix the agree keyboard interrupt with async I'll wait for C Carew teens compose very well just wrapping our gather and our wait for is enough to make it stop running after a predefined time so we can just write the gather inside it right that will instantiate the curtain and now the wait for well deal with this let's see how that will execute like we still have to recreate like are all three core routines that we're gonna want to happen at the same time we set our own timeout for the five seconds and we say oh if there is a timeout err just print that oops ah our times up okay let's execute this now and see what happens so now that we run this we should see that there is gonna be our time out of but we also see another thing well what is this I think IO is still complaining about some exception that we did not anticipate apparently our keep printing coroutines raised a canceled error exception but we never retrieved it so this gently introduces us to the concept of canceling let's explore that for a while so let's look at our latest async main function one more time we only have 108 here we are awaiting on an async await for which wraps a gather coroutine which gathers three keep printing co-routines so the async I wait for is configured to timeout after five seconds but what does that mean what does actually happen we've seen the docs for this function before and it mentioned that when a timeout occurs whatever we waited for is cancelled so what were we waiting for they gather coroutine so that will be canceled now the gather documentation says that if it gets canceled itself it propagates the cancellation to all our waiter balls that is it is gathering so far so good a sink I await for timed out canceled our gather and gather in turn cancels all three keep printing curtains but what does the keep printing colluding see when it is cancelled well when a coroutine gets cancelled what happens inside is that the await expression that is currently executing raises the cancelled error exception so let's look at the code of our keep printing async function in our case it only has a single await expression so we know that the exception is raised right there I think it's a brilliant design decision to make cancellations exceptions because this compose is very well with the rest of Python I already mentioned that to you that you know exceptions are important and the fact that you know you can just handle stuff like cancellations with them it's pretty cool let's see how we can deal with those cancelled errors though so there's essentially two options either we can handle the cancel or inside of our async function by wrapping it in a try accept block or we just don't do anything and you just let it bubble up and you know deal well whoever is awaiting on us we'll deal with it then so in other words either we catch the exception or let the caller worry about it so let's see how catching it ourselves will work we can reuse our async main function so let's just redefine and keep printing it will work exactly as before but now we will wrap the single await expression in our try except block so for now let's just create the infinite loop and do the prints that we wanted and finally we have a try with this away for the silly sleep and now accept async IO cancelled err what should we do well we can say that you know we were canceled that will let us see what's happening and just break cool so can we run this let's try when we run it and it actually gets canceled we will see that Oh first was canceled second was canceled thread was canceled oops time's up well but only partial success the co-routines in fact nicely reported that they were cancelled but we are still getting complaints from async IO that the canceled air of gather itself wasn't handled by us in any way and can we fix that of course but before we get there let's talk about futures and tasks first as I mentioned the cold in abstract based class can be found in collections dot ABC alongside its parent the away table those are core features of Python they are built into the interpreter futures and tasks are features of the async i/o framework so if you would use twisted tornado curio or trio you would still use away tables and coroutines but you wouldn't use async iove's futures and tasks in other words the core routine is a low-level building block it doesn't know about any essential concept like the event loop or cancellations so this is why the documentation tells you that many of the built-in functions in a scenario rapke routines in tasks that action allows async i/o to keep track of the Karoo teens handle results returned from the kuru teen as well as exceptions raised inside the kuru teen including cancellations so let's first meet the task right um as just users so seeing what tasks enable us to do um the mighty task is actually one of the features of async IO that makes it so wonderful so let's go but to be quite frank I expect that at this point you're probably pretty bored with async IO sleeps those are kind of silly right and I'm not entirely fond of those simplified examples myself so let's do something more exciting how about let's write a trivial web crawler of course there's nowhere near the time to make it production-grade but well the only thing that we need is to crawl some particular set of trivial pages like the one I just generated here so the content of a given page here includes hyperlinks to more pages so you're gonna follow those hyperlinks for simplicity the URLs are always absolute and always at the start of the line there's no HTML there's no robots.txt handling or anything like that it's really relatively a trivial example but with a twist there's a 33% probability that a link will actually contain the full copy of Joseph Conrad's Heart of Darkness well making it a bit slower to download so that's essentially it let's write our web crawler that will download all of this ready I'm ready so we will be using a third-party HTTP like client library called HTTP X just pip install HTTP X when you're done I wanted to minimize new api's and libraries in those videos so that they are not overwhelming but I guess one doesn't really hurt and will help us do it the right way so you can see that crawls 0 so the first version starting with 0 right well just get some prefix which is gonna be our base website address and a given URL if we don't give it a URL at first we're just gonna use the prefix as the URL we're gonna print that something is happening and now we click create the async client just call climb get and then finally close the client now having the result we can split lines on the text of the result and whenever we find the prefix means we found a link let's just away it on another crawl right so pretty simple right here even if you didn't know HTTP X before that should probably read like English right we created an async client and we were just getting it so now we're already crawling it it already does its job but this will actually take a long while as you can see like you know there's downloads that are happening rather slowly so let's pop up our session side by side so we can discuss how it works and what's wrong with it actually right as you can see on the left side already it's slow so there's something wrong well but there's actually four things that I don't like about this initial version of ours so first of all I don't like when things that do kind of back and tasks back and activity also are responsible for reporting their status so reporting progress from that same task yeah I don't like it especially if it is just a print or logging like you know all of those probably no great ideas and then if you really follow this a wait cross zero inside an async function that is already called crawl zero and that is a recursive form that might be a little annoying layer if we really had a gigantic and very deep web sites so that's probably something that we should avoid as well and then a whit crawl is essentially recreating a blocking environment because we only ever cross zero on one particular URL so we are not using any of the nice features of async i/o that allow concurrency and finally maybe you notice that we are getting the async client just like that and we should probably not do it and you a context manager instead you're right however I would like to introduce all of those concepts one by one so we're not gonna be using asynchronous context managers just yet but we listed four things I that I didn't like we can probably fix at least some and maybe more incrementally right so I'd start by storing our URL and our variable so I have less to type later and now let's write a progress reporting async function so if we want to report progress in a elegant way like we all take the URL and another async function see it's a callable that returns according its what I told you to like is the tight static typing tells you exactly the same so oh we have our first task created here we pass it a core routine and set a name which is a good practice that helps with debugging so now to really report progress we need to know where we are so we'll be using a to-do set that holds URLs let's measure the time between when we start and we finish and then while there's still stuff to do let's print the status so there's like you know some complicated printing because I wanted to show our nice you know like a length of the to-do list and maybe some members of the to-do list we cannot quite show the entire thing because the our console is kind of narrow so let's just show the last 38 characters so now we still have our old friend async I'll sleep well just because we don't want to print the status you know can if all the time we just want to print it out often enough so that I as a human know what's going on so now that we ended we only need to measure the time elapsed and report how long it all took so that fixes the reporting and actually gives us a form of a benchmark well it's not like a strict benchmark in the sense that you would actually use to you know deal with problems with your programs but it's a good thing to know so now let's create a to-do set and recreate the crawl function actually so it uses the to do set right so now we already created the crawl 1 async function so it's gonna have that same signature so it has a prefix which is our base address for the website and the URL if we didn't pass the URL yet which is gonna use the prefix at the very start again we're creating the async client for us and now we are awaiting on a client get to actually get the contents of the URL now that we are done we can close the client and split lines on the text in the result and if the line happens to look like another of our URLs where it is gonna Oh add it to our - doula set and a weight on it again so like we're not changing that yet we will in fact just do this incrementally so oh we also have to discard things from the to-do set when we are done right with our own URL on the invocation so running this again well show us some interesting results we already see how all of this nicely nicely runs a step by step but this will essentially be as slow as the own example that I showed you before so let me show you this nice print out let's call it of the progress async function just so that we can still look at it once more and talk a little about the creat tasks well what is it like how come does anything actually happen we are not awaiting on the tasks anywhere and yet we clearly see on the left side of the screen that there are in fact things that happen let me just remove my head so you can see the entire Marco and there's not much there but you know it may be helpful so anyway um create task what it does is it tells a sink I owe that now there is going to be a quarantine that a sink attracts so it will run our thing in the background like a little e in the background of our current execution context meaning it will only run whenever we are awaiting on something so the await async eye asleep or the gaps in which background tasks are actually executing so all of this took 94 seconds well that is pretty disgusting to be honest with you the new Kroll function is a kind of still slow or we can do much better I'm pretty sure we can do much better than this so let's just recreate a function that will be essentially the same with one important difference we won't be awaiting for the crawl but rather just using what we already saw scheduling crawls as background tasks so all of this that you are seeing right now is the same but now that we split lines and we find a line that looks like hyperlink right a URL we will okay add it to the to-do list but now we will create tasks create tasks essentially saying crawl to here with the prefix and the line which is the URL and our case we will actually put a name there which is great it is a good practice that gives you know a descriptive name for debugging purposes so we still discard and whatnot then let's just run that and see what happens okay so so far so good looks exactly the same oh no it doesn't we run 20 of them at the same time at some point right so there was much more concurrency actually took 13 seconds instead of 94 that's over seven times faster to do the same work on the same internet connection and still using just one thread so that should get you excited but that's the power of async IO essentially so now we saw that this concurrency thing enables us to do a lot of things even just with a single thread and while this was I hope an impressive demo of what tasks can accomplish we shouldn't let them run a while like this so if we just create tasks and they're just happily doing things in the background well what if we actually want some results from them or what if something goes wrong in them are there's an exception that is happening or whatnot in fact if our progress meter got canceled or otherwise got interrupted wouldn't we want for those background tasks to also be cancelled instead of just continuing until they're done maybe a long time later like we would expect actually for the pending tasks to be canceled as well but since they are running in the background that would not happen unless we do something about it so if we want to stop our application altogether we'd like to either cancel the tasks gracefully maybe in some cases we would even wait for them to finish maybe that's that's not a very long operation and it's nicer to just allow for whatever is happening to you know and come to its conclusion or at least you could ask the user hey are you sure like you know you're you're attempting to quit but there's pending downloads so it is essentially a very good practice to have handling like this all right so now cool so we need to store the tasks somewhere to be able to cancel them when our progress indicator gets canceled itself is that clear like that essentially means we need to have handles to our tasks so that we can send them information that we want them to be so fortunately we already have a to do set it just used to be some useless URLs that we used for printing but we only need to modify it so it holds the actual tasks as you can see the diff here is pretty small there's actually no code behind my back here that is all that we need to change in the crawl to function so as you can see I even removed the line that discards the URL that was already dealt with but why well we won't need it anymore because async IO will help us maintain this set with its async i/o dot weight function so we need to change the progress async function now right our progress meter needs to handle this new life we need to make it understand this anew to do set so first we will store they create a task in a variable so that we can add it to the to do set as well right we're gonna be using that same functionality that we just saw just as we did in this modification to crawl to now instead of async I of sleeps will be using async IO dot weight as I told you about in the last slide so I think I'll wait right here it takes a collection of tasks so our to-do set is great for this and as the name implies waits for them to complete but we actually want to report progress while things are still going on so we're setting a timeout on the weight unlike async I wait for this function will not raise an exception instead it returns two sets of tasks done tasks and tasks that are still pending to clean up our to-do set we're removing the done tasks from it now we only need to print the current status which we are doing I used more characters per line because we are using this entire width that we have and that's it essentially right the rest is pretty much the same so let's see how this new progress works all right now we see it kind of raising its concurrency I'm also pretty well up to 20 goes down a bit because the longest downloads are now going on also and it took 13 seconds so same performance but a much cleaner approach because we are essentially tracking our error tasks now we know at any given point what are the actual pieces of context pieces of code that are running on the event loop well okay so the very last thing I wanted to show because that was the happy case right now is that we can cancel pending tasks so when interruptions happen so for example if somebody actually decides to kill your process or you need to recreate something because you change the network part that you're listening on or something like this we can cancel whatever is pending so that we don't have things that are incomplete this is our large topic actually so I won't cover everything just yet but it's important enough to just cover it right now right here at least in some basic form so thanks to art to do that in fact like we we have just the thing that lets us clean things up so let's leave our trusty progress function alone and just deal with cleanups in a new async main we've seen async main at the start of this video so we can use it again we can await on the progress or with a given address and an algorithm our crawl to write but when an async i/o cancelled error happens now we will go through our to-do set and cancel everything on it but we are nice people so we are still going to give it one last chance to finish we'll remove things that are done from the to-do set we'll remove the that are pending from the to do set and even cover an unlikely but kind of annoying thing that can happen more tasks added while we were canceling already will discuss this on the next slide so now trying to run this will see that well we cannot really just a sink a run anymore so let's just take the loop create our task for a sink main and using coal later that we met in the last episode we can just run task cancel on it like ten seconds from now right so we'll just kill it then so now running it under complete we'll see that in fact oh it did cut short but the task does not indeed finish before crawling the entire website but see there are no warnings there are no exceptions we cleaned up after ourselves rather gracefully here so what is now the point of the warning that we introduced in the code before so let's think about this what what can happen is async IO create task is not itself away table right you didn't say a wait async I'll create task you just ran that instruction and it's scheduled things on the event loop right there right then that is itself really powerful it doesn't take much time so why await cool but if you are canceling things you have to realize that cancellations as exceptions only happen during await expressions so there can be a very unlucky course of events in which you are trying to cancel a thing but you just just about missed your chance and whatever is happening later actually managed to create a task in the background so while this does not happen in the examples that we have we had in this video it is good to have this realization right now that cancellations might be tricky and entirely graceful things might well graceful shutdowns might require multiple passes of like decreasing Liebig sets of tasks that are still somehow pending all right so that's essentially it for today's episode today you learned about async functions and co-routines how are they related but how they are different as well what can you await on and how to await on many things at once we saw tasks and how they enable us to run things in the background we learned how to wait on them and how to keep track of them and cancel them if needed so that was a lot of information just in one go we'll continue working with coroutines tasks and futures in the next video I promised you futures we didn't really get to them today so there will be some focus on future objects in particular next time around we look how all this is implemented in Python but we will also cover some typical pitfalls in async are you use and how to avoid them we've already seen missing a weight there's quite a few more so subscribe to make sure you will get notified when the episode is out thanks for watching see you next time [Music]
Info
Channel: EdgeDB
Views: 15,024
Rating: 4.949821 out of 5
Keywords: python, asyncio, edgedb, async, await, coroutine
Id: -CzqsgaXUM8
Channel Id: undefined
Length: 46min 40sec (2800 seconds)
Published: Tue May 05 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.