Threads vs Async: Has Asyncio Solved Concurrency?

Video Statistics and Information

Video

Captions Word Cloud

Captions

awesome all right i'll get going then um so yeah thank you very much for that introduction my name is jacob and today i'm going to talk to you on the topic of has async i o solved concurrency so um as was just said i am a software engineer at deloitte analytics and cognitive um yeah before we get going i should just say um if you're if you're interested in any of the work that we do at deloitte i'm not gonna not gonna go on about it for too long now but um if you are interested in helping some of the world's largest companies get value out of their data then please feel free to reach out to me obviously chat to me on discord anytime during the conference uh or if you prefer my email address and twitter handle are up there on the screen right now so today i want to talk to you about um concurrency and i'm going to begin by just getting us all on the same page about what concurrency is what we mean by concurrency so i'm going to try you should be able to see my terminal now i think that's that's working um so i'm just going to run a little program and it's it's an extremely uh simple program it's called clock echo and what it's going to do is print the time every two seconds which is pretty simple thing uh but it can also do something else which is that if i type something you know just hide and press enter then it's gonna echo that thing back to me in uppercase so there's nothing really impressive about this program i don't think um it's sorry yeah there's nothing very impressive about this but i suppose what's a bit interesting is the fact that it is like one program that can manage two different tasks at once namely kind of echoing stuff back in uppercase and printing the time so this is a really simple example of what a concurrent computer program looks like and two frameworks that exist that you've probably heard of for writing concurrent code are threads and async io now out of these two threads are often billed uh fairly or otherwise as being more old-fashioned less performance and a bit more error-prone or harder to reason about whereas by comparison async io is young fast and just generally kind of the way of the future um so in this talk the objective is to see how much truth there is to this view and the way that we're going to do this is we're not going to look too deeply at the api of like how to use this versus how to use async i o look a bit more fundamental um how these two technologies differ in terms of like on the most fundamental level how they achieve concurrency how like how they possibly managed to like give the impression that two things are actually happening at the same time and by following this more fundamental approach it should just drop out fairly naturally uh first of all how they work secondly which is better and under which circumstances and thirdly to answer the question posed in the title of this talk is it indeed the case that asynchaio spells the end for threats so to to kind of begin to understand these two technologies at a really fundamental level we have to talk about something called scheduling and there's going to be a scheduler the heart of any concurrency framework and the fundamental job or like reason for existence for the scheduler is that you have one cpu and you've got lots of tasks that you want that cpu to do and the scheduler's job is going to be to tell the cpu which task to focus on when to focus on it and how long to spend on it before swapping its attention to something else so you could think of this situation as being analogous to a chess grandmaster who is playing simultaneously against lots of opponents so you know the grand master in in like intuitively the gun master should be able to win all these games of chess because they are much better of a chess like they've kind of got much more ability than their than the sum of their opponents ability perhaps but still even even with that they still need a strategy to focus on the right game at the right time um they may need to focus on you know maybe some game positions are harder and they need to you know give more time to that or maybe you know some of the opponents are better or whatnot so the scheduler is kind of telling the grand master who to focus their attention on and for how long so one way that we can uh tackle this problem my one one approach to scheduling is called cooperative scheduling and cooperative scheduling is what drives async io it's a to kind of get an insight into what cooperative scheduling is but we're going to go through a simple example here so i'm going to try and write a program um i'm going to call it a clock echo and what i want to do is just recreate what i just showed you so i'm going to just recreate the whole thing the same thing where it prints the time every two seconds and echoes back the user's input so to begin with i'm just going to like write these two functions without any regard for concurrency so i'm going to begin by writing a functional clock and i mean like all that's going to have to do is be like while true uh print time dot c time that's going to fit the current time and then sleep for two seconds all right so let's call this function now so that's working quite nicely that's printing the time every two seconds that's a good start okay now let's just write our echo function as well but the echo function is just as simple we're going to do while true and we're just going to say the message can be the user's input and then we're just going to print message function so now when i type something it's gonna that function's gonna print whatever i type back in off the case so we've kind of like successfully written um two functions which do the two things individually uh now the challenge is to make them run concurrently all right it's kind of an interesting thing if you've never done it before so kind of just stare at this code um and just think like how could i possibly make these two things happen at once right like not using any libraries not not reaching for any magic like threads or async just just looking at the code and thinking just using normal python control control structures um like how would one make these two things happen at the same time like in that program that i showed you at the beginning and if you kind of like sit back and think about this a bit then i get you're gonna quite quickly realize that a problem that exists here is that both of these functions begin with a wild true loop and that loop is means that once we enter that function we're never gonna exit we're just in an infinite loop for the rest of the program so this is kind of a problem like we need to do something about these these wild true loops so we can't use these functions in their current form um and one thing you might think about doing to get around this problem is well maybe instead of having two separate functions you can just create one function called like clock echo um and then inside that you can just have one big wild true loop and you can just put the put both bits of logic so for the clock logic first and then underneath that in the same while true loop you could then have the uh echo logic just in the same function and then we can get rid of the original functions because um you know like just put all the logic in one big function now so so let's see how this goes i mean this kind of like i mean on a very superficial level it gets rid of our like wild true problem that now we just have one function and all the logics just in one big loop uh so if i call that function let's just run it right so so that's i've think of the current time once but but then it's stopped so so let's see like what's going wrong here well then you know we've entered this function and we've gone into this wild true loop and we've printed the time so that's good and then we've set for two seconds but now we're stuck here on line 13 waiting for the user to type something and this program is it's not going to do anything it's just sitting there on that line being like okay now i'm waiting for some user input until i get some user input i'm not gonna i'm not gonna do anything else so so now if i you know if i give it some input it's gonna echo that back in uppercase and it's gonna go to the beginning of this while true loop and put the time again but then we're gonna get stuck on this line again so this isn't going to work because like you know like like the clock is not going to take every two seconds it's just going to tick whenever you type something um so one thing um one thing you might kind of feel now is like well what we need is we only really want these lines to execute when there is actually something to print in uppercase like so we want to skip these two lines in the event that the user hasn't typed anything and only run them if they have typed something and there is a way of doing this in python um we have to use the select module i'm also just going to do from sis import sddn and the way this is that we do it is we i mean the the api is slightly uh well complicated and not particularly important but we do something like this um so what we're going to do is we've created this variable called available and we're signing it and then you know there's two other items in the term which we're not interested in and then we're assigning it the result of this call to a function called select and the essentially what's going on here is that something's going to get dumped into this available object if there is something on stdn which can be read and um if there's nothing on std in that can be bad then this zero means we're going to timeout after zero seconds so we're gonna timeout immediately we're not gonna we're not gonna sit there waiting and blocking the program for the user to type something so what we can do now is we can say if available something available to read then we're gonna run these two lines and if there's nothing available to read then this will be false and we'll go back and put the time again all right so let's see let's see if this includes things um so this is cool like it's actually tinting the time every two seconds now so that's that's an improvement um the clock is actually working and let's see about um when i type something what happens well it does get printed out in uppercase and it might not be entirely clear because potentially there's a bit of lag on the internet connection but it although it prints what i wrote in uppercase it only can sit on the next tick of the clock um like i have to wait up to two seconds before what i type gets printed back so we're a bit closer but we still got a bit of a problem here um and the problem is that if i type something while we're on this time dot sleep line here then we have to wait for the sleeping to finish before that thing gets echoed back in uppercase so so we're a bit closer um but but we need to do something about this sleeping here um and one way that we can get around that a pretty uh this is a bit of a hacky and inefficient way but but it doesn't work it's at the beginning of this function we can log the kind of the start time to the current time at the moment we started executing the function and we can initiate some counter i zero and then down here we can get rid of this time dot sleep line and instead we can say well if if the current time minus the start times it's kind of like the amount of time that's passed since we started this function uh divided by two so that this double forward slash is like kind of like dividing and then rounding down to the nearest integer if that is equal to i then we're going to print the oppo in time and we also need to uh increment our counter i so what what this kind of doing is like instead of sleeping for three seconds it's kind of like deducing by the amount of time that started since the function began like it it's using that quantity to kind of work out when it needs to find the current time rather than sleeping um so um again let's let's see if that works um so there's printing the time every two seconds which is quite cool because we haven't used sleep function so that's kind of kind of interesting that that's possible and then if i type something again it might not be obvious there's a difference there but it is now echoing back immediately as soon as i type it um so we've kind of implemented concurrency here from first principles like but we've um and you can kind of imagine like we've only when you're doing two things we've got the clock and the echo but you could imagine we could we could add more stuff inside this wild true loop like we could have in principle we could as many things as we want um below those two little snippets um so there are a couple of things to note about what we just wrote and they are general properties of um cooperative scheduling the first is that what we just did is is implemented at the application level now there is a bit of an asterisk by that like because i did use the select module and the select module does make a assist call so the there is a little bit of magic coming from the operating system but only for the narrow purpose of checking if there is something available to read for the most part the general idea of like how concurrency is achieved is happening at the application level and secondly um we just changed like what we just did there by you know using the select module and getting rid of that sleep line was we got rid of everything that was blocking and we replaced it for a non-blocking alternative so at no point will we get stuck on a line of code either waiting for two seconds to pass or waiting for the user to type some input and it is this point which really on on some level of abstraction this is why we call it uh cooperative scheduling um we're kind of relying on both of those little little snippets of code to do that thing but then hand back control so in both cases like that like a clock needs to you need to just you know either put the time or not from the time but it then needs to just hand back control to this this outer wild chulu almost immediately it can't like hog the resources of the interpreter for any longer than it needs to um and this is kind of what what we're talking about when we say it's cooperative um so so that's like one approach that you could take to to scheduling um and it is the on again of course async is much more complicated than those 20 lines that we just wrote but that is uh on one level of abstraction kind of what async i o is doing uh and then another another approach we could use is preemptive scheduling and this is what is used by threats so let's have a look at how we can kind of achieve the same thing using threads so i've got a little um little file ready here just to just to kind of rewind us back to that point where we have the clock and the echo functions both written um without any regard for concurrency now to make these two running words all we need to do is import threading and then we just like kind of fire off two threads so we do threading dot thread target equals clock so this is going to be a thread for the clock function and then the threading dot squared target equals echo start this is a uh this is a thread for the echo function all right now let's uh let's run that now printing the time every two seconds and whenever i whatever i type gets echo back in uppercase so these two functions are running concurrently um with like yeah i haven't had to do anything i'm just able to run them just like that uh no no effort required really um so that that's kind of like by comparison what um preemptive scheduling looks like um you're able to run any two bits of code at the same time very easily kind of almost like magic um the way it works is the the underlying mechanism is handled by the operating system it would not be possible to implement the threading module using python alone the only way that what that learning module can work is by delving into deep powers that the operating system has by virtue of being an operating system as opposed to an application um the operating system is able to freeze running code mid flow we say in this case that the code was preempted and then it is able to switch the cpu's attention to something else so one advantage of this is that the uh like unlike async i o it is not going to get stuck on uncooperative code it doesn't matter we don't need to make all our code non-blocking we don't need to get rid of the sleep statement and the input line um because the operating system's all powerful it can just preempt any code it wants and focus on something else at any time so this you know like if you've never seen either of these before you'd probably be thinking right now well threads just look like a whole lot easier a whole lot uh nicer um so let's let's dig into that that which of these two is is the better technology um so first thing to say is that um it's the first thing to look at is the overhead when switching from one task to another so in the case of threads they have to preempt running code so they have to like take some running code that is like mid flow and freeze it and then resume the execution of some other code this is as you might kind of intuitively imagine this is a fairly expensive operation on the other hand with cooperative scheduling it is just a matter of the flow of that python program continuing so it's just like you know one function completes or whatever and the control flow kind of naturally flows into another function or another another generator is resumed or whatever happens it all just happens at the application level and the the cost is no more than just you know the cost that you would incur anyway running a python program and calling a function or whatever you're doing um so the overhead when switching tasks with async i o is much lower and one real world consequence of this is that servers running async i o typically can handle far more concurrent connections than servers based on threads another thing to think about is how easy these two are to reason about so with threads although it kind of feels really simple um to to just write some functions and have them run in parallel under the hood that code inside those functions could get preempted at any time um again let's have a little look at an example of what this might mean in reality so i've got a little file here um and it's got a list an empty list at the top called out and i've got two functions and the first function is going to append the word hello as a string hello and the string icon to this list out and the second function is going to append the string za and then it's going to print out the contents of out to the std out um and then what i'm going to do is i'm going to run first in one third and second in another third so when i run this program you're going to see hello icons i know that's nice enough except that isn't really what like this isn't necessarily going to happen this will happen like but this will happen almost every time but there will be there is like a one in a million chance that this would not happen there's a one in a million chance that the operating system would decide to just begin running this first function get to this line and then preempt this function here preempt the squared uh and then at this line and then maybe run this line and then maybe jump to this line you know like although threaded code may may kind of seem quite simple there are lots of different possible execution pathways that you need to you need to think about and this can make it really difficult to to reason about threaded code um by comparison code that is cooperatively scheduled has to explicitly kind of hand control back to the scheduler and so we you don't tend to run into this problem in the same way um but then on the other hand you know like having having given those two points to async i o we do have to give some credit to threads that it can handle blocking code so you know whatever whatever code it is you don't need to modify it to be non-blocking as as we did previously um threads can just handle whatever you throw at them with minimal effort so assuming that your code isn't blocking uh and or maybe you can modify it to be non-blocking it kind of seems like async io is just kind of the better technology so you know if we were to say move to a world where where you know all new concurrent programs were written in a non-blocking way maybe you could say that asynchio is like just kind of the better technology and it's going to take us take over the world of concurrency so this is the kind of the third question that we want to look at does async io spell the end for threads well i kind of like think a nice analogy is is it's a to say async ios or async is going to replace threads it's a bit like saying that scissors are going to replace the chainsaw because scissors are cheaper and safer and easier to use threads are a really powerful tool they can preempt code any code blocking or otherwise um at any time and it really shouldn't surprise us that there were drawbacks to this kind of um paradigm so the decision flow between whether you should use async i o or use threads is kind of like the decision flow between whether you should use scissors or use a chainsaw if your code can be is non-blocking or if it can be modified to be non-blocking or if you're writing a concurrent program from scratch then using async io probably makes more sense however there are plenty of examples where that just is not possible and you need to reach for the lower level more um like more powerful but harder to use tool for example maybe you're working with a library a third-party library which you don't really have the expertise to modify and that library is blocking or maybe you are working with user contributed pieces of code or you know add-ins which you can't guarantee are not going to do something that is not in keeping with the requirements for an async architecture in fact i've been saying asynchio all the way through but you actually don't even need to use async i o to get the benefits of cooperative scheduling we mentioned before that cooperative scheduling takes place at the application level and so it is easy to create new async frameworks well i mean actually that's a that's a bit of a lie it's probably extremely difficult to create a new async framework but i guess relative to creating a new operating system which is what you would need to do to create a competitor to threads it is easy to create a new async framework so this is another thing which is um another really nice property that asynchio has over threads or sorry that is a really nice property that cooperative scheduling has over preemptive scheduling the preemptive scheduling is going to happen at the operating system level we're probably not going to have many much innovation in that space or people trying to make things easier to use whereas when it comes to preemptives when it comes to cooperative scheduling we have a much more kind of rich patchwork of technologies and innovation going on in that space because you can do that at the application level so in addition to async io there are just many many different other technologies which are out there that rely on the fundamental principles of cooperative scheduling um which are essentially what we demoed previously um i wanted to kind of go into more detail and talk about some of these technologies um but um i i don't think this was my original plan but i don't think there's really enough time that there'll be so much to go through it's such a big topic um i want to kind of draw attention if you are if you're interested in this of course like happy to chat about any of any of this stuff uh anytime during the conference but um two that i want to draw particular attention to right now are curio and trio if you've been using async io and um and you like it and you think it's nice and that you're interested maybe in uh finding out more about other alternatives curio and trio are both created with the philosophy of taking something similar to async io but simplifying it and optimizing it for a better development experience um there's actually a talk tomorrow on trio so that might be a very nice compliment to this talk as well so look out for that um so you know reasonable people can disagree about the best way forward and the best framework to use but it is a great advantage of async or cooperative scheduling over using threads and preemptive scheduling that it happens at the application level and there is therefore a rich patchwork of technologies out there that can help you write async code so that is uh all i wanted to say today uh thank you very much for coming along and listening it's a it's an honor to join you at your conference from the other side of the world um the code from this talk if you're interested is available on github there is a shortened url on the screen there and yeah i'm happy to take any questions that you may have great thank you so much um i am going to read out some questions because we have a couple so um from adam how would you rate processes versus threats in async io yeah sure um well i mean do you know where to start um so like i think that um the the primarily the um i mean so one thought that's going through my head right now is that under the hood processes and threads work in essentially exactly the same in the linux kernel uh so you could kind of argue that they're the same thing um but then the difference is that um with processes i guess that the actual use case would be if you wanted to actually parallelize your code um and take advantage of the fact that you have more than one processor available so in python uh because of the gill if you're using threads then you can in spite the fact that you kind of logically have multiple threads of code running at once they will only take advantage of one processor whereas with processors they will that you can actually literally have you know two threads of code two threads two two processors which are running on different processors literally in parallel rather than what threads are doing which is just kind of giving giving the impression like that it feels like things are working in parallel but actually they're not so really the use cases is quite different um uh so so yeah um i guess that's that's kind of the uh the the short answer for that question okay cool um then we have another question about whether async i o uses threads under the hood but that's been answered already and the answer is apparently no um and i have another question from from james how difficult is it to integrate a legacy threaded package in an async i o project yeah well um pretty difficult if you actually want to uh change every function to be like non-blocking uh but on the other hand so i guess this um this comes back to that question so yeah so thread async io's not using threads under the hood it is it is using uh techniques similar like conceptually similar to the one that i demoed where you've got like that i i had a wild true loop and it has what's called an event loop um which is kind of you know kind of like a wild tree loop uh but probably much more sophisticated um so it's not using threads but you but it can there is an api in async io and also in curio and trio and you know maybe in other frameworks for dealing with threads um in in a way which will play nicely with other async with with tasks which are not threaded and which which are running under async io so there are like apis which will allow you to kind of manage threads from async io i think it's called like run in executor um so so you can you could potentially use that that might be the best path forward for migrating threaded code um yeah i mean i you'd have to it would have to depend on exactly what your objectives are as to which which is the best path forward okay and we have one more question um when things go wrong what is it like debugging unhandled exceptions in async i o code yeah so asynchronous gives you back uh like future like kind of has like futures um i think so i'm not i'm not like yeah i'm not i'm not i don't want to like talk about something uh unless unless i'm sure um what i i can't really give a very authoritative answer to that question but what i i do think that this is an area where the different frameworks might take significantly different approaches so this is this is like a thing right i mean it's always a problem in concurrency if there is an exception um how does that get propagated because you don't have that kind of simple like the exception bubbles up to the top of the control flow kind of thing which you would have with um like non-concurrent code um so this is this is something which i think one of the benefits of having uh kind of many different technologies available which rely on concurrent scheduling can be really useful for because they can all kind of compete and innovate in terms of the best way of handling this problem um and i know that this is something which this is probably a differentiator of uh like that curio and trio would would say like we we deal with exceptions in this way and that is better for this reason um so cool and then we have another question from adam um how would you rate the current state of the ecosystem of equivalent libraries um equivalent to async io in the way that you might compare requests and httpx i think that um it it is so asynchio is pretty mature it's being used in production in many places um so and there are others so you know like you have um g event uh which is um like a a web server that uses green under the hood um this is like this is an async i mean this is this is different but it's a similar concept like it is cooperative scheduling so that is very mature and then you have you know twisted tornado there are um mature technologies out there in terms of a sync io and things which specifically use the async await syntax in python um i think trio and curio uh perhaps more so trio but i don't really know is that um my my feeling is that they are kind of still in the i don't know whether they're officially in the beta stage but i don't think they're like widely used in production projects but i think that so this is kind of an area these are kind of up-and-coming tech like libraries trio and curio um i think if you want to use the kind of python 3 4 or python 35 plus async await syntax then async io is probably the most mature but that you know i mean that might be changing very fast okay do we have any other questions from anyone else going once going twice did someone type something no i think that's it so let me see if this is going to work thank you very much um for your talk um and yeah uh more discussion can be had on discord um if anyone has later questions um and the next talk will start at um a quarter two thank you very much and see you all later back in the same room hopefully thank you

Info

Channel: PyCon South Africa

Views: 1,214

Rating: 4.826087 out of 5

Keywords: PyConZA, PyConZA2020, Python

Id: NZq31Sg8R9E

Channel Id: undefined

Length: 34min 28sec (2068 seconds)

Published: Wed Oct 21 2020