Raymond Hettinger, Keynote on Concurrency, PyBay 2017

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you it's all about concurrency so all right so I'll do a quick introduction hi my name is Raymond so nice to meet you I have a mission in life to train thousands and thousands of Python programmers I've done that probably about forty eight weeks a year for the last six years and have generated an enormous number of programmers if you'd like to contact me for training or whatnot here's my cleverly obfuscated email address and my training company is mutable mine for those of you who have access to Safari Books Online I've got a series of videos it's available too for free which is an excellent price it's about twelve hours of training the only way my wife knows if I've done any good up here is if you all tweet during this I do anything good go ahead and tweet to her and say Raymond H did something okay and so my talk that's about concurrency and I just I think it's an exciting topic that topics are becoming more important to its over time and I have a lot of advice for you on how to use concurrency and what your choices are and I just want to share that with you so that's my plan so the introduction is to I've got a by the way I'll publish all these slides right after this all I'll give you a link and you can have all of this because it's got a lot of notes in it it's as for a keynote rather detailed and rather technical so essentially this part over here is woman talked about our goal why the heck would want concurrency and no keynote is complete without calling out other martelli's and something that I had learned from Alex so he's in here now and then talking about the the hated global interpreter lock which i think is an irrelevant thing and not a thing that should be hated then we'll have a little battle threads versus processes and then a little battle thread versus a sync and the goal is at the end of this page you have a pretty good idea of when do you threads Wendy's processes Wendy is a sink and the advantages and disadvantage of each does that sound like a worthwhile goal all right achievable and then if you enjoy that if you respond to it really well I've got examples and I can go run code for you including the incredibly dangerous live examples where I have a whole bunch of code set up on a whole bunch script and nothing can possibly go wrong with the live demo wish me luck all right so I'll switch out a slide mode really quickly so what we're going to do is walk through some examples of a concurrency using threading multi processing and async the idea is to acquaint you with each of those show the rules and the best practices for each which raises the question why concurrency by the way at the end we'll have a very brief question and answer a session what I'd like to do is answer all of your questions and we're going to allocate about one minute for it how is that possible concurrency I'll have you all ask your questions at the same time and then I'll try and answer them all at the same time how old is that going to work well let's saturate your CPU okay not your CPU but the server yes okay so are the limits to what concurrency can do for you in the end you have a certain number of clock cycles to go around and you can spend all of those skål clock cycles serving requests when the excetera quest receive exceed the number of clock cycles you've got concurrency can't help you anymore it doesn't provide more computing power concurrency is all about taking advantage of the computing power you've got so why do concurrency well it improves perceived responsiveness I've got two people who wanted to ask a question they line up nicely behind the microphone one person asked a question and then another person asked a question the second person feels like they have to wait on the first person and they don't get responsiveness so if you can ask both questions at the same time you get perceived responsiveness also for improved speed II mainly we get additional speed now when you're using multiple processes you're taking advantage of multiple cores and that way you're actually throwing more clock cycles at the problem and there are certain categories of problems that benefit from this and then lastly there's another reason to think about concurrency I never really thought of this idea until I'd read the book the pragmatic programmer I thought of concurrency is a last resort it's something that you do when you have to but in the pragmatic programmer it was pointed out by Kent Beck that the real world is concurrent things are happening right now if I were to check the news something's going on in the world somewhere simultaneous with now when you're trying to get a project done you have to coordinate with a lot of people they're all working at the same time the real world works this way and we want our computer systems to model the real world as well as possible and so people who live in a single process single thread world with no async aren't modeling the real world at all they're modeling a simplified world that was a three good reasons for concurrency who wants concurrency now yeah all right fair enough so what did uncle Alex have to teach me about scalability years ago he went to work at a little company called Google that had a few computers and apparently they had a task of serving a lot of users concurrently and he walked away with some opinions about scalability there's three kinds of problems in the world there's problems that you can solve with one core now those problems don't sound interesting but there are one Cornell is about a thousand times more powerful than it was back in the mid 1980s so one core can do a heck of a lot so this did not used to be a very big category of things that can be done but in fact you can run tensorflow on one machine and train it in a short period of time to read handwriting or to do some voice recognition or whatnot and the tensorflow demonstrations make that appear remarkably easy is it amazing what you can do with just one little core yes but data is getting bigger and we want to serve more customers there's another category of problem that you can serve with 2 to 8 cores 2 to 8 is I've got multiple cores on my machine and they're hyper threaded so effectively I have 8 thread Ripper came out recently and so pretty recent everybody and their brother is going to be running 12 cores 16 cores and 32 cores so many cores are coming and so Alex had a thought about this he said well you can use threads or you can use our processes as long as your problem fits in here so suppose my machine that has 8 cores as a limit I have a problem that only requires 7 cores with a computing power can I use this machine for it can I use multiple threads and multiple processes in fact i han't can because I have enough computing power but then Ellen thought is I am so close to the limits of my machine that if my problem just grows by 20% all of a sudden I'm out of this range and I'm going to need more than 8 cores and so his thought is if you have a problem in this range you happen to be lucky for just that point in time when your luck runs out you're going to really wish that you had just jumped up here to distributed processing and to the Google way hundreds of thousands or millions of course at the same time by the way do you have to work it to a Google in order to get access to that kind of computing firepower now you could go on Amazon lambda there are lots of tools out there that will let you for a fee run your program across a hundred thousand cores in parallel and even a Google will sell you access to that type of computing power and so what Alex had to teach me is that this is a very very big area there are lots and lots of great things that you can do on one core and other problems this is a very very rich set and his thought that he recommended to me at one point was mostly you don't want to work in this space because you just happened to temporarily be it's too hard for one core but I could do it in under 8 what the problem gets a little bit it will completely outclasses a problem we need to scale oh and so as time goes on the second category becomes less common and less relevant but datasets grow bigger that said even when you go to distributed processing what you'd like to do is take maximum advantage of whatever cores you have you don't really want to distribute across 100,000 machines but only use 1/100 of their power so in fact this is still an area of interest even that the problem is going to be bigger than that in fact distributive processing is basically doing a whole lot of this on a bunch of other machines do you all agree so category 2 is interesting but you don't want to limit yourself to it all right in every story there has to be a villain Darth Vader boom boom boom boom boom boom all right who likes the global interpreter lock oh me too me too I'd like the global interpreter lock quite a bit because if you're going to have to have locks which would be better one simple lock of that covers all the cases that makes all the rest of your codes clear are thousands of little locks that can individually be messed up and are expensive to individually require and melis so our friend Larry Hastings is working on a project called the Gila ectomy to remove the global interpreter lock from Python do you think that's a difficult project that is an incorrect hypothesis it takes about a day of work to remove the Gil he'd removed the Gil on the first day it was no problem taking it out the problem then becomes all the Velux that you have to put in everywhere else in order to get Python to function it turns out that's not particularly hard either and a few days later he had that done so the Gil is gone in replaced by uh lots of other little locks on smaller data structures problem solved any questions Oh our Luck's expensive to acquire and melis in fact they are and so the good news is the Gil is gone it's free threaded and it's a dozens of times slower than regular a Python so you actually get a payoff for the Gil and the payoff is you don't pay all the performance cost of all of these individual LOC acquires and our releases it's actually a really nice thing to have it gets in the way of us free threading but we have ways of solving that problem if you can't fully free thread one Python why don't I run eight pythons in parallel each with their own threads and then it's no problem I'm taking advantage of all of other cores or you can combine threading and multi processing there's a number of ways to go in fact at some point most folks just get over the Python has a global interpreter lock go ahead and saturate all eight cores to a hundred percent and get full advantage of other machine it's just simply ignore the problem there's lots of ways to ignore the global interpreter lock it is not that big of deal for anyone except for Larry Hastings Larry hates the Gil he hates it with a passion no one in this room hates it with a passion but he does because none of you are like him Larry likes to play games and Garret al Larry likes to hang out with a gaming community and Python is not popular in the gaming community because the gaming community is all about I have one really powerful computer with a bunch of cores and the more cores I can throw at a problem the more likely it is I can have clear video and get a clean head shot or whatever is important to a gamer so gamers are all about taking one system and eking out what every possible clock cycle can do so they love threads and Python uh when you do threading doesn't take advantage of your multiple cores therefore python is not popular in the gaming community and that is the section the world closed off to us and his thought is if he can remove the guilt and keep the performance we can recover that part of the community so it is in fact a noble effort and probably something that deserves to be explored in the meantime the rest of us don't have that problem and because we don't have that problem it's my contention the guilt is unimportant to you do you agree how many of you have your work screwed up every day because the gill is in your way and so my god if the gill was removed it basically hardly ever comes up it okay there we go another familiar face all right so the global interpreter lock I'm not going to say is a non-issue but I would just say that it has some advantages as well as disadvantages and that pulling it out has currently has a fairly large cars to us so we need to learn to live with it all right now a little bit about human resources you guys are engineers you know nothing about human resources to you I know all about human resources I dated somebody who worked in human resources and I learned HR type things while I'm learning to hack a computer and bend it to my will there are over other conferences learning to hack you and just like I know how to put a computer into an infinite loop they know how to do that to you in fact one day I was having a staffing problem and I thought I'd discuss it with HR I mean my girlfriend and I said Ellen here's my situation what should I do and she pulled out a little HR infinite loop voodoo and hit me with this well Raymond don't you know your weakness is your strength and your strength is your weakness that's not actionable what do I do get stronger get weak or rigor what are they learn this they're in conferences right now learning more of these they've got a long listing just full medical and it's all over so anyway of course that has nothing to do with engineering salute speak I like I should move on to threads versus process what is the threat of strength of threads the strength of threads that makes them so awesome is that they have shared state and because we have shared state it is easy for one thread to write to a piece of memory and another thread to read back through it with no overhead in communication cost isn't that awesome what is the weakness of threads shared state because we've got shared state we now have race conditions in fact if you have a multi-threaded program and you don't have a race condition you probably didn't need threads to begin with the whole point of having threads with shared state if they have cheap communication cost with that shared state which means you need to put locks around it so in fact Ellen was right your weakness is your strength and your strength is your weakness that the strength of threads the shared state makes it run really fast the weakness of threads is it makes it very very difficult to get correct so let's talk about processes what make what is the strength of processes they are fully independent of each other they have no shared memory and so that makes it easy to kill one process without killing another process which is really kind of nice and you don't have to put locks in because they never step on each other and there's no race conditions inside them that is a wonderful strength of processes what is the weakness of processes the weakness of processes is because they're independent they don't have shared state because they don't have shared state if two processes are going to talk to each other they have to take the objects pickle them move them across a raw socket or some other medium of transport and unpick 'el them on the other side so they have enormous communication cost compared to threat LM was right when it comes to threads and process this is your weakness is your strength and your strength is your weakness who learned something new all right async who's excited about async no one used to be excited about async once in a while a person would read a book on twisted and team would use a little twisted to speed up other code in their servers but it didn't dominate Python conferences there was always a twisted talker - a twisted book or - and a team or two that used twisted and then Facebook open-sourced tornado and there was a little bit more interest and then cuido woke up one night said poor do I want to do with the rest of my life I know I want to live it asynchronously and suddenly cuido became profoundly interested in asynchronous i/o now what's the logical thing to do if you want to program a computer well you should go out and learn a computer programming language so you could a programmer does that sound reasonable and most reasonable people do that but what if you're an unreasonable person an unreasonable person looks around said there are hundreds of programming languages and none of them are just right they don't express ideas very clearly I don't believe in them nobody else has done it right that is a very unreasonable position and in response to all these fools using other programming languages you invent your own unreasonable people invent their own programming languages in all progress depends on these unreasonable people so what's a reasonable thing to do if you get interested in async programming well a reasonable thing to do is download twisted or tornado army are one of the many many many async packages that are out there that are well tested well-documented broken it that's a reasonable thing to do what would an unreasonable person do I mean what did cuido do it's called a think i/o from scratch built one from first principles and so because he's become interested in it it's become a key part of Python it's now part of Python three five and three six async i/o and it's starting to permeate the rest of the language which got new keywords a thinking a way to support these tools Oh and because cuido became interested in it suddenly everybody else woke up sit oh this must be cool I should do it too do you need a sink are you interested in anything new is it I share that interest too and a lot of folks are interested a sink but they're not sure what is it what's the difference between threads so I'd like to give you a little model so you can talk about a sink this is going to be the if you know what is in the next few lines about 200 words if you know these 200 words you will be the cool person at all the Silicon Valley parties that's it yeah you know about threads versus a think and a whole bunch of people will gather around here have a drink tell me more so this is what you need to become popular in Silicon Valley parties they would say John tell me about threads and John would say you know threads which preemptively what this means is the system decides for you when to switch tasks this is great and very convenient because you don't need to add any explicit code your code will just be running along and suddenly there will be a task switch and you don't have to do it yourself and the world is magically concurrent did you paint a really good picture in fact you basically get to concurrency for free with us red you just say say something that I was doing before run it in the thread and poof now it's happening in parallel it's actually not much more difficult than that to all watch threads it's almost trivially easy because it's pre-emptive pre-emptive means you're right in the middle of doing something and then someone else the thread manager decides to switch to another thread and then come back and turn you on this is great because the programmer has to do very very little to get this on is there a cost to this convenience there is because you can be interrupted at any time you have to assume that to us which can happen at any time so if you are trying to arrange things nicely to where two things have to be consistent with each other I'll update this variable and that variable that have to be equal to each other your problem is if you update one and get preempted the other one might not be updated and you will leave the system in an incoherent state in fact that's the reason for the global interpreter lock in Python is as you execute your Python program global state is constantly updating which task is running which line number are you on which opcode most recently executed this global state is updating and at any time the thread could come in and switch a task and right in the middle of an update you could switch and the system would be left in an incoherent state so what do we have to do anything that's important it's called a critical section and we have to guard it with locks or queues or some other type of synchronization tool and the idea is if two things have to happen together I acquire a lock which says nobody else should be running right now do the critical section then release the lock and let other people run and so the challenge is in multi-threaded programs is to identify all the places where the badness can happen where you can left they leave the system in an incoherent state input locks around it how many of you've ever used a lock before and you've seen examples of it in books yeah and you might have seen it in a operating system class at school and the problem with almost every published example I see on how to use laws is that they are way too simple you've got all one little resource you've got one lock to acquire one to release and because they show you a simple example it creates the illusion that locks are easy to use but when you start to put them in larger systems outside of operating system you'll find that if you add enough locks it becomes insanely difficult to reason about your code to know whether it will ever deadlock whether it will starve a process or what not the dining philosophers problem is an example of this of the simplest example of a problem that most people have a hard time solving using a locks there are correct solutions to it just most people don't get to it easily in other words we have learned over time that is insanely difficult to get large multi-threaded programs correct but at least if you do all the work make the proofs and think through it's possible to get correct isn't that good news and once you've got that you can retire right I made of large correct multi-threaded program with lots of locks someone else will maintain it for me and it will be fine in perpetuity the problem is locks don't lock anything they're called locks which is a really really really bad name for them what is a lot it is essentially a flag a signal and if someone else checks that flag or signal and says oh it's locked I'll not touch that resource but it's only if they check in fact the lock doesn't lock anything at all if you law set up a lot for access to the printer what all of the other threads are supposed to do is every time they want to a print they should acquire the lock what if they forget to acquire the lock can they print anyway in fact that's the case so even if you have a large multi-threaded program that's correct it won't necessarily stay correct over time the tiniest little adjustments to the code can cause it to become incorrect in a way that's hard to see during code reviews so this technique is fairly fragile and most people thought a lot of experience have learned to develop a natural revulsion or aversion to large multi-threaded programs and at a version it's not because they don't know how to do it they just know that it is a fairly hard task and just when you get it right it doesn't cause it to stay right in perpetuity fair enough all right now what is the limit for threads your limit as always how many CPU cycles do you have to begin with but you don't get to use them all because there's a cost of town switches and a cost of the synchronization so every time the tests which is that eat some CPU cycle and every time you acquire Malisse locks at each CPU cycles so what Larry has found out is if he put just takes out one big lock and puts in lots of small locks the total cost of this goes up quite uh quite a bit making Python far less performant we learned something new so multi-threading will it give you more hardware computing power than you started with you're always worse off with threads it always eats some of the power so it never adds power to the system the question is how much does it take out and we can rank the heaviness of process switches versus threads which is versus light weight threads and green lips and whatnot the whole reason for the existence of tools like our green lights is that these test switches are fairly expensive and so green let's essentially tries to get around of this by not paying the cost of other tasks switches is it fair to say that with you you're trying to maximize the total CPU power that you're going to throw away some of that power how when you start to multi thread so should you use threading at all the answer is yes if you don't need 100% of that CPU power threading is actually a pretty reasonable way to go if on the other hand the cost of your threads eats up the CPU power you need and you need to get it back there must be a better way asynchronous so the difference between async and threads roughly is the async switches cooperatively that means you don't get interrupted at arbitrary time what you do is go about your work and then when you get to a good stopping point you go back to the async manager or event loop and say you know what you can let someone else run now I've gotten everything neatened up on my all of my estate is in a consistent state and now someone else can start working and so to switch cooperatively you actually have to alter your code unlike with a threading you have to add a explicit yield or in a way to cause a task switch so what's the benefit because you control when the cast which is a queerer you pretty much no longer need locks or other synchronization primitives by the way whenever I make a broad sweeping generalization those things are always not true in fact sometimes in the async world we the equivalent of all locks or synchronization primitive but by-and-large a big advantage of a sink is a lot fewer locks and a sink revolution also the cost of a task which is incredibly low who's ever written a Python function before who's ever called it the process of calling that function is more expensive than a task switch in a sink who thinks that's kind of cool that means async switches are cheap cheap cheap cheap because essentially they're using generators under-the-hood generators store all of their state and to turn a generator back on we just need to call that generator and say keep going and it takes less time to do that than a function because the function has to build up the state build a new stack frame on every call whereas a generator already has a stack frame and picks up where it left off is it fair to say that out of all the techniques of switching back and forth between tasks that this is the cheapest and not the cheapest by a little bit it's the cheapest by far so if you need some concurrency and you're choosing between threads and async you start with something that's eating on needs a hundred and let's say 75% of your CPU power you add in threads and they just cost you 25% of that Bob power leaving you with 50% but if you put of it in a sink it eats up 1% of your peeps CPU power leaving it with 74 so do you have more cycles left if you use async so in terms of speed async servers tend to blow threaded servers out of the water and the comparison is oh you can run hundreds of threads but thousands or tens of thousands of async tasks per second which is a amazing so async is very very cheap so one of the reasons it's popular is it as low overhead as everyone see why folks are excited about a sink now no locks that's cool and because you don't have locks it's a lot easier to get your code correct you just switch whenever you've got all your duck when all over your state is a consistent and you don't have to worry about arbitrary interruptions so the coding is a lot easier and the speed is a faster isn't that all so how many of you love a think now easier to get right then threads and much much faster and lighter weight and can handle enormous volumes are there any disadvantages well there's a little disadvantage of you have to say yield or a weight every now and then to say okay I you have to do the cooperation part so you do have to add a little bit to your code but that's not very difficult is there any downside yes every single thing you do have to be non-blocking you can't just read from a file anymore you need to launch a a task to read the file send out that task let it start reading and when the data is available then go back and visit it and pick it up and so you can't even use f read anymore you'll have to use an async version of that read and so pretty much everything that blocks including a sleep you can't use the regular one in fact you need a giant ecosystem of support tools and this dramatically increases the learning curve to startup threads you say thread start and you're done without async you need to load an event loop of some sort curieux async i/o twisted etc you'll need to switch all of your calls to non-blocking calls and then put in the async in a way in other words the learning curve on this is enormous I can teach people techniques of threading to thread reliably in just a few hours I can teach people in a few hours to use multi processing correctively correctly and get all the benefits out of it but I think it takes days to teach a person to be used async correctly which is not to say that you can't cut and paste an async example and have it work but if you're going to debug it and certainly if you're going to get into the event loop itself you have to know a lot and if you don't appreciate this go look at the ace of the documentation for a sink in three six and then look at concurrent futures and when you see the entire ecosystem you'll realize I thought I knew Python but in fact there's twice as much Python as you know the other half of it is async and they think is continuing to grow it is reaching its tentacles out through out the entire language and anything in in the language that doesn't fit well what they think is about to get a parallel version of it that will all be asynchronous there's context managers in context Lib that don't play nicely what they think so we're going to get a whole new set of context managers that are async aware of many many tools in Python will get a non async version and a async version and it might in the end double the size of the language so it our little downside to it there is but there's also a nice payoff what is my belief of what will win in the future I think a think is the future threading is so hard to get right it is so expensive and I think if the ecosystem gets better and better it will get easier to use the best practices won't be known and once you learn those patterns you can get up and running with it pretty quick if you were at the talk last night probably wu kasia told you that they had a great deal of success with it at a facebook there is some on ramping time but once people have crossed that on-ramp they can do quite a bit so here's the comparison AC maximizes your CPU utilization why is it better than threads less overhead threading has the advantage of it works with existing code and tools so if you have a lot of libraries and you have a lot of existing code and suddenly you want to be concurrent which will you choose threading or async it's not intended to be a hard question but it is important test tuition and the person being tested is me if you don't get the right answer it means I fail to communicate a very important point you have a lot of existing code that you wrote and existing libraries you want to continue to use you want to become concurrent what do you use threading erasing ready okay because async you'll have to almost completely retool every single thing that blocks needs to get a a non-blocking version that said there's some tools being written to wrap around those and run them in another process and give them kind of an async II feel but those tools also wrap it in a way that's fairly expensive that occurs all the disadvantages of threading so the problem doesn't just magically go away so in general for a complex system async is profoundly easier to get right than threads and but threads require very little retooling you just throw in some locks and cues and you're done and async requires an enormous ecosystem of futures event loops and non rockin versions of everything who learn something new if you can tell this to another person and by the way I'm giving you these notes you can go read this again if you can say this to another person you are now qualified to make decisions on your team should we use async should we use a threading should we use multi processing these are the core considerations oh so I could just say I'm done but they're going to give me more time and they haven't kicked me off the stage should we go look at some code all right what could possibly go wrong nothing could go wrong because I have the code in the slides and can always fall back on a static demo by the way I haven't even tested the internet connection here because I had Wi-Fi issues this morning so I'm now jacked in so maybe that part of the demo will run so I've got two simple examples for you one is an example of I have a global variable a counter I'm going to print starting up loop ten times increment the counter print the count print a little bar and after I printed that ten times I'll print finishing up let's see what that example looks like okay and that would be three six it's ready basic not terribly exciting output this is an easy program that high school students should be able to write after their first few hours of a Python training you need very little Python skill to write a code like this but it's starting up print a loop this is a beginner problem easy peasy make a global variable print starting up increment the count print outside result of the count finishing up how many of you consider that to be easy beginner code I agree now slightly more advanced is some other code which says take a list of our website loop over those ah sites open the website read it once you've read the web page get the length of the page invites print out the URL and of the web page so what this does is get the sizes of the homepage on all of these websites and you'll learn that the Yahoo page is enormous and that the pipe eye page is very very small okay I consider this to be beginner code ah also you still have to teach person about packages you know in Python 3 we have to do dot request in front of it so just URL Lib so the import is a little bit more annoying in Python 3 though it is kind of convenient you can use the with statement so it'll automatically close the URL and release the underlying socket I believe that this can be the skills necessary to be raw to write this can be taught to a person who knows no Python all the skills can be taught in under an hour how many of you agree that this is a beginner Python code and an easy thing and it's easy to get right now can you get paid lots of money to write code that this easy head no because obviously I can't pay you a lot of money for this my daughter is only in fourth grade she can light that code after only one hour Python training you call yourself an engineer my kid can do it and she's still coloring and watching cartoons can't get paid for this sort of thing I know what you're thinking there must be a harder way how can we make this hard so we can get paid for it well there's three ways you can thread it you can multi-process it well in a worse way you can thread it and multi-process it at the same time or you can also do asynchronous so let's do threading first so the scripting style is this one here's the obvious output a function style is to say I'm an advanced programmer I'm going to take this part and factor it out in a function and so it would look like this your worker has the sole job of incrementing the counter and printing the count all I did was factor out these three lines of code and put it in a function they're very professional the kids weren't doing that after one hour of training I write code with functions that are reusable and now I go to multi-threaded is multi-threading easy or hard it's easy to add to existing code because you don't have to retool all you have to do is change one little piece instead of saying worker open paren close paren all you need to do is target the worker and start a new thread and presto concurrency by changing only one line of code is threading impressively easy in fact that's of the case by the way I'm a professional unlike these kids before I ship the code I'm going to test it fair enough so I go to run it and I'm going to test to prove that the code is correct and in fact it gets the answer I want you all are thinking of cheating because it's in a slide but no I have an electronical computer here and we will run the function version of it single and the function version is threating single so this is the code that just ran that's a little tiny on the screen I can probably make this part bigger there we go alright and the next version up was test multi one okay and so threading multi one is the one that we just had on the slide the part that's different is this part here and I'll go run it just threading multi one there the print starting up the count up to ten and finishing up so what I prove to you is that multi-threading is easy and that you can test to make sure your code is correct and it's ready to ship I'm hearing sound effects like some of you don't believe me can you spot the race conditions what is the race condition in this one that's exactly it almost every single person I've ever taught can instantly spot this one that uh in between the looking up account and the writing account another thread could run and have come in of updated accounts you can have two reads each updating the same count writing it out and the consequence of this is we won't get up to ten which waits the question why didn't we observe that effect when we tested it the answer is this happened so fast that it is unlikely that a task which is going to happen in between the read in the right here so this might run correctly a billion times before it ever failed but in fact there is a bug there and it will be quite difficult to observe but also the print itself is a race condition because you have the main thread trying to print finishing up before these others are print however I would only see that in Python 2 7 I don't see it in Python 3 6 does that mean that the bug is gone the race condition is still there the task switching logic changed in Python 3 and so whenever you print it tasks witches all right away so it takes this bug that used to be visible and makes it invisible during testing nice improvement I don't like seeing bugs solvable problem we won't show them to you okay so in fact it is problematic that we tested this and it appeared to run correctly okay so testing cannot prove that the code is are correct it is probably one of the most important lessons of all multi-threading you need proof that said there is a way to undo some of the effects that made it invisible and the technique is called fuzzing and the idea is every time I call fuzz I'll put in a random amount of sleep and so you put in fuzz pretty much in a lot of the places that you would be putting a thinker yield of in async code but you have to put it everywhere because what they sing can yield you control when that tasks with champions for us and threading can the tests which happen at any time so I'll put a fuzz between every step looking up the old counter doing the increment doing the print doing the other print I put a fuzz in between every one in a fuzz between launching the threads and finishing up fuzzing is a technique for amplifying the problems so that they become visible so if you are going to test it's a perfectly reasonable way to do it so this is threading multi - okay so it's the one with the fuzz on the right threading multi - and with the fuzz it should run a little slower will see the starting up and the count is one Oh are there bugs all over the place the count is three came in twice keep in mind the code itself has not changed all I've done is just put in arbitrary time delays these bugs were always there this output was always that possibility so if someone tells you they tested their multi-threading code does it make you feel safe they tell you I tested the multi-threaded code that's being used to lend your aircraft so it is everybody's excited about the Internet of Things except for me I am not excited about the Internet of Things for this reason the reason is I see bugs in code all the time and when there's a bug on the code on my machine or on some website the consequence to me is well the website looks funny or my shopping cart gets emptied and there's something I have to go fix the consequence is basically basically nothing when there's a bug in the logic for my self-driving car it's going to be bad for the person standing in front of that car fair enough so as the Internet of Things ah gets closer to us I would like us to develop a higher and higher aversion to multi-threaded code because it's so difficult to get correct I would someone sending a self-driving car aimed at me I would much rather that car have been programmed with async code thin threading code fair enough just for reliably even if you took all of the reasons performance reasons away it is profoundly better in terms of your ability to look at the code and see it's right who learned something new all right I'll talk to a couple people yesterday from Mozilla where's my Mozilla people where's that over here all right tell me if you know this person you do so this is a fairly famous photo going around the internet this is a fairly tall person who is at a standing desk what do you have to and then just over his head there's this little fine every office needs that sign posted it about that level because a lot of people think they're qualified to write multi-threaded code and they're not are there ways to solve the problem can you write correctly yes you can be more careful and use atomic message queues we have one built into a Python the queue module but there's lots of atomic message queues out there effectively your email is an atomic message queue and so we can fix up the code and actually make it correct with the queues I put in here some advice for you of Raymond's rules of a coding I won't spend a whole lot of time on them because you can read them later and I've got an example I'll say one category of problems is that step a and B need to happen sequentially what's the solution put them both in the same thread and then they will be sequential again it's an easy thing okay another one is a concept of a barrier a barrier is you have several parallel threads launched and you want to make sure that they're all complete there's a simple way to do that you do a join on all of the threads keep in mind every multi-threading problem has a corollary in the real world you have five programmers each working on a different part of your website you can't go live until all odd they're all done so I launched five threads programmer one two three four five and they all start working and then I do a join tell me when you're done now wait now wait no wait tell me when you're done you happen to already be done until you answer yes join join join and after five joins I know that the websites done and I can go live so go live follows five joins it's a simple thing and this is called a barrier what about daemon threat daemon threads what that means this is a thread that's never supposed to finish it's a service worker every time you ask it to do a task it goes and does something for you and it never finishes but it does it sits in wait so your printer in your office typically is running a daemon --thread the printer never turns off in fact it just waits for somebody to send it a print job when it finishes the print job it doesn't turn off it never returns so for daemon threads you can't do a joint on them because they never finish the printer never turns off so what do you do you do a join on the message queue rather than on the thread itself who designed that API who's standing in front of you on stage the only person okay so that was me so I created that one before that what we used to do is the only way to figure out if all the tasks were done is you put a poison till message in say I've asked you to do Tim things and then the 11:00 thing is I want you to send me a notification back that you've done everything else and so you have to have a two way to message queues one in and out and a poison pill message to drop in and drop in and a poison pill to get back and now you can just use join which says wait on all the tasks to be done all right Raymond rule number four sometimes you need global variables to communicate between functions you may think global variables are terrible but in fact one of the reasons for using threading is so that you can use global variables the shared state and so of the solution though this can be a disaster for something that works in a single thread program is a disaster in a multi-threaded code you can wrap locks around it but there's a bedding better way in the threading module you can mark it as local and that means each thread has its own copy of that global variable the decimal module does this and so when you set additional context it's only for that current thread and it's not going to change the context for any other threads Wow pretty much anything that could be potentially paralyzed that has global state you should wrap it in threading local alright and this one's kind of important because I don't know why people don't seem to get this but it happens all the time and you'll see being a highly a voted of a question on Stack Overflow there are some mass murderers in this room there are those of you who hate threads and want to kill them and you're always asking me my quest how do I kill threads you call me up as a consult and say I want to kill a thread and I said there's no API public API for killing a thread because it's a bad thing to do now if you really really want to kill a thread and I tell you that I'm not going to tell you how to do it what are you going to do go look it up on Stack Overflow it's that Cobra flow they'll show you how to do it and the way to do it is you can load the c-types module reach into the internals of the Python C API and just make a call and it kills the thread dead dead dead instantly by the way if we wanted you to do this would we have put a function in to do it if job I wanted you to do it would they put in a function to do it all they did then they took it out you know why because people used it they went around mass-murdering threads so why did you ever want to kill a thread remember the whole reason for using threads is that you've got shared state and if you've got shared state you've got race conditions and you manage these race conditions through a lock so when you want to modify the state you acquire the lot modify it in melis what if you get killed between the acquire and the release when you kill a thread you have no idea whether it's got a lock or not if it's acquired a lock and you kill it every other thread that's ever going to wait on that lock instantly dead logs it's a disaster so I keep a log of all the consulting calls I get and there's a pattern January 15th got a call from a friend at Cisco who happens to be here Raymond how do I kill a threat don't go killing threads there's not an API for that don't do that March 15 I get a call we have a problem where we're getting deadlocks I look back over and I know the cause should you be in the thread killing business no if you want threads to die you need to plan on them advance you need to be have that thread periodically check a message queue are a global variable that says I don't want you to do your thing anymore and then the thread itself can release its own locks and exit gracefully it takes some extra planning in order to do this that said the context of that particular call they didn't have that option because this was a large system where the threads were being written by other programmers not experienced programmers and if they had a bug we wanted to kill their thread the problem is they could deadlock the entire system what's the solution don't use threads for this kind of thing use processes we like processes because you can kill them fair enough who learned something new all right so applying all five of those rules in here I implemented an atomic message queue leaving the fuzz in and multi three and you'll see the fuzz is still there there's still the time delays but it's going to get the correct answer and that's just the application of the five rules to this code I'm giving you the code so no real reason to study it now I will show you the clean version of the code the clean version of the code is I take the fuzzing back out and you'll see it's not actually that complicated what I've done is taken the counter and isolated it in its own daemon --thread there is now a counter manager the other threads never update the counter they just send a message to the counter hey I want you to update and the atomic message queue does them one at a time we're isolating the resource that was Raymond rule one our number two number one was if you want things to happen consecutively put them in the same thread so after the increment we send a request to print up a change that guarantees these two things happen sequentially the printer is in its own daemon --thread and we communicate with it through a message queue it gets one print job at a time and prints it now every time you want to print you don't print directly you send a message to the printer queue and say princess just like you do with real printers in your office you never access the printer directly you always send in a print job and then it atomically does the jobs one at a time we launched the workers we wait for the workers to be done join which was Rule three after all the workers say we're done we say we're finishing up by the way we have to do something else yes because one worker has been launched other tasks we have to wait for those to complete as well which is also done with a join if you do anything less than this your code is incorrect did we just take an easy problem that was a solvable and about six lines of code and make it hard in fact we did so the original code looks very very small but then new correct code even cleaned up is much much much longer that's this careful threading with lots even without the fuzzing it is bigger and requires more skill to do that said it's perfect it's beautiful and it's a much simpler with using cues I know what you're thinking there must be a worse way there is another way to solve our problems and it's with locks are written to solve race condition and people tend to reach little for locks first because when you read about race conditions the first thing someone shows you is a lock you should never show anyone locks unless they're writing operating systems because locks are hard to use in bigger systems so let me show you the same thing using a logs and some threading I'm going to get the version with LT blocks I'm gonna get the clean version about the fuzzing and you will see that it is still very short there's still a couple of joins in there but instead of message queues we're using the wood statement and doing a width printer lot did the print that acquires the lock does prints atomically and then releases the lot all of this is done in the context of having the counter lot most people who learn about locks and attempt this problem don't indent this one under the other one and they leave a race condition in their tub and because most engineers I teach are already practicing engineers real coders solving real problems and they almost all make that mistake that tells me that people don't reason directly about locks even for simple problems but it does solve the problem and I can demonstrate it over here threading lock clean version and it gets the right answer every time is it possible to make correct clean code in fact that's the case so given that this is possible to do with a little bit of training how many of you are all like this the lock approach well if you think it'd be reasonable to you ran across code like this there's still something to not like about it it is beautiful the wood statement makes it beautiful it's not that hard to reason about once you get it correct there's something else wrong with it what was our whole reason for starting threads to begin with what do we want what's the subject of this talk concurrency here's the last of my rules if you put enough locks into a program eventually you throw away the concurrency and it actually executes sequentially this program is now fully deterministic and it runs the same way every time it actually is isomorphic to the original program so it is slower and more complex than the original but with none of the advantages and so people don't find this out early on they think a problem is going to be simple they make it threading they get a raise condition they start putting in locks they've got bugs they start putting in more and by the time they get all the locks and they realize hey we're actually slower than we started out to begin with am i a huge fan of locks I am NOT they note some locks locks don't lock anything they're just flags and they can be ignored even though there's a print lock you can print anywhere in the program they are a low level primitive and they're difficult to reason about message queues are a lot easier to reason about and the Morlocks did you acquire the more you lose the advantages of a concurrency the more sequential while your program becomes that is the world of the reading I know what you're thinking there is a better way the better way is multi processing and so this is the script we showed before that loops over all the websites I'm actually curious if it runs right now whether it runs or not is primarily dependent on whether the internet connection here it's working so MP single if it could I just okay new Internet even though plugged in not cool all right I will not demonstrate the program because I am NOT connected to the Internet all right had I run this it would take about 25 seconds to loop over all of these sites get all their sizes and print them out this takes from ever the pro code is correct and it's not broken that said you will get a complaint from a user that makes it sound like it's broken the words that they will use are this it hangs what does hang mean hang doesn't mean broken hang means taking a long time to do something when you want quicker results I have this bug at my home all the time I'm trying to get my son all ready for school it hangs go put on your shoes it hangs get your backpack it hangs hangs means it takes a lot longer to respond than I expect I am i improving my code by moving the body out into a function gets a site size good news is I've used professional programming technique and it's reusable bad news it hangs so we ask yourself can some of this be parallel Eliza Boleyn oh you get the benefit of the benefits of concurrency come only for the parts that are parallelizable and the important realization is not everything is parallelizable some things are intrinsically sequential the classic example is a baby-making it takes nine months to make one baby you put five workers on the task it does not take five times less time you don't put nine workers on the time on the task and get baby in one month so you get a point of diminishing returns very quickly for additional workers on the task so but then there are some things that are parallelizable like mowing a lawn so if you get two people mowing along it takes about half the time not exactly half because there's some overhead and coordination between the two but you can get some improvement most problems are on a sliding scale between baby-making and lawn mowing and this is quantified in something called Aldo's law if you want to pass your coding interviews when you go out to interview be sure to mention on those law any time someone mentions concurrency I've stated it here but basically it says there's some portion of a test that will benefit from running in parallel and some point that's intrinsically sequential and on those walls it has a formula that computes the two and tells you the maximum potential benefit so in this case I take a look at what's going on here and saying can i parallelize it well what it's doing internally if it's uh making a DNS request for the URL then it has to get the response then it acquires a socket then it makes a TCP connection then it sends an HP request then it rates for the response gets all the packets then it counts all the characters on the webpage it's as sequential or parallel Liza ball the notes say non parallelizable because you can't make a TCP connection until you know the IP dress you can't know the IP address until you've sent a request you can't get the results until you've seen a request and you can't count the page characters until you've got them that said you can count the characters in parallel with as you get a packet count the characters inside so there is a little bit of parallelization available in here so this task itself is I'd say 95% baby-making and only 5% parallelizable and so basically not worth all the work it would take to parallelize it that said what if you want a hundred babies well you can't get any quicker than nine months but with 50 workers you can get it done in 18 months and with a hundred workers you can get it done in on nine months so in fact that's what we're going to do here we actually want a dozen different babies and so we use multi processing or a thread pool and this will take the total time down from about 25 seconds to about two seconds the speed that this runs in is now the speed of making of making the slowest baby it turns out the slowest website here Ars Technica is the one that takes the longest to load and it determines the total running time your program can't finish until it gets a response and so our speed now becomes covered by something external which is how fast we can be sent the data this code is still pretty simple and beautiful it's also very easy to get right are you liking multi processing well and the one I saved for last that I'm almost out of time for oh I've got two things one is combining threading and forking let me just say that cover this real quickly we get bugs submitted to us all the time I've linked to one of them if you wanted to take a look at it and it is basically code that works like this this was a summary of the bug report once condensed down someone said I ran this and it hung they've got a thread pool executor and they're using multi process processing so they're using the two techniques together and let it hang it deadlocks it never finishes Python is broken I'll quote uncle Tim Peters who said I should actually put quotes reminisce those of you believe that if you mix threading and forking you are living in a state of sin and you deserve whatever happens to you I'm a little more gentle about it I'd say if you're going to mix threading and porking there's a general rule thread before you fork not after the problem is if you thread first when you fork all the threads get copied and they share the same locks yes ah okay yes I will repair it okay no thread after you port I read it the words are right on the page I said it wrong I don't need to fix the slide slides are correct thank you five bucks uh-huh and a familiar face someone was at the workshop yesterday how was it how could people have possibly know that I didn't tell people was going to be in the workshop just like I didn't tell them what was in the keynote people came anyway what's it worth showing up all right ten bucks now by the way in the front page is my contact information if you want me to conduct training for you and I do have free training videos for you for those of you got Safari online so they're about to boot me off the stage but I would like to talk briefly about this mountain of code I've made for you an asynchronous server I don't think that this an example like this exists anywhere on the web there are very very complex event loops out there but there's nothing that shows from core principles how you use async the part at the top is a server that I wrote from scratch I made my own miniature version of twisted let's concentrate not on that but on the user's business logic at the bottom the users business logic is they have an announcement a function that prints and they want this to run every 15 seconds they also would like to run a server that asynchronously receives the connection from multiple sources and the person could switch to a title case mode or an uppercase mode and every character they send in either gets title case or upper case and we want to handle multiple users what makes the same synchronous well we say async here and everywhere we would have blocked we use a non-blocking version of read line and you put in and await and that's all there is to it otherwise this code looks almost exactly like the single process single thread version this is fairly easy to write and fairly easy to get correct so this is working code for those of you want to experiment from first principles with writing your own say sync and a weight and here we go on the upper left is the code we just looked at in the little server notice that I've set it to a non-blocking mode and I'm using select between the sessions on the bottom left I'll turn on the server okay and it's now waiting a local host and the upper right I'll go talk to it I'll tell them that into local host 9600 okay on the bottom left how about this on the bottom left it says it received a connection and over here it's in down the message we're starting in the uppercase mode I say I love Python and it responds in all uppercase meanwhile down below someone else does it tell net into local host I'm simulating multiple clients here not even simulating these actually are multiple clients it's in an uppercase mode and says well ruby is nice too okay and it responds to that one each one got their own response but up at the top I can send in title and it will switch to a title case mode and I'll give you a big Texas howdy note the uppercase H but the lowercase other letters but a howdy at the bottom is still all upper in other words every user has their own individual state and this list is easy to write a single-threaded mode all I had to do was switch to non-blocking put in the async everywhere I would have waited a result put in a weight rather than blocking and but I can't use the regular read line I have to use the one provided up above the reactor or the the event loop essentially is an infinite loop that says if anything came in on any socket trigger a callback also find out if in the heap of events if there's a scheduled event run that task this code looks very simple the real version of it is in you look at the code for tornado looks almost exactly like this the heart of almost every event loop has code almost identical to this the part that is different is async I that I've been simplistic about here is not handling all different types of futures and putting in error handling callbacks and for that I think IO uses current concurrent futures so async IO is basically all of this code at the top on steroids dozens and dozens of non blocking things and concurrent futures so wrap up thank you very much for inviting me of to your comfort [Applause]
Info
Channel: SF Python
Views: 122,483
Rating: undefined out of 5
Keywords: PyBay, python, concurrency
Id: 9zinZmE3Ogk
Channel Id: undefined
Length: 73min 53sec (4433 seconds)
Published: Wed Aug 30 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.