Understanding Python: Multiprocessing

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome back to understanding python my name is Jake and today we'll continue our mini-series on concurrency by covering multi-processing unlike threading multi-processing is a bit more complicated but it's also the best option for computationally intensive tasks if you haven't seen my video on threading I recommend you watch that first this will help you understand the differences between the two if you have then let's get started multi-processing unlike threading is a great way to speed up CPU bound tasks but what do we mean by CPU bound tasks these are tasks that take a lot of processing power specifically from the CPU to complete they're typically things that perform complex calculations manipulate a lot of data or just run generally computationally intense algorithms something like image or video processing would also count now in order to use multi-processing we're of course going to have to import the multi-processing mod module which is part of the standard Library so we have our computationally intense function right here and then a simple list for it to operate on and numbers we're going to divide them into two different chunks that we're going to split into two separate processes but how do we do this well the simplest way for us to do this is actually pretty similar to what we would do with threads we'll create two processes and the way we're going to do this is by calling multi-processing dot process and we're going to pass in a Target and that Target is going to be our function up above Square sum and we're going to pass in a tuple of arguments two square sum that being chunk one for our first process and then for our second process we'll just change that to two so process 2 corresponds to chunk two so now we've defined two processes but nothing has really happened yet thinking back to threading you may know where this is going and that is we need to start each of these processes so we'll call Process one dot start and then process two that's start what this is doing when we call start is it's actually creating processes in the form of separate instances of the Python interpreter in each of those instances has its own memory space and executes code independently of one another so process one and process 2 will each do their own thing not only separate from themselves but from the main process that we're in here and now that we got them going we need to wait for them to finish and the way that we do that is by calling process1 dot join and the same thing for process two you can think of this like driving down a road you start off on a single lane road which is our main process here when we create these two additional processes and we start them we effectively created two more Lanes on our road in those two lanes cars can pass us we can pass other cars we can do whatever each lane acts effectively independently of one another that is until we call join on those lanes that merges those Lanes back together bringing it down into this single Lane with that said let's save this and run it and see what happens okay pretty quick we see that sum of squares from numbers 10 through 50 was 5500 and the sum of squares from the second chunk is thirty three thousand and just like that with one of the simplest examples possible we just did multi-processing in Python and it was really easy and while you can introduce multi-processing to your code like this you're gonna likely want to do this in better ways and that's what we're going to cover with the rest of this video so the first thing that I want to introduce to you is a better way of spawning processes now we can manually create them like we did with the previous example or we can work with multi-processing poles in this example we've got a helper function to check to see if a number is prime as well as our Workhorse function whose goal is to find prime numbers from some given number range those ranges specifically are going to be between 100 million and 101 million we're going to use this variable here to control both the number of chunks as well as the number of processes that we're going to use within our pool okay I keep talking about a poll but how do we create one and what is it first to create a pool we'll create a variable aptly named full which is going to call multi-processing not pull and it's going to ask for the number of processes that you wanted to use and for that we're going to pass in this variable that we assigned before which currently is one what this pool does is it gives us a number of processes to work with it will evenly distribute tasks across the members in that poll because I do want to demonstrate the speed between the different versions of this we're just going to do one member of the pool meaning that everything's going to run sequentially close to how it normally would if we didn't use multi-processing at all so let's store our start time now one of the things I really like about this pool is that it's super easy to both create the processes and get the results with some of the helper functions that it provides for example we can go straight to getting the results of all of our calculations by calling pull dot map just like you would use the built-in map function normally see my video about that you can do the same thing to map our find primes function to the chunks iterable so for each of the chunks that we've created it's going to run five primes now of course in our first example this is just one process one chunk so it's just going to run once ask the entire thing and find primes but when we increase those processes we're going to have more smaller chunks available to pass in to find primes now we will need to collect all the results from all the fine primes that are called so I'll just do that now so this is going to iterate through all the different results within that results iterable and add them all together so we can get a true Count of how many prime numbers we found between those ranges finally we will close the pool and we'll also join on it and at the very end of this we're going to figure out how long it took by storing our end time and of course we want to print the results now as I was typing out that print statement I'm realizing that many of you may not be familiar with some of the syntax you're seeing here not only in the print statement itself but also up here how I'm able to put underscores within number well this is something that python added a number of versions go in Python 3. and that effectively allows you to instead of using commas or decimal points depending on the country that you're in and the standard that you use you can use underscores to make large numbers more easily readable this way instead of having to have a bunch of zeros that they're hard to look at and immediately understand what number that is you put in underscores and The Interpreter effectively just remove them at runtime and use the value as normal and you can do the same thing when you're formatting the output in things like f strings you put a colon and then the format in this case when we put a single underscore here or we could also do a comma if you wanted to but when we played an underscore here it'll do the same formatting as we're doing above this will make it more readable Additionally you can do the same thing with decimal precision so if we do the colon here and then we do a 0.2 meaning we're going to do a two decimal point precision and then F saying that's a float we'll show what that looks like when we go ahead and run this again so we'll save it and run it and since this is just running in one process it's probably going to take somewhere between 15 to 20 seconds or so and with a shortcut those 16.8 seconds have passed in no time at all so we found 54 208 prime numbers between 100 million and 101 million in 16.68 seconds let's put an extra space there just for better formatting okay so it took 16.68 seconds to go through all of that now since we are using this pull we can just bump that number up by two everything else adjusts our multi-processing tool now gives us two processes to run on and we split this into two chunks so we'll run this now and we should see it complete in roughly half the time like I said roughly half the time there is a bit of overhead when creating each process that's something to keep in mind with multi-processing unless it's really worth going and creating additional processes you probably want to keep it in a single process because creating that additional interpreter does take a bit of overhead because it has to recreate a number of objects let's continue on but we don't have to stop it too let's double that again to four and run it again this time it should take approximately fourth of the time as the original yep and here we go completed in 4.68 seconds now we are going to start to see some diminishing returns if we double it again to eight and run it it completed in 4.21 seconds you're not going to get a huge benefit for everything if you just keep throwing additional processes at it there's certainly a sweet spot to this and looks like for us that sweet spot for time is probably somewhere around four maybe five processes of course there's going to be a little bit of a margin of error here depending on what else my CPU is doing but yeah with five processes that's three less than eight I got it down to within .03 seconds of using eight processes so in this case I'd like to stop there and keep it at five or if you want to be a little bit more conservative you can put it back to four and it completes in 4.68 seconds something you'll want to think about when you're moving into multi-processing however this isn't the way that I really like to do multi-processing within Python and in this example we're not really passing data between processes after they're initially created we effectively give them their chunks when they start up and they go from there so let's go into our final example of this video and in this example we do have a few more Imports than we did before the first one you may remember from our video on threading is concurrent.futures of course we're going to be using multi-processing but probably not in the way that you're thinking of right now and then a couple of things to help us down the line for our quote computationally intensive task this one is going to be a lot more simple than the previous ones we're not going to calculate primes or do anything like that we're just going to count up from zero to some ending number so what we're going to use from the multiple processing module if we're going to be also using concurrent.futures well the multi-processing module gives us a really nice data structure that can allow us to pass data back and forth between processes and we'll do this also with a context manager so create a new manager instance and this manager gives us a number of tools that we can work with but the one that we're interested in right now is a queue specifically an instance of the manager's queue object I like this queue object over other ones that are in the standard library because it has built-in synchronization so you don't have to worry about any kind of locking to avoid conflicts between processes this handles it all for you now what are we going to do with this queue what we're going to do is we're going to use it to populate the count to numbers for each of our compute tasks you see here on line 10 we have count two is equal to Q dot get so it's going to try to pull a number from the queue so let's create those numbers now or blank in range 10 we'll say Q dot put we'll be adding a number to the queue and that number is going to be random integer between say sure we'll go 10 million and all the way up to 50 million so we're going to be doing a lot of counting and that's it at this point our queue is populated and ready for us to pass it through some processes and to create our processes instead of creating them manually or using the multi-processing pool we're going to use concurrent.futures processpool executor and this is very similar to the thread pool executor that we used in the threading video here we can specify our Max workers this time we'll start with four if we don't specify Max workers which we can play around with later on it will assign a number likely based on the number of CPUs that I have available on this system but as you saw in our previous example using the max number of CPUs isn't always worth it but that's something we can play around with so let's create all of our features remember that a feature is effectively a promise to execute some task and the way that we create those features is we submit our function compute task we're going to pass in its task ID which we'll call tid doesn't exist yet we're going to also pass in our Q we're going to do this for each task ID in range 10 corresponding to the number that we created here at this point just for readability we may want to tell ourselves that we've submitted all tasks do the process pool but what do we do with these Futures well of course we're going to try to pull the results from them otherwise they're not going to do us much good so we'll iterate or I feature in we're going to use enumerate here just to give us our I and we're going to use another thing from concurrent.futures and that's going to be the as completed function and for that we're going to pass in our list of futures so what this will do is it will cycle through our futures see which ones are ready and as soon as one is ready it's going to give it to us in this for Loop and at that point we can pull out our results so we'll call Print first just so we can talk to ourselves say that we're processing tasks and since we start from a zero index we'll call I plus 1 and I guess since we're going to do non-zero starts here we'll also go ahead and do that for our range here so we have the same thing for IDs I'm going to do that ID out of 10. we'll put some dots there next we'll get the result from that future by calling you may have guessed it future dot result and we'll print that out as well got result from task this being again I plus one and we'll put in result and we'll also format this because it's likely going to be a large number with an underscore now before I forget we also want to keep track of how much time all this takes so we'll store start time let's go to time.monotonic and at the very end we're going to print out that we finished in the current time which is time.monotonic minus start time we'll wrap this all up here just for readability we'll also Dot 2f or two decimal places and seconds okay that's a lot of typing and potentially some issues I'm not seeing there but we're not going to know until we try to run it oh looks like it's running and completed in 4.46 seconds so let's look through this okay submitted all tasks to the process cool let's kind of even up the the spacing here and we see submitted all tests to the process poll okay so at this point it created all the features by calling executor.submit and then we see that we're starting task one and this message comes when we're in compute task counting up to 16 million 887 628 so it's gonna take a bit we're also starting task two four and interestingly enough three took a little bit longer to start up than four and we have a good range of values that each of these is trying to count up to two and three are more than double of what one and four are counting up to and it shows because task 4 which is only counting up to 12.6 million finished first now oh I see the type of their process in let's go ahead and fix that real quick processing Stephen if you're watching I'm sorry for the typo process and task one out of ten but we said that task 4 finished that's something you need to keep in mind with concurrent.futures when using as completed here is that they're not going to come in in the order that they were submitted as Futures but again as the order that they completed so if the order that they come in and the order that you store these results matter you're going to want to figure out a way to put them in the right order now one thing you could have done here is return the task ID as well as result in a tuple so that way we can keep our order by the task ID but we don't want to do this in our case and then we get the result here and since we know that processing task one two Etc doesn't really matter as well as getting the result from that particular task doesn't really matter to the actual task itself let's just look at which task finished at what time so it then starts Task 5 which only has to count to 10 million and that finished before and in the previous tasks even Task 1 which is only counting up to 16 million so you see just how fast this is going it then starts task 6 which is counting up to 30 million and then Task 1 finishes and so on and so forth let's see which task finished last and that coincidentally is Task 10 which task 10 not only had to start last but also had a count up to 35.5 million okay so again all that took 4.46 seconds if we get rid of our Max workers and let it use as many cores as it wants from our system let's run that see all of them started basically at the same time and it completes in 3.46 seconds so it uses more than double the CPUs and gives us basically an additional second quicker run time and before we close this out there's something really important that you need to consider when it comes to multi-processing and that is in order to use multi-processing especially when it comes to passing data back and forth between processes is that everything needs to be pickable if you're unfamiliar with pickle it's Python's built-in tool for object serialization basically a way that it can export some type of object outside of a process itself whether that's saving it to a file or passing it between processes but pickle is what python uses for multi-processing but not everything is pickleball for example things such as Lambda functions nested functions and class is not defined at the top level of a module meaning nested classes aren't pickable and you need to figure out ways to get around that which isn't always easy it's something I've had to deal with in the past and in my particular case aside from doing a whole bunch of workarounds I ultimately decided to move over to using the threadpool executor which doesn't require pickling because it all happens within the same python process but let me know if you want me to dig more into this in a future video there's a lot more to multi-processing including pipes but that's something we'll cover another time and that wraps up this video now that you have a good understanding of how multi-processing works and when to use it give it a try in a project of your own and let me know how it goes if you have any further questions or recommendations for others leave a comment down below to let me know as always today's code will be added to the understanding GitHub repo so check the description for a link and of course if you have any questions or suggestions for topics you'd like me to cover let me know in the comments section to keep up with this series please consider subscribing thanks for watching [Music]
Info
Channel: Jake Callahan
Views: 3,465
Rating: undefined out of 5
Keywords: python multiprocessing, multiprocessing python, multiprocessing, multiprocessing in python, python, multiprocessing module python, multiprocessing pool python, multiprocessing in python 3, python multithreading and multiprocessing, python multithreading, python multiprocessing pool, multiprocessing python queue, python multiprocessing module, python multiprocessing tutorial, python tutorial, python multiprocessing queue, concurrent futures processpoolexecutor, processpoolexecutor
Id: Zziu_duALcE
Channel Id: undefined
Length: 21min 49sec (1309 seconds)
Published: Sun Jul 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.