Functional Programming in Python: Parallel Processing with "concurrent.futures"

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey there and welcome to another video in my functional programming and python series so in the last video you saw how to take a piece of code that used the built-in map function and to refactor it so it it works in a parallel processing fashion so it gets executed in parallel we're processing multiple records at the same time and that can lead to huge speed ups in the execution time and we did that using the multi processing module that's available in python 2 and python 3 now i already hinted at this in the previous video or towards the end of the previous video that there's other ways to implement parallel processing using a functional programming style in python so what i want to talk about in this video what i want to show you in this video is how to use the concurrent module that's built into Python 3 so that's not available in Python 2 but it's kind of the nice and clean interface for doing parallel processing and parallel programming in Python 3 all right so let's bring back the multi processing implementation for a second and just to run this example program again so what you can see here is that well we're taking this input data set we're generating this output here and this takes about two seconds to complete using multi processing and we can see here based on our logging output that I set up that the work is distributed across a bunch of different processes so we have these four worker processes here that we can identify based on their process ID and they're working on these records in parallel and then at the end the multi processing pool reassembled all the results and gives us a list with all these derived or I guess I call it transformed dictionaries here that are based on the input data all right so let's replace this code with the concurrent dot futures module so this is the the new and shiny way to do asynchronous computation in Python it has a clean interface for you working with process pools and also thread pools it's kind of cool it's only available in Python 3 and the first thing I'm going to show you is how we can replace this multi press and processing code here with the with code from the concurrent futures module so let's just get this set up so here I can go a concurrent futures process pool executors and this the way this interface works in the concurrent futures module is that you have these different classes that are called executors and they have there are different execution strategies for how your code is run in parallel whether that's across multiple processes or multiple threats within a single process and they all follow the context manager protocol so we can just enter this executor here and then do stuff with it and then it it makes it very easy to do the cleanup here as well so here I can just go result executor dot map and again you can see here the kind of the central importance of this map function as a parallel processing primitive and I'm just gonna pass it my transform function and my input data and then hopefully this is gonna run alright and as you can see here we're pretty much getting the same result that we did get with the multi processing based implementation again this is fanning out and it's running across four processes parallel it's doing the calculations here transforms in parallel takes about two seconds to complete and then we're getting this eater tools chained object so this is maybe a small difference to what you've seen before where multi processing or a multi processing pool dot map it gives you a list of results whereas this will give you an iterator here and so if I wanted to convert that back into a immutable data structure I probably just call tuple on it and again you know maybe you want to go back to some of the previous videos to see why I needed to do that because I explained it in I think that was the video on doing the map operation that's built into Python directly so again we're gonna you know rerun this and now we're getting the expected output because we're just converting we're consuming this iterator and turning it into a tuple here with all these output elements so we can print them nice and cleanly so that was pretty easy to do you may be wondering okay why should I use this thing here over multi processing or over a multi processing pool well what's kind of nice is that the concurrent futures module it provides a couple of different implementations that allow you to change very easily how your your computations are happening in parallel so this was the process pool executors and I can just swap out the word process for the word thread and then we're gonna get a different result here so now you can see here that the thread pool executors actually performed all these calculations within a single process spread out across multiple threads and I mean in this case it's even faster but this is just a toy example it's not really doing anything it's just sleeping for a second but you can already see here that the big difference is that now everything is running within a single process parallel computation is happening because we have multiple threads within that single process that are working on this data in parallel now of course the question is well when should you use one over the other so python kind of the dark secret of python is something called the global interpreter lock and what it means basically is that no two threads can execute Python code at the same time so even if you have multiple threats running in your Python program they only one of them can execute at a time now this sounds like a huge limitation and in some cases it is but really what happens most of the time is that your thread will be waiting on IO to complete so in this case if I'm calling time not sleep that's an i/o operation that's blocking this thread and that means while that that time not sleep call is blocking as one thread other threats can execute and then at the end they this thread will resume and they will all finish their processing so in this case it doesn't really make a difference however if I were if I was doing heavy number crunching in these threats I would run into the global interpreter lock problem because they wouldn't the end result wouldn't really be faster than running a single threaded version so there's lots more to say about this but really what you need to remember is that the best way to get around this is to use process-based parallel programming in Python or process based parallelism and this is where this concurrent futures module is extremely handy because you know see I did it again I just switched from thread based execution or a threat based parallelism to process based parallelism and this gets around the global interpreter lock problem because every single process has its own interpreter therefore they can all run in parallel you can actually spread them out across multiple CPU cores and this solves the global interpreter lock problem but this is definitely something that you need to keep in mind when you're writing parallel programs in Python and this is also where this concurrent futures module is kind of nice because you can change the execution strategy very very easily and really the process pool executor is just a wrapper around the multi processing pool but if you're using this interface it just becomes so simple to swap out the different execution strategies here and now we're back to a thread pool executor again and we're getting a different result right again only scratching the surface in this video but I hope it in whetted your appetite for doing more parallel processing in Python and to also do it in a functional programming style because I think one of the key advantages off functional programming is that it makes it very easy to parallelize your programs right if you can write your program in a way where it's using a map operation to transform some input data into some output data or let's say if part of your program can be written that way it is extremely simple to paralyse this like crazy you know as you've seen here all you need is you need to import concurrent futures and you need these two lines of code instead of a straight up map call that's built into Python and your code is running in parallel and this can lead to huge speed ups this can be a really quick win for a program that's IO bound or that CPU bound so highly recommended for any kind of number crunching that you do or if it's some super i/o bound program like most web scrapers are where you're waiting a long time for the web request to finish so that we can we can parse out the data this is extremely handy for those kinds of situations and I would encourage you to play with it and just get a better understanding of how this works and what the difference is between thread based parallelism and process based parallelism and how the global interpreter lock works in Python and hopefully I get a chance to do some more videos on those topics in the future salt click the subscribe button to make sure you're not missing those updates in the future and have fun playing with parallel programming in Python alright talk to you soon
Info
Channel: Real Python
Views: 30,230
Rating: 4.9684544 out of 5
Keywords: python, dbader, functional programming, concurrent.futures, python parallel, multiprocessing, multithreaded, functional python, python fp, apply, python tutorial, concurrency, parallel computing, python multitasking, multithreading, python 3, python programming, python trick, python map, parallel programming, immutable data structure, immutable, learn python, programming, computer science, python coding, python tutorials, lambda, threadpoolexecutor, processpoolexecutor
Id: 0NNV8FDuck8
Channel Id: undefined
Length: 10min 1sec (601 seconds)
Published: Wed Oct 11 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.