Multiprocessing in Python | Basics to Advanced | Tutorial - 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello people from the future welcome to normalized nerd today i'm going to talk about multiprocessing and how you can implement multiprocessing in real world cases using python trust me if you can use multiprocessing at the right place then it might speed up your code by five to six times at least so yeah it's a very important concept if you are new to my channel then please subscribe to my channel and hit the bell icon because i make videos about machine learning and python regularly so without any further ado let's get started okay first of all i wanna explain the concept of multi-processing if you are someone who already know the concept of multi-processing and only interested in the implementation then you can skip a couple of minutes for the better understanding of this concept i have drawn here two diagrams on the left hand side we have got the normal slow of programming and on the right hand side we have got the multi-processing so let's come to the single processing of the normal way how the program runs suppose we have a piece of code and we have divided it into four chunks j1 j2 j3 j4 j represents the job so you have got four jobs happening in the program so what happens in the traditional single processing is that first the cpu takes care of the job one then after the completion of it it goes to the job two then it finishes that and then to three and then to four so the total time taken to finish the program is the sum of the times for all the individual jobs now imagine these four jobs are independent of each other now what do i mean by independent well i mean that the order of the execution does not matter so it doesn't matter whether we finish job 2 first or job 1 first or job 3 or job 4 okay so in this scenario when the jobs are independent and they can get processed parallely we can use the concept of multiprocessing so what happens in the right hand side is we have got four processes so when the program starts execution we start four processes and each process will take care of a different job so what's the benefit well here we are parallelly processing the four jobs so in case of multi-processing the time taken to finish the complete program is the time taken by the largest job and in our case job 2 takes the highest time so that is the time taken by the whole program so i hope now you understand the concept of multiprocessing now one very common mistake that beginners make is that they think we can only run as many processes in parallel as the number of cpu cores available to us but that's not true we can even run multiprocessing on a single core well in that case we won't get true parallelism what happens is when one job is doing some i o related work then the cpu is not being used by the process right so at that time some other process replaces the previous process and does its cpu work and this is called context switching now another very important thing i need to mention here is that each of the processes gets it separate memory so there is no shared memory while in case of multi-threading they actually access the same memory so yeah that's a very important difference between multi-processing and multi-threading so i hope this much concept will be enough for you so let's jump into the coding section we are gonna start with a very simple program just to demonstrate the difference between multi-processing and single processing and how the thing actually work so we're gonna have here two different functions you can think of them as two different jobs the first one is a function called counter one so it does a very simple thing it just counts so here you can see that i am running a for loop for n number of times and at each step i am increasing the counter and after the counting is done it's just printing counter 1 done in the second function the name is counter 2 now y2 because here in the for loop you can see that i am increasing the step by 2 so this function should get executed in exactly half the time as the first function so now we have got our two jobs and first i'm gonna run it sequentially so here i'm taking a huge number like 2 times 10 to the power 8. then i am calling the functions counter 1 and counter 2. now to see how much time it takes we have to import the time library and then before the starting of our program we are initializing one variable called st and after the program execution is done we are also initializing another variable called en so to see the time taken to finish the two jobs we can just print the difference between en and st so let's just run this program so both of the jobs are done and the time taken to finish these two jobs is about 13 seconds and now i'm gonna show you how we can use multi-processing here to speed up our program so first of all we need to import a library called multiprocessing now a very important point for the windows users like me that you have to put your multi processing code inside the main block and if you are a linux user then you don't need to care about that so here's our main block so here we have got two jobs so ideally there should be two processes taking care of these two jobs so here goes our first process now to initialize the first process here's the syntax multiprocessing dot process and we need to pass two parameters here target and args the target will be the function that you are trying to do in this particular process so in the process p1 i wanna execute the function counter one and in the arguments i am passing the number now here please notice that you have to pass the argument as a tuple similarly we need to initialize another process called p2 same syntax here but in this case the target will be counter 2. so after the processes are initialized we need to start the processes so that's very simple p1 dot start and p2 dot start after this is done we need to join them so that's it now we just need the time statements so that we can see how much time our multi-processing code is taking let's run this just look at the difference in the single processing case it was taking nearly 13 seconds but in the multi-processing case it's taking only about eight and a half seconds so that's a really good improvement and moreover you can see that counter two actually finished before counter one so that's a proof that our two functions are actually running in parallel okay so that was a very basic program and here we created the processes by ourselves but there's a better way that i'm going to show you right now in this program we're going to write a prime checking function using multiprocessing so let's just quickly import the modules that we will need first i'm gonna write a sequential function that will find if a number is prime or not for all the numbers less than n so first we are declaring an array and initially every element in the array is true and then if we find that some number is not a prime then we will change it to false so first of all we have to run a loop to check every number less than n and then i am checking if the number is less than 2 if it's less than 2 then obviously it's not a prime and we need to replace the true with false and if it is 2 then we don't do anything because 2 is a prime number right and if the number is greater than 2 then we need to do this in a loop and the variable j runs from 2 to the square root of the number now if you didn't know about this method of checking a prime then let me just quickly explain this what we are doing here we are finding if there is a factor less than or equal to the square root of the number if there is a factor then it is not prime so in that case we are replacing the true value in the array with a false so that was our check prime function that does the job sequentially so now we are going to write a check prime function that will be compatible with our multi-processing code this check prime multi function takes num as its input and it checks if this number is prime or not i am using the same method of finding primes as the previous function if a number turns out to be prime then this function returns the number comma true and if it's not a prime then it returns number comma false now why we are returning the number also well you will understand that when we are gonna write our multi-processing code so that was the check prime multi function now let's have our main block here i'm gonna take n as 2 times 10 to the power 6 and to check the primality of every number less than n our single processing code should take quite a lot of time okay so the single processing code is ready now let's do the multi-processing code first of all we need an array that contains all the parameters of our multi-processing function here our multiprocessing function is check prime multi that takes a single number so we need an array containing all the numbers less than n then here i am declaring a number of process so this will tell our computer to initialize 10 processes in parallel here comes the most elegant way of writing multiprocessing program in python the syntax goes like this with multiprocessing dot pool and here you need to pass the number of processes you want to run in parallel then as pool and then you need to use the pool dot map function and here we need to pass the function that we want to run in parallel and as the second parameter we need to pass the array that contains all the arguments to the check prime multifunction and in our case this is just the array of all the numbers less than n please notice that here the results variable stores all the outputs of all the instances of the check prime multi function that is running in parallel so what pull dot map does is it takes the function that we want to parallelize and an iterable that contains all the parameters of the function and it runs the given function on every item of the iterable here the durable is num array and after that we need to close our pool and then we are just printing the results so let's just run this just look at it the single processing took nearly 12 seconds whereas the multiprocessing took only about 4 seconds so that's a huge improvement and here you can clearly see why i returned the numbers also because i wanted to show you that it is really working here i want to say that the method we just explored with pool dot map works only when you wanna parallelize one single job for example in this case we wanted to parallelize the check prime multifunction also notice that here the parameter that is passed to check prime multifunction in each instance is different but what happens when you don't want to change the parameter that is passed to the check prime multi well for that and some real world application of multiprocessing we need to go to the next example in this example we are going to download a bunch of photos from pixabay and we are gonna convert it to grayscale you will notice how multiprocessing makes a big difference in this program so here you can see i have already imported the libraries created the list of urls and also written a function that does the job serially so the download function takes the url list it loops through every url downloads the image and converts it to grayscale and then saves it now i'm gonna write the main block and then to first do the job serially i'm gonna call the download function so after running the function you can see that photos are being downloaded and then converted to grayscale and they are being saved into the images folder now we are going to write the same function using multiprocessing so here i am calling the function as download multi which takes only one url and does the exact same thing then we are going to write our multiprocessing code just like the previous program we are using here the pool object and the number of parallel processes is equal to 10 and then i am using the pool dot map method inside of which you need to pass the function that you want to parallelize and the list of arguments that will go into the function then we need to run the program and you can see here the multi-processing took about three and five seconds so just imagine previously it was around 10 seconds and you are bringing down the time to 3.5 seconds now i want to mention something extra well here you can see that the download multi function takes only one argument that is the url and it is different for every instance of this function but what happens when you need to pass some extra parameter but that does not change with the instance of the function so to demonstrate that i am adding another parameter in the download multifunction called directory path our directory path is images so i am replacing this string with the variable directory path now to accomplish this task we need something called partial from function tools so let's just quickly import that then we can write something like this inside partial you need to pass the function that you want to parallelize and the parameters that does not change with the instance of the function so in this case the parameter directory path will remain the same for every instance of the function that's why here i have passed directory path and outside this you need to pass the iterable that contains all the parameters that will change with respect to the instance of the download multifunction and in this case it is just the url list and obviously i need to initialize the folder variable here and now we can run this program see that's working so that's about it at this moment you should have a really good understanding of multiprocessing in python and i really hope that you apply the multiprocessing module in your code to make it much more fast if you enjoyed this video please share this with your friends and don't forget to subscribe stay safe and thanks for watching [Music] you
Info
Channel: Normalized Nerd
Views: 17,789
Rating: undefined out of 5
Keywords: artificial intelligence, machine learning, normalized nerd, data science, normalised nerd, multiprocessing, python, multiprocessing in python, multithreading, multiprocessing vs multithreading, pool.map
Id: PcJZeCEEhws
Channel Id: undefined
Length: 20min 7sec (1207 seconds)
Published: Tue Jun 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.