Python is NOT Single Threaded (and how to bypass the GIL)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so there's these two misconceptions that I often encounter in regards to the Python programming language I've encountered them both in professional settings as well as just reading through comments on the internet and stuff like that and they seem pretty pervasive despite being completely wrong well mostly completely wrong the first one is that python is a single-threaded language this one is just patently false and I can show that very quickly in a second here the second one is if you want to use multiple cores from your computer in Python don't use threading and instead use multi processing this one is a bit layered because it's technically true so if you are using pure Python only then this definitely makes sense and the only context in which you should use threads is if your code is waiting on IO in some way if you're trying to do something that's computationally heavy or requires the use of multiple cores from your computer using threading is not gonna get you anywhere however when was the last time that you were doing something computationally intensive using Python and we're using just pure Python I'm going to bet the answer is almost never because you were probably using numpy or Sai PI or numba or siphon or calling some C library or using PI torch or you know a myriad of things and when you change your context that way this statement stops being entirely true and I'll show examples of that as well so let's debunk the first of these misconceptions that python is a single-threaded language and that whenever you create a thread in Python what you're doing is creating a green thread and not a system thread debunking this one is relatively straightforward I have an H top instance open on the left and I have put it into the tree view and searched for the Python process so here I have opened an ipython session in my virtual environment RL and I'm gonna go to that on the terminal on the right as you can see there is only one thread under this sort of main process that's the main thread so we're gonna go ahead and import threading and then create a new thread and then I like to keep most of my threads as daemon threads because that makes sure that they exit whenever the main thread exits and then we go ahead and start it okay that didn't actually do anything because I forgot to give that thread something to do so I have as disproving my own disapproval I have given the thread something to do now so I just made this function where there's a while loop and it's doing something and now when I create the thread I give it a target and then set the daemon equal to true and start and now you can see on the Left we have a second a second thread under the main ipython process and we can do this again we can make a t2 and bam every time we start a new thread in Python and actually give it something to do we get a new thread show up at the system level in HTM so this as far as I'm concerned definitively proves that Python is absolutely not a single threaded language and that Python threads are quote unquote real threads that are managed by the system and they're not managed by some internal event loop and they're not green threads now whenever it comes to async which is something that you definitely should use if you are dealing with a lot of i/o or you know just if you're doing web related work in general and doing a lot of web calls you should use async and async is running green threads it is absolutely a single threaded process but threading in Python proper is not green threads its proper threading now this of course brings us to point two which is another misconception but like I said it's not quite a misconception if you're only doing pure Python work so then comes the question if threading is quote unquote real in Python then why can't we just use threading to make use of multiple cores which I've said before if you're using pure Python code you can't do the reason for this is the infamous global interpreter lock or a Gil I'm going to refer to it as Gil from now on and what the Gil does is that it forces every thread that is running Python bytecode to acquire a shared lock first so if you are writing only pure Python code then you have to acquire that shared lock and because of that at best you're only ever going to be able to use one core this is why when you're is pure Python the recommendation is to use the multiprocessing module because that's going to launch a separate process it is going to launch a separate Python interpreter under the main Python interpreter process and then any communication is going to happen through pickling that's not entirely true in Python 3.8 but that's that's new and so it has a higher Ram burden but it does allow you to use multiple cores on your computer even for pure Python code because it is literally running multiple instances of Python however as I said before if you are doing any kind of computationally intensive work you are probably using something like numpy or number or scythe on or any number of other things when you do something like that you can actually kind of get away from the restriction of acquiring the global interpreter lock one example of this is if you are acting purely on numpy arrays you can safely release the global interpreter lock to do whatever work you were gonna do in fact numpy does this internally but what I'm gonna do is I'm gonna take a page from my number video and show you an example of this happening with a function that I have JIT compiled with number okay so i've gone ahead and written a function that i'm gonna that i've already wrapped in the engine wrapper from number it doesn't seem to take very long primarily because we're getting it so I'm gonna change it so that it does take somewhat longer a reasonable amount of time okay clearly I am way under estimating the modern CPU but I think it's fine we can just throw a lot of threads at it so I really like to use the thread pool executors yeah let's give it a twerkers because I have eight cores I guess four physical cores and four virtual cores and then I like to use X dot map and in this case I'm just gonna give it a dummy array and just have it do a bunch of work I forget if the function goes first or not a function goes first in the iterable goes second it's gonna need to take an argument so let's change the definition we're not gonna do anything with that input and we'll jit it again and then we can give it some ridiculous input that went far too long far too quickly okay so now that this is running you'll see that in H top there isn't quite a difference like I'm not slamming all the all the CPUs there's this peak that you'll see that bounces from core to core as the OS is scheduling the thread in in different locations and here under the main Python process you see the CPU percentage is only a hundred percent this is the case because we haven't explicitly told number to let go of the global interpreter lock we can do that there we go so now what we've done is that we have explicitly told number hey this does not need any interaction with the Python interpreter and as you know the code that we've written for this function really does not so now that we have told it if we go ahead and run that same insane thread pool executors again you'll see that it is actually able to at least break 100 percent now this is not particularly useful in its own right the computation that we're doing here is not particularly heavy now the previous example that I just showed was not computationally very heavy by itself so while I was able to show that the main Python process was able to go beyond a hundred percent CPU usage and start using more than one core it wasn't particularly satisfying so instead I've written up this function that creates a very large matrix and then multiplies it by itself and returns the result now as I mentioned before numpy will sometimes do its own threading in the background but as far as I know it doesn't do that for matrix multiplication it does that for matrix inversion and some other processes we'll see so let's first go ahead and run this guy without any global interpreter lock so let's first go ahead and run this guy without releasing the Gil so we see the same kind of behavior we see a bunch of spikes but the CPU utilization never goes above 100% so now we'll go ahead and make sure that the global interpreter lock is released when this function is called and bam there we have it so whenever you can get rid of this restriction on python threading which is that it needs to acquire the global interpreter lock in situations where you don't need that even with just standard Python threads you can use your CPU you know almost to its full capacity and this is very useful because the Ram impact of using Python threads is significantly less then using multi processing oh and this is very interesting I decided to use process pool pool executors to just show that difference and first of all it's using roughly ten gigabyte more RAM now than it was before with thread pool executors and second somehow even though I've told it to use eight workers it's really able to only use four cores at best huh that's very interesting so here in this particular example the process pool executors actually does worse so hopefully this demonstration of the nuance behind Python threads and the global interpreter lock and how when you can get around it you should get around it and that'll most likely give you very good performance in general hopefully this has been useful and clears up some of the confusion around Python threading for some people thank you very much for watching this video if this was useful to you and if you liked it just hit that like button and consider subscribing if you haven't already let me know in the comments if there's any other Python related videos that you'd like me to do and see you next time bye
Info
Channel: Jack of Some
Views: 104,549
Rating: undefined out of 5
Keywords: python multithreading, python multiprocessing, python, python signle threaded, python single threaded, pythin gil, global interpreter lock, python parallel processing
Id: m2yeB94CxVQ
Channel Id: undefined
Length: 10min 22sec (622 seconds)
Published: Sun May 10 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.