python is removing the GIL! (PEP 703) (advanced) anthony explains #550

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to another video in this one we're going to be talking about the Gill the global interpreter lock and pep 703 which is going to remove the Gill um so I've already talked about this previously in a previous video so if you go on YouTube and search Anthony Wright's code Gill uh we're gonna quickly rehash this video but this talks about why you would want to remove the Gill and uh what this means for Python and we're going to start by rehashing that a little bit talking about what the Gill is you know Python's threading situation etc etc then we're going to talk about what needs to change in Python in order to remove the Gill and we're going to talk about all the changes there and then finally at the end I'm going to leave my thoughts on whether I think this is a good thing or not so I guess stick around for that part uh but let's start with just quickly rehashing what the Gill is in Python first off python has real threads it's often a myth that python does not have threads but there are real live threads in this and I'm actually going to reuse some code from that other video uh so quickly I'm going to rehash what the code is here because I'm not going to type this all out by hand but basically we have a little bit of a timer basically going to say how long a chunk of work takes we have a work function it's just gonna run a big CPU bound Loop here and then finally we have a main function which spins up four threads does a bunch of work and then tells us how much time happens and just to show you that there are real threads if we go ahead and run this Python 3 T dot Pi it's going to start doing that work and if we look at the tree of processes here these curly brace ones are threads and so you can see there are four real threads doing work in this code in this program uh and you can see you know these threads each take about 1.4 seconds some of them are a little bit faster at the end when they have less contention but overall this is pretty slow for a single chunk of work here and this is because the global interpreter lock is you know switching off between all of these threads and so they're taking a lot more time here now if we adjust this code to instead of having four threads have one thread run it in a single threaded mode you'll see that each of these threads take a lot less time only you know 300 milliseconds there's no thread contention between a bunch of threads trying to do work here and overall it's actually a little bit faster running this in single threaded mode than it is in the multi-threaded mode and this is because you know lock contention and thread contact switching takes a bit of performance so basically there are threads but the global interpreter log makes them kind of useless because we can only execute one chunk of code at once and this is because there's interpreter State and a bunch of other things that are inherent to python things like the garbage collector and reference counting which we'll talk about later uh that are not thread safe and so there is a global interpreter lock to protect those not thread safe things and this generally means that you can only run one chunk of pure python code things with a DOT Pi extension uh it can only have one thread running it at once now there are some tricks in C where you can drop the Gill and run some you know actually truly threaded code or truly concurrent code but this requires you know a c extension and a bunch of tricks and careful management of state and that sort of stuff and so you don't really see that too often with pure python code there are a few sciency libraries that take advantage of this but in general you're bound by that gill for only one thread executing python code at once okay so that's kind of the the tldr on the kill and why it's you know threading is not green in Python and why you may want to remove it so let's talk about pep 703 this was recently accepted by the python steering console or tentatively accepted I think it's the wording that they use there and that it's going to get tried out and it's going to you know an implementation is probably going to land in Python 3.13 um and middle it'll go from there we'll see what happens with it basically uh so I want to talk about what needs to change here what are the downsides first and then all of all of the changes because a lot of stuff needs to change in order to make python you know the The Interpreter itself thread safe uh in Let's uh start with that okay so the first thing is that all of these changes added up on top of python are going to make single threaded code slower uh there's benchmarks in here towards the bottom way down here let's see five percent so anywhere between five and eight percent uh slow down in single threaded code due to the changes that need to happen here so Gill python will be slightly slower for some stuff uh oh I guess that's the first thing there will be two flavors of python this is actually not new to python there's several flavors of python right now there's you know the debug build of Python and the non-demug build in python in the past there were different modes for like narrow and wide Unicode or different implementations of Malik whether it's Pi Malik or whatever the other one is I don't remember what the other one was because at some point it was always pymalic and so they dropped the little uh ABI flag but there have been multiple implementations of python in the past so this is not exactly new but there would be a no-gill and a gill python two separate implementations uh and that brings us to the first part which is like why are there two separate into in implementations and this is because a lot of the scene level ABI application binary interface needs to necessarily change to support this Gillis future uh for instance the C structure that represents every single python object Pi object uh that needs to change reference counting needs to change things that are core to the implementation of python objects need to change completely and so ABI objects from old python are not going to work in this Gill list flavor of python they will need to be recompiled and then sometimes in some cases code will need to change in order to support this Gill is world um yeah basically every C extension will need to Target both of these things and this has already been true like if you wanted to have a c extension that works with the debug build a python you would need to pre-build a debug wheel and a non-debug wheel no one does this because no one really runs the debug build in production but for instance when there was narrow and wide Unicode in Python 2 people would build different C extensions for narrow and wide Unicode you can kind of think of this like a building for another architecture it doubles the amount of stuff you have to do but you know maybe it'll result in something cool in the end and this also means that there will be two different flavors of python so you'll have to know which one you're running whether you're running classic python or this Gillis python you'll also need some sort of distributor to make sure that this Gill list python is available maybe operating system packages will pick this up uh conda is apparently going to do a bunch of work to make this possible um but it's it's going to be tricky and you're going to need to know which python you're running and know how to switch between the two of them and it'll almost so two different pythons gonna need to rebuild all your C extensions all right so that brings us to the first thing that needs to change Beyond just the the C objects or C python object uh structure and that is changes to reference Counting the way reference counting works today roughly this is simplified because there's some other stuff that happens here is there's an integer inside of the pi object structure that enter gets increased and decreased based on whether it is referenced in code uh referenced at runtime I guess not in code because it's not code based how the thing executes uh and so that number increases and decreases if it falls to zero that object gets deallocated uh this number this integer that's inside that object is not an atomic object or not an atomic integer it's just a normal integer uh and the atomicity of that value is protected by the global interpreter log uh in fact that value across all of the Python objects but you know if you're getting rid of The Interpreter lock you need to figure out a new way to do that and so there are going to be I think I counted three different approaches to reference counting that do not follow the original reference counting implementation in order to accomplish this the first of those is what's called immortalization uh this is probably the easiest one to explain there will be several objects such as you know Singleton's true false none Etc small integers which are pre-allocated in Python in turn strings you know things like empty strings Single Character strings Etc these objects will be considered immortal that is they will not participate in garbage collection at all or reference counting at all they are considered to be alive and always the same for the entire uh length of the process execution um so you know he would never deallocate false for instance you're always going to have false around because something is going to need it at some point and so these will live the entire life cycle of The Interpreter and so they won't participate in garbage collection at all there's actually a difference uh immortalization and I've sort of talked about this in another video um and apparently they can't use it here I didn't read any of that so just trust I guess uh the second type we're actually going to skip biased reference counting because uh that one will come up in a second um the second type of uh uh reference counting is called deferred reference Counting and this is where I get less good at uh explaining things um but deferred reference counting is a specialized type of reference counting that's similar to immortalization but not quite mostly deferred reference counting is used for things like functions or modules things that are typically cyclical and live for larger spans of the interpreter life cycle and are constantly uh increased and decreased references as things are like calling them uh but typically they're not garbage collected until later on in the process or uh you know when the model is no longer in existence and being deallocated which doesn't happen very often and usually happens quite at the end of a uh interpreter life cycle but they're slightly different than Immortal objects because they aren't always alive these are things that could be deallocated like you can hop in assist dot modules and delete a module and you know it will likely get garbage collected if nothing is referenced again um but so this is a alternative implementation of garbage question for those things because they rarely they rarely get deallocated and they're highly cyclical and so uh we'll get to cyclic reference counting later but the garbage collector needs to do special things for cycles and so we're just going to avoid a bunch of work by not so that's kind of the idea of deferred reference counting and of course I'm not an expert on any of this stuff this is my quick reading of the pep so feel free to correct me on the comments but this is how I understand it all right and then the third thing is uh uh biased reference County and the actual algorithm is fairly for this is is fairly complicated you know they can cover a lot of this and there's a whole bunch of text here the tldr from what I've gathered from this is that there is a special type of reference counting that biases that's why it's called biased reference counting biases towards the owner of the object if you're doing the reference counting within the thread that owns the object your increase in decreasing rep that is treated as a fast path uh basically less less contention uh when you unequivocally own an object and if you're accessing an object that's owned by a different thread or owned globally then you need to do a little bit more work that's that's the tldr of this it's a little bit more complicated than that and I did not completely understand what they were doing here but trust that's that's the idea here uh and so those three reference counting approaches are needed because because uh the naive implementation of reference counting which is to just use an atomic primitive for your reference count basically there are integers that uh in most processor architectures are considered Atomic that is writes and reads to them are guaranteed by the I guess the micro code of well not microcode but the word uh the the assembly code the code that the processor runs they're guaranteed to be Atomic in reading and writing uh but unfortunately that's really expensive using those atomic Primitives is much more expensive and would cause a much bigger slowdown in Python and so there's these reference counting tricks that needs to happen in order to make those fast so that's why that's why all of those uh need to happen or need to happen or why there's so many different strategies to reference content rather than just taking the obvious easy thing uh it's because the obviously easy thing is really slow all right cool that's the changes the reference counting uh the rest of this is mostly changes to things like the garbage collector which is next somewhere we've gone too far here we go yeah garbage question uh and this is mostly for cyclic garbage collection most of the other garbage collection that happens in Python doesn't really needs to change Buzz here that is like when when an object reference count flows to zero deallocation can occur but cyclic garbage collection uh and specifically in C python it's cyclic garbage collection uh the garbage collector needs to know an accurate reference count of all of the objects to know whether there are detached cycles that aren't referenced basically you know a cycle of objects but are isolated from everything else that cycle of objects will all have positive rev counts so they shouldn't be shouldn't be garbage collected but if nothing points into that cycle then the garbage collector knows it can you know delete that but it has to know accurate garbage or accurate reference accounts and a few of the types that we've talked about up here you know deferred reference counting uh you're not going to have exactly accurate reference counts all the time and so garbage collection has to change in a little way uh one of the things that is probably going to be problematic and for those of you that have written Java before you're familiar with this sort of stop the world approach to garbage collection and that the garbage collector needs to stop all the current running threads in order to through a garbage collection pause you know juicy pause and so we're going to have GC pause and python I mean we already have that today because it's already running single threaded in the global interpreter locked garbage collection routines but now it'll still be a thing but in the in the gilless world so there would still be a bit of a global lock here as cyclic garbage collection needs to occur um cool the other thing is the allocator needs to change from python so the current allocator is called Pi Malik it optimizes for small python or small objects because that's what happens a lot in Python a lot of small things get allocated and deallocated uh basically it's a special uh memory memory management thing and that'll have to change because Pi Malik is not thread safe I believe it's covered in here by Malik yeah uh the pet proposal is changing it to me Malik and I looked at the pronunciation of this because I was like there's no way I'm going to get this right without guessing uh this is a allocator I believe written by Microsoft which is you know general purpose and thread safe and so hopefully this will be able to replace things it also optimizes for small allocations similar to Pi Malik and so hopefully that will be pretty easy transition there they did note in the pet that there will need to be some slight changes to how memalic works we know like that's such a fun name uh there will need to be some changes to how me Malik Works to support some of the other optimizations and Necessities that python needs but uh presumably Microsoft who is already sponsoring python development will you know step in and collaborate on that uh so let's change this to the allocator uh the other thing that needs to happen is a lot of the containers containers somewhere here yeah a lot of the containers in Python right now aren't actually written in a thread safe manner things like extending list or building a set from a list or other you know digit list set etc those are not currently written in a thread safe manner they assume that the global interpreter log will ensure their thread safety for uh for execution and so some of these will have to change uh in order to remain thread safe when that Global interpreter lock no longer exists and basically the tldr is they're all going to have to have locks in some way or another but adding locks to all these objects is going to make them slower and so there's a bunch of specialization that will have to happen here um they talk about this in in the pep but basically they're going to preserve thread safety and all of the built-in collections by adding locks and then they're gonna have to do some special stuff to not deadlock things like adding fast paths when they don't need a lock or inventing this new critical section thing in C python which didn't really understand to be honest but magical non-lock sauce that's that's what I'm calling it uh and then sometimes uh preemptively unlocking things if a deadlock may have occurred so special new collection stuff special new locking there it's all going to be needed in order to remove the guild uh the last thing that I have noted is probably pretty minor for most people uh but this pattern in C extensions is in not actually thread safe because you acquire a what's called a borrowed reference a borrowed reference in this in Sea land is uh you're you're getting a reference reference to a thing but you didn't actually increase the reference count you're just borrowing someone else's reference and then if you need to turn it into a strong reference you later encrypt that there's actually you know you can imagine a thread being interrupted in between these two calls and then the reference counts aren't correct and you know you could potentially have correct reference counts otherwise object stuff may happen may get the allocator Maybe who knows but uh this pattern is inherently unsafe and so anything that does something like this with borrowed references we'll need to use one of the newer function calls for this or Implement their own locking but for instance fetch item is an implementation of this pattern but gives you back a upgraded reference some C extensions okay a lot of junk days to change the big ones for me uh we're getting to the part where my opinions are coming out so this is you know we've we've left factland we're now in things that Anthony thinks about this uh land so the first part about this is it's going to make most codes slower uh I say most code because most code right now is single threaded in Python because multi-threading doesn't really buy you much uh but it's going to make most single threaded codes slower five to eight percent as the as the table in this pep says and I think that's going to be a huge barrier to entry for the average person that would maybe benefit from uh because they're you know they're writing their web app with with you know flask or or fast API or whatever and those are all inherently written in a single threaded world they're designed around that um it's not really going to benefit async IO either well it may depending on how async IO is implemented because a lot of async uh callbacks are actually just threads in the trench coat but so maybe it'll benefit those a little bit but most single threaded performance is going to get worse the other big barrier to entry and I think this is going to be almost on the level of like the python 2 to 3 migration although probably not as much because I think less single threaded pure python code will need to change uh but needing to recompile every single C extension is going to be a huge barrier to entry for the community at large um you know it's it's there are some C extensions that have not been recompiled for many many years because of things like you know python abi3 um because of of you know stable API in Python you know I I know that some of the stuff that I've built uh some of the stuff I built I you know okay well this one was released a year ago so I I've at least built this one recently but um due to uh ABI 3 I can basically build something you know this thing that I targeted python 3.6 which is ancient at this point I didn't need to recompile this and I can still install this in Python 312 and it would still just work but I'm gonna have to go back and recompile at least some of the versions of this in order to support this no-gill world and I suspect that there's going to be a lot of stuff that is it going to take years for people to recompile the other part and why I think this might not succeed is a lot of code is not written in a thread-safe manner now and will suddenly need to be thread safe moving forward a lot of purelib code assumes that the global interpreter lock exists whether that's implicit or not you know they wrote some multi-threaded code they ran it once or twice it worked there was no corruption and they were good to go but now that the Gill's not there there's going to be all sorts of data races where they didn't exist before and a lot of code's going to need to get Rewritten in order to handle those data races I don't think this is a bad thing that would probably be writing thread safe code if you're dealing with threads been a bit of a no-brainer there but uh The Guild was hiding a lot of thread unsafety in existing code a lot of stuff is going to need to be Rewritten um and a lot of C extensions are going to need to be Rewritten you know patterns like this which you wouldn't look at and say oh yeah this is unsafe or suddenly in the critical path where you know there's a database here and you wouldn't you wouldn't implicitly know this without without knowing this little tidbit about borrowed references or reading very carefully so I hope that there is some amount of like linter tooling or other stuff that can help here but I don't know I haven't seen any linter tooling for C python C code so something something's got to happen here and I don't think that I don't I don't even know if you'd be able to implement like a compiler extension to make this a warning but I don't know a lot of C code is going to need to change a lot of pure lib multi-process or multi-threading code we'll probably need to change c c extensions are going to need to be rebuilt and things are going to be slower so I'm overall pessimistic on this I think it's going to take a Monumental amount of work to get this and make it be successful but there's potentially a lot of gain here I don't know we'll see what happens I'm happy to eat my words later but uh I'm pessimistic this is all that this will be successful uh but anyway that's my uh quick summary of Pep 703 including all of my misunderstandings and my hand weaving over over various parts of it but I hope you found this useful and uh if there are additional things you would like me to explain leave a comment below or reach out to me on the various platforms but thank you all for watching and I will see you in the next one
Info
Channel: anthonywritescode
Views: 41,076
Rating: undefined out of 5
Keywords:
Id: OC2gnyfmwL8
Channel Id: undefined
Length: 24min 4sec (1444 seconds)
Published: Wed Aug 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.