Garbage Collection in Python: Speed Up Your Code

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome back in this video today we're going to learn about garbage collection and python how it works and what we can do to potentially speed up our coat so let us get right into [Music] it all right so we're going to discuss garbage collection in this video today what the basic idea behind it is how it works how it's implemented in Python what the basic mechanism is behind it and we're going to take a look at what we can do manually when it comes to garbage collection to maybe speed up our code and our programs but before we get into any of the coding I would like to briefly explain to you on a theoretical level what is happening behind the scenes when it comes to garbage collection what reference counting is in Python and I don't want to go too deep into the technical details I want to keep it uh concise and simple here but before we get into any of the concrete coding and concrete experimentation with python I want to just explain briefly what's happening uh in the background and for this I have my paint opened up and the idea of reference refence counting is quite simple now python collects the garbage based on reference counting the idea of reference counting is I have some object here for example object a and this object a can be referenced by other objects a very simple example is let's say I have an object B and B is a list and a list contains many elements a could be part of B so actually this would mean b is referencing a and what python does is for each object it has something called a reference count so how many references are there to this object a and you could say now that this adds one referenced and maybe I have some object C this could now be a class for example and one attribute of the class for example c. name could be pointing to a as well and that would also be+ one when it comes to reference counting the idea now is that when an object is completely unreachable we can completely destroy it and we don't need it any anymore so the idea is if this connection does no longer exist this connection does no longer exist and there's no way to reach a in any way and we delete a then basically the garbage collector can destroy a without any problems that's now the basic explanation without too much technical detail here now the problem is that sometimes we can have uh or not necessarily a problem but the thing is sometimes we can have cyclical references so we can have something like a link one pointing to a link two pointing to a link three and then link three points maybe to L1 again and then I have some external object a pointing to L1 now I can delete L3 I can delete L2 and I can delete L1 however these object will still remain in uh the memory they will still remain available why because there is a path since L1 will not be deleted because the L1 reference count is at least one a is pointing to a L1 uh and because L1 is not going to be deleted from a I will be able to reach L3 because from a I can reach L1 from L1 I can reach L2 and from L2 I can reach L3 as long as this path still exists I cannot really destroy these objects I can delete in Python the variable L3 L2 L1 but I cannot really destroy the objects because they are necessary I can still reach them whereas if I have something else here and this is all from the documentation from the developer guide if you have um L4 link 4 pointing to itself and then you delete L4 you can completely destroy it because there's nothing pointing to L4 the basic idea is now garbage collection takes care of stuff like this it analyzes which objects are unreachable and which objects are still reachable and all the unreachable objects are going to be destroyed so that we don't waste any memory that's the basic idea of garbage collection I hope this was not too complicated the important thing to understand now is that we work with three so-called Generations so when it comes to garbage collection in Python we have three generations generation zero Generation 1 generation 2 now every time you allocate space for an object and every time an object is created it's basically belonging to g0 to generation zero so I can have an object a an object B an object C and so on and after a certain threshold of allocations uh the garbage collector is going to come in and analyze which of these objects are still reachable and which of them are not so maybe it realizes a is still there b is still reachable but C is completely useless it's unreachable we can destroy it so this is one iteration or one cycle of garbage collection so C would be destroyed A and B are not destroyed they persist and because of that they're moved to generation one so now we have a and b in generation one these are objects um and the idea is why do we move them to generation one out of generation zero the idea is that there is an assumption that most objects have a very short lifespan but those that persist long or already for quite a long time are more likely to persist even longer which means that since A and B are still around after this cycle uh we expect them to be around for even longer so we don't have to check them all the time we're going to check them less often than all the objects in g0 so then maybe we some new objects or we allocate some space for new objects DF and so on after a certain threshold is um again uh reached what we do again is we uh check which are reachable maybe these two are no longer reachable D is moved to generation one now after we do this 10 times this is those are Now the default values uh after you do this 10 times a certain threshold is reached for generation one and then we go into generation one and the garbage collector analyzes which of these objects are still around so a b d are all of these still reachable maybe we see okay no B is no longer reachable but A and D are still around and after these 10 Loops of generation zero garbage collection we can then move A and D to generation 2 which is going to be checked even less often it's going to be checked after 10 times garbage collection in generation one um I hope this is not too confusing the basic idea is that for each of these Generations you have thresholds for the first generation you have a threshold saying after 700 allocations this is Now the default value in Python after 700 allocations perform garbage collection analyze which of these objects are reachable which of these objects are no longer reachable and destroy all the unreachable objects when you persist this garbage collection you're moved to generation one when you persist this 10 times so after 10 times generation one is garbage garbage collected this is now the threshold here so after 10 times garbage collection in generation zero a garbage collection is performed for generation one if you persist here as well you're mve to generation two and um that is this uh threshold here so when you when you basically do 10 times garbage collection in generation one you do it once in generation 2 that's the basic idea all right now let us go ahead and play around with these Concepts in Python directly for this one going to open up an interactive python Shell by either typing python or Python 3 into the terminal into the command line and then we're going to import the two packages CIS and GC which stands for garbage collection both are core python packages no need to install anything and one function provided by the CIS package is the function get ref count and this basically tells us how many references are there for a specific object or to a specific object so for example if I say a equal hello world and then I run CIS get ref count on a you can see we have two references here and if I now say my list is an empty list and I say my list. aent a now my list is referencing a and when I run this I can see I have one more reference I can also see where the reference is coming from by running GC get referers and I can type a here and you can see my list is uh part of it so that is also something that you can explore here now the interesting thing here is that you can call the function GC get thresholds to see what the current uh thresholds are 700 1010 what I just explained in my pain and you can also change these so you can cause garbage collection to happen more often or less often so of course garbage collection takes some time and if you run it less often you can speed up your code in certain circumstances not always but we can set the threshold to something else by just saying uh for example 1,000 2030 or something like this and then when I run this you can see those are now the new values I can also see what the current state is so GC get count will tell me what the current state is we have 415 allocations we have 10 times uh generation one garbage collection and one time generation 2 garbage collection already have having or already have happened uh in the past and if you want to see when garbage collection is happening you can enable the so-called debug mode so you can say GC set debug true and this will basically uh show you a message every time a garbage collection is performed and it will tell you exactly when it's happening and how it's happening now for this we're going to move away from the interactive shell into an actual script here we're going to start again by importing GC and by importing CIS and and then we're going to say GC set debug to true so that we can see what's happening and I'm going to use now the example that I showed you in my paint which is from the def guide of python and this is a class link and this class link has a Constructor which takes a parameter next link and it also takes a value in my case now this is a little bit modified this is not the exact example from the uh def guide but the idea is that you have self next link being equal to next link and self value being equal to value and then we're going to have a representation Dunder method which just Returns the value so that we just get a string instead of this python main object at certain address um so that is our class and what we're going to do now is we're going to um what we're going to do is we're going to cause garbage collection to have happen quite often first of all so that we can see the output and second of all so that we can see that this can actually speed up the process if you do garbage collection less often so this is a very artificially crafted example but there are also examples that are actual use cases not just artificially crafted examples where reducing the amount of garbage collection checks that you do is going to speed up your code so what we're going to do here is we're going to create one link which is going to be our uh main link so we're going to say link link it will have no link that it's pointing to and it's going to have the value main link and then we're going to say now that we want to have my list empty list and we want to say for I in range and let's pick a large number which one did I pick here 5 million what we're going to do is we're just going to create a new link l Temp and this l temp is going to be a link pointing to the main link and it's going to have a value L and it's going to be added to my list so we create quite a lot of references here my list append L Temp and what we're going to do here as well is we're going to import the python package time and we're going to measure how long this takes so time or actually start is equal to time perf counter and is equal to time perve counter and then print end minus start to see how long it takes with the default settings uh so we have this now I can run this and first of all what you can see here maybe we can stop this I can then disable it to show you uh but you can see now how the garbage collection is being done you can see here garbage collection is happening all the time um and it's also being locked now you can also see that it happens in different Generations so it happens in 0 0 0 1 0 0 0 1 so after 10 generations of zero it happens uh in generation one after 10 times Generation 1 it happens in generation 2 so you can analyze this whole process here if you set debug to true but of course this is going to uh show a very verbose output if we disable that we're just going to get our result which is going to be 4.31 seconds I can run this again to see that this is roughly how long it takes in this case uh with all the garbage collection being done now we can go ahead and we can change the thresholds I can go and say here GC set threshold and I can say instead of doing it for 700 allocations do it for 20,000 allocations every 20,000 allocations check uh for or do garbage collection um and for generation one do it only for uh you know every 50 times or for Generation two do it every 100 times so if I run this now you will see that this takes much less time and I can also take this to an extreme by saying GC disable this disables garbage collection completely basically says don't do any garbage collection at all and then you're going to see that this runs even faster now what you can also do make for this we're going to use the interactive shell again uh what you can also do is you can uh collect the garbage manually to see what happens so you can say import GC we can say GC set debug true and now I can say GC get count and what I can do is I can say GC collect and I can pass generation collect generation zero and you can see I get this message now collecting generation zero and you can see that the count for generation one increased as well uh now when I collect um generation 2 you can see it collects everything so it resets everything um so yeah collecting generation 2 also collects everything else or basically uh resets everything to the beginning so that is also something that you can do you can disable you can enable of course so you can also disable for a certain section so one thing is for example if you have some database axis which uh causes a lot of uh references to be created but you don't really have much or you don't really have many unreachable um objects so you don't want to really do garbage collection too much you can say GC disable then some code and then GC enable afterwards again and then maybe you can do some GC collect manually to catch up or something uh that is something that can in certain circumstances and certain situations speed up your code massively so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in a comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye that
Info
Channel: NeuralNine
Views: 8,971
Rating: undefined out of 5
Keywords: garbage collection, garbage collector, python, python garbage collection, python garbage collector, python gc module, python gc
Id: pVGujarYk9w
Channel Id: undefined
Length: 16min 41sec (1001 seconds)
Published: Mon Jan 01 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.