Pablo Galindo Salgado - Time to take out the rubbish: garbage collector - PyCon 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
happy may the fourth everyone we're gonna go ahead and get started just as a reminder please make sure you turn off all sounds from your electronic devices we have up Pablo Galindo who is a Python or C Python core developer and also works at Bloomberg he'll be giving time to take out the rubbish garbage collector awesome so thank you all for coming so in this talk I'm going to what you through basically the current implementation of the C Python garbage collector so at the beginning we're going to make a gentle interaction of memory management and reference counting but it is that we are centering our efforts or how the current algorithm for garbage collector collection works and why it's needed and so a very brief interaction so this is my Twitter handle I am passing co-developer I work now as a software engineer at Bloomberg before that I was in academia doing black hole physics research so if you are interested please come after a talk and talk with me and one of the authors of pave 570 an implementer we merge this last week and I spend my free time like squashing bugs and raise conditions on cpython awesome so good with an interaction of what garbage collection is and wise need so in mind that you have a variable right and for example here is this list and then you delete the variable as you know this is basically removing the name but we're interested in what happened at that point so this is Mary also present itself when you have a function for example in this case this factorized function and then you have an internal variable local to the function in this case is this list of primes and when the function finishes and the resource prime signal did it anymore so we say that the viola goes out of scope and we interesting to know what happens so how she python behaves to clean up that list that is don't need anymore so what happens to it right so some definitions so first we are going to find garbage as the memory that is occupied by Abdi that are not longer need in the program and therefore garbage collection is just a formal automatic memory management we can go to actual Python glossary and we see how it defined the process of free memory when is not use anymore so why we need this thing because otherwise we will run out of memory right if we don't clean the memory that we are using we will soon run out of memory and the idea is that it has to be automatic and this is very important because first we don't need to do it ourselves this involves less thinking and therefore fewer bugs because all of you that have compiled languages and especially low-level ones like see you will may know that memory management is one of the most challenging parts of the language of the source of many of the of bags and memory leaks there is a problem though because having it automatic can be a bit slower if it's not properly so nowadays modern languages tend to not do it manually so for example rest resists a good example France doesn't have garbage collector it has a very clever new approach which is the borrowed checker so this is a way apart from giving other nice features it's a way of avoiding sort of a manual memory management or even garbage collector so C and C++ usually were the same for hire manual memory might swim but there is some implementation of garbage collected version of the language available okay so let's let's walk through some pieces that we're going to need to understand the algorithm and the different faces that it went through so the first thing that we need to talk is finalized errs so basically in mind that we have a class in this case this class dog right and then we create one of these dogs and then we delete the dog and then we assume that this frees our memory right so it's actors have been more complicated because if we go to a Python Docs we will see that it's actually not guaranteed that their method is call especially when the interpreter axis usually is actually called but we cannot rely on that method to recall why not well because but the first thing is that the purpose of the doll method which is called the finalizar is only to free up memory when it's no longer necessary so we should not rely on this method to do some kind of cleanup of resources we have other roaches in Python particular with the statement and context managers that will help out with these things so for example you are going to close a file or finish a database connection you should use a context manager are not there then we are going to introduce reference companies so reference counting is one of them basically the major way that cpython manages memory on lifetime so reference counting is a strategy so the idea is that every object has internally has a number which basically refers to all the other objects that are referring to it for example you have a list and an object is inside the list then we have a with the number will be one because the list owns the object for example in this case here we have X which is a list of three elements and then we click a second variable so we know we say that X has two references one is the name X and the other is the name of Y you can actually try it for like objects so for example in the C's module we have this function called get ref count and you can call it to any object and it will return the number of references that are pointing to the object so basically internal integral the bleeps inside the object there is a catch though because the function will always return one more and this is because the function needs to see the object to return the reference count so you need to be aware that you always need to think one less so for example let's make an example to import malucia's we have this list that we talked before and then which is how many reference it has it says two we know that actor is only one then we click another name for the for the list which is y and then we check again and we see that actually it increased one we need an actor is two and then when we delete the name we say that we're back to two trillion so that's okay you have to be aware as well that a container also cleans ownership of this so if it's not only names to the variables if the variables inside a container the reference count will also be increased once for example in this case we have two variables so if we check the reference count of X we know is one but when we put this variable in a set we see that actually because the variables in have said the reference count increases by one and idea that when the reference count goes to zero we know that no other objects is actually using the one that we are talking about so we can actually destroy the object right so now we're going with a blurring which is cycles so let's imagine this situation I read I'm only using two nodes because three is like a tree and people just sat down attention when these are three in the slides so in this case we have a and B and every these objects point to each other and then there is this arrow on the left which is basically our reference from the outside so some other object outside is referring to a so it has two references one from B and one from the outside and B has one from a so there is a problem right because if we just we delete the reference from the outside then a points to B but B points to a so even if these two objects are not reachable anymore the reference count will never fall to zero and therefore we cannot clean them up so we need something else and this something else is the garbage collector so the situation for them is more clarity will be something like this right we have a lot of objects in our program which are here the blue cycles and they will still calls and then we have these two a and B sitting around and they are not reach hold this means that there is not an arrow from these other objects pointing to them so we need some something else some mechanism to identify these cycles and delete them so this is a garbage collector so to work through the algorithm we are going to imagine basically a situation in which have several cycles so first we need to know how all these cycles are created so for example this is a simple example you can create a class node you can create three of them and then setting attributes you can basically make every node point to the other the other two the situation is actually more complex this is an actual diagram of the relationship between all these classes you need to actually understand this thing but bear in mind that it is actually much more hidden objects that have reference inside in particular you can look how these a lot of things that point to the node type in there and that is actually one of the major source of cycles that you can form so the situation so it may even if you think that cycles are a bit weird it's actually very very common that happens so it's important to having handle in the in the runtime okay so let's go through the algorithm so bear in mind the idea is to identify cycles and remove them so first thing how we are going to identify the cycles okay let's swap two as an example let's consider the following situation which is this one so let's consider these three these three cycles which are separate we have x y&z which is like a triangle of cycles then we have M be an L the form is not important it just to have three of them that will have will different behavior along algorithm so basically the triangle is created as we saw before with our class node and internal reference and notice that we have an extra cycle that will help us see in some behavior later with this set that points to itself then we have a and B again with our class node and both a and B poet to myself an interesting one which is a list that contains itself this is a cycle of only one element which is cute cool so bear in mind also that the situation is a bit more complex because every object has a dictionary inside and is the dictionary the one that actually points to other things so we have to consider not only the ones that we saw before but every dictionary associated with a class so this is the situation right we have all these poor arrows pointing to each other so in the first row we have the objects in the second row we have the dictionaries and the least the has no dictionary inside and then the algorithm uses two linkedlist the first one is the linked list of all the objects that are being considered and the second one is a list of all the edges that are going to maybe be destroyed and I say maybe and you will see why so the first step is identifying which are the cycles so for that we basically are going to copy the reference count into a separate field because we are going to modify and we don't want to mess in case the objects are going to destroy so in this case the algorithm starts basically copying the scene and decrement in the reference count of all the objects that are point for every one of them and then we an object has a city reference count we might mark them it may be and richer so for example these are the numbers but are for the reference count so we start with the first object and we decrease the reference count of the only object that is point by X which is the dictionary so the dictionary becomes maybe unreachable that is what is marked in red color because it has a reference counts of zero so we with y Y points to a dictionary with a grease its referring town same thing with said the dictionary of X actually points to one handset so y&z are decreased by one the same thing for Y and when we read the dictionary said then Y and said are no longer referenced by anything so both become maybe enriched over again the same thing happens with a B and it situation right and then the least a point to itself only has one reference with a greasy so if you see in this situation what happens is that after this algorithm only the objects that are rich all from the outside which is that are you see up there has a reference count bigger than one so basically these are the objects that have reachable from the outside and all of the rest are basically isolated cycles okay so now we are not finished because as you might imagine if X is ritual from the outside we cannot delete any of the things that are active referred by X right because we will destroy an internal state so when as part of the algorithm is trying to figure it out which objects are actually retool for it so the way it works is that we start with the first object if the object has a reference count bigger than one we mark it as certainly reachable which is going to be this green color notice that we also mark X because X is the only object that is resolved so the dictionary of X because the dictionary of X is the only object which is resolved from X ok so we know that these two are Richard then we go to Y and Y has a reference count of zero so we don't know if he's going to reach all or not so we will move it temporally to the list the second list which is the object that we maybe are going to destroy the scenes the same thing happens with set and then we visit the X dictionary and the X dictionary is very interesting because it's pointed to Y and said and notice that y&z are already in the second list which is all maybe are going to destroy them but because they are being point by an object which is green we know that those we cannot destroy those so we need to move them back to the first list and we move them green because we know that certainly those are reachable then we go to the dictionary of Y and it has zero reference counts and we move it to the second list the scenes seen the same thing happens with the dictionary of said a as well B as well and the dictionary of a and the dictionary of E so we are basically moving all of these objects to second list because they have zero reference count and notice an interesting thing which is that every time we have one of these red arrows all the objects to the left are green this is an invariant of the algorithm and it's very important when you are implementing it because if this is not true you have a back and then with L has to no reference can we move it as well to a second list and then we need to visit Y notice that Y is visit twice like the first time we visited it has zero reference count so we move it to the second list but the second time we visited it was moved to the first list because someone else pointed to it and need we need to repeat the algorithm like which are the objects pointed by Y so actually is a dictionary of so we we we move it back from the second list to the first and then the same thing with the object point if I said is the dictionary offset also acts on the rest but we know that they are ritual so we don't need to do it and then we visit the dictionary of why but everything that is ritual from here we know this alive and the same thing for the dictionary say so only thing is this step basically we know that all the objects that are reachable from the outside so they are Marcos dream and the objects that are marked as Rea we know that they're isolated so this is objects that form cycles and they are not ritual from the outside and I'm repeating this because it's very important in some of the assumptions more assumptions one of the things that we're going to do afterwards and it's very important to bear in mind that nobody outside these objects that we have here can refer or see any of these objects so right now that we know the object that we need to destroy we simply destroy them right she's actually no because nothing in Python is actually deleting so there is this thing called reference weak reference so what are we reference so weak reference are basically a way to referring to objects that don't count towards the total sum of referring counting this is specially useful for not forming cycles in the first place but it has other usage especially for example if you are rapping C++ or other languages that have problems with domestic destruction and weak reference I'll special e tricky because we can reference have sometimes callbacks so basically the idea is that if you have many weak reference to a certain object and the object dies all the callbacks are executed basically it's a way of saying oh the object that I'm referring to have died so I need to do something else while I may be closing files or memory removing other cycles so these are very tricky because every time you put a function in when you are finalizing object in c python you have to be extremely extremely careful so in the in our case we know that these are the objects that should by the last pass so these are the two blue objects that we we know from before and list that contains itself so we have two kind of reference weak reference as we saw before so here the the one that has the little f of x are the ones that have callbacks and the other two so is the green ones right and the other two are weak references that do not have callbacks okay so for the one that do not have callback we can just remove them so the reference the weak reference are store actor in a linkedlist which is hidden from Python that lives inside the object so we just reach into the link a list and we unlink them destroy them so for the one I have call back we have to be very careful because it can be two possible situation one is that the the weak reference objects are actually outside the cycles remember that this site this weak reference cannot refer to the object inside the cycle so that's okay because they cannot mess with them but we have to be extremely careful with the second case the second case is one of the objects that we're going to destroy as part of the cycle happens to be a rough week reference to one of the others and this is especially complicated because the callback of the second one can actually mess with our object so for the first one I say we can just remove them because there is absolutely no risk they cannot reach the object in off cycle they cannot like create new references or restrict them or change the field so we just removed them for these ones there is a so the dangerous I saw a slight resurrection which is like the kolbar correctly increase the reference count again and we can bring it back to life so more later on this by the very sight trick which is that the garbage collector doesn't guarantee order of the extraction and under that assumption we can pretend that the reference count the weak reference re just happens to be before the object and we can't we just don't call it because we say oh the weak reference was actually destroyed before the object was destroyed so there's no reason to call it it's a trick but we can do it because we have this guarantees that I took before okay so we have call we have called all the weak reference so we can forget about those now let's call finalizes which is the second problem so finalizes are especially tricky as well why because listen my in this situation so we have a and then we have been they point to each other so the finalizar remembers that are the dell methods that you can implement in python inside is called it's a function pointer called TP finalized in Python 2.7 or well at some point it was a legacy one called TP Dell and I was linked to also to the well function but as we will see later that legacy has been replaced with a better system the problem is the following like we have these two objects and we need to destroy them and then we need to call the final Isis well we have bronan because which which finalize we need to call first right imagine for example of the final race of a need something in the final set of B imagine for example that a is a database and B is a connection or something and when you closed it when you are going to destroy the database you want to make sure that the connection is closed or something like that right which is not the best way of doing it as we saw before but imagine that is a possibility also what happens if the finalization of a calls the financial be for whatever reason so we can we don't know the order and then it's another problem which is for example this one which is resurrection in my name for example we have a global body roll call X we initialize it to none and then we have this class that we plan the dunder del method will print something to know that it was called and then basically when the valve method is called we put the object in the global variable so basically this is resurrecting the class the object is going to be destroyed but just happens that the del method points the object in the global variable increasing the reference count and putting it back to life so in this case when you create one of these objects and destroy the only name that points to it basically the the print time the print function is called and we see that the X variable instead of known now is our object so this is especially tricky because this can mess up with so many things it's actually so tricky that pythons you decided you know if there is one of these and it comes in a cycle and literally not going to do anything and indeed we can actually check that this is true so this is Python 2 or more known as legacy Python so we have this class null we implement this del method we print something to know that it was called is the better the bug in strategy and then weekly one of these two nodes weekly the cycles point in A to B and B to a and then we destroy both names and force a collection in this case we are forcing a collection there basically calling d c dot collect which is a function which is available in the GC module the so basically this this runs the algorithm that we saw before so after that we saw that these two objects are actually not destroyed they are in this special this is special list called GC garbage so there and basically it means that the never destroy there they're going to be there for the remaining of your program and this is more or less a link so in Python 3 we have per for for 2 why I'm 20 true and this adds these fixes this and as safe on deterministic object finalization so this takes into account all the things that are very very complicated like resurrection it takes into account also something very important that is that every day well the structure is finalized but they will finalize it is only called once so basically if the object is resurrected and then destroyed again the del method is not call again and also it takes into account safety and correctness so if we repeat the same experiment again so weakly the null quickly children with with the cycle with force a collection we know now with orange ribbon the bug in strategy does we see that both prints have been called therefore the del has been made and you can actually check that amb are no more so how this works so inside the object that is this function pointer called TP finalize the TP finalize is basically pointing to your del function so these are the objects that we have before from the algorithm and it is the following so for every object let's let's start for the the one which is the first day so we call the finalizer if it's present if it's not present we don't care if it's present we call it here we're going to represent that with this green circle so that means that is called then inside the object we mark that we have called the finalizer so we are going to represent that with the stick this is very important because this means that the object needs to every object needs to know is the finalizer has been called and this is something that we need to store inside the object and this is true for every object which is tracked by the garbage collector which means that absolutely all the object needs to have an extra fill inside somehow therefore increasing the memory consumption of all your Python program it might not all your Python object will have an extra little field but when you add all dem all together it can be an important increase so when when you do this kind of thing and when you modify the Python object extracted you have to be extra careful not to add like arbitrary stuff because if you do this in basically you're going to increase the memory consumption of all the programs that use Python so then you have to also be very careful because another thing that can happen is that when you call the finalizer of the first object it actually go and destroy the lattice I'm old some of them so for example the finalizar of the first object can go and destroy this the last one and this can happen so in this part of the algorithm you have to be extremely tricky because when you have the collection of the objects that you are iterating over they can start appear and disappearing and then your algorithm need to be resilient of this effect this is one of the parts of the garbage collector code which is full of comments explain how crazy this is and how careful this needs to be but it works sort of already works it's okay so then we we know that this can happen and in the second time we call this the other object finalization and we mark it as finalize and this little mark is the one that we need to inspect the next time we try to call the finalizar because if the mark is there we know that the method was called before so we don't need to do we don't need to destroy it so you may think ok this is ok but what happen if the first object is actually resurrected as we saw before so if the of the is resurrected it means that we need to abort all the collection it means that that object cannot be actually destroyed because it has been resurrected but not only that all the other objects that have been referred by that one need to be also resurrected as well because you cannot leave the object in an inconsistent state so basically your ward the collection and put all the objects back but that's basically all the possibilities that can happen you just need to be very careful with where you put things and that your data structures are very consistent so if you check the call there is actually a lot of assertions around this and the algorithm checks all the time that all the collection that is using is consistent and the reference concept we can see at this stage and that the different objects that happens to be there are not never deleted ok so the last step in the algorithm is cleaning the reference cycles so basically the idea is that we are left with this object and this object we know that we already know that we have called all the finalize errs there are no weak reference to these objects and is isolated from the outside so the to refer the reference count that is two of the half are only maintained by the refer the internal reference of both objects so you think about this it means that if some we destroy these arrows the object reference count will fall to zero and they will be naturally destroyed so that's the work of an extra function C function inside the object which is called TP clear so you in Python do need to implement this thing I'll worry about that is I implanted for you automatically in the object the object object but in C if you're implementing sign extension module or some custom class or even even in seitan you may want to implant typically usually we recommend that you always implant typically R so the idea is that even if you know if you are diving you need to do it or not do you need to and idea that this function the only job of this function is remove this internal arrows so for example in my eye simple case the least double contains itself so for the list that contains it sells this function just this deletes all the elements in the list so it calls least not clear for a dictionary it basically empties the dictionary for a set it empties the cell and so on for an object it basically also can set all the at least to none and there is that after calling this function there will be no internal reference so we will be in this situation and the reference panel has fall to zero so we know that we can literally destroy the object and this is and at this stage the thing that is going to be called right now is the actual destructor reactor destructor is a function pointer inside the object called TP dialogue that you don't see in Python but what your implant and extension modules you can implement it in case you want to clean up also some other things like C objects or extra things that your extension balloon is to care about another point we have done our work and this o vehicle dead are we just finished so this is the end of the algorithm ending of the RAM but there is more so the interesting part now is to talk about some optimizations on top of that because if you think about this basically this algorithm runs over all the objects that are tracked by the collector which is like the majority of them so there's some optimization that we haven't talked one of these is the generations so the idea is that the generational garbage collector is a optimization on top of this that is based on idea the objects that remain reachable after a run of the garbage collector so the green ones in the algorithm that we saw before will also remain reachable in future runs because the older the objects are the more likely to be used by the program and this idea is basically also in with the idea that in Python there is a lot of scenarios when you create a lot of objects very fast very little and very short leaf so the idea is that obviously will be created very very fast and will be destroyed very very fast as well so if an object has to drive a lot of rounds of the garbage collector we probably need to run over it less and less because it's very likely to survive so area that we have three collections which we call generations see python has three by default but you can compile C Python with more if you want the first engine generation which is in this by zero is the jungle generation then you have the other ones and the third one is the older and idea that object that should buy a pass in the generation are promoted to the next one basically they are moved and objects in order generation as I said basically have survived more and more collection passes so they so if I had objects in the older generation the Obi is very likely to be around for a long time or at least has to drive a lot those passes of the GC so let's see this inspect this thing in a fickle way so the the generation in the bottom is the younger generation then we have another and the other one so think that there is some errors on the top which I don't if so is there there these arrows here so these two errors can be there can be objects that are eternal so some object live in the actual data segment of your program and these subjects are never the allocated so it may be some arrows also from the outside from objects that will never do the allocated until Python finishes and those those arrows can be also taken into account so let's do this thing so we start with the younger generation we collect so we run the algorithm that we talked about and we after a finishing we know that the objects in grey are those that we need to destroy and the objects that remain blue in the bottom box at the urges that we need to promote so we do that so we destroy the objects in cycles and then we'll promote the rest to the second generation so some time passes and there's our new collection in the second generation so we run the algorithm again cleaning you know weak reference calling finalizer I said I said wrap and then after that we know that all these other objects need to be destroyed because they are not use anymore notice that this means in this generation diagrams that there are no errors from the original to the lower generations right so we see that the only object that is blue in the middle is the only object that has an arrow from the upper generation so after this run we just promote this little geyser and it's promoted then we have the object in the latest generation the latest generation is also a Tweaker it also suffers garbage collection brands in this case I don't I'm not going to show it here but also when the latest generation is run there is an extra a lot of extra steps that are done for example the free list the internal fleece free lists are clear so freely czar is basically optimization because when you create certain objects for example floats or doubles or objects that are very likely to appear when you destroy them instead of the strain then you Marcus that's and use and if you want to create a new one instead of creating a new one you use this the corpse of the other one to put new things in so this basically makes that is much faster to allocate them but when you run in the latest generation all of the all of the freely certainly some of the freely start here so that you may think okay so one when you run the algorithm in a generation so do you basically run the algorithm imagination what we call a pass of the garbage collector when the regeneration reach a certain threshold this means that the generation has a certain number of objects inside and you can actually inspect these numbers so for example you can import the GC module which is not for garbage collector and then you can call this function called get threshold and it will return a double in this case there are three elements in the table is seven hundred ten and ten those are the default and these are the number of objects that need to be in the generation to start Riggin in a collection in this case if you have compiled Python with more generation the tupple will be bigger you can actually call a function called get set set threshold and pass and numbers with n are the numbers of generation if you want to change this value this is an interesting trick as Victor Stenger here will is we'll know very well because if you have bags that happen in the garbage collector and you when you want to trigger more more collection very fast so put in long numbers in this threshold triggers more collection and it will trigger your bag more likely so you can reproduce it this is specially useful is debug arrays conditions so in this case we set up to 1050 and 20 and we call again get first hole we see that the values are changed so you can use it if you want if you have you have a knowledge an external knowledge of how often these generations are filled and you have a better strategy or you want to delay or make more often wrong and garbage collector runs so you can force also a generation a collection by hand if you want so there is this function called GC collect and it adds it receives a parameter which is the generation so you can make a run on a particular generation if you want you don't provide any parameter it will run on the latest generation and it will promote everything up you can actually also inspect all the objects that are tracked by the garbage collector this is sort of giving me of the objects that are alive in the Python program of course this is not all the objects before because all the objects that are not tracked by the GC will not appear here but this is still a bunch of them so the function that you can call is called get GC objects so what like this do you call these objects here and then you get this this list here which is very big so I don't I have not completed and since Python 3.8 you can actually now specify a generation so you can say Oh actually give me all the objects that are in a Python and a particular generation and this is especially interesting if you want to improve the GC or do some experiments of how your program is using the GC or how many objects are in your generations okay so let's talk about the gill briefly so the gill as you may know is a mutex it's actually just a combination of a mutex a condition body one on boolean you basically the idea is that one of the important things that it does that it protects s to Python objects so basically when you have multiple threads it protects them to modify the reference count because this this when you when you can have a Python object or do you refer to it basically you need to increase and decrease the reference count of the objects and this is just an integral so you have multiple threads accessing the same value this is a race condition and because this is the Python runtime doing this wrongly can create real problems so as you know the ill basically makes that your threads not happy but you can use multiprocessing instead so the way that one of the main reasons to have the gill is that it protects the reference count thus one of the main purposes but because we need basically initiative when we place them under chrism if you go to act all Python documentation you can see actually that is explained there so it's placed for example the case of a multi-threaded program when two three single time you increase the reference count of the same object and therefore you can have objects that will be leaked this is never destroyed all objects that will be destroyed before they should and other features of the language and extension will have grow to the pain on the guarantees of the guild but memory management becomes basically the main reason that we have this theme and this is the reason that removing the Gili so hire one of them because basically if you want to remove the guild you need to get rid of reference counting that's that's it so we do basically need a new garbage collector or some modify garbage collector that will basically do the job that the reference cone is doing and this is really really hard for many reasons so the garbage collector is also atomic it means that there is first is only one so it's global there is no one per thread one consequence is that it will be stopped all your threads from running we know that because of the yield there is not two threads running Python in parallel but it means that if you inspect the times that the first take one of them will take a bit longer because he's running the collection another interesting Finnish copy-on-write so what is copy-on-write so copy-on-write is an optimization in some platforms so basically is used for fork processes and the like and it is that if you have a process and you fork it this means that you have a copy of itself and the reason optimization that says well you're not going to modify the the memory because both processes will have the same memory so you're not going to modify it we can share it so only when you actually modify some value of the memory it will trigger a page fault inside the operative system and then it will make a copy this means that if you have four or five or six copies of your program you don't need four or five of six times the same memory but it is a problem and the price that reference counting because just looking at a Python object increase this number just by looking at it and because it increased the number is right into memory and ready to memory copies an entire page and this is a lot of memory and what is worse running the garbage collector increases and looks at the reference count of all the objects that are tracked by the GC and this means that all the objects although all the pages in the memory layout were this object we'll be copied which basically we'll make that is used often then processes you need ten times the same memory and discipline as you may know this is the parental Eastern phase and this a lot of interesting talks about this and they tried to disable the garbage collector and they have you do making odd is in memory going to grow indefinitely so actually not really there is still problems so they actually made some some new adjustments in the future but it is that reference count is still the primary mechanism for this so you can do really still have a lot of variables getting destroy because of it after that they actually continue back to see Python and implanted DC freeze so DC freezes a function that basically freezes all the objects tracked by the GC and you can use it in combination with fork this is basically what is in documentation so you can't read it there and you can use it in Commission to fork to basically make it a bit easier in this situation and don't suffer that much for this copyright violation semantics once you want to restart the GC overall the subject you can call this here unfreeze so let's go to some interesting practical applications of this to basically wrap up so one interesting thing is the back in reference cycles so you may be interested in saying okay I have my program right and I basically want to know which objects become parallel cycles because one of the things of cycles is that is very difficult to know when they happen because it's not only your immediate objects they will have internal structures inside and they will form cycles of themselves and they are really really tricky to find so one thing you can do is for example let's let's do this situation so we have our class node there as before so we create two of them we point one to each other and then in the GC module there is this barrier here called set debug and then you can place a flag called the back say ball so when you cut past off that flag what is going to do the disease that instead of a stranger objects that is going to put it into this special variable in the GC module called DC garbage so in this case is you follow the code sorry for the back background if you delete the names and force a collection you can inspect the DC garbage list there and we see that our two node objects at there so the idea that you set this variable then you run your program it and then you inspect the contest on this GC garbage and then basic you have an idea of which objects are becoming part of cycles so you may want to now use maybe weak references or I don't know look at your program to improve it or something like that so this is an interesting theme another interesting thing is for example calculating memory costs so this is like a simple stuff and it is that the GC every object for the object for the GC algorithm to work needs to know how to reach all the objects that contains so for example a list needs to know how to reach all the objects that are in the list a dictionary needs to know how to reach the key on the values a class all the attributes etc etc basically an object need to know how to recourse itself and you can use this theme so for example if you have this list here in the top so it's a list that has multiple lists inside and you may be tempted to calculate the total size of the list and then you can say okay I will call C get size of in X and it reports 90 96 bytes and then the point is that that's the does not the like the recursive size of the list it's only the size of the list plus the size of the pointers because you'll get this other list here with that one which has the same number of elements but they are less because they are not least by themselves the size is the same but you can use this function in the GC model called get reference so this value this function here will automatically know how to reach all the objects that are inside and then you can call this function recursively and then calling get size of within every element of the list actually there are dictionaries as well so you can actually know the total side in this case I'm not Cohen recursively I'm just calling it in this first level but you can use it as a genetic way of reaching all the objects in a generic one so the last thing that I want to talk about is an interesting thing that happens which is exception blocks so let's listen my in the following scenario right so we have a for loop there so we have 4x in range something then whatever and the thing is that the variable X that we have in the for X in range you can actually inspect it after the loop is done so in this case range 10 actually the last element is 9 so you can actually print x over there and you can print as well the last value of X is 9 this also happens with ex-manager so for example you can say we'd open something as file and then you print the very old file and the value and the value of files is still there so basically these variables outlive the context money right so this is this unknown thing well if you do exception management here so you have this try block so you do try to divide by 0 to create an exception and if you do accept exception as e and then you print e so print it works but you try to expect the value of e after the except block is done the value of is no more so basically the variable is not accessible after the except block is done and this is an interesting reason for it the interesting reason is that the variable e contains exception but the exception contains the frame object inside and that contains the variables in the frame which contains the section which contains Swiss a cycle this means that if we don't do this thing all the exception that happened and this has to be also take a thought like when you record up and up and have the exception chain will form cycles in my name for example that you have a gigantic numpy array on a wire another function and then you raise an exception if the numpy array is 1 gigabyte x if we don't do this thing it will leak one gigabyte until the garbage collector is run so it's not a leak technically because the garbage collector will end cleaning it so but the price that it will delay distraction until there is a pass and it just happens that your program never creates more objects there will be no runs so you basically have to maintain that gigabyte of nonpublic alive forever it's not important it's just an interesting thing to happen any might change in the future but they this gives you an idea how tricky garbage collection is how tricky reference cycles happen and they can appear in the most unexpected places so if yeah that you need to take out is that we have taken care of this we have put a lot of effort into making it correct and it's really really tricky there is a lot of pieces involved we also collaborate with other other open source projects like siphon to try to find backs related to this and solving as well but you have to sometimes or sometimes be aware that there is something called the garbage collector and what is for and when it runs and is related to the Gil and that is very key in case you want to move it in the future so this is the other talk I hope you have enjoyed it and we have now to have some questions okay we have room for about three minutes of questions so about three questions could you provide any like example use cases for when you might want to change the number of generations in the generational collector so real ones and we really mean real applications I don't have any that can come to my mind but one of them is the one that I mentioned before you have a bridge condition that only happens in the GC and this is by likely 4c extensions because sometimes that is memory that is corrupted and it just happens that the GC when is inspecting the object crashes so you want to reproduce it because it's a race condition low in the number instead of forcing new collections it's a good way of making it appear more often for real applications is more like a try to see what happens and look at the memory consumption or at least I don't have anybody show views maybe Instagram actually knows something else because they have play with it a bit more Thank You hello thanks for the talk in some of your code examples you were using Dell with two variables at the same time I didn't know that was possible and I'm a little surprised that Dell isn't actually trying to dereference or decrease the reference count of the tupple that goes into it can you explain how that works so he's actually part of the grandma she's not really creating at Apple in the middle it just goes to every variable in that particular list it's the same thing as well it's not exactly the same but it's the same wouldn't you do I'm packing unpacking it's not really tuples sometimes getting creative you just have part programmer you can actually see it so in the documentation there is a nice explanation of how actually Dell works with multiple arguments you don't need a parenthesis for example and you can also check the grammar file in the C Python source code so using grammar / grammar txt I think and you can check actually how is the exact way of or which you can put it there yeah thanks cool thank you hi are you first-time listener longtime caller so I was curious when you have a reference cycle and then you're calling TV finalized and then something in that cycle resurrects the whole cycle what prevents TV finalized from being called more than once I'm one of the things that was deleted first so the algorithm first goes to be finalized on everything and then he checks it of psychology still isolated and then if it's not it reflects everything when everything that is reachable so basically first it runs all the finalized errs in the family leaf is the resurrection or not and only after that it actually checks if the cycle is is still isolated and then it checks it runs checks for getting to know which kind of things need to be resurrect and which not but yeah so the final answers can be called more than once no no no no so it calls all the finalizar in mark DeMoss as called and then it runs the check for the isolation yeah but if they get if the whole cycle gets resurrected and then deleted again later but because the mark is there do you know when you try to call the final I sure that you don't need to because it's mark has already finalized thank you [Applause]
Info
Channel: PyCon 2019
Views: 2,431
Rating: 5 out of 5
Keywords: Pablo Galindo Salgado, pycon, python, coding, tutorial
Id: CLW5Lyc1FN8
Channel Id: undefined
Length: 46min 11sec (2771 seconds)
Published: Sun May 05 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.