.NET Core Garbage Collection

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

We've been talking a lot about memory allocation recently, but one thing we haven't really looked at is how the garbage collector actually works. Now from a programming point of view, it's not something you need to know too much about in detail, but it can be quite interesting - and having some understanding of it can help us avoid a few pitfalls that mean we're not dealing with memory very efficiently. So let's take a look at it, and what I've got here is a class called 'Person' and a Person has a 'Name' and then a Person has two children - both also Persons - 'ChildOne' and 'ChildTwo'. Now in reality of course, you wouldn't do it like this. You would have some kind of List or something like that to store the children. But that would just complicate matters because it means the List itself would need allocation on the heap, so all I've done is hard code this so we can have up to two children. And then what I've also got is this finalizer. Now we'll talk about finalizers in more detail in the next video, but for now just bear in mind that a finalizer is a method that gets called just at the moment when an object is being garbage collected. So all we're using it for here is just so we can keep track of what's going on. It's not a particularly normal use of finalizer - it's just there to keep track. So that's our Person class, and then in our Program what I do is in the main program, I call this function called 'Run' and then just print out where we are, and then I do this 'GC.Collect()' - so that is how we can from the program instruct the garbage collector to do a collection. Now all the advice is you really should never do this for yourself. The garbage collector itself knows much better when it's a suitable time to do a garbage collection - basically when the runtime system is starting to run short of memory. But from a programmer's point of view, we can't really have any idea because we're doing the programming a long time in advance of when the program is actually going to run; it's going to be running on a different machine. How can we know? So in normal programming, you just leave it to the system to work it out, but for now I want to put this in, so we can see precisely what happens at each stage. And then also just for completeness I've got to put in that 'GC.WaitForPendingFinalizers()' that just means the program won't terminate before the finalizers have been called. So we call 'Run', and in 'Run' I create a Person called 'Wilma' who has one child which is a Person called 'Pebbles'. I then pass that into this 'ShortLives' and then another message to tell us where we are, and another garbage collect, and then in 'ShortLives' I create another Person called 'Fred' who has a child called 'Bamm-Bamm'. Now I know for Flintstones fans that's not correct that, Bamm-Bamm is not Fred and Wilma's child, but for the purposes of memory management, that's what I'm going to do. And then I'm going to take Fred's ChildOne - Bamm-Bamm - and make that Wilma's ChildTwo, okay? We'll then come out of 'ShortLives' and we'll garbage collect, and we'll then come out of 'Run' itself and do another garbage collect. And so if we run that up we can see that as we leave 'ShortLives' and do the collection, then we simply collect Fred. And then as we leave 'Run', we collect the remaining objects, Bamm-Bamm and Pebbles and Wilma. So let's think about what's happening there. So if we put a break just at the end of 'ShortLives', so what we've got there is that the local variable 'fred' is a reference to the object 'Fred' and that has a reference to the object 'Bamm-Bamm'. And then we've also got the local variable 'parent' and the local variable 'wilma' - both of which refer to the person 'Wilma' who has two children, 'Bamm-Bamm' and 'Pebbles' If we then step on at that point, everything that is local in 'ShortLives' has gone out of scope. So 'fred' - the stack variable - is now out of scope, then we do our garbage collect. Now, the very first phase of garbage collection is what is known as 'marking'. And what the system does when we do that 'GC.Collect()' it looks for all of the local stack variables in our entire system - so in this case the only one we've got actually is going to be 'wilma' because we saw that 'fred' has gone out of scope. But the garbage collection system looks for all of the stack variables and also any static variables, which will be treated in the same way. We haven't got any here but stack variables and static variables. And so what the garbage collect will do, it will start with 'wilma' - the reference on the stack - and it will follow that to the object 'Wilma' on the heap. And then once it's reached the object 'Wilma' on the heap it's determined that that object is what is known as 'reachable' - that's to say there is a path from the stack or from a static variable to 'Wilma' - and therefore it marks 'Wilma' as reachable; effectively gives it a tick to say, this thing should not be garbage collected. Having done that, it then goes to Wilma's children. So it'll go to ChildOne, see that refers to a person 'Pebbles', therefore Pebbles is reachable, therefore Pebbles gets a tick that it should not be garbage collected. If Pebbles had any further children - so if Pebbles' children had references - it would follow the path right down, but there aren't any here. We then move on to Wilma's ChildTwo, which is a reference to Bamm-Bamm. So Bamm-Bamm gets a tick to say that Bamm-Bamm is still alive and so we can see when we look at those ticks that Wilma and everything connected to Wilma all have the ticks to say that they're still alive, whereas Fred - because Fred's lost the original reference on the stack - doesn't get a tick, and so once we've gone through that phase of marking, then we go through the phase of garbage collection, where anything that's been marked stays alive and anything that's not been marked - Fred in this case - is garbage collected. Which is why at that point we see that we're getting the message 'Collecting Fred'. Then let's carry on. And so when we leave 'Run', that means that the local variable 'wilma' disappears, but we've still got the 'Wilma' object and the 'Pebbles' object and the 'Bamm-Bamm' object on the heap, but now when we do the garbage collect, because Wilma doesn't have a reference from the stack Wilma, doesn't get marked, and neither do Pebbles and Bamm-Bamm. So they're all available for garbage collection, and therefore when we do the garbage collection we can see that we get the messages out 'Collecting Bamm-Bamm', 'Collecting Pebbles', 'Collecting Wilma'. And so we can see that's the mechanism by which it determines whether objects are alive. It's relatively efficient - you can imagine, obviously, we could have had a situation where, say, Bamm-Bamm was accessible through more than one reference. Well as soon as it's got the tick then any other references that reach it don't bother to carry on processing, because they know that that particular object and its descendants - anything it references - must also already be marked. So it only has to go through every object once, and then it can see they've been marked. Even so, if the system just worked like that it wouldn't be particularly efficient, and so in fact the garbage collection system in .NET is a little bit more complicated than that. It's actually what is known as a three generational garbage collection system. So actually we really in .NET have three heaps, which are known as the Generation 0 heap, the generation 1 heap and the Generation 2 heap. There's also - just for complexity - a thing called the 'Large Object Heap' which is where any objects greater than 85 K bytes get allocated; talk about that a bit more afterwards. But let's just stick to the main three to start with, and when objects are allocated, so when we had our Wilma and so forth allocated, they get allocated on the Generation 0 heap. And allocation is very efficient because actually, just like a stack, we have a heap pointer. And so each new allocated object is allocated at the bottom of the heap, and then the heat pointer moves up to show the next available slot, and so on and so forth. So there's no searching around for a spare gap of free memory on the heap like you get with some other systems - things like C++ and C - it just always piles them up like a stack. Then when the 0 Generation heap starts to get close to full - so when the garbage collector decides to kick in - it does this process of marking. So some of those objects will be ticked, and then after it's marked them, rather than having to explicitly delete the unmarked objects it simply moves the marked objects over from the Generation 0 heap to the Generation 1 here. And then anything that's left on the Generation 0 heap must be unmarked - must be unreachable. But now to tidy it up, it doesn't have to do anything much at all. It can just pop the heap pointer back down to the bottom of the heap, and therefore as the heap builds up again those collected objects will just be stomped over, just like happens on the stack. So that's really, really efficient. It means that the allocation always happens at the location pointed to by the heap pointer. We don't get the problem that they get on these other systems like C++ of heap fragmentation, where you could have a heap that overall has plenty of memory, but because of the ongoing process of allocation and deletion, all that memory is in very small fragments. And if you want to allocate a reasonably large object, although there's enough room in total, there isn't enough room at any one space. The reason it's done this way is that any objects that have made it from the Generation 0 heap onto the Generation 1 heap have proved themselves to have a reasonably long lifetime. So things that are very short-lived are just going to be on the Generation 0 heap for a short time, then they'll get garbage collected and go away. Again, anything that's already made it to the Generation 1 heap has proved itself and therefore is likely to stay around for longer. And the Generation 1 heap is collected much less frequently than the Generation 0 heap, and that's where we get our performance gain, that we're not having to garbage collect everything - we're only having garbage collect, for the most part, what's in the Generation 0 heap. However, it will sometimes happen that the Generation 1 heap starts to get full, in which case we do exactly the same process, except this time the surviving objects are moved over to the Generation 2 heap. And if they make it that far, they really have proved themselves and they are likely to be the kinds of objects that are around for the entire lifetime of the program, and therefore very unlikely to need garbage collecting at all. Obviously it can still happen, the Generation 2 heap may start to get full, and so we have to do a garbage collection on that. And that has to work slightly differently, because once we've marked up the surviving objects, we can't move them over to the next generation heap because there isn't one. And so what actually has to be done there is compacting the heap. So once the unreachable objects have been removed, everything else is compressed down so it all sits at the bottom of the heap, which is a slightly slower process, but it only happens rarely because we don't get much garbage collection on the third generation heap. And so that's the approach that we have objects on the zero generation heap are short-lived, if they're not short-lived on the first garbage collection they'll get transferred to the first generation heap. There they're much likelier to live for longer, but even then the really long lived ones will end up on the second generation heap and will be around probably for the whole lifetime of the program. I mentioned we also have this other thing called the Large Object Heap that is for large objects - things over 85K. And it really behaves on its own, but it behaves like a second generation heap. So when it gets garbage collected, it's got nowhere to transfer the surviving objects to, so it always has to compact them. And that's just worth bearing in mind slightly - what that means is the presumption is that large objects are also long-lived objects, because they behave like second generation objects. And that means just be a bit careful if you are allocating large objects. Make sure in terms of your program's structure, that they are going to be around for a long time. Typically when we talk about large objects, they're generally going to be collections like Lists or arrays. It's extremely rare that you'd have a large single object as big as 85K. So that's essentially how garbage collection works. A few other things to mention about it. One is to do with threading. The garbage collection runs in a separate thread from your program, so although when the garbage collector kicks in it might slow things down a little bit, it's not actually going to cause a complete halt while it does its work. There's one slight bit of configuration that's done for you automatically on that - but which you may want to change in certain circumstances - and that's to say there's a difference if you're running on a workstation - so a desktop computer, a laptop or something like that - or if you're running on a server. And the difference is on a server, the garbage collection thread is given a higher priority than the thread that your program's actually running in, which means it will have a slight tendency to interrupt your program, but the assumption is that a server has powerful enough resources that that's not going to be an issue. Whereas on a workstation, the thread priority is set to be the same as the program you're garbage collecting, and so neither of them will completely elbow out the other one from accessing the processor. You can change that - you can configure it - but it's pretty rare. Normally, the automatic setting works okay. The one thing we haven't really mentioned though, is how finalizers fit into all this. So I had a finalizer in there so we could see what was happening, but where exactly in the process they get called is slightly more complicated. And actually this is the area where you can make the biggest mistakes, because finalizers can really slow down the garbage collection process. So that's what we're going to be talking about in the next video. But for now, hope you enjoyed that. If you did, click the 'like' button. Do subscribe and I'll see you next time.

Info

Channel: Coding Tutorials

Views: 1,731

Rating: undefined out of 5

Keywords: garbage collection, .net core, finalizers, coding, tutorial, performance

Id: BeuNvhd1L_g

Channel Id: undefined

Length: 14min 54sec (894 seconds)

Published: Fri Mar 19 2021