We've been talking a lot about memory allocation
recently, but one thing we haven't really looked at is how the garbage collector actually
works. Now from a programming point of view, it's not something you need to know too
much about in detail, but it can be quite interesting - and having some understanding of
it can help us avoid a few pitfalls that mean we're not dealing with memory very efficiently.
So let's take a look at it, and what I've got here is a class called 'Person' and a Person has a
'Name' and then a Person has two children - both also Persons - 'ChildOne' and 'ChildTwo'. Now in
reality of course, you wouldn't do it like this. You would have some kind of List or something
like that to store the children. But that would just complicate matters because it means the List
itself would need allocation on the heap, so all I've done is hard code this so we can have up to
two children. And then what I've also got is this finalizer. Now we'll talk about finalizers
in more detail in the next video, but for now just bear in mind that a
finalizer is a method that gets called just at the moment when an object is being
garbage collected. So all we're using it for here is just so we can keep track of what's going
on. It's not a particularly normal use of finalizer - it's just there to keep track. So
that's our Person class, and then in our Program what I do is in the main program, I call this
function called 'Run' and then just print out where we are, and then I do this 'GC.Collect()'
- so that is how we can from the program instruct the garbage collector to do a collection.
Now all the advice is you really should never do this for yourself. The garbage collector itself
knows much better when it's a suitable time to do a garbage collection - basically when the runtime
system is starting to run short of memory. But from a programmer's point of view, we can't really
have any idea because we're doing the programming a long time in advance of when the program is
actually going to run; it's going to be running on a different machine. How can we know? So in normal
programming, you just leave it to the system to work it out, but for now I want to put this in, so
we can see precisely what happens at each stage. And then also just for completeness I've got
to put in that 'GC.WaitForPendingFinalizers()' that just means the program won't terminate
before the finalizers have been called. So we call 'Run', and in 'Run' I create a Person called
'Wilma' who has one child which is a Person called 'Pebbles'. I then pass that into this 'ShortLives'
and then another message to tell us where we are, and another garbage collect, and then in
'ShortLives' I create another Person called 'Fred' who has a child called 'Bamm-Bamm'. Now I
know for Flintstones fans that's not correct that, Bamm-Bamm is not Fred and Wilma's child,
but for the purposes of memory management, that's what I'm going to do. And then I'm going
to take Fred's ChildOne - Bamm-Bamm - and make that Wilma's ChildTwo, okay? We'll then come
out of 'ShortLives' and we'll garbage collect, and we'll then come out of 'Run' itself and do
another garbage collect. And so if we run that up we can see that as we leave 'ShortLives' and do
the collection, then we simply collect Fred. And then as we leave 'Run', we collect the remaining
objects, Bamm-Bamm and Pebbles and Wilma. So let's think about what's happening there. So
if we put a break just at the end of 'ShortLives', so what we've got there is that the local variable
'fred' is a reference to the object 'Fred' and that has a reference to the object
'Bamm-Bamm'. And then we've also got the local variable 'parent' and the local
variable 'wilma' - both of which refer to the person 'Wilma' who has two children, 'Bamm-Bamm'
and 'Pebbles' If we then step on at that point, everything that is local in 'ShortLives' has gone
out of scope. So 'fred' - the stack variable - is now out of scope, then we do our garbage collect.
Now, the very first phase of garbage collection is what is known as 'marking'. And what the
system does when we do that 'GC.Collect()' it looks for all of the local stack variables
in our entire system - so in this case the only one we've got actually is going to be 'wilma'
because we saw that 'fred' has gone out of scope. But the garbage collection system looks
for all of the stack variables and also any static variables, which will be treated in
the same way. We haven't got any here but stack variables and static variables. And so what
the garbage collect will do, it will start with 'wilma' - the reference on the stack - and it will
follow that to the object 'Wilma' on the heap. And then once it's reached the object 'Wilma' on the
heap it's determined that that object is what is known as 'reachable' - that's to say there is
a path from the stack or from a static variable to 'Wilma' - and therefore it marks 'Wilma'
as reachable; effectively gives it a tick to say, this thing should not be garbage collected. Having
done that, it then goes to Wilma's children. So it'll go to ChildOne, see that refers to a
person 'Pebbles', therefore Pebbles is reachable, therefore Pebbles gets a tick that it should
not be garbage collected. If Pebbles had any further children - so if Pebbles' children had
references - it would follow the path right down, but there aren't any here. We then move on
to Wilma's ChildTwo, which is a reference to Bamm-Bamm. So Bamm-Bamm gets a tick to say
that Bamm-Bamm is still alive and so we can see when we look at those ticks that Wilma and
everything connected to Wilma all have the ticks to say that they're still alive, whereas Fred -
because Fred's lost the original reference on the stack - doesn't get a tick, and so once we've gone
through that phase of marking, then we go through the phase of garbage collection, where anything
that's been marked stays alive and anything that's not been marked - Fred in this case - is
garbage collected. Which is why at that point we see that we're getting the message
'Collecting Fred'. Then let's carry on. And so when we leave 'Run', that means
that the local variable 'wilma' disappears, but we've still got the 'Wilma' object and the
'Pebbles' object and the 'Bamm-Bamm' object on the heap, but now when we do the garbage collect,
because Wilma doesn't have a reference from the stack Wilma, doesn't get marked, and neither do
Pebbles and Bamm-Bamm. So they're all available for garbage collection, and therefore when
we do the garbage collection we can see that we get the messages out 'Collecting
Bamm-Bamm', 'Collecting Pebbles', 'Collecting Wilma'. And so we can see that's the mechanism
by which it determines whether objects are alive. It's relatively efficient - you can imagine,
obviously, we could have had a situation where, say, Bamm-Bamm was accessible through more than
one reference. Well as soon as it's got the tick then any other references that reach it don't
bother to carry on processing, because they know that that particular object and its descendants
- anything it references - must also already be marked. So it only has to go through every object
once, and then it can see they've been marked. Even so, if the system just worked like
that it wouldn't be particularly efficient, and so in fact the garbage collection system
in .NET is a little bit more complicated than that. It's actually what is known as a three
generational garbage collection system. So actually we really in .NET have three heaps, which
are known as the Generation 0 heap, the generation 1 heap and the Generation 2 heap. There's also -
just for complexity - a thing called the 'Large Object Heap' which is where any objects
greater than 85 K bytes get allocated; talk about that a bit more afterwards. But let's
just stick to the main three to start with, and when objects are allocated, so when we had our
Wilma and so forth allocated, they get allocated on the Generation 0 heap. And allocation is very
efficient because actually, just like a stack, we have a heap pointer. And so each new allocated
object is allocated at the bottom of the heap, and then the heat pointer moves up to show the
next available slot, and so on and so forth. So there's no searching around for a spare gap of
free memory on the heap like you get with some other systems - things like C++ and C - it just
always piles them up like a stack. Then when the 0 Generation heap starts to get close to full - so
when the garbage collector decides to kick in - it does this process of marking. So some of those
objects will be ticked, and then after it's marked them, rather than having to explicitly delete
the unmarked objects it simply moves the marked objects over from the Generation 0 heap to the
Generation 1 here. And then anything that's left on the Generation 0 heap must be unmarked - must
be unreachable. But now to tidy it up, it doesn't have to do anything much at all. It can just pop
the heap pointer back down to the bottom of the heap, and therefore as the heap builds up again
those collected objects will just be stomped over, just like happens on the stack. So that's really,
really efficient. It means that the allocation always happens at the location pointed to by the
heap pointer. We don't get the problem that they get on these other systems like C++ of heap
fragmentation, where you could have a heap that overall has plenty of memory, but because of
the ongoing process of allocation and deletion, all that memory is in very small fragments. And
if you want to allocate a reasonably large object, although there's enough room in total, there isn't
enough room at any one space. The reason it's done this way is that any objects that have made it
from the Generation 0 heap onto the Generation 1 heap have proved themselves to have a reasonably
long lifetime. So things that are very short-lived are just going to be on the Generation 0 heap for
a short time, then they'll get garbage collected and go away. Again, anything that's already made
it to the Generation 1 heap has proved itself and therefore is likely to stay around for longer.
And the Generation 1 heap is collected much less frequently than the Generation 0 heap, and that's
where we get our performance gain, that we're not having to garbage collect everything - we're
only having garbage collect, for the most part, what's in the Generation 0 heap. However, it will
sometimes happen that the Generation 1 heap starts to get full, in which case we do exactly
the same process, except this time the surviving objects are moved over to the Generation 2 heap.
And if they make it that far, they really have proved themselves and they are likely to be the
kinds of objects that are around for the entire lifetime of the program, and therefore very
unlikely to need garbage collecting at all. Obviously it can still happen, the Generation 2
heap may start to get full, and so we have to do a garbage collection on that. And that has to work
slightly differently, because once we've marked up the surviving objects, we can't move them
over to the next generation heap because there isn't one. And so what actually has to be
done there is compacting the heap. So once the unreachable objects have been removed, everything
else is compressed down so it all sits at the bottom of the heap, which is a slightly slower
process, but it only happens rarely because we don't get much garbage collection on the
third generation heap. And so that's the approach that we have objects on the zero generation heap
are short-lived, if they're not short-lived on the first garbage collection they'll get transferred
to the first generation heap. There they're much likelier to live for longer, but even then the
really long lived ones will end up on the second generation heap and will be around probably for
the whole lifetime of the program. I mentioned we also have this other thing called the Large Object Heap that
is for large objects - things over 85K. And it really behaves on its own, but it behaves like a
second generation heap. So when it gets garbage collected, it's got nowhere to transfer the
surviving objects to, so it always has to compact them. And that's just worth bearing in mind slightly -
what that means is the presumption is that large objects are also long-lived objects, because
they behave like second generation objects. And that means just be a bit careful if you are
allocating large objects. Make sure in terms of your program's structure, that they are going to
be around for a long time. Typically when we talk about large objects, they're generally going to be
collections like Lists or arrays. It's extremely rare that you'd have a large single object as
big as 85K. So that's essentially how garbage collection works. A few other things to mention
about it. One is to do with threading. The garbage collection runs in a separate thread from your
program, so although when the garbage collector kicks in it might slow things down a little bit,
it's not actually going to cause a complete halt while it does its work. There's one slight bit of
configuration that's done for you automatically on that - but which you may want to change in
certain circumstances - and that's to say there's a difference if you're running on a workstation -
so a desktop computer, a laptop or something like that - or if you're running on a server. And the
difference is on a server, the garbage collection thread is given a higher priority than the thread
that your program's actually running in, which means it will have a slight tendency to interrupt your
program, but the assumption is that a server has powerful enough resources that that's not
going to be an issue. Whereas on a workstation, the thread priority is set to be the same as the
program you're garbage collecting, and so neither of them will completely elbow out the other one
from accessing the processor. You can change that - you can configure it - but it's pretty
rare. Normally, the automatic setting works okay. The one thing we haven't really mentioned though,
is how finalizers fit into all this. So I had a finalizer in there so we could see what was
happening, but where exactly in the process they get called is slightly more complicated.
And actually this is the area where you can make the biggest mistakes, because finalizers can
really slow down the garbage collection process. So that's what we're going to be talking
about in the next video. But for now, hope you enjoyed that. If you did, click the 'like'
button. Do subscribe and I'll see you next time.