.NET Core Garbage Collection

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
We've been talking a lot about memory allocation  recently, but one thing we haven't really looked   at is how the garbage collector actually  works. Now from a programming point of view,   it's not something you need to know too  much about in detail, but it can be quite   interesting - and having some understanding of  it can help us avoid a few pitfalls that mean   we're not dealing with memory very efficiently.  So let's take a look at it, and what I've got here   is a class called 'Person' and a Person has a  'Name' and then a Person has two children - both   also Persons - 'ChildOne' and 'ChildTwo'. Now in  reality of course, you wouldn't do it like this.   You would have some kind of List or something  like that to store the children. But that would   just complicate matters because it means the List  itself would need allocation on the heap, so all   I've done is hard code this so we can have up to  two children. And then what I've also got is this   finalizer. Now we'll talk about finalizers  in more detail in the next video,   but for now just bear in mind that a  finalizer is a method that gets called   just at the moment when an object is being  garbage collected. So all we're using it for here   is just so we can keep track of what's going  on. It's not a particularly normal use of   finalizer - it's just there to keep track. So  that's our Person class, and then in our Program   what I do is in the main program, I call this  function called 'Run' and then just print out   where we are, and then I do this 'GC.Collect()'  - so that is how we can from the program   instruct the garbage collector to do a collection.  Now all the advice is you really should never   do this for yourself. The garbage collector itself  knows much better when it's a suitable time to do   a garbage collection - basically when the runtime  system is starting to run short of memory. But   from a programmer's point of view, we can't really  have any idea because we're doing the programming   a long time in advance of when the program is  actually going to run; it's going to be running on   a different machine. How can we know? So in normal  programming, you just leave it to the system to   work it out, but for now I want to put this in, so  we can see precisely what happens at each stage.   And then also just for completeness I've got  to put in that 'GC.WaitForPendingFinalizers()'   that just means the program won't terminate  before the finalizers have been called. So we   call 'Run', and in 'Run' I create a Person called  'Wilma' who has one child which is a Person called   'Pebbles'. I then pass that into this 'ShortLives'  and then another message to tell us where we are,   and another garbage collect, and then in  'ShortLives' I create another Person called   'Fred' who has a child called 'Bamm-Bamm'. Now I  know for Flintstones fans that's not correct that,   Bamm-Bamm is not Fred and Wilma's child,  but for the purposes of memory management,   that's what I'm going to do. And then I'm going  to take Fred's ChildOne - Bamm-Bamm - and make   that Wilma's ChildTwo, okay? We'll then come  out of 'ShortLives' and we'll garbage collect,   and we'll then come out of 'Run' itself and do  another garbage collect. And so if we run that   up we can see that as we leave 'ShortLives' and do  the collection, then we simply collect Fred. And   then as we leave 'Run', we collect the remaining  objects, Bamm-Bamm and Pebbles and Wilma.   So let's think about what's happening there. So  if we put a break just at the end of 'ShortLives',   so what we've got there is that the local variable  'fred' is a reference to the object 'Fred'   and that has a reference to the object  'Bamm-Bamm'. And then we've also got   the local variable 'parent' and the local  variable 'wilma' - both of which refer to the   person 'Wilma' who has two children, 'Bamm-Bamm'  and 'Pebbles' If we then step on at that point,   everything that is local in 'ShortLives' has gone  out of scope. So 'fred' - the stack variable - is   now out of scope, then we do our garbage collect.  Now, the very first phase of garbage collection   is what is known as 'marking'. And what the  system does when we do that 'GC.Collect()'   it looks for all of the local stack variables  in our entire system - so in this case the only   one we've got actually is going to be 'wilma'  because we saw that 'fred' has gone out of scope.   But the garbage collection system looks  for all of the stack variables and also   any static variables, which will be treated in  the same way. We haven't got any here but stack   variables and static variables. And so what  the garbage collect will do, it will start with   'wilma' - the reference on the stack - and it will  follow that to the object 'Wilma' on the heap. And   then once it's reached the object 'Wilma' on the  heap it's determined that that object is what is   known as 'reachable' - that's to say there is  a path from the stack or from a static variable   to 'Wilma' - and therefore it marks 'Wilma'  as reachable; effectively gives it a tick to say,   this thing should not be garbage collected. Having  done that, it then goes to Wilma's children.   So it'll go to ChildOne, see that refers to a  person 'Pebbles', therefore Pebbles is reachable,   therefore Pebbles gets a tick that it should  not be garbage collected. If Pebbles had any   further children - so if Pebbles' children had  references - it would follow the path right down,   but there aren't any here. We then move on  to Wilma's ChildTwo, which is a reference   to Bamm-Bamm. So Bamm-Bamm gets a tick to say  that Bamm-Bamm is still alive and so we can see   when we look at those ticks that Wilma and  everything connected to Wilma all have the ticks   to say that they're still alive, whereas Fred -  because Fred's lost the original reference on the   stack - doesn't get a tick, and so once we've gone  through that phase of marking, then we go through   the phase of garbage collection, where anything  that's been marked stays alive and anything   that's not been marked - Fred in this case - is  garbage collected. Which is why at that point   we see that we're getting the message  'Collecting Fred'. Then let's carry on.   And so when we leave 'Run', that means  that the local variable 'wilma' disappears,   but we've still got the 'Wilma' object and the  'Pebbles' object and the 'Bamm-Bamm' object on   the heap, but now when we do the garbage collect,  because Wilma doesn't have a reference from the   stack Wilma, doesn't get marked, and neither do  Pebbles and Bamm-Bamm. So they're all available   for garbage collection, and therefore when  we do the garbage collection we can see   that we get the messages out 'Collecting  Bamm-Bamm', 'Collecting Pebbles', 'Collecting   Wilma'. And so we can see that's the mechanism  by which it determines whether objects are alive.   It's relatively efficient - you can imagine,  obviously, we could have had a situation where,   say, Bamm-Bamm was accessible through more than  one reference. Well as soon as it's got the tick   then any other references that reach it don't  bother to carry on processing, because they know   that that particular object and its descendants  - anything it references - must also already be   marked. So it only has to go through every object  once, and then it can see they've been marked.   Even so, if the system just worked like  that it wouldn't be particularly efficient,   and so in fact the garbage collection system  in .NET is a little bit more complicated than   that. It's actually what is known as a three  generational garbage collection system. So   actually we really in .NET have three heaps, which  are known as the Generation 0 heap, the generation   1 heap and the Generation 2 heap. There's also -  just for complexity - a thing called the 'Large   Object Heap' which is where any objects  greater than 85 K bytes get allocated;   talk about that a bit more afterwards. But let's  just stick to the main three to start with, and   when objects are allocated, so when we had our  Wilma and so forth allocated, they get allocated   on the Generation 0 heap. And allocation is very  efficient because actually, just like a stack,   we have a heap pointer. And so each new allocated  object is allocated at the bottom of the heap,   and then the heat pointer moves up to show the  next available slot, and so on and so forth. So   there's no searching around for a spare gap of  free memory on the heap like you get with some   other systems - things like C++ and C - it just  always piles them up like a stack. Then when the   0 Generation heap starts to get close to full - so  when the garbage collector decides to kick in - it   does this process of marking. So some of those  objects will be ticked, and then after it's marked   them, rather than having to explicitly delete  the unmarked objects it simply moves the marked   objects over from the Generation 0 heap to the  Generation 1 here. And then anything that's left   on the Generation 0 heap must be unmarked - must  be unreachable. But now to tidy it up, it doesn't   have to do anything much at all. It can just pop  the heap pointer back down to the bottom of the   heap, and therefore as the heap builds up again  those collected objects will just be stomped over,   just like happens on the stack. So that's really,  really efficient. It means that the allocation   always happens at the location pointed to by the  heap pointer. We don't get the problem that they   get on these other systems like C++ of heap  fragmentation, where you could have a heap   that overall has plenty of memory, but because of  the ongoing process of allocation and deletion,   all that memory is in very small fragments. And  if you want to allocate a reasonably large object,   although there's enough room in total, there isn't  enough room at any one space. The reason it's done   this way is that any objects that have made it  from the Generation 0 heap onto the Generation   1 heap have proved themselves to have a reasonably  long lifetime. So things that are very short-lived   are just going to be on the Generation 0 heap for  a short time, then they'll get garbage collected   and go away. Again, anything that's already made  it to the Generation 1 heap has proved itself   and therefore is likely to stay around for longer.  And the Generation 1 heap is collected much less   frequently than the Generation 0 heap, and that's  where we get our performance gain, that we're not   having to garbage collect everything - we're  only having garbage collect, for the most part,   what's in the Generation 0 heap. However, it will  sometimes happen that the Generation 1 heap   starts to get full, in which case we do exactly  the same process, except this time the surviving   objects are moved over to the Generation 2 heap.  And if they make it that far, they really have   proved themselves and they are likely to be the  kinds of objects that are around for the entire   lifetime of the program, and therefore very  unlikely to need garbage collecting at all.   Obviously it can still happen, the Generation 2  heap may start to get full, and so we have to do   a garbage collection on that. And that has to work  slightly differently, because once we've marked up   the surviving objects, we can't move them  over to the next generation heap because   there isn't one. And so what actually has to be  done there is compacting the heap. So once the   unreachable objects have been removed, everything  else is compressed down so it all sits at the   bottom of the heap, which is a slightly slower  process, but it only happens rarely because   we don't get much garbage collection on the  third generation heap. And so that's the approach   that we have objects on the zero generation heap  are short-lived, if they're not short-lived on the   first garbage collection they'll get transferred  to the first generation heap. There they're much   likelier to live for longer, but even then the  really long lived ones will end up on the second   generation heap and will be around probably for  the whole lifetime of the program. I mentioned we also have this   other thing called the Large Object Heap that  is for large objects - things over 85K. And it   really behaves on its own, but it behaves like a  second generation heap. So when it gets garbage   collected, it's got nowhere to transfer the  surviving objects to, so it always has to compact them.   And that's just worth bearing in mind slightly -  what that means is the presumption is that large   objects are also long-lived objects, because  they behave like second generation objects.   And that means just be a bit careful if you are  allocating large objects. Make sure in terms of   your program's structure, that they are going to  be around for a long time. Typically when we talk   about large objects, they're generally going to be  collections like Lists or arrays. It's extremely   rare that you'd have a large single object as  big as 85K. So that's essentially how garbage   collection works. A few other things to mention  about it. One is to do with threading. The garbage   collection runs in a separate thread from your  program, so although when the garbage collector   kicks in it might slow things down a little bit,  it's not actually going to cause a complete halt   while it does its work. There's one slight bit of  configuration that's done for you automatically   on that - but which you may want to change in  certain circumstances - and that's to say there's   a difference if you're running on a workstation -  so a desktop computer, a laptop or something like   that - or if you're running on a server. And the  difference is on a server, the garbage collection   thread is given a higher priority than the thread  that your program's actually running in, which means   it will have a slight tendency to interrupt your  program, but the assumption is that a server   has powerful enough resources that that's not  going to be an issue. Whereas on a workstation,   the thread priority is set to be the same as the  program you're garbage collecting, and so neither   of them will completely elbow out the other one  from accessing the processor. You can change   that - you can configure it - but it's pretty  rare. Normally, the automatic setting works okay.   The one thing we haven't really mentioned though,  is how finalizers fit into all this. So I had a   finalizer in there so we could see what was  happening, but where exactly in the process   they get called is slightly more complicated.  And actually this is the area where you can make   the biggest mistakes, because finalizers can  really slow down the garbage collection process.   So that's what we're going to be talking  about in the next video. But for now, hope   you enjoyed that. If you did, click the 'like'  button. Do subscribe and I'll see you next time.
Info
Channel: Coding Tutorials
Views: 1,731
Rating: undefined out of 5
Keywords: garbage collection, .net core, finalizers, coding, tutorial, performance
Id: BeuNvhd1L_g
Channel Id: undefined
Length: 14min 54sec (894 seconds)
Published: Fri Mar 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.