Is the COST of JavaScript’s GC REALLY that high?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Let’s get right to the point, shall we? Is JavaScript’s garbage collector as bad as some people think it is? I have no idea, so let’s poke at this a bit and see if we can learn something. Basically, when you allocate memory, in languages like c++, you have to explicitly free it, otherwise you have a memory leak. In languages like javascript, c#, java, go, these are all garbage collected languages, they handle a good chunk of the memory management for you. When you allocate some memory, at some point javascript has to decide you’re done with it. So if I allocate an array, here’s my array somewhere in memory, and then I fill that array with stuff, so a number, a string, etc. it really doesn’t matter. If I now null out one of the entries in the array, this little blob of memory here kinda gets left stranded now, it has no connection to anything anymore, nothing refers to it In other words, it’s not reachable, if we start at a root object, we can’t reach that little blob and thus it’s garbage, it’s safe to reclaim that memory because nothing can possibly be using it anymore. That’s the general idea anyway, let’s see it in action with some code. What I've done here is I've written a really dumb test that just spam fills an object with random strings and other objects. This is called via requestanimationframe, kinda like you would in a game loop. So see here, I create data, set it empty to start, then I loop over and create a bunch of junk in there, basically a linked list of objects with random names. And these all have to allocate memory. The strings take up memory, as do the objects themselves. This “data” variable isn’t used anymore at the end of each loop, so we’re achieving our goal of wasting memory and forcing the garbage collector to act. So, in order to actually test this, you need to be able to measure how much time javascript spends in garbage collection. You could pull up dev tools and hit the performance button and dig around there, but there’s actually another way that’s slightly less known. This is chrome tracing, the backbone of chrome’s performance testing framework, hundreds of benchmarks measuring god knows how many thousands of metrics are built on this. In another life, I was an owner here and even led the infra team monitoring and processing all this data. It’s pretty cool, you can see and attribute time to various calls, see ipc’s between processes, in general it just gives way more info. For our purposes, I think it’ll just be easier to focus on what we want to know, garbage collection events. So from here, you can start a trace, choose which events you’re interested in, in our case, we only care about v8 stuff, and then start recording. You just need to wait a bit to capture enough of a trace, just capture however long you want, don’t to wait to 100%. This uses the doom keys to move, WASD, so let’s just zoom in and we can look a bit closely at all these garbage collection events. And we can see we were successful at provoking the garbage collector into action, all these minorgc calls are v8 in action reclaiming memory. If we’re lucky, maybe we see some majorgc calls in here too. These are both versions of v8’s garbage collector in action. So that brings up an important question before we move on, what is a minor gc vs a major gc. That’s a good question Simon, thanks for asking that. V8 has a great blog post that talks in depth about Orinoco, their garbage collector, and goes through some of the high level details on their collection strategy There’s this idea, the generational hypothesis in garbage collection, that most items die really young, also called infant mortality, because why not just use super morbid terms for memory management. But as a super handwavy explanation, v8 basically divides allocations into new and old. You have the new generation, and the old generation, and in essence, allocations start in the new generation. So if I allocate a string here, it starts out in new generation, specifically the nursery section part of this new generation. So then a garbage collection pass happens, and let’s assume that some objects are reclaimed and others aren’t. If that happens to survive the first garbage collection attempt, it gets copied out, or evacuated, into a sort of intermediate area. It’s not old yet, but it’s not brand new either. Another garbage collection event occurs, bunch of other stuff gets reclaimed. If at that point we find that our object survives, if it’s still in use, it gets evacuated out to the old generation. V8 makes this distinction so that it can apply separate strategies for the new and old generations. A minorgc, the scavenger, is optimised to collect from the new generation, while a majorgc, which doesn’t have a cool name like the mutilator or anything, does the heavy lifting of sifting through the old generation. Maybe you could call it the reaper or something. So by looking at the trace of our first test, we can actually select the first 10s or so of the trace, and we can see there were 384 occurrences of minorgc, average about 1.079ms each, and 56 occurrences of majorgc, at around 0.838 ms each, so our test is getting v8 to do both minor and major gc’s. So we can change this test around just a tiny bit to try to poke the majorgc into action. So what I’ve done here is I just take a random subset of the data and I stuff it into this.stuff, overwriting whatever happened to be there. this.stuff is permanent, never deleted for the duration of the test, so this should in theory keep a bunch of random things for a lot longer than a single call of DoSomethingStupid() and promote things to old generation When we get a trace of the whole thing and we’re just going to select the first 10s or so of the trace, and we can see that this is indeed what happens, we manage to force the garbage collector to work a bit harder, and we’ve added some time to both minorgc and majorgc. To try and understand the relationship between allocation patterns and minorgc and majorgc, I fiddled with this a bit more. We’ll try creating fewer objects, but make them more complex. This is the trace here, we’ll select the first 10s again, and looking at it, minorgc isn’t changing a whole lot, but majorgc is changing quite substantially, nearly doubling the time taken vs the simpler cases. I poke at this a bit more by adding a setup step to the test, where I just allocate a whole buncha random crap and store it. It’s never referenced in the test, but it’s there. When we look at a trace of THAT, we see our majorgc calls getting much more expensive. Although there’s fewer calls to majorgc, each invocation is taking 26ms, which for a game would be basically a skipped frame, something that’d be inexcusable. So that suggests there’s a relationship between the just amount of crap you have allocated and how long the gc takes, which I mean, is hardly surprising, it has to figure out what you’re using and not using, it’s not magic. If you were using another language where memory management is explicit, such as c++, it wouldn’t matter how much or little you have allocated, there’s no on-going cost for having it allocated, you only spend time allocating or freeing. Whereas in JavaScript there’s definitely like a tax on what’s currently allocated. I tried a couple more experiments to see if I could make minorgc’s more expensive, so what I did here was I allocated just random stuff in a loop. Looking at the trace, we DID spend quite a bit of total time in minorgc’s, wall time is almost 10% of the time in fact, although each singular call to minorgc was just shy of 2ms, so not quite the crazy long time majorgc took. I took another stab at it by allocating sparse arrays, and just randomly setting values in there, and v8 kinda laughed that attempt off. Nearly 800 minorgc calls, but they ran in a pretty negligible amount of time. I consider 10% to be pretty bad, but simultaneously, it’s a ridiculous setup, so maybe instead let’s build a somewhat more plausible game setup to see the effects of gc in action. Here’s a quick and dirty game setup, so what I've got here is a super basic bare bones “game” in quotation marks. I don’t bother actually rendering any of this, but we do simulate the entities. So you’ve got this list of entities, we call update on each one, when they die we destroy them, then filter them out. And we keep a stable population by creating new ones in their stead. The entities themselves do some legit work, so they have transform data associated with each one, along with values like energy, health, and they have a spatial hash grid client to track their position in the world and query other entities. They also have this update function, and then I force them to do some random stuff. I didn’t feel like making the logic extensive so I just repeat it a few times. When they’re fully charged, they shoot off a bullet, so they create a bullet, and that’s added to the world. And bullets are their own thing, they have a full transform, their position is updated according to velocity, and eventually they explode and kill nearby things. So it does some stuff. I took a few traces of this running, chose a bad one, and we can poke around in here a bit. If I select the first 10s of the trace, we can see some numbers for minorgc and majorgc. We got around 500 or so minorgc calls, taking on average less than half a ms, but adding up to around 240ms or so. There’s not many majorgc calls, barely any actually, and the ones that do fire take a little over 1.1 ms each. There was always going to be some overhead for garbage collection, but at the same time, this is written super poorly, realistically an experienced game developer will code with memory allocations in mind So let’s make it a tiny bit better. I’ve got a slightly improved version of these entities that attempts to minimize allocations. In the main loop, we actually reuse entities instead of allocating new ones, and the entities themselves use pre-allocated matrices and vectors to perform their calculations. Nothing super crazy here. Looking at a trace of that version though, we can see a monster of a difference. Whereas before we had around 500 minorgc calls, this version gets 110 or so, and a touch over 50ms total time spent there. There’s 1 majorgc call accounting for another ms of time. One really interesting thing that stands out here is that, in the first example, let’s zoom in and look at these minorgc calls. They’re happening DURING the frame, sometimes multiple times, meaning they’re contributing to longer frames. In the 2nd example, they’re happening in the idle time between frames, which is totally by design. V8 specifically built this feature into Chrome, when it detects there’s an idle period, that’s when v8 has the goahead to quickly run gc. What I kinda got from all of this was that short lived objects are often pretty cheap from a gc perspective. That doesn’t mean they’re free though, but if you can get them down low enough, they may just run in the idle time between frames and you may not see much in terms of overhead for it. The more crap you keep in memory, the longer lived cached stuff, the more expensive your majorgcs are, as we saw with my stupid synthetic test. I know my own code though, usually, so this could probably be at least partially mitigated in the future by just letting me tell you via some sort of api that gives hints to the garbage collector. Something like that, I dunno, I’ve given this a full 10 seconds of thought, you’d need to actually plan this out more. One major problem with javascript though is that, although for your own app you might be able to control your allocations, with 3rd party stuff, you have very little control, as far as I know. I mean, in something like C++, its the sorta same deal, but at least you can override memory allocation at a global level, and depending on your application, extend this to 3rd party libraries as well. In JS, no such luck, I can’t isolate a third party library to its own memory pool, so that’s just something you have to live with. Another is that, yeah of course there’s a cost to having gc, direct management will be more performant when done right But in reality, decently made games don’t just allocate and free on a whim either, most do it in a very controlled way. I’ve spent enough time moving systems to different allocation patterns, linear allocators, pool allocators, LIFO and scratchpad style frame allocators in order to wring out extra performance. C++ programmers may love to toot that horn, but even among gamedevs, that was usually handled by just a few people at the studio, not everyone. In JS, you have some control over memory, but it’s pretty tenuous. Mostly it’s down to your pattern of allocation. Minimizing allocations by reusing objects, that sort of thing. Maybe you could go crazy using typed arrays, but let’s save that for another time. And lastly, how much memory overhead does js incur for doing all of this? How’s performance on FireFox? etc. etc. Great questions! No idea, didn’t test those. Maybe next time. Throw some comments about what you’d like to see tested in the future. Until next time, Cheers

Info

Channel: SimonDev

Views: 91,014

Rating: undefined out of 5

Keywords: simondev, game development, programming tutorial

Id: easvMCCBFkQ

Channel Id: undefined

Length: 13min 52sec (832 seconds)

Published: Mon Jun 19 2023