Weak JavaScript - HTTP 203

Video Statistics and Information

Video

Captions Word Cloud

Captions

DAS SURMA: So it's my turn to tell you about something new in JavaScript land and the web land, actually. And I have a proposal. JAKE ARCHIBALD: Had we started, by the way? DAS SURMA: I thought so. JAKE ARCHIBALD: Oh [BLEEP]. Sorry. I didn't realize. I thought you were still just like, um, might just start soon. DAS SURMA: So. JAKE ARCHIBALD: So. DAS SURMA: We're trying this once more. Another episode of-- JAKE ARCHIBALD: I love the sigh there. Like, [SIGH], here we are again. I'm having to talk to Jake. And this time, I have to present something. Come on. DAS SURMA: Yes. JAKE ARCHIBALD: Be a bit cheerier. This is YouTube. You know people don't watch anything for more than-- DAS SURMA: I'm actually really-- JAKE ARCHIBALD: --five minutes unless you're like, oh my god, this is the most amazing thing that's ever happened to JavaScript. This is going to be incredible. You know, you're going to learn so much in this episode. You're going to come away with your brain to be, like, three times the size it was. It's just-- this is going to be the best part of your day, if not your whole year. DAS SURMA: Isn't that how you die, if your brain grows to three times the original size? I think you die. JAKE ARCHIBALD: I think the idea is the skull would have to also inflate. DAS SURMA: Oh. So you look like Brain from "Pinky and the Brain." JAKE ARCHIBALD: Yes. DAS SURMA: Not-- shall we-- now we need to put a disclaimer on the show. Might make your skull grow if you watch too much of this. JAKE ARCHIBALD: That's the new tagline for the show. "HTTP 203--" might break your skull. DAS SURMA: Great. So I found a new JavaScript thing that intrigued me mostly in my role as the maintainer of Commlink-- JAKE ARCHIBALD: Ah. DAS SURMA: --where, you know, values and workers and main thread pretend to be in both places, but they're actually not. And then you get into garbage collection issues because things that you share will never get garbage collected because they're being shared. There's been references being held. And so I have learned more about this new proposal that is now ship to Chrome stable, which is weak refs. And I thought, you know what? Let's make a little episode about all this weak stuff that JavaScript has anyway. JAKE ARCHIBALD: Oh, so we're not just covering weak references. This is going to be the-- DAS SURMA: Well, we have a short intro bit about what's been there so far and why it is the way it is because there are some weird rough edges. JAKE ARCHIBALD: Well, stop teasing me and just give me the information. DAS SURMA: Let's do it, shall we? OK, so I thought we start with what actually is a weak reference because it's been thrown around a lot and just want to make sure everybody knows what we're talking about. Really what it means is whenever you have an object and you, quote unquote, "save" that object in a variable that the variable is a strong reference to that object. And as long as there's any variable in your code that has this object, this object will not get garbage collected because that will be bad if the stuff you have in variables suddenly disappeared. Now, a weak reference is the kind of reference that doesn't prevent garbage collection. So if there was a way to have a variable being a weak reference to an object, if there is no references to an object or only weak references-- basically no strong references-- that object is free to get garbage collected at any point in time, and you might see your value actually disappear. JAKE ARCHIBALD: OK, so a weak reference is kind of like-- like, if your weak reference is actually pointing to the object, then that's your way of-- you kind of know, eh, there's probably something else that's using this right now. DAS SURMA: Maybe. You basically know that it has not been garbage collected yet. That's pretty much as much you can deduce from that. And there's already one little difference that you mentioned there because just because you give up all strong references to an object doesn't necessarily imply that it will be immediately garbage collected. So a weak reference could theoretically still give you an object even though there is no strong references to it anymore. And that's a detail that we're going to get more into a little bit later because it's an important distinction to make. JAKE ARCHIBALD: It was-- I heard that. I'm glad you're doing this talk because I did read the article on weak references, and I came away still being quite confused. And it was some of the uncertainty that I was a bit like, I don't know, this uncertainty sounds bad. DAS SURMA: So we're going to have a lot of that because basically, we have had two weak data types in JavaScript for actually quite a while. And those are the WeakMap and the WeakSet. And some might be surprised that these are fairly old. They were in IE 11. So they've been around for a long time. JAKE ARCHIBALD: Yes. DAS SURMA: And what these two do is basically-- in the WeakMap, the key is weak, meaning you can only get to the value when you have the key. But the key itself is not prevented from being garbage collected just by the fact that it's being used in the map. So if you use a dom note as the key in the WeakMap and don't save this dom note anywhere else, if it gets garbage collected, your key and your value can disappear from that map. And that can be incredibly helpful. I often see questions on Twitter, sometimes directed at me, sometimes others, what the use cases for WeakMap and WeakSet are. And so one of the prime uses for me is that for the longest time in JavaScript, expandos, as they're called, have been fairly common. You just take an object, and you throw new properties on it. Like, you put just an _mystuff and put your stuff in there. And you know, that's JavaScript. It works. But one of the downsides is that it's bad etiquette, basically, to transform objects that you don't own also because that can cause the JavaScript runtime to deoptimize because you're changing the shape, as it's called, of an object. And so generated binary code might not work anymore. And they would have to regenerate it. And so mutating objects you don't own is often a bad practice. JAKE ARCHIBALD: And there's an ownership issue here as well, isn't there, because now, anything else that gets that profile node can see that internal data. Yeah, because it's not really private. DAS SURMA: And you can never be 100% sure that you're not creating a name clash. Maybe in this case, you know, some library code actually does use _internal. Who knows? And so what you should be doing instead is create a WeakMap. So in this case, we're getting something from the dom. And instead of putting something on the dom node itself, we use the dom node as a key in our WeakMap and put our extra data into the value of the weak map. And that means as long as the dom node exists, the value will continue to exist. But when the dom node gets garbage collected, so will the value. And that's really useful. And for a WeakSet, it's basically for the use case. Whenever you find yourself doing a WeakMap where you store true in the value, that's like my prime example. But you just want to remember, have you visit a certain object? Have you already seen it? Has it already been processed? Those kind of things, that's where a WeakSet comes in. However, and that is something that trips some people up or raises question marks, at least, is that neither of these are iterable. You cannot iterate over the keys in the WeakMap. You can't iterate over the entries in a WeakSet. Weak stuff in general was never iterable, and that kind of comes from the fact that one of the fundamental principles, I think, by the W3C was you should never expose garbage collection. And the reason for it is actually from security-- [MUSIC PLAYING] Because exposing garbage collection is, what I learned when I researched this, is called a covert channel because now two pieces of JavaScript can communicate without ever knowing really of each other's existence. You can think about it a little bit that if you bundle two libraries and you pass, you know, just an object for one library to another, now the one library might be able to track when the other library stops using that. And that could actually-- if those two libraries do it in coordination, you can communicate a lot through that without ever accessing each other's data. And I mean, if you think about cookies, that's how tracking happens. And I think something like this could be exploited the same way. And so they never wanted to open this can of worms, basically. However, over time, there were more and more demands and use cases for actual weak references and exposing garbage collection. And so over time, they spend a lot of design work and collaboration between the engine authors to figure out how they can make this happen without making this covert channel an actual problem. And so the weak refs proposal was born. JAKE ARCHIBALD: Before we move on to that, like, one of this model with weak sets and weak references where you can't iterate on the keys, in some ways, it almost makes more sense to me because it's that idea of, like, you can only open the door with a key, right? And if you lose the key, you can no longer open the door. The equivalence here with like-- because we don't just have WeakMap and WeakSet. There's also map and set, which can iterate. And that seems like a kind of model where I've lost my keys. Uh, I better go and ask the door for them back. You know? DAS SURMA: Which, fair. JAKE ARCHIBALD: Would be problematic in general, wouldn't it? DAS SURMA: Yeah, let's not let real doors be inspired by computer science maps. No, I don't think that's a good idea. So yeah, so this is where we now can talk about WeakRef. This is marked as advanced because it can be fairly hard to reason about. The second garbage collection is involved, it becomes undeterministic, unpredictable, and but also because whenever you use it, think really, really hard if there isn't a way you can solve the problem without it because this will be likely to introduce more bugs than it fixes because it is a fairly hand wavy spec almost. And we'll talk about that a bit more. JAKE ARCHIBALD: Well, whether garbage collection happens isn't specced at all. DAS SURMA: Isn't spec, and that's one of the underlying problems for this entire thing. JAKE ARCHIBALD: In fact, somebody even said to me, like, JavaScript engines are not required to garbage collect. Like, the spec-- DAS SURMA: We'll get to that. We'll get to that because that is-- JAKE ARCHIBALD: Oh, OK. DAS SURMA: Basically, we have weak refs now. And in the end, they do violate the principle of exposing garbage collection. They do create that covert channel. But they did spend a lot of time in the design to make sure that on the one hand, engines are still able to optimize and, you know, iterate on their internal architecture freely without being constrained by this WeakRef existing and also to limit the bandwidth of this covert channel to such an extent that the risks are extremely low to that actually being a real problem. What are these actual problems? Why is this actually such a, quote unquote, "advanced" topic? Well, the first problem is that garbage collection is non-obvious or non-deterministic. For example, if two values become unreachable at the same time, like, you give up your strong references at the same time, that does not mean they will get garbage collected at the same time. Or-- and that is also something that you might not know. It might look like you made something unreachable, but it is, in fact, not because the way that internal data structures work or native data structures to JavaScript or closures, there might still be a reference you're just not aware of. And so your reasoning might just be flawed and really hard to figure out where it is going wrong. Engines will garbage collect differently from each other. So something that might work in one browser or when something gets garbage collected in one browser, it might not in another. It actually might get garbage collected in one previous release of your browser but not in the current one. Or it might get actually garbage collected in the first run but not in the second run of the same code on the same website because there's so many factors that the engine incorporates when to run garbage collection that it is inherently unpredictable. And this is what I brought up earlier. Garbage collection and something being unreachable are inherently separate things. Something being unreachable is a requirement for it getting garbage collected, but they basically can be an arbitrary and infinitely long time in between. Most browsers run their garbage collector when they feel like the browser has downtime and only when it's under memory pressure. So if you're running on a super beefy machine, garbage collection might just not happen as a trade off for keeping the page responsive, snappy, and fast. And so that's just something to really keep in mind also for the rest of the discussion that when we talk about garbage collection, we mean the moment in time when a value is being deleted from memory and the memory is being freed up, not the point where it's becoming unreachable for your code. JAKE ARCHIBALD: Yeah, it's an optimization pass. Like, garbage collection is optimization. It was weird when I heard that browsers aren't required to garbage collect or JavaScript engines aren't required to garbage collect because it's like, but all the code everywhere would break if they didn't garbage collect. And it's like, well, yeah, but never mind. It is something they all do. It's just when they do it and how they do it is completely unspecified. DAS SURMA: Yeah, and just to give it a separate point, GC will happen late or sometimes not at all. That can happen. And actually, there is a lot of details of this that we'll talk about a bit later. All right, let's look about at the actual weak ref API. It is a very, very tiny, slim API, which is nice. So you create a value. That value has to be an object because only object gets garbage collected. Things like numbers, strings, the primitives don't get garbage collected. And you just put it in a WeakRef. And now you have a WeakRef to the value. JAKE ARCHIBALD: Wait, no. Wait, wait, wait, wait. I'm confused. No, no, no. No, no, no. OK, strings do get garbage collected, right? If I create a 100 megabyte string and then lose my reference to it, it will be garbage collected. I think the difference is that you can recreate that exact string later whereas you can't-- once you lose access to an object, you can't recreate that exact object again. It would be a different instance of that object. DAS SURMA: I'm not 100% sure on strings now that you say that. I think strings are special in that they're not objects and that two strings you create separately will be considered identical. And I think strings are shared in like a string pool because of this identity, reference identity that they have. So I think they will probably be removed from that pool, especially large strings. But I think that's not considered garbage collection. I'm not sure on this one. So we'll probably add something in the notes of this video. I'm pretty sure you cannot create a WeakRef to a string. JAKE ARCHIBALD: Yeah, it's pretty similar to how it can't be the key to a WeakMap or a WeakSet-- DAS SURMA: Yeah. JAKE ARCHIBALD: --because you can recreate them again. So the same goes for like numbers and symbols and, like, null and undefined, that kind of thing. DAS SURMA: So this is now a WeakRef. The WeakRef is a weak reference to the value that I created above. Even if we didn't put the value in its own variable, which is the strong reference-- like, if we didn't have a single strong reference, we would be guaranteed that this value would continue to exist until the next task. So that way, we can make sure it doesn't get garbage collected immediately. We might have a promise chain. Maybe want to use this value in the next micro task. That much is guaranteed. And this is not only for code to work as you expect but also one of these mitigations for the bandwidth. And that's the lifetime of the value is elongated so that you can't communicate as much through garbage collection just because now it's delayed until the next task. And now we could basically call .deref on the weak reference to see if the value is still there at some point later. And then this maybe value will be undefined if the value had been GCed at some point between these calls. Or it will be the actual value if it hasn't. And if we call deref and get the value, again, that's a new point in time where we say, OK, now the value will continue to exist until the next task at the very least, again, for this guarantee and to limit the amount of communication that you can do. JAKE ARCHIBALD: So when you call deref all right, two questions. Can you call deref multiple times? DAS SURMA: Yes, as many times as you like. JAKE ARCHIBALD: And then the thing you've got is now a strong reference. deref gives you a strong reference. DAS SURMA: Yes, it gives-- yeah, basically. And so until that variable ceases to exist. But even then, the value will not be garbage collected at the very least until the next task. This is actually one of the consistency guarantees. So you can have multiple WeakRefs to the same value, and calling deref on all of them will always yield the same result. There's no way that one WeakRef will still have it and the other one won't. Those will always be consistent. So with this, we can have weak references and hold onto a value without preventing it from being garbage collected. But actually, sometimes it's more helpful to have it the other way around where we get informed when something has been garbage collected because this way, we don't. We have to check, I guess, every frame or something. Like, it's really hard. You would have to do polling, and that's bad. You want to have the inversion where you get notified when something gets garbage collected. And that's the other part, the other half of this proposal. And that is called the finalization registry. The finalization registry takes a call back that will be called whenever a value that you have registered with the registry gets garbage collected. And you basically register that value with a held value. And now this is something that might be confusing to some people at first because the value that will be called [INAUDIBLE] not the value that gets passed into the callback. And that's simply because at the time the callback is called, the value has already been garbage collected. Doing it the other way around where you pass the value in just before it gets garbage collected could allow you to store a new strong reference. Then the garbage collector would have to undo its thing, and it would be way more complicated. So what you do instead, you get a separate value that you can pass in and that can be something for your internal bookkeeping to know which value it originally was. Or maybe it's the underlying resource you want to free up. Whatever. JAKE ARCHIBALD: Can you pass the same object in twice? No, that's not going to work because it has to have a strong reference, then. DAS SURMA: Yeah, so the held value is called held value because it is strongly held. JAKE ARCHIBALD: Yes. DAS SURMA: I mean, you could call register sum value sum value. But that would prevent garbage collection from ever happening on sum value. JAKE ARCHIBALD: Yes. So this is your optimization step. So if you've got-- if you've vended an object and you've got some sort of system where I want to keep this web socket open while this object is still being used, this is the pattern you would use because you get that call back, and you go, oh, that object's gone now. Now I can do my related optimizations like, close the web socket. DAS SURMA: Yeah. And this is also important to note that the callback will be called at the same time the value gets garbage collected or at some point later in time. And again, this is up for interpretation by the engine. It can be very, very long between the actual garbage collection and when this callback will actually be invoked. JAKE ARCHIBALD: But what about WeakRef then? Is there guarantees there? Like, could there be a situation where if I deref a weak reference, it returns, like, undefined or whatever, but this callback fires like an hour later or something? DAS SURMA: Yes, I think that is a possibility. So I think if the callback has been called, deref will definitely return undefineds. Deref returning undefined doesn't require that this callback has been called because it isn't required to get called at all. And this is what I want to talk about next because there is no guarantee the callback will run. And that's why it's so important that this callback isn't used for critical work. This callback should be used if a value being garbage collected means you can free up even more memory internally. It is supposed to allow you to reduce memory pressure further. It is not supposed to be used to actually do business logic cleanup work because it might not ever get called. And so in that case, you might actually be more wasteful with user resources than necessary. JAKE ARCHIBALD: Yeah, you couldn't build, like, an analytics system that tells you how many objects are around using this because it doesn't come with those guarantees. DAS SURMA: Yes, so that's one of the reasons, for example, why this has been tailored with WebAssembly in mind because WebAssembly often passes values to JavaScript that represent something that's actually in WebAssembly memory. And so in that case, it makes sense because if the representation in JavaScript gets garbage collected, meaning the engine thinks it's under memory pressure, then code can run to free up even more memory in WebAssembly lands. But if that memory doesn't get freed, it doesn't inhibit the functionality of the app or waste user resources to an unnecessary extent. So as I said, there is no guarantee that this will run, and there's some-- two things that they point out makes especially unlikely for them to run. One is if you close the tab or if you kill the process, there is no need for the engine or no requirement for the engine to run all the registered callbacks if you just kill the process. So it shouldn't be used to do cleanup work when somebody leaves the page. And it's also very unlikely for these callbacks to run if the registry itself becomes unreachable by your code. So I guess if the registry gets garbage collected unreachable, there is no guarantee that these callbacks will ever be called in the future. Now that we've registered something to get cleaned up, sometimes we might discover later on actually, we don't want this callback handler to run ever and we want to unregister a certain value. For that to work, the function register takes a third parameter, which is the unregisterToken. It's similar to the held value in that it symbolizes what you need to have to unregister this one value. In this case, however, unregisterToken could be the same thing as sum value because it is not strongly held. But sometimes, it might be easier for you to use a different value altogether just for bookkeeping purposes or keeping it simple or something like that. And then you can just call it unregister with this unregisterToken, and that will remove your value from the registry. And now one last method on the finalization registry, which actually I found really interesting because it's the first time that I have seen a normative optional API. That means this API is in the spec, is specced out, but implementers are not required to implement it. I think the only other API on the web that has a similar fate currently are shared array buffers in that they are specced. They're agreed upon. They're merged and everything. But they're not required to be implemented to be fully spec complaint. JAKE ARCHIBALD: Is this the same reason? Is it a security timing thing? DAS SURMA: Yes. I mean, shared array-- you all know or we have probably heard of the history of the shared array buffer. That was removed because of specced out meltdown, and now it's slowly being incorporated. But they still leave it up to the engine to decide whether they want to enable them. If they don't-- basically, browsers who hadn't shipped any mitigation or founded a good mitigation shouldn't be considered spec incompliant-- uncompliant? Incompliant? JAKE ARCHIBALD: Incompliant, I think it is. DAS SURMA: Whatever it is, don't say that about the browser that doesn't have shared array buffer because it's a very tough problem. That's basically the thinking behind it. And this apparently had a long discussion behind it, which I'm not going to try to summarize. But some browsers didn't want to ship this. Other browsers did want to ship with it. There was basically a big contention around should this ever be-- because it's a synchronous API that runs callbacks on amount of values that might have been garbage collected-- oh, sorry. I should explain what this function does first. So the function is called clean up sum, and that is basically saying, hey, if you have a chance, call all the callbacks for the values that have been garbage collected if you haven't called the callback on them just yet. So you're kind of opting like, hey, I would like to process these things now. I'm actually kind of surprised they made it optional because it would be just as spec compliant to make it a no-op because you don't have to run the callbacks. You can say, like, no, not running anything. JAKE ARCHIBALD: It's just a little poke, isn't it? It's just like please, please, please. DAS SURMA: But there were browser vendors disagreeing on whether this is good for the main thread or not. And so in the end, they just made it normative optional to implement this. And it currently looks like, at least on the web, we will only be shipping it in workers, not on the main thread, which I thought is quite interesting. So you do have to check whether this method exists before calling it. And this is where optional chaining comes in super handy. We just do a little question mark dot parentheses, and then it will be called only if that function actually exists. JAKE ARCHIBALD: And that's not the first thing that's happened with in JavaScript, right? Because we've got-- there's the atomic stuff, right? There's methods on atomics that are only in workers as well. DAS SURMA: Yeah. And so, yeah, just keep in mind you need to check before it's there. And again, this method is just a request to run these callbacks. There is no requirement from the browser to actually process, call any callbacks, or process any values. That's what I mean. Like, this entire API is very much trying to leave this freedom for the engine implementers to allow them to implement better garbage collection algorithms in the future so that they're not constrained by WeakRefs and finalization registry. And so it is very-- you need to be very careful when working with this. One thing I want to close with-- that with this, you can now actually build an iterable WeakMap. And interestingly enough, there's an entire implementation of this in the explainer. That is, I have a link here-- bit.ly/iterable-weak-map if you want to look at that. I'm not going to go through it here because of things to track to make sure that everything is garbage correctly and so on and so forth. But if you're curious, take a look. The WeakRefs implementation has shipped in current stable Chrome. I have actually merged it into Commlink. So if your browser supports WeakRefs, I will automatically garbage collect stuff across the worker boundary now, which I think is kind of cool. So yeah, try it out. Play around with it, but remain careful too. JAKE ARCHIBALD: Do we have enough time for you to quickly describe how you used it in Commlink? DAS SURMA: Commlink, what it does is you have a value in a worker. And you basically create a proxy on the main thread saying like, you, proxy. Behave like that object in the worker. And then under the hood, I use a message passing protocol to say, this person wants to access property A. Send me the value of property A, please. And then this round trip happens. What I'm doing now is basically have a simple reference counting algorithm. Whenever a proxy is created, I increment that counter. And previously, I had an explicit release function. I still have that, an explicit release function. Like, I don't need this proxy anymore. That decreases the counter. And when the counter reaches zero, that thing can get garbage collected-- on the worker side, that is. Now you implicitly call that function when the proxy gets garbage collected. And you can opt in off that. You can opt out of that. JAKE ARCHIBALD: Nice. DAS SURMA: But this-- again, like, this is-- if there is memory pressure, it might be useful for the engine that I release unused objects in the worker side as well. And so I pretty much just use a WeakRef-- finalization registry, sorry-- to get a notification this proxy has been released. And then I can decrease the count on the worker side and just let go of the message channels once that counter reaches zero. JAKE ARCHIBALD: You should say, this is super advanced stuff, and anyone watching this shouldn't be like, oh, my code doesn't use weak references. I should go and add them in somehow. Like, don't do that. DAS SURMA: Please don't. JAKE ARCHIBALD: Yeah. DAS SURMA: Don't. Please don't. JAKE ARCHIBALD: But it is just enabling these couple of edge cases where you can make extra optimizations. Or like you say, you can avoid a case where you have to call an explicit destructor, like, release function or something like that. DAS SURMA: Yeah. Yeah, for me, obviously the most exciting part is the WebAssembly bit where you now get a deeper integration between the garbage collector and JavaScript and your memory and WebAssembly. So this is the first stepping stone towards getting garbage collected languages to WebAssembly without having to ship your own garbage collector code in WebAssembly. So that's-- JAKE ARCHIBALD: Got you. DAS SURMA: It's not the whole story, but it's the first puzzle piece. And so I'm obviously thinking ahead and quite excited about the prospect of having managed languages be smaller and, I guess, better in WebAssembly. JAKE ARCHIBALD: Cool. Well, we'll link to the VA article, and-- DAS SURMA: Yes. JAKE ARCHIBALD: --Commlink. Will show where you're using it in Commlink. DAS SURMA: Why not? JAKE ARCHIBALD: Yeah, that's good, actually. I do now understand it a lot better than I did before. DAS SURMA: I'm glad. JAKE ARCHIBALD: Well done, Surma. DAS SURMA: Well done, me. JAKE ARCHIBALD: It's super advanced, this stuff. And, like-- DAS SURMA: Yeah. JAKE ARCHIBALD: --anyone watching this-- and anyone watching this shouldn't-- oh, Christ. Jesus, I'm out of practice. We should say, though, that this is super advanced stuff.

Info

Channel: Google Chrome Developers

Views: 21,670

Rating: 4.9179206 out of 5

Keywords: GDS: Yes, javascript, weakref, weakmap, weakset, v8, weak JS, bad JS, bad JavaScript, comlink library, HTTP203, new in tech, JS 101, JS news, HTTP, tech, tech podcasts, tech videos, JS videos, JavaScript videos, new videos from Chrome, Chrome, Chrome Developers, Web, Chrome devs, developers, Google, Jake and Surma

Id: uygxJ8Wxotc

Channel Id: undefined

Length: 29min 21sec (1761 seconds)

Published: Tue Sep 01 2020