The main thread is overworked & underpaid (Chrome Dev Summit 2019)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] SURMA: The main thread is overworked and underpaid. And yet all of us run their code almost exclusively on the main thread. And I'm not wagging my finger at you. This has been the norm. This is the best practice on the web currently. And it makes sense because, if you had the choice between driving your car on the main road or on a side alley, you would drive it on the main road. And so what I'm really saying is the main thread is overworked and underpaid-- and the name is bad, too-- really? So what is this all about? Well, this whole thing started to become a topic for me when I was researching people coming online for the first time. 50% of the world's population are currently online, which means that 50% are not, or at least, not yet. These 50% are now slowly coming online through a vast variety of devices. For example, feature phones-- they have been incredibly popular for a long time in emerging markets like India. They are incredibly cheap to manufacture and, as such, they can be sold for a very low price tag, bringing more people from the world of offline to online. So the Jio Phone that you can see here on the very right is running a fork off the old Firefox OS called KaiOS, which is based on Firefox 42. So that isn't a recent version of Firefox by any means. But it is modern enough to browse the current web. And this phone only costs $15, which means the phone is incredibly popular and makes the mobile internet accessible to many more people than before. And these people coming online for the first time with comparatively low-powered phones are sometimes referred to as the Next Billion Users, or NBU. And I know that many of you hear this and might think of emerging markets like India. And that is absolutely not wrong. But it's also not the entire story, because there are also people in America. Now, they might not be coming online for the first time, necessarily. But they do spend their time on devices with very similar performance characteristics, for example, the Nokia 2. The Nokia 2 is a great phone because it is nice looking, it is very cheap, and it runs modern Android. However, the Nokia 2 smartphone is as smart as Iron Man is iron. There is a resemblance from the outside, but it's really made from something much more lightweight. So now these Americans that have phones like the Nokia 2 have these phones because they are subsidized. These phones are available at little or almost no cost for people living below the poverty line. And that is around 16% of Americans. And similar programs exist in other countries in the Western world. So to compare, look at how the iPhones have been climbing the single-core benchmark over the years. They are absolute beasts, and they continue to keep getting faster. The Nokia 2, on the other hand, is down here. It came out in 2018, but it's pretty much on par with the iPhone from 2011. That's seven years ago. It's ancient for technology standards. And yet, this ancient hardware runs a modern version of Android with the most recent version of Chrome. So you get all the modern, new APIs, but on old hardware. So you should be looking at the Nokia 2 or a similar phone to see how your web app feels for up to 16% of Americans. Or to phrase it another way, the Nokia 2.1 is probably representative as a 95th percentile phone for America. If your app runs on this phone, your app will be usable for 95% of Americans. And that's just America. Globally, the percentiles are skewed much more towards the low-end spectrum of phones. Either way, you should try out this phone and see how your web app feels. The bottom line is here that, even in the wealthy Western world, we need to care about hyperconstrained devices with crappy CPUs, pretty much no GPUs, small screens, and sometimes even no touch. And more precisely, we need to start caring about people who are constrained to these kind of devices. And as an exercise in this, we wrote PROXX earlier this year. It's a minesweeper clone as a PWA. So it has all the PWA goodies, like offline and nice graphics. And it was projected that around 400 million phones, feature phones, would be sold in 2019 alone. So we explicitly wanted to include that audience in our target audience so that they can play on these devices as well. So we wanted to see what it takes to make a game run on devices like this, these hyperconstrained devices, without writing a completely separate version of the game. And a couple of early experiments showed that we are really pushing the performance boundaries of these devices by something as simple as a table with a couple of buttons and some JavaScript to update the table. The main thread was completely overworked. So what is the main thread? Since the beginning of browsers, websites only ever had one thread. In the early days, the entire browser just had one thread because you just had one window. If you wanted to surf multiple websites, you would start a completely separate second instance of that browser. Since then, we have at least gotten one thread per tab, kind of; there are exceptions. But at the same time, the web has evolved from static documents with a couple of styles and images to new, full-blown, dynamic applications. And everything that is required to make this jump from docs to applications has just been added to this one thread, to the main thread, over time. And as a result, the main thread ended up with a lot of responsibilities when loading and running a website. So it has to process the events that the user causes by scrolling or interacting with it and figuring out if there's any JavaScript that needs to be run in response to this event. If there is, the browser needs to run that JavaScript, then figure out if the JavaScript changed the styles in which elements are affected by the changes in these styles. Then it needs to do layout to figure out where the elements end up where on this page and where the text flows and breaks. And then it needs to paint everything, meaning it needs to color in the elements' backgrounds and the borders and the images and the text and the shadows. And lastly, it needs to finally composite all these individual elements into the final image that you see on your screen. Now, to put that into context, we want to ship 60 frames a second because that's what it takes to make scrolling or animations feel smooth to, well, humans with human psychology. Not hitting that goal is what can make your web app feel like low quality or unpolished. Failing to hit a consistent 60 FPS is one of the bigger factors in why web apps feel worse than their comparable native app. So if you want to ship 60 frames a second, the entire system can spend, at most, 16.6 milliseconds to finish each frame, start to end. Most of these tasks are run by the browser. And so you really don't have any direct control over the duration. The only thing where you have direct control over the duration is your JavaScript, the amount of code that you run on the phone. But it's not just that, not just the JavaScript, but also the amount of work that the JavaScript causes. So it's really hard to tell how much work a piece of JavaScript will cost. And that's why it's so important to test on real devices. And here you might realize the device that you choose for testing will have a massive impact on the results that you get in your testing. So while you test on your iPhone or even your laptop, it might look like this. And you feel, oh, that's fine. But then you check on a Moto G4, and it looks like this, which is still fine, but definitely less headroom. And then you run your app on, say, a feature phone or a Nokia 2, and suddenly, you're way over your budget. And again, the budget was 16 milliseconds because that's how much time you have when you want to fit 60 frames into one second. But recently, Google brought out the Pixel 4, which has a 90 hertz screen. So on that device, you only have 11 milliseconds per frame. On that note, two years ago, Apple published the second generation of the iPad Pro, which has a 120 hertz screen. So that means that, yeah, you only have 8 milliseconds. We barely make it through our [INAUDIBLE] styles here. Did I mention that there are desktop screen is 144 hertz? Yeah, we're in trouble here. So on the one hand, we have hyperconstrained devices that are not getting faster, but cheaper. And at the same time, also wealthy Westerners getting the flagship phones get faster hardware, but also screens that want to ship more frames per second. So both of these developments leave us with less and less time to spend on the main thread for our code. We can't just keep putting code there without thinking about it. So really what I'm trying to say here is, if we want to follow the RAIL guidelines, we are imposing budgets on ourselves based on how an app feels when a user uses it-- so basically based on human psychology. And that is completely independent of the device that the user holds in their hands. And then we write some code, and we throw it all at the main thread. And every piece of code we run consumes a piece of our budget from the main thread. But how much is actually dependent on the hardware and, as such, is completely device dependent. So we are setting ourselves up for failure here. We have no control over the environment that our app will run in. So the question is the main thread is completely unpredictable. What takes two milliseconds on a modern flagship phone might take 20 milliseconds on the next low-end phone. How can we escape this unpredictability? Looking at native platforms like Android or iOS, they provide threads and patterns around and on top of threads and have done so for a very, very long time. Basic threading often looks like this. This is a snippet. It would work like this in Java or C#. But most other languages are similar where you just give a thread of function, and now that function will run in parallel to the rest of your program. You can access the same variables from both threads. And to make sure there are no race conditions, you can use [INAUDIBLE] to synchronize access to these shared resources. In terms of higher level abstractions, iOS, for example, has Grand Central Dispatch, a scheduling service, which allows you to think in tasks. Here's an example from Swift and how you use Grand Central Dispatch in Swift. In this case, you want to update a label in our UI with a new text. And to know what goes into the label, let's say we have to hit the database or the network. So we schedule the loadArticleText function that does this in the background. So it runs independent of the main thread and with a lower priority. And once it is done, it will schedule another main thread task. It actually does the assignment to the label because only the main thread can access UI elements. And this is what I would love to have for the web. However, JavaScript, as a language, is incapable of providing these kind of threads. JavaScript was designed around the concept of a single thread. And we can't just add threads and shared memory to JavaScript, because it would actually break everything and set it on fire. So instead, we have to isolate the concept to a dedicated type, like SharedArrayBuffer, and provide parallelism through workers. Now, SharedArrayBuffers are fairly new, but workers are actually not new at all. They have been around since roughly 2007 and had wide support in every browser since 2012. And just to make it clear, web workers are something very different from service workers and worklets. They share some characteristics, but be careful to not conflate them. In the context of this talk, I'm only talking about web workers. So workers, in case you don't know, are a bit like as if I opened the browser twice, but one of them is kind of headless. So they're completely isolated, no variables can be shared, and they run in parallel. Now, in terms of code, you create a worker by passing a file to the worker constructor. And that will basically spin up the isolated worker that you can-- the second browser without-- the headless version. You can still communicate with it by sending messages. And the value of the message you want to send is the parameter for this postMessage call. That value will now be copied to the worker. And to receive it, you must register a handler for the message event. The value can then be read on the dot data property on the event. And the worker is, of course, allowed to send a message back to the main thread with the exact same API. And you receive it on the main thread, again, with the exact same message event header. And that's all you got. That's all you can use when you want to use workers. Now, that might seem kind of OK. So far, workers have historically only been used for moving a piece of heavy work away from the main thread. And the worker only exists for the duration of the main thread, for example, in Skrooge. We did exactly this. We spin up a worker. We load our WebAssembly-fied image codecs. We send over a bitmap. WebAssembly does its thing, and it responds with the encoded image, and then we terminate the worker. We are done with it. It's just a one-off worker for a single task, a single purpose worker, if you will. However, things will get unwieldy quite quickly when you want to offer more than one operation in a worker. To get back to our previous example, what if, in addition to addition, we also wanted to add subtraction? Now we have to not only encode the parameters into the message, but also the operation that the worker is supposed to execute. And that has implications for the complexity of the worker because now we need to not only introspect the operation, but also dispatch the parameters to the right piece of code that actually does that. And now what if, while the first operation is being calculated in the worker, the main thread sends another operation? How do we know which response maps to which original request? We have to now do bookkeeping with IDs. And it is not great. If you've ever worked with threads in any other language, coming to Java [INAUDIBLE] workers is going to feel really bad and feel very complicated. And I think that's one of the main reasons why workers haven't seen a lot of adoption on the web to this day. I actually believe that postMessage has been a bit misunderstood. And it could actually be a strength if you build something around that message-passing pattern. For example, the actor model is a perfect fit here. And Paul Lewis and I talked about this here, at CDS, last year. And you should check that talk out if you're interested. But since we already talked about it last year, I want to talk about a different approach this year. And this is with libraries like Comlink. Now Comlink is a library that removes the conceptual overhead of communication with a worker. Its goal is to let you use workers without actually thinking about them. So through some convoluted proxy magic, Comlink allows you to share variables between the worker and the main thread almost like the normal programming languages that are out there. So for example, I can import Comlink into my worker, and I can define a set of functions that I want to expose to the main thread. And then, on the main thread, I can also import Comlink and wrap the worker and get access to these exposed functions. The API variable here, on the main thread, will behave exactly the same as the one in the worker, except that every function will now not return a value, but a promise for that value. And in combination with async await, it barely makes a difference syntactically, though. And so this is exactly what we used in PROXX to move parts of our game into a worker. But now the next question is, which parts did we actually move? Because one of the limitations that many people point out is that workers do not have access to the DOM. And actually, they can't access a whole bunch of APIs. So depending on whether your app relies on access to some of these APIs, you might not be able to run most of your app into worker. So really, the title of this talk should be "The main thread is overworked and underpaid-- the name is bad, too, and yet, sometimes, you don't even have a choice." At that point, you have to chunk your code that's running on the main thread to make sure with APIs isInputPending that Eddie and I talked about. But again, it's really hard to know how small the chunks should be because devices are so widespread in their performance metrics. And also just because we cannot move everything doesn't mean we should abandon the entire effort altogether. Every small piece of code that we can move buys a little bit more headroom to make room for the stuff that we have to run on the main thread, like access to the DOM because the DOM is not available in a worker and, therefore, it's bound to the main thread. And I'm pretty sure all our apps have UIs. And again, this is not an alien concept. Both Android and iOS do not let you access your UI from anywhere but the main thread. So let's go back to that Swift example that we had earlier. If we just go to the main thread just to change the text of a label, if we were to skip that step, the app would crash. They actively enforce it. You cannot do that. You cannot access your UI from anywhere but the main thread, which is actually why both iOS and Android often call their main thread the UI thread. And I find that really helpful because it tells you what should be there and what should not be there. One of the struggles I often see is that current UI frameworks on the web are the center of your universe. They are the entry point to your codes. And they are the overall orchestrator of everything. Anything else that you want to use in your app ends up being a component within that UI framework. And again, it's not something that we can blame UI frameworks for. This is how it's been on the web since its inception. that has been a best practice or even the only choice. UI frameworks think in UI components and are inherently tied to the UI and the DOM. And as a result, workers are not very useful from a UI framework's perspective. And I think we can move forward here by separating these concerns. I think we should try to use the UI thread for UI work only. And UI frameworks do UI work. They are allowed on the UI. They belong there. But many other things can actually go somewhere else. And that's the mantra that we've followed for PROXX. We actually distinguished between a visual state and game state. Or to categorize it another way, we had the main thread, which runs our two rendering engines. Yes, we have two rendering engines-- one using WebGL and one using Canvas 2D, because not all phones actually have WebGL like these feature phones. And this code's handled states for animations and transitions and small things. So we want to be really snappy in response if the code is small and really, really fast. The worker runs the game logic, and it's purely computational. This code is longer and can actually run longer or even in a blocking fashion. So note there are two kinds of state, UI state and the app state. The separation has proved quite useful to use, but it's somewhat of a change in a mindset. Now, this might sound familiar to some of you because what we're doing here is pretty much use something that is very similar to the Flux pattern, as in Flux Redux. And I found this realization really interesting because, to me, it means that many apps that use Flux or Flux Redux might actually have a pretty easy time to migrate to an off-the-main-thread architecture. In case you don't know it, this is what the Flux architecture looks like. It is implied that only the view is supposed to do the UI work. And as such, it should run on the main thread. Then the UI emits actions. The actions are received by a dispatcher. And the dispatcher then kicks off the functions that manipulate the state according to the action. And the new state gets stored in, well, the store. And this is the important part. All of that can run off main thread environment or, more specifically, in a worker. Now, no matter how much processing the dispatcher has to do, it does not lock the main thread. It could even run a while True loop. The UI would stay responsive. The user can keep interacting. So if you use Redux or any other form of Flux pattern, I wrote a blog post on how to pull Redux in a worker, which might be of interest to you. Now, if you want to adopt an off-main-thread architecture, I want you to be very aware that off-main-thread architecture will not make your app faster. It will make it more reliable. Because we are really just moving the same amount of work to a different thread. The overall amount of work stays the same. If anything, it might actually get a tiny bit slower because of the additional communication overhead between the worker and the main thread. The difference is that, while the worker is busy running whatever logic you have, the main thread stays free and available to process user interactions and do scrolling and do all these kind of things while JavaScript is running. It's often better to make the user wait a little bit longer than to drop a frame. The time to drop a frame is on the order of milliseconds. The time to make a user wait is on the order of hundreds of milliseconds. And so adding thread helps to process state change is less risky than squeezing more work into the next frame, especially when the amount of time that you cause on the main thread is so unpredictable. Now, of course, off-main-thread can make your app faster because phones have multiple cores. And with workers, we can make use of all these cores in parallel. So if your app's logic is parallelizable, you should go ahead and reap the benefits. However, do keep in mind that, on phones, it's only often one or two cores that are actually fast, and all the other cores are a lot slower. So it's actually hard to estimate what the benefits are going to be. And for this talk, I want to focus on risk reduction because I think that's really the key word. I see off-main-thread as a means to reduce risk, make your app more robust in the face of adverse runtime conditions. It's not about parallelizable for me. It's about improving my microbenchmarks. And in PROXX, we actually have a pretty extreme example of this. Let's look at this. Here we have a version of PROXX where everything runs on the main thread. No workers are in use. And the timer starts when a user taps one of the fields on the screen. You ready? Go. The game engine is now figuring out what needs to happen. Which fields need to get revealed? And during that time, the UI is frozen-- no animations, no scrolling for six seconds in total. That's pretty bad. Now let's compare this to the game running on the same hardware but with our off-main-thread architecture with workers. Ready? Go. We see an animation. We see, actually, that the game engine is working. And during all this time, the UI is responsive. The user can scroll and tap and keep playing. Basically, the user is getting feedback that something is happening, a very basic UX rule. And here's why I say it's an extreme example. The game takes almost twice as long to reach the same state. Now, that sounds pretty bad, doesn't it? But the question is, is this really the number that we should be looking at here? Is the question how quickly can we get this work done, or how can we make the game feel better? Let's the measure how long it takes for the game to give the user a visual response. On the version without workers, we saw we had to wait for six seconds for the task to finish. And since it was on the main thread, the main thread was completely blocked. So after six seconds is the first time that we actually can cause any change. And that's exactly what the number shows. When we use workers, we keep the main thread free. And we can use that freedom to update the UI while the game logic was running. So the first update actually happens seven frames after we tapped, which is roughly 100 milliseconds, and so perfectly in line with our RAIL budget. So the question is, is this, it takes twice as long, a big deal? Yeah, it's a big deal, but it's also a very conscious trade-off. And it's also important to note the slowdown is not because we're using workers. It is slower because we are using the freedom to ship more frames than the other version, which shipped no frames at all. And shipping frames on these low-end devices is very expensive. And it looks very different if you run the exact same code on a modern piece of hardware. Ready? Go. That's it. You can see it. It's pretty fast. So we can give the users of hyperconstrained devices a better experience without penalizing the experience of flagship phone users with the same code base. To simplify, really, slower does not always mean worse. As an anecdote-- you might have heard this one before-- Houston airport actually got a lot of complains that people were spending too much time waiting for their luggage at the belts. Customers were complaining. And so they could have spent time optimizing how quickly they could get the luggage from the plane onto the belt. But instead, they made the way longer for the customers to walk from the plane to the belt. So the customers were being kept busy with walking and spent, technically, less time waiting. They were happier, and they got less complaints. And this is kind of what you're doing here. We made a task slower to have more freedom and to use that freedom to keep our users busy with a nice animation and actually with the ability to continue playing the game. So if you find yourself doing microbenchmarks, keep in mind that there might be not so obvious trade-offs, and that the numbers game is not always the best game. Something I've glossed over so far a little bit in this talk is that the value that you send from a worker to a main thread needs to be copied. And that process is called structured cloning. It has been a source of worry for many people evaluating workers over time. So I ran a benchmark. You don't need to look at this graph too much. I just put it in here so that I look legit when I talk about numbers. But what I try to prove here is that the time it takes to copy a value from one thread to another is dependent on how complex the object is. A deeply nested, big object will take longer than a long-ish string or a simple array. And what it turned out that there is actually a simple rule of thumb. The amount of time it takes to structure clone an object is roughly proportional to the length of its JSON representation. Now, keep in mind that the number that you get is very much device-specific. And so I ran some more benchmarks to establish a lower bound. So I want to look at the results that I got on the Nokia 2. The TLDR of this graph is that even on the Nokia 2, if your raw JSON is under 10 kilobytes, you don't have to worry about bursting through any of your RAIL budgets. This might not be enough. The 10 kilobytes might not be enough for every app, but you can actually do quite a lot with 10 kilobytes. If, however, you are running into problems with this postMessage pattern, you can look into alternatives like transferring array buffers using SharedArrayBuffers in Atomics or even look into WebAssembly. But I can't fit all of that into the 30 minutes that I have. So if your interest is peaked, I also have a more detailed blog post on this, where I explain the graphs and the methodology, but all for these alternative techniques that might help you address the performance problems if you actually encounter them. So if you want to experiment with workers now, you might be wondering what the tooling situation is. Because workers are not really mainstream. They have been overlooked by most of the tooling that we have today. Webpack and Rollup, for example, don't support workers out of the box. To change that, my colleague Jason wrote a plug-in that teaches Webpack about workers. I wrote a plug-in the teaches Rollup about workers. I want to give a big shout out to the parcel people, because they actually made workers work out of the box, so thumbs-up to them. And this was basically my off-main-thread on-the-web speed run for you. There is much more to talk about. There is much more nuance in this topic, and it's very explorative at this point. So I just want to leave you with this. We are experiencing death by 1,000 cuts. Our problem is not really that any specific UI library is slow or that painting takes too long in one browser. It's the accumulation of all of these tasks, that we run everything on the UI thread. Support hyperconstrained devices. It is a matter of inclusivity. We need to look beyond the tech bubble that we often live in and experience the web as the 50th percentile and probably, even better, the 75th percentile. Some of our current pages won't even load over 3G on a feature phone let alone run in a usable way. By embracing off main thread architecture, you are moving execution costs to a different thread. But you actually also move parsing cost. So in turn, it might mean that your UI thread is now booting up faster, giving you a better time to first count in full paint or maybe even a better time to interactive. And so, in turn, you could increase your Lighthouse scores-- just saying. And lastly, web workers seem very complicated and crazy and scary, but they can be enjoyable, either by embracing the communication model or through libraries like Comlink. And with this, I think, we can actually take a big step for the web development ecosystem to make our web apps more reliable, but also more usable for everyone. Thank you. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Chrome Developers
Views: 101,401
Rating: 4.8967104 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Chrome, purpose: Educate
Id: 7Rrv9qFMWNM
Channel Id: undefined
Length: 30min 6sec (1806 seconds)
Published: Mon Nov 11 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.