[MUSIC PLAYING] SURMA: The main thread is
overworked and underpaid. And yet all of us run their
code almost exclusively on the main thread. And I'm not wagging
my finger at you. This has been the norm. This is the best practice
on the web currently. And it makes sense
because, if you had the choice between driving
your car on the main road or on a side alley, you would
drive it on the main road. And so what I'm really
saying is the main thread is overworked and underpaid-- and the name is bad, too-- really? So what is this all about? Well, this whole thing started
to become a topic for me when I was researching
people coming online for the first time. 50% of the world's
population are currently online, which means that 50%
are not, or at least, not yet. These 50% are now slowly coming
online through a vast variety of devices. For example, feature
phones-- they have been incredibly popular for
a long time in emerging markets like India. They are incredibly
cheap to manufacture and, as such, they can be
sold for a very low price tag, bringing more people from the
world of offline to online. So the Jio Phone that you can
see here on the very right is running a fork off the
old Firefox OS called KaiOS, which is based on Firefox 42. So that isn't a recent version
of Firefox by any means. But it is modern enough
to browse the current web. And this phone only
costs $15, which means the phone is
incredibly popular and makes the mobile internet
accessible to many more people than before. And these people coming
online for the first time with comparatively
low-powered phones are sometimes referred to as
the Next Billion Users, or NBU. And I know that many of you
hear this and might think of emerging markets like India. And that is
absolutely not wrong. But it's also not
the entire story, because there are also
people in America. Now, they might not be coming
online for the first time, necessarily. But they do spend
their time on devices with very similar
performance characteristics, for example, the Nokia 2. The Nokia 2 is a great phone
because it is nice looking, it is very cheap, and
it runs modern Android. However, the Nokia 2
smartphone is as smart as Iron Man is iron. There is a resemblance
from the outside, but it's really made
from something much more lightweight. So now these Americans that
have phones like the Nokia 2 have these phones because
they are subsidized. These phones are
available at little or almost no cost for people
living below the poverty line. And that is around
16% of Americans. And similar programs
exist in other countries in the Western world. So to compare, look
at how the iPhones have been climbing the
single-core benchmark over the years. They are absolute
beasts, and they continue to keep getting faster. The Nokia 2, on the
other hand, is down here. It came out in 2018, but
it's pretty much on par with the iPhone from 2011. That's seven years ago. It's ancient for
technology standards. And yet, this ancient
hardware runs a modern version of Android with the most
recent version of Chrome. So you get all the modern,
new APIs, but on old hardware. So you should be looking at
the Nokia 2 or a similar phone to see how your web app feels
for up to 16% of Americans. Or to phrase it another
way, the Nokia 2.1 is probably representative
as a 95th percentile phone for America. If your app runs on
this phone, your app will be usable for
95% of Americans. And that's just America. Globally, the percentiles
are skewed much more towards the low-end
spectrum of phones. Either way, you should
try out this phone and see how your web app feels. The bottom line
is here that, even in the wealthy
Western world, we need to care about
hyperconstrained devices with crappy CPUs, pretty
much no GPUs, small screens, and sometimes even no touch. And more precisely, we
need to start caring about people who are constrained
to these kind of devices. And as an exercise in this, we
wrote PROXX earlier this year. It's a minesweeper
clone as a PWA. So it has all the PWA goodies,
like offline and nice graphics. And it was projected that around
400 million phones, feature phones, would be
sold in 2019 alone. So we explicitly wanted
to include that audience in our target audience so that
they can play on these devices as well. So we wanted to
see what it takes to make a game run
on devices like this, these hyperconstrained devices,
without writing a completely separate version of the game. And a couple of
early experiments showed that we are really
pushing the performance boundaries of these devices
by something as simple as a table with a couple of
buttons and some JavaScript to update the table. The main thread was
completely overworked. So what is the main thread? Since the beginning of
browsers, websites only ever had one thread. In the early days,
the entire browser just had one thread because
you just had one window. If you wanted to surf
multiple websites, you would start a completely
separate second instance of that browser. Since then, we have at least
gotten one thread per tab, kind of; there are exceptions. But at the same
time, the web has evolved from static
documents with a couple of styles and images to
new, full-blown, dynamic applications. And everything that is required
to make this jump from docs to applications has just been
added to this one thread, to the main thread, over time. And as a result, the
main thread ended up with a lot of responsibilities
when loading and running a website. So it has to process the
events that the user causes by scrolling or
interacting with it and figuring out if there's
any JavaScript that needs to be run in response to this event. If there is, the browser needs
to run that JavaScript, then figure out if the
JavaScript changed the styles in which
elements are affected by the changes in these styles. Then it needs to do
layout to figure out where the elements end
up where on this page and where the text
flows and breaks. And then it needs
to paint everything, meaning it needs to color
in the elements' backgrounds and the borders and the images
and the text and the shadows. And lastly, it needs
to finally composite all these individual
elements into the final image that you see on your screen. Now, to put that
into context, we want to ship 60 frames
a second because that's what it takes to make
scrolling or animations feel smooth to, well, humans
with human psychology. Not hitting that goal is
what can make your web app feel like low quality
or unpolished. Failing to hit a
consistent 60 FPS is one of the bigger factors
in why web apps feel worse than their comparable
native app. So if you want to ship
60 frames a second, the entire system
can spend, at most, 16.6 milliseconds to finish
each frame, start to end. Most of these tasks
are run by the browser. And so you really don't
have any direct control over the duration. The only thing where
you have direct control over the duration is your
JavaScript, the amount of code that you run on the phone. But it's not just that,
not just the JavaScript, but also the amount of work
that the JavaScript causes. So it's really hard to tell how
much work a piece of JavaScript will cost. And that's why it's so important
to test on real devices. And here you might
realize the device that you choose for
testing will have a massive impact on the results
that you get in your testing. So while you test on your
iPhone or even your laptop, it might look like this. And you feel, oh, that's fine. But then you check
on a Moto G4, and it looks like this, which is
still fine, but definitely less headroom. And then you run your app on,
say, a feature phone or a Nokia 2, and suddenly, you're
way over your budget. And again, the budget
was 16 milliseconds because that's how
much time you have when you want to fit 60
frames into one second. But recently, Google
brought out the Pixel 4, which has a 90 hertz screen. So on that device, you only
have 11 milliseconds per frame. On that note, two
years ago, Apple published the second
generation of the iPad Pro, which has a 120 hertz screen. So that means that, yeah,
you only have 8 milliseconds. We barely make it through
our [INAUDIBLE] styles here. Did I mention that there are
desktop screen is 144 hertz? Yeah, we're in trouble here. So on the one hand, we
have hyperconstrained devices that are not
getting faster, but cheaper. And at the same time,
also wealthy Westerners getting the flagship phones
get faster hardware, but also screens that want to ship
more frames per second. So both of these
developments leave us with less and less time to
spend on the main thread for our code. We can't just keep
putting code there without thinking about it. So really what I'm
trying to say here is, if we want to follow
the RAIL guidelines, we are imposing
budgets on ourselves based on how an app feels when
a user uses it-- so basically based on human psychology. And that is completely
independent of the device that the user holds
in their hands. And then we write some
code, and we throw it all at the main thread. And every piece of code we run
consumes a piece of our budget from the main thread. But how much is actually
dependent on the hardware and, as such, is completely
device dependent. So we are setting ourselves
up for failure here. We have no control
over the environment that our app will run in. So the question
is the main thread is completely unpredictable. What takes two milliseconds
on a modern flagship phone might take 20 milliseconds
on the next low-end phone. How can we escape
this unpredictability? Looking at native platforms
like Android or iOS, they provide
threads and patterns around and on top of
threads and have done so for a very, very long time. Basic threading often
looks like this. This is a snippet. It would work like
this in Java or C#. But most other
languages are similar where you just give
a thread of function, and now that function
will run in parallel to the rest of your program. You can access the same
variables from both threads. And to make sure there
are no race conditions, you can use [INAUDIBLE]
to synchronize access to these shared resources. In terms of higher level
abstractions, iOS, for example, has Grand Central Dispatch,
a scheduling service, which allows you to think in tasks. Here's an example from Swift
and how you use Grand Central Dispatch in Swift. In this case, you want to
update a label in our UI with a new text. And to know what
goes into the label, let's say we have to hit
the database or the network. So we schedule the
loadArticleText function that does this in the background. So it runs independent
of the main thread and with a lower priority. And once it is done, it will
schedule another main thread task. It actually does the
assignment to the label because only the main thread
can access UI elements. And this is what I would
love to have for the web. However, JavaScript,
as a language, is incapable of providing
these kind of threads. JavaScript was designed around
the concept of a single thread. And we can't just add
threads and shared memory to JavaScript, because it
would actually break everything and set it on fire. So instead, we have to isolate
the concept to a dedicated type, like SharedArrayBuffer,
and provide parallelism through workers. Now, SharedArrayBuffers
are fairly new, but workers are
actually not new at all. They have been around
since roughly 2007 and had wide support in
every browser since 2012. And just to make it
clear, web workers are something very different
from service workers and worklets. They share some
characteristics, but be careful to not conflate them. In the context of this talk, I'm
only talking about web workers. So workers, in case
you don't know, are a bit like as if I
opened the browser twice, but one of them is
kind of headless. So they're completely isolated,
no variables can be shared, and they run in parallel. Now, in terms of code,
you create a worker by passing a file to
the worker constructor. And that will basically
spin up the isolated worker that you can-- the second browser without-- the headless version. You can still communicate
with it by sending messages. And the value of the
message you want to send is the parameter for
this postMessage call. That value will now be
copied to the worker. And to receive it, you
must register a handler for the message event. The value can then be
read on the dot data property on the event. And the worker is,
of course, allowed to send a message back
to the main thread with the exact same API. And you receive it on
the main thread, again, with the exact same
message event header. And that's all you got. That's all you can use when
you want to use workers. Now, that might seem kind of OK. So far, workers have
historically only been used for moving
a piece of heavy work away from the main thread. And the worker only
exists for the duration of the main thread, for
example, in Skrooge. We did exactly this. We spin up a worker. We load our WebAssembly-fied
image codecs. We send over a bitmap. WebAssembly does
its thing, and it responds with the
encoded image, and then we terminate the worker. We are done with it. It's just a one-off worker for
a single task, a single purpose worker, if you will. However, things will
get unwieldy quite quickly when you want to
offer more than one operation in a worker. To get back to our previous
example, what if, in addition to addition, we also
wanted to add subtraction? Now we have to not only
encode the parameters into the message, but
also the operation that the worker is
supposed to execute. And that has implications for
the complexity of the worker because now we need to not
only introspect the operation, but also dispatch the parameters
to the right piece of code that actually does that. And now what if, while
the first operation is being calculated in the
worker, the main thread sends another operation? How do we know which response
maps to which original request? We have to now do
bookkeeping with IDs. And it is not great. If you've ever worked with
threads in any other language, coming to Java
[INAUDIBLE] workers is going to feel really bad
and feel very complicated. And I think that's one
of the main reasons why workers haven't seen a
lot of adoption on the web to this day. I actually believe
that postMessage has been a bit misunderstood. And it could actually
be a strength if you build something around
that message-passing pattern. For example, the actor
model is a perfect fit here. And Paul Lewis and I
talked about this here, at CDS, last year. And you should check that
talk out if you're interested. But since we already
talked about it last year, I want to talk about a
different approach this year. And this is with
libraries like Comlink. Now Comlink is a library that
removes the conceptual overhead of communication with a worker. Its goal is to let you use
workers without actually thinking about them. So through some
convoluted proxy magic, Comlink allows you
to share variables between the worker
and the main thread almost like the
normal programming languages that are out there. So for example, I can import
Comlink into my worker, and I can define
a set of functions that I want to expose
to the main thread. And then, on the main
thread, I can also import Comlink and
wrap the worker and get access to these
exposed functions. The API variable here,
on the main thread, will behave exactly the same
as the one in the worker, except that every function
will now not return a value, but a promise for that value. And in combination
with async await, it barely makes a difference
syntactically, though. And so this is exactly
what we used in PROXX to move parts of our
game into a worker. But now the next
question is, which parts did we actually move? Because one of the limitations
that many people point out is that workers do not
have access to the DOM. And actually, they can't
access a whole bunch of APIs. So depending on whether
your app relies on access to some of these
APIs, you might not be able to run most of
your app into worker. So really, the
title of this talk should be "The main thread
is overworked and underpaid-- the name is bad, too,
and yet, sometimes, you don't even have a choice." At that point, you
have to chunk your code that's running on the main
thread to make sure with APIs isInputPending that
Eddie and I talked about. But again, it's
really hard to know how small the chunks
should be because devices are so widespread in
their performance metrics. And also just because we cannot
move everything doesn't mean we should abandon the
entire effort altogether. Every small piece of
code that we can move buys a little bit more headroom
to make room for the stuff that we have to run on the main
thread, like access to the DOM because the DOM is not
available in a worker and, therefore, it's
bound to the main thread. And I'm pretty sure
all our apps have UIs. And again, this is
not an alien concept. Both Android and
iOS do not let you access your UI from anywhere
but the main thread. So let's go back to that Swift
example that we had earlier. If we just go to
the main thread just to change the text of a label,
if we were to skip that step, the app would crash. They actively enforce it. You cannot do that. You cannot access your UI from
anywhere but the main thread, which is actually why both
iOS and Android often call their main thread the UI thread. And I find that really
helpful because it tells you what should be there and
what should not be there. One of the struggles
I often see is that current UI
frameworks on the web are the center of your universe. They are the entry
point to your codes. And they are the overall
orchestrator of everything. Anything else that you
want to use in your app ends up being a component
within that UI framework. And again, it's not
something that we can blame UI frameworks for. This is how it's been on
the web since its inception. that has been a best practice
or even the only choice. UI frameworks think
in UI components and are inherently tied
to the UI and the DOM. And as a result, workers are
not very useful from a UI framework's perspective. And I think we can
move forward here by separating these concerns. I think we should try to use
the UI thread for UI work only. And UI frameworks do UI work. They are allowed on the UI. They belong there. But many other things can
actually go somewhere else. And that's the mantra that
we've followed for PROXX. We actually distinguished
between a visual state and game state. Or to categorize
it another way, we had the main thread, which
runs our two rendering engines. Yes, we have two rendering
engines-- one using WebGL and one using Canvas 2D,
because not all phones actually have WebGL like
these feature phones. And this code's handled states
for animations and transitions and small things. So we want to be really
snappy in response if the code is small
and really, really fast. The worker runs the game logic,
and it's purely computational. This code is longer and can
actually run longer or even in a blocking fashion. So note there are two kinds
of state, UI state and the app state. The separation has proved
quite useful to use, but it's somewhat of
a change in a mindset. Now, this might sound
familiar to some of you because what we're doing here is
pretty much use something that is very similar to the Flux
pattern, as in Flux Redux. And I found this realization
really interesting because, to me, it means that
many apps that use Flux or Flux Redux might actually have a
pretty easy time to migrate to an off-the-main-thread
architecture. In case you don't know it, this
is what the Flux architecture looks like. It is implied that only the view
is supposed to do the UI work. And as such, it should
run on the main thread. Then the UI emits actions. The actions are received
by a dispatcher. And the dispatcher
then kicks off the functions that manipulate
the state according to the action. And the new state gets
stored in, well, the store. And this is the important part. All of that can run off
main thread environment or, more specifically,
in a worker. Now, no matter how much
processing the dispatcher has to do, it does not
lock the main thread. It could even run
a while True loop. The UI would stay responsive. The user can keep interacting. So if you use Redux or any
other form of Flux pattern, I wrote a blog post on how to
pull Redux in a worker, which might be of interest to you. Now, if you want to adopt an
off-main-thread architecture, I want you to be very aware that
off-main-thread architecture will not make your app faster. It will make it more reliable. Because we are really just
moving the same amount of work to a different thread. The overall amount of
work stays the same. If anything, it
might actually get a tiny bit slower because of
the additional communication overhead between the
worker and the main thread. The difference is that,
while the worker is busy running whatever logic
you have, the main thread stays free and available to
process user interactions and do scrolling and do
all these kind of things while JavaScript is running. It's often better to make the
user wait a little bit longer than to drop a frame. The time to drop a frame is
on the order of milliseconds. The time to make a user
wait is on the order of hundreds of milliseconds. And so adding thread helps
to process state change is less risky than squeezing
more work into the next frame, especially when
the amount of time that you cause on the main
thread is so unpredictable. Now, of course, off-main-thread
can make your app faster because phones
have multiple cores. And with workers,
we can make use of all these cores in parallel. So if your app's logic
is parallelizable, you should go ahead
and reap the benefits. However, do keep in
mind that, on phones, it's only often one or two
cores that are actually fast, and all the other
cores are a lot slower. So it's actually
hard to estimate what the benefits are going to be. And for this talk, I want
to focus on risk reduction because I think that's
really the key word. I see off-main-thread as
a means to reduce risk, make your app more robust in
the face of adverse runtime conditions. It's not about
parallelizable for me. It's about improving
my microbenchmarks. And in PROXX, we actually have a
pretty extreme example of this. Let's look at this. Here we have a version
of PROXX where everything runs on the main thread. No workers are in use. And the timer starts when a
user taps one of the fields on the screen. You ready? Go. The game engine is now figuring
out what needs to happen. Which fields need
to get revealed? And during that time,
the UI is frozen-- no animations, no scrolling
for six seconds in total. That's pretty bad. Now let's compare
this to the game running on the same hardware
but with our off-main-thread architecture with workers. Ready? Go. We see an animation. We see, actually, that the
game engine is working. And during all this time,
the UI is responsive. The user can scroll and
tap and keep playing. Basically, the user
is getting feedback that something is happening,
a very basic UX rule. And here's why I say
it's an extreme example. The game takes almost twice as
long to reach the same state. Now, that sounds
pretty bad, doesn't it? But the question is, is this
really the number that we should be looking at here? Is the question how quickly
can we get this work done, or how can we make
the game feel better? Let's the measure
how long it takes for the game to give the
user a visual response. On the version
without workers, we saw we had to wait for six
seconds for the task to finish. And since it was
on the main thread, the main thread was
completely blocked. So after six seconds
is the first time that we actually can
cause any change. And that's exactly
what the number shows. When we use workers, we
keep the main thread free. And we can use that freedom to
update the UI while the game logic was running. So the first update actually
happens seven frames after we tapped, which is
roughly 100 milliseconds, and so perfectly in line
with our RAIL budget. So the question is, is
this, it takes twice as long, a big deal? Yeah, it's a big
deal, but it's also a very conscious trade-off. And it's also important to
note the slowdown is not because we're using workers. It is slower because we
are using the freedom to ship more frames than
the other version, which shipped no frames at all. And shipping frames on
these low-end devices is very expensive. And it looks very different
if you run the exact same code on a modern piece of hardware. Ready? Go. That's it. You can see it. It's pretty fast. So we can give the users
of hyperconstrained devices a better experience
without penalizing the experience of flagship phone
users with the same code base. To simplify, really, slower
does not always mean worse. As an anecdote-- you might
have heard this one before-- Houston airport actually
got a lot of complains that people were spending
too much time waiting for their luggage at the belts. Customers were complaining. And so they could
have spent time optimizing how
quickly they could get the luggage from
the plane onto the belt. But instead, they made the
way longer for the customers to walk from the
plane to the belt. So the customers were being
kept busy with walking and spent, technically,
less time waiting. They were happier, and
they got less complaints. And this is kind of
what you're doing here. We made a task slower
to have more freedom and to use that freedom
to keep our users busy with a nice animation
and actually with the ability to continue playing the game. So if you find yourself
doing microbenchmarks, keep in mind that there might
be not so obvious trade-offs, and that the numbers game
is not always the best game. Something I've glossed over so
far a little bit in this talk is that the value that you send
from a worker to a main thread needs to be copied. And that process is
called structured cloning. It has been a source of worry
for many people evaluating workers over time. So I ran a benchmark. You don't need to look
at this graph too much. I just put it in here
so that I look legit when I talk about numbers. But what I try to prove
here is that the time it takes to copy a value
from one thread to another is dependent on how
complex the object is. A deeply nested,
big object will take longer than a long-ish
string or a simple array. And what it turned out
that there is actually a simple rule of thumb. The amount of time
it takes to structure clone an object is roughly
proportional to the length of its JSON representation. Now, keep in mind that
the number that you get is very much device-specific. And so I ran some
more benchmarks to establish a lower bound. So I want to look at the results
that I got on the Nokia 2. The TLDR of this graph
is that even on the Nokia 2, if your raw JSON
is under 10 kilobytes, you don't have to worry
about bursting through any of your RAIL budgets. This might not be enough. The 10 kilobytes might not
be enough for every app, but you can actually do quite
a lot with 10 kilobytes. If, however, you are
running into problems with this postMessage
pattern, you can look into alternatives
like transferring array buffers using SharedArrayBuffers
in Atomics or even look into WebAssembly. But I can't fit all of that
into the 30 minutes that I have. So if your interest
is peaked, I also have a more detailed
blog post on this, where I explain the graphs
and the methodology, but all for these alternative
techniques that might help you address the performance problems
if you actually encounter them. So if you want to
experiment with workers now, you might be wondering what
the tooling situation is. Because workers are
not really mainstream. They have been overlooked
by most of the tooling that we have today. Webpack and Rollup, for example,
don't support workers out of the box. To change that,
my colleague Jason wrote a plug-in that teaches
Webpack about workers. I wrote a plug-in the
teaches Rollup about workers. I want to give a big shout
out to the parcel people, because they actually made
workers work out of the box, so thumbs-up to them. And this was basically my
off-main-thread on-the-web speed run for you. There is much more
to talk about. There is much more
nuance in this topic, and it's very explorative
at this point. So I just want to
leave you with this. We are experiencing
death by 1,000 cuts. Our problem is not really that
any specific UI library is slow or that painting takes
too long in one browser. It's the accumulation
of all of these tasks, that we run everything
on the UI thread. Support hyperconstrained
devices. It is a matter of inclusivity. We need to look beyond the tech
bubble that we often live in and experience the web as the
50th percentile and probably, even better, the
75th percentile. Some of our current
pages won't even load over 3G on a feature phone
let alone run in a usable way. By embracing off main
thread architecture, you are moving execution
costs to a different thread. But you actually also
move parsing cost. So in turn, it might mean
that your UI thread is now booting up faster, giving you
a better time to first count in full paint or maybe even
a better time to interactive. And so, in turn, you could
increase your Lighthouse scores-- just saying. And lastly, web workers
seem very complicated and crazy and scary, but
they can be enjoyable, either by embracing
the communication model or through libraries
like Comlink. And with this, I
think, we can actually take a big step for the
web development ecosystem to make our web apps more
reliable, but also more usable for everyone. Thank you. [APPLAUSE] [MUSIC PLAYING]