SURMA: What better way to
start a recording after, I don't know, two months since
the last time we did this and change up everything? What could possibly go wrong? [MUSIC PLAYING] Should I talk a bit about
WebAssembly Threads Jake? What do you think? JAKE: Well, have you
written slides about that? Because if that's what you've
written all the slides about, then I think that is what
you should talk about. Otherwise, it'll
get very confusing. SURMA: I wrote the title,
and I'm like, well, now I wrote the title, I got
to write the rest around this. And so I did. Well, so the thing I really
like about WebAssembly-- and this is very much,
I think, a Surma thing that doesn't necessarily
apply to everyone-- is that WebAssembly has very
little surface in itself. WebAssembly can't do a
lot of things by itself. It can't-- very tiny things. And I'm going to talk
about this, what it can do. And yet we end up
with capabilities like Threads and
SIMD, all these things that JavaScript can't provide. And so I wanted to talk about
that, talk about WebAssembly, what it can and
cannot do by itself, and how interactions with
JavaScript can open up all these possibilities. And for that, I
thought I'd start a little bit with a
small introduction to WebAssembly at the low level. Before I start, it's
probably important to say, this is not what you need
to know to use WebAssembly. This is a bit like
looking under the hood. JAKE: So before we continue,
you said JavaScript doesn't have threads. And JavaScript doesn't
have threads, correct. But the web has workers. Node has workers. SURMA: I'll talk about that. JAKE: Oh, OK. Fair enough. SURMA: Yeah, that's exact-- because that's exactly
the interesting bit, because they're,
to an extent, yes, but on the other perspective,
you would say no. And then you combine these two,
and suddenly magic happens. So that's what I
want to talk about. So I'm going to give a bit
of an incomplete overview, enough so that you hopefully
understand what WebAssembly can and cannot do, but probably not
enough to cover every variant in which it can do these things,
but just a peek under the hood of WebAssembly, something you
don't necessarily need to know to use WebAssembly, but that
could be interesting or useful to know every now and then. So this is WebAssembly, kind of. This is the human readable
assembly languages, the human readable as-- JAKE: I'm going to have to
take issue with human readable. SURMA: [LAUGHS] JAKE: Because this
human can't read it. SURMA: It's-- yes. I hear you. It's Assembly. I mean, people who have seen
Assembly or any form of machine code, the human readable
versions are not readable in, oh, I understand
what's happening. But at least you can
decipher individual words compared to the binary
representation of this file. So this text
presentation is called WAT, or "what," WebAssembly
Text period, I guess. This language is literally
a tech [? interpretation ?] of what is in the
file in binary. So you can define module,
and one module will basically end up as one WebAssembly file. A Wasm module can contain
multiple functions, and functions can take
numbers as parameters. And numbers in WebAssembly
can be 32 and 64-bit integers and 32 and 64-bit floats. And you can do math
with these numbers, and then you can return
a number as a result. And you can export-- JAKE: What more could
you possibly want? SURMA: Exactly. It's all you need, right? And you can export
some of these functions to be callable
from "the outside." And we're going to talk
about outside in a bit. So JavaScript is one of the
host systems of WebAssembly. There is now actually multiple
host systems out there, some for PHP, some just
standalone like Wasmtime, which runs as a standalone
app on your desktop machine. But we're going to
talk about JavaScript, because we are a web show. And so in JavaScript Land, you
would fetch the .wasm module and instantiate it, which means
that it will compile the module and instantiate it, and you can
call these exported functions, which is now-- this is what the outside is. And here I declare
a function that takes two parameters
and a return type. 32-bit integer is the return
type of this function. And then I use these two
parameters to add them, and that is also implicitly the
return value of this function. And at that point, it
returns JavaScript. And the JavaScript
environment knows how to convert between
JavaScript types and WebAssembly types. And pretty much
all the Wasm types just turn into 64-bit floats,
because that's the only number type JavaScript has. Recently, there's been an
addition to WebAssembly where 64-bit integers are
now going to map to BigInts, because a 64-bit float can't
represent all the numbers a 64-bit integer can assume. So that will address
that problem. JAKE: But does that mean
then JavaScript can-- JavaScript has a number
type that WebAssembly doesn't support as well? Because if you get an
arbitrarily big number, it will get to a point
where WebAssembly can't-- once it's beyond the 64 bits. SURMA: Yeah. That can happen currently. So yeah, if the big
end grows too big, it cannot be represented in 64. I actually don't quite remember
how it is handled, if it throws or if it gets clamped
or something like that. I would look into that. Have to look into that. [ELEVATOR MUSIC] You see here that JavaScript
can call WebAssembly, but WebAssembly has access
to absolutely nothing. You can pass in
the numbers and use these numbers to do
arithmetic within WebAssembly, but you don't have
access to any of the APIs you might be used to. The WebAssembly is
completely isolated. And that alone is actually
surprisingly powerful, but we need a bit more to
make it actually useful. So the next step is that
you can declare imports. And here I am saying
that I'm expecting an import in the
surmas_imports namespace and that the import
is called alert. And I expect it to
be a function that takes one 32-bit
integer as a parameter. Later, I call that function with
a result of our computation, returning the result
of that computation. The instantiation for
the WebAssembly module remains largely
unchanged, except now we have to provide these imports. And that has to happen
at instantiation. And instantiation will fail if
I don't provide all the imports the module requires. So this here is the
so-called imports object, and the alert is obviously
good old alert function that I hope we all remember
from our start of JavaScript debugging. I certainly use that
a lot for debugging. So this shows now
that you can not only export WebAssembly
functions to JavaScript, but also you can expose
individual JavaScript functions to WebAssembly. But still, only
number types will be able to be passed back and
forth, because WebAssembly, for example, has no built-in
understanding of strings. Now so far, these
WebAssembly modules have worked with just
parameters and that arithmetic on these parameters. For any more complex
kind of work, you actually need
a bit of memory. And that's why
WebAssembly also has a way to handle chunks of memory
that you might already know as array buffers. So here we declare that this
WebAssembly module expects a memory in our import
object and that it needs to be at least one page big. WebAssembly measures
memory in pages, each page being 64 kilobytes. That has security and operating
system integration reasoning. It doesn't really matter, but
basically, the smallest unit of memory is 64 kilobytes,
and every memory has to be a multiple of that size. And now we can
use load and store to manipulate the
values in that memory. So instead of adding the
two values from the function parameters, we are
now adding two values that we find in memory. WebAssembly memories
are, as I mentioned, a lot like array buffers,
but not quite the same. They have their own
type, because they grow, they can grow. They have a different
unit of measurement, and they need a [? little ?]
of special setup for security under the hood. But in a way, they
behave exactly the same. So here I create a new
WebAssembly memory. So you can create a typed
array view on the [? screen, ?] just like with
normal array buffers. And then we can use ths DataView
to put these values into memory and then use our Wasm
module to add them up. This is obviously
a useful example, just put two values in
memory and add them up. But it just shows how the
interaction with the memory works. JAKE: So the memory
in Wasm, it's just like-- or sending a
function into Wasm, like you did with alert, and
sending memory into Wasm, is just-- is exactly the same. SURMA: It's pretty
much the same. You have to declare what an
import is supposed to be, because WebAssembly
is strongly typed. So at compile, it is known
that this important needs to be a function and this
import needs to be a memory. But it's the same way. You as the host system have
to make the conscious choice to give something
to WebAssembly. WebAssembly cannot just
grab anything by itself, which is one of the security-- JAKE: Because that talks to
what you said before, of how WebAssembly is so lightweight. I always assume there
was a deep integration of how the memory in
WebAssembly works. But it is just chucking a
JavaScript object in there, and then you're performing
operations on it in WebAssembly Land. SURMA: So for
complete transparency, a WebAssembly
module can actually declare it's [? mo ?] memory
and export it instead, but it still functions the same. It's just like, here's-- because the WebAssembly module
declares its own memory, you'll also know it doesn't
get access to anything it shouldn't get
access to, because it's created at instantiation time,
so it doesn't get random access to unknown data. So it's all about the
security and the primitives that are being exposed here. That covers all the
things that WebAssembly can do that we need to know
about to talk about it. This is pretty much
what is called the Wasm MVP, the Minimal
Viable Product, which was the synchronized launch
between all the browsers. There are proposals,
obviously, in WebAssembly Land to augment what WebAssembly
can do, but almost all of them are just almost syntactic
sugar on top of these things. Very few of the
proposals actually expose new capabilities,
and if they are, they're often limited
to arithmetic, which I think is very interesting. So let's talk about
threads, because JavaScript is a bit weird on this topic,
because it is, by design, single-threaded. JavaScript, however,
supports parallelism, at least on the web and
in [? node ?] recently, with workers, which
runs a JavaScript file in a truly parallel
fashion to your main thread. However, you can only send
messages back and forth with the worker and the main
thread with postMessage. And there is no way to share
a variable between those two threads, like you might be
used to from other languages that support threads or any
form of threading primitive. And since the-- JAKE: Oh, well, except
shared array buffer, right? That's-- SURMA: Well, Jake, that's
what I'm getting to. JAKE: Oh, OK, I'm sor-- [LAUGHS] SURMA: So if it just added
shared memory to JavaScript, things would break, because
many of the parameters are designed around the fact
that they can get interrupted or that there could
be erase conditions. So instead, the
shared memory concept has to be isolated to
a specific type, which, as you already spoiled, Jake,
is the shared array buffer. And that's-- JAKE: I'm sorry. I'll just shut up and
sit here and let you talk, because it's clearly-- everything that I'm thinking of,
you've already got covered, so. SURMA: I'm just glad you asked
this question exactly here, because that was my next slide. So I did something
right, at least. So yeah, shared array
buffer is pretty much just like an array buffer in the end. But you can get shared access
with the same array buffer from across the threads. And both of these threads will
see the memory manipulation under the hood in real time. So here, what I'm doing is
I'm running the main thread and basically have
a "while true" loop to wait until the first
cell of the memory is bigger or greater than-- greater or equal than 100. So this basically means
the main thread is blocked. And then the worker, I get
access to the same shared array buffer, and I increment
the first cell value. So even though the main thread
is blocked, at some point it's going to get unblocked,
because the worker is running in parallel for real and can
work on the same exact memory chunk as the main thread. So this is called a
spinlock, where you just keep spinning in an endless
loop and keep checking your condition to continue. They're a thing, but
they're obviously quite bad, because you're
just locked in your CPU at 100% for this thread,
because just that's all the processor's doing. And-- JAKE: Are you going to
mention atomics in-- SURMA: I am now going
to mention atomics. Aren't you on top of things. JAKE: [LAUGHS] SURMA: Because I actually
think that atomics are not that well known,
because very few people probably work with
shared array buffers, and they only work with
shared array buffers. And basically, they have
just a couple operations with atomics to make operating
on these shared array buffers more reliable
and predictable. As for example, in
a worker here, we can block [? out ?] a memory
cell and wait for other threads to notify us that this memory
cell is now ready to use. It's a form of a mutex. So in the worker, we
can use Atomics.wait to wait on a certain cell. The first value
here is the index, and the second parameter
is the [? ex-- ?] the first memory is the view. I should read my own code. The first memory is
the actual memory view. The second parameter
is the index, and the third parameter
is the expected value that needs to be in the cell
before we start blocking. That is a typical mutex
programming pattern, that you check if the
value had been changed before you started waiting. And then in the main thread,
we can basically just wait on the user to click
a button, and once they do, we use Atomics.notify,
and that will wake up all the threads that are
waiting on this memory cell. And so this way, the CPU
is not locked at 100%. This is not a spinlock. It's actually in cooperation
with this operating system and will put the thread to
sleep and save system resources. Now that was basically
all the prelude, all the building
blocks that we need to know to understand the
WebAssembly threads proposal that is now in Chrome stable. So the WebAssembly threads
proposal is actually much less than I thought it was, because
when you think about threads in C or Rust, you think
about calling a function and having it run in
a separate thread. WebAssembly Threads is not that. What it really is,
it's just it allows you to declare a memory as
shared, which basically makes this WebAssembly
memory behave exactly as a [? normal memory, ?]
but also like a shared array buffer in that it can have
multiple views in real time onto the same memory
from different threads. And additionally, it
exposes these atomics as WebAssembly instructions. Now this is interesting,
because it doesn't actually allow you to spawn a thread. It just gives you these atomics. And this is actually
solved by the language compiler or the runtime that you
use to write your WebAssembly. So I did a diagram. And I know people find these
kind of UML diagrams scary. But that's actually what
I [? found interesting, ?] that it is more complicated
outside of WebAssembly than it is inside
of WebAssembly. So basically, you
[? went ?] to JavaScript, and whenever you compile
something from C to WebAssembly with M script, M script will
not only generate a .wasm file, but also a JavaScript
glue code, as it's called. And that piece of
JavaScript takes care of loading the
WebAssembly module for you, populating the memory
with all the values it needs to be in there, and
it provides the integration with the host system
that the C language expects to be in place. So for example, when in C
you call pthread_create, which is the function
to create a thread, that is actually a
JavaScript function that is imported
into the WebAssembly. So when you call pthread_create,
the call goes into JavaScript. It will spawn a worker. It will send the module
and the shared array buffer over to the worker. The worker will also
load the glue code and instantiate the
same module on top of the exact same memory. And now we have main
thread and worker running on the same memory with
the same WebAssembly module, and they can now use the
atomics to synchronize. So from here on in, it actually
behaves like a real C program. But all the magic
really happens in-- JAKE: So what's
the new bit then? Because we could have, like-- today, before this Wasm
feature became a thing, we could still give
WebAssembly a shared array buffer as memory. So there'd be nothing
to stop us instantiating two bits of Wasm
that are actually using the same bit of
memory in different workers, and they could be instructed
to work on separate things at separate times. So is that the case? And if so, what's the new bit? SURMA: Before, you couldn't
instantiate WebAssembly on a shared array buffer. That just wasn't possible. And you didn't have the
atomics instructions inside WebAssembly. So that's what I mean. Those are really just
the new additions, and they already
exist in JavaScript. They are not new capabilities
on the web really. The difference is
that in JavaScript, we don't have any way of expressing
complex objects on top of a shared array buffer. But any normal
compiled language does exactly that, where you
can build complex classes and structs, and
they all somehow get represented
in linear memory. And so that is this
combination of actually kind of old JavaScript
features, shared array buffers and atomics, and
WebAssembly's capability to bring the high level
constructs to a low level virtual machine that,
in combination, we now have real threads on the
web, which kind of was possible with workers
and shared array buffers, but not in a comfortable way. Yeah, we've been using
it with Squoosh now, and it worked surprisingly
well for some of it. Yeah, and so I looked
into it, and I was like, I was surprised at how
small, to an extent, the proposal really is. And yet the combination
of these things really turns into
something very powerful. JAKE: So I'll put you on
the spot a little bit, because I haven't-- SURMA: Yes, please. JAKE: --looked in
detail at what we've done with Squoosh, because
it wasn't me doing that work. So it's going to be
spinning up these workers. When does it get rid
of those workers? Does it generate them
per thread and then destroy them per thread? Or does it have a thread pool? SURMA: I think it has to. JAKE: Mm. SURMA: So I-- actually,
no, that's not true. I think different compilers
will handle it differently. I know that M script takes
a worker pool parameter. JAKE: Hmm. SURMA: So I wonder if that is,
to an extent, how it works. Because I mean, a computer has-- you can spawn as many
threads as you like, technically, but
you have limited amount of cores
that can actually run any of these threads
at any given time. My hunch is that,
naively, it will probably spin up as many workers
as you create threads, and it will kill the worker
when the thread is done. JAKE: Mm. SURMA: But there's probably
smarter things out there when you created
worker pool, that they get recycled and reused. I mean, workers-- not workers. I think
[? WebAssembly's ?] threads are still so new, to
an extent, that there's so many things to measure and
to optimize and to see how, with actual usage patterns,
how that affects performance. I know that Google Earth has
been using them for a while, so I'm guessing they
had-- and I know they've been
talking about it, so I wonder how much feedback
has been flowing back between the M script engineering
team and the Google Earth engineering team. But so far, Threads has been,
quote unquote, "good enough" to run all these use cases
with good performance results in the wild. And that makes me kind
of hopeful at least. JAKE: So is there any proposal
to put in genuine threads? Like-- I don't know,
I say genuine threads, and that's kind of like what-- SURMA: You want to spawn
something without having to write the glue code, right? That's what I thought, that
that would be the capability to somehow spawn a thread. No, I don't think so. I think what this
would be, that would be WASI, where you have a
standardized systems interface. So instead of,
for each language, you reinvent how to mock
out a thread creation call, there would be a standardized
interface to the host system, which is what WASI
is, where you can say, open the file, create a network
connection, but probably also, spawn a thread. And then the WASI
implementation, may that be in a desktop
environment runtime or maybe a JavaScript layer on
the web or wherever, would just have this
generic implementation. But that is still, I think,
a bit out being worked on, being-- there's experiments, and
that's really, really good, but nothing that I would say you
could settle on for production right now. But yeah, that was basically the
WebAssembly handwritten version speedrun with Surma and threads. JAKE: Did you mention-- you
mentioned something about SIMD along the way as well. What's happening there? SURMA: SIMD is an even smaller
proposal, because it just-- well, it adds one new type,
which is 128 bit something. And you can interpret
these 128 bits either as four 32-bit
integers, two 64-bit integers, as eight 16-bit integers,
and just see them as vectors, and add them and multiply them. And they settled on
that, because that seems to be what will
compile on most actual CPUs. Because WebAssembly by itself
is just an intermediate format, right? When you download
WebAssembly, it will get compiled to
real machine code that runs on your actual processor. So it needs to find a
SIMD equivalent that will compile as many
[? processors ?] as possible. And then all it does is
just add this new type and add a couple of
new instructions. And then the compiler can
decide whether to the CPU supports or not,
and either pretend to run those instructions
in one instruction cycle, or just do it in
series and pretend that it did it in one cycle. And yeah, that's the
other thing that we need to look into but we
haven't done yet, have we? That's still one
of the things we want to look into for Squoosh. JAKE: Actually, so the status of
Threads, that's shipped, right? SURMA: Mm-hmm. Well, we've-- so
[? Ingvar, ?] our colleague, has done some experiments and
has got it to work with both Rust and C++, and we have seen
some pretty good performance improvement on those. SIMD is harder, because often
SIMD needs to be handwritten. The compiler often can only
figure out very clear cases to auto-vectorize, as it's
called, to automatically turn a loop into a SIMD instruction. That only works
in very few cases. The problem is that many codecs
that have SIMD instructions use handwritten assembly, but
not handwritten WebAssembly, but for other CPUs. So those don't compile
to WebAssembly. So we're in a niche
where we don't know how we can make use of
the SIMD from these codecs to make use of WebAssembly SIMD. But I think we'll
play around a bit, and maybe he'll find something. JAKE: But in terms of the
thread stuff, the new thread stuff, what browsers
has that landed in? SURMA: I know it's
in desktop Chrome. I know it's in Firefox
desktop, if you have COOP and COEP enabled. And I think we even have it
in Android's Chrome soon, if you have COOP and
COEP enabled, which we have on Squoosh. So we also have
[? feature detection. ?] So if you have support
on your browser, Squoosh will just use threads. If not, you will get the old
codec version without threads. JAKE: So I think
the reason for this is because on desktop Chrome,
you get process isolation out of the box, whereas you
don't get that in Firefox, and you don't get
that on Android, because there's more
of a memory concern. And that's when you need to-- SURMA: Exactly. JAKE: --yeah,
close all the doors to outside content, which
is the COOP and COEP stuff. We've got another
episode on that. We can link to that. SURMA: Oh, do we? JAKE: Yeah. I think we did. SURMA: We lost shared
array buffer for-- oh, we did that one, you're
right, with all the acronyms. Yeah, I mean we lost shared
array buffer for a while due to Spectre and Meltdown. And now we have found mechanisms
how we can bring them back without them being a risk,
which is COOP and COEP And we should link that
episode, because it was good. JAKE: Thank you. [LAUGHS] SURMA: [LAUGHS] JAKE: Well done, us. SURMA: [LAUGHS] All
right, that's it. That was my
WebAssembly speedrun. JAKE: Cool. Buh-bye. SURMA: Bye. [MUSIC PLAYING] JAKE: All right, [INAUDIBLE] SURMA: Oh no.