[MUSIC PLAYING] RICARDO CABELLO: All right. Hello, everyone. Thanks for coming
to this session. I hope you're having
a good conference. Before we begin, we'll be
sharing a lot of links. But don't worry-- you'll
find them on the YouTube page whenever this recording goes up. OK. Let's start then. My name is Ricardo. And together with
Corentin, I will be talking about the
future through the graphics on the web. But before we do that,
let's have a quick look at the past and present. WebGL landed in browsers
in February 2011. That was in Chrome
9 and Firefox 4. Those browsers were the
first ones who implemented. Back then, with the
Google Creative Lab, we created a
interactive music video that aimed to showcase the
new powers the technology was bring into the web. It was a pretty big project in
between creators, directors, concept artists, animators. Around 100 people worked on
the project for half a year, and 10 of us were a JavaScript
graphics developers. We knew the workflow and tools
were very different compared to traditional web development. So we also made the
project open source so others could use
it as reference. Some years later, Internet
Explorer and Edge and Safari implemented WebGL
too, which means that today the same experience
works in all major browsers, in desktops, and
tablets, and phones. What I find most
remarkable is the fact that we didn't have to modify
the code for that to happen. Anyone with experience
doing graphics programming knows that this is
rarely the case. Usually we had to recompile the
project every couple of years when operating systems
update or new devices appear. So here's a quick recap--
just double checking. WebGL is a JavaScript API that
provides binding to OpenGL. It allows web developers to
utilize the user's graphic card in order to create an
efficient and performance graphics on the web. It is a low level API, which
means that it is very powerful, but it's also very verbose. For example, graphics cards
main primitive is a triangle. Everything is done
with triangles. Here's the code that we're
going to need to write in order to display just one triangle. First, we need to
create a canvas element. Then with JavaScript, we get
the context for that canvas. And then, things get pretty
complicated pretty fast. After defining positions
for each vertex, we have to add them to a
buffer, send it to a GPU, then link the vertex
and fragment shaders. Compile a program
that will be used to inform the graphics card
how to fill that those pixels. So that's why a bunch
of us back then started creating libraries and
frameworks that abstracts all that complexity so developers
and ourselves could stay productive and focused. Those libraries
take care of placing objects in 3D space, material
configurations, loading 2D and 3D assets, interaction
sounds, et cetera, anything for doing any sort of
game or application. Designing those
libraries takes time. But over the years,
people have been doing pretty amazing
projects with them. So let's have a look at
what people are doing today. So people are still doing
interactive music videos. That's good. In this example, "Track"
by Little Workshop not only works on
desktop, mobile, but it also works on
VR devices, letting you look around while traveling
through glowing tunnels. Another clear use of the
technology is gaming. "Plume" is a beautiful
game developed by a surprisingly small team and
was released in our last year's Christmas experiment. Another one is a
web experiences. In this case, the
"Oat the Goat" is an interactive
animated storybook designed to teach
children about bullying. And the folks at Assembly
used Maya to model and animate those characters and then
export it to glTF via Blender. For rendering the
user GIS, they brought like 13,000 lines of TypeScript
to make the whole thing work. And another very common use
is a product configurators. The guys at Little
Workshop, again, show how good those
can look in this demo. But those use cases
do not end there. People are doing
data visualizations, enhancing newspaper articles,
virtual tours, documentaries, movie promotions, and more. Like, you can check the ThreeJS
website and the BabylonJS website to see more
of those examples. However, we don't want
to end up in a world where the only HTML
elements in your page is just a canvas
stack and script tech. Instead, we must find ways
of combining WebGL and HTML. So the good news
is that lately we have been seeing more and more
projects and examples of web designers utilizing
bits of WebGL to enhance their HTML pages. Here's a site that
welcomes the user with a beautiful,
immersive image. We're able to interact
with the 3D scene by moving the mouse
around the image. But after scrolling
the page, then we reach a traditional
static layout with all the information
about the product, as traditional websites
usually look like. The personal portfolio
of Bertrand Candas shows a set of deep elements
affecting dynamic background. It's a little bit dark but OK. With JavaScript, we can
figure out the position of those deep elements. And then we can use
that information to affect the physical
simulation that happens on the 3D scene
on the background. But for underpowered
devices, we can just replace that WebGL scene
with a static image, and the website is
still functional. Another interesting
trend we have been seeing is the websites that
use distortion effects. The website for Japanese
director, Tao Tajima, has a very impressive
use of them. However, the content is actually
plain and selectable HTML. So it is surprising
because, as you know, we cannot do these kind
of effects with CSS. So if we look at it again, what
I believe that they are doing is they had the DOM
elements that they are copying the pixels
all those elements into the background
of WebGL canvas. Then they had a DOM element
that they apply the distortion. They did the
finished transition, and they put the
next DOM on top. So it's still something that
you can enable and disable depending on if it's
more-- it works on mobile, some other things, but something
that you can progressively enhance basically. One more example
to cite for this that applies the distortion
effect on top of the HTML basically making the
layout look truly fluid. Then again, this is
something surprising because it wouldn't
be possible with CSS. So I think those are
all great examples of the kind of results you can
get by mixing HTML and WebGL. But it still requires
the developer to diving into JavaScript. And that, as we know, can
be a little bit tedious to connect all the parts. If you're more used to
React, this new library by Paul Henschel can be
a great option for you. React Three Fiber
mixes React concepts on top of previous abstraction. So here's the code for the
animation that we just saw. Notice how the previously
defined Effect and Content components is composed
into the canvas. It makes the code much more
reusable and much easier to maintain. However, I think that we can
still make it even simpler. And the Web Components-- I believe Web
Components will allow us to finally bring
all the power of WebGL right into the HTML layer. We can now encapsulate all those
effects in compostable custom elements and hide all
the code complexity. So for example, here
is another project that we did for the WebGL
launch eight years ago. It was kind of a globe platform. It was a project that
allowed JavaScript developers to visualize different data
sets on top of a globe. You will have the library. You have your data, and then
you'll have to use different-- manage here different parts
of the data to display. But even if we tried
to hide the WebGL code, developers still had to
write custom JavaScript for loading the data
and configure the globe and append it to the DOM. And the worst part
was the developers will still have to handle
the positioning of the DOM object and the resizing. And it was just difficult to
mix it with a normal HTML page. So today, with Web Components,
we can simplify all the code. We use those two lines. The developer only has to
include the JavaScript library on their website. And powerful custom
element is now available to place whenever
they need in the DOM. Not only that,
but at that point, by duplicating the line they
can have multiple globes. Before, they will have to
duplicate all the code, and it will be, again, harder,
more code to read and parse. A component that is
already ready to use, the previous one
is not ready yet. This one, model-viewer,
is already ready. And for this one, basically,
we wanted to do that. The problem is that displaying
a 3D models on the web is still pretty hard. So we really wanted
to make it as simple as embedding
an image in your page, like as simple as
adding an image tag. So that's the main goal. For this one,
again, the developer only has to include
a JavaScript library. And then a powerful
a custom element is ready to display
like any 3D models while using the
glTF open standard. An important feature of
HTML text is accessibility. For low vision and
blind users, we're trying to inform
them on both the 3D model of what the 3D model
is and also orientation of the model. Here you can see that the view
angle is being communicated verbally to the
user so they can be oriented with what's going on. And also it prompts
for how to control the model with keyboard. And I see exit back to
the rest of the page. The model-viewer also supports
AR, Augmented Reality. And you can see how
it's also really being used on the NASA website. So use by adding the
AR attributes is going to show an icon,
and it's going to be able to launch the AR viewer
for both an Android and iOS. For iOS, you have to
include the USCC file. And lastly, while
building the components, we realized that
depending on the device you can only have up to
eight WebGL context at once. So if you create a new one,
the first one disappears. It is actually like a
well-known limitation of WebGL. But it's also good
practice to only have one context for
keeping memory in one place. The best solution
that we found for this was creating a single
WebGL context off screen. So like [INAUDIBLE]. And then we use
that one to render all the model-viewer
elements on the page. We also utilized the
IntersectionObserver to make sure that
we are not rendering objects that are not in view. And also ResizeObserver whenever
detecting either the developer is modifying the size, we
re-rendering if we have to. But we all know how the web is. Sooner than later
someone we want to display hundreds of
those components at once. And that is great. We want to allow for that. But for that, we'll
need to make sure that the underlying APIs are
as efficient as possible. So for that, now
Corentin is going to share with us what's
coming up in the future. Thank you. [APPLAUSE] CORENTIN WALLEZ: OK. Thank you, Ricardo. This was an amazing
display of what's possible on the web
using GPUs today. So now I'll give a
sneak peek of what's coming up next in the
future where you'll be able to extract even more
computational power from GPUs on the web. So hey, everyone. I'm Corentin Wallez. And for the last
two years at Google, I've been working on an
emerging web standard called WebGPU in collaboration with
all the major browsers at W3C. So WebGPU is a new API that's
the successor to WebGL. And it will unlock the
potential of GPUs on the web. So now you'll be asking,
Corentin, we already have WebGL, so why are
you making a new API? The high level
reason for this is that WebGL is based on
an understanding of GPUs as they where 12 years ago. And in 12 years, GPU
hardware has evolved. But also the way we use
GPU hardware has evolved. So there is a new
generation of GPU APIs in native-- for example, Vulkan
that helped do more with GPUs. And WebGPU is built to
close the gap with what's possible in native today. So it will improve what's
possible on the web for game developers, but not
only, it will also improve what you can
do in visualization, in heavy design
applications for machine learning practitioners
and much more. So for the rest of
the session, I'll be going through specific
advantages or things that WebGPU improves over
WebGL and show how it will help build better experiences. So first, WebGPU is still
a low level and verbose API so that you can
tailor usage of where GPU to exactly what
your application needs. This is the triangle
Ricardo just showed. And as a reminder,
this was the code to render that
triangle in WebGL. Now, this is the
minimum WebGPU code to render the same triangle. As we can see, the complexity
is similar to WebGL. But you don't need to worry
about it because if you're using a framework,
like Three or Babylon, then you'll get the benefits
transparently for free when the framework
updates to support WebGPU. So the first limitation that
WebGL frameworks run into is the number of
elements or objects that can draw each frame
because each drawing command has a fixed cost and
needs to be done individually each frame. So with WebGL, an
optimized application can do a maximum 1,000
objects per frame. And that's kind of
already pushing it. Because if you want to target
a variety of mobile devices and desktop devices,
you might need to go even lower than this. So this is a photo
of a living room. It's not rendered. It's an actual photo. But the idea is that
it's super stylish, but it feels empty and cold. Nobody lives there. And this is sometimes
what it feels looking at WebGL
experiences because they can lack complexity. In comparison, game developers
in native or on consoles are used to, I don't know,
maybe 10,000 objects per frame if they need to. And so they can build
richer, more complex, more lifelike experiences. And this is a huge difference. Even with the limitation
in the number of objects, WebGL developers have been able
to build incredible things. And so imagine what
they could do if they could render this many objects. So BabylonJS is another
very popular 3D JavaScript framework. And just last month,
when they heard we were starting to implement
WebGPU, they're like, hey, can we get can we
get some WebGPU now? And we're like, no. It's not ready. It's not in Chrome, but
here's a custom build. And the demo I'm going to
show is what they came back to us with just two days ago. So can we switch to
the demo, please? All right, so this is a complex
scene rendered with WebGL. And it tries to replicate
what a more complete game would do if every object
was drawn independently and a bit differently. So it doesn't look like it, but
all the trees and rocks and all that there are
independent objects and could be different objects. So in the top right
corner, there's the performance numbers. And we can see that as we zoom
out and we see more objects the performance starts
dropping heavily. And that's because of the
relatively high fixed cost of drawing each object,
of sending the command to draw each object. And so the bottleneck here
is not the power of the GPU on this machine or
anything like that. It's just JavaScript
iterating through every object and sending the command. Now let's look at
an initial version of the same demo in WebGPU. And keep in mind this was
done in just two weeks. So as the scene
zooms out, we can see that the performance
stays exactly the same, even if there's more
objects to draw. And what's more, we can see
that the CPU time of JavaScript is basically nothing. So we are able to use
more of the GPU power because we're not
bottle-necked on JavaScript. And we also have
more time on the CPU to run our applications logic. So let's go back to the slides. What we have seen is that for
this specific and early demo, WebGPU is able to submit
three times more drawing comments than WebGL
and leaves room for your applications logic. A major and new version of
BabylonJS, BabylonJS 4.0 was released just last week. And now, today, the
BabylonJS developers are so excited about
WebGPU that they are going to implement full
support for the initial version of WebGPU in the next version
of BabylonJS, BabylonJS 4.1. But WebGPU is not just about
drawing more complex things with more objects. A common operation
done on GPUs are, say, post-processing image filters-- for example, the depth
of field simulation. We see this all the time
in cinema and photography. For example, this
photo of the fish, we can see the fish is in focus
while the background is out of focus. And this is really important
because it gives us the feeling that the fish is lost
in a giant environment. So this type of effect
is important in all kinds of rendering so we can get a
better cinematic experience. But it's also used
in other places like camera applications. And of course, this is one
type of post-processing filter, but there's many other cases
of post-processing filters, like, I don't know,
color grading, image sharpening, a bunch more. And all of them can be
accelerated using the GPU. So for example, the
image on the left could be the background
behind the fish before we apply
the depth of field. And on the right, we see the
resulting color of the pixel. What's interesting is that
the color of the pixel depends only on the color
of a small neighborhood in the original image, in a
small neighborhood of the pixel in the original image. So imagine the grid on
the left is a neighborhood of original pixels. We're going to
number them in 2D. And the resulting color will be
essentially a weighted average of all these pixels. Another way to look at it
is to see that on top we have the output
image, and the color of each of the output pixels
will depend only on the 5 by 5 stencil of the input
image on the bottom. The killer feature of
WebGPU, in my mind, is what we call GPU Compute. And one use case
of GPU Compute is to speed up local image
filters, like we just saw. And so this is going
to be pretty far from DOM manipulation, like
React, or amazing web features, like course headers,
so please bear with me. We're going to go through
it in three steps. First, we look at how
GPUs are architectures and how an image filter in
WebGL uses that architecture. And then we'll see
how WebGPU takes better advantage
of the architecture to do the same image
filter but faster. So let's look at how GPU
works, and I have one here. So this is a package
you can buy in stores. Can you see it? Yes. So this is a package you can
buy in stores and the huge heat sink. But if we see inside,
there's this small chip here. And this is the actual GPU. So if we go back
to the slides, this is what we call a die shot,
which is a transistor level picture of the GPU. And we see a bunch of
repeating patterns in it. So we're going to call
them execution units. These execution units are
a bit like cores in CPUs in that they can run in parallel
and process different workloads independently. If we zoom in even more in
one of these execution units, this is what we see. So we have, in the
middle, a control unit which is responsible for
choosing the next instruction-- like, for example, at two
registers or load something from my main memory. And once it has
chosen an instruction, it will send it to all the ALUs. The ALUs are the
Arithmetic and Logic Units. And when they receive an
instruction, they perform it. So for example, if they
need to add two registers, they will look at their
respective registers and add them together. What's important to see is
that a single instruction from the control unit will
be executed at the same time by all the ALUs, just
on different data because they all have
their own registers. So this is single instruction
multiple data processing. So this is the part
of the execution unit that is accessible from WebGL. And what we see is that
it's not possible for ALUs to talk to one another. They have no ways
to communicate. But in practice, GPUs
look more like this today. There is a new
shared memory region in each of the execution units
where ALUs can shared data with one another. So it's a bit like
your memory cache in that it's much cheaper
to access than the main GPU memory, but you can program it
directly, explicitly, and have ALUs share memory there. So a big benefit
of GPU Compute is to give developers access to
that shared memory region. This was the architectures of
GPUs and their execution units. So now we're going to look at
how the image filter in WebGL maps to that architecture. For a reminder, this
was the algorithm we're going to look at. And in our example, since our
execution units has 16 ALUs, we're going to compute
a 4 by 4 block, which is 16 pixels, of the
output in parallel. And each ALU will
take care of computing the value for one output pixel. And this is GPU pseudo-code
for the filtering WebGL. And essentially it's
just a 2D loop on x and y that fetches from the inputs and
computes the weighted average of the input pixels. What's interesting here is
the coordinates' argument to the function is a
bit special because it's going to be pre-populated
for each of the ALUs. And that's what we'll make
that ALUs each do an execution on different data because
they start populated with different data. So this is a table for the
execution of the program. And likewise, we can see the
coordinates are prepopulated. So each column is the
registers for one of the ALUs. And we have 16 of
them for the 16 ALUs. So the first thing that happens
is that the control unit says, hey, initialize sum to 0. So all of them
initialize the sum to 0. And then we get to the first
iteration of the loop in x, and each ALU gets
its own value for x. Likewise, each ALU you
gets its own value for y. And now we get to the line
that does the memory load of the value of the inputs. So each ALU has
a different value of x and y in their registers. And so each of them will
be doing a memory load to a different
location of the input. Let's look at this
register at this ALU. It's going to do a memory load
at position minus 2, minus 1. We're going to get
back to this one. So if we go in and do another
iteration of the loop in y, likewise, we update
the y register, and we do a memory load. What's interesting
here is that the first ALU you will do a memory
load in minus 2, minus 1. That's a redundant
float because we already did it at the last iteration. Anyways, the loop
keeps on looping, and there's more loading
and summing and all that that happens. And in the end, we
get to the return, and that means the sum will get
returned to the output pixel and the computation for our
4 by 4 block is finished. Overall, the execution
of WebGL of the algorithm in WebGL for a 4 by 4
block did 400 memory loads. The reason for this is we have
16 pixels and each of them did 25. So now, this was how the
filter executed in WebGL. We're going to look at how
WebGPU uses the share of memory to make it more efficient. So we take the same
program as before. It's that exact
same code, and we're going to optimize it
with share a memory. So we introduce a
cache that's going to contain all the
pixels of the input that we need to do
the computation. This cache is going
to be in shared memory so that it's cheaper to
access than the actual input. It's like a global
variable that's inside the execution unit. Of course, we need to modify the
shader to use that input tile. And because the input tile
needs to contain values at the beginning, we can't
just start like this. So this function is going to be
a helper function that computes the value of the pixel. And we're going to have a
real main function that, first, completes the cache,
and then calls the computation. So like the previous
version of the shader, the coordinates are
pre-populated so each of the ALUs does a
different execution. And then all the ALUs work
together to populate the cache. And there's a bunch of
loops and whatnots there, but it's not really important. So I'll spare you this. What's interesting to see
is that only 64 pixels of the input are loaded
and put in the cache. There is no redundant
memory loads. Then we go through
the main computation of the value and likewise. This is very similar to
what happened before. But on this line,
the memory load is now from the shared memory
instead of the main memory, and this is cheaper. So overall, thanks
to the caching of the tile of the
input, the WebGPU version didn't do any redundant
main memory load. So for our 4 by 4 block,
it did 64 memory loads. And like we saw before,
WebGL had to do 400. So this looks very biased
in favor of WebGPU, but in practice, things are a
bit more mixed because WebGPU didn't do main memory loads. But it did a bunch of
shared memory loads, and it's still not free. And also WebGL is a bit
more efficient than this because GPUs have a
memory cache hierarchy, and so some of these
memory workloads will have hit the cache that's
inside the execution unit. But the point being,
overall, WebGPU will be more efficient
because we explicitly are able to cache input data. So the code we
just talked about-- in the graphics world, it's
called an image filter. But if we look at the
machine learning world, it's called a
convolution operator. All the optimizations
we talked about, they also apply to Convolutional
Neural Networks, also known as CNNs. So the basic ideas for CNNs were
introduced in the late '80s. But back then it was just too
expensive to train and run the models to produce the
results we have today. The ML boom of the last
decade became possible because CNNs and
other types of models could run efficiently
on GPUs in part thanks to the optimization we just saw. So we are confident
that machine learning web frameworks such
as TensorFlow JS will be able to take
advantage of GPUs to significantly improve the
speed of their algorithms. Finally, algorithms can be
really difficult to write on GPUs using WebGL. And sometimes they're just
not possible to write at all. The problem is that in WebGL,
where the output of computation goes is really,
really constrained. On the other hand, GPU
Compute that WebGPU has is much more flexible because
each ALU can read and write memory at any place in
the computer memory. This unlocks a whole new
class of GPU algorithms from physics and particle
based food simulation, like we see here, to
parallel sorting on the GPU, mesh skinning, and many,
many more algorithms that can be offloaded from
JavaScript to the GPU. So to summarize, the
key benefits of web GPU are that you can have increasing
complexity for just better and more engaging experiences. And this is what we have
seen with BabylonJS. It provides performance
improvements for scientific computing,
like machine learning. And it unlocks a whole
new class of algorithms that you can offload from JS
CPU time to run on the GPU in parallel. So now you're like, hey,
I want to try this API. You're in luck. The WebGPU is a group effort,
and everyone is onboard. Chrome, Firefox, Edge, Safari-- they're all starting
to implement the API. Today, we're making an
initial version of WebGPU available on the
Chrome Canary on MacOS, and other operating systems
will follow shortly. To try it you just need to
download Chrome Canary on MacOS and enable the experimental
flag and safe web GPU. And again, this
is an unsafe flag, so please don't browse
the internet with it for your daily browsing. More information about WebGPU
is available on webgpu.io. So there's the status
of implementations. There's a link to some
samples and demos. A link to a forum where
you can discuss WebGPUs. And we're going to add more
stuff to this with articles to get started and all that. What we'd love is for
you to try the API and give us feedback on
what the pain points are, what you'd like the
thing to do for you, but also what's going great
and what you like about it. So thank you, everyone,
for coming to this session. Ricardo and I will be at the
Web Sandbox for the next hour or so if you want
to discuss more. Thank you. [MUSIC PLAYING]
So many "You can't do this in CSS!" examples.
Why doesn't the CSSWG fix this?
Maybe because it shouldn't be the job of css to do heavy operations when it comes rendering a page. Also, I think it would overly bloat the language and make it harder for browser makers to maintain parity when implementing the spec.