SURMA: (BEATBOXING)
Let's do some some 203. Are we good? JAKE: We are good. [THEME MUSIC PLAYING] SURMA: I don't know
if you've noticed, but we've built a thing. JAKE: Are we going to
talk about Squoosh again? SURMA: A little bit. JAKE: OK. SURMA: Maybe, but another
aspect of Squoosh. JAKE: All right. SURMA: So that's
kind of interesting. JAKE: OK. SURMA: So this might be a
long one, so bear with me. I'm going to start
at where we started, and then we kind of fell
down into this rabbit hole. And I want the audience to fall
into this rabbit hole with us. JAKE: Yes, and I'm really
looking forward to this one. Because sometimes when
we do these, one of us is maybe slightly pretending
to know less about the subject than we do. Whereas in this
one, there's a lot that I really don't understand. So-- SURMA: And I'm really worried
that I might not actually be able to explain everything
as much as you would like me to. JAKE: OK. Well, I'll-- SURMA: So let's see
where we end up. JAKE: Yes, I'll let
you know honestly. SURMA: Let's start with,
what are images on the web if we manipulate
them with JavaScript? So let's talk about image
data, which is a data structure that we use in Squoosh. So once we get an image
in and we decode it, and we turn it
into an image data object, which is a
data structure that exists on a platform. Basically, it has three
properties-- the width, the height, and data. And data, there is
a Uint8ClampedArray. And in there, you have just
four bytes for each pixel. JAKE: Yes. SURMA: And it's the first row,
and the second row, and so on. JAKE: And then each one is like
red, green, blue, alpha, right? SURMA: Exactly. So what you see
here is like it's a red pixel, then a green
pixel, and a blue pixel, and a white pixel. And because the image is 2 by
2, that is what the image would look like, right? JAKE: Huh, nice. SURMA: So it's
basically just a series of numbers with no concept
of rows or columns. But because of that
information, we can rearrange them
and interpret them as a proper
two-dimensional image. JAKE: Brilliant. SURMA: That's kind
of how it works. All right. So now, in Squoosh,
we had the goal to rotate an image
by 90 degrees. JAKE: Sounds like
a simple thing. Probably only take 10 minutes. SURMA: I mean, you wrote it-- the first version, right? And so let's talk
about how you wrote it. You rotate an image
by 90 degrees, gets an input image, which
is this image data object. JAKE: Yes, it is. SURMA: And what we
do, we figure out by 90 degrees, what is the
width and the new height, which is pretty much just
height and width swapped. JAKE: You're doing fancy
Surma code already. SURMA: It's a little bit because
otherwise it wouldn't fit. So I'm compressing things down. JAKE: Right, OK. SURMA: This is
actually kind of two-- JAKE: So here,
you're essentially assigning the
height to the width, and the width to the height,
because it's 90 degrees. Right, OK, I'm following. SURMA: And I'm
trading in new output image, which has this new
width and the new height. JAKE: Yes. SURMA: So now the goal is
to go through the pixels and put them in the right
spot in the output image. JAKE: Ba ba ba ba ba. SURMA: So what we do,
we for loop over all the pixels in the input
image, and we figure out where they would have to
land in the output image. So basically the
new x-coordinate is that kind of formula,
the new y-coordinate's that. Then we figure out
which input pixel it is, which output pixel,
and just copy it over. JAKE: More fancy
Surma code here-- wouldn't get through review. SURMA: I know. You don't like it. JAKE: OK, that's fine. SURMA: And then because we
have four bytes per pixel, we just loop over four times,
and just do another thing. We copy the r value, the
g value, the b value, and the a value. JAKE: Yep. And off we go. SURMA: And this works. And this was actually
decently fast. We shipped it this way. JAKE: We should say the reason
we did this rather than canvas is because we wanted
to run it in a worker. SURMA: That's an
entire different story. But yes, we did a lot
of tests with what seemed more fancy [INAUDIBLE]
technology didn't seem to work. So we ended up writing our
own piece of JavaScript just for this problem. JAKE: Yes, because
offscreen canvas, only in a couple of browsers,
whereas this is just basic-- SURMA: JavaScript. JAKE: --JavaScript, so
that works everywhere. SURMA: And it can
run in the worker because it has the image of it. So this-- we shipped this. This worked. JAKE: Yes. Yes, it did. SURMA: And then I looked at
some point, and was like, hmm, there's actually kind of
an obvious optimization that you missed. And so I basically
added a little patch. This all stays the
same, same as before. But now I'm creating
a u32 array. JAKE: Yes, yes. SURMA: But basically we
have the same underlying chunk of memory, but instead of
seeing it as a series of bytes, we see it as a
series of 32 bits-- numbers-- because every pixel
consists of a 32-bit number, right, for r, g, b, and a. And so this way, we can
simplify or actually remove the inner loop. JAKE: So it's this bit
that was here that-- doing something four
times every time, we're now just doing it once. SURMA: It's now one copy
operation which actually maps to a machine instruction. Most of the times, the V8 will
be, like, super smart and go, like, whoa fast. So this was actually
quite a bit faster. So, cool. And then we ship
this-- still fine. And then it turns out that for
some reason, in one browser, this was super slow. JAKE: Right. And we've been
advised by legal-- SURMA: By our legal department
to not name the browser. JAKE: Apparently it's a
Chrome policy not to-- SURMA: I've never heard
that before, but-- JAKE: No, we're not allowed
to talk about other browsers. So we can't mention
which browser it is. SURMA: But it's one that
didn't run in our machines. JAKE: It didn't run on
your machine, did it? You had to use a VM to run
this different browser. OK. SURMA: Either way-- like, most
browsers were fine, good enough at least, and then
for some reason, this one browser just ended
up being extremely slow, like unreasonably slow. So we must have hit
some weird corner case. JAKE: Yes. SURMA: Because this
browser isn't slow usually. It's a very good browser. JAKE: Yes, and different
JavaScript engines optimize with different things. So the fact that one
browser was slower here isn't saying that that
browser is terrible. It's just saying V8 is very good
with this kind of tight loop code, over engines that
have optimized for, like, more dumb bindings stuff. SURMA: Exactly. JAKE: So it wasn't
that surprising that one browser was completely
different in terms of problems with this piece of code. SURMA: So we thought,
well, what do we do? Maybe we frame a web assembly
at the problem, right? JAKE: Aye. SURMA: So we looked into that. And the first
problem we had that, when you write WebAssembly
and you load it, it turns into a module
that has functions, the functions that you
wrote in whatever language you were using. JAKE: Yes. SURMA: Right? JAKE: This is different
to an ECMAScript module. It's a Wasm module. It's a different thing. SURMA: It's a different thing. And these functions can only
take in and return numbers. So there is no easy
way, straight up, to pass in an image. So what do you do, right? So what we ended up doing-- I'm going to reuse the video
I made for my article-- JAKE: Oh, brilliant. SURMA: --basically,
the JavaScript was going to load the image, put
it into the WebAssembly memory, and then we're going to
use WebAssembly to just do the reordering within that
WebAssembly memory buffer and use JavaScript to
read it back afterwards. JAKE: Right. SURMA: So that means
the WebAssembly really is completely isolated from
all of the outer world, really, so to speak. It just has its chunk
of memory to work and will read in the
image, do the reordering that was shown before,
and then JavaScript comes back, takes over, and
reads back the resulting image. JAKE: So JavaScript
and WebAssembly, the thing they share is memory. That they-- SURMA: Pretty much. JAKE: WebAssembly,
it's its memory. SURMA: So this
WebAssembly.memory is WebAssembly-specific memory. But it is also exposed
as an array buffer that we can use as a u32
array or whatever we need in that very instance, right? JAKE: So the amount of memory
we need for WebAssembly is essentially double
the size of the image, because it's going to-- SURMA: Yeah. JAKE: --have the main image in
memory and then the next bit. OK. SURMA: So how do we create Wasm? We've done it before
with mScript in C. But that's also Rust. But we actually found a
very interesting project we stumbled over
called AssemblyScript. JAKE: Yes. SURMA: Which is a-- they call themselves
a type script to WebAssembly
compiler, which is true. But might be a little
bit misleading. Because you can't just
take any type script and compile it to WebAssembly. It is using the
TypeScript syntax and the TypeScript
standard library things, but with their own
version of their library that is specifically
tailored to WebAssembly. So what you can see here
is the signature way. Now we have types, as you
know, from TypeScript. But there's the I32 type, which
is the type WebAssembly has, but JavaScript doesn't. JAKE: And that's the
32-bit integer, right? SURMA: Yes, the
signed 32-bit integer. JAKE: Signed. SURMA: That's also be u32,
which is the unsigned. JAKE: Why are we using signed? SURMA: Four reasons. JAKE: Four reasons? OK. Let's gloss over it. This is good. Because I can recognize this. It looks a lot like JavaScript. It looks a lot like TypeScript. SURMA: And so will the
rest, except for two lines. So this looks the same. So we switch height and width. JAKE: Yep. SURMA: Now this is
a bit interesting. Because we have this
chunk of memory, we need to know where
our input image starts and where our
output image starts. That's what these
two variables are. So our input image starts at
0, at address 0 in this memory. JAKE: Which is always does. SURMA: Index 0, you can say,
and the output image is right after the input image ends. And the input image consists of
width times height times four-- JAKE: Four bits per pixel. SURMA: Bytes. JAKE: Bytes per pixel. [LAUGHTER] See the thing
about this, and I'm sorry to interrupt the flow. I should say that
I came to the web as a CSS person, CSS Front-End,
and I learned JavaScript. Whereas you came to
the web from being a programmer-- well, and then
you went to the web, right? SURMA: [INAUDIBLE] I
did embedded systems. Like I was literally
writing kernel code and low level
memory management. And I had no idea
about CSS and how to do UI and anything like that. JAKE: Right. SURMA: It's just two
completely different angles. JAKE: But I would say that
if anyone is watching this, thinking what is going on? SURMA: Yeah. This is-- JAKE: I am feeling
exactly the same. So don't worry
too much about it. All right. Come on. Let's-- Let's go. SURMA: But for now, these are
basic indices in the array. Where does the
input image start? Where does the
output image start? And then this looks familiar--
looping of all the pixels. JAKE: Yep. SURMA: And figuring out where
the new coordinates are. We did all this before. And now there's these two
AssemblyScript specific functions. The first one is
load, which allows me to load a u32 from the
memory at a given address. JAKE: Right. SURMA: And so in this
case, what I'm doing is I'm using the input image
space, where the image starts, plus the pixel I want to read. JAKE: So this is very similar
to what we were doing before with the uint32array. But it's where-- but there's
a special command to get it straight from
memory rather than-- SURMA: Yeah. Because it's a
WebAssembly memory. And that's, like-- JAKE: Right. SURMA: --implicit. It's not something you
get handed as a reference. It's just there is a
global, almost like. JAKE: But it's the same thing. We're passing the
same indices into it. Yes. SURMA: Exactly. JAKE: OK. SURMA: So we're
loading our pixel and then all we have
to do is write it back to the output image. And it's the same thing,
restoring the value v, which we just
read, back as a user to into the output image space. JAKE: OK. SURMA: And now we have
written AssemblyScript. JAKE: And then this
converts to WebAssembly. And what really
struck me with this is that if I wanted
to write WebAssembly, this is the tool I would use. SURMA: Yeah. JAKE: Because this looks
really familiar to me. SURMA: You don't have to
learn a new language, right? Because-- JAKE: Yeah. SURMA: --I think you've learned
a bit of C because of Squoosh. JAKE: Yes. SURMA: But that's pretty
much it, as far as I know. You've not written
Rust, I think. You kind of-- JAKE: I know PHP. [LAUGHTER] SURMA: [INAUDIBLE] PHP to a
WebAssembly compiler, then. JAKE: I would love it. It was the first
language I learned. SURMA: So we have
this function now. And now I want to compile
it to WebAssembly. And luckily, AssemblyScript
makes it very easy. So we just installed the
AssemblyScript package. And then we have an
ASC command, which we give our TypeScript file to. And it will give us
back a WebAssembly file, with no additional
glue or JavaScript, which I think is
quite interesting. Because most other
implementations for WebAssembly give you glue code,
which is the initial-- JAKE: A huge JavaScript file,
otherwise, is really difficult to deal with and work with. But this is just-- yeah, just Wasm, right? SURMA: So we did this. We got a rotate.wasm file. And now the interesting bit
might be how to load it. Because usually, glue
code loads it for you. But now you don't
have glue code. How does this work? It's actually not
that difficult. What you do is you take
the instantiate streaming function from the
WebAssembly object and put a fetch in there. Because the
WebAssembly compiler, at least the
non-optimizing one, can compile while the Wasm
file is still downloading. JAKE: So this instantiate
streaming takes a promise. SURMA: A promise, or
response, or an array buffer-- JAKE: That's a weird API. It's, like, why does
it take a promise? SURMA: Because they want to
make this simple-- if you don't have to wait the fetch, right? JAKE: OK. I don't agree with it. But that's fine. SURMA: Sure, fine. JAKE: You should just
put an array in there. SURMA: Either way-- I find it really interesting. It starts compiling while
it's still downloading. So it's not like
download, then compile. It's actually
almost in parallel, which for WebAssembly modules,
which can be quite big. You know? I think that the Unreal Engine
one is, like, 40 megabytes. That will make
quite a difference. JAKE: Yes, absolutely. Not so much here. SURMA: So no, absolutely not-- so yeah, the Wasm
module is, by the way, it's, like, 500
bytes or something. So it's really small. It's smaller than the compressed
ng's of JavaScript code that we had. JAKE: Nice. SURMA: That was
actually quite cool. So now we get an instance
back from this one. And on that instance,
we can have exports. And exports is all the
functions, but also the memory that we are going to work on. JAKE: Right. SURMA: So we can
grow our memory. Because we didn't
know what size it has. But we have to go to the size
that fits our images two times, right? Which we would
have to calculate. We'll skip this here. But I would-- JAKE: So that would just be
that the size of the [INAUDIBLE] array data times 2. SURMA: Yeah. JAKE: OK. OK. SURMA: And then I will
somehow load this image into the buffer,
which is really just-- memory has a dot buffer property
which is a normal array buffer. Plus we can use
all the [INAUDIBLE] to put data in there. JAKE: Right. SURMA: Just put it in. JAKE: Yep. SURMA: And then you call
rotate 90, and we'd image back, and you're done. JAKE: Ah. So exports has all
of the methods. SURMA: So this is the method. This is the magic, where
you call into WebAssembly. And you can also see
it's synchronous. So WebAssembly is something
that will actually take the control away from
JavaScript and do its thing, and then return the
control back to JavaScript. It's just like an
actual function. JAKE: OK. OK. SURMA: Which I
think is super nice. And so this was fast, and we
were super happy about this. JAKE: Yes. This was much faster than-- SURMA: It wasn't faster
in Chrome in the sense that it didn't
outperform JavaScript. It was as fast,
or almost as fast. But it was consistently
fast across all browsers. JAKE: Yes. It had taken the
browser that doesn't run a Mac from seven
seconds down to, like, 500 or something, 500 milliseconds. SURMA: It was very,
very acceptable. JAKE: Yes. It was really nice to
see that similar value-- SURMA: --across all browsers. So we were super
happy about this. So we opened PR in our
Squoosh and you reviewed it. And we wrote an article. And then "Hacker News" happened. JAKE: "Hacker News" happened. SURMA: And that's something
I would never say. Because usually the
comments on our articles are quite annoying. [LAUGHTER] JAKE: "Hacker News" can
sometimes be quite pedantic, I find. But in this instance,
there was some pedantry. But the pedantry was
really interesting. SURMA: It was
really interesting. JAKE: Some fascinating results-- and just a lot of it
I didn't understand. And I hope you're going
to explain it to me. SURMA: Yeah. So someone said, why
aren't they using tiling? Tiling would make
this so much faster. Let me quickly try it. And yeah, I totally
did it for something, like, 20 milliseconds. I was, like, what? JAKE: Yeah. So they had taken it
from-- what was it, sort of framed to fall
into milliseconds down to-- what was it? SURMA: I think 40. JAKE: 40, which is such
a huge improvement. And that was even
faster than we were seeing from a canvas element. Yeah. SURMA: So and I had to
obviously sit down and actually understand what's happening. So let's talk about
what tiling actually is. JAKE: Yes. Please do. Because I have no idea. SURMA: So I'm going
to explain tiling. But there was also
another suggestion for performance optimization. I'm going to talk
about both of these. But I'm going to get
the other one first to get it out the way. Basically, some
people were saying, oh, if you look at
this y times width, it's completely independent
of the inner loops. If I move it out between the
outer and the inner loop, that would make it faster. Because that calculation
can happen only once per outer loop. It doesn't need to happen
every time in the inner loop. JAKE: Yes. And I thought this was going
to be the kind of thing that the optimizer thingy doo
dah would take care of for me. SURMA: And it is. JAKE: Ah. SURMA: So this is
the kind of advice where you don't have to worry
about these kind of things. Like moving constants
out of a loop is something that not only most
compilers can do-- so, like, the [INAUDIBLE] compiler could
do this or the Rust compiler-- but even the V8 compiler
that go from the JavaScript to machine code or
from WebAssembly byte code to machine
code, will do this. So this is an optimization
that we don't have to do. And where we can say let's
keep it readable and obvious and don't introduce
another variable where people that
read the code would have to have even more
state in their head to understand what's going on. JAKE: Yes. OK. SURMA: But the other
thing is tiling. And tiling is something
that I hadn't heard of. I actually had heard of it. But I also was
under the impression that compilers
would do it for us. And in this case, it is not. JAKE: What is it? Tell me. SURMA: So what is tiling? So this is an image-- JAKE: Correct. SURMA: --it's actually the
album cover of our podcast. I don't know. Did you know that we
do a podcast, Jake? JAKE: We do a podcast, as well. SURMA: We should link to it
in the description, Jake. JAKE: Yes. We should. SURMA: So we have been reading
this image so far like this. We've been going row by row. JAKE: Yeah. SURMA: And just--
what is this pixel? Where does it belong? OK. Copy. And look at the next
pixel in the same row. That's kind of what we
did, and we thought fine. Tiling is a different
approach where you tile the image into tiles. JAKE: That's good. Yeah, those are tiles. Excellent. SURMA: And then do
whatever you're trying to do within a tile first. So instead of going row by
row, you just go tile by tile. And within the tile,
you go row by row. JAKE: This is legitimately-- SURMA: It's the same thing. JAKE: This is legitimately
a different way of doing the same thing. SURMA: I know. Now the interesting thing
is that this turned out to be so much faster. JAKE: Yeah. Like, a tenth of the time. I still don't understand yet. SURMA: So let's implement
this real quick. Which it's not actually-- JAKE: Can I just say one of
our previous recent episodes, we talk about the dangers
of over optimization. SURMA: Yeah. JAKE: And why are we doing this? SURMA: Because it ends
up being so much faster. JAKE: OK. OK. OK. SURMA: We actually,
with this optimization, we end up going well below
100 milliseconds, which within the RAIL
guidelines makes it feel like an instantaneous
response to the button. JAKE: It is an
optimization that matters. Cool. SURMA: And before that, we
were at, like, 300 to 500. Which was fine, but if
you can go under 100, we should go under 100. JAKE: Especially
for bigger images. OK. SURMA: So basically, I just do
an additional two outer loops. Which usually sounds
wrong, but in this case is very, very right,
where we iterate over all the tiles that we have. And then, in there, we basically
have the same old loop, where we loop over
each individual tile. JAKE: I'm starting
to hyperventilate. Why? OK. SURMA: So this is
tiling implemented. JAKE: So I get it. SURMA: Let's talk about why
this might make things faster. JAKE: OK. That is the bit I
don't understand. SURMA: So originally, tiling-- when I Googled tiling
and researched it, it was mostly the use case
for matrix multiplication, which is a different use case. Because input values
are used multiple times. If you multiply
two matrices, you have to read the cell
at 1, 1 multiple times for, I think, for each
column that you're calculating the output matrix. JAKE: OK. SURMA: So it makes sense
that, if you do tiling, you have a better chance
of having that value still in the cache. We're talking now processor
level 1 cache, by the way. JAKE: So hang on. OK. We will need to explain what
that is at some point, as well. But my feeling is, by
reading memory sequentially, you're more likely
to hit caches. Because you're dealing
with a little bit of memory that was very close to
the last bit of memory. SURMA: So if I have
these two really big matrices, and I go through the
first row of the input matrix, by the time I come-- I end up at the end, the
values from the start might have been kicked
out of the cache. Because level 1 cache in the
processor is really small. We're talking like 200 kilobytes
of cache, maybe, or less. JAKE: Right. So the processor
has an L1 cache-- SURMA: Which is,
like, super fast. JAKE: --so that there's
these set of caches that gets bigger and slower. SURMA: Yeah. JAKE: Until you get to memory-- SURMA: Actually, memory
is actually really slow. JAKE: Memory is really slow-- in relative terms. SURMA: Yeah. JAKE: OK. SURMA: And so what
the tiling does is, by shortening
the amount of time you spend going away
from the initial value, you have a better chance of
having the initial value still in the-- JAKE: Buh, buh, buh,
buh, buh, buh, but-- SURMA: For matrix
multiplication. So with this one-- JAKE: Yeah. SURMA: This is
going to make sense why this would make it faster. JAKE: Because the second row is
a massive jump from the first-- SURMA: Yeah. The rotation-- we
read every value once, and we write it once. So why would caching
make things better? JAKE: That is roughly the
question I have in my head. SURMA: So there's two theories. And I don't know which one
of them is actually true. JAKE: Are you telling
me you don't even know? SURMA: Well, I even talked to
Benedict, our V8 VM engineer. And he's, like, I
have two theories. But it's really hard to test. JAKE: OK. OK. SURMA: So one version
is that [INAUDIBLE] are really smart at
predicting what memory you are going to grab next. So by basically
seeing the tiles, it can make better
predictions what-- JAKE: Oh. SURMA: --cells to grab,
already put into the cache for you, even though you
haven't exited that code yet. And the other thing is that,
because the cache is so small, that there's a certain
pattern which cell can be cached in which cache cell. So this gets a
little bit confusing. But basically, if
you think about it, if you have like
three cache cells, just three individual cells-- JAKE: What can go in a cell? SURMA: One value. JAKE: One value, OK. OK. SURMA: OK. So memory error zero can
only go in cache cell zero. And cache one can go
in cache cell one. Cache two can go
in cache cell two. Memory error three can
only go in zero again. You wrap around, right? So you assign those. JAKE: Yep. SURMA: And then again,
by keeping it smaller, you have a better chance of
not overwriting the old value that you have put into
your level one cache. So basically, all this is
about is, by making things, making your access
memory smaller, so that you don't
evict the cache from the things that you-- JAKE: No, this is
basically, it's working because our
inner loops are smaller. SURMA: Yeah. JAKE: Right. SURMA: So it makes the processor
make better predictions and also make the processor
not evict the cache. Because the area we
work on is smaller. JAKE: So then does
the tile size-- yeah, what's the tile size? SURMA: So that's what
I thought, right. And so I did some benchmarks. On a MacBook, on an
iMac, and a Pixel 3-- because the bigger the machine
or the bigger the processor, the bigger the level
one cache usually is. JAKE: Right. SURMA: So the iMac that I have
is an 18 core massive processor thing. It has massive [INAUDIBLE],,
while the Pixel 3, obviously, has a very, very
tiny level 1 cache. JAKE: All this code is
single call, anyway, right? SURMA: Yeah. JAKE: Yeah. SURMA: So basically, at
zero is the relative time it took for no tiling. JAKE: So that's the original-- SURMA: That's baseline tiling. JAKE: --wasn't, yes. SURMA: So what you
can see here is how the time shifted
relatively to that base time, depending on
what the tile size is. JAKE: Interesting. SURMA: So if I have
a tile size of two, a two-by-two pixel grid,
it makes the code slower. Which is not very
surprising, because you have so much more looping
going on and more jumps. JAKE: OK. SURMA: It gets faster
really, really quick. At some point, over here, you
kind of hit level one cache boundaries where it
then gets slower again. JAKE: Right. I see. OK. SURMA: To be honest, there's
one weird thing where the Pixel 3 is slow, even with
the massive grid, which I'm not quite
sure why that is. I think-- JAKE: You mean it's fast even-- SURMA: Yeah. JAKE: It's faster. SURMA: I expected the
pace was going to, like, go up somewhere around here. JAKE: You would assume
the level one cache is less than any Mac book. SURMA: It probably is. And there's probably
another effect here that I don't
quite understand. JAKE: OK. SURMA: But what I felt-- JAKE: There's different
architecture, as well, in that processor. OK. SURMA: But it seems to be a
sweet spot between, like, 16 and, I don't know, 64,
depending on what you want. I think 16 looks really
promising in this graph. Which means you have, like,
a 256 pixel grid that you work with. JAKE: I thought I was going
to come into this episode and I was going to
go away understanding why the tiling works. No. It just does. SURMA: I spent the
last week on this. Right? JAKE: Right. SURMA: You've been kind
of sitting across from me and hearing me talking to people
and trying to figure this out. This is as close I've
gotten to understand it. In that there is
this interaction between the processor predicting
what values went in the cache. And then, not forcing the
processor to evict that cache, because you read too far ahead. JAKE: But this is a massive case
for tools, not rules, right? Don't go away and
rewrite all your code-- SURMA: With tiling. JAKE: --with tiling. SURMA: No. Right. JAKE: This is
something you would have to very carefully profile
on a wide range of machines with different processor
architectures to see is this actually
working across-- SURMA: And also I
find it interesting because we started at let's
rotate an image, a very high level use case. And we fell down. And ended up with,
like, let's talk about processor architecture
and level one caches. JAKE: Yes. SURMA: So thanks
to "Hacker News", I guess, for ruining my week. But it's been actually
been very educational, even though I still don't
fully understand it. JAKE: But I feel like-- SURMA: I'm OK with that. JAKE: Yeah, and I feel that my
understanding of lower level stuff is-- like I say, there's
that confusion element. But I feel like I've got
an appreciation for-- SURMA: The smarts, right-- JAKE: Yeah. SURMA: --that go into that. JAKE: It's incredible. SURMA: So let's take a breather. And we'll see our poor audience. next time. [LAUGHTER] JAKE: But this is
going into Squoosh in-- SURMA: Yeah. It's going to be-- [INTERPOSING VOICES] [MUSIC PLAYING] JAKE: Yes. SURMA: Oh. That's the one thing-- ah. [GRUNTS] I have to
write this down. How do I fix this? JAKE: [LAUGHS] (SINGING)
Something for the edit. Something for the edit. SURMA: OK. Let's go from here.