(lively music) (audience applauds) - I'm sure that none of you
really need an introduction for our closing plenary speaker. I know everyone is really tired but if you could all get up
some energy, put it together for Chandler Carruth, who is going to scare the heck out of us about the things that can get into our systems. (audience applauds) - All right everybody. How are folks doing, folks like energized, you sticking strong to
the end of the conference? It's been a long week. I'm here to talk to you about Spectre. How many folks here do not
know anything about Spectre, have no idea why this is
even an interesting talk? It's okay, you can put your
hand out, it's not a problem. A few people. Spectre is a big security issue that was, kind of, uncovered over a year ago. It seemed really interesting to me to come and give a talk
about this, in part because, last year, I was up on a
stage here giving a talk. I was really hoping to
actually roll the video for you but I don't think we managed to get all of the technical
issues sorted out here. But the key thing here is,
the very last question. In my talk last year, I didn't
give a very good answer to. Someone asked me, the whole talk was about
speculative execution. If you haven't seen it, it's a great talk, not to self-promote but it's a great talk. At the end of it, someone asked, what happens to instructions
that are speculatively executed if they would, like crash
or do something weird? Very fortunately for me, that
was at the end of the talk, I said that, I don't know
and I kind of blew it off and I said the session was over and we wrapped up for the day. That's not a great response and so, I'm gonna give you an entire talk instead. (audience laughs) Before we get too far into it, we gotta set some ground rules. I'm talking about security issues today and I'm actually not a security person, you may not know this,
I'm not a security expert. I'm not gonna know all of the answers that I'm talking about
here okay, and that's okay. That's part of why a bunch of
the Q&A is gonna be in a panel that we have after the talk so I can bring some other experts who've been working on Spectre with me and with a lot of other
people in the industry up onto the stage and they can help me out in answering your questions. But we need some ground rules 'cause security can be
tricky to talk about. A good friend of mine who's
also been working on this was at a conference and he was
talking about security issues and they were having a
great holiday conversation and he ended up tweeting about, I think I can probably attack this particular vulnerability this way and didn't really give a lot
of context, it was a tweet, we've all made tweets
without adequate context. And so, the Las Vegas Police Department came and talked to him about
exactly why he was figuring out how to attack people at this conference. I had to have a very long
conversation with them, I don't want any of you
to have that conversation, I don't wanna have that conversation. So, we're gonna try and use careful words, I don't really want to talk
about exploiting things, I want to talk about vulnerabilities. I don't want to talk about attackers, I wanna talk about threat actors. Sometimes, these people
are actually white hats, they're actually working
for the good people, they're trying to find vulnerabilities. I'm not gonna be perfect but
I just wanna encourage people to really think about what words we use as we're talking about this stuff. The other thing I gotta
tell you is, unfortunately, with a sensitive topic like security, I am not gonna be able to
say everything that I know. I couldn't say it last year and I'm still not gonna be
able to say everything I know. I'm gonna do my very best
but please understand, when I have to cut you off
or say like I'm really sorry, I can't talk about that, please
be respectful, understand, I'm doing everything I can
but there are restrictions. These deal with issues
spanning multiple companies, sometimes intellectual property issues and also security issues
where we can't disclose things without kind of, responsible
time for people to get patched. And that last point brings
me to another thing. If you're out here and there's some really brilliant people in the room, I'm sure, if you think I've got it, I totally see this way more awesome way to break through that system,
to find a new vulnerability, I would ask you, don't
come up to the microphone in public with that, right here and now because none of the security people really like to have
instantaneous disclosure, they like responsible disclosure. I'm happy to talk to you offline, I'm happy to point you at other people who can talk to you offline and figure out how to go through that process. That said, I do want you to ask questions, especially at the panel, please come up to the
microphone with questions, just understand, if we have
to push back a little bit, we're doing what we can to
try and keep this discussion at the right level 'cause
we're talking about very recent and very current events. With that, let's get started. When I first started working on this, I actually had a hard time
even following the discussions, I felt like I was a kid, I
didn't know what I was doing and a lot of that was
because there's background and terminology that I simply didn't have. I can't give you all of that, I don't have all of it myself,
I'm not a security researcher but I'm gonna try and give
you enough for this talk. First off, we have vulnerabilities,
this is a common thing it's pretty obvious, it's
some way you can take a system and cause it to behave in an unexpected and unintended manner. Not too fancy. But a gadget is a weird
thing, in a security context, we mean something very
specific, by the term gadget. We mean, some pattern of
code, some thing in a program that you can actually
leverage to actually, make a vulnerability work. These tend to be the
little building blocks of vulnerabilities. So, whenever you hear security
people talking about a gadget in the code, that's what we mean. Let's get to slightly more
interesting terminology. An information leak. This is a kind of vulnerability. There's a very classic
example, Heartbleed. What does an information leak do? Well, it takes information that
you shouldn't have access to and it gives you access to it. But I don't think, talking about it, it's the easiest way to figure this out, let's see if we can actually, show you what an
information leak looks like. Hopefully, my live demo
actually works here. I've written, probably the world's most simple information
leak that you'll ever find. We have some lovely data
here including hello world and hello all of you, but
we also have a secret, something we don't want to share publicly. We have a main function
that's gonna go through and process some arguments. This could be just any old
API that takes untrusted input and it tries to validate it. We try and make sure that we actually, if we don't have the argument,
we give it a nice default, if we do, we actually set
it from the command line, we extract this length and
then we even bounds check it, but we wrote our bounds
check in a really funny way. Some of you may be
reading this bounds check and just being like, uh-uh, this isn't gonna end well for you buddy. Unfortunately, it's not. Let's take a look at
how this actually works. So, if I run this program, it doesn't do anything
interesting, it has defaults, it says, hello world. But I don't like just
talking to the world, let's talk to you all of
you all, hi everybody. This all fine. And we see we have a length
of 13 that's our default. If I give it a small length,
it just truncates it off that's fine. But what happens if I give
it too long of a length? Uh-oh. This is because my bounds
check isn't very good. And if I give it a long enough length, it's actually going to print
out all of this secret. It wasn't intended, I
didn't write any code that would've allowed it naturally, to go and read that secret. If I try and just give it a higher index, it's like no, you can't read it. But because there's a bug in my code, I could have an information
leak, and this is literally the core bug behind Heartbleed, this is how Heartbleed happened. Is everybody happy with information leaks? Let's talk about side channels. Side channels is the next
core component of this. A side channel is some way
of conveying information using the natural behavior of the system, without setting up some kind of explicit communication channel, can
we embed a communication inside of something that's
already taking place that's routine and common
and expected to take place. You'll see in some
discussions, this gets kind of muddied with the term covert channel. I don't particularly like using that term for things like Spectre. A covert channel, I understand much better by thinking about old-fashioned spy, who here likes spy movies? I've got some people who like spy movies. Covert channels are like spy movies. That's like when you say, when I raise my blinds on the
third Wednesday of the month, we meet, that's a covert channel. It's not a normal thing, I'm not always raising
and lowering my blinds, it's just that, it doesn't look like a communication mechanism but it is intentionally set up
as a communication mechanism and used for that purpose. A side channel is not something
we intentionally set up, it's just something we
can take advantage of that was already happening. Let's look at a side channel. Again, I think seeing
this stuff is a lot better than just describing it. I built a little side
channel demo for you all but unfortunately, this is
gonna be a lot more code so, I'm gonna try and step through it. It's okay if you don't understand
everything, like I said, we're gonna have a whole panel, but I'm gonna try and give you at least, the gist of how this works. The first thing I have is a secret. The secret is just a string,
it's nothing too fancy and I have some code
that does force reading, and I have some timing code,
I have some random math code that's not super important. The main body of this is
this leak bytes thing. The very first line of
this, up at the top, I have a timing array
and this timing array is a big array of memory that I
can access in different ways to access different cache
lines on a modern processor. I then extract this string
view, this nice string view which tells me about this
in bounds range of text and I build some data structures
to collect information, latency and scores. And then we start runs,
and we do a bunch of runs until we get enough information to believe that we have actually found
some information embedded in another medium, in this case,
in a timing side channel. First thing we do is, we
flush all of the memory then we force a read
but not just any read. We load the information out of data then we use that to
access the timing array. And we access it not just
locally but at strides. And so this means that,
for different values in this data array, I'm gonna
access different cache lines. Then, I have to go and see
whether that was successful and in order to see
whether it was successful, I have a loop down here
which kind of, looks for, which kind of shuffles
the way I access memory and then, accesses each
and every cache line in this timing array, does a read and computes the latency of that read. It is just timing each cache
line in a way that lets us see whether one of these
cache lines was faster than all of the others, because
we've already accessed it, we accessed it right before,
in the previous loop. Makes some sense? And then, we go and we
find the average latency because we don't wanna
hard-code any constants here. If one of the latencies, if one of the latencies
for one of the cache lines was substantially below
the average, then we think, cool, that was probably a signal embedded in our timing side
channel, we bump the score and if we get the score high
enough, down here at the bottom if we get the score high
enough, we gain confidence, yeah, we found our signal, we've actually found the information. Makes sense to folks? Let's see how this works. If I run this, that was pretty fast. If I run this, you're gonna
see, it's gonna print out each of those characters. And each one of those,
it's not actually looking at the character, it's
timing the access to memory. Makes some sense? It's actually that simple. There's not more, I don't
have anything up my sleeves. Like I promised, this is like
a real, this is a real demo. You have one more key piece of, quarter knowledge here and
that's speculative execution. We talked a lot about
speculative execution in the talk I gave last year,
I'm not gonna try and give you a full rundown on how processors
do speculative execution, the key thing is, that
it allows them to execute instructions way past what
the program currently is at and sometimes, with
interesting assumptions. Because in order to execute further along than the program currently has, the processor has to make predictions. These predictions, are
really more like guesses. And sometimes, it guesses wrong and it makes an incorrect prediction but it continues to speculatively execute and it just unwinds all of that later. But when you have this misspeculation and you combine it with a side channel, it allows you to leak
information that was only visible during that speculative execution. And that speculative
execution may have occurred with strange invariants, with
invariants simply not holding and so, you can actually
observe behavior from a program that violates the fundamental
invariance the program set up. And that's Spectre and that's
why Spectre is so confusing. You wrote the code and it
clearly, only does one thing but observation shows something else. Let's see if we can map this on. My demo for this one is going
to be essentially, Spectre v1. But I've tried to make it as similar to the previous two demos as I could. Just like last time, I have a
text table with three strings. I've hard coded it to try and
read using this second string we can jump down to the main
function and you can see what it's actually doing here. We actually are going to,
always use this text table one. That's the only thing
we hand to leak byte. We do not hand the second,
like the third entry in our text table to this routine and we hand it to string
view with a bound in it. And then this loop is
essentially, computing an out-of-bounds index into this thing and we're parsing this index. But this I is always
going to be out of bounds. We're computing it based on
a totally different string. This index is never in bounds. Once we get up to the leak byte, we have a slightly different
routine, we have the same setup with one small difference. We put the size of our
string view into memory. This is me cheating so that
it fits on a slide but, the idea being that your
size might not be sitting in a register, it might be slow to access. Then we have our runs. Getting a good demo of
this is a bit tricky. One thing we need to do,
is we have to essentially, train the entire system
using correct executions before we can get it to
predict an incorrect execution. And so, I build a safe
index into my buffer of text and this is always gonna be in bounds, this index is totally fine. But it's important to note, this index is not stable,
each run gets a different one, it's not at all going to be useful for extracting any
information from this routine. The only thing it's
useful for is actually, accessing my data in a safe manner. Then I am going to flush
the size out of my cache. It doesn't matter that I'm flushing it or doing something else,
all I really need to do is make size very slow to compute. Then I wait a while. Turns out that this little
stall here is important or it doesn't tend to work. And I compute this weird local index. This local index is essentially, the training and then the attack. For the first nine runs, we just access a perfectly safe index,
but then on the tenth run, we switch to the index the user parsed in. So, just nine good, a tenth one bad. Then we do a bounds check. I wanna be really clear, we always do a bounds check and this is a correct bounds check. We make sure that the index
is smaller than the size and that means, we will
never access the data out of bounds here. We hid it in a string view, a safe entity. Herb has told us all about
how safe string view is but then when I come down here, I'm going to access it using a local index and the problem is that
this access right here using the index may happen
speculatively and it may happen before the bounds check finishes and when the bounds
check was going to fail. So, it accesses an
out-of-bounds piece of memory, it uses that, scaled up,
to access the timing array then we read through that yet again and all of a sudden, we
have leaked information, we've actually accessed our side Channel. The rest of this is the exact same code. We go through and we
measure all the times to see like yes, did we in fact
find one of these cache lines being slower, and if so, we compute it, there's nothing else
different from this example and the previous one. And when I run this, it's actually going to print the string. And we never accessed this memory. If I made this example
a lot more complicated and move that memory into a separate page, I could even protect the page
so that any access would fault the program would run fine. Because we never access
the memory directly, we leaked it through a side
channel, so that is Spectre. I know this is an uncooked thing, I just ran it on an Intel laptop here. If we make really good time, I'm happy to try and actually
show you this actually working on a non-Intel machine as well. I have it but unfortunately,
we had some AV issues and so, I'd have to sit here and type in passwords like half a minute, it's not really fun. Let's for now, kind of
go back to presentation. We've gone through and we've looked at all this speculative execution,
we've looked at Spectre and misspeculative execution, but if this were just one issue maybe, it wouldn't be that bad. It isn't just one issue. This is an entirely new class
of security vulnerabilities. No one had really thought
about what would happen if you combine speculative
execution and information leaks. They had no idea that there
was something interesting here and as a consequence,
we have had a tremendous new set of security issues coming in and I'm gonna try and
give you a rough timeline of all of this. It started off last year in June, when Project Zero at Google
informed vendors of various CPUs and other parties about the
first two variants of Spectre which are called bounds check bypass and branch target injection
or variants 1 and 2. Then, a few weeks later,
they found a third variant, it's called variant 3
or rogue data cache load or much more popularly, Meltdown. And vendors were working
furiously for the rest of the time until January, when these were
finally disclosed publicly as variants 1 and 2 of Spectre and variant 3 of meltdown. During this time period, they were found by other researchers who were looking in the same
areas, kind of concurrently and all of the researchers
kind of held their findings in order to have a
coordinated disclosure here because this was such a
big and disruptive change to how people thought about security. Most of the companies
working in this space actually didn't have teams
set up in the right place or with the right expertise to even address these security issues. So, it was a very, very disruptive and very challenging endeavor
because it was the first time and a totally new experience. But we weren't done. After this, we started to see more things. The next one was in
March, called BranchScope. BranchScope wasn't a new form
of attack, it was actually, a new side channel, instead
of using cash timings that pointed out that you could use the branch predictor
itself to exfiltrate data from inside a speculative
execution to a normal execution, just a different side channel. We also started to see issues coming up which had nothing to do with Spectre but were unfortunately,
often grouped with Spectre because this stuff is complicated. I don't know about you all, but I think this stuff is complicated, the press thinks this stuff is complicated and they ended up merging
things together, understandably. And so, there were issues around POP and MOV SS which are weird,
Intel and x86 instructions that have a surprising semantic
property that essentially, every operating system
vendor failed to notice when reading the spec. And unfortunately, those bugs
persisted for a long time but now that people were looking at CPUs and CPU vulnerabilities, they were able to uncover
these and get them fixed. They don't have anything to
do with speculative execution or Spectre. There's also Glitch, again, doesn't have anything to do
speculative execution on CPUs and Spectre. But there was another
interesting one in May and this is two things, variant 3a, was a very kind of, obscure
variation on variant 3 and then variant 4. Variant 4 was really
interesting, and I mean, really interesting. This one's called
speculative store bypass. This was also discovered by Project Zero and by other researchers concurrently. And this one made Spectre even
worse than it already was. So, this really kind of, amplified everything we were dealing with. And we still weren't done. The next issues were
Lazy FPU save and restore which we saw in June. This was super easy to fix,
it's kind of a legacy thing that hadn't been turned off
everywhere it should have been and it turns out there's a bug. During speculative execution, you may be able to access FPU state. That the operating system
has kind of left there from when the previous
process was running. With the idea being, that it has an, it's gonna trap if you actually access it, and once it traps, it'll save it, it'll restore your FPU state and then let your execution proceed. But the trap happens after
speculative execution. And so, you can speculate right past it, access the FPU state and leak it. This is an arbitrary memory but it ends up still being
fairly scary because, inside of the FPU state,
includes, things that are part of, that are used by Intel's
encryption instructions. And so, you would actually
put private key data in the exact place that you leaked which was really unfortunate. Again, this was mostly a legacy thing, very quickly and easily turned off. Intel and other vendors
have been providing better mechanisms than
this for a long time but we hadn't turned it off
everywhere that we needed. We have another kind of
mistaken entity in this, we got a new side channel attack that had nothing to do
with speculative execution. It's just a traditional
side channel attack on cryptographic
libraries, called TLBleed, it's a very interesting attack, it's very interesting research but it doesn't have a
lot to do with Spectre. And apparently, I have... Then in July, we start to solve
even more interesting things in my opinion, even more
interesting things coming up. These ones are called
variants 1.1, 1.2.0 and 1.2.1 or collectively, bounds
check bypass store, which is a, kind of a mouthful but this was a big, big thing. This essentially, extended variant 1 in really exciting ways
that we're gonna look at. Then later in July, we
got still more good news. We got to hear about
SpectreRSB and ret2spec, yet more variations on this. And then in July, we got the
worst news, for me at least, which was NetSpectre. NetSpectre was not a new vulnerability, it was not a new variation on Spectre, it was a really, exemplary demonstration that all of the Spectre
things we're looking at can be leveraged remotely. It does not require local access. So, the NetSpectre paper
actually used this remotely. Oh sorry, and one more
thing, L1 Terminal Fault. This one was extremely
scary but fortunately, has relatively little impact outside of operating system vendors so, we're not gonna spend
too much time on that one. But there was yet another one
that happened pretty recently. I don't think that we're over. This timeline is going to
keep going as time passes. We're going to keep
seeing more things come up as the researchers and the vendors kind of explore this new space, so, you should not expect this to stop. That doesn't mean that the sky is falling, it's just that we have to
keep exploring this space and understanding the
security issues within it. And this is gonna keep
going for some time. But for now, let's try
and dig into these things and understand how they work
in a little bit more detail, especially outside of the one example that I've kind of shown you already. Let's look at the broader
scope of variant 1, because variant 1, I've shown you just bypassing a bounds check,
but variant 1 is actually, a much more general problem. Any predicate that the processor
can predict can be bypassed and if that predicate
guards unexpected behavior by setting up some
invariants or assumptions, which most predicates do, you may have very surprising consequences. As an example, we might have, a small string optimized
representation here, where we have a different
representation for a long string and a short string. Up here, we have a
predicate, is this long, is this in the long representation? And you might actually train and the branch predictor might
think, this is probably long or it might think, this is probably short. Turns out, short strings
are the most common cases, the branch feature will
predict that this is probably going to be short. Unfortunately, a lot of short
string optimization strings, the pointer to the short string
is inside the object itself often on the stack, where
there are other things that are really, really
interesting to look at adjacent to the string object. And so, if we predict that this is short, we're going to get the short pointer 'cause it's actually just
a pointer to the stack and we're going to start speculating on it and if we speculate far enough to find some information leak,
this can be exploited. Then you have another interesting case. What about virtual functions,
what about type hierarchies? Here, we have a type hierarchy,
we have some base class for implementing key data
and hashing of the key data and then we have public keys where we don't have to worry
about leaking the public key, and we have a private key
where we have to worry about leaking the key data. We have this virtual dispatch
here and what happens, if we've been hashing public keys over and over and over
again, and then we predict that in fact, we think we
have another public key when we don't. We may dispatch it to the wrong routine, to the non-constant time one, speculate it and run right across the cryptography bug that this whole thing
was designed to prevent. Again, the invariance you
expect in your software, don't hold one speculative
execution starts, that's what makes it so
hard to reason about. There are also other
variant 1 derivatives. So far, we've looked at cases where you speculate parse some predicate and you immediately find
an information leak. But, there aren't that many
information leak code patterns in your software maybe, so,
that might be relatively rare. But that's where the the variants 1.1, 1.2 or the bounds check bypass
variants came into the picture. Here, we have some delightful code which has some untrusted size. We're gonna come in and we're gonna have an out-of-bounds access here, and once we have this
out-of-bounds access, we're actually going to
copy into a local buffer on our stack, data that has
been given to us by the attacker because we've got an out-of-bounds store that we can also speculatively execute. This speculatively stores
attacker data over the stack. And if this happens, then later on, we're going to potentially,
return from this function and when we return from this function, the return address is stored on the stack but we've speculatively written over it, this is a classic stack smashing bug now come back to haunt us
in the speculative domain. Even though the bounds check
is correct, it didn't help, we were still able to conduct
a speculative stack smash. And this in speculative
execution to an arbitrary address controlled by the attacker. Before I go on, it's important
to really think about why, sending control to an
arbitrary address is so scary. We've had bugs involving
stack smashing forever, it's one of the most common
security vulnerabilities but once you do that, you tend to want to build some kind of, remote code execution, you wanna build logic and
trigger logic out of that. The best way to do this is
to find the logic you want inside the existing executable and just send the return to that location. It's called return-oriented programming. You take the binary and you analyze all of the
code patterns in the binary to find little pieces
of code that implement the functionality you want. And then, you string them
together with returns by smashing the stack and
going to the first one which does something and
then goes to the second one and so on and so on. The most amazing thing to me, again, I'm not a security researcher
so when I heard about this, it just like, blew my mind. The most amazing thing is that, some very, very delightful
individuals have built a compiler that analyzes an arbitrary binary to build a Turing complete set of these gadgets and then, emit a particular
set of data values and a start point which
can implement any program, which is a little bit frustrating. And then you realize, that it's actually easier
in the speculative domain. It doesn't matter if it crashes after I do my information leak. For a real code execution,
I don't just have to execute the code I want, I also probably, wanna keep the service
running for a while, like I wanna, set it aside
and not disturb it too much. Don't need to do that, I just need to hit my information leak, it can do whatever it wants, it can crash, it can do anything. And this means, if the attacker
can get to this return, they're done. They have so much power, because we have this long history of work figuring out how to use this return to do really, really bad
stuff to the program. Makes sense? But there are more ways you can do this. You can imagine, you have again, some type with some virtual interface. And you have this virtual
function you created on your stack but then you process some
code, also on the stack but with an attacker-controlled offset that may be mispredicted. And then, you're going to
use that offset to index and this can index from one
object on the stack to another because it can go out of bounds, 'cause we're in speculative execution. And then, we can potentially
write attacker data over the stack, and this might write over the actual V pointer, that points the vtable for this object. Again, speculatively. It's all gonna get rolled back eventually but if we then hand control,
off to some other function and this other function
doesn't use the derived type, it uses the base class to access it, it's going to use that V pointer
to load the virtual table to load a function out
of it and call that. But you just got to point
it at any memory you want which means you get to send
this virtual function call anywhere you want in
the speculative domain. It's just like the
return, except this time, with the virtual function call. And I can keep going, there
are a bunch of different permutations of how you can
hijack control flow here. But the easiest way to hijack control flow and send it to your
information leak gadget was in variant 2. And this is why variate 2 was extra scary until it got mitigated. Variant 2 works something like this. Again, we have our class
hierarchy, we have some, sorry, not class hierarchy, we
have a function pointer here, just any indirect function call, doesn't matter how you get there. We're gonna call through it. Well, how does this
actually get implemented in the hardware? To really understand variant 2, we've gotta start dropping down
a few layers into hardware. We're gonna drop into x86
assembly at this point. This is actually the x86
assembly produced by Clang a little while ago for that C++ kit. Right here we have this
call with the weird syntax, we're actually calling,
like through memory. And what this is doing, it's
actually loading an address out of the virtual, sorry,
out of the state function and then calling through it. This is an indirect call. This is really hard on the
processor because it doesn't know where this call is going to
go and it wants to predict it, that's how we got into
speculative execution. But the implementation of this predictor has a special problem. This is my world's worst diagram for it but it gets the point across. The implementation of this predictor is essentially, a hash table. It's a hash table that maps
from the program counter or the instruction pointer
of the indirect call to a particular target
that we want to predict. But it doesn't map it to the
actual target address, oh no, it maps it to a relative displacement from the current location
because that's smaller, we can encode that in a lot fewer bits. And then you realize something else. This is a really size constrained thing, this is literally, a hash
table implemented in silicon. And so, in order to implement this, the hash function actually has
to reduce this key by a lot, it doesn't use most of the bits and the hash function is
really straightforward in a lot of cases. And so, there are collisions
in these hash tables all the time. They're tiny, you would expect
collisions and that's okay. So long as the collisions
are infrequent enough, the performance is still good. But if you can kind of try out
the collisions long enough, you can figure out how to
cause a collision reliably in this hash table. If you can cause a collision reliably, you can train this predictor
to go to your displacement. And then, when we do this call,
we look up in the hash table we hit a collision, we
get the wrong displacement and we go to the wrong location. And it turns out, this is really easy. The only thing you have to
have in the victim code here is an indirect call and that's everywhere. Or even just a jump table
to implement a switch, is enough to trigger the same behavior. That makes this really, really
easy to exploit and actually, take and send control
flow to wherever you want. But it's worse than that. There's another kind of
indirect branch in x86 code, if you have a return. Returns on x86 get implemented with some instruction
sequences that look like this. And again, we don't have a
specific destination here, the destination's in
memory, it's on the stack. And so, when you go to return, the processor has to predict it somehow. For calls and returns to processors, all have very specialized predictors that are super, super
accurate, typically called, the return stack buffer. Unfortunately, sometimes,
these predictors run out. They may not have enough
information to predict it and on some processors, when
that happens they fall back to the exact same hash table solution as we solve for virtual
calls and for jump tables. And so, even a return can, in some cases, trigger this behavior. That means, it's actually pretty
easy to find these in code. That's variant 2. I'm gonna keep going. I'm skipping over variant
3 because variant 3 was completely addressed
by the operating system, user code does not need
to worry about variant 3. So, let's look next at variant 4. Variant 4 is called
speculative store bypass. This is actually pretty easy
to understand what it does. It's exactly what it says in the name. Sometimes, when you read from memory, instead of reading memory
that was just stored at that location, you will read
speculatively, an old value. That's really it. The problem here, is that
the processor may not know whether the addresses of
these loads and stores match. And so, instead of waiting
to see if they match, they'll guess, they'll predict. If they mispredict, they
may predict that the store and the load don't have the same address. And if it predicts they
don't have the same address, it may speculatively execute the load with whatever was there
before, that store. That's pretty simple and you
can imagine how this works. Imagine you have an application which runs some sandbox
code in the callback here and hands that sandbox code,
a specific private key. We don't ever want to hand a private key to the wrong callback here. One of these callbacks
owns one of the keys, another callback owns a different key. But when we're going through this loop, the key gets parsed by
value and that means, this is a bit big to fit into registers, we're going to store a copy
of this key onto the stack, then we're gonna call the function with the pointer to
that entry on the stack. It's gonna finish, come
back, we go to the next one, we store the next key onto the stack and call the next function. But if that function happens
to speculatively execute in the right way, its loads may
not observe that stored key, it may observe the previous
function's stored key. And then be able to leak it and we have another information leak. It turns out, that this is the fastest of the information leaks
that we have found. If you can hit this reliably, you can extract data
at an unbelievable rate with this particular technique. This technique caused
tremendous problems for browsers and other people doing
sandboxing as a consequence. But there's also is other implications. You can imagine a variant
1-style information leak that's actually powered by variant 4. So here, we have a vector
that we're returning from some function, which means
we're gonna store a pointer like some pointers but also
a size into memory here. Then, when we come down
to our bounds check, we may be reading size out of memory and if we're reading size out of memory and it happens to be slow,
it may not see the store just before this in size. And so, it may speculate instead, reading whatever was on
the stack before the store, which might just be a
random collection of bytes probably a very large number, means this bound check will parse, but it's using the wrong bound. It's not that we've
bypassed the bounds check, the bounds check occurred,
it just used the wrong bound. And again, we get into the classic information
leak as a consequence. Variant 3, like I said, this is mostly about operating systems. I can explain if you folks want, but I'm just gonna keep
moving for the sake of time. We also have Lazy FPU save and restore, I mentioned kind of how this worked. But again, this was largely
fixed by operating systems since the operating system
is the one switching context, it can change its behavior
and prevent application code from having to worry about this. An L1 Terminal Fault. The way L1 Terminal
Fault works is amazing. There are certain kinds of
faults that, when they happen speculative execution can again, occur. And if you arrange everything just right, especially with page tables and
other aspects of your system you can essentially read
arbitrary data out of the L1 cache while this terminal fault is being handled and leak it with speculative execution. And there are a bunch of
different ways to observe this, there is a great paper
that introduced this called Foreshadow and showed,
that this actually works inside of Intel's secure Enclave SGX. And yes, it just allows you
to read the entirety of L1. If you haven't seen it yet,
go and look for the video online about this. You can actually find
one of the researchers which has a window at the
bottom of a Windows machine and as they type in the
administrator password, the window shows the administrator
password in real time. It's really, really effective. But again, this is mostly
an operating system concern and so, operating system
changes and hardware changes are being used to address this. Application code doesn't have
to deal with this directly. I don't know about all of you, but I think that was too much information. So, I'm gonna try and summarize in a way that you can kind
of wrap your head around. This is gonna be the
most busy slide I have. This is the summary slide, of essentially, all of this background information. We have four variations on Spectre v1. There's v1, 1.1, 1.2, ret2spec, which I just didn't have
time to show you all. These are all taking advantage of the same fundamental mechanisms and they have very similar properties. They can impact application code, they can impact operating system code. They don't require to
be using hyper threading or simultaneous
multi-threading in your CPU. We have really slow software
fixes that none of us like and we don't have any realistic
hardware fix on the horizon. These are actually the thing
I'm gonna talk about most, because these are for me, the most scary. Note that red column on the right. We also have variant 2 which actually, is the primary variant
2 but also, SpectreRSB which helps show how you can actually get variant 2 to work on returns. These are a bit different. While they impact both the application and the operating system code, they do require some things to be true. For you to attack from one
application to another, you really have to be
using hyper threads or SMT. The other nice thing is that, we have some much better hope of fixing these. We have a very good
software fix for variant 2, we don't have a great
software fix for SpectreRSB or variant 2 when it's hit
with the return instruction but there's some stuff you can do, but it's not as satisfying. But we do have good Hardware
fixes on the horizon, future Intel hardware, future other, future hardware from other vendors is going to do a very good
job of defending against this. Then, we have variant 4. Variant 4 looks, in terms of
the risk, more like Spectre v1 but with less hope of mitigating it. It impacts applications, it
impacts operating systems, it does not require hyper threading for one application to attack another. We have absolutely no hope
of fixing this in software and so far, the hardware
fixes are proving problematic. There is one that's slow and the browser vendors aren't using it and have some concerns about it, and so, this one's still pretty fuzzy. And then we have a bunch
of things at the bottom that I really view very
differently from the rest because these are fundamentally, CPU bugs that just interacted very poorly
with speculative execution and the Spectre techniques. And these, I think are
going to very consistently, get fixed rapidly. I think these are in some
ways, the least scary for application developers. Most of them don't impact
applications at all, you don't have to change your code at all. They're only in the OS. We have a great software fix for Lazy FPU, so good that no one is going
to try and fix the hardware and we have great hardware
fixes for the other ones. And so, I think these
are generally speaking, going very well. I'm gonna really focus on
Spectre variant 1, variant 2 and variant 4 because those
are the things that are really continuing to impact software today. To really talk about what you
need to know in this space, we need to have a threat model. If you went to one of the earlier talks at the conference about security, there was a great discussion around how you do threat modeling. Unfortunately, that person is actually a
security researcher and I'm not. And I'm certainly not
your security researcher and so, I can't help
you build a threat model and that's not what I'm gonna do up here. But I can give you some
questions you can use when building your own threat model to really understand the
implications of Spectre and speculative execution attacks on your particular software system. First off, does your service have any data that is confidential? Because if not, it doesn't matter if you have an information
leak vulnerability, it's a very simple, simple answer. I love this threat model. Next, does your service interact with any untrusted services or inputs? Is there any input you don't fully trust? Is there any entity that
talks to you in some way that you would not want to share all of the information you have with? If the answer's again,
no, then, you're fine. This gives you a nice simple rule that fortunately excludes, I think, the majority of software
we have out there. If you have nothing to
steal or no one to steal it, you have nothing to secure, from information leaks. This is a pretty solid,
mental model to use when coming up with your threat model. Unfortunately, we do still
have a lot of software that doesn't fit this model. So, let's talk about
how we can dig through those pieces of software. Do you run untrusted code
in the same address space as you have confidential
information stored? Do you have some information there and you're gonna run untrusted
code right next to it? If this is the case,
you have a hard problem. We do not know how to
solve Spectre effectively for this case, outside of isolating your entire code from your confidential information. This is the case that browsers are in. You're going to see browsers
increasingly dealing with this particular case. If you hit this, almost nothing else about the questions here matters, you're going to have the
highest risk from Spectre. But maybe you don't have untrusted code running in the same address space, there's a lot of software that
doesn't run untrusted code, which is good. Now you need to ask yourself,
does an attacker have access to your executable? Can they actually look at your
binary and reason about it in some way? Can they steal a copy of it easily? Is it distributed in some way
that they would have access? That's gonna really
change the threat model. If no one has access to your executable, they're going to have
an extremely hard time using these techniques. It's not impossible, but it
becomes incredibly difficult. However, you wanna be a
little bit careful here because they don't need access
to the entire executable. If you use common open source libraries, and if you link them in
and if you build them with common flags, then, they have access to part of your executable. If you run on a distribution
and you dynamically link the common distribution shared objects, they may have the exact same distribution and they'll have access
to some of the executable and they don't need access to all of it to mount a successful attack. So, you wanna be a little bit careful how you think about this but it does really dramatically
influence how open you are to these kinds of risks. The next question is, does any untrusted code run
on the same physical machine? Because if the answer here is, no, you're really looking at a
single mechanism for attack and that's the ones
presented in NetSpectre. That's the way you're
going to be seeing this. NetSpectre gives us pretty
clear bandwidth rules and it turns out, the
bandwidth is low and so, if you don't have untrusted
code running on the same machine there's some very specific
questions you wanna ask. How many bits need to be leaked
for this information leak to actually be valuable to someone else? How many bits are at risk? If you have a bunch of data, if you have the next manuscript for, I guess Harry Potter is over, but whatever the next fancy book is, leaking that manuscript's going
to be really hard, it's big. You don't need to worry
about someone leaking the next video game that you've got a copy of on your machine, that's gonna be really slow. But if you have a cryptographic key, that may only be a few thousand bits. If you have an elliptic
curve cryptography key, that may only be 100 or 200 bits before it's compromised. And worse with cryptographic issues, you may not need all the
bits for it to be valuable. So, you really wanna think about this. Another thing to think about is, how long is this data accessible? If it's in the same place for
one request in your service and then you throw it away and then it shows up somewhere else, then, you may not have big
problems here because, it may be very hard to conduct
all of the things necessary while the data is in the same place. You also wanna look at
what kind of timings that someone can get in the
NetSpectre style of attack. You wanna look at, what is
the latency of your system? How low is that latency,
how low can they get it? And you also want to look at, just how many different systems,
have the same information? So, if you have, for
example, a cryptographic key that is super important
and you have distributed it across thousands and thousands of machines and all of those machines can
all be attacked simultaneously you have a much bigger bandwidth problem than if it only exists on
one machine, because then, the bandwidth is much narrower. These are key things to
think about around bandwidth. And really, NetSpectre is all about this. You're essentially, always going
to be making this bandwidth risk, value and complexity
trade-off because, it's going to be very hard
to mitigate this otherwise, so, you want to think
very carefully about this. But what if you do run untrusted
code on the same machine? There are a lot of shared
machines that actually have shared users here, and I don't
mean in the cloud, since, if you have separate VMs, that's enough. Like you can think of
those as separate machines, but what if you're actually, really running on the same machine? Then you have to ask more questions. Do you run untrusted code
on the same physical core? And this may not always be obvious. If you don't have hyper threading or simultaneous multi-threading,
then, you clearly don't run untrusted code on the same
physical core simultaneously. But there are other ways you may get here, you may partition your workload
across different cores. There are a lot of ways
that may influence this and all of the variant 2-style attacks from application to application, rely on running on the
same physical core and so, in a lot of ways, if you can exclude this you get to take out an entire
variant from your threat model and that's really, really useful. With that, we've kind of talked about all of the different things
you wanna think about from threat modeling. I do wanna re-emphasize,
this is about applications. Operating systems and hypervisors have totally different challenges here, I'm not covering them. They're there, they're very real risks but I'm not covering them. If you wanna know all
about operating systems and hypervisors, you can
come and ask all about them at the panel but, I'm
actually not the expert there and it's a very different thing and it seemed like a different crowd that might be more interested in that. I'm focusing on application issues here. With that, let's move over
to talking about mitigations. How do we actually cope with this? First things first, you have to mitigate your
operating system otherwise, none of this matters. If you do not deploy the
operating system mitigations that your operating system
vendor is providing, you cannot do anything useful here. These are essential. So, please, especially now,
it's increasingly important that you have a way to
update your operating system and that your operating system vendor is actively providing you updates. If they aren't, you should probably look for a different
operating system vendor. This stuff is important. Let's assume you've gotten all of your operating system mitigations and all of your operating system
updates and so you're good. And let's talk about how you can mitigate your application code. First off, there are some
x86 kind of operating system and hardware-based mitigations
for application code. These come in three flavors. They have again, weird acronyms. IBRS which is, indirect
branch reduced speculation. IBPB, which I missay every
time I try, which is, indirect branch prediction barrier. And STIBP, which is the, single threaded indirect
branch prediction feature. Your operating system and your
hardware can turn these on. When they do, they can provide
certain levels of protection from some of these variants. But an important thing to
realize, for an application, these do not help with variants 1 or 4. They're exclusively
helping with variant 2. They also, may be very slow in some cases. These are especially slow
on current and older CPUs. We're expecting newer
CPUs to increasingly, make these things fast and
for them to be essentially, unobservable in terms of performance. But if you have the older CPUs, even turning these on
with your operating system may be a very significant performance hit and there are some alternatives. But the alternatives are software-based, and so, we need to talk
about how we can use software to go after mitigation. The first one is called Retpolines. This was developed by
Google, a colleague of mine. The idea is, well, we can
recompile our source code to our application, is we wanted to see, is there something we could
change in the source code that could be effective
at mitigating at least, some of the most risky variations on this. Notably, variant 2 which is far and away the easiest to attack in a working system. It seemed like something we
really wanted to mitigate in software, given the
performance impact we were seeing from the OS hardware-based mitigations. It does require recompiling your source, which can be painful, but if you can, this mitigates Specter variant 2 and SpectreRSB in restricted cases but there're a bunch of
asterisks and hedges there. And it's usually going
to be faster than STIBP on current CPUs and older CPUs for mitigating your current application. Not always, but there's a decent chance you probably want to look at it. Going forward, in the
future, we do expect this to become less and less relevant because the hardware
is really catching up. We're expecting in the future, this is just going to work on hardware and you're not going to
need to worry about this. But for now, you might
want to worry about this if you have a service
that is at risk here. How does this work? We have some indirect call,
just like the previous one but when you compile your
code with Retpolines, we don't emit these instructions, we emit a different set of instructions. Here, we've taken this address
that you wanted to call and we've put it into a register r11. And then we've transformed the call into a call to this helper
routine, llvm retpoline r11. If we look at this
routine, this is a very, very strange function. The first thing it does is a call but it doesn't call a
function, it calls a label, a basic block inside of itself. And once it does that, it then takes the address
you wanted to call and smashes the stack with
it, this is a stack smash, this clobbers the return address with this address you wanted to call and then it uses a return to actually branch to that location. So, that's a pretty weird thing to do. The key idea here, is that by doing a call followed by a return, we
put a particular address, an unambiguous address into
the call and return predictor, the return stack buffer. And this predictor is
really fast and really good so, the processor prefers
it anytime it can use it. And in the vast majority of cases, it's going to be able to use
it here, and when it does, if it speculates this return,
it actually ends up here, because the speculative return can't see that stack smash operation. So, when it speculates
return, it goes here which then goes to this
weird pause instruction. How many folks here have used
the x86 pause instruction? I don't know what kinda
code you people are writing except for this one over here. I know what you're doing too. The pause instruction's super weird, I never even knew what this was, I thought this was like
something from old, old, old x86 days but no, it
actually has lots of uses, and in this case it is
the cheapest possible way to abort speculative execution. And we want to abort it
because speculative execution consumes resources, like power. And so, we don't want to abort it, and so, we cut it off here. Unfortunately, pause doesn't
do that on AMD processors, it only does it on Intel processors. After we pause, we then
do an LFENCE and this, will actually work on AMD processors once you install your
operating system updates. Finally, just in case
all of this magic fails, we make this into an infinite loop. You're not getting out
of here, this is keeping the speculative execution in
a safe, predictable place. This essentially, turns
off speculative execution and branch prediction for indirect calls and indirect branches, but that
protects us from variant 2. The overhead of doing
this is remarkably small. This is about your worst case scenario, we built very large C++
servers with this enabled and the overhead was under
3%, reliably under 3%, but it does require that
you use some pretty advanced compilation techniques. You need to be using profile
guided optimizations, you need to be using ThinLTO
or some other form of LTO. I can't emphasize that
enough, but when you use them, you can keep the overhead
here very, very low. And if you're working in
something very specialized like some really specialized
code at code or a kernel, you can usually avoid
the indirect branches and indirect calls,
manually, with essentially, no measurable performance
overhead by introducing kind of, good guesses for what
the direct code should be and a test to make sure
that that's correct, rather than relying on indirect
calls and indirect branches. We've been able to use this to make our operating system mitigations incredibly inexpensive, as a consequence. But this is only for variant
2 and maybe variant 2 is gonna be fixed in future hardware and maybe, you're not even subject to it. So, what about the other variants? That's where things start to get bad. You can manually harden
your branches for variant 1, which is nice. But it can be a bit painful. Intel and AMD are suggesting that you use the LFENCE
instruction right after a branch. And actually, while we're here,
I think we have enough time. Everybody likes live demos, let's see if we can actually just do this. I come down here. And after my branch, I do an LFENCE. We would expect this to mitigate things, hopefully it does. This is gonna run really
slow but it's also, not gonna produce my string. Nothing's happening here
and that's a good thing, it's running. I can even build the debug
version if you're all are worried that I'm being sneaky here. I have a debug version that
actually prints out stuff while it's going. We're trying to leak it, it's a secret and you're seeing what it's finding here and it's not finding any
character data from the secret. And just so that we're all clear, I don't have anything up my sleeve. Comment this out. No, have to rebuild. Goes right back to working. LFENCE works, that's nice. We like mitigations that work. But it is a bit slow and
it can be really expensive and there're cheaper
ways to do the same thing if you can go through and mitigate each and every one of your branches. With Google and ARM have been looking at building APIs to do this
in a more efficient way and in a little bit more
clear way in the source code because an LFENCE feels
pretty magical to just like, oh no, no, I just put an
LFENCE here, I'm good. We can do something a little
bit better with an API. There's a lot of work to do that though, I've got links up on the
slides if you wanna go to them. This is gonna show you, kind of, where these different
organizations are looking to build APIs, but we don't have anything that's really production quality and that you can reach out and use today. The best you can do right now
is actually something like LFENCE, I think ARM has
a similar thing to LFENCE that they suggest with
an intrinsic as well. But, this doesn't scale well. You have to manually do
this to every single point in your code, that's
really, really painful. Maybe you can use a static
analysis tool to automate this but what we found is that
the static analysis tools either cannot find the interesting gadgets that look like Spectre variant 1 because they're very careful and accurate and they leave lots of unmitigated code or they find hundreds and
hundreds and hundreds of gadgets that are completely
impossible to actually reach with any kind of real-world scenario. You can't actually get there and use them to conduct a Spectre,
kind of, information leak. So, this means that they're
not super satisfying to use, they're better than the
alternatives of doing it manually without static analysis tool, but they still pose real
scalability problems. Ultimately, my conclusion is that, this isn't going to continue to scale up to larger and larger applications. We're already right about at the threshold of how much we can do
with static analysis tools and manual mitigations when we're working on large applications. So, we need an alternative. There's another system called
speculative load hardening, this is also developed by Google and this is an automatic
mitigation of variant 1. This is not related to the Spectre flag in Microsoft's compiler. That is not automatic mitigation
of variant 1 in all cases, that handles specific cases
that they've taught it about. Other kinds of variant 1,
other instances of variant 1 aren't caught by it, which makes it, potentially, risky to use. But this is a categorically
different thing. This is a transformation that removes the fundamental exploitable
entity of variant 1 from your code, and it
does it systematically across every single piece
of code you compile. You still have to recompile your code but you can deploy this to get kind of, comprehensive mitigation of variant 1. Just so you are aware,
this is incredibly complex, it's still very, very
brittle, this has been something that we're
working on for a long time but I don't want you to get
the impression that, this is production quality, ready
to go right out the door. We're all still, really working on this, but I wanna try to
explain how this can work. Let's take an example. This is a little bit simplified version of the Spectre variant 1 example
from the original paper. We have a function, except
some untrusted offset, some arrays and it's going
to try and do a bounds check. So, we come down, we do a bounds check, we potentially bypass this bounds check. Let's look at how this
bypass will bounds check is actually implemented in x86. If we compile this code down, we get the instructions on the right. These instructions are going to compare whether we're below the bound. If we're greater than
or equal to the bound, we're going to skip this body of code. That's what this does. When we're going to use
speculative load hardening, we need to somehow transform
this so that a branch predictor predicting that the bound is within, that the index is within the bound and predicting that we
enter the code from working. The way we do this is by, instead of generating
the code on the right, we generate the code on the left. So, let's try and walk
through this code on the left. This is for the same C++ pattern and understand how it works. First we need to build what we
call, a misspeculation mask. So, it's just all ones. We're going to use this whenever
we detect misspeculation in order to harden the
behavior of the program. We also need to extract the caller's mask because speculative execution
can move across function calls it could be interprocedural. So, we want to the caller pass in any speculation state that it has and we parse it in the high
bit of the stack pointer. This transforms the hide
it in the stack pointer into a mask of either,
all ones or all zeros. And in a normal program, you'd
expect this to all be zeros and in a misspeculated
execution, this is going to be all ones just like our
misspeculation mask. Now, we do our comparison
just like we did before, we have our branch just like we did before and we may mispredict this branch. If we mispredict the branch though, we're going to enter this basic block, when the condition is actually
greater than or equal to. And so, in that case, we
have a CMOV instruction and CMOV instructions
today, are not predicted by any x86 hardware,
and so, as a consequence we can write the CMOV,
using the same flag, greater than or equal to. And if we enter this block
when that flag is set, which should never happen, we write the misspeculation
mask over our predicate state, over this state that
we got from the caller. This essentially collapses
us to the all ones if we ever misspeculate this branch. Then we come down and we load
some memory just like normal, but keep in mind, this may
have loaded leakable bits, these bits may actually be,
something that can get leaked in some kind of actual attack scenario. There are some operations on
this that we actually allow. These are data invariant operations, these are the same kinds of
operations we would allow on private keys, if we were implementing a
cryptographic algorithm. They do not exhibit any change in behavior based on the data that they observe and so, they're safe
to run over this data. They just move things around and there's nothing that
you can glean from these. But before we actually
use this piece of data to index another array, we mask
it with our predicate state or all of those bits over
the data that we loaded. And because of this, if we misspeculated, all of the bits are now all ones, none of what we loaded is observable. And so, the fact that we then
do this data-dependent load remains safe. This is the core transformation of speculative load hardening. And we do this for every
single predictable branch in the entire program,
and we do this hardening for every single piece of loaded
data in the entire program. It's very, very comprehensive. There aren't these huge
gaps in what gets hardened and what doesn't get hardened. But there is a catch. The overhead is nuts, it's
just beyond belief, it's huge. 30 to 40% CPU overhead is a
best-case, medium-case scenario. Worst-case scenario is
even worse than this. If you don't access a lot of memory, then it can be lower overhead
than this but, I don't know, you don't access a lot of
memory, which is weird. For most applications, we expect this overhead to be very large. We've built a very
large service with this, we've actually like had them test it, in a live situation so
we can actually measure the real-world performance overhead, this is a very realistic
performance overhead you can expect from deploying speculative
load hardening to your service. I am very aware that
this is not an acceptable amount of overhead for most systems. They probably don't have
the CPU just kicking around. If they're latency-sensitive, this is actually going
to impact your latency. If you're not latency-sensitive,
you're still going to need a 30 to 40% increase in
capacity of CPU to handle this or 30 to 40% decrease in the
amount of battery you have if you're running on a device. This is a really, really
problematic overhead. Unfortunately, this is the
best that we know how to do while still being, truly comprehensive. The only things we know to
really reduce this at this point also open up exposure to
various forms of attack and that's not what we want, that's not the trade-off we wanna make. So, what else can we do? This has been a grim list
of, stories about mitigation. The other thing you can
do, is you can isolate your secret data from the risky code. Sandbox any, and this is
actually the thing that works even for untrusted code. When you have sandbox code, you
have to actually separate it from the data with some
kind of processor level security abstraction,
typically separate processes on a modern operating system. That's, really the only
thing that's enough for untrusted code, because
this is the only mitigation we realistically have for variant 4. This is what all the
browsers are working on in order to mitigate variant 4, long-term. Everything else looks
short-term, too expensive or doesn't work in enough cases. The other interesting
thing is, if you do this, this protects against all of
the other variants of Spectre. If you actually, can separate
your code in this way, you are truly protected from
Spectre, and it gets better. You're also protected
from bugs like Heartbleed. It's now, very hard to
leak information at all because the attacker
doesn't have access to the program that actually is
touching the secret data. So, the extent to which you
can design your system this way it can really, really increase
the security of your system, it can really make it hard to
suffer from information leak vulnerabilities in general. We really do think this is a
powerful mitigation approach. Ultimately, you're going to need some combination of approaches
targeted to your application, oh, I almost forgot, sorry. I forgot, we actually
can live demo this too. Just so that we're all on the same side. I build this and you can
see there's a little, there's an extra flag in
there and now when I run it, whoa, that's not good. Helps, if you run the right program. So, when I actually run the mitigated one, it doesn't leak anything. This is just like linking
random bytes of data. If you want, I can open up the
binary, we can stare at it, it's gonna look a lot
like what I presented. But, this actually does work. You do want to expect to like, need some mixture of these things. You've got to look at your
application, your threat model, your performance characteristics, how much of an overhead you can take to pick some approach here. There's not this, oh yeah,
you do this, this, this, you're done, go home, everything is easy. That's why I gave a long
presentation about it. This isn't sadly, the easy, easy case. There's also some stuff I
want to see in the future because like I said, we're not done here, we're not finished. So, I've got three things that I would really, really like to see. Number one, we have to have a cheaper operating system and hardware solution for sandboxing protections,
like the last one I mentioned because that's the most
durable protection, provides the most value by far. We need an easier way to do this. The browser vendors are really
struggling doing this today and we should make that much, much better so that we can deploy it more widely. The second thing is, cryptography
really needs to change. The idea that you do cryptography with a long-lived private key
that you keep in your memory, is a very bad idea. We need to go and make sure every single cryptographic system is separating the
long-lived private key data into a separate subsystem
and a separate process, potentially, leaving it
on disk until it needs it because this is too high risk. We have the cryptographic
parameters we need here, things like ephemeral keys in TLS 1.3, we have good techniques here
in the cryptographic space, we need to use them, we need to stop using
older cryptographic systems that require these long-lived,
stable private keys, especially, small elliptic
curve stable private keys to be visible in memory,
to a system under attack. That's a very, very bad,
long-term proposition in the wake of Spectre. And last, I think we have to
solve Spectre v1 in hardware. I do not think, that anything
I've shown you for v1 is tenable, long-term. I think we may be able to sneak by for the next five to 10 years, while the hardware
community moves on this. I understand that there are
real timeline issues here that they cannot change,
but they must actually, solve this in hardware. Think of it in a different way. I do not believe that
we can teach programmers to think about Spectre v1. How do we teach programmers? We say, like, well, you have
these set of assumptions and once you build up these assumptions, you work within them and then
you build up more assumptions and you work within those, and
you build up more assumptions you work within those. And how does Spectre work? It says, eeeh, not really. You have all those
assumptions, they're very nice but I didn't pay any attention to them. Now, we have to teach people to think about the behavior of their code, when literally, none of
their predicates hold and I don't think that's viable. This is different from saying, like today, we have C++ without contracts. We're gonna to get contracts to it. This is worse than going
back to C++ without contracts 'cause today, what we have
are unenforced contracts, we have contracts in our
documentation, in our comments, everywhere, right? We have asserts, we have
predicates, everywhere. Imagine having none of them and having to write code
that was correctly behaved even in their absence. I don't think that that's
viable, and so, I do not think we can exist in a computational
world where a Spectre v1 is a thing programmers are thinking about. I think we have to actually remove it. And so, I'll give you a brief conclusion. Spectre, misspeculation, side channels give you information leak of secrets. It's a new and it's an
active area of research, this is going to keep happening
for a long, long time. We have at least a year,
maybe years, plural, of issues that have yet to be discovered. You need to have a threat model to understand its implications for you and you need to tailor
whatever mitigation strategy to your application because there is not a single one that looks promising. And ultimately, I want
all of you to help me convince our CPU vendors,
that they must fix Spectre v1 in hardware. We can't actually sustain this world where our assumptions do not hold. So hopefully, you all
can help me with that, and I thank all of you
and I also wanna thank all the security researchers
that I've been working with for the last year, across the industry, it's a tremendous group of people, they've taught me a whole lot. Hopefully, I've taught you
all at least a little bit and I'm happy to take questions. (audience applauds) Just as a quick reminder, we only have a few minutes for questions like four or five minutes for questions. I would really encourage you, focus your questions on my talk. We're going to have a panel to talk about everything to do with Spectre in about, just over half an hour. I'll be there, a couple of the other folks working on this will be there. If you have generic questions,
feel free to wait until then and we'll try to answer them then. With that, let's do the
question on the left or the right here. - Some mitigations require a compilation. I'd like to understand,
it's like a compilation of everything, right? It's not see specific problem, it's processor instructions,
specific problem? - Yes. The key thing here is, as we start to work with Spectre, we see an increasing need for
you to be able to recompile all of your source code in
your application somehow. Because all of it, potentially
has, the vulnerable piece. - So, that's true about Java
managed systems and whatever? - To a certain extent, its true
of Java and managed systems however, constructing ways to actually break these types of things is much harder in managed systems. - Hi. This all is based on the fact that the speculative execution executes code that is actually not supposed to run. So, eventually, the pipeline will catch up and the CPU will realize that, I'm actually not supposed
to execute this branch and then stop executing it. Just, like a ballpark estimate, how much code can I get into that before the CPU realizes that,
I shouldn't be executing this and stops doing it. - That's a great question. The key question is, how much code can be speculatively
executed in this window? What's the window of my risk? I have been asking processor vendors that question for a long time
and they will not answer me. But I'm not throwing them under the bus. I actually understand why, increasingly, I really understand why. I don't think that there
is a simple answer, it's not that easy to
reason about because, what you actually are
seeing is the exhaustion of resources on the processor. But different kinds of instructions exhaust resources at different rates. It's very hard to say,
oh no, 100 instructions and then you'll be done,
because different instructions may take up different
amounts of resources. However, in practice, we have seen hundreds of instructions
execute speculatively. Not tens, hundreds. And we should expect that we
will get better and better at tickling this particular,
weird part of the processor and sending it further and
further down these traces. We should also expect that
processors are going to speculate more and more as they get
larger and more powerful. - Thanks. - You said a mitigation for
this is to put untrusted code in a separate process
from the secret data. - Correct. - But you also said that there's
something called NetSpectre where you can exploit over a
network, how does that work? - If you're moving untrusted
code into a separate process what you're protecting the data
from, is the untrusted code. You can also move
trusted code that handles untrusted inputs to a separate process. And then, NetSpectre is going to leverage that code to
read data in that process. But if that process doesn't
expose to its untrusted inputs, any control over the inputs to the process with the secret data, you
can't construct an attack. And you have to think
really carefully about, just how trusted is my input? Can I fully trust, can I fully validate the communication, the
secondary communication from the at-risk process
to the trusted process? But sometimes you can do that. Sometimes you can say like, no, all of the communication there
is written by the programmer, is trusted. All we can do is select between those, we can't construct arbitrary risky inputs, so now, we can trust our
inputs in the trusted process, we don't have to worry about
a Spectre vulnerability. - So, we have to think
about, not just trusted code but also, trusted input?
- Absolutely. At-risk code is either untrusted code or code handling untrusted data. - Cool, thanks. - It seems to me that the
whole issue is because, the CPUs are trying to
speculate where they are going and try to do this optimization on the way of, they are working. How bad would be to turn
this completely off? - What's the cost of turning
off speculative execution? It's actually pretty
easy to simulate this. When I built the speculative
load hardening compiler parse, I also built something that added Intel's suggested mitigation of an LFENCE but instead of doing it
only on the risky branch, it adds them on all of them. It's a very simple transformation, much simpler than the
speculative load hardening. And I measured the performance of that. And that's actually an
interesting thing to look at because what LFENCE
does, is it essentially, blocks speculation past the fence. And so, this doesn't turn
speculative execution completely off, but it
dramatically reduces speculative execution on the processor. The performance overhead
of this transformation was somewhere between a 5X, to a 20 or 50X
performance reduction. There was like several very
tight computational loops so, well over 20X performance
reductions and at that point, I started having trouble
measuring with high accuracy. I don't think that's
even remotely desirable due to the performance impact. This shows you also,
how incredibly important speculative execution is. No one should leave this and be like, "Oh, those processor designers, "why do they have to use
speculative execution?" It makes your program 20X faster. It's really good, unfortunately, it does come with a problem. - Hello, I wonder on the impact
on compile optimizations. For example, when it was pretty
new I tried to get rid of all my indirect jumps by just
not using function pointers and I observed that basically, the only option I had to
parse to my compiler was to disable jump tables to get rid of it. Like some compiler parsers
now being overthought to like maybe, generate
completely different code. - The question is, is
Spectre really changing how we think about compiler optimizations? I don't think it is in a lot of ways because a lot of software isn't
really impacted by Spectre. So, we want the
optimizations to run there. But when we know we're mitigating against some part of Spectre,
we definitely turn things off as necessary. So, when you're using
Retpolines for example, we turn off building jump tables, so that we don't introduce more
of these risky things that we then, have to transform. But I don't think there's
a lot of impact beyond that long-term. Mostly, the impact on compiler
optimizations is figuring out how we can mitigate these
things less expensively. - Okay, thanks. - Most of the stuff on memory leaks all happens during speculative
execution and gadget chains are relatively inefficient
use of instructions. How deep can you go, how many
instructions can you execute speculatively, given
those two things combined? - Again, we don't know, we
don't have card answers here, but our experimentation shows,
hundreds of instructions which is more than enough to form any of these information leaks. And remember, even even
though your gadget chain for a wrap-based gadget chain,
may be fairly inefficient. The set of operations
needed here is fairly small. They fit into a pretty tight loop, especially if you're willing
to have a lower bandwidth timing mechanism. I used a fairly high bandwidth, high reliability timing mechanism. There are other approaches that are much shorter code sequences, that for example, extract a single bit at a
time rather than extracting all eight bits of a byte in one go. And so, there are a lot of different ways you can construct this. - Thank you. - It sounds like you said that, none of these approaches
will work across a process or a hypervisor boundary,
and I was just curious if you could elaborate a
little bit on why that is and what protects us in that scenario. - The key question here is, why are we safe across these boundaries, these operating system
and hardware boundaries such as system calls,
privilege transitions, virtual machine transitions? Fundamentally, we aren't
protected by these inherently but the operating systems and hypervisors have all been updated in
conjunction with the hardware to introduce protections
on those boundaries. And so, that's why, the
very first thing I said was, you must have the operating
system mitigations in place, otherwise, you don't have
the fundamental tools to insulate one process from another. - Thank you. We're gonna cut this short but I'll take these three questions. If you do have a question that
would be fine at the panel, consider if you can
just, wait in 20 minutes and you can ask it then. - You said that basically, if you don't have anybody
to steal the secrets, then you're safe, so like, nobody your process communicates with-- - You're safe from from information leaks. - Yes. I think I remember reading,
when Spectre came out that you can actually
use it by just running another process on the same machine, so like, there's no obvious
communication going on but you can like time caches or something, without any relation to unit processes. - You have to have some way
of influencing the behavior of the thing you're running. There are some edge cases
where you can do that from outside the process, as
just a sibling but those are pretty rare and isolated, I think it would be very,
very hard to do that. You have no way of triggering
a particular type of behavior of the victim. It's gonna be very hard
to cause it to then, actually leak the information
you really care about. This is less true for
some of the other things that are mitigated at the
operating system level, but for Spectre specifically. - Can you tell us anything about Spectre and non-memory related
side channel attacks? - The question is, are
there other side channels and the answer is, yes. There are many, many,
many other side channels. BranchScope showed a, branch predictor-based side channel. The NetSpectre paper
included a frequency-based, very generally, a
frequency/power-based side channel. Essentially, any bit of state in the micro-architecture of the processor that you can cause to change
during speculative execution and that does not get
rolled back is a candidate and there are a tremendous
number of these things. - Thank you. - You ended with you talk
with a sort of, call to arms for us to help you convince-- - I wouldn't say arms, I would say action. - Action, sure. For us to help you
convince hardware vendors to mitigate this in hardware. I have heard that Google
spends quite a lot of money with hardware vendors, so, one might be forgiven for wondering if Google can't convince them, what hope do the rest of us have? - The key issue is, why is one person asking
the hardware vendor if that person buys enough CPUs, why is one entity asking the
hardware vendor, not enough? Fundamentally, these hardware vendors are not in a good position
to scale their production and their economies of their production in ways that differentiate
between customers arbitrarily. So, if only one customer
really needs this to happen, they may not be in a good position to spend a tremendous amount
of money building that when only one of their
customers will benefit. If all of their customers want it, then they get the full economies of scale for that particular feature. My fear is that, this feature is going
to be expensive enough on the hardware end, that
unless it's universally desired, it won't make economic sense
to the hardware vendor, and so, that's why I think,
everyone needs to do this. But it's also important to keep in mind, we literally do not
know how to do this yet. We have some ideas, a
few people have ideas, they're not fully fleshed out,
we're not sure that they work we're not sure that they're implementable. And so really, the first step is to try and figure out how to do this, what the cost would be and then hopefully, if there is a way to do it
at a cost that at least, is reasonable, if the entire
user base of these processors lobbies very effectively, I'm hopeful that the processor
vendors will actually step up and provide a real solution, long-term. But with that, we should
probably end the Q&A and hopefully, you'll all
come to the panel session which will be a lot of fun, thank you all. (audience applauds)