CppCon 2018: Chandler Carruth “Spectre: Secrets, Side-Channels, Sandboxes, and Security”

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

(lively music) (audience applauds) - I'm sure that none of you really need an introduction for our closing plenary speaker. I know everyone is really tired but if you could all get up some energy, put it together for Chandler Carruth, who is going to scare the heck out of us about the things that can get into our systems. (audience applauds) - All right everybody. How are folks doing, folks like energized, you sticking strong to the end of the conference? It's been a long week. I'm here to talk to you about Spectre. How many folks here do not know anything about Spectre, have no idea why this is even an interesting talk? It's okay, you can put your hand out, it's not a problem. A few people. Spectre is a big security issue that was, kind of, uncovered over a year ago. It seemed really interesting to me to come and give a talk about this, in part because, last year, I was up on a stage here giving a talk. I was really hoping to actually roll the video for you but I don't think we managed to get all of the technical issues sorted out here. But the key thing here is, the very last question. In my talk last year, I didn't give a very good answer to. Someone asked me, the whole talk was about speculative execution. If you haven't seen it, it's a great talk, not to self-promote but it's a great talk. At the end of it, someone asked, what happens to instructions that are speculatively executed if they would, like crash or do something weird? Very fortunately for me, that was at the end of the talk, I said that, I don't know and I kind of blew it off and I said the session was over and we wrapped up for the day. That's not a great response and so, I'm gonna give you an entire talk instead. (audience laughs) Before we get too far into it, we gotta set some ground rules. I'm talking about security issues today and I'm actually not a security person, you may not know this, I'm not a security expert. I'm not gonna know all of the answers that I'm talking about here okay, and that's okay. That's part of why a bunch of the Q&A is gonna be in a panel that we have after the talk so I can bring some other experts who've been working on Spectre with me and with a lot of other people in the industry up onto the stage and they can help me out in answering your questions. But we need some ground rules 'cause security can be tricky to talk about. A good friend of mine who's also been working on this was at a conference and he was talking about security issues and they were having a great holiday conversation and he ended up tweeting about, I think I can probably attack this particular vulnerability this way and didn't really give a lot of context, it was a tweet, we've all made tweets without adequate context. And so, the Las Vegas Police Department came and talked to him about exactly why he was figuring out how to attack people at this conference. I had to have a very long conversation with them, I don't want any of you to have that conversation, I don't wanna have that conversation. So, we're gonna try and use careful words, I don't really want to talk about exploiting things, I want to talk about vulnerabilities. I don't want to talk about attackers, I wanna talk about threat actors. Sometimes, these people are actually white hats, they're actually working for the good people, they're trying to find vulnerabilities. I'm not gonna be perfect but I just wanna encourage people to really think about what words we use as we're talking about this stuff. The other thing I gotta tell you is, unfortunately, with a sensitive topic like security, I am not gonna be able to say everything that I know. I couldn't say it last year and I'm still not gonna be able to say everything I know. I'm gonna do my very best but please understand, when I have to cut you off or say like I'm really sorry, I can't talk about that, please be respectful, understand, I'm doing everything I can but there are restrictions. These deal with issues spanning multiple companies, sometimes intellectual property issues and also security issues where we can't disclose things without kind of, responsible time for people to get patched. And that last point brings me to another thing. If you're out here and there's some really brilliant people in the room, I'm sure, if you think I've got it, I totally see this way more awesome way to break through that system, to find a new vulnerability, I would ask you, don't come up to the microphone in public with that, right here and now because none of the security people really like to have instantaneous disclosure, they like responsible disclosure. I'm happy to talk to you offline, I'm happy to point you at other people who can talk to you offline and figure out how to go through that process. That said, I do want you to ask questions, especially at the panel, please come up to the microphone with questions, just understand, if we have to push back a little bit, we're doing what we can to try and keep this discussion at the right level 'cause we're talking about very recent and very current events. With that, let's get started. When I first started working on this, I actually had a hard time even following the discussions, I felt like I was a kid, I didn't know what I was doing and a lot of that was because there's background and terminology that I simply didn't have. I can't give you all of that, I don't have all of it myself, I'm not a security researcher but I'm gonna try and give you enough for this talk. First off, we have vulnerabilities, this is a common thing it's pretty obvious, it's some way you can take a system and cause it to behave in an unexpected and unintended manner. Not too fancy. But a gadget is a weird thing, in a security context, we mean something very specific, by the term gadget. We mean, some pattern of code, some thing in a program that you can actually leverage to actually, make a vulnerability work. These tend to be the little building blocks of vulnerabilities. So, whenever you hear security people talking about a gadget in the code, that's what we mean. Let's get to slightly more interesting terminology. An information leak. This is a kind of vulnerability. There's a very classic example, Heartbleed. What does an information leak do? Well, it takes information that you shouldn't have access to and it gives you access to it. But I don't think, talking about it, it's the easiest way to figure this out, let's see if we can actually, show you what an information leak looks like. Hopefully, my live demo actually works here. I've written, probably the world's most simple information leak that you'll ever find. We have some lovely data here including hello world and hello all of you, but we also have a secret, something we don't want to share publicly. We have a main function that's gonna go through and process some arguments. This could be just any old API that takes untrusted input and it tries to validate it. We try and make sure that we actually, if we don't have the argument, we give it a nice default, if we do, we actually set it from the command line, we extract this length and then we even bounds check it, but we wrote our bounds check in a really funny way. Some of you may be reading this bounds check and just being like, uh-uh, this isn't gonna end well for you buddy. Unfortunately, it's not. Let's take a look at how this actually works. So, if I run this program, it doesn't do anything interesting, it has defaults, it says, hello world. But I don't like just talking to the world, let's talk to you all of you all, hi everybody. This all fine. And we see we have a length of 13 that's our default. If I give it a small length, it just truncates it off that's fine. But what happens if I give it too long of a length? Uh-oh. This is because my bounds check isn't very good. And if I give it a long enough length, it's actually going to print out all of this secret. It wasn't intended, I didn't write any code that would've allowed it naturally, to go and read that secret. If I try and just give it a higher index, it's like no, you can't read it. But because there's a bug in my code, I could have an information leak, and this is literally the core bug behind Heartbleed, this is how Heartbleed happened. Is everybody happy with information leaks? Let's talk about side channels. Side channels is the next core component of this. A side channel is some way of conveying information using the natural behavior of the system, without setting up some kind of explicit communication channel, can we embed a communication inside of something that's already taking place that's routine and common and expected to take place. You'll see in some discussions, this gets kind of muddied with the term covert channel. I don't particularly like using that term for things like Spectre. A covert channel, I understand much better by thinking about old-fashioned spy, who here likes spy movies? I've got some people who like spy movies. Covert channels are like spy movies. That's like when you say, when I raise my blinds on the third Wednesday of the month, we meet, that's a covert channel. It's not a normal thing, I'm not always raising and lowering my blinds, it's just that, it doesn't look like a communication mechanism but it is intentionally set up as a communication mechanism and used for that purpose. A side channel is not something we intentionally set up, it's just something we can take advantage of that was already happening. Let's look at a side channel. Again, I think seeing this stuff is a lot better than just describing it. I built a little side channel demo for you all but unfortunately, this is gonna be a lot more code so, I'm gonna try and step through it. It's okay if you don't understand everything, like I said, we're gonna have a whole panel, but I'm gonna try and give you at least, the gist of how this works. The first thing I have is a secret. The secret is just a string, it's nothing too fancy and I have some code that does force reading, and I have some timing code, I have some random math code that's not super important. The main body of this is this leak bytes thing. The very first line of this, up at the top, I have a timing array and this timing array is a big array of memory that I can access in different ways to access different cache lines on a modern processor. I then extract this string view, this nice string view which tells me about this in bounds range of text and I build some data structures to collect information, latency and scores. And then we start runs, and we do a bunch of runs until we get enough information to believe that we have actually found some information embedded in another medium, in this case, in a timing side channel. First thing we do is, we flush all of the memory then we force a read but not just any read. We load the information out of data then we use that to access the timing array. And we access it not just locally but at strides. And so this means that, for different values in this data array, I'm gonna access different cache lines. Then, I have to go and see whether that was successful and in order to see whether it was successful, I have a loop down here which kind of, looks for, which kind of shuffles the way I access memory and then, accesses each and every cache line in this timing array, does a read and computes the latency of that read. It is just timing each cache line in a way that lets us see whether one of these cache lines was faster than all of the others, because we've already accessed it, we accessed it right before, in the previous loop. Makes some sense? And then, we go and we find the average latency because we don't wanna hard-code any constants here. If one of the latencies, if one of the latencies for one of the cache lines was substantially below the average, then we think, cool, that was probably a signal embedded in our timing side channel, we bump the score and if we get the score high enough, down here at the bottom if we get the score high enough, we gain confidence, yeah, we found our signal, we've actually found the information. Makes sense to folks? Let's see how this works. If I run this, that was pretty fast. If I run this, you're gonna see, it's gonna print out each of those characters. And each one of those, it's not actually looking at the character, it's timing the access to memory. Makes some sense? It's actually that simple. There's not more, I don't have anything up my sleeves. Like I promised, this is like a real, this is a real demo. You have one more key piece of, quarter knowledge here and that's speculative execution. We talked a lot about speculative execution in the talk I gave last year, I'm not gonna try and give you a full rundown on how processors do speculative execution, the key thing is, that it allows them to execute instructions way past what the program currently is at and sometimes, with interesting assumptions. Because in order to execute further along than the program currently has, the processor has to make predictions. These predictions, are really more like guesses. And sometimes, it guesses wrong and it makes an incorrect prediction but it continues to speculatively execute and it just unwinds all of that later. But when you have this misspeculation and you combine it with a side channel, it allows you to leak information that was only visible during that speculative execution. And that speculative execution may have occurred with strange invariants, with invariants simply not holding and so, you can actually observe behavior from a program that violates the fundamental invariance the program set up. And that's Spectre and that's why Spectre is so confusing. You wrote the code and it clearly, only does one thing but observation shows something else. Let's see if we can map this on. My demo for this one is going to be essentially, Spectre v1. But I've tried to make it as similar to the previous two demos as I could. Just like last time, I have a text table with three strings. I've hard coded it to try and read using this second string we can jump down to the main function and you can see what it's actually doing here. We actually are going to, always use this text table one. That's the only thing we hand to leak byte. We do not hand the second, like the third entry in our text table to this routine and we hand it to string view with a bound in it. And then this loop is essentially, computing an out-of-bounds index into this thing and we're parsing this index. But this I is always going to be out of bounds. We're computing it based on a totally different string. This index is never in bounds. Once we get up to the leak byte, we have a slightly different routine, we have the same setup with one small difference. We put the size of our string view into memory. This is me cheating so that it fits on a slide but, the idea being that your size might not be sitting in a register, it might be slow to access. Then we have our runs. Getting a good demo of this is a bit tricky. One thing we need to do, is we have to essentially, train the entire system using correct executions before we can get it to predict an incorrect execution. And so, I build a safe index into my buffer of text and this is always gonna be in bounds, this index is totally fine. But it's important to note, this index is not stable, each run gets a different one, it's not at all going to be useful for extracting any information from this routine. The only thing it's useful for is actually, accessing my data in a safe manner. Then I am going to flush the size out of my cache. It doesn't matter that I'm flushing it or doing something else, all I really need to do is make size very slow to compute. Then I wait a while. Turns out that this little stall here is important or it doesn't tend to work. And I compute this weird local index. This local index is essentially, the training and then the attack. For the first nine runs, we just access a perfectly safe index, but then on the tenth run, we switch to the index the user parsed in. So, just nine good, a tenth one bad. Then we do a bounds check. I wanna be really clear, we always do a bounds check and this is a correct bounds check. We make sure that the index is smaller than the size and that means, we will never access the data out of bounds here. We hid it in a string view, a safe entity. Herb has told us all about how safe string view is but then when I come down here, I'm going to access it using a local index and the problem is that this access right here using the index may happen speculatively and it may happen before the bounds check finishes and when the bounds check was going to fail. So, it accesses an out-of-bounds piece of memory, it uses that, scaled up, to access the timing array then we read through that yet again and all of a sudden, we have leaked information, we've actually accessed our side Channel. The rest of this is the exact same code. We go through and we measure all the times to see like yes, did we in fact find one of these cache lines being slower, and if so, we compute it, there's nothing else different from this example and the previous one. And when I run this, it's actually going to print the string. And we never accessed this memory. If I made this example a lot more complicated and move that memory into a separate page, I could even protect the page so that any access would fault the program would run fine. Because we never access the memory directly, we leaked it through a side channel, so that is Spectre. I know this is an uncooked thing, I just ran it on an Intel laptop here. If we make really good time, I'm happy to try and actually show you this actually working on a non-Intel machine as well. I have it but unfortunately, we had some AV issues and so, I'd have to sit here and type in passwords like half a minute, it's not really fun. Let's for now, kind of go back to presentation. We've gone through and we've looked at all this speculative execution, we've looked at Spectre and misspeculative execution, but if this were just one issue maybe, it wouldn't be that bad. It isn't just one issue. This is an entirely new class of security vulnerabilities. No one had really thought about what would happen if you combine speculative execution and information leaks. They had no idea that there was something interesting here and as a consequence, we have had a tremendous new set of security issues coming in and I'm gonna try and give you a rough timeline of all of this. It started off last year in June, when Project Zero at Google informed vendors of various CPUs and other parties about the first two variants of Spectre which are called bounds check bypass and branch target injection or variants 1 and 2. Then, a few weeks later, they found a third variant, it's called variant 3 or rogue data cache load or much more popularly, Meltdown. And vendors were working furiously for the rest of the time until January, when these were finally disclosed publicly as variants 1 and 2 of Spectre and variant 3 of meltdown. During this time period, they were found by other researchers who were looking in the same areas, kind of concurrently and all of the researchers kind of held their findings in order to have a coordinated disclosure here because this was such a big and disruptive change to how people thought about security. Most of the companies working in this space actually didn't have teams set up in the right place or with the right expertise to even address these security issues. So, it was a very, very disruptive and very challenging endeavor because it was the first time and a totally new experience. But we weren't done. After this, we started to see more things. The next one was in March, called BranchScope. BranchScope wasn't a new form of attack, it was actually, a new side channel, instead of using cash timings that pointed out that you could use the branch predictor itself to exfiltrate data from inside a speculative execution to a normal execution, just a different side channel. We also started to see issues coming up which had nothing to do with Spectre but were unfortunately, often grouped with Spectre because this stuff is complicated. I don't know about you all, but I think this stuff is complicated, the press thinks this stuff is complicated and they ended up merging things together, understandably. And so, there were issues around POP and MOV SS which are weird, Intel and x86 instructions that have a surprising semantic property that essentially, every operating system vendor failed to notice when reading the spec. And unfortunately, those bugs persisted for a long time but now that people were looking at CPUs and CPU vulnerabilities, they were able to uncover these and get them fixed. They don't have anything to do with speculative execution or Spectre. There's also Glitch, again, doesn't have anything to do speculative execution on CPUs and Spectre. But there was another interesting one in May and this is two things, variant 3a, was a very kind of, obscure variation on variant 3 and then variant 4. Variant 4 was really interesting, and I mean, really interesting. This one's called speculative store bypass. This was also discovered by Project Zero and by other researchers concurrently. And this one made Spectre even worse than it already was. So, this really kind of, amplified everything we were dealing with. And we still weren't done. The next issues were Lazy FPU save and restore which we saw in June. This was super easy to fix, it's kind of a legacy thing that hadn't been turned off everywhere it should have been and it turns out there's a bug. During speculative execution, you may be able to access FPU state. That the operating system has kind of left there from when the previous process was running. With the idea being, that it has an, it's gonna trap if you actually access it, and once it traps, it'll save it, it'll restore your FPU state and then let your execution proceed. But the trap happens after speculative execution. And so, you can speculate right past it, access the FPU state and leak it. This is an arbitrary memory but it ends up still being fairly scary because, inside of the FPU state, includes, things that are part of, that are used by Intel's encryption instructions. And so, you would actually put private key data in the exact place that you leaked which was really unfortunate. Again, this was mostly a legacy thing, very quickly and easily turned off. Intel and other vendors have been providing better mechanisms than this for a long time but we hadn't turned it off everywhere that we needed. We have another kind of mistaken entity in this, we got a new side channel attack that had nothing to do with speculative execution. It's just a traditional side channel attack on cryptographic libraries, called TLBleed, it's a very interesting attack, it's very interesting research but it doesn't have a lot to do with Spectre. And apparently, I have... Then in July, we start to solve even more interesting things in my opinion, even more interesting things coming up. These ones are called variants 1.1, 1.2.0 and 1.2.1 or collectively, bounds check bypass store, which is a, kind of a mouthful but this was a big, big thing. This essentially, extended variant 1 in really exciting ways that we're gonna look at. Then later in July, we got still more good news. We got to hear about SpectreRSB and ret2spec, yet more variations on this. And then in July, we got the worst news, for me at least, which was NetSpectre. NetSpectre was not a new vulnerability, it was not a new variation on Spectre, it was a really, exemplary demonstration that all of the Spectre things we're looking at can be leveraged remotely. It does not require local access. So, the NetSpectre paper actually used this remotely. Oh sorry, and one more thing, L1 Terminal Fault. This one was extremely scary but fortunately, has relatively little impact outside of operating system vendors so, we're not gonna spend too much time on that one. But there was yet another one that happened pretty recently. I don't think that we're over. This timeline is going to keep going as time passes. We're going to keep seeing more things come up as the researchers and the vendors kind of explore this new space, so, you should not expect this to stop. That doesn't mean that the sky is falling, it's just that we have to keep exploring this space and understanding the security issues within it. And this is gonna keep going for some time. But for now, let's try and dig into these things and understand how they work in a little bit more detail, especially outside of the one example that I've kind of shown you already. Let's look at the broader scope of variant 1, because variant 1, I've shown you just bypassing a bounds check, but variant 1 is actually, a much more general problem. Any predicate that the processor can predict can be bypassed and if that predicate guards unexpected behavior by setting up some invariants or assumptions, which most predicates do, you may have very surprising consequences. As an example, we might have, a small string optimized representation here, where we have a different representation for a long string and a short string. Up here, we have a predicate, is this long, is this in the long representation? And you might actually train and the branch predictor might think, this is probably long or it might think, this is probably short. Turns out, short strings are the most common cases, the branch feature will predict that this is probably going to be short. Unfortunately, a lot of short string optimization strings, the pointer to the short string is inside the object itself often on the stack, where there are other things that are really, really interesting to look at adjacent to the string object. And so, if we predict that this is short, we're going to get the short pointer 'cause it's actually just a pointer to the stack and we're going to start speculating on it and if we speculate far enough to find some information leak, this can be exploited. Then you have another interesting case. What about virtual functions, what about type hierarchies? Here, we have a type hierarchy, we have some base class for implementing key data and hashing of the key data and then we have public keys where we don't have to worry about leaking the public key, and we have a private key where we have to worry about leaking the key data. We have this virtual dispatch here and what happens, if we've been hashing public keys over and over and over again, and then we predict that in fact, we think we have another public key when we don't. We may dispatch it to the wrong routine, to the non-constant time one, speculate it and run right across the cryptography bug that this whole thing was designed to prevent. Again, the invariance you expect in your software, don't hold one speculative execution starts, that's what makes it so hard to reason about. There are also other variant 1 derivatives. So far, we've looked at cases where you speculate parse some predicate and you immediately find an information leak. But, there aren't that many information leak code patterns in your software maybe, so, that might be relatively rare. But that's where the the variants 1.1, 1.2 or the bounds check bypass variants came into the picture. Here, we have some delightful code which has some untrusted size. We're gonna come in and we're gonna have an out-of-bounds access here, and once we have this out-of-bounds access, we're actually going to copy into a local buffer on our stack, data that has been given to us by the attacker because we've got an out-of-bounds store that we can also speculatively execute. This speculatively stores attacker data over the stack. And if this happens, then later on, we're going to potentially, return from this function and when we return from this function, the return address is stored on the stack but we've speculatively written over it, this is a classic stack smashing bug now come back to haunt us in the speculative domain. Even though the bounds check is correct, it didn't help, we were still able to conduct a speculative stack smash. And this in speculative execution to an arbitrary address controlled by the attacker. Before I go on, it's important to really think about why, sending control to an arbitrary address is so scary. We've had bugs involving stack smashing forever, it's one of the most common security vulnerabilities but once you do that, you tend to want to build some kind of, remote code execution, you wanna build logic and trigger logic out of that. The best way to do this is to find the logic you want inside the existing executable and just send the return to that location. It's called return-oriented programming. You take the binary and you analyze all of the code patterns in the binary to find little pieces of code that implement the functionality you want. And then, you string them together with returns by smashing the stack and going to the first one which does something and then goes to the second one and so on and so on. The most amazing thing to me, again, I'm not a security researcher so when I heard about this, it just like, blew my mind. The most amazing thing is that, some very, very delightful individuals have built a compiler that analyzes an arbitrary binary to build a Turing complete set of these gadgets and then, emit a particular set of data values and a start point which can implement any program, which is a little bit frustrating. And then you realize, that it's actually easier in the speculative domain. It doesn't matter if it crashes after I do my information leak. For a real code execution, I don't just have to execute the code I want, I also probably, wanna keep the service running for a while, like I wanna, set it aside and not disturb it too much. Don't need to do that, I just need to hit my information leak, it can do whatever it wants, it can crash, it can do anything. And this means, if the attacker can get to this return, they're done. They have so much power, because we have this long history of work figuring out how to use this return to do really, really bad stuff to the program. Makes sense? But there are more ways you can do this. You can imagine, you have again, some type with some virtual interface. And you have this virtual function you created on your stack but then you process some code, also on the stack but with an attacker-controlled offset that may be mispredicted. And then, you're going to use that offset to index and this can index from one object on the stack to another because it can go out of bounds, 'cause we're in speculative execution. And then, we can potentially write attacker data over the stack, and this might write over the actual V pointer, that points the vtable for this object. Again, speculatively. It's all gonna get rolled back eventually but if we then hand control, off to some other function and this other function doesn't use the derived type, it uses the base class to access it, it's going to use that V pointer to load the virtual table to load a function out of it and call that. But you just got to point it at any memory you want which means you get to send this virtual function call anywhere you want in the speculative domain. It's just like the return, except this time, with the virtual function call. And I can keep going, there are a bunch of different permutations of how you can hijack control flow here. But the easiest way to hijack control flow and send it to your information leak gadget was in variant 2. And this is why variate 2 was extra scary until it got mitigated. Variant 2 works something like this. Again, we have our class hierarchy, we have some, sorry, not class hierarchy, we have a function pointer here, just any indirect function call, doesn't matter how you get there. We're gonna call through it. Well, how does this actually get implemented in the hardware? To really understand variant 2, we've gotta start dropping down a few layers into hardware. We're gonna drop into x86 assembly at this point. This is actually the x86 assembly produced by Clang a little while ago for that C++ kit. Right here we have this call with the weird syntax, we're actually calling, like through memory. And what this is doing, it's actually loading an address out of the virtual, sorry, out of the state function and then calling through it. This is an indirect call. This is really hard on the processor because it doesn't know where this call is going to go and it wants to predict it, that's how we got into speculative execution. But the implementation of this predictor has a special problem. This is my world's worst diagram for it but it gets the point across. The implementation of this predictor is essentially, a hash table. It's a hash table that maps from the program counter or the instruction pointer of the indirect call to a particular target that we want to predict. But it doesn't map it to the actual target address, oh no, it maps it to a relative displacement from the current location because that's smaller, we can encode that in a lot fewer bits. And then you realize something else. This is a really size constrained thing, this is literally, a hash table implemented in silicon. And so, in order to implement this, the hash function actually has to reduce this key by a lot, it doesn't use most of the bits and the hash function is really straightforward in a lot of cases. And so, there are collisions in these hash tables all the time. They're tiny, you would expect collisions and that's okay. So long as the collisions are infrequent enough, the performance is still good. But if you can kind of try out the collisions long enough, you can figure out how to cause a collision reliably in this hash table. If you can cause a collision reliably, you can train this predictor to go to your displacement. And then, when we do this call, we look up in the hash table we hit a collision, we get the wrong displacement and we go to the wrong location. And it turns out, this is really easy. The only thing you have to have in the victim code here is an indirect call and that's everywhere. Or even just a jump table to implement a switch, is enough to trigger the same behavior. That makes this really, really easy to exploit and actually, take and send control flow to wherever you want. But it's worse than that. There's another kind of indirect branch in x86 code, if you have a return. Returns on x86 get implemented with some instruction sequences that look like this. And again, we don't have a specific destination here, the destination's in memory, it's on the stack. And so, when you go to return, the processor has to predict it somehow. For calls and returns to processors, all have very specialized predictors that are super, super accurate, typically called, the return stack buffer. Unfortunately, sometimes, these predictors run out. They may not have enough information to predict it and on some processors, when that happens they fall back to the exact same hash table solution as we solve for virtual calls and for jump tables. And so, even a return can, in some cases, trigger this behavior. That means, it's actually pretty easy to find these in code. That's variant 2. I'm gonna keep going. I'm skipping over variant 3 because variant 3 was completely addressed by the operating system, user code does not need to worry about variant 3. So, let's look next at variant 4. Variant 4 is called speculative store bypass. This is actually pretty easy to understand what it does. It's exactly what it says in the name. Sometimes, when you read from memory, instead of reading memory that was just stored at that location, you will read speculatively, an old value. That's really it. The problem here, is that the processor may not know whether the addresses of these loads and stores match. And so, instead of waiting to see if they match, they'll guess, they'll predict. If they mispredict, they may predict that the store and the load don't have the same address. And if it predicts they don't have the same address, it may speculatively execute the load with whatever was there before, that store. That's pretty simple and you can imagine how this works. Imagine you have an application which runs some sandbox code in the callback here and hands that sandbox code, a specific private key. We don't ever want to hand a private key to the wrong callback here. One of these callbacks owns one of the keys, another callback owns a different key. But when we're going through this loop, the key gets parsed by value and that means, this is a bit big to fit into registers, we're going to store a copy of this key onto the stack, then we're gonna call the function with the pointer to that entry on the stack. It's gonna finish, come back, we go to the next one, we store the next key onto the stack and call the next function. But if that function happens to speculatively execute in the right way, its loads may not observe that stored key, it may observe the previous function's stored key. And then be able to leak it and we have another information leak. It turns out, that this is the fastest of the information leaks that we have found. If you can hit this reliably, you can extract data at an unbelievable rate with this particular technique. This technique caused tremendous problems for browsers and other people doing sandboxing as a consequence. But there's also is other implications. You can imagine a variant 1-style information leak that's actually powered by variant 4. So here, we have a vector that we're returning from some function, which means we're gonna store a pointer like some pointers but also a size into memory here. Then, when we come down to our bounds check, we may be reading size out of memory and if we're reading size out of memory and it happens to be slow, it may not see the store just before this in size. And so, it may speculate instead, reading whatever was on the stack before the store, which might just be a random collection of bytes probably a very large number, means this bound check will parse, but it's using the wrong bound. It's not that we've bypassed the bounds check, the bounds check occurred, it just used the wrong bound. And again, we get into the classic information leak as a consequence. Variant 3, like I said, this is mostly about operating systems. I can explain if you folks want, but I'm just gonna keep moving for the sake of time. We also have Lazy FPU save and restore, I mentioned kind of how this worked. But again, this was largely fixed by operating systems since the operating system is the one switching context, it can change its behavior and prevent application code from having to worry about this. An L1 Terminal Fault. The way L1 Terminal Fault works is amazing. There are certain kinds of faults that, when they happen speculative execution can again, occur. And if you arrange everything just right, especially with page tables and other aspects of your system you can essentially read arbitrary data out of the L1 cache while this terminal fault is being handled and leak it with speculative execution. And there are a bunch of different ways to observe this, there is a great paper that introduced this called Foreshadow and showed, that this actually works inside of Intel's secure Enclave SGX. And yes, it just allows you to read the entirety of L1. If you haven't seen it yet, go and look for the video online about this. You can actually find one of the researchers which has a window at the bottom of a Windows machine and as they type in the administrator password, the window shows the administrator password in real time. It's really, really effective. But again, this is mostly an operating system concern and so, operating system changes and hardware changes are being used to address this. Application code doesn't have to deal with this directly. I don't know about all of you, but I think that was too much information. So, I'm gonna try and summarize in a way that you can kind of wrap your head around. This is gonna be the most busy slide I have. This is the summary slide, of essentially, all of this background information. We have four variations on Spectre v1. There's v1, 1.1, 1.2, ret2spec, which I just didn't have time to show you all. These are all taking advantage of the same fundamental mechanisms and they have very similar properties. They can impact application code, they can impact operating system code. They don't require to be using hyper threading or simultaneous multi-threading in your CPU. We have really slow software fixes that none of us like and we don't have any realistic hardware fix on the horizon. These are actually the thing I'm gonna talk about most, because these are for me, the most scary. Note that red column on the right. We also have variant 2 which actually, is the primary variant 2 but also, SpectreRSB which helps show how you can actually get variant 2 to work on returns. These are a bit different. While they impact both the application and the operating system code, they do require some things to be true. For you to attack from one application to another, you really have to be using hyper threads or SMT. The other nice thing is that, we have some much better hope of fixing these. We have a very good software fix for variant 2, we don't have a great software fix for SpectreRSB or variant 2 when it's hit with the return instruction but there's some stuff you can do, but it's not as satisfying. But we do have good Hardware fixes on the horizon, future Intel hardware, future other, future hardware from other vendors is going to do a very good job of defending against this. Then, we have variant 4. Variant 4 looks, in terms of the risk, more like Spectre v1 but with less hope of mitigating it. It impacts applications, it impacts operating systems, it does not require hyper threading for one application to attack another. We have absolutely no hope of fixing this in software and so far, the hardware fixes are proving problematic. There is one that's slow and the browser vendors aren't using it and have some concerns about it, and so, this one's still pretty fuzzy. And then we have a bunch of things at the bottom that I really view very differently from the rest because these are fundamentally, CPU bugs that just interacted very poorly with speculative execution and the Spectre techniques. And these, I think are going to very consistently, get fixed rapidly. I think these are in some ways, the least scary for application developers. Most of them don't impact applications at all, you don't have to change your code at all. They're only in the OS. We have a great software fix for Lazy FPU, so good that no one is going to try and fix the hardware and we have great hardware fixes for the other ones. And so, I think these are generally speaking, going very well. I'm gonna really focus on Spectre variant 1, variant 2 and variant 4 because those are the things that are really continuing to impact software today. To really talk about what you need to know in this space, we need to have a threat model. If you went to one of the earlier talks at the conference about security, there was a great discussion around how you do threat modeling. Unfortunately, that person is actually a security researcher and I'm not. And I'm certainly not your security researcher and so, I can't help you build a threat model and that's not what I'm gonna do up here. But I can give you some questions you can use when building your own threat model to really understand the implications of Spectre and speculative execution attacks on your particular software system. First off, does your service have any data that is confidential? Because if not, it doesn't matter if you have an information leak vulnerability, it's a very simple, simple answer. I love this threat model. Next, does your service interact with any untrusted services or inputs? Is there any input you don't fully trust? Is there any entity that talks to you in some way that you would not want to share all of the information you have with? If the answer's again, no, then, you're fine. This gives you a nice simple rule that fortunately excludes, I think, the majority of software we have out there. If you have nothing to steal or no one to steal it, you have nothing to secure, from information leaks. This is a pretty solid, mental model to use when coming up with your threat model. Unfortunately, we do still have a lot of software that doesn't fit this model. So, let's talk about how we can dig through those pieces of software. Do you run untrusted code in the same address space as you have confidential information stored? Do you have some information there and you're gonna run untrusted code right next to it? If this is the case, you have a hard problem. We do not know how to solve Spectre effectively for this case, outside of isolating your entire code from your confidential information. This is the case that browsers are in. You're going to see browsers increasingly dealing with this particular case. If you hit this, almost nothing else about the questions here matters, you're going to have the highest risk from Spectre. But maybe you don't have untrusted code running in the same address space, there's a lot of software that doesn't run untrusted code, which is good. Now you need to ask yourself, does an attacker have access to your executable? Can they actually look at your binary and reason about it in some way? Can they steal a copy of it easily? Is it distributed in some way that they would have access? That's gonna really change the threat model. If no one has access to your executable, they're going to have an extremely hard time using these techniques. It's not impossible, but it becomes incredibly difficult. However, you wanna be a little bit careful here because they don't need access to the entire executable. If you use common open source libraries, and if you link them in and if you build them with common flags, then, they have access to part of your executable. If you run on a distribution and you dynamically link the common distribution shared objects, they may have the exact same distribution and they'll have access to some of the executable and they don't need access to all of it to mount a successful attack. So, you wanna be a little bit careful how you think about this but it does really dramatically influence how open you are to these kinds of risks. The next question is, does any untrusted code run on the same physical machine? Because if the answer here is, no, you're really looking at a single mechanism for attack and that's the ones presented in NetSpectre. That's the way you're going to be seeing this. NetSpectre gives us pretty clear bandwidth rules and it turns out, the bandwidth is low and so, if you don't have untrusted code running on the same machine there's some very specific questions you wanna ask. How many bits need to be leaked for this information leak to actually be valuable to someone else? How many bits are at risk? If you have a bunch of data, if you have the next manuscript for, I guess Harry Potter is over, but whatever the next fancy book is, leaking that manuscript's going to be really hard, it's big. You don't need to worry about someone leaking the next video game that you've got a copy of on your machine, that's gonna be really slow. But if you have a cryptographic key, that may only be a few thousand bits. If you have an elliptic curve cryptography key, that may only be 100 or 200 bits before it's compromised. And worse with cryptographic issues, you may not need all the bits for it to be valuable. So, you really wanna think about this. Another thing to think about is, how long is this data accessible? If it's in the same place for one request in your service and then you throw it away and then it shows up somewhere else, then, you may not have big problems here because, it may be very hard to conduct all of the things necessary while the data is in the same place. You also wanna look at what kind of timings that someone can get in the NetSpectre style of attack. You wanna look at, what is the latency of your system? How low is that latency, how low can they get it? And you also want to look at, just how many different systems, have the same information? So, if you have, for example, a cryptographic key that is super important and you have distributed it across thousands and thousands of machines and all of those machines can all be attacked simultaneously you have a much bigger bandwidth problem than if it only exists on one machine, because then, the bandwidth is much narrower. These are key things to think about around bandwidth. And really, NetSpectre is all about this. You're essentially, always going to be making this bandwidth risk, value and complexity trade-off because, it's going to be very hard to mitigate this otherwise, so, you want to think very carefully about this. But what if you do run untrusted code on the same machine? There are a lot of shared machines that actually have shared users here, and I don't mean in the cloud, since, if you have separate VMs, that's enough. Like you can think of those as separate machines, but what if you're actually, really running on the same machine? Then you have to ask more questions. Do you run untrusted code on the same physical core? And this may not always be obvious. If you don't have hyper threading or simultaneous multi-threading, then, you clearly don't run untrusted code on the same physical core simultaneously. But there are other ways you may get here, you may partition your workload across different cores. There are a lot of ways that may influence this and all of the variant 2-style attacks from application to application, rely on running on the same physical core and so, in a lot of ways, if you can exclude this you get to take out an entire variant from your threat model and that's really, really useful. With that, we've kind of talked about all of the different things you wanna think about from threat modeling. I do wanna re-emphasize, this is about applications. Operating systems and hypervisors have totally different challenges here, I'm not covering them. They're there, they're very real risks but I'm not covering them. If you wanna know all about operating systems and hypervisors, you can come and ask all about them at the panel but, I'm actually not the expert there and it's a very different thing and it seemed like a different crowd that might be more interested in that. I'm focusing on application issues here. With that, let's move over to talking about mitigations. How do we actually cope with this? First things first, you have to mitigate your operating system otherwise, none of this matters. If you do not deploy the operating system mitigations that your operating system vendor is providing, you cannot do anything useful here. These are essential. So, please, especially now, it's increasingly important that you have a way to update your operating system and that your operating system vendor is actively providing you updates. If they aren't, you should probably look for a different operating system vendor. This stuff is important. Let's assume you've gotten all of your operating system mitigations and all of your operating system updates and so you're good. And let's talk about how you can mitigate your application code. First off, there are some x86 kind of operating system and hardware-based mitigations for application code. These come in three flavors. They have again, weird acronyms. IBRS which is, indirect branch reduced speculation. IBPB, which I missay every time I try, which is, indirect branch prediction barrier. And STIBP, which is the, single threaded indirect branch prediction feature. Your operating system and your hardware can turn these on. When they do, they can provide certain levels of protection from some of these variants. But an important thing to realize, for an application, these do not help with variants 1 or 4. They're exclusively helping with variant 2. They also, may be very slow in some cases. These are especially slow on current and older CPUs. We're expecting newer CPUs to increasingly, make these things fast and for them to be essentially, unobservable in terms of performance. But if you have the older CPUs, even turning these on with your operating system may be a very significant performance hit and there are some alternatives. But the alternatives are software-based, and so, we need to talk about how we can use software to go after mitigation. The first one is called Retpolines. This was developed by Google, a colleague of mine. The idea is, well, we can recompile our source code to our application, is we wanted to see, is there something we could change in the source code that could be effective at mitigating at least, some of the most risky variations on this. Notably, variant 2 which is far and away the easiest to attack in a working system. It seemed like something we really wanted to mitigate in software, given the performance impact we were seeing from the OS hardware-based mitigations. It does require recompiling your source, which can be painful, but if you can, this mitigates Specter variant 2 and SpectreRSB in restricted cases but there're a bunch of asterisks and hedges there. And it's usually going to be faster than STIBP on current CPUs and older CPUs for mitigating your current application. Not always, but there's a decent chance you probably want to look at it. Going forward, in the future, we do expect this to become less and less relevant because the hardware is really catching up. We're expecting in the future, this is just going to work on hardware and you're not going to need to worry about this. But for now, you might want to worry about this if you have a service that is at risk here. How does this work? We have some indirect call, just like the previous one but when you compile your code with Retpolines, we don't emit these instructions, we emit a different set of instructions. Here, we've taken this address that you wanted to call and we've put it into a register r11. And then we've transformed the call into a call to this helper routine, llvm retpoline r11. If we look at this routine, this is a very, very strange function. The first thing it does is a call but it doesn't call a function, it calls a label, a basic block inside of itself. And once it does that, it then takes the address you wanted to call and smashes the stack with it, this is a stack smash, this clobbers the return address with this address you wanted to call and then it uses a return to actually branch to that location. So, that's a pretty weird thing to do. The key idea here, is that by doing a call followed by a return, we put a particular address, an unambiguous address into the call and return predictor, the return stack buffer. And this predictor is really fast and really good so, the processor prefers it anytime it can use it. And in the vast majority of cases, it's going to be able to use it here, and when it does, if it speculates this return, it actually ends up here, because the speculative return can't see that stack smash operation. So, when it speculates return, it goes here which then goes to this weird pause instruction. How many folks here have used the x86 pause instruction? I don't know what kinda code you people are writing except for this one over here. I know what you're doing too. The pause instruction's super weird, I never even knew what this was, I thought this was like something from old, old, old x86 days but no, it actually has lots of uses, and in this case it is the cheapest possible way to abort speculative execution. And we want to abort it because speculative execution consumes resources, like power. And so, we don't want to abort it, and so, we cut it off here. Unfortunately, pause doesn't do that on AMD processors, it only does it on Intel processors. After we pause, we then do an LFENCE and this, will actually work on AMD processors once you install your operating system updates. Finally, just in case all of this magic fails, we make this into an infinite loop. You're not getting out of here, this is keeping the speculative execution in a safe, predictable place. This essentially, turns off speculative execution and branch prediction for indirect calls and indirect branches, but that protects us from variant 2. The overhead of doing this is remarkably small. This is about your worst case scenario, we built very large C++ servers with this enabled and the overhead was under 3%, reliably under 3%, but it does require that you use some pretty advanced compilation techniques. You need to be using profile guided optimizations, you need to be using ThinLTO or some other form of LTO. I can't emphasize that enough, but when you use them, you can keep the overhead here very, very low. And if you're working in something very specialized like some really specialized code at code or a kernel, you can usually avoid the indirect branches and indirect calls, manually, with essentially, no measurable performance overhead by introducing kind of, good guesses for what the direct code should be and a test to make sure that that's correct, rather than relying on indirect calls and indirect branches. We've been able to use this to make our operating system mitigations incredibly inexpensive, as a consequence. But this is only for variant 2 and maybe variant 2 is gonna be fixed in future hardware and maybe, you're not even subject to it. So, what about the other variants? That's where things start to get bad. You can manually harden your branches for variant 1, which is nice. But it can be a bit painful. Intel and AMD are suggesting that you use the LFENCE instruction right after a branch. And actually, while we're here, I think we have enough time. Everybody likes live demos, let's see if we can actually just do this. I come down here. And after my branch, I do an LFENCE. We would expect this to mitigate things, hopefully it does. This is gonna run really slow but it's also, not gonna produce my string. Nothing's happening here and that's a good thing, it's running. I can even build the debug version if you're all are worried that I'm being sneaky here. I have a debug version that actually prints out stuff while it's going. We're trying to leak it, it's a secret and you're seeing what it's finding here and it's not finding any character data from the secret. And just so that we're all clear, I don't have anything up my sleeve. Comment this out. No, have to rebuild. Goes right back to working. LFENCE works, that's nice. We like mitigations that work. But it is a bit slow and it can be really expensive and there're cheaper ways to do the same thing if you can go through and mitigate each and every one of your branches. With Google and ARM have been looking at building APIs to do this in a more efficient way and in a little bit more clear way in the source code because an LFENCE feels pretty magical to just like, oh no, no, I just put an LFENCE here, I'm good. We can do something a little bit better with an API. There's a lot of work to do that though, I've got links up on the slides if you wanna go to them. This is gonna show you, kind of, where these different organizations are looking to build APIs, but we don't have anything that's really production quality and that you can reach out and use today. The best you can do right now is actually something like LFENCE, I think ARM has a similar thing to LFENCE that they suggest with an intrinsic as well. But, this doesn't scale well. You have to manually do this to every single point in your code, that's really, really painful. Maybe you can use a static analysis tool to automate this but what we found is that the static analysis tools either cannot find the interesting gadgets that look like Spectre variant 1 because they're very careful and accurate and they leave lots of unmitigated code or they find hundreds and hundreds and hundreds of gadgets that are completely impossible to actually reach with any kind of real-world scenario. You can't actually get there and use them to conduct a Spectre, kind of, information leak. So, this means that they're not super satisfying to use, they're better than the alternatives of doing it manually without static analysis tool, but they still pose real scalability problems. Ultimately, my conclusion is that, this isn't going to continue to scale up to larger and larger applications. We're already right about at the threshold of how much we can do with static analysis tools and manual mitigations when we're working on large applications. So, we need an alternative. There's another system called speculative load hardening, this is also developed by Google and this is an automatic mitigation of variant 1. This is not related to the Spectre flag in Microsoft's compiler. That is not automatic mitigation of variant 1 in all cases, that handles specific cases that they've taught it about. Other kinds of variant 1, other instances of variant 1 aren't caught by it, which makes it, potentially, risky to use. But this is a categorically different thing. This is a transformation that removes the fundamental exploitable entity of variant 1 from your code, and it does it systematically across every single piece of code you compile. You still have to recompile your code but you can deploy this to get kind of, comprehensive mitigation of variant 1. Just so you are aware, this is incredibly complex, it's still very, very brittle, this has been something that we're working on for a long time but I don't want you to get the impression that, this is production quality, ready to go right out the door. We're all still, really working on this, but I wanna try to explain how this can work. Let's take an example. This is a little bit simplified version of the Spectre variant 1 example from the original paper. We have a function, except some untrusted offset, some arrays and it's going to try and do a bounds check. So, we come down, we do a bounds check, we potentially bypass this bounds check. Let's look at how this bypass will bounds check is actually implemented in x86. If we compile this code down, we get the instructions on the right. These instructions are going to compare whether we're below the bound. If we're greater than or equal to the bound, we're going to skip this body of code. That's what this does. When we're going to use speculative load hardening, we need to somehow transform this so that a branch predictor predicting that the bound is within, that the index is within the bound and predicting that we enter the code from working. The way we do this is by, instead of generating the code on the right, we generate the code on the left. So, let's try and walk through this code on the left. This is for the same C++ pattern and understand how it works. First we need to build what we call, a misspeculation mask. So, it's just all ones. We're going to use this whenever we detect misspeculation in order to harden the behavior of the program. We also need to extract the caller's mask because speculative execution can move across function calls it could be interprocedural. So, we want to the caller pass in any speculation state that it has and we parse it in the high bit of the stack pointer. This transforms the hide it in the stack pointer into a mask of either, all ones or all zeros. And in a normal program, you'd expect this to all be zeros and in a misspeculated execution, this is going to be all ones just like our misspeculation mask. Now, we do our comparison just like we did before, we have our branch just like we did before and we may mispredict this branch. If we mispredict the branch though, we're going to enter this basic block, when the condition is actually greater than or equal to. And so, in that case, we have a CMOV instruction and CMOV instructions today, are not predicted by any x86 hardware, and so, as a consequence we can write the CMOV, using the same flag, greater than or equal to. And if we enter this block when that flag is set, which should never happen, we write the misspeculation mask over our predicate state, over this state that we got from the caller. This essentially collapses us to the all ones if we ever misspeculate this branch. Then we come down and we load some memory just like normal, but keep in mind, this may have loaded leakable bits, these bits may actually be, something that can get leaked in some kind of actual attack scenario. There are some operations on this that we actually allow. These are data invariant operations, these are the same kinds of operations we would allow on private keys, if we were implementing a cryptographic algorithm. They do not exhibit any change in behavior based on the data that they observe and so, they're safe to run over this data. They just move things around and there's nothing that you can glean from these. But before we actually use this piece of data to index another array, we mask it with our predicate state or all of those bits over the data that we loaded. And because of this, if we misspeculated, all of the bits are now all ones, none of what we loaded is observable. And so, the fact that we then do this data-dependent load remains safe. This is the core transformation of speculative load hardening. And we do this for every single predictable branch in the entire program, and we do this hardening for every single piece of loaded data in the entire program. It's very, very comprehensive. There aren't these huge gaps in what gets hardened and what doesn't get hardened. But there is a catch. The overhead is nuts, it's just beyond belief, it's huge. 30 to 40% CPU overhead is a best-case, medium-case scenario. Worst-case scenario is even worse than this. If you don't access a lot of memory, then it can be lower overhead than this but, I don't know, you don't access a lot of memory, which is weird. For most applications, we expect this overhead to be very large. We've built a very large service with this, we've actually like had them test it, in a live situation so we can actually measure the real-world performance overhead, this is a very realistic performance overhead you can expect from deploying speculative load hardening to your service. I am very aware that this is not an acceptable amount of overhead for most systems. They probably don't have the CPU just kicking around. If they're latency-sensitive, this is actually going to impact your latency. If you're not latency-sensitive, you're still going to need a 30 to 40% increase in capacity of CPU to handle this or 30 to 40% decrease in the amount of battery you have if you're running on a device. This is a really, really problematic overhead. Unfortunately, this is the best that we know how to do while still being, truly comprehensive. The only things we know to really reduce this at this point also open up exposure to various forms of attack and that's not what we want, that's not the trade-off we wanna make. So, what else can we do? This has been a grim list of, stories about mitigation. The other thing you can do, is you can isolate your secret data from the risky code. Sandbox any, and this is actually the thing that works even for untrusted code. When you have sandbox code, you have to actually separate it from the data with some kind of processor level security abstraction, typically separate processes on a modern operating system. That's, really the only thing that's enough for untrusted code, because this is the only mitigation we realistically have for variant 4. This is what all the browsers are working on in order to mitigate variant 4, long-term. Everything else looks short-term, too expensive or doesn't work in enough cases. The other interesting thing is, if you do this, this protects against all of the other variants of Spectre. If you actually, can separate your code in this way, you are truly protected from Spectre, and it gets better. You're also protected from bugs like Heartbleed. It's now, very hard to leak information at all because the attacker doesn't have access to the program that actually is touching the secret data. So, the extent to which you can design your system this way it can really, really increase the security of your system, it can really make it hard to suffer from information leak vulnerabilities in general. We really do think this is a powerful mitigation approach. Ultimately, you're going to need some combination of approaches targeted to your application, oh, I almost forgot, sorry. I forgot, we actually can live demo this too. Just so that we're all on the same side. I build this and you can see there's a little, there's an extra flag in there and now when I run it, whoa, that's not good. Helps, if you run the right program. So, when I actually run the mitigated one, it doesn't leak anything. This is just like linking random bytes of data. If you want, I can open up the binary, we can stare at it, it's gonna look a lot like what I presented. But, this actually does work. You do want to expect to like, need some mixture of these things. You've got to look at your application, your threat model, your performance characteristics, how much of an overhead you can take to pick some approach here. There's not this, oh yeah, you do this, this, this, you're done, go home, everything is easy. That's why I gave a long presentation about it. This isn't sadly, the easy, easy case. There's also some stuff I want to see in the future because like I said, we're not done here, we're not finished. So, I've got three things that I would really, really like to see. Number one, we have to have a cheaper operating system and hardware solution for sandboxing protections, like the last one I mentioned because that's the most durable protection, provides the most value by far. We need an easier way to do this. The browser vendors are really struggling doing this today and we should make that much, much better so that we can deploy it more widely. The second thing is, cryptography really needs to change. The idea that you do cryptography with a long-lived private key that you keep in your memory, is a very bad idea. We need to go and make sure every single cryptographic system is separating the long-lived private key data into a separate subsystem and a separate process, potentially, leaving it on disk until it needs it because this is too high risk. We have the cryptographic parameters we need here, things like ephemeral keys in TLS 1.3, we have good techniques here in the cryptographic space, we need to use them, we need to stop using older cryptographic systems that require these long-lived, stable private keys, especially, small elliptic curve stable private keys to be visible in memory, to a system under attack. That's a very, very bad, long-term proposition in the wake of Spectre. And last, I think we have to solve Spectre v1 in hardware. I do not think, that anything I've shown you for v1 is tenable, long-term. I think we may be able to sneak by for the next five to 10 years, while the hardware community moves on this. I understand that there are real timeline issues here that they cannot change, but they must actually, solve this in hardware. Think of it in a different way. I do not believe that we can teach programmers to think about Spectre v1. How do we teach programmers? We say, like, well, you have these set of assumptions and once you build up these assumptions, you work within them and then you build up more assumptions and you work within those, and you build up more assumptions you work within those. And how does Spectre work? It says, eeeh, not really. You have all those assumptions, they're very nice but I didn't pay any attention to them. Now, we have to teach people to think about the behavior of their code, when literally, none of their predicates hold and I don't think that's viable. This is different from saying, like today, we have C++ without contracts. We're gonna to get contracts to it. This is worse than going back to C++ without contracts 'cause today, what we have are unenforced contracts, we have contracts in our documentation, in our comments, everywhere, right? We have asserts, we have predicates, everywhere. Imagine having none of them and having to write code that was correctly behaved even in their absence. I don't think that that's viable, and so, I do not think we can exist in a computational world where a Spectre v1 is a thing programmers are thinking about. I think we have to actually remove it. And so, I'll give you a brief conclusion. Spectre, misspeculation, side channels give you information leak of secrets. It's a new and it's an active area of research, this is going to keep happening for a long, long time. We have at least a year, maybe years, plural, of issues that have yet to be discovered. You need to have a threat model to understand its implications for you and you need to tailor whatever mitigation strategy to your application because there is not a single one that looks promising. And ultimately, I want all of you to help me convince our CPU vendors, that they must fix Spectre v1 in hardware. We can't actually sustain this world where our assumptions do not hold. So hopefully, you all can help me with that, and I thank all of you and I also wanna thank all the security researchers that I've been working with for the last year, across the industry, it's a tremendous group of people, they've taught me a whole lot. Hopefully, I've taught you all at least a little bit and I'm happy to take questions. (audience applauds) Just as a quick reminder, we only have a few minutes for questions like four or five minutes for questions. I would really encourage you, focus your questions on my talk. We're going to have a panel to talk about everything to do with Spectre in about, just over half an hour. I'll be there, a couple of the other folks working on this will be there. If you have generic questions, feel free to wait until then and we'll try to answer them then. With that, let's do the question on the left or the right here. - Some mitigations require a compilation. I'd like to understand, it's like a compilation of everything, right? It's not see specific problem, it's processor instructions, specific problem? - Yes. The key thing here is, as we start to work with Spectre, we see an increasing need for you to be able to recompile all of your source code in your application somehow. Because all of it, potentially has, the vulnerable piece. - So, that's true about Java managed systems and whatever? - To a certain extent, its true of Java and managed systems however, constructing ways to actually break these types of things is much harder in managed systems. - Hi. This all is based on the fact that the speculative execution executes code that is actually not supposed to run. So, eventually, the pipeline will catch up and the CPU will realize that, I'm actually not supposed to execute this branch and then stop executing it. Just, like a ballpark estimate, how much code can I get into that before the CPU realizes that, I shouldn't be executing this and stops doing it. - That's a great question. The key question is, how much code can be speculatively executed in this window? What's the window of my risk? I have been asking processor vendors that question for a long time and they will not answer me. But I'm not throwing them under the bus. I actually understand why, increasingly, I really understand why. I don't think that there is a simple answer, it's not that easy to reason about because, what you actually are seeing is the exhaustion of resources on the processor. But different kinds of instructions exhaust resources at different rates. It's very hard to say, oh no, 100 instructions and then you'll be done, because different instructions may take up different amounts of resources. However, in practice, we have seen hundreds of instructions execute speculatively. Not tens, hundreds. And we should expect that we will get better and better at tickling this particular, weird part of the processor and sending it further and further down these traces. We should also expect that processors are going to speculate more and more as they get larger and more powerful. - Thanks. - You said a mitigation for this is to put untrusted code in a separate process from the secret data. - Correct. - But you also said that there's something called NetSpectre where you can exploit over a network, how does that work? - If you're moving untrusted code into a separate process what you're protecting the data from, is the untrusted code. You can also move trusted code that handles untrusted inputs to a separate process. And then, NetSpectre is going to leverage that code to read data in that process. But if that process doesn't expose to its untrusted inputs, any control over the inputs to the process with the secret data, you can't construct an attack. And you have to think really carefully about, just how trusted is my input? Can I fully trust, can I fully validate the communication, the secondary communication from the at-risk process to the trusted process? But sometimes you can do that. Sometimes you can say like, no, all of the communication there is written by the programmer, is trusted. All we can do is select between those, we can't construct arbitrary risky inputs, so now, we can trust our inputs in the trusted process, we don't have to worry about a Spectre vulnerability. - So, we have to think about, not just trusted code but also, trusted input? - Absolutely. At-risk code is either untrusted code or code handling untrusted data. - Cool, thanks. - It seems to me that the whole issue is because, the CPUs are trying to speculate where they are going and try to do this optimization on the way of, they are working. How bad would be to turn this completely off? - What's the cost of turning off speculative execution? It's actually pretty easy to simulate this. When I built the speculative load hardening compiler parse, I also built something that added Intel's suggested mitigation of an LFENCE but instead of doing it only on the risky branch, it adds them on all of them. It's a very simple transformation, much simpler than the speculative load hardening. And I measured the performance of that. And that's actually an interesting thing to look at because what LFENCE does, is it essentially, blocks speculation past the fence. And so, this doesn't turn speculative execution completely off, but it dramatically reduces speculative execution on the processor. The performance overhead of this transformation was somewhere between a 5X, to a 20 or 50X performance reduction. There was like several very tight computational loops so, well over 20X performance reductions and at that point, I started having trouble measuring with high accuracy. I don't think that's even remotely desirable due to the performance impact. This shows you also, how incredibly important speculative execution is. No one should leave this and be like, "Oh, those processor designers, "why do they have to use speculative execution?" It makes your program 20X faster. It's really good, unfortunately, it does come with a problem. - Hello, I wonder on the impact on compile optimizations. For example, when it was pretty new I tried to get rid of all my indirect jumps by just not using function pointers and I observed that basically, the only option I had to parse to my compiler was to disable jump tables to get rid of it. Like some compiler parsers now being overthought to like maybe, generate completely different code. - The question is, is Spectre really changing how we think about compiler optimizations? I don't think it is in a lot of ways because a lot of software isn't really impacted by Spectre. So, we want the optimizations to run there. But when we know we're mitigating against some part of Spectre, we definitely turn things off as necessary. So, when you're using Retpolines for example, we turn off building jump tables, so that we don't introduce more of these risky things that we then, have to transform. But I don't think there's a lot of impact beyond that long-term. Mostly, the impact on compiler optimizations is figuring out how we can mitigate these things less expensively. - Okay, thanks. - Most of the stuff on memory leaks all happens during speculative execution and gadget chains are relatively inefficient use of instructions. How deep can you go, how many instructions can you execute speculatively, given those two things combined? - Again, we don't know, we don't have card answers here, but our experimentation shows, hundreds of instructions which is more than enough to form any of these information leaks. And remember, even even though your gadget chain for a wrap-based gadget chain, may be fairly inefficient. The set of operations needed here is fairly small. They fit into a pretty tight loop, especially if you're willing to have a lower bandwidth timing mechanism. I used a fairly high bandwidth, high reliability timing mechanism. There are other approaches that are much shorter code sequences, that for example, extract a single bit at a time rather than extracting all eight bits of a byte in one go. And so, there are a lot of different ways you can construct this. - Thank you. - It sounds like you said that, none of these approaches will work across a process or a hypervisor boundary, and I was just curious if you could elaborate a little bit on why that is and what protects us in that scenario. - The key question here is, why are we safe across these boundaries, these operating system and hardware boundaries such as system calls, privilege transitions, virtual machine transitions? Fundamentally, we aren't protected by these inherently but the operating systems and hypervisors have all been updated in conjunction with the hardware to introduce protections on those boundaries. And so, that's why, the very first thing I said was, you must have the operating system mitigations in place, otherwise, you don't have the fundamental tools to insulate one process from another. - Thank you. We're gonna cut this short but I'll take these three questions. If you do have a question that would be fine at the panel, consider if you can just, wait in 20 minutes and you can ask it then. - You said that basically, if you don't have anybody to steal the secrets, then you're safe, so like, nobody your process communicates with-- - You're safe from from information leaks. - Yes. I think I remember reading, when Spectre came out that you can actually use it by just running another process on the same machine, so like, there's no obvious communication going on but you can like time caches or something, without any relation to unit processes. - You have to have some way of influencing the behavior of the thing you're running. There are some edge cases where you can do that from outside the process, as just a sibling but those are pretty rare and isolated, I think it would be very, very hard to do that. You have no way of triggering a particular type of behavior of the victim. It's gonna be very hard to cause it to then, actually leak the information you really care about. This is less true for some of the other things that are mitigated at the operating system level, but for Spectre specifically. - Can you tell us anything about Spectre and non-memory related side channel attacks? - The question is, are there other side channels and the answer is, yes. There are many, many, many other side channels. BranchScope showed a, branch predictor-based side channel. The NetSpectre paper included a frequency-based, very generally, a frequency/power-based side channel. Essentially, any bit of state in the micro-architecture of the processor that you can cause to change during speculative execution and that does not get rolled back is a candidate and there are a tremendous number of these things. - Thank you. - You ended with you talk with a sort of, call to arms for us to help you convince-- - I wouldn't say arms, I would say action. - Action, sure. For us to help you convince hardware vendors to mitigate this in hardware. I have heard that Google spends quite a lot of money with hardware vendors, so, one might be forgiven for wondering if Google can't convince them, what hope do the rest of us have? - The key issue is, why is one person asking the hardware vendor if that person buys enough CPUs, why is one entity asking the hardware vendor, not enough? Fundamentally, these hardware vendors are not in a good position to scale their production and their economies of their production in ways that differentiate between customers arbitrarily. So, if only one customer really needs this to happen, they may not be in a good position to spend a tremendous amount of money building that when only one of their customers will benefit. If all of their customers want it, then they get the full economies of scale for that particular feature. My fear is that, this feature is going to be expensive enough on the hardware end, that unless it's universally desired, it won't make economic sense to the hardware vendor, and so, that's why I think, everyone needs to do this. But it's also important to keep in mind, we literally do not know how to do this yet. We have some ideas, a few people have ideas, they're not fully fleshed out, we're not sure that they work we're not sure that they're implementable. And so really, the first step is to try and figure out how to do this, what the cost would be and then hopefully, if there is a way to do it at a cost that at least, is reasonable, if the entire user base of these processors lobbies very effectively, I'm hopeful that the processor vendors will actually step up and provide a real solution, long-term. But with that, we should probably end the Q&A and hopefully, you'll all come to the panel session which will be a lot of fun, thank you all. (audience applauds)

Info

Channel: CppCon

Views: 43,670

Rating: 4.9133759 out of 5

Keywords: Chandler Carruth, CppCon 2018, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording

Id: _f7O3IfIR2k

Channel Id: undefined

Length: 87min 35sec (5255 seconds)

Published: Tue Oct 02 2018