The Memory Sinkhole - Unleashing An X86 Design Flaw Allowing Universal Privilege Escalation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

TLDW; An old feature that lets you memory map a bunch of zeros to anywhere on the memory is used to force reading a bunch of zeros in ring -2 code. (ring -2 code has privileges even the operating system doesn't have) Causing that specific ring -2 code to read only zeros causes it to jump to and execute code at an offset from 0x00000000 which the operating system can modify allowing arbitrary code execution in ring -2 from ring 0. Effectively this means somebody with root access on a machine can royally screw up the machine. He used it to install a backdoor in a computer to allow an unprivileged user to gain root access in a way undetectable to anti virus. (Anti virus cannot access ring -2 code) Processors before 2013 have the flaw. The vulnerable machines cannot be patched.

TLDR; An old cpu feature opened up an exploit to escalate cpu privileges higher than even the operating system.

👍︎︎ 65 👤︎︎ u/soiguapo 📅︎︎ Nov 07 2017 🗫︎ replies

Wow, that's a deep rabbit hole. I commend the presenter on his ability to not only ferret out that issue, but to actually turn it into an exploit.

👍︎︎ 35 👤︎︎ u/vogon_poem_lover 📅︎︎ Nov 07 2017 🗫︎ replies

Isn't this the guy who also wrote an obfuscator which only uses mov operations? (Because obviously, mov on x86 is turing complete).

👍︎︎ 18 👤︎︎ u/C5H5N5O 📅︎︎ Nov 08 2017 🗫︎ replies

Google's NERF project had a section on how they are neutralizing the SMM ("ring -2.5") as well as other ring -2/ring-3 exploits. I don't know if the SMM work has had it's code published yet or if it was just a special kernel configuration on top of booting from u-root.

👍︎︎ 11 👤︎︎ u/zamadatix 📅︎︎ Nov 08 2017 🗫︎ replies

I remember this actually. It ended up only affecting a handful of ancient Intel boards, since AMD's x86 was unaffected and SMM doesn't exist on x86_64. Those boards were salvageable with a firmware update.

Quick Google search makes it look like it never even got a CVE. A deeply interesting flaw, but luckily a few years too late for it to have reached its full destructive potential.

👍︎︎ 3 👤︎︎ u/not_a_novel_account 📅︎︎ Nov 08 2017 🗫︎ replies

Wild, this guy has the best job

👍︎︎ 3 👤︎︎ u/BLAHFUK 📅︎︎ Nov 08 2017 🗫︎ replies
Captions
this talk is the memory sinkhole unleashing an x86 design flaw allowing for universal privilege escalation with our speaker Christopher Domus thanks all right good morning everyone thanks for coming out my name is Christopher dolmas I'm a cybersecurity researcher at a group called the Battelle Memorial Institute it's a pretty cool place to work is my chance to do a lot of research into the fringe areas of cybersecurity that I wouldn't otherwise get to see but I wonder what I wanted to talk to you about today is something I've been tinkering with in my free time for the last couple of months it's actually an x86 architectural Warner ability that allows privilege escalation and it's been hidden on these processors for the last 20 years without anybody noticing yet what it effectively does is open up an entirely new class of exploits on these processors and I want to talk about exactly how that's going to work but first I want to give a demonstration in order to sort of frame what we're going to see today so this demonstration is a little bit risky so instead of doing this on my actual presentation laptop I've got this little netbook over here I love doing this deep vulnerability research on these netbooks because I can brick these things all day long and just not feel bad about it and just in case something does go very bad I've got an entire stack of backups here so we should be prepared so so I'm SSH into this into this netbook that's the prompt that we see here and I'm just SSH in some boring on privileged user who can do boring on privileged things but one of the things of boring unprivileged user can do is they can run boring unprivileged programs so I've got this simple little C program here that doesn't really do anything interesting oh it does this create a couple of 64 bit variables it's got this sort of clue genus here that I'll fix later on this ensures that the process is going to be scheduled on core 0 of the processor right now this exploits only going to work on core 0 I can fix that later but then we do something very very simple we just sit in a loop for a couple million iterations and all we do in this loop is write the 64-bit value out to memory read it back in from memory into the processor registers and then write that back out to memory again and we do that over and over and over again so this isn't row hammer this is literally just reading a 64-bit value into the process registers for a very very long time and after that's done we're gonna exec Bane shell hoping that maybe we cause something to happen by using this magic 64-bit number so I'm gonna go ahead and compile this simple C program here and one more time before I run this double check Who I am I am user right now then we're going to go ahead and run this little C program here and hopefully maybe something will actually happen so we run this and it sits in that loop and it didn't work and we still have that dollar sign which means if we check who I am I'm I'm still user so of course it didn't work you can't escalate privileges by loading a 64-bit number into the processor registers that's kind of silly but maybe maybe you can maybe if we find the right 64 bit number we can do that so I'm going to change that last three in the 64 bit number it's going to become a 4 instead we're going to retry this attack will recompile the program one more time verify that we're just user and run the program and all of a sudden something changed so we're gonna spend the rest of the hour figuring out how this thing worked but I gotta warn you it's it's flash here than it actually seems there is absolutely nothing special about that 64-bit value if you run this exact same code on your computers absolutely nothing will happen exactly as you would expect all that 64 bit number really was was a signal to something running much much deeper on the processor is something running so deep that our process couldn't see it in fact even the kernel running in ring zero couldn't see the piece of code that just gave us that root access in fact you could tear the system apart for the last piece and do a forensic analysis on here and not be able to see the piece of code that just gave us that root access so what we're really gonna find out is exactly how to get something running so deeply on the processor that nothing else can see it there what we're gonna see today is an architectural solution for ring -2 privilege escalation so in order to understand how this work we need we need a little bit of background information on the x86 privilege model so x86 divides privileged into different rings so at the top is ring 3 where we really can't do anything useful that's where all our user land code runs inside of ring 3 just doing boring uninteresting things we've got ring 2 and ring one below that nobody really uses those anymore but ring 0 is where the real magic happens where the kernel is that's where we can finally start doing some interesting things on the processor and usually if you want to do something really interesting in an exploit you try to go from ring 3 down to ring 0 but it doesn't stop there it goes much much deeper than that over the course of evolution of the x86 processor we sort of found out that some things are so important that ring 0 shouldn't have access to them so we created additional levels of privilege deeper on the processor so we invented this ring - wine which is more commonly known as simply the hypervisor but as we found out eventually some things are still so important that even the hypervisor shouldn't be able to access it so we created another level on the processor what we call ring - - or system management mode so system management mode started out simply enough it was just what if we have this mode of execution that's invisible to the operating system well I wouldn't want something like that well originally it was pretty simple we just wanted to be able to do power management without the operating system have to worry about it that was a simple goal for system management mode but it evolved from there over the course of time system management mode sort of became this dumping grounds for all these miscellaneous things that we didn't want the operating system to have to worry about and then all of a sudden came a big wine we dumped platform security into system management mode and why not platform security is really really important and ring zero could be compromised if we put platform security into ring minus two all of a sudden we don't have to worry about ring zero compromises anymore but this really opened up a whole Pandora's box for what system management mode can do it's now in charge of an alarming number of very important things on the processor like cryptographically authenticated variables signature verifications Hardware locks TPM communications it controls a platform lock box on in that system and it's actually the interface to the root of trust on the system because system management mode alone can modify the very first instruction that the processor will ever execute when it turns on so basically what we've seen over the evolution of SMM is that whenever we had anything that was so important that we didn't want the kernel to screw it up or so secret that it needed to be hidden from the OS and giya may accesses or so sensitive that it should never be touched by anyone we just toss that into system management mode so on modern processors this is really what our privilege model looks like at the highest level least privilege is just our ring free code and sitting just below ring 3 is ring 0 where the kernel executes but much deeper than ring zero is the hypervisor and there's a chasm that separates the hypervisor from system management mode and ring -2 so ultimately ring -2 is what's really in control of the processor so if you think that you own a system when you get down to ring 0 you're really not even close because there are layers and layers of separation between ring 0 and the actual processors on modern systems ring 0 simply not in control ring -2 is what controls the hardware at the firmware and all the most critical security checks so if we want to do something interesting we need to find a way to get deeper than ring 0 but in order to figure out how to do that we need to understand how the privilege model for SMM works how can SMM be something invisible to the operating system how can we zero Nazi something sitting in memory well the idea behind system management mode is that it's going to execute from a special region of memory called system management RAM or SM Ram and that's region of memories are only going to be accessible to SMM code so the idea is that the processor will see receive what's called a system management interrupt in SMI that's going to switch the processor over to system management mode and that's going to unlock sm ram since the management mode is going to execute from sm ram and when it's done it's going to issue the resume instruction which is going to leave system management mode and relock sm ram so with this model system management Ram is only ever visible to SMM code that's how we hide it from ring 0 so if you're sitting in ring 0 and you try to read from sm ram so on specific system i was looking at sm ram started at one ffs point 8 megabytes if we try to read from that address from ring 0 we're just going to get a bunch of garbage a bunch of F's in this case we can't actually see what's there but but how does that work how is it possible that ring 0 can't see memory that's physically on the system well that's the memory controller hubs job its job is to separate sm ram from ring 0 and it's ultimately in charge of enforcing SMM security so this is sort of the layout that we have for sm m security we've got the processor on the left and the memory controller hub sitting in between the processor and memory and sm ram sitting somewhere in memory so if we're in sm m and we try to read from a memory address inside of sm ram the memory controller hub looks at that address sees that it's in sm ram it looks at what mode were and it sees that we're in sm m so it allows us to access that memory it returns to us some real values on the other hand if we were in ring 0 and we tried to issue that exact same instruction that address goes over to the memory controller hub the memory controller hub says no you're not in sm m you can't access this memory here's a bunch of FS instead so that's sort of how sm m security's designed but who's to say we can't modify the memory controller hub why can't we just go around this I mean we're ring 0 code after all we should be able to set this stuff up it turns out that there are layers and layers of protections built around sm ram to keep us from ever seeing what's inside of that memory from ever modifying system management mode code so we've got ways to configure C seg HCT seg the different regions of SM Ram we've got ways to law SM Ram down we've got ways to lock the locks down we've got ways to enforce Cash go here and see on the processor we've got ways of preventing remapping the memory controller hub configuration we have lakhs and lakhs and lakhs and lakhs and lakhs on this processor keeping us out of system management RAM and this thing is better protected than ring zero is and it is a daunting task to try to get past all of these things so a lot of these locks well some of these locks are on the processor themselves itself but most of these locks exist in the memory controller hub so as far as how we can get around these things there's a lot of really really cool research going on right now and how to circumvent these protections in order to dive into ring -2 code so there's ways of attacking the fringes of the memory controller hub ways of exploiting misconfigurations in the firmware and SMM code so if you're interested in that kind of thing you should check out the research from leg the core and ITL and 80 are some really cool stuff going on here I want to present something a little bit different because there's a way to simultaneously circumvent every single one of these protections and it's built into the architecture itself in order to understand how that works we need to step back in time and look at something completely unrelated to system management mode a while back some 20 years ago we had something called the local APICS of local a pick in the x86 architecture is in charge of receiving interrupts events from things happening on that system and sending those over to the processor and it used to be that the local a pick was a physically separate chip on the chipset and it sent things over the processor but that was sort of an inefficient design so 20-some years ago until move the local APEC on to the processor itself it moved the local ethic into the actual silicon that the processor was using that had a lot of benefits it's way more efficient and cheaper to manufacture things this way and it also opened up a unique opportunity we no longer have to use IO instructions to communicate with the local a pick we can now use mm IO memory mapped i/o for communicating with the local a pick which means we can configure and communicate with the apec way way faster than we ever would have been able to before so with the p5 generation of microarchitectures until reserve this region of memory this four kilobyte region starting at feee megabytes and they said whenever you try to access this region of memory instead of being sent out of the processor that memory address is going to be accepted by the local APEC and you're going to be accessing the APEC registers instead so for example let's say you wanted to quickly access a pic register 280 what you would really do is issue what looks like a memory read instruction accessing 280 offset from F EE megabytes you'd move that into one of your registers so the APEC is going to catch that memory access and then give you back one of its own registers this caused a problem for Intel Intel doesn't like to break things when they release a new processor and that makes sense but this broke something this model namely there were some old legacy systems that were already using that memory range for something else if they then drop in one of these new processors that expects that memory range to go to the apec that's going to break things so they could have fixed their stuff but instead Intel decided to fix their processors instead so with the PGM p6 generation of microarchitectures they changed things a little bit so if you dig out the 1997 version of the Intel software developer manual you'll find this blurb buried in there the p6 family of processors permit the starting address of the epoch registers to be relocated from feee megabytes to another physical address this extension of the epoch architecture is provided to help resolve conflicts with the memory Maps of existing systems so that's really so this is how that's going to work in its default configuration the apec accepts accesses within this memory range so if we were to issue a memory request within that memory range the apex gonna catch that first and it's going to give us back one of its registers in this case register 280 on the other hand if we try to access something just one megabyte below that that looks similar that's not in the apec range they pick it's gonna look at that and be like well that's not in my range so it's gonna send that out on the system bus and you're actually gonna fetch ram from memory instead but with this new new addition to the p6 generation of microarchitectures they let us move where the a pic is so we can issue a couple of assembly instructions and a right MSR instruction in order to change where the a pic window is located so now I just move the apex so that it's located at Fe D megabytes instead of fue megabytes if I issue those same exact two instructions now fe e goes out onto the system bus and fetches from Ram and Fe D is accepted by the APEC so that we access an APEC register so um seems trivial it's it's a vestigial feature to fix some things that occurred a long time ago it's largely been forgotten nobody really uses this anymore in fact the software manual is almost universally assumed that the a pics are located at F EE megabytes it's it's really just a forgotten patch to fix a forgotten problem on some tiny number of legacy systems some twenty years ago but incidentally this opens up an incredible vulnerability on an entirely unrelated piece of the processor today so we can actually use this relocatable a pic feature from 20 years ago to attack modern system management mode code and here's how we would do that if we look at the standard configuration for the processor if we're sitting in ring 0 and we try to read something out of SM Ram that address is going to be sent out to the memory controller hub the memory controller hub is going to say you're not an S mm you can't access SM Ram it's going to give us back a bunch of F's can't do anything i'm the other and if we did it that exact same thing from s mm that address is going to be sent out to the memory controller hub it's going to allow us to access SM Ram it's going to give us back an actual value in that situation but what if we're in ring 0 code and we move the apec we put the a pic so that it overlaps yes and RAM region instead all of a sudden we've changed the view of our memory so from ring 0 we issue that exact same instruction that we just tried to issue well before we got back a bunch of F's now that address never makes it out of the processor the a thing accepts that address now and it gives us back one of its registers instead it's not really interesting it doesn't matter if we can modify our view of memory in ring 0 the interesting part is this allows us to modify SMS view of memory so now if SMM tries to fetch a value from SM Ram instead of getting what it actually expected to see the apex going to grab that memory access and give us back a register instead so the mth in this situation never actually received the memory request meaning that the primary enforcer of ring -2 security has just been removed from the picture so through this legacy APEC based feature ring 0 can manipulate the mmm I arranged an intercept ring negative two accesses to SM Ram so how could we actually use this to attack ring - - from ring zero and try to infiltrate this most privileged realm of execution on the processor well the concept is that SM ran sort of acts as a safe haven for s mm code it's where s mm lives it's an SM Ram as long as s mm code stays inside of SM Ram we can never see it from ring zero we can't touch it we can't modify it in any way but if we could get s mm to step outside of its hiding spot if we could get it to leave SM Ram and go into normal memory then we could hijack its execution and gain s mm privileges for ourselves so the first version of this attack is really pretty straightforward the idea is to move the apec over SM Ram let the system switch over to system management mode it's going to fault because it's breeding a bunch of data that it didn't expect to have there and when it faults it's going to try to figure out how to handle that exception it's going to look up an exception Handler and the interrupt descriptor table and that handler is under our control our handler isn't going to execute with SM n context so a pretty simple idea for an attack unfortunately it fails there's an undocumented security feature on these processors when this processor switches over to system management mode it zeros out the IDT our limit field effectively disabling interrupt handling when that happens if you trigger an exception in system management mode the system does what we call triple faulting it's essentially a reset of the system so if we try this attack from ring zero we can triple fault the system in system management mode but what good does it do us to simply reset the system from ring zero we could have done that anyways so we've got to find a much much more elaborate version of this attack so here's what we're going to do instead we're gonna overlay that apec mm IO range at the SMI entry point so when the system first switches over to system management mode the very first thing it tries to execute we're gonna put the apec right over that location then we're gonna load up our a pic with a payload that we want the processor to be executing inside of system management mode we're going to trigger an SMI assistant management interrupt to switch the system over to system management mode and that's going to allow us to hijack execution when SMM begins executing our payload directly out of the a pic so that looks something like this a normal situation when the processor receives a system management interrupt it tries to fetch the first as a mine or system management mode instruction from a fixed address it sends that address out to the MCH the MCH retreats that instruction from SM Ram gives it back to the processor to execute simple enough but if in ring zero we decided to move the apec range so that it exactly overlapped the SMI entry point we've all set all of a sudden taking control of which instructions the processor is going to be fetching when it switches to system management mode so from ring zero then we can trigger an SMI by writing to the B two ports we're just going to transition the processor to system management mode only now when it tries to fetch the first SMM instruction it's going to fetch it out of the apec registers instead of out of sm ram so it's going to return one of its registers and the processor is going to try to execute that as if it was an instruction so if we could store shellcode in the epoch registers and we could gain a control of the processor within SMM turns out to be really really hard to do the challenge is that the apec registers have to be 4k aligned meaning that we have no real v way and how we place this a pig memory window we have to put it exactly at the SMI entry point we've got 4096 bytes available inside of the eight thick that we can use for our shellcode unfortunately for us only those bytes are actually writable and only a few bits of each of those bytes are writable that's not a lot of control over the shellcode complicating matters this is actually an invalid instruction so the SMM is going to begin executing these registers from the top but when it hits this invalid instruction that's what we call triple faulting it's gonna reset the system our attacks over at that point we have control over exactly 17 bits before that happens so our blackboard registers in this diagram are largely hardwired to zero so if we try to disassemble zero zero zero zero as an assembly instruction that's just an add instruction it doesn't do anything useful but it's also not harmful which means the attack can continue so this is what our shellcode ends up looking like it's mostly just a bunch of add instructions that we have no control over and every once in a while a couple of bits that we can actually modify where we have to try to do something useful unfortunately eventually we're going to hit this region where we can't change it and that's not a valid instruction it's going to trigger a fault and reset the system we need to make something useful happened for that occurs so these are the exact APEC registers we have control over in this region when I say control I mean that word very loosely because some of these registers we can't actually do write directly right - they're actually pulled from other things on the processor and on the system so getting those registers set to the right value involve setting the system up to a very specific state just so that those are read as the instruction that we want them to be turns out this is pretty hard to work with if you tinker with these registers for a really really long time try to make them into something useful what you're eventually going to find is that the best instructions you can possibly get into these registers do nothing but the bad instructions the ones you don't want to use crash the system so you end up wasting most of your seventeen bits just trying to keep the system alive trying to keep it from resetting when it's executing out of the apec but if you do that just right you can get it to not crash by the time it hits your last byte so that last byte is our last chance that's the spurious interrupt vector register inside of the a pick if you look up the documentation on this it'll tell you that the low middle bits zero through three are hard-wired to whine the high nibble is actually writable that means if we're trying to pick an instruction to place into that register it's got to end in EPS that gives us four bits left to try to take control of the most privileged mode of execution on the processor so we need to figure out what instruction could we possibly put into the spurious interrupt vector to make this thing do something useful before the system crashes if you consult an opcode map our choices are pretty slim we've got prefixed bytes that don't do anything some ways of modifying the stack a bunch of instructions that shouldn't even exist in this architecture none of these do anything for us but all of a sudden right about when we're out of possible op codes to place into that register we hit a small miracle and I write instruction that's a return from interrupt routine and IRA it's gonna return papa return value off the top of the stack and jump to that address that'll give us a way to escape this horrible a pick and jump to some code that we actually have reasonable control over so by placing an eye read instruction into that spurious interrupt vector register and then configuring the stack to support this IRET instruction we could then remap the apec trigger an SMI and take control of the processor so this is what our apec payload attack looks like we set up that I read instruction set up a stack set up a payload to jump to remap the eight take and then trigger an SMI this is really really cool I was really excited to find this so I wanted I payload out of processor and it doesn't work that was depressing I spent 40 hours debugging this thing trying to figure out why it doesn't work and it turns out that instruction fetches actually bypassed the APEC window that's not the kind of thing they I they documents nobody was ever supposed to try to execute code out of the APEC registers it turns out only data fetches actually hit that window this attack is useless at this point it just got a whole whole lot harder so I was kind of despairing at this point this was a real vulnerability I could influence the view of SM Ram but it doesn't do us any good if we can't actually take control of the system but I I tried to reason about it and started thinking well we can't execute from the apec we don't have control of the instructions we're gonna have to take control of SMM through data accesses alone and suddenly that started to sound like something rather rather familiar there's other common situations where we don't have control of our instructions but we do have control over data and we try to get the system to do something useful so maybe we can do something like a pic wrapping in order to circumvent this this execution limitations we can sort of do that but it turns out to be way way more difficult than I imagined because if we fault the system the system resets the attack is over sm ram is invisible we can't even see the thing we're trying to attack 99.5% of the apec bits are hard-wired to zero we have no control over the data that we're using in this attack and the apec has to be 4k aligned and it's 4k large that makes it really really unwieldy so this is more like blind dropping with this enormous unwieldy payload of zeroes but if we're open minded we can maybe make this work so that's the memory sinkhole attack that we're going to see essentially from ring 0 by moving that a pic we can sinkhole a single page of rain- to memory so that reads from that memory returns zero because most of those registers are wired to zero and rights are completely lost since those registers are hardwired any rights are simply discarded so that's the sinkhole effect for this SMM attack the challenging is how do we attack code when our only control is the ability to disable a page of memory that's not a whole lot of control but if we use our imagination maybe we can find ways to do this but our ultimate goal needs to be to cover as many systems as we possibly can so in order to do that we to understand a little bit about where SMM code comes from SMM codes actually installed by your system firmware so if we want to make a very broad attack we need to understand the firmware ecosystem a little bit so firmer starts with Intel Intel writes what they call the efi template code it's sort of a skeleton firmware for other people to adapt from and they give that template code to the independent bios vendors the ID vs the IV V's had their own modifications to the template code and sell it to the OEMs things like HP and Dell they had their own modifications to the template code and then adapt it for each of the individual models of computers that they sell so you end up with really really really diverse system management mode code by the time you get down to the individual systems so we could try to attack that om code the stuff they added but even if you find an exploit in that code it's really only going to affect one system if we attack the code that the IV V's wrote we'd do a little bit better but if we really want a universal attack we have to go after that template code that code that everybody is using it's hard to attack but one exploit will allow you to attack almost any system out there so it just so happens that the efi template code is actually responsible for the SMM entry points and that's what that entry point looks like this is a little bit hard to decipher if you're not familiar with the SMM execution environment but this is the entry point for SMM on nearly every modern system because it comes from that template code so execution in SMM starts in what we colloquially called unreal mode and it's executing from high memory which as it turns out is a really really complicated environment to try to write code for so the very first thing that this SMM handler tries to do is it tries to set up some kind of reasonable execution environment for the rest of the SMM code and it's going to do that by building up segment descriptors transitioning the processor to protected mode and then transitioning the processor to long mode and the pieces of that of this that are really important for that process are these little pieces right here so with a little worker we can reverse engineer this and understand exactly how its operating the very first thing this SMM code is going to try to do I'm just going to try to set up a gdt descriptor that's a global descriptor table which is basically going to define the memory mappings to use in the rest of SM M so the first things it does for setting up this G DT descriptor so it reads the size of the GD T out of memory and stores that into the descriptor then it reads the location of the GD T out of memory and stores that into into the descriptor it's going to use a little self-modifying code here we've got an upcoming far jump that far jump is going to transition us from 16-bit protected mode to 32-bit protected mode it's gonna use self-modifying code to write out a selector into this far jump writes out selector 10 for that and then some more self-modifying code to write out the offset that it's going to jump to for this far jump finally we hit this far jump what the far jump does is it looks at the global descriptor table looks up its location in memory and then accesses that global descriptor table and then moves to offset 10 in the global descriptor table to figure out which segment of memory it's about to be jumping to so segment 2 happens to be at offset 10 inside of this table it fetches information about that segment and then jumps to this offset within that segment so that's how executions supposed to work and if we look at that code a little bit more depth what you're going to find is that all of that information that it loaded out of memory is being read from one large structure called the DSC structure the template SMM handler uses this single structure for storing all of its most critical environment information things like the global descriptor table information the segment selector information the memory mappings that it's going to use inside a system management mode are all stored inside of this one structure so sync willing this structure that has all the most important information for SMM would be devastating for the processor so I figured let's see what happens when we do that so from ring 0 we try to sinkhole that d SC structure by moving the apec over its location and then triggering an SMI to just transition the system to system management mode we can watch what exactly happens on the system we don't have control over the processor anymore at that point it switches to system management mode and then it's out of our hands any exceptions going to reset the system but maybe maybe it'll do something that'll let us hijack execution before it resets so these are the pieces of memory out of that same region that we just looked at these are now the memory accesses that are going to be sync hold by our attack each of these is going to read a 0 out of the a pic registers instead of what it intended to read out of SM Ram so once again it's going to try to set up a global descriptor table descriptor in order to define its memory mappings and the very first thing it's going to try to do is it's going to try to load up the size of that global descriptor table but it accesses the memory sinkhole reading a zero for the size that would kill our attack right there if you have a zero size for your descriptor table that means you have no memory mappings on the system it's going to triple fault and reset but we've got this miraculous instruction in here that just happens to save us they decrement that size so instead of being zero it becomes ffff the largest possible descriptor table size we could possibly have and then they store that into the global descriptor table register the next thing they need to figure out is where is the global descriptor table located again they read that out of the sinkhole and then write that into this gdt descriptor and again it's miscalculated now as being at address zero so we just successfully tainted the entire global descriptor table descriptor that they constructed here that's going to help us a lot in some upcoming code this little chunk is not especially relevant they end up doing some more self-modifying code for an upcoming instruction because they're reading that information out of the sinkhole they end up corrupting that upcoming instruction so if we get this far if we get down here past this jump the system is going to triple fault because of this corruption and reset and our attack will be over fortunately we never make it that far so once again they do some self-modifying code they write out that selector can into the upcoming far jump they try to set up the offsets used for that far jump that's going to be miscalculated again because of the sinkhole so we managed to taint that far jump value as well so when it hits the far jump now it's going to try to figure out where's the global descriptor table located what memory mappings am i using what segment of memory am I going to so it's going to fetch that global descriptor table from address 0 address 0 is now outside of sm ram that's something we have control over from ring 0 so we so sexually got it to fetch some data that we can actually influence at this point so it's going to look up a global descriptor table at address 0 it's gonna move to offset 10 inside of that table and read a segment selector out of that offset so the segment selector now since this is energy D T that we control is also a selector we control that means we control which memory mappings it's going to be using the memory segment it's about to jump to is now under our control it's going to add its offset to that memory segment base and it's going to jump to that region so that far jump caused SMM to load a protected mode memory mapping from the GD T under our control and if we were preemptively configure a malicious global descriptor table and place it right at address zero then we could control exactly where this jump is about to go we could force it to jumps up outside of SM Ram to code that we control to hijack the processor and ring -2 so this is what our payload looks like in this situation and it seems a little bit complex but after pre-processing it boils down to just these instructions flushing the cache setting up descriptor entry in low memory for it to use and moving the apec registers to sinkhole that DSE structure and then simply looping in place simply waiting for a periodic SMI to transition the system's assistant management mode so that we can take control of the processor when we do that we fired at the the processor we wait and it works this time around so I was extremely ecstatic to see that this thing worked because with eight lines of code we now know six successfully exploited a huge number of features on this processor we exploit Hardware remapping x' descriptor cache configurations instruction in data cache properties the processor is going to go through for execution modes and for different memory models before it actually hits that far jump that we tainted and jumps to code under our control so that we can successfully infiltrate the most privileged mode of execution on the x86 processor and the coolest thing about this whole thing that is that that template SMI handler had no vulnerability in it there's absolutely nothing wrong with that code we managed to attack it through this flaw in the x86 architecture and like I mentioned earlier this opens up an entirely new class of exploits against ring - - we can start looking for other things that we could apply this sinkhole against so for example this is the firmware that I pulled off of the MacBook that I've got up here for this demonstration this is how the Mac handles interrupts in system management mode they have this table of function pointers over here then they're going to use this call instruction to call into one of these function pointers if you were to sinkhole this table of function pointers that call instruction is going to fetch a 0 as the address that it's supposed to brand to forcing SMM to jump outside of sm ram to where we can hijack it alternatively the SMM stack is another fantastic place to apply the sinkhole all SMM code is going to try to set up a stack to use when it's executing inside of system management mode so if we were to apply the sinkhole over the stack then every call instruction inside of sm m is going to try to push a return address onto the stack that return address is going to get consumed by the sinkhole and just completely lost but but when it issues a return instruction its going to try to pop a return address off of the stack it's going to read a zero out of the sinkhole in branch to address 0 under our control so there are just a tremendous number of things we can apply this to it's really whatever you can imagine so the question then is what do we actually do with this capability okay we've got ring minds to control how do we use that well we can unlock Hardware that wasn't available to us in ring zero we can disable the cryptographic checks on the system we could break the system if we wanted to since since the management mode is in charge of thermoregulation on many systems we could just halt and catch fire we can open the lock box on the system or we could just install a really really nasty rootkit so that's the route I took the normal demonstration for when you gain system management mode access on the system is to simply brick the system and it's a really cool demonstration but it's not necessarily the most practical attack I thought it'd be way way more interesting to actually install a system management mode rootkit using the memory sinkhole attack so that's what we do here we deploy the fruit kit through the memory sinkhole once the rootkit has control and system management mode that means it can preempt the hypervisor it can periodically intercept ring zero code it can filter ring zero i/o accesses it can modify any memory on that system it can escalate process privilege levels if you can do all of this completely invisible to the operating system to ring zero to antivirus technology into the hypervisor our rootkit at this point once it's placed in sm ram is entirely undetectable on that system so for this demonstration i hacked together some code created by this guy crash who made a really really cool robust SMM rootkit so I adapted us some of his code for this demonstration what we do is we signal that rootkit by some magic 64-bit number and you shurl and when the SMM rootkits sees that magic 64-bit number it knows that some process is secretly requesting to be escalated in its privileges so the SMM rootkit then parses that processes page tables locates the process credentials and gives it root access to the system so if we revisit the attack that we illustrated early on the way this really worked so we have this attack driver code the attack driver is watching charge of using the memory sinkhole exploit in order to actually install a rootkit onto the processor so the attack driver is actually not too complex it simply installs a hook into low memory that's what SMM is accidentally going to branch two copies the actual rootkit that we want to use into memory for later reference and then it has some inline assembly here to actually apply the sinkhole attack so in this specific situation I was tinkering with using the sinkhole attack against the SMM stack so we sinkhole there stat forcing them to branch to address zero where our hook will take control and install the rootkit into system management ram our rootkit itself like I said adapted from a crash a source code also not especially complicated all it's going to be doing is reading a couple of register values out of what's called the State Save map basically it's going to look at what process it interrupted and if it sees these magic this magic 64-bit values in those registers then it knows that a process is security requesting privilege escalation so it calls the privilege escalation function for that process the privilege escalation function Proffitt parses that processes page tables locates its credentials and assigns it root privileges here so going back to the actual escalation attack when we run the escalation attack all it's really doing is loading those a couple of 32-bit values into the registers to signal to the rootkit running on the system now if we remember to actually install the rootkit so when we when we run the single strict script you'll use that driver to install our rootkit into system management mode work and no longer be seen so that now when we run our escalate process we surreptitiously gain access to route so the impact of this is fairly large we simultaneously just circumvented every single ring - to protection in place on this processor and the mitigations don't look great could we do a firmware update to fix this kind of thing maybe move the apec back to the proper location inside of the system management mode code you can try that the problem is by the time you're able to do that you have probably already been sync hold by the attack so it doesn't really work very well could you do a micro code update probably not this is too deeply ingrained in the processor it's not something you can easily reconfigure really this is unpatch a ball the only real mitigation to this attack is to build new processors unfortunately for me that's exactly what Intel did somebody had Intel discovered this shortly before I did and this problem is fixed on their latest generations of processors so starting with Sandy Bridge and atom 2013 processors what they have are some undocumented internal checks against what are called the system management range registers when you try to relocate the apec and if you try to put the a pic over sm ram they want to block that from happening the good news from an attackers perspective is that still requires the SMR RS to be properly configured which an alarming number of systems are still not doing as far as AMD goes I haven't had enough time to thoroughly research AMD I'm still analyzing that now keep people posted on what I find as far as AMD is concerned what I can tell you for now is AMD doesn't have these SM RR registers they never needed them for the same reason that Intel did so this fix on Intel processors won't work on AMD processors on top of that AMD tends to document their stuff really really well which is actually incredibly helpful from an attackers point of view so AMD actually has buried in their developer manuals of lourve stating that the a pic window takes precedence over the SM Ram window meaning that the apec attack should work on AMD processors and I don't know for certain but I suspect these are very vulnerable to this attack but these mitigations only work if every single one of these other security mechanisms is properly configured and even when that's the case there are hundreds of millions of processors out there right now in which this can't be fixed so I'm working on getting a CVT number from this if you're interested interested in Intel's perspective on a problem and what they're doing to mitigate the issue you can go to Intel comm slash security and see their write up right now so I coordinated this disclosure with Intel I got to tell you they've been fantastic to work with they were really interested in keeping their processors secure they instantly started working on mitigations wherever it was architectural feasible they're working on their most recently vulnerable processors and working their way backwards it will take a little bit of time for their updates to trickle down so in the meantime just be careful so looking forward for the future of this attack as far as I know this is actually only the second architectural privilege escalation vulnerability on the x86 processor and that's behind the original SMM cache poisoning attack discovered by the flatten invisible things a couple years back so there been exploits against the chipset configuration vulnerabilities and the firmware and software you've even had DRAM corruption errors but the processor itself has actually been surprisingly resilient to attack throughout its long history but it x86 is an immensely complex architecture with 40 years of evolution behind it so I think what we're going to keep on finding is that there are a multitude of pieces that are individually secure but maybe collectively vulnerable as was this case where the SMM was secure the APIC was secure but when you combine those two things and introduced an interesting vulnerability so I really think we're just beginning to scratch the surface in terms of these style of attacks so a lot of people went into helping this work either directly or indirectly so I wanted to make sure to acknowledge my coworker Scott Lee a bunch of fantastic researchers in this field on the teams at Intel and crash who created that SMM rootkit which I was able to deploy with this attack so if you're interested in looking at some proof of concept code for this and a little bit more depth you can check out my github page you can also follow me on Twitter for more updates on this attack as I do more research it's X or e ax e ax e ax or you can email me at the same address and I've got a few minutes left so I wanted to diverge a little bit and talk about a side project since I'm at blackhat and I'm talking about weird x86 things I also thought it'd be a good opportunity to release another weird x86 project I've been working on so if you saw me speak at recon a couple months ago I talked about this weird x86 fad and that the move instruction is itself turing complete so what that meant from our perspective is that you could write entire programs using only the move instruction which is really cool and as a proof of concept for that I made this compiler that compiled code from this really really awful language called brain yuck into only move instructions so we could actually run move only programs which was neat but writing code like this isn't the most practical thing in the world so I I promised an alteration to that compiler so today in the next hour if you check out the github page I'll post what I think is the world's first single instruction C compiler so it's got a little shell script there after you download the source code it'll help you build everything and then it'll help you actually run through this to see how the single instruction C compiler actually works so I call this thing the Moff ascator when you do a git clone you'll get this check script it's going to download an open source implementation of the AES algorithm it's in going to compile it with the Moff ascator and dump out the assembly results so what you see is just a absolutely monstrous number of unconditional move instructions used to implement this C program and sometimes a day this is actually going to finish dumping out and when it does we'll be able to actually see this program run and and run the program exactly as you would expect it to run so this was sort of an exercise or a thought experiments and anti reverse engineering or code obfuscation but mostly I just thought it was funny so when it's done it'll actually run this program and it'll do all of AES using only move instructions so I think it's a really cool kind of neat tool I would love to get some feedback on it so I'd love it if people were interested in checking that out so if you're interested in weird x86 things I'll be talking a little bit more about that at DEFCON but more importantly at DEFCON I wanted to illustrate another side project I've been working on now it's a way of manipulating control flow graphs in assembly code by manipulating your control flow graphs very carefully I found that you could do really groundbreaking revolutionary things like take a selfie in Ida so I'll release some source code for doing that to check that out at Def Con if if you're more interested in practical things the memory sinkhole is practical and you can find that source on github as soon as I post that in about an hour but I appreciate everyone's time here today I think I'm out of time for questions although I'm not sure about that but if anyone wanted to talk to me about this offline I'd love to answer any questions or discuss this further so thank you again
Info
Channel: Black Hat
Views: 118,522
Rating: 4.9625182 out of 5
Keywords: Black Hat USA 2015, Information Security, InfoSec, BlackHat, Black Hat
Id: lR0nh-TdpVg
Channel Id: undefined
Length: 46min 33sec (2793 seconds)
Published: Tue Dec 29 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.