GOD MODE UNLOCKED - Hardware Backdoors in x86 CPUs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This guy again. Every time I see his face I know I'm in for a ride.

👍︎︎ 318 👤︎︎ u/[deleted] 📅︎︎ Aug 29 2018 🗫︎ replies

I loved the lecture and hearing about all the work that went into this hack. But the short version, should anyone not care to watch the whole thing, is that on certain VIA C3 x86 CPUs there is an actual, shit-you-not undocumented instruction that enables an undocumented execution core which in turn gives you complete access to the system from user space, including read/write access to ring 0.

On some CPUs he tested the attack requires you to first set a particular MSR bit, which can only happen in ring 0 anyway, but on others he found that this bit was already set.

Either way, it's absolutely insane that this is a thing, but at least this particular vulnerability doesn't seem to apply to Intel or AMD processors. Still, the fact that a major company like VIA would build a backdoor directly into their processors should make you wonder about Intel and AMD too (if you didn't already), and black boxes in general.

I'll just leave this here.

👍︎︎ 636 👤︎︎ u/ReturningTarzan 📅︎︎ Aug 29 2018 🗫︎ replies

Black hat wearing dark wizard

👍︎︎ 58 👤︎︎ u/vegetablestew 📅︎︎ Aug 29 2018 🗫︎ replies

You'd think an event this big could afford to record audio properly.

👍︎︎ 129 👤︎︎ u/nikomo 📅︎︎ Aug 29 2018 🗫︎ replies

Hey I used to work in the same office as this guy!

I was participating in a CTF and he was there. I got to watch him RE some stuff.

Dude is insane. The code he was REing was obfuscated to wits end. He solved the challenge in about 10 hours.

👍︎︎ 73 👤︎︎ u/leroy_hoffenfeffer 📅︎︎ Aug 29 2018 🗫︎ replies

So I got a question. How do these security researchers earn a living? His company is here, apparently they do security/hacking for IoT devices. They say they serve the Fortune 50 and the U.S. intelligence community but I'm just trying to figure out what exactly they would sell these companies and spies? Also, is that what all the security researchers do?

👍︎︎ 51 👤︎︎ u/foreheadteeth 📅︎︎ Aug 29 2018 🗫︎ replies

I think the most interesting parts of this were all the tools he developed for blackbox testing hardware. It seems like you could use these tools to look for undocumented instructions on any processor. Particularly impressive to work out the ISA to the extent of building an assembler for it, all completely blackbox.

👍︎︎ 14 👤︎︎ u/blauster 📅︎︎ Aug 29 2018 🗫︎ replies

The audio quality hurts my delicate ears

👍︎︎ 11 👤︎︎ u/cold12 📅︎︎ Aug 29 2018 🗫︎ replies

Ah, that topic again. Move on, nothing to see here.

This is a about a specific cpu VIA C3 (ancient) and the behaviour described is in fact in the manual so no backdoor whatsoever.

Here is the datasheet, check page 82. http://datasheets.chipdb.org/VIA/Nehemiah/VIA%20C3%20Nehemiah%20Datasheet%20R113.pdf

👍︎︎ 107 👤︎︎ u/twi6 📅︎︎ Aug 29 2018 🗫︎ replies
Captions
all right Thank You Stephanie so welcome every line I'm here to talk to you today about something that I didn't really think was possible at first but it's it's gonna be interesting so the the idea of backdoors is kind of thrown around a lot today to the point that it's it's largely lost all meaning but what I'm going to talk to you about today is is not the management engine it's not the platform security processor it's none of the things that people are normally so concerned about it's it's something that we never really saw coming and I think something a lot more interesting but before you again like any good research start off with a disclaimer I did all this research on my own in my own time I was an independent consultant and none of this reflects in any way the beliefs or actions of my current employer but with that my name is Christopher Thomas I'm a cyber security researcher I've tinkered with a lot of different things over the years but uh the last couple years what I've been interested in is low level processor exploitation and vulnerability research so let's start off with a demo of kind of what I mean by that and the kinds of things we can unlock and specifically let's look at what we're going to explore today in this in this presentation so I am logged into a system just a regular system unmodified running default OS configuration I'm logged in and as a non privileged easier named Delta I'm going to open up dot C file called demo dot C and Emma dot C is a very simple file all we do is we load an address into the EAX register then we've got a label and then we've got all of these bound instructions so the x86 bound instruction is not a very common instruction you might not be familiar with bound but the idea behind the x86 bound instruction is it will take an address and it will take a second address and see if the first address is within the bounds specified by the second address now you'll notice this bound instruction has a rather unusual set of second addresses associated with it these are basically look like random numbers and in fact this processor does not have access to the memory at that address that's being specified and like anything else in x86 if you don't have access to the memory that you're trying to use you will get a general protection exception or in Linux a segmentation fault so despite the fact that sorry about that despite the fact that all of these registers we know are going to cause segmentation faults at the at the end we're still going to try to launch a shell and see if anything happens so let's let's give this a try we'll compile this little program we will execute it and sure enough this like we expected we get a segmentation fault and our user hasn't changed so that's something I'm terribly interesting here but if I go back into this program I'm gonna make one tiny little change I'm gonna add wine x86 instruction it's an instruction that's so secure or so obscure and unknown it doesn't actually have a name in fact it doesn't it's not supposed to exist I have to write this in machine code Oh F 3 F is the one instruction I'm going to add to the beginning of my executable and when I execute this instruction the fundamental nature of all the subsequent valid instructions is going to change and what I'm going to be able to do is use those instructions to reach directly into the kernel bypassing all the processors security mechanisms in order to give myself root access on this system so this so this kind of thing is not supposed to exist and the rest of this presentation is gonna be a long convoluted journey sort of seeing how I came across this this feature and the whole thing begins with the idea of rings of privilege so in the beginning thirty years ago in x86 there was no concept of separation of privileges on the processor basically any code running on the processor had the same permissions as any other code running on the processor and things were basically chaos there was nothing stopping minesweeper from having the exact same privileges as the kernel and that's not a good situation to begin and so thirty years ago they implemented the idea of separation of privileges different rings of execution on x86 processors and maybe with something like this only some code would have complete unfettered access to the entire system unrestricted access to the system's hardware and that was the kernel code that would live inside of the most privileged ring-ring zero then outside of ring zero slightly less privileged code would live in ring whine less privileged madden ring - and our least privileged code where we would throw all of our user code would live in ring 3 and then fundamental idea of separation is why we can have some sort of confidence that our mind super game is not also harvesting credentials from my banking account sitting in another process because in order for a ring three code to do anything of importance it has to go through very very strict careful Hardware security checks in order to ask ring zero to do something for it so that's the fundamental basis of all security and x86 processors today but we started digging deeper basically this this ring model wasn't well suited for running multiple operating systems on one processor we needed something more privileged than ring zero in order to handle that so we invented the hypervisor and since it was more privileged in ring zero colloquially we kind of called that ring minus one but there's some things we didn't want hypervisor to do we threw all those things into system management mode and since that was more privileged in the hypervisor we called that ring -2 and then a couple of years ago some researchers came along they said hey there's this entirely different processor sitting on the platform that can actually do things that the x86 processor can't do so we started calling out ring minus three and it's just sort of getting ridiculous at this point but if you've been following this research as it sort of expanded over the last twenty and the back of your head you probably even think like can we go further how deep does this rabbit hole go and that's sort of the question I I set out the answer when I went down this path so when I'm not sort of beginning research on something really big and unknown I found a good place to start is sometimes with patents because sometimes you can find information in patents that you can't find in any other documentation so given this idea of this privilege model of these rings of privilege and x86 imagination surprise without sifting through patents and I saw this little blurb just sort of nonchalantly buried in the in the middle of a patent on that completely different idea it said additionally accessing some of the internal control registers can enable the user to bypass security mechanisms for example allowing ring 0 access at ring 3 my head kind of exploded when I saw this like all of our security on x86 is based around this idea of rings of privilege and this little blurb is telling me there may be some way to circumvent all of that in one fell swoop but they go on to say in addition these control registers may reveal information that the processor designers wish to keep proprietary well that's kind of understandable if I had some circumvention for all the privilege mechanisms on the processor I'd probably want to keep that proprietary too but then they go on to say for these reasons the various activity 6 processor manufacturers have not publicly documented any description of the address or function of some of the control amsr so so that makes sense but that means we're probably dealing with something undocumented that we don't have a lot of access to so I did what any rational person would do in this situation I went out and bought fifty-seven computers to start doing some research on to see if I could dig into this idea a little bit further so I had some idea based on the patent owner in the patent time frame I had some idea for what processor I might be trying to look at here but but patents are a funny thing where the intellectual property gets bought by different entities and ideas sort of trickle through the industry in weird ways I sort of want to cast a wide net to try to analyze this this idea of a ring circumvention mechanism but eventually what I settled on was a processor with a system with a b a c 3 processor so v is one of the three major x86 manufacturers and c 3 is a that they had a while back so these were specifically targeted at embedded systems they're marketed towards point-of-sales kiosks ATMs gaming since we're in Vegas you might want to start poking around after this digital digital signage healthcare digital media industrial automation and of course you can still find them in pcs and laptops so this is a system I eventually pull off my shelf for this research and what I'm going to talk about for the rest of this presentation this is a thin client with a c 3mm aia core inside of it and i'll talk later on about how this issue might affect other processors but for now that's the system we're going to be focused on now I was unable to find a developer manual for this processor that would have been a really useful starting point but you know even the patents sort of hinted at the idea that a lot of this stuff's not going to be documented so that means we have to find some other path forward so what you can do in the situation what I did was sort of try to follow a trail of patent breadcrumbs just try to read different patents that might be related to one another in order to try to piece together as much information about this this backdoor as as I can so this this quote isn't actually from one of the patents I ended up using this was just another patent that I stumbled across along the way but in order to give you some idea of what kind of things you're dealing with when you're reading this patent literature I wanted to quickly give you an example of some some patents speak so this says figure 3 shows an embodiment of a cache memory referring to figure 3 in one embodiment cache memory 3 xx multi-way cache memory and another embodiment cache memory 3 xx comprises multiple physical sections in one embodiment cache memory 3 xx is logically divided into multiple sections in one embodiment cache memory 3 xx includes for cash flows ie cash weigh 310 cash weigh 311 catch weigh 312 and cash flow 314 in one embodiment a process or sequester's one or more cache ways to store or at or 2xq processor microcode like this is the most convoluted legalese I've ever had to sift through and it is just so incredibly tedious trying to drive any information from patent literature so you can imagine how awful it is to try to sift through a hundred pages of this kind of text and just to put that in perspective that one four-page patent had the phrase in one embodiment hundred and forty two times like trying to do research this way is absolutely painful but if you're persistent it can it can start to pay off so after a lot of work I narrowed in on these six patents that seem to have some loosely related ideas that might be able to point me in the right direction for circumventing the ring privilege model in x86 so some of the key ideas from these patents that I sort of sort of narrowed in on was it look like at the time via was embedding an on x86 core alongside their x86 cores in the c3 processor this non x86 core was a risk type arbiter architecture and the patents didn't have a consistent term for this but I started calling this the deeply embedded core the debt they also talked about something called a global configuration register it was a register that would be exposed to the x86 Court through a model specific register and the patent suggested that this global configuration register could activate the risk core they also talked about what was called a launch instruction it would be a new instruction added to the x86 instruction set architecture where once the risk core was activated you could use the launch instruction to start instruction a sequence of risk instructions so so what does that leave us well this is all sort of wild speculation at this point is piecing together little pieces of different patents but if our assumptions about this deeply embedded core are correct that means they could be used as sort of a backdoor or a means of surreptitiously circumventing all of the processor security checks so that's obviously something worth diving into a lot further so let's start at the beginning how would we go about activating this circumvention mechanism well there's there's three little tidbits from the various patents I was able to pick up on that might point us in the right direction here we've got one patent that says a model specific register can be used to circumvent processor security checks we've got another patent that says a model specific register can be used to activate the new instruction in x86 and we've got a third patent that suggests that a launch instruction can be used to switch to a RISC instruction sequence so if you put these three things together and sort of fill in the gaps it sounds like there's some model specific register bit that enable the new x86 instruction that activates a risk cord that can bypass the processor security checks so let's start with the first step in that chain this model-specific register so just a little bit of background for people not familiar with the idea of MSRs on x86 so MSR SR 64-bit control registers and they are really really varied they're used for things like debugging and performance monitoring and cache configuration and feature configuration basically anything not directly related to computation can be talked into the the MSRs on x86 and unlike the registers you might be more familiar with like EAX or edx on x86 MSRs are accessed by address not by name and the MSR addresses go from 0 to 4 billion so the way you access a specific MSR is you load its address and the dec X register and then you use the read MSR and write MSR instructions in order to access the contents of that MSR so theoretically one of these Emma stars will allow us to eventually bypass security mechanisms on the processor but there's a saving grace here you can only access MSRs from ring 0 code so even though we might be able to enable a backdoor on the processor it should require one time ring 0 kernel level access in order to activate or maybe not we'll revisit this part later but just so that we can move the research forward for now let's assume that we have one time ring 0 access just to tinker with the MSR s in order to get the rest of this chain active and we'll revisit that limitation later on so going back to that original patent I talked about they do mention that the varies x86 processor manufacturers have not publicly documented any description of the address or function of some of the control MSRs so that leaves us in a bit of conundrum if we don't have any documentation for which MSR we're looking at how are we going to find a wine that activates these secretive features so step one it seems to be let's just figure out which Emma stars are implemented by the processor ignoring any documentation or anything else like that let's figure out which ns ours actually exists on the processor and this one's a pretty easy one to solve basically what you can do is in a kernel module set your general protection exception handler to be specific function under your controller you can with the lidt instruction to reconfigure that exception handler then you're going to load an MSR address into the ECX register so let's say I wanted to figure out does MSR 133 7 exist on this processor I'll load 133 7 into the ECX register then I'm going to try to read that model specific register and then if I don't get a fault that means that the MSR exists whether or not the documentation says that hema SAR exists it must exist if I don't get a fault on the other hand if my exception handler gets controlled that means that that MSR does not exist so this is a really really simple way to iterate through all the possible EMA czars and figure out which exists on the processor which don't exist on a processor so when I ran this little algorithm on my targets III processor what I found were that it had 1,300 implemented model specific registers meta is a way more than would be typical on an x86 processor and kind of threw a wrench in this whole process that's too many MSRs to analyze I think one bit and one of these MS ours is going to activate this this new x86 instruction but I can't sift through 1,300 MSRs on my own so the next question is well which of these ms ARS are actually unique which really could be the ones that I'm looking for what are the interesting ones that I should be focused on so I came up with this idea for sort of a timing side channel attack on on the processor where basically what I would do would be to act calculate the access times for all four billion possible MSRs so what that looks like is we have a read MSR instruction and then on either side of that read MSR instruction we have some serialized read time stamp counter instructions and that lets me measure exactly how much time it takes to read any given model specific register so what that ends up looking like is is this on the x-axis here I've got my four billion possible MSRs on this platform on the y-axis is the amount of time it takes to access each of those MSRs in green I'm showing the MSR that actually exists on the processor in red are the unimplemented MSRs so this is some really really interesting insights into the processor that would normally be totally off-limits to us we can actually sort of peer under the micro code and understand what's happening with these various emissaries using this timing side channel so I want to throw out an observation here may be that functionally different MSR should have different access times so for example accessing the time stamp counter MSR should take a different amount of time and accessing a thermal sensor and I saw and that's because each of those Emma stars is going to a different micro code implementing them that micro cuts going to take different amounts of times to execute so I would expect the access times for each MSR to be different if the MSR czar very different on the other hand functional equivalent MSR should have approximately the same access time so for example there's something called the MTR are the memory type range registers in x86 it's a set of MSRs that control caching behavior for different regions of memory what I would expect would be that MT rr0 would have roughly the same access time as MTR r1 because even though they control different regions of memory they're functionally equivalent MSRs so what that means is that this timing attack gives me a way to differentiate between light and unlike and the Czar's and I'm going to define like MSRs as Jason MSRs with equal are functionally related access times so then specifically I want to throw out this hypothesis that the global configuration register this model specific register hinted at and these patents is probably unique it doesn't make sense to have multiple functionally equivalent versions of the GCR this thing should have a bit that activates a new x86 instruction that I can use for circumventing protections and it doesn't make a lot of sense to have several of these different registers all doing the same thing so when I started looking at my timing graph I can begin to pick out the functionally unique MSRs in this in this graph it's the one said are separate from the other MSRs on the system so when I begin to identify the functionally unique MSRs I'm able to actually hair down that original list of MSRs a lot more so using the side channel I found 43 unique model specific registers from the 1,300 that are actually implemented on the processor so that's that's really exciting for me that means it seems like my research can move forward 43 cm it sounds like a much more tractable number of model specific registers to analyze than 13-hundred the problem is 43 and ours is still 27:52 bits to check at 64 bits of piece that is a lot of bits to sift through I want to find one bit that activates this this launch instruction this new x86 instruction but that's too many bits for me to go through by hand especially when you consider the magnitude of the x86 instruction set so theoretically one of these bits enables a new x86 instruction well if we look at how many possible x86 instructions there can be it turns out there's a lot x86 is a really really complicated architecture and an upper bound on the number of possible at ten six instructions would be something like 1.3 undecillion instructions so I'm looking for a single new instruction amongst 1.3 undecillion possible instructions even if I take a really really generous estimate and say that I can scan one billion possible instructions a second we can do like a quick Fermi calculation to see like 1.3 undecillion divided by a billion divided by 60 seconds in a minute divided by 60 minutes an hour divided by 24 hours-a-day divided by 365 days a year means scanning for a new instruction X 86 is going to take approximately one eternity so I don't have time to sit around waiting to find this this new instruction especially when I've got to do a scan for every bit and I'm looking at about 2700 eternities trying to find which bit activates this new x86 instruction so I was really fortunate in this case because I actually looked at this exact problem last year and developed this tool called stance after so what's an sector does is it finds an intelligent way to scan through the x86 instruction set it uses page fault analysis and a depth first search algorithm to quickly find all the x86 instructions of interest on a processor so I still can't run San v or 2700 different times it takes about a day to scan a processor but this at least opens up some opportunity from from moving forward what I can do is I can look at each of those 2700 Candidate MSR bits and I can try to toggle each of them one by one not doing an instruction scan between each one but I can just try to toggle each of these bits one at a time of course I'm going to run to some problems these are configuration bits that can the deepest workings of the processor and I have no idea what they do so a lot of them are going to lock the processor free z/os panic the kernel or just reset the system entirely so this is still something that's not really doable by hand but we we can automate this process so I developed a simple setup where I would have a target system the vs III processor would have a wire soldered on to its power switch and that wires connected to a relay and that relay is connected to a master system the target system boots from the master over the network and the master system can SSH into the target and assign it jobs basically it can SSH into the target and start toggling its potential MSR bits one by one and what the master will do is it will detect when the target is frozen or panicked or locked and use that relay in order to power cycle the target so this way I can automate the process of trying to toggle each of these 2700 MSR bits one by one and we're going to see how many can I get on before the system becomes unstable so through hundreds of automated reboots the saying ran for about a week and I was able to identify exactly which of those bits could be toggled without the system having any sort of visible side effects so so with that done I would go in and I would Tabo all the stable MSR bits that I could possibly access and then run the sand sifters scanning tool in order to see if any new instructions had appeared on that x86 processor so so that looks something like like this so since after it is scanning scanning the system for it for new instructions using it's it's page fault analysis depth-first search thing I sort of started watching this taking this video in the middle of a search but if you let this scan for about a day what you'll eventually find is sand sifter will spit out some some new information for us so there we go after about a day of scanning sand sifter finds exactly one new instruction in x86 that was not supposed to be there it finds a Oh F 3 F so judging from the patent literature this is what they're calling the launch instruction this is the new x86 instruction and enabled by some bit in the global configuration register so with GD being a little try I was able to figure out that this launch instruction is effectively a jump EAX instruction it just jumps to whatever address is in the EAX register so now I want to know which of those bits that I activated was really responsible for enabling this this much instruction fortunately now that I know what the instruction is I no longer have to run complete sense after scans in order to test my theories basically what I can do is I can activate a candidate MSR bit and see if Oh Oh F 3 F exists on the processor if it doesn't that wasn't the correct MSR but if it does and I found a bit that I'm after so using this approach I was able to determine pretty quickly that MSR number 1107 on the processor activates this launch instruction so 1107 must be what the patents are calling the global configuration register the register that will unlock this new functionality going further what I suspect at this point is that by unlocking this other risk core on the processor I will use that core to circumvent all of the security checks built into x86 that that really opens up some phenomenal opportunities and exploitation so because of that power I called that specific bit in MSR 1107 the god mode bit is basically bits 0 of that register was the one ultimately responsible for enabling this new x86 instruction so so with this I figured out the god mode bid I figured out what the launch instruction is now the question is how do I actually execute instructions on this risk core so we can dive into the patents a little bit more to try to speculate on how this might work and the patents include some interesting figures for this brutal instruction set pipeline and what they suggest is that some time after fetching an instruction that instruction might be sent to the x86 core or if the risk score has been activated it might be sent to the risk core so I I went through a lot of trial and error and a lot of different models for exactly how this would work and I ended up settling on something like this where what's essentially hard to happen is an assertion is going to fetch from the instruction cache it's going to go to some x86 pre decoder that's basically going to break that instruction part apart into its basic components then those components are going to pass a check the processor is going to check has the risk or been activated with that launch instruction or not if it is not it's going to pass those components over to the full x86 decoder and those are going to go through the x86 pipeline if it has it's going to break off one of these components and send it over to the risk decoder and execute that as a risk instruction instead in other words with this setup there is some x86 instruction where if the processor is in risk mode it can pass a portion of itself over to the RISC processor and since this x86 instruction he essentially joins the two cores it joins the x86 in the risk core I started calling this the the bridge instruction but we don't know what this instruction is yet I have no idea how to actually feed instructions to this risk court that I've activated so to find the bridge instruction it should be sufficient to detect that our RISC instruction has the execute has been executed but that's easier said than done I have no idea what this risk core looks like I don't know what instruction execution on that core should look like so how are we going to detect if we've successfully executed a RISC instruction well there is one easy way the theory here is that if this core actually does provide a way to get around the processor security checks then there should be some risk instruction that when executed in ring 3 would corrupt the system basically that would give us something really easy to detect if the processor locks or the kernel panics or the system resets that would be a sure sign that we executed a RISC instruction because none of those things should be able to happen if you're just executing ring 3 x86 instructions so if we detect any of that behavior we know that we've found a corrupting RISC instruction and therefore found the x86 bridge instruction as well so I explore this I sort of poor part the heart of sand sifter and repurposed it for a brute-force buzzing of the the processor and what's Ancestors going to do now is it's going to execute that launch instruction before every x86 instruction that it generates and what since if there's trying to do is it's trying to generate a processor lock and you'll see that we just did that after fuzzing the processor for little bit basically when you observe that the processors lock when from ring three we locked the processor that indicates that we've found that bridge instruction that we're looking for we found a way to send commands to that x86 core or to that risk court embedded alongside the x86 core so it takes about a full hour of buzzing to find us but what's en cetera uncovered was that a bound EAX instruction in x86 is able to send instructions over to the risk core after that core has been activated specifically bound he X has a 32-bit constant value in it and it's this 32 bit constant value that forms the actual risk instruction that will be sent to that deeply embedded core that might be able to bypass processor security checks so so we're getting further we figured out how to execute instructions on the deck I use this bridge instruction to send it and risk instruction so so the next question is what do I actually want to execute like what do these instructions look like what architecture am I even dealing with if I want to use this actually circumvent security checks I need to answer these these questions so so the obvious first approach to dealing with this unknown architecture is just assume that it's some common architecture there's really no need to reinvent the wheel adding a new core to your your x86 processor so maybe this thing is some common architecture that we've seen before so to test that we can basically try generating simple instructions in some common architecture like arm power PC or MIPS how might generate instruction like add one to our zero forearm and I could try executing that on the deeply embedded core and see what happens the the challenge here is without knowing anything about that deeply embedded core there's no obvious way for me to necessarily check the results of an instruction so sure I can generate some simple RISC instruction and try to execute it but I don't know what I'm looking for after it executes so the good thing here is that there is still a simple way to rule out architectures what I was observing is that a lot of the risk instructions I tried to execute would actually lock the processor so if I generated a really simple instruction like add one to ours zero for RISC architecture and I tried to execute that on the deck and the processor locked up that's a pretty good sign that that wasn't the architecture that I was expecting so I can rule out that this is armed if it's choking on simple arm instructions unfortunately when I took this approach I pretty quickly ruled out 30 very common RISC architectures for that deeply embedded core now I still think that this thing is probably based off of some common architecture there'd be no reason to spend something up from scratch but it's probably heavily enough modified that I couldn't recognize it so so that means I've got to sort of deal with this deeply embedded core as a black box I've got to treat this as an unknown architecture and figure out how it's actually communicate with it so to do that we've got to basically reverse-engineer the instruction format for that deeply embedded core and I spent enough time on this I I made up my own name for this I called started calling it the deeply embedded instruction set that dies so this dies might look like and and how we might begin to to reverse engineer so one idea would be we could execute a RISC instruction and basically try to observe its results like I said the challenge with that though is we don't know what the risk is a looks like so there's not necessarily an easy way to observe the results of an of an instruction but as I started reading more into the patent literature I saw that there are hints that this risk court and the x86 core had a partially shared register file which means we should be able to observe the results of at least some of these risk instructions from the x86 core so this is a diagram from the patent literature where they're sort of showing in an arm core and how some of the register registers are shared with the x86 core on the processor so what that means is we could do something like this and this is just sort of a rough summary we could generate a processor State so we could generate some buffers in kernel memory generate some buffers and user memory generate a state for all the processor registers and we could load that state onto the processor after we loaded up this state will execute the launch instruction followed by an x86 bridge instruction the bridge instruction will feed a RISC instruction over to the deeply embedded core and let that instruction execute after that instruction execute Simon deeply embedded poor what we're going to do is read the state off of the processor will read out all the registers will read out all the buffers that we set up and then we'll see if anything's changed between that input and output state to see if we can start deciphering what effect that randomly generated instruction had so nothing's ever simple with this another challenge came up I'm dealing with an unknown instruction set with unfettered access to ring 0 so I'm just trying to randomly generate these RISC instructions and see what happens but it turns out it's really really easy then to generate an instruction that causes a kernel panic or a processor lock or system reboots and in practice I could only execute about 20 random RISC instructions before the system became unrecoverable corrupted and had to be rebooted so even after optimization the fastest I can get one of these systems to boot was about two minutes so some quick calculations suggested that gathering enough data to reverse-engineer the entire instruction set at this rate was going to take months and months so I decided to expand the original system that that I had built for fuzzing this so what we see down here are instead of one target I've now got seven of these inclined targets if you look closely you can see a little green wire coming out of the chassis of each of these all hooked up to this relay up here the relay is hooked up to a master system that can use that relay in order to power cycle each individual target the targets network boot through the master and the master once each system is booted we'll SSH into the target and give it a fuzzing task the master records the results of the fuzzing as they occur into a database and when it detects that one of these targets has become corrupted if the colonel is panicked the system's frozen or what-have-you it will use the relays in order to power cycle that specific frozen target so we can see what this thing's looks like in action so what I'm going to do is I'm going to start a fuzzing job on the master it's going to generate some fuzzing tasks for each of the targets and once it's done that it'll start power powering up each of these that targets one by one so if you watch the lights on the relay and if you listen carefully you can hear the relays clicking you'll see that each of these targets is powering up one by one if you watch the little green LEDs and the targets you'll see them coming up and it'll take about two minutes for those targets to completely come up but once it does we can actually start to see some fuzzing results occurring so what we're seeing scrolling by on the a master system now is every once in a while you'll see a state if there is one scroll by on the screen and the master is recording each of those discs as they occur and my hope is that I'll be able to analyze those discs offline later in order to detect what changes each instruction is having but of course every once in a while one of these instructions will lock the system and you can see the relay blinking as the master is now resetting frozen target systems so this thing ran for about three weeks I collected 15 gigabytes of logs about 2.3 million different state disks for four thousand hours of compute time in order to try to gather enough information to reverse engineer this unknown instruction set so the very first thing I wanted to figure out after I collected this giant collection of logs was is any of my theory correct am i actually able to circumvent ring protections through this embedded risk core on the processor and I was really excited when I found this first instruction a seven seven one nine five six three an instruction that I executed in ring three sending that instruction over to the deeply embedded core from the bridge instruction and we can see that EDX got loaded with a new value here EDX became eight zero zero five zero zero three three that is the value of this control register zero so CR 0 is supposed to be a register that is only accessible to ring zero but we just read it in ring 3 using the deeply embedded core and we're not limited to leaking data from ring zero we can actually write data that only ring zero should have access to so for example only ring zero should be able to modify the debug registers in x86 but here we can actually see that this deck instruction this a deeply embedded instruction was able to write the value of EBP into the dr zero register so this is a really good sign things we're looking looking promising to be able to use this backdoor for privilege escalation purposes so really when we've started tearing the boundaries of rings when we can reach directly into the kernel we can do whatever we want at that point but I wanted to come up with some sort of simple interesting proof-of-concept payload to demonstrate the capabilities of this of this deeply embedded core on the processor so I thought an easy easy proof-of-concept would be privileged escalation payload and the privilege escalation payload would look something like this it's the very first thing we want to do is we want to read the global descriptor table register and an x86 global descriptor table register is going to point us to the global descriptor table and one of the entries in the global descriptor table is going to be for the FS segment register we can theoretically if we can circumvent rank protections we can reach directly into kernel memory and pull out that segment descriptor from the global descriptor table and there's some bit fields in that segment descriptor that will give us a pointer to our processes task structure and again if we are able to reach into kernel memory and pull out information we can grab a pointer to what's called the cred structure that sits within this test structure so the credit structure is going to hold my processes permissions so with that test of the credential structure what I'm going to do is I'm going to give myself root permissions root and Linux is basically defined by zero so I'm going to use my range circumvention techniques in order to write the value root to the new ID the GID the EU ID and the eg ID in that credentials structure so at least that's what our payload theoretically looks like the next question is can we actually build this thing using this deeply embedded core now if you look carefully at the payload there's really only a few places where we actually bypass the ring boundaries where we actually reach into Kirk ring zero and start modifying things so those are really the only places where I need this deeply embedded core to do my work for me but it is kind of fun to write instructions for this the secret core on the processor and make it do interesting things so I thought would be a little more interesting to actually try to write this entire payload in and instructions for that deeply embedded core as opposed to just the kernel accesses that I needed so so to build that payload well I'm in a situation where I've got 50 gigabytes of of logs I basically got start sifting through this thing for primitives so this actually starts to feel a lot like building a rope chain where conceptually I know what I'm trying to accomplish but I need to figure out how to use my individual pieces in order to accomplish that so we can start looking for through those logs for some some things to help us out so for example I want to find a gdt read instruction from that det-cord sure enough I can find a three one three which seems to read the global descriptor table register into the EBS register I need a kernel read primitive I need some way to read kernel memory from that deeply embedded core sure enough I can find an instruction that does a d-- 4:07 read this single bite out of kernel memory and into a processor register I need a kernel right privilege I can find out to e2 b7 does exactly that it wrote the the low bit of ECX into a kernel memory buffer so this is this is promising but sifting through these logs like this just doesn't scale very well I really wanted to be able to write robust payload it's a lot of different things for this deeply embedded core so I needed some way to automate this approach that wouldn't require me to manually go through 15 gigabytes of logs so I started looking for a way to reverse engineer this deeply embedded instruction set and an automated fashion so my idea here was if we could extract instruction behavior from the set of state disks I could start to identify bit patterns index they're in these risk instructions so I built a tool for this that I called the collector and basically what that's going to do is it's going to help us automatically your verse engineer this unknown instruction said so the clutter starts out by trying to look at state dips and identify basic patterns and then so for example it'll look for words being copied around in the state if it'll look for immediate values being loaded into registers you know for one register being transferred to another register to look for memory rights memory reads and increments decrements shifts media loads and look for various arithmetic and bitwise instructions just by looking at patterns in these state differential records so what we give you is it will start classifying instructions based on the patterns that it observed so for example here it's telling me well these are all the instructions that I found for you that transferred information from one register to another now the very first thing the collector is going to do at this point after it generates this instruction then it's is it's going to try to resolve what do these individual bits me in this instruction so through a through some manual analysis I was able to figure out for example that the eex register is encoded by the bits 0 0 0 0 and the ECX register is encoded by the bits 0 0 0 1 so the first thing that collector is going to do is it's going to check for each of these instructions based on the register change that I'm observing can I find those bit patterns somewhere in this instruction so it'll pull out patterns like this this is basically everywhere in this entire set of data that those individual registers could have been encoded in their corresponding instructions and what you'll see is that there are patterns here there's only one location and this is a set of register the set of instructions where that pattern is it's consistent so the collector is able to infer well it has to be these two columns encoding your your registers one column encoding the source one column encoding the destination and we can use that same technique to pull out other bit patterns so for example here we're trying to resolve which what op codes are being used to encode these different instructions and you'll see that it's not perfect ideally we would see all these registers would say with all these instructions with exactly the same opcode pattern and that's not what we see some of the later ones don't really follow the patterns that we saw on some of the earlier ones but that's okay the collector will deal with that it'll just try to pick out the patterns that are the most probable or the most common it will even do things like try to pick out don't care bits or bit set have some unusual statistical properties that might indicate that they do something even though we don't know what they did and then it'll Jam all this information together in order to automatically derive a bit encoding for that specific instruction so based on the beans that the collector was finding he's a sort of being coatings that I came up with in the basic instructions I had to work with for the deeply embedded core I've got instructions to load the global descriptor table I've got instructions to move data around load immediate values read data out of memory write data to memory i've got all the basic primitives I need now in order to implement that payload that I was trying to implement so so at this point I decided to go all out and I wrote a complete assembler for this custom assembly language that they called the dice assembler which basically means that I can write instructions and one of their primitive forms like this at a high level and then get it encoded into a binary format I can then wrap each of those instructions in the x86 bridge instruction and send that instruction over to the wrist core in order to execute that instruction so looking at our payload again in dice assembly it looks something like this we grab the global descriptor table register we calculate an offset to that FS descriptor we use our ring for Convention instructions in order to pull out the scripture descriptor directly out of kernel memory and parse its bit patterns at that point we've got a pointer to the task struct so we'll pull up how apart that pointer in order to get appointed to the cred struct will pull apart that pointer in order to start writing out the UID the G ID the EU ID and the eg ID fields all in this custom deeply embedded instruction set for that alternate core on the processor so collectively that looks something like this we actually build that into a working payload we've got our introductory instructions that are going to be the the launch instruction to activate the deeply embedded core and then we've got each of those assembled dice instructions wrapped and the bridge instructions so that I can feed it over to that embedded core fine even we're done we launch a shell in order to try to try to get our privilege escalation so one more time revisiting that demo from the beginning now now that we understand how all the different pieces work we began by unlocking the backdoor by executing that secret x86 launch instruction that was enabled by the god mode bid after we've executed the launch instruction the processor is ready to start accepting RISC instructions but we're only if we get those instructions over to the risk core by wrapping the instruction in the x86 bridge instructions so that bound EAX instruction is now how we can send these individual risk instructions over to that deeply embedded core in order to get it to execute exactly what we want in order to make it circumvent the processor security checks that we're trying to get around finally we'll execute a shell in order to view our results so once again we simply compile this program and check who we are we're still a regular user will relaunch this program and see that we've become route so I'm only gonna have seriously throw this out but this is sort of like a ring - floor on the processor so marine - 3 was a core that was completely separate from the x86 processor this is a little bit more deeply embedded it's sort of a secret kool-aid located core alongside the x86 core unlike green - 3 it has unrestricted access to the x86 cores register file it's got shared execution pipeline with the x86 core which in many ways makes this more powerful than ring - 3 was but at the same time the whole thing's nebulous this deep and the whole ring model has sort of completely shattered by this point but it is sort of an interesting thing to think about so that leaves us with this this is direct ring 3 - ring 0 hardware privilege escalation on x86 this has never been done that every other every other can't at this kind of thing has used operating system code or other program code in order to exploit some flaw this is purely a hardware circumvention mechanism for the fundamental ring privilege model behind all of our processors fortunately it shouldn't be that big of a deal we slowly ring zero access in order to get this whole thing kicked off the whole rest of the pipeline can be done in ring 3 but the first change that that toggling the god mode bit had to be done with ring 0 access at least that's the theory as I started poking around some of these other systems I had in my stockpile that turned out to not be the case so what we're looking at here is another via C 3 plus so this is a Samuel 2 core I've just booted up the system from scratch nothing else has been modified nothing else been touched on it we're under login as an unprivileged user I could have dumped all this over cereal I didn't have the right setup for that but after we login to the system and what we're going to do is install the MSR kernel module in order to gain access to the NSR we're not going to modify them but na I'm going to read them out I don't have the read em it's our tool on this so I'm just going to hex dump out a specific MSR bit so I'm looking at the global configuration register on this processor and when I do that you'll see the low byte the global configuration register is d7 that is 1 1 0 1 0 1 1 1 the low bit of this register is the god mode bid that means the God would bit is active by default on a freshly booted Samuel 2 core that means that any unprivileged code on this system could have access to Colonel at any time which which is a scary prospect and when you sort of blow away the x86 ring protection model entirely all of our other protections fall apart antivirus does nothing now ASLR depth they're easily circumvented when you can just directly reach into ring 0 code-signing control flow integrity kernel integrity checks don't do anything when there's no more separation between the rings so so it's a scary prospect but there are mitigations so one approach would be to update the microcode to lock down the god mode bit or we can update micro code to disable the bridge instructions so that we couldn't feed instructions to the deeply embedded core alternatively we can update the OS and firmware to disable that god mode bit and just periodically check its status to make sure it hasn't been enabled but at the end of the day this is an older processor it's not in widespread use and I don't mean to throw me under the bus I think this was a good idea that just had a flaw on its implementation their target market was embedded and this was probably a useful feature for their customers so instead we should take this as a case study backdoors do exist in hardware but we can find them with the right techniques so looking forward I do think this is a big deal this isn't just a c3 problem this isn't just an x86 problem this is an overall flaw or an overall problem in all of computer engineering where these black boxes that were trusting for all of our computation we have no way to introspect them so whether or not these Hardware backdoors exist anywhere else this is a problem that's going to continue to haunt us until we have a way to look into these black boxes so I think that's the big takeaway here is whenever we find something secretive or off-limits we have security researchers need to push deeper because that's how we establish trust that's how cybersecurity should work so along those lines I've open sourced all this information all the tools techniques code and data from this are available online now you can scan anything scan everything and I really really hope people will build off of this for future research and know let's dive deeper and deeper into into processor security I do want to quickly pitch a side project project nightshift I'll talk more about this in a few days but it turns out that you can use that side channel attack that I demonstrated in order to reveal password-protected registers in some x86 processors so I'll be demonstrating that at Def Con if you're around but beyond that you can find all this work right now on my github that's github comics or Exe ax EAX that's project Rosen bridge is this specific project you can also find the scenes if you're fuzzer there the mops cater single instructions C compilers there some other fun stuff I've done over the years is all there if you have any feedback or ideas on this I would absolutely love to hear it I'm totally out of time now but you can grab me after the talk you can shoot me an email or a contact me on Twitter at X or Y ax e ax ax or same thing at gmail.com so thank you everybody for attending and that's all I got
Info
Channel: Black Hat
Views: 261,903
Rating: 4.902452 out of 5
Keywords: Black Hat, Black Hat USA, Black Hat USA 2018, BHUSA, Black Hat 2018, BlackHat, Black Hat Briefings
Id: _eSAF_qT_FY
Channel Id: undefined
Length: 50min 59sec (3059 seconds)
Published: Tue Aug 28 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.