DEF CON 24 - Ulf Frisk - Direct Memory Attack the Kernel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
>>Hi, so let's get this show on the road, please settle down, take a seat and please give a warm welcome to a first time Defcon speaker Ulf [applause] so please >>[laughter]So thank you everyone for coming and listening to me. Today we are going to direct memory attack the kernel. My name is Ulf Frisk and helping me with the demos today, I have, er, [inaudible]vist. Today, we are going to totally own Linux, Windows and OSX kernels by DMA code injection. We're going to dump memory at speeds in excess of 150 megabytes per second, we're going to both pull and push files from the target system. We're going to execute code and spawn a system shell. After this talk I will be open sourcing the software making all this possible, and since we are talking about a hardware based attack you're also need a hardware, which is already available for purchase online for less than $100. But first a little bit about myself. My name is Ulf Frisk, I'm working as a penetration tester, primarily with online banking security. I'm employed in the financial sector in Stockholm, Sweden. I have a master of Science in Computer Science and Engineering and er, most recently I've been taking a special interest in low level Windows programming and DMA. And this has been a little bit like learning by doing project from my part, learning more about 64 bit assembly and operating system kernels. Actually in order to be able to do this talk, I had to put up this slide, I need to point out that er this talk is given by me as an individual. My employer is not involved in anyway whatsoever with what I'm doing here today. But I'm here today to present PCILeech. PCILeech is the combination between the PLX technologies USB3380 development board coupled with a custom firmware and a custom software. On the image here you see this development board, in the mini PCI express form factor. To the left you see the PCI express side, the side that goes into the target computer, or if you wanted to call it the victim computer. The USB3380 is able to send both DMA reads and writes into the target system's main memory. To right, you see a USB3 connector, which allows us for connecting this board to a controlling computer, and once connecting to a controlling computer this controlling computer is able to transfer memory at very high speeds with USB3 straight into the memory of the target computer. What's very nice about this hardware, is that it requires no drivers at all on the target computer, it just works. It's hardware only. And with this piece of hardware, I'm able to get well over 150 megabytes per second DMA transfer speeds. Unfortunately this ship is only capable of 32 bit addressing and that means that you're only able to access the lower 4 gigabytes of memory with this card. As we will see later on that's not really a problem in practice. Actually, actually, the USB380 has been presented here at Defcon before. It was presented two years ago as the NSA playset slotscreamer device by Joe Fitzpatrick and Miles Craybill. So I want to really thank Joe for bringing this really nice piece of hardware into my attention, so thank you very much, Joe. If I compare PCILeech to slotscreamer, it's obviously exactly the same hardware, it a complete, it's a different firmware and different software. This also means that if you already do have a slotscreamer device you should be able to reflash it and try this software on. It's er faster. The slotscreener was be able to receive around 3MB per second, something like that and the PCILeech device is able to achieve well over 150 MB per second, DMA transfer speeds. The PCILeech is also capable of the kernel implants, in fact it's relying heavily on kernel implants. But what makes all this possible is, of course PCI express. PCI express is a high speed serial expansion bus, or it's not really a bus since it's point to point communication but anyway, it's er packet based. And to the upper right you see a schematic of, er, PCI express. You have the PCI express root complex anchored within the CPU ship. From this root complex you have a er several serial lanes that you can connect several PCI express end points to, you can also connect like PCI switches and bridges, so you can say that PCI express forms a small device network within a computer. Depending on how much bandwidth the device needs it can er consume between one and sixteen serial lanes, a graphics card that needs lots of bandwidth typically consumes like sixteen lanes. PCI express is designed to be hot pluggable and it comes in many form package and variations. It comes as the standard form factor as you know, the PCI express, standard graphics card and similar things. It comes in the mini PCI express form factor as you saw in a previous life. It comes as express card which goes into laptops and also Thunderbolt encapsulates PCI express. And what's nice about PCI express from our point of view is that it's DMA capable and that means it's circumventing the CPU of course, so the PCI express points can read and write memory directly. But what's direct memory access and how does it work? When you have the CPU core it usually executes code in something called a virtual address space and you have a memory management unit which is built into the CPU which uses page tables in order to translate these virtual addresses into physical addresses and it actually translates pages and a page is typically 4KB long and it can be larger as well but er, most cases are 4KB long. And PCI express devices have traditionally been able to access all physical memories straight out without any limitations whatsoever. But CPUs now a days do have something called a IOMMU, which works a similar way to, to the memory management unit for a CPU and this allows for, works as virtualization of device addresses as well. So in theory, the operating system should be able to protect themselves fully against DMA attacks if the IOMMU is fully used. But as we will see later on, that's not really the case. Actually this is the complete firmware of the PCILeech device. It's a whopping 46 bytes in total and the first two bytes is a header, or actually the first byte is a header 5A 00 tells us that it jus load data from the data configuration er into the configuration registers at power on. Next we have the length which is in little endian, so 2A is er, 4 to 2 bytes of configuration data. Then we have the USB controller register, we need it to enable the USB three port on this board because it's disabled by default. First you have an address to deregister, with the 2310 here, and then you have er deward or 4 bytes or 32 bits which is er programmed into that register at power on. And disenable the USB 3 port. Then we set the PCI express vendor ID and Product ID to a broadcom SD card and erm, this pretty much a left over from the slotscreamer software I started to toy toy around with. And then in green here we enable the four DMA endpoints which are kept capable of high speed DMA transfers between USB three and PCI express. Insert the first endpoint to a write endpoint which allows us to write memory from USB into main memory of the target computer as high speeds. Then we set the following three endpoints to read endpoints. Reason why we set three endpoint to read endpoints is that read is much more common than write and we can get a little more transfer speed out of this ship if we're doing multi threaded access. And of last we set the USB vendor product ID and end product ID to Google Glass. And the reason why I'm doing this is that er I wrote this program for Windows. Windows has a very nice user mode USB stack called WinUSB, but er in order to activate it for a certain hardware you need to sign a small configuration file with a driver sign certificate. And er those ones are kinda expensive, so I didn't want to purchase one. It was much easier to find a device out there that actually uses this WinUSB stuff already and lie about being that device. But, let's get into the kernels. Most computer today, they do have more than 4GIGs of memory. If you are able to get the kernel module into a system it should be able to access all memory. And also be able to execute code. So what we can do is we can search for kernel structures, code signatures – whatever in lower memory using DMA and patch that code and hijack execution flow of the kernel code that way. And when we are doing this we need to keep in mind that the PCI express DMA works with physical addresses. Kernel code runs in er virtual address space. I divided exploitation into three stages: first you have the stage one which is pretty much just a hook, then we have the stage two which is the stager for the final stage three kernel module implant. We start by er trying to locate the kernel, or a driver or whatever in the kernel space that we can target. Usually at the end of the kernel itself or a driver, there's some free space in the last page because it's usually not completely filled out and that page is already executable so we put our stage two code in there which is around 500 bytes. Then we search for a function to hook and erm, yeah, and once we find that function we overwrite it with a cool into the stage two code which is already written into the kernel. And when a thread starts executing the hook function, it immediately jumps to the stage two hardcode and er the very first thing the stage two code does, it restores the stage one code to its original state. Then we check if you're the first thread running here, we might run in a multi-threaded environment. And er if you're not the first thread running here, it immediately jumps back to the now unhooked stage one function and resume the normal execution flow for that kernel thread. But if you are the first thread running here, we locate the base of the kernel and we need that in order to look up some er function pointers, that we are going to need later on. For example, we need those function pointers in order to allocate two pages of memory. The first page we use as a buffer, the PCILeech main control program running on the other computer can use DMA in this buffer in order to communicate with the kernel module that we are going to insert. The second stage is the kernel module or the stage three code itself. Then we write a small stage three step into the second page, and this is pretty much just a tight loop. And then we create a new kernel thread in that loop. And at the very end in the stage two section we write the physically address of the buffer we allocated into the er code where the stage two part is located and the PCILeech main control program is pulling this buffer all the time er with DMA and once it receives the physical address, it writes the complete stage three contents in that stage three contents into that address it received. Then the loop, which is already executing the thre- the kernel thread there it senses that the complete stage three contents is written so it exits exits and erm starts by setting up a DMA buffer which is around 4 to 16 megabytes big in lower memory. And then it starts looping, waiting for commands. The commands are pretty much: read memory, write memory, execute code or exit. But, let's start by attacking Linux. The Linux kernel is located in a low physical memory, if kernel address layerization is not enabled it's located at 16 megabytes in physical memory. If KASLR is enabled, it slides at 2 megabyte chunks. So once we find the kernel we search for a function, a random function to hook, in my code I show [inaudible], since it gets called pretty often and it works fine. And then I search for a function called kallsyms look-up name. This is pretty much the equivalent of get proxy address in Windows. It allows me to use a kernel symbol name, er and send it to that function and if we look up the function point, function pointer for that, or a symbol for that. Then we write the stage 2 code and write the stage 1 code, then we wait for the stage 2 code to return with a physical address of stage 3. We write the complete stage 3 code and then it's demo time. In this demo I will show how we can use a generic kernel implant in order to both pull and push files from and to a Linux system and I'm also going to dump the memory. Hmm. Let's see, demos aren't supposed to be like this. Sorry about that. You see a kallas Linux computer. We will try to log on to that computer with the er root account. [extended silence] Ah! That one was not working here with Joe, we will reboot the computer afterwards and do the demo, do the Windows demo but we will start by dumping the memory on this computer anyway. So, er, let's dump the memory. Er...we use the Linux 64 bit kernel module. We going to dump the memory and store in C temp here. So first we insert the kernel module into RAMing kernel, then we receive execution and then memory dumping is starting. Memory dumping works the following way, is that kernel module first asks the RAMing kernel about the physical memory map. In computers, physical memory was not one big chunk of memory, you have like memory PCI Express devices in between there, that if you read those ones you can crash the computer whatever you also have unreadable memory such as system management mode that you can't read. So, it first queries the computer about the memory map. Reports this back to the PCILeech main control program and once the main control program knows about the physical memory map it can ask the inserted kernel module to read certain memory chunks. Dumping memory is usually pretty fast. It's a should be well over 150 megabytes per second but in this demo I have to use a crappy USB hub, so the speed is a little bit lower, but it should still be well in the excess of 100 megabytes per second as you can see here. [applause]Tthank you very much. And er when we dump the memory, lets try run to run volatility on it as well. I'm running the Linux PCI command here. Just to show you that it's working here. At the very bottom you see the PCILeech kernel thread for the inserted kernel module. And if you scroll up here you see lots of kernel threads, and user mode and the processes here and the system at the very top here. Er, so let's move back to Windows 10. In Windows 10 the kernel is located at the top of the physical memory which is em is kinda boring for us since we can't access it directly and this a problem for us if a computer do have more than about 3 and a half GIGS of RAM. And the reason for that is memory map PCI express devices and other things pushes the last bytes of memory well over above 4 GIGS. So this means that the kernel executable is not reachable directly and most drivers are a bit below below, above 4 gigs so they're not reachable, but if we look at the memory structures below 4 gigs. We see that the page table for the kernel itself and important er kernel drivers are actually a bit below 4 gigs in its entirety. So, let's attack the page table. Paging on a 64 bit system works this way. First you have a virtual address or a linear address in the top in red here that you wish to translate into a physical address, and this what the memory management unit is doing. So, it memory management unit starts by reading the physical address in a CPU register called CR3 in order to find the um physical base address of a table called page mapping level 4. And you take the top most bits from the virtual address to point out which entry in that table to use and then in the PLM 4 entry you have the physical address of the page directory pointer table. And you take some more bits from the virtual address to point out the entry in back table which contains the physical address of the page directory. Take some more bits from the virtual address which contains the entry in back table which erm is erm the physical base address of the page table itself. Take some more bits in the page in the virtual address and you get the page table entry. It's the entry that we're going to target and corrupt in order to gain kernel execution. What's nice about this is that all four paging structures here are actually loaded below 4 GIGs, so we can access them by using DMA. Kernel address space starts up the address that you see here on this slide. Windows do have kernel address space layer randomization, so that means that there is no fixed erm virtual addresses between reboots; the kernel is loaded up different places and drivers are loaded up different places as well so we can't use that. But if we take a page level entry and have a look at the er lowest three bits and the highest bit which is the present bit, and if it's a read or write page or if it's a user or supervisor slash kernel page or if it's an executable or non-executable page those four bits together form what I call a page signature. And if you take, have a look at the driver, the kernel itself, it actually, you can call those collec-collection of page signatures a driver signature. So what I'm doing? I'm searching for the driver signature by walking the page table. Once I find the correct driver to target, I locate the page. I rewrite the physical address in the page table entry to a place below 4GIGs which I can control over DMA. So let's continue onto the Windows 10 demo. In this demo I will use a page table rewrite in order to er implant a kernel module. I'm going to execute code and I'm going to dump memory. I'm going to spawn a system shell and also try to unlock the computer. So, let's switch over to the demo. Here we have a Windows 10 computer, we will try to log on to that computer without using a password here. As you can...[silence]. As you can see we couldn't log on to that computer without using a password on the domain account. But what we can do is that we can insert the PCILeech device here into the computer and once we done that we can try to load a kernel module in to running kernel by using a page table hijack. So in Windows 10 because we are looking for driver signatures we need to target a specific driver version. So let's do that and use the page table hijack here. So we search for page table location. We hijack the page table. We wait for a kernel thread to start executing there. We receive execution and we loaded the kernel module at this memory address. And now we can try to remove the password requirement on that computer. By the way it's fully bit lockered so we can't log on to it without using a password. In order to do this we need to erm specify, we are going to use the unlock implant. It works similar way to inception, but er this is all done in kernel code because we are inserting this kernel module into the target system. And in order to insert it, it also needs to specify the memory address we just received here. So let's do that. And it has zero success here. So it's a zero here so, so let's try and log on. [applause]. As you can see it's quite easy to log on to that, this computer. Erm, let's er try to dump the memory of that computer as well. And we need to specify the kernel module address of the loaded kernel module here as well. Dumping, dumping memory works in a similar way to Linux. First we ask the er kernel module that is already inserted to report back the physical memory map to the PCILeech main control program, running on er my demo computer here. Then it er asks the running kernel module to read certain memory chunks it knows it's already accessible. And store them in the DMA buffer that was already allocated in lower memory. Memory dumping takes around a minute on an 8 GIG system. And of course once you've dumped all the memory you run memory forensics tools on it such as volatility, you should also be able to, for example extract credentials with mimecast or that things like that. And this works on fully bit lockered computers by the way. So let's wait until the memory dump is complete here. And let's er try to spawn the system shell. In order to spawn the system shell, it can use the PMS and D system kernel implant. And we also need to specify the memory address of the kernel module that is already inserted into the kernel, so let's try to run it on the system shell. And it's as easy as that. So let's check who we are. [applause]. Thank you very much. And as you can see, we're in a system here and once we're in a system of course you can do everything. It can disable bitlocker, we can er spy on other users' files and er do whatever stuff, so. But let's not do this here. So, erm because this is a Windows demo, there is one more thing missing here. So, we need to specify the kernel module address here as well. We're missing a blue screen here, and what's missing that one? So er let's run the PS blue er kernel implant here. Erm and er as you can see [audience laughter and applause] as you can see Windows don't like me. Actually, Windows 10 they do have some very nice anti DMA features built in in the enterprise version. But they are not enabled by default. Windows 10 can be made rather secure against DMA attacks if er the rationalization base security features are enabled like credentials guard and device guard. But, it's quite easy often for users to mess around with settings in the UA Field. Like disable or disable VT-d, disable the secure boot and things like that er, than er this virtualization base security feature will be disabled in Windows as well so. So, we come to recommendations later on. But er let's target the last missing operating system here, that is OSX. OSX is just like Linux, it's located the kernel of OSX, is located in er low physical memory. Its location is dependent on the kernel AS lower slide, it's large in 2 megabyte chunks. OSX now a days imports kernel extension signing system integration protection means that users can't write to certain folders. And kernel extension signing means that you can't load unsigned drivers. OSX today pretty much have er Thunderbolt, but er Thnderbolt is the actually protected with the Vt-d OSX actually uses this app IOMMU to protect itself from DMA attacks so that's kinda boring, so what can we do? In order to change that? So we can visit Apple's website. Thank you Apple. Erm Apple on their website tells us in pain how to disable VT-d. So, ja, it's as easy as that. In OSX we'll first search by using DMA. We'll search for the er Mach O kernel header. Mach O is the binary format on binarous in mach to including the kernel. Erm, then we search for like a random nice function to hook. I think guy hook mancop in this example. And then we write the stage 2 code into the memory of the target computer, then we write the stage 1 code. We wait for stage 2 code to return with physical address of stage 3 code, we write the stage 3 code then it's demo time. In this demo I will show you how to erm to disable VT-d in order to gain DMA access. And then we're going to dump the memory and unlock the computer. So here you have a Mac, actually to write here you have a express card to er, Thunderbolt commander which you don't really need for this part. Er, all you need in order to disable Vt-d is that you need to power on the Mac, which we will do in a second, this is kinda slow here... I think the movie was very slow. Hmm. Let's try to re-open it. Ah, let's move on here. We're actually rebooting to recover mode by pressing command R, er when we are starting the computer, then you enter recovery mode, there is no password entered into recovery mode [some clapping from audience] [laughs] and then you start the terminal and then you type envy Ram boot or start equals CO, just er ask the Mac, Apple tells you on their website and VT-d is now fully disabled here. Erm, so once VT-d is fully disabled, we should be able to target the computer over Thunderbolt here, so let's do that. Er, you have er Mac book air with that adapter connected to the right. And let's try to log on to that computer without using a password at all. As you saw we could not log on to that computer, which is kinda boring so let's insert the PCILeech, I control the adapter from [inaudible] here so. Let's start by loading a kernel module into the running Mac OS kernel here. And it's as easy as that. We say that we're going to load the kernel module and that we're going to target OSX here. And the kernel module is loaded up this address, and then we should be able to remove the password requirement on this Mac. So let's run the er Apple 64 bit unlock implant here. And we need to specify the er memory address of the already inserted kernel module as well. And it's a serious success here. We have a status zero here so we should be able to log on here. So let's try to do that. [applause]. And we're in! Thank you very much. Thank you very much. Erm, so what can we do about this? In order to protect ourselves better? Of course we can purchase hardware without using any DMA porche whatsoever, it's the low tech variant. Works perfectly fine. Erm, if you do have Windows with auto booting bitlocker and things like that, erm we should be able to disable like express [inaudible] in the er computers. You can do das, this in er UAP settings usually, but then you need to, you probably need to change the bitlocker settings in order for it to trigger if this port is re-enabled at a later stage. Of course, if you don't want your Mac erm security disabled in recovery mode you can set a firmer password on the Mac in order to protect yourself and also the setting a bio pasword in the pc is a good idea. Of course pre-boot authentication is always nice to have. And er, of course the long term solution here is for the operating system damage actually to make full use of the IOMMU that is already in the hardware. And Windows 10 has some very nice virtualization based security features there going on. So Microsoft seem to be do some very nice work as well. So what can we use PCILeech for? Of course we can use it for awareness. It's part why I'm doing this talk. You saw today that full disk encryption is not really invincible in anyway. It's er excellent for forensics and malware analysis. Er sometimes you want to er run malware samples on relying hardware and you don't want to pollute that system with lots of diagnostics , software whatever, so it could be nice to have a kernel implant in a hardware device. You can use it to load unsigned drivers into your operating system kernels. It's a good pen testing tool. I do realise that law enforcement might use this tool as well. But er please, if you want to take a look at this don't do any evil with this tool. PCILeech targets 64 bit operating systems. It runs on 64 bit Windows 7 and 10 at the moment. It's able to read up to 4 GIGs natively and if you are able to insert the kernel module it should be able to read all memory of the target system that that kernel can er read. And if a kernel module is inserted, obviously you can execute code on the target system as well. I have kernel modules for Linux, Windows and OSX at the moment. It's er written in C and Assembly in visual studio. It's a module [inaudible] sign. I tried to make it as modular as possible. You should be able to create your own signatures very easily. And er also create your own kernel implants. Actually to right here you see very minimal kernel implant, erm it's in Assembly and it reads some control registers on of the CPU and prints them on screen on the computer running the PCILeech main control program. Maybe we should...but we are missing one thing here, to try the Linux demo again here. See if, er, better luck this time. So as you saw we couldn't log on with the tor as the default password. So, let's pull a file from the Linux system. A nice file to pull is the shadow file. And it's as easy as that pulling a shadow file from a running Linux system, which uses the encrypt by the way. And er, we can open the shadow file and have a look at it. And the router count here has a very long password hash. So, of course you can try to crack it, but it's no fun doing that, so let's replace it instead with the default password hash of tor er, so this is the default password hash of Tor. Erm so let's write the file back. And erm we're going to push it back to the Linux system. And we are going to use the file push kernel implant here. And now it should be on the target system. So, let's try to log on here. See if it works better this time. [extended silence] And as you can see we're in [applause]. So when you leave here today er I want you to remember that em, inexpensive universal DMA attacking is here, it's the new reality of today. Physical ac-access is erm, still very much an issue. You should be aware of potential email attacks for example if you bring your Mac on to a security conferences [laughter from audience] and er, please do remember that full disk encryption it's not invincible. After this talk I will be making the er Github [inaudible] public at er this address here. And please give me a couple of hours in order to do that, but I will definitely do it er today. And thank you very much to Joe for er the slotscreamer and er you've been a huge inspiration also for my work here so thank you very much Joe. [applause]. And also thank you to inception for being a big inspirational source for my work. And also the guys at PLX technologies for creating this wonderful ship. So, thank you, thank you very much for today.[applause].
Info
Channel: DEFCONConference
Views: 11,505
Rating: 5 out of 5
Keywords: DEF CON, DEFCON, DEFCON24, DEF CON 24, DC24, DC-24, hack, hackers, hacking, computer security, security conference, conference speakers, ulf frisk, memory attacks, kernel attacks
Id: fXthwl6ShOg
Channel Id: undefined
Length: 43min 19sec (2599 seconds)
Published: Sun Nov 13 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.