>>Hi, so let's get this show on
the road, please settle down,
take a seat and please give a warm welcome to a first time
Defcon speaker Ulf [applause] so
please >>[laughter]So thank you everyone for coming and
listening to me. Today we are
going to direct memory attack the kernel. My name is Ulf Frisk
and helping me with the demos
today, I have, er, [inaudible]vist. Today, we are
going to totally own Linux,
Windows and OSX kernels by DMA code injection. We're going to
dump memory at speeds in excess
of 150 megabytes per second, we're going to both pull and
push files from the target
system. We're going to execute code and spawn a system shell.
After this talk I will be open
sourcing the software making all this possible, and since we are
talking about a hardware based
attack you're also need a hardware, which is already
available for purchase online
for less than $100. But first a little bit about myself. My name
is Ulf Frisk, I'm working as a
penetration tester, primarily with online banking security.
I'm employed in the financial
sector in Stockholm, Sweden. I have a master of Science in
Computer Science and Engineering
and er, most recently I've been taking a special interest in low
level Windows programming and
DMA. And this has been a little bit like learning by doing
project from my part, learning
more about 64 bit assembly and operating system kernels.
Actually in order to be able to
do this talk, I had to put up this slide, I need to point out
that er this talk is given by me
as an individual. My employer is not involved in anyway
whatsoever with what I'm doing
here today. But I'm here today to present PCILeech. PCILeech is
the combination between the PLX
technologies USB3380 development board coupled with a custom
firmware and a custom software.
On the image here you see this development board, in the mini
PCI express form factor. To the
left you see the PCI express side, the side that goes into
the target computer, or if you
wanted to call it the victim computer. The USB3380 is able to
send both DMA reads and writes
into the target system's main memory. To right, you see a USB3
connector, which allows us for
connecting this board to a controlling computer, and once
connecting to a controlling
computer this controlling computer is able to transfer
memory at very high speeds with
USB3 straight into the memory of the target computer. What's very
nice about this hardware, is
that it requires no drivers at all on the target computer, it
just works. It's hardware only.
And with this piece of hardware, I'm able to get well over 150
megabytes per second DMA
transfer speeds. Unfortunately this ship is only capable of 32
bit addressing and that means
that you're only able to access the lower 4 gigabytes of memory
with this card. As we will see
later on that's not really a problem in practice. Actually,
actually, the USB380 has been
presented here at Defcon before. It was presented two years ago
as the NSA playset slotscreamer
device by Joe Fitzpatrick and Miles Craybill. So I want to
really thank Joe for bringing
this really nice piece of hardware into my attention, so
thank you very much, Joe. If I
compare PCILeech to slotscreamer, it's obviously
exactly the same hardware, it a
complete, it's a different firmware and different software.
This also means that if you
already do have a slotscreamer device you should be able to
reflash it and try this software
on. It's er faster. The slotscreener was be able to
receive around 3MB per second,
something like that and the PCILeech device is able to
achieve well over 150 MB per
second, DMA transfer speeds. The PCILeech is also capable of the
kernel implants, in fact it's
relying heavily on kernel implants. But what makes all
this possible is, of course PCI
express. PCI express is a high speed serial expansion bus, or
it's not really a bus since it's
point to point communication but anyway, it's er packet based.
And to the upper right you see a
schematic of, er, PCI express. You have the PCI express root
complex anchored within the CPU
ship. From this root complex you have a er several serial lanes
that you can connect several PCI
express end points to, you can also connect like PCI switches
and bridges, so you can say that
PCI express forms a small device network within a computer.
Depending on how much bandwidth
the device needs it can er consume between one and sixteen
serial lanes, a graphics card
that needs lots of bandwidth typically consumes like sixteen
lanes. PCI express is designed
to be hot pluggable and it comes in many form package and
variations. It comes as the
standard form factor as you know, the PCI express, standard
graphics card and similar
things. It comes in the mini PCI express form factor as you saw
in a previous life. It comes as
express card which goes into laptops and also Thunderbolt
encapsulates PCI express. And
what's nice about PCI express from our point of view is that
it's DMA capable and that means
it's circumventing the CPU of course, so the PCI express
points can read and write memory
directly. But what's direct memory access and how does it
work? When you have the CPU core
it usually executes code in something called a virtual
address space and you have a
memory management unit which is built into the CPU which uses
page tables in order to
translate these virtual addresses into physical
addresses and it actually
translates pages and a page is typically 4KB long and it can be
larger as well but er, most
cases are 4KB long. And PCI express devices have
traditionally been able to
access all physical memories straight out without any
limitations whatsoever. But CPUs
now a days do have something called a IOMMU, which works a
similar way to, to the memory
management unit for a CPU and this allows for, works as
virtualization of device
addresses as well. So in theory, the operating system should be
able to protect themselves fully
against DMA attacks if the IOMMU is fully used. But as we will
see later on, that's not really
the case. Actually this is the complete firmware of the
PCILeech device. It's a whopping
46 bytes in total and the first two bytes is a header, or
actually the first byte is a
header 5A 00 tells us that it jus load data from the data
configuration er into the
configuration registers at power on. Next we have the length
which is in little endian, so 2A
is er, 4 to 2 bytes of configuration data. Then we have
the USB controller register, we
need it to enable the USB three port on this board because it's
disabled by default. First you
have an address to deregister, with the 2310 here, and then you
have er deward or 4 bytes or 32
bits which is er programmed into that register at power on. And
disenable the USB 3 port. Then
we set the PCI express vendor ID and Product ID to a broadcom SD
card and erm, this pretty much a
left over from the slotscreamer software I started to toy toy
around with. And then in green
here we enable the four DMA endpoints which are kept capable
of high speed DMA transfers
between USB three and PCI express. Insert the first
endpoint to a write endpoint
which allows us to write memory from USB into main memory of the
target computer as high speeds.
Then we set the following three endpoints to read endpoints.
Reason why we set three endpoint
to read endpoints is that read is much more common than write
and we can get a little more
transfer speed out of this ship if we're doing multi threaded
access. And of last we set the
USB vendor product ID and end product ID to Google Glass. And
the reason why I'm doing this is
that er I wrote this program for Windows. Windows has a very nice
user mode USB stack called
WinUSB, but er in order to activate it for a certain
hardware you need to sign a
small configuration file with a driver sign certificate. And er
those ones are kinda expensive,
so I didn't want to purchase one. It was much easier to find
a device out there that actually
uses this WinUSB stuff already and lie about being that device.
But, let's get into the kernels.
Most computer today, they do have more than 4GIGs of memory.
If you are able to get the
kernel module into a system it should be able to access all
memory. And also be able to
execute code. So what we can do is we can search for kernel
structures, code signatures –
whatever in lower memory using DMA and patch that code and
hijack execution flow of the
kernel code that way. And when we are doing this we need to
keep in mind that the PCI
express DMA works with physical addresses. Kernel code runs in
er virtual address space. I
divided exploitation into three stages: first you have the stage
one which is pretty much just a
hook, then we have the stage two which is the stager for the
final stage three kernel module
implant. We start by er trying to locate the kernel, or a
driver or whatever in the kernel
space that we can target. Usually at the end of the kernel
itself or a driver, there's some
free space in the last page because it's usually not
completely filled out and that
page is already executable so we put our stage two code in there
which is around 500 bytes. Then
we search for a function to hook and erm, yeah, and once we find
that function we overwrite it
with a cool into the stage two code which is already written
into the kernel. And when a
thread starts executing the hook function, it immediately jumps
to the stage two hardcode and er
the very first thing the stage two code does, it restores the
stage one code to its original
state. Then we check if you're the first thread running here,
we might run in a multi-threaded
environment. And er if you're not the first thread running
here, it immediately jumps back
to the now unhooked stage one function and resume the normal
execution flow for that kernel
thread. But if you are the first thread running here, we locate
the base of the kernel and we
need that in order to look up some er function pointers, that
we are going to need later on.
For example, we need those function pointers in order to
allocate two pages of memory.
The first page we use as a buffer, the PCILeech main
control program running on the
other computer can use DMA in this buffer in order to
communicate with the kernel
module that we are going to insert. The second stage is the
kernel module or the stage three
code itself. Then we write a small stage three step into the
second page, and this is pretty
much just a tight loop. And then we create a new kernel thread in
that loop. And at the very end
in the stage two section we write the physically address of
the buffer we allocated into the
er code where the stage two part is located and the PCILeech main
control program is pulling this
buffer all the time er with DMA and once it receives the
physical address, it writes the
complete stage three contents in that stage three contents into
that address it received. Then
the loop, which is already executing the thre- the kernel
thread there it senses that the
complete stage three contents is written so it exits exits and
erm starts by setting up a DMA
buffer which is around 4 to 16 megabytes big in lower memory.
And then it starts looping,
waiting for commands. The commands are pretty much: read
memory, write memory, execute
code or exit. But, let's start by attacking Linux. The Linux
kernel is located in a low
physical memory, if kernel address layerization is not
enabled it's located at 16
megabytes in physical memory. If KASLR is enabled, it slides at 2
megabyte chunks. So once we find
the kernel we search for a function, a random function to
hook, in my code I show
[inaudible], since it gets called pretty often and it works
fine. And then I search for a
function called kallsyms look-up name. This is pretty much the
equivalent of get proxy address
in Windows. It allows me to use a kernel symbol name, er and
send it to that function and if
we look up the function point, function pointer for that, or a
symbol for that. Then we write
the stage 2 code and write the stage 1 code, then we wait for
the stage 2 code to return with
a physical address of stage 3. We write the complete stage 3
code and then it's demo time. In
this demo I will show how we can use a generic kernel implant in
order to both pull and push
files from and to a Linux system and I'm also going to dump the
memory. Hmm. Let's see, demos
aren't supposed to be like this. Sorry about that. You see a
kallas Linux computer. We will
try to log on to that computer with the er root account.
[extended silence] Ah! That one
was not working here with Joe, we will reboot the computer
afterwards and do the demo, do
the Windows demo but we will start by dumping the memory on
this computer anyway. So, er,
let's dump the memory. Er...we use the Linux 64 bit kernel
module. We going to dump the
memory and store in C temp here. So first we insert the kernel
module into RAMing kernel, then
we receive execution and then memory dumping is starting.
Memory dumping works the
following way, is that kernel module first asks the RAMing
kernel about the physical memory
map. In computers, physical memory was not one big chunk of
memory, you have like memory PCI
Express devices in between there, that if you read those
ones you can crash the computer
whatever you also have unreadable memory such as system
management mode that you can't
read. So, it first queries the computer about the memory map.
Reports this back to the
PCILeech main control program and once the main control
program knows about the physical
memory map it can ask the inserted kernel module to read
certain memory chunks. Dumping
memory is usually pretty fast. It's a should be well over 150
megabytes per second but in this
demo I have to use a crappy USB hub, so the speed is a little
bit lower, but it should still
be well in the excess of 100 megabytes per second as you can
see here. [applause]Tthank you
very much. And er when we dump the memory, lets try run to run
volatility on it as well. I'm
running the Linux PCI command here. Just to show you that it's
working here. At the very bottom
you see the PCILeech kernel thread for the inserted kernel
module. And if you scroll up
here you see lots of kernel threads, and user mode and the
processes here and the system at
the very top here. Er, so let's move back to Windows 10. In
Windows 10 the kernel is located
at the top of the physical memory which is em is kinda
boring for us since we can't
access it directly and this a problem for us if a computer do
have more than about 3 and a
half GIGS of RAM. And the reason for that is memory map PCI
express devices and other things
pushes the last bytes of memory well over above 4 GIGS. So this
means that the kernel executable
is not reachable directly and most drivers are a bit below
below, above 4 gigs so they're
not reachable, but if we look at the memory structures below 4
gigs. We see that the page table
for the kernel itself and important er kernel drivers are
actually a bit below 4 gigs in
its entirety. So, let's attack the page table. Paging on a 64
bit system works this way. First
you have a virtual address or a linear address in the top in red
here that you wish to translate
into a physical address, and this what the memory management
unit is doing. So, it memory
management unit starts by reading the physical address in
a CPU register called CR3 in
order to find the um physical base address of a table called
page mapping level 4. And you
take the top most bits from the virtual address to point out
which entry in that table to use
and then in the PLM 4 entry you have the physical address of the
page directory pointer table.
And you take some more bits from the virtual address to point out
the entry in back table which
contains the physical address of the page directory. Take some
more bits from the virtual
address which contains the entry in back table which erm is erm
the physical base address of the
page table itself. Take some more bits in the page in the
virtual address and you get the
page table entry. It's the entry that we're going to target and
corrupt in order to gain kernel
execution. What's nice about this is that all four paging
structures here are actually
loaded below 4 GIGs, so we can access them by using DMA. Kernel
address space starts up the
address that you see here on this slide. Windows do have
kernel address space layer
randomization, so that means that there is no fixed erm
virtual addresses between
reboots; the kernel is loaded up different places and drivers are
loaded up different places as
well so we can't use that. But if we take a page level entry
and have a look at the er lowest
three bits and the highest bit which is the present bit, and if
it's a read or write page or if
it's a user or supervisor slash kernel page or if it's an
executable or non-executable
page those four bits together form what I call a page
signature. And if you take, have
a look at the driver, the kernel itself, it actually, you can
call those collec-collection of
page signatures a driver signature. So what I'm doing?
I'm searching for the driver
signature by walking the page table. Once I find the correct
driver to target, I locate the
page. I rewrite the physical address in the page table entry
to a place below 4GIGs which I
can control over DMA. So let's continue onto the Windows 10
demo. In this demo I will use a
page table rewrite in order to er implant a kernel module. I'm
going to execute code and I'm
going to dump memory. I'm going to spawn a system shell and also
try to unlock the computer. So,
let's switch over to the demo. Here we have a Windows 10
computer, we will try to log on
to that computer without using a password here. As you
can...[silence]. As you can see
we couldn't log on to that computer without using a
password on the domain account.
But what we can do is that we can insert the PCILeech device
here into the computer and once
we done that we can try to load a kernel module in to running
kernel by using a page table
hijack. So in Windows 10 because we are looking for driver
signatures we need to target a
specific driver version. So let's do that and use the page
table hijack here. So we search
for page table location. We hijack the page table. We wait
for a kernel thread to start
executing there. We receive execution and we loaded the
kernel module at this memory
address. And now we can try to remove the password requirement
on that computer. By the way
it's fully bit lockered so we can't log on to it without using
a password. In order to do this
we need to erm specify, we are going to use the unlock implant.
It works similar way to
inception, but er this is all done in kernel code because we
are inserting this kernel module
into the target system. And in order to insert it, it also
needs to specify the memory
address we just received here. So let's do that. And it has
zero success here. So it's a
zero here so, so let's try and log on. [applause]. As you can
see it's quite easy to log on to
that, this computer. Erm, let's er try to dump the memory of
that computer as well. And we
need to specify the kernel module address of the loaded
kernel module here as well.
Dumping, dumping memory works in a similar way to Linux. First we
ask the er kernel module that is
already inserted to report back the physical memory map to the
PCILeech main control program,
running on er my demo computer here. Then it er asks the
running kernel module to read
certain memory chunks it knows it's already accessible. And
store them in the DMA buffer
that was already allocated in lower memory. Memory dumping
takes around a minute on an 8
GIG system. And of course once you've dumped all the memory you
run memory forensics tools on it
such as volatility, you should also be able to, for example
extract credentials with
mimecast or that things like that. And this works on fully
bit lockered computers by the
way. So let's wait until the memory dump is complete here.
And let's er try to spawn the
system shell. In order to spawn the system shell, it can use the
PMS and D system kernel implant.
And we also need to specify the memory address of the kernel
module that is already inserted
into the kernel, so let's try to run it on the system shell. And
it's as easy as that. So let's
check who we are. [applause]. Thank you very much. And as you
can see, we're in a system here
and once we're in a system of course you can do everything. It
can disable bitlocker, we can er
spy on other users' files and er do whatever stuff, so. But let's
not do this here. So, erm
because this is a Windows demo, there is one more thing missing
here. So, we need to specify the
kernel module address here as well. We're missing a blue
screen here, and what's missing
that one? So er let's run the PS blue er kernel implant here. Erm
and er as you can see [audience
laughter and applause] as you can see Windows don't like me.
Actually, Windows 10 they do
have some very nice anti DMA features built in in the
enterprise version. But they are
not enabled by default. Windows 10 can be made rather secure
against DMA attacks if er the
rationalization base security features are enabled like
credentials guard and device
guard. But, it's quite easy often for users to mess around
with settings in the UA Field.
Like disable or disable VT-d, disable the secure boot and
things like that er, than er
this virtualization base security feature will be
disabled in Windows as well so.
So, we come to recommendations later on. But er let's target
the last missing operating
system here, that is OSX. OSX is just like Linux, it's located
the kernel of OSX, is located in
er low physical memory. Its location is dependent on the
kernel AS lower slide, it's
large in 2 megabyte chunks. OSX now a days imports kernel
extension signing system
integration protection means that users can't write to
certain folders. And kernel
extension signing means that you can't load unsigned drivers. OSX
today pretty much have er
Thunderbolt, but er Thnderbolt is the actually protected with
the Vt-d OSX actually uses this
app IOMMU to protect itself from DMA attacks so that's kinda
boring, so what can we do? In
order to change that? So we can visit Apple's website. Thank you
Apple. Erm Apple on their
website tells us in pain how to disable VT-d. So, ja, it's as
easy as that. In OSX we'll first
search by using DMA. We'll search for the er Mach O kernel
header. Mach O is the binary
format on binarous in mach to including the kernel. Erm, then
we search for like a random nice
function to hook. I think guy hook mancop in this example. And
then we write the stage 2 code
into the memory of the target computer, then we write the
stage 1 code. We wait for stage
2 code to return with physical address of stage 3 code, we
write the stage 3 code then it's
demo time. In this demo I will show you how to erm to disable
VT-d in order to gain DMA
access. And then we're going to dump the memory and unlock the
computer. So here you have a
Mac, actually to write here you have a express card to er,
Thunderbolt commander which you
don't really need for this part. Er, all you need in order to
disable Vt-d is that you need to
power on the Mac, which we will do in a second, this is kinda
slow here... I think the movie
was very slow. Hmm. Let's try to re-open it. Ah, let's move on
here. We're actually rebooting
to recover mode by pressing command R, er when we are
starting the computer, then you
enter recovery mode, there is no password entered into recovery
mode [some clapping from
audience] [laughs] and then you start the terminal and then you
type envy Ram boot or start
equals CO, just er ask the Mac, Apple tells you on their website
and VT-d is now fully disabled
here. Erm, so once VT-d is fully disabled, we should be able to
target the computer over
Thunderbolt here, so let's do that. Er, you have er Mac book
air with that adapter connected
to the right. And let's try to log on to that computer without
using a password at all. As you
saw we could not log on to that computer, which is kinda boring
so let's insert the PCILeech, I
control the adapter from [inaudible] here so. Let's start
by loading a kernel module into
the running Mac OS kernel here. And it's as easy as that. We say
that we're going to load the
kernel module and that we're going to target OSX here. And
the kernel module is loaded up
this address, and then we should be able to remove the password
requirement on this Mac. So
let's run the er Apple 64 bit unlock implant here. And we need
to specify the er memory address
of the already inserted kernel module as well. And it's a
serious success here. We have a
status zero here so we should be able to log on here. So let's
try to do that. [applause]. And
we're in! Thank you very much. Thank you very much. Erm, so
what can we do about this? In
order to protect ourselves better? Of course we can
purchase hardware without using
any DMA porche whatsoever, it's the low tech variant. Works
perfectly fine. Erm, if you do
have Windows with auto booting bitlocker and things like that,
erm we should be able to disable
like express [inaudible] in the er computers. You can do das,
this in er UAP settings usually,
but then you need to, you probably need to change the
bitlocker settings in order for
it to trigger if this port is re-enabled at a later stage. Of
course, if you don't want your
Mac erm security disabled in recovery mode you can set a
firmer password on the Mac in
order to protect yourself and also the setting a bio pasword
in the pc is a good idea. Of
course pre-boot authentication is always nice to have. And er,
of course the long term solution
here is for the operating system damage actually to make full use
of the IOMMU that is already in
the hardware. And Windows 10 has some very nice virtualization
based security features there
going on. So Microsoft seem to be do some very nice work as
well. So what can we use
PCILeech for? Of course we can use it for awareness. It's part
why I'm doing this talk. You saw
today that full disk encryption is not really invincible in
anyway. It's er excellent for
forensics and malware analysis. Er sometimes you want to er run
malware samples on relying
hardware and you don't want to pollute that system with lots of
diagnostics , software whatever,
so it could be nice to have a kernel implant in a hardware
device. You can use it to load
unsigned drivers into your operating system kernels. It's a
good pen testing tool. I do
realise that law enforcement might use this tool as well. But
er please, if you want to take a
look at this don't do any evil with this tool. PCILeech targets
64 bit operating systems. It
runs on 64 bit Windows 7 and 10 at the moment. It's able to read
up to 4 GIGs natively and if you
are able to insert the kernel module it should be able to read
all memory of the target system
that that kernel can er read. And if a kernel module is
inserted, obviously you can
execute code on the target system as well. I have kernel
modules for Linux, Windows and
OSX at the moment. It's er written in C and Assembly in
visual studio. It's a module
[inaudible] sign. I tried to make it as modular as possible.
You should be able to create
your own signatures very easily. And er also create your own
kernel implants. Actually to
right here you see very minimal kernel implant, erm it's in
Assembly and it reads some
control registers on of the CPU and prints them on screen on the
computer running the PCILeech
main control program. Maybe we should...but we are missing one
thing here, to try the Linux
demo again here. See if, er, better luck this time. So as you
saw we couldn't log on with the
tor as the default password. So, let's pull a file from the Linux
system. A nice file to pull is
the shadow file. And it's as easy as that pulling a shadow
file from a running Linux
system, which uses the encrypt by the way. And er, we can open
the shadow file and have a look
at it. And the router count here has a very long password hash.
So, of course you can try to
crack it, but it's no fun doing that, so let's replace it
instead with the default
password hash of tor er, so this is the default password hash of
Tor. Erm so let's write the file
back. And erm we're going to push it back to the Linux
system. And we are going to use
the file push kernel implant here. And now it should be on
the target system. So, let's try
to log on here. See if it works better this time. [extended
silence] And as you can see
we're in [applause]. So when you leave here today er I want you
to remember that em, inexpensive
universal DMA attacking is here, it's the new reality of today.
Physical ac-access is erm, still
very much an issue. You should be aware of potential email
attacks for example if you bring
your Mac on to a security conferences [laughter from
audience] and er, please do
remember that full disk encryption it's not invincible.
After this talk I will be making
the er Github [inaudible] public at er this address here. And
please give me a couple of hours
in order to do that, but I will definitely do it er today. And
thank you very much to Joe for
er the slotscreamer and er you've been a huge inspiration
also for my work here so thank
you very much Joe. [applause]. And also thank you to inception
for being a big inspirational
source for my work. And also the guys at PLX technologies for
creating this wonderful ship.
So, thank you, thank you very much for today.[applause].