If you are struggling to get the firmware
out of your device, this is the video for you! In this video I will explain the possible
ways we can use to to get the firmware of our IoT device. I will do a practical example, of one of these
possible ways. I will connect the PC to the UART of our sample
device, I will analyze the boot log, I will access the command line interface of the boot
loader, and I will dump the firmware, exploiting the dump command available in the boot loader. I will use a couple of scripts, do dump the
entire EEPROM in an hexadecimal ASCII text file, and, then, to convert back this file
in binary form to get the exact image of the EEPROM. I am Valerio Di Giampietro, I am an Electronic
Engineer with a background in Digital Electronics and in Information Technology Infrastructures. I would like to be your friendly Italian hacker
neighbor willing to share with you tools and techniques for hardware hacking that I learned
by myself hacking many devices. And now let's start! This is the fourth episode of the series "Hardware
Hacking Tutorial" in this complete series we will talk about the hacking process based
on: Information Gathering from our device. Building an emulation environment where to
run interesting binaries. Discovering how the device works. And then hack the device and modify its firmware. This episode is about getting the firmware
file, that is one of the last steps in the information gathering phase. In this episode I will use the same sample
Gemtek router as in the previous episodes. One of the most important basic principle
in Hardware Hacking is to follow the "easiest path first", so, also in this case, the first
and easiest thing to do, is to search on Internet or on our device manual, if the manufacturer
has a website with a firmware image to download. If we find an image to download we can move
forward analyzing the image. Sometimes the downloaded image is encrypted,
it will be decrypted by the boot loader or by the operating system self-upgrade procedure;
in this case, unless we find useful information on Internet for decrypting, we need another
way to dump the firmware. Once dumped the firmware, extracted his file
systems and analyzed the software may be we can discover how the firmware was encrypted. Sometimes the firmware is not directly available
for download from a PC with a standard browser, but the device itself is able do download
the image. If this is the case we can sniff the communication
with Wireshark and, usually, we can obtain some information: We can get the URL of the firmware file so
we can download the firmware from our linux PC. Sometimes the server will allow the firmware
download only if it receives the "User Agent" string of our device, in this case we can
use a command line tool, like wget or curl to set the same User Agent string as in our
device (links to these tools in the description below). Or we can let the device download the firmware
and sniff the entire communication with Wireshark and, then, use the Wireshark ability, to reconstruct
and save the file downloaded by our device. But we can also have difficulties in using
this approach if the device use an encrypted protocol like HTTPS, in this case we can get
the fully qualified server name with Wireshark, but not the complete URL or the file content. We could do some trick, like using the mitmproxy
software (link in the description below) to try to do a man in the middle attack, but
if the IoT device correctly manages the security certificates this attack will not succeed. I will show some example of this type of attack
in future episodes. If our IoT device is a router and if we try
to sniff a router's communication we have the added difficulty that the router will
download the firmware update using the wan interface that is the ADSL or Fiber interface
and, for us, it is almost impossible to sniff on that interface. We could try to connect this router to an
existing LAN, change its routing table on his web management interface, and see if it
will accept to download the firmware update using the ethernet interface. I will show and example of this approach in
a future episode. Another possibility, to dump the firmware
of our device, is to attach our Linux box serial interface to the UART of the device,
and interact with the device boot loader. If the device has a boot loader that has a
Command Line Interface, and if this Command Line Interface has a dump flash command, we
can use this command to dump the entire EEPROM. This is the approach we will use for our sample
Gemtek router that we will see later. If everything above fails, we can try to use
the JTAG interface with Bus Pirate or Bus Blaster and OpenOCD, as explained on the previous
episode of this series, but this is quite complicated and, often, the JTAG interface
is software disabled. Sometimes the JTAG interface is available
for few milliseconds after powering on the device, before it is disabled, but to exploit
this possibility some additional circuitry, that controls the power supply of the device,
must be used. In future episodes I will show you some examples
of this approach. Another, possibility is to read the EEPROM
memory chip directly; in few cases, with serial based EEPROM with easy to access packages,
like some EEPROM with DIP8 or SOIC8 packages, it is possible to read and write the EEPROM
content without removing the chip from the board; but also in this case we have to give
power to the chip, but don't want to give power to the entire board and having the CPU
starting and interfering with our readings, so, sometimes we have to temporary cut some
pins from the board. Anyway usually the packages are much more
complex. For example, in our sample router, the package
is really compact with the pin pitch of 0.5mm, in this cases there is no possibility to attach
a clip directly on the board; one possibility is to de-solder the chip and remove it from
the board using an hot air gun, as the one shown on picture, and then use the appropriate
adapter, if we have one, to read the EEPROM with an EEPROM programmer attached to our
PC. Anyway this operation is not easy for an hobbyist,
there is the possibility to damage the chip and nearby components, if the temperature
goes too high, and it is almost impossible to manually re-solder the EEPROM on the board
later, so this approach can be used when we have more than one board and we can destroy
one. For our sample Gemtek router the firmware
is not available on Internet for download, so we have to find another way do dump his
firmware. We will connect our Linux box to the UART
interface and will analyze the boot log to see if it is possible to interact with the
boot-loader to dump the EEPROM and to get more information about our device. So, first of all, I connect my PC to the router's
UART interface. All the details on how to find the UART interface
and how to connect to it are available in the second episode of this series, link below
in the description. Then I start the PuTty serial terminal emulator,
enable logging to a file, power on the device and wait until the boot process have finished
to write a lot of information on the serial console and the standard Linux login prompt
has appeared. Now I try to login using the default username/password
printed on the manual to access the web management interface, but, as we can see, it doesn't
function on the serial console. Now we can start analyzing the boot log file
written by PuTty to see if there is something interesting, usually we can get many information
from a boot log file: We can get the boot loader name and version. The System On a Chip part number and his architecture
and instruction set. The amount of RAM installed. The amount of EEPROM installed. The operating system kernel and version. The file system types used. The EEPROM partition details. Information on the Init process, on Linux
systems. Information if the boot loader has a Command
Line Interface. Now we close the PuTtty terminal emulator
and start looking at the boot log file it has written on disk with the "less" command. One of the first information is related to
our EEPROM device; on this line MTD stands for Memory Technology Device and it is the
name of the device driver for interacting with flash memory. In our case we have a NAND Flash memory that
has some peculiarities: It can be read or written a page at a time. A page belongs to a larger block that must
be cleared before writing (every bit equal to 1). It can be erased a block at a time, a block
includes many pages. During operation some bits can spontaneously
fails, for this reason each page has a certain number of bytes, for error correction codes,
called OOB or Out Of Band data. It has a finite number of program/erase cycles,
this means that the file system must be aware of this limitation and spread the writes/erasing
cycles evenly on the memory. This information tells us that the page size
is 2Kbytes. The OOB, Out of Band data used for error correction,
is 64 bytes for each page. The erase size is 128Kbytes. the memory width is 8 bits that means it is
accessed a byte at a time. Then we understand that the boot loader is
U-Boot version 1.1.3, U-Boot is a popular Open Source boot loader (link in the description
below). It seems we have an additional Ralink WiFi
Board; Ralink is a WiFi chipset manufacturer that was acquired, few years ago, by Mediatek. This board is probably the one below the metal
sheet on our motherboard. It seems that this additional board is also
running the boot loader U-Boot with a more recent version, but, at the moment, we are
not interested in this additional board. We can confirm that the System On A Chip is
a dual core Mediatek MT7621A. We already identified it, visually inspecting
the board, on the first episode; it is running at 880Mhz. We have 128 Mb of RAM and we can confirm that
we have a NAND Flash EEPROM. Then we have a very interesting menu: It is
a U-Boot boot-loader menu that allows, among other things, to enter a Command Line Interface
prompt; it is exactly what we were looking for, later we will reboot the router and we
will enter this menu. Anyway, by default, U-Boot has booted the
Operating System from the flash memory. U-Boot loads the boot image in memory, it
has two parts, the Linux kernel and the root file system. The image is loaded at page 81.00.00.00 and
it is a MIPS linux image, this confirms that we have a MIPS architecture. Then the Linux operating system starts and
prints his kernel version that is 2.6.36, and the CPU revision and type, and this confirms
that we have a MIPS 32bit CPU. We also get another very useful information:
this system has been built using Buildroot version 2015.02, this will help a lot when
we will build an emulation environment where to run interesting binaries. Buildroot is a simple, efficient and easy-to-use
tool to generate embedded Linux systems through cross-compilation. Another useful information is that the root
file system is a squashfs file system; this is a popular file system in embedded devices;
It is never modified in EEPROM, it is loaded in RAM during boot and every time the system
is powered off and then powered on, it reloads the same unmodified root file system. This is the second image loaded by the U-Boot
boot loader. Then we can spot the most useful information:
how the EEPROM is partitioned, for each partition we have the starting address and the partition
length in hexdecimal. We have 9 partitions: two partitions for the boot loader; one partition that probably will store the
router configuration; two partitions for the environment, the boot
loader environment; two partitions for the kernel and the squashfs
root file system; two storage partitions for the read/write
file system used by the router. The reason why each partition is duplicated,
except the router's configuration partition, is to upgrade the router upgrading the non-active
partitions, and then switching partitions to boot from the new upgraded partitions. If something goes wrong, the router can automatically
boot from the old partitions. The configuration partition is not duplicated
because it stores the router configuration (like WiFi password, web admin password etc.)
that will remain the same across upgrades. Finally the Linux kernel starts the init process,
it is the first process started on a Linux or Unix system; here we can find another very
useful information: the init process is Busybox version 1.23.1. This means that this Linux system is based
on Busybox, this is a very popular choice in embedded devices, because Busybox, in a
single and small binary implements, with minor limitations, a lot of traditional linux commands
as the init process, the shell interpreter, the grep command, the ls command and many
many other Linux commands. Another useful information is that the storage
partition is an UBIFS file system that uses the lzo compressor. UBIFS is a popular file system for NAND Flash
devices, because it is aware of the NAND flash peculiarities and it is good at the so called
wear leveling that means distributing the writes evenly in the entire NAND flash device,
to extend the life of the NAND EEPROM that has a limited number of rewrites before starting
to fail. Near the end of the boot cycle we can see
that the router try to connect to his master, acs.linkem.com, using the TR069 protocol;
this is a standard protocol to allow an Internet Service Provider to remotely access, reset,
reconfigure and upgrade your router without needing your help or your consent. In this case the router is disconnected from
internet, so it is not able to contact his master and to resolve his master's hostname. Finally, at the end, we get our login prompt. The router calls himself "buildroot", it is
the default name of Linux embedded systems that have been built using the buildroot software. We try to login with the "admin" username,
because on the manual we have the "admin" and his password to access the web interface,
but instead of receiving a password prompt we are receiving a challenge code that seems
a binary string encoded in Base64, because the chars belongs to the Base64 character
set that includes letters from a to z, both lowercase and uppercase, digits from 0 to
9 and the / char and the + char. In one of the next episodes we will reverse
engineer this login binary and will understand how the authentication works. We will not be able to defeat this authentication
algorithm, but we will easily work around it, replacing this login binary with a standard
login executable. We have seen that analyzing what the device
prints on the serial console during boot, we got a lot of very useful and interesting
information; but for now we are mainly interested in the U-Boot command line interface to see
if we can dump the EEPROM content. Analyzing the boot log file we have seen that
the U-Boot loader prints a menu to let the user to choose the operation to do, so we
will power cycle our router and will wait until the menu is displayed on our terminal
emulator and then press "4" to enter the U-Boot command line interface. We now have a U-Boot prompt. U-Boot is an open source boot loader that
can be heavily customised, this means that usually only a small subset of all U-Boot
commands are actually available; we type "help" to have a list of commands. The most interesting command for us, at the
moment, is the "nand" command; to have more information we type "help nand". We can see that we have the "nand read" command
that can read from EEPROM and write to RAM. The "nand write" command does the opposite,
can read from RAM and write to EEPROM. The "nand erase_write" is a similar command,
but will erase the EEPROM before writing. The "nand dump" seems interesting to dump
the content of the EEPROM, that is what we need, but it doesn't do what we want, it dumps
some information about the EEPROM. The command that does what we need is "nand
page", we can see that if we pass the page number, it will dump the content of the entire
2Kb page on our terminal, including 64 bytes of OOB data, the Out Of Band data used for
error correction. If we type "nand page 0", then "nand page
1" up to "nand page FFFF" we can dump the content of the entire EEPROM. But we have two issues: first, it is not feasible to manually press
more than 65,000 times "nand page_number"; second, we have the EEPROM dumped in hex decimal
in a text file and not a binary file. For the first issue we can write a small script
that gives the "nand page" commands for us; I am an old man, so I used an ancient tool,
that was popular in the nineties, that is "expect" and it is based on the TCL language,
a language with a quite unusual and strange syntax. You can write this script in Python using
the Pexpect module if you prefer. I called this script "serial-flash-dump.expect",
you can find it on my Github repository, link in the description below. One important thing to note is that this expect
program have to interact with a TTY device, the serial interface in this case, and not
with the standard input/standard output, for this reason we need the expect tool or the
Pexpect module in Python because they are able to interact with a TTY device. In our case this device is the serial device
but, more in general, expect or the Pexpect Python module, will interact with the terminal
device. Anyway this is a very simple program: get the serial device name as parameter, in
our case it is /dev/ttyUSB0; set serial parameters, like serial speed and
so on; open the modem; wait for the string "Load Boot Loader code
etc" that is the last option in the U-Boot menu; then send the string "4" to select the U-Boot
command line interface; then execute a long loop, from 0 to FFFF,
each time waiting for the prompt and immediately after issuing the "nand page" command, passing
as parameter the loop variable converted in hexadecimal. We can see what this command does, executing
it in our linux box terminal. To save in a file what i am dumping we can
use the same command, with a pipe passing his output as input to the "tee" command;
the "tee" command will write on standard output everything it reads from his standard input
and will write also, the same content, to the named file passed as parameter, in this
case "eeprom.txt"; In this way the entire EEPROM will be dumped on the "eeprom.txt"
file and we can monitor that this script is running and has not frozen. We know that the EEPROM has 128Mb of RAM,
it is dumped in hexadecimal so each byte is converted in 3 chars (two hex digits, plus
the space), plus we have the OOB data for error correction that is 64 bytes every 2Kbytes,
a 3% overhead, this means that the dumped file will be about 400Mb. The serial interface has a speed of 115200
bit/s that means about 11.5 Kb/s, this means that it will take about 10 hours to dump the
entire EEPROM content! We can launch the expect script in the evening,
we can have a long sleep, and in the late morning we have the entire content of the
EEPROM dumped, in hexadecimal, in our text file. If we look at the text file, we can see the
strings that our expect script wrote, then, moving forward we can see the menu written
by the U-Boot boot loader, our script selected the option "4" for the command line interface. Then waited for the command prompt and sent
the "nand page 0" command, the U-Boot dumped the first 2Kb page of the EEPROM, including
the OOB data used for error correction. Then the script waited again for the command
prompt and issued the "nand page 1" command and so on until the last page of the 128Mb
EEPROM that is page FFFF. If we look at this file we can see that after
the "nand page" command we have a line with the string "page" and the number of the page
in hexadecimal, then we have the 2Kb of EEPROM page dumped in hexadecimal, 32 bytes per line
arranged in four groups of 16 lines separated by a blank line. To convert back to binary this text dump file
we can write a script that does this conversion. Again I am an old men and I learned the Perl
language in the early nineties and used it extensively till today, so I wrote this script
in Perl but, if you prefer, you can rewrite it in Python. It will read the text file in the standard
input, and it will use regular expressions to extract the hexadecimal strings, convert
them to binary and write the output to a binary file that will be, bit by bit, the EEPROM
image. I have ignored the OOB data, the data used
for error correction, and it seems that this hasn't produced any issue on the EEPROM image. The script is simple but it seems more complicated,
because it has the option to include the OOB data in the output and has some error checking,
to prevent writing the same page twice, if, for example, the input script has been generated
in multiple, overlapping, sessions. The script is called "hexdump2bin.pl" and
you can find it on same Github repository as the previous script. Links in the description below. The core of the program is this regular expression
that is expecting 2 hex digits followed by a space, repeated 31 times, followed by 2
hex digits; this time not followed by a space because at the end of the line we have an
end of line char and not a space. Then this line is split in 32 hex bytes, some
error checking is done, and then each hex byte is converted to binary and written to
the standard output. We can convert the EEPROM text dump file in
the corresponding binary file with the command shown. If we take a glimpse of the converted binary
file with the "hexdump" command we can see that it seems OK. If we take a glimpse of the converted binary
file with the "binwalk" command we can see that it find some interesting stuff inside,
like U-Boot image header, U-Boot version string, a squashfs file system, so, probably, it means
that our binary file is OK. Binwalk is a fantastic tool to analyze firmware
files, it can scan a binary file searching for many different types of signature, identifying
many types of boot loaders, file system images, segment of compressed data, digital certificates
and so on. It can also graphically display the entropy
of a binary file letting us to easily understand if it's a plain file or an encrypted or compressed
file with an undetected compression algorithm. It is particularly useful when the firmware
file has been downloaded, as a firmware update file, form our device supplier web site. Any way we will see in the next episode how
to extract the boot loader, the root file system and the other file systems from this
EEPROM image and to use, more generally, the "binwalk" tool. If you have found this video interesting please
subscribe, help this channel grow, share this video with friends interested in hardware
hacking. Please click the subscribe button and the
notification bell to be notified when new episodes will be released. And don't forget to click the thumbs up icon! Please let me know, in the comment below,
if you have found this episode easy to follow. Please give me feedback, writing comments
below, let me know if you have suggestions to improve this channel, if you enjoyed this
video, or if you din't like it or if you have any other type of comments. Every comment, both positive and negative,
is welcome! Thank you for watching, see you again on this
channel.