If you have downloaded the firmware file for
your device from the supplier's website or if you have dumped the EEPROM from your device
and you want to extract the root file system and other information, this is the video for
you! In this episode I will talk about the available
options to understand where the root file system is located in the firmware image, and
the tools to use to extract it with the purpose to analyze it. I am Valerio Di Giampietro, I am an Electronic
Engineer with a background in Digital Electronics and in Information Technology Infrastructures. I would like to be your friendly Italian hacker
neighbor, willing to share with you tools and techniques for hardware hacking that I
learned by myself hacking many devices. And now let's start! This is the fifth episode of the series "Hardware
Hacking Tutorial" in this complete series we will talk about the hacking process based
on: Information Gathering from our device. Building an emulation environment where to
run interesting binaries. Discovering how the device works. And then hack the device and modify its firmware. This episode is about extracting the root
file system, it is the last step in the information gathering phase. In this episode we will use 3 different types
of firmware file: An encrypted firmware update file for a digital
camera, downloaded from the supplier's website. I will not succeed to extract the root file
system, but we will learn something useful anyway. Another file is a firmware upgrade for an
home router, downloaded from the supplier's website; we will successfully extract the
file system, with some minor issues. The last file is an EEPROM dump that we dumped
from the sample Gemtek router in the previous episode. We will do everything on our Linux box using
some simple tools: Like the "file" command, that gives very basic
information about any type of file. The "strings" command, that prints embedded
strings in a binary file. The "hexdump" command, that prints the hex
dump of a file, including the ASCII equivalent of each byte. The "binwalk" software, it is able to scan
a binary file searching signatures of many different file system images, of compressed
data segments, of digital certificates and of many other type of information embedded
on a single binary file. It is also able to show the running entropy
of a file allowing us to understand if we have an encrypted or compressed segment inside
the binary file. The "dd" command, it is able to dissect a
file, easily extracting part of it, or reassembling a file putting together different parts. We have downloaded from the Canon website
the firmware update for my Canon EOS M50 camera, the one that I am using now to record this
video. Link to the download page in the description
below. We get 2 files, plus the upgrade instruction
manual: One file, CCF19.DAT, is only about 4 Kb The other file, CCF19103.FIR is about 31Mb
and it is the firmware file If we use the "file" command on these files
it tells us that we have data files. The "file" command will scan the first bytes
of each file trying to understand the format of the file, so it will give us information
if the file is an executable file, a shell script, a text file, a compressed file, a
video or image file and so on. But if it doesn't find anything known it simply
says, as in this case, that we have a data file. We can use the "strings" command on each of
these files. The "strings" command will search, and print,
a sequence of n consecutive chars that are printable ASCII characters, by default n is
equal to 4 and it will produce a lot of random garbage on output; one useful argument is
"-n" to increase this default value. If we increase the minimum string length to
6 we have a reduced number of garbage lines, but we can see that nothing useful is printed
on both files. We can look at these files with the "hexdump"
command. Hexdump can dump a file in ASCII, decimal,
octal or hexadecimal; the most useful option is "-C" to dump in both hexadecimal and ASCII. If we take a look at the smaller file, CCF19.DAT,
piping the output of hexdump to the "less" command, we are not able to see anything recognizable
or interesting, it seems just random bytes. If we look at the firmware file, CCF19103.FIR,
we can recognize some consecutive strings of zero at the beginning of the file, we can
recognize the string "1.0.3", that is the firmware version, but from byte 00.00.00.bc
onward, it seems that we have only random bytes. The situation is the same till the end of
the file. We can now use the "binwalk" command to scan
the entire content of these two files to search for signatures of file systems, compressed
segments, digital certificates, and so on. But we can see that "binwalk" hasn't recognized
any signature inside each one of the binary files. Binwalk is a very powerful utility, it also
has the very useful option "-E" to plot the entropy of a binary file. The entropy goes from "0" to "1" and it is
a sort of measure of the randomness of the byte sequence in the binary file, this means
that: a totally random file will have entropy equal
to 1; if we create a 1 Mb random file reading form the random device generator, /dev/urandom,
and then we use "binwalk -E" we can see that the entropy is always near to 1; a binary executable ELF file that has some
form of redundancy and different sections inside, has an entropy always below 1 as the
example with the binary file /bin/ls shows; a text file, like this long wikipedia article
that is a "List of mountains of the British Isles by height", downloaded with wget, has
a great deal of redundancy and his entropy is always well below 1; a compressed file, for example the same wikipedia
article compressed with gzip, has an entropy always near 1, because the compression operation,
to save space, removes the redundancies in the original file; an encrypted file, like the same wikipedia
article encrypted with mcrypt, has an entropy always near 1, because often the encryption
is preceded by a compression and because the encryption algorithm has the purpose to transform
a file to something similar to an unintelligible random sequence of bytes; Now that we understand better what binwalk
entropy means we can execute binwalk on the shorter file first and we can see that the
entropy is near 1 for almost the entire file, it decreases toward the end, but if we look
with "hexdump -C" at the end of the file we cannot spot anything meaningful. If we repeat the same thing on the firmware
file we can see that the entropy is always near 1 for the entire file. This means that the firmware file is encrypted,
probably the smaller.DAT file has something to do with the encryption algorithm but we
don't know exactly. Anyway, if we are not lucky and don't find
anything on Internet on how to decrypt this firmware file, we have to find another way
to dump the firmware as explained in the previous episode of this series. In this case this means disassembling the
device, trying to identify components, UART and JTAG interfaces and so on, but this device
is a 700 $ camera, extremely compact and with a high probability of breaking something during
disassembly and re-assembly, so I will not try to do it! If some previous firmware was not encrypted
we could download the last unencrypted version, extract his file system and try to understand
and reverse engineer the self upgrade procedure that, for sure, will contain the decrypting
algorithm. Unfortunately this is not the case for our
Canon EOS M50 camera! There are no available old, unencrypted, firmwares. As second example we have downloaded the firmware
file for a D-Link router (link in the description below), a DVA-5592 router, that it is distributed
in Italy by Wind to his ADSL and Fiber customers. We have this firmware file ending in.sig. First of all we use the "file" command that
tells us that we have a binary file. Then we use the "strings" command with the
"-n 6" parameter to reduce the number of insignificant strings, we pipe the output of this command
to the "less" command and we can see a lot of meaningful ASCII strings, this means that
this file is not encrypted, at least significant portions of this file is not encrypted or
compressed. We take a look at this file with the "hexdump
-C" command and we can see some meaningful strings at the beginning of the file. We run the "binwalk -E" command on this file
and we can see that the entropy is often at 1, but not always, this probably means that
this file is not encrypted, but that there are segments of compressed data and, probably,
compressed file systems were the data itself is compressed, but the metadata present in
each block probably is not. If we run the command "binwalk" without the
"-E" option we can see that binwalk has identified, at offset 512 decimal, a JFFS2 file system;
it is a popular file system in embedded devices with a NAND flash EEPROM. JFFS2 stands for Journaling Flash File System
version 2. It has identified a segment of gzipped compressed
data. If we run "binwalk" with the "-e" option it
will extract the data it has identified inside the firmware binary file. To extract the file system it will use external
commands, that must be available; in this case it needs the Jefferson open source software
to extract the JFFS2 file system. Link in the description below. We can see that it has extracted two files
inside the directory with the name of the firmware prefixed with an underscore and post-fixed
with a ".extracted" string. If we use the "file" command we can correctly
identify a tar file and see its content, it seems a file containing additional packages
and related checksums. Files ending in ".ipk" are package files,
similar to the.deb package files in a Linux Debian system. Ipk files are often used in embedded devices
with the "opkg" package manager. If we use the "file" command on the file 200.jffs2
we can see that it is correctly identified as a JFFS2 file system image with data in
Little Endian format. In a Little Endian file system, multi bytes
word, like a 32bit integer, is stored with the less significant byte first. We can see that under the folder jffs2-root,
binwalk has extracted the file system, it seems that we have 3 file systems: the first one fs_1 seems a boot file system
containing a bootloader image, cferam.000, and a compressed linux kernel; the second one, fs_2, seems the root file
system; the third one, fs_3, seems containing pieces
of root file system. In this case I analyzed this firmware, you
will find link to detailed analysis below, it is the Jefferson program that had a glitch,
and erroneously split the root file system in two parts, fs_2 and fs_3, instead they
both belong to the root file system. Anyway we have successfully extracted the
root file system from the firmware file downloaded from the supplier's website. It is now possible to analyze this file system. The third example is related to our Gemtek
sample router; in the previous episode we dumped the EEPROM image in a text file using
the "nand dump" command, available in the U-Boot bootloader, and then converted back
to a binary file. Our binary file, that is the exact image of
the 128Mb EEPROM is "eeprom.bin". We can use the "file" command on this file,
this command will read the first bytes of the file and will identify the U-Boot boot
loader because this is the first image located at the beginning of the file. If we use the "strings" command on this file
we find a lot of garbage, but also many embedded strings in the file system, this is quite
normal in an EEPROM image that usually is not encrypted, but can have one or more compressed
file systems inside. If we give a look at the file with the "hexdump
-C" command we can identify some strings at the beginning of the file. Using "binwalk -E" we can see the entropy
of this EEPROM file and we can see that we have two segments with entropy equal to 1,
probably these are related to compressed kernel and compressed squashfs root file system. Using "binwalk", with only the "-t" parameter
to pretty print his output, we can see that binwalk identify a lot of interesting information
like U-Boot image header, LZMA compressed data that probably is the kernel, a squashfs
file system and a UBI erase count header, but it hasn't clearly identified all the partitions
that we have inside the EEPROM. In the previous episode we looked at what
was printed on the serial console during the boot cycle, including the EEPROM partition
table, so we know exactly how the EEPROM is partitioned and this is an information unknown
to binwalk, for this reason it is much better to split the EEPROM with the "dd" command,
creating a file for each partition. For simplicity, and for using the "dd" command
more easily, we can can rewrite the EEPROM partition table, that was printed by the operating
system during boot, in decimal and in Kbytes or, if you prefer, in decimal numbers of 1024
byte blocks. We can now use the powerful linux "dd" command
to extract the 9 partitions from the EEPROM file and store them on 9 different files,
executing the "dd" command for each partition. The "dd" command is very useful to extract
arbitrary sequence of bytes from a file or to do the opposite, inserting arbitrary sequence
of bytes into a binary file. In this case it takes as arguments: "if", it is the input file, in our case it
is always "eeprom.bin"; "of", it is the output file, in our case it
is the partition number followed by the name of the partition; "bs", it is the block size, it is the number
of bytes that will be read And written in a single system call by the "dd" command. This parameter can have the value that we
want, it can also be 1, but it has a huge impact on performance, on the time needed
by "dd" to execute the operation. If it is "1", "dd" will do a read system call
for each byte, and it will be painfully slow; it can take tens of minutes to split a few
megabytes file. In our case, as a good trade-off between simplicity
and efficiency, we choose 1024 bytes, that is one Kbyte. Other parameters in the "dd" command, like
"skip" or "count", will indicate the number of this block size; this means, for example,
that count equal to 1024 means 1024 blocks that is equal to 1Mbyte, in this case; "skip" is the number of blocks to skip before
starting copying blocks from the input file to the output file. We can see that this value is zero for the
first partition, it is 1024 for the second partition, because we have to skip the previous
partition size, it is 2048 for the third partition, because we have to skip the previous partitions
and so on; "count" is the number of blocks to copy from
the input file to the output file; it is 1024 for the first, the second and the third partition;
it is 2560 for the fourth partition and so on. After having executed the script with these
"dd" commands we have now a file for each EEPROM partition, if we use the "file" command
on these partitions we can see that it correctly identify some partitions: partition 1, the bootloader partition, is
identified as a U-Boot image standalone program, not compressed that means that it is the boot
loader itself; partition 2, 3, 4 and 5 have not been recognized; partition 6 and 7 have been recognized as
multi-file U-Boot image, multi-file because they contain the Linux Kernel plus the squashfs
root file system; partition 8 and 9 are UBIFS file system images; We can see, using the "diff" command or the
"sha1sum" command, that calculates a SHA1 checksum, that the two kernel partitions,
6 and 7, are identical. We can use the "hexdump -C" command to look
at the unrecognized images and we can see that: partition 2, the bootloader2 partition, has
all bytes to FF, this means that it is an empty partition, probably it will be used
when this router will be upgraded by the upgrade procedure; partition 3, the configuration partition,
has some apparently random bytes inside, probably they are related to the default router configuration; partition 4, the U-Boot environment partition,
has some environment string inside, we can recognize the baud rate, the mac address and
so on; partition 5, the U-Boot environment 2 partition,
seems to have the same information as the previous partition but the two files are not
equal as we can see with the diff command; we can compare these two binary files with
the "binwalk" command with the options "-W", that compares the hex dump of the files, and
with the option "-i" that prints only lines that are different; we can see that they differ
for a single bit; Probably this bit is used to select the active U-Boot environment partition. We are mainly interested in the kernel partition
06 or 07, where the squashfs root file system is located, we could use "binwalk" on this
partition file and it correctly identify: the U-Boot header; the LZMA compressed data, that is the Linux
Kernel; the squashfs file system, xz compressed, it
is the root file system We could use binwalk to successfully extract
the squashfs root file system, but, again, we can find the layout of the U-Boot header
on Internet, than use "dd" to more exactly extract the U-Boot header, the compressed
kernel, and the squashfs file system image. If we look at the U-Boot layout, found on
related documentation on internet, and, at the same time, at the first bytes of the hex
dump of the kernel partition, and at the boot log that contains what the boot loader printed
during boot, we can see that the U-Boot header has the magic number to identify itself; the CRC header checksum; the creation time stamp; the image data size; the hex size in the kernel
partition hex dump is exactly the same as the decimal size in the boot.log file; the load address; the entry point address; the CRC for the image data checksum; the operating system, Linux in this case; the CPU architecture, MIPS in this case; the image type, a multi-file image in this
case; the compression type, lzma in this case; the image name, =01.01.02.90 in our case; then it has 8 bytes for the length of the
first image, that is the kernel, the length is specified in the first 4 bytes of these
8 bytes; in some U-Boot documentation the length is a 32bit integer, but in our case
we have also an additional 32bits, or 4 bytes, of zero; then we have 8 bytes for the length of the
second image (the squashfs file system image); 8 bytes of zero, to terminate the list; and then the first image, followed by the
second image; It is not difficult to extract the image lengths
from the U-Boot header and then, using "dd", extract the kernel and the squashfs file system
image. We can use the script shown on the screen. We can use "hexdump" to extract the image
lengths: "-s 64" means start at offset 64, where the
first image length starts, and print 4 bytes, as specified by the "-n" argument; the "-e" option specify the format string; the output of hexdump is given to "bc" to
make an hexadecimal to decimal conversion; First we extract the first 64 bytes from the
kernel partition, this is the U-Boot header itself; Then we extract the 24 bytes that includes,
the two 8-bytes image lengths, plus the 8 bytes of zero as terminating value. Then we extract the compressed kernel. Then we extract the squashfs file system. We execute this script and we can see that
the images we have extracted have exactly the same length, as the length of the images
written by the boot loader in the boot log file, and this confirms that we have correctly
extracted the two images. We can now, finally, extract the squashfs
root file system with the unsquashfs command; we execute this command, as regular non privileged
user, inside the fakeroot environment. Fakeroot replaces file manipulation functions,
simulating the effect the real library functions would have had, if the user was really root;
basically it fakes a root environment, this is useful in our case to not receive errors
when the unsquashfs tries to create device files or when unsquashfs tries to change file
ownership; both of these operations require root privileges. The "-s" (save option), is to save the fakeroot
environment so that we can restore this environment with the "-i" command; for example we can
run a shell inside fakeroot and explore the extracted file system as if it was extracted
by root, and as if we are root. The "-d" option specify the folder where to
extract the root file system. For example if we look at the extracted file
system as normal user we can see that all the files are owned by the user, and that
files under the /dev folder are normal files and not device files. If, instead, we execute bash in a fake rooted
environment, with the "-i" option reading the previously saved file, and we explore
the file system we are faked to be root, the files seems to have the correct ownership
and files in the /dev folder seems to be device files and not normal files. "fakeroot" will be particularly useful when
we will modify the root file system and will recreate the squashfs root file system. Without fakeroot we would need to be root
to create device files and to change file ownership; with fakeroot we will be able to
create a valid squashfs root file system image as normal user. We have successfully extracted the root file
system from our EEPROM image, in the next episode we will start analyzing the file system
with the purpose to hack the device, to login as root into the device, to identify interesting
binaries to reverse engineer in an emulation environment. If you have found this video interesting please
subscribe, help this channel grow, share this video with friends interested in hardware
hacking. Please click the subscribe button and the
notification bell to be notified when new episodes will be released. And don't forget to click the thumbs up icon! Please let me know, in the comment below,
if you have found this episode easy to follow or if you have found that my bad english is
a big obstacle in understanding the content. Please give me feedback, writing comments
below, let me know if you have suggestions to improve this channel, or if you have enjoyed
or not this video. Every comment, both positive and negative,
is welcome! Thank you for watching, see you again on this
channel.