#05 - How To Get The Root File System - Hardware Hacking Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
If you have downloaded the firmware file for your device from the supplier's website or if you have dumped the EEPROM from your device and you want to extract the root file system and other information, this is the video for you! In this episode I will talk about the available options to understand where the root file system is located in the firmware image, and the tools to use to extract it with the purpose to analyze it. I am Valerio Di Giampietro, I am an Electronic Engineer with a background in Digital Electronics and in Information Technology Infrastructures. I would like to be your friendly Italian hacker neighbor, willing to share with you tools and techniques for hardware hacking that I learned by myself hacking many devices. And now let's start! This is the fifth episode of the series "Hardware Hacking Tutorial" in this complete series we will talk about the hacking process based on: Information Gathering from our device. Building an emulation environment where to run interesting binaries. Discovering how the device works. And then hack the device and modify its firmware. This episode is about extracting the root file system, it is the last step in the information gathering phase. In this episode we will use 3 different types of firmware file: An encrypted firmware update file for a digital camera, downloaded from the supplier's website. I will not succeed to extract the root file system, but we will learn something useful anyway. Another file is a firmware upgrade for an home router, downloaded from the supplier's website; we will successfully extract the file system, with some minor issues. The last file is an EEPROM dump that we dumped from the sample Gemtek router in the previous episode. We will do everything on our Linux box using some simple tools: Like the "file" command, that gives very basic information about any type of file. The "strings" command, that prints embedded strings in a binary file. The "hexdump" command, that prints the hex dump of a file, including the ASCII equivalent of each byte. The "binwalk" software, it is able to scan a binary file searching signatures of many different file system images, of compressed data segments, of digital certificates and of many other type of information embedded on a single binary file. It is also able to show the running entropy of a file allowing us to understand if we have an encrypted or compressed segment inside the binary file. The "dd" command, it is able to dissect a file, easily extracting part of it, or reassembling a file putting together different parts. We have downloaded from the Canon website the firmware update for my Canon EOS M50 camera, the one that I am using now to record this video. Link to the download page in the description below. We get 2 files, plus the upgrade instruction manual: One file, CCF19.DAT, is only about 4 Kb The other file, CCF19103.FIR is about 31Mb and it is the firmware file If we use the "file" command on these files it tells us that we have data files. The "file" command will scan the first bytes of each file trying to understand the format of the file, so it will give us information if the file is an executable file, a shell script, a text file, a compressed file, a video or image file and so on. But if it doesn't find anything known it simply says, as in this case, that we have a data file. We can use the "strings" command on each of these files. The "strings" command will search, and print, a sequence of n consecutive chars that are printable ASCII characters, by default n is equal to 4 and it will produce a lot of random garbage on output; one useful argument is "-n" to increase this default value. If we increase the minimum string length to 6 we have a reduced number of garbage lines, but we can see that nothing useful is printed on both files. We can look at these files with the "hexdump" command. Hexdump can dump a file in ASCII, decimal, octal or hexadecimal; the most useful option is "-C" to dump in both hexadecimal and ASCII. If we take a look at the smaller file, CCF19.DAT, piping the output of hexdump to the "less" command, we are not able to see anything recognizable or interesting, it seems just random bytes. If we look at the firmware file, CCF19103.FIR, we can recognize some consecutive strings of zero at the beginning of the file, we can recognize the string "1.0.3", that is the firmware version, but from byte 00.00.00.bc onward, it seems that we have only random bytes. The situation is the same till the end of the file. We can now use the "binwalk" command to scan the entire content of these two files to search for signatures of file systems, compressed segments, digital certificates, and so on. But we can see that "binwalk" hasn't recognized any signature inside each one of the binary files. Binwalk is a very powerful utility, it also has the very useful option "-E" to plot the entropy of a binary file. The entropy goes from "0" to "1" and it is a sort of measure of the randomness of the byte sequence in the binary file, this means that: a totally random file will have entropy equal to 1; if we create a 1 Mb random file reading form the random device generator, /dev/urandom, and then we use "binwalk -E" we can see that the entropy is always near to 1; a binary executable ELF file that has some form of redundancy and different sections inside, has an entropy always below 1 as the example with the binary file /bin/ls shows; a text file, like this long wikipedia article that is a "List of mountains of the British Isles by height", downloaded with wget, has a great deal of redundancy and his entropy is always well below 1; a compressed file, for example the same wikipedia article compressed with gzip, has an entropy always near 1, because the compression operation, to save space, removes the redundancies in the original file; an encrypted file, like the same wikipedia article encrypted with mcrypt, has an entropy always near 1, because often the encryption is preceded by a compression and because the encryption algorithm has the purpose to transform a file to something similar to an unintelligible random sequence of bytes; Now that we understand better what binwalk entropy means we can execute binwalk on the shorter file first and we can see that the entropy is near 1 for almost the entire file, it decreases toward the end, but if we look with "hexdump -C" at the end of the file we cannot spot anything meaningful. If we repeat the same thing on the firmware file we can see that the entropy is always near 1 for the entire file. This means that the firmware file is encrypted, probably the smaller.DAT file has something to do with the encryption algorithm but we don't know exactly. Anyway, if we are not lucky and don't find anything on Internet on how to decrypt this firmware file, we have to find another way to dump the firmware as explained in the previous episode of this series. In this case this means disassembling the device, trying to identify components, UART and JTAG interfaces and so on, but this device is a 700 $ camera, extremely compact and with a high probability of breaking something during disassembly and re-assembly, so I will not try to do it! If some previous firmware was not encrypted we could download the last unencrypted version, extract his file system and try to understand and reverse engineer the self upgrade procedure that, for sure, will contain the decrypting algorithm. Unfortunately this is not the case for our Canon EOS M50 camera! There are no available old, unencrypted, firmwares. As second example we have downloaded the firmware file for a D-Link router (link in the description below), a DVA-5592 router, that it is distributed in Italy by Wind to his ADSL and Fiber customers. We have this firmware file ending in.sig. First of all we use the "file" command that tells us that we have a binary file. Then we use the "strings" command with the "-n 6" parameter to reduce the number of insignificant strings, we pipe the output of this command to the "less" command and we can see a lot of meaningful ASCII strings, this means that this file is not encrypted, at least significant portions of this file is not encrypted or compressed. We take a look at this file with the "hexdump -C" command and we can see some meaningful strings at the beginning of the file. We run the "binwalk -E" command on this file and we can see that the entropy is often at 1, but not always, this probably means that this file is not encrypted, but that there are segments of compressed data and, probably, compressed file systems were the data itself is compressed, but the metadata present in each block probably is not. If we run the command "binwalk" without the "-E" option we can see that binwalk has identified, at offset 512 decimal, a JFFS2 file system; it is a popular file system in embedded devices with a NAND flash EEPROM. JFFS2 stands for Journaling Flash File System version 2. It has identified a segment of gzipped compressed data. If we run "binwalk" with the "-e" option it will extract the data it has identified inside the firmware binary file. To extract the file system it will use external commands, that must be available; in this case it needs the Jefferson open source software to extract the JFFS2 file system. Link in the description below. We can see that it has extracted two files inside the directory with the name of the firmware prefixed with an underscore and post-fixed with a ".extracted" string. If we use the "file" command we can correctly identify a tar file and see its content, it seems a file containing additional packages and related checksums. Files ending in ".ipk" are package files, similar to the.deb package files in a Linux Debian system. Ipk files are often used in embedded devices with the "opkg" package manager. If we use the "file" command on the file 200.jffs2 we can see that it is correctly identified as a JFFS2 file system image with data in Little Endian format. In a Little Endian file system, multi bytes word, like a 32bit integer, is stored with the less significant byte first. We can see that under the folder jffs2-root, binwalk has extracted the file system, it seems that we have 3 file systems: the first one fs_1 seems a boot file system containing a bootloader image, cferam.000, and a compressed linux kernel; the second one, fs_2, seems the root file system; the third one, fs_3, seems containing pieces of root file system. In this case I analyzed this firmware, you will find link to detailed analysis below, it is the Jefferson program that had a glitch, and erroneously split the root file system in two parts, fs_2 and fs_3, instead they both belong to the root file system. Anyway we have successfully extracted the root file system from the firmware file downloaded from the supplier's website. It is now possible to analyze this file system. The third example is related to our Gemtek sample router; in the previous episode we dumped the EEPROM image in a text file using the "nand dump" command, available in the U-Boot bootloader, and then converted back to a binary file. Our binary file, that is the exact image of the 128Mb EEPROM is "eeprom.bin". We can use the "file" command on this file, this command will read the first bytes of the file and will identify the U-Boot boot loader because this is the first image located at the beginning of the file. If we use the "strings" command on this file we find a lot of garbage, but also many embedded strings in the file system, this is quite normal in an EEPROM image that usually is not encrypted, but can have one or more compressed file systems inside. If we give a look at the file with the "hexdump -C" command we can identify some strings at the beginning of the file. Using "binwalk -E" we can see the entropy of this EEPROM file and we can see that we have two segments with entropy equal to 1, probably these are related to compressed kernel and compressed squashfs root file system. Using "binwalk", with only the "-t" parameter to pretty print his output, we can see that binwalk identify a lot of interesting information like U-Boot image header, LZMA compressed data that probably is the kernel, a squashfs file system and a UBI erase count header, but it hasn't clearly identified all the partitions that we have inside the EEPROM. In the previous episode we looked at what was printed on the serial console during the boot cycle, including the EEPROM partition table, so we know exactly how the EEPROM is partitioned and this is an information unknown to binwalk, for this reason it is much better to split the EEPROM with the "dd" command, creating a file for each partition. For simplicity, and for using the "dd" command more easily, we can can rewrite the EEPROM partition table, that was printed by the operating system during boot, in decimal and in Kbytes or, if you prefer, in decimal numbers of 1024 byte blocks. We can now use the powerful linux "dd" command to extract the 9 partitions from the EEPROM file and store them on 9 different files, executing the "dd" command for each partition. The "dd" command is very useful to extract arbitrary sequence of bytes from a file or to do the opposite, inserting arbitrary sequence of bytes into a binary file. In this case it takes as arguments: "if", it is the input file, in our case it is always "eeprom.bin"; "of", it is the output file, in our case it is the partition number followed by the name of the partition; "bs", it is the block size, it is the number of bytes that will be read And written in a single system call by the "dd" command. This parameter can have the value that we want, it can also be 1, but it has a huge impact on performance, on the time needed by "dd" to execute the operation. If it is "1", "dd" will do a read system call for each byte, and it will be painfully slow; it can take tens of minutes to split a few megabytes file. In our case, as a good trade-off between simplicity and efficiency, we choose 1024 bytes, that is one Kbyte. Other parameters in the "dd" command, like "skip" or "count", will indicate the number of this block size; this means, for example, that count equal to 1024 means 1024 blocks that is equal to 1Mbyte, in this case; "skip" is the number of blocks to skip before starting copying blocks from the input file to the output file. We can see that this value is zero for the first partition, it is 1024 for the second partition, because we have to skip the previous partition size, it is 2048 for the third partition, because we have to skip the previous partitions and so on; "count" is the number of blocks to copy from the input file to the output file; it is 1024 for the first, the second and the third partition; it is 2560 for the fourth partition and so on. After having executed the script with these "dd" commands we have now a file for each EEPROM partition, if we use the "file" command on these partitions we can see that it correctly identify some partitions: partition 1, the bootloader partition, is identified as a U-Boot image standalone program, not compressed that means that it is the boot loader itself; partition 2, 3, 4 and 5 have not been recognized; partition 6 and 7 have been recognized as multi-file U-Boot image, multi-file because they contain the Linux Kernel plus the squashfs root file system; partition 8 and 9 are UBIFS file system images; We can see, using the "diff" command or the "sha1sum" command, that calculates a SHA1 checksum, that the two kernel partitions, 6 and 7, are identical. We can use the "hexdump -C" command to look at the unrecognized images and we can see that: partition 2, the bootloader2 partition, has all bytes to FF, this means that it is an empty partition, probably it will be used when this router will be upgraded by the upgrade procedure; partition 3, the configuration partition, has some apparently random bytes inside, probably they are related to the default router configuration; partition 4, the U-Boot environment partition, has some environment string inside, we can recognize the baud rate, the mac address and so on; partition 5, the U-Boot environment 2 partition, seems to have the same information as the previous partition but the two files are not equal as we can see with the diff command; we can compare these two binary files with the "binwalk" command with the options "-W", that compares the hex dump of the files, and with the option "-i" that prints only lines that are different; we can see that they differ for a single bit; Probably this bit is used to select the active U-Boot environment partition. We are mainly interested in the kernel partition 06 or 07, where the squashfs root file system is located, we could use "binwalk" on this partition file and it correctly identify: the U-Boot header; the LZMA compressed data, that is the Linux Kernel; the squashfs file system, xz compressed, it is the root file system We could use binwalk to successfully extract the squashfs root file system, but, again, we can find the layout of the U-Boot header on Internet, than use "dd" to more exactly extract the U-Boot header, the compressed kernel, and the squashfs file system image. If we look at the U-Boot layout, found on related documentation on internet, and, at the same time, at the first bytes of the hex dump of the kernel partition, and at the boot log that contains what the boot loader printed during boot, we can see that the U-Boot header has the magic number to identify itself; the CRC header checksum; the creation time stamp; the image data size; the hex size in the kernel partition hex dump is exactly the same as the decimal size in the boot.log file; the load address; the entry point address; the CRC for the image data checksum; the operating system, Linux in this case; the CPU architecture, MIPS in this case; the image type, a multi-file image in this case; the compression type, lzma in this case; the image name, =01.01.02.90 in our case; then it has 8 bytes for the length of the first image, that is the kernel, the length is specified in the first 4 bytes of these 8 bytes; in some U-Boot documentation the length is a 32bit integer, but in our case we have also an additional 32bits, or 4 bytes, of zero; then we have 8 bytes for the length of the second image (the squashfs file system image); 8 bytes of zero, to terminate the list; and then the first image, followed by the second image; It is not difficult to extract the image lengths from the U-Boot header and then, using "dd", extract the kernel and the squashfs file system image. We can use the script shown on the screen. We can use "hexdump" to extract the image lengths: "-s 64" means start at offset 64, where the first image length starts, and print 4 bytes, as specified by the "-n" argument; the "-e" option specify the format string; the output of hexdump is given to "bc" to make an hexadecimal to decimal conversion; First we extract the first 64 bytes from the kernel partition, this is the U-Boot header itself; Then we extract the 24 bytes that includes, the two 8-bytes image lengths, plus the 8 bytes of zero as terminating value. Then we extract the compressed kernel. Then we extract the squashfs file system. We execute this script and we can see that the images we have extracted have exactly the same length, as the length of the images written by the boot loader in the boot log file, and this confirms that we have correctly extracted the two images. We can now, finally, extract the squashfs root file system with the unsquashfs command; we execute this command, as regular non privileged user, inside the fakeroot environment. Fakeroot replaces file manipulation functions, simulating the effect the real library functions would have had, if the user was really root; basically it fakes a root environment, this is useful in our case to not receive errors when the unsquashfs tries to create device files or when unsquashfs tries to change file ownership; both of these operations require root privileges. The "-s" (save option), is to save the fakeroot environment so that we can restore this environment with the "-i" command; for example we can run a shell inside fakeroot and explore the extracted file system as if it was extracted by root, and as if we are root. The "-d" option specify the folder where to extract the root file system. For example if we look at the extracted file system as normal user we can see that all the files are owned by the user, and that files under the /dev folder are normal files and not device files. If, instead, we execute bash in a fake rooted environment, with the "-i" option reading the previously saved file, and we explore the file system we are faked to be root, the files seems to have the correct ownership and files in the /dev folder seems to be device files and not normal files. "fakeroot" will be particularly useful when we will modify the root file system and will recreate the squashfs root file system. Without fakeroot we would need to be root to create device files and to change file ownership; with fakeroot we will be able to create a valid squashfs root file system image as normal user. We have successfully extracted the root file system from our EEPROM image, in the next episode we will start analyzing the file system with the purpose to hack the device, to login as root into the device, to identify interesting binaries to reverse engineer in an emulation environment. If you have found this video interesting please subscribe, help this channel grow, share this video with friends interested in hardware hacking. Please click the subscribe button and the notification bell to be notified when new episodes will be released. And don't forget to click the thumbs up icon! Please let me know, in the comment below, if you have found this episode easy to follow or if you have found that my bad english is a big obstacle in understanding the content. Please give me feedback, writing comments below, let me know if you have suggestions to improve this channel, or if you have enjoyed or not this video. Every comment, both positive and negative, is welcome! Thank you for watching, see you again on this channel.
Info
Channel: Make Me Hack
Views: 7,872
Rating: 4.9795399 out of 5
Keywords: Firmware Hacking, Router Firmware Hacking, Binwalk, Router File System, Firmware Analysis, How To Get Firmware, How To Get The Root File System, Extracting Root File System, Embedded File System, Embedded Root File System, Root File System, Hacking U-Boot, Extracting U-Boot images, Hardware Hacking Tutorial, Hardware Hacking, How To Do Hardware Hacking, Reverse Engineering, Practical Hacking, Hacking Tutorial, Hacking for beginners, Router Hacking, Gemtek, Firmware, Linkem
Id: -AYmTMILsM8
Channel Id: undefined
Length: 33min 20sec (2000 seconds)
Published: Sat Apr 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.