Reverse Engineered old Compression Algorithm for Frogger

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
one thing I really love about the Internet is that there are communities for anything you can imagine and I'm so fascinated by people completely nerding out over something weird for example the 1997 video game Frogger there are people who still spent their time in 2020 on this game and I loved it so today we are going to explore a weird data compression algorithm used within this game [Music] before we go very technical let's just learn more about Frogger nice nap the master hopper of the highway frogs forum contacted me quite a while ago and asked if I would be interested in making a video about his work on a tool called frog Lord and Frogger in general like I said I find the stuff always very fascinating so we jumped on a call and he showed me stuff so this is Frogger as you can see a very old game you are playing as a frog and you have to get through the level without dying hopping around and evading enemies and obstacles quite a trivial game with typical old-school 3d graphics nice nap has quite a lot of nostalgia for this game as you can see the camera is very close to the Frog and doesn't reveal much about the whole level he told me that when he was a kid the game always felt mysterious these levels seemed so large but it was difficult to explore the maps because there was a very strict timer I can totally relate to the sense of wonder from the games of our childhood in fact I will have some videos coming out about a game from my childhood soon anyway so nice nap got motivated to work on this tool called Frog Lord it's a modding tool it allows you to open the game files and explore them and even make modifications this is how the Frog of photo structure looks like the exe obviously has the code but the important file containing the assets of the game is this MWD file they don't even know what exactly the name of this file type is they assume it's called medieval word or millennium WOD nice nap explained to me that this is a very old file format it was designed for CDs CDs are split up in multiple sectors and so this file format makes sure to align different parts of it to the sectoral boundaries that's why there's a lot of empty data areas until the next section begins this makes reading from the CD more efficient so in there must be all the different assets used by the game and so this file contains multiple files and it's quite easy to know where the single files start and end because they start at those boundaries and have all this padding to the next section but these embedded files have again their own format at the start of a file you typically find some information about the file type this is referred to as the file magic and so this file for example has a signature pp20 followed by a lot of gibberish gibberish data can hint at two things it could be encrypted or it could be compressed so we have here an indication that this is a compressed file and actually when nice snap started with frog tool there existed already a tool that could decompress this data the tool is called quick BMS by a Luigi quick BMS supports tons of games and file formats archives encryptions compressions obfuscation and other algorithms it's an advanced tool for reverse engineers in case you wonder how could one figure this compression in the first place well there are a few options maybe you just know it because you worked in that kind of industry maybe you'll find stuff documented online about pp20 or you sit down and do with a very laborious work of debugging and reverse engineering the game itself when it loads this data the compression algorithm must be included in the Frogger game right so this is already cool there were tools to decompress the data SS from fragra and you could explore them now in the first you get after decompressing it might also be another custom file format and you need to figure that one out but you're one step closer so let's learn a bit more about this file compression nice nap explained some basics to me so PP 20 is the magic that you can check and be sure yep this is my compressed file and then the following data contains some metadata and all the way at the end of the file three of the last four bytes is the uncompressed file size so you can already allocate the size needed for the decompressed file and this byte the hex 1f means you need to skip 31 bits backwards the whole decompression works kind of from the end of the file you see very weird stuff however that's only the compression what if you would want to modify some assets and then compress it again so based on the decompression algorithm you could reverse engineer what pression algorithm does but it turns out that actually this compression algorithm seemed to have been used in the amiga community this compression algorithm is called power Pecha 2.0 pp20 it's made by some guy in like 1989 and nice nap actually fought the original compression tool but he has no clue how to run it it's a compiled library for the amiga so it's kind of interesting that such an old compression algorithm remember it came out in 1989 for the amiga was supported for a few years now shows up in 1997 in the Frogger game so it's speculation but it could be that the Frogger developer used to work with Amiga and knew it from there I've had this fascinating but anyway in general this compression algorithm seems to be based on El Zetas s Lempel Ziv store Szymanski el CSS is a lossless data compression algorithm and wikipedia roughly shows how it works let's say you want to compress this text you can see it has some repeating patterns and those you can reference so in the compressed text you encode offset in length instead of having the actual text itself so when you decompress this vowel you read in this data reach this reference and it says please go to the fifth character and take three bytes and then please go to the start and take four bytes this is very simple but has an interesting property a file without any references is still a valid compressed file and the decompressor could read it it would just not do anything it also means the work and efficiency of the compression relies on the compress algorithm because depending on what it chooses to reference or not choose it does good compression or bad compression it could be very fast or very slow so if you are very lazy you don't really have to do much to compress a file just keep it uncompressed and rapid in the metadata information from pp20 but obviously there is less cool then developing an actual compression algorithm also you might need it to fit the data into the smaller CD format so let's check out how the decompression works in a bit more detail I think it's a neat introduction to how data compression works I think with that you can develop some intuition for it first I'm using IntelliJ to clone the Frog to a repository and follow the setup steps also I had to install the Java 1.8 SDK and installed the Lombok plugin then I was able to build the project but we are only interested in the PPA 20 pecker and unpacker so we can create a new minimal main here's a simple hello world and in the build configuration I create my own one that launches our new main testing running it and it prints hello world grid now we can simply access the Frog tool classes pp20 pecker and unpacker I also added a basic hex stamp and binary dump so we can look at the actual data and this works now we can compress a string with Peck data and decompress it with unpacked data next I'm adding some debug output in various places of the pp20 unpacker class this way we can easily follow what happens during the compression let's start simple let's compress the single letter A and then unpack it again here's the output we see this is the compressed data in hex and this is the same data in binary in this output I have removed the first four bytes the first bytes would be the magic files during PPE two but it's not part of the actual decompression so I removed it now let's see what happens let's head into unpack as mentioned earlier the decompression starts from the back so he had reads the last byte apparently bits to skip but the last byte is zero so nothing to skip then we get the decoded data size which is taken from the next three bytes from the end and you can see those three bytes are of one so the output length is 1 which will be the single a this also means max file size you can compress with this is hex ffffff so 16 megabytes if we had bits to skim we would skip them here but it was zero so at this point we processed all of those bytes and our bit reader is currently here now we decode the segment it reads the next bit and checks if it's zero if it's zero it's an indication that raw data is following in that case we execute copy from input the data we want to unpack as the input so copy from put here we initialize a counter how much data to read and follow it up with reading to bits if the two bits are a three so in binary 1 1 we would add 3 to the counter and enter a loop to read the next two bits but we only have a single byte to read the signal a so you read 2-0 bits they are later added to the counter which was initialized with 1 so 1 plus 0 is 1 which means we copy one bite from the input to the output and so here we read the next 8 bit which is that one byte 0 1 0 0 0 0 0 1 is the binary ASCII value for capital a so we copy the single a to the output and we are done now let's check what happens with two A's I'm switching back and forth between single and two A's you can see here minor changes first we see that the output decompressed size is now 1 0 so a decimal tool and the 2 bits read for the input length is now a 1 so 1 plus 1 is 2 we read 2 bytes 2 times 8 bits now let's try 3 a's here's where it starts to get interesting we read again the skip byte which is 0 then we read the 3 bytes for the final output length which is now 1 1 so 3 and then we read the next bit which is 0 indicating we have raw data again so copy from input we read the next 2 bits which indicate the input length its default value is 1 so 1 plus 0 is 1 so we read 1 raw byte and then again is a single a and that single 8 is the last a in the output now let's check where the other two A's come from after the copy from input we now run into the copy from decoded the color is basically the output so here we are entering logic that copies data from the already outputted data make sense we want to copy the single a that we got so first we read the compression level bits no clue what that means and if it would be 1 1 we would have extra length data but we read 0 0 so that's not the case and based on that info we decide an amount of off bits to read it takes from this offset bit length array which is actually 7 cents and 7 which you can find at the start of the compressed data to be honest I don't really understand its purpose it's always 7 all I can say is that the copy' length is minimum - in our case the amount of bytes we still need is exactly - so the off value that we read is zero which means we copied 2 bytes from the already decoded output here's the loop doing that it copies the last a to the second last position then it copies the second last position for the third last so the start of the string and there we have our output now this was rather inefficient our compressed data is so much longer than just single three days but let's try to compress a very long a string 500 days let's compress and uncompress there we go the compressed data is much shorter now like always we start at the end we have zero bits of script then we have the decompress sighs so this is binary for the 500s then we check if we have raw data it's zero so yes then we read the next two bits they are zero by default it's one so one plus zero is the length of rope bytes we read so one the next eight bits are the byte we read and that is the a again zero one zero zero zero zero one so we copy the very last a into the output now it gets a bit more interesting in the copy from decode first we read the two compression level bits which are one one so a three which means we have extra length data then we read another bit which is zero so we take the default optional bits small offset which is also a seven again no clue what's up with those sevens all I know is we read the next seven offset bits and they are all zero so no offsets but our compression level was binary 1:1 so a 3 so we have 3 plus 2 which sets the copy length to 5 but of course we have a lot more a is to copy than just 5 and this is indicated by the has extra length data and here we have now a while loop but it always reads 3 bits and adds that value to the copy length so 3 bits or ones would be a decimal 7 so as long as we read one on one we add 7 to the copy length in our compressed data you can see a lot of ones so it will find many of these 3 one-one-one packages and keep adding seven plus seven plus seven it will add so many sevenths that we eventually get close to the 499 copy length we need as long as it's one-on-one and we'll keep adding more but eventually we read the binary one zero zero and that means we read the last value to add we are done with the loop and our copy length is now four nine nine and then we have the same copy loop as before we are copying 499 times the decoded output and we recover our 500 ace so now you got a feeling for how compression algorithms might work and all of this work is being done for this old game Frogger I love nerds as I introduced in the beginning nice nap wrote this code for modding with frog tour and he showed me some cool discoveries through the tool and how you can explore the levels and even modify them basically change the levels or maybe even make your own levels I think he would be really excited if anybody would try to play around with Fraga and the modding tool and maybe share something on their forum it's a small community so there's space to make something nice I'm also linking another video below where nice nap collaborated with another youtuber where they are just exploring some interesting files and maps if you're curious about this kind of data mining it's highly entertaining thanks nice nap for making the internet just a little bit more awesome and of course all the other people involved in the Frogger community I think this kind of work and community is what really inspires me to keep digging into computers and technologies and doing it in my area of interest CTFs and security and they do it in old games it's awesome some people might think that something like this is just a waste of time but nice nap has gotten so much experience and skill from building such a tool which will certainly help him in his career and in the end what they are doing is also hacking and reverse engineering just with a different goal not necessarily to break security but to explore the mysteries of the imaginations who had as a child playing those levels and I can relate to that and that's why I want to explore Pokemon Red and Blue next [Music] you [Music]
Info
Channel: LiveOverflow
Views: 252,918
Rating: undefined out of 5
Keywords: Live Overflow, liveoverflow, hacking tutorial, how to hack, exploit tutorial, frogger, frooglord, kneesnap, froggerhighway, compression, pp20, powerpacker, 2.0, powerpacker 2, frog, unpack, pack, zip, tar, 7zip
Id: BwoOB2QFXvw
Channel Id: undefined
Length: 16min 28sec (988 seconds)
Published: Mon Apr 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.