MissingNo.'s Glitchy Appearance Explained

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This was an interesting, well presented video.

👍︎︎ 1 👤︎︎ u/misterfrenik 📅︎︎ Apr 29 2020 🗫︎ replies

So many deleted Pokemon ☹️

👍︎︎ 1 👤︎︎ u/PM_Dix 📅︎︎ Apr 29 2020 🗫︎ replies
Captions
When it comes to video game glitches, MissingNo. from the first generation of Pokémon games comes to mind for a lot of people. From its distinctive grainy texture to its iconic stair-step shape, it really embodies the feeling of glitchiness. But just how and why does MissingNo. look like this? For a placeholder literally named 'missing number,' one might expect this Pokémon to show up as a solid square, a placeholder graphic, or maybe even not show up at all. In this video, we'll look at exactly how Pokémon are laid out and drawn onto the screen--uncovering the unintentional design of MissingNo. and other glitch Pokémon. The odd visual form of MissingNo. has to do with several errors that occur when decoding the Pokémon's sprite data and displaying it onto the screen. But before we tackle that, let's first look at how the sprite data is stored in the ROM in the first place. By knowing the format these graphics use, the whole process will be easier to understand. During a battle, a 56x56 pixel square area is designated as the area where the opponent Pokémon shows up. This area can be divided up into 49 8x8 tiles--the entire area is 7 tiles wide and 7 tiles tall. These tile graphics are stored in column order--that is, starting at the top left and going down. The graphics used here (and on the Game Boy in general) have a bit depth of 2 bits per pixel, which allows for 4 different colors. This means that each tile will take up 16 bytes in memory. Multiply that by the 49 total tiles, and you'll find that the buffer size that holds the Pokémon's sprite data is 784 bytes large. This is how the sprite is displayed during a battle. However, keeping the data in this format in the ROM just would be wasteful. 784 bytes is a lot of space, especially for just a single sprite. Considering the fact that there are 151 Pokémon, this would mean over 100 thousand bytes just for Pokémon front sprites. Then considering literally all of the other graphics in the game, that would be a crazy amount of data just for graphics alone. To save a lot of space very easily, the graphics data was compressed before inserted into the ROM. Then, when a sprite is loaded, the graphics are decompressed on the fly. For example, this Articuno sprite data is only 568 bytes when compressed in the ROM, before being decompressed into the 784 bytes in video RAM. The actual compression and decompression used in the game is fairly complicated, so I won't go into too much detail here, but there is one important thing that you'll need to know for later. As mentioned earlier, each pixel requires two bits to know what color to use. When loaded in video RAM, these two bits are near each other, and two bytes in video RAM represents one row of 8 pixels. But to make the compression more efficient, these two bits are stripped apart from one another. By separating these bits throughout the entire sprite (which is referred to as a bitplane), instead of one sprite at 2 bits per pixel, we get 2 sprites at 1 bit per pixel. These two sprites are then compressed and stored one after the other. This was what was done to large Pokémon sprites like this. However, a lot of Pokémon took much less than the allotted 56x56 area. Take Charmander for example. There is a lot of empty space around its sprite. Even though this would probably compress really well, it is still a waste to include it within the sprite data, since there's nothing actually here. Therefore, this extra empty space around the edges was omitted before compressing the sprite data. More specifically, a bounding rectangle was placed around the sprite to include as few empty tiles as possible. In practice, only 5x5, 6x6, and 7x7 bounding boxes were used, but as we'll see later, the algorithm that deals with this can support all the way to a 1x1 sprite, as well as rectangular bounding boxes. The sprite data is still stored in order from top to bottom, left to right. Then, one additional byte is used to tell how large the bounding box was. The upper 4 bits held the box's height in tiles, while the lower 4 bits held its width. This byte was copied into two separate locations: it was bundled with the compressed graphics data to help the decompression algorithm piece together the sprite correctly, and it was located in a separate table for the algorithm that reinserts these blank tiles to get ahold of. Now, let's look at the algorithms that were used to convert this compressed data back into a viewable sprite image. In order to manipulate the data correctly, a buffer of 1176 bytes was used, which is enough space to hold three 56x56 pixel sized sprites, or 147 8x8 tiles at 1 bit per pixel. This buffer was split into three equally sized portions, which were 392 bytes each--I'll refer to them as buffers A, B, and C. In this view here, I'll display the data in column order at 1 bit per pixel--this makes it easier to see what exactly is going on. And over here I'll show the data in row order at 2 bits per pixel, which is more likely what you would see if you were using a memory viewer. You can tell from how messy this looks that it is important to interpret graphics data at the correct bit depth. First, the 16-bit pointer to the compressed graphics data was loaded from the Pokémon's statistics table, and the single byte that holds the bounding box size was reserved. Then, the two bitplanes were decompressed into buffers B and C. The method here involves several steps, but it's pretty complex and out of the scope of this video, so we'll just skip to the part where the graphics data is decompressed and in the correct format here. Note that the entire buffer may not be used, particularly if the sprite we're dealing with is smaller than 56x56 pixels. If it is 56x56, then the entire buffer is filled. The data should however be the exact same size in each of the buffers, since these bitplanes are two halves of the same sprite. Now the data has been decompressed, but the empty tiles outside of the sprite's bounding box haven't been inserted yet. For example, if we were to merge the bitplanes and move the tiles into video RAM as they are now, it would be a mess since none of these empty tiles around the edges exist yet. To account for this, a simple routine is used to piece together the correct version of the sprite given the graphics data and the bounding box size. Here is a basic outline of what the routine does. Step one: initialize all of buffer A into all zeros, so the entire bitplane is blank. Step two: calculate the offset of the top left corner of the sprite. We'll come back to this in a bit. This step will give us a pointer to memory within buffer A where the graphics data should be copied to. Step three: copy over one column of tiles. The number of bytes transferred will be eight times the height of the sprite's bounding box, since each tile takes up 8 bytes here. Step four: revert the pointer back to what it was before copying over this column, then increment it by 56 bytes. This effectively sets up the pointer to be at the top of the next column. And step five: repeat steps 3 and 4 for each column remaining in the sprite. These steps will run a number of times equal to the width of the sprite's bounding box. After this subroutine finishes, the Pokémon's sprite will be properly positioned within the allotted 56x56 pixel area. Or at least, one of the bitplanes will be. The same subroutine is run again on the data within buffer C, but it copies into buffer B. So after all of this, buffer A contains the first bitplane and buffer B contains the second bitplane. To finish building the sprite, the two bitplanes are zippered together to match the graphics format that the Game Boy requires so it can actually display the sprite correctly. The data in buffers A and B are copied into buffers C and B starting from the end and working backwards. This sprite takes up two buffers since the number of bits representing each pixel has doubled from 1 to 2. And finally, the 784 bytes of buffers B and C are transferred into video RAM to be displayed on the screen. Okay, let's revisit that one step that I skipped over: finding the top left corner of the bounding box of the sprite within the 56x56 pixel area. First, I'll mark off each 8x8 tile's index number in this area, in order of how they are stored in memory. The goal is to fit a smaller box of graphics into this larger box, and find the tile index number of its top left corner. The smaller box should always rest on the bottom of the larger box, and it should also be centered horizontally. We can find the index of the top left corner by finding the horizontal offset and vertical offset of the smaller box in tiles. Then, if we multiply the horizontal offset by 7 and add the vertical offset, that will give us the tile offset we want. For example, if the smaller box were 5x5, we would want it to show up here. It has a horizontal offset of 1 and a vertical offset of 2. Seven times one plus two is nine, which is exactly the index of its top left corner. First, the vertical offset. This one is actually pretty easy; it is just 7 minus the height of the bounding box. We can double check this by seeing that a 7x7 sprite should have an offset of zero, and that our 5x5 sprite had an offset of 2. The horizontal offset is a bit trickier. The formula for this offset is 7 minus the width of the bounding box, all divided by 2, plus one half, and rounded down to the nearest whole number. Again we can double check by seeing that a 7x7 sprite should have an offset of zero, and that the 5x5 sprite had an offset of 1. The plus one half part ensures that even-numbered width sprites, which can't be centered exactly, will favor the right side of the allotted area instead of the left. Now that we have the horizontal and vertical offsets, the index number of the top left corner can be calculated. However, the algorithm that inserts the empty tiles deals with bytes of graphics data, not whole tiles. At this point in time, the graphics are in 1 bit per pixel format, so one tile takes up 8 bytes. This means to convert a tile index into a byte index, we multiply it by 8. This may have seen like a lot of pedantic mathematics, but it really is important to uncover the mysterious shape that MissingNo. takes. Before moving on to MissingNo., let's look at a few more examples so you can see this algorithm in action. First, a 7x7 sprite. The data is decompressed and decoded into buffers B and C. Now, without the need for any spacing, these already look pretty good. The horizontal and vertical offsets are both zero, making the tile offset and pointer zero, but the data is copied into buffers A and B anyway, since the same code is run for all sprites regardless of their size. Then, the two 1 bit per pixel images are merged into a single 2 bits per pixel image. This data is what gets written into video RAM. Next, a 6x6 sprite. The data is decompressed and decoded into buffers B and C. This time, everything looks jumbled since the empty area needs to be readded. The horizontal offset turns out to be 1, and so does the vertical offset, which means the tile offset is 8, and the pointer is at 64 bytes in to the buffer. The empty area gets added to the first bitplane when it is copied from buffer B to buffer A, then it's added to the second bitplane when it's copied from buffer C to buffer B. Now that the sprite is centered correctly, the images are merged, then send to video RAM. And finally, since we saw a 5x5 sprite earlier, let's do a 3x2 sprite just for fun. A sprite this small and non-square is never actually used in the game, but the algorithm here can handle it just fine. The first and second bitplanes are decompressed into buffers B and C, and the data is decoded. You can tell the sprite is much smaller because much less of these buffers are written to at this step. The horizontal offset is found to be 2, and the vertical offset is 5, resulting in the top left corner tile having index 19, and the pointer becomes 152. Then the 6 tiles of the sprite are arranged as you would expect. The same is done for the second bitplane, and then the two are merged just as before. And then of course these final tiles are transferred to VRAM like usual. Alright, finally we can look into what makes MissingNo. look the way it does. Remember, the only two things required to draw a Pokémon's sprite is a pointer to compressed graphics data, and one byte that describes the width and height of the sprite's bounding box. With MissingNo. and other glitch Pokémon, these two parameters are corrupted. So while the pointer is supposed to point to compressed graphics data, in most cases it doesn't, and it points to somewhere completely unrelated. This is what gives MissingNo. its glitchy texture. And then the size of the bounding box is corrupted as well. The algorithm that inserts empty space around the bounding box only works correctly for widths and heights between 1 and 7 inclusive. If a 0 or 8-15 were to show up, the padding around the bounding box won't be applied correctly. In fact, if a Pokémon's bounding box is specified to have a width of 0, the algorithm completely fails and the game crashes. Furthermore, the two copies of MissingNo.'s sprite's width and height are different, which makes things a little bit inconsistant. Let's step through the entire process again, but using MissingNo.'s data as the example. The pointer to MissingNo.'s supposed compressed graphics data is $1900, which happens to point in the middle of some code responsible for writing tilemap data to the screen, including most of that which draws text boxes and dialog. But that doesn't really matter here, because right now it will be treated as graphics data. The first problem that arises with loading MissingNo.'s data is decompressing this junk data. The buffers that hold each of the bitplanes are only 392 bytes big. However, MissingNo.'s decompressed graphics data ends up being much more than 392 bytes. The dimension byte bundled with the compressed data is $DD, which corresponds to a width and height of 13 tiles. So instead of writing a maximum of 49 8x8 tiles, 169 tiles get written instead. Of course, this means that these decompression buffers are indexed way out of bounds, and everything in memory after them gets clobbered. This just so happens to be where the Hall of Fame data is stored, which is why it gets corrupted just from encountering a MissingNo. or some other glitch Pokémon. The algorithm continues to chug along anyway, and eventually these 'graphics' are decompressed into buffers B and C, after processing over 3 times as much data as intended. This is the reason why the game seems to load for much longer than normal when encountering MissingNo.. After it finally wraps up, we are left with data that looks like this. Now we insert the proper amount of empty space around the sprite's bounding box. And here is the second problem that arises--MissingNo.'s bounding box is found to be 8x8 tiles, since this byte is garbage also. Well, let's plug in a width of 8 tiles and height of 8 tiles into those formulas and see what happens. The horizontal offset is found to be 0, which is all fine and dandy, but the vertical offset is calculated to be -1. Therefore, the tile index of the top left corner is -1. Multiply by 8 to get the graphics data offset and we get -8. Now, this offset is calculated inside of an 8-bit register. Negative 8 is represented as 11111000 in binary. However, when working with offsets like this, a negative number doesn't make much sense, so this 8-bit value is treated as an unsigned number instead of a signed number. As an unsigned number, this value is $F8, or decimal 248. If we divide this value by 8 we can get the true tile index, which is 31. It's a pretty unintuitive result, but this places the supposed top left corner way over here. Now, 8 columns of 8 tiles are copied over one at a time. Notice how the last tile of the column is overwritten by the subsequent column, since advancing to the next column is only an increase of 7 tiles, when there are supposed to be 8 tiles in one column. And by the third column, the buffer is again indexed out of bounds, and starts running into the same buffer that this data is coming from. The same thing happens for the second bitplane just as you would expect. Finally, the two bitplanes are merged into a single 4-color sprite as normal. And there you have it! MissingNo.'s shape and texture are both results of unintended parameters that come from a corrupted Pokédex number. Speaking of Pokédex number, let's look into why MissingNo. exists in the first place, which will lead us into the other glitch Pokémon. Each Pokémon has an internal ID number and a Pokédex number. For example, Bulbasaur has Pokédex number 001, but Rhydon has the internal ID number of 1. The Pokédex number and name of a Pokémon are derived via the internal ID, so technically more than one Pokémon can have the same Pokédex number. The internal ID number is the only truly unique identifier between Pokémon species. Now, throughout this list of Pokémon ordered by internal ID, you will notice a bunch of them have Pokédex number 000. These IDs probably belonged to Pokémon that were removed from the game during development. Rather than shifting all of the IDs over when removing a Pokémon, they just left a gap and filled it with a zero. This is also where the placeholder name of 'missing number' came from, since the Pokédex number for the cut Pokémon went missing. Now while a Pokémon's name is derived by its internal ID, its sprite is derived by its Pokédex number. All of these gaps were filled with placeholder zeros, and the Pokémon name list is more of the same, where you can find the string 'MissingNo.' many times. But the reason that all MissingNo. look the same is strictly because they all share the same Pokédex number--000. Oddly, the glitch Pokémon with internal ID zero, 'M, happens to also have Pokédex number 000 just by chance, which is why it looks the same as MissingNo.. As we look through all of these glitched Pokémon sprites, you'll see some of them have been marked with red. These are tiles that change their appearance depending on what is in memory at the time the sprite is loaded. See, the 16-bit pointer to compressed graphics data normally points to a ROM address, since that is where the data is supposed to be located. However, some of the glitch Pokémon have pointers that refer to other regions of the address space, such as work RAM, SRAM, or video RAM. These will look different almost every time they are loaded. The same goes for many of the glitched Pokémons' names. The area of video RAM where the alphabet is stored is near the area where other currently loaded graphics are, such as the back of your own Pokémon, and the front sprite itself. In fact, some glitch Pokémon, like 'M, have parts of its own sprite within its name, which I think is just wonderful. Loading up a Pokémon's sprite in its status screen will even allow tiles from the currently loaded tileset to appear in certain glitched names. This video is getting pretty long, so I'll end it here for now. In a followup video, I will try to 'fix' some of these glitched sprites. I also want to explore more of the data in the ROM, and look at some theoretical glitch Pokémon that don't actually exist.
Info
Channel: Retro Game Mechanics Explained
Views: 875,949
Rating: undefined out of 5
Keywords: video, game, programming, code, glitch, trick, explain, description, hack, tas, pokemon, missingno, cheat, sprite, graphics
Id: ZI50XUeN6QE
Channel Id: undefined
Length: 21min 20sec (1280 seconds)
Published: Mon Apr 27 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.