When it comes to video game glitches, MissingNo. from the first generation of Pokémon games
comes to mind for a lot of people. From its distinctive grainy texture to its
iconic stair-step shape, it really embodies the feeling of glitchiness. But just how and why does MissingNo. look like this? For a placeholder literally named 'missing
number,' one might expect this Pokémon to show up as a solid square, a placeholder graphic,
or maybe even not show up at all. In this video, we'll look at exactly how Pokémon
are laid out and drawn onto the screen--uncovering the unintentional design of MissingNo. and
other glitch Pokémon. The odd visual form of MissingNo. has to do
with several errors that occur when decoding the Pokémon's sprite data and displaying
it onto the screen. But before we tackle that, let's first look
at how the sprite data is stored in the ROM in the first place. By knowing the format these graphics use,
the whole process will be easier to understand. During a battle, a 56x56 pixel square area
is designated as the area where the opponent Pokémon shows up. This area can be divided up into 49 8x8 tiles--the
entire area is 7 tiles wide and 7 tiles tall. These tile graphics are stored in column order--that
is, starting at the top left and going down. The graphics used here (and on the Game Boy
in general) have a bit depth of 2 bits per pixel, which allows for 4 different colors. This means that each tile will take up 16
bytes in memory. Multiply that by the 49 total tiles, and you'll
find that the buffer size that holds the Pokémon's sprite data is 784 bytes large. This is how the sprite is displayed during
a battle. However, keeping the data in this format in
the ROM just would be wasteful. 784 bytes is a lot of space, especially for
just a single sprite. Considering the fact that there are 151 Pokémon,
this would mean over 100 thousand bytes just for Pokémon front sprites. Then considering literally all of the other
graphics in the game, that would be a crazy amount of data just for graphics alone. To save a lot of space very easily, the graphics
data was compressed before inserted into the ROM. Then, when a sprite is loaded, the graphics
are decompressed on the fly. For example, this Articuno sprite data is
only 568 bytes when compressed in the ROM, before being decompressed into the 784 bytes
in video RAM. The actual compression and decompression used
in the game is fairly complicated, so I won't go into too much detail here, but there is
one important thing that you'll need to know for later. As mentioned earlier, each pixel requires
two bits to know what color to use. When loaded in video RAM, these two bits are
near each other, and two bytes in video RAM represents one row of 8 pixels. But to make the compression more efficient,
these two bits are stripped apart from one another. By separating these bits throughout the entire
sprite (which is referred to as a bitplane), instead of one sprite at 2 bits per pixel,
we get 2 sprites at 1 bit per pixel. These two sprites are then compressed and
stored one after the other. This was what was done to large Pokémon sprites
like this. However, a lot of Pokémon took much less
than the allotted 56x56 area. Take Charmander for example. There is a lot of empty space around its sprite. Even though this would probably compress really
well, it is still a waste to include it within the sprite data, since there's nothing actually
here. Therefore, this extra empty space around the
edges was omitted before compressing the sprite data. More specifically, a bounding rectangle was
placed around the sprite to include as few empty tiles as possible. In practice, only 5x5, 6x6, and 7x7 bounding
boxes were used, but as we'll see later, the algorithm that deals with this can support
all the way to a 1x1 sprite, as well as rectangular bounding boxes. The sprite data is still stored in order from
top to bottom, left to right. Then, one additional byte is used to tell
how large the bounding box was. The upper 4 bits held the box's height in
tiles, while the lower 4 bits held its width. This byte was copied into two separate locations:
it was bundled with the compressed graphics data to help the decompression algorithm piece
together the sprite correctly, and it was located in a separate table for the algorithm
that reinserts these blank tiles to get ahold of. Now, let's look at the algorithms that were
used to convert this compressed data back into a viewable sprite image. In order to manipulate the data correctly,
a buffer of 1176 bytes was used, which is enough space to hold three 56x56 pixel sized
sprites, or 147 8x8 tiles at 1 bit per pixel. This buffer was split into three equally sized
portions, which were 392 bytes each--I'll refer to them as buffers A, B, and C.
In this view here, I'll display the data in column order at 1 bit per pixel--this makes
it easier to see what exactly is going on. And over here I'll show the data in row order
at 2 bits per pixel, which is more likely what you would see if you were using a memory
viewer. You can tell from how messy this looks that
it is important to interpret graphics data at the correct bit depth. First, the 16-bit pointer to the compressed
graphics data was loaded from the Pokémon's statistics table, and the single byte that
holds the bounding box size was reserved. Then, the two bitplanes were decompressed
into buffers B and C. The method here involves several steps, but
it's pretty complex and out of the scope of this video, so we'll just skip to the part
where the graphics data is decompressed and in the correct format here. Note that the entire buffer may not be used,
particularly if the sprite we're dealing with is smaller than 56x56 pixels. If it is 56x56, then the entire buffer is
filled. The data should however be the exact same
size in each of the buffers, since these bitplanes are two halves of the same sprite. Now the data has been decompressed, but the
empty tiles outside of the sprite's bounding box haven't been inserted yet. For example, if we were to merge the bitplanes
and move the tiles into video RAM as they are now, it would be a mess since none of
these empty tiles around the edges exist yet. To account for this, a simple routine is used
to piece together the correct version of the sprite given the graphics data and the bounding
box size. Here is a basic outline of what the routine
does. Step one: initialize all of buffer A into
all zeros, so the entire bitplane is blank. Step two: calculate the offset of the top
left corner of the sprite. We'll come back to this in a bit. This step will give us a pointer to memory
within buffer A where the graphics data should be copied to. Step three: copy over one column of tiles. The number of bytes transferred will be eight
times the height of the sprite's bounding box, since each tile takes up 8 bytes here. Step four: revert the pointer back to what
it was before copying over this column, then increment it by 56 bytes. This effectively sets up the pointer to be
at the top of the next column. And step five: repeat steps 3 and 4 for each
column remaining in the sprite. These steps will run a number of times equal
to the width of the sprite's bounding box. After this subroutine finishes, the Pokémon's
sprite will be properly positioned within the allotted 56x56 pixel area. Or at least, one of the bitplanes will be. The same subroutine is run again on the data
within buffer C, but it copies into buffer B.
So after all of this, buffer A contains the first bitplane and buffer B contains the second
bitplane. To finish building the sprite, the two bitplanes
are zippered together to match the graphics format that the Game Boy requires so it can
actually display the sprite correctly. The data in buffers A and B are copied into
buffers C and B starting from the end and working backwards. This sprite takes up two buffers since the
number of bits representing each pixel has doubled from 1 to 2. And finally, the 784 bytes of buffers B and
C are transferred into video RAM to be displayed on the screen. Okay, let's revisit that one step that I skipped
over: finding the top left corner of the bounding box of the sprite within the 56x56 pixel area. First, I'll mark off each 8x8 tile's index
number in this area, in order of how they are stored in memory. The goal is to fit a smaller box of graphics
into this larger box, and find the tile index number of its top left corner. The smaller box should always rest on the
bottom of the larger box, and it should also be centered horizontally. We can find the index of the top left corner
by finding the horizontal offset and vertical offset of the smaller box in tiles. Then, if we multiply the horizontal offset
by 7 and add the vertical offset, that will give us the tile offset we want. For example, if the smaller box were 5x5,
we would want it to show up here. It has a horizontal offset of 1 and a vertical
offset of 2. Seven times one plus two is nine, which is
exactly the index of its top left corner. First, the vertical offset. This one is actually pretty easy; it is just
7 minus the height of the bounding box. We can double check this by seeing that a
7x7 sprite should have an offset of zero, and that our 5x5 sprite had an offset of 2. The horizontal offset is a bit trickier. The formula for this offset is 7 minus the
width of the bounding box, all divided by 2, plus one half, and rounded down to the
nearest whole number. Again we can double check by seeing that a
7x7 sprite should have an offset of zero, and that the 5x5 sprite had an offset of 1. The plus one half part ensures that even-numbered
width sprites, which can't be centered exactly, will favor the right side of the allotted
area instead of the left. Now that we have the horizontal and vertical
offsets, the index number of the top left corner can be calculated. However, the algorithm that inserts the empty
tiles deals with bytes of graphics data, not whole tiles. At this point in time, the graphics are in
1 bit per pixel format, so one tile takes up 8 bytes. This means to convert a tile index into a
byte index, we multiply it by 8. This may have seen like a lot of pedantic
mathematics, but it really is important to uncover the mysterious shape that MissingNo. takes. Before moving on to MissingNo., let's look
at a few more examples so you can see this algorithm in action. First, a 7x7 sprite. The data is decompressed and decoded into
buffers B and C. Now, without the need for any spacing, these
already look pretty good. The horizontal and vertical offsets are both
zero, making the tile offset and pointer zero, but the data is copied into buffers A and
B anyway, since the same code is run for all sprites regardless of their size. Then, the two 1 bit per pixel images are merged
into a single 2 bits per pixel image. This data is what gets written into video
RAM. Next, a 6x6 sprite. The data is decompressed and decoded into
buffers B and C. This time, everything looks jumbled since
the empty area needs to be readded. The horizontal offset turns out to be 1, and
so does the vertical offset, which means the tile offset is 8, and the pointer is at 64
bytes in to the buffer. The empty area gets added to the first bitplane
when it is copied from buffer B to buffer A, then it's added to the second bitplane
when it's copied from buffer C to buffer B. Now that the sprite is centered correctly,
the images are merged, then send to video RAM. And finally, since we saw a 5x5 sprite earlier,
let's do a 3x2 sprite just for fun. A sprite this small and non-square is never
actually used in the game, but the algorithm here can handle it just fine. The first and second bitplanes are decompressed
into buffers B and C, and the data is decoded. You can tell the sprite is much smaller because
much less of these buffers are written to at this step. The horizontal offset is found to be 2, and
the vertical offset is 5, resulting in the top left corner tile having index 19, and
the pointer becomes 152. Then the 6 tiles of the sprite are arranged
as you would expect. The same is done for the second bitplane,
and then the two are merged just as before. And then of course these final tiles are transferred
to VRAM like usual. Alright, finally we can look into what makes
MissingNo. look the way it does. Remember, the only two things required to
draw a Pokémon's sprite is a pointer to compressed graphics data, and one byte that describes
the width and height of the sprite's bounding box. With MissingNo. and other glitch Pokémon,
these two parameters are corrupted. So while the pointer is supposed to point
to compressed graphics data, in most cases it doesn't, and it points to somewhere completely
unrelated. This is what gives MissingNo. its glitchy texture. And then the size of the bounding box is corrupted
as well. The algorithm that inserts empty space around
the bounding box only works correctly for widths and heights between 1 and 7 inclusive. If a 0 or 8-15 were to show up, the padding
around the bounding box won't be applied correctly. In fact, if a Pokémon's bounding box is specified
to have a width of 0, the algorithm completely fails and the game crashes. Furthermore, the two copies of MissingNo.'s
sprite's width and height are different, which makes things a little bit inconsistant. Let's step through the entire process again,
but using MissingNo.'s data as the example. The pointer to MissingNo.'s supposed compressed
graphics data is $1900, which happens to point in the middle of some code responsible for
writing tilemap data to the screen, including most of that which draws text boxes and dialog. But that doesn't really matter here, because
right now it will be treated as graphics data. The first problem that arises with loading
MissingNo.'s data is decompressing this junk data. The buffers that hold each of the bitplanes
are only 392 bytes big. However, MissingNo.'s decompressed graphics
data ends up being much more than 392 bytes. The dimension byte bundled with the compressed
data is $DD, which corresponds to a width and height of 13 tiles. So instead of writing a maximum of 49 8x8
tiles, 169 tiles get written instead. Of course, this means that these decompression
buffers are indexed way out of bounds, and everything in memory after them gets clobbered. This just so happens to be where the Hall
of Fame data is stored, which is why it gets corrupted just from encountering a MissingNo.
or some other glitch Pokémon. The algorithm continues to chug along anyway,
and eventually these 'graphics' are decompressed into buffers B and C, after processing over
3 times as much data as intended. This is the reason why the game seems to load
for much longer than normal when encountering MissingNo.. After it finally wraps up, we are left with
data that looks like this. Now we insert the proper amount of empty space
around the sprite's bounding box. And here is the second problem that arises--MissingNo.'s
bounding box is found to be 8x8 tiles, since this byte is garbage also. Well, let's plug in a width of 8 tiles and
height of 8 tiles into those formulas and see what happens. The horizontal offset is found to be 0, which
is all fine and dandy, but the vertical offset is calculated to be -1. Therefore, the tile index of the top left
corner is -1. Multiply by 8 to get the graphics data offset
and we get -8. Now, this offset is calculated inside of an
8-bit register. Negative 8 is represented as 11111000 in binary. However, when working with offsets like this,
a negative number doesn't make much sense, so this 8-bit value is treated as an unsigned
number instead of a signed number. As an unsigned number, this value is $F8,
or decimal 248. If we divide this value by 8 we can get the
true tile index, which is 31. It's a pretty unintuitive result, but this
places the supposed top left corner way over here. Now, 8 columns of 8 tiles are copied over
one at a time. Notice how the last tile of the column is
overwritten by the subsequent column, since advancing to the next column is only an increase
of 7 tiles, when there are supposed to be 8 tiles in one column. And by the third column, the buffer is again
indexed out of bounds, and starts running into the same buffer that this data is coming
from. The same thing happens for the second bitplane
just as you would expect. Finally, the two bitplanes are merged into
a single 4-color sprite as normal. And there you have it! MissingNo.'s shape and texture are both results
of unintended parameters that come from a corrupted Pokédex number. Speaking of Pokédex number, let's look into
why MissingNo. exists in the first place, which will lead
us into the other glitch Pokémon. Each Pokémon has an internal ID number and
a Pokédex number. For example, Bulbasaur has Pokédex number
001, but Rhydon has the internal ID number of 1. The Pokédex number and name of a Pokémon
are derived via the internal ID, so technically more than one Pokémon can have the same Pokédex
number. The internal ID number is the only truly unique
identifier between Pokémon species. Now, throughout this list of Pokémon ordered
by internal ID, you will notice a bunch of them have Pokédex number 000. These IDs probably belonged to Pokémon that
were removed from the game during development. Rather than shifting all of the IDs over when
removing a Pokémon, they just left a gap and filled it with a zero. This is also where the placeholder name of
'missing number' came from, since the Pokédex number for the cut Pokémon went missing. Now while a Pokémon's name is derived by
its internal ID, its sprite is derived by its Pokédex number. All of these gaps were filled with placeholder
zeros, and the Pokémon name list is more of the same, where you can find the string
'MissingNo.' many times. But the reason that all MissingNo. look the same is strictly because they all
share the same Pokédex number--000. Oddly, the glitch Pokémon with internal ID
zero, 'M, happens to also have Pokédex number 000 just by chance, which is why it looks
the same as MissingNo.. As we look through all of these glitched Pokémon
sprites, you'll see some of them have been marked with red. These are tiles that change their appearance
depending on what is in memory at the time the sprite is loaded. See, the 16-bit pointer to compressed graphics
data normally points to a ROM address, since that is where the data is supposed to be located. However, some of the glitch Pokémon have
pointers that refer to other regions of the address space, such as work RAM, SRAM, or
video RAM. These will look different almost every time
they are loaded. The same goes for many of the glitched Pokémons'
names. The area of video RAM where the alphabet is
stored is near the area where other currently loaded graphics are, such as the back of your
own Pokémon, and the front sprite itself. In fact, some glitch Pokémon, like 'M, have
parts of its own sprite within its name, which I think is just wonderful. Loading up a Pokémon's sprite in its status
screen will even allow tiles from the currently loaded tileset to appear in certain glitched
names. This video is getting pretty long, so I'll
end it here for now. In a followup video, I will try to 'fix' some
of these glitched sprites. I also want to explore more of the data in
the ROM, and look at some theoretical glitch Pokémon that don't actually exist.
This was an interesting, well presented video.
So many deleted Pokemon ☹️