if you've seen my previous video where I play
Bad Apple inside Super Mario Bros you might be wondering "how"?
Or maybe "what?!" Two people asked "why".
In this video I'll be walking you through the process I took to create this TAS.
Hopefully by the end you would have the knowledge to create a similar TAS yourself.
Part zero: A General Understanding. Glossary. This is a TAS or Tool Assisted Superplay.
In many cases that 's' stands for speedrun, but here my goal is to play a
music video in instead of go fast. Consider a series of musical notes on a timeline.
If played in a digital workstation the computer will always play the notes
exactly as written without fail. "Surprise!"
This was an analogy for TASes. Instead of notes on a timeline
to play a song, consider button inputs on a timeline to play a video game.
When we talk about a tool assisted run you might ask how under this context we define a tool.
The popular examples are save states and playing the game in slow motion.
In this video I'll be using the Hex Editor to look at the consoles RAM.
The Trace Logger to see every instruction that was executed in a given frame.
And as I'll discuss later in the video we'll be writing our own custom tools.
Arbitrary Code Execution! The ROM data of a NES cartridge
is just a bunch of zeros and ones grouped into octets which we call bytes.
A different value of a byte represents a different instruction for the CPU to process.
The ROM is just a bunch of bytes. Fun fact: RAM is just a bunch of bytes.
For example in the Super Mario Bros. 3 any% run, the X Position of enemies in
the stage can be manipulated to form a jump instruction that runs the game's credits.
Now writing a payload in RAM is not enough for an ACE exploit we also need some
way to move the console's Program Counter to RAM in order to execute it.
In the case of Mario 3 this is performed by hitting an out of bounds Note Block.
Something that excites many TASers is what we call Total Control.
The ability to write anything anywhere in RAM and execute it.
If we can write and execute any code we want all we need to do now is
figure out what to do with it. Ace in Super Mario Bros!
Whenever I make a video showcasing arbitrary code execution in
Super Mario Bros. I get the same comment. "Mario 1 has ACE?!"
It does! (Asterisk) As I'll explain in a minute it requires
manipulating RAM with a different NES game and then swapping cartridges.
Quite a few people pointed out that this TAS happens inside World 'N'.
Let's talk about why. When you defeat Bowser with Fireballs he
gets replaced with a different object. The object that replaces Bowser
is different for every world but what would happen if you were beyond the
eight regular worlds of Super Mario Bros.? For instance it's possible to spawn Bowser
in world 9 though you're unable to spawn a fire flower before encountering him.
Let's make a table of all 256 worlds in Super Mario Bros.
Let's also ignore the first eight worlds since we know Bowser is
programmed specifically for those worlds. Filtering for worlds that contain Bowser
and then filtering even further for worlds where you can obtain a fire flower first
we're left with 13 worlds to investigate. You might be happy to hear that Bowser can be
replaced with quite a few other existing objects. A firebar. A moving platform. Even a firework! There are a few worlds where defeating Bowser this way results in a jump to RAM and
this is where we want to investigate. let's not stall for time here.
You know the TAS is in World 'N' so what goes on when Bowser is
defeated this way in this world? In World 'N', defeating Bowser with Fireballs
replaces him with an object with the ID of hex C9. This doesn't immediately jump to RAM though
it does start a chain reaction that will. The state machine's value is incremented and
that will lead to a jump to open bus, which is... Out of the scope of this video.
So I'm just going to wave my hands around a bit aaand we're moving on!
This moves the PC to RAM which allows arbitrary code execution.
Though this all relies on being in World 'N'. How do you even get there?
For this TAS I'll be using a custom version of the bizhawk emulator that will allow
me to swap cartridges in the middle of a TAS. You might be familiar with this trick
where you can start Super Mario bros. beyond world 8 by playing tennis.
I won't be using tennis for this TAS but this should hopefully
explain how cart swapping is used. Now let's talk about the
arbitrary code we'll be running. To begin with let's examine the code
that runs right before the jump to RAM. The value of the state machine is
loaded into the "A register" and it will jump to code that handles jump tables.
This is the code that handles jump tables but we parse the table out of bounds!
This leads us to open bus which we use to jump to address $1181.
The question marks here aren't just because this is RAM, but more
specifically this is uninitialized Ram! Super Mario bros. will never write
to this area under normal gameplay. For example if I were to boot up Kirby's
Adventure this area will be filled with some data but after swapping to
Super Mario Bros. it will still be there! Let's do a brief recap while also
explaining some other fun facts about RAM. For instance there are only hex 800 bytes
of RAM but this is all the way in hex 1100. Due to "RAM mirroring", address $1181
has the same value as address $0181. The entire range from address $0160 to $01E4
is never written to by Super Mario Bros. Since we can use another game to write
here and an ACE exploit another game would allow us to write anything here we've got a
golden ticket to execute anything we want. The question is what do we write here?
The goal is to play the Bad Apple music video but we only have 132 bytes to work with and that's
definitely not enough to put the whole video in. We're going to need to write code
that lets us write more code. This will be done by streaming
data through the controller ports. Speaking of- how does a set of
button presses correspond to a byte? There are eight buttons on the controller and each
button represents a single bit of an 8 bit value. In this example I want to write
the binary number %01010101. Represented as button inputs
this is B, Start, Down and Right. Writing our own custom code!
If we want to write our own code we need to know where to write it
and how many bytes we wish to write. To determine where we're writing I'll read from
the controller and store this at address $C3. This will be the high byte of the 16-bit address.
I'll read from the controller again and store it at address $C2.
This will be the low byte of the address. Now we have the location
let's determine the length. This is done by reading the controller once more
and I'll store it in the same area at address $C1. Now if we look in the hex editor at the region we
stored these we can see how many bytes we want to write and where we want to write it.
In this example I'll be storing hex 34 bytes at address $700.
Now to actually write the code at the target address.
We're going to use the "X register" to determine how many bytes we've
written so far so let's set X to zero. I'll read from the controller and
write to our target address with the X register being used as an offset.
X is incremented and that will happen in a loop until X is equal to our payload
length determined by address C1. Now this is a whole lot of information, so let's
actually see this payload in action and how I as the TAS developer can use this.
Suppose I want to write some code here at address $210.
First I'll add the three controller inputs that form the location and length.
02 is written with just the left button. 10 is written with Start.
And I'll write 9 bytes here which is going to be Up + Right.
Once these inputs are emulated we'll see this at address $C1.
I'll paste a series of inputs that form the code I want to write and as I step through each input you
can see the bytes being written in the Hex Editor. So that's how we'll write code in this TAS.
Now we have to make a choice: do I want to write more code at a different location or
do I want to execute some specific code? If I want to write more code
I'll leave the next input blank. If this input isn't blank we're
going to move move the PC somewhere. Let's read the controller twice to set up the
desired place to move the PC and then an indirect jump instruction can take it there.
Let's see this in the emulator too. We wrote code at address
$210 now let's jump to it. First I'll add anything to this
next input so it isn't blank. Now to jump to $210.
I'll write 2 and 10. And just like that I'll step ahead
a few inputs and there it is! Here's the assembly code that I'm using to
take these inputs and use them to write code. If you want to read it go
ahead and pause the video. You might wonder how this
data is going to be written. That's a lot of bytes to manipulate! I'll be using a very brief total
control TAS of Super Mario Bros. 3. That was it!
That was about 3/4 of a second and everything is all ready for Super Mario Bros.
This was likely a lot to take in so let's recap. A TAS of Mario 3 is used to write
a payload and manipulate RAM so Super Mario Bros can begin in World 'N'.
Defeating Bowser in World 'N' leads to a chain reaction ending with this payload being executed.
This payload allows me to use controller inputs to write my own custom code and execute it.
And now we're ready to talk about the Bad Apple music video.
To begin with, let's cover the graphics. The Graphics: how to convert
the video into Mario tiles. This is all the graphics data of Super Mario Bros.
Okay- there's also the object graphics, but we won't be using any of those.
If you want to take a frame from the music video how would we render it using Mario tiles?
There are many ways to do this and in hindsight I probably could have Googled for
existing code for making mosaics. Alas I didn't so here's
how I converted the images. I started by shrinking the image down
to fit inside the NES resolution. Then I broke the image down into 8x8 pixel chunks.
let's take a look at one of these chunks. The goal here is to find which character in the
Mario graphics is the best fit for this chunk. I'll iterate over every character
so let's begin with this one. My method of determining how close these match was
to get the total difference between each pixel. Let me take the brightness of these
two pixels and add up the difference. We've got 254.
Moving on to the next pixel I'll calculate the difference again: 201.
Next pixel, do it again. This repeats for every pixel and I'll tally
up all the differences to get a score. This number will be recorded and
we move on to the next character. I'll run the same process and if
the score is smaller then this character is now the new best fit.
This repeats for every character and finally once all have been checked we
have the character that is the best fit! Repeat this for every chunk and we
have the full frame rendered this way. Of course, doing this by hand would take thousands
of years to make it through the entire video so let's write a program to do this for us.
This is Visual Studio. I'll begin by making a new .net forms C# project. I'll call it- uh- BadAppleTAS.
I'll add a button to this application so all I need to do is click
this button and this function will run. I've got the image scaled down to the
proper resolution so that's a good start. Slap that bad boy in a loop and we've
got ourselves every frame from the video rendered in Mario tiles!
It's quite pleasing to see this code working so far.
Of course PNG files are nice to look at though we can't just drag a PNG file into Super Mario Bros.
We'll need to be a bit more creative. Instead I exported all of the Frame data one
tile at a time in the form of a .bin file. Here's all the bites that I wish
to draw into the NES background. Though this poses a good question, "How
do you draw to the background on the NES?" Graphics part two: how to draw
to the background of the NES. here's a new ROM that I put together.
So far it just sets up the palettes and then crashes the NES.
The goal here is to write the string "hello world!" on the background.
The first step is to know where on the background we want to write it.
How is the middle of the screen defined in NES terms?
Using Bizhawk's Nametable Viewer we can see the PPU address of each tile.
According to this cartridge I already made, it looks like the ideal place
to start is address $21CA. Drawing to the Nametable is a two-step process.
First you need to change the PPU's read/write address.
And then you write to the PPU. Changing the read/write address requires two
separate store instructions to address $2006. In this case I write 21 and then CA which
sets the read/write address to $21CA. Now we need to write "hello
world" one character at a time. To begin with which character is 'H'?
Looking at all the Mario tiles, "H" is Row 1 column 1
so we'll use hex 11 to draw the letter H. Storing this value at address $2007 is how we
communicate with the PPU and draw the tile. I'll compile this ROM and
drag it into the emulator. For this I'll use a compiler called NES ASM. With this ROM loaded in you can see we've
successfully drawn the letter 'H' on screen. If I want to continue and write the rest of "hello
world" I'll need to write the letter 'E' next. This can be done by copying the previous two
lines we wrote and changing it to an 'E'. Notice how we don't need to move the
PPU read/write address as that happens automatically when storing it address $2007.
To make this code look nicer I wrote a lookup table for the entire string and I'll replace
this code up here with a loop that reads from the lookup table and writes to $2007. Now this
should print out the entire "hello world" string. Taada! Now I want to draw all the characters
I exported from the music video. Currently in our TAS we just defeated Bowser.
You'll notice that in the Nametable Viewer, the screen still has all the
tiles from Bowser's Castle. The first step here I took was to remove all that.
Here's some assembly code that changes the palettes and overwrites every tile
on the Nametable with a black square. After setting up the pointers to write
this at address $300 I'll paste all the bytes for this code and watch the code
get filled in through the hex editor. Now to actually execute this code I'll
make the next input non-blank and I'll write the pointer to address $300.
Now I've got a blank canvas. We'll need to write some code to draw the data
on screen but first let's try and compress it. Graphics part three: compress
the data into packets. This row of bytes represents a
strip of tiles to draw on screen. You might notice that many
of these bytes are repeated. The easiest form of compression I could
think of was to simply indicate how many of a tile to draw followed by
the ID of the tile to draw. I should also include the address of where
this tile is being written so in total we have turned these 20 bytes into 12 bytes!
If we were to take a look at a much busier frame we can see how this style of compression
can immediately begin to backfire, as we have transformed these 20 bytes into 40 bytes.
I decided for this video I'll store this graphical data in two ways as a row
of duplicate tiles in this format: Or as a row of individual tiles in this format: Let's take a look at an example.
This is the second frame from the video with any changes.
You can see there are 14 tiles that need changed. With this system of packets I can
modify only the tiles that need changed. Starting with this tile on the
upper right at address $2119 I will write one tile with the ID CB.
Following this format, here are the rest of the packets.
I also want to end each group of packets with a Terminator.
This will tell my code that the data for the frame is complete and we can wait for the next frame.
Graphics part four: the limitations of V blank. I initially had a section here where I would
explain how long it takes in CPU Cycles to update tiles on the screen to be honest
this part of the video was a bunch of math and then walking you through my assembly
code and oh wow it was not entertaining! In summary there's not enough time using my
method to draw an entire 32x 24 tile image in a single frame.
Not even close! I'm going to render this video
as 20 x 15 tiles instead. That means there's a lot of blank space around
the video but not much I can do about that. Moving back to C# land let's adjust for this
new resolution and regenerate that .bin file. Right now we have code to interpret packets
as graphical data we still need code to interpret button presses as packets.
Something else I'll be glossing over is a hardware level race condition for
my method of waiting for the next frame. Again, it's...
Out of the the scope of this video. More hand waving- just know that I read the
controller an extra time in a frame to prevent it. You might have noticed that the NES
has two screens worth of background. Since the NES runs at 60 FPS and
we're rendering a 30 FPS video I can take two frames to render each image.
If I render a frame on the background not being shown I can seamlessly render
30 FPS video at a larger scale. I'll be switching which background is being
shown every two frames that way I don't show the frame I'm changing when it's halfway finished.
Graphics part five: creating the packets from the frame data and converting it into button presses.
Here are the formats for our packets and here's the data we exported from our C# program.
Let's begin by separating these packets into each frame we can then separate each frame
into the individual rows that make it up. All we need to do is write a function that can
take in a row of bytes and spit out a packet. Do this for every row and we got
ourselves a full set of packets! You might notice there are two terminators here.
That's because we're spending two frames for each image.
Now we'll write a function that takes in the bytes of our packet and convert
each byte into the TAStudio button format. Paste in the inputs (wait a few minutes)
and we're ready to watch the TAS! In case you want to know what these button inputs
are doing here's an example frame to look at. The first input is to prevent that
race condition I glossed over. The next input is how many bytes of packet
data need to be written for the next frame. In this case there are hex 26 bytes to write.
The following 'n' bytes is the packet data. Again in this case there are
hex 26 bytes of packet data. And now we just sit back and watch the TAS. And here it is!
We've got the graphics from the Bad Apple music video being rendered inside Super
Mario Bros. through a whole lot of button presses! I'm really happy with this result though
it does seem to be missing something... This is a .wav file. If you take a look up close on an audio
editing tool such as audacity you can see that this waveform is not actually
continuous but it is broken into "samples". Imagine it as the "resolution" for the audio.
Let's listen to a musical chord but hear how it sounds at lower and lower sample rates.
You can hear how it gets dramatically worse the fewer samples there are per second.
Let's learn how a .wav file stores this data! This is done by storing where the sample
is in relation to some Middle Point. In this case, Zero.
The further away a sample is from zero the larger the number.
If we convert these into hexadecimal we can see these exact numbers being
stored inside the .wav file! 06 and D3 are stored in the file as D306.
23 and D8 are stored as D823. 49 and 62 are stored as 6249, and so on.
It's not super important for the video though if you're interested in knowing
what the other bytes in the .wav file are, here's a breakdown for that file's header.
Not every .wav files header will look exactly like this.
Some may contain metadata and other information in here as well though
again that info isn't really needed here. The .wav file for Bad Apple that I'll be using
stores the samples as 16 bits with zero as their midpoint and a sample rate of 48 khz.
If I want to play this audio on the NES I need to work with the NES limitations.
The NES can play 7bit samples 0 to 127 with 64 as the midpoint, and it plays one
sample whenever I write to address $4011. This means that we will be responsible for
playing the audio at the desired sample rate! Here's the plan: I will read from the controller and then
I will store that data at address $4011. This will be happening in a loop.
Here's the assembly code for this. You'll notice the comments
here are counting CPU Cycles. This Loop takes 71 cycles per iteration.
To determine the sample rate we could do some real quick math!
71 cycles per loop. About 1.78 million cycles per second.
To calculate the sample rate we will divide the cycles per second with the cycles per Loop.
And this case that gives us a sample rate of over 25 khz!
25 khz audio is not too bad coming out of the NES! Let's recap the order of
events happening in a frame. We begin a frame by updating the graphics using
the packet information stored at address $0000. I read the controller to prevent the
race condition at the end of the frame. I read the controller to
determine the packet length. I read the controller 'n'
times to write the packet data. And all the remaining time
until the end of a frame is spent reading the controller to play 25 khz audio.
After the frame ends this loop repeats and we'll do this for every frame of the music video.
Audio part two: generate the inputs and give it a listen!
Welcome back to C# land where I have a puzzle to solve.
I need to make sure that the sample I grabbed from the .wav file is the correct
sample to play at this moment in time. How do I know which sample to grab?
In short I need to know how many cycles have happened this Frame so far, and since
every frame could take a different amount of time to update the graphics and
write the packet for next frame... I decided to just write a custom emulator to
emulate the frame for me and count cycles. This is a pretty standard NES emulator.
(Though it does support open bus) Here's how I set up the
emulator to run in my C# code. The number of cycles will be
stored in EMU.totalCycles. A few lines down I'll convert cycles into seconds.
Then as long as I still have more than 71 CPU cycles in a frame, I will grab the sample from
the .wav file, convert it to 7bit, add 71 CPU cycles for the next audio sample, and loop!
With the program ready I click a button and this generates the inputs in
the TAStudio button format. It looks like this TAS will have
5.4 million inputs to paste! Copy that, paste it into TAStudio (and
then wait 15 minutes) and we're good to go! You'll notice every frame now takes
forever to emulate since we're cramming in as many inputs as possible but if we were to
play it back in real time how does it sound? Uh- headphone warning... it sounds terrible!
And it shouldn't come as a surprise either. Consider how much time in a frame is
spent not writing to the audio chip. There's some pretty big gaps.
These gaps are actually visible if you take a look at the audio we're creating from the NES.
Let's figure out how to fix this! The audio part three: fixing the audio.
One idea I had was to smoothen out the first audio samples of a frame.
Here's how that sounds: it still sounds bad.
It would be really nice if we could just somehow send data to the audio
chip during the other parts of the frame. And with this plan all figured out let's
write audio data while writing packet data. Here's my code that writes the packet data
as you can see it is a loop that reads the controller and stores the graphics data.
If I were to make this change now it will alternate between writing graphical data and
writing audio data well how does it sound? it sounds much better!
But why stop there? We still need to write audio
data while updating the graphics! This is done by simply cramming
in audio data inside the packets. (and a small change to the packet reading loop)
And with that the entire frame now has audio data. The plan has worked and we now have the
full music video looking and sounding good! Step three: return to stable gameplay.
I wanted to end the TAS by faking the level transition screen and spawning
Mario directly over the axe. Well, how do I make that work?
You might notice that in our music video playing loop we don't really have any
way to exit the loop when the video ends. I made a small modification to this step
here so if I press the A button it exits the loop by jumping to address $0181 which
was the code that lets me write more code. I would walk through the assembly code for
this though it's not really that interesting it's much more amusing to just watch the
bytes get filled in on the hex editor and watch the Nametables be overwritten.
First I set up the level transition screen and then I replace the right
Nametable with Bowser's Castle. Now I'm just waiting for a few frames so
that this screen lasts the correct length. Now I finally blank the screen and
I draw the rest of Bowser's Castle. And finally I overwrite almost every byte in RAM.
This is kind of like loading a save state. By enabling the non-maskable interrupt and jumping
to where Super Mario bros. waits for the next frame, I return everything to stable gameplay
and and Mario falls to the axe winning the game! No more inputs. The TAS has ended.
Step four: even further beyond! I'd like to share some ways to improve this TAS.
Could I improve the audio any more? Currently I'm interlacing one write to the
audio chip for every byte of graphical data, but some of these frames
have very little to write! For frames with a lot of graphical data
I'll stick to one interlaced audio sample but for frames with very little to write I
could pack around four writes to the audio chip for every byte of graphical data.
I wrote four different audio interlacing functions to be used depending on how much
data needs to be written for a given frame. Here's the code that chooses
which function to use. Could the main audio reading Loop
be faster than 71 CPU cycles? As it turns out, yeah!
If before the loop begins I set up the X and Y registers I can speed
up the process of strobing the controller. I still need A to be equal to zero when this
begins so we can only save two cycles here. I can also remove this CLC
instruction saving two more cycles. This brings the loop down to 67
Cycles which is around 26.6 khz audio! Of course the huge drawback here being that...
I didn't think about that when I made this TAS and now it's too late!
What about the graphics? Could this be rendered at a larger resolution?
I spent a lot of effort looking for ways to increase the resolution, but taking even more
time to draw the graphics means we're going to take a huge dip in the audio quality.
As an experiment though, what if we remove the audio altogether? could we
play the video at a full 32 x 24 tiles? I also spent a ton of time on this one.
In theory this code could run entirely within Vblank, though it would take over 16,000
cycles more than the available cycles in a frame in order to write the data.
How about 60fps? As it turns out I initially thought the video
was 60fps and my first test with the graphics was noticeably faster than it was supposed to be.
Though we'll have to make the resolution smaller since everything will be drawn
in one frame instead of two. How about color video?
Something I wanted to try was rendering a non- grayscale video and what I found was that
shockingly enough, I wouldn't even need to change my assembly code at all in order to do this!
The system of packets was designed for writing to the Nametables but it could be used
to write to any address on the PPU! The attribute tables and even color palettes
could be changed with this existing code. Suppose for instance I wanted to render this
trailer for my indie game inside Super Mario Bros. The real difficult part here would be
converting the frames into Mario tiles. While this step was easy for the black and
white video, if I have 64 potential colors to choose from dived up into four color palettes
and the limitations of the attribute tables... Where do I begin for this conversion?
Let's begin by taking a frame from the video and shrinking it down to the resolution
we'll render it inside Super Mario Bros. Let's set every pixel to the
closest color on the NES palette. Uh- right now this has about 35 colors
in it so it's not quite NES-ready. The NES background can have up to 13 colors.
A single background color for all four palettes and three more colors for each palette.
Trimming for only the top 13 used colors gives us this!
The attribute tables of the NES will subdivide this image into 16 pixel chunks and
each of these chunks will be assigned a palette. The first problem: how do we choose
which colors to assign to which palette? Let's tally up the top four colors in every chunk.
I wanted to know which color exists in the most palettes.
This will be our background color since that color is used in every palette.
Now we group every remaining color into their most popular tri-color-arrangement.
And now we force every 16 pixel chunk to use colors exclusively in the
palette that most represents that chunk. And finally we can use the same method as
before to draw the image using Mario tiles. Look at that!
It's barely recognizable! I found the title screen for my game
rendered this way was laughably unreadable. Now all I need to do is convert the
entire trailer for my game, and- I wonder... So yeah I got Doom inside Super Mario Bros. Keep in mind this is not the source
code of Doom ported inside the NES, this is just Super Mario Bros. being a video
player, though I'm really happy with the results! In theory, this could be used to play
just about any 4x3 video as long as you're willing to have the graphics completely ruined.
And that about wraps things up for this video! All of the assembly code can be found in the
TASVideos submission linked In the description. If you want to know how this TAS can beat Mario 3 in less than a quarter of a second
I plan to explain that one next. But what if I could beat Mario
3 in 5 millionths of a second?! Stick around!
By the way, if you want to support me (so I can make more videos like this)
consider checking out my game, Fantastic Fist. It's a keyboard and mouse platforming
game about punching things! There's a free demo and a link in the description.
And with that I'll see you around Oh- right! And if you asked
why I made this uh it was fun.