The Atari 2600 had extremely primitive hardware. It really is a great feat of engineering and
programming that any of the games made for the system work at all. The biggest reason why this is such a commonly
stated fact is that the console has basically no video memory. The same processor that handled the actual
game logic was also in charge of coordinating how things appear on the screen. And all of this had to be perfectly timed
and in sync with the television's scanning electron beam. Hence, this kind of programming to produce
an image was often referred to as 'racing the beam'. Here's a question: how many bits are used
to produce one frame of gameplay on a standard Atari 2600 game? I'll give you just a few hints: the picture
resolution was 160 by 192 pixels (for NTSC systems at least), with up to 30 lines in
overscan. There was one background, two player sprites,
two 'missile' sprites, and one 'ball' sprite. And all of these had a graphical bit-depth
of 1 bit per pixel, which just means that you could only choose between two colors at
a time--on or off. Pause the video now if you wanna try to do
the math. So what did you come up with? 160 times, say, 222 bits for the background,
plus maybe 8 times 20 times 2 for the player sprites, and 8 times 2 for the missile sprites,
and just a single bit for the ball sprite. This totals 35857 bits. If that was your guess, you were a bit too
high. In fact, you were way too high. The correct answer is 39 bits. Yes, just 39 bits--I can even show that many
right here on the screen for you. 20 bits for the background--or playfield as
it is called in Atari's documentation--, 8 bits for each player sprite, and 1 bit each
for the 2 missiles and ball sprite. How in the world does this work? First let's just focus on the playfield; we'll
come back to the sprites in a bit. I mentioned the full resolution of the output
from the Atari is 160 pixels across and 192 pixels down. Well, the playfield has a lower horizontal
resolution than that. It can only have 40 distinct 'pixels' per
scanline--let's call these blocks. This means that each block of the playfield
is actually a small group of 4 pixels next to each other horizontally. 40 playfield blocks times 4 pixels per block
gives us the 160 total pixels across. If we cut the number of blocks per line in
half, we get 20. This is where the 20 bits for the playfield
comes from. The playfield is actually split in half horizontally,
and the left half and right half share the same memory space. The two halves can be repeated or mirrored;
a lot of games used this mirroring functionality. But what do you do if you don't want the playfield
to be a repeated or mirrored pattern? The general idea was that, in the game's code,
the playfield bits to display the left half of the screen were written to memory right
when the scanning beam of the television was just about to display this part of the screen. Then, after an incredibly short time passes,
the beam moves to the right drawing that part of the screen. During this moment, the data for the right
half of the screen had to be written. This way, two different patterns for the playfield
could be drawn using the same exact memory region. This is where the phrase 'racing the beam'
comes from, because you have to time everything with where the scanning beam is, all the time. If you write the second half of the screen's
data too quickly, the first half gets corrupted because the beam is still drawing that half. If you write it too slowly, the second half
gets corrupted because the beam is already trying to draw that half. It's a very delicate process that requires
timing out exactly how long each bit of code takes to execute, so that the writes to memory
occur exactly when they need to. Now, this process I just described was just
for a single scanline. Having only 20 bits of memory for the entire
playfield, not only do the left and right half of the screen share the same memory,
but all scanlines share the same memory as well. This means this process of writing the data
for the left and right halves of the playfield has to occur every single scanline if you
want the graphics to change as the beam moves down the screen. Now that we've got a general idea of what
is going on, let's look at the actual memory values used to hold the playfield data. 20 bits can be organized into 3 bytes, with
4 bits to spare. These three bytes are known as PF0, PF1, and
PF2. Only the higher 4 bits of PF0 are used, the
lower 4 go unused, while PF1 and PF2 are used entirely. To draw a single scanline of the playfield,
the bits are used in this order: First, the 4 bits of PF0 are used going from
least significant to most significant; so they appear backwards. Then, the 8 bits of PF1 are used going from
most significant to least significant; they appear forwards. Then, the 8 bits of PF2 are used going from
least significant to most significant; appearing backwards again. If the playfield is currently set up to display
repeated, then these 20 bits are just drawn again in the same order. Otherwise, if it's in mirror mode, the 20
bits are drawn in reverse order to fill out the scanline. So in order to draw a standard line of the
playfield, first, the PF0 byte should be set before the beam even starts drawing this line. The PF1 byte needs to be written before this
point, where its bits are used to draw the next 8 blocks. And then the PF2 byte needs to be written
before this point, where its bits are used to draw the next 8 blocks. The last 20 blocks are drawn using the same
values--once the beam draws out all 40 blocks of the playfield, the first 4 blocks of the
next line can be set up via PF0. Now, what if we wanted the right side to be
completely different from the left side? Not a repeated pattern, or a mirrored pattern? Then we would just have to write to each of
the playfield registers twice per line, once for the left half, like we just did, but then
a second time for the right half. There is one thing we have to watch out for
though--the playfield registers should not be modified while their respective blocks
are being drawn to the screen. This results in garbage being written to the
screen as the register is being written to and read from at the same time. We update the left half of the scanline like
before, but then we can update the right half independently. The PF0 register needs to be updated sometime
after this point, but before this point. Then the PF1 register needs to be updated
when the beam is somewhere between here and here, and finally PF2 is updated after this
point and before this point. Before moving on, let's get a sense of scale
for how quickly things are happening here. How fast is the electron beam in the television
moving compared to how quickly the Atari's CPU can execute instructions? Is updating the playfield registers just barely
feasible, or is the CPU so fast that this is no problem at all? Recall that each playfield block is made up
of 4 pixels, so let's show those with some grid lines. And, we're gonna zoom in a bit. The Atari 2600's CPU, the 6507, clocks in
at about 1.19 MHz. Due to how all of the circuitry in the console
is based off of the same clock signal, it turns out that one CPU machine cycle takes
the same amount of as it takes the beam to draw out 3 pixels horizontally. Each line, including the horizontal blanking
period, takes the same amount of time to scan as 76 machine cycles. CPU instructions all take a different number
of machine cycles depending on how complicated they are. For example, a simple read or write from the
zero page of memory takes only 3 machine cycles. So writing a byte to the playfield registers,
which reside within the zero page, takes 3 cycles each. 3 machine cycles times 3 pixels is 9 pixels
in total, which is 2 and a quarter playfield blocks. We can mark out the exact locations where
the scanning beam will be at the latest possible moment that each playfield register can be
written by finding the boundaries between the playfield blocks that correspond to the
different registers. If each of the playfield registers needed
to be updated both times in one line, we see that the first write to PF0 can be no later
than 19 machine cycles after the start of the horizontal blanking period, or H-blank,
for this line. That write will take 3 cycles, then we have
a gap of 3 cycles before the next write to PF1 needs to be made. Continuing, we have gaps of 7, 8, 2, and 8
cycles; after the second write to PF2, there are 11 remaining cycles before H-blank begins
again. As you can see, using 18 of the 76 available
cycles for the line uses up quite a bit of time. And these are only the writes--the data actually
being written needs to be read from somewhere, which will take up even more of these cycles. Additionally, this is only to display the
playfield--we still have the player, ball, and missile sprites to display too! As well as actually executing the code responsible
for handling the game's logic and everything. It turns out that having a full, non-mirrored
and non-repeating playfield design wasn't very common, and why most games mostly stuck
to some sort of background that could be mirrored, or nearly mirrored. Let's take a look at the sprites now. There are two player sprites, two missile
sprites, and one ball sprite. The two player sprites use 8 bits each for
their graphics. Just like with the playfield, only one line
of graphics is stored at a time, so if your sprite changes shape vertically, it will need
to be updated on the fly. In fact, the only way to disable a player
sprite is to just set its graphics to all zeros, so you will need to update it on the
fly anyway, unless your player sprite is going to be a big vertical column spanning the entire
screen. The two missile sprites and the ball sprite
is just a single bit--either on or off. There are a pair of registers that allow the
player and missile sprites to be stretched or duplicated on the scanline, but still only
8 bits per player and 1 bit per missile sprite exist. However, these bits can be updated in between
the time that the copies are drawn, allowing for what looks like two distinct sprites on
the same line using only one sprite's data. Say you have a small graphic, say 8x8 pixels,
that you want to display at a specific spot on the screen, using the player zero sprite. Let's say, 40 lines from the top of the screen,
and 60 pixels from the left side of the screen. Just write the bytes 60 to the X position
register and 40 to the Y position register, right? If only it were that easy. Neither of these registers exist--positioning
sprites has to be done manually by counting scanlines and machine cycles. Let's work on the horizontal component first. The easiest way of setting the horizontal
position of a sprite is by using the reset sprite registers. There are five of them, one for each sprite. When this register is written to (the value
written doesn't matter since this is what's called a strobe register), the position of
the sprite will be set to the current location of the scanning beam. There is also a bit of delay, so it will end
up being a bit to the right of where the beam was--for player sprites that aren't stretched,
that offset is 5 pixels from where the store instruction ended. So, in order to get our sprite to show up
60 pixels from the left edge of the screen, we need the CPU to finish writing to this
register exactly when the beam is scanning the 55th pixel of a scanline. How do we know where the scanning beam is
at any point in time? There is no register to read the current position
of the beam. The way you do it is by syncing up the CPU
with the beam at a known position, and then keeping track of how far along you are on
your own, by counting how many machine cycles each instruction of your code uses. There is a special register called WSYNC which
pauses the CPU until the scanning beam finishes a scanline. This way, we know that the instruction that
runs after writing to WSYNC will execute at the very start of H-blank, which can be thought
of as being 68 'pixels' before the first visible pixel of the line. So after we sync up at the start of H-blank,
we have to wait these 68 'pixels' along with the 55 pixels of offset we want, before writing
to RESP0. 68 plus 55 is 123 pixels, at 3 pixels per
machine cycle, we get 41 machine cycles; the write instruction itself takes 3 cycles, so
we need to wait for 38 cycles. We need to load in the graphics for the sprite,
and that will take 5 cycles if we load a constant with an LDA instruction, and then an STA to
the graphics register. This leaves 33 cycles left to wait around. If there's nothing else to do on this scanline,
we can just use a bunch of NOP instructions which do nothing but take 2 cycles to execute
each. There's no instruction that takes only 1 cycle,
so we need a 3 cycle instruction to get to exactly 33 cycles--that STA instruction takes
3 so may as well just execute that twice. So for this first line of the sprite, the
code will look like this: the LDA to prepare the graphics data, the STA to store that data,
another STA that does nothing but use up 3 cycles, 15 NOP instructions, then an STA to
the RESP0 register, then an STA to the WSYNC register to wait for the next scanline to
begin. The horizontal position of every sprite is
persistent until another write to the reset position registers. So to draw the next 7 lines of our sprite,
we only have to update the graphics register. This makes the code much shorter since we
don't have to wait for anything; just load graphics, store them, and write to WSYNC. That will position the sprite in the correct
horizontal position, but we still have the vertical position to deal with. Counting 40 lines from the top of the screen
should be pretty easy you think? Well, unlike with horizontal position and
using WSYNC, there is no way to sync up the CPU with the beam vertically. The Atari actually has no hardware for controlling
vertical syncing or blanking at all; that has to be done in the game's code as well. Vertical syncing is what the television uses
to be able to tell where the start and end of each frame is. The vertical sync period is a small moment
where the video signal becomes 'dark than black', that is, a negative value. Vertical syncing can be enabled and disabled
with the special VSYNC register. In order for the picture to come through clearly,
VSYNC needs to be set for exactly 3 scanlines, then reset back to zero. This can easily be done by using the WSYNC
register to make sure these reads and write occur directly at the start of H-blank. After resetting VSYNC back to zero, the vertical
blanking period begins. This also has a register associated with it,
fittingly called VBLANK. This should be set to 1 at the start of the
vertical blanking period, and reset to zero at the end. Just like how the horizontal blanking period
occurs when the scanning beam has to travel from the end of one line to the start of the
next, the vertical blanking period is when the beam has to travel from the bottom of
the screen to the top, starting a new frame. There are 37 lines in V-blank in the NSTC
format, or 45 lines in the PAL format. So, in order for our sprite to show up on
the 40th line, we have to wait 40 plus 37 equals 77 lines after the end of V-sync. We can use this extra time to actually process
game logic if we had any. The V-blank period is the only time where
we don't have to worry about updating graphics for anything, so this is a great time for
handling stuff like that. In this example though, we have nothing else
to do other than showing this sprite, so we can just write to WSYNC a bunch of times to
wait out all this time. Make sure to clear the VBLANK register at
the appropriate time, then write to WSYNC 40 more times, which is equals to the number
of lines we want our sprite to show up below the top of the screen. After that, then we can append all of the
code we wrote for setting up the graphics and horizontal positioning of the sprite. Let's look at a few examples. We'll look at the title screen of E.T. since
it has two main elements that show a simple and more complex way of doing things. First is the title text up here that says
E.T. The E is drawn with player sprite 0, the T
with player sprite 1; and the two periods are drawn with missile sprites 0 and 1. Here is the code that positions these sprites
on the screen. This loop runs 5 times, once for each of the
sprites (even the ball sprite, which is just disabled here). There is a write to WSYNC right at the start
of the loop, so only one iteration of the loop will run per scanline. Inside the main loop, there are two main sections. First, some value is read from the game's
ROM and written to this register, which I'll get to in a moment. Then here, the lower nybble of this byte is
used as a counter for how many times to run through this small inner loop. The purpose of this loop is simply to waste
time waiting for the scanning beam to reach a particular point. Then the write to the corresponding reset
position register occurs, before the X index is decremented and the main loop starts again. After the loop concludes, another WSYNC occurs,
and then a write to another register we haven't seen yet. These two new registers are the horizontal
movement and movement application registers. This is just another way to move sprites around. Remember how one CPU machine cycle takes just
as long as it takes for the beam to scan across three pixels? This means if you use solely the reset position
register, your horizontal resolution can only be this good. To locate sprites horizontally down to the
pixel, you need to use the horizontal movement registers. By writing to the upper 4 bits of an HM register,
you are queuing up a value that corresponds to a number of pixels to move the sprite,
going from 7 pixels to the left to 8 pixels to the right. The movement doesn't happen when this value
is written, but instead the movement values queued up for all 5 sprites are applied when
the HMOVE register is written to. In this routine, you can think of the write
to the reset position registers as setting the 'coarse' horizontal position, the write
to the horizontal movement register as queuing the 'fine' horizontal position, and the write
to the movement application register as applying the 'fine' position. For example, the value retrieved from ROM
for the big E is $52. This byte is written to the HMP0 register,
which will correspond to a movement of 5 pixels to the left, since it only looks at the upper
nybble. Then, the lower nybble is used as the inner
loop counter--this branch will be taken twice. If we count out the machine cycles for this
routine, we see that each inner loop execution adds 5 extra machine cycles, which means the
coarse horizontal position moves in steps of 5 times 3 equals 15 pixels. This range is covered by the fine horizontal
position, since that ranges from 7 left to 8 right, a range of 16 pixels. For this big E sprite, the write to RESP0
finishes 36 machine cycles after the start of H-blank. This corresponds to 36 times 3 or 108 pixels. Subtract the 68 pixels from the H-blank period,
but add in a delay offset of 6 pixels this time (because the sprite is stretched), and
you get 46 pixels from the left edge of the screen. Then, after the loop concludes and the other
sprites' coarse positions and HM registers are set up, the HMOVE register is written
to. This shifts all the sprites over by a few
pixels corresponding to the value stored in the HM registers. The big E moved to the left 5 pixels, and
is now 41 pixels from the left side of the screen. Okay, that was the code to set up the position
of the 4 sprites, but now it's time to look at the code that updates the graphics each
scanline. Here is the code that does that. This first block is just run on the scanline
before the first line the sprite should appear. It does stuff like sets the color of the sprites,
makes sure their graphics are not flipped horizontally, and sets the number and sizes
of the sprites. In this case, the player sprites are shown
stretched at 4 pixels per bit, and the missile sprites are 2 pixels wide. The letters E and T are 16 units tall, so
this loop is prepared to run 16 times. All this loop does is grab a couple bytes
from ROM and write them to the corresponding graphics registers. The graphics for the two missile sprites is
only run for the last 2 iterations of the loop, since they only appear at the bottom
of the text. You'll notice that there are two writes to
WSYNC inside the loop, and that is because each unit of these letters is actually two
scanlines tall. So what looks like one large pixel here is
actually 4 pixels wide and 2 scanlines tall. Recall that if nothing changes from one scanline
to the next, no code needs to be written. Here, only a single DEX instruction is run
for the entire scanline. Once this loop finishes all of its iterations,
code for the other parts of the game starts to run, including drawing E.T.'s face here. This sprite is 48 pixels wide and 40 pixels
tall. Only the two player sprites are used to create
it. This is an example of a sprite that is drawn
by using the setting that copies the sprite and displays it three times in a single scanline,
with 8 pixel gaps in between them. Then, the two player sprites are horizontally
positioned 8 pixels offset, so that the entire 48 pixel range can be drawn. The sprite is split into 6 times 40 bytes,
each 8x1 strip is encoded as a single byte. This is all stored in a table somewhere in
the ROM. The code responsible for this large sprite
starts 3 scanlines in advance, since a few things need to be set up before actually drawing
it. In the first line, the sizing and positioning
of the two sprites are set up. This write to HMOVE is not important here,
a constant #$03 is loaded into the accumulator, and #$00 is loaded into the Y index register. The reflection registers are zeroed out here
and here, to make sure the sprite is drawn not flipped. The sizing registers are written to, which
display the sprites 3 times per line and at 8 pixel intervals. Then this pair of registers is written to,
which we haven't seen yet. These are the graphics delay registers. When these registers are cleared, a write
to the graphics registers goes directly to those registers with no delay. When VDELP0 is set, the graphics for player
sprite 0 won't fully go through until the graphics for player sprite 1 is written. This way the graphics can essentially be applied
at the same exact time. Likewise when VDELP1 is set, the graphics
for player sprite 1 won't fully go through until the graphics for player sprite 0 is
written. Here, both registers are set, which means
both registers' writes are delayed until the other register is written to. This is used for timing reasons--it allows
an extra position value to be queued up which lets this method of writing 6 bytes worth
of graphics data quickly to work at all. I'll explain a bit more about this when we
get to that point. Anyway, next, the graphics are zeroed out
in case they weren't already, making sure to write to each register twice because the
delay settings for both registers is enabled. Then, after a single NOP instruction to wait
an extra 2 cycles, the two writes to the reset sprite position registers occur. If we count the machine cycles here, that
works out to 39 and 42 cycles. Multiply by 3, subtract 68 for H-blank, add
5 for beam delay, and we get horizontal positions of 54 and 63 for the two sprites. Oops, that's a difference of 9, not 8. That's because this kind of STA instruction,
which writes the data to the register we need, takes 3 machine cycles, which translates to
an offset of 9 pixels. It's the closest we can get two sprites together
in a single scanline of work. No worries, we'll just use the HM register
to scoot the first player sprite to the right by one pixel to put it at horizontal position
55, and we're all good. The next scanline, all the pointers to the
sprite graphics data in ROM are set up in RAM. The first instruction is the write to HMOVE,
since writes to HMOVE should always come right after a write to the WSYNC register. The rest of this just sets up the six pointers
to the six columns of data for the sprite. This is done like this because the routine
that draws the sprite is actually a general routine that can be run with any graphic. So this is a prerequisite to calling that
routine. The third line just sets up the colors of
the sprites, and initializes the loop counter to #$27, or 39, since the loop needs to run
40 times. Here is the JSR subroutine call that calls
the general drawing routine. Before the last setup scanline concludes,
the graphics data for the left-most column of 8 pixels is queued up by writing to GRP0. Since the graphics data delay registers are
enabled, this doesn't actually do anything yet, but once GRP1 is written to, this write
will take effect. I'm going to refer to the 6 bytes of graphics
for the six columns as A, B, C, D, E, and F.
So after this scanline, byte A is waiting in the temporary register associated with
GRP0. Finally WSYNC is written to, and we wait for
the first scanline where the sprite will actually show up. Here is the main loop, which is responsible
for writing the graphics data at the appropriate time. The general method here is to write the required
byte right before it is needed. The data written to the graphics register
occurs on the last machine cycle of the write instruction. So a write instruction needs to completely
finish executing before the beam even begins drawing that part of the sprite. However, that means the write instruction
can begin before the beam is done scanning the previous copy of that player sprite. This is good, because a write instruction
takes a minimum of 3 machine cycles, which corresponds to 9 pixels, which won't actually
fit in between to copies of a single player sprite. Before looking exactly how the code does it,
let's try a naive approach, adjust it until it actually works, then compare the two. Let's suppose the graphics delay registers
are not set for now--we'll see why we need them enabled soon. In order to have byte F drawn to the screen
in this exact position, the third write to GRP1 needs to finish at the latest 1 pixel
before the beam reaches the start of this copy of the sprite. Why one pixel? Because one machine cycle translates to 3
pixels, we can't have an instruction start or end on any pixel, just at a resolution
of every 3 pixels. And it turns out this 3 pixel machine cycle
grid lines up right here. So if this write instruction must end by this
point, and it takes 3 machine cycles, or 9 pixels, it must start here. This means the third write to GRP0 to write
byte E can only execute as late as here. The second write to GRP1 for byte D here,
and the second write to GRP0 for byte C here. We are cutting it close here, but remember
that the data actually gets written during the last cycle, so the fact that these two
cycles writing to GRP0 overlap with the first copy of the player 0 sprite is completely
fine. Then we can tack on the two initial writes
to the graphics registers here and here. Just to make it a bit more clear, let's draw
out which machine cycles actually correspond to sprite data being written. Now, this would work, but there is one problem. We can't have six write instructions in a
row, especially if all six data values are different. The 6507 CPU only has three internal registers
for general-use data. So it can only hold three values at a time. We'll need to stick some read instructions
or data movement instructions in here to actually move six bytes around this quickly. The first easy step we can do is push these
first two writes to the graphics registers way out here. There's nothing stopping these from being
super early, since the sprite doesn't exist over there; there's nothing wrong with writing
to the registers so early for the first two bytes. Since there's room, we can even just insert
two read instructions before each of these. This leaves 4 bytes to move around quickly. We can preload three of these bytes into the
A, X, and Y registers before the sprite is drawn as well. This way we can write three bytes in sequence
easily. The problem occurs with this fourth byte. We can't squeeze in a fourth read instruction
while the sprite is being drawn. Even if we use a data movement instruction
which is only 2 machine cycles and try to be fancy by utilizing the stack pointer register
S to temporarily hold a fourth value, it just won't work. The issue is right here, since this instruction
to write byte C can't come any earlier or else it will conflict with byte A being drawn
to the screen. This is why we need to set the graphics data
delay registers. Doing this will give us an extra buffer for
free, since the data is held in that temporary internal register waiting to be written, which
will act as our fourth register. So writing byte A to GRP0 doesn't actually
affect anything at first, then writing byte B to GRP1 applies byte A to the sprite graphics,
then writing byte C to GRP0 applies byte B, and so on. Now, we can actually shift the second write
to GRP0 to the very front as well, allowing us to use the accumulator to store byte F.
So after the third write, we queue up the three CPU registers, A, X, and Y, with the
three remaining values. We strategically write to each graphics register
when the beam is scanning over the opposite sprite. After the sixth write, all of the data is
written, but that last byte is still stuck in the temporary internal register. In order to get byte F actually applied, we
just have to execute a dummy write to the opposite graphics register, GRP0 in this case. This will write that last value just in time
for the beam to finish drawing the rest of the sprite. If we look at the game's code, this is essentially
what is happening. The main difference is that reading from ROM
indirectly via a pointer like this takes 5 cycles each instead of 2 like in our mock-up,
which makes things a bit more difficult. The first write to GRP0 occurs at the end
of the previous line like we saw. The first write to GRP1 and second write to
GRP0 occurs at the start of the scanline. At this point, the two graphics registers
hold bytes A and B, while byte C is ready to go in the temporary register. Then, bytes D, E, and F are read from ROM
and prepared. The Y index register is actually used to fetch
the graphics data from ROM, so we have to clobber that last, and use a temporary memory
byte to hold the value for the accumulator. Then, surprisingly without any waiting required,
bytes D, E, and F are written to their respective graphics data registers, and the final dummy
write to GRP0 happens after that to actually put byte F in the graphics register. The loop counter is decremented and we jump
back up and prepare for the next line. Finally, after the entire sprite has been
drawn, the loop falls out, and all of the graphics data, sprite size, and graphics delay
registers are all zeroed out. One last example, going back to the playfield. The cubes in the stage layout of Q-bert are
made via the playfield. You'll see it's not quite symmetric due to
these center cubes in the rows with an odd number. Also, the playfield seems to be using 3 or
4 colors at once, instead of just one like I said before. Let's look through one scanline and see how
this is accomplished. Just like how the graphics data for things
can be changed mid-scanline, so can the colors. Only one color can be used at a time, but
that one color can change over time. So the playfield can be blue on a black background
or yellow on a black background, but it can't be yellow and blue at the same time. The COLUPF register is responsible for controlling
this color. In this scanline, there are 3 cubes; the color
values for the cubes are already stored in memory, so they are just moved from there
to the proper register at the correct time. Because of the odd size of the cubes, the
playfield registers, namely PF2, need to be updated mid scanline. This is just so that the center cube doesn't
show up larger than the others--however it does result in this large zig-zag shaped gap. The register is written to just perfectly
so that the two halves of the cube show up properly even though they share the same bits
of data. Along with the weird gap down the middle of
the screen, the mirrored playfield is also the reason why the shading on the cubes is
inconsistent. Thank you for watching. Retro Game Mechanics Explained has a Patreon
and SubscribeStar page if you would like to be generous and help support the channel. Otherwise, watching the video, leaving a comment,
and sharing it with your friends is a great way to help too. If you would like to see more about the Atari
2600, please leave a comment saying so!
The Atari VCS 2600 was a simple machine designed in 1977 to make your TV play Pong and Combat, and that's about it. The fact that people spent the next ten years making it run full-fledged computer games is nothing short of amazing. It's as though European classical orchestras decided to skip the rest of the instruments and see how far they could push kazoos.
A thorough, clear and beautifully presented explanation of an incredibly challenging programming environment, well done!
I was aware that the Atari 2600 had only 128 bytes of RAM, still I would never have guessed at the level of technical difficulty involved in drawing two chunky, monochrome letters, or a 48-pixel-wide monochrome bitmap.
You had this on pretty much all 8-bit systems.