NES Emulator Part #4: PPU - Background Rendering

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Must admit you've put a lot of work into this and done it justice well done loving it

👍︎︎ 1 👤︎︎ u/grant2258 📅︎︎ Oct 01 2019 🗫︎ replies
Captions
hello and welcome to part 4 of my nez emulator series if you've not seen the other parts I suggest that you do so before watching this one and I just want to fit in a small disclaimer before I start at no point do I claim that this emulator is 100% faithful to the original hardware it is likely not cycle accurate and it probably isn't compatible with all of the nez games out there but that's not the point what I'm trying to do with this series is explain an approach to emulating a system and it just so happens I've chosen a system which means a lot to me this video is the first of two videos describing the picture processing unit or the PPU and this is a device on the neces main hardware that is responsible for producing the images we see on the screen specifically in this video we're going to look at how the nez stores and renders its graphics when it's producing the images we see in the background and if you would indulge me for a moment I'd like to thank all of my patreon and YouTube channel members for their support you may recall from the previous video that the PPU has access to three memories on its address bus an 8 kilobyte pattern memory or character ROM and I'm going to assume it's a ROM it can be a ram but for this I'm going to assume it's a ROM a 2 kilobyte name table memory which is mirrored through this address range and you'll hear me refer to this as V RAM occasionally the configuration of this specific memory is non-trivial and we'll look at that in detail and a small memory that holds the pallet information this is used to decide which colors are displayed on the screen broadly speaking the pattern memory contains the sprites what things look like and these are stored as bitmapped images the name table describes the layout of the background and the palette memory contains the colors this video is going to specifically focus on rendering the backgrounds of the games so even though this memory handles sprites don't assume it is exclusively for sprites it just really contains the graphical information that all parts of the PPU require to draw things on the screen the 8 kilobyte pattern memory can be broadly split into 2 4 kilobytes actions and each one of these sections is split into a 16 by 16 grid of sprites a sprite can also be referred to as a tile and the tile itself is 8 by 8 pixels meaningless if we viewed this as an image it would be a hundred and twenty eight pixels wide by 128 pixels tall the PPU has the ability to select between the left hand side of this memory or the right hand side of this memory to access the source for its drawing assuming the pattern memory lies solely on the cartridge we have to go via the mapper to access it and so the mapper has the ability to selectively swap out sections giving access to different sprites and in fact this is how many sprites are animated on the nares 8 by 8 pixels per tile doesn't seem like very much so the sprites that we see in the game or the background elements usually consist of multiple tiles and it's quite common to see this memory split that 1/2 handles sprites the characters that jump about and move around on the screen and the other half handles background tiles so these may make up things like the scenery in the game never underestimate just how creative an artist can be with such a limited number of tiles to choose from famously in Super Mario Brothers the clouds are the same as the bushes they're just a different color so the tiles can be reused the tile is an 8 by 8 bitmap but it's not like the bitmaps you've seen in Windows paint the nest only uses 2 bits per pixel so that gives you a choice of four colors per pixel and it stores the tile in the following and quite convenient way since we've only got four colors our pixels can assume the values 0 1 2 or 3 and the information for a tile is stored in two bit planes so I'm just going to denote this one as being the least significant bit plane and this one has the most significant bit plane the value of a specific pixel is the sum of the two bits from the respective locations in the bit planes so that's zero value what have zero in those locations the pixel with a value one has a 1 here but a 0 here the value 2 has a 0 in the least significant and 1 in the most significant and of course 3 has a 1 in both locations for reasons which will become more apparent a little later pixel values of 0 can be considered transparent storing in bit planes like this is quite convenient it means we don't have to do lots of shifting and bitwise manipulation to extract the data because as you may have noticed we've got 8 bits going across like this a single byte and the tile in memory is stored as the entire least significant bit plane followed by the entire most significant bit plane implying that it takes 16 bytes to store an entire tile I probably should have drawn those in hex and so going back to our pattern memory once we've identified where the tile offset is in that memory of the tile that we're interested in we can just read the next 16 bytes to get the entire tile out of that memory the 2 bits of the tile is not enough to specify its color we need to combine it with a pallet the pallet memory is structured in the following way that's address 3 F 0 0 we have a single entry for the background color and this is going to be an 8-bit value that indexes specifically one of the colors the nez is capable of displaying and I'm showing here the nest palette generated by bisquit and so if I wanted the background color to say B cyan and looking at the pallet table I can choose the appropriate color in this case it's 0 X 2 C whereas 3 F 0 0 stores one byte the other entries in this memory map store four bytes even though there are four possible entries the fourth byte isn't used but let's try and understand why if we index our pallets 0 1 through 7 we can combine our chosen pallet ID with our 2 bit pixel value to select the appropriate memory location that contain the color that we want let's assume the pallets ID is 1 and the pixel value is 3 we know that each pallet entry consists of 4 bytes so we'll multiply the pallet ID by 4 and we'll add to that the pixel value knowing that our pallet memory starts at 3 f double zero we can index from that address so here we would have a 0 1 2 3 4 5 6 7 which gives us entry 3 4 pallet 1 which is what specified by the pixel and pallet ID curiously if the pixel value is 0 our total offset would have been 4 0 1 2 3 4 and that takes us to our unused location 0 means that pixel should be transparent ie the background color should show through and so before when I said this particular pallet index was unused that's a little misleading in fact it mirrors the background color address as do all of the fourth entries for each pallet this clever architecture means that no matter what palette you how selected for your particular drawing you get an effective bonus color from the background which can make the item that you're drawing look transparent even though this is quite obvious for sprites it also applies to background tiles too and I think this whole assembly is a wonderful example of the designers really thinking about the most optimal memory strategies and require the most minimum computation in order to compute the final output color for a given pixel depending on whether or not it can be seen one final partitioning of this memory is that these 4 entries are used for background palettes and these 4 entries are used for foreground palettes or sprite you'll hear me use the term foreground or sprites interchangeably so I think it'll be quite fun to visualize the pattern memory for different games I'm going to start off with the code just as we left it in the last episode but because there's a lot to get through in this video I'm not going to go through the code byte by byte as I did with the CPU video I've provided an accompanying source file that goes into some detail explaining every facet of the PPU at this stage and so you may find this very useful to consult whilst watching this video we finished the previous video with a rudimentary skeleton of the PPU it didn't do anything except artificially generate some noise but it was connected to the CPU bus and the cartridge appropriately via mapper zero we'd already filled in an array of all of the available colors the net is capable of displaying and we created two sprites in an array called SPR pattern table which we'll use to visualize the pattern memory so let's implement this function because I think by doing this we'll gain a real understanding of how palettes and the bitmaps work together we know that for a given pattern table there are 16 by 16 tiles so I'm going to create two nested for loops to iterate through these I want to convert my 2d coordinate into a 1d coordinate to index the pattern memory now we've used Y times width plus X many many times and I mention it in many many videos but this one's slightly different here we've got the Y but the width is 256 and that's because don't forget a single tile consists of 16 bytes of information and we've got 16 of them in a single row in the sprite memory so this offset will be the byte offset in that memory for each tile we've got 8 rows of 8 pixels and so just for completion I'll add in the eight columns here in order to read from the pattern memory we need to use our PP u read function so this will place an address on the PP use address bus and get the data from wherever it needs to come from hopefully the mapper will sort out that translation for us and calculating the address is quite easy we know we've got to pattern tables to choose from and we pass in index I depending on which one were trying to get the sprite for and we also know that a full pattern table is 4 kilobytes and into that 4 kilobytes we want to offset by the offset we've just calculated we know that for each tile a single of pixels is one bite so I'm also going to offset by that and since we know that we're going to read from the least significant bit plane before the most significant bit plane I'm just going to for visual reasons put an ad zero here and that's because the corresponding most significant bit is exactly the same but offset by an additional eight together this little routine gives me two bytes each containing eight pixels worth of data and we need to combine these bytes to give us our final bitmap color between zero and three I'm going to combine these bytes by adding them but I'm only interested in adding the least significant bit of each byte because that will give me a value between zero and three and this means the next time in this loop we get to this calculation we need all of our bits to have shifted one so that next time the next bit is the least significant bit now we have a pixel value we can start to draw that value into the sprite that represents this section of the pattern memory the x-coordinate of this specific tile is the tiles offset X location in pixel space but I'm doing this plus seven minus column because the least significant bit of our tile word refers to the rightmost pixel in the tile but we're drawing from the top left of the tile first so we just need to invert it on the x-axis the y-axis is much simpler and finally we need to choose the color but we've not used a palette yet all we've got is the pixels two bit value so I'm going to create a function called get color from palette ramp which is going to take in a palette ID and the pixel value to determine the final screen color this means I need to add a palette value as one of the arguments to this function so let's add this get color from palette RAM function if you remember we need to take the palette ID and multiply by four and to that we add the pixel ID this whole value is then offset into the palette memory and so we need to perform a PP you read of this location to access that the value that's returned from this read is an index into the nez's color palette which we've stored in pal screen as an array and so we can return the color directly the PP you read and PP you write functions are empty at the moment they don't do anything so let's add in the three main memories we know that we can defer to the mapper on the cartridge to handle relocations of the requested addresses the pattern memory sat between 0 & 1 FFF and the name table memory sat between 2003 eh-heh and finally the palette memory sat right at the end what we have for reads we also have 4 right in our PP u class we had added these memories as arrays name table palettes and patents for the table palette array we can select the appropriate index by masking the bottom 5 bits and are also hard code in the mirroring finally all that's left to do is to read directly from that memory location and naturally we'll do something very similar for right since we want to visualize the pattern memory it would be useful to be able to read from it well that's quite simple in this range here is the pattern memory array and the first dimension chooses whether it's the left or the right hand side of that array of data by examining the most significant bit of the PPU address the offset into that memory is calculated by masking the remaining bits of the PPU address patent memory is usually a ROM but just in case I'm also going to add in for PP you right because on some cartridges this memory is in fact around let's go back to the main application now this is the pixel game engine derivative which were using to visualize and interact with the emulator to this class I'm going to add an additional variable called selected palette because I'm going to allow the user to choose which palette is being used to draw the pattern tables which they can do by pressing the P key it just increments this value and wraps it around drawing the pattern tables is now quite simple we just draw the sprite and we call the PPU function get patent able to render the patent table instantaneously for us and we want to look at both of them so pass on the index 0 or 1 and they pass on the current palette I wish to view them in and in fact whilst I'm here and I'm not going to go into detail on this I'm going to draw the palettes as well and draw which one of them is selected don't worry about this it's unrelated to the emulation it's purely cosmetics for the user interface so let's take a look all right now we can see we've now got two gray rectangles in the bottom corner that are going to represent our pattern memories and if I press the P key I'm scrolling between the 8th visible palettes but everything's gray and boring and that's simply because the program hasn't run yet so it's not actually established what the palette should be even after running for a little while we'll see there's still no visible information here what's going on well if I step through the program we can see it doesn't get very far the program is attempting to load from 2002 on the CPU address bus and it's not getting the value it's looking for so it's stuck in this loop well 2002 and in fact 2000 to 2007 are important registers they're the ones that control the PPU and reveal its status to the CPU before we can go any further we need to look at those the CPU talks to the PPU via 8 registers it's actually 9 but we'll look at the ninth in the next episode and in the CPU address bus these exist from 2000 to 2007 although they are mirrored over quite a large address range 2000 is the control register this is responsible for configuring the PPU to render in different ways at 2001 we have the mask register this decides whether backgrounds or sprites are being drawn and what's happening at the edges of the screen for 2002 a very important one is the status register and for this video that's going to be important for telling us when it's safe to render the next two registers I'm going to leave out for this video at 2005 we the scroll register it is through this register that we can represent game world's far larger than what we can see on the screen as Mario runs to the right towards the flag the level is effectively scrolling to the left finally we have two registers in 2006 and 2007 that allow the CPU to directly read and write to the PP use memory address and data the full addressable range of the PPU is 14 bits but the CPU can only transfer 8 bits during a write so it has to do two successive writes to set the address the first 8 bits set the low byte of the address and the second 8 bits set the high byte the actual data to transfer is of course written and read through the data register to the PPU header file I'm going to add some structures that consists of bit fields to represent the important registers here is the status register it has three bits that are important one for vertical blank one for sprite zero and another for sprite overflow i've unioned them with an 8-bit register so we can access it numerically refer to the first episode of this series to understand how this structure works the mask register is really just a series of switches that determine which parts of the PPU are switched on or off and so we can see the important ones rendering sprites and rendering background and finally we have the control register I fully appreciate that that's a huge dump of new stuff to take in but it's my intention to explain it as we go along the test program I ran before got stuck because it was reading from the status register in particular it was looking for the status of vertical blank let's take a moment to understand the order of events that occur whilst rendering a frame in the last video I equated to rather strange variables scanline and cycles to be the equivalent of Y and X values on our screen there is certainly a relationship but it's not one-to-one scan lines represent the horizontal rows across the sky back in the days when the electron gun was firing at the phosphorescent material at the front of the screen the nez resolution is effectively 256 pixels across by 240 pixels down however the scanline can exceed these dimensions as the scanline is going across it's counting cycles and we crudely estimated that one cycle was the equivalent of one pixel along the scanline since the scanline goes beyond the edge of the screen so does the cycle count and in fact there are 341 cycles per scanline this is approximate if you really read into the nez dev documentation you'll see that some of these numbers are rounded slightly once it gets to the end of the scanline the gun would switch off and go back to the next one and it would keep doing this all the way down the screen through the visible area so 240 visible scan lines but it doesn't stop when it gets to the bottom in fact there are quite a few more scan lines off the bottom of the screen in total there are 261 scan lines this period of unseen scan lines is known as the vertical blanking period and it's important that the game knows when this period starts if whilst the scan line is being drawn across the screen and the CPU starts to talk to the PPU it could inadvertently cause all sorts of graphical artifacts on the screen in some advanced scenarios this is in fact exploited but for our simple use case it would generate noise it just wouldn't look very nice and so it's important that the CPU knows it can do some processing whilst the screen is updating but it can't really change the nature of the PPU once the vertical blank period has started however of course the CPU can change the nature of the PPU we can't see what it does it's free to make the output as messy as it likes so it is typically during this period that the CPU is setting up the PPU ready for the next frame for our emulation purposes we're going to assume that once we've got to the last scanline we don't jump back to scanline zero instead we put in a conceptual scanline - one and we'll see more about that when we start rendering the background tiles a little later and so the vertical blank bit in our status word tells us whether we're in screen space or nothingness space we can mess things up we can also optionally omit an interrupt request to the CPU at this point - this uses the non-maskable interrupt feature of the 6502 whether or not we omit this interrupt is governed by a bit of the control word on the PPU register combinations of the vertical blank bit and the non-maskable interrupt are used for synchronizing the PPU with the cpu so things on the screen look normal it's important that the CPU finished doing whatever it's doing whilst the screen is being rendered or else we'll get lag and the CPU may have to wait an entire frame before the screen is updated you saw this a lot on there's games when things got busy on the screen and for exactly the same reasons you see it in modern games - in the previous video we'd already left in placeholders for these registers it's time to start filling some of these in so let's start with the control word for write simply we assign our control status word to the data being written and because it's convenient to do so I can do the same for our mask register you can't write to the status register and I'm ignoring these two for this video we'll also come back to scroll a little later - to handle the address and data transfers I'm going to need a few variables I need to know whether I'm writing to the low byte or high byte so I'll create a variable called address latch which indicates which when we read data from the PPU it is in fact delayed by one cycle and so we need to buffer that byte I'll also add a 16-bit variable to store the compiled address so if we're writing to the address register and is equal to zero I'm going to store the lower eight bits of the new PPU address and set my dress latch to one which Prime's it for the next rights of the address where I'm going to store the high byte once I have the full address I can write data using my PP you write function in the CPU read implementation you can't read from the address register doesn't make any sense but we can read from the data register but it's delayed by one read so I'm going to take the current contents of my buffer and transfer that to the output data and read into the buffer the value at the address and here is where we get into one of our first of many quirks of emulation this delayed read is true for almost all of the PPU address range except for where our pallets reside there are various Hardware reasons why this could be the case at the moment everything is synchronous to the PP use instruction clock and so before memory can output a value it needs to be primed with an address these all take clock cycles this is why you get a delay however there are certain types of small memories available which don't need this delay they work using combinatorial logic and can output data within the same clock cycle it takes a little bit of time but normally you it ensures the designer that the propagation time through that circuit yields correct results within the clock cycle and in this case I feel that the palette memories stored exactly this way so we need to put in a special case just for handling pallets addresses and so in this case I don't want to wait another clock cycle to get the data in the buffer I know some of you will be thinking ahead and realize that hang on none of this can possibly work in an actual emulation and you're quite right but we'll come back to that when we start implementing the background scrolling the only other register we're interested in reading from right now is the status register and reading from the status register also does certain thing to the PPU that's quite an unusual concept to get your head around just the act of reading is changing the state of the device when we read from the status register we're only interested in the top three bits the unused bits tend to be filled with noise or more likely what was last on the internal data buffer of the PPU I don't think any legitimate nez games rely on this behavior but it is isolated in nez dev as being factual so I've included it here although you could probably get away by not having this part at all interestingly reading from the status register also clears the vertical blank flag so whether or not you are in vertical blank is irrelevant as soon as you read the status to determine if you're in vertical blank it gets reset to zero as well as setting the vertical blank flag reading from the status register will also set our address latch back to zero and just a minute this is making me think of something we've just done incorrectly at the moment when we're writing to the address port I'm setting the low byte and then the high byte this is wrong I've made a mistake we need to set the high byte and then the low byte so I'm going to swap these round apologies for that we know that our program was getting stuck reading the status register so I'm going to hack in something just to make some progress I'm going to set it that it always returns a 1 for the vertical blank bit when it reads this register this is to get it past the part of the program where it's getting stuck so let's take a look I'll run the program and great I'll just pause that what we can see now is we're getting some sprite information in our pattern tables but it looks a little off to me this is the nez test wrong and I know that this doesn't look like this it's not these colors the palettes aren't quite correct and we can see that it's not really made any effort to try and set the palettes to something useful we're missing a fairly important part of what happens when the CPU writes data to the PPU it would be very tedious for the programmer to have to write 2 bytes of address and then one byte of data most of the time programmers will be writing data to successive locations and so the PPU provides a facility for this it has an auto increment on the PPU address when written to and when read from so let's try it again that's better we can see a full range of colors now for the palettes and as I press the P key I can scroll through the palettes and see how the pattern table looks with a particular palette applied for the purposes of demonstration I have significantly simplified this part of the emulation we'll see in the second part of this video how this can become a lot more complex to handle choosing of tiles depending upon the location of the scanline we're now going to get into the serious side of rendering the background but before we can do that let's add in the timing to omit the non-maskable interrupt at the end of a frame we know the precise location when we enter vertical blank when our scan line is 241 and our cycle is at the beginning so at this specific time I'm going to set the vertical blank bit to 1 if the enabled non-maskable interrupt bit has been set in the control register then I'm going to set a non-maskable interrupt variable to true and this is a boolean that I've included in the header file we need to get the fact that the interrupt has been emitted from the PPU to the CPU and I'm going to do that in the bus and this is very simple if it's true then I'm going to call the nmi function on the CPU and I'll also reset the enemy flag on the PPU we know when we leave the vertical blanking period because we're effectively at the top left of the screen so I'm going to set my vertical blank bit to zero so now the status word accurately reflects the state of the vertical blank period and we optionally have control of emitting the non-maskable interrupt this means I can get rid of my little hack to make things work I thought at this point it might be interesting just to see if any other games are capable of displaying their pattern here we can see the Patten memory for Super Mario Brothers and you can just about make out the sprites that comprise of a certain Italian plumber this is definitely not cheerio and on the Left we can see it is the sprites of the characters and on the right we can see it as tiles that make up the background scenery and as I change the palettes we can see how the colors changed too let's try another one here we can see again we've got background tiles and sprites and they've changed color on their own that's not me changing the palette the program itself has changed the palette this is encouraging stuff so let's just have a quick little recap of we're up to because it's going to get complicated we have some fixed memories for the name table the palette and the pattern memory and we've created three registers the status register tells us where we're up to in our rendering process the masks register we've not really looked at yet and the control register really all we've used so far is this enable nmi pin to emit the correct interrupts temporarily I've created this register called PPU address so we can visualize the pattern memories as the program is executing in reality this is a dramatic oversimplification of how this address works and that's what we'll look at next now that we've examined how graphical information is stored on the nests we can get to the real heart of this video and look at how the backgrounds are scored and subsequently rendered backgrounds in there's games usually make up the level scenery and are more or less static compared to the objects in the foreground known as sprites the background of the game is stored in a name table in the name table memory and here I have a name table it's one kilobyte in size so it's 32 entries across by 32 entries down each entry is a single byte and that byte represents an ID into the pattern memory we saw earlier if you recall the pattern memory is a 16 by 16 grid of 8x8 tiles so quite conveniently there are 256 possible tiles place in a single name table location since each tile is 8 by 8 pixels we can now see where the Nintendo is getting its resolution from 32 times 8 is 256 pixels across and 32 by 8 is 256 pixels down but it's not the Nintendo vertical resolution is in fact 240 which means not all of the rows of the name table are used in fact these bottom two rows of the name table are used for something else which we'll look at later when crafting a level the game designer carefully chooses the tiles they need and the locations they need to be placed within and sometimes the background even contains elements you might think a foreground in its most simple form a single name table like this represents a whole visible screen and this is just fine for some of the earlier Nintendo games like Donkey Kong slightly more sophisticated games like Mario required the screen to scroll left or right and we can write how many tiles are offset from the top left into the scroll register of the PPU but we've got a problem we've clearly run out of name table here to facilitate scrolling the nez actually stores to name tables and they line next to each other in the memory as the viewable area of the screen Scrolls across it crosses this boundary and we render from two different main tables simultaneously the CPU is tasked with updating the invisible parts of the name table with the bits of level that are going to be seen next when the viewable window Scrolls pass the end of the second name table it's effectively wrapped background into the first one and this allows you to have a continuous scrolling motion in two directions the nests itself has memory for just two named tables so that's two kilobytes of video RAM but by utilizing address mirroring we can conceptually have four named tables there's still only two kilobytes to store the data so two of these named tables are simply duplicates of the other two this duplication is also rather fusing Lee called mirroring in this configuration this is called horizontal mirroring because our actual memory contains these two named tables but the ones to the side of them are the mirrors so if I write to this location here I'm also writing to this location here in Mario the screen Scrolls horizontally but some games it only Scrolls vertically and so in games that scroll vertically between the named tables we're using horizontal mirroring games like Super Mario Brothers use vertical mirroring where the two kilobyte memory stores these named tables and the ones underneath are the mirrors in this case writing to this location is exactly the same as writing to this location the configuration of the name table mirroring can come from a variety of sources on some cartridges like Super Mario Brothers it's hard coded into the cartridge circuit and that means the entirety of Super Mario Brothers is limited to scrolling horizontally some mapping circuits can also dictate how mirroring is implemented and can dynamically switch between the two depending on what's required admittedly this is on slightly more advanced games in games like Zelda you see scrolling occurring in both directions in more advanced games like Super Mario Brothers 3 you can see scrolling in both directions simultaneously but the core concept to understand here is that as you're scrolling in a particular direction the CPU must be updating the name table with the background that's about to appear we can specify a particular tile offset using the scroll register and as we've done many times in this video it's not dissimilar to Y times width plus X but if width is a power of 2 we can take advantage of binary numbers to perform this calculation for us just by existing here we know we've got 32 by 32 tiles that requires 5 bits per tile offset I'm going to call these offsets coarse Y and coarse X both of these are 5 bit words by simply concatenate eating them to form a 10-bit word we have actually implemented this equation because in binary if we split the word at some point the number on this side is a count of how many times this side occurs and so it's a power of two boundary in this case it's 32 this side is automatically multiplied by 32 plus whatever this value is since we've got four possible name tables that we can address we're going to need an additional two bits and excuse me for just reusing the same grid pattern here and so using exactly the same logic as before I'm going to extend our 10 bit address to a 12 bit address name table X and name table Y for name tables gives us 4096 possible locations which very conveniently is the maximum number we can represent with a 12 bit address if our game only scrolled in hole tiles it would look quite blocky and jumpy and the nest doesn't look block in jumpy it can scroll quite smoothly because each cell is 8 by 8 pixels we also need to store offset into a single cell and so I'm going to introduce another two variables fine Y and fine X note that these are not part of this 12 bit address as the scan line is zipping across the screen we count cycles to work out which particular tile the scan line is currently residing within each cycle of the scan line is the equivalent of a single pixel being drawn and therefore for each cycle we can increment our fine x value if the fine x value goes to greater than 7 we'll increment our course x value in a similar way each time we progress down the screen with new scan lines will increment our fine Y value and again if it goes greater than 7 we'll increment whatever is in course Y this doesn't change the number of bits it was just a number I added 1/2 if we have scrolled the screen as were incrementing along the scan line counting cycles at some point we'll cross into the next name table if this happened we can invert our name table bit and likewise as we're scrolling down the screen will invert the name table why bit and even further if we carry on scrolling in one of those directions and go beyond to the second name table in that direction will invert those bits again which resets them back to normal and implements the wraparound functionality we'll look at this form of addressing in a bit more detail in a few moments but some of you may already be thinking well hang on we know which tile we want to display from our pattern memory for a particular location on a name table but we've not specified which palette recall that the bottom two rows of a name table are not visible on the screen this is 64 bytes of what is called attribute memory each one of these bytes is responsible for a region of the name table and since we've got 64 of them it's fair to assume that we can divide the name table up into an 8x8 grid one of the attribute bytes is responsible for a four by four cell region of tiles these 16 tiles occupy quite a large area of the screen so limiting that one area to a single palette is rather restrictive for the designer also recall from the palette memory there are only four pallets available to use in the background so in principle we only need two bits per palette since the attribute byte is eight bits we can specify four distinct palettes in that word and so we can reasonably assume we can break our four by four tiles into for lots of 2x2 tiles and the nest does this and it apportions bits seven and six to being the bottom right bits five and four to being the bottom-left bits three and two to being the top right and bits one and zero to being the top left so all four of these tiles must share the same palette and for a reference of scale the question mark lock in Super Mario Brothers is a 2 by 2 tile and so not only do we have to work out where our scan line is after scrolling in the name table to get the tile ID we can use the same information albeit crushed down slightly by throwing some out and offsetting it to a different location to choose the appropriate attribute byte for that region and we can work this out by taking our composite address of course Y and course X and reducing their 5 bit implementations to 3 this is the same as dividing by 4 so our original 32 tiles on the name table have now been reduced to the 8 regions of palette in the attribute memory if we assume up here is address 0 for our name table then clearly whatever number we have got that represents our attribute memory offset we need to offset to the start location of the attribute memory which is 3 C 0 and we'll use the 2 bits that we've just thrown out here to help us choose which two-bit section of the 8 bit attribute memory word we're using for a given tile personally I found this part of the nez emulation probably the most complicated part and my very simplified description of it here is probably not doing it justice so please do consult the source file because there's a lot of detail in the comments specifically about how this mechanism works before we start rendering the backgrounds I think it might be useful to visualize a name table but I'm not going to visualize it in graphical detail because that will just be rendering the background instead what I'm going to do is display the tile IDs just to make sure that we've got all of the memory set up correctly here are our - 1 kilobyte named tables and we've left space in our PP you read and PP you write functions to access the name table memory with the appropriate mirroring but we need to know what the mirror mode currently is but for the simple demonstration games I'm using that information is contained in the cartridge so looking at the structure that represented the header for the wrong in a similar way to how we determined which mapper was being used I can also extract how the cartridge is mirrored and for now I'm going to assume for very basic modes horizontal and vertical and a couple of others we'll worry about in the future so I'm going to set up special conditions depending on the mirroring mode there are some fancy bitwise mathematical ways to do this but as I've described I'm trying to keep this visually simple if we're in vertical mirroring mode then we look at the address offset to the start of the name table Ram and choose the appropriate physical name table depending on that address the horizontal mirroring it's quite the same except the physical name tables are in a different order and as always what we do for read we also do for right before we display the name table IDs I'm just going to stop the emulator from outputting the noise I think we're done with that now because I don't want to fully visualize the name table I'm just going to draw the IDS in the corresponding name table location a top of whatever the screen is outputting in this case I've chosen name table zero and I'm converting them to hex I've pre-loaded the nez test as wrong so let's take a look so nothing's happened yet the name table is all zero so I'll run this the emulation and we can see the name table has changed and he's got some structure to it it's a little bit too difficult to interpret perhaps let me just stop the emulation so we see that the character 20 appears quite a lot in the environment and because everything conveniently is in hex and our arrays of patterns are in sixteen by sixteen grid we can easily work out which particular pattern tile the name table refers to so for 20 that would be zero one two down and zero cost so that's empty space for the most part which is correct because the menu screen for the nest Test ROM is mostly empty space with a list of text in between a quick and crude way to visualize the name table could be to take this ID directly and then use the draw partial sprite function of the pixel game engine and the pattern table as a source sprite to draw that tile in the right place you know what I couldn't resist doing that so I've just implemented that line of code and we'll take a look so this is going to hopefully show the name table but it's not rendered as the nez intended well there we go we actually see the test menu for that particular ROM should be able to change the palette - yep and because I'm getting all giddy and excited I can't resist I'm just going to try another game well hmm I don't think it's chosen the correct name table there that looks like the one for the sprites I'll just brute-force swap that over for now well again it's kind of there but not quite right given at the moment the performance is atrocious I was quite pleased to see this screen pop up automatically let's just see if we can repair some of those glitches because I think I know what's causing them named tables are fundamentally updated by the CPU writing to the PPU to increase the efficiency of this process the PPU address automatically incremented so you could just continuously write a stream of data however just incrementing by one will only increment the address in a horizontal direction across the screen what if you wanted to write to the name table but in a vertical orientation well this is exactly what the net is designed to do the control register has a specific bit that can be set by the program called increment mode and this decide whether the increment should be a 1 or a 32 if it's a 1 or incrementing along the x-axis but if it's 32 were skipping 32 tiles at a time along in the x axis which is the same as going down one row in the y axis now that the CPU can control the direction of this increment let's see if that's repaired our glitches oh very nice it has the screen looks far more sensible now the reason the performance sucked is because I was generating the pattern tables for every single tile so I quickly hacked this together I know I'm shooting off on a tangent I'm just excited but let's have a look at how Donkey Kong renders now so it's rendering much much faster than it was before and here we can see the uncorrupted Donkey Kong level oh we can see Donkey Kong himself being animated we can choose a do it that looks like a Donkey Kong palette doesn't it maybe that one perhaps so that's very pleasing to see so even though it looks like we've rendered the background we haven't at all it's complete hack and so what we'll do now is implement the background rendering properly so we don't get confused I'm going to comment out my hack I'll keep it around just in case it helps us out with some debugging later rendering to the screen requires counting scan lines and cycles I think we've established that but one of the most useful diagrams you'll find on the nez dev wiki is this frame timing diagram what this diagram shows are the cycles going across the screen and the scan lines going down and it tells us what operations need to be performed when don't forget each cycle across the screen represents one pixel so eight cycles represents one row of one tile the PPU is only capable of storing information for the next tile it's going to render so during those eight cycles it's loading up the information it requires for the next eight pixels in this case it loads the name table byte so that's the tile ID then it loads the attribute byte that's going to contain the pallet information then it loads the pattern itself so remember this was split into two planes one representing the LSB and one representing the highest B it knows which pattern to read because it's already read the name table byte and it knows how to combine these into the correct color because it's red the attribute byte once the eight pixels have been drawn we effectively move to the next tile this movement is illustrated by this increase horizontal cell in the chart and then we repeat the process and we keep doing that all the way across the screen there's 256 pixels in the screen so naturally it stops at that point even though the last eight pixels of a scanline follow the same pattern the data is unused because there's no point in rendering beyond the scanline and don't forget the number of cycles exceeds the number of pixels on the screen but what we do need to prepare is the rendering system for the next rows first tile and that happens here towards the end of the scan line there are some additional reads of the tile ID but they get ignored you may also recall that we jump to scan line minus one at the start well the whole purpose of jumping to scan line minus one in this case it's labeled as zero is simply to prepare the very first pixels that are visible on the screen so as the scan line traverses across the screen at each tile boundary we increment our address that's accessing the name table when we get to the end of the visible area we increment our address but we increment it vertically in our name table and reset the x position the nice thing about this timing diagram is it also includes additional information things that we've already covered such as when the V blank flag gets set it also contains information about where and how the sprites are loaded but we're not looking at that this video when you start reading the documentation for now simulation inevitably you'll come across something called loopy loopy was a guy that generated quite a convenient memory structure for representing this information it's not dissimilar to the course X and course Y with the name table bits I showed before when using it as a 12 bit address this address is distinct to all of the other addresses we've used so far this is an internal address maintained by the PPU that correlates the scanline position to well everything else that's going on and it's almost always maintained by the PPU itself in fact this is the reason why you can't write to the PPU whilst it's rendering because you would inadvertently change this address and so the PPU would get confused with where it's reading its graphic sources from I'm going to do things the loopy way so thank you very much loopy for creating this fantastically convenient interpretation of the internals of the PPU data bus unsurprisingly the nez dev wiki promotes the use of using Lupe's approach too and it's quite verbose in telling us when and how the loopy registers are updated to registers are maintained one is labeled V which is the internal data register that the PPU is incrementing as required to get the data the other one is called T and this is the one that gets affected by the user so when the user reads and writes to the PPU this is the register that is updated periodically parts of V need to be updated with the contents of T things like facilitating the reset at the end of a scanline to go back to a known location these registers combine the scrolling information as well as the people use location in order to access the right bytes of memory in the first episode of the series I implied that there is no code on nez dev well the loopy register is the only place where I have actually found some implementable pseudocode it's written using bitwise tricks all over the place my implementation makes this a little bit more verbose so it is my intention now to replace the PPU address variable that I created temporarily with these loopy registers and looking at the internals of the register it is very similar to how I described it before course X and course Y are written to buy the scroll position the name table specifically gets a bit each and we've also got a verbal that stores our fine Y position it's only a 15 bit register so I'm using a 16-bit word to store it and I'm creating two loopy registers vram and T Ram the only missing piece of information now is for fine X scrolling the PPU address is now effectively replaced with the vram address in CPU read not much interesting goes on regarding the loopy registers except for this auto increment CPU right however does have quite a lot of influence over these registers the control register contains two bits which represent which one of the four name tables were interested in using I'm going to store these in the tearoom address variable we don't directly ever write to the vram address we always write to the T Rama dress so I've modified writing to the PPU address register accordingly however once a full 16 bits of address information has been written the vram address is updated with the T Rama dress the register that causes all of this complexity is the scroll register and again this is written to in two halves and each write successively flip-flops between the two halves of the address the data written to the scroll register sets in screen space the pixel offset so our fine X offset is between 0 & 7 is the bottom three bits of the data but we can also extract the coarse X location from the same byte of data in a similar way we can store find y & course y I'm now going to modify my clock function in accordance to the timing table we can get from nez death we've already included two specific entries for setting the vertical blank flag the bulk of these operations are going to apply to all of the visible scan lines and for a bunch of cycles our particular scan line we want to extract the tile ID the attribute and the bitmap patterns when we get to the end of a scan line we want to increment in the y-direction our loopy register the repeated eight cycles per tile can be implemented with a switch case statement these cycles are for pre loading the PPU with the information it needs to render the next eight pixels so I need to create some variables to store this information in the header file I'm going to add four variables the background next tile ID next attribute and to 8-bit variables that represent the plane of the pattern memory for the next eight pixels and recall that these are one bit per pixel and so the first thing to do is to read the tile ID then the attribute ID and don't forget that's a single byte that represents additional data split up into two bit patterns so there's a little bit more manipulation required to actually extract the final two bit information that we need then I'll extract the lowest significant bit plane and then the highest significant bit plane you can see these are the same except there is an offset of eight as described earlier now I'll forgive you for looking at that and going wah in the accompanying source file I've gone to some great lengths to actually explain precisely what is happening at each one of these stages and I break down the bitwise operations into their component parts with a description I've already described most of this in the slides but when you see it formalized like this I can appreciate it is a little confusing to see on the screen but I really want this video to be under two hours long on the timing diagram all of the cycles marked in red imply that we're doing some additional manipulation of the loopy registers and there are four essential functions incrementing in the x-direction incrementing in the y-direction resetting along the x axis and resetting along the y axis I'm going to implement these as lambda functions in my clock function incrementing in the X Direction simply adds one to the vram address value but if we go beyond the edge of the name table we flip the name table bit so now we're indexing into the other name table and this line will make a little bit more sense in the next episode but effectively we can only do these things if we're rendering something if the renderer is disabled none of this applies and this enabling is set by setting bits in the mask register incrementing in Y is pretty much the same thing except we increment our Y address and this is because we're operating on a scanline basis and scan lines are one pixel high whereas when we're moving along in the x axis were reading new tile information every eight pixels so that aligns with the course x value but in Y we need to use the fine Y value if fine Y is greater than eight pixels the width of a tile then we increment the course Y variable and just as before if we go beyond the name table vertically we flip the name table bit so we can access the other name table its corresponding counterpart of the pair resetting the address I've called it transferring the address is simply a case of copying over the X components of the T Ram address variable into the vram address variable and similarly for y going back to our case statement once we've gone through eight pixels we know we must be going onto the next tile so we'll increment scroll X and when we're done with a visible row we'll increment scroll Y but because we've incremented scroll Y our x-coordinate is still incorrect so we need to reset it back to the start of the scan line and we'll need to set our Y components on the non visible scan line ready for a new frame curiously on scan line 240 nothing happens we're almost there now have the facility to know precisely which background tile we need to access depending on which name table is in place and how far we've scrolled across the screen but of course right now everything is happening in 8 pixel chunks because we've been reading bytes from the memory these 8 pixels need to be buffered so that they can be rendered for the next 8 cycles we've already loaded and stored this information in these variables but now we need to go on to a slightly parallel part of the PPU that takes this information and composites it to the correct pixel color in the correct location the nurse utilizes shift registers in order to do this here I have a row of 16 pixels on a single scanline whilst the scanline is rendering these pixels it's loading up the information for the next 8 this information is loaded into the low byte of a 16 bit shift register every cycle therefore every pixel the shift registers shift to the right so by the time we get to this tile boundary the bit information for the next 8 pixels is in the high byte of the 16-bit shift register let's look at the pixel bit planes as an example when I get to rendering this pixel I can take the most significant bits of the shift registers to give me my pixel value when I move to the next pixel both of these registers have shifted 1 so again I can take the most significant bits to give me the correct pixel value for that pixel in a way as the scanline is rolling this way the shift registers are rolling this way so they converge to give us the correct pixel values in the right location however there is something else that affects this value and that is our fine exposition we set via the scroll register everything so far has happened at tile boundaries but we want pixel precision for our scrolling so instead of choosing the most significant bit all of the time we choose the bit chosen by the fine X register so supposing our fine X is equal to the third bit then instead of the most significant bit we choose the 1 this is 3 in from the MOE significant bit this is effectively scrolled our tile by three pixels and then of course the shift register moves everything to the left and our scanline moves one to the right and we carry on as normal the palettes were only represented as two bits but they apply to the whole row so when we load in the pallet attribute information for that particular tile I'm going to set all of the bits to be the same that way I can use exactly the same mechanism to choose all of the information I need to get me my final pixel color as the scanline continues through the pixels that we're rendering in the background we're loading up the information for the next eight pixels and because this is a 16-bit shift register these all get moved along by one as well so we end up with this never-ending stream of one bit information to give us a very smooth scroll in the X direction but also supply us with the information we need to produce the correct color I'm just going to use 16-bit words to represent my shifters and to the clock function I'm going to add another two lambda functions one prepares the shifters for rendering of the pixels we load the whole 8-bit word into the bottom of the 16-bit shifter for the pallets we take the individual binary bit specified here and we inflate them to a full eight bits so this ensures us our pallets are in sync with our patterns the second lambda is going to simply update the shifters it's going to shift them all to the right by one bit that's for the pixels and that's for the pallets every visible cycle we want to update the shifters and when the internal cycle counter loops around eight pixels we're going to load our background shifters with the next eight pixels worth of information now that we've got a cycle and scanline tracking architecture and we know which pixel we're at and we know what palette it is in which bitmap it's using its time to composite it all and draw it and this will seem relatively simple given everything we've seen so far I only want to draw the background if we've enabled drawing background in the mask register I to select which bit of the shift register depending upon my fine x-value at the moment by default it's set to choose the most significant bit and I'll shift that depending on the value of fine X once that bit has been moved I can use it to extract a particular bit in my shifters I'm going to do that for both pixel planes here and I'll combine those pixel planes into a two-bit word that represents the pixel and see how we're getting things going back to the start of the video now in exactly the same way I'm going to get the two bit information for my palette this means where we were just generating noise before I can use my get color from pallet Ram and pass in the palace in pixel we've just created and it seems like a long time ago now we define that function to actually choose the final color to draw on the screen so let's take a look start the emulation and now we can see Donkey Kong rendered with the correct colors this time so before it was rendered with a whole pallet but now it's choosing the pallet depending on the attribute regions of the final output and we can see Donkey Kong very happily doing well he's just having a dance at the moment there's nothing else to really do up there it's interesting that he was part of the background Super Mario Brothers kind of works there's a reason why it doesn't and we'll see a lot more of that in the next video at this stage it's difficult to find titles that support scrolling because we've only implemented mapper 0 which is a very simple one so most of those games were fixed single screens and as we've just seen with Super Mario that's relying on a few of the features before it'll start to scroll the demo at the beginning but this one's quite nice so here we can see vertical scrolling being used very smoothly indeed interestingly as it's playing out the demo we can see parts of the name table in the background disappearing this will have been the ice climber chipping away at the blocks above him and here's good old kung fu I'm kind of glad that one's not working it's a terrible game so this has been a long and complicated video and it is by far the longest and most complicated video of this series the remaining videos next week we're going to look at sprites and then after that will be looking at sound should be considerably simpler than this if you've enjoyed this one please give me a big thumbs up and have a think about subscribing thanks again to all of my patreon Xand people who have joined the YouTube channel getting to a really exciting number of subscribers now and never would have thought that always come and have a chat on the discourse er if you've got some questions the source code is gonna be available on the github and I'll see you next time take care
Info
Channel: javidx9
Views: 86,649
Rating: 4.9813156 out of 5
Keywords: one lone coder, onelonecoder, learning, programming, tutorial, c++, beginner, olcconsolegameengine, command prompt, ascii, game, game engine, pixelgameengine, olc::pixelgameengine, nes, nes emulator, emulation, 8-bit, picture processing unit, ppu, graphics, nametables, pattern memory, palettes, loopy
Id: -THeUXqR3zY
Channel Id: undefined
Length: 67min 9sec (4029 seconds)
Published: Fri Sep 20 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.