The Madness of Z80 I/O

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
prepare to descend into total Madness not only does the Z8 out instruction not do what you think it does but most documentation is actually wrong then the amstr CPC flips it on its head and at the end we may not even be able to use that instruction and instead need to use crazy methods welcome into a deep dive of The Madness of the z80 iio let's start with a quick recap of how some CPUs talk to other chips on the computer the simple possible way that a CPU can talk to another chip is to Simply have that chip listen for a particular address on the memory addressing space for example I could have the graphics chip listen on address 9000f whenever the CPU flashes that address on the address busz along with the right line the graphics chip can read the datab bus and interpret that as a screen barder colors for example just totally random example or maybe that's exactly how it's done in the big 20 with a 6502 CPU that's actually super nice and a straightforward way to communicate at with everything just give out addresses and document how each chip interprets the data it also works beautifully to read data back from chips just do a read operation on that address and the chip can return some data in the data bus the CPU is already set up for that so this is great problem solved we're done with the video well not quite so here's a memory map of the v20 as an example the are in pink is the address range reserved for Io and it's only 4 kiloby out of the possible 64 KOB that you get with a 16bit dress in bus and certainly on a Vic 20 with only 20 or 32 KOB of ram total it will work without a problem we have lots of room to spare but what if you want to have 64 KOB of ram without giving up 4 kiloby for iio that's when things start getting interesting the 6510 is the CPU on the Kam 64 a lot of people just think of it as a 6502 and it's mostly true except for a key difference that we care about here the 6510 added a 6-bit processor Port these are six physical pins on the 6510 that are Wired from the CPU to different chips that way the CPU can use those signals on those lines to talk to different chips and not waste any address in space on iO addresses those six new lines for the processor Port are also handled in a very clever way instead of adding a new instruction on the CPU they're simply mapped to the lowest six bits of address zero to set the io direction of each pin and the lowest six bits of address one for the signal on the pins the problem is that you're a bit limited with only six Bits And while Comer could have technically used that to address two to the six or 64 different ports on different chips instead they mostly wire each line for a very specific purpose so they can only control six different things that means that a lot of the io is still happens through specific addresses like CIA communication and so we finally come to the z80 apparently they decided that if you're going to have a 16bit address in bus then gosh darn it you better be able to have 64 kiloby of real memory to access and they didn't want to take any space with IO addresses that sounds like a fantastic idea the solution is very straightforward actually add a new instruction that signals that this is specifically an IO operation and not just a memory read or write and that instruction is called out on a ad to write some data to a memory address we could do something like this that puts c0000 in the address bus and 33 on the data bus and then sets the right and MRE memory request signals the rest of the system detects that and puts that address in the right location perfect the out instruction functions the exact same way except that instead of setting the embra signal it sets the io request signal that simple except that I just lied to you and that is not the right Syntax for out it's actually written out like this now you're probably saying wait a second that doesn't look like he puts the full value of BC on the address bus surely that only puts the eight bits from C right that would make a lot of sense so let's find out so let's have a look at the Bible of the z80 programming how to program the z80 by Rodney Zack the contents of the specific register are output to the peripheral device addressed by the contents of the C register quick note from Noel from the future here I actually missed that the very last line in there says register B supplies bits 88 to 8815 so that's pretty clear right there okay let's continue so that seems consistent with the way the instruction is written right and how about aad reference site the value of B is written to Port c yeah that's pretty clear right so let's go to the source to the Zog documentation itself the content of register C are placed on the bottom half of the address bus to select the I/O device at one one of 256 possible ports the contents of register B are placed in the top half of the address bus at this time then the bite contained in the register R is placed in the datab bus and written to the selected peripheral device that is totally different and why does register B come into play since it doesn't even appear in the instruction so is the Zog manual right and everybody else is wrong or did Zog mes the documentation of their own CPU so I did what any samee person would do and I wired up a minimal Z8 system I know this is not up to Ben eer standard but it was just supposed to be a quick temporary setup this is truly a minimal build I have the Z8 processor hooked up to a ROM in a simple program that simply does out ca with BC of ff00 and a of 33 and then it just repeats again and actually since the program is only 12 bytes long I even only wired the first four lines of the atress bus and the rest are just going to ground the initial intention was that you would step through all the instructions with this push button and that when we get to the out instruction the IQ signal will be high and you could see in an LED and then we would examine the address bus with LEDs as well at that time however after banging my head for many days on this I remembered that the original z80 processor cannot be stepped at a really low frequency apparently it's implemented internally with some kind of dynamic refreshing so unless it has a minimum frequency in the kilohertz it will just not preserve the correct state so you just simply can't do that so if it's been many weeks since my last video that's probably one of the reasons why so now instead I'm going to be driving the clock signal from this signal generator I'm going to use something slowish like 100 khz that would be fast enough that this 80 actually worked but slow enough that running on a breadboard with all the wires crossed and all that won't really matter as far as noise and crossover and all that and here you can see the clock signal generated by the signal generator and after I reset it when we we look at IQ there you go you see it that it periodically comes up like that this is not ideal but we can check out the address lines at the same time we're checking IQ and the oscilloscope and yeah it looks like we're getting a one on every bit on the upper half of the address bus and on the lower address bus it looks like it's zero so yeah that seems to work this would be a perfect job for a good data analyzer and all I have is this cheap one which the cheap part doesn't really matter as far as frequency because it's running really slowly but it only has eight channels so I'm going to hook up one of them to IQ and then the other seven to the middle of the address bus maybe between A5 and a11 that way we see part of the lower half and part of the upper half and yeah there we can see the same thing when irq is active the lower half of the bus is zero and the upper half is set to one you know what else would make this much more convenient to work with if I had made this into a PCB with today's sponsor PCB way maybe it sounds weird to make a PCB for something like this this is supposed to be a prototype but this is setting up a z80 with a ROM that's something that you're going to be doing a fair amount so it would be pretty trivial to turn this into a PCB design upload it to PCB Way's website and then I would get a simple PCB back in just a few days or a week at most PCB Bo has great prices for very small runs like that so you can share them with other people or maybe you can have multiple of them set up that way that's actually a project that I may consider doing in the future so if that's the kind of thing that you may be interested definitely check out pcb.com so this is great we finally know what outc does exactly and the Zog manual was correct so now the big question is why is the instruction written as out ca instead of out BCA which is really what it's doing and how come B isn't even mentioned in most of the documentation the answer I'm afraid will cause us to go one step deeper into this mess the answer is the instruction out I before before we go there I want to touch briefly into how to read from an I/O device in the z80 we continue our parallel with reading and writing to memory if we want to read from some address we would do something like this so it would make sense that to read from an IO device we would do something like this or I guess out a c but I guess someone thought that it was too clashing to use the instruction out to bring data in so they did exactly that and changed the instruction to in AC now this is Small Potatoes compared to the mess that we're about to see but it just bugs me that we have to use LD for load both ways but then we have to go and use different instructions for out and in and actually that's not a horrible idea so why not just ditch LD and go with a read and a write instruction instead from a purely design aesthetic that's just ugly having the way it is but okay fine fine fine we'll have out and in let's get to some of the really ugly stuff one of the culprits of this whole mess is the instruction out eye and it's corresponding in I of course so let's warm up by looking at an easier to understand instruction ldi ldi is an instruction that will do several things it will copy the bite pointed to by HL into the address contained in De and then it will increment HL and de and decrement BC and why does it do all of that because that way it can be inserted in a loop with BC as a counter and copy a block of memory from HL to De pretty quickly and as if that wasn't enough the Z8 designers decided to take it one step Beyond and Implement ldir which actually implements the loop itself and continues coping until BC is at zero I imagine for anyone who didn't grow up with Z8 assembly this monster of an instruction it's just a real isore and I totally see that and at the same time I remember how ldi is so much more efficient than trying to do the same thing by hand and since moving data around memory is just just so slow and so important for drawing things on the screen it does have a reserved spot in my heart but yes it's still really ugly now that we know about ldi and ldir I bet you can imagine what out ey does it does an out C with the contents of HL increases HL and decreases B there's even a looping instruction except that instead of calling it out they called it OT tiir they really wanted to trigger my OCD anyway Let's do an example to see how that works say we have this code so it will read the data pointed to by HL which is the first bite in my data which is a z and it will do an outc with it that means BC goes in the address bus 03 F7 and a0 goes in the data bus then it will decrement B since this is expected to be part of a loop let's repeat it until B is zero so it does another outp put in this time O2 F7 7 on the address bus and b0 on the data bus since B is not zero it does it again with 01 F7 and c0 and now B is decremented hits zero and we're finally done with that Loop did you notice anything there we blasted several values to Port F7 the contents of C but the value of B was a countdown so it was mostly irrelevant so even though the Z8 puts both B and C on the atress bus only the lower bite is really intended for decoding the hard Ware the fact that b is added to the address bus I'm not sure if it's a side effect or to save on transistors or maybe they're doing that on purpose to let some devices know about the countdown because maybe will be important in some weird cases but the fact that they named instruction out ca just with C is a very clear indication that Zog intended to address 256 ports they even say that in the manual and the value of B should be mostly ignored and in fact this is what happens in every sayane Z8 system out there from the ZX Spectrum to MSX computers chips just look at the lower bite of the address bus to see if their address number has shown up confusing enough for you yet no the amstrad CPC just enter the chat amstr Engineers decided that it would be a good idea to go against the normal use of the ye0 and drum roll please use the top bite of the addressing bus to access the chips why we'll see that in a second but let's have a look at what it actually means now you would do something like this and you will use the F7 value starting B to determine the port to communicate with so really you don't need to load C at all and you could simplify that to something like this which makes it super confusing because the instruction names C but we aren't even using it which finally leads us to what started this whole the scent into madness in my last video I wrote this code and he seems innocent enough but I got swamped with comments saying that it was incorrect and it should have been out BC but now finally we understand why that code is so weird it really is doing out BCC and on the amstrad which was the platform that I was writing about it only cares about the B part that's actually a very common idiom with amstr CPC Assembly Language and you'll see it all over the place there you're loading both bites you care about and then doing an out instruction right away in that particular case it's accessing the gate array and setting the screen to mode zero aha some of you may be saying that's why amstrad made that odd Choice it's much more efficient to load all the values at once and then just do an out instruction than it would be to write something like this it would kind of make sense since each of those LD instruction takes seven cycles and an ldbc takes only 10 and takes one fewer byit but that's not the reason at all because you could have just used out CB and loaded BC the exact same way as before the real reason for that odd choice that amstrad made was something that amstrad was really good at reducing cost since we have a hobit that would in theory gives us 256 different ports that we could address however amra decided to have the top five bits and use one of each of those bits for each major IO system gate array crtc ROM selection printer and PPI that was done that way so that each chip could have just one line directly connected to it and the logic to talk to that chip would be trivial the altern ative would have been to have eight bits coming into the chip which would not be realistic and would require more pins that are available for a particular package size in the case of the garray or in the case of thirdparty chips they would have to have external logic to convert those eight bits into a single bit that would activate the chip that would have made the computer significantly larger and more expensive so that makes sense but what does that have to do with using the top of the address bus the answer to that is what we talked about in last video memory banking remember how video I mentioned that we had Banks of 16 kiloby and that we could detect which bank we're accessing by looking at the top two bits of the address bus that means that the gator rate needs to already be connected to a15 and A14 which are the top two bits of the address bus if we were using the lower bite of the address bus for iio like a normal z80 system the get array would need to have an extra pin or two just for that so in this case AMR managed to squeeze all of that in a 40 pin package by pulling that weird optimization they saved some cost and he made it so that all io on amstrad worked completely differently to all other Z8 computers and not only that but out I and ni and out or Otti are pretty much useless on an amstrad CBC because as part of the Loop B is going to be decremented which is going to change the port that you're talking to so they've killed about four or five instructions of the z80 by doing that thanks amre okay we finally made sense out of all of that but there is still one big problem the iOS space is mostly full check out this list for known IO port for different amstrad extensions they're a mix of original expansions and the newer ones and each of them lists addresses and most importantly the bits that it cares about and here's the big problem remember how I said that amstrad used the top five bits so if any of those bits are low one of the five built-in systems will think that is being addressed that only leaves three bits for extensions and if we assume that we'll do a full decoding of those bits that's only 2 to the 3 or eight different ports that's it and looking at that table it looks like those ports are all used up with memory expansions right at the beginning so what's a new hardware designer is supposed to do if they want to make an extension for the amstrad again looking at the table it looks like the answer is to also use the lower bite of the address bus yes I know I said that amstrad wired only the upper bite but I was talking about the their own chips as long as you have full access to the address bus you're welcome to read any part of it and use it to trigger your own activation and that's what people are doing for the most part we'll see some interesting exceptions at the end when you look down the list you see that a lot of entries are taken and what's worse there are some extensions that claim a whole range of bites to themselves that might have been okay for the original board but now given how congested that table is and how easy it is to implement the bit Logic on a CLD there is simply no excuse for a modern device to only do partial address decoding so if you're making a device today please please please confirm that there's room in that table and that you're only active when all the bits you've selected are active and no other time otherwise you'll be running into constant conflict with other hardware and I've seen actually several devices doing that but even if you're well behaved and you're only triggered with one of those addresses there's still only 255 possible ones and the list is mostly full to be fair most of the expansions aren't going to be running at the same time so maybe it's safe kind of still some Hardware designers are given up on iO ports completely and finding alternate ways of communicating with the extensions so you thought it was crazy until now no this is where crazy starts all right I'm going to give you a problem and I'm going to let you think about it for a second so imagine you're an expansion designer and you have full access to the expansion bus meaning the data and address buses and the usual control signals how could you communicate with custom Hardware without using IQ in case you're thinking this is purely hypothetical this is exactly how the dandon nator is implemented and the solution doesn't use any secret signals or anything that we haven't talked about it's a really ingenious solution so pause here for a second and think about it if you want okay got it are you ready for the solution the key to this communication is extremely simple sending some pre-agreed data pattern on the datab bus specifically the upcode fetch phas which happens while M1 is active that's it if you agree that the bite to trigger the device is going to be for example C9 whenever the CPU fetches the next up code and it's C9 the device can do its thing knowing hey they're talking to me yes I can already hear the zad experts yelling at the screen I know I know I know C9 is the binary code for the red instruction I did that on purpose if we did that the device will be triggered constantly by normal programs every time they return from a sub routine so that won't work we really need to find some data that will not be present in normal Z programs except that look at that table that is the Matrix with every bite and what instruction it belongs to no room in there right well not quite do you see those darker squares those are multiup code instructions they're instructions that start with a bite but then they'll have an other bite that defines the full instruction for example for bit instructions that prefix will be CB and then we look in the second bite in the CB table and if that was 1 E the instruction was r r how does that help us well not all those tables are as full as that one as a matter of fact the MK table is mostly empty so the dandon nator in particular chose the FD prefix which leads to the iy instruction stable followed by FD again which is Mt once the zad has started decoding an instruction like this it needs to finish it so it will actually ignore the second FD bite and keep trying to match instruction in the table that's why the Detonator expects this instruction which is FD 07 and then something so The Detonator is actually looking for the sequence FD FD FD and then the next bite tells it more information about what kind of operation we want to do that's crazy depending how you look at it that's either really clever and I had never heard of this approach before or a horrible hack but the fact of the matter is that it works beautifully and it avoids the overcrowded I/O table the choice is really smart because clearly no well-written program is ever going to have the bite sequence while fetching up codes and also the the CPU will not be interrupted in the middle of an upcode fetch phase so there's no need to disable interrupts before sending any data to the dendon Ator the only possible downside is that the instruction has a side effect of storing B in some Ram bite so you need to worry about where you point iy2 and then it's going to be a little slower than just doing out instructions but it's really really ingenious and the good news is that if other expansions wanted to take that similar approach there's a lot of room in that table to have hundreds or maybe even thousands of different sequences I probably lost most of my Hardware Audience by this point with so much talk about software but if you made it this far congratulations you have conquered the madness that is z80 iio and now when you see out CC you probably know exactly what it's doing or not you never know better check those connections with a multimeter to be sure in any case I'll see you next time
Info
Channel: Noel's Retro Lab
Views: 71,356
Rating: undefined out of 5
Keywords:
Id: aB9AuKx8kBU
Channel Id: undefined
Length: 22min 52sec (1372 seconds)
Published: Fri Jan 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.