Inside the CPU - Computerphile

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in a previous video we looked at how cpus can use caches to speed up accesses to memory so cpu has to fetch things from memory it might be a bit of data it might be instruction and it goes through the cache to try and access it and the cache keeps a local copy and fast memory to try and speed up the axises but what we didn't talk about is what does the cpu do with what it's fetched from memory what's it actually doing and how does it process it so the cpu is fetching values from memory and we'll ignore the cache for now because it doesn't matter whether the cpu has got a cache or not it's still going to do roughly the same thing so we're also going to look at very old cpus the sort of things that we're in eight bit machines purely because they're simpler to deal with and simply to see what's going on but the same ideas still apply to an arm cpu today or your x86 chip or whatever it is you've got in your machine modern cpus use what's called the von neumann architecture what this basically means is that you have a cpu and you have a block of memory and that memory is connected to the cpu by two buses these are just a collection of several wires that are connecting and again we're looking at old-fashioned machines on a modern machine that gets a bit more complicated but the idea the principles the same so we have an address bus and the idea is that the cpu can generate a number in here in binary to access any particular value in here so we'll say that the first one is at address 0 and we're going to use a 6 502 as an example we'll say that the last one is address 6 5 5 3 5 in decimal or f f f f in hexadecimal so we can generate any of these numbers on the 16 bits of this address bus to access any of the individual bytes in this memory how do we get the data between the two well we have another bus which is called the data bus which connects the two together now the reason why is a von neumann machine is because this memory can contain both the program i the bytes that make up the instructions that the cpu is going to execute and the data so the same block of memory contains some bytes which contain program instructions some bytes which contain data and the cpu if you wanted to could shoot the programmers data or treat the data as program although if you do that it'll probably crash so what we've got here is an old bbc micro uses a 6502 cpu and we're going to just write a very very simple machine code program that uses one of the operating systems just to print out the letter c for computer file so if we assemble it we're using hexadecimal we've started our program at 084c so that's the address where our program's been created and our program is very simple it loads one of the cpu's registers which is just basically a temporary data store that you can use and this one is called the accumulator with the ascii code 67 which represents a capital c and then it says jump to the subroutine at this address which will print out that particular character and then we tell that we want to stop so we've got to return from subroutine and if we run this and type in the address of where i've put it 0 8 4c then you'll see it prints out the letter c and then we get the prompt to carry on doing things so our program we write it in assembly language which we can understand as humans ish lda load accumulator jsr jump to subroutine rts return to subroutine you get the idea once you've done it a few times and the computer converts this into a series of numbers in binary cpus work in binary but to make it easier to read we display them in hexadecimal so our program becomes a94320ee ff60 that's the program we've written and the cpu when it runs it needs to fetch those bytes from memory into the cpu now how does it do that so to get the first byte we need to put the address 084c on the address bus and a bit later on the memory will send back the byte that represents the instruction a9 now how does the cpu know where to get the instructions from well it's quite simple inside the cpu there is a register which we call the program counter or pc on a 6502 on something like a x86 machine it's known as the instruction pointer it has different names it doesn't make any difference and all that does is store the address of the next instruction to execute so when we were setting up here it would have 0 8 4c in it that's the address of the instruction we want to execute so when the cpu wants to fetch the instruction that it's going to execute it puts that address on the address bus and the memory then sends the instruction back to the cpu so the first thing the cpu's got to do to run our program is to fetch the instruction and the way it does that is by putting the address from the program counter onto the address bus and then fetching the actual instruction so the memory provides it but the cpu then reads that in on its input on the data bus now it needs to fetch the whole instruction that cpu is going to execute and on the example we saw there it was relatively straightforward because the instruction was only a byte long not all cpus are that simple some cpus and will vary these things so this hardware can actually be quite complicated because it needs to work out how long the instruction is so it could be as short as one byte it could be as long on some cpus as 15 bytes and you sometimes don't know how long it's going to be until you've read a few of the bytes so this hardware can be relatively trivial so an arm cpu makes it very very simple it says all instructions are 32 bits long so the archimedes over there can fetch the instruction very very simply 32 bits on something like an x86 it could be any length up to 15 bytes or so and so this becomes more complicated and you have to sort of work out what it is until you've got it but we fetched the instruction so in the example we've got we've got a9 here so we now need to work out what a9 does well we do that we need to decode it into what we want the cpu to actually do so we need to have another bit of our cpus hardware which is dedicating to decoding the instructions so we have a part of the cpu which is fetching it and part of the cpu which is then decoding it so it gets a9 into it so the a9 comes into the decode and it says well okay that's a that's a load instruction and so i need to fetch a value from memory which was the 4 3 the ascii code for the capital letter c that we saw earlier so we need to fetch something else from memory so we need to access memory again and we need to work out what address that's going to be we also then need to once we've got that value update the right register to store that value so we've got to do things in sequence so part of the code log you should take the single instruction byte or however long it is and work out what's the sequence that we need to drive the other bits of the cpu to do and so that also means that we have another bit of the cpu which is the actual bit that does things which is going to be all the logic which actually executes instructions so we start off by fetching it and then once we've fetched it we can start decoding it and then we can execute it and the decode logic is responsible for saying put the address for where you want to get the value to load into memory from and then store it once it's been loaded into the cpu so you're doing things in order we have to fetch it first and we can't decode it until we fetched it and we can't execute things until we've decoded it so at any one time we'd probably find on a simple cpu that quite a few bits of the cpu wouldn't actually be doing anything so while we're fetching the value for memory to work out we're going to how we're going to decode it the decode and the execute logic aren't doing anything they're just sitting there waiting for their turn and then when we decode it it's not fetching anything and it's not executing anything so we're sort of moving through these different states one after the other and they'll take different amounts of time if we're fetching 15 bytes that's going to take longer than if we're fetching one if we're decoding it might well be shorter than if we're fetching something from memory because it's all inside the cpu and the execution depends on what's actually happening so your cpu will work like this and it will go through each phase then once it's done that it'll start on the next clock tick all the cpus are synchronized to a clock which just keeps things moving in sequence and you can build a cpu something like the 6502 worked like that but as we said lots of the cpu aren't actually doing anything at any time which is a bit wasteful of the resources so is there another way you can do this and the answer is yes you can do what's called a sort of pipe lined model for a cpu so what you do here is you still have the same three bits of the cpu but you say okay so we're going to fetch and i'll just use an f instruction one in the next bit of time i'm going to start decoding this one so i'm going to start decoding instruction one but i'm going to say i'm not using the fetch logic here so i'm going to have this start to get things ready i'm going to start to do things ahead of schedule i'm also at the same time going to fetch instruction two so now i'm doing two things two bits of my cpu in user zone i'm reflecting the next instruction while decoding the first one then once we've done the decoding i can start executing the first instruction so we'll execute that but at the same time i can start decoding instruction two and hopefully i can start fetching instruction three so what still taken the same amount of time to execute that first instruction but the beauty is when it comes to execute instruction two it completes exactly one cycle after the other rather than having to wait for it to go through the fetch and decode and execute cycles we can just execute it as soon as we finish instruction one so each instruction still takes the same amount of time still takes say three clock cycles to go through the cpu but because we've sort of pipelined them together they actually appear to execute one after each other so it appears actually one clock cycle after each other and we could do this again so we can start decoding instruction three here at the same time executing instruction two now there can be problems this works for some instructions but say this instruction said store this value in memory now you've got a problem you've only got one address bus and one data bus so you can only access or store one thing in memory to time you can't execute store instruction and fetch your value from memory so you won't be able to fetch it until the next clock cycle so it would fetch instruction 4 there while executing instruction 3 but we can't decode anything here so in this clock cycle we can decode instruction four and fetch instruction five but we can't execute anything we've got what's called a bubble in our pipeline or pipeline store because at this point the design of the cpu doesn't let us fetch an instruction and execute an instruction at the same time it's a one type of what we call pipeline hazards that you can get when designing a pipeline cpu because the design of the cpu doesn't let you do the things you need to do at the same time at the same time and so you have to delay things which means that you get a bubble and so you can't quite get up to one instruction per cycle efficiency but you can certainly get closer than you could if you just had everything do one instruction at a time that it has to add this content is really very narrow i think it's the equivalent of a 15-inch screen at normal distance so really my field of view of augmented content and this sounds bad but it's not that bad

Info

Channel: Computerphile

Views: 294,679

Rating: 4.9397163 out of 5

Keywords: computers, computerphile, computer, science, computer science, Dr Steve Bagley, University of Nottingham, CPU, Computer Hardware, Pipeline

Id: IAkj32VPcUE

Channel Id: undefined

Length: 11min 16sec (676 seconds)

Published: Wed Mar 22 2017