A History of The ARM Microprocessor | Dave Jaggar | Talks at Google

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] RAYMOND: Dave is largely responsible for the ARM architecture. I'm going to give you the briefest possible intro to ARM. It's why this is not this big and burning a hole in your pocket. That's it. With that, Dave. DAVE JAGGAR: Thanks so much, Raymond. I'm going to skip through the first slides fairly quickly, because I hope it's fairly well-known stuff. On that introduction, just a wee detail about ARM is in a lot of products these days, it is dominant in cell phones, but a lot of other products as well. ARM was formed in 1991-- late 1990, start of 1991-- with 12 engineers from a British company called Acorn. We started intellectual property company with no intellectual property. This is not something to try at home. We got $2.5 million from Apple. Why all that came together will become clear in a moment. Chip designers-- ARM doesn't make chips, but everyone else does, pretty much. And ARM makes a royalty. Just recently the 150 billionth chip shipped. But none shipped in 1991 to 1995. And about $23 billion last year. So that's about 20 for everyone on the planet. More than 60% now of the world have access, use ARM every day. And that's about the same as having access to basic sanitation. 730 chips per second are manufactured 24/7. They're a tiny company compared to Google, of course. Everyone is. But $1.83 billion turnover in 25 years is not too bad. And about $1.1 billion is from royalties. 6,000 employees, nearly 1,700 people manufacturing ARMS. And they were bought by SoftBank in, uh-- whether it's for better or for worse, the Pope was inaugurated in 2005. And there's a photograph. And the change that perhaps ARM had. In 2013 it kind of looked like that. I think maybe the Pope thinks the whole world is covered in purple and green splotches from all the flashes going off. Shipments [INAUDIBLE] $150 billion growth curve. That's to the end of 2018. So that's why it's not quite $150 billion. I was largely responsible for the yellow and orange and the start of the red. ARM stopped reporting those products as individuals. And that's why it becomes that beige color. That's the combination of those three after that time. And then the Cortex M is largely the development of the yellow stuff that I did. And it's a direct descendant. And then the power-- the big chips that are on your phone-- that's the Cortex A series-- the purple at the top. So you can see a lot of ARMS are going into everything other than the main processor in a cell phone too. Annual shipments look like that, approaching $25 billion a year. Quarterly shipments are really jumpy, because you fill the pipeline with products in about the end of second and third quarter. And the fourth quarter and the first quarter are pretty quiet after Christmas for production. So a little bit about the background. This processor was originally developed by a company called Acorn. And that's the name. ARM is a Acorn RISC Machine. They had a lot of success in the early '80s. They were kind of like a British Apple. Had a educational computer. And they sold so many of them that they then decided to do this computer, which was a follow-on. Unusually, they decided to develop everything themselves, right down to the keyboard and the mouse. They had multiple operating systems, networking, file systems, and the core processor, and three support chips-- the memory controller, the I/O controller, and the video controller. Kind of unusual. Unfortunately, it was never particularly successful. It was probably just overtaken by the IBM PC, like most things. In parallel in 1990, Apple were developing the Newton-- the first PDA handwriting recognition. You can roughly imagine it as a large cell phone without any connectivity. Probably why it wasn't successful is it didn't have any connectivity. But the advanced technology group led by Larry Telser were building this Newton. And they really wanted a low power 32-bit processor. It was actually Jony Ive's first job at Apple was to design the Newton. And because Apple and Acorn competed in the UK market, they decided to spin out this company. The other bit of serendipity was the timing of this company was it was just the start of the world going digital. So just to cast your mind back, about 1990 a lot of applications that were developed on PCs wanted to be put into sort of portable products that ran on batteries. So as far as that was concerned, the ARM processor came online just about the right place at the right time. At high school, my math teacher said I'd never be an engineer. This is kind of ironic, because one of the reasons I'm on a tour of the US at the moment is Dave Flynn and I-- another senior engineer at ARM-- were awarded the James Clerk Maxwell medal from the IEEE. So I think maybe I can talk to my teacher with my head up now and say, actually, I'm probably a decent engineer. But because of that I actually did computer science instead. So I did every single paper at my university in New Zealand, which was unusual. And again, it was a bit of serendipity. We just seemed to have all the right people with-- we only had 10 teaching staff. But we just seemed to have the right bits of technology. We had especially Tim Bell. If you know anything about text compression, the original bible was written by Bell, Clary, and Witten. And Tim Bell was one of our lecturers. We also had compiler technology. We had the source to an operating system-- a good one. But we had almost no hardware expertise. So I spent a lot of time in the engineering library learning about hardware. John Mashey was from MIPS computers. He has an esteemed career. And he gave a guest lecture in 1988 and really taught me that there was such a thing called a computer architect. And I worked with lots of old mainframes. If you look at anything I design and you squinted it a little bit, you get a PDP 11. So. So for my master's thesis I looked at this ARM processor. And the word "interesting" is in quotes because it's a slightly crazy architecture. But it is interesting. So my thesis was called "A Performance Study of the Acorn RISC Machine." And I wrote a C Compi-- sorry, I wrote a C Compiler. I wrote a instruction set simulator called ARMulator. I put MIP style floating point on the side of it. And put a complete software stack on top of that-- compilers, assemblers, the Sun OS sources, and ran the whole lot. And I really think I learned a lot in those couple of years, because you really haven't lived until you've debugged a complete stack like that. It was actually a couple of bugs on the Sun OS source when it was compiled for a new machine all the way down to a bunch of bugs, of course, in my code. And later I modeled a 16-bit ARM. And this comes in later as a replacement for the teaching simulator. A couple of days after handing in my thesis I saw my first copy of Patterson and Hennessy's "Computer Architecture-- A Quantitative Approach." And I remember standing in the university new book section at the library going, well, you could have told me sooner. RAYMOND: If only I had known. DAVE JAGGAR: I could have just saved myself so much time. But yeah. So inspiration number 1 one was John Mashey from MIPS and Silicon Valley. And I was very fortunate to give a talk in Stanford last week. And John was there. So after all these years-- 30 years-- I was able to sort of repay a little bit of the thank you's. He walked in with a stack of overhead projector slides about this tall, all messy, straight out of his attache case, and said, I won't have time to present all this, and then did it. And it was like drinking from a fire hose for 90 minutes. It was fantastic. And I guess at the end of that I knew what I wanted to be when I grew up. He also came out with this taxonomy of a complex instruction set computer and a reduced instruction set computer, and a continuum of how you grade these things as far as what a processor might be on that continuum. And I've never really entered into this argument. ARM is not a pure RISC. Our CTO, Mike Muller said that way back in 1992. It's kind of on the spectrum. I found out recently that mushrooms are closer to animals than they are to plants. And I think it's kind of like that. It's just a different thing. And you shouldn't try and rule it in too much. Maybe it's a MISC for miscellaneous. And maybe you have to bacronym the M. And we'll call it a M for modest. It was a modest little chip, mainly because it was designed not to have any on-chip caches. It was designed to connect directly to DRAM. And that has a ton of architectural implications that good and bad. And it took a long time to sort of undo some of those. Probably the single biggest mistake they made is they had a limited 26-bit of address bus. They artificially limited the address bass. And so this thing could only address 64 megabytes of memory. But it was low cost, it was low heat, and it was low parts. They had to fit in a plastic package back then with no heat sync, no fans. It was probably no smaller than the early MIPS machines. We used to say it was. But with hindsight, when I learned more about how the MIPS machines were laid out, there was probably no simpler. Anyway. The implications of having no caches on-chip meant a long cycle time. And this meant that the whole instruction pipeline of the machine stalls whenever you access memory, because the machine's trying to load an instruction every cycle. That's what RISC machines do. They're always trying to load an instruction. And as soon as you need to access memory for a load or store, you have to stop fetching instructions. And the whole pipeline stalls. This is kind of unusual. Because that whole pipeline stalls, you've actually got time to do other stuff while you're accessing memory. And they went ahead and baked a lot of the stuff into the instruction set. Really quite unusually in an ARM, a single instruction can do a shift and an AOU op in a single cycle. I don't know any other machines that do that. It also has load installed multiple instructions, which allow you to get fast DRAM access for data. This is not how our computer pipeline should look. But this is how ARM2 looks. A trained computer architect will look at this, and I see an Anaconda that has swallowed a goat. So you've got that empty, empty, empty, huge bulge that just doesn't look right, empty, empty, empty. And there's just everything's done in the execute stage of the ARM pipeline. It's really not pipelined at all-- the back end of the machine. Because it's so simple, that thing has to just loops while it's accessing the single memory system. A little bit on code. It's cute and fun to write assembly code for this thing. For example, the top instruction multiplies register 1 by 5, which is kind of a handy thing to do in one cycle. You can do other things like move bytes around in and out of registers to do bit field and certain [INAUDIBLE] in a single cycle. The AOU and shifter combination was also used to form addresses from loads and stores. And that meant you can do quite complex addressing mode. So the C programmers in this world will understand the R5 + + nomenclature. You can do auto increment and auto decrement built into every instruction. Register 15 was the program counter. And register 14 was the return address. So to return from a subroutine, you just put register 14 back into 15. And that returned. The last one is a conditional return from function call all in one hit. So if something is equal to 0, that's EQ bit. You load multiple increment after. Register 13 is the stack pointer. And the exclamation mark means, I'm going to update the stack pointer after I've done this operation. It loads register 4, 5, 6, and 7, and the program counter. So it does a return all on one instruction. So that's all kind of cute and fun. The trouble is, the instructions say it also defined all the bottom ones. And they had to work as well, because they really didn't have a concept of, we shouldn't allow people to do this. We'll just let them do whatever the pipeline can achieve. So if you want to multiply by 33,554,431, you can do that in a single cycle. It's just not particularly useful. You can branch to that funny address. You can load a byte was funny offsets like 75-byte offsets. The load instruction underneath that-- that LDR R15-- that takes the program counter and rotates it by 15 bits. And then it adds it to the program counter. Then it accesses that memory address and loads that into the program counter. Now this is almost completely useless. But it still had to work in every implementation we did after that, because the programmers used weird instructions like this early on. The last one loads register 13 and updates register 13 as the stack pointer. So after their instruction, it's not clear what register 13 it. And so this was a lesson about architecture versus implementation. They had a tight little implementation that worked well for DRAM. But the world was moving very quickly towards high level languages. And Steve Furber, the original implementer of the design, said this recently. "We expected to get into the project finding out why it wasn't a good idea to do it. And the obstacle just never emerged from the mist. We just kept moving forward through the fog." Now we've all been in that situation with a new design of something where we just really don't know what we're doing. If we're honest, we're just progressing through the fog, trying to work out where things are. But I really love this very honest description. It explains why the architecture was more or less missing. Because to have a chip architecture you really need to have visibility over lots of implementations. You really need to be able to look forward several implementations to design a good architecture that's not going to be costly to implement in the future. I'll skip this slide. It unpacks that a little bit more for those that are interested. It basically says why those initial interesting things in the pipeline become hazardous. So if ARM2 was M for modest, as soon as you add on-chip caches, M has to begin something else. And it's muddled or messy or any other kind of sick adjective. Just beyond this, on-chip caches became affordable. And as soon as you do that to the ARM architecture, the whole architecture starts to look a bit strange and silly and hard to implement. But they went ahead and made one of these things anyway. Acorn did ARM3 It was an ARM2 with a 4-kilobyte instruction and data cache. It had no write buffer. And this was due to self-modifying code. They had a habit of writing the instruction stream directly ahead of executing it. This was mainly for bit blit graphics. But they would just spit out instructions, and then expect that to be loaded straight into the pipeline and executed. That means that you really can't have anything buffered in the write stream, because it needs to come straight back into the cache. And it means you can't have separate instruction caches and D Cache-- data caches, because they become incoherent. My thesis predicted that this would be a 40% performance loss. And that number probably got me the job at ARM. So yeah. They really had no idea of Gordon Moore and Moore's law it was really-- I don't know whether they just ignored it or just missed the memo. I guess it was somewhere in the fog. So the next generation process would have given them twice as much silicon. But they didn't exploit it. So just a quick summary. At the start of 1991, there was this joint venture between ARM and Apple with 12 engineers. Sophie Wilson, the original ARM instruction set, stayed at Acorn. She did not join ARM. Steve Furber took the professor of computing at Manchester University. So he was gone. Al Thomas-- this name is going to be an important in moment. He was the ARMS3 cache designer. He took over all CPU design at ARM. And we had no patents. No coverage at all. Acorn had never filed any patterns on this technology. And the money from Apple. And we had fab space from VLSI Technology in San Jose. Robin Saxby joined as CEO. He brought a lot of experience. We'll talk about Rob in a minute. And then later on there was a layout engineer. The office manager, Simon Segars is the current CEO. That's the tall one in the picture. I'm the other one in the picture. And then Dave Flynn joined just after me. And he's the co-recipient of the middle. Robin brought a lot of experience to ARM. I'm a hoarder of emails. I never throw away an email. And I have all my emails. And you can look back and see that he completely predicted this domination. He has a bunch of sayings there. And you'll hear him say those things a lot. We did work hard. We did have fun. So ARM started in this tiny little barn-- in a 17th century barn-- in a town called Swaffham Bulbeck-- can that name be any more English-- about eight miles Northeast of Cambridge. We added about 10 staff per year. And we had almost no money. We were almost always going bust. In fact, Brexit is kind of funny, because we would have not been in business if it wasn't for the European government funding that we received. So if Europe hadn't been part-- if England hadn't have been part of Europe, ARM wouldn't exist. Acorn and Apple Commitments. Had to do an ARM6, and an ARM7, an ARM8 for Acorn plus floating point and video controller. They really wanted high performance workstation processes. Apple really wanted something that would fit in a thing that looked a wee bit like a phone. And that was Robin's balancing act for years. I think, as he's practicing on the unicycle there. First thing ARM did-- this was just before I joined-- was the ARM6 family for Apple. The nomenclature is if it's a single digit like a 6, that's just a processor core. Can't really use it by itself. 60 is a processor core bonded out. And this is the very first ARM development card with an ARM60 right here. At the same time as I got the metal, Dave Flynn presented this to me as a gift. I'm so proud to have the very first ARM development card. I've since promised it without asking Dave yet-- sorry, Dave, if you're watching this-- to give it to the Computer Museum in San Jose, because it's kind of a start of the revolution. So that's an ARM60, which is ARM6 bonded out. An ARM600 or 16 would have caches on it. If there was a 6,000 later, we started going up to four digits, that would be an ESOC. So that's how the naming worked. They put a write buffer on this for Apple when they pushed it out at 32-bit wide address bus for Apple. And hey, the write buffer produced a 40% performance increase. So that was handy. I've only ever had one job. I worked at ARM for nine years, and then I retired. So straight out of university I joined. I sent them my thesis by post and heard not a thing. Not a single sausage was heard in New Zealand. And of course, postage back then from New Zealand to England, I didn't actually know the means by which my parcel would travel, whether it was on an airplane or a boat. So I waited patiently. And on the 2nd of May I sent an email to Jamie Urquhart, who was running the VLSI group, asking if I'd got my thesis. And John Mashey admits asking if they had any jobs. John came back and said contact HR, which made me kind of think probably not. But on the 3rd of May I heard from somebody called Lee Smith at advancedRISCMachines.co.uk. And he said that following, I have your CV. I've been impressed by it. And he's currently looking for a software person start around the end of June. So this was another piece of good timing. I got a telephone interview on the tenth of May. And I had a job offer on the 17th of May. And I arrived in the UK on the 20th of June. As part of those emails going backwards and forwards, I had the following paragraph. Lee said, "Over the past few days it has come to my attention that our understanding of ARM at the software level is insufficient." This really troubled me. I couldn't quite understand how that sentence could exist. These were ARM. How did they not understand their processor at a software level? He was actually talking about doing HDL and VHDL models-- Verilog and VHDL models of the ARM. And I went on to do some of that too. So I joined about two months after ARM600 taped out. Robin Saxby lived two hours away. And he didn't want to move his family up to Cambridge at the time. This was a start-up, so he didn't want to disrupt his whole family. So we ended up renting an apartment together in Cambridge. We have a lot in common, including the same birthday, 20 years apart. And we're both Cambridge outsiders. So we got on really well. We still do. We see each other a lot. I had a very modern software development background then. I was used to symbolically [INAUDIBLE] C and Unix. Acorn's way of hand-coding things and they used a lot of interpreted Basic was kind of archaic to me. I certainly knew that the ARM processor was too slow to compete with the big boys. I knew that we had a decent modest implementation. But the architecture was pretty much non-existent. And ARM really didn't understand the concept of architecture back then. And I knew that we didn't have John Mashey's experience. So day one was write an instruction set simulator. Day two, I handed in my thesis code. That was the easiest day's work I've ever done, cause I pretty much had that written. [LAUGHTER] [INAUDIBLE] Actually, I spent about three months fighting X86 compilation back then. And then as I said, Dave Flynn and I developed this development card. I did the software, he did the hardware. And then we did Verilog and VHDL models by wrapping that C code. I was made the head of technical marketing because I was the only one in technical marketing. So therefore I was the head and the body. Because I knew how to benchmark code and had a good experience with this, I was just flying around the world benchmarking code for people. Just so you know what a high tech startup looked like in 1991, we called Cambridge once an hour at five minutes to the hour with a 2,400 baud modem to send and receive all the email for the entire company. So if you had an important email to send and it was 10 to the hour, you were typing very quickly to catch that dial-up. We had no wireless. Wireless really hadn't been invented then. 10-bit Ethernet everywhere. A few Sun workstations. A few Acorn base workstations. But all pretty crude. So a summary is, we had two low volume customers with very different needs. We had one CPU designer. We had a modest ARM6. We had that with 600 cache, MMU, and write buffer. We had some software tools. But we had no experienced architect or complete CPU design team. We had no development cards. We had no HDL models. We had no general purpose operating system. No way to debug an ARM6 if it was buried in SOC. And as I said before, no volume customers. But most shockingly, we had no patents. Shockingly, in 1992, as I said, Sophie stayed at Acorn. Steve went to Manchester, took a professorship. And Al Thomas passed away halfway through 1992. The other half of the company-- about half the company-- were working on Acorn parts, another quarter on software tools, and the remaining quarter were support sales marketing. It turned out, 12 months after leaving university, I was the only one in the company that really had an in-depth knowledge of the ARM. And I had absolutely no clue about processor design. So I was really thrown in at the deep end. I did point out that maybe perhaps it would be a good idea if we had some patents. So they immediately made me the chairman of the patent committee. RAYMOND: Were you the entire patent committee? DAVE JAGGAR: Yeah. I was the chair of the patent committee. So I had to walk around and bribe people into writing things up as patents. So I was the entire CPU team. I understood bits of their design, because it was written in C. Other bits to instantiate it into their timing simulator I did not understand at all. We needed a follow-on processor quickly. I did have a lot of background with software architectures in general. And this was really the rebirth of ARM. Back then RISC was very popular. All the big guys were doing the RISC processor in some way. Intel had the I960, the I860 going on. Motorola had the 88K. And all the old MIPS and Sun really started all this. But everyone followed. Down the bottom there was a bunch of small embedded cores. And in between there actually wasn't much. The Motorola 68k really owned that market back then. There was a little bit of X86, but not much. I remember Robin rented my room Monday to Friday. And we had some pretty candid talks every evening. I need to rewrite this line. I think we convinced each other pretty quickly we couldn't compete with the big fish, and we should just go somewhere else. Richard Feynman has that term, there's plenty of room at the bottom. And I really like that term. There's plenty of room at the bottom. I think there's still plenty of room at the bottom of this market. So that's what we-- we started to go down into the embedded side of things. And RISC was the buzz of the industry. It was much better than CISC. So we kept calling it a RISC. But I'm really sticking for the MISC. M is now for the embedded instruction set computer. I did a really quick spin of the ARM6, made it go faster. And there's a big critical path I knew about. I learned about transistors real quick. You can't have big stacks of them if you wanted to run at low voltage. So you rearrange a few things to get it down to 3.3 volts. I put a tiny bit of debug in. I removed a reset from the return address after the processor was reset. I removed the reset wire from that latch. The hardware guys go nuts when you do this, because they point at it and say things like hi-Z. And I didn't even know what hi-Z was. Sounded like an energy drink that hadn't even been invented then. But what they let you do is you reset the processor. And then at least you knew where it was when you pressed reset. That's how crude our debug was at the start. And I filed a very narrow patent on that, because it's quite an unusual thing to do to not reset part of your chip when you hit the reset button. And that was, I think-- that was my first patent. We called it ARM7. Those changes gave it enough to give it a new name. I then went on and started to get into DSP. So at this time we were looking at MP3 code for doing digital audio players like the iPod. And so we added a faster multiplier. And I did proper integrated debug so that we could debug the processor when it was buried under an SOC. I think I'll skip this. I did multiply properly to get us into DSP. Simon Segars, who's the current CEO, freed up from the video controller. And he most of ARM7DMI. It's really great to have a CEO with a technical background. It was very well received. The debug interface really revolutionized a lot of the design tools. Because I'm a software guy, interfaces kind of come naturally to me-- well-defined interfaces. And so that really started the ARM ecosystem where people could write a debugger once and interface to a lot of different chips, because it was a proper interface at that level. I was traveling a lot at this time, doing a lot of benchmarking. The performance was great. The power consumption was great. The die size was fantastic. I was spending a lot of time in America. So the weather was much better than England. But code density bit us, and it bit us hard. We were trying to replace eight and 16-bit controllers. And obviously the reason you're putting a new microprocessor in your product is you want-- either it's a brand new product, or you're trying to put a bunch of new features in. And we ended up having code size that was bigger than the original products. We originally thought we would be smaller. But it turned out being bigger. And of course, the way memory works is you don't go from 12 kilobytes to 13 kilobytes. If you go from 12 kilobytes to 17 kilobytes, you then probably need a 32-kilobyte memory system. That's the first problem. We blew the memory budget such that they really needed to double their memory size. The other problem is a 32-bit risk instruction set computer wants 32 bits every cycle to hit full speed. It wants to swallow instruction as much as it can. And a 32-bit wide memory system then was two or even four chips. So this was painful for everyone to maybe quadruple the size of their memory system. What really drove this home to us-- a lot of people think that the chip that this became was for Nokia. It actually wasn't. It was for Nintendo. And back then games cartridges plugged in. And they were basically a bit of plastic, a tiny little bit of brass, and a stack of memory. And so if we made the wrong cartridge twice as expensive or four times as expensive, that really ate all their profit at Nintendo. So this was against the industry. Now Mike Horowitz-- the quote here was at Stanford last week. So I'm slightly OK that I've told this joke to his face. But it's unusual to see the word "ridiculed" in a technical document. But the thinking at the time was very much this, that you shouldn't try and do coding density. You should do simple decode. And that's absolutely the correct thinking for a high performance workstation. And it's just the wrong thinking for embedded. So simple decode, simple decode, simple decode was the way everyone thought. And you'd be ridiculed if you tried to do anything else. And so to swim against that tide was hard work back then. But any instruction set was fixed links as wasteful. And as we saw on that code side earlier, not all combinations are very useful. So if you can get rid of them somehow, it's good. So on a train from Nintendo to a ski weekend at Matsumoto in 1994, and literally on a napkin, I started writing the 16-bit instruction set. It was pretty much the same one that I used in my thesis. I'd learned a few more tricks by then. And so I crippled the C compiler. And what I did was I made the C compiler only produce 32-bit instructions that weren't too complicated that I knew I could compress down into 16-bit instructions. So that, because I was not using the full power of the instruction set, the programs actually got bigger. Because it was still 32-bit instruction sets. But they had instructions-- but they had gaps in them. And I knew that I could then take all those and squish them down to 16 bits. So when the program size only went up by about 40%, I smiled, because I knew I could halve that immediately back down to 70% when I re-encoded them in 16 bits. The real light bulb moment, though, was when I realized that this processor should have two instruction sets. Now at the time, remember, we're talking about reduced instruction set computers. You should one instruction that does one thing at all times. So having a machine that has two completely different instruction sets and codings and two instructions that do exactly the same thing was really weird. It's about as unRISC as you can possibly get. So I called this thing thumb, because that's the useful bit on the end of your arm. It's a second instruction set, more compact than the original one. I recorded the instructions. As I said, programs end up being 70% smaller. And if you're running from narrow memory, the code runs faster because you get a 16-bit instruction every cycle instead of having to halve the memory bandwidth to get a 32-bit instruction. I added some support for 16-bit data. I left in the ARM instruction set, so you can still do full speed if you want to, especially from on-chip memory. I also defined something called TOM. Tom Thumb, right? A 32-bit data path with only the 16-bit instruction set. And that's what's called Cortex M0 and M1 today. The other really big volume chips. And I also defined and put all the hooks in TOM 16 with a full 16-bit data path. We never did that, and we really should have. A lot of people don't know-- Unix runs really nicely on a 16-bit machine. It started life on a 16-bit machine. And one of my few regrets is that bit. So Thumb really put us on this different curve, this red curve where we could have more performance and less cost. And depending on which-- how you encoded your program, if it was an important bit of code, you encoded in the 32-bit instruction set. If it was a less important bit of code-- for example, all the GUI-- you ran all that in 16-bit code. So you had the best of both worlds. It was really on a different curve. And it was really the breakthrough for ARM and embedded. I left all the original stuff in because it was a really easy sell to say you've got the best of both worlds. And I never would have got away with replacing the entire instruction set. Remember, by this stage I'm only a couple of years out of university. So although it's exactly what I was doing, I put a back door in that later we used Thumb2. I put a prefix instruction. And no one spotted that, fortunately. It was smart politically, because it looked like a relatively small change for the chip. And for those who called it architecturally ugly, I said, yeah, it's ugly. But gee, it works well. Sophie Wilson, who was the original architect that stayed at Acorn, she hated it. She wrote to ARM's board and said, to be brief, I don't like Thumb. As a short-term hack it might be survivable. As a long-term architectural component, as my view a disaster of enormous proportions. It represents a backward step. Now the first chip sold $30 billion units. So maybe not quite as backward as she was expecting. But it was a big deal. There was an emergency board meeting. Robin Saxby's bonus was cut by 20% if he chose to do this. They really tried to stop it. Steve Furber was called in from Manchester as the judge and jury. And narrowly a side of ARM. Steve recently said, "ARM addressed the code density issue with an imaginative leap. They introduced the Thumb 16-bit instruction set." So it went from a backward step to an imaginative leap. So that's a pretty good U-turn. And this is why I say my part in ARM's downfall. It was downward in market position. But it was very much upward in success. I will say it's much harder to simplify something like this than you think. Looking back on it, it looked so easy at the time. It was just, how do I take this big complex problem and make a simple solution? And RISC in general is a little like that. It's often hard to look across. It still looks like an Anaconda that swallowed a goat. But there's this little Thumb decode in the front. There was fresh air in there. And I could slip the decoder in so that we just decoded 16-bit instructions to 32-bit instructions. And the rest of the pipeline just thought it was being fed 32-bit instructions. I fixed quite a lot of other things that were wrong with the architecture. I hid a lot of the ugliness. And I really thought no one noticed. But in the latest version of Patterson and Hennessy there's the statement at the bottom. "In many ways, the simplified Thumb architecture is more conventional than ARM." So someone actually noticed that I did a bunch of cleaning up in there. And they least miss the guy that originally gave me the job just said last year, "Thumb was essential to our success." That's his summary of it. 32-bit ARM sealed the deal, getting to 2/3 of the code size took 10 years, but they could see we're on a trajectory to an asymptote. Nokia were driving round Finland with a van full of equipment testing cell phones at the time. They looked at Thumb, realized how much it outperformed the competitors, and were sold on it. And so Ericsson and Motorola were the other big names in phones. Then they had to follow. And so we ended up selling an ARM license to Motorola. So this was-- wow. We've actually sold a license to the big guys. Texas Instruments loved it. They combined it with a lot of their DSPs. The chip was called MAD-- microprocessor and DSP. And I think it's fair to say it really rewrote the rulebook on what an embedded processor should look like. MIPS followed fairly quickly with MIPS 16. The latest RISC-V, if you're familiar with it out of Berkeley and Stanford, has the C optional 16-bit instruction set that you can bolt on it for embedded control. These are the two big patents. Notice that actually MISC is not a bacronym. Right on the patent title back then, multiple instruction sets. Multiple instruction set mapping. So MISC isn't a bacronym really. Multiple Instruction Set Computer. They were filed early 1994. I'm the inventor. No, I do not get all the money. Everyone asks that. That would be nice. But that's the ARM of the assignee. The patent people in the audience might like to read this one at their leisure. We had some narrow patents and some wide patents. ARM7TDMI, the processor that came out of this, was never cloned successfully. The little guy, ARMs 2, 6, and 7 when they had less patent coverage, or almost no patent coverage, were cloned a lot. I was flying a lot by this time. I was just selling this thing and benchmarking this thing. We're still a pretty small company-- maybe 40 people. And all the big names getting into printers, getting into hard drives, getting into headless terminals and all this sort of stuff. Cars, of course. The printer and camera guys really liked it. We had some weird customers. NKK Steel, who were just a big steel company, took a license. I still don't know why. We had some on the eurofighter. That scared me. I didn't want to be anywhere near a eurofighter, cause I knew how many bugs we'd seen over the years. But anyway. There was one on the eurofighter. And I accidentally visited the NSA. They wanted me to put a backdoor on the processor. I thought I was honestly visiting the National Semiconductor of America. National Semiconductor used to be a firm. And "of America" used to be a thing you put on your end of your title. My boss gave me a bollocking for why I didn't hit return with any business cards. And later I worked out what the NSA was. That became that skipjack clipper program that came on much later. In parallel we had a big project running at ARM-- ARM8 and 810 was using up about half our resource to try and do a fast processor for Acorn as best we could. We had a single instruction data cache for the self-modifying code problem. But we didn't put Thumb and debug on that. And the floating point was difficult too. But we did that to their specification. But it used up an enormous amount of resource. So that's what most of the company were doing. So ARM7TDMI was really successful. I was traveling a lot-- I'm starting to think about how to go faster-- when Digital Equipment Corporation, who were third or fourth biggest people in computer company in the world then, came on long and said, we'd like to do a fast ARM for Apple. Now digital had about four-- well, they had exactly four that I know of-- reduced instruction set programs going on at the company at that time. They had the Hudson RISC. They had Titan. They had Prism. And lately, Alpha. And Alpha was originally called EV, because their programs kept getting canned because Vex was everything at digital. And if your program had nothing to do with Vex, when the cutbacks came, they were just canned. So the prism architecture was a beautiful little architecture. But it got canned. So they started a new architecture, which they called Extended VAX-- EV. And it didn't get canned, even though it had nothing to do with VAX. It just had VAX in the title. And I really learned about that. I thought, well, that's kind of-- hide that from the board. Later, by the way, the marketing people got hold of it. And they called it the Alpha AXP. And the joke in the engineers was, AXP stood for Almost Exactly Prism. They blew the doors off the industry, they were running at 200 megahertz when everyone else was about 66. It actually turned out to be too late to say digital. But probably the best design team on the planet. Quite a lot of these people are still active. I went to Texas for eight weeks and wrote the ARM ARM-- the Arm Architecture Reference Manual. I just cleaned up the whole architecture and said, don't do this. We promise not to halt and catch fire if you do do this. We promise not to get privileged if you do do this. Otherwise, don't do this. And I learn a whole lot about how to design a chip from these guys. They were a very friendly bunch of people. I didn't downplay Thumb. But I didn't talk it up either. I basically said, you guys do the high end where you've got 32-bit memory systems, 32-bit on-chip caches. We'll stay at the low end. And that could be our differentiation. We all agreed that was quite a good idea. So the StrongARM processor came out. They basically cut an Alpha in half. It was so fast that Apple started rewriting their self-modifying code. But it was-- did I say Apple? Acorn started rewriting their self-modifying code, which was the nail in our mate's coffin, really at the ARM company. But nothing could save Acorn by then. It was just too late for them. But I snuck back to Cambridge having learned everything about how the StrongARM was designed, and told ARM we should do an ARM8E. And this was that lesson about don't call anything new because it may well get canned by the board if it's not in line with the product roadmap. So I called it ARM8E, even though it had absolutely nothing to do with ARM8. It was the StrongARM pipeline. A direct rip-off. I add Thumb and debug to it. And a tiny little design team, again including Simon Segars. And it was launched at ARM as ARM9TDMI. There's the pipe. It's starting to look a lot less like an Anaconda full of goat. It's pretty streamlined, that machine. Digital taught us how to do that. Those two chips together are still responsible for about 80% of ARM shipped today. So they've been tremendously successful. That TOM32 machine did get built as that Cortex M0 and Cortex M1. The little arms have no 32-bit instruction set at all anymore. Then we decided to do-- it was silly for ARM and Digital to be designing chips together. And we particularly-- I particularly-- wanted to do floating point properly. And they had a lot of floating point experience. So we decided to a joint design center in Austin, Texas. We employed just about everybody in England that could spell microprocessor backward, let alone design one. So we really needed to tap into another market. America was a lot more expensive for salaries than Cambridge was. But we had to bite that bullet. So we did this design center in Austin, Texas. So I went to Austin in late 1996. My oldest daughter Catherine was born. She's here today too. She's just finished her EE degree. So that's what I've been doing in between by the way, is raising my children. But this program ran into some huge unforeseen problems-- unavoidable problems. First of all, I noticed the ARM19, we didn't have a great debug strategy. And they were booting operating systems. We had Window C, the Symbian operating system, and Linux were all running on ARM at the time. And our first silicon ran about 10,000 structures and fell over. And so we spun the silicon. Got the silicon back. It ran about 10,000 more instructions and fell over. And they did this four times from memory. And this is a very expensive long loop to be going around. Digital had enough performance that they were booting the operating system on the neat list. They had enough compute in the Digital company that they were getting about 100 instructions per second. And they were booting Unix up to the command prompt. I really thought that was cool, and really wanted to exploit that somehow. The next thing that happened was that Digital sued Intel. And Intel looked at the price of the lawsuit and the price of Digital and went, let's just buy Digital. No one at the Digital design center in Austin wanted to work for Intel, so they all quit. And they didn't want to work for ARM. They wanted to do their own startup. And then we were using Compass design tools at the time. They were bought by Avanti. And I believe overnight the licenses just stopped working. So we had no design flow. So obviously big problems are a big opportunity. I went back to the emergency board meeting in Cambridge. Do we stop this now and fire the four or five people we've hired and apologize profusely? Or do we change gear and do our own design center? That's what we decided to do. They gave me headcount for 50. I only ever used about 20. But I didn't get to spend much more time at home with my family. So we had a new chip, a new team, new tools, new flow, new country. So obviously I had to get infrastructure, buildings, admin, the time zones are a pain. There was a hiring frenzy. I borrowed the support people from elsewhere. I just didn't have time to put all that together. But this was really a startup in Austin. And I didn't want to back off on the deliverable. I wanted ARM10 to be about twice as fast as ARM9. By the time you add floating point to that and new support for operating systems, it ended up being about four or five times more complex than ARM9. I was really worried about the ARM9 long loop around booting code. I really wanted to find some way of getting much better validation in these silicon chips. We didn't do super scalar on ARM10. But we set up for the next chip to be super scalar. And that group in Austin went on to do the start of all those Cortex A series that were all in the phones. It was another group and Sophia in France, they had it also. They ping-ponged back between the two designs. But the ARM10 was a decent chip. It had an 8-stage pipe. It ran fast. It did indeed become twice as fast as the ARM9. We fixed up everything we could that we knew about in ARM9. And so ARM10 was very successful. We did brand new floating point from the ground up with little short vectors in it. That architecture is still in use today in ARMV8. So that architecture is 23 years old already. So that probably says it was half decent. That floating point also got back-ported to ARM9 and ARM7. So it really was a broad architecture. We put in some proper software hooks. By this time people were actually debugging code that was running on the arm. So we'd sort of come full circle. People were making something powerful enough to actually develop on the machine. We completely reworked our validation methodology. We started including random instruction set generation to just throw random instructions at the core to see if we can make it blow up. I had a small brainwave that code that I wrote way back for my thesis. I brought it up to ARM10 specification. And I made that code record an instruction trace. When we booted an operating system, I saved every instruction that got pulled into the core. And I recorded every data transfer to and from the core. And I played that back to the transistor model. And wherever they were different, we set around the table and worked out whether it was their fault or mine. It was about 50/50. But we got it to the point where we could make this instruction trace of the three operating systems booting. And we could run that on a simple Sun workstation. And as soon as we saw a problem, we could stop and go and look, really pinpointed where the problems were. And of course, we'd fix the transistors manually. Run a regression test up to that point to make sure we hadn't broken anything with the fix. And then kept on booting. So we were able to boot whole operating systems that way. And we ended up finding every bug in ARM10 that way. Worked beautifully, along with the random instruction generation. ARM10200 was very successful. And as I said, it was the start of that Austin design center. In the year 2000 we had that silicon back again. We knew what we were going to do for Rev 1. There's always a few tweaks. You don't get it perfect when you type it out. You get something that's very close, and then make some silicon, bring it up. The Austin office was about 45 people by then, a pretty experienced team. And they've gone on to be a wonderful set of CPU designers. I bailed. I was from New Zealand, remember. So I bailed back to New Zealand in early June 2000 just after nine years at ARM. Technically I was on sabbatical. And I've been ever since working on much more powerful CPUs. I spent the next few years actually wading through patents because there was a lawsuit over ARM7TDMI. But I was pretty happy with what I've achieved. I did work hard, but had fun. We got a few things wrong. We backed-- we backed-- the people we backed were mostly wrong. And the people we didn't back were mostly right. So we really got it-- we were one inverter away from success, one not 8 away from success. If we had of backed any of the other games consoles, we probably would have been fine. If we had of backed the Palm Pilot, we probably would have been a little better off. We didn't see Nokia coming. I personally did not see cell phones coming at all. I looked at the possibility of the cell phone infrastructure and thought, wow. They're really going to dig up every road and put aerials on top of buildings. And this just seemed so unlikely. But I actually thought the Iridium cell ph-- the satellite stuff was going to work better. I often look back at my life. And I don't if you know the movie "Slumdog Millionaire." it's quite well-known. It's the Indian fellow who's had a hard life. But just through serendipity he just happens to know-- he only asks the questions he knows the answers to somehow. He doesn't know much. But he knows the answers to the questions he's asked. And I always feel that my career has been a little bit like that. If any of the things on the bottom were missing, I just don't think much of this would have come together. Certainly Lee-- ARM was 25 years old in 2015. And Lee wrote me this lovely email saying that he had been asked as one of the four founders that were still at ARM what their most significant milestone was. And he said it was hiring me on the telephone. I love the quote. He said, starting with memorable moments, starting with returning to the Barn, the beautiful old Barn at quarter to 9 to phone you in New Zealand. Robin Saxby, [INAUDIBLE] was just leaving the pub and offered to buy me a pint-- of beer, obviously. If I'd ever accepted and missed the interview, history might have been very different. Ended with picking me up, taking to Mike Muller's place for a shower. I'd been on an airplane for 24 hours. I'm glad he did that. And then into the Barn and out for lunch to a curry house in Bottisham-- another cute wee town. I'll never forget your comment when your food arrived. "Gee, Mom. I flew halfway around the world to eat lamb and potatoes." Great time with great people. But yeah, a lot of serendipity in there. And that's-- Robin's 70 and I'm 50 in that photograph. A couple of years ago. We still get together for our birthdays. So with a few minutes to go, I've got time for any questions. Sorry if that was a little rushed. But it's hard to pack nine years into 45 minutes. RAYMOND: You did an amazing job. You can go to the microphones for questions. AUDIENCE: So what would you recommend for somebody who is interested in learning about CPU design and implementation nowadays, even as just a hobby? Or even just any silicon chip in general. DAVE JAGGAR: Is there any ARM snipers? No. I would Google RISC-V and find out all about it. They've done a fine instruction set, a fine job. And they're explaining it. This is Berkeley and Stanford are behind this. There are obviously commercial companies like [INAUDIBLE] doing things. But it's the state of the art now for 32-bit general purpose instruction sets. And it's got the 16-bit compressed stuff. So you're learning about that, learning from the best. Still. AUDIENCE: All right. Thank you. AUDIENCE: Hello. So it seems like if you're programming in the '80s, you would know a lot more things kind of down below, like, the lower levels. And now things are so complicated that if someone's coming out of school, they're not going to be able to really understand everything that's going on below them. So do you think that's sort of making it harder for us to have a full view? Or maybe that's just the way that things are now. And we're just going to have to accept that? What do you think about that? DAVE JAGGAR: I certainly agree with you. There's so much going on. I mean, I'm still very active. What was I doing the other day? I have a-- first of all, let's talk about the Raspberry Pi. That whole program was trying to address exactly what you're talking about. It's giving something simple enough where you can look at a software stack top to bottom. Well, that's still complicated. Even I look at the boot process and go, man, this is hard work to keep in your head. So there's a lot going on. I absolutely agree with you. I personally hate programming languages like Python, because I look at inserting something into the list. And just know how many bazillion instructions are going on to support that piece of code. I just can't quite get my head around doing all that stuff. I know it's productivity. I think the best we can probably do is things like a Raspberry Pi. I was recently looking at the host APD code recently cause it didn't work at 5 gigahertz. And you can burrow down into that a bit and learn a lot. I think, to come back to that statement about fog too. When I started out I remember being-- I think scared is the right word. When you're in that fog and you know nothing, and you really feel like you're a dumb idiot. And you go, other people understand this, but I don't. I've sort of embraced that over time and gone, I know tomorrow I will know more than I do today. I always feel like I'm kind of groping around in a dark room trying to find the furniture. But I think that's also the thing is not to be afraid of that situation and know that you're a bright person if you're in this room, let's face it. Other people understand this stuff. But not to be afraid to grope around in the dark like that and just try and get one more piece of information than you got yesterday. And then slowly start-- stuff comes together, and you can build on that. But yeah. It's complicated now. I mean, look at-- well, I don't want to say Android on top of Linux on top of ARM. But man, there's a stack. There's a stack of code in there. I mean, I've hacked around in that quite a lot. And it takes a lot of understanding, even with my background. So yeah, it is. It's complicated. It's hard. Maybe there will be-- with machine learning-- maybe there'll be another big revolution. I'm pretty sure it's coming. Where we really look at what an algorithm is now in the modern world, and reinvent hardware to support that top down. So I really think that's coming. I've got a pretty good idea of how that will shake down, I think. But yeah. Yeah. AUDIENCE: How do you think it will shake down? DAVE JAGGAR: Pardon? AUDIENCE: How do you think it will shake down? DAVE JAGGAR: If-- and this is a really interesting experiment. I think everyone should do this at some point. Open a messenger session to a friend, have them use a different service provider to you, send them-- hit the 1 key. And run everything in between on a simulator. And then just watch how much data gets sucked in and sucked out to send the one key through all that networking, all the fonts, all the graphics and everything. What's going on is-- and I think it's-- there's this wonderful one analogy on YouTube. It's a comedian. And he says, the difference between male brains and female brains-- and this strikes a chord with me. He basically says, men's brains put everything in little boxes, and the boxes mustn't touch. And female brains go [BUZZING SOUNDS] all the time. And I really think we have to design hardware that's much closer to [BUZZING SOUNDS].. I think a lot of engineering has got this data based in blocks. And we call them buffers. And we have these interfaces where you call a piece of code, and that passes back a nice buffer. And then that code must never touch that code. And that code must-- and they're all separated. I really think we're going to end up with a machine where you put the data on the top. And the data is going to fall out the bottom. And it's working in a much more integrated way. If you YouTube that comedian, you'll sort of understand. I'm not telling the story very well. But it really comes across as I really think we have to be thinking in a much more holistic way than generally engineers have in the past. I think it's a limited way that we think when we partition data. And it means that, of course, think about-- let me give an easy example. Think about inserting a character into the middle of a string. So this should be kind of easy, right? If I'm a character of a string, and you're the next character of the string, and I want to put a character in between us, I say, well, I'm going to just not hold your hand anymore. You're going to hold his hand. And away we go. And it's easy. It's all local. And we understand exactly what's going on. When you convert that into a computer, you've got a 64-bit address. I've got a 64-bit address. I might be 0. You might be 1. But I'm stored on a 64-bit number. I have absolutely no idea of what the locality of you is in the program related to me. But if you build on this hardware, and it's easy to do when you think about it, if I want you to move along the array more, I just pull on a wee line to you that says, increment your index, and slot that new guy in. That's really easy to do. If I want to delete you from the queue, I just say, remove yourself. And you say, everyone north of you, decrement your index. And everything will sort of close up. You get all that in a program. It's called the directed flow graph. That's all there. And we sort of throw that away with the stake of software we put on top. That's the crazy thing. All the layers that we've put in between, all the differences between hardware and software and assembler and linkages and operating systems. With layers and layers and layers. And you actually lose the meaning of the program. And the hardware then works very hard to try and put that meaning back together. Anyway. Sorry that was a long answer to a simple question. RAYMOND: That's a great answer. One thing, I want to say that I found-- I was really glad to hear you say was, you'll be smarter tomorrow. DAVE JAGGAR: Yeah. RAYMOND: One of the things I always tell myself, gets me through every day is, you know those smart gals and guys? They're just meat and bone like you. DAVE JAGGAR: They are. Yeah, yeah, yeah. RAYMOND: Thank you. AUDIENCE: Great story. Thank you. I was particularly struck by one of the quotes on the slide where Robin Saxby says that you will never manufacture chips. DAVE JAGGAR: Yeah. AUDIENCE: And I was wondering if you could talk more about those when you decide not to take a path. I mean, was that a courageous decision? DAVE JAGGAR: He is incredibly-- there's this buzz word-- global. He was always global. He said, we have this partnership business model. We'll do this. And they do that. And we're not going to compete. And the very broad-- and it's nowhere near this defined-- but the very broad thought was, if we design the chip once and sell it three times, we can afford to sell it for about half or a third of what it would cost them to develop it. They're getting a deal. We are getting a business. And there's just no need for us to sell any product. Our product is just going to be design. And it was a very successful intellectual property company. I mean, as I said, it's tiny compared to Google. But it really has no real product. And so his foresight was very strong about that. We challenged it a few times. We should make a few embedded controllers to go on development cards. No. We should make some SOCs as demonstrators. No. And we did do some SOCs in the end. But we never bought any fab space. Always done through partnership. And that clear distinction was incredibly beneficial. And he was he was absolutely rigid in that decision, and absolutely right in that decision. AUDIENCE: It's slightly different from say, what Qualcomm has done, for example. DAVE JAGGAR: That whole industry is sort of on its-- on its-- AUDIENCE: Ear. DAVE JAGGAR: On its ear now. So now those fabs that don't design anything. So yeah, absolutely. The TSMCs and the global foundries of this world, you can just buy fab space. So there's this other product family in it, knitted in very well with what ARM-- you know, you've got a designer of a chip, somebody that integrates the rest of the IP and then fabs it. And so they're all quite separate things now. But yeah. The TSMC and global foundries of this world are almost exactly the other way around. AUDIENCE: Right. DAVE JAGGAR: Yeah. Again, they don't compete. AUDIENCE: Thank you. DAVE JAGGAR: Yeah. RAYMOND: We're over. But I'm gonna say one more question. I want to say host privilege. One more question. AUDIENCE: It's a incredible talk. DAVE JAGGAR: Thanks. AUDIENCE: Very briefly, Specter and Meltdown. DAVE JAGGAR: Yeah. AUDIENCE: So how much has that changed your thinking? And do you feel like there's a future for CPUs where they solve the problem in some way? Or there is securer CPUs that have completely rigorous predictable performance, and others that can have variable performance, but a risk of side channels. Thank you. DAVE JAGGAR: This answer is in 1996 I wrote a patent that said if you bring anything speculatively into the chip, make sure you take it all the way back out again. I guess they lost that patent down the back of the couch, right? AUDIENCE: They did. DAVE JAGGAR: I always hassle them about that. Again, that's thinking like a software guy. Bringing that stuff in speculatively. You've got to take it out again, guys. You can't leave it in the processor. How to handle that in the future. I think we're all basically nice engineers that just don't expect people to stab us ins the back with a-- we're just-- and now we're all a little less innocent and probably looking at how can we break this thing. But we're always going to be chasing our tails, you know. It's impossible to find every single backdoor into the processor. We're always going to be chasing our tails as far as trying to spot where some sneaky little person might-- and so they should, by the way. You know, if they don't do it, someone that's really nefarious will. But I don't know where there's a good solution in the-- my patent, while smart, was a whole lot easier back then when the chips were a whole lot simpler. But I think now the side channel attacks. [INAUDIBLE] better known. Or we're better able to handle them. Yeah. AUDIENCE: Thank you. DAVE JAGGAR: Yeah. RAYMOND: All right. With that, thank you all, and thank you, Dave. [APPLAUSE]

Info

Channel: Talks at Google

Views: 17,816

Rating: undefined out of 5

Keywords: talks at google, ted talks, inspirational talks, educational talks, A History of the ARM Microprocessor, Dave Jaggar, ARM Microprocessor Rise and Fall, ARM architecture, how are cell phones so small, ARM Microprocessor

Id: _6sh097Dk5k

Channel Id: undefined

Length: 62min 28sec (3748 seconds)

Published: Thu Jun 20 2019