The future of computing: a conversation with John Hennessy (Google I/O '18)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] JOHN HENNESSY: Boy, I'm delighted to be here today and have a chance to talk to you about what is one of the biggest challenges we faced in computing in 40 years, but also a tremendous opportunity to rethink how we build computers and how we move forward. You know, there's been a lot of discussion about the ending of Moore's law. The first thing to remember about the ending of Moore's law is something Gordon Moore said to me. He said, all exponentials come to an end. It's just a question of when. And that's what's happening with Moore's law. If we look at-- what does it really mean to say Moore's law is ending? What does it really mean? Well, look at what's happening in DRAMs. That's probably a good place to start because we all depend on the incredible growth in memory capacity. And if you look at what's happened in DRAMs, for many years we were achieving increases of about 50% a year. In other words, going up slightly faster even than Moore's law. Then we began a period of slowdown. And if you look what's happened in the last seven years, this technology we were used to seeing boom, the number of megabits per chip more than doubling every two years, is now going up at about 10% a year and it's going to take about seven years to double. Now, DRAMs are a particularly odd technology because they use deep trench capacitors, so they require a very particular kind of fabrication technology. What's happening in processors, though? And if you look at the data in processors, you'll see a similar slowdown. Moore's law is that red line going up there on a nice logarithmic plot. Notice the blue line. That's the number of transistors on a typical Intel microprocessor at that date. It begins diverging, slowly at first. But look what's happened since in the last 10 years, roughly. The gap has grown. In fact, if you look at where we are in 2015, 2016, we're more than a factor of 10 off, had we stayed on that Moore's law curve. Now, the thing to remember is that there's also a cost factor in here. Fabs are getting a lot more expensive and the cost of chips is actually not going down as fast. So a result of that is that the cost per transistor is actually increasing at a worse rate. So we're beginning to see the effects of that as we think about architecture. But if the slowdown of Moore's law, which is what you see all the press about as one thing, the big issue is the end of what we call Dennard scaling. So Bob Dennard was an IBM employee, he was the guy who invented the one transistor DRAM. And he made a prediction many years ago that the energy, the power per square millimeter of silicon would stay constant, would stay constant because voltage levels would come down, capacitance would come down. What does that mean? If the energy, if the power stays constant and the number of transistors increases exponentially, then the energy per transistor is actually going down. And in terms of energy consumption, it's cheaper and cheaper and cheaper to compute. Well, what happened with Dennard scaling? Well, look at that blue line there. The red line shows you the technology improving on a standard Moore's law curve. The blue line shows you what's happening to power. And you all know. I mean, you've seen microprocessors now, right? They slow their clock down, they turn off cores, they do all kinds of things, because otherwise they're going to burn up. They're going to burn up. I mean, I never thought we'd see the day where a processor would actually slow itself down to prevent itself overheating, but we're there. And so what happens with Dennard scaling is it began to slow down starting about '97. And then since 2007, it's essentially halted. The result is a big change. All of a sudden, energy, power becomes the key limiter. Not the number of transistors available to designers, but their power consumption becomes the key limiter. That requires you to think completely differently about architecture, about how you design machines. It means inefficiency in the use of transistors in computing. Inefficiency in how an architecture computes is penalized much more heavily than it was in this earlier time. And of course, guess what? All the devices we carry around, all the devices we use are running off batteries. So all of a sudden, energy is a critical resource, right? What's the worst thing that happens is your cell phone runs out of power, your smartphone runs out of power. That's a disaster, right? But think about all the devices we walk around with. They're hooked up to battery. Think about the era, the coming era of IoT, where we're going to have devices that are always on and permanently on, which are expected to last 10 years on a single battery by using energy harvesting techniques. Energy becomes the key resource in making those things work efficiently. And as we move more and more to always on devices with things like Google Assistant, you're going to want your device on all the time or at least you're going to want the CPU on all the time, if not the screen. So we're going to have to worry more and more about power. But the surprising thing that many people are surprised by is that energy efficiency is a giant issue in large cloud configurations. This shows you what the typical capital cost would be like for a Google data center. You'll notice that green slice there, those are the servers. But look at the size of that red slice. That red slice is the cost of the power plus cooling infrastructure. Spending as much on power and cooling as you're spending on processors. So energy efficiency becomes a really critical issue as we go forward. And the end of Dennard scaling has meant that there's no more free lunch. For a lot of years, we had a free lunch. It was pretty easy to figure out how to make computation more energy efficient. Now, it's a lot harder. And you can see the impact of this. This just shows you 40 years of processor performance, what's happened to uniprocessor, single processor performance, and then multiprocessor performance. So there were the early years of computing, the beginning of the microprocessor era. We were seeing about 22% improvement per year. The creation of risk in the mid-1980s, a dramatic use of instruction level parallelism, pipelining, multiple issue. We saw this incredible period of about 20 years, where we got roughly 50% performance improvement per year. 50%. That was amazing. Then the beginning of the end of Dennard scaling. That caused everybody to move to multi-core. What did multi-core do? Multi-core shoved the efficiency problem from the hardware designer to the software people. Now, the software people had to figure out how to use those multi-core processors efficiently. But Amdahl's law came along, reared its ugly head. I'll show you some data on that. And now, we're in this late stage period where it looks like we're getting about 3% performance improvement per year. Doubling could take 20 years. That's the end of general purpose processor performance as we know it, as we're used to for so many years. Why did this happen? Why did it grind to a halt so fast? Well, think about what was happening during that risk era where we're building these deeply pipelined machines. 15, 16, 17 stages deep pipelines, four issues per clock. That machine needs to have 60 instructions that it's working on at once. 60 instructions. How does it possibly get 60 instructions? It uses speculation. It guesses about branches, it yanks instructions and tries to execute them. But guess what happens? Nobody can predict branches perfectly. Every time you predict a branch incorrectly, you have to undo all the work associated with that missed prediction. You've got to back it out, you've got to restore the state of the machine. And if you look inside a typical Intel Core i7 today, on integer code roughly 25% of the instructions that get executed end up being thrown away. Guess what? The energy still got burnt to execute all those instructions. And then, I threw the results away and I had to restore the state of the machine. A lot of wasted energy. That's why the single processor performance curve ended, basically. But we see similar challenges when you begin to look at multi-core things. Amdahl's law, Gene Amdahl wrote Amdahl's law more than 40 years ago. It's still true today. Even if you take large data centers with heavily parallel workloads, it's very hard to write a big complicated piece of software and not have small sections of it be sequential, whether it's synchronization or coordination or something else. So think about what happens. You've got a 64 processor multi-core in the future. Suppose 1%, just 1% of the code is sequential. Then that 64 processor multi-core only runs at the speed of a 40 processor core. But guess what? You paid all the energy for a 64 processor core executing all the time and you only got 40 processors out of that, slightly more than half. That's the problem. We've got to breakthrough this efficiency barrier. We've got to rethink how we design machines. So what's left? Well, software-centric approaches. Can we make our systems more efficient? It's great that we have these modern scripting languages, they're interpreted, dynamically-typed, they encourage reuse. They've really liberated programmers to get a lot more code written and create incredible functionality. They're efficient for programmers. They're very inefficient for execution, and I'll show you that in a second. And then there are hardware-centric approaches, what Dave Patterson and I call domain-specific architectures. Namely, designing an architecture which isn't fully general purpose, but which does a set of domains, a set of applications really well, much more efficiently. So let's take a look at what the opportunity is. This is a chart that comes out of a paper by Charles Leiserson and a group of colleagues at MIT, called "There's Plenty of Room at the Top." They take a very simple example, admittedly, matrix multiply. They write it in Python. They run it on an 18 core Intel processor. And then they proceed to optimize it. First, rewrite it in C. That speeds it up 47 times. Now, any compiler in the world that can get a speed up of 47 would be really remarkable, even a speed up of 20. Then they rewrite it with parallel loops. They get almost a factor of nine out of that. Then they rewrite it by doing memory optimization. That gives them a factor of 20. They block the matrix, they allocate it to the caches properly. That gives them a factor of 20. And then finally, they rewrite it using Intel AVX instructions, using the vector instructions in the Intel Core, right, domain-specific instructions that do vector operations efficiently. That gives them another factor of 10. The end result is that final version runs 62,000 times faster than the initial version. Now admittedly, matrix multiply is an easy case, small piece of code. But it shows the potential of rethinking how we write this software and making it better. So what about these domain-specific architectures? Really what we're going to try to do is make a breakthrough in how efficient we build the hardware. And by domain-specific, we're referring to a class of processors which do a range of applications. They're not like, for example, the modem inside the cell phone, right? That's programmed once, it runs modem code. It never does anything else. But think of a set of processors which do a range of applications that are related to a particular application domain. They're programmable, they're useful in that domain, they take advantage of specific knowledge about that domain when they run, so they can run much more efficiently. Obvious examples, doing things for neural network processors, doing things that focus on machine learning. One example. GPUs are another example of this kind of thinking, right? They're programmable in the context of doing graphics processing. So for any of you who have ever seen that any of the books that Dave Patterson and I wrote, you know that we like quantitative approaches to understand things and we like to analyze why things work. So the key about domain-specific architectures is there is no black magic here. Going to a more limited range of architectures doesn't automatically make things faster. We have to make specific architectural changes that win. And there are three big ones. The first is we make more effective use of parallelism. We go from a multiple instruction, multiple data world that you'd see on a multi-core today to a single instruction multiple data. So instead of having each one of my cores fetch separate instruction streams, have to have separate caches, I've got one set of instructions and they're going to a whole set of functional units. It's much more efficient. What do I give up? I give up some flexibility when I do that. I absolutely give up flexibility. But the efficiency gain is dramatic. I go from speculative out-of-order machines, what a typical high-end processor from ARM or Intel looks like today, to something that's more like a VLIW, that uses a set of operations where the compiler has decided that a set of operations can occur in parallel. So I shift work from runtime to compile time. Again, it's less flexible. But for applications when it works, it's much more efficient. I move away from caches. So caches are one of the great inventions of computer science, one of the truly great inventions. The problem is when there is low spatial and low temporal locality, caches not only don't work, they actually slow programs down. They slow them down. So we move away from that to user control local memories. What's the trade-off? Now, somebody has to figure out how to map their application into a user controlled memory structure. Cache does it automatically for you, it's very general purpose. But for certain applications, I can do a lot better by mapping those things myself. And then finally, I focus on only the amount of accuracy I need. I've move from IEEE to the lower precision floating point or from 32 and 64-bit integers to 8-bit and 16-bit integers. If that's all the accuracy I need, I can do eight integer operations, eight 8-bit operations in the same amount of time that I can do one 64-bit operation. So considerably faster. But to go along with that, I also need a domain-specific language. I need a language that will match up to that hardware configuration. We're not going to be able to take code written in Python or C, for example, and extract the kind of information we need to map to a domain-specific architecture. We've got to rethink how we program these machines. And that's going to be high-level operations. It's going to be vector-vector multiply or a vector-matrix multiply or a sparse matrix organization, so that I get that high-level information that I need and I can compile it down into the architecture. The key in doing these domain-specific languages will be to retain enough machine independence that I don't have to recode things, that a compiler can come along, take a domain-specific language, map it to maybe one architecture that's running in the cloud, maybe another architecture that's running on my smartphone. That's going to be the challenge. Ideas like TensorFlow and OpenGL are a step in this direction, but it's really a new space. We're just beginning to understand it and understand how to design in this space. You know, I built my first computer almost 50 years ago, believe it or not. I've seen a lot of revolutions in this incredible IT industry since then-- the creation of the internet, the creation of the World Wide Web, the magic of the microprocessor, smartphones, personal computers. But the one I think that is really going to change our lives is the breakthrough in machine learning and artificial intelligence. This is a technology which people have worked on for 50 years. And finally, finally, we made the breakthrough. And the basis of that breakthrough? We needed about a million times more computational power than we thought we needed to make the technology work. But we finally got to the point where we could apply that kind of computer power. And the one thing-- this is some data that Jeff Dean and David Patterson and Cliff Young collected-- that shows there's one thing growing just as fast as Moore's law-- the number of papers being published in machine learning. It is a revolution. It's going to change our world. And I'm sure some of you saw the Duplex demo the other day. I mean, in the domain of making appointments, it passes the Turing test in that domain, which is an extraordinary breakthrough. It doesn't pass it in the general terms, but it passes it in a limited domain. And that's really an indication of what's coming. So how do you think about building a domain-specific architecture to do deep neural networks? Well, this is a picture of what's inside a tensor processing unit. The point I want to make about this is if you look at this what uses up the silicon area, notice that it's not used for a lot of control, it's not used for a lot of caching. It's used to do things that are directly relevant to the computation. So this processor can do 256 by 256-- that is 64,000 multiply accumulates, 8-bit multiply accumulates every single clock. Every single clock. So it can really crunch through, for inference things, enormous amounts of computational capability. You're not going to run general purpose C code on this. You're going to run something that's a neural network inference problem. And if you look at the performance and you look at-- here we've shown performance per watt. Again, energy being the key limitation. Whether it's for your cell phone and you're doing some kind of machine learning on your cell phone or it's in the cloud, energy is the key limitation. So what we plotted here is the performance per watt. And you see that the first generation tensor processing unit gets roughly more than 30 times the performance per watt compared to a general purpose processor. It even does considerably better than a GPU, largely by switching from floating point to lower density integer, which is much faster. So again, this notion of tailoring the architecture to the specific domain becomes really crucial. So this is a new era. In some sense, it's a return to the past. In the early days of computing, as computers were just being developed, we often had teams of people working together. We had people who were early applications experts working with people who were doing the beginning of the software environment-- building the first compilers and the first software environment-- and people doing the architecture. And they're working as a vertical team. That kind of integration, where we get a design team that understands how to go from application to representation in some domain-specific language to architecture and can think about how to rebuild machines in new ways to get this, it's an enormous opportunity and it's a new kind of challenge for the industry to go forward. But I think there are enough interesting application domains like this where we can get incredible performance advantages by tailoring our machines in a new way. And I think if we can do that, maybe it will free up some time to worry about another small problem, namely cybersecurity and whether or not the hardware designers can finally help the software designers to improve the security of our system. And that would be a great problem to focus on. Thank you for your attention and I'm happy to answer any questions you might have. [APPLAUSE] Thanks. AUDIENCE: Can you talk about some of the advances in quantum and neuromorphic computing? JOHN HENNESSY: Yeah. So quantum-- that's a really good question. So my view of this is that we've got to build a bridge from where we are today to post-silicon. The possibilities for post-silicon, there are a couple. I mean there's organic, there's quantum, there's carbon nanofiber, there's a few different possibilities out there. I characterize them as technology of the future. The reason is the people working on them are still physicists. They're not computer scientists yet or electrical engineers, they're physicists. So they're still in the lab. On the other hand, quantum, if it works, the computational power from a reasonably modest sized qubit, let's say 128 corrected qubits, 128 corrected qubits, meaning they're accurate, that might take you 1,000 qubits to get to that level of accuracy. But the computational power for things that make sense, protein folding, cryptography, of 128-bit qubit is phenomenal. So we could get an enormous jump forward there. We need something post-silicon. We need something post-silicon. We've got maybe, as Moore's law slows down, maybe another decade or so before it comes to a real halt. And we've got to get an alternative technology out there, because I think there's lots of creative software to be written that wants to run on faster machines. AUDIENCE: I just-- at the end of your presentation, you briefly mentioned how we could start using hardware to increase security. Would you mind elaborating on that? JOHN HENNESSY: Sure. Sure. OK, so here's my view with security. Everybody knows about Meltdown and Spectre? First thing about Meltdown and Spectre is to understand what happened is an attack that basically undermined architecture in a way that we never anticipated. I worked on out-of-order machines in the mid-1990s. That's how long that bug has been in those machines, since the 1990s. And we didn't even realize it. We didn't even realize it. And the reason is that basically what happens is our definition of architecture was there is an instruction set. Programs run. I don't tell you how fast they run, all I tell you is what the right answer is. Side channel attacks that use performance to leak information basically go around our definition of architecture. So we need to rethink about architecture. You know, in the 1960s and 1970s, there was a lot of thought about how to do a better job of protection. Rings and domains and capabilities. They all got dropped. And they got dropped because two things. First of all, we became convinced that people were going to verify their software and it was always going to be perfect. Well, the problem is that the amount of software we write is far bigger than the amount of software we ever verify, so that's not going to help. I think it's time for architects to begin to think about how can they help software people build systems which are more secure? What's the right architecture support to make more secure systems? How do we build those? How do we make sure they get used effectively? And how do we together-- architects and software people working together-- create a more secure environment? And I think it's going to mean thinking back about some of those old ideas and bringing them back in some cases. AUDIENCE: After I took my processor architecture class, which used your book-- JOHN HENNESSY: I hope it didn't hurt you. AUDIENCE: Hopefully not. I had a real appreciation for the simplicity of a risk system. It seems like we've gone towards more complexity with domain-specific languages and things. Is that just because of performance or has your philosophy changed? What do you think? JOHN HENNESSY: No, I actually think they're not necessarily more complicated. They have a narrower range of applicability. But they're not more complicated in the sense that they are a better match for what the application is. And the key thing to understand about risk, the key insight was we weren't targeting people writing assembly language anymore. That was the old way of doing things, right? In the 1980s, the move was on. Unix was the first operating system ever written in a high level language, the first ever. The move was on from assembly language to high level languages. And what you needed to target was the compiler output. So it's the same thing here. You're targeting the output of a domain-specific language that works well for a range of domains. And you design the architecture to match that environment. Make it as simple as possible, but no simpler. AUDIENCE: With the domain-specific architectures, do you have examples of what might be the most promising areas for future domain-specific architectures? JOHN HENNESSY: So I think the most obvious one are things related to machine learning. I mean, they're computationally extremely intensive, both training as well as inference. So that's one big field. Virtual reality. Virtual reality and augmented reality environments. If we really want to construct a high-quality environment that's augmented reality, we're going to need enormous amounts of computational power. But again, it's well-structured kinds of computations that could match to those kinds of applications. We're not going to do everything with domain-specific architectures. They're going to give us a lift on some of the more computationally-intensive problems. We're still going to have to advance and think about how to push forward general purpose, because the general purpose machines are going to drive these domain-specific machines. The domain-specific machine will not do everything for us. So we're going to have to figure out ways to go forward on that front as well. AUDIENCE: Professor, what do we think about some emerging memory technology? How will it impact the future computer architecture? Thank you. JOHN HENNESSY: Yeah, that's a really great question. So as we get to the end of DRAMs, I think some of the more innovative memory technologies are beginning to appear. So-called phase change technologies, which have the advantage that they can probably scale better than DRAM and probably even better than Flash technologies. They have the advantage that lifetimes are better, too, than Flash. The problem with Flash is it wears out. Some of these phase change memories or memristor technologies have the ability to scale longer. And what you'll get is probably not a replacement for DRAM. You'll probably get a replacement for Flash and a replacement for disks. And I think that technology is coming very fast. And it'll change the way we think about memory hierarchies and I/O hierarchy, because you'll have a device that's not quite as fast as DRAM, but a lot faster than the other alternatives. And that will change the way we want to build machines. AUDIENCE: As a person, you think about education quite often. We all saw Zuckerberg having a conversation with Congress. And I'm excited to see children getting general education around computing and coding, which is something that a lot of us didn't have the opportunity to have. Where do you see education, not only for K-12, grad, post-grad, et cetera, but also existing people in policy-making decisions, et cetera? JOHN HENNESSY: Yeah. Well, I think first of all, education has become a lifelong endeavor. Nobody has one job for a lifetime anymore. They change what they're doing and education becomes constant. I mean, you think about the stuff you learned as an undergrad and you think how much technology has already changed, right? So we have to do more there. I think we also have to make more-- society needs to be more technology-savvy. Computing is changing every single part of the world we live in. To not have some understanding into that technology, I think, limits your ability to lead an organization, to make important decisions. So we're going to have to educate our young people at the beginning. And we're going to have to make an investment in education so that as people's careers change over their lifetime, they can go back and engage in education. Not necessarily going back to college, it's going to have to be online in some way. But it's going to have to be engaging. It's going to have to be something that really works well for people. AUDIENCE: Hi. Olly [INAUDIBLE] from BBC. Just wondered what your view is on the amount of energy being used on Bitcoin mining and other cryptocurrencies and that sort of thing. JOHN HENNESSY: Yeah. So I could build a special purpose architecture to mine Bitcoins. That's another obvious example of a domain-specific architecture for sure. So I'm a long-term believer in cryptocurrency as an important part of our space. And what we're going to have to do is figure out how to make it work, how to make it work efficiently, how to make it work seamlessly, how to make it work inexpensively. I think those are all problems that can be conquered. And I think you'll see a bunch of people that have both the algorithmic heft and the ability to rethink how we do that, and really make cryptocurrencies go quite quick. And then we can also build machines which accelerate that even further, so that we can make-- a cryptocurrency transaction should be faster than a cash transaction and certainly no slower than a credit card transaction. We're not there yet. But we can get there. We can get there with enough work. And I think that's where we ought to be moving to. AUDIENCE: What do you think the future operating system has to have to cope with this? JOHN HENNESSY: Yeah. The future of operating system, you said, yes? Yeah. So I think operating systems are really crucial. You know, way back when in the 1980s, we thought we were going to solve all our operating system problems by going to kernel-based operating systems. And the kernel would be this really small little thing that just did the core functions of protection and memory management. And then, everything else around it would be protected, basically. And what happened was kernel started out really small and then they got bigger and then they got bigger and then they got bigger. And all of a sudden, almost the entire operating system was in the kernel, primarily to make it performance-efficient. And the same thing happen with hypervisors. They started really small in the very beginning and then they got bigger. We're going to have to figure out how we structure complex operating systems so that they can deal with the protection issues, they can deal with efficiency issues, they can work well. We should be building operating systems which, from the beginning, realize that they're going to run on large numbers of processors, and organize them in such a way that they can do that efficiently. Because that's the future, we're going to have to rely on that. AUDIENCE: In your intro video, you mentioned this chasm between concept and practice. And also in your talk, you've mentioned that hardware is vital to the future of computing. Given that most investors are very hardware-averse, especially this day and age, where do you expect that money to come from? Is that something that will come from governments or private investing? How are we going to fund the future of computing is really what my question is. JOHN HENNESSY: Yeah, it's a good question. I mean, I think the answer is both. You know, certainly Google's making large investments in a lot of these technologies from quantum to other things. I think government remains a player. So government, you look at how many of the innovations we're used to. The internet, risk, the rise of VLSI, modern computer-aided design tools. All had funding basically coming from the government at some point. So I think the government should still remain a player in thinking about-- what's the one area the government has probably funded longer than anybody else? Artificial intelligence. They funded it for 50 years before we really saw the breakthrough that came. Right? So they're big believers. They should be funding things long-term. They should fund things that are out over the horizon that we don't yet really understand what their practical implications may be. So I think we're going to have to have that and we're going to have to have industry playing a big role. And we're going to have to make universities work well with industry, because they complement one another, right? They do two different kinds of things but they're complementary. And if we can get them to work well, then we can have the best of both worlds. AUDIENCE: You talked a little bit about the difference between the memory hierarchy and storage that is coming up with these new memory technologies. Have you seen any applications where the compute and the storage get combined, kind of more like the brain? JOHN HENNESSY: Yeah, I think increasingly we'll see things move towards that direction where the software takes care of the difference between what is in storage and-- "storage," quote unquote, right, because it may actually be Flash or some kind of next generation memory technology-- and what's in DRAM. What you need to tell me is what's volatile and when do I have to ensure that a particular operation is committed to nonvolatile storage. But if you know that, we've got log base file systems, you've got other ideas which move in the direction of trying to take advantage of a much greatly different memory hierarchy, greatly different storage hierarchy than we're used to. And we may want to continue to move in that direction, particularly when you begin to think about-- if you think about things like networking or I/O and they become major bottlenecks in applications, which they often do, then rethinking how we could do those efficiently and optimize the hardware, but also the software. Because the minute you stick an operating system transaction in there, you've added a lot of weight to what it costs to get to that storage facility. So if we can make that work better and make it more transparent without giving up protection, without giving up a guarantee that once something is written to a certain storage unit it's permanently recorded, then I think we can make much faster systems. AUDIENCE: So do you see the implementation of a domain-specific architecture being implemented as hetero type or do you see it off-die, off-chip type implementations, or both? JOHN HENNESSY: I think both. I mean, I think it's a time of great change. The rise of FPGAs, for example, gives you the opportunity to implement these machines, try them out. Implement them in FPGA before you're committed to design a custom silicon chip. Put it in an FPGA. Unleash it on the world. Try it out, see how it works, see how the applications map to it. And then, perhaps, decide whether or not you want to freeze the architecture. Or you may just want to build another next generation FPGA. So I think we'll see lots of different implementation approaches. The one thing we have to do-- you know, there was a big breakthrough in how hard it was to design chips that occurred from about the mid-'80s to about 1995 or 2000. Things have kind of ground to a halt since then. We haven't had another big-- we need a big breakthrough because we're going to need many more people designing processors targeting particular application domains. And that's going to mean we need to make it much easier and much cheaper to design a processor. AUDIENCE: I'm wondering, as a deep learning engineer for a private enterprise, what is my role in pushing forward DSA? JOHN HENNESSY: Yeah. Well, I think your role is vital because we need people who really understand the application space. And that's really critical. And this is a change. I mean, if you think about how much architects and computer designers, hardware designers have had to think about the applications, they haven't had to think about them. All of a sudden, they're going to have to develop a bunch of new friends that they can interact with and talk to and colleagues they can work with, to really get the insights they need in order to push forward the technology. And that's going to be a big change for us, but I think it's something that's absolutely crucial. And it's great for the industry too, because all of a sudden we get people who are application experts beginning to talk people who are software domain experts or talk to hardware people. That's a terrific thing. AUDIENCE: You mentioned the performance enhancements of domain-specific languages over Python, for instance, but they're also much harder to use. So do you think software engineering talent can keep up in the future? JOHN HENNESSY: Yeah. I think the challenge will be-- the gain we've gotten in software productivity in the last 20 or 30 years is absolutely stunning. It is absolutely stunning. I mean, a programmer now can probably write 10 to 100 times more code than they could 30 years ago, in terms of functionality. That's phenomenal. We cannot give that up because that's what's created all these incredible applications we have. What we need to do is figure out-- all of a sudden, we need a new generation of compiler people to think about how do we make those run efficiently. And by the way, if the gap is a factor of 25 between C and Python, for example, if you get only half that, that's a factor of 12 times faster. Any compiler writer that can produce code that runs 12 times faster is a hero in my book. So we have to just think about new ways to approach the problem. And the opportunity is tremendous. AUDIENCE: Are there any opportunities still left in x86 as far as, like, lifting the complexity of the ISA into software and exposing more microarchitecture to the compiler? JOHN HENNESSY: It's tough. I mean, I think the Intel people have spent more time implementing x86s than anybody's ever spent implementing one ISA, one instruction set ever. They've mined out almost all the performance. And in fact, if you look at the tweaks that occur, for example, they do aggressive prefetching in the i7. But you look at what happens with prefetching, some programs actually slow down. Now on balance, they get a little bit of speed up from it, but they actually slow down other programs. And the problem right now is it's very hard to turn that dial in such a way that we don't get overwhelmed with negative things. And I see my producer telling me it's the end of the session. Thank you for the great questions and for your attention. [APPLAUSE] [MUSIC PLAYING]

Info

Channel: Google Developers

Views: 59,013

Rating: 4.9274378 out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate

Id: Azt8Nc-mtKM

Channel Id: undefined

Length: 41min 16sec (2476 seconds)

Published: Thu May 10 2018