[MUSIC PLAYING] JOHN HENNESSY: Boy, I'm
delighted to be here today and have a chance
to talk to you about what is one of the biggest challenges
we faced in computing in 40 years, but also a
tremendous opportunity to rethink how we
build computers and how we move forward. You know, there's been
a lot of discussion about the ending of Moore's law. The first thing to remember
about the ending of Moore's law is something Gordon
Moore said to me. He said, all exponentials
come to an end. It's just a question of when. And that's what's
happening with Moore's law. If we look at-- what
does it really mean to say Moore's law is ending? What does it really mean? Well, look at what's
happening in DRAMs. That's probably a
good place to start because we all depend on the
incredible growth in memory capacity. And if you look
at what's happened in DRAMs, for many
years we were achieving increases of about 50% a year. In other words, going
up slightly faster even than Moore's law. Then we began a
period of slowdown. And if you look what's happened
in the last seven years, this technology we were
used to seeing boom, the number of megabits per chip
more than doubling every two years, is now going
up at about 10% a year and it's going to take
about seven years to double. Now, DRAMs are a
particularly odd technology because they use deep
trench capacitors, so they require a
very particular kind of fabrication technology. What's happening in
processors, though? And if you look at the
data in processors, you'll see a similar slowdown. Moore's law is that
red line going up there on a nice logarithmic plot. Notice the blue line. That's the number of
transistors on a typical Intel microprocessor at that date. It begins diverging,
slowly at first. But look what's happened since
in the last 10 years, roughly. The gap has grown. In fact, if you look at
where we are in 2015, 2016, we're more than a factor
of 10 off, had we stayed on that Moore's law curve. Now, the thing to remember
is that there's also a cost factor in here. Fabs are getting a lot more
expensive and the cost of chips is actually not
going down as fast. So a result of that is that
the cost per transistor is actually increasing
at a worse rate. So we're beginning to
see the effects of that as we think about architecture. But if the slowdown
of Moore's law, which is what you see all
the press about as one thing, the big issue is the end of
what we call Dennard scaling. So Bob Dennard was
an IBM employee, he was the guy who invented
the one transistor DRAM. And he made a prediction
many years ago that the energy, the power
per square millimeter of silicon would stay
constant, would stay constant because voltage levels
would come down, capacitance would come down. What does that mean? If the energy, if the
power stays constant and the number of transistors
increases exponentially, then the energy per transistor
is actually going down. And in terms of
energy consumption, it's cheaper and cheaper
and cheaper to compute. Well, what happened
with Dennard scaling? Well, look at that
blue line there. The red line shows
you the technology improving on a standard
Moore's law curve. The blue line shows you
what's happening to power. And you all know. I mean, you've seen
microprocessors now, right? They slow their clock
down, they turn off cores, they do all kinds of things,
because otherwise they're going to burn up. They're going to burn up. I mean, I never thought we'd see
the day where a processor would actually slow itself down to
prevent itself overheating, but we're there. And so what happens
with Dennard scaling is it began to slow
down starting about '97. And then since 2007,
it's essentially halted. The result is a big change. All of a sudden, energy,
power becomes the key limiter. Not the number of transistors
available to designers, but their power consumption
becomes the key limiter. That requires you to think
completely differently about architecture, about
how you design machines. It means inefficiency in the
use of transistors in computing. Inefficiency in how an
architecture computes is penalized much
more heavily than it was in this earlier time. And of course, guess what? All the devices we carry
around, all the devices we use are running off batteries. So all of a sudden, energy is
a critical resource, right? What's the worst
thing that happens is your cell phone
runs out of power, your smartphone
runs out of power. That's a disaster, right? But think about all the
devices we walk around with. They're hooked up to battery. Think about the era,
the coming era of IoT, where we're going to have
devices that are always on and permanently on,
which are expected to last 10 years
on a single battery by using energy
harvesting techniques. Energy becomes the key resource
in making those things work efficiently. And as we move more and
more to always on devices with things like
Google Assistant, you're going to want your
device on all the time or at least you're going to want
the CPU on all the time, if not the screen. So we're going to have to worry
more and more about power. But the surprising thing that
many people are surprised by is that energy efficiency is
a giant issue in large cloud configurations. This shows you what the
typical capital cost would be like for a Google data center. You'll notice that green slice
there, those are the servers. But look at the size
of that red slice. That red slice is
the cost of the power plus cooling infrastructure. Spending as much on power
and cooling as you're spending on processors. So energy efficiency becomes
a really critical issue as we go forward. And the end of
Dennard scaling has meant that there's
no more free lunch. For a lot of years,
we had a free lunch. It was pretty easy
to figure out how to make computation
more energy efficient. Now, it's a lot harder. And you can see
the impact of this. This just shows you 40 years of
processor performance, what's happened to uniprocessor, single
processor performance, and then multiprocessor performance. So there were the early years
of computing, the beginning of the microprocessor era. We were seeing about 22%
improvement per year. The creation of risk
in the mid-1980s, a dramatic use of instruction
level parallelism, pipelining, multiple issue. We saw this incredible
period of about 20 years, where we got roughly
50% performance improvement per year. 50%. That was amazing. Then the beginning of the
end of Dennard scaling. That caused everybody
to move to multi-core. What did multi-core do? Multi-core shoved the
efficiency problem from the hardware designer
to the software people. Now, the software
people had to figure out how to use those multi-core
processors efficiently. But Amdahl's law came
along, reared its ugly head. I'll show you some data on that. And now, we're in
this late stage period where it looks like we're
getting about 3% performance improvement per year. Doubling could take 20 years. That's the end of
general purpose processor performance
as we know it, as we're used to
for so many years. Why did this happen? Why did it grind
to a halt so fast? Well, think about what was
happening during that risk era where we're building these
deeply pipelined machines. 15, 16, 17 stages
deep pipelines, four issues per clock. That machine needs to
have 60 instructions that it's working on at once. 60 instructions. How does it possibly
get 60 instructions? It uses speculation. It guesses about branches,
it yanks instructions and tries to execute them. But guess what happens? Nobody can predict
branches perfectly. Every time you predict
a branch incorrectly, you have to undo all the work
associated with that missed prediction. You've got to back
it out, you've got to restore the
state of the machine. And if you look inside
a typical Intel Core i7 today, on integer
code roughly 25% of the instructions
that get executed end up being thrown away. Guess what? The energy still got burnt to
execute all those instructions. And then, I threw
the results away and I had to restore the
state of the machine. A lot of wasted energy. That's why the single
processor performance curve ended, basically. But we see similar
challenges when you begin to look at multi-core things. Amdahl's law, Gene
Amdahl wrote Amdahl's law more than 40 years ago. It's still true today. Even if you take
large data centers with heavily parallel
workloads, it's very hard to write a big
complicated piece of software and not have small sections
of it be sequential, whether it's synchronization or
coordination or something else. So think about what happens. You've got a 64 processor
multi-core in the future. Suppose 1%, just 1% of
the code is sequential. Then that 64 processor
multi-core only runs at the speed of
a 40 processor core. But guess what? You paid all the energy
for a 64 processor core executing all the
time and you only got 40 processors out of
that, slightly more than half. That's the problem. We've got to breakthrough
this efficiency barrier. We've got to rethink
how we design machines. So what's left? Well, software-centric
approaches. Can we make our
systems more efficient? It's great that we have
these modern scripting languages, they're
interpreted, dynamically-typed, they encourage reuse. They've really
liberated programmers to get a lot more code
written and create incredible functionality. They're efficient
for programmers. They're very inefficient
for execution, and I'll show you
that in a second. And then there are
hardware-centric approaches, what Dave Patterson and I call
domain-specific architectures. Namely, designing
an architecture which isn't fully
general purpose, but which does a set of
domains, a set of applications really well, much
more efficiently. So let's take a look at
what the opportunity is. This is a chart that comes out
of a paper by Charles Leiserson and a group of colleagues
at MIT, called "There's Plenty of Room at the Top." They take a very simple example,
admittedly, matrix multiply. They write it in Python. They run it on an 18
core Intel processor. And then they proceed
to optimize it. First, rewrite it in C.
That speeds it up 47 times. Now, any compiler in the world
that can get a speed up of 47 would be really remarkable,
even a speed up of 20. Then they rewrite it
with parallel loops. They get almost a factor
of nine out of that. Then they rewrite it by
doing memory optimization. That gives them a factor of 20. They block the matrix, they
allocate it to the caches properly. That gives them a factor of 20. And then finally,
they rewrite it using Intel AVX instructions,
using the vector instructions in the Intel Core, right,
domain-specific instructions that do vector
operations efficiently. That gives them
another factor of 10. The end result is
that final version runs 62,000 times faster
than the initial version. Now admittedly, matrix
multiply is an easy case, small piece of code. But it shows the
potential of rethinking how we write this software
and making it better. So what about these
domain-specific architectures? Really what we're
going to try to do is make a breakthrough
in how efficient we build the hardware. And by domain-specific,
we're referring to a class of processors which
do a range of applications. They're not like, for
example, the modem inside the cell phone, right? That's programmed once,
it runs modem code. It never does anything else. But think of a set
of processors which do a range of
applications that are related to a particular
application domain. They're programmable, they're
useful in that domain, they take advantage
of specific knowledge about that domain
when they run, so they can run much more efficiently. Obvious examples, doing things
for neural network processors, doing things that focus
on machine learning. One example. GPUs are another example of
this kind of thinking, right? They're programmable
in the context of doing graphics processing. So for any of you who have
ever seen that any of the books that Dave Patterson
and I wrote, you know that we like quantitative
approaches to understand things and we like to analyze
why things work. So the key about
domain-specific architectures is there is no black magic here. Going to a more limited
range of architectures doesn't automatically
make things faster. We have to make specific
architectural changes that win. And there are three big ones. The first is we make more
effective use of parallelism. We go from a multiple
instruction, multiple data world that you'd see
on a multi-core today to a single instruction
multiple data. So instead of having
each one of my cores fetch separate
instruction streams, have to have
separate caches, I've got one set of
instructions and they're going to a whole set
of functional units. It's much more efficient. What do I give up? I give up some flexibility
when I do that. I absolutely give
up flexibility. But the efficiency
gain is dramatic. I go from speculative
out-of-order machines, what a typical high-end
processor from ARM or Intel looks like today,
to something that's more like a VLIW, that uses
a set of operations where the compiler has decided
that a set of operations can occur in parallel. So I shift work from
runtime to compile time. Again, it's less flexible. But for applications
when it works, it's much more efficient. I move away from caches. So caches are one of
the great inventions of computer science, one of
the truly great inventions. The problem is when there
is low spatial and low temporal locality, caches
not only don't work, they actually slow
programs down. They slow them down. So we move away from that to
user control local memories. What's the trade-off? Now, somebody has
to figure out how to map their application
into a user controlled memory structure. Cache does it automatically for
you, it's very general purpose. But for certain applications,
I can do a lot better by mapping those things myself. And then finally, I focus on
only the amount of accuracy I need. I've move from IEEE to the
lower precision floating point or from 32 and 64-bit integers
to 8-bit and 16-bit integers. If that's all the
accuracy I need, I can do eight
integer operations, eight 8-bit operations in
the same amount of time that I can do one
64-bit operation. So considerably faster. But to go along
with that, I also need a domain-specific language. I need a language
that will match up to that hardware configuration. We're not going to be able to
take code written in Python or C, for example, and extract
the kind of information we need to map to a
domain-specific architecture. We've got to rethink how
we program these machines. And that's going to be
high-level operations. It's going to be
vector-vector multiply or a vector-matrix multiply or
a sparse matrix organization, so that I get that high-level
information that I need and I can compile it down
into the architecture. The key in doing these
domain-specific languages will be to retain enough machine
independence that I don't have to recode things, that a
compiler can come along, take a domain-specific
language, map it to maybe one architecture
that's running in the cloud, maybe another
architecture that's running on my smartphone. That's going to
be the challenge. Ideas like TensorFlow and OpenGL
are a step in this direction, but it's really a new space. We're just beginning to
understand it and understand how to design in this space. You know, I built my first
computer almost 50 years ago, believe it or not. I've seen a lot of revolutions
in this incredible IT industry since then-- the creation of the internet,
the creation of the World Wide Web, the magic of
the microprocessor, smartphones, personal computers. But the one I think
that is really going to change our
lives is the breakthrough in machine learning and
artificial intelligence. This is a technology
which people have worked on for 50 years. And finally, finally, we
made the breakthrough. And the basis of
that breakthrough? We needed about a million
times more computational power than we thought we needed
to make the technology work. But we finally got
to the point where we could apply that
kind of computer power. And the one thing-- this is some
data that Jeff Dean and David Patterson and Cliff
Young collected-- that shows there's one
thing growing just as fast as Moore's law-- the number of papers being
published in machine learning. It is a revolution. It's going to change our world. And I'm sure some of you saw
the Duplex demo the other day. I mean, in the domain
of making appointments, it passes the Turing
test in that domain, which is an extraordinary
breakthrough. It doesn't pass it
in the general terms, but it passes it in
a limited domain. And that's really an
indication of what's coming. So how do you think
about building a domain-specific architecture
to do deep neural networks? Well, this is a picture
of what's inside a tensor processing unit. The point I want
to make about this is if you look at this what
uses up the silicon area, notice that it's not used
for a lot of control, it's not used for
a lot of caching. It's used to do things
that are directly relevant to the computation. So this processor
can do 256 by 256-- that is 64,000
multiply accumulates, 8-bit multiply accumulates
every single clock. Every single clock. So it can really crunch
through, for inference things, enormous amounts of
computational capability. You're not going to run
general purpose C code on this. You're going to run something
that's a neural network inference problem. And if you look
at the performance and you look at-- here we've
shown performance per watt. Again, energy being
the key limitation. Whether it's for your
cell phone and you're doing some kind of machine
learning on your cell phone or it's in the cloud, energy
is the key limitation. So what we plotted here is
the performance per watt. And you see that the first
generation tensor processing unit gets roughly more than
30 times the performance per watt compared to a
general purpose processor. It even does considerably
better than a GPU, largely by switching
from floating point to lower density integer,
which is much faster. So again, this notion of
tailoring the architecture to the specific domain
becomes really crucial. So this is a new era. In some sense, it's
a return to the past. In the early days of computing,
as computers were just being developed,
we often had teams of people working together. We had people who were early
applications experts working with people who were doing
the beginning of the software environment-- building
the first compilers and the first
software environment-- and people doing
the architecture. And they're working
as a vertical team. That kind of
integration, where we get a design team
that understands how to go from application
to representation in some domain-specific
language to architecture and can think about how to
rebuild machines in new ways to get this, it's an
enormous opportunity and it's a new kind of challenge
for the industry to go forward. But I think there are enough
interesting application domains like this where we can
get incredible performance advantages by tailoring
our machines in a new way. And I think if we can do that,
maybe it will free up some time to worry about another small
problem, namely cybersecurity and whether or not the
hardware designers can finally help the software
designers to improve the security of our system. And that would be a great
problem to focus on. Thank you for your
attention and I'm happy to answer any
questions you might have. [APPLAUSE] Thanks. AUDIENCE: Can you talk
about some of the advances in quantum and
neuromorphic computing? JOHN HENNESSY: Yeah. So quantum-- that's a
really good question. So my view of this
is that we've got to build a bridge from where
we are today to post-silicon. The possibilities
for post-silicon, there are a couple. I mean there's organic,
there's quantum, there's carbon
nanofiber, there's a few different
possibilities out there. I characterize them as
technology of the future. The reason is the people working
on them are still physicists. They're not computer scientists
yet or electrical engineers, they're physicists. So they're still in the lab. On the other hand,
quantum, if it works, the computational power
from a reasonably modest sized qubit, let's say 128 corrected
qubits, 128 corrected qubits, meaning they're
accurate, that might take you 1,000 qubits to get
to that level of accuracy. But the computational
power for things that make sense, protein
folding, cryptography, of 128-bit qubit is phenomenal. So we could get an enormous
jump forward there. We need something post-silicon. We need something post-silicon. We've got maybe, as
Moore's law slows down, maybe another decade or so
before it comes to a real halt. And we've got to get an
alternative technology out there, because I think there's
lots of creative software to be written that wants
to run on faster machines. AUDIENCE: I just-- at the
end of your presentation, you briefly mentioned how we
could start using hardware to increase security. Would you mind
elaborating on that? JOHN HENNESSY: Sure. Sure. OK, so here's my
view with security. Everybody knows about
Meltdown and Spectre? First thing about
Meltdown and Spectre is to understand what happened
is an attack that basically undermined architecture in a
way that we never anticipated. I worked on out-of-order
machines in the mid-1990s. That's how long that bug
has been in those machines, since the 1990s. And we didn't even realize it. We didn't even realize it. And the reason is that
basically what happens is our definition
of architecture was there is an instruction set. Programs run. I don't tell you
how fast they run, all I tell you is what
the right answer is. Side channel attacks that use
performance to leak information basically go around our
definition of architecture. So we need to rethink
about architecture. You know, in the
1960s and 1970s, there was a lot of
thought about how to do a better job of protection. Rings and domains
and capabilities. They all got dropped. And they got dropped
because two things. First of all, we
became convinced that people were going
to verify their software and it was always
going to be perfect. Well, the problem is that the
amount of software we write is far bigger than the amount
of software we ever verify, so that's not going to help. I think it's time for architects
to begin to think about how can they help software
people build systems which are more secure? What's the right
architecture support to make more secure systems? How do we build those? How do we make sure they
get used effectively? And how do we together--
architects and software people working together-- create
a more secure environment? And I think it's going to
mean thinking back about some of those old ideas and bringing
them back in some cases. AUDIENCE: After I took my
processor architecture class, which used your book-- JOHN HENNESSY: I hope
it didn't hurt you. AUDIENCE: Hopefully not. I had a real appreciation
for the simplicity of a risk system. It seems like we've
gone towards more complexity with domain-specific
languages and things. Is that just because
of performance or has your philosophy changed? What do you think? JOHN HENNESSY: No, I
actually think they're not necessarily more complicated. They have a narrower
range of applicability. But they're not more
complicated in the sense that they are a better match
for what the application is. And the key thing to understand
about risk, the key insight was we weren't
targeting people writing assembly language anymore. That was the old way
of doing things, right? In the 1980s, the move was on. Unix was the first operating
system ever written in a high level
language, the first ever. The move was on from
assembly language to high level languages. And what you needed to target
was the compiler output. So it's the same thing here. You're targeting the output of
a domain-specific language that works well for a
range of domains. And you design the architecture
to match that environment. Make it as simple as
possible, but no simpler. AUDIENCE: With the
domain-specific architectures, do you have examples
of what might be the most promising areas
for future domain-specific architectures? JOHN HENNESSY: So I think the
most obvious one are things related to machine learning. I mean, they're computationally
extremely intensive, both training as
well as inference. So that's one big field. Virtual reality. Virtual reality and augmented
reality environments. If we really want to construct a
high-quality environment that's augmented reality, we're
going to need enormous amounts of computational power. But again, it's
well-structured kinds of computations that could match
to those kinds of applications. We're not going to
do everything with domain-specific architectures. They're going to give
us a lift on some of the more
computationally-intensive problems. We're still going to have to
advance and think about how to push forward general purpose,
because the general purpose machines are going to drive
these domain-specific machines. The domain-specific machine
will not do everything for us. So we're going to have
to figure out ways to go forward on
that front as well. AUDIENCE: Professor, what do
we think about some emerging memory technology? How will it impact the
future computer architecture? Thank you. JOHN HENNESSY: Yeah, that's
a really great question. So as we get to
the end of DRAMs, I think some of the more
innovative memory technologies are beginning to appear. So-called phase
change technologies, which have the advantage
that they can probably scale better than DRAM
and probably even better than Flash technologies. They have the advantage that
lifetimes are better, too, than Flash. The problem with
Flash is it wears out. Some of these phase change
memories or memristor technologies have the
ability to scale longer. And what you'll get is probably
not a replacement for DRAM. You'll probably get a
replacement for Flash and a replacement for disks. And I think that technology
is coming very fast. And it'll change the way we
think about memory hierarchies and I/O hierarchy, because
you'll have a device that's not quite as fast as
DRAM, but a lot faster than the other alternatives. And that will change the way
we want to build machines. AUDIENCE: As a person, you think
about education quite often. We all saw Zuckerberg having
a conversation with Congress. And I'm excited to
see children getting general education around
computing and coding, which is something
that a lot of us didn't have the
opportunity to have. Where do you see education, not
only for K-12, grad, post-grad, et cetera, but also
existing people in policy-making
decisions, et cetera? JOHN HENNESSY: Yeah. Well, I think first
of all, education has become a lifelong endeavor. Nobody has one job for
a lifetime anymore. They change what they're doing
and education becomes constant. I mean, you think
about the stuff you learned as an undergrad and
you think how much technology has already changed, right? So we have to do more there. I think we also
have to make more-- society needs to be
more technology-savvy. Computing is changing
every single part of the world we live in. To not have some understanding
into that technology, I think, limits your ability
to lead an organization, to make important decisions. So we're going to have to
educate our young people at the beginning. And we're going to have to
make an investment in education so that as people's careers
change over their lifetime, they can go back and
engage in education. Not necessarily going
back to college, it's going to have to
be online in some way. But it's going to
have to be engaging. It's going to have
to be something that really works well for people. AUDIENCE: Hi. Olly [INAUDIBLE] from BBC. Just wondered what your view is
on the amount of energy being used on Bitcoin mining
and other cryptocurrencies and that sort of thing. JOHN HENNESSY: Yeah. So I could build a special
purpose architecture to mine Bitcoins. That's another
obvious example of a domain-specific
architecture for sure. So I'm a long-term
believer in cryptocurrency as an important
part of our space. And what we're
going to have to do is figure out how
to make it work, how to make it work
efficiently, how to make it work seamlessly, how
to make it work inexpensively. I think those are all problems
that can be conquered. And I think you'll
see a bunch of people that have both the algorithmic
heft and the ability to rethink how we do that, and
really make cryptocurrencies go quite quick. And then we can also build
machines which accelerate that even further, so
that we can make-- a cryptocurrency transaction
should be faster than a cash transaction and certainly
no slower than a credit card transaction. We're not there yet. But we can get there. We can get there
with enough work. And I think that's where
we ought to be moving to. AUDIENCE: What do you think
the future operating system has to have to cope with this? JOHN HENNESSY: Yeah. The future of operating
system, you said, yes? Yeah. So I think operating
systems are really crucial. You know, way back
when in the 1980s, we thought we were going to
solve all our operating system problems by going to
kernel-based operating systems. And the kernel would be this
really small little thing that just did the core functions
of protection and memory management. And then, everything
else around it would be protected, basically. And what happened was
kernel started out really small and then they got
bigger and then they got bigger and then they got bigger. And all of a sudden, almost
the entire operating system was in the kernel, primarily to
make it performance-efficient. And the same thing
happen with hypervisors. They started really small
in the very beginning and then they got bigger. We're going to
have to figure out how we structure complex
operating systems so that they can deal with
the protection issues, they can deal with efficiency
issues, they can work well. We should be building
operating systems which, from the beginning,
realize that they're going to run on large
numbers of processors, and organize them in
such a way that they can do that efficiently. Because that's the future, we're
going to have to rely on that. AUDIENCE: In your
intro video, you mentioned this chasm between
concept and practice. And also in your
talk, you've mentioned that hardware is vital to
the future of computing. Given that most investors
are very hardware-averse, especially this day
and age, where do you expect that money to come from? Is that something that
will come from governments or private investing? How are we going to fund
the future of computing is really what my question is. JOHN HENNESSY: Yeah,
it's a good question. I mean, I think
the answer is both. You know, certainly Google's
making large investments in a lot of these technologies
from quantum to other things. I think government
remains a player. So government, you look at
how many of the innovations we're used to. The internet, risk,
the rise of VLSI, modern computer-aided
design tools. All had funding basically
coming from the government at some point. So I think the government
should still remain a player in thinking about-- what's the one area the
government has probably funded longer than anybody else? Artificial intelligence. They funded it for 50
years before we really saw the breakthrough that came. Right? So they're big believers. They should be funding
things long-term. They should fund things that
are out over the horizon that we don't yet
really understand what their practical
implications may be. So I think we're going
to have to have that and we're going to have to have
industry playing a big role. And we're going to have
to make universities work well with industry, because
they complement one another, right? They do two different
kinds of things but they're complementary. And if we can get
them to work well, then we can have the
best of both worlds. AUDIENCE: You
talked a little bit about the difference
between the memory hierarchy and storage that is coming
up with these new memory technologies. Have you seen any
applications where the compute and the
storage get combined, kind of more like the brain? JOHN HENNESSY: Yeah, I
think increasingly we'll see things move
towards that direction where the software takes care
of the difference between what is in storage and-- "storage,"
quote unquote, right, because it may actually be Flash
or some kind of next generation memory technology--
and what's in DRAM. What you need to tell
me is what's volatile and when do I have to ensure
that a particular operation is committed to
nonvolatile storage. But if you know that, we've
got log base file systems, you've got other ideas which
move in the direction of trying to take advantage of a much
greatly different memory hierarchy, greatly
different storage hierarchy than we're used to. And we may want to continue
to move in that direction, particularly when you
begin to think about-- if you think about things
like networking or I/O and they become major
bottlenecks in applications, which they often
do, then rethinking how we could do
those efficiently and optimize the hardware,
but also the software. Because the minute you stick
an operating system transaction in there, you've
added a lot of weight to what it costs to get
to that storage facility. So if we can make that
work better and make it more transparent without
giving up protection, without giving up a guarantee
that once something is written to a certain storage unit
it's permanently recorded, then I think we can make
much faster systems. AUDIENCE: So do you
see the implementation of a domain-specific
architecture being implemented as hetero type or do you see
it off-die, off-chip type implementations, or both? JOHN HENNESSY: I think both. I mean, I think it's a
time of great change. The rise of FPGAs,
for example, gives you the opportunity to implement
these machines, try them out. Implement them in
FPGA before you're committed to design a
custom silicon chip. Put it in an FPGA. Unleash it on the world. Try it out, see
how it works, see how the applications map to it. And then, perhaps,
decide whether or not you want to freeze
the architecture. Or you may just want to build
another next generation FPGA. So I think we'll see lots
of different implementation approaches. The one thing we have to do-- you know, there was a big
breakthrough in how hard it was to design chips that
occurred from about the mid-'80s to
about 1995 or 2000. Things have kind of ground
to a halt since then. We haven't had another big-- we need a big
breakthrough because we're going to need many more people
designing processors targeting particular application domains. And that's going to mean we need
to make it much easier and much cheaper to design a processor. AUDIENCE: I'm wondering,
as a deep learning engineer for a private
enterprise, what is my role in pushing forward DSA? JOHN HENNESSY: Yeah. Well, I think your role
is vital because we need people who really
understand the application space. And that's really critical. And this is a change. I mean, if you think about how
much architects and computer designers, hardware
designers have had to think about
the applications, they haven't had to
think about them. All of a sudden,
they're going to have to develop a bunch
of new friends that they can
interact with and talk to and colleagues they can
work with, to really get the insights they need in order
to push forward the technology. And that's going to be
a big change for us, but I think it's something
that's absolutely crucial. And it's great for
the industry too, because all of a
sudden we get people who are application
experts beginning to talk people who are
software domain experts or talk to hardware people. That's a terrific thing. AUDIENCE: You mentioned the
performance enhancements of domain-specific languages
over Python, for instance, but they're also
much harder to use. So do you think software
engineering talent can keep up in the future? JOHN HENNESSY: Yeah. I think the challenge will be-- the gain we've
gotten in software productivity in the
last 20 or 30 years is absolutely stunning. It is absolutely stunning. I mean, a programmer
now can probably write 10 to 100 times more code
than they could 30 years ago, in terms of functionality. That's phenomenal. We cannot give that up because
that's what's created all these incredible applications we have. What we need to
do is figure out-- all of a sudden, we need a new
generation of compiler people to think about how do we
make those run efficiently. And by the way, if the
gap is a factor of 25 between C and
Python, for example, if you get only
half that, that's a factor of 12 times faster. Any compiler writer
that can produce code that runs 12 times
faster is a hero in my book. So we have to just
think about new ways to approach the problem. And the opportunity
is tremendous. AUDIENCE: Are there any
opportunities still left in x86 as far as, like, lifting
the complexity of the ISA into software and exposing
more microarchitecture to the compiler? JOHN HENNESSY: It's tough. I mean, I think the Intel
people have spent more time implementing x86s than anybody's
ever spent implementing one ISA, one instruction set ever. They've mined out almost
all the performance. And in fact, if you look at the
tweaks that occur, for example, they do aggressive
prefetching in the i7. But you look at what happens
with prefetching, some programs actually slow down. Now on balance, they get a
little bit of speed up from it, but they actually slow
down other programs. And the problem
right now is it's very hard to turn that
dial in such a way that we don't get overwhelmed
with negative things. And I see my producer telling
me it's the end of the session. Thank you for the
great questions and for your attention. [APPLAUSE] [MUSIC PLAYING]