[MUSIC PLAYING] NICHOLAS THOMPSON: Hello,
I'm Nicholas Thompson. I'm the editor in
chief of "Wired." It is my honor today
to get the chance to interview Geoffrey Hinton. They're a couple--
well, there are many things I love about him. But two that I'll just
mention in the introduction. The first is that he persisted. He had an idea that
he really believed in that everybody
else said was bad. And he just kept at it. And it gives a lot of faith to
everybody who has bad ideas, myself included. Then the second, as someone
who spends half his life as a manager
adjudicating job titles, I was looking at his job
title before the introduction. And he has the most
non pretentious job title in history. So please welcome Geoffrey
Hinton, the engineering fellow at Google. [APPLAUSE] Welcome. GEOFFREY HINTON: Thank you. NICHOLAS THOMPSON: So
nice to be here with you. All right, so let us start. 20 years ago when you write some
of your early very influential papers, everybody starts
to say, it's a smart idea, but we're not actually going
to be able to design computers this way. Explain why you persisted, why
you were so confident that you had found something important. GEOFFREY HINTON: So actually
it was 40 years ago. And it seemed to me there's no
other way the brain could work. It has to work by learning
the strengths of connections. And if you want to make a
device do something intelligent, you've got two options. You can program it,
or it can learn. And we certainly
weren't programmed. So we had to learn. So this had to be
the right way to go. NICHOLAS THOMPSON:
So explain, though-- well, let's do this. Explain what neural
networks are. Most of the people here
will be quite familiar. But explain the
original insight and how it developed in your mind. GEOFFREY HINTON: So you have
relatively simple processing elements that are very
loosely models of neurons. They have connections coming in. Each connection
has a weight on it. That weight can be
changed to do learning. And what a neuron does
is take the activities on the connections times the
weights, adds them all up, and then decides whether
to send an output. And if it gets a big enough
sum, it sends an output. If the sum is negative,
it doesn't send anything. That's about it. And all you have to
do is just wire up a gazillion of those with
a gazillion squared weights and just figure out how
to change the weights, and it'll do anything. It's just a question of
how you change the weights. NICHOLAS THOMPSON:
So when did you come to understand that this was
an approximate representation of how the brain works? GEOFFREY HINTON: Oh, it was
always designed as that. NICHOLAS THOMPSON: Right. GEOFFREY HINTON: It was designed
to be like how the brain works. NICHOLAS THOMPSON: But
let me ask you this. So at some point
in your career, you start to understand
how the brain works. Maybe it was when you were 12. Maybe it was when you were 25. When do you make the
decision that you will try to model
computers after the brain? GEOFFREY HINTON:
Sort of right away. That was the whole point of it. The whole idea was to have
a learning device that learned like the
brain like people think the brain learns by
changing connection strengths. And this wasn't my idea. Turing had the same. Turing, even though
he invented a lot of the basis of standard
computer science, he believed that the brain
was this unorganized device with random weights. And it would use
reinforcement learning to change the connections. And it would learn
everything, and he thought that was the best
route to intelligence. NICHOLAS THOMPSON: And so you
were following Turing's idea that the best way to make
a machine is to model it after the human brain. This is how a human brain works. So let's make a
machine like that. GEOFFREY HINTON: Yeah, it
wasn't just Turing's idea. Lots of people thought
that back then. NICHOLAS THOMPSON: All
right, so you have this idea. Lots of people have this idea. You get a lot of credit. In the late '80s,
you start to come to fame with your published
work, is that correct? GEOFFREY HINTON: Yes. NICHOLAS THOMPSON: When
is the darkest moment. When is the moment
where other people who have been working who agreed
with this idea from Turing start to back away and yet
you continue to plunge ahead? GEOFFREY HINTON:
There were always a bunch of people who kept
believing in it, particularly in psychology. But among computer
scientists, I guess in the '90s, what happened was
data sets were quite small. And computers weren't that fast. And on small data sets,
other methods like things called support vector machines,
worked a little bit better. They didn't get confused
by noise so much. And so that was very depressing
because we developed back propagation in the '80s. We thought it would
solve everything. And we were a bit puzzled about
why it didn't solve everything. And it was just a
question of scale. But we didn't really
know that then. NICHOLAS THOMPSON:
And so why did you think it was not working? GEOFFREY HINTON: We
thought it was not working because we didn't have
quite the right algorithms. We didn't have quite the
right objective functions. I thought for a long
time it's because we were trying to do
supervised learning where you have to label data. And we should have been doing
unsupervised learning, where you just learn from the
data with no labels. It turned out it was
mainly a question of scale. NICHOLAS THOMPSON: Oh,
that's interesting. So the problem was you
didn't have enough data. You thought you had the
right amount of data, but you hadn't
labeled it correctly. So you just misidentified
the problem? GEOFFREY HINTON: I thought
that using labels at all was a mistake. You would do most
of your learning without making any
use of labels just by trying to model the
structure in the data. I actually still believe that. I think as computers get faster,
for any given size data set, if you make computers
fast enough, you're better off doing
unsupervised learning. And once you've done the
unsupervised learning, you'll be able to learn
from fewer labels. NICHOLAS THOMPSON:
So in the 1990s, you're continuing
with your research. You're in academia. You are still publishing, but
it's not coming to a claim. You aren't solving big problems. When do you start-- well, actually, was
there ever a moment where you said, you know
what, enough of this. I'm going to go
try something else? GEOFFREY HINTON: Not really. NICHOLAS THOMPSON: Not that
I'm going to go sell burgers, but I'm going to figure out a
different way of doing this. You just said we're going
to keep doing deep learning. GEOFFREY HINTON: Yes, something
like this has to work. I mean, the connections in the
brain are learning somehow. And we just have
to figure it out. And probably there's a bunch
of different ways of learning connection strengths. The brains using one of them. There may be other
ways of doing it. But certainly, you have to
have something that can learn these connection strengths. And I never doubted that. NICHOLAS THOMPSON: OK,
so you never doubt it. When does it first start
to seem like it's working? OK, you know, we've got this. I believe in this idea, and
actually, if you look at that, if you squint, you
can see it's working. When did that happen? GEOFFREY HINTON: OK, so one
of the big disappointments in the '80s was if
you made networks with lots of hidden layers,
you couldn't train them. That's not quite true because
convolutional networks designed by Yann LeCun, you could
train for fairly simple tasks like recognizing handwriting. But most of the deep nets, we
didn't know how to train them. And in about 2005,
I came up with a way of doing unsupervised
training of deep nets. So you take your
input, say your pixels, and you'd learn a bunch
of feature detectors so that were just good at
explaining why the pixels were behaving like that. And then you treat those
feature detectors as the data and then you learn another
bunch of feature detectors. So we got to explain why
those feature detectors have those correlations. And you keep learning
less and less. And what was interesting
was you could do some math and prove that each time
you learned another layer, you didn't necessarily have
a better model of the data, but you had a band on
how good your model was. And you could get a
better band each time you added another layer. NICHOLAS THOMPSON: What do you
mean you had a band on how good your model was? GEOFFREY HINTON: OK, so
once you got a model, you can say how surprising
does a model find this data? You showed some
data and you say, is that the kind of thing
you believe in or is that surprising? And you can sort of measure
something that says that. And what you'd like
to do is have a model, a good model is one that looks
at the data and says yeah, I knew that. It's unsurprising, OK? And it's often very
hard to compute exactly how surprising
this model finds the data. But you can compute
a band on that. You can say this model finds the
data less surprising than this. And you could show that, as
you add extra layers of feature detectors, you get a model. And each time you
add a layer, it finds the data, the band
on how surprising it finds the data gets better. NICHOLAS THOMPSON: Oh, I see. OK, so that makes sense. So you're making observations,
and they're not correct. But you know they're closer
and closer to being correct. I'm looking at the audience. I'm making some generalization. It's not correct, but I'm
getting better and better at it, roughly? GEOFFREY HINTON: Roughly. NICHOLAS THOMPSON: OK,
so that's about 2005 where you come up with that
mathematical breakthrough? GEOFFREY HINTON: Yeah. NICHOLAS THOMPSON: When do
you start getting answers that are correct and what
data are you working on? This is speech data where you
first have your break through. GEOFFREY HINTON: This was
just handwritten digits. Very simple data. Then around the same time,
they started developing GPUs. And the people doing
neural networks started using GPUs
in about 2007. I had one very
good student called Vlad Mnih, who started
using GPUs for finding roads in aerial images. He wrote some code that was
then used by other students for using GPUs to recognize
phonemes in speech. And so they were using
this idea pre-training. And after they'd done
all this pre-training, then they'd just stick labels
on top and use back propagation. And it turned out that way,
you could have a very deep net that was pre-trained this way. And you could then
use back propagation, and it actually worked. And it sort of
beat the benchmarks for speech recognition
initially just by a little bit. NICHOLAS THOMPSON: It beat the
best commercially available speech recognition. It beat the best academic
work on speech recognition. GEOFFREY HINTON: On a relatively
small data set called TIMIT, it did slightly better than
the best academic work. It also worked on at IBM. And very quickly people
realized that this stuff, since it was beating
standard models that are taking 30 years to develop
with a bit more development would do really well. And so my graduate students
went off to Microsoft and IBM and Google. And Google was the fastest
to turn it into a production speech recognizer. And by 2012, that work
that was first done in 2009 came out in Android. And Android suddenly got much
better in speech recognition. NICHOLAS THOMPSON: So tell me
about that moment where you've had this idea for
40 years, you've been publishing on
it for 20 years, and you're finally better
than your colleagues? What did that feel like? GEOFFREY HINTON:
Well, back then I'd only had the idea for 30 years. NICHOLAS THOMPSON: Correct,
correct, sorry, sir. Just a new idea. It's fresh. GEOFFREY HINTON:
It felt really good that it finally got the state
of the art on a real problem. NICHOLAS THOMPSON:
And do you remember where you were when you first
got the revelatory data? GEOFFREY HINTON: No. NICHOLAS THOMPSON: No, no, OK. All right, so you realize it
works on speech recognition. When do you start applying
it to other problems? GEOFFREY HINTON: So then
we start applying it to all sorts of other problems. So George Dahl, who was
one of the people who did the original work on speech
recognition, applied it to-- I give you a lot of
descriptors of a molecule and you want to predict if that
molecule will bind to something to act as a good drug. And there was a
competition on Kaggle. And he just applied our
standard technology design for speech recognition
to predicting the activity of drugs and
it won the competition. So that was a sign that this
stuff sort of fairly universal. And then I had a student
called [INAUDIBLE],, who said, you know, Geoff,
this stuff is going to work for image recognition. And Fei-Fei Li has created
the correct data set for it, and it's a public competition. We have to do that. And so what we did was take an
approach originally developed by Yann LeCun. A student called Alex
Krizhevsky was a real wizard. He could make GPUs do anything. Programmed the GPUs
really, really well. And we got results
that were a lot better than standard computer vision. That was 2012. And it was a coincidence I
think of the speech recognition coming out in the Android. So you knew this stuff could
solve production problems. And on vision in 2012,
it had done much better than the standard
computer vision. NICHOLAS THOMPSON: So those are
three areas where it succeeded. So modeling chemicals, speech,
voice, where was it failing? GEOFFREY HINTON: The failure is
only temporary, you understand. [LAUGHTER] NICHOLAS THOMPSON:
Where was it failing? GEOFFREY HINTON: For things
like machine translation, I thought it would be a very
long time before we could do that because
machine translation, you've got a string
of symbols comes in and a string of
symbols goes out. And it's fairly plausible to say
in between you do manipulations on strings of symbols, which
is what classical AI is. Actually, it doesn't
work like that. Strings and symbols come in. You turn those into great
big vectors in your brain. These vectors interact
with each other. And then you convert it back
into strings and symbols to go out. And if you told me in 2012
that in the next five years, we'll be able to translate
between many languages using just the same technology,
recurrent nets, but just the stochastic
gradient descent from random initial weights,
I wouldn't have believed you. It happened much
faster than expected. NICHOLAS THOMPSON: But
so what distinguishes the areas where it
works the most quickly and the areas where it
will take more time? It seems like the visual
processing, speech recognition, sort of core human
things that we do with our sensory perception
seem to be the first barriers to clear. Is that correct? GEOFFREY HINTON: Yes and no
because there's other things we do like motor control. We're very good
at motor control. Our brains are clearly
designed for that. And that's only just now
a neuron net's beginning to compete with the best
other technologies there. They will win in the end. But they're only
just winning now. I think things like
reasoning, abstract reasoning, they're the kind of last
things we learn to do. And I think they'll be
among the last things these neural nets learn to do. NICHOLAS THOMPSON:
And so you keep saying that neural nets will
win at everything eventually. GEOFFREY HINTON: Well, we
are neural nets, right? Anything we can do they can do. NICHOLAS THOMPSON:
Right, but just because the human brain is not
necessarily the most efficient computational
machine ever created. GEOFFREY HINTON:
Almost certainly not. NICHOLAS THOMPSON: So
why could there not be-- certainly not my human brain. Couldn't there be a way
of modeling machines that is more efficient
than the human brain? GEOFFREY HINTON:
Philosophically, I have no objection to the idea
there could be some completely different way to do all this. It could be that if
you start with logic and you're trying
to automate logic, and you make some really
fancy theorem prover, and you do reasoning,
and then you decide you're going to do visual
perception by doing reasoning, it could be that that
approach would win. It turned out it didn't. But I have no philosophical
objection to that winning. It's just we know
that brains can do it. NICHOLAS THOMPSON: Right,
but there are also things that our brains can't do well. Are those things
that neural nets also won't be able to do well? GEOFFREY HINTON:
Quite possibly, yes. NICHOLAS THOMPSON:
And then there's a separate problem, which
is we don't know entirely how these things work, right? GEOFFREY HINTON: No, we really
don't know how they work. NICHOLAS THOMPSON: We don't
understand how top down neural networks work. There is even a
core element of how neural networks work that
we don't understand, right? GEOFFREY HINTON: Yes. NICHOLAS THOMPSON:
So explain that and then let me ask
the obvious follow up, which is, we don't know
how these things work. How can those things work? GEOFFREY HINTON: OK, you ask
that when I finish explaining. NICHOLAS THOMPSON: Yes. GEOFFREY HINTON: So if you
look at current computer vision systems, most of them, they're
basically feed forward. They don't use
feedback connections. There's something else about
current computer vision systems, which is they're
very prone to have visceral examples. You can change a
few pixels slightly and something that was
a picture of a panda and still looks exactly
like a panda to you, it suddenly says
that's an ostrich. Obviously, the way you
change the pixels is cleverly designed to fool it into
thinking it's an ostrich. But the point is it still
looks just like a panda to you. And initially, we thought
these things work really well. But then when
confronted with the fact that they look at a panda and
be confident it's an ostrich, you get a bit worried. And I think part of
the problem there is that they're not trying to
reconstruct from the high level representations. They're trying to do descriptive
learning where you just learn layers of
feature detectors and the whole, whole objective
is just to change the weights. So you get better at
getting the right answer. They're not doing things
like at each level of feature detectors, check that
you can reconstruct the data in the layer below from
the activities of these feature detectors. And recently in Toronto,
we've been discovering, or Nick Frost's
been discovering, that if you introduce
reconstruction then it helps you be more resistant
to adversarial attack. So I think in
human vision, to do the learning we do in
reconstruction and also because we're doing
a lot of learning by doing reconstructions,
we are much more resistant to adversarial attack. NICHOLAS THOMPSON:
But you believe that top down communication
in a neural network is how you test,
how you reconstruct, how you test and make sure
it's a panda not an ostrich? GEOFFREY HINTON: I think
that's crucial, yes. Because I think if you-- NICHOLAS THOMPSON:
But brain scientists are not entirely agreed
on that, correct? GEOFFREY HINTON:
Brain scientists all agreed on the
idea that if you have two areas of the cortex
in a perceptual pathway, if there's connections
from one to the other, they'll always be backwards
connections, not necessarily point to point. But there will always
be a backwards pathway. They're not agreed
on what it's for. It could be for attention. It could be for learning, or
it could be for reconstruction, or it could be for all three. NICHOLAS THOMPSON: And
so we don't know what the backwards communication is. You are building your new neural
networks on the assumption that-- or you're building
backwards communication that is for reconstruction
into your neural networks even though we're not sure
that's how the brain works. GEOFFREY HINTON: Yes. NICHOLAS THOMPSON:
Isn't that cheating? GEOFFREY HINTON: Not at all NICHOLAS THOMPSON:
If you're trying to make it like
the brain, you're doing something we're not
sure is like the brain. GEOFFREY HINTON: Not at all. NICHOLAS THOMPSON: OK. GEOFFREY HINTON: There's two-- I'm not doing computational
neuroscience science. That is, I'm not trying to make
a model of how the brain works. I'm looking at the brain
and saying this thing works. And if we want to make
something else that works, we should sort of look
to it for inspiration. So this is neuro inspired,
not a neural model. So the neurons we
use, they're inspired by the fact neurons have
a lot of connections and they change the strings. NICHOLAS THOMPSON:
That's interesting. So if I were in
computer science and I was working on neural
networks, and I wanted to beat Geoff
Hinton, one thing I could do is I could build in
top down communication and base it on other
models of brain science. So based on learning,
not on reconstructing. GEOFFREY HINTON: If they were
better models, then you'd win, yeah. NICHOLAS THOMPSON: That's
very, very interesting. All right, so let's move
to a more general topic. So neural networks will be able
to solve all kinds of problems. Are there any mysteries of the
human brain that will not be captured by neural
networks or cannot? For example, could the emotion-- GEOFFREY HINTON: No. NICHOLAS THOMPSON: No. So love could be reconstructed
by a neural network? Consciousness can
be constructed? GEOFFREY HINTON: Absolutely,
once you've figured out what those things mean-- we are neural networks, right? Now consciousness is something
I'm particularly interested in. I get by fine without it. But um-- [LAUGHTER] So people don't really
know what they mean by it. There's all sorts of
different definitions. And I think it's a
pre-scientific term. So 100 years ago, if you
ask people what is life? They would have said, well,
living things have vital force. And when they die, the
vital force goes away. And that's the difference
between being alive and being dead, whether you got
vital force or not. And now we don't
think that sort of-- we don't have vital force. We just think it's a
pre-scientific concept. And once you understand
some biochemistry and molecular biology, you
don't need vital force anymore. You understand how
it actually works. And I think it's going to be
the same with consciousness. I think consciousness
is an attempt to explain mental phenomena with
some kind of special essence. And this special essence,
you don't need it. Once you can really
explain it, then you'll explain how we do
the things that make people think we're conscious. And you'll explain all
these different meanings of consciousness without
having some special essence as consciousness. NICHOLAS THOMPSON: Right,
so there's no emotion that couldn't be created. There's no thought that
couldn't be created. There's nothing
that a human mind can do that couldn't
theoretically be recreated by a fully
functioning neural network once we truly understand
how the brain works. GEOFFREY HINTON: There's
something in a John Lennon song that sounds very like
what you just said. [LAUGHTER] NICHOLAS THOMPSON: And you're
100% confident of this? GEOFFREY HINTON:
No, I'm a Bayesian. So I'm 99.9% confident. NICHOLAS THOMPSON: OK,
and what is the point one? GEOFFREY HINTON: Well, we
might, for example, all be part of a big simulation. NICHOLAS THOMPSON:
True, fair enough, OK. [LAUGHTER] [APPLAUSE] That actually makes me think
it's more likely that we are. All right, so what are
we learning as we do this and as we study the brain
to improve computers? How does it work in reverse? What are we learning
about the brain from our work in computers? GEOFFREY HINTON: So I think what
we've learned in the last 10 years is that if you take
a system with billions of parameters, and you'd
use stochastic gradient descent in some
objective function, and the objective function
might be to get the right labels or it might be to fill in
the gap in a string of words, or any objective function,
it works much better than it has any right to. It works much better
than you would expect. You would have thought, and
most people in conventional AI thought, take a system
with a billion parameters, start them off
with random values, measure the gradient of
the objective function. That is, for each
parameter figure out how the objective function
would change if you change that parameter a little bit. And then change it in that
direction that improves the objective function. You would have
thought that would be a kind of hopeless
algorithm that will get stuck. And it turns out, it's
a really good algorithm. And the bigger you scale
things, the better it works. And that's just an
empirical discovery really. There's some theory
coming along, but it's basically an
empirical discovery. Now because we've
discovered that, it makes it far more
plausible that the brain is computing the gradient
of some objective function and updating the weights
of strength of synapses to follow that gradient. We just have to figure out
how it gets the gradient and what the
objective function is. NICHOLAS THOMPSON: But
we didn't understand that about the brain. We didn't understand the
re-weighted [INAUDIBLE].. GEOFFREY HINTON:
It was a theory. It was-- I mean,
a long time ago, people thought
that's a possibility. But in the background,
there was always sort of conventional computer
scientists saying, yeah, but this idea of
everything's random, you just learn it all
by gradient descent, that's never going to work
for a billion parameters. You have to wire in
a lot of knowledge. NICHOLAS THOMPSON:
All right, so-- GEOFFREY HINTON: And we
know now that's wrong. You can just put in
random parameters and learn everything. NICHOLAS THOMPSON: So
let's expand this out. So as we learn more and
more, we will presumably continue to learn more and
more about how the human brain functions as we run these
massive tests on models based on how we
think it functions. Once we understand
it better, is there a point where we
can, essentially, rewire our brains to
be more like the most efficient machines or
change the way we think? GEOFFREY HINTON:
You'd have thought-- NICHOLAS THOMPSON: If it's a
simulation that should be easy, but not in a simulation. GEOFFREY HINTON:
You'd have thought that if we really
understand what's going on, we should be able to make things
like education work better, and I think we will. NICHOLAS THOMPSON: We will? GEOFFREY HINTON: Yeah. It would be very odd
if you could finally understand what's
going on in your brain and how it learns and not be
able to adapt the environment so you can learn better. NICHOLAS THOMPSON:
Well, OK, I don't want to go too far into the future. But a couple of
years from now, how do you think we will be
using what we've learned about the brain and about how
deep learning works to change how education functions? How would you change a class? GEOFFREY HINTON: In
a couple of years, I'm not sure we'll learn much. I think it's going to
change the education. It's going to be longer. But if you look
at it, Assistants are getting pretty smart now. And once Assistants can really
understand conversations, Assistants can have
conversations with kids and educate them. So already, I think most of
the new knowledge I acquire comes from me
thinking, I wonder, and typing something
to Google and Google tells me, if I could
just have a conversation, I'd acquire knowledge
even better. NICHOLAS THOMPSON:
And so theoretically, as we understand
the brain better, and as we set our children
up in front of Assistants. Mine right now almost certainly
based on the time in New York is yelling at Alexa to play
something on Spotify, probably "Baby Shark"-- you will program the Assistants
to have better conversations with the children based on
how we know they'll learn? GEOFFREY HINTON: Yeah, I haven't
really thought much about this. It's not what I do. But it seems quite
plausible to me. NICHOLAS THOMPSON: Will
we be able to understand how dreams work, one
of the great mysteries? GEOFFREY HINTON: Yes, I'm
really interested in dreams. NICHOLAS THOMPSON: Good,
well, let's talk about that, GEOFFREY HINTON:
I'm so interested. I have at least four
different theories of dreams. NICHOLAS THOMPSON:
Let's hear them all-- 1, 2, 3, 4. GEOFFREY HINTON: So a long time
ago, there were things called-- OK, a long time ago there
was Hopfield networks. And they would learn
memories as local attractors. And Hopfield discovered
that if you try and put too many memories in,
they get confused. They'll take two
local attractors of merged them into an attractor
sort of halfway in between. Then Francis Crick and Graeme
Mitchison came along and said, we can get rid of these false
minima by doing unlearning. So we turn off the input. We put the neural network
into a random state. We let it settle down,
and we say that's bad. Change the connections so you
don't settle to that state. And if you do a bit
of that, it will be able to store more memories. And then Terry Sejnowski and
I came along and said, look, if we have not just the
neurons where you're storing the memories, but
lots of other neurons, too, can we find an
algorithm that we'll use all these other neurons
to help you store memories? And it turned out in the end,
we came up with the Boltzmann machine learning algorithm. And the Boltzmann machine
learning algorithm had a very interesting property
which is I show you data. That is, I fixed the states
of the observable units. And it sort of rattles
around the other units until it's got a
fairly happy state. And once it's done
that, it increases the strength of all
the connections based on if two units
are both active, it increases connection strength. That's called kind
of Hebbian learning. But if you just do that,
the connection strengths just get bigger and bigger. You also have to have a
phase where you cut it off from the input. You let it rattle
around to settle into a state it's happy with. So now it's having a fantasy. And once it's had
the fantasy you say, take all passive
neurons that are active and decrease the strength
to the connection. So I'm explaining the algorithm
to you just as a procedure. But actually that algorithm is
the result of doing some math and saying, how should you
change these connection strengths so that this
neural network with all these hidden units finds
the data unsurprising? And it has to have
this other phase. It has to have this what we
call the negative phase when it's running with no input. And it's canceling out-- its unlearning whatever
state it settles into. Now what Crick pointed
out about dreams is that, we know that you dream
for many hours every night. And if I wake you
up at random, you can tell me what you were just
dreaming about because it's in your short term memory. So we know you dream
for many hours. But in the morning,
you wake up, you can remember the
last dream, but you can't remember all the others,
which is lucky because you might mistake them for reality. So why is it that we don't
remember our dreams at all? And Crick's view was it's
the whole point of dreaming is to unlearn those things
so you put the learning rule in reverse. And Terry Sejnowski and I
showed that actually that is a maximum [INAUDIBLE]
learning procedure for Boltzmann machines. So that's one
theory of dreaming. NICHOLAS THOMPSON: You
showed that theoretically? GEOFFREY HINTON: Yeah,
we should theoretically that's the right thing
to do if you want to change the weights so that
your big neural network finds the observed data
less surprising. NICHOLAS THOMPSON: And I want
to go to your other theories, but before we lose
this thread, you've proved that it's efficient. Have you actually set any of
your deep learning algorithms to essentially dream? Right, study this image data
set for a period of time, resort, study again,
resort versus a machine that's running continuously? GEOFFREY HINTON: So yes, we had
machine learning algorithms. Some of the first
algorithms that could learn what to
do with hidden units were Boltzmann machines. They were very inefficient. But then later on, I found a
way of making approximations to them that was efficient. And those were
actually the trigger for getting deep
learning going again. Those were the things that
learned one layer feature detector at a time. And it was efficient form of
restricted Boltzmann machine. And so it was doing
this kind of unlearning. But rather than going
to sleep, that one would just fantasize
for a little bit after each data point. NICHOLAS THOMPSON: So Androids
do dream of electric sheep. So let's go to
theories 2, 3, and 4. GEOFFREY HINTON: OK,
theory 2 was called the wake-sleep algorithm. And you want to learn
a generative model. So you have the idea
that you're going to have a model that can generate data. It has layers of
features detectors. And it activates the high level
ones and the low level ones and so on, until it activates
pixels, and that's an image. You also want to
learn the other way. You want to learn
to recognize data. And so you're going to have an
algorithm that has two phases. In the wake phase,
data comes in. It tries to recognize it. And instead of learning
the connections it is using for
recognition, it's learning the
generative connections. So data comes in. I activate the hidden
units, and then I learn to make those hidden
units be good at reconstructing s that data. So it's learning to
reconstruct at every layer. But the question is, how do you
learn the forward connection? So the idea is, if you knew
the forward connections, you could learn the backward
connections because you could learn to reconstruct. NICHOLAS THOMPSON: Yeah. GEOFFREY HINTON: Now
it also turns out that if you knew the
backward connections, you could learn the
forward connections because what you could
do is start at the top and just generate some data. And because you
generated the data, you'd know the states of
all the hidden layers. And so you could learn
the forward connections to recover those states. So that would be
the sleep phase. When you turn off the input,
you just generate data and then you try and
reconstruct the hidden units that generated the data. And so if you know the
top down connections, you'd learn the bottom up ones. If you know the
bottom up ones, you could learn the top down ones. And so what's going
to happen if you start with random connections and
try doing both-- alternate both kinds of learning and it works. Now to make it
work well, you have to do all sorts of
variations of it. But it works. NICHOLAS THOMPSON:
All right, that is-- do you want to go through
the other two theories? We only have eight minutes left. I think we should probably jump
through some other questions. We'll deal with-- GEOFFREY HINTON: If you
give me another hour, I could do the
other two theories. [LAUGHTER] NICHOLAS THOMPSON: All
right, well, Google I/O 2020. So let's talk about
what comes next. So where is your
research headed? What problem are you
trying to solve now? GEOFFREY HINTON: The main
thing I'm trying to solve, which I've been doing for
a number of years now-- actually, I'm reminded
of a soccer commentator. You may notice
soccer commentators, they always say things like
they're doing very well, but they always go
wrong on the last pass. And they never seem to sort
of notice there's something funny about that. It's a bit circular. So I'm working--
eventually, you're going to end up working on
something you don't finish. And I think I may
well be working on the thing I never finish. But it's called
capsules, and it's a theory of how you do visual
perception using reconstruction and also how you
root information to the right places. And the two main
motivating factors were in standard neural
nets, the information-- the activity in the layer just
automatically goes somewhere. You don't make decisions
about where to send it. The idea of capsules
was to make decisions about where to send information. Now since I started
working on capsules, some other very smart
people at Google invented transformers, which
are doing the same thing. They're deciding where
to route information, and that's a big win. The other thing that motivated
capsules was coordinate frames. So when humans do
visual, they're always using coordinate frames. And if they impose the wrong
coordinate frame on an object, they don't even
recognize the object. So I'll give you a little task. Imagine a tetrahedron. It's got a triangular base
and three triangular faces, all equilateral triangles. Easy to imagine, right? Now imagine slicing
it with a plane. So you get a square
cross-section. That's not so easy, right? Every time you slice
it, you get a triangle. It's not obvious how
you get a square. It's not at all obvious. OK, but I give you the same
shape described differently. I need your pen. Imagine, the shape you
get, if you take a pen like that, another pen that
right angles like this, and you connect all
points on this pen to all points on this pen. That's a solid tetrahedron. OK, you're seeing it relative
to a different coordinate frame where the edges of
the tetrahedron-- these two line up with
the coordinate frame. And for this, if you think
of the tetrahedron that way, it's pretty obvious
that at the top, you've got a long
rectangle this way. At the bottom, you get a
long rectangle that way. And there's [INAUDIBLE]
that you've got to get a square in the middle. So it's pretty obvious how you
can slice it to get a square. But that's only obviously
if you think of it with that coordinate frame. So it's obvious that for
humans, coordinate frames are very important for perception. And they're not at all
important for conv nets. For conv nets, if I
show you a tilted square and an upright diamond, which
is actually the same thing, they look the same
to a conv net. It doesn't have two
alternative ways of describing the same thing. NICHOLAS THOMPSON: But how
is adding coordinate frames to your model not
the same as the error you were making
in the '90s where you were trying to put rules
into the system as opposed to letting the system
be unsupervised? GEOFFREY HINTON: It
is exactly that error. And because I am so adamant
that that's a terrible error, I'm allowed to do
a tiny bit of it. It's sort of like Nixon
negotiating with China. [LAUGHTER] Actually that puts
me in a bad role. Anyway, so if you
look at conv nets, they're just neural
nets where you wired in a tiny bit of knowledge. You add in the knowledge that
if a feature detector is good here, it's good over there. And people would love to
wire in just a little bit more knowledge about
scale and orientation. But if you do it
in the obvious way of having a 4D grid
instead of a 2D grid, the whole thing blows up on you. But you can get in that
knowledge about what viewpoint does to an image by using
coordinate frames the same way they do them in graphics. So now you have a
representation in one layer. When you try and reconstruct the
parts of an object in the layer below, when you do
that reconstruction, you can take the coordinate
frame of the whole object and multiply it by the
part whole relationship to get the coordinate
frame of the part. And you can wire that
into the network. You can wire into the
network the ability to do those coordinate
transformations. And that should make it
generalize much, much better. It should mean the networks
just find viewpoint very easy to deal with. Current neural networks
find viewpoint other than translation very
hard to deal with. NICHOLAS THOMPSON:
So your current task is specific to
visual recognition, or it is a more general way
of improving or coming up with the rule set for
coordinate frames? GEOFFREY HINTON: OK, it could
be used for other things. But I'm really interested in
the use for visual recognition. NICHOLAS THOMPSON:
OK, last question. I was listening to a podcast
you gave the other day. And in it, you said that the
people whose ideas you value most are the young graduate
students who come into your lab because they aren't locked
into the old perceptions. They have fresh ideas, and
yet they also know a lot. Is there anything that you, sort
of looking outside yourself, you think you might be locked
into that a new graduate student or somebody in this
room who came to work with you would shake up? GEOFFREY HINTON: Yeah,
everything I said. NICHOLAS THOMPSON:
Everything you said. [LAUGHTER] Take out those coordinate units. Work on feature three,
work on feature four. I wanted to ask you
a separate question. So deep learning used
to be a distinct thing, and then it became sort of
synonymous with the phrase AI. And then AI is now
a marketing term that basically means using a
machine in any way whatsoever. How do you feel
about the terminology as the man who
helped create this? GEOFFREY HINTON: Well,
I was much happier when there was AI, which
meant your logic inspired and you do manipulations
on cymbal strings. And there was neural
nets, which means you want to do learning
in a neural network. And they were completely
different enterprises that really sort of
didn't get along too well and fought for money. That's how I grew up. And now I see sort
of people who spent years saying neural
networks are nonsense, saying I'm an AI professor. So I need money. And it's annoying. NICHOLAS THOMPSON: So
your field succeeded kind of ate or subsumed
the other field, which then gave them an advantage
in asking for money, which is frustrating? GEOFFREY HINTON: Yeah,
now it's not entirely fair because a lot of them
have actually converted. NICHOLAS THOMPSON:
Right, so wonderful. Well, I've got time
for one more question. So in that same interview,
you were talking about AI. And you said, think of it
like a backhoe, a backhoe that can build a hole, or if
not constructed properly, can wipe you out. And the key is when you
work on your backhoe to design it in such a way
that it's best to build a hole and not to clock
you in the head. As you think about
your work, what are the choices
you make like that? GEOFFREY HINTON: I guess
I would never deliberately work on making weapons. I mean, you could
design a backhoe that was very good at
knocking people's heads off. And I think that would be
a bad use of a backhoe, and I wouldn't work on it. NICHOLAS THOMPSON: All right,
well, Geoffrey Hinton-- extraordinary interview. All kinds information--
will be back next year to talk about dreams
theories three and four. That was so much fun. Thank you. [MUSIC PLAYING]