- Okay, I think we're
ready to get started here. My name is Ken Goldberg. I'm a professor here at UC Berkeley, and I've been studying artificial
intelligence and robotics for over 40 years, and I've always been a skeptic
about artificial general intelligence until recently. Now, have anyone, has anyone tried or heard
about ChatGPT out there? No. Okay. I must say, that was a, a very earth shattering experience for me. And I have to say, I'm really
recalibrating a lot of my, my positions on it, especially about the ability
of ChatGPT to be creative, because I never thought a, an AI system would tell a good joke or, or write a great story. And it's passed all those by far. Now I wanted, I'm very excited about this
series because I think this is on so many people's minds. It really is gonna change our lives. There's no doubt about it. And we also, I'm happy to report that chatGPT the specific program has
some Berkeley DNA in it. The primary architect of
chatGPT is a Berkeley grad, so you're gonna be hearing from him. He'll be back, he'll be here on the 19th of
April at 5:00 PM It's sold out. But we can get you on online. So that should be a very interesting talk. And we have some of
the world's experts in, in AI here at Berkeley. And we have one of them, probably one of the most
well known here today. Now this is gonna be part of
a series of these lectures, and the next one will be next Wednesday. They're gonna be that CITRIS
has put these together with the Berkeley AI Research Lab, and the, the next one will be at
lunchtime, same as this. And then every lunchtime in April, we're gonna present a different speaker. So next week we'll be Alison Gonick, then we have Mike Jordan, and then who's the next, Pam Samuelson. And then we actually have
Rodney Brooks coming as well. And John Schulman is
sandwiched right in the middle. So please look at the, the CITRIS webpage to get
updates on all of that. And Berkeley has a nice link on this, on berkeley.edu/ai. So please save your questions for the end. We'll be leaving time for, for some questions before we
end at noon, I mean at one. And we'll be happy to take them now. Our speaker today is a professor here at, in the Department of Computer
Science at UC, Berkeley. He's a leading researcher
in artificial intelligence, and he's the author, he wrote the book. Okay, he's the author with
Peter Norwig of Artificial Intelligence, a Modern Approach, and that textbook has been, is being used in 1400
universities around the world. It is the single most popular
textbook on artificial intelligence. He has a number of other accomplishments. He is the blaze Pascal chair in Paris. He's on the World Economic Forum Council, global Council on Artificial Intelligence. He delivered the wreath
lectures on BBC radio, and they're available online
if you wanna check them out. Amazing lectures to most of the world on, on artificial intelligence, his views, and is an officer of the most excellent order in the British Empire. So as I understand it, that's the closest you get to
being a knight if you live in the United States. So he's sort of royalty. And he also wrote, or an open letter, as you may have seen about
artificial intelligence that was widely circulated. Now, I just have to end by saying that, just a few last comments here. He's been referred to as
the godfather of artificial intelligence, but don't worry, he's much more approachable
than Marlon Brando. And one of his most significant
contributions to the field is his work on inverse reinforcement, learning a technique that
allows machines to learn from human behavior by inferring
their underlying goals and preferences. The rewards then, and this is a groundbreaking
approach that's changing the way we think about machine
learning and human robot interactions. I must admit these were
written by ChatGPT. Okay, so the last one, when he is not busy
revolutionizing the field of AI, he's also a lover of jazz
music, a skilled wood worker, and a proud owner of a collection
of antique typewriters. Please welcome Stuart Russell. (audience clapping) - Thanks very much, Ken. And the last part is, as
you can probably guess, complete hallucination. Okay. So we're gonna have, I'm sure we're gonna have
the usual AV problems. Good. All right. So I want to begin by getting
everyone on the same page about what we're talking about. There's a lot of use of the
word AI bandied about in the media, in corporate pitch decks. Pretty much everyone is doing AI if, if you believe what the pitch decks say. So of course, it has
meant from the beginning, even going back to the 1840s
when Charles Babbage and Ada Lovelace were talking about it, making their machines intelligent. But if you're actually gonna do it, you have to take the next step. Okay, well, what does that mean? And sometime in the
1940s, this idea gelled, which was really borrowed
from economics and philosophy, that machines are intelligent
to the extent that their actions can be expected to
achieve their objectives. And this is so prevalent that
I've called it the standard model. It's also the standard model
of control theory of operations research to some extent of, of economics where we define objectives. We create optimizing machinery
for those objectives. And then we set off the
machinery and off the go. So I'll be arguing later actually, that that model is incorrect
and it's actually the source of a lot of our problems. But the goal of the field
since the beginning, and you can find this very
explicitly in the writings and speeches of some of
the founders of the field, is to create general purpose AI, not specific tools like equation
solvers or chess programs, but general purpose AI, meaning that it can quickly
learn to behave well in any task environment where the human
intellect is relevant. We don't have a formal
definition of generality, but I think we will
know it when we see it. And maybe Ken already saw it. So let me begin by saying that
there's a reason we're doing this, right? Not just because it's cool or
fun or profoundly interesting, but it could be extraordinarily
beneficial for our civilization because, you know, our civilization is made
out of intelligence. That's the material
from which we build it. And if we have access to a lot more, then we might be able to have
a much better civilization. So by definition, from what I
said about general purpose AI, if we had access to it, we could use it to deliver at
least what we already know, how to deliver with the
intelligence that we have, which is a good standard of
living for hundreds of millions of people. But we haven't yet extended
that to everybody on earth. But general purpose AI can
do it at much greater scale, at much less cost. So we could, I think, deliver a respectable standard of living to everyone on Earth. And if you calculate the
net present value of that, so that will be about a
tenfold increase in GDP, which tells you how far we have to go, it will be about 13.5 quadrillion dollars of net present value. So that gives you a lower bound on sort of what
is the cash value of creating general purpose AI, right? It gives you a sense of what's
going on right now, right? The various entities that are competing, whether you think of China
and the US or Microsoft and Google, that's the size of the prize. So it makes all the investments, all the numbers that we've
ever talked about are just minuscule in comparison. And it also creates a certain
amount of momentum which makes ideas of, okay, well we
could just stop doing this. Probably difficult to implement, but of course we could
have more than that, right? We could have much better healthcare. We could solve a lot of
scientific problems that are too hard for us right now. We could deliver amazingly
good individualized tutoring based education to every child
on earth, so we could really, perhaps have a better civilization. Actually, I just, I just gave a talk at the
OECD where I had politics at the end of that, and people
laughed, so I took it off again. (audience laughing) And towards this goal, we
are making a lot of progress. I mean, the self-driving car, you know, is now a commonplace on the
streets of San Francisco. And I remember John McCarthy, one of the founders of the
field saying that he would be happy when a car could take
him to San Francisco Airport by itself. And that's now something that is, is perfectly feasible. This is some work from my group, one of my PhD students,
Nemar Aurora developed the, the monitoring system for
the nuclear Test Ban treaty. This is developed not using deep learning, but actually this is a large
scale Bayesian inference engine and is running 24/7 at the
United Nations in Vienna. This shows the accurate detection
of a nuclear explosion in North Korea a few years ago. And so there's the entrance
to the tunnel down there that was later found in satellite images. This is our estimate in
real time of where the event happened, about 300
meters from the tunnel. And then we have the, the amazing things that
generative AI does. So here's one of my favorite
examples using Dolly, you ask it for teddy bears
mixing sparkling chemicals as mad scientists in a steampunk style. And this is what it produces
in a few seconds, right? So this is pretty amazing, right? Particularly given that
for us to do this right, we would need a rendering engine, which is hundreds of thousands
of lines of graphics code. It doesn't have a rendering
engine, doesn't have any graphics code, and yet it's
able to produce this image. It's not perfect. The teddy bear seems to have a, a foot where he should
have a hand and so on, but it's pretty amazing. And then of course, we had,
what was China's Sputnik moment, as it's often described, the defeat of human world champions at go, which was predicted after the
defeat of Gary Kasparov to take another a hundred years
because go was thought to be so much more complex than chess. I'll tell you a little
bit more about this later. Okay, so how are we doing this right? How are we filling in this question mark, between the sensory input of
an AI system and the behavior that it generates. And there have been various
approaches over time. The one that's popular
right now is this, right? You simply fill that box
with an enormous circuit. That circuit has billions
or trillions now of tunable parameters, and you optimize those parameters
using stochastic gradient descent in order to improve the
performance on the objective that you've specified. A much older approach was popular in the 1950s where
instead of circuit in the box, you had four tran programs. And rather than just
stochastic gradient descent, you also did crossover. In other words, you took pieces of those
different FORTRAN programs and, and mixed and matched in a
way that is reminiscent of evolution. So that didn't work in 1958, but you should keep in mind
that they were working with computers that were at least a quadrillion
times slower than the ones we have now. So we are using, in a
typical machine learning run, now a quadrillion times more
computation than they typically used then if they did a, you know, even a multi-day machine learning run. Okay? For most of the history of the field, we pursued what we might
think of as a knowledge-based approach, the idea that intelligence
systems have to know things, right? That seems so fundamental and so obvious, and we built technologies
around that principle. Initially, technology is based on logic where the knowledge was represented as formal sentences
in a mathematical logic. Later on, we embraced uncertainty,
developed technologies based on probability theory with the
advent of Bayesian networks. And then in the 2000s,
in some sense produced a, a unification of those two
approaches with probabilistic logics or probabilistic programs. And I'll talk some more
about that later on. And the advantage of, of building knowledge-based
systems is that each element of a knowledge-based system has
its own meaning and can be verified and checked
against reality separately. And that provides an enormous
commentorial advantage in the rate of learning and the ability
of a system to fix itself when something isn't working. So let me give you an
example of what I mean by a knowledge-based approach, right? So here are a pair of black holes, about 1.2 billion light years away. So a a good distance
across the known universe, not all the way across,
but a good distance across. And well, 1.2 billion years ago, they started rotating around each other. And as they did that, they lost energy and
became closer and closer, and they started radiating
gravitational waves at the peak level of radiation. They were emitting more energy, 50 times more energy than all
of the stars in the universe. So a quite cataclysmic event. 1.2 billion years later,
just by coincidence, we had completed the LIGO the large interferometric
gravitational observatory, which is designed to detect
exactly this type of event. The gravitational waves emitted by this, the type of collision of black holes. So this is several kilometers in length. It contains enormous amounts
of advanced technology lasers, lots of computers, incredible materials, and it measures the distortion
of space caused by these gravitational waves. And how big is that distortion of space? Well, on relative to this scale,
it's the distant, it's the, the distance to alpha Centauri
relative to the width of a human hair. So if you change the
distance to Alpha Centauri, which is four and a half light
years by the width of a human hair, this system would be
able to detect that change. So that's how sensitive it is. And they predicted what that
distortion would look like if there was a black hole
collision like this, and it looked exactly the way
they predicted they could even measure the masses of the
two black holes and so on. So this was the result of
knowledge that we as the human race accumulated the knowledge
of physics, of materials, et cetera, et cetera, et
cetera, over centuries. And so if we're going to
create general purpose AI, it needs to be able to
do at least this, right? Because humans did this, it's hard to see how we would
train a deep learning system. Where would we get all the
billions of examples of different gravitational wave detectors
to train it in order to figure this idea out, right? It starts to just not even
make sense when you think about things, when you don't think about
things in terms of systems that know things. So I'm gonna give you a few reasons to, to be a little bit doubtful
about this orthodoxy that has dominated AI for the last decade, which is that all we need is
bigger and bigger circuits, and it's going to solve everything. So first of all, just a simple
technical reason, right? Circuits are not very expressive compared in particular to
programs or formal logics. And that causes an enormous
blowup in the size of the representation of some perfectly
straightforward concept. So for example, the rules that go just talk
about pieces being put on a board and surrounding each other, being connected together
in groups by vertical and horizontal connections. These are very easy things to
write in English or in logic about half a page is enough, but to try to write those down
in circuits requires millions of pages of circuit. And that means that it takes
millions or even billions of examples to learn an
approximation to the rules, which it can't actually learn correctly. And I'll show you
examples of that later on. And you also get very
fragile generalizations, which we see in adversarial
examples in image recognition where tiny invisible changes
in images cause the object recognizes to completely
change their mind. That what was a school bus, You change some pixels
invisibly on the image, and now it says it's an ostrich. In fact, you can change any
object into an ostrich by making invisible changes to the image. Okay, so these should worry you, right? Rather than just sort
of say, oh, well that's, that's peculiar, right? And just carrying on, right? They should worry you,
they should say, okay, maybe there's something
we're not understanding here. Maybe our confidence is
misplaced in the success, the, the so-called success
of these technologies. And some group, some students at MIT
working with Dave Gifford, did some research on the, you know, standard high performance
convolutional neural network systems that do object recognition
on the ImageNet database, which is the standard benchmark. And so if you look at this
parachute here looks like a parachute to us. The system is recognizing it
by looking at those pixels, right? As long as those pixels are blue, this is a parachute. So that should worry you, right? You know, if you know, as long as there's a certain
type of grass on the right hand edge, then it's a golden retriever, right? This should worry you if you
are basing your, you know, your trillion dollar company
on the fact that these systems are going to work in the real world, then they should worry you. Well, you could say, well, okay, we know that these go
programs are truly amazing. They've beaten the human
world champion, right? What more proof could you want than that? Okay, so here's what's happening with Go. So 2017, the successor system of alpha
go defeated the current world champion, whose name is Ke Jie, I'm probably mispronouncing
that, who's Chinese. And that was, that was really the
China's sputnik moment. And the best human players
rating on the ELO scale is 3,800. The best Go systems now
are operating around 5,200. So they are stratospherically
better than human beings, right? They're as much better than
the human world champion. As the human world champion
is better than me, right? And I'm useless. So, okay, so in our research
team, we have a go player, his name is Kellin Pelrine
and his rating is around 2300. So way better than I am, but way worse than the
human world champion. He's an amateur and, you know, Completely hopeless relative
to the best program. So the best program right
now is called KBXKata005, which is way better than AlphaGo. And this is the current
number one program on the, the internet ghost over and
Kellin decided to give it a nine stone handicap, right? Which is the maximum handicap
you can give to another player, right? So it's the kind of handicap
you give to a five year old when you're teaching them to play, right? To give them like some chance of playing, of having a decent game. Okay, so the computer is black, starts out with a nine stone advantage. And here we go. So I'm just gonna play
through the game very quickly. So remember the human is
white computer is black, and watch what happens in
the bottom right corner. So capturing and go consists
basically of surrounding your opponent's pieces so that they
have no breathing spaces left at all. So white makes a little group, and then black is trying to
capture it so it's surrounding that little group. And then white is surrounding
the black group, right? So we're kind of making
a circular sandwich here. And black seems to pay
no attention at all, doesn't seem to understand
that its pieces are in danger. And there are many, many
opportunities to prevent this. It's actually a completely
hopeless strategy for white. If you play this against a
human, they immediately realize, and now you see that black
has lost all of its pieces, right? This is a program that is far, far better than any human go player. And yet it's playing ridiculously
badly and losing a game even with a nine stone
handicap against an amateur. So if that doesn't worry you, right? Then I think you, I
dunno what to say, okay? So I think we are constantly
overestimating the capabilities of the AI systems that we build. And we actually designed this approach. This is a human playing this game. But the circular sandwich idea
came from this thought that the go program had not learned
correctly what it means to be a connected group and what
it means to be captured. In other words, it doesn't understand the rules
because it can't understand the rules because it can't
represent them correctly in a circuit. And so we just looked for
configurations that it doesn't recognize as captures. And we found one fairly quickly. We originally designed some
by hand that involved sort of interlaced groups of white
and black pieces like this, but we couldn't make it happen in a game. And then we did some searching
and found this configuration. And you can reliably
beat all the programs. So not just JBXKata Go, but actually all the leading
programs can't do this. So something is wrong with
their ability to learn and generalize. So I would argue we still
have a long way to go towards general purpose AI. I don't believe despite the
impression one has using the large language models, I don't believe they
really understand language, and I don't believe they
really even understand that language is about the
world in a real sense. And I think the other big
missing piece is probably the third bullet. The ability to manage our
activity at multiple scales of abstraction, which lets us handle this world. The world is so big, so complicated, right? If you take AlphaGo right, which has an amazing ability
to look ahead in the game, it's looking ahead 50 or
60 moves into the future. But if you take that idea and
you put it on a robot that has to send commands to its
motors every millisecond, right? Then you're only getting 50
milliseconds into the future, this doesn't get you anywhere, right? The only way we manage is by
operating at multiple scales of abstraction. And we do that seamlessly and
we construct those different hierarchical levels of
abstraction during our lives, and we also inherit them from
our culture and civilization. We don't know how to get
AI systems to do that. So we have a long way to go and it's very hard to predict
when this is gonna happen. Now, this has happened before, right? The last time we invented a
civilization ending technology was with atomic energy. And the idea of atomic
energy goes back to special relativity in 1905. This idea that there's a mass defect, there's some missing
mass between, you know, helium and hydrogen, right? We know what the pieces
are, we know, and they're, we know how they fit together, but there's some missing mass, which represents the binding energy. So we knew that there were
enormous quantities of energy available if you could
transmute one type of atom into another. But physicists or most
leading physicists at the time believed that that was impossible. So Rutherford, the leading
nuclear physicist of that age, gave a talk in Leister on September 11th, and he was asked this question, do you think in 25 or 30
years time we might be able to release the energy of the atom? And he says, no, this is moonshine, right? And he said the same thing
in many different fora and in different ways. But Leo Szilard read a report
of that speech in the times he was staying in London and went
for a walk and invented the neutron induced nuclear chain
reaction the next morning. So when I say unpredictable, right? I think it is pretty
unpredictable when these kinds of advances might occur in Ai. And this is the title of a
paper that was published a few weeks ago by Microsoft who
have been evaluating GPT four, right? So maybe, maybe they're claiming that
this is that moment, right? That actually in fact they
are claiming that this is that moment they're claiming that they detect the sparks of artificial general
intelligence in GPT four, and they confidently claim that successive versions will quickly
move towards a real AGI. So I'm not gonna say whether
I think that's right or wrong, but I'll come back to it later. So the last part of the
talk I want to talk about What happens if we succeed? What happens if Microsoft is right? And this is Alan Turing's version of that. So he was asked that question
in 1951, and he said, it seems horrible that
once the machine thinking method had started, it would not take long to
outstrip our feeble powers at some stage. Therefore, we should have to expect the
machines to take control. So why is he saying that? Well, I think it's really
pretty straightforward, right? That intelligence really means
the power to shape the world in your interests. And if you create systems that
are more intelligent than, than humans, either
individually or collectively, then you're creating entities
that are more powerful than us. So how do we retain power over
entities more powerful than us forever, right? That's the question. And I think Turing is answering it. He's saying you can't. So if Turing is right,
then maybe we should stop. But that 13.5 quadrillion
dollars is saying no, it's gonna be very hard to stop, right? So I think we could try to
understand whether we can answer this question differently
from Turing and say, actually we can, and I'm
gonna argue that we can do it, but only if we abandon the
standard model of AI and do things differently. So let me give you an example
of where things are already going wrong, and I'm using
this word misalignment. That's the clue, right? Why is it going wrong? Because the objectives that
we put into the systems are misaligned with our objectives, right? So now we are creating
basically a chess match between machines pursuing a mispecified
objective and what we actually want the future to be like. So with social media algorithms, right? These are recommended systems. Think of YouTube for example, that loads up the next video. You watch one video and
another one comes along and, and loads up and starts playing. So the algorithms learn
how to choose that content, whether it's content for your
newsfeed or whatever it is on Twitter, and they do that in order
to optimize some objective. So click through is one the
sort of total number of clicks that we're gonna get
from this user over time. It could be engagement, the amount of time the user
engages with the platform, and there are other metrics being used. And you might think, okay,
to maximize click through, then the system has to
learn what people want, which is sort of good, right? Sounds helpful. But we quickly learned
actually that that's, that's not what's going on, right? That it's not what people want, it's what people will click on, which means that the systems
learn to actually promote clickbait because they, you know, they got more clicks out of it. And so they like that kind of content. And we very quickly
learned about this idea of the filter bubble, where you start to only see
things that you are already comfortable with. And so you become narrower
and narrower in your understanding of the world. But that's not the optimal
solution either, right? If you know anything about
reinforcement learning, you know that reinforcement
learning systems learn to produce sequences of actions
that generate the maximum sum of rewards over time. And those sequences of actions
change the environment in which the agent operates. So what is the environment that the, that the recommender
system is operating in? It's your brain, right? That's the environment. And so it learns to find a
sequence of actions that changes your brain so that in the long
run you produce the largest number of clicks, right? This is just a theorem, right? About how those learning
systems will behave. So they simply modify people
to be more predictable by sequences of thousands
of nudges over time. And at least anecdotally people
have reported that this can happen quite quickly. That your nice Midwestern
middle of the road granny, you know, in a few months can be reading
the Daily Stormer and posting on neo fascist websites. And these albums are really stupid, right? They don't know that people exist. They don't know that we have
minds or political opinions. They don't understand the
content of what they're sending to us, right? If we made the AI systems better, the outcome would be much worse, right? And in fact, this is a theorem
that one of my students, it's off the bottom, sorry. So Dylan Hadfield Mennell and, and Simon Zhuang proved a theorem, very simple theorem saying
that when the objectives are misaligned, optimizing the
wrong objective makes the, the situation worse with respect
to the true objective under fairly mild and general conditions. So we need to actually to get
rid of the standard model. So we need a different model, right? This is the standard model. Machines are intelligent to
the extent their actions can be expected to achieve their objectives. Instead, we need the machines
to be beneficial to us, right? We don't want this sort of
pure intelligence that once it has the objective is off
doing its thing, right? We want the systems to be beneficial, meaning that their actions
can be expected to achieve our objectives. And how do we do that? Well, it's not as impossible
as it sounds, right? So I'm gonna do this in two ways. One is in sort of an Asimovian
style saying here are some principles, right? So the first principle is that
the robots only objective is to satisfy human preferences
and preferences here is not just what kind of pizza do you like, but which future of the
universe do you prefer, right? What is your ranking over
probability distributions over or, you know, complete
futures of the universe. So it's a very complex and
abstract mathematical object, but that's what the objective
of the robot should be. If you prefer a less technical notion, think of it as to further human interests. But the key point is number two, that the machine knows that
it doesn't know what those preferences are, right? That you do not build in a
fixed known objective upfront. Instead, the machine knows that it doesn't know what the objective is, but it still needs a way of
grounding its choices over the long run. And the evidence about human
preferences will say flows from human behavior. So I'll talk a little bit more about that. So these three principles
actually can be turned into a mathematical framework. And it's similar for the
economists in the room, Ben, it's similar to the idea of
what's called a principle agent game, except in principle agent games, the agent is typically
thought of as another human, and you're trying to get the
human to be useful to the principal. So how do you do that even
when they don't have the same objectives? But here we're actually get
to design the agent and we'll see that the agent doesn't
need to have any objectives of their own. So in some ways it's actually easier. So we call this an assistance game. So it's a, involves at least one
person, at least one machine, and the machine is designed
to be of assistance to the human. So let's have a look at that
a little bit more depth. So we've got some number of humans, M, and they have their
utilities, U1 through Um, which we can think of as preferences about what the future should be. And then we have some machines
or robots end of those. And the robots in order to, right, their, their goal is to further human interests. So if I was a utilitarian, I would say the collective
human interest is the sum of the utilities of the individual humans. And we could have other
definitions if you prefer different ways of aggregating preferences. The key point is there's a
priori uncertainty about what those utility functions are. So it's gotta optimize something, but it doesn't know what it is. And during, you know,
if you solve the game, you in principle, you can just solve these games
offline and then look at the solution and how it behaves. And as the solution unfolds effectively, information about the human
utilities is flowing at runtime based on the human actions. And the humans can do deliberate
actions to try to convey information, and that's part
of the solution of the game. They can give commands, they can prohibit you from doing things, they can reward you for
doing the right thing. In fact, in GPT four, that's one of the main
methods is basically good dog and bad dog. That's how they get GPT
four to behave itself and, and not say bad things is just
by saying bad dog whenever it says a bad thing, right? So in some sense, you
know, the entire record, the written record of humanity is, is a record of humans doing
things and other people being upset about it, right? All of that information is
useful for understanding what human preference structures
really are algorithmically. Yeah, we, you know, we can
solve these and in fact, the, the one machine, one human game can be reduced
to a partially observable MDP. And for small versions of
that we can solve it exactly. And actually look at the
equilibrium of the game and, and how the agents behave. But an important point here and, and the word alignment often is used in, in discussing these kinds of things. And as Ken mentioned, it's related to inverse
reinforcement learning, the learning of human
preference structures by observing behavior. But alignment gives you this
idea that we're gonna align the machine and the human and
then off they go, right? That's never going to happen in practice. The machines are always
going to have a considerable uncertainty about human
preference structures, right? Partly because there are just
whole areas of the universe where there's no experience
and no evidence from human behavior about how we would
behave or how we would choose in those circumstances. And of course, you know, we don't know our own
preferences in those areas. In human compatible. The book that Ken mentioned,
I use durian as an example. Durian is a fruit that some
people love and some people despise to an extreme degree, right? Either the most sublime fruit, the world is capable of
producing or skunk spray, sewage stale vomit and
used surgical swabs, right? These are the words people use. So I tried it, I was
in Singapore last week, so I tried durian for the first time. I'm actually right in the middle, right? I'm not on one end or the other, okay? But before that, I literally didn't know my
preferences and I learned something about them as a result. So when you look at these solutions, how does the robot behave? If it's playing this game, it actually defers to human
requests and commands. It behaves cautiously because
it doesn't wanna mess with parts of the world where
it's not sure about your preferences. In the extreme case, it's
willing to be switched off. So I'm gonna have, in
the interest of time, I'm gonna have to skip
over the proof of that, which is prove with a
little, a little game. But basically we can show
very straightforwardly that as long as the robot is uncertain
about how the human is going to choose, then it has a positive
incentive to allow itself to be switched off, right? It gains information by leaving
that choice available for the human. And it only closes off that
choice when it has, well, or at least when it believes
it has perfect knowledge of human preferences. So there's many more extensions, right? The most obvious issue here
is what happens when we have many humans, as I mentioned,
one option is to optimize the, the sum of utilities, but that's an oversimplified
solution and you have to be quite careful. So if you've ever seen
the Avengers movie where Thanos, right? He collects these infinity stones. Once he's got them all,
he's got this plan, right? A very rational plan
also he thinks, right? Which is that if there were
half as many people in the universe, the remaining people would
be more than twice as happy. So as a goods naive, utilitarian, he clicks his fingers, gets rid of half the people
in the universe, okay? So the financial times review
of the movie is Thanos gives economics a bad name, right? So we, you know, as AI systems reach these sort
of Thanos levels of power, we had better figure this out
right before we get there and not implement naive solutions to, to these kinds of very
important questions. There are issues to do with
making sure that when we have lots of as systems game
solvers in the world, at the same time, you know, from many different manufacturers, that they don't accidentally
get into strategic interactions with each other. We have to take into
account the real psychology, the real cognitive architecture of humans, which is what's generating their behavior, which is what's providing
evidence about their underlying preferences. So you have to be able to invert
that correctly since we're getting rid of the standard model, right? That's an enormous pain for
me because in writing the textbook, right, all the major chapters of the
textbook assume the standard model, right? If you think about search
and problem solving, you assume a cost function
and the goal test, right? But what if you don't know
what the cost function and the goal test should be? Well then we gotta rewrite
that whole chapter. We're gonna rewrite markoff
decision processes and reinforcement learning and, and supervised learning and
all the other branches of AI that rely on knowing
the objective upfront. And then if we, if we develop successful and
capable technologies on this foundation, we've then gotta make, make sure that they actually get used, that regulations are developed
to ensure that unsafe versions of AI systems are not deployed. So I'll just briefly mention
one of the results about aggregation of preferences
across multiple humans. And this has been a
fairly settled area of, of economics for a long time. Actually, John Hassani, who
is a Berkeley professor, won the Nobel Prize in part for this work. So the social aggregation
theorem is that if, if you are making a decision
on behalf of N people, then the only undominated or
what's called pareto optimal strategies are to optimize
a linear combination of the preferences of the individuals, right? And if you in addition, assume that no individual
should be privileged, then that linear combination
just becomes a sum, right? That everyone gets equal weight
in that linear combination. But that requires an assumption
that's quite unrealistic, which is that everybody
has the same belief about the future. So this is called the
common prior assumption. If you get rid of that assumption
and allow people to have different beliefs about the future, the theorem gives you a different result. So this is something that Andrew Critch, who's a member of our group,
approved a few years ago, that the optimal policies when
you have different beliefs actually have dynamic weights
for the preferences of the individual. And those weights change over
time as the predictions turn out to be true or false that
each individual is making about the future. So whoever has the best prior
that turns out to give high probability to the future that
actually unfolds will end up getting an exponentially larger
weight than the person who has the worst prior. It's a very egalitarian idea. But this is a theorem, right? And everybody prefers this
policy because everybody believes that their beliefs are
the right beliefs, right? Nobody believes that their
beliefs are wrong, right? 'Cause they wouldn't be beliefs. So this is inevitable. What we do about it is
a different question. So I'll just wrap up and
say a little bit about large language models because that, that's probably what you were
expecting me to talk about. So what are they, right? They're circuits trained
to imitate human linguistic behavior. So you want to think of
their outputs not as words, but as actions, right? They are choosing to emit a
linguistic act with each word. So that's the way to think about it. And as can experience,
they do it incredibly well. It's really hard to see coherent
grammatical text and not think that there's a mind behind it. So if you need an antidote, think about a piece of paper
that has written on it, some beautiful piece of prose
or poetry, and ask yourself, is this piece of paper intelligent? Because it has this beautiful
paragraph of text on it, right? We don't know where the large
language models are between a piece of paper and AGI, right? They're somewhere along there, we just don't know where they are. Now, human linguist behavior
is generated by humans who have goals. So from a simple Occam's
razor point of view, right? If you are going to imitate
human linguistic behavior, then the default hypothesis is
that you are going to become a goal seeking entity in
order to generate linguistic behavior in the same way that humans do. Right? So there's an open question, are these large language
models developing internal goal structures of their own? And I asked Microsoft that
question a couple of weeks ago when they were here to
speak, and the answer is, we have no idea, right? And we have no way to find out. We don't know if they have goals, we don't know what they are, right? So let's think about
that a bit more, right? One might initially think, well, you know what they're doing. If they're learning to
imitate humans, then, then maybe actually, you know, almost coincidentally that
will end up with them being aligned with what humans want. All right? So perhaps we accidentally are
solving the alignment problem here, by the way we're
training these systems. And the answer to that is it depends. It depends on the type of
goal that gets learned. And I'll distinguish two types of goals. There's what we call common
goals where things like painting the wool or mitigating climate
change where if you do it, I'm happy if I do it, you are
happy, we're all happy, right? These are goals where any
agent doing these things would make all the agents happy. Then there are indexical goals, which are meaning
indexical to the individual who has the goal. So drinking coffee, right? I'm not happy if the robot
drinks the coffee, right? What I want to have happen is
if I'm drinking coffee and the robot does some inverse reinforcement, Hey, Stuart likes coffee, I'll make Stuart a cup
of coffee in the morning. The robot drinking a coffee
is not the same, right? So this is what we mean by an
indexable goal and becoming ruler of the universe, right? Is not the same if it's
me versus the robot. Okay? And obviously if systems are
learning indexical goals, that's arbitrarily bad as they
get more and more capable, okay? And unfortunately, humans have a lot of indexical goals. We do not want AI systems to
learn from humans in this way. Imitation learning is not alignment. And then the question is
not just do they have goals, but can they pursue them, right? How do those goals causally
affect the linguistic behavior that they produce, well, again, since we don't even
know if they have goals, we don't certainly don't
know if they can pursue them, but the empirical anecdotal
evidence suggests that, yeah, I mean if you look at the Kevin
Bruce conversation, Sydney, the Bing version of, of GPT four is pursuing
a goal for 20 pages, despite Kevin's efforts to
redirect and talk about anything but getting married to Sydney, right? So if you haven't seen that conversation, go and read it and ask yourself, is Sydney pursuing a goal here, right? As opposed to just
generating the next word. Okay? The last thing I wanna
make before I wrap up, I'm sorry I'm a little over
time, is that black boxes, as I've just illustrated, a
really tough to understand, right? This is a trillion
parameter or more system. It's been optimized by about
a billion trillion random perturbations of the parameters. And we have no idea what it's doing. We have no idea how it works. And that causes the problem, right? Even if you wrap it in
this outer framework, the assistance game framework, right? It's gonna be very hard for
us to prove a theorem saying that, yep, this is definitely gonna
be beneficial when we start running it. So part of my work now is
actually to develop AI systems more along the lines
that I described before, the knowledge-based
approach where we have, where each component of the system, each piece of knowledge has
its own meaningful semantics that we can individually check
against our own understanding or against reality. That those pieces are put
together in ways where we understand the logic of the
composition and then we can start to have a, a rigorous theory of how these
systems are gonna behave. I just don't know of any other
way to achieve the enough confidence in the behavior of
the system that we would be comfortable moving forward
with the technology and probabilistic programming
languages are one possible substrate, not the only one, but one substrate that
seems to have these types of properties and to exhibit the
kinds of capabilities that we would like, for example, being able to do computer vision
better than deep networks, right? Being able to understand
language and learn extremely fast from very small numbers of
examples, so to summarize, I think AI has vast
potential and that creates unstoppable momentum. But if we pursue this
in the standard model, then we will eventually lose
control over the machines. But we can take a different
route and talking of losing control over the machines,
we can take a different, we can take a different route
that actually leads to AI systems that are beneficial to humans. And I'm afraid to be really boring, right? It's a very exciting time in AI but I think AI is going to end
up looking more like aviation and nuclear power where there's
a lot of regulation where the developers, the engineers are extremely
careful about how they build their systems, they check, they verify, and they drive down the error rate as much as they possibly can. We don't look anything
like that right now. But if we're gonna be mature about this, if we believe that we
have sparks of AGI, right? That's a technology that could
completely change the face of the earth and civilization, right? How can we not take that seriously? Thank you. (audience clapping) - [Moderator] Okay, we do have time for a couple of questions. I know many people have
to go to other classes. I have to teach actually in
five minutes, so I have to go. But if I know that there's
a lot of questions here and Stuart will, you'll be willing to stick
around for a little bit? - Yep.
- Is that okay? Yeah. Yeah, okay, great. So questions? Wow, we're everyone okay? You had one right here. Go ahead. - [Audience Member] First
of all, thank you so much. What do you advise young
founders in the AI space in the sense of making their
machine learning models, as I'd say aligned morally
positive and well-defined as possible, while it's still driving
innovation in that space too? - So I think that's a great
question because at the moment it's really difficult if
your business model is, well we get GPT four or
some open source equivalent, we fine tune it for a
particular application. We figure out how to get
it to stop hallucinating and lying, right? I mean, the last thing you want to do is, is you know, have AI systems selling people
insurance policies that don't exist for houses on Pluto and
all this kind of stuff, right? And I've talked to many, many CEOs who ask how do
we use, you know, these, these language models? You know, who can I
replace in my organization? I would say, well, if you have a lot of psychotic
six year olds who live in a fantasy world in your
organization doing jobs, you could probably replace
those with GPT four. So I actually believe that
we might be able to come up with a different form of
training that still uses all this human data, but
avoids the indexical, right? Learning the indexical goals
so that might be a help. But I'm afraid at the moment,
the other technologies, the well founded technologies, we can do some amazing things
like the global monitoring system for the nuclear test pantry, which literally took me
half an hour to write, but it's not a panacea, right? One thing we can't do
with it is just download gigabytes of data, shove it into a tabular Raza
system and have it work off the bat like that. So the technologies that we, that are available off the
shelf that will just at least, you know, claim to solve your problem
without any real effort are so unreliable that I don't, I don't think you can ethically
use them for any high stakes application. And interestingly, you know, I think OpenAI did make serious efforts. If you look at their webpage
and the model card paper, they made serious efforts to
try to get it to behave itself basically, bad dog, bad dog. But if, you know, if you've got a dog who pees
on the carpet and you say bad dog, right? It says, oh, okay, you mean don't pee on the
carpet before breakfast? Good. I won't do that, right? So it's kind of hard to get
these things to learn because we don't know how they work. So anything that's high
stakes open AI says maybe you shouldn't use GPT
four for that application. - Right here,
- Yep. - [Audience Member 2] Thank
you so much for your talk. I have two questions. One is, I read that you signed the
petition to ban AI experiments for a certain amount of time. Why don't you think we could
have done what we will do in the next six months time in the past, like before all this became like blew up. Second question is, if the AI like starts replacing
all the jobs should, what, what kind of solution should we seek for? Should the government
start like distributing the increased production? Or should each one of us like
try to find another job that we can do? - Yeah, okay, so two questions. So the open letter asks for
moratorium on training and releasing systems more
powerful than GPT four for six months. I did not write the letter at
all when they asked me to sign it, I suggested all kinds of changes, but it was too late at
that point to change it. But I decided to sign it
because I think the underlying principle is very simple. You should not deploy systems
whose internal principles of operation you don't understand
that may or may not have their own internal goals that
they are pursuing, right? And that show sparks of AGI, right? It's incredibly irresponsible to do that. And you know, there are the OECD, which is a very boring
organization, right? Organization of economic
cooperation and development. You know, it's all of the
wealthy countries talking about, you know, making sure that
their economies interoperate successfully. So they have AI principles
that all the major governments have signed up to, right? This is a legal document that
says that AI systems need to be robust, predictable, and you need to be able to
show that they don't present an undue risk before you
can deploy them, right? So I view this petition as
simply asking that the companies abide by the AI principles
that the OECD governments have already agreed to and we're
sort of nudging the governments to go from principles to regulation. So that's the idea. Okay, second question, what about jobs? Well, that's a whole other lecture. In fact, one of the four
wreath lectures is exactly about that. And it's a long story. My view in a nutshell is
that the impact on jobs is going to be enormous and
obviously general purpose AI would be able to do any
job that humans can do with a couple of exceptions, right? So jobs that we don't want
robots to do and jobs where humans have a comparative
advantage because we have, we think the same subjective
experience as each other, which machines do not, right? A machine can never know what
it's like to hit your thumb with a hammer or what it's
like to fall out of love with somebody, that's simply
not available to them. And so we have that empathic
advantage and to me that suggests that the future
economic roles for human beings will be in interpersonal roles. And that's the future of the economy. Not a whole bunch of
data scientists, right? The world does not need 4
billion data scientists or any, you know, I'm not even, I'm not even sure we need
4 million data scientists. So it's a very different
future, but not a terrible one, but one we are totally unprepared
for 'cause we don't have, you know, the human sciences
have been neglected. We don't have the science
base to make those professions productive and really valuable
in helping each other. Thanks. Last question, apparently. - [Audience Member 3]
Thank you for the talk. During one part of the talk, you've mentioned that a good
way to reach an alignment is to prioritize the parts of
population or people who have a strong pride for the future. So the ones who are right about
the future will kind of end up winning over in term of
what the systems align to. How do we ensure that those
are positive for the humanity and not bad actors end up
kind of being right about the future and kind of en
ending up a self-fulfilling prophecy, just making
a negative future, huh? - Okay, so actually I wanna separate
out two things here so that, that theorem is a theory, right? So you just have to face it, right? I mean, I think the, the obvious recommendation
coming from that theorem is that it's a good idea to get people
to have shared beliefs about the future, right? But it actually, it's
interesting, it provides, for example, a way to get agreements between
people who fundamentally disagree about the state of the world, but you can get them to
agree by having this sort of contingent contract that
says, well, if you are right, you win. And if you are right, you win. And both of them agree
to this contract, right? So you can naturally resolve
negotiations better by understanding this theorem. But it the separate question
about how, you know, how do we deal with the fact
that some people's preferences might be negative? So Hasani calls this the, the problem of sadism, right? That some people actually
get their jollies from the suffering of others. And he argues that if
that's literally true, we should zero out those negative
preferences that they have for the wellbeing of others. And you might say, okay, well
those people are pretty rare, right? Most people, all other things being equal. So if it had no impact
on my welfare, right? Someone else being better
off happier, I'm not gonna, I'm not gonna be too upset by that, right? Most people actually be
pleased that someone else is, is not suffering rather than suffering. But there's an entire category
of preferences that you might think of as relative preferences. Some people in economics call
these positional goods, right? So I like having this nice
shiny car parked out in front of my house, not just because it's a nice
shiny car and I like driving it because it's nicer and shinier
than the guy down the street, right? I like having a Nobel prize, not because it's a big shiny
piece of gold and I get a million euros, but because
nobody else has one. If everyone else in the world
got a Nobel Prize at the same time I did, I would feel a
lot less good about myself, right? Isn't that weird? But relative preferences
play a huge part in people's identities, right? Think about soccer fans and how
they care about their soccer team being better than the
other people's soccer teams and so on. So if we were to zero out all
those relative preferences, which operate mathematically
exactly like sadism, right? It would be a massive change. So I don't have a clear
recommendation on that point, but it's certainly
something we should work on. - [Moderator] Thank you so much. That's all the time we have. Thank you so much, professor Stewart Russell
for the enlightening talk. (audience clapping) Thank you all for coming. Please stay tuned for the next events. Go to berkeley.edu/ai for
the rest of the schedule. Thank you so much.