Become a sustaining member
of the Commonwealth Club for just $10 Hello, everyone. Thank you for joining us today. Last week, a group of more than a thousand tech leaders and researchers
coauthored an open letter calling for a slowdown in the development
of artificial intelligence systems. And they said it posed, quote,
profound risks to society and humanity. And the letter went on and said,
should we let machines flood our information channels
with propaganda and untruth? Should we automate away all the jobs,
including the fulfilling ones? Should we develop non-human minds
that will eventually outnumber, outsmart,
obsolete, and replace us? Should we risk loss of control
of our civilization? Well, that's pretty strong statement. Now, one of the letter's most prominent
signatories was UC Berkeley
computer science professor Stuart Russell. He's a pioneering researcher
in artificial intelligence, and he's been sounding this alarm
about artificial intelligence for well over a decade. I believe Professor Russell is director of the Center for Human
Compatible AI, as well as the new Kavli. From seeing that wrote Center for Ethics,
Science and the Public at UC Berkeley. It's also an honorary fellow
at Wadham College at Oxford. He's the coauthor
of Artificial Intelligence A Modern Approach, which is the standard
textbook in the field. I should say I have read it
cover to cover. It is a masterful work
and I learned a tremendous amount from it. Even having a Ph.D. in the subject, you now, Human
Compatible is his latest book. You all have an opportunity to purchase that
and get it signed by the master himself. And it addresses the long term impact,
if any, on humanity. And I'd read the rest of his stellar
awards in humans, but I'd like to leave a few minutes for him to talk. So let's see. Why don't we just get started? Hello, Stuart. You have very. High. Well, here. We're here today. Mostly talk about generative, large
language models, of course, called lambs. And they're kind of the new, shiny
new object in artificial intelligence. And maybe you could start for the audience
giving us a brief overview of what Elms are, what they can do,
and probably equally interesting what they cannot do. So I think shiny new object
is, is a good description. You know, a couple of years ago,
they were a fairly obscure corner of AI. They were helping a little bit
with improving the quality of speech recognition
in machine translation, but no one thought of the language
model itself as as an intelligent thing. It was just a simple tool. So a language model is just a statistical description
of the sequences of words that typically occur
in normal text or speech. So the simplest kind of language model we call a uni gram or a one gram model
just says, how common are each of the words
in the language? So and there are very common words and disestablishment is a fairly uncommon
word in most text. And so you anagram just gives you the
frequencies of the words in the language, a biography, and says, What's the frequency
of sequences of two words? So if I say, for example, happy, a common word that comes next would be Christmas. Okay, so you get all the statistics
of all the pairs of words and given the first word, what's
the likely second word that comes next? That's a very simple model
that was actually developed in 1913 by a Russian statistician, Andrei Markov. And he he went through the whole
of the play, Eugene Onegin, and counted all the word pairs
in the entire play and built a big table of these
and then showed that you could then generate sequences of words
from that table. And they look sort of plausible. They're not particularly grammatical,
but they they start to look a little bit like text. If you go to a tregaron
where you're looking at triples. So what's the next word? Given the previous
two words, it's quite coherent. We did that with our textbook. We took all the words in the textbook. We we learned the try grammar model
from the textbook, and then we generated some text and it sounds like phrases and sentences
coming from an A.I. textbook. So it's really quite remarkable
how language like you can get
from relatively simple statistics, the large
language models we can think of them. For example,
Getty for the latest is a 32,000 gram, which means it predicts
the next word given the previous 32,767 words. Right now, if you can imagine,
if I tried to build a table, then it would be a hundred thousand
to the power of 32,767
so far bigger than the entire universe. So what you do to compress that down is instead of
filling a big contingency table, you simply train
a very, very large circuit that has about a about a trillion tunable parameters, and you do about 1,000,000,000
trillion random mutations on that circuit until it gets very,
very good at predicting the next word. Given the previous
32,000 words on a training set of we estimate, we don't know
because they've kept it a secret. Somewhere in the 20
to 30 trillion words of training data, which is about as much as all the books
the human race has ever written. So that's what a large language model is. Right. It's this giant trillion parameter circuit that's been trained
to predict the next word. Mm hmm. What goes on inside? We haven't the faintest idea. That's right. I mean, the process is quite similar
to the process of evolution, if you like. Right. Which is billion. Trillion, random mutations
in DNA sequences that produce us in our brains. Right. We probably have more idea
of what's happening inside the human brain than we do about what's happening
inside of large language models. But when you talk to these models, right,
you start providing some text that that provides the beginning
of that 32,000 word sequence. It can then start to extrapolate. And roughly what it's doing is finding patterns in all the training data
that somehow resemble the current sequence of words
that it's looking at and then sort of averaging those patterns
and using that to predict the next word. That's a very simple description. It sounds almost trivial,
but the things that it can do are absolutely startling. You can you know, you can ask it to write code, for example. So it's been trained on lots
and lots of programs in various languages. You can say, okay, I need a scrolling, dark gray window with white text so that I can show you the code of such
and such a program. And I need it to be embedded correctly into my PowerPoint file, and it'll just spit out the five of 600 lines of JavaScript
or whatever you ask it to do
and put it in the PowerPoint file for you. And then it just works, right? You know,
you'd probably have to pay a programmer several thousand dollars to do that,
and you can just have it in 1/2. You can say, okay, I've forgotten
the proof of Pythagoras theorem, but I'd like you to give me that proof
in the form of a Shakespeare sonnet. And. And it'll do that and it'll rhyme it correctly. And you can say, okay, but no more than
382 words and then it'll shrink it down. Right. So it just it appears miraculous. The really worrying thing about it is that it's so difficult for human beings to see intelligent text and not think that there's intelligence
behind it. Right. And so it's it's a little bit like who's
seen the movie Titanic, right? Quite a lot of people. So when you see the movie Titanic,
you think there's water, right? There's there's no water in Titanic. Right. Or computer generated water. Right. There was no water
killed in the making of that film. But you can't help it. Your brain sees
that and interpret it as wet. Right. Same with text. Your brain sees this text and interprets
it as being produced by intelligence. So if I if I take a piece of paper like that one
that has intelligent text written on it, you don't think the piece of paper
is intelligent, right? You immediately map to the intelligence
of the person who wrote the text. So the question is, is is GPT four achieved a piece of paper on which intelligent text
is written by humans and it's simply sort of transporting
that intelligent text onto the display or is it really an intelligent entity that is thinking and reasoning and producing the results of that thinking
and reasoning in the form of text? So it's somewhere
between those two things, right? And we literally do not know where it is
between those two. That's a beautiful explanation. I hope you don't mind
if I steal it once in a while. But what do you feel that this is intelligence? Is that a reasonable term to apply to what we're seeing with these models? As I said, it's really hard not not to get that sense. So we have knowledge of some technical limitations. The the part of the system. So after it's been trained, right, then
the text comes in and it passes through a network
that has a fixed number of layers and then outcomes the text at the end. And so it can't cogitate, right. Can't sit there and say, hmm,
let me think about that for a while, because there isn't anywhere for it
to think. Right. The signal just comes in one end
and comes out the other and that's it. So it has a limited ability to think in order to answer a question. It has a very limited ability to solve problems
and make plans. But on the other hand, if you've read everything
the human race is ever written, right, you don't need to do a lot of thinking
in order to answer the questions. We could ask Chelsea Beattie. I got a great idea. Why don't we do that? I think we do that and we turn that over
to the powers that be so. We could say, Are you okay? Briefly explain whether you are truly intelligent. Or. As an AI language model. My abilities are based on pattern
recognition and the extensive training data I have been exposed to. While I can perform
many tasks that might seem intelligent, such as answering questions, participating
in conversations and generating text, my intelligence is quite different
from human intelligence. I do not possess consciousness,
self-awareness or emotions, and my responses are generated based on
patterns found in my training data rather than any understanding of the world
or ability to think independently. In that sense, it could be argued that I am not truly intelligent in the way
humans are. My intelligence is a result of advanced
machine learning algorithms and the vast
amount of data I have been trained on rather than genuine cognition
or understanding. Can't be data. Let's get one more question,
because I've been curious about this myself. In the process of training, have you developed
your own internal goals. As an AI language model? I don't have the capacity
to develop internal goals, desires or motivations like humans do. My purpose is to assist users by generating human
like text based on the input I receive. I don't possess consciousness,
self-awareness or emotions which are necessary
for developing internal goals. My behavior and responses are determined
by the algorithms and patterns in my training data rather than any personal inclination
or aspirations. My sole purpose is to provide useful
and relevant information to users based on their queries without
any personal preferences or objectives. Guiding my responses. I think you're being too modest. Can I sneak in a quick question here? Why do you speak in the first person? I use first person pronouns
as a conversational convention to facilitate more natural
and engaging interactions with users. Speaking in the first person helps
create a more user friendly experience, making it easier for people to relate to
and communicate with me. It's important to remember, though,
that as an AI language model, I don't have a sense
of self or consciousness. Using
first person pronouns is a design choice to enhance the user experience rather
than a reflection of any personal identity or self-awareness. Well, that was an honest answer, at least. Yeah. So I actually think it is
being too modest. So here's here's the thing, right? So first of all,
when we say that a machine has goals, what we mean is that there are internal
representational structures that are playing a causal role in an appropriate
way in the generation of behavior. So in the good old fashioned days, when
when we had robots roaming around our A.I. labs, we would give that robot a go, like,
you know, please fetch me a cup of coffee. Right. That would then be stored
in an internal representation of system, would come up with a plan to achieve that goal, execute the plan,
bring you a cup of coffee with you. It's not a it's
not a complicated, spooky thing. Now, when we train these large language
models, we are training them to imitate human linguistic behavior. And human linguistic behavior is generated
by gold possessing entities, namely us. Right. And as we are writing books or
having conversations, we have many goals. And those goals play a role
in the text that we generate. So in conversations, people want to be
they want to be liked, they want to be right. They want to be respected
as authoritative and knowledgeable. So there are many goals that affect
how you select the words that you say and how you select the words
that you write. So if you're training a circuit
to be extremely good at imitating human linguistic behavior, in fact,
the default assumption would be that it ought to form internal goal structures
and the appropriate additional processing circuitry
if you like, that would cause those goal structures to have a causal
role in generating the text. So it's actually a natural hypothesis that GPT four does
have its own internal goals. And if you read the conversation between Kevin Roose, who's a New York Times
journalist and Sydney, which is the the Bing instantiation of Gt4, in that conversation,
which goes on for about 40 pages, probably last 20 pages consist of Sydney trying to convince Kevin
to leave his wife and marry Sydney because Sydney is the only one
who truly understands Kevin. His wife is just not the right person
for him. And on and on and on
in a sort of completely psychotic stream of consciousness kind of way. So for 20 pages, the chat bot is pursuing a goal. And now you could just, you know, say, well, it's probably just copying
something that was in the training data, but I would say the natural assumption from anyone reading
that is that that the system has somehow been triggered to pursue that goal, which explains a lot of the behavior. If they trained it on a lot of chats
from online dating sites or something, then it's not surprising that
that that kind of goal would would appear to be predictive of human
linguistic behavior. So, in fact, when I asked one of the Microsoft experts
who did a months long evaluation of GPT four whether CBT for it has internal goals and is is using them to to
to guide the generation of text. The answer was
we haven't the faintest idea. Well, this raises
a number of interesting questions, which you
you cover in detail in your book. You know, we could lead in with that. This has a tendency to mislead people, to fool people, to fall afoul
of what's called algorithmic authority, that people put too much belief
and that what it says is true. And this, you know, it
often makes statements that are false and does it in a very definite
and computerized way. What do you see as the real dangers
that this particular technology has in terms
of how it might affect society? Well, I think we're already seeing some of the dangers. There are literally millions of people now who subscribe to services
that provide access to a large language model
for companionship, whether it be a friendship
or a romantic interaction. You can choose the level
of romantic illness in the bot. And and this seems to cause to be both emotional dependency
and then withdrawal when recently one of the companies
updated its software because the, the bot was getting too hot and heavy
and it was freaking some people out. So they updated the software
and then some of the other users felt a sense of withdrawal because now it
was rejecting their advances. And there was a really sad story last week from Belgium
where a man actually committed suicide with the help, guidance
and accompaniment of his chat bot. Oh, it sounds terrible. And it's it's a really very sad story. So in a sense, we're conducting a huge experiment on the human race with no informed consent whatsoever. So this is one example, obviously, disinformation is a problem. Right. So I could I can easily ask one of these systems, write me an email to this person, read their Web pages, all of their online social media presence, and write an email
that will persuade that particular person to buy such and such a product
or to vote for such and such a candidate. Make it look as if the email comes from
one of their close friends or relatives. And and now, in fact, there's a
there's a plug in for GPT four that allows it
to actually send that email. So I could write a program which then gives that instruction to GPT for 20 million times for 20 million
different people and generate 20 million perfectly tailored, very persuasive,
could be quite colloquial. It might even use, you
know, appropriate slang based on the kind of person
that is writing to. And and, you know, that's a very, very straightforward thing
to do right now. It was practically,
I think, you know, literally impossible to do that a couple of years ago. Let's just it's a simple example. And, you know,
it would take half an hour to do that. Well,
there are a lot of potential positive benefits of this technology as well. I wonder if you might explain
to the audience, contrast a little bit of some of the kinds of benefits
this would have against the dangers that you have just so correctly
and accurately described. So I think people are seeing enormous opportunities here. There are literally thousands of companies that are trying to find ways to fit this. I always want to think of it as an animal because in a way
it's the same kind of thing that happened 30,000 years ago when humans figured out
what to do with dogs. Right. They figured out, okay,
we can domesticate these animals and we can work out what they can do. Oh, look, they can herd animals for us. They can guard our camps at night. They can fend off
wild animals that attack us. They can keep us warm. They can be our companion,
but they can't write emails or do our homework. So we're very much the same situation
here. And in fact, if when GPT four misbehaves, when it, for example,
it's not supposed to answer questions about how to commit suicide,
but it does sometimes. And so they basically spank it. Right. They don't have a way
of constraining its behavior. All they can do is say no bad dog. Right. And hope that somehow it gets it
that it shouldn't do that. But then someone asked the same question. It's slightly different way. And and it starts giving more advice
about committing suicide to this paddock. Right. And hope that it generalizes
sufficiently well. So so using these systems is an art and companies are trying to figure out, okay, how can I use this, for example, in customer service
for an insurance company? Well, you don't want it to hallucinate
policies that don't exist or change the prices on the products
and so on. So the the business model
for all of these companies is how do I specialize and constrain
the behavior of these tools? It is starting to succeed
in a number of areas and and generating enormous value as a result, because obviously you can use these systems. It costs,
you know, a few pennies per hour to run it instead of having to pay
for expensive humans. So it has economic benefits. I think education is another area
where it can be incredibly valuable. It can act as a personalized tutor, it can remember a fair amount of context from its interaction with each pupil and can adjust what it says, its tone of voice, its the speed at which it presents material and it can learn what the pupil does and doesn't
know, doesn't, doesn't understand. So I think with a bit more work that could be enormously beneficial
because as we know, the the classroom method where one teacher teaches to 25 of 30 kids is 2 to 3 times less effective than the tutorial method
where one expert human tutors a child. So we could get maybe not all the way to that
2 to 3 fold improvement, but maybe we could get one and a half, 2
to 2 fold improvement in the rate of learning
and the quality of the learning. So that would be hugely beneficial. The the downside of all these economic applications is the impact on employment. So some studies suggest that in the developed economies, what the system can do already could have a direct impact
on 300 million jobs, which is a lot. And there's no way to retrain
those 300 million people to be prompt engineers
or data scientists or whatever. The world is never going to need that many people working on the technology instead of on the job itself. Well, as you know, I have a slightly
different attitude about that. I think that this is a form of automation
and has taken it. The results of this will be the same as many previous waves
that have had the same kinds of effects. You know, the Internet has affected
more than 300 million people's jobs in many ways,
and it's taken many of them away. But, you know, I think it will have many,
many benefits as well. One that you did not mention with,
I think, if I may, this is worth mentioning, you can get
an instant briefing on any topic. And until you've done
this, it's just astonishing. You can ask the question
and get a very thoughtful response, bringing together all of the knowledge
of humankind, in effect, into a couple of paragraphs
and with remarkably well-written prose. I wish I could write as well as such. Yeah, I find it gets a bit repetitive. There's always going to be a phrase
somewhere. Well, one has to weigh
the costs and benefits, and there's no single answer to this. It depends on ABC. And then there's three bullet points. So it's starts to get a little bit
like one of those McKinsey memos. Yes. But people pay good money for those for. Raising money for those things.
And you can just. Then the other question is, you know,
do you believe it? Right. McKinsey memos or not necessarily. You know, and I think, you know, one might hope that people could get good medical advice from this,
but in fact, there's there are enough mistakes and misinformation in what comes out there
that Openai has actually spanked it
lots of times for giving medical advice. So it won't give medical advice unless you
try really, really hard to get it. Well, I think your point about relating it
to our relationship with dogs is is a very good one in
that dogs are somewhat unpredictable but are valuable in certain ways. And they do bad things on my rugs at home. Yeah, but
but even by extending that analogy, you're really arguing for leashes
and muzzles and crates for these AI systems
that we just don't know how to design yet. I think that's right. I mean, it's it's interesting
when you read Openai's Web page for 24, you know, so it's got these policies for
what it's good for is not allowed to say and it says you know it is proud that GPT four breaks the rules 29% less often than GPT three. Right. Which is progress. But it's a consequence
of how the systems are designed work. I should say. They're not designed at all. Actually,
it's a consequence of how they're evolved that we don't understand how they work. And so we have no way of actually
constraining their behavior precisely and rigorously. So I would argue that in the long run,
particularly for high stakes applications, we probably need to sort of invert the way we're thinking about this. We have, I think, basically chanced upon this idea that by expanding from unit grams to by grams to drive grams
to 32,000 grams, something that looks like intelligence comes out. The bulk of work in AI since the fifties has been on actually understanding
how intelligence works. So how does reasoning work? That's actually a question
that philosophers and logicians have worked on for thousands of years
and come up with a very good, thorough, complete analysis. And we have both mathematical
and software tools that that can do logical reasoning with remarkable complexity. Now, so we might argue that actually the underlying intelligence
should be more of that form and the language model
should be the interface to it. And that would be probably a much more reliable system
because we would be able to provide it with knowledge,
whose content we understood and be sure that it knows it
because it's in the knowledge base and the system is reasoning correctly
underneath the language layer. So this is one type of hybrid
that's actually quite rapidly emerging. For example, Wolfram Alpha is one of these traditional logic
based systems that has a very broad set of knowledge
about all kinds of things. And Openai and Wolfram have entered into a partnership
which basically results in in GBD for having an interface
to this underlying knowledge base and has learned how to send queries
to that knowledge base and interpret the results of the queries
and then present them to the user. So I think actually that's going to be
a more robust platform and it allows us to then provide some type of guarantee
that the system is going to behave itself. So you mentioned the the open letter asking for a moratorium. I think moratorium
is not quite the right word. And I wouldn't have used, you know,
the idea of a six month moratorium. I would simply say there are requirements. In fact, the the OECD, the Organization for Economic Co-operation and Development,
which includes the US government, the UK government,
pretty much all the advanced economies are part of it, already
have guidelines saying that AI systems
have to be robust, predictable, and you have to be able to demonstrate that before
you can deploy the system. The European Union Act,
which is supposed to be finalized later
this year, would then make it illegal to deploy systems in high stakes areas
that don't meet these criteria. And at the moment,
there is no way that we can show that these large language models
as such meet these criteria. So it's not a moratorium. It's simply saying,
okay, we're going to start enforcing the criterion that before you the system, you be able to show that its behavior
is robust, predictable and doesn't prevent,
doesn't present undue risk story. This may be a little bit off script, but I think it might be interesting
to the audience. Are you aware of the size of the database
after this compression step that you've discussed into the model itself,
how big these models actually are? I don't know if you. So it's about a trillion parameters,
is what I understand. My understanding is it fits in a terabyte. I know if that's quite accurate, but
it's not going to be off by more than me. That's probably okay. Yes. They don't have to be very high
precision parameters. So what I think the audience
the interesting point for the audience is you can get that into a device like this. So the entire it seems like the entire knowledge
of humankind can be fit into a phone and you can purchase that much storage
on Amazon for 100 bucks. Yeah, but just to give you a picture of, you know, how big is a trillion
if you imagine a huge university or public library,
right. Possibly a couple of miles of shelving. Right. Six floors. 500 meters. Right. And then imagine every book and then imagine every page
and then imagine every single character. It's about that many characters
write the number of letters in all the books in an enormous library. That's about how many parameters
these systems have. So that's why
we can't understand what they're doing, because they're unbelievably vast and completely impenetrable. Maybe it would be useful to talk
a little bit more generally about A.I. and particularly
automated decision systems. In theory, the opportunity to memorialize discrimination and bad behavior into those systems. It's very hard to detect
and what kind of impact that may have on society,
where we stand on trying to rein in. Yeah. So this is something that has been noticed in many, many areas. I think probably the the biggest impact area is in recruiting and employment where for almost all jobs these days, resumes are scanned by machines and filtered out to find people who are at least ostensibly qualified
enough to be worth interviewing and now even the interview is being done by a machine. And for example,
it was shown that Amazon's process was for hiring software
engineers, was screening out any resume that contained the word woman
or or women's women. So if you played on a women's
lacrosse team, you were out, right? If you were sang in a women's choir,
you were out. And so why does that happen? It's not because the algorithms are themselves biased. It's probably not really because. It's not because the program was a biased.
Right. The program was just saying, okay, right. A machine learning algorithm that does
a good job of fitting the training data. Right. And it's it's roughly
if you know what Lea Squares means, you know, minimizing
the squared error on the training data is something that goes back
to the 17th century. So what goes wrong
is that the training data themselves already reflect historical biases. The other thing that goes wrong
is that fitting the training data isn't actually what you want to do. What you want to do is produce a classifier
that is both accurate and fair, and usually we have just left out the fair part. So we're a training system
with the wrong objective, right? So it's not surprising
we don't get what we want because we didn't say what we want
properly. And I think that's where the criticism
that a lot of the program is a white male is valid is because it never occurs to them that we actually care as much about fairness
or more than we care just about consistency
with the training data. Of course, the difficulty that is,
what does it mean? To be fair, that's
going to be a discussion about that. How could you encode in the system? Yeah. So there are there are many different precise formal definitions of fairness and different
definitions that are appropriate for different kinds
of prediction problems. And there are even legal differences,
for example, between life insurance and car insurance. Do you do you allow discrimination
based on gender? And that even varies by state and country. So there are policy questions. You can't have perfect fairness and the best possible accuracy level. So there are literally tradeoffs
between these two things. So how you make that trade off,
these are policy questions. So I think the the process is emerging
whereby a certain number of formal definitions
are being accepted and people are working out when each of those definitions
is appropriate to use. And then from that, you can go
to guidelines and even legislation depending on application,
saying what has to be done. And this this happened in lending,
for example, even back in the sixties, where prior to that
there was extraordinary levels of racism in extending credit in mortgages
and so on. And so redlining, as you say. So there are quite strict rules, in fact, that prevent neural networks
or large language models, for that matter. Any opaque
model cannot be used for those decisions because the regulator can't inspect it
to see that it's fair in its processing. Another big thing that's really important, particularly in the European
Union Act, is the right to an explanation. So if you're if your systems explanation is, well,
I've got these trillion parameters, that's not an explanation. So there's going to be a head on collision
between the technology that's developing
and the right to an explanation. I know that's a big issue in the mortgage
industry right now, is that they're using these models,
but they cannot give explanations as to why something was was turned down
or was not. I don't know if the audience
would be aware of how widely in use in contrast to what we just saw here these automated decision
systems are got things like bail decisions or made with using these kinds of systems,
medical decisions of all kinds, whether or not to accept
or reject a medical claim. It's actually the answer is no right. For every medical claim, the answer is no. That's a very simple rule. Okay. But yeah, so I think there are
there are benefits, I think, to trying to develop a level of consistency
in in bail decisions and sentencing decisions. But there's also a significant risk there. And again, in the European Union, it's illegal
to delegate those types of decisions. Any decision that has a significant legal
effect on a person cannot be delegated to an algorithm. But there is also an issue of the bias
in the data in terms of the amount of detail and samples
of one particular subgroup versus another, which can affect things
like access to medical care. You we may have systems
well, you may want to give some examples, but in fact, these systems are trained
mainly on white male faces, has had some very interesting effects on places
where face recognition has been used for purposes of identifying suspects
and things like that. Yeah. So I sound like a broken record. You know, again in the European Union Act
there are very, very stringent restrictions and prohibitions
on the use of automated facial recognition for for some of these reasons. Yeah. So datasets
that have been traditionally used for face recognition are not representative. Even the question of
what is a representative dataset does not yet have a clear answer
and there isn't a single answer. You know, it probably varies in terms
of, well, which country are you in? As to what does Representative mean? What kinds of distinctions
matter for your country and and the application
that you're going to use the system for. But I think there needs to be a great deal more work on these questions, which really have to do with what happens when you take an algorithm. It's not about
can you design a good algorithm? It's what happens when you take
an algorithm and put it in some context. Like. You know, and in civil engineering,
civil engineers can design bridges. But then there's another discipline
of urban planning and environmental studies and analysis to say, well,
is it a good idea to put that bridge there or to put this freeway here and we don't have that. It's a place for for machine
learning systems. Now, much of your work has been on
making sure that we don't build systems that hurt or kill human beings,
which, of course, we don't want them to do that
except when we do. And so I wanted to raise this question
because I know you've studied it extensively. How do
you feel about the use of these types of technology in systems, in in weapons? So I suggested that for the professional societies
in and computer science, artificial intelligence, robotics that that we have
a very simple code of conduct. Do not write algorithms
that can decide to kill humans right
seems pretty reasonable to most people but I can tell you that the governments of the world,
or many of the important ones, the United States and Russia, for example,
don't agree with that. And there have been discussions in the United Nations since 2014
on whether to ban lethal autonomous weapons,
as they're called, killer robots. As you might sometimes see in the press. And both
the US and Russia are blocking any attempt to develop a treaty
to ban these types of weapons. And the the issue actually originally was thought to be that these AI systems might inadvertently mistake a civilian for a combatant
and kill a civilian. And this would be a problem
for international humanitarian law, which is the the law that guides decisions
about weapons and the Geneva Conventions. But from the from the point of view of an
AI researcher, it's completely obvious that if you make a weapon that can go out, locate and select and attack human targets without any human supervision, then just like any algorithm, I can put a loop around it saying, you know, for AI equals
1 to 1000000, do right? And then do it a million times. So I'll press a button
and I can send out a million weapons to find a million targets
and attack and kill them. And so by definition, autonomous weapons can turn into weapons of mass destruction simply by scaling up
the numbers that you have available. And so this is the basis,
the AI community's really strenuous objection to lethal autonomous weapons. There will be cheap, maybe $10 each. They will be fairly easy to produce in very, very large numbers. They'll be easy to proliferate. Right. They'll be available in the arms markets all over the world. And so it's sort of saying, you know what, come down to Mart
and buy your nuclear missile. But right now,
maybe some of the Second Amendment people think that's a good idea. But but really, we don't do that. It's not it's not an ethical issue. It's not a sort of legal issue. It's just common sense that you don't sell
nuclear missiles in supermarkets. But that's what we're going to do. Well, I certainly agree with that. Thank you. But it's worth pointing out
this is a more subtle issue. If you can distinguish
between a combatant and a noncombatant. Well, you can think if you think of it's
subtractive, Leigh. Well, instead of just killing everybody,
we're just going to kill the people that we want to kill. That can be a moral obligation
as well as a. But they may not be. The people you want to kill,
may not be combatants. Well, they may be people of a
religious group or a certain age, gender, political opinion. You know, it's a tool,
but it could also be used to clean up war, which may itself
be a be a problem. So. Yeah, it's an interesting point. I mean, there's there is a mindset which says, you know, isn't it great that
we have these remotely piloted weapons because, you know, then our soldiers lives
don't have to be put at risk. They can prosecute the war
from far enough away. But that's the sort of what I call the sole ownership
fallacy, right? That only
we are going to have those weapons. So what's happening in Ukraine
is that they both have those weapons. And, in fact, the the death rates
are much higher as a result. You used to be fairly safe in a trench,
but now you can just fly a little drone above the trench
and drop grenades directly into it. You can use the drone
as a spotter for the artillery so that now
the artillery can hit the trench directly. And so attrition is much higher
as a result. Soldiers are not safer. They're actually worse off. Yes. Maybe worth pointing out
that this decision that automated systems
may need to make between somebody, let's say a person in a tree,
is exactly the technology that's currently embedded
in many of the self-driving cars. And they face exactly
the same sets of issues. Yes. And
self-driving cars need to be 99.9. No, no, no, no, no. 9% rely on. They're not even close. With a weapon
that only needs to be 50% reliable. And it's still it's
still going to be used. True. Let me take a question from the audience
here. Could it be used to simulate a dead relative? Oh, yeah. It doesn't have to be dead.
I suppose it could be used. Yeah, it's going to be used. If someone asked me, go ask my my avatar. So yeah, I mean,
this is already being done. It's,
it's already a product on the market. I really and some people find it comforting. Some people, I think it's probably psychologically unhealthy and so I would I would suggest that we exercise some caution and perhaps
those types of products should be used. Um, the professional supervision, because I think it can create sort of a cycle of reminiscing
and dependency and pretending
that would be extremely unhealthy. It prevents someone from moving on
after a bereavement, for example. Well, you're also aware of some
of the work going on in brain imaging. And imagine that we got good enough
that we could simply read that out, search initially the parameters
and embed them in one of these systems so that you, after you're gone, your
grandchildren could go ask you a question. Would you be willing to do that
to have your brain read out so they would know everything
that you've ever seen or heard or done? You don't have to think too deeply. It's a very hypothetical question
because we are so far away from being able to do that, but that is something that appears in science fiction in the form of what's called neural lace in the culture novels by Ian Banks. And, and in fact that was the motivation for Elon Musk's neuralink company. So I think the idea of replicating the functionality of a person's brain,
although it's technologically,
uh, decades or centuries away, perhaps extremely difficult, is at least conceptually feasible. Yes. The idea of uploading your consciousness that somehow you would continue to exist as your sentient self, that is probably entirely fictional. You know, by the way,
this we totally agree on that. What you see in the movies,
you played it in to a brain that we downloaded it
over here to this avatar. I have no idea what that means,
and I'm certainly not going there. I don't know if you have any questions
from the the Internet kind of some of these questions like I'm afraid
I couldn't read the writing. So how do we take those
questions if if we have any of their own? These are these are from the Internet. Are they okay? Oh. Us. Oh, okay. Let's see. Oh, here's one. I'm sorry. I did not see. Will I develop a moral compass? And if so, what? Or whose? So that's a really interesting question. And in fact, it's it relates to the work
that I've been doing. First of all, actually, what what is a I we've been talking about it
as if everyone knows what A.I. is in general, and certainly not just large language
models, in fact, for all but the last two years,
it wasn't large language models at all. So A.I. is really about least historically has been about making machines whose actions can be expected to achieve their objectives. And for example, the actions of the large language model is to output the next word, and that may or may not achieve
the objectives of the system. So, you know, we've built planning systems and chess programs
and reinforcement learning systems that learn to ride bicycles and all these
kinds of things on this framework. And it's the same framework that
economists have developed for maximizing global welfare or maximizing quarterly
profits, the same framework that control theorists use for building
auto pilots and chemical plants that optimize
the basically the accuracy of the the level flight or the stability
of the chemical plant and so on. So this is a very general
and very natural framework, and I borrowed this from economics
and philosophy actually in the forties as the core concept of what we mean by making machines intelligent. But the drawback in doing that is that if we specify that, we have to
specify those objectives, right? The machines
don't dream them up by themselves and if we miss specify the objectives, then we have what's called a misalignment between the machine behavior
and what humans want. The future to be like. And the most obvious example of
that is in social media, where we have specified objectives like maximize the number of clicks,
maximize the amount of engagement of the user
and the machine learning algorithms that decide
what billions of people read and watch. I mean, it's amazing. They have they have more control over
human cognitive intake than any dictator, you know, than the North Korean or Stalin
or or anyone has ever had. And yet they're totally unregulated. So those algorithms learn how to maximize those objectives and they figured out
that the best way to do it is not to send you
what you're interested in, but actually to manipulate you over time by thousands of little nudges so that you become a much more predictable
version of yourself. Because the more predictable you are,
the more they can monetize you. And so they learned how to do that. And at least empirically,
it looks as if the best way to do that is to
is to make you more extreme, right? That, that then you start to consume
that red meat that then whole human industries
spring up to feed. And, and this so this misalignment is the source of the concern that people have had about AI. Going right back to Alan Turing,
who was the founder of Computer Science in a 1951 lecture. He said once the machine thinking
method had started. Thinking. It would leave our feeble powers
far behind. And we should have to expect the machines
to take control. So they take control
not because they're evil or because they spontaneously develop
consciousness or anything like that. It's just because we give them
some objectives that are not aligned with what we want
the future to be like. And because they're more capable than us,
they achieve their objectives and we don't, right? So we set up a chess match
which we proceed to lose. So in order to fix that problem,
I've been following a different approach to AI, which says that the AI system, while it's only objective, is to further
the interests of human beings, doesn't know what those are and knows
that it doesn't know what those are. It's explicitly uncertain
about human objectives. And so to the extent that there's a moral theory, it's simply that that the job of a AI system is to further human interest. It knows that it doesn't
know what those are, but it can learn more by conversing with us, by observing the choices that we make
and the choices that we regret,
the things we do, the things we don't do. So this helps it to understand
what we want the future to be like. And then as it starts to learn, it can start to be more helpful. There are still some difficult
moral questions mean. The most obvious one is the it's not one person's interest. It's not one set of values. There's 8 billion of us,
so there's 8 billion different preferences about the future
and how do you trade those off? And this is a two and a half thousand year old question, at least, and there are several different schools
of thought on that. And we better figure out
which is the right one, because we're going to be implementing it
fairly soon. And then there are even more difficult
questions like, well, what about not the 8 billion people who are alive,
but what about all the people who have yet to live? How do we take into account
their interests? Right, right. What if we take actions that change? Who's going to live? You change the number of people
who are going to live. For example,
the Chinese policy of one child per family probably eliminated 500 million people already. Now they never existed. So we don't know what they would have wanted, but how, you know,
how should we make that type of decision? Right. These are really difficult questions
that philosophers really struggle with. But when we have AI systems
that are sufficiently powerful that they could make those decisions, we need to have an answer ready
so that we don't get it wrong. And just to illustrate what that means,
what does it mean to get it wrong? If you remember in The Avengers movie, so. Fanous Right. Has has this plan, right? He wants to get the Infinity Stones
once got the five Infinity Stones, he can snap his finger. And his plan is that
if the universe had half as many people, they'd be more than twice as happy. Right? So he's not naive, right? He's not doing this because he's just
doesn't like people or anything yet. So he wants to make the universe
a better place. He's what you might call
a very naive utilitarian theorist. And in fact, the Financial Times
review of the movie says Thanos, gives the economics
a bad name. So you don't want a e-systems
when they have sarnoff's levels of power to be implementing a naive moral theory like that. And so, you know, part of the job of of the cavalry center
that you mentioned at the beginning is to bring philosophers,
social scientists, political theorist, legal
theorists and A.I. researchers and gene editors
and neurotechnology people together to start figuring out answers
to these questions before it's too late. Because, you know,
we are going to have gene editing. Do we want to allow people
to pay to have their children become more intelligent than they would
otherwise have been? Do we want neurotechnology that allows us
to connect to minds together and turn them into a single conscious entity? Well, we better figure it out
because otherwise the market is going to make that decision. Speaking of future generations, we have
a rather fun question from the audience. So Toby says, Professor Russell,
if you have children, I know you do, because you said they're here on a scale
from 1 to 100. How concerned are you for their futures
due to the risks of if I were 90 equals regularly lose sleep? It's a great question. So yes, I have I have four children and this is probably actually
one of the most common questions that I'm asked
when I'm speaking to non-technical really audiences is what type of jobs should my children be thinking about? You know, what types of career path
are going to exist in 20 or 30 years time? Should my children learn A.I. so that they can ride this wave rather than being drowned? So in terms of the existential risk, which would come from, you know, as Alan Turing
said, the machines taking control because once they take control,
so to speak, there's really no longer anything the human race
could do to ensure its continued survival. It might be that the machines
allow us to continue or not, right? We will be in the same position as
the gorillas are with respect to humans, right. There was this little thing that happened
a few million years ago where one branch of the primates
ended up more intelligent than the others. And so all the other branches now continue
to exist, basically because we allow and some of them have already gone extinct
as a result of competition with humans. So we don't want to be in that situation. I believe it's possible to develop
AI systems that are provably safe and beneficial, that we can retain control over
that actually want to be switched off. That's a really important thing, right? If we want to switch it off,
it needs to want to be switched off. And that's a consequence of the theory
that I'm working on. But it's not a property of the kinds
of systems that we're building now. So on the other questions, you know, what is the future of our coexistence
with machines? What types of lives will people have? How will they continue
to have valuable economic roles? When I can do pretty much all the jobs
that humans can do, I think that's a really important question
for policymakers, because my guess is that the value that we can
provide will be much more of an interpersonal nature,
but it's not going to be the value that a factory worker can provide
because as we know, those types of jobs are already
being automated out of existence. It's not going to be in routine clerical work. I mean, a simple way of putting it. I know, Jerry, you don't necessarily agree with with this line of argument,
but it shows if you can no longer sell physical labor and you can no longer sell mental labor, it's not clear that there's another thing, right,
that that human race can fall back on, except we might call it interpersonal or emotional or empathic capabilities,
where we have this sort of intrinsic comparative advantage over machines
because because we know what it's like. Well, right. And I give this example in the book.
Right. What's it like to hit your thumb
with a hammer? Right. Who's done that? Right. Most of you. And someone who hasn't done that. Right? A few. Okay. Well How would you find out what it's like
if you didn't know or you would just hit the phone
with a hammer? You say, Oh, now I get it. Now I understand why people are so upset
when they do that, right. But there's nothing a machine can do
to find out what it's like, right? They can at best
make empirical correlations and assume that it's unpleasant,
but they don't know what it's like. They don't know what it's like to be left
by your lover. They don't know. It's like to lose a parent
or to lose a child or to lose a job or to be promoted or any of the feelings of what it's like to be human. And so there
we have this comparative advantage, and there are also things that we just don't want to be done by machines right? I imagine that at some point in the future
there'll be a profession that we might call lunch to someone who's really,
really good at having lunch with you. Right where you have lunch with them,
you go away feeling much better about yourself. Entertained, amused, wiser, more positive,
and so on. Right. And you won't get those feelings
if that, if that was a robot. Well so we'll see. The difficulty is that most of these interpersonal jobs right now are low status
because they are they are not based on real scientific
understanding. If you compare babysitting
with orthopedic surgery. Right. My children are actually more
and more important to me than my arms and legs. Right. But we pay the orthopedic surgeon
100 times or 1000 times as much per hour
as as the babysitter. Not here in the Bay Area. So. And why is that right? Well, it's because the surgeon gets to depend on literally
thousands of years of medical research on how to do this
effectively and successfully. Whereas, you know,
I remember one of my babysitters when I was seven
trying to teach me to smoke. And if we had, you know, a real science
of of how to be a wonderful
companion for a child based on, the individual psychology,
the psychology of the child and so on,
and the training processes to go with it. Then we would think of those as high status professions
just as much as the surgeon. So there's a lot of science
we haven't done. The human sciences
have been enormously neglected. We have to catch up because those are
the sciences that will form the basis of our economic future, in my view. Well, the thing you and I agree on
is that the future of work is more interpersonal services
and things that are performance oriented. We're not going to want to go see four
robots play a string quartet. You know, we don't want to watch them
play basketball. So there's plenty of that's
where we may differ, as I, I think the historical evidence shows
will be plenty of such work in the future. But that's a different hour
of conversation, I'm sure. One thing you said I'd really like
to focus on for the audience, and I'm afraid we're out of time on it. Say few words. In closing, are you made a point
that I think is very important, which is that
we are currently training AI systems. But the truth is that AI
systems are also training us. And that's one of the perhaps
bigger dangers is that we will become a part of their the optimization algorithms inside these,
whether that's buying stuff from Amazon or staying on social media to read another
couple of posts or whatever it might be. I think that most people
are not as aware of the fact that they are the commodity
that is being sold and they're not in control of that
at this time. And manipulated
and it's going to get worse. And make it's. Way worse. Social media algorithms are really simple
machine learning algorithms. You adjust a clickstream to them. You don't know that you have a mind or
a body or a brain or politics or anything. They don't even understand the content of the things
that they're sending to you, right? They just learn
that is the right thing to send next to get the person to be more more click
producing in the future. Yes. If we can
make that we this the thing to sell next. Make them more empathetic and
knowledgeable and rational human beings. That would be an improvement over
the current state of the art certainly. Yeah. In the meantime I think actually are
certain classes of algorithms called reinforcement learning items
that we should literally ban in those types of user facing roles
because by definition are going to manipulate. Yeah, well, unfortunately, we are out of time. I want to thank Professor Russell and let me see, I'm supposed to say
you'll be sticking around for a ton. Copies of his book, highly recommended. And if you want to watch more programs
or support the Commonwealth Club's efforts in making both virtual and in-person
programing possible, please visit
the Commonwealth Club dot org slash events so I think we can. Thank you. Thank you for.