Stuart Russell – AI: The Story So Far – CSRBAI 2016

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Slides, and links to other - more technical, really cutting edge, contemporary stuff: https://intelligence.org/colloquium-series/

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/Pas__ πŸ“…οΈŽ︎ Aug 01 2016 πŸ—«︎ replies

nice

thanks

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/harbifm0713 πŸ“…οΈŽ︎ Aug 01 2016 πŸ—«︎ replies
Captions
good morning everybody and welcome to the colloquium series so I'm really excited for today's lineup of distinguished speakers starting with professor Stuart Russell the professor of computer science and the Smith is a professor in engineering here at the University of California Berkeley so it would take too long for me to list his qualifications and awards and contributions so I will include the book that he co-authored with Pierre Norvik artificial intelligence and modern approach which is now being used by by universities and the quadruple digits in countries and the triple digits across the world and and also sort has been a powerful influence on the field of artificial intelligence starting to take seriously the positive and negative effects of future advances in artificial intelligence on the world and the things that we care about in that respect he's been talking to influential groups major conferences as well as the Davos World Economic Forum recently and so we're very pleased to have him here today to talk to us about the prospects for work rewinding the field on provably beneficial artificial intelligence and and also mary has been pleased to have him as a research adviser helping direct us in and what things are important for us to work on so so I would like you at all please join me in welcoming our very first speaker professor Stuart Russell thank you very much pets so um I made a last-minute decision to switch to a much shorter talk and that will give us hopefully much more time for discussion so I'm gonna just dispense with the usual preliminaries of this where I talk about you know what is AI and what's happening now and look at all this amazing progress and all these milestones and so on and just say look let's take it as a given for the sake of argument that eventually we will exceed human capabilities and some still not very clearly specified since you know partly because we don't really know what human capabilities are but if we think about what it means to make decisions and how to make better decisions it means if you can take into account more information if you have a better model of how the world works and you can compute more extensively on that model and look further and further into the future so think of this as like alphago moved from the NGO board to the whole world then AI systems are gonna make better decisions than humans and I put an asterisk a sone asterisk is something that linguists use to mean this is this is not quite a felicitous expression in the natural language and so what could I possibly mean by putting an asterisk on better well there's a piece missing not just taking into account more information and looking further into the future but what is the objective that's being optimized in making decision that turns out to be a crucial point so so the upside as as Nate mentioned is it's pretty large because pretty much everything we have is the result of our being intelligent and so if we had more intelligence at our disposal to use as tool as tools we could do all kinds of wonderful things and you know the each of these areas is something that that have they've been problematic for the human race forever pretty much and the last one ecological declaration is getting much worse and it seems like well it couldn't hurt to have access to more intelligence to help and you can even imagine very concrete ways where it might be very useful so one of the biggest issues when you look at poverty and disease and war is actually communities it's not that we don't know what to do about these it's actually that we have we have difficulty in in management of collective decision making and implementation processes that a I can clearly help with sort of if you like global distributed government governance at a sort of micro level where lots and lots of people have to do lots and lots of things for this to work well so in the long run we could get away from the constant you know fight with ourselves and fight with necessity in sort of physics and actually choose how we want human life to be so that would possibly be very good or not I mean there's another at least release we have a choice whether we know how to make that choice that's another question but at least it be nice to have a choice and then the downside well everyone knows about killer robots and everyone knows about the end of employment and then this is other thing the end of the human race which seems to be a very popular theme these days yeah but I would say most the discussion about this theme has at least in the media and when I meet people when I go around giving these talks everyone seems to almost everyone seems to got hold of the wrong end of the stick you have many wrong ends of the stick but there is a sort of a general sense and this has been this goes back to certainly to Alan Turing saying that you know I expect at best that they will keep us as pets or something was that effect that if you make something that's much more than you are then you we might sort of find ourselves in the situation of the gorillas so here they are having a meeting and this guy is falling asleep you can tell it's a meeting and they're talking about whether it was a good idea for their ancestors to have created this human race these these human things which are much smarter than they are they're having a really hard time with this this issue and I think they pretty much concluded that it was terrible idea because now they don't have any control over their own futures and they could easily go extinct and it's say if they had ability to conceptualize their own state they'd probably be very sad about it but that's a very inchoate fear and then that gets translated in the media into all kinds of things like oh you know armies of killer robots are gonna spontaneously rise up and and decide that they hate human beings and so on so forth right so you know you know all of the you know Hollywood sometimes gets it almost right and mostly gets it mostly wrong so more specifically right the problem is is this right that they're gonna be incredibly good at making decisions and doing stuff but somehow it isn't the right stuff I mean if they are incredibly good making this sense and it's the right stuff I they really are helping us realize whatever it is we decide we want to realize you know that's that's that would be what we want so it must be because they're not quite doing that they're doing something else they are the objective that they're making decisions on is not the right one and unfortunately AI by lodge and these other areas operations research control C and so on all assume that that specifying objective is actually not part of the problem at all right it's just you know the user who knows what it is and you know and you can control theory it's like a squared error with respect to the reference trajectory Y squared error well because that makes the equations easier but it doesn't have much connection to actually what anything anyone really cares about so so actually there isn't a lot of help right when you say okay we have better we've got to get these objectives right otherwise we're screwed okay what discipline can I turn to the answer is not really there isn't a place to turn and so no but we know pointed this out so this is a very useful paper I don't know if you have a reading list Nate for for the group but there's an there's there's a nice paper I often point journalists to this paper so he wrote it in science I think in 1960 and it was in as a result of looking at Arthur Samuels checker playing program that learned to be better playing checkers then an office under was so it's a very early demonstration refuting the usual claim that all you know machines can only do what we programmed them to do so we don't need to worry right and so he said okay if we use to achieve our purposes in mechanical agency with whose operation we can't interfere we better be quite sure that the purpose is the purpose we really desire and that that's a pretty clear statement of the problem from 56 years ago but arguably that that statement could have been written by King Midas whenever this is some uncertainty about the date all right have you tried to write it down yeah in the paper as well so so this is a writing the story of King Midas is actually both in microcosm and macrocosm a lesson for Humanity right so the whoever it was it was granting King Midas's wish took his objective literally and and then it was too late right once his food and his wine and his daughter all turned to gold he couldn't undo those things and he said damn you know I wish I had said it right and this is often in with these stories in other cultures you know there's a genie in the genie grants you wishes you know this is in going back to the time of King Solomon and in the Jewish culture and in Arab cultures and lots of others as a version of this story where you ask for wishes you get what you want a man you know your last wishes please undo the first two wishes because I got them wrong right and then in the macrocosm right this is actually telling the universe or perhaps what you what we are wishing for right the ability to automate and have sort of super control over everything in its or unlimited powers and they actually be a poisoned chalice for the human race in general not just through the individual so we better be more careful about our macro policy and so Steve Omohundro pointed out some some additional problems are not just that when you have the machine with the wrong objective right in some sense you're you're setting up a chess match or a go match between the human race and the machine that's busy pursuing the objective that's wrong and we know what happens with those matches so but Steve pointed out that it's actually worse than that because if you give a goal to a machine then even if you don't ever mention to the machine that it should preserve its own existence so I mean Asimov didn't need to have the third law saying that machine should preserve avoid harm to themselves because actually unnecessary right they will nonetheless form this as a sub goal because you you can't fetch the coffee if you're dead so you give the machine they're gonna fetching the coffee the Machine figures out based on physics that if it's dead it can't get the coffee so it naturally has a sub goal not to be dead right as a consequence of needing to get the coffee this is a very straightforward point and also you know it can improve for sort of typical goals in the real world you improve your chances of success by having more resources more computational sources more money and so on so all other things being equal you're going to want to acquire more of those so then if you have a machine that has the wrong objective and he's gonna have these things as sub goals then you can clearly see that you're gonna have how like problems so that's the high-level story and it's it's a pretty straightforward story and then there have been a number of arguments about why nonetheless we should pay no attention to this issue yeah so so I thought it'd be helpful to go through some of those and we can discuss in further after the end but you will come across these you probably have come across many of them already so one of the first responses I'm sorry this colors not ideal for for the lighting situation could we maybe we could turn the light yeah we thought they were low enough but in fact it wasn't low enough given they chose the wrong color okay orange okay yep so one all right so orange is these are things that other people say right so one typical response is it's never going to happen right or you know we're not going to achieve human-level AI and so it's pointless to to worry about this or it's it's so far off in the future that it's it's completely ridiculous and you know if I think if it was true that if you went to people back a million years ago you know who figured out how to make fire actually pre humans and told them that this fire stuff was gonna cause global warming and they should stop right I think that was probably like that would be not good advice so if you know if a I was gonna happen you know a million years in the future then yeah probably it's too soon to to even think about what we might do but I wanted you know so I so in response to that I sometimes point to a historical example this is Ernest Rutherford who was the most famous nuclear physicist of his time so not a weird fringe dude but actually the main guy in nuclear physics and here's what he said on September 11 of 1933 essentially that it will never be possible to to get energy out of atoms right they knew that the energy was in there based in they had done the mass defect calculation they knew the equals M c-squared they knew the amount of energy that was there but his considered view which he expressed in many ways in many forms and many times was that it was impossible to ever get it out and even Einstein kind of agreed with this and then that was September 11th he he said this at a meeting of the British Association for the Advancement of science and it was reported in The Times and Leo Szilard read this in The Times the next morning and he got annoyed and so he invented the neutron induced nuclear chain reaction and within a few months he patented early version of the nuclear reactor you know with negative feedback control mechanisms to to damp out the critical reaction soon after that people were patenting nuclear bombs and and so on so forth so it went from never to 16 hours and so it's very hard to predict these things and I think just saying well I'm an expert and it's never going to happen he's not good enough argument and this was what he wrote so after he did it he did a demonstration of a natural fission reaction and he said you know there was little doubt in my mind that the world was headed for grief because at that point they were also in an arms race with Germany and he anticipated that there would be nuclear conflict with Germany ok so a version another version of that is it's too soon to worry about it you know if you if you ask many people when do you think is likely to happen you know I generally try to avoid giving predictions because precisely because it for the nuclear physics example I think it worked quite so it requires breakthroughs but it's very hard to say when those are gonna happen but if you ask people in the field or near the field they'll say you know give you some number that looks like 50 to 75 years some people earlier but not that many people think it's not gonna happen this century right so so if I said that you know in 50 years time a giant asteroid is on course to collide with the earth you know when we saw it's way too far away to even worry about it or even start thinking about the problem you know so come back in 58 years sorry 48 years and then won't like them won't give you some funding to work on it that wouldn't be the kind of response one would expect and arguably for climate change the right time to intervene would have been around 1900 when we already knew the basic physics you know Iranians and others had published papers you know giving quantitative calculations the greenhouse effect and projecting carbon dioxide and you know influential people like Alexander Graham Bell had said you know this is gonna be a major problem we have to do something but it was ignored I don't know exactly I haven't looked at the history of why people didn't pay attention at that time but that would have been a time when you could have intervened before the fossil fuel industry and electoral electrical power production became so important to our entire economy that that it's very hard to change you know so you could have started investing in wind power and solar power improved battery technology and other kinds of things a long time ago but we didn't so my distinguished colleague Andrew Inge has another version of this story right it's it's like worrying about overpopulation on Mars he since changed that to Alpha Centauri to make it seem even more ridiculous or perhaps he thought Mars well that fits it is reasonable to worry about Rovers I don't know having seen the Martian I'm not sure but you know this is it's you know it's a it's an appealing analogy but I think is totally misleading you know another version of this which I saw in a paper recently was you know it's like worrying about black holes suddenly materializing into us a little bit I mean yeah if they did that would be terrible but you know there's no particular reason to think it's going to happen so it's sort of silly to worry about it right and the answer to both is so they're saying well you know if we were spending billions of dollars to move the human race to Mars without thinking about what we would breathe when we got there that would be that would be silly right you know similarly if we were spending billions of dollars to cause black holes to materialize in near Earth orbit then it would be reasonable to ask you know is that a good idea and you have you thought about the consequences how would we would prevent the obvious sequel i and you know so so I don't find and doings argument well no no I me see if you're gonna use the argument that beats this is just like materializing you know worrying about materializing black holes they say no it isn't just like that so yeah so I mean so in other words the onus is on someone who says that to to actually prove that in fact AI is harmless that it isn't a black hole because we are spending billions of dollars to make it happen another another version of this is well if the problem comes with giving objectives like make some paper clips or whatever to the to the AI system then it's better not to have us giving the goals the AI system just let the Machine indent its own objectives which is a little odd right I mean it's sort of like saying you know if you have a problem steering straight then the best thing to do is remove the steering wheel altogether and just leave it up to chance as it were to make the right thing happen this is this is something that you see a lot I be M for example this is a general there's you know view of why we don't have to worry well because we're gonna have these beneficial human AI teaming and so it's not gonna be you know machines independently operating and deciding what to do there's in the human AI teams of work together but you you can't have a human AI team unless the team members all are aligned in what their objectives are so it's just a restatement of the problem I mean yes of course we want beneficial human AI teaming but that is that in fact making the question how do you ensure that the AI passed the team is actually on the team another common responses well okay you're right yeah it's really shoe but there's nothing we can do about it whatsoever because it's well known that you can't control research you know there's no way to put a stopper on human creativity you know and then that usually people will show cute movies of of kids playing you know interacting with robots and exhibitions and look at this you know outpouring of human creativity and there's no way you can do anything about this and and there's you know there's some validity of that but it's not really true right we can and do biologists deliberately said engineering the human genome is not something we want to do and that was a complete switch because an awful lot of work on genetics and an early molecular biology was precisely about the ability to to improve humans and then it was decided oh perhaps that isn't an ideal goal for biology because that opens up a Pandora's box of you know genetically --tz-- and all the rest of the stuff that science fiction has already looked at so they said no and it's been 40 years and it's still hasn't happened although it's the rich been reopened recently with there's this CRISPR technology although the inventors of CRISPR also believe that we shouldn't use it to to engineer better humans another interesting reaction is this is just typical Luddite right you're just attacking AI or attacking technology so in fact Elon Musk and Stephen Hawking and their various other people I guess everyone who signed the open letter on robust and beneficial AI was included as when as of the 2015 Luddite of the Year award from the information technology innovation foundation who who seemed to be vehement ly opposed to any any of these thoughts and I just think this is misdirected it's misunderstanding what we're saying completely right if a fusion researcher says fusion researchers need to be contained in order to be safe right that doesn't make them a Luddite it's just complete misunderstanding of what's going on right they're not attacking physics by saying that we're not attacking I mean we're ridiculous to say that Turing was attacking AI by pointing out this long term issue or that we know was attacking AI or Bill Gates is attacking right right and these these are people who put a lot of their effort into creating AI in the first place so another reaction that you often see even from very distinguished AI researchers is Rome's there isn't really a risk right because if anything we don't like we immediately just switch off the machine and that solves the problem right as if super intelligent entities couldn't possibly think of that that eventualities and wouldn't you know so it's sort of like saying yeah you know if you're if you're losing a game against alphago well they just you just win all right what's the problem you know just win they're easy you know some people say well if we could if we just avoid anthropomorphizing and putting in these goals like self-preservation then of course there won't be a problem Steven Pinker's version of this is we just make female ai's they wouldn't want to take over the world literally he said this this is just these stupid male AI researchers who don't get it yeah but you can't not put it in I mean it doesn't matter if you don't put it in it will it will arise anyway because you can't get the coffee if you're dead so I'm happy to discuss any of these further on you may have heard other arguments that you you're not sure how to respond to so the proposal is that in fact you know the part of the problem is that AI is traditionally conceived for which I guess I have some guilt in conveying this idea that that AI is about rational behavior which means optimizing objectives you know allows for the past you know release doesn't think about the issue of well what if the objective isn't the one that you actually want to have optimized so could we change AI to a different field this should initially we're going to call it provably beneficial ai and you can see why they're asterisk because this is almost oxy oxymoronic because beneficial is so vague and touchy-feely and provably doesn't seem to fit with that eventually it'll just be called AI because you know just like we don't you know if you're a civil engineer you don't say oh I work on bridges that don't fall down right you just say I work I work on bridges right it's just so just intrinsic to bridge design that they don't fall down and it should be intrinsic to AI system design that they are supposed to be beneficial to you and that's sort of what it means to do it I so eventually it will just be called AI but for the time being we have to distinguish it from traditional AI okay and how do you do that so so here's one way and there are there are others you know that there's a whole range of research that can be done on in some sense trying to constrain behaviors of AI systems which is I'm not going to talk about but that's a completely plausible and interesting and but as yet totally unsolved direction but if we want to think about this this question of how do we get rid of the problem of of misaligned values well you could say well the only way to get rid of misaligned values is to just to get the values to be exactly the same all right to get the objectives to be exactly those of the human race and then everything's fine that's but that's too difficult and it's also isn't quite necessary right what needs to happen actually so this is number two is crucial number one is just to point out in some sense that as Moore's Law is or what at least one of them is superfluous we don't want the robot to care about itself at all it has no intrinsic objectives whatsoever it's only objective is to optimize human values but it doesn't know what they are right and so this is a if you like this then it's get you get soft alignment right that it's at least compatible with humans because it's uncertain about what the human objective is and it's as as we say in power ability the support of its distribution includes whatever the true human value function might be even though the machine isn't sure on which which of the possible value functions is right and this turns out to be quite helpful and then the third part of this is well ok how yeah we could have very robot that's very very very uncertain it doesn't know if humans like losing legs or like gaining extra legs or just like having the number of legs they have right well that's not a very helpful robot right because now the robots are less I'm really not sure what to do to help you ok so you what you want to get better at understanding human so it could be more helpful to you and the information source is there right the raw data if you like the ground truth is contained in human behavior because that reveals information about human preferences so those three simple ideas you could put together in various ways and get to start to make progress so so a version of the self-preservation thesis from our mohandro is is this one way to have a robot that you know it has an off switch that someone can come along as press the off switch now the robots did right and you know if you take Omaha murder and literally what he says is look if the robot has the objective of getting a coffee you know one way of failing is that someone comes along and presses the off switch so if robot has an action which permanently disables the off switch so it's sort of an internal off off switch then then it would naturally do that right there's no cost and it gets rid of one branch of the tree that would lead to failure and so it's clearly a good idea right and when you put it like that it's sort of hard to find even think of a way around it in fact when you put that into mathematics there is no way around it it's in fact you know unavoidable and so but if you if you avoid giving the robot a precise objective but instead you allow it to be uncertain about the objective so for example it might know that it's supposed to get coffee but it's uncertain about what other what the signs of the other variables and the value function might be you know so is it allowed to you know kill people who get in the way of the coffee machine it's not sure all right well so then it starts to its behavior will be different because of that uncertainty in the value function and in fact so then you've got uncertainty about the the human objectives and then you have to have some attribution of rationality to humans it doesn't have to be perfect but it has to be so to me behavior has to be sort of correlated with with their objectives and so roughly speaking then the the you can think of the human action of switching off the robot is actually providing information to the robot about what the human's true value function is in particular we know whatever the robot was about to do is not helping right and so that's why we're switching off and so the robot should be happy to be switched off because that leads to an outcome that is more beneficial from the human than the robot disable and be off switch okay and so and you can when you do the math that works out and in fact the margin of safety is proportional to the allowed amount of uncertainty about the human value function and but of course the more uncertainty there is about the even value functions are less helpful the robot can be and that seems to be an unavoidable trade-off okay so yeah sure then the consequence is it's actually in the robots interest to to leave the off switch available so then let me talk a little bit about this third point value alignment you know how do we learn what the value function is how we narrow down this uncertainty from the dirting behavior so there's this old Field called inverse reinforcement learning it has other versions so in economics and applied you know consumer theory they do something called preference solicitation you know so so many presents consumers with you know 81 different versions of headphones and asked them to say how much they pay for them or which ones they like better and so on so forth to try to figure out the human value function for headphones and you know so that's the sort of those are non sequential decision problems like do you want this one or that one but there's another field called structural estimation of mdps where for example you know the economists look at when do people have children and then somehow you figure out the value of children from from people sequential child production behavior and things like that so the general idea is that the behavior is a is a very complex manifestation which is made complex actually by the environment in which the behavior is produced but underlying it there's a simple explanation which is that the human wants some things and cares about some stuff and and so that's a if you like the physics of behavior alright what is the underlying Laurer physics is the humans want things and they act to try to get them and so you can invert the behavior to figure out what it is they want and this is this has been around in AI since 98 and there are quite effective algorithms that are quite scalable and people have done there are several hundred papers on how to do this it's not quite the right problem for one obvious reason is that you don't want the robot to adopt the value function of the human right that's that's trivial but important sorry if the robot watches knees struggling out of bed and wandering down stairs like a zombie to get my coffee it can figure out that oh you know you Stewart really likes to have coffee when he wakes up but you don't want the robot to want coffee that doesn't help right so so it's not adopting the value function that's usually how it's done in the inverse reinforcement learning you know you you will a copter pilot and now you learn about desirable helicopter maneuvers and then the robot doesn't so it actually adopts the value function so the framework we developed is a generalization of that called cooperative inverse reinforcement learning which is a game theoretic setting and you could essentially you have a human or multiple humans and a robot or multiple robots and as I mentioned they the human has a value function and at least implicitly they know it or they might not be able to make it explicit the robot know doesn't know it and knows it doesn't know it but if that's its objective to maximize and and then when you when you solve this game when you look at the solutions of the game they automatically produce the kinds of things that you want namely you know the robot is cautious it asked questions the human actually has an incentive to teach the robot so that because the faster the robot figures out what the human wants the more it can be helpful and new we can actually show show little examples and so this actually contradicts the inverse reinforcement learning assumption all right the inverse reinforcement ending assumption is that the human is acting optimally according to some value for and then we observe the behavior and we try to figure out what what the value function is but actually in this setting the human doesn't act the same way as they would if the robot wasn't there right they sort of will you know demonstrate things they'll even you know point out what not to do right whereas the human by themselves would never do that because totally pointless all right and so you actually get different solutions and and and so since the human is gonna behave as it were a non-optimal at least in the isolated sense then the the algorithms for learning from that behavior also have to be different so the standard IRL learning hours won't work in this setting and they have to be revised so it creates a much richer more complicated and interesting setting so he's just a very trivial example that Dylan my student Dylan had feel Manila's not here right now so he just did some sort of deliberately trivial but you have a grid world and there are three locations that can be of throats or three centroids of value and they can have different you know any of these could be positive or negative and then they would radiate that value to their neighboring squares as you can see here this is a peak of value and this is the peak of value this is a kit that you want to avoid and so the optimal you know if the human or you know a rational agent is put in this environment and let's say it starts here then you know the optimal behavior because we're slightly to the left of the the center here the alto behavior is to go directly to the left-hand peak of value and then stay there right that's that's the optimal solution for this environment but and then what I've shown here is okay if you see that behavior and you run IRL right then you will conclude this gray what I mean this grey map shows the conclusion that the IRL garden draws about what is the value function underlying this behavior okay and in fact there's in the posterior over value functions this is now whereas in truth it's highly positive it now looks slightly negative because the robot didn't go to the right right and therefore that rules out the possibility that that this is the highest value square right and then so the the mean of the posterior is actually sit now slightly below zero so to speak it definitely didn't go down so it's pretty sure that's not a good idea either right so you get the wrong conclusion from observing the behavior and in fact if you solve the if you solve or you actually this is one round of best response in the game so it's not a complete solution to the game but the the one after one round a best response to what the human does is actually to visit both of these regions of high value and then this shows the posterior that they learning out of them obtains and it's much closer to the true posterior from compared to that one and so this is just a trivial observation that the solutions of these two player games are different from optimal behavior by one agent observed by a second agent that's trying to figure out the balance okay so then looking looking ahead we know beyond trivial toy examples and say okay let's imagine if we take this seriously we are actually going to need to figure out to a large extent what the human value function is and that's you know that's easily a twenty thirty year project and it's interesting to think about well what's the output right it's like if I if if you guys are a bunch of venture capitalists and I always hear saying hey I need funding to to start this and then think it's okay so what are gonna sell at the end of is how I'm gonna sell value functions right well what exactly does that going to look like you know so just you just try to imagine doing this right and taking it seriously and I think okay well what are the sources of information well actually there in all there's enormous amount of information about human behavior right so everything pretty much everything that's ever been written by humans is actually about people doing things some of it very boring like people buying two bushels of corn and exchanging that for you know some arrowheads but even that is really useful information about the human value function and your novels and and newspaper articles and everything else and every television program you know there's not a lot of television programs where they only talk about rocks and not about you know nothing about what people do or care about or any of those things so so almost everything out there is gonna be useful information a lot of it is you know in newspaper articles and levels and everything is it one person does something another person gets upset or happy alright that's also useful information but again it's a form of behavior it's not it's not direct proof that wine is wrong any other one's right but it's evidence that it can all be thrown into the mix you know if it's understood properly so you know so that we in order to do this we'll need to do natural a new language understanding and your computer vision to understand all the TV programs and what everyone's doing in speech nothing else there's lots of AI to be done to make this work but it's easier than building the super intelligent AI system that we are preparing for so it's it's it should be feasible and so that this is this is this is good news we need to solve this actually much earlier so this this startup company the values are us corporation you know will will actually have customers fairly soon I think you know so self-driving cars domestic robots you know so one example I give I don't think I have the slides here I just gave a talk in Korea where I made a little sort of cartoon sequence of a robot in the house and then there's the little kids sitting there and their plates are empty and they're hungry and then the robot has to find something to eat if the fridge is empty oh and there's a little cute kitty and then robots oh yeah we'll cook the kitty for dinner and then there's a newspaper headline and that's the end of the domestic robot industry so there's a very strong economic incentive for self-driving car companies and domestic robot companies and personal digital assistant companies right you know if they're gonna be helping you book your airline flights and and making meetings you know you don't want them to make meetings with lunatics you don't want to book flights there Antarctica and so on so they all need to understand your value system fairly well so there's this very very strong economic incentive to get it right even fairly soon so that's good all right that means that this should be this should be part of the AI industry and we will be developing the technology so you know really reasons that these are related reasons to the concern about super intelligent AI but they're much more mundane that the difficulties include you know the fact that the humans are complicated some of them are nasty so how do you you know how do we avoid you know there's lots of bad behavior out there how do we avoid learning that we should be that the robots should be supporting all these very undesirable behaviors you know even if it's not clear the extent to which our behavior can even be successfully described as as trying to optimize any value function there are lots of reasons for thinking that isn't true including the fact that evolution doesn't care about us as individuals anyway like so a lot of evolutionary theory says no it's nothing to do with you and your desire to reproduce it's actually you know small groups of genes that actually exist across multiple species and they're the units of optimization and they're the ones that are really being selected and from a even Irish you know even if you think about the species is a unit right well as as a unit the the species if it's going to survive needs to do both exploration and exploitation exploration means one way of having the species explorers by producing individuals who are completely nuts right who acted extremely risk prone ways and then sort of go off and explore you sail across some ocean that they think is gonna fall off the end of the earth and happen to arrive another continent and things like that you know completely completely nuts the kind of stuff that they do on Star Trek right it's not that the individuals involved are irrational is that the concept of rationality in some senses doesn't apply to individuals at all right it's actually that they're just fulfilling a function which is part of the rationality of the species or the tribe or the gene group or whatever so so things can get really really complicated in understanding the you know the full spectrum of human behavior and how we infer anything from it you know we're computationally limited so if you watch two people playing chess well you know one of them loses does that means because they wanted to lose the game or no it's because in fact it's because they you know they're both computing computation limited and one's maybe slightly more than the other all right could be that he's trying to lose yeah it could be these trying to lose that it does happen but usually he are not doing it and so on and of course you know we there's different you know here all humans are individuals and then there's differences across cultures and so on and then and there are these questions of trade-offs right that we even if you do learn the value function of individuals you can't optimize everyone's value function because they're ours enough countries to be king or queen all of them there isn't enough money for everyone to be a billionaire and so on so on so on so so so how do you deal with those so and these these are age-old questions in social sciences so we're not gonna solve by observing the human behavior but by making everything much more explicit and mathematical and empirical hopefully we can make a lot of progress and maybe we'll learn more about what we what we think we should be doing and that will make us better at doing it okay so the consequences are various so the the objective is I think in part to change how we think of the field to include these considerations and and and then ensure that what we're building is actually the produces behavior that we're happy with and you know as I said there are a lot of questions that social scientists have studied for a long time and that will have to be incorporate some of those concepts will have to be incorporated and then last question is well there is a lot when you actually get concrete and say okay we twenty years time values are us corporation he's now selling these things you know what are they going to look like right it's not at all obvious it I could do it for chess very easily I could sell you a value function for chess you know and it says nine points for a queen and five points for rook and it's pretty straightforward right but that's because chess is fully observable and there's no argument about whether you have a queen or not but the inputs to a domestic robot are the the video sequence coming in through its camera you're not going to define value functions in terms of video sequence is coming in through cameras right so you know give zillions of pixels that will be daft all right so what are you gonna do and that I think is a somewhat open question but you know a technically technically important question to answer so coming back to Norbert Wiener all right so he in his paper which I really do recommend reading you know he points out that that these questions are incredibly difficult even a scientist is sort of only seeing a very local part of an unending stream that goes on for millennia and might think that what he's doing is beneficial that in fact could be entirely wrong and you have to look over a long time scale and try and figure out the answers and it's very difficult but you have no choice but to try to do it so I guess that's why you're all here thank you it's just yes that's a good question I mean I think the point I mentioned towards the end that value functions apply in these partially observable environments and how you define them right so you could imagine let me just take something very simple like you know is the cat alive or dead right so you could put you know higher value on the cat being alive than the cat being dead but for different robots the mapping from percept sequences to the probability that the cat is in fact alive or dead would be different and so presumably we all have to agree on what we mean by alive and dead and then the robot manufacturer has to have some recognizes for that and this is all very hand wavy but and and then you supply the value so they're like dead the then the problem with that is that in fact you know alive and dead are not well-defined you know if you talk to anyone like a neurosurgeon who works in the hospital it's extremely high in many cases to figure out if someone's alive or dead and one of my colleagues told me that so the hospital allowed him to run experiments on people who had been officially declared dead so he kept them alive on the ventilator or kept them kept the body's functioning on the ventilator kept them alive because they'd already dead and two of them got what got up and went back to work so so this is actually you know it's it's a tricky thing it's not really fine and this is exactly where the super intelligence this they find the worry is they find the loopholes right they find ways of achieving what you specified is the objectives that are so you would just never imagined they would think of that they're so extremely counterintuitive but they you know just like tax law right you think you've ruled out a loophole so people find this completely bizarre way of you know they pay their employees with gold coins right because that's you know they're it's a five dollar gold coin it's five dollars right I give you one each and so you don't have to pay any tax because you're only making five dollars a month but you know you know that that's kind of an example but you know they will come up with much much much more devious ways that you know and in alive or dead so people in this in the existential risk literature talk precisely about situations that you could argue a kind of in this gray zone or they'll define in this yes you're still alive but you're immobilized in a box with with a heroin drip and so on and and you might say well that's really you might as well be dead but no you're alive you've met that you met the stated criteria so so there I think is where where the question of having alignment which is not perfect right so might one you get sort of a clash of intuitions if one says look if you're if you're the value function of the robot is optimizing is within epsilon of a true value function then nothing too bad can happen right and you can maybe prove that you know the most you can lose is you know epsilon squared over one minus gamma or something right but then you know your other intuition says but if the robot is way way way smarter than you you know it can somehow use that epsilon to as a loophole to produce something that in the long run you were extremely happy with unhappy with I should say and I don't I mean that that seems like a question that can be attacked mathematically I think it will come out the desire of the right way but I still I'm still not sure about it they're not fully identifiable so so like the reward functions and inverse reinforcement learning are positive to exist but they they are not directly observed episode so we have all the challenges of latent variable my learning which is that often you cannot pin down the exact value of these things but still they give the reason to use them as they give a very compact and maybe even approach on causal explanation of the behavior so it's yeah it will be tough and so the but that means the AI system needs to know that it doesn't know which come back to very first slide and needs to be to behave robustly with respect to that yeah but I think I I I almost want to put on my luck fees are they accent and say I mean the it's not just that we can't observe alive and dead directly but in fact it isn't a well-defined right why we that even notionally you can't say okay here is a here's a particular world and alive is true and here's another one where alive is false there's always a dichotomy yeah but is that uncertainty treated the same way in other words you take expectations over it or do you do you take work worst case over at yarn that's one way of thinking I mean of course the other responses is try to find those corner cases and then check to see the people like so what do you think about this yes what is more dangerous and yep so we thought about cooperative so you there's the shallow particular agent which is the human which is this well-defined thing and in real world is this sort of like uncertainty this relates to diversity like what what counts as human and what human action is you think I would change very much so what counts as human I mean it's not that I have two arms in two legs and exactly yes so what what counts as what you care about yeah oh yeah I mean in some sense that's a political question you know should we include in our observations the you know the behavior of the clinically insane you know and what you know what about animals and so on I I'm not sure that anything I can instruct things that canvas you murderous definition and is that their preferences like it gets back to the Victorian point yeah yeah so I do I don't know whether we will have to be microchipped at birth so that the system knows that we recount is real and and these are non Fae Keable microchips but yeah and so yeah I don't know I maybe we need to make sure it doesn't ever make people yeah I think that's just last question another way once no some way to tell you talk show funny you should ask I was just thinking about that last night it sort of imagining a large library of decision making scenarios which would be represented bazoom by you know an in an embedded 3d virtual reality experience that would go on for some time and then the robot would have to be we deciding how to behave in this in this scenario and we can you know sort of kind of like a driving test right so at least you know you you've got a hundred thousands in areas where you're sure that across all these it's behaving adequately well would be that would be a good start for a domestic robot I think and for a self-driving car oh because we're not right that assumes that we the human can induce from these hundred thousand scenarios what exactly the value function is and so we we can largely the assumption is that we can largely act in them morally and societally reasonable way in a large variety of settings but we're unable to make explicit in a reliable way exactly what the value function is that is enough for you say yes here's the right wedding function yeah also a system where I teach what I mean thank you thinking about the the thing about the the imagenet competition right I can recognize all these objects I can have a system that learns by any mechanism how to recognize objects and I can test it on a million test cases but I can't write down the discriminating function yeah all right so it's just like that yeah sure I mean that's that's one way of doing it and in some sense that's precisely except that I'm looking at training data which is occurring naturally right the actual behavior of the human race where I haven't validated that everyone is behaving precisely according to the values that I want the robot to learn but you still have to be able to learn from back that kind of data right so so King Midas did what he did and nany he's expressing remorse well you can learn from that behavior sequence something about human values that they they like gold but they actually like their daughter even more than gold but they're also that they're not 100 percent rational in collecting the outcomes of their choices right that's all good information even though none of it is optimal behavior so yeah what we used to call near-miss examples of the machine that english' two before it was entirely forgotten and replaced with a new machine learning literature but yeah yeah so near-miss I mean these are these are all good cases and and you could you could look at fables and and other instructive stories for children is precisely doing that for human beings all right so let's thank professor Russell again
Info
Channel: Machine Intelligence Research Institute
Views: 6,884
Rating: 4.9252338 out of 5
Keywords:
Id: zBCOMm_ytwM
Channel Id: undefined
Length: 63min 21sec (3801 seconds)
Published: Thu Jul 28 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.