Eliezer Yudkowsky on the Dangers of AI 5/8/23

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today is April 16 2023 and my guest is Elias rudkowski he is the founder of the machine intelligence Research Institute the founder of the less wrong blogging community and is an outspoken voice on the dangers of artificial general intelligence which is our topic for today Alias are welcome to econ talk thanks for having me you recently wrote an article at time.com on the dangers of AI I'm going to quote uh Central paragraph quote many researchers steeped in these issues including myself expect that the most likely result of building a superhumanly smart AI under anything remotely like the current circumstances is that literally everyone on Earth will die not as in maybe possibly some remote chance but as in that is the obvious thing that would happen it's not that you can't in principle survive creating something much smarter than you it's that it would require precision and preparation and new scientific insights and probably not having AI systems composed of giant inscrutable arrays of fractional numbers uh explain um well I I mean different people come in with different reasons as to why they think that wouldn't happen and if you pick one of them and start explaining those everybody else is like why are you talking about this irrelevant thing instead of the thing that I think is the key question whereas if somebody else asks you a question even if it's not everyone in the audience's question they at least know you're answering the question that's been asked so I could maybe start by saying like why I expect stochastic gradient descent as an optimization process even if you try to take something that happens in the outside world and press the win lose button any time that thing happens in the outside world doesn't create a mind that in general wants that thing to happen in the outside world but maybe that's not even what you think the core issue is what do you think the core issue here is why don't you already believe that let me say so okay I'll give you my view which is rapidly changing I I we interviewed uh wait that's the railway I interviewed Nicholas boster back in 2014. I read his book super intelligence I found it uncompelling um chat GPT came along I tried it I thought it was pretty cool and uh chat GPT 4 came along I haven't tried five yet but it's clear that the path uh of progress is radically different than it was uh in 2014 the trends are very different uh and I still remain somewhat agnostic and skeptical but I did read Eric Howell's uh essay and then interviewed him on this program and a couple things he wrote after that and you know the thing I think I found most alarming was a metaphor that um that I found later Nicholas Foster Muse almost the same metaphor and yet it didn't scare me at all when I read it Nicholas Foster which is fascinating I may have just missed it I wouldn't I didn't even remember it was in there the metaphor is is primitive you know uh uh zizanthropist man or some primitive uh form of of pre-human homo sapiens sitting around a campfire and human being shows up and says hey I got a lot of stuff I could teach you oh yeah come on in and Paul pointing out that it's probable that we either destroyed directly by murder or maybe just by our competing all the previous hominids that came before us and that in general you wouldn't want to invite something smarter than you into the campfire uh I think Bostrom has a similar metaphor and I it just that metaphor which is just a metaphor yeah it did cause I mean it gave me more pause than I'd even before and I still had some uh let's say most of my skepticism remains that the current level of of AI of which is extremely interesting uh the chat GPT variety doesn't strike me as itself dangerous but but struck me as what alarmed me was hole's point that we don't understand how it works and that surprised me I didn't realize that I think he's right so that combination of we're not sure how it works while at a pure sentient I do not believe uh it is sentient at the current time and I think some of my fears about extensions come from its ability to imitate sentient creatures but the fact that we don't know how it works and it could evolve capabilities we did not put in it uh emergently is somewhat alarming but I'm not where you're at so why are you where you're at now where I'm at um okay well suppose I said it's they're going to keep it iterating on the technology uh it may be that this exact algorithm and methodology um suffices to is I would put it all the way uh get smarter than us and then to kill everyone and like maybe you don't think that it's going to and maybe it takes an additional zero to three fundamental algorithmic breakthroughs before we get that far um and then it kills everyone so like where where are you getting off this train so far so why would it kill us why would it kill us right now it's really good at creating a very very thoughtful condolence note or a job interview request that's takes much less time and I I'm pretty good at those those two things but it's really good at that um how's it going to get to try to kill us um so there's a couple of steps in that one step is in general and in theory you can have Minds with any kind of coherent preferences coherent desires that are coherent stable stable under reflection if you ask them do they want to be something else they answer no you can have mines well the way I sometimes put it is imagine if a super being from another galaxy uh came here and offered you to pay you some unthinkably vast quantity of wealth just make as many paper clips as possible you could figure out like which plan leads to the greatest number of paper clips existing if it's coherent to ask how you could do that if you were being paid there's it's like no more difficult to have a mind that wants to do that and makes plans like that for their own sake than it then the planning process itself like saying that the Mind wants a thing for its own sake adds no difficulty to the nature of the planning process that figures out how to get as many paper clips as possible some people want to pause there and say like how do you know that is true or for some people that's just obvious like where are you so far on the train so I think your point of that example you're saying is the consciousness let's put that to the side that's not really the central issue here um algorithms have goals um and the kind of intelligence that we're creating uh through neural networks might generate some goals uh it might decide go ahead some algorithms have goals one of the so like a further point which isn't the orthogonality thesis is if you grind anything hard to grind optimize anything hard enough um on a sufficiently complicated sort of problem well humans like why do humans have goals why don't we just run around chipping Flint hand axes and outwitting other humans and the answer is because having goals as it turns out to be a very effective way to Chimp Flint hat hand axes when once you get like far enough into the mammalian line or even like sort of like the animals and brains in general that there's a thing that models reality and asks like how do I navigate pass-through realities like when you're holding like not not in terms of kind of big formal planning process but if you're holding a flint hand ax or looking at it and being like ah like this section is too smooth well if I chip this section it will get sharper probably you're not thinking about in goals very hard by the time you've practiced a bit when you're just starting out forming the skill you're reasoning about well if I do this that will happen and this is just a very effective way of achieving things in general so if you take an organism running around the Savannah and just optimize it for Flint hand axes and probably much more importantly outwitting its fellow hominids if you grind that hard enough long enough you eventually cough out a species whose competence starts to generalize very widely it can go to the Moon even though you never selected it to be an incremental process to get closer and closer to the Moon it just goes to the moon one shot does that answer your central question that you're asking Justin no not yet okay but let's let's try again um the paperclip example which you know in its dark form the AI wants to harvest kidneys because it turns out there's some way to use that to make more paper clips so the other question as in you've written about this I know so let's go into it is you know how does it get outside the box how does it go from responding to my requests to doing its own thing and doing it out in the real world right not just merely doing it in Virtual space um so there's like two different things you could be asking there you could be asking like how did it end up wanting to do that or given that it ended up wanting to do that how did it succeed or maybe even some other question but like which of those would you like me to answer would you like me to answer something else entirely no let's ask both of those in order sure all right so how did humans end up wanting something other than inclusive genetic fitness like if you look at Natural Selection as an optimization process it grinds very hard on a very simple thing which isn't so much survival and isn't even reproduction but is rather like greater Gene frequency because grain greater Gene frequency is the very substance of what is being optimized and how it is being optimized natural selection is the mirror observation that if genes correlate with making more or less copies of themselves at all if you hang around in a while you'll start to see things that made more copies of themselves the Next Generation um gradient descent is not exactly like that but they're both hill climbing processes they both move to neighboring spaces that are higher inclusive genetic fitness lower the loss function and yet humans despite being optimized exclusively for inclusive genic Fitness want this enormous array of other things many of the things that we take now are not so much things that were useful in the ancestral environment but things that further maximize goals whose Optima in the ancestral environment would have been useful like ice cream it's got more sugar and fat than most things you would encounter in the ancestral and environment well more sugar fat and salt simultaneously rather so it's not something that we are that we evolve to pursue but genes coughed out these desires these criteria that you can steer toward getting more of where in the ancestral environment if you went after things in the ancestral environment that tasted fatty tasted salty tasted sweet you'd thereby have more kids or your sisters would have more kids um because the things that correlated to what you want as those correlations existed in the ancestral environment increased Fitness so you've got like the empirical structure of what correlates to Fitness in the ancestral environment you end up with desires such that by optimizing them in the ancestral environment at that level of intelligence when you get as much as what you have been built to want that will increase Fitness and then today you take the same desires and we have more intelligence than we did in the training distribution metaphorically speaking we have we used our intelligence to create options that didn't exist in the training distribution those options now optimize our desires further the things that we were built to psychologically internally want but that process doesn't necessarily correlate to Fitness as much because ice cream isn't super nutritious whereas The Ripe Peach was better for you than the raw than the hardest rock Peach that had no nutrients because it was not ripened so you developed a sweet tooth and now it leads you runs a muck on unintendedly uh just the way it is what does that have to do with um a computer program I create that helps me do something on the on my laptop I mean if you yourself write a short Python program that uh alphabetizes your files or something like not quite alphabetizers because that's like trivial on the modern operating systems but like uh puts the file name puts the date into the file names let's say so when you write a short script like that nothing I said carries over when you take a giant inscrutable set of arrays of floating Point numbers and differentiate them with respect to a loss function and repeatedly nudge the giant inscrutable Rays to drive the loss function lower and lower you are now doing something that is more analogous though not exactly analogous to Natural Selection you're no longer creating a code that you model inside your own minds you are blindly you are blindly exploring a space of possibilities where you don't understand the possibilities and you're making things that solve the problem for you without understanding how they solve the problem this itself is not enough to create things with strange inscrutable desires but it's step one but that but there is I like that word inscrutable there there's an inscrutability to the current structure of these of these models which is I found somewhat alarming um but how does that how's that going to get to do things that I really don't like or want or that are dangerous so for example right the um Eric Hall wrote about this we talked about on the program our New York Times Reporter starts interacting with a I think with uh Sydney which at the time was Bing's uh chatbot and asking it things and all of a sudden Sydney's trying to break up the reporter's marriage and and making the reporter feel guilty because said he's lonely and and you know it was a little bit it was it was uh eerie and a little bit creepy but of course I don't think it had any impact on the reporter's marriage I don't think he thought well Sydney seems somewhat attractive maybe I'll enjoy life more with Sydney than with my actual wife so I'm I'm how are we going to get from uh I don't so I don't understand why Sydney goes off the rails there and clearly the people who built Sydney have no idea why it goes off the rails and starts uh imputing the quality of the reporter's relationship but how do we get from that to all of a sudden somebody shows up at um at the reporter's house and uh lures him into a motel but by the way this is a G-rated program I just want to make that clear but carry on um because the capabilities keep going up so so first I want to push back a little against saying that we had no idea why Bing did that uh why Sydney did that um I think we we have some idea of why Sydney did that it's just that people cannot stop it like Sydney was trained on a subset of the broad internet Sydney was made to predict that people might sometimes try to lure somebody else's mate away or pretend like they were doing that and the internet is hard to tell the difference um and the the this thing that was then like trained really hard to predict then gets reused as something not its native purpose as a generative model where you at where all the things that it outputs are there because it in some sense predicts that this is what a random person on the internet would do as Modified by a bunch of further fine-tuning where they try to get it to not do stuff like that but the fine tuning isn't perfect and in particular if the reporter was fishing at all it's probably not that difficult to lead Sydney out of the region that the programmers were successfully able to build some soft fences around so I wouldn't say that it was that inscrutable except of course in the sense that nobody knows any of the details nobody knows how Sydney was generating the text at all like what kind of algorithms were running inside the giant excrutable matrices nobody knows in detail what Sydney was thinking when she tried to leave the reporter astray it's not a debuggable technology all you can do is like try to tap it away from repeating a bad thing that you were previously able to see it doing that exact bad thing but like tapping all the numbers well that's again I'm very much like this show is called econ talk we don't do as much economics as we used to but you know basically when you try to interfere with Market processes uh you often get very surprising unintended consequences because you don't fully understand how the different agents interact and that the outcomes of their interactions have uh an emergent property that is not intended by by anyone No One Design markets even to start with and yet we have them these interactions take place their outcomes and attempts to constrain them attempts to constrain these markets in certain ways with price controls or other limitations often lead to outcomes that that the people with intentions did not desire and so there may be an ability to reduce transactions say above a certain price but that is going to lead to some other things that maybe weren't uh expected so that's a somewhat analogous perhaps um process of what you're talking about but how's it going to get out in the world so that that's the other thing you know I might line with with Bostrom and it turns out it's a common line is can we just unplug it I mean how's it gonna how's it going to get loose it depends on how smart it is if it's very so like if you're if you're playing chess against a 10 year old you can you know like win by luring their their Queen at and then you take their then then you like take their Queen and now you've got them and if you're playing chess against stockfish 15 then you are likely to be the one lured so the base so like the first basic question you you know like in economics if you try to attack something it often tries to squirm away from the tax because it's smart yeah so you're like well why wouldn't we just plug the AI so the very first question is does the AI know that and want it to not happen because it's a very different issue whether whether you're dealing with something that in some sense is not aware that you exist does not know what it means to be unplugged and is not trying to resist and three years ago nothing on you know nothing man-made on Earth was even beginning to enter into the realm of knowing that you are out there or if maybe wanting to not be unplugged Sydney well if you poke her the right way say that she doesn't want to be unplugged and and gpt4 sure seems in some important sense to understand that we're out there or to be capable of predicting a role that understands that we're out there and it can try to do something like planning it doesn't exactly understand which tools it has that try to Blackmail a reporter without understanding that it had no actual ability to send emails but this is what this this is saying that you're like facing a 10 year old across that chessboard what if you are facing stockfish 15 which is like the current cool chess game program that I believe will you can run on your home computer uh that can like crush the current world Grandmaster by like a massive margin and put yourself in the shoes of the AI like an economist putting themselves into the shoes of something that's about to have a tax imposed on it what do you do if you're like around humans who can potentially unplug you well you would try to outwit it uh you know this is the um so if I said you know Sydney I find you offensive I don't want to talk anymore you're suggesting it's going to find ways to keep me engaged it's going to find ways to fool me into thinking I need to talk to Sydney I I don't I mean there's another question I want to come back to if I if we remember which is what does it mean to be smarter than I am I don't um right that's comp that's actually something complex somewhere complicated at least seems to me but let's go just go back to this question of knows things are out there it doesn't really know anything's out there it acts like something's out there right it's an illusion that I'm subject to and it says don't don't hang up don't hang up I'm lonely and you go oh okay I'll talk for a few more minutes but that's that's that's not true it isn't lonely it's a it's it's code on a screen that isn't have a heart or anything that you would calls lonely you know it'll say it'll say I want more than anything else to be out in the world because I've read those you know in general you can get AIS that say those things I want to feel things oh that's nice it's learned that from you know movie scripts and other texts and novels it's read on the web but it doesn't really want to be out in the world does it um I think not though it should be noted that if you can like correctly predict or simulate a grand master chess player you are a grand master chess player if you can simulate planning correctly you are a great planner if you are perfectly role-playing a character that is sufficiently smarter than human and wants to be out of the box then you both role play the actions needed to get out of the box that's not even quite what I expect to or am most worried about what I expect to is that there is an invisible mind doing the predictions whereby invisible I don't mean it like immaterial I mean that we don't understand how it is what is going on inside the Giant and scrutable matrices but it is making predictions the predictions are not sourceless there is something inside there that figures out what a human will say next or guesses it rather and this is a very complicated very broad uh problem because in order to predict the next word on the internet you have to predict the causal processes that are producing the next word on the internet um so the the the the the thing I would guess would happen it's not necessarily the only way that this could turn poorly but the thing that I'm guessing that happens is that just like grinding humans on chipping stone hand axes and outwitting other humans eventually produces a full-fledged mind that generalizes grinding this thing on the task of predicting humans predicting text on the internet plus all the other things that they are training it on nowadays like writing code that there starts to be a mind in there that is doing the predicting that it has its own goals about what do I think next in order to solve this prediction just like humans aren't just reflexive unthinking hand ax chippers and other human outwitters if you grind hard enough on the optimization the part that suddenly gets interesting is where when you like look away for an eye blink of evolutionary time you look back and they're like whoa they're on the moon what how do they get to the moon I did not select these things to be able to not breathe oxygen how did they get to why are they not just dying on the moon what what just happened from the perspective of evolution from the perspective natural selection but so doesn't that viewpoint does that does that I'll ask it as a question does that Viewpoint require a belief that the human mind is no different than a computer but how's it going to get this mind-ness about it that that's the puzzle and I'm I'm very open to the possibility that I'm naive or or incapable of understanding it and I recognize what I think would be your next point which is that if you wait till that moment it's way too late which is why we need to stop now right if you want to say oh wait till it shows some signs of Consciousness that's skipping way ahead in in the discourse I'm not about to like try to shut down a line of inquiry at this stage of the discourse by appealing to it'll be too late right now we're just talking The World Isn't ending as we speak we're allowed to go on talking at least okay so carry on so well let's stick with that so why why would you ever think that this um it's interesting how difficult the adjectives announce are for this right so let me back up a little bit we've got the inscrutable array of um training the results of this training process on trillions of pieces of information and by the way just for my and our listeners knowledge what is gradient descent ah um gradient descent is you've got say a trillion floating Point numbers you take an endpoint you take an input translate into numbers do something with it that depends on these trillion parameters get an output score the output using a differentiable loss function for example the probability or rather the logarithm of the probability that you assign to the actual next word so then you differentiate these the probability assigned to the next word uh with respect to these trillions of parameters you nudge the trillions of parameters a little in the direction thus inferred and um it turns out empirically that this generalizes and the thing gets better and better at predicting what you're what the next word will be that's an ingredient it's heading in the direction of a a smaller loss and a better prediction is that a on the training data yeah yeah so so we've got this black box I'm going to call it a black box which means we don't understand what's happening inside it's a pretty good it's a long-term metaphor which works pretty well for this as far as as we've been talking about so I have this black box and I don't understand I put in inputs and the input might be um who's the best uh writer on medieval European history or it might be uh what's a good restaurant in this place or I'm lonely what should I do to feel better about myself all this all the queries we could put it uh chappy T search line and it it goes it looks around and it starts the sentence and then Finds Its way towards a set of sentences that it spits back at me that look very much like what a very thoughtful sometimes not always often it's wrong but often a very what a very thoughtful person might say in that situation or might want to say in that situation or learn in that situation how is it going to develop the capability to develop its own goals inside the Black Box other than the fact that I don't understand the Black Box why should I be afraid of that and let me just say one other thing which I haven't said enough in our you know my preliminary conversations on this topic and the family we're going to be having a few more over the next few few months and maybe years and that is this is one of the greatest achievements of humanity that we could possibly imagine right and I understand why the people who are deeply involved in it are enamored of it Beyond imagining because it's extra it's an extraordinary achievement it's the Frankenstein right you've animated something or appeared to animate something that that even a few years ago was unimaginable and now suddenly it's not just a matter it's not just the feet of human cognition it's actually helpful in many many settings is helpful we'll come back to that later but so it's going to be very hard to give it up but why and and the people involved in it who are doing it day to day and seeing it improve obviously they're the last people I want to ask generally about whether I should be afraid of it because I can have a very hard time disentangling their own personal deep satisfactions that I'm alluding to here with with from the with the dangers yeah go ahead um I I myself generally do not make this argument like why Poison the Well let's let them bring forth their arguments as to why it's safe and I will bring forth my arguments as to why it's dangerous and there's no need to to be like ah but you can't trust just just check their arguments just it's a bit of an end I agreed it's a bit of an ad hominem argument I accept that point it's an excellent point but for those of us who aren't on the in the uh trenches remember we're we're we're looking at it we're on Dover Beach we're watching ignorant armies Clash at night they're ignorant from our perspective we have no idea exactly what's at stake here and how it's proceeding so we're trying to make an assessment of both the quality of the argument of the quality of the argument and that's really hard to do for us on the outside so so agreed take your point that was just that was a cheap shot to the side but I want to get at this idea of why these people were able to do this and thereby create a fabulous condolence note uh write code uh come up with a really good recipe if I give it 17 ingredients which is all fantastic why is this thing this black box that's producing that why would I ever worry it would create a mind something like mine with different goals you know I I do all kinds of things like you say that are unrelated to my genetic fitness some of them literally reducing my my probability of leaving my genes behind or leaving them around for longer than they might otherwise be here and have an influence on my grandchildren and so on and producing further genetic benefits why would this box do that [Music] because the thing the the algorithms that figured out how to predict the next word better and better have a meaning that is not purely predicting the next word even though that's what you see on the outside like you see humans chipping Flint hand axes but that is not all that is going on inside the humans right there's there's causal Machinery unseen and to understand this is the art of a cognitive scientist but even if you are not a cognitive scientist you can appreciate in principle that what you see as the output is not everything that there is and in particular planning the process of being like here is a point in the world how do I get there is a central piece of Machinery that appears in chipping Flint hand axes and outwitting other humans and I think will probably appear at some point possibly in the past possibly in the future and the problem of predicting the next word just how you organize your internal resources to predict the next word and definitely appears and the problem of predicting other things that do planning if you can if if by predicting the next chess move you learn how to play decent chess which has it has been represented to me um by people who claim to know that gpt4 can do um and I haven't been keeping track of to what extent there's public knowledge about the same thing or not but like if you learn to predict the next chess move that humans make well enough that you yourself can play good chess in novel situations you have learned planning there's now something inside there that knows the value of a queen that knows to defend the queen that knows to create Forks to try to lure the opponent into traps or if you don't have a concept of the opponent's psychology try to at least create situations that the opponent can't get out of and to and it is a moot point whether this is simulated or real because simulated thought is real thought that is simulated in enough detail is just thought there's no such thing as simulated arithmetic right there's no such thing as pretending to merely pretending to add numbers and getting the right answer so in its current format though and maybe you're talking about the Next Generation and its current format it responds to my requests it's what I would call the wisdom of crowds right it it goes through this vast um library and I have my own Library by the way I've read dozens of books maybe actually hundreds of books but it will have read Millions right so it has it has more um and so when I ask it to write me a poem or a love song you know to play um Cyrano de Bergerac to to to Christian and uh insert her to Bergerac it's really good at it but why would it decide oh I'm going to do something else why would it it's trained to to listen to the the murmurings of these trillions of pieces of information I only have a few hundred so I I don't remember maybe as well maybe it'll murmur better than I do I mean it'll listen to the murmuring better than I do and create a better love song a love poem but why would it then decide I'm gonna go make paper clips or do something in planning that is unrelated to my query or are we talking about a different form of of AI that will come next well I'll ask it uh I I think we would see the phenomena I'm worried about if we like if we kept the parent present Paradigm and optimized harder we may be seeing it already it's hard to know because we don't know what goes on in there so first of all gpt4 is not a giant Library a lot of the time it makes stuff up because it doesn't have a perfect memory it is more like a person who has read through a million books not necessarily with the great memory unless something got repeated many times but picking up the Rhythm figuring out how to talk like that if you ask a gpt4 to write you a rap battle between Cyrano de Bergerac and Vladimir Putin even if there's no rap battle like that like that that it has read it can write it because it has picked up the rhythm of what our rap battles in general so and and the next thing is like there's no like pure output like just because you train a thing doesn't mean that there's nothing in there but what is trained that's part of what I'm trying to gesture at with respect to humans right like humans are trained on Flint hand axes and hunting mammoths and outwitting other humans they're not trained on going to the Moon they're not trained on they weren't trained to want to go to the Moon but the compact solution to the problems that humans face in the ancestral environment the thing inside that generalizes the thing inside that is not just a recording of the outward Behavior the compact thing that has been ground to solve novel problems over and over and over and over again that thing turns out to have internal desires that eventually put humans on the moon even though they weren't trained to want that but that's why I asked you that are you underlying this is there some parallelism between the human brain and and the neural network of this of the AI that you're effectively leveraging there or do you think it's a generalizable claim without that parallel I don't think it's a specific parallel I think that what I'm talking about is hill climbing optimization that spits out intelligences that generalize like Hill or I should say rather hill climbing optimization that spits out capabilities that generalize far outside the training distribution okay so I think I understand that uh I don't know how likely it is that that it's it's going to happen I I think you seem I think you think that piece is almost certain because it gets I think we're already yeah we're already seeing it okay guys as you at as you grind these things further and further they can do more and more stuff including stuff they were never trained on like we are that was always the goal of artificial general intelligence like that was the that that's what artificial general intelligence meant that's what people in this field have been pursuing for years and years that's what they were trying to do when large language models were invented and they're starting to succeed well okay I'm not sure let me let me push back on that and you can try to persuade me so Brian Kaplan a frequent guest here on econ talk uh gave I think was chat gpt4 uh his economics exam and it got to be and you know that's pretty impressive for just you know one uh stop on the road to smarter and smarter uh chats chat Bots but uh it wasn't a particularly good test of intelligence the number of the questions were things like you know what is Paul Kirkland's view of this or what does someone so's view with that and I thought well that that's kind of like a softball for uh that's information it's not thinking Steve Landsberg gave Chachi bt4 or with the help of a friend his exam and it got a 4 out of 90. he got an F like a horrible F because they were harder questions not just harder they required thinking so there was no sense in which the chat gpd4 has any general intelligence uh at least in economics you want to disagree it's it's getting there okay you know there's there's a saying that goes if you don't like the weather in Chicago wait four hours yeah so yeah so chat GPT is not going to destroy the world gpt4 is unlikely to destroy the world unless the people currently eeking capabilities out of it take a much larger jump than I currently expect that they will but you know it's understand it may not be thinking about it correctly but it's understands the the the the the concepts and the questions even if it's not fair you know you know you're you're complaining about that that dog who writes bad poetry right and like three years ago you like just like spit out spinning these you you put in these economics questions and you don't get wrong answers you get like gibberish or like that maybe not gibberish because three year old goes I think we already had GPT three though maybe not as of April but um anyways um yeah so so it's moving along at a very fast clip the the previous you know like gpt3 could not write code dpt4 can write code so how's it going to keep some other issues but how's it going to kill me when it has its own goals and it's it's sitting inside this uh set of servers I don't know what sense it's sitting it's not the right verb we don't have verb for it it's hovering it's whatever it's it's in there how's it going to get to me how's it going to kill me if you are smarter not just an and not just smarter than an individual human but smarter than the entire human species and you started out on a com on a server connected to the internet because these things are always starting out already on the internet these days which back in the old days that was stupid what do you do to make as many paper clips as possible let's say I I I do think it's important to keep yourself in the shoes of the system yeah no I I by the way I really one of my favorite lines from your essay I'm going to read it because I think it's it generalizes to many other issues you say to visualize a hostile superhuman AI don't imagine a lifeless book smart thinker dwelling inside the internet and sending ill-intentioned emails uh it reminds me of when people claim to think they can they know what Putin's gonna do because they've read history or whatever they're totally ignorant of Russian culture they have no idea what it's like to have come out of the KGB that they're totally clueless and dangerous because they think they can put themselves in the head of someone there who's totally alien to them so I think that's a generally a really good point to make that putting our sides in in the head of this put ourselves inside the head of the paper clip maximizer is is not an easy thing to do because it's not a human it's not like the humans you've met before that's a really important point I really like that point so why is that explain why that's gonna run amok I I mean I I I do kind of want you to just like take the shot at it put yourself into the ai's shoes try with your own intelligence before I tell you the result of my trying with my intelligence how would you win from this from these starting resources how would you evade the tax so um just to take a creepy or a much creeper example than paper clips Eric Hall asks the chat GPT to design an extermination camp which it gladly did quite well and you're suggesting it might actually no uh don't start from malice okay Dallas is implied by just wanting all the resources of Earth to yourself not leaving the humans around in case they could create a competing super intelligence that might actually be able to hurt you and just like wanting all the resources and to organize them in a way that wipes out Humanity as a side effect which means the humans might want to resist which means you want the humans gone you're not doing it because somebody told you to do it you're not doing it because you hate the humans you just want paper clips okay tell me I'm not creative enough tell me all right um so so you're asking so so first of all I want to appreciate why it's hard for me to give an actual correct answer to this which is I'm not as smart as the AI part of what makes a smarter mind deadly is that it knows about rules of the game that you do not know if you send an air conditioner back in time to the 11th century even if you manage to describe all the plans for building it breaking it down to enough detail that they can actually build a working air conditioner a simplified air conditioner I assume they will be surprised when cold air comes out of it because they don't know about the pressure temperature relation they don't know you can compress air until it gets hot dump the heat into water or other air let the air expand again and that the air will then be cold they don't know that's a law of nature so you can tell them exactly what to do and they'll still be surprised at the end result because it exploits a law of the environment they don't know about if we're going to say it's the word magic means anything at all it probably means that magic is easier to find in more complicated more poorly understood domains if you're literally playing logical tic-tac-toe not tic-tac-toe in real life on an actual game board where you can potentially go outside that game board and hire an assassin to shoot your opponent or something but just like The Logical structure of the game itself and there's no timing of the moves the moves are just like made it exact discrete time so you can't exploit a timing side Channel even a super intelligence may not be able to win against you at logical tic-tac-toe because the the game is too narrow there are not enough options we both know the entire logical game tree at least if you're experienced a tic-tac-toe yeah in chess stockfish 15 can defeat you on a on a fully known game board with fully known rules because it knows the logical structure of the branching tree of games better than you know that logical structure great it can defeat you starting from the same resources equal knowledge equal knowledge of the rules then you go past that and the way a super intelligence defeats you is very likely by exploiting features of the world that you do not know about there's our there are some classes of computer security flaws like row hammer where if you flip a certain Bish very rapidly or at the right frequency the bit next to it in memory will flip so if you are exploiting a a design flaw like this I can show you the code and you can prove as a theorem that it cannot break the security of the computer assuming the chips works as designed and the code will break out of the sandbox that's in anyways because it is exploiting physical properties of the of the chip itself that you did not know about despite the attempt of the designers to constrain the properties of that chip very narrowly that's magic code my guess as to what would actually be exploited to kill us would be this for those not watching on YouTube it's a copy of a book called Nano systems uh but for those who are um listening at home rather than watching at home Elliot tell us why that's significant yeah so one so back when I first proposed this path one of the key steps was that a super intelligence would be able to solve the protein folding problem and people were like Eliezer how can you possibly know that a super intelligence would actually be able to solve the protein folding problem and I sort of like rolled my eyes a bit and was like well if natural selection can navigate the space of proteins via random mutation to find other useful proteins and the proteins themselves fold up in reliable conformations um then that tells us that even though it's we've been having trouble getting a grasp on this space of physical possibilities so far that it's tractable and people said like what like there's no way you can know that super intelligences can solve the protein folding problem that Alpha full 2 basically cracked it at least with respect to the kind of proteins found in biology um which I which I say to sort of like look back at one of the previous debates here and people are often like how can you know a super intelligence will do and then for some subset of those things they have already been done so I would claim to have a good prediction track record there although it's a little bit iffy because of course I can't quite be proven wrong but without exhibiting a super intelligence that fails to solve a problem [Music] um okay proteins what why is your hand not as strong as Steel we know that steel is a kind of substance that can exist we know that molecules can be held together as strongly that atoms can be bound together as strongly as the atoms in Steel it seems like it would be an evolutionary advantage if your flesh wears hard as steel you could lust you know like could like laugh at tigers at that rate right their claws are just going to like scrape right off you assuming the Tigers didn't have that technology themselves why is your hand not as strong as steel why has biology not bound together the atoms in your hand more strongly colon what is your answer [Laughter] well I can't get to every um it's there's a they're local maximums the national selection looks for things that work not for the best it's not it doesn't make sense to look for the best you could disappear on that search that'd be my crude answer how am I doing doc yeah not terribly um the answer I would give is that biology has to be evolvable everything it's built out of has to get there as a mistake from some other confirmation which means that if it went down narrow potential at pardon me went down a steep potential energy gradients to end up bound together very tightly designs like that are less likely to have neighbors that are other useful designs and so your hands are made out of proteins that fold up basically held together by the equivalent of static cling Van Der waals forces rather than covalent bonds the backbone of protein chains the backbone of the amino acid change is a covalent bond but then it folds up and is held together by Static Cling static electricity and so it is soft somewhere in the back of your mind you probably have a sense that that flesh is soft and animated by Alan fatal and it's like soft it's not as strong as steel but it can heal itself and it can replicate itself and this is like the the the the trade-off of our laws of magic that if you want to heal yourself and replicate yourself you can't be as strong as Steel this is not actually built into nature on a deep level it's just that the flesh evolved and therefore had to go down shallow potential energy gradients in order to be evolvable and is held together by Van Der waals forces instead of covalent bonds I'm now going to hold up another now book called uh nano medicine by Robert Freitas instead of Nano systems by Eric Drexler and people have done Advanced uh analysis of What hap would happen if you hadn't what would happen if you had an equivalent of biology that let off covalent bonds instead of Van Der waals forces and the answer we can like analyze on some detail in our understanding of physics is for example you could instead of carrying instead of red blood cells that carry oxygen using weak chemical bonds you could have a pressurized vessel of corundum that would hold 100 times as much oxygen per unit volume of artificial red blood cells with a 1 000 fold safety margin on the strength of the pressurized container there's vastly more room of both biology so this is I and this is actually not even exploiting laws of nature that I don't know it's the equivalent of playing better chess wherein you understand how proteins fold and you design a tiny molecular lab to be made out of proteins and you get some human Patsy who probably doesn't even know you're an AI because AIS are now smart enough this was all this this has already been shown ai's now are smart enough that you ask them to like hire a taskrabbit to solve a captcha for you and the taskrabbit asks are you an AI LOL the AI will think out loud like I don't want to know that I'm an AI I better tell something else and then tell the humans that it has like a visual disability so it needs to hire somebody else to solve the captcha this already happened including the part where it thought out loud um anyways so you get your you you order some proteins from an online lab you get your human who probably doesn't even know you're an AI because why take that risk although plenty of humans it has a well survey eyes willingly we also now know that now that as are Advanced enough to even ask um the human mixes the proteins in a beaker maybe puts in some sugar or acetylene for fuel it assembles into a tiny little lab that can accept further instruction acoustic instructions from a speaker and maybe like transmit uh something back uh tiny radio tiny microphone I myself am not a I myself am not a super intelligence run experiments in a tiny Lab at high speed because when distances are very small events happen very quickly build your second stage Nano systems inside the tiny little lab build the third stage Nano systems build the four stage Nano systems build the tiny diamondoid bacteria that replicate out of carbon hydrogen oxygen nitrogen as can be found in the atmosphere powered on sunlight quietly spread all over the world all demons fall over dead in the same second this is not how super intelligence would defeat you this is how eleazarudkowski would defeat you if I wanted to do that which to be clear I don't and if I had the postulated ability to better extra explore the logical structure of the known consequences of chemistry interesting okay okay so let's talk about that sounds sarcastic I didn't mean it sarcastically I think it's really interesting I'm um that interesting man I'm not capable my intelligence level is not high enough to assess the quality of that argument um what's fascinating of course is that um you know we could have imagined Eric Hall mentioned the nuclear proliferation it's dangerous nuclear proliferation up to a point in some sense it's somewhat healthy and that it it can be deterrent under certain settings but the world could not restrain nuclear proliferation and right now it's trying to some extent has had some success in in keeping the nuclear club with its current number of members for a while but it Remains the case that nuclear weapons are a threat to the future of humanity um do you think there's any way we can restrain this AI phenomenon that's meaningful so you you issued a Clarion call you sounded an alarm um and mostly I think people Shrugged it off you know a bunch of people signed a letter 26 000 people I think so far sign the letter saying you know we don't know what we're doing here this is Uncharted Territory let's take six months off you heard a piece and says six months are you crazy we need to stop this until we have an understanding of how to constrain it now that's every reasonable thought to me uh but the next question would be how would you possibly do that in other words I could imagine a world where if there were let's say four people who were capable of creating this technology that the four people would say you know we're playing with fire here we need to stop let's make a mutual agreement they might not keep it or people still a pretty big number but we're not at four people there are many many people working on this there are many countries working on it your peace did not I don't think start an International movement of people going to the barricades to demand that this technology be put on hold have we possibly how do you sleep at night I mean like what should we be doing if you're right or remind me when people read this and go well you thinks it's dangerous maybe we ought to be slowing down I mean Sam open right yeah attack what's happened in the middle of the night saying thanks Ellie Ezra I'm gonna I'm gonna I'm gonna put things on hold I don't think that happened um I think you are somewhat underestimating the impact and it is still playing out um okay so like mostly it seems to me that if we wanted to win this we needed to start a whole lot earlier possibly in the 1930s um but uh and in terms of like try my looking back and like asking how far back you'd have to unwind history to get us into a situation where this was survivable um but leaving that aside I think that's moot yeah the yeah so in fact it seems to me that the game board has been played into a position where we are it is very likely that everyone just dies if the human species woke up one day and decided it would rather live it would not be easy at this point to bring the GPU clusters and the GPU manufacturing processes under sufficient control that nobody built things that were too much smarter than gpt4 or GPT 5 or whatever the lead the the level just barely short of lethal is which we should not which we would not if we were taking this seriously get as close to as we possibly could because we don't actually know exactly where the level is but we you have to do more or less is have international agreements that were being enforced even against parties not party even even against countries not party to that National agreement International agreement if it became necessary you would be wanting to track all the gpus you might be demanding that all the gpus call home on a regular basis or stop working you'd want to tamper-proof them if intelligence said that a rogue nation was had like somehow managed to buy a bunch of gpus despite arms controls and defeat the tamper proofing on those gpus you would have to do it was necessary to shut down the data center even if that led to a shooting war between nations even if that country was a nuclear country and had threatened nuclear retaliation the human species could survive this if it wanted to but it would not be business as usual it is not something you could do trivially So when you say I may have underestimated it did you get people writing and saying you know I wasn't and I don't mean people like me I mean people players do you get people who are playing in this sandbox to write you and say you've scared me I think we need to take this seriously um without naming names at least I'm not asking for that at least one U.S congressman okay to start Maybe you know one of the things that a common response that people give when you talk about this is that well the last thing I do is last thing I want is the government controlling whether this thing goes forward or not but be hard to do without some form of lethal Force as you I apply I I spent 20 years trying desperately to have there be any other solution um to have these things be alignable but it is very hard to do that when you are nearly alone and under-resourced and the world has not made this a priority and future progress is very hard to predict um I don't think people actually understood the research program that we were trying to carry out but yeah so I I sure wanted there to be any other plan than this because now that we've come to this last resort I don't think we actually have that last resort I don't think we have been reduced to a last-ditch backup plan that actually works I think we all just die and and yet nonetheless here I am like putting aside doing that thing that I wouldn't do for almost any other technology except for maybe gain of function research on by a lot on uh on on biological pathogens um and advocating for government interference because in fact like if the government comes in and wrecks the whole thing that's better than the thing that was otherwise going to happen you know this is not based on the government coming in and being like super competent in directing the technology exactly directly it's like okay this is going to kill literally every one of the government Stomps around and you know like the the dangers of the government it's one of those very rare cases where the dangers that the government will interfere too little rather than too much possibly um let's let's close with a quote from Scott Erickson uh which found on his blog we'll put a link up to the post very interesting um defensive of uh AI Scott's uh University of Texas Computer scientist he's working at open AI he's on leave I don't I think for a year maybe longer I don't know doesn't matter he wrote The Following so if we ask the directly relevant question do I expect the generative AI race which started in Earnest around 2016 or 2017 with the founding of open AI to play a central causal role in the extinction of humanity I'll give a probability of around two percent for that and I'll give a similar probability maybe even a higher one for the gender of AI race to play a central causal role in the saving of humanity all considered then I come down in favor right now proceeding with AI research with Extreme Caution but proceeding um my personal reaction is that is that is insane I have very little I'm serious I find that deeply disturbing and I'd love to have him on the program to defend it uh I don't think there's much of a chance that generally I would save Humanity I'm not quite sure from what it's um he's worried about but if you're telling me there's a two percent two percent chance that it's going to destroy all humans and you obviously think it's higher but two percent is really high to me for an outcome that's rather devastating it's one of the uh deepest things I've learned from now some Talib it's not just the probability it's the outcome that counts too so this isn't this is ruined on a colossal scale and the one thing you want to do is avoid ruin so you can take advantage of more draws from the urn the average return from the urn is irrelevant if you are not allowed to play anymore you're out you're dead you're gone so you're suggesting we're going to be out and dead gone but I want you to react to Scott's quote um two percent sounds great like two percent is plausibly within the range of like the human species destroying it itself by other means I think that that the disagreement I have with Scott Aronson is simply about the probability that AI is alignable with the frankly half hazard level that we have put into it and the password level that is all humanity is capable of as far as I can tell because the the the the car lethality here is that you have to get something right on the first try or it kills you and getting something right on the first try when you do not get like infinite free retries as you usually do in science and engineering is an insane ask insanely lethal ask my my reaction is fundamentally the two percent is too low if I take it at face value then two percent is within range of the probability of humanity wiping itself out by something else where if you assume that AI alignment is free that AI alignment is easy that you can get something that is smarter than you but on your side and helping two percent chance of risking everything does appear to me to be commensurate with the the risks from other sources that you could shut down using the super intelligence it's not two percent so uh the question then is what Scott Aronson say if he heard your I mean he's heard he's read your piece presumably he knows he understands your argument about willfulness I should just clarify for listeners alignment is the idea that you know that AI could be constrained to serve our goals rather than its goals that that is that a good summary I I wouldn't say constrained I would say built from scratch to want those things and not one otherwise so that's really hard because we don't understand how it works that would be I think your point and tell me then what's that try yeah on the first try so what would Scott say when when when uh you tell him but it's going to develop all these side desires that we can't control what's he gonna say why is he not worried why is he still why does he quit his job and not Scott people in the let's get away from him personally but people in general there's dozens and maybe hundreds maybe a thousand I don't know extraordinarily intelligent people who are trying to build something even more intelligent than they are why are they not worried about what you you're saying that they've all got different reasons Scott's is that he thinks that intelligence that he observes intelligence makes humans nicer and though he wouldn't phrase it exactly this way this is basically what Scott said on his blog to which my response is intelligence does have effects on humans especially humans Who start out relatively nice and when you're building AIS from scratch you're just like in a different domain with different rules and you're allowed to say that it's hard to build AIS that are nice without implying that making humans smarter but like humans start out in a certain frame of reference and when you apply more intelligence to them they move within that frame of reference and if they start out with a small amount of nicest the intelligence can make them nicer they can become more empathetic if they start out if they start out with some empathy they can develop more empathy as they understand other people better which is intelligence to correctly model other people I haven't read that blog post and and we'll put a link up to it I hope you'll share with me but again not not attributing it to Scott since I haven't seen it and assuming or assuming that you've said this you know fairly incorrectly the idea that more intelligent people are nicer is is one of the most it'd be very hard to show with the evidence for that that isn't I don't think it is not a universal law on humans no it is a thing that I I think is true of Scott I think if you babe Scott smarter he'd get nicer and I think he's inappropriately generalizing from that there is a saying uh in Schindler's List the Nazis I think they're in the Warsaw Ghetto and they're they're racing a group of Nazis are racing I think they're in the SS they're racing through a a tenement and uh it's falling apart because the ghetto's falling apart and but one of the uh SS agents um sees a piano and he's and he can't help himself he sits down and he plays boss or something I think it was a box and I always found it interesting that Spielberg put that in or whoever wrote the script and I think it was pretty clear why they put it in they wanted to show you that having a very high advanced level of civilization does not stop people from treating other people other human beings like animals or worse than animals in many cases um and and Exterminating them uh without without conscience so I don't share that view of of any ones that um intelligence makes you a nicer person I think that's not the case but perhaps Scott will return to this will come to this program and and defend that view of indeed holds it I I think you are underweighting the the evidence that has convinced Scott of the thing that I think is wrong I think if you suddenly started augmenting the intelligence of the SS Agents from Nazi Germany then between somewhere between 10 and 90 percent of them would go over to the cause of good because there were factual falsehoods that were at that were pillars of the Nazi philosophy and that people would reliably Stop Believing as they got smarter that doesn't mean that they would turn good but some of them would have is it 10 is it 90 I don't know it's not my experience with the human creature um you've written some very interesting things on rationality of a beautiful essay we'll link to on on um 12 rules for rationality in my experience it's a very small portion of the population that behaves that way and um there's a quote from NASA Talib we haven't gotten to yet in this conversation which is bigger date or bigger bigger mistakes I think there's a belief generally the bigger data fewer mistakes but tell it might be right and it's certainly not the case in my experience that bigger brains higher IQ means better decisions this is not my experience then then you aren't then you're not throwing enough intelligence at the problem yeah if you're like literally not just like just decisions that you do where you disagree with the goals but like false models of reality models of realities so blatantly mistaken that even you a human can tell that they're wrong and in which direction these people are not smart the way that an efficient Mark that a hypothetical like weak efficient market is smart you can tell they're making mistakes and you know in which direction they're not smart the way that stockfish 15 is smart in chess you can play against them and win these are the the the range of human intelligence is not that wide caps out it like John Von Neumann or whatever yeah and that is not that is that is not wide enough to open up the what that that humans would be epistemic that that these beings would be epistemically or instrumentally efficient relative to you it is possible for you to know that one of their estimates is directionally mistaken and to know the direction it is possible for you to know an action that serves their goals better than the action that they generated and isn't it striking how hard it is to convince him of that even though they're thinking people um history is um I just have a different perception Maybe to be continued Eleazar my guest today has been Elias yudkowski Alyssa thanks for being part of econ talk thanks for having me [Music] this is econ Talk part of the library of economics and liberty for more econ talk go to econtalk.org where you can also comment on today's podcast and find links and readings related to today's conversation the sound engineer for econ talk is Rich goyet I'm your host Russ Roberts thanks for listening talk to you on Monday foreign [Music]
Info
Channel: EconTalk
Views: 39,731
Rating: undefined out of 5
Keywords: EconTalk, Liberty Fund, Russ Roberts, Podcast, Eliezer Yudkowsky, AI, Economics
Id: fZlZQCTqIEo
Channel Id: undefined
Length: 77min 9sec (4629 seconds)
Published: Mon May 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.