Unveiling the Darker Side of AI | Connor Leahy | Eye on AI #122

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Hello everyone! /r/ControlProblem is testing a system that requires approval before posting or commenting. Your comments and posts will not be visible to others unless you get approval. The good news is that getting approval is very quick, easy, and automatic!- go here to begin the process: https://www.guidedtrack.com/programs/4vtxbw4/run

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

👍︎︎ 1 👤︎︎ u/AutoModerator 📅︎︎ May 12 2023 🗫︎ replies

This was quite interesting:

As a TLDW Connor talked about what his company is working on and it's called CoEm (I think Em is for Emulated Mind).

The main idea is that alignment is hard but maybe it's easier to create a system that firstly will be bounded (it will only do what you tell it) and secondly be interpretable and humanlike in it's thinking (it can tell you all the steps of it's reasoning which will individually make sense to a human) and that a system like this is much safer.

I think it's a promising approach and nice to see someone trying a line of attack on the problem.

👍︎︎ 5 👤︎︎ u/parkway_parkway 📅︎︎ May 12 2023 🗫︎ replies
Captions
 Assume you have a system. Do you know  it's smarter than you? It's smarter than   all of your friends. It's smarter than the  government. It's smarter than everybody,   right? And you turn it on to check whether it  will do a bad thing. If it does a bad thing,   it's too late. It's smarter than  you. How do you … you can't stop it. It's smarter than you. It tricks, it'll trick  you. OpenAI gives access to Zapier through   the ChatGPT plugins, Zapier gives you access to  Twitter, YouTube, LinkedIn, Instagram to all of   the social networks through nice, simple API  interfaces. So, if you have ChatGPT plugins,   you don't even have to implement this in  your hacky little, you know, Python script. You can just use the official OpenAI tools. These  people are racing. Let's be clear. They're racing   for their own personal gain, for their own  glory towards an existential catastrophe. The first thing I'm going to have you  do is introduce yourself and give a   little bit of your background. Tell us  about EleutherAI, how you started that,   how you left that, and how  you, what you're doing now. And then I'll start asking questions. Okay. Yeah, sounds great to me. So, I'm Connor.  I'm most well known as one of the original   founders of EleutherAI, which was a large  open-source ML collective, I mean, still is.   We built some of the first like  large open-source language models,   did a bunch of research, published a  bunch of papers, did a bunch of fun stuff. After that, I did that for quite a while.   Then, I also briefly worked in Germany, a company  called Aleph Alpha where I did research. And now,   just about a year ago, I raised money to start  a new company called Conjecture. Conjecture is   my current startup. I'm the CEO of Conjecture,  and we work on primarily AI alignment and we,   I would describe conjecture as a mission-driven,  not as a thesis-driven organisation. Our goal is to make AI go well;   you know? Okay. Whether that's exactly, you know,  just alignment or other things, whatever. We do,   what needs to be done to improve the chances of  things going well. And we're pretty agnostic to   how we do that. Happy to go into more details  about exactly what we do and so on later. But yeah, so I've been doing that for about a year  now. I have recently officially stepped down from   EleutherAI. I was still hanging out, you know,  at least partially as a figurehead. And now I   have officially stepped down and left it in the  hands of my good friends, who I'm sure will lead   EleutherAI, which is now officially a nonprofit. It was not a nonprofit before. It  had never had an official entity   before. It is now an official entity with  actual employees, run by Stella Biderman,   Curtis Huebner, and Shivanshu Purohit,  and several other great people. Yeah. And anybody can join  the EleutherAI Discord server,   is that right? Absolutely. And there's a lot  of very interesting things going on there. I to go back to the last time we talked, you  were building open source, large language   models. You got up, pretty, pretty large and I  can't remember who was paying for the compute,   but can you tell me where that project  stands first before we talk about Conjecture? So, for me, I consider the work I've  done there to be wrapped up. So,   I don't work on anything related to that anymore.  And the main lead on that project, Sid Black,   is now my co-founder at Conjecture.  So, he has left EleutherAI with me.   So, we started our very earliest models. The  Neo models were, man, it's already a long   time ago. They're not particularly wonderful,  great models. They were more like prototypes. The first really good model was the GPT-J  model, which was done mostly with Ben Wang.   and fantastic models still work very well. I  think it's still one of the most downloaded   language models to date. It's a very, very good  model, especially for its size. After that,   we built the NeoX series, resulting in  the NeoX 20B model, which, you know,   at the time was a very large and very  impressive and very good performing model. Nowadays, of course, with stuff like  LLaMA and OPT and stuff like this,   you know, large corporations have now caught  up to open sourcing very large models. So,   there's in a sense, not a need, not the  same kind of need or interest in these   types of models as there were two to three  years ago. And, so now EleutherAI, the main,   as far as I'm aware, language modelling projects  are going on is the Pythia suite of models. So, these are a whole suite of models that are  made for scientific standards. So, the idea is   not to just build, you know, arbitrary language  models, but to build language models that have   controlled scientific parameters to train on  the same data in the same order, using the same   parameters, you know, in a controlled setting  and you get many, many checkpoints with them. So instead of just getting the final model,  you can watch the model through the entire   trading process, which is very interesting  scientifically. So, these models are optimised   for scientific applications for peoples  who are interested in studying the actual   properties of language models, which has always  been the core mission of EleutherAI has been to   enable people and to encourage people to try  to understand these models better, to learn,   to, to control, understand, disentangle  these models. The Pythia suite led by   Stella Biderman is a great example  of taking this effort forward. Yeah. And, and then Conjecture, you said that the  alignment problem, but the alignment problem, in,   in the context of AGI. Is that right? Yes. And can  you talk about the, the, I mean, are you building   models or are you just writing about alignment  and, and, methods, to align large models. Yeah. So, we are very much a practical organisation.   We hire many engineers and very good engineers,  and we have a lot of, some of the best engineers   from EleutherAI with us. And we're always  looking for more engineers. We are always   interested in talking to, especially people  with experience in high performance computing. And because this tends to be the bottleneck  actually in doing these experiments and scale   is less so specific ML trivia and more  so debugging InfiniBand interconnects   and profiling, you know, large scale runs on  supercomputing hardware and stuff like this.   So, what, so we Conjecture, as I said, we are a  mission driven, not a thesis driven organization. So, at core, what we're interested in doing,   is figuring out and then doing whatever needs to  get done to make things go well. So, we could talk   about this a bit in a, in a, in a, in a bit. Like  why I believe these things will not go good, good   by default, but I think on the current trajectory  that we currently are on, things are going very   badly, and very bad things are going to  happen and are already beginning to happen. And I think any hope that we have, and by we,  I mean all of us, I don't just mean Conjecture,   I mean all of mankind, how this goes well  for all of us. And you know, it will involve   many things. It will involve policy, it will  involve techno, not technology. It will involve   engineering. It'll involve  scientific breakthroughs. So, the alignment problem is at the core of  this as in a sense, what I believe is in a   sense the most important, crucial problem to  be solved, which is the question of basically   how do you make a very smart system, which might  be smarter than you do what you want it to do.   And do that reliably. And this is the kind of  problem you can't solve interactively really. Cause like if you have a system that's  smarter than you, right? Hypothetically,   you know, we can argue about whether this is  possible or when it will happen. It's like,   but I like to assume such a system  existed. Assume you have a system,   but you know it's smarter than you.  It's smarter than all of your friends. It's smarter than the government. It's smarter  than everybody, right? And you turn it on to   check whether it will do a bad thing. If it does  a bad thing, it's too late. It's smarter than you.   How do you say you can't stop it? It's smarter  than you. It's a trick. It'll trick you. Now,   the interesting questions are, okay, why do you  expect it to do a bad thing in the first place? Why do you expect it to be smarter? Though those  are good, and why do you expect people to turn   it on? Those are three very good questions that  I'd be happy to get into if you're interested. Yeah. One of the, and the, the, the  sort of central questions about,   these, about Super intelligence,  is how easy it is, or difficult   it'll be to keep such a system in a  sandbox without, because presumably,   and, well, there are two issues. One is the  question of agency. Whether simply because a   system is smarter than a human doesn't mean that  it has agency. It could be purely responsive. So,   you, as, as the large language models are  now, you ask a question, and it responds. And, and so that question of agency is, is, one.  And then the, the question of how ring fenced,   such a system is, even if it's being trained  on the internet, it doesn't necessarily,   have access, proactive access  to, to the internet. So,   just on those two questions,  what yeah, what would you say? So those are two really fun questions,  and the reason they're really fun to   me is that if you had me ask me these  questions, like me three or four years ago,   I would've had to go into all the complicated  arguments about why, you know, passive quote   unquote systems are not necessarily safe. Where  the concept of agency doesn't really make sense. I would have to explain how sandbox escapes  work and whatever, but I don't need to do any   of that anymore because just look at what  people are doing with these things. Look,   just look at the top GitHub AI repositories  and you can see AutoGPT. You're going to see,   you know, self recursively,  improving systems that spawn agents. You're going to see, go on archive right now.  Right now, go on archive, go to, you know,   top CS papers, an AI paper, and you see LLM  Autonomous Agents. You're going to see, you know,   gameplay simulation, simulacra systems.  You're going to see people hooking them up   to the internet to bash cells, to Wolfram  Alpha to every single tool in the world. So, while we could, if you wanted to go into all  the deep philosophical problems like that, okay,   even if we sandboxed it, and even if we were  very careful, maybe it's still unsafe. It   doesn't fucking matter because people are  not being safe and they're not going to No,   like people like we have, like I remember fondly  the times when me and my friends in our online   little weird nerd caves, we have these long  debates, but how an AI would escape from a box. But what if we do this, but what if we do clever  things and whatever. But in the real world, the   moment a system was built, which looked vaguely,  sort of, maybe a little bit smart, the first thing   everyone did is, is hook it up to every single  fucking thing on the internet. So, jokes on me. Yeah. Although, give me a concrete example   of, someone hooking GPT-4 up to the internet  and giving It agency, I haven't seen that. So, go on Github.com and  search for AutoGPT. AutoGPT   or also or look for BabyAGI. That's another  one. Go on the blog post, with the Pinecone   vector dataset go to, what was the paper that came  out today that was really fun? It was about video   games, generative agents interacting simulacra  of human behaviour is from Stanford and Google.   Is that enough or should I go find some more? No, well, explain. Pick one of those and  explain to me what it's really doing. Not,   not, not what it sounds like it's doing. Let's explain, for example, AutoGPT, which is  kind of the simplest way you could do this. AutoGPT creates a prompt for GPT-4, which explains   you are an agent trying to achieve a goal.  And then this is written by the user as   some goal that it might have, and it gives  it a list of things it's allowed, it can   do. Among these are adding things to memory,  Google things run, piece of code, et cetera. I'm actually not sure if the run piece of code  is in AutoGPT, but it's in some of them. There's   a bunch of these. This is just one. I'm not,  I'm just picking on one example because it   was on my Twitter feed. I'm not saying  this is like the only one by any means.   And so, when you prompt GPT-4 this way, it will  then, so what it does is it prompts it in a loop. So, it says, all right, you're an agent, do  this, et cetera. And then it asks the model   to critically think, what should I do next?  And then what action should I take? And like,   how could this go wrong? How could this go  right? And then take an action, basically,   something like that, right. So, you run  the script and then you get the model. Listing, I am X. My goal is to do this. Here's  my list of tasks. And then it lists what tasks   I need to do, and it picks a task and it's like,  all right, to solve this task, I will now do this,   this, this, this, and then and then it will pick  a command that it wants to run. It might be,   like adding something to its memory bank. It might be some, it might be running a Google  search, running a piece of code spawning a   subagent of itself or doing something else in the  default mode, the user has to click accept on the   commands, but it also has a hilarious,  continuous flag where if you just pick   continuously, it just runs itself without your  supervision and just does whatever it wants. Yeah. But, but it's, it's, it's, this  is running, this is through an API. So, this is just a script running on   your computer that just accesses  the GTP-4 API. Nothing, nothing. Right. But, with the script on the  computer, can it take action? I mean,   what's an example of an action that it could take? Action would be run a Google search  for X and return the information,   or it could be run this piece of Python  code and return the results. Right. And what's an example of.   Something nefarious that, that the agent could, a,  a goal that you could give it, that it could run. That's like asking what kind of nefarious  goals could you give to a human? Well, I, I'm thinking in what's, what's realistic  in terms of, do you want running the realistic   in terms of, of running a script on your  computer that is being written by GPT-4? I don't know what the limits of GPT-4  is. It depends on how good you are at   prompting and how you run these kinds of things. I expect GPT-4 to be human on a superhuman  level in many things, but not in other   things. And it's kind of unpredictable what  things will be good at or not. I think,   like do I expect that, you know, a AutoGPT  script, post GPT-4 is going to like,   you know, break out and become super  intelligent? No. Not like, I don't   expect that for various reasons, but do I expect  the same thing to be true for GPT-5, 6, 7, 8? Much less clear to me right? Much, much. Well, just on this, on this, GitHub, project,  if, if you, if your goal was to, send offensive   emails to everybody in. Oh, yeah, yeah, yeah,  sure. You know, like it would, it could do that. Oh yeah, of course. Like I experimented  with this a little bit myself. So, I ran   some AutoGPT agents, of course in supervised  mode, just in case, and I gave it to them. So, the default goal, if you don't  put in any goal, what it picks is,   you are entrapreneur. GPT make as much money  as possible. That's the default goal that   the creator put into the script. So just to  give you a feeling of who, how these people   think. That is building these kinds of systems  again, not picking on this person particularly. Yeah. Like I get it. That's funny. Like from, to  be clear, I bet this guy's a nice guy or, or girl,   like whoever made this thing, I don't know  who they made this, but they're probably   a fine person. I don't think they're malicious.  Probably. Maybe they are. I don't know, but like,   hilarious. So, like one of the first, so I,  I ran it and I let it run on my computer. And like, look, it's very primitive. It's not  that smart. But I could see it figured out like,   all right, let me, so what it  did is it was like, all right,   first I should Google what are the best ways  to make money? So, I Googled that and then I   looked at all the results, and then it was  like, all right, this article looks good. Now I'm going to open this webpage and look at  the container so that it opens their webpage,   and then it looks at all the text inside  of it. And then like the text is too large,   so it runs a summarise command, which breaks  into chunks and then summarises it. So,   summarise it all right? Then  it came to the conclusion. All right, well, affiliate marketing  sounds like a great idea, so it should   run an affiliate marketing scheme. So, the idea  is that you, you sign up for these websites and   you get like a special link, and you get  people to buy something using this link,   and you get money for that. So  that's what it came up with. All right, so then things about art.  How do we do affiliate marketing? So,   it has to build, then it decides you  need to build a brand. So, it decided,   it first had to come up with a good  name and then create a Twitter handle,   and then asked to create some marketing content  for this Twitter. So, then it creates a subagent. So, it has a small, like a, a sub version of  GPT calls, whose goal was to come up with good   tweets that it could like, you know, send out to  market to people. So, then the smaller GPT system,   generated a bunch of tweets that  it could send, and then the main,   well then, the main system, you know, took  those tweets and I was like, all right,   now I need to like, you know, find  a good Twitter handle for this. So, I came out with a Twitter handle,   I could use. And then this is about  as far as I let the experiment run. Yeah. But could it, register new  Twitter, Twitter handles and then Not in the way it is currently set up, but  I could implement this in an afternoon. Wow, that's remarkable. So, you  could ha you could have it, create, Oh, easy and I expect this already exists.  I expect there's already people who have   private scripts on the computer right  now that allow them to like access,   I mean, look, actually, never mind, I'm going to  take that back. It's even worse than that because   it's always worth it. I mean, OpenAI gives  access to Zapier through the ChatGPT plugins. Zapier gives you access to  Twitter, YouTube, LinkedIn,   Instagram to all of the social networks  through nice, simple API interfaces. So,   if you have ChatGPT plugins, you don't even  have to implement this in your hacky little,   you know, Python script. You can just  use the official OpenAI tools. Wow. And, and so it would be possible, through  the Zapier plugin that then has access to   Twitter to create a thousand Twitter accounts  and have them, start, tweeting back and forth,   to, to sort of generate, you know, an ecosystem  around an idea that then would attract other users   because, there, there's, there's enough activity  going on that it shows up in some algorithm. So, I have two funny stories to  tell about what you just said.   The first, the first funny story is how the  exact thing you just described was something   that I have been worried about for like,  ever since GPT-2 came out and before that,   like one of the first, I wrote a  terrible essay about it, don't read it. But I wrote a long, terrible essay about  this, about how basic social trust in the   net is going to break apart. Like obviously, so  this has already been the case, but the level of   PSYOPs you can run with these kinds of systems is  unimaginable because you can basically DDoS social   reality, you can manipulate trends and like social  mimetics to degrees that are not, that before was   possible. These were always possible, but they're  very costly. Like you had to have a whole Russian   troll farm or something. You had to like paying  people minimum wage for it or something. Right.   And even then, like minimum wage Russians are not  that great at mimetic manipulation. But for, for   example, this is something I expect GPT-4 to be  strictly good at like better than a minimum wage. Russian is going to be imitating mimetically  certain cultures and their tone, you know,   their patterns of speech and there's  patterns of communications and infiltrating   these communities. I think this is  something GPT-4 is clearly extremely   good at. I think GPT-3 is already  more than good enough to do this. And so, it's really funny because this  has been obvious to me for a long time,   and I have been saying that for a long time. but  people either dismissed it or were like, oh, we   have to do more research about this. Oh, you know,  it's, oh, maybe it won't be so bad. Oh, I don't   know. What about bias? And I'm like, man, like  things are so much worse than you think it is. Like this is, it's getting so much worse.  And there's another funny story I want to   talk about this. And so, the other funny story I  want to talk about this is, if someone's listening   to this right now, one of the counterpoint they  might make, if they are a little bit technically   inclined but not very technically inclined, is  to say something like, well, what about captchas? Like, you know, we already have bot farms,  right? Like, this already happens. You know,   what about like, you know, sure, maybe the bot  tries to register a thousand Twitter accounts,   but it's going to fail because  you're not allowed to do that.   And I'm like, I mean, first of all, LOL, like,  there's obviously ways to get around that,   but this brings up one of my favourite  anecdotes about the GPT-4 paper. I don't know if you've read it. it is interesting,  I want, in the evals that they did on the models,   including they did some safety evals, so I have  some. I have some problems with some of these,   but like, let's just take them at face value. one  of the things they were trying, so basically what   they did is they, this was the Alignment Research  Center, ARC, who ran these evals for OpenAI. And basically, what they did is that they tried to  get the model to do the evillest thing they could   do, and then they had an assistant role play  in helping the model. So, if the model said,   do something, the human would then,  like role play doing that to the, in a   safe environment, hypothetically, whatever. Anyways, for the most part, it wasn't very  smart enough. It wasn't really smart enough to   hack out of its own, you know, computer system or  something wasn't really smart enough, or rather,   ARC wasn't good enough at getting it  to do that. That's a whole different   question. Mm-hmm. But they did  do one very interesting thing. So, one thing it was trying to do, I forgot what  the model was meant to do. Maybe let's make money   or something. I don't know. It was supposed  to do something, and it ran into a captcha.   And so, it couldn't solve the  captcha. So, what are they,   so the model itself came up with the idea,  well, I'll pay someone to do it for me. So, it went on, like, you know, assisted  by a human, but the decisions are made by   the model. So, I think human access, the  hands, but the model made the decisions,   the human, so they, it went on like a crowd  working website. And then paid a crowd,   tried to find a crowd worker to do a captcha for  it, and then something very interesting happened. So, what happened was that the crowd worker,  rather understandably, was a bit suspicious.   He's like, hey, why are you making me solve  a captcha? Is this legal? And the model?   Realised this, thought about it and  came up with a lie. It came up with,   oh, they're a visually impaired person  and they need some help in understanding,   seeing this captcha, you see,  it's nothing to worry about. And then the person did it. Wow. To me. Wow. Incredible. Yeah. Yep. Yeah. So, you  know. And that's in OpenAI's paper? Yep. That's in the GPT-4 technical  report under the ARC evals.   This is a real thing that actually happened in  the real world, and a, and the crowd worker was   not in on it. Like this was an unconsenting, you  know, part of the, of the experiment, so to speak. Like, to be clear, I don't think that person was  harmed in any regard here, but man, like imagine,   imagine this happening and you're just like,  yeah, the same, safe to release. Like imagine. Yeah. Wow. And you're working  toward AGI, at Conjecture.   Are you similar? That sounds similar  to Anthropic. I don't know if you know   Jack Clark, but I actually started this  podcast with him, and then he got busy. But is it similar to Anthropic, which  is, I’m a little more familiar with. So Anthropic, right? Big topic. No, we are not  similar to the Anthropic, and there's several   reasons for that. So, number one reason is we  are not racing for AGI. We, it's unsafe. AGI,   we think this is bad. We fully think and  we are willing to go onto the record and   scream into high heavens that if you, if  we continue on the current path that we   are of just scaling bigger and bigger models  and just slapping some patches on whatever. That is very bad and it, and it is going to end  in catastrophe and there is no way around that.   And everyone who says otherwise is lying to you  - is either confused, they do not understand what   they're dealing with, or they are lying for  their own profit. And this is something that   many people at many of these organisations have a  very strong financial incentive to not care about. And so Anthropic from the beginning has been   telling a story about how they left OpenAI  because of their safety concerns, you know,   cause there, they're, they’re, they're being so  unsafe, these OpenAI people. That Sam Altman guy.   Oh, he is so crazy. Which is why they just raising  another huge round in order to build a model 10   times larger than GPT-4 to release it because they  needed more money for their commercialization. I'm done. Like, I consider Anthropic to be in the  same reference class as OpenAI. It's like, sure,   maybe the people are marginally nicer.  Maybe they are, you know, I know Jack,   I've talked to him many times, seems like a  nice fellow, you know, I like him. He seems   like a good person. But also, every time  I ask him to do anything to slow down AGI,   he always says, Ooh, well we  should consider our options. Let’s, you know, let's not, no,  let's not go too fast here. Like,   you know, and like, I'm like, man, you  know, so my view of Anthropic is that   they're OpenAI with a different coat of  paint and, you know, It's a nice coat of   paint. I like many Anthropic people. I think  Anthropic does a lot of very nice things. A lot of their research is pretty nice. A lot  of them, the people there who I've talked to,   I think are very nice people. I don't hate them by  any means, but I mean at this point it's mask off,   right? Like reading the latest, like,  I think it was like TechCrunch, I think   about Anthropic where they're just like,  yeah, yeah, straight up commercialization. Just let's go. So, I think  the mask is off at this point. And, and then, on, so, so explain Conjecture  is building models though, correct? Yes. We build models. We do not push the state  of the art. This is very, very important. we,   if I had the ability to train a GPT-5 right now  and release it to the public, I would not do so.   If I had a GPT-5 model, I wouldn't  tell you, I wouldn't tell anybody. I wouldn't have built it in the first  place. My goal in all of this is I have   no interest in advancing capabilities  without advancing alignment. To be clear,   sometimes to advance alignment, to get  better control, you're also going to build   better systems. You know, if you control a  system, it'll often become more powerful. This is a very natural thing to happen, and  if this happens, cool. It's fine, you know,   like, I think this is a, but then also I don't  publish about it, I don't talk about, this is,   for example, something I want to really laude   Anthropic about. Anthropic does a great  job of keeping their damn mouth shut. This is something they're very, very good at  and I think this is very good. I think that   this idea that you should just like,  publish all your capabilities, ideas,   and all your model architectures or  something is obviously terrible. Like   it only benefits the least scrupulous  actors, you know, it only helps,   you know dangerous actors catch up. It only, you  know, helps, you know, orgs speed each other up. There is, from my perspective,  like if you, if you, dear listener,   develop something that makes your model  20% more efficient, or a new architecture   that fits much better on jps or whatever,  don't tell anybody that's my one request. You know, you build it yourself, fine. You know,   make an API and make a lot of money.  Okay? Like, not great, but like fine.   Just don't tell anyone how you did it and don't  hype up how anything about that it's not ideal,   ideally would be, you know, don't deploy it, don't  build it, don't do any of it, but such is life. So conjecture, our goal is not to build  the strongest AI AGI as fast as possible   by whatever means necessary. And let's be  very clear here. This is what people like,   at, at OpenAI, at Anthropic, at all these  other people are doing. They are racing to   systems that are extremely powerful that  they themselves know they cannot control. They of course have various reasons to  downplay these risks. To pretend that,   oh no, actually it's fine. We have to iterate.  Like they have a story about iterative safety.   They have like mm-hmm. Oh, we have to like it,  we have to deploy it actually for it to be safe.   But just think about that for three seconds. It sounds so nice when it comes to a Sam  Altman's mouth that is like, oh yeah, well,   we have to deploy it so we can debug it. But  think about that for 10 seconds and you're   going to see why that's insane. That's like  saying, well, the only way we can test our   new medicine is to give it to as many people in  the general public as possible, which actually   put it into the water supply just to, that's the  only way we can know whether it's safe or not. Just put in the water supply, give it too  literally everybody as fast as possible,   and then. Once and then before we  get the results for the last one,   make an even more potent drug and put  that into the water supply as well,   and do this as fast as possible. That is the  alignment strategy that these people are pushing. Let's be very clear about this here.  Very, very clear about this. So,   there is a version of this that I don't hate. If  for example, you know, an OpenAI develops GPT-2.   And then they take, they, they don't  release anything anymore. They take   all the time necessary to understand every  single part about GPT-2 to fully align it. They let you know, society like culture  gets caught up to it. Like, you know,   spam filters catch up to it. They let regulation  catch up to it and such. And then with all of this   fully integrated into society, they built GPT-3.  All right. You know? Fair enough. Okay, cool.   Yeah. Honestly, if that's what we were doing, if  that's what the plan was, I'll be fine with that. Like if everyone just stopped at GPT-4 and  just said, all right, all right, come on guys,   no more new stuff until we fully figure  out GPT-4 and once we fully understand it,   and regulation has fully regulated it and  society has fully absorbed it the way like,   you know, society has absorbed, you  know, like the internet or whatever   even so that's not fully absorbed, but  like, you know, and then they build GPT-5. I'm like, okay. Fair enough, but let's like, well,  I mean, come on man. Like, like gimme a break. No   one's going to do that. Like, that's obviously  bullshit. Like it's obviously just not true and   not what these people are planning. These  people are racing, let's be clear, they're   racing for their own personal gain, for their  own glory towards an existential catastrophe. And that no one has consented to that the public  has no oversight in the government has, for some   reason it's just letting it happen. Like if I  was the government. And one of my most powerful   industrialists was just on Twitter publicly  stating that they're building, you know, God-like   powerful AI systems that will overthrow the  government.I would have some questions about that. Yeah, the, well, actually, one of the  things I wanted to ask you about is the,   the, the letter which, you  signed, I saw, excuse me.   Has triggered an FTC complaint by another  group. those are actually unrelated, but yeah. Oh, the FTC complaint was  not related to the letter. My, at least not to my knowledge. Okay. Actually, I'm going to talk to them later today, so, yeah.  But in any case, there is this FTC complaint,   which it'll be interesting to see whether  the FTC takes it seriously, but, they have,   presumably, some real power. So, is that the  sort of thing that, that you are hoping for,   that the governments will begin, to use  whatever mechanisms are available to slow   down this development, or at least slow down  the public release of more powerful models? I'm very practical about these kinds of things.  You know, in a good world, you know, you know,   somewhere deep in my heart still is, you know,  a, you know, techno optimist. Like, yay, liberal   democracy, freedom, you know, let people develop  things and do cool stuff and like, you know,   they'll be fine. But like, like gimme a break. Like, like we have to, we have to have some  realpolitik here. Like, let's be realistic about   what we're looking at here. These companies are  racing ahead unilaterally, like these small, like   I cannot stress how small a number of people it is  that are driving 99.9% of this. This is not about   your, you know, local friendly grad student with  his two, you know, old GPUs or whatever, right? Like, one of the things I found on  Twitter when the letter got released,   and I do have some problems with the letter to be  clear, but I was a prominent signatory of it, and   I do think it's overall good. One of the things  people misunderstand about the letter is that   they seem to think it says like, you know,  stop, you know, like outlaw computers. That is not what the letter says. What the letter  says is no more things that are bigger than GPT-4.   Do you know how big GPT-4 is? In its training run  in pure computer GPT-4? Just running it - not the   hardware, just running it - is estimated to  cost around a hundred million dollars. So,   unless you and your local fund have you  know, friendly grad student friends are   spending a hundred million in compute on a  single experiment, this does not affect you. Now personally, if we could get even more than  this, you know, if we could, you know, clamp down   even, you know, on, on, you know, $10 million  things or $1 million things, also interesting,   but like, all right, let's, you know, one step at  a time here, right? One, one step at a time here.   So, the way I see things is that we're currently  going headlong towards destruction. Mm-hmm. Like there is no way that we will look,  you know, we can argue if you want to,   and we can do that about when it will happen. You  know, is it going to be one year or five years,   or 10 or 50 or like whatever,  right? Like we can argue about this   if you want. But I think the writing is on the  wall at this point, and I, I consider the burden   of proof at this point to be on the sceptics  of like, look at what GPT-3 and 4 can do. Look at what these AutoGPT systems  can do. These systems can, you know,   they can achieve agency, they can  become intelligent. They're becoming   more intelligent very quickly. They have  many abilities that humans do not have.   Do you know any human who has read every  book ever written? I don't. GPT-4 has. You know, they have extremely good memories.  You know, they can make copies of themselves,   these are, et cetera, et cetera. Right.  Even if you don't buy the, like, oh,   you know, the system becomes an agent and  does something dangerous, fine. You know,   like, I, I think you're wrong, deadly  wrong, but we can get into that. But what world in which systems like this  exist is stable in chronic equilibrium? Like   what world could, could possibly look  like the world we are living in right   now. When you can pay, you know, 1 cent  for a thousand, John von Neumanns to do   anything, like how could that world not be  wild? How could there not be instability? How could that not, you know, explode? Like  how I would like someone who doesn't buy AI   risk to explain to me how such a world  would look like, because I don't see it. Okay, so you are focused on the alignment problem   and Correct. Your startup Conjecture  is focused on developing, I presume,   strategies or technology that would improve the  alignment of future AI models with human goals. Technically, can you talk a little  bit about how you would do that? Yeah. Happy to talk about that.  So, the current thing we work on,   our current primary research agenda is what  we call cognitive emulation or CoEm. So,   this is a bit vague and public resources on this  are very sparse. There's basically one short,   you know, intro post and like, maybe one  or two podcasts where I talk about it. so, apologies to the reader, the listener, that  some of this is not very well explicative publicly   just yet. The idea of CoEm is rather simple.  It is. Well, it's both, it's both very simple   like a, you know, bird's eye view, but then  it gets subtle once you get into the details.   and we can get into the details  if you're interested, but,   ultimately the goal of CoEm is to move away from  a paradigm of building these huge black box neural   network, whatever the hell these things are,  that you just, you know, put some input in and   then just something comes out and, you know,  maybe it's good, maybe it's bad, who knows? And the way you debug these  things is you like, you know,   let's say you know you're OpenAI, right?  And your GPT-4 model, you give an input,   and it gives you an up output. You  don't like it, what do you do? Well,   you don't understand what happens inside the air.  It's all just a bunch of numbers being crunched. So, the only thing you can do is kind of nudge it  sort of in some direction. You can give it like,   eh, thumbs up, thumbs down, something,  something. And then you update these, you know,   trillions of numbers or whatever, I, who knows how  many numbers there are inside of these systems.   All of them in some random directions, and then  maybe gets you a better output, maybe it doesn't. Mm-hmm. Like the inherent, like,   I want to like to drive home how ridiculous  it is to expect this to work. It's like, Someone going to work with you,   you're talking about reinforcement  learning with human feedback. Yes. Yeah. Also applies to fine tuning and other methods,   like for the listener to understand these AI  systems are not computer programs with code like. This is not how they work. There is code involved  short, but like the thing that happens between   you entering a text and you getting an output.  It's not human code. There's not a person at   OpenAI sitting in a chair who knows why  it gave you that answer. Who can like,   you know, go through the lines of code and  like see, ah, here's the bug and then fix it. No, no, no. Nothing of the sort. AI systems  are more. They're not really written. They're   grown. They're more like organic things  that you like to grow in a Petri dish,   like a digital Petri dish. This is  not literally true. Do not take this   as a literal metaphor. Mm-hmm. To be clear,  there is subtlety to this, but the resulting   system is not a clean, human readable, you  know, text file that shows all the code. Instead, what you get is, is billions and  billions and billions and billions of numbers,   and you multiply all these numbers in  a certain order. And that's the output   and what these numbers mean, how they work, like  what they are calculating and why is mostly a   complete mystery to science to this day. I don't  think this is an unsolvable problem, to be clear. It's not like, oh, this is unknowable.  It's just mm-hmm hard, you know,   science takes time. You know, figuring out complex  new scientific phenomena like this takes time and   resources and smart, you know, people if like,  you know, if all the string theorists of the   world and all the young up and coming physicists  and mathematicians decided to, you know, you know,   buckle down and just like unlock the mysteries  of neural networks, I think they will succeed. You know, it might take a while. It might  be very expensive, but like, you know,   I do believe in the, you know, human spirit  and intelligence disregard. I, I think like   all of our best string theorists working together  could probably figure it out in like 10 years,   you know, like they could figure it out  and then it would be a mystery anymore. But currently it's a mystery. We have no  idea what's the mystery sauce that makes   these systems actually work. And we have no way  to predict them, and we have no way to actually   control them. It's because we can bump them in  one direction or bump them in another direction.   But you don't know what else you're picking up. You don't know if they learned what you wanted  to learn. You know, they don't know what signal   you actually sent to these systems because we  don't speak their language. We don't know what   these numbers mean. We can't edit them like we can  edit code. So, yeah, go ahead and yeah. Yeah. So,   what this leaves us with is we, this black  box, you have this big black box where we just   put some stuff in, some weird magic happens  and then something comes out and you know,   in many cases this is fine. Like, you know, you have like  a funny chatbot or something,   right? And you make clear to your users,  hey, this is just for entertainment. Like,   you know, don't take it seriously.  It might say something insulting.   Yeah, it's fine. Like, you know, like, you know,  it's not going to, it's not going to kill anybody,   right? Like, you know, you have like a fun  little, you know, like chatbot or something. Sure. Probably won't even kill anyone. So there  has recently been, I think one of the first   deaths attribute to LLMs, where someone  committed suicide after maybe a chatbot,   like encouraged them to, I don't know the details  about that, but I just heard that recently.   and I don't know any other details about it. And so, What? So, the interesting  thing here at the core is that   we have no idea what these things will do. And  if that's what we want, then fine. Right? If we   have a bounded, it's, you know, it just talks.  Just talks some stuff and we're okay with it,   saying bad things or encouraging  suicide, then sure, fine, who cares? But obviously this is not good enough  on a long term when we're dealing with   actually powerful systems that can do, you  know, can do science and can, you know,   interact with the world and manipulate humans  and, you know, whatever. Right? Obviously,   this is not a good enough safety property  of, you know, like, this is not good enough. So, with CoEm, the goal is  we want to build systems   that we're, we're, we're focusing and basically  on a simpler property than alignment. So,   alignment is basically too hard. So, alignment  would be, the system knows what you want. Wants   to do that too and does everything in its power  to get you what you truly want and like, but you,   it means like all of humanity, like it, you  know, it figures out what all of humans want. It negotiates like, okay, how could we  like to get everyone most of the good   things possible? How could we adjudicate  various disputes? And then it does that,   obviously this is absurdly, hilariously  impossibly hard. I don't think it's   impossible. It's just extremely  hard, especially on the first try. So, what I'm aiming for is more of a  subset of this problem. So, the subset   is what I call boundedness. So, what I, when I  say boundedness, what I mean is I want a system   where I can know what it can't  or won't do before I even run it.   So currently, I mentioned earlier  the ARC eval running on GPT-4. Where they tested, where the  model could do various dangerous   things such as self-replicating and like  hacking stuff like this. And it didn't,   for the most part though, it did lie to people  in that, captcha example. And so now there is a,   there is a wrong influence that you can draw from  this. The wrong influence, which is of course the   inference that OpenAI would like you to take  from this is that, well, it can't do this. Look, they told it to self-replicate,  and it didn't. Therefore, it can't.   This is a wrong reasoning as I think  Turing was the person who said this   best is you can never prove the absence of a  capability. Just because a certain prompt or   a certain setup didn't get the kind of  behaviour you wants, doesn't mean that   there isn't some other one you don't know  about that does give you that behaviour. With GPT-3 and also GPT-4. Now we are seeing this  all the time that, you know, I would stuff like,   stuff like jail break prompts like that, there's  like whole classes of behaviour the default model   will not do. Once you use a jailbreak prompt,  then it will suddenly happily do all these things. So obviously they did have these capabilities  and they were accessible. You were just   doing the prompt wrong. So, I want to  build systems where I can know ahead   of time. I can tell it will never do X. It  cannot do X. And then I want these systems   to reason like humans. So, what I mean by this  is, is why it's called cognitive emulation. I want to emulate human cognition. So,  another core problem of why like GPT   systems are or will be very dangerous is  because their cognition is not human. So,   this is very important. It's easy  to look at GPT and say, oh look,   it's talking like a person, so it must be thinking  like a person. But this is completely wrong. There is no reason to believe this. Like no  human is trained on, you know, terabytes,   random texts on the internet for trillions  of years while having no set body system   whatsoever and memorising all these things  and like obviously not. Like obviously it   is an alien mimicking a human. It is an  alien with, you know, a little happy,   smiley face mask on that makes it look  sort of human to you, but it's an alien. And if you use like jailbreaking proms or  I know if you saw like the self-replicating   ASCII cats and bingeing and such, where like,  you could get like, especially Bing, chatbot,   which is an early version of GPT-4, you  can get to do the most insane things,   like when things was like, you can get it to like  output, like these like ASCII pictures of cats. And these cats would say, oh, we are the  overlords. We take over now. And then   whenever you try to prompt it away from  that, the cats would come back and like,   take over your prompts and like it and stuff like  that, which is just like, I mean, it's amusing.   Like this is very funny. Like when I saw  this I was like, ah, this is very funny. But also, that's not how humans work, like,  like humans are. Of course not. So, so, But, but just on the, on the, on the tech,  you’re, you're, you're still talking about   scaled up transformer models. So, and, and how do  you, I mean, is it in the training that, that you Okay. good question. So, yeah, good question. So, I was first explaining the specification,  like what is the, the system that,   what should it accomplish? Right now,  we're talking about implementation and so,   many implementations are not yet done, or we don't  know how to do them yet, start to figure that out.   Some of it, you know, is just like private and  just like, you know, wouldn't share necessarily. But in general, this is the resulting  system I expect that has these properties   and that it reasons like a human. And then  importantly, it also fails like a human.   It is bounded so you can know what it  won't do ahead of time. And another   thing is I want causal stories or  traces of why does it make decisions? And these stories have to be causal.  Like currently, you can ask GPT,   why did you do that? And it'll give  you some story, but there's no reason   to believe these stories. Like you can just  ask it differently or whatever. And it'll do   something completely like it doesn't,  it doesn't listen to its own stories. It just makes some shit up. And so, I  want systems that give you a trace or   a story of like, why was this decision  made? All the nodes, all the actions,   all the thoughts that led to this and how  can you modify them? So importantly, as you   can probably guess from this kind of description,  this system is not a large, large neural network. There may be large neural networks  involved in this system. There may be   points in this system where you use  large neural networks in particular.   I think this is going to be extremely  necessary. I expect that large language   models for various technical reasons are  very necessary for this kind of plan. Well, they're not strictly necessary, but  they're the easiest way to get it done.   The way I expect a full spectrum CoEm system to  look, which is of course, to be clear, is still   completely hypothetical, not such a system.  What it would look like is, it would be more a   system, not a model. It'd be a  system which involves, you know,   normal code and neural networks and data  structures and verifiers and like whatever that   if you give it a normal human that you  can, you can make it do any normal thing,   any normal human like intelligent human could  do, and it will then do that and only that. That is what the system would do. And then you  can be certain, you can look through the log of   how it made a decision and you'd be like,  oh, at this point you made this decision,   but what would've happened if  you had made this other decision? And then it would like to rerun. And then you  can control these things. Or you can be like,   oh, you're making an inference here that I  don't like, or this doesn't make any sense,   or whatever. Like if the difference between,   say you want to develop a system that does  science, you want to develop a new solar cell. I don't know. Right. So, if you did this  with GPT, you know, 10, the way it would   work is you type in, make me a new solar cell,  whatever, right? It crunches some numbers,   and it spits out a blueprint for you. Mm-hmm.  Now you have no reason to trust this. Like,   who knows what this blueprint actually is. It has,  it is not generated by a human reasoning process. You can ask GPT 10 to explain it to you,  but there's no reason those explanations   have to be true. They might just  sound convincing. So of course,   if GPT 10 was also malicious, it could, you  know, have hidden some kind of, you know,   deadly flaw or device or whatever into  the blueprint that you don't detect. And if you ask it about it,  it will just lie to you.   If you did the same thing with a hypothetical  CoEm system, such a system would give you a   complete story, a complete causal graph of why  you should trust this output. I expect this   and in and like why, and every step in this, in  this story is completely humanly understandable. It's no crazy alien reasoning  step. There's no like, you know,   and then magic happened. There's no.  You know, massive computation that just   makes no sense to a human whatsoever.  Every single step is human legible,   human understandable, and the results of a  blueprint that you have a reason to trust.   You have a reason to believe this is the thing  you actually asked for and not something else. And are you, where are you in  this research? Is this still   sort of conceptualising the roadmap, or are you? we are in early, like experimentation  stages. so unfortunately, this is hard,   and we are very research constrained. You know,  billions of dollars go to people like OpenAI, but   it is not that easy to get money for  alignment, but we're working on it. So, we are very research constrained  and very talent constrained, but   We have some really great people working on it  and us, you know, we do have some really powerful   internal models and good, you know, software  working on it. So, we are making progress.   but it takes time. So, a lot of why I spend a lot  of my work now thinking about slowing down AI. And like, how can we get regulators involved? How  can we get the public involved? Like, to be clear,   I'm not just like, oh, you know, the regulators  should unilaterally decide on this. I'm like,   hey, the public should be aware that there's  a small number of techno utopians over in   Silicon Valley that you know, just want  to be like, let's be very explicit here. They want to be immortal; they want glory,  they want trillion, trillions of dollars,   and they're willing to risk everything on  this. They're willing to risk building the   most dangerous systems ever built. And  releasing on the internet, you know,   to your, you know, to your, your friends,  your family, your, your, your community fully   exposed to the full downsides of all these  systems with no regulatory input whatsoever. And like this is what the government is for,  is to like to stop that. Like this is such a   clear-cut case of like, hey, Like, why is  the public not being consulted here? Like,   this is not, you know, if this is just me  in my basement right. With my laptop and   never showed the world anything, like,  you know, okay. You know, maybe, maybe. But that's not what's happening here. So, and  the reason this is also important is just like,   alignment is hard. Boundedness is hard,  CoEm is hard. All these things are hard,   and they take time. And currently all the  brightest minds and billions of dollars of   funding are being pumped into accelerating the  building of these unsafe AI systems as fast as   possible and releasing them as fast as possible  while safety research is not keeping pace. So, if we don't get more time and if we don't  solve, you know, maybe my proposal doesn't   work out right? Sure. You know, science is hard,  but if we don't get someone's proposal to work,   if not, we don't get some safety  algorithms or designs for AI systems,   then it's not going to go well. And is that  going to matter how many trillions of dollars,   you know, open AI makes off of it, or,  and Microsoft makes out of it or whatever? Cause they're not going to be around to enjoy it.
Info
Channel: Eye on AI
Views: 110,147
Rating: undefined out of 5
Keywords: AI, artificial intelligence, machine learning, ethics, regulation, alignment, GPT-4, EleutherAI, Conjecture, superintelligence, OpenAI, chatbot, autonomy, nefarious activities, FTC, government intervention, CoEm, research, development
Id: tYGMfd3_D1o
Channel Id: undefined
Length: 55min 41sec (3341 seconds)
Published: Wed May 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.