Connor Leahy on the State of AI and Alignment Research

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right welcome to the future of Life Institute podcast I'm back here with Connor Leahy and Connor is of course the CEO of conjecture Ghana welcome to the podcast back again all right so how does the Strategic landscape of AGI look like right now tell me if tell me if you think this kind of structure is is approximately true do you have open Ai and deepminded front followed by the American Tech giants like Apple and Google and meta and then perhaps you have American startups and then you have Chinese Tech Giants and startups in terms of capabilities is that the right ordering sort of I would say deepmind is behind open EI by a pretty significant margin right now uh I think anthropic might actually be ahead of deepmind at this point um not 100 clear deep mine keeps the cards much closer to the chest so it might have some really impressive Eternal things I've heard some things that effect but I don't have evidence of them so it seems to be open AI is clear front center ahead of everyone else I expect anthropic will catch up it seems like they're trying very hard to train their gbt4 model right now like the equivalent model right now I expect they will succeed Tech Giants I mean really depends like meta is like pretty far behind Google is deepmind um Apple doesn't do anything as far as I can tell so I would for example Apple thinks of startups such as character are ahead of all of them like character AI is a company that was founded by name Shazier and others you know Shazier being the person who invented the Transformer and you know they're they're both focused on chat Bots and such but their models are quite good they're quite good so they're yeah they're yeah that's kind of how it was set yeah I don't feel that Chinese check science and startups are very relevant I think they're really far behind and I don't expect them to catch up anytime soon are we simply waiting for the tech American Tech Giants to make moves here I I mean apple has a lot of money they have a lot of talent they have machine learning chips on all of their iPhones you could you could easily see an enhanced Siri uh GPT 4 style sure but you know Google which is supposed to be the best of these couldn't even keep the guy who invented Transformers around because they're so dysfunctional then one of the first things I was you know I was told by experienced investors and such when I found the startup is that like incumbents are not competition they're all incompetent it's like all like sure all these things are possible they have the resources to do these things but there is a lot of reasons why it could be very hard for large organizations to execute on these kinds of things another great example is The Bard so like the chatbot that Google produced um it was severely delayed it you know had lots of problems and it was just like extraordinarily underwhelming compared to what a much smaller group at openai was capable to do in smaller amounts of time you know Google is in code red now where you know the CEO is personally involved and like everyone's freaking out and they that doesn't mean they can catch up like you know just because a lot of people in a board room say something should be done and they have a lot of money that's not enough it's there are some things that are actually hard and you know training complex cognitive engines like gpt4 is among those things other ones are for example like chip production like you know China did a huge thing about how they're going to produce their domestic chip production they're going to catch up to tsmc and that has now been like slowly like choking in a way because it's just not succeeding because no one in the world other than tsmc can get these ultraviolet machines to work for whatever reason so what we're doing right now in kind of um describing incumbent technology Giants as incompetent might that be mistaken because perhaps they're hiding they're they're really they're not waiting to release until they have something that's very polished that would be very apple-like thing to do perhaps deepmind has something uh that they're not releasing because they are safety conscious or is this simply is this kind of Wishful Thinking wishful thinking okay so so so the situation is that this is interesting because because then the situation is kind of how it would look if you're a naive Observer you're just seeing open AI making lots of progress and the the most legible big players are the most advanced big players is that so basically the the Strategic landscape is transparent in that way it is quite transparent like the truth of the matter is this field is not good with secrecy this is not like defense contractors where people have a culture of secrecy and like keeping things towards the chest like apple is like the only company on this list that like is good at that deepmind also and traffic is also trying but you know mixed if you ask you know who's better you know Lockheed Martin or like you know Airbus fighter jets or whatever I'm like okay I genuinely don't know like you know that's like actually hard to know and like people will actively make it hard for you to know these things but like all these people have an incentive to make it public how good they are and they do so quite aggressively when it benefits them and you know Google you know scrambled after chat EPT to catch up with Bart and they put their best effort forward and it was a flop and you know same thing like with Ernie and China and stuff like this like I think like don't Galaxy brain yourself like it's just what it looks like and also the AI researchers in the most advanced organizations they they want their they want to publish research so that they perhaps can can move to another organization they have Mo they have interests and incentives that are not uh not particularly aligned with the company they work for so there are these there are these publishing Norms where your resume as an AI researcher is your published papers and does this make it basically difficult to to prevent new advances from from spreading out to to a lot of companies yep that's exactly correct but if that's the case why can't Google catch up then because there is an additional aspect to it um which is execution and tacit knowledge so especially with large language model training a massive amount would difference a good language model from a decent language model is weird Voodoo or it's just like someone just has a feeling like oh you know you have to turn down the Adam beta to Decay parameter why gnome said so you know like there is a theoretical catch-up you know where it's like you know you need to come up with the right architecture the right algorithms whatever but there's also engineering like just like research and like Logistics like you know setting up a big compute data center is hard and takes a lot of money and time and specialized effort and also then you need so there's like a logistical aspect to it there's also there's a you know will and like you know executive capacity that like can an organization commit to doing something like this and like have someone lead it who can like actually manage the complexities involved with it and then there's a huge aspect of task knowledge of just like the stuff that isn't written down and there's a lot that is not written down and that that tested knowledge might be particularly important in chip production which could be why you know the the Giants or the people in front the companies in front and ship production are you know they are very difficult to copy that is exactly correct like this is also what people on the inside of Flagship production will tell you that there is a absolutely phenomenally large amount of tacitology that goes into reducing high-end computer chips and that like there's a lot of stats down to like only tsmc has and like they're not writing it down and even if they wanted to they probably couldn't why is it that for example with defense companies or with chip production there there's a there's a protection of intellectual property in in such companies that we don't see in AI companies in the the most kind of secure or safe uh secrecy uh in AI companies is simply to to not say what's happening you you don't see that that these company secrets are protected by intellectual property in the same way yep I think this is completely contingent historical cultural fact I think there is no it didn't have to be this way I think it was literally just coincidence it's literally just coincidence that just the personalities of people that like founded this field and like the academic Norms that it came from a very academic area there is much less military involvement and much less you know industry involvement initially and there's much more like academics have much more bargaining power in that like um because of the high level of tacit knowledge there's a larger bargaining power that academics have here because you know if gnome Shazier wants to publish like he can go wherever he wants so if you don't allow him to he'll just go somewhere else or like whoever you know any high profile person but I think this is totally contingent from the perspective of people in ml that thing like this is the way things have always been we'll always be can only be this is obviously wrong because you know you're telling me the people who build like you know stealth Fighters are not incredible scientists and Engineers like you know give me a break like just because you're a great scientist engineer doesn't mean they're like you know compelled by their very genomics to like you know want to publish like no this is just a cultural Contin this is a contingent truth about how the culture happened to have evolved as the race to AGI heats up will we see more more closed Miss so close the data closed algorithms uh would you see Labs not publishing as many papers will these kind of Open Source Norms from the AI researcher Community begin to fall apart I hope so we're seeing some of it that's for sure and I hope it accelerates okay as you see it we are already in a race towards AGI correct yep obviously so maybe 10 years ago when people were debating AI people would debate whether human level AI is is possible and whether we could whether we could get there within a couple of centuries and so on perhaps uh is it time now to retire the the term ATI and to talk about just specific predictions because it's it's people mean a lot of different things perhaps if you if you asked Conor Leahy in 2010 whether chat GPT or gpt4 was an AGI system what would you have responded I mean depending on how well you described it I would have said yes but I do think these things are API so I still do but just like people just don't like my definition of the word AGI so I don't use that word very much I agree that the word AGI is not particularly good it's um people use it to mean very very different things like by reasonable definitions of AGI that were used like 10 years ago obviously chat gpt's AGI obviously so like most the definitions of ATI from like 10 or 20 years ago were you know vaguely can do some human-like things in some scenarios right and like you know and like you know reasonable human love performance I'm like a pretty wide range of task obviously attached apt and gpt4 have reached this level and obviously they have the ability to go beyond that but there's other definitions that they don't fulfill you know like strictly better at humans at literally everything like sure you know TBT is not that but also LOL like what are you doing perhaps that's not super interesting there will always be two percent that where humans are just better sure or people are just you know testing it wrong or just not bothering to do it or whatever so like the the real definition of AGI that I am most interested in personally is more vague than that and it's something like A system that has the thing that humans have that chimps don't it's like you know a thing that you know you know human brains are basically scaled up chimp brains you know they're like the fact they're like three year four or something they're all the same structures all the same kind of things internally blah blah blah but humans go to the Moon gyms don't so you know there's some people who are like oh okay but like you know there's always gonna be some task that A specialized you know tool like you know AGI couldn't fold proteins as good as Alpha fold or whatever I'm like yeah sure sure AGI maybe you can't fold proteins as good as Alpha fold but it can invent Alpha fold so the relevant thing I'm personally interested in is just like a thing that is powerful enough to learn and do science and to pose potentially existential risks to humanity like that's the thing I personally care about and when I talk about AGI that's generally what I'm referring to but I agree it's a bad meme I think it's a bad word because other people are as I've said before some people in the hearing AGI think like you know friendly human robot buddy who's like you know sort of smart as you but not really smarter you know but other people think you know AGI is god-like super thing that can do everything log into media things uh yeah you know we can we can have it and we can have a semantic fight about this but I don't know so perhaps a way to resolve these uh issues is to make predictions do you would you be willing to to make a prediction about when for example a an academic paper created by an AI model would get published in say a a reasonably high quality scientific journal that's under defined it's like does the sister how much human intervention is allowed here do I give it a prompt does it have to man you know navigate the website and upload the paper itself say you you give it a you can give it a detailed prompt and the system simply has to create the paper and nothing else okay so do does this have to have actually occurred or be possible because possible yesterday actually occurred I think there's no one's gotten around to it I just don't bother to do this do you think this could be done now yeah absolutely like obviously so like you know the SoCal Affair you know already got papers published you know you know decades ago which complete nonsense I think you could have done this with gbt 2 probably if you if you allow like non-stem journals you may maybe need gpt3 for stem journals but like have you read ml papers like so many of them are so awful like this is like not that hard so you think this is basically already here now oh yeah absolutely but I think this is not capturing the thing you're interested in this thing I think you're probably interested in is like candidate do science you're not interested in candid trick reviewers into thinking something is good so the question of when will the publish a scientific paper which were what I expect you know correctly if I'm wrong expectually looking forward to question when can it do science you're not looking for the question how stupid are peer reviewers true so how how can we make this question interesting then is it is it how when can a can an AI system publish a scientific paper that gets cited a lot or or is that also simply kind of uh a way of gaming the system or so there's various ways we can think about this and and I'm going to give the unsatisfying but I think correct answer which is by the time you can do that it's too late if you have a system that can fulfill whatever good criteria you can actually come up with that actually means you can do actual science it's too late and I expect at that point if we have not aligned such a system you're not creating from things then you know end times are upon us and we do not have much time left if any if you ask me to bet on these things or like you know do you know with my real money I just like wouldn't because I like I don't expect the BET to pay out do you expect AIS to publish uh credible scientific papers before they can empty a dishwasher credible or correct those are different correct interesting scientific papers I expect that to happen probably before the dishwasher yeah I can make a concrete prediction I expect the world to end before you know like more than 10 of cars on the street are autonomous okay so what what we have here is is a scenario in which we are close to to transformative AI we could call it uh or perhaps deadly AI if we are very close does this mean that the the game board is kind of settled in a sense the big players are the the players that are going to take us all the way there so for example we could ask is it open AI that ends up creating transformative AI seems pretty likely in the current trajectory if nothing changes if Government doesn't get involved if you know culture doesn't shift if people don't Revolt then yeah I expect you know open AI Deep Mind anthropic eighty percent one you know 70 one of them and like you know let risk percentage like smeared over you know like other actors or like actors that have yet verged are we are we getting hyped up on an exponential curve that's about to flatten off so will we for example because we run out of data or we run out of accessible compute or something of that nature this is not something you see you you see coming I don't see any reason to expect this my general predictions are usually predict what if you don't know what the future is going to hold predicted what just happened will happen again and this is what I'm seeing we're at the beginning of the next you know we're now in takeoff you know exponentials are happening and will is flatten off at some point yeah sure I just expect that to be post apocalypse let's take a tour of the landscape of the different alignment Solutions or AI Safety Solutions so the current Paradigm that's that's used by openai for example is that you train on human creative data and then you do reinforcement learning from Human feedback kind of fine-tuning the model afterwards if nothing changes if we are very close to transformative AI if it perhaps transformative AI is is will be developed by openai could this succeed as a as a last option do you think that reinforcement learning from Human feedback could take us to something at least somewhat uh Safe Systems no there's no chance of this Paradigm working now like it's not it's not even the alignment solution it's not a proposal I don't think the people working on it will even claim that like I'm pretty certain that if you asked like Paul Cristiano or something like is our HF a solution to alignment they would just he would just say no and for context Paul Christiano basically invented uh reinforcement learning from Human feedback yet he was one of the core people involved in it and I don't think he I mean maybe some people involved in the creation we're playing this I don't know but I would expect that if you ask the people to create these methods is this an alignment solution they would say no and they don't expect this to work like I don't think that rhf in any way addresses any of the actual problems of alignment it is a the core problem of alignment is how do you get a very complex powerful system that you don't understand to reliably do something complicated but you can't fully specify in domains where you cannot supervise it it's you know the principal agent problem writ large rhf does not address this problem it doesn't even claim to suggest this problem there's no reason to expect that rhf should solve this problem this is like it's like you know clicker training an alien you know it's like you know there is every time you do an early TF update so you can you know if for those are not familiar you kind of imagine it's simplified as the you know the model produces some text and you give it a thumbs up or a thumbs down and then you do a gradient update to like you know make it more or less likely to the stuff you have no idea what is in that gradient there is no idea what it is learning what is updating on you know let's say you know your model threatens some users or whatever and you're saying like oh that's bad so give it a thumbs down well what does the model learn from this well one thing it might learn is don't threaten users another thing you might learn is don't get caught threatening users or it could learn to use less periods or you know um don't use the word green or like like who knows like in practice what it's going to learn is is a superposition of like all of these like or like tons of these possible explanations and it's going to you know change itself like the minimum amount to like fulfill this criteria or like move in that direction and in that domain but you have no reason to expect this to generalize maybe it does sure maybe it does but maybe it doesn't like you have no reason to expect it to there is no Theory there is no prediction there is no safety it is like you know it's like the you know secreted Roy and the tiger right it's like well you know we've raised it from birth it's so nice and then it mauls you like why who knows tiger had a bad day I don't know perhaps the counter arguments or something like this is that it's only when we begin interacting with these systems in an empirical way that we can actually make progress uh the the 10 years of alignment research before uh say 2017 didn't really bring us any closer it was only when when openai began uh interacting with real systems that that we understood how they even worked and therefore perhaps gained information about how to align them do you bother it no and and because I mean like which part of that is true like a it's the saying that like no progress happened when it was like you know three people in the basement working out with no funding is like what the hell are you talking about a like like given it was three people in the basement they made a longer progress on predicting things that would happen on de-confusing Concepts on you know building an ontology for things that don't yet exist this was extremely impressive given the extremely low amount of effort put into this and you know sure they didn't like you know they didn't solve alignment sure but like has any progress happen alignment since then it's like not obvious to me that there's been more people using the word there's a lot more papers about it but like and like there's stuff like Arlo Jeff and stuff I don't consider roh actually progress in a sense as regression it's like the fact that anyone and this is not to meant to be as a critique per se of the people who did Arlo check because I think they were fully aware that hey this is not alignment this is an interesting thing we want to study a little bit which I think is totally fair so legitimate you know and like rohf has its purpose it's you know it's a great way to make your product nicer right as like a capabilities method rhf was totally fine like you know just don't delude yourself into thinking that this is you know I don't buy this whole like well if it makes them a model behave better in some subset of scenarios this is progress towards alignment I think this is a really bad way of thinking of the problem it's like saying well if I hide my password file two folders deep then that is security because you know there are some scenarios where an attacker would not think to look two folders deep and I'm like sure in like some trivial sense that's true but like that's obviously not what we mean when we talk about security it's like if you if you encrypt the password file but your encryption is bad I'm like okay that's progress but your encryption is bad I'm like all right cool this is obviously safety I accept this as safety but now I have problems with your encryption you know system that is progress and Alignment that we can argue about you moving a thing your your password.txt two folders deep I do not consider progress you weren't even trying to solve the problem you were trying to do something different you know you don't think that uh Microsoft's Bing chat or Sydney uh was less aligned than the chat gbt4 not in the ways I care about like it's you I think this is a stretching of what I use the term alignment for so like you can make that statement I'm this I think this is a completely legitimate way of defining the word alignment if you want to Define it that way that is an okay thing to do but it's not the thing I care about I do not expect that if I had an underlined existential risk AGI and I did the chat GPT equivalent to it that that saves you I think that gives you nothing you die anyways nature doesn't grade on a curve like just because you're you know 10 better if you don't meet the mark you still die it doesn't matter it doesn't matter if I'm you know your smiley face was a little ten percent larger than the next guy's smiley face if you're only painting you know incrementally larger smiley faces it doesn't matter so what about extrapolations from reinforcement learning from Human feedback for example having AIS work to give feedback on other AIS could that maybe scale to to something that would be more interesting for you why would you expect that to work like where does the same safety come from here there is no step in this process where you actually are addressing the core difficulty of how do you deal with a system that will get more smart that will reflect upon itself that will learn more that is fundamentally alien with fundamentally alien goals encoded in ways we do not understand can access or modify that is extrapolating into larger and larger domains that we cannot supervise no part of this addresses this problem it's like you can play Shell Games with where the difficulty is until you confuse yourself sufficiently to thinking it's fine this is a very common thing how does the science all the time especially in cryptography there's a saying everyone can create a code complex enough that they themselves can't break it and this is a similar thing here where I think everyone can create a lineman scheme sufficiently complicated that they themselves think it's safe but so what like if you don't if you just confuse where the safety part is that doesn't actually give you safety what would be evidence that you're wrong or right here for example if it turns out that the GPT model that's that's available right now is not used to create uh you know havoc in the world is not used to to scam people and turns out to to not be be dangerous in the sense that we expected would this be evidence that perhaps open AI is doing something right so it a proof of this would be is that no one on the entire internet can find any way to make a model say something that there's no prompt that can be found that makes it say something open 8 doesn't want it to say perhaps not not say something bad it's it's not specifically about bad words this is actually quite important this is quite important because what open AI is trying to do is to stop the model from saying bad things that's what they were trying to make it do and they failed that's the interesting thing if they had an alignment technique that actually worked that I expect might have a chance to work on super intelligence system it should be able to make your less Smart Systems never in any scenario say a bad thing so it should be much more it should work in in almost all cases or basically all cases foreign so importantly it has to be all cases because if it's not all cases then I I unless you have some extremely good theoretical reason why actually this is okay but by default these are black boxes I don't accept any assumptions unless you give me a theory a causal story about why I should relax my assumptions then I'm like well if it's breakable it will break and this is the security mindset the difference between security mindset and ordinary paranoia is ordinary paranoia assumes things are safe until proven otherwise security mindset assumes things are unsafe until proven otherwise and sure you can't apply security mindset to literally everything all the time because you go crazy right sure but when we're dealing with existential threats of extremely powerful superhuman optimizing systems systems whose whole purpose is to optimize reality into weird you know edge cases to to find break systems to glitch to to you know enforce power upon systems this is exactly the type of system you have to have a security mindset for because if you have a system that's looking for a hole in your walls you can't and that you have one small hole that's not good enough if you have a system which is you know randomly probing your wall and you have one small hole yeah maybe that's fine if it's small enough that's okay but it's not okay if it is the deliberately looking for the small hole and if it's really good at finding them what about the the industry of cyber security for example you would assume that they have security mindset or at least they should have but accidents happen all the time data is leaked and so on so isn't that evidence that we can survive that we can survive situations where we should have had security where where there are holes in our security so so it's not actually true that systems have to work 100 of the time the fact that we survived has not anything to do with the security method it has to do with the systems being secured not being substantial if those systems had been existentially dangerous hdis yes I expect we would be dead it's only because of the limited capabilities of these systems that can be hacked and then have been hacked and so on exactly let's let's take another Paradigm of AI safety which is mechanistic interpretability and this is about understanding what this black box machine Learning System is doing trying to reverse engineer the algorithm that produced the neural network weights is this is this a a hopeful Paradigm in your opinion I think it's definitely something worth it worth working on is something that you know meeting people at conjecture work on as well I think the way I think about interpretability is not as res as a alignment agenda like alignment like interpretability doesn't solve alignment it might give us tools with which we can't construct an align system it might allow us like the way I I think about in my ontology is that I think of mechanistic interpretability as attempting to move cognition from Black Box neural networks into white boxes again as I've said before black box is Observer dependent you know neural networks are not inherently black box it's not like an inherent property of the territory it's property of the map if you have extremely good interpretability in your head an extremely good theory and extremely you know a lot of compute in your head and whatever then a neural network would probably look like a white box to you and if that is the case fantastic now you can like you know bound lots of things and maybe like stop it from doing bad things and what whatever and so I expect this so this is like like this is the default thing I tell people to do if they don't know what to do if they're like I don't know what to do with oh I meant or saved it I'm just like okay just working interpretability just like just like try just like just like bash your head against it and just see what happens um not because I think it's easy I expect this to be very hard I also think a lot of the current Paradigm mechanistic interpretability is not great I think a lot of people are making simplifying assumptions I think they should be making but um in general I'm in favor and I think this is good it's one problem or perhaps the main problem with interpretability research is just the question of whether it can move fast enough so we are just beginning to understand the some interesting clusters of neurons in in dpt2 but right now gbt 5 is being trained and so can it keep up with the pace of progress do you think it can I mean the same implies to like literally every other thing my my default answer is no I don't expect things to go well like again I expect things to go poorly I do not expect us to solve alignment on time I expect things will not slow down sufficiently I expect things will continue and expect us to die I expect this is the default outcome of the default world we are currently living in this doesn't mean it has to happen there is this important thing that you know the world being the way it is is not over determined it didn't have to be this way the way the world currently is the path we're currently on is not determined by God it is because of decisions that humans have made individual humans have made decisions and institutions and so on have made decisions and you know done things in the past that have led us to the moment we aren't right now this was not over determined it didn't have to be this way it doesn't have to continue to be this way but it will if nothing changes so I exp do I explain interpretability to work on time no do I expect tell them to work on time no do we expect rlgf to work ever no I don't expect any of these things to work that doesn't mean it's impossible if we take action if things change if we slow down or if we make some crazy breakthroughs or whatever I think I think interpretability is like I think there is a lot that can be done here you know I think there's a lot of theory there's a lot of you know things that can be done here will they happen in will they have like are they possible to happen yes will they happen in time probably not then there is the research done by Paul Cristiano and Eli isaacowski at the alignment resource center and the Machine intelligence Research Institute this is something that's that's for me at least difficult to understand as I see it it is it is attempting to make a to to use mathematics to prove something about the background assumptions that uh in which alignment is is situated what do you think of this research Paradigm should we have I mean is there any hope here so I feel like both Paul and elieza with stream if you put them in the same bucket I think their research is actually very different um so just to say a few words on that um the straw man I mean I'm specifically straw Manning because Eliezer and Paul are some of the most difficult people to get their true opinions right so basically every single person I know completely mischaracterizes Paul even people who know him very well like whenever someone who knows the fall very well I asked him what Paul believes they tell me X and then I asked Paul he tells me something different so like I I think I don't think this is malicious I think it's just hit Paul and yukoski's opinions are very subtle and very complex and communicating them is hard so I am pre-fractioning this I am definitely misunderstanding what Paul and Elias are truly believe I can just give my best straw man so my best straw man I can give for Cristiano view is that he works on currently something called elk which is eliciting latent knowledge it's kind of like this attempt to adapt plus another thing that I'm aware is trying to like think of worst case scenarios how can you get like true knowledge out of like neural networks how can you get their true beliefs about system or not necessarily networks like any system like arbitrary system even if they're deceptive and also related to that he does some like semi-formal work about like proofs and like calls out causal tracing through neural networks stuff like this I this is a straw man this is definitely not an accurate description but he actually does but this is the closest I can get while Eliezer um well he's currently on Hiatus he's currently on sabbatical so I don't think he's currently doing any research actually but historically what Miri there in the organization that he founded does is kind of building formal models of agency like trying to deconfuse agency and Intelligence on a far more fundamental level than building like you know just doing some code and trying to build an AI and then figuring out how to line it it's way more thinking for first principles what is an agent what is optimization what would it mean for systems to be aligned or corrigible can we like Express these things formally how can like systems know that you know they or their successes will be aligned how can they prove things about themselves how would they coordinate or like work together a lot of work on like decision theory on embedded agency and stuff like this so I think a lot of the Miri Paradigm is um a lot more subtle than and then and I understand the Murray Paradigm better than I do Paul um I think a lot of it is very subtle but actually I think a lot of the mirror work is very good I think it's very good work I think it's really interesting um but that's just my opinion um so when people talk about like formal mathematical theories blah blah blah they often refer I think to like something that I think Eliezer said in the sequences where he's like the only way to get aligned AGI is like formally proof checked full thing you know solve all of alignment in theory and then you know build API I don't know if he still believes this he probably does but I'm just saying I just don't know I haven't I don't think I've asked him maybe I've asked him but I don't remember his answer um and I don't think Paul believes this I might I I like Paul last time I talked to him again this is straw man please don't hold me for this Paul sorry if I'm misresenting you here um my understanding is that he has like you know 30 P Doom even in on the current path or something like that which obviously isn't going through formal methods so by that I deduce that he doesn't expect this to be necessary if that's wrong I apologize that's just my impression that Paul is quite open to non-formal things and neural networks and that kind of stuff well you guys are kind of has this belief that like if we if it's just neural networks we're super script we're just super super screwed there's nothing we can do it's way too hard so like a lot of the muri perspective I think is it like aligning neural networks is so hard that we have to develop something that is a neural networks that is easier to align and they have to use that instead and this has been not super successful as far as I can tell my view on this is not sure about Paul's agenda bit pessimistic I'm pretty pessimistic about elk it seems too hard um pretty pessimistic about that don't really understand the other stuff he's working on can't really comment on that um I definitely disagree with him on some points about interpretability and um pedoom and such I think he's too optimistic about many things but every time when I bring this up he actually then has good Counterpoint so maybe he has some good counterpoints I just don't know about for Eliezer I agree that in a good world that's what we would do like I think in a good world where people are are sane and like coordinated and we take lots of time we would do much more meary like things not necessarily exactly what Miri did I think some of the exact details of like emir's research identity are like not what I would have done but like the general like class of things like those deconfused agency let's deconfuse alignment courage ability and then like try to build formal models and I try to understand it I think this is super sensible I think this is like a super sensible thing to do it didn't work in this one specific instance given the constraints they had I don't think that means the entire class of methodologies you know ontologically flawed and like cannot possibly work I'm just like you know they tried they found some things that I find interesting and other things didn't work out like bro that's how science works and perhaps perhaps it could have worked if we had started in 1950 working on this and had developed it alongside the you know the general mathematics of computation or something like that yep I think this is completely feasible I think it's completely possible just like things have gone slightly differently or just you know if Miri had you know one more John Von Newman get involved and get really into it in the early days like you know I think it's not obvious that this is a hundred years away or something like this um it might be but it's not obvious to me like things always feel impossibly far away until they're not people thought you know flying machines were hundreds of years away the day before it happened same thing with like you know nuclear fusion and like vision and like stuff like that so like it's I feel like Miri gets a bad rap because that sure they made some technical bets and they didn't quite work out but I think that's like fair so I'm pretty sympathetic to the utkowski view even so I am uh so my personal view is kind of like we're at the point this is a strategic decision like okay if I had if I knew I had 50 years I would probably work on mirror delay stuff but I'm like all right I don't have 50 years so I'm like the kind of Cohen stuff I work on is more of a compromise between the various positions well really all right there's a spectrum between fully formal everything white box and nothing formal completely Black Box let's try to move to as far towards the formal things possible but no further kind of that makes any sense it does so perhaps introduce poems these cognitive emulations yeah so uh cognitive emulation or poem is the agenda that I and projector are primer focused on right now it is a proposal or for a well more research agenda for how we could get towards more safe useful powerful AI systems by fundamentally trying to build bounded understandable systems that emulate human-like reasoning not arbitrary reasoning not like they just solve the problem but whatever means necessary but they solve it in the way humans would solve a problem in a way that you can understand so when you use a co-em system and so these are system top models like I don't expect this to be like this on neural network this is like a it may involve neural networks it probably does involve neural networks but it'll be like a you know system of many sub components which can include neural networks but also include non-neural Network components that when you use such a system at the end you get a causal story you get a reason to believe that you can understand using human-like reasoning why it made the choices it did why it did the things it did and why you should trust the output to be valid in this regard yeah and for listeners who were enticed by that description they should go back and listen to the previous podcast in this series where Conan and me we discussed this for an hour and a half all right so as we see more and more capable AI systems do you expect us to also see more and more public attention do you expect the public attention to to kind of scale with with capabilities or will public attention lag behind and only come in right at the very end both in that I think we're at the very end and we're sorry to see the attention now okay do you think that this will on net be a positive or negative so will public attention to AI make AI safer at the current point I see it as positive um this is not obvious it could still go negative very easily but the way I currently see things is that all the everyone is racing headlong into the abyss and at least what the public so far in my experience been able to do is to notice hey wait what the don't do that which is great progress that's correctly you can tell it is truly imagining how many smart you know ml professors and whatever are so incredibly utterly resistant to the possibility that what they're doing might be bad or might be the interest it is incredible the level of rationalization that people are capable of I mean it's not incredible like it's actually very expected this is exactly what you expect they rely on a man to understand something when a salary depends on him not understanding it and even the people who claim to understand it and say all the right words like you know they still do it like you know opening eye can say all the nice words about alignment they want or anthropic or deepind or whatever they're still racing to AGI and they don't have an alignment solution so I don't like speculating about people's like internal minds or like why are they doing it are they good or are they aligned or that I don't really care what I care about is what they do and for me the writing is on the wall just people are just reading towards the abyss and if no intervention happens if nothing changes they will just go straight off that Cliff with all of us in tile and I think the public you know even so they don't understand many things and there's many ways in which they can make things worse do seem to understand the very very simple concept of don't Korean off Abyss into the abyss stop that right now so here's an argument uh open AI releases gbt4 and this draws a lot of attention and therefore we get more resource sources into regulation and and AI Safety Research and so on and so it's actually a good thing to release a model that's that's very capable but not a super intelligent AGI is this is this a Galaxy brain is this 4D chess or do you think there's something there sure there's something there but it is also just obviously for DHS like it's like okay if you had a theory with which you could predict where the abyss is sure okay there is no such Theory you have no idea what these systems can do when people get their hands on you have no idea what happens when you augment them with tools or whatever you have no idea what these things can do there's no bounds there's no limits or whatever so every time you release a model every time you build such a model you're rolling your die you know maybe this time's fine maybe next time's fine but at some point it won't be like it's Russian Roulette sure you know you can you can play some Russian Roulette most of the time it's fine you know five out of six people say Russian Roulette is fine what about a possible counterproductive overreactions from more public attention to AI for example imagine that we decide to pause AI research but AI Safety Research gets lumped in with AI capabilities research and so even though we're not making any progress on capabilities we're not making any progress on safety either and when we lift this pause we are in the same place as when we instigated it honestly I'd be extremely surprised if that happened like I'm trying to imagine how that would actually play out in the real world like people won't even accept like a moratorium on training things larger than gpt4 which is like the easiest thing to implement the easiest thing to monitor that affects like you know a teeny tiny sliver of all AIS research like there are so few people that could or ever would train a gp4 size model and you know that's such a teeny tiny sliver of AI research and not even that is like feasible in the current political world it's like very hard to get done it's like an overreach so large that you know Mary doodling you know type theory on their whiteboards get shut down I'm like oh that's a not the world we live in like if we were in that world I would be like okay interesting um let's talk about it but this is just not the world we live in what about AI becoming a military technology and only the only military can work on it and perhaps they work on it in ways that uh turn out to be dangerous yep I am concerned about this I think this is one of the ways things can go really badly um I used to be more virulently against this than I am now now in another sense I look at where we're currently heading and I'm like all right currently we have a hundred percent Doom chance what are the other options right and so look I'm not going to defend like many of the atrocity is committed by militaries across the world or whatever right I'm not going to say that there's not problems here I'm not going to like say deny that there's some really up people involved in these in these organizations or anything like that of course there are but also um at least in the Democratic West you know don't want to speak about other nations but like there is such a thing is oversight like there are Court martialings like this is an actual thing actually happens in a sense the military is authoritarian in a good way like the the military is very authoritarian there is hierarchy there is Authority there's accountability there is structure like very like the US military does a lot of bad things but at least to some degree they are accountable to the American public like not perfectly there's lots of problems here but like if a senator wants a hearing to investigate something going on in the military they can usually get it which you know is not perfect you know huge problems or whatever but I'm like that's something and like people do you look like you know politicians do care you know um they might make very stupid mistakes and they might make stupid things and it might make things worse like the dod could scale up you know like you know gbt4 very easily they could make something much bigger than that you know if they if they did a Manhattan Project and they you know put all the money together to create you know GPT you know you know gptf you know just like end of the world system then they they could and that would be bad um so I think it can make it worse but it's not obviously so it's not like it could also be that they you know a like also it's just like super competent slow you know bureaucratic mess and the military is very conservative like very very conservative about what they deploy what we'll do they want extremely high levels of security they want extremely high levels of reliability before they use anything like if we built AI systems to military standards of like reliability like like the military requested that like every AI system is like you know as reliable as a flight control system I would be like well that's not great like uh that sounds awesome of course that is a Rosy view like probably when I think it's not a question it's military gets evolved I think it's a question of when and when this happens it probably has the law of undignified failure goes if a thing can fail it will always fail in the least dignified way possible so probably it won't get to this level but I think we should not dismiss out of hand that I mean first of all I think it's ridiculous to accept that the military will not get involved I think this is just impossible at this point unless we get you know paper clip tomorrow like on things unless things go so fast that no one can react military will get involved and we should work with them we should be there to like be like all right how can we help the military handle this as non-stupidly as possible and I do think that a lot of people that work in a military do care and would like things to be safe and work well so is it worse than you know Sam Altman you know all like Doctor Strange Love style you know is running you know things as fast as possible is it worse if you know the military nationalizes the whole thing and grinds into this bureaucratic monstrosity not obvious to me I'm not saying I know obviously it is good but it's not obviously not good all right gonna thank you for coming on the podcast pleasure zoeys
Info
Channel: Future of Life Institute
Views: 16,329
Rating: undefined out of 5
Keywords:
Id: nf-2goPD394
Channel Id: undefined
Length: 52min 8sec (3128 seconds)
Published: Thu Apr 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.