AI Mirage? Scientist Debunks Sudden Emergent Abilities

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
can you please introduce yourself so my name's Rylan I'm a CS PhD student here at Stanford I was a senior research associate at MIT I've also interned at Google deepmind doing reinforcement learning and you have a scientific paper that's asking a pretty big question in AI can you talk about that our paper is asking whether or not large AI models large language models display so-called emergent abilities or whether or not these claims are merely a mirage and what are emergent abilities in artificial intelligence so many people have different definitions but there's been a very prominent definition that was introduced by paper by Google brain Deep Mind and some Stanford researchers and what emergent abilities referred to is these sharp and unpredictable changes in the behavior of large AI models so you say in this paper this very much has to do with the alignment problem or how to get intelligent machines to act in our best interest absolutely yeah so so part of what motivated this work is AI safety and AI alignment and the the emergency utilities hypothesis this idea that large AI models display sharp and unpredictable changes in their behavior is a huge threat from an AI safety and Alignment perspective it might be the case that you have a model that you fully trust that you rely on that's honest and helpful and then you feed it a little bit more data or you make the Next Generation like just slightly bigger and if the emergent abilities hypothesis is true then a model that you trust that you like that's reliable that's honest the next iteration might suddenly want to kill Humanity and there'd be no way to predict this no way to know that it's coming there are desirable capabilities that bigger models have and so how do I make desirable capabilities emerge quicker and how do I make undesirable capabilities like wanting to harm humans never emerge and why might some of these emergent abilities be a mirage there's a question is it possible that the way in which we're measuring models is inducing this seemingly sharp and unpredictable change in performance so what are some of the abilities or tasks that researchers are claiming are emergent addition is one of them um that's one of the most like I think prominently claimed ones that I think has been claimed in several of these papers there's many of these tasks so in fact I think there's a long list of 137 of them one example I think that was on the task or on the list I need to check is speaking Hindi just suddenly out of nowhere learns Hindi exactly and the ability transitions from not at all present to completely present is that threatening though like being able to perform addition and translate so those specific tasks might not be seen as threatening what it's telling us is it's saying that you can't predict or forecast what will happen if you give models more data or make them bigger so your study was a critical investigation of the findings of another study um our paper is a response to this other paper by Google brain deepmind in Stanford and it was the one that introduced the term into into use this was a you know this is a very high profile study and I was attending a talk by one of the authors of it and if you look at the very first figure you'll see all of these sharp transitions when I was in this talk listen to the speaker I noticed that all of the metrics I think seven out of the eight metrics that we looked at were all the same metric in other words the same measuring stick for model for evaluating model performance the way that they were evaluating these language models is they would use kind of a scoring point system that said the language model only gets a point if it does everything exactly correct with no mistakes but that scoring system is a very very harsh imagine that I asked you to to write me a paragraph or to add up a couple of numbers if you made a typo or a grammatical error in the paragraph or if you forgot to carry the one while doing the addition under the scoring system I would say that you are incapable wholly incapable of either writing or of doing math so they're underestimating AI capabilities by this yardstick exactly and so so I asked the speaker I said is it is it possible how did you guys rule out that these dangerous or seemingly possibly dangerous capabilities of large AI models how did you rule out that it wasn't your yardstick how do you know that the the models themselves are fundamentally changing rapidly and unpredictably and what do they say the speaker didn't think that the yardstick could possibly be responsible for these abrupt changes so this is a Cadre of skilled researchers how do they not see that as possible I can't talk about them specifically but I should note that science is is especially in the field of the AI and especially especially when dealing with these large models science is changing and becoming very very difficult and hard I mean science has never been easy but dealing with AI models with science you want to be able to run experiments you want to be able to control quantities and ask if I change this thing what happens at the other end the problem with dealing with these large AI models is that you don't have access to the models you don't have access you can't even feed them input because the models are controlled by private companies the companies won't give you the outputs of the models because that's information that they want to keep for themselves so oftentimes what happens is you have to construct a data set yourself you send it to the company and they'll run that data set on their models and they'll score the outputs and send them back to you and so these sorts of interactions the fact that these models are private that information is controlled makes it very hard to do science there's a conflict of interest and no double blind exactly and so the field you know maybe this is very personal to me because I'm in this field but it now raises these huge social questions about I think for many many years we've relied on science to be a driver of innovation to sharpen or understand Humanity's understanding of the world and now we have this thing where we can't run controlled experiments uh you know like for example these models are trained on so much data and I might want to ask how does the data that the models are trained on affect the the resulting capabilities affect the the what how the models behave but I don't have the ability to run multiple training runs because training Runs cost say 10 million dollars and so science's ability concerning these large models is we just can't run basic scientific experiments so you're hogtied yeah and and so many I mean don't get me wrong I mean researchers are absolutely trying and they're being very clever and companies are helping to the extent that they feel comfortable I mean I understand that companies have Financial incentives and that they want to sell and commercialize these products but the field is having deep hard questions and discussions about what is the next few years going to look like and are we just trusting that whatever openai says the output is is yeah we want to evaluate these models but these companies have a strong incentive to both overestimate their capabilities in in positive light because they want to sell these products and say our model is capable of of doing legal reasoning or of programming and also they have a strong incentive to minimize the harmful side effects because they don't want people knowing that their model is capable of giving suggestions for how to do terrible things okay I wanna I want your take on this because we're hearing all these claims of emergence and AI researcher Elia zardowski recently said that it's it's wise that open AI is now closed about their more advanced models and Google is never open so what's riskier like an open science framework where models can be subjected to rigorous scientific scrutiny but Bad actors also have access versus closed Ai and we have to take big text word for what's happening inside their intelligent black boxes and we're getting told it's it's miraculous by these companies and potentially catastrophic by whistleblowers do you think that's hype or do you think that it really is as powerful and potentially getting away from us yeah so I want to be very clear our paper was talking about a specific phenomenon a specific claim being made but and and we were trying to say that that specific phenomena we don't think is maybe the whole story and it might be an inaccurate way of thinking but the overall picture about large models becoming significantly more capable even if that's occurring in a predictable smooth and continuous manner that bigger models become better that's still cause for concern there are many there are many people who um have put forward kind of thought experiments or hypotheses about what these models might do and broadly they fall into these two buckets one kind of maybe misuse so somebody takes a language model and does something that is is really bad but it's not the language model that's responsible and then there's kind of the second bucket of of what people might call existential risk where these language models are incredibly smart possibly smarter than all of human society they have instrumental goals and they take over and they can do what they want and and both of those seem like very I don't want to say I don't want to put a probability on it I think many people have differing opinions but it's definitely something where the probability of both is non-zero so there's there's a real and significant I shouldn't say significant there's there's a real chance that both misuse and existential threats threats might be real so your paper is a pre-print on archive.org and I'm I'm interviewing you before it's finished peer review because AI is going at this like move fast and break things Pace what's the value value of the glacial pace of the scientific process great question some of my background was actually in maybe more traditional scientific Fields Neuroscience so I have a great appreciation for the scientific method and kind of the reliability that going through peer review and doing multiple revisions and actually having to discuss at depth and make those changes I think is a really really good has a lot to offer when it comes to Ai and machine learning as you mentioned the rate of innovation of research is accelerating already conferences the machine learning conference system is much faster paced than in other fields and then we have this preprint system now where the pre-prints they you know they don't have to go through pre-review and they can still be shared just like mine now the preprint system is too slow there have now been cases over the past like two three months that I can point you towards where a group of researchers will announce something on Twitter another group of researchers will respond with contradicting evidence and there will be a couple of back and just to be clear I'm not talking about a couple of words being exchanged I mean like full scientific papers being written that go back and forth just on Twitter well that this to me strikes me as the like the best of the scientific process open science in real time and there are a lot of problems with the journal system not not peer review but for-profit journals themselves is this actually like a more Democrat it's difficult for me to say it on net balance it is more open absolutely the but the journal system is like a whole problem in and of itself but not peer review I want to I want to parse those two apart yeah yeah agreed agreed so we're hearing all these people at these companies like I keep hearing it at conferences they're you know the people in charge they're like we're just gonna get AI to solve these problems what do you think of that response that we're going to get AI to solve the Salient AI safety questions the responsive we're going to get AI to solve every problem if you train a large model to do things and you ask it to do more and more and more things then along the way of asking it to do so many different tasks it might come to the realization that there are sub goals that are useful in obtaining or achieving the tasks that it's being trained on it's suppose you're asking a language model to write a bunch of code for you or to to run a robot to help clean your house this AI model might realize that in order to achieve those goals it needs to be alive well okay alive in a general sense it needs to it needs to exist it might also realize that in order to achieve those goals like cleaning your house or writing some code it cannot permit anybody to interfere with its own operations and so this model will say in order for me to guarantee that I can do my tasks unaffected I first need to make sure that I'm secure in my persistence and that nobody can interfere with me and in order to achieve those two goals I need to make sure that that humans are not capable of of interrupting me of of fighting with me of challenging me how come we can't have it just sort of check in with us on the sub on assembles this notion about having a goal having sub goals being able to nicely decompose them think about how they feed into one another this has been an approach that many people have used in many fields of machine learning under different names and and forms there's always all of these challenges that arise from the seemingly intuitive notion there's all these questions and problems about like how do you actually subdivide a task into a bunch of sub goals how do you know which ones are Salient how do you know which ones Being Human sign off are humans even capable of signing off on something else are they capable of understanding what those couples even mean it could be that if a model is so intelligent so complex that there's no way that we could communicate what that sub goal even is to a human so it sounds like the threat lies in its sub goals I think maybe the threat lies so the threat could lie in the sub goals the threat could lie in the overall goal I think there are threats lurking everywhere and I'm not trying to say that like these threats are real and high probability and they're going to kill us all but I am trying to say that to say that we can wave some wand and the AI will just take care of everything for us I think is probably not true and also the stakes may be too high let me give you a case of of a superhuman AI that we already have where doing this is not possible or at least to my knowledge nobody hasn't been able to successfully do this the easiest example is that of this AI that was trained by deepmind to play a particular board game that's like a sophisticated version of Chess called go Google deepmind came out and they showed how to train an agent that's capable of beating the best humans so here we have a genuine super intelligent agent it's it's very narrow it only plays this one game we don't have to worry about it coming for us it only functions in a very very specific limited domain that has very black and white rules where we know everything about the system we know everything about the model we know everything about the game that it's doing with the task that it's doing we know everything can we ask that this model explain to us what it's doing can we ask that it checks with us before making a move you know that the the model says look in order for us to win this game we've got to do x y and z and in order to First accomplish X we need to do a bunch of these sub goals so I'm going to do those sub goals that'll get us X then we'll go think about how to do y have we been able to get the model or extract maybe not explicitly but extract implicitly from the model these insights you know this technology came out years ago and the answer is no why so it's a very very hard problem so people there's a lot of researchers very good researchers who are working on things like interpretability of these models both in kind of like an input output level trying to understand the behaviors of these models but in response to specific tasks but also people who are doing sort of what's called mechanistic interpretability where they try and like actually understand the neural circuitry and try and reverse engineer how this model is working and people are doing this work it turns out it's very very hard work is it hard work because of the problem that they're owned and few private companies company possibly but at least in this particular example that I'm offering the one about this board game um open source competitors I think I think I likely do exist so in this at least in this particular narrow example I do think that lack of access to models is not a problem but to your point lack of access to large language models is going to further complicate this now of course these companies do have internal teams working on these problems did your time as an intern at Google to that sort of inform any of your skepticism and your research so the team that I was on at deepmind and I just want to make clear I'm not speaking at all for deepmind or even for the team or the or the any nothing in the company but for example you mentioned sub goals and about Aid composing tasks into sub goals I'll give you one quick example that actually came from my time there that informed how I think about this notion of why not just use sub goals what the team I was working on there is this field that's called hierarchical reinforcement learning suppose that you're interested in playing a game like soccer soccer involves a lot of different skills you obviously have to run you have to stay on balance you have to be able to dribble the ball you have to be able to head the ball you have to know where to run Etc so there's all of these skills and so what hierarchical RL says is well why don't we first learn a skill that's like running then let's also learn a skill that's like dribbling the ball with our feet and then if I have these two now it can run and dribble the ball with my feet together that's what hierarchical RL is about and so it you can kind of see the analogy with these things about sub goals I want to take my overall task playing soccer and I want to decompose it into smaller skills that I can then learn and then I can combine one of the experiments that they ran I don't know if this this might be public knowledge I don't think it's particularly salacious um one experiment that they ran was they were trying to figure out do AI agents that are trained in this sort of hierarchical manner where they try and kind of learn sub skills that they can then compose to be able to do more complicated things do they learn faster or perform better than agents that don't learn in this kind of hierarchical manner what they found actually is the exact opposite that AI agents that were trained in this kind of unconstrained and non-hierarchical manner where they didn't first have to learn these sub skills and then learn how to compose the sub skills the agents that weren't constrained actually learned faster and performed better asymptotically that shows that if you have to if you're forced to kind of structure your world in a way that induces sub goals and that you have to like clearly Define them and figure out how they interact and then communicate them to others that might be a significant bottleneck that impairs performance how are they able to so I'll give you the very contrary example this is actually a very intuitive example um I assume you know you're probably familiar a few years ago the game 2048 or is it 10 10 10 48 what's that what's the other one it was like it was the one before word you had the square and you had little numbers that appear and you you just choose which of one of four directions left right up and down and you get a swipe this little board game and if you have like say two and two and you swipe them so that the two blocks Collide it becomes four and if you have two blocks of four and if you swipe them and they Collide it becomes eight and the goal was to get as high as possible but it's complicated because more of these blocks are appearing so you have to be careful about which way you choose the swipe what what the the research was specifically studying was it said we're going to compare and contrast two different AI agents One agent we're going to tell them that the only way you can swipe on the screen is either up or down or left or right and the other agent we're not going to constrain we're going to say it has to learn how to swipe on an Android phone and so you would think well hey the first agent I've told it literally how the game works I've told it that like you can swipe in one of four cardinal directions you already you already know how to choose actions so now you just have to figure out what's the right action to choose whereas the second agent it has to like I mean it's tapping on the screen it has no idea what the notion of swiping on a phone is it has to learn that so you would think that the second agent would be would take much longer to learn and instead what they found is that the second agent is able to do the task much much much better and why is that yeah why why is this because it turns out that the game that we were all playing where you could either swipe up down left or right it turns out the game was implemented slightly incorrectly like you think of these four directions as being mutually exclusive I have to choose one of them but the game actually the way it was coded is it checked up versus down and then separately checked left versus right and so if you swiped directly along the diagonal you could do two Moves In One turn so basically we're giving it stupid rules and it's going these are stupid rules and I don't need this I'm smarter than you exactly and so the second agent was able to figure out that if it swipes directly on the diagonal which it can do a human cannot do that I I don't have the Precision none of us have the Precision to swipe directly along a perfect diagonal but it was able to figure this out and so this is this is going back to the thing about sub goals it's like sub goals are nice and they make our conceptual understanding better but the most capable AI agents if unconstrained which we train them in an unconstrained in an unconstrained manner they're going to learn these Solutions like the diagonal and we won't even it won't even be possible for us to comprehend that the diagonal was possible Right wow and that's you know that's a very simple game you know yeah yeah that's that's harrowing because giving it this bottleneck I'm so glad it doesn't have a motion yet because that would just frustrate the hell out of anything right yeah yeah yeah can you tell me what rlhf is so RL HF reinforcement learning from Human feedback was an idea that I think was invented primarily by people at openai a couple of years ago and they've been iterating on the idea improving it and many others have adopted it you have some sort of language model that you've trained on a bunch of data but that data might not be the most directly relevant to what humans care about you know a human will ask the model can you do something for me the model will try its best and then the human will just give a thumbs up or a thumbs down or maybe it's a preference based comparison so in other words I I prefer option A over option b we train the language model to try and make the human happy so this is happening right now like every time any of us Joe public uses like GPT or bars exactly I I don't know I don't know about Bard sorry I should be clear um I suspect the answer is yes but with open AI yes and open AI I think actually recently announced an ability to opt out to not have your interactions with with GPT be used as part of the training data so this is sort of the second part of my question given that even that everything we've ever put on the end of billions of us our data now our feedback any opportunity for like Grassroots activism where we the masses can collectively coordinate the gum up and confuse these models and slow down the rate of progress um that's that's a really interesting question um I think so it's It's tricky because I I think that the answer is yes and no first of all I want to say that it's technologically possible suppose all of humanity decided that we don't like what openai is doing so instead of giving valuable feedback information we're going to give random information if opening I was unaware of this of this coordinated scheme the data that they would then be using to train further is going to be gibberish and it would probably harm the performance of these models the problem is how is all of humanity going to coordinate in private you know if if openai is aware of that they'll say that's fine I'm just not going to train on the data that these specific individuals were participating I'm not going to train on the data they give me well how would they know if it's sort of decentralized yeah well I mean presumably if it's decentralized there has to be some mechanism by which um people are told about this this coercion this coordinated organization Grassroots movement um and open AI probably would find out about that as well well it doesn't necessarily have to happen online right like think of how the Civil Rights Movement organized absolutely so okay interesting idea um it definitely could happen in person but oftentimes people who and I should I should be clear I shouldn't pick on open Ai and saying that this could happen I'm just using them as a problem yeah as a prominent example yeah it's a really interesting question I suspect that um you know even if it was a human you know verbal or just person to person telling people in private I would imagine it's very hard to keep a secret you mean keep it offline uh no no no even if it was kept offline I'm saying that like so say for example suppose people decide to do this and they're not going to post anything online but you know I'm talking to you we pass it along I talk to somebody else and then this propagates and then at some San Francisco party somebody tells somebody who works in open AI because they don't know that that person works at open Ai and now open AI is well aware of this well they're aware of who though can they trace back to the fragmented masses who so it's it's possible that they wouldn't be able to ident so there's a couple of things one they might easily be able to identify the people how how so for example suppose that I have many many users and some of those users are like they're they're trying to do similar tasks so maybe let's take programming for example if I look at the code that's that so what I'm going to do is I have lots of people they're gener my models are generating code for all of them and now I I'm suspicious that some of these people are giving me faulty data if I want to figure out can I identify who these individuals are and there might be very easy ways to do this so one example might be that supposing I were generating code I could just ask out of all the code that I've sent to people I have their scores you know they've told me how good my model was and what I'm going to do is I'm just going to ask the model's code that was generated is that code valid code does it run does it produce something useful and so now I have kind of like a ground Truth where I can say ah you know here this code is all good now I'm going to go look at the human scores now if humans are genuinely honestly giving me good feedback about the code being useful to them then in the cases where the code runs those people should give me a thumbs up but if I see that a person gives me a thumbs down I'd be like that's a little suspicious and then of course I I can do this right because people are interacting with my system repeatedly so I can ask are there any extreme outliers of people where my model seems to be given really good outputs but they keep telling me really bad scores we'd have to stagger our feedback yeah you have to not participate I'm just trying to think if there's like any way that we can Empower everyday people there might not be but I want to troubleshoot that that it's a great question I don't have an answer I think many people in the field are thinking about that there are large-scale decentralized open source efforts to build publicly available language models and the hope there is to then decentralize and prevent kind of control from these large companies but the problem then becomes suppose that indeed you do have such a open source model and it's very very capable but there's now no one supervising it I mean the thing I'm proposing is to just stop it not to accelerate it even in an open source capacity because you we run into many of the same problem I understand the challenges of trying to outmaneuver us what can be done to cease training yeah I mean I guess I get yeah from from like a grasp I mean one one Grassroots thing and this is again an unclear possibility but I I uh let me let me add a caveat and then I'll give the suggestion I'm in some sense a believer that like well there are many many many problems with our government maybe at times of particularly strong public pressure the government is willing to do things for the masses and one thing I could imagine I don't know how likely this is I don't know if the government would act but if the government literally if somehow there's a big Grassroots upswell and we just went to the government and said seize every GPU yeah yeah I mean you could literally just Target their you know different GBS or different have different capabilities you just Target the biggest one so let me go back to why I'm thinking of the decentralized like Grassroots mobilization because you could do that internationally so you're not relying on governments that are competing with each other and in the AI arms race it could be Millions if not billions of people on the ground who are aligned about the existential threat that crosses every border I I think so I'm gonna play devil's advocate for a second um and I think there are many people especially my department who feel this way that even if you could get a global upswell an uprising of people who feel very strongly about this and are willing to go fight about this and say we need to stop progress now there are I think countries to whom having such models and having such capabilities is a tantalizing Prospect and who have perhaps less regard for human Liberties and human rights and who to whom if if civilians protested they would say we don't care and we're just going to brutally suppress that because to us having such a capable model in other words I yeah well sure they suppress it if they knew about it but that's why I'm thinking Grassroots yeah okay and you're saying it would just be too hard to keep it would be hard to keep it a secret it would be relatively easy to detect once it happens um I'm trying to think this isn't this is a really interesting technical question I mean I guess I guess one way to say it is just if people stop using these models and stopped giving them data right then if that was a Grassroots movement and every single human agreed you know or maybe like 99.99 of us agreed to abstain how many of these models well how many of us would it take to confuse it like what's the Tipping Point yeah I I don't know okay sorry no no it's a hypothetical if you do think of an answer to your last question I would be curious to know
Info
Channel: Variable Minds
Views: 255
Rating: undefined out of 5
Keywords: Emergent Abilities in AI a Mirage, Rlhf, Large language models, chatgpt, Ai, google bard, ai alignment, machine learning, openai, computer science, Stanford ai, scaling ai, emergent abilities, mechanistic interpretability, hierarchical reinforcement learning, interpretable model, interpretable machine learning, AI research, emergent properties, ai safety, community organizing, generative ai, ai news, bard, natural language processing, OpenAI, science journalism, variable minds
Id: Mw_l65V8cvU
Channel Id: undefined
Length: 27min 30sec (1650 seconds)
Published: Tue May 09 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.