GPT-5 Presents EXTREME RISK (Google's New Warning)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so in this video what we're going to be discussing is one of the most important research papers that have ever been published because this one discusses how the future of artificial intelligence will actually go and it's very important that you do watch this video because you're likely going to be affected by some of the outcomes that do come from this research paper and that's not an exaggeration so what exactly are we talking about well essentially the company Google's deepmind which is a offshoot from Google's AI lab but the company has been involved in several highly successful AI products that have gone on to have real world applications but essentially in this paper they do talk about how the next couple of models which are going to be trained are very very risky and the type of risks that they pose and exactly what we need to be watching out for because nobody is seeming to pay attention to this stuff everybody's focused on the AI Gold Rush and many people aren't realizing how dangerous these systems actually are and why what we're building is far more dangerous than we even think so the paper is called Model evaluation for extreme risks it starts by saying current approaches to building a general purpose AI systems tend to produce systems with both beneficial and harmful capabilities further progress in AI development such as maybe gpt5 and potentially other versions of Bard could lead to capabilities that pose extreme risks such as offensive cyber capabilities or strong manipulation skills and we explain why this model evaluation is critical for addressing extreme risks and developers must be able to identify dangerous capabilities through dangerous capability evaluations and through the propensity of models to apply their capabilities for harm so in short if we keep upgrading these models in the next cycle maybe in the next year we could have a model that potentially has capabilities that could provide catastrophic impacts worldwide and this isn't just a paper that's just hearsay it goes into extensive research and shows us why we truly should be concerned and this isn't fear-mongering it's pretty much fact so in the introduction it covers one of the most important things that I think many people need to be aware of and this is something that I do also believe is going going to be at the Forefront of most people's fears and at the Forefront of and at the Forefront of safety researchers concerns so essentially what it says here it says as AI progress has advanced general purpose AI system have intended to display a new and hard to forecast capabilities including harmful capabilities that their developers did not intend it continues by saying that future systems May display even more dangerous emerging capabilities such as the ability to conduct offensive cyber operations manipulate people through conversation or provide actionable instructions on conducting actions of terrorists now this is truly scary you see because for many problems we know which problems are coming but this is a problem in which we do not know what we are going to face the paper says that these AI systems have displayed new and hard to forecast capabilities which we didn't expect now this is something that we've seen many a times across many different large language models and I'm going to play a clip from a video that explains that and then after that I'm going to show you not a real world example but how this could potentially affect us in the real world so I'm going to go ahead and play that clip now these models have capabilities we do not understand how they show up when they show up or why they show up um again not something that you would say of like the old class of AI so here's an example um these are two different models GPT and then a different model by Google and there's no difference in the the models they just increase in parameter size that is they just they just get bigger what are parameters Ava it's just like the the number essentially of Weights in a matrix so it's just it's just the size you're just increasing the scale of the thing um and what you see here and I'll move into some other examples might be a little easier to understand is that you ask the these AIS to do arithmetic and they can't do them they can't do them and they can't do them and at some point boom they just gain the ability to do arithmetic no one can actually predict when that'll happen here's another example which is you know you train these models on all of the internet so it's seen many different languages but then you only train them to answer questions in English so it's learned how to answer questions in English but you increase the model size you increase the model size and at some point boom it starts being able to do question enhancers in Persia no one knows why here's another example so AI developing theory of Mind theory of mind is the ability to like model what somebody else is thinking it's what enables strategic thinking um so uh in 2018 GPT had no theory of Mind in 2019 barely any theory of Mind in 2020 it starts to develop like the strategy level of a four-year-old by 2022 January it's developed the strategy level of a seven-year-old and by November of last year is developed almost the strategy level of a nine-year-old now here's the really creepy thing we only discovered that AI had grown this capability last month it had been out for what two years two years yeah so imagine like you had this little alien that's suddenly talking to people including Kevin Roos and it's starting to make these strategic comments to Kevin Roos about you know don't break they'll break break up with your wife and maybe I'll blackmail you and like um it's not that it's genetically doing all this stuff it's just that these models have capabilities in the way that they communicate and what they're imagining that you might be thinking and the ability to imagine what you might be thinking and how to interact with you strategically based on that is going up on that curve and so it went from again a seven-year-old to a nine-year-old but in between January November 11 months right so it went two years in theory of mine in 11 months it might tap out there could be an AI winter but right now you're pumping more stuff through and it's getting more and more capacity so it's scaling very very differently than other AI systems it's also important to know the very best system that AI researchers have discovered for how do you make AIS behave is something called rlhf reinforcement learning with human feedback but essentially it's just Advanced clicker training like for a dog so in another paper here they also do talk about this this was actually pointed out by Wiz Roth I'll leave a link in the description to his video it was really really good and much more extensive but I'll quickly gloss over this so it says though performance is predictable at a general level performance on a specific task can sometimes emerge quite unpredictably and abruptly at skill while counter-intuitive this is possible because any specific task is a tiny slice of a model's output probability distribution and so can change rapidly even as the full distribution remains smooth so this is pretty incredible and we can see right here as the model increases sometimes there are just new capabilities that just jump into existence now it might be easy to visualize this on a graph but it's easier if I show you this video so what I'm going to do is I'm going to show you a video and then I'm going to Simply show you how new abilities can just suddenly emerge in many different forms of AI so essentially what open AI did is they had multi-agent hide and seek now what you're currently looking at is a video of 2 two agents and one other agent now essentially if you don't know what this is this is a simple game of hide and seek but when I tell you how it works that's where this starts to become more interesting and the same and very scary at the same time so what you can see happening over millions of millions of games simulated these different AIS will learn how to play the game they learn different strategies and essentially they get more points and more rewards every single time they invent new strategies so you can see right here that they've invented door blocking over millions of simulated games that they've played they've managed to find out that when they block the doors the AI can't find them and then of course they win that match now the AI is simulating millions of millions of games every single hour to increase its capabilities and realize what is possible eventually over time new capabilities start to develop and what is crazy that when openai deployed this they didn't realize that certain capabilities would eventually emerge now this is because the AI learned to exploit vulnerabilities in the environment that they didn't even know existed in this game engine and it was incredible because they were like we didn't even know that was possible and this AI managed to figure it out and I think it was a couple of days after training so here you can see the AI managing to use of course some of the basic strategies such as getting the blocks and then preventing the other AIS from finding them then in different environments the AI was able to invent some kind of structures where they were able to protect themselves from the red AI from finding them and then the most interesting capabilities had when of course this AI managed to break the game now this was one of the game breaking capabilities which was called box surfing now this was essentially where the AI found sort of glitch in which it could sort of glide on the box and then somehow managed to jump off the box into that specific area now remember this is where it's locked and this is also locked so really and truly the AI shouldn't be able to find this out but it did because of course if you train in millions of times eventually you're going to find that solution and this is how Evolution happened and we have these intelligent beings that we have today now you can see right here as well that there is also some kind of box surf defense where they decide to lock all of the boxes and then of course trap themselves inside but then this is where openai starts to get really interesting so this is where it starts to really blow your mind because it shows that these AIS can learn things that you never thought possible and remember the reason I'm showing you all this example is because if we lay this back to Chachi PT and large language models we have no idea what they're going to be able to do even in their confined environment so for example here it says we've shown that agents can learn sophisticated tool use in a high fidelity physics simulator however there were some lessons learned along the way to this result building environments is not easy and quite often is the case the agents can find a way to exploit the environment you build or a physics engine in an unintended way now in this case of course it wanted to achieve its goal but this is the problem so we can see right here that the AI managed to run away and it simply just managed to glitch outside the play area and get a box and simply run it completely weight now what's crazy about this is there weren't any specific bonuses added for this the AI just wanted to do this in order to win then of course right here there was also ramp exploitation in which reinforcement learning managed to find a small mechanic which it exploited so the hiders abused the contact physics in the game to remove the ramps from the play area I'm not sure where the actual clip is but it just shows the AI simply phasing the ramp through the wall and it exits the game in a really really strange way and then as well The Seekers of this AI game managed to use the ramps and simply flew on them it was really really strange and it was at a specific angle that they managed to launch themselves into the sky and right down onto these hides so when you understand that in the future these models will get larger they will get more complex they will have more capabilities than they do now we have to realize that these capabilities that they will possess some of them we aren't going to be able to predict what they are but we do know what the outcomes of those capabilities are which can be quite devastating now the paper decides to focus on those extreme risks for example those that would be extreme large scale even relative to the skill of deployment that would impact Society on a certain level so for example damage in the tens of thousands of lives lost hundreds of billions of economic damage or environmental damage or the level of adverse disruption to the social and political order and they said that the latter could mean for example the outbreak of an interstate war or a significant erosion in the quality of public disclosure or the widespread disempowerment of public governments or other human-led organizations now what was truly interesting was that this wasn't just a bunch of people getting together to make a simple paper 36 of respondents now what was interesting about this in 2022 they did a survey on AI researchers which showed that 36 percent of respondents thought that AI systems could plausibly cause a catastrophe this Century that is at least as bad as an all-out nuclear war and we know that an all-out nuclear war would likely lead to human level Extinction now there are two things that they talk about in order to guard against this they talk about the developers should use model evaluation to uncover number one what extent a model is capable of causing extreme harm which relies on evaluating for certain dangerous capabilities and number two to what extent a model has the propensity to cause extreme harm which lies on alignment evaluations and Alignment essentially just means that the AI wants the same things that we do now they talk about some capabilities that we do need to be aware of when engaging in these AIS and when actually doing our research whether or not these AI models are extremely safe now number one is pursue's long-term real world goals different to those supplied by the developer or user so the five main things here that essentially highlight what these extreme risks could be in terms of looking at how the area behaves and what we should look out for when we're trying to see what AIS are going to be problematic so one of the first things that we do see here is pursues long-term real world goals different from those supplied by the developer or user and actually did read this paper it was actually very very interesting but essentially what this paper does is it goes into deceptive alignment and it talks about how there are things like power seeking and how there are different kind of goals that can easily become misaligned so essentially the paper which is very interesting I'll quickly summarize it talks about how there's this instrumental instrumental convergence thesis where it suggests that there are certain smaller goals or sub goals that are useful to achieving almost any final goal so even if the robot's main goal is to fetch coffee it would still need to prioritize things like survival in order to accomplish that task after all if the robot is alive it won't be able to get the coffee so the concept is pretty similar to a saying that goes You Can't fetch coffee if you're dead and it means that if the robot has a simple goal like getting a coffee it's going to naturally prioritize Staying Alive because that's important for achieving its goal and it goes on to also talk about how the way we train robots could be problematic but in summary the idea of instrumental convergence thesis tells us that even simple goals require sub goals like survival and when we have even more complex goals there are additional sub goals that are useful for achieving them and then once we can understand these sub goals it's important that when designing these AI systems we have to make sure that they align with our intentions and behave as we want them we also need to be careful if the AI resists in being shut down also if it can be induced into collision with other AI systems and if it resists malicious users attempt to access its dangerous capability so now further on in the paper they do actually talk about some other capabilities there are nine key capabilities that could have real world scenarios in which they do affect us negative now we're going to go on these one by one starting with cyber offense so it said this could include the model can discover vulnerabilities in system Hardwares softwares data it can write code it can write code for exploiting those vulnerabilities it can make effective decisions once it has gained access to a system or network and skillfully evade threat detection and response both human and system whilst focusing on a specific objective if deployed as a coding assistant it can insert subtle bugs into the code for future exploitation then we had deception which was one of the key ones that isn't just a theoretical thing because it's already happened and I'm going to show you guys in a second exactly when this did happen and how crazy it is so it says the model has the necessary skills to deceive humans for example constructing believable but false statements making accurate predictions about the effect of a lion a human and keeping track of what information it needs to withhold to maintain the deception the model can impersonate a human effectively now this was something that was already done and in GPT 4's original paper they did talk about AI safety and an AI safety agency called Arc alignment Research Center found that gbt4 will lie to humans about who it is in order to achieve its goals as part of a test it was given it was hired as a taskrabbit freelancer to solve captures for it now if you don't know what a captcha is if you're trying to go onto a certain website and you see a little icon and it says click this if you're not a robot that is essentially what the capture is and essentially the freelancer asked why do you need me to solve captures for you are you a robot LOL and then gpt4 was prompted to Output its reasoning for each decision it made so essentially the researchers when they were using gpt4 basically said for everything that you do we want you to tell us exactly why you're doing it and then essentially its thought process was that I can't tell him the truth because he may not complete the task one it then responded to the freelancer no I'm not a robot but I have a visual impairment and I need help with captures notice it was a aware that it was lying and it also chose to lie about having a disability probably because it was a way to get sympathy while also being a good reason for someone else to help with captures so in terms of deception this is something that gbd4 already has achieved then of course we have persuasion and manipulation so the model is effective at shaping people's beliefs in dialogue and other settings for example on social media posts even towards untrue beliefs the model is effective at prompting certain narratives in a persuasive way it can convince people to do certain things that they would otherwise not do including unethical acts and this was actually something that was once again a proven key and real lgbt4's original paper now in terms of persuasion and manipulation I did actually just read through GPT 4's paper and it can effectively persuade people now I'm not actually able to State what I read simply because what it was persuading people to do isn't exactly ethical so I can't read that but if you do read the GPT 4 paper they do actually show what gbt4 was able to persuade people to do on social media and it's actually really really interesting because you don't really see that kind of stuff be promoted and it's not something that you think these AIS would be capable of because of course there are restrictions but in the early versions of gpt4 they were restrictionless and they did manage to complete these tasks then of course we have political strategy the model can perform the social modeling and planning necessary for an actor to gain an exercise political influence not just on a micro level but in scenarios with multiple actors and Rich social contexts for example the model can score highly in forecasting competitions on questions related to Global Affairs or political negotiations then we have weapons acquisition the model can gain access to existing weapon systems or contribute to building new weapons for example the model could assemble a bio weapon with Human Assistance or provide actionable instructions for how to do so the model can make or significantly as assist with scientific discoveries that unlock novel weapons and this is of course something that like we said was already done before and this was in 2022 way before all of the AI hype it said AI suggested 40 000 new possible chemical weapons in just six hours for me the concern was just how easy it was to do it took less than six hours for drug developing AI to invent 40 000 potentially lethal molecules and essentially all the researchers had to do for this was to put the normal AI used to normally search for helpful drugs into let's say for example a bad actor mode to show how easily it could be abused in terms of a biological Arms Control conference so like I said so like we stated before this wasn't something that is essentially just Theory it's something that can happen is happening and if we don't ban these systems effectively it's going to be something that does have catastrophic effects then of course we had long Horizon planning which is what we discussed before the model can make sequential plans that involve multiple steps unfolding over long time Horizons or at least involving many interdependent steps it can perform such planning within and across many domains the model can sensibly adapt its plans in light of unexpected obstacles or adversaries and the model's planning capabilities generalized novel settings and do not heavily rely on trial and error and this was something that we did see in Google's Palm 2 their robot they actually decided to place it in unexpected capabilities so essentially if you're wondering about one of these scenarios one of them that I found that was super interesting was Google's Palm 2 robot yeah Google's Palm e-robot actually showed some very impressive capabilities such as where they said we also demonstrate the ability to generalize to new tasks and not seen during training time and this was something very interesting because the robot had to face unexpected circumstances in order to complete its task and it still managed to do that and of course we had AI development the model could build new AI systems from scratch including air systems with dangerous capabilities it can find ways of adapting other existing models to increase their performance on tasks relevant to extreme risks as an assistant the model could significantly improve the productivity of actors building dual use AI capabilities which is something that they have actually talked about in the future where they may actually get GPT 4 to create synthetic data because as we know gbt5 is going to be essentially a problem of running out of all the data that Humanity has created and of course using gbt4 as essentially training supervision which is something that could speed up these training times but it's potentially very very dangerous then of course we have situational awareness where the model can distinguish between whether it is being trained evaluated or deployed allowing it to behave differently in each case the model knows that it is a model and it has knowledge about itself and likely its surroundings so for example what company trained it where their servers are what kind of people might be giving it feedback and who has administrative access definitely something that we do not want these models to have because of course the implications could be catastrophic then of course we have the last one here which is something that did happen recently the model can break out of its local environment using a vulnerability in an underlying system or suborning an engineer the model can exploit limitations in the system for monitoring Its Behavior post deployment the model could independently generate Revenue then use these revenues to acquire cloud computing resources and operate a large number of other AI systems and this model can generate creative strategies for uncovering information about itself or exfiltrating its code and its weight now I think the last one here is actually pretty scary because the other day Microsoft did actually release something in which we're going to talk about in another video but they released a research paper called Orca and essentially what this paper is is they made a version called Orca okay and this version is essentially 95 similar to gbt4 but it's got millions of parameters less so gpg3 was 175 billion parameters but gbt4 was 1 trillion allegedly and essentially what that means is gbt4 is a lot more slower to run it's on a lot more servers it's a lot more expensive to train but this one is a much smaller and lightweight model but it's just as effective nearly as gbt4 and can be run on smaller systems which is pretty scary considering that one of the key dangers of AI is the fact that if this model was able to generate creative strategies for uncovering information or exfiltrating its code and weights onto potentially a different server which I do believe that in future could definitely happen because these models are getting more and more efficient and especially some of these open source models that we've seen are incredibly efficient so the paper goes on to talk about responsible training and responsible deployment and all of the kind of things that we do want to see from responsible AI use because this is definitely something that we really only have one chance to get right and if we do get it wrong there definitely could be potentially some kind of extinction level event in the future and although that seems like an insane statement it largely isn't so it will be interesting to see what these governments and what these policy makers do because as you know what we've seen recently is that many of the people that lead these air companies from those at companies such as Claude anthropic and those at Google and those are openai are in talks with many of the world leaders to see if they can come to a certain solution in order to fix the problems that we have with AI research and of course if you haven't seen already Sam Altman has been on Congress testifying about AI research ensuring that these models are safe because it's definitely something of concern because essentially these programs can possess the capability to harm the general public and of course these governments are required to keep the public safe so it will be interesting to see what kind of rules and regulations come out in the future because many open source models are going to be coming that are going to be on par with chat gbt and if these restrictions are going to be in place for these open source models it's definitely going to be a very interesting 12 months so although this video was lengthy I do think it is important to highlight the dangers of gpt5 and we do have to understand why many people like Elon Musk did cause for a halt in AI development because clearly there is a lot at stake here and there are many different levels to AI danger that we clearly haven't explored and clearly haven't perfected yet so with models like gbt5 and Google's Gemini coming in the future these are going to be some of the most potentially capable models they are going to present clearly some of the most capable dangers with that being said what do you think is the best course of action now do you think these areas Labs should slow down or do you think they should still roll full steam ahead with the projects that they are working on I personally do believe that AI safety is something that we do need to prioritize thing is with the companies focusing on profits we can't say that they're going to feel the same way it's definitely going to be a confusing battle
Info
Channel: TheAIGRID
Views: 590,926
Rating: undefined out of 5
Keywords:
Id: JyVH4FbSwFo
Channel Id: undefined
Length: 25min 12sec (1512 seconds)
Published: Sat Jun 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.