GPT 5 Is Now In Training (Open AI GPT-5 Announcement)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so ladies and gentlemen we finally have another huge update towards open ai's next large language model or most likely multimodal model and it seems that the leaks were right you can see by this article it says open AI Chief seeks new Microsoft funds to build super intelligence and this update is very interesting because ladies and gentlemen we are talking about GPT 5 so in this article from Ft if we scroll down you can see that they talk about many different things okay and there's a lot to unpack here because there is a lot of data about GPT 5 that we have discussed over the summer that many of you have missed so make sure you watch the video Until the End because it wasn't just today's announcements where they've talked about GPT 5 they've given many different hints about exactly what GPT 5 is going to be and trust me you're going to get a really good scope as to exactly what GPT 5 will be one of the things that they talked about was the article says the company is also working on GPT 5 the Next Generation of its AI model outman said although he did not commit to a timeline for its release it will require more data to train on which outman said would come from a combination of publicly available data set on the internet as well as proprietary data from companies now before we talk more about gbt 5 of course Alman did say he didn't comment on a timeline for its release because as you know a lot of the times with these AI models there are always things that perform well and there are always things that underperform what I mean by that is with the release of GPT 5 they're going to have to ensure a very very smooth roll out and they know that it's going to be a massive hit which means that if they do set a specific date for this model and then they do miss that date then there's going to be a lot of speculation in the air which is why we don't usually get release dates for these things for example this robot that you're seeing on screen now is called Neo and openi is essentially backing this robotics firm and we know that likely what's going to happen is that open's llms or their Vision models are going to be embedded into these robots now the only problem was that this company essentially said that they were going to release this during summer and as of yet we've received no updates for this robot that is supposedly going to hold the next generation of large language models or AI systems that are going to be embodied into physical robots which is why we're seeing many companies not commit on certain timelines if you are thinking about which timeline is going to be for GPT 5 I can provide some kind of guidance based on what we did have for GPT 4 so in a video that we did release 4 months ago we did talk about release dates and then when we looked at the GPT 4 release date we realized that GPT 4 had data collection in 2021 of September it finished training in 2022 of August and then the release was 2023 of March so it seemed that it was somewhat like a 2-year cycle in which over the course of 2 years data is collected the training is finished and then of course there is a release so I'm going to show you a clip from that video where I explain the exact timelines and why GPT 5 not that far away based on current estimates and this is not just what I've said it's based on what Sam Alman said himself now of course you can see in this video we spoke about them starting training in around December and I think that that was very very accurate but this is where Sam Alman gives us a slight Glimpse at when he's going to start training gbt 5 or potentially collecting said data so take a look at this clip what about you Mr Alman do you agree with that would you would you pause any further development for 6 months or longer uh so first of all we after we finished training gp4 we waited more than 6 months to deploy it um we are not currently training what will be GPT 5 we don't have plans to do it in the next 6 months but I think the frame of the letter is wrong what matters is audit I remember this was 4 months ago so if the model does come out in late to Mid 2025 I think that that would be a very good accurate now the article continu to state that while GPT 5 is likely to be more sophisticated than its predecessors Alman said it was technically hard to predict exactly what new capabilities and skills the model might have although this is good wording by some Altman I don't think that this is entirely the case we do know that it is going to have an llm base and we do know that the image capabilities are going to be upgraded right now what we've seen from GPT 4 in its current state as it's been evolving is definitely far more sophisticated than we initially got with GPT 4 so what it's likely to happen is that Alman is just saying saying that we can't say exactly what new capabilities and skills it might have because of course as you know these things can change if certain things don't work out too well so he says until we go train that model it's like a fun guessing game for us we're trying to get better at it because I think it's important from a safety perspective to predict the capabilities but I can't tell you exactly what it's going to do that gp4 didn't this is kind of true because in previous statements Sam Alman in interviews has talked about GPT 5 and what the future models are going to be like so I'm going to show you a snippet from another video where we talked about how Sam Altman did talk about video for GPT 5 so take a look at that because I think it's super interesting a law here we can predictably say this much compute this big of a neural network this training data um this will be the capabilities of the model now we can predict how it will score on some tests we're really interested in which gets to the latter part of your question is can we predict the sort of the qualitative new things just the new capabilities that didn't exist at all in GPT 4 or that do exist in future versions like gbt 5 um that seems important to figure out but right now we can say you know here's how we predict it'll do on this eval or this there are a lot of things about coding that I think are particularly great modality to train these models on um but that won't be of course the last thing we train on I'm very excited to see what happens when we can really do video there's a lot of video content in the world there's a lot of things that are I think much easier to learn with video than text there's a huge debate in the field about whether a language model can get all the way to AGI can you represent everything that you need to know in language is language sufficient or do you have to have video I personally think it's a dumb question because it probably is possible but the fastest way to get there the easiest way to get there will be to have these other representations like video in these models as well again like text is not the best for everything even if it's capable of representing everything so with that you can see that GPT 5 on the base data that we currently have from interviews and statements we do know that gbt 5 is likely to have video I do think that video is definitely one of the hardest things to do at the moment because even companies that are completely focused on video like Runway and other companies like pabs haven't completely solved the video challenge yet although there were some recent breakthroughs that we will discuss I don't think video is at the level to which a language model can input it in I do think that video is possible because gp4 can output MP4s in its current state but the videos that are released aren't as good as they should be but of course as you do know AI is evolving at a very rapid rate with tons and tons of decent research papers what was very interesting was at the end of the developer speech from Sam Alman he said a statement that I think most people did Overlook and I'm pretty sure that most people did Miss he stated that what we launched today is going to look very quaint relative to what we're busy creating for you now and I can't imagine that he isn't talking about the Next Generation large language model I'm pretty pretty sure that they're working on AGI for GPT 5 although that is just a broad statement I do think that with the capabilities we are now seeing in these large language models and combining all of these models and even what not even Sam Alman and open AI are doing but what uses are doing for example if you saw our recent video on what people were simply able to do with the vision API where models will literally be able to hooked up to computers and control computers and control phones and simply do a lot of tasks very effectively like around 70 to 80% I think that if open aai managed to streamline that and get that working very effectively next year when we do have GPT 5 I think it's really going to have some AGI like capabilities and the worst thing about this is that many of the large language models that are still currently being released don't even match up to GPT 4's current capabilities which means that by next year if they do release GPT 5 or at least some parts of GPT 5 maybe a video version of Dar it seems that the lead for open the eye is just going to get further and further and further away now what's even crazier about this is that open AI is reportedly trying to poach Google AI Talent with around $10 million pay packages as the race heats up which essentially means that openi knows that GPT 5 if they're able to get that off the ground successfully it is going to be the large language model or the multimodal model that completely decimates everything so I'm pretty sure that since they're throwing so much money at this problem they know that if they can solve it they're going to be at the very very top in terms of what technology company are able to do now with regards to reasoning capabilities and the other stuff about GPT 5s in terms of its technical abilities I'm pretty sure you're all about to be shocked at what we do say next because there is still a lot of information that we do want to include into this video so potentially depending on the number of modalities that it is trained on it could be larger or it could be smaller but we do know that if GPT 5 was just text based the parameter count would be significantly smaller that is because recent papers have shown that the effectiveness of your data matters much more when training your large language model than actually upping the parameter count let me show you some examples showing you that a smaller parameter account is more effective than a larger parameter account when you use high quality data to train your large language model so if they're going to train GPT 5 we can refer to this paper by Microsoft released recently in this paper they talk about textbooks are all you need and basically what they state is that when we had high quality data versus low quality data we increased the effectiveness of our large language model three times and the only thing we did was switched the training data this means that if they switch the training data to three different methods that we're about to talk about we could see a 3X jump in the quality and responsiveness of GPT 5 even if the parameter count does not increase so I can simply summarize the 51 Pap by saying that they had a large language model that was trained on 1.3 billion tokens and it achieved comparable results on par with other large language models that were trained on 16 billion tokens and 175 billion tokens including GPT 3.5 so it did on par with or better with significantly less parameters which means that gp5 doesn't need a large number of parameters to be effective all it needs is high quality data additionally something else that they're going to do which is likely to be done with GPT 5 is resulting from this paper show you're working so openi did release a paper which talked about how they increased the ability of the raw version of GPT 4 just by using a different method of prompting so essentially this new method of training was they trained two reward models one for providing positive feedback on the final answer to a math problem and another for rewarding intermediate reasoning steps by rewarding a good reasoning the model achieved a surprising success rate of 78.2% on a math test almost double the performance of GPT 4 and outperforming models that only rewarded correct answers the approach of rewarding good reasoning steps extends beyond the mathematics and shows promise in various domains like calculus chemistry and physics so the paper highlights the importance of alignment and process supervision training models to produce a chain of thought endorsed by humans which is considered safer than focusing solely on correct outcomes which essentially just means that when you get these large language model to think step by step these large language models double their efficiency simply based on this Chain of Thought reasoning which means that the output that you're going to get is going to be as twice as good which means that GPT 5 is likely to incorporate this into its mechanism for outputting prompts which means even if you put in a simple prompt you won't need to say let's think step by step it will have that thought process originally and the output process is going to be much better then of course we have another research paper which blows everything out of the water so as we talked about before the way in which you prompt GPT 4 or the large language model can make the large language model improve by a 2X or a 3X okay and with GPT 5 I do know that they are trying to increase the capability now this paper called tree of thoughts increased GPT 4's reasoning ability by 900% which is the base model increased by 900% just by changing the words that you input to it so essentially in this paper they talked about how they used a tree of thoughts prompting essentially what it means is that if you think about every time you can make a decision there are about five different outcomes the large language model was asked to rate every single decision from number five being the best decision or number one being the worst decision then every single time they went through that decision they then continued that same process to the end and essentially ranking all the different outputs and essentially thinking about what is the best output that you could get by of course going through every possible output and this of course increased the reasoning by 900% so if we know that tree of thought is going to be implemented into GPT 5 which it most likely will be this could increase GPT 5's reasoning by a huge amount so along with data and training it very differently the parameter size is of course hard to come by but I do think that the quality of gbt 5 will be absolutely incredible if we take a look at how smart gbt 5 is going to be as we discussed earlier on in the video there are countless different examples of research papers and new different research papers coming out every single week that showcase the ability to increase the effectiveness of large language models without changing anything now we do know that data and stuff like that is going to increase the capabilities but one thing we haven't talked about is how is GPT 5 going to perform currently we do know that GPT 4 was a huge leap in advancement from GPT 3.5 and GPT 4 is absolutely outstanding we do know that GPT 4 for example was able to pass the bar exam and was able to get around 90% on various different tests that are benchmarks for artificial intelligence so knowing this it is currently estimated that if GPT 5 manages to succeed in its ability to reason think critically and include these Trio thoughts way of thinking that it could theoretically achieve around 99% on pretty much every test there is we know that it's already great at math already knows every single subject that there is the only thing we need to do is pretty much fine-tune everything in one last cycle which is why many people are thinking that GPT 5 will truly be very close to AGI and addition to remember that GPT 5 will have images embedded in it and of course we know that the performance of GPT 4 greatly increased when Vision was acquired so many of the exam questions that GPT 4 took they took them with with vision and without Vision so they were able to see diagrams and some of them didn't have diagrams and when they were able to see these diagrams TPT 4 with vision improved significantly then of course we need to talk about the various limitations that GPT 5 will Implement because although GPT 5 is going to be absolutely insane when you think about everything before that we discussed from higher context Windows to image audio and of course to new ways of thinking and prompting GPT 4 and GPT 3.5 still struggle with the most basic concepts and you might think that is a ridiculous statement but please look at this Ted Talk where they document where AI is incredibly smart and also shockingly foolish because it cannot understand basic concepts but let the video explain it better because it's going to do a better job so all the video is going to show you is a simple question a common sense question that a it doesn't take a genius to answer but GPT 4 continually gets it wrong AI is passing the bar exam does that mean that AI is robust at common sense you might assume so but you never know so suppose I left five clothes to dry out in the sun and it took them 5 hours to dry completely how long would it take to dry 30 clothes GPT 4 the newest greatest AI system says 30 hours not good a different one I have 12 lit jug and 6 lit jug and I want to measure 6 lit how do I do it just use the six lit jug right gp4 spits out some very elaborate nonsense step one feel the 6 lit jug step two pour the water from 6 to 12 lit jug step three feel the 6 lit jug again step four very carefully pour the water from 6 to 12 lit jug and finally you have 6 L of water in the six L jug that should be empty by now so with that it's going to be interesting to see see how they do solve this issue I haven't been aware of any solutions just yet but it will be interesting to see if they even focus on this because largely we do gloss over these problems and just focus on the interesting stuff next of course now that you know that we don't really understand exactly what AI is doing we also need to talk about emergent capabilities now this is something that we've spoken about previously in the past but you have to understand that GPT 5 is likely to be a few echelons better than GPT 4 this means that even if the parameter count is the same a few emerging capabilities are going to be seen in GPT 5 that we simply cannot predict that have been in GPT 4 now one of GPT 4's most emerging capability was theory of mind and theory of Mind essentially is where an AI is able to think about how other people are thinking in certain situations now of course this is particularly worrying if you are thinking about how an AI could potentially manipulate humans to get it to do things from it because of course these large language models do have access to almost every piece of text on the earth and that means it's not limited to other books about persuasion manipulation and of course persuasion tactics now take a look at this clip that perfectly explains emergent capabilities you've likely seen it before but for those of you who hadn't you really need to understand because this emergent capabilities phenomenon is likely to be one of the reasons that they don't release GPT 5 on time because if there is an emerging capability then the AI researchers or open AI will need to learn how to effectively contain it or potentially remove this capability some people use the metaphor that AI is like electricity but if I pump even more electricity through the system it doesn't pop out some other emergent intelligence some capacity that wasn't even there before right um and so a lot of the metaphors that we're using again paradigmatically we have to understand what's different about this new class of Gollum generative large language model AIS this is one of the really surprising things talking to the experts because they will say these models have capabilities we do not understand how they show up when they show up or why they show up um again not something that you would say of like the old class of AI so here's an example um these are two different models GPT and then a different model by Google and there's no difference in the um the models they just increase in parameter size that is they just they just get biger what what are parameters AA it's just like the the number essentially of Weights in a matrix um so it's just it's just the size you're just increasing this the scale of the thing um and what you see here and I'll move into some other examples that might be a little easier to understand is that you ask the these AIS to do arithmetic and they can't do them they can't do them and they can't do them and at some point boom they just gain the ability to do arithmetic no one can actually predict when that'll happen here's another example which is you know you train these models on all of the internet so it's seen many different languages but then you only train them to answer questions in English so it's learned how to answer questions in English but you increase the model size you increase the model size and at some point boom it starts being able to do question and answers in Persian no one knows why now we've already seen that clip on emergent capabilities but I do think this is also going to show you exactly why AI has these emerging capabilities you have to understand that although we can see the outputs of artificial intelligence models models like GPT 5 and GPT 4 we still don't know what they're actually doing so take a look at this tweet right here it says this person tweeted these Engineers never speak a word or document anything their results are bizarre and inhuman this guy trained a tiny Transformer to do addition then spent weeks figuring out what it was doing one of the only times in history someone has understood how a Transformer works and Transformers are essentially the building blocks of these large language models then of course you can see here this is is the algorithm it created to add two numbers and you can see here this is a large simple complex calculation that it's doing to add two simple numbers which is pretty crazy if you ask which means that these AIS think completely different to us this example shows us that this artificial intelligence thought about basic math as rotation around a circle which goes to show that although it might tell us an answer it doesn't tell us how it got there and this is what's so scary about AI we would never know that it's thinking about rotations around a circle when performing simple addition but it is which means that we need to ensure that these artificial intelligence are completely aligned because if you release something like that into the public the risks could be ex
Info
Channel: TheAIGRID
Views: 59,458
Rating: undefined out of 5
Keywords:
Id: Dv4yBH5dwtY
Channel Id: undefined
Length: 21min 43sec (1303 seconds)
Published: Tue Nov 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.