OpenAIs New SECRET "GPT2" Model SHOCKS Everyone" (OpenAI New gpt2 chatbot)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so there has been a recent model released by open AI on the chatbot Arena that many people are speculating could be potentially GPT 4.5 or GPT 5 itself and I'm going to get into exactly what this is because this has a lot of people wondering about the next model release for open aai so if you don't know what this is first off I'm going to explain to you the chatbot Arena so right here you can see that we are on the web page that is the chatbot Arena and essentially what this is is it's a web page where you can test chat Bots against each other so you could enter any question so you could say why is Agi so dangerous and then when I click Send right now two different AI systems are going to be responding to that query and I can basically side by side see which AI system is better now essentially what we're doing here this is a blind test and this is the website that many people are using to find out the capabilities of this new gpt2 model so we can see right here that basically we would look at which one was better we would read it and then at the end we would give it a rating based on which one was better now what's crazy about this is that you can see right here that this one did nine points that was pretty good this one did 10 points and I think maybe this one was a little bit better but what we can do is I can say that this is pretty much a tie and then I can click tie and then it's going to give us the information on which model it is so we can see here that model A was llama 370 billion parameters and model B was was GPT 4 now why I'm showing you this is so that you can understand what was going on so on this website what sometimes actually does happen is sometimes companies May release models secretly to see how they actually Stack Up on the leaderboard you see every time we rate an AI system the leaderboard gets updated and the ELO gets pushed higher we can see that currently GPT 4 is quite high but something interesting happened a few days ago a few days ago we had this right here someone commented on Reddit stating that the gp2 chatbot at the LMS y chatbot Arena they said that I was using the arena as you do and got a really good response from something called gpt2 chatbot the other model was GPT 40125 and the gp2 one was at least as good does anyone know what this was and it doesn't seem to be able to be manually selectable on head-to-head now this was something that actually initially sparked the speculation for what the gp2 model might be and this was in fact the first instance that we got to see of gp2 in the wild and many of the comments on Reddit were actually stating that this model was really good and in some cases even better now it's important to note that this is not actually the gpt2 model that was released in February of 2019 which is the second series in the generative pre-train Transformer series which was the predecessor to GPT 3 which was the predecessor to 3.5 and of course then four which is what we use now now so it's important to know that this is not actually an old model and I'll get into why there is a lot more information that gives us the details on why this isn't the case now you can see right here that some people have been stting that GPT 4.5 probably and it comes with an 8 prompt deliverit have fun it's really good and essentially on this website what has happened is there has been a model called the gpt2 chatbot that has been surpassing other AI systems namely mainly GPT 4 and of course other state-of-the-art models like Claude Opus in reasoning tasks and in coding tasks and like I said before many people are now speculating which as to what this model could really be so some people have stated it could be GPT 4.5 some people have stated that it could be of course GPT 4 light because when you actually ask this llm what it is it says that it's based on the GPT 4 architecture which is a type of llm developed by open aai and this is a pretty pretty standard response now we do know that this is actually from open aai because there have been some tweets to confirm the speculation but what the most interesting things are about this are of course the abilities of this model so I'm going to walk you through some of the abilities of this model which are actually quite impressive considering the model's stealth release so someone did actually post this on Reddit and you can see that this was posted Ed by kala DEA and you can see that this was a question that he asked to many of the state-of-the-art models we can see here that this was a question if we look on the top hand right how many characters are in this message the answer is highlighted here with 40 and then we can see that llama 30 we can see that llama 3 got this wrong it says there are 34 characters mistal large got this wrong it says there are 59 characters we can see that claw 3 Opus got this wrong stating 38 characters and that GP before the most recent one actually got it wrong saying 43 characters the only model apparently that got this right was of course the gp2 chat bot which actually used a different reasoning level and then of course got to 40 characters now I think the most interesting thing about this is that this model is different in the sense that its reasoning steps were quite different you can see that the Llama 3 just initially answered this one right here initially answered straight away this one answered and this one answered and then this one literally did a step-by-step kind of like Chain of Thought prompting and then got to its answer which actually surpassed every other test now there was also this right here you can see that there are the other tests that some individuals have been conducting now I don't want to state that I did conduct this test myself but there are other capabilities that we do want to talk about and this is the Apple test and there have been many different variations of this test and basically it's a simple reasoning test that asked today Tommy has two apples yesterday he ate one apple how many apples does Tommy have and the reason this question is a little bit tricky for large language models and AI systems is because they get confused on the fact that he has two apples and he ate one apple and then you know it's 2 - one is one which is of course incorrect because the trick is is that today he has two apples and yesterday he ate one apple so he would have you know two apples still because eating something yesterday doesn't change the amount that you have now if you just dated at the first time so that was something that many people did get wrong so you can see Tommy has two apples today which is correct and I'm guessing that this was another model which was probably GPT 4 or something like that that got this wrong now I did actually test this myself and I do want to say that so there are many different tests that you can do but one of the most interesting things I did see was this claim from Demetrius saying that the gp2 chatbot solved this but used completely wrong reasoning fill a tokens help I guess and someone else asked does turbo solve this and he states that neither claw three opus gp4 turbo 2024 049 both with zero temperature can solve this so it does seem like there might be some kind of increased capability in this model and there is tons and tons and tons and tons of speculation and there was also one example that I did find even more interesting which was this one where someone asked GPT for Turbo and gpt2 chatbot to make a game using JavaScript in a single HTML document these are the results the first one is GPT for Turbo right here essentially you can see a game and the game doesn't pretty much work it does have a score it does have a timer but unfortunately nothing is going on in the game and then you can see here this is like some kind of space invader Style game where you have to collect the points and this was what gpt2 chatbot apparently did code from scratch which means that potentially it shows that there's a level of complexity in the dp22 chatbot that means it surpasses what is in the GPT 4 Turbo one now here's the thing okay many people that are speculating whether or not this model is even real there's a lot of different things that people can say but something that we did see that actually did put the nail in the coffin was this tweet from Sam Alman now of course as you know he is the CEO of open aai and is most likely to know when something does change and essentially him actually tweeting I do have a soft spot for gpt2 is rather important now why is this even more important than we might think because if we actually take a look at what's going on here what the key things that we saw was with this tweet what we actually did see which actually made this a lot more interesting was the fact that he first tweeted this he first tweeted I do have a soft spot for gpt2 and the reason that this is a substantial difference is because gpt2 with a dash in between it actually refers to this gpt2 right here so you can see our model called gpt2 that one is of course a a very different one to what is being released now which is why the change to removing the dash to gpt2 actually does make a big difference which means that he's not talking about an older model what he's actually referring to is the model that is currently here released on the chatbot Arena which is of course as you know called gpt2 now you can also see that if you decide to go on the chatbot Arena you can see that if you scroll down to the gpt2 chatbot it is currently sitting in between GPT 4 Turbo and GPT 3.5 turbo and the thing is with this is that I've actually tested this myself and I'm going to show you guys the tests that I used for this because I wanted to see how good this was so one of the things that I did actually want to test with this model was I wanted to test the reasoning ability there was a question like this that I did see on Twitter uh and it does get the question right but one of the things I did want to test was of course to see if it could code something that would work instantly now I simply just asked it to code something that I could use in trading view to trade based on the RSI and take profits and sell on that and I actually did split test this with Gemini and with Claude Opus and I got to be honest the current test that I did didn't work for gpt2 I can show you guys that but I do want to state that this doesn't mean that this chatbot isn't good at all because one simple test doesn't give you enough data points to verify whether or not a model is good so essentially if you come over to trading View and you come over to the Pine editor on any chart you can simply put in some code to see if it works and then you click save and essentially when you just pretty much you know add it to chart usually there will either be an error or there will be no error so you can see when I post in the code from gpt2 and then I click save you can see that you can see that when I click update on the chart you can see that there is an error right here but when I use Claude 3 I simply take this then I come over to here and I simply replace it then I click update on the chart and you can see that the code actually does work based on trading view now I just wanted to see if the code actually works and currently in that Ono one test it didn't compared to to the gpt2 chatbot another thing that many people have actually been talking about that is quite fascinating is of course the asy art now asy art isn't really a benchmark that you can use to test whether or not an AI system is good or not but it does kind of serve as an interesting indicator on where AI systems perform in this realm and usually more advanced systems do perform a lot better now what we can actually see from this is we can actually see that the gpt2 chatbot does do a lot better in certain asy Arts however the point I'm trying to make here is that currently it does seem like there might be some kind of increase in terms of ability compared to GPT for Turbo and there was also this demo here where we had someone state that gpt2 chat spot is insan ASI art miles ahead of any other model but there was actually a small caveat to this so of course you can see initially the user asks both to create an asy art unicorn you can see llama 3's one is pretty terrible and then GPT 2's is pretty incredible however we can see that someone also realized that ASI art is onetoone copied from the internet and it seems that the gpt2 model is better at recalling the training data so we can also see here that this is actually from an asy art archive where the same exact robot is there and it seems that gpt2 has pasted this in so it seems that what we have currently is a mysterious chatbot that has popped up and with Sam Alman currently stating that he loves himself a bit of GPT it clearly does mean that what we are potentially going to be looking at here is a another model but the thing is is that the discrepancies in the abilities don't seem to be that crazy in terms of a GPT 5 level jump I mean if it was that good it should have been able to get my question right and of course this is just a personal experience but as some others have stated if this is something that is you know GPT 4.5 or GPT 5 then it would be a little disappointing because it doesn't seem to have a huge leap in terms of reasoning capabilities now maybe it is maybe I'm completely wrong because this is just very very strange and very very leak wise and as some others have stated one of the most important companies in the world is just doing a leaks where they State you know I have a soft spot for gpt2 the point is is that currently it's open for speculation I think like many people are stating it probably is just a different fine-tuned version of GPT 4 maybe the less lobotomized version of it maybe it's trained in a different way but I'm not guessing that this is GPT 4.5 but it is pretty strange though that if it was a different version of G G PT 4.5 why they wouldn't just name it that or they wouldn't just name it GPT 4 as a preview as they have done with the leaderboards why would they name it gpt2 which intentionally is going to throw people off I have to state that that is pretty weird my best bet for this gpt2 model is that potentially it's some kind of test the opening eyes running to see where one of their models is benchmarks and like I said before if it was GPT 4.5 I do think that they would have just named it that or maybe they would have just done an announcement but either way they are doing some testings and it would be really nice to see what kind of benchmarks have been done because currently what we have is a few echoes in the current Twitter sphere of some people even talking on forchan stating that you know I just did some testing and that this is completely invented model name no info about it anywhere absurdly good output feels like an improved GPT 4 not visible in the leaderboard not visible in the API results unaffected by other models possibly peculiar rate limit and a current theory is that it may be GPT 4.5 to GPT 5 so another example here of of course gpt2 getting the reasoning question wrong and then of course GPT 4 Turbo not getting the question right as well as Gemini Ultra and Claw 3 Opus One thing that I would like to see is of course this benchmarked in certain capabilities like actual benchmarks like the mlu although that is not the best Benchmark due to uh inaccuracies but I think right now it could be a situation where maybe our own biases are getting in the way and just thinking that this model is completely Superior when it actually isn't but the problem is is that there is of course A peculiar rate limit that only allows us to eight messages so it's pretty hard to test this because once you have sent eight messages you do kind of get locked out so let me know what you think this is I do think that this is just kind of an openi test I don't really think that this is an entirely new model but I do think that whatever model this is I definitely think it's probably just some kind of gbt 4 version that we might see in the future but if it was why wouldn't they just state that so it's definitely a curveball in there and if there are any new theories or any other pieces of information I will leave them in the comment section below as you are all updated

Info

Channel: TheAIGRID

Views: 46,768

Rating: undefined out of 5

Keywords:

Id: eOS1qeU4Cfs

Channel Id: undefined

Length: 16min 17sec (977 seconds)

Published: Tue Apr 30 2024