Elon Musk FINALLY Introduces GROK 1.5 - XAI Grok 1.5 MASSIVE UPDATE!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so ladies and gentlemen it finally happened we got an update on Gro and pretty much they've been shipping out a lot of recent updates recently so this is actually quite a surprise because last week they just announced they were open source so you can see right here on March the 208th 2024 it says announcing Gro 1.5 Gro 1.5 comes with improved reasoning capabilities and a context length of 128,000 tokens available on X or twitter.com now this is rather surprising considering the fact that like I said before they just went open source and the ability to get 128,000 tokens seems like maybe there has been some industry inside breakthrough that many are starting to go onto so it says here if I just delete this introducing grock 1.5 our latest model capable of long context understanding and advanced reasoning Gro 1.5 will be available to our early user testers and existing Gro users on the xplatform in the coming Days by releasing the model weights and Network architecture of Gro one 2 weeks ago we presented a glimpse into the progress x a had made up until last November since then we have improved reasoning and problem solving capabilities in our latest model Gro 1.5 so as you can all see they basically wanted to improve the capabilities and reasoning of this model so it says one of the I'm just going to zoom in here one of the most notable improvements in grock 1.5 is its performance in coding and math related tasks in our test grock 1.5 achieved 50.6% score on the math benchmark and a 90% score on the GSM akk Benchmark two benchmarks covering a wide range of grade school to high school competition problems and additionally it scored 74.1% on the human eval Benchmark which evaluates code generation and problem solving abilities now you can see right here that this is rather fascinating so there's so many benchmarks that they compared this to and we can see we've got grock 1 then we've also gr 1.5 and we can see that on the MML U it increased by around 8.13% or 88.1% or whatever um and this is actually pretty decent we can also see that on the math benchmark they pretty much doubled the capability there which is really nice on the GSM 8K the grade school math questions they also went up to 90% which is actually very very good and then on the human eval they managed to get up to 74% so I think this is fascinating for two reasons number one if grook is now going to be open sourcing all of their architecture then I guess the benchmarks that we compare it to are going to be different than some of the industry standards now one thing that I've recently been thinking about and discussing is that whilst yes these benchmarks for AI systems are pretty impressive I think one of the things that most people aren't realizing is that all of these AI systems are actually going to be essentially products and I think one of the key differences that separates GPT 4 from even some of the better systems like Claude 3's Opus is the fact that it is actually a good product and whilst grock 1 is really really cool cool the point I'm trying to make here guys is that you can have a really really comprehensive system but the thing is is what is going to be XTO ai's product are they going to be productizing the x. a model productizing grock 1.5 or are they going to be a completely open source company where they just open source all the models like I said if they decide to you know continue open sourcing all their models and you know deviate from that the way other companies have then essentially if we compare them to the other open source companies we can actually see that they're quite better than some of them we can see that compared to mistra large we can see that ggsm AK it actually does outperform it so this is pretty pretty crazy considering the fact that this is now um mainly an open source company I'm not sure if they're going to open sources I think it's going to be really crazy now what's also crazy as well is the fact that one thing that I want you all to understand about this company is that yes while Gro 1.5 is trailing behind Gemini 1.5 Pro gp4 and Claw 3 Opus you have to understand that x. aai is a rather rather small team and they are much smaller than some of these large companies you have to understand companies like you know clae 3 are funded by anthropic recently even I think it was today they literally got $2.7 billion invested in their company again um they had a lot of investors they had billions of dollars in funding round GPT 4 from opening ey they had $10 billion from Microsoft which is absolutely incredible we know that Gemini 1.5 Pro is from Google and Google is literally a trillion dooll company so these models are from not trillion dollar companies but billion dollar companies and Gro 1.5 is competing you know and pretty much you know I wouldn't say destroying because that just goes a bit too far but definitely definitely definitely it's on par with some of the other open source models and not just that guys you have to think about the rate of speed in which they've managed to do this from Elon musk's announcement until now I think it's been about 9 months to a year and in that time they've actually come quite a long way so it will be interesting to see how this goes and with a recent uh recent product release there was something I saw the other day I'm not sure if you're all clued up on it but it kind of showed me the AI systems that were going to get new and new releases and many different companies are going to be releasing different things now in addition what we also did get was we also did get the long context understanding so it states here that a new feature in grock 1.5 is the capability to process long context of up to 128,000 tokens within its context window this allows gr to have an increased memory capacity of up to 16 times the previous context length enabling it to utilize information from substantially longer document so this is something that is pretty incredible um you can see right here that the accuracy is pretty much 100% And this is something that we've seen across the board and like I said it's pretty impressive that they've been able to catch up to what some of these other companies are doing and 128,000 tokens I'm sure that is going to be providing a lot more util It also says that furthermore the model can handle longer and more complex prompts while still maintaining its instruction followed capacity as its context window expands in the needle in a hay stack evaluation grock 1.5 demonstrated powerful retrieval capabilities for embedded text within context of up to 128 tokens achieving perfect retrieval results and I find that to be very fascinating because they didn't say near perfect retrieval results they said perfect retrieval results which means that maybe they have something even more advanced because um yeah that that that is pretty pretty interesting now in addition it goes onto State the gro 1.5 infra so they state that Gro 1.5 is a Cutting Edge llm and that research runs on massive GPU clusters that demands robust and flexible infrastructure Gro 1.5 is built on a custom distributed training framework based on Jacks rust and kubernets and this training stack enables our team to prototype ideas and train architectures at scale with minimal effort a major tallenge to training llms on large compute clusters is maximizing reliability and uptime of the training job and our custom Training orchestrator ensures that the problematic nodes are automatically detected and ejected from the training job so we also optimize checkpointing data loading and training job restarts to minimize the downtime in the event of a failure and if working on our training stat sounds interesting to you you can apply to join the team so it seems that what they want to do as well is that they've got a really really efficient infrastructure in terms of how they manage to train these models and deploy them and then um you know you could potentially join the team if that's something that you're interested in and also it says looking ahead Gro 1.5 will soon be available to early testers and we look forward to receiving your feedback to help us improve Gro and as we gradually roll out grock 1.5 to our wider audience we're excited to introduce several new features over the coming days so right here you can see that they said we're inside to introduce several new features over the coming days and I'm not sure if they're just talking about the standard gr or if that includes some additional features now the only thing I would say that annoys me about X a and their grock model is that it isn't easily accessible you have to subscribe to premium and I've even subscribed to premium um and that just basically means you need to get verified on Twitter which is like I think $5 or $10 a month the only problem with that is that I think if you're in the UK or in certain countries you just don't have access so it is pretty frustrating at this time to not have access to that model even though you are paying for premium and I understand that like there is a weight list but it still is pretty annoying to not even be able to test out an llm that is by Elon Musk you know he's got a huge huge fan base um and people really always want to know what his companies are up to so I think if El must just put this on a different website I know he's pretty much obsessed with x.com and driving traffic to there and having you know revenue from this but I just think increased accessibility would be good for the long term now let me know what you think about this model do you think this model is really good do you think it's really cool how are you going to be using this model if you are let me know what you think and either way I'll see you in the next one
Info
Channel: TheAIGRID
Views: 53,942
Rating: undefined out of 5
Keywords:
Id: 4Ot5HLKhyVw
Channel Id: undefined
Length: 8min 55sec (535 seconds)
Published: Fri Mar 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.