Metas LLAMA 3 Just STUNNED Everyone! (Open Source GPT-4)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So Meta have finally released their long anticipated llama 3 model which is an open source model that actually grants access to a variety of new capabilities in terms of how well the model functions when it answers questions and this is a truly Landmark event for the AI community so I'm going to let Mark Zuckerberg State exactly what's going on and then we'll dive into the technical details of exactly what this release means all right big day here we are releasing the new version of meta AI our assistant that you can ask any question across our apps and glasses and our goal is to build the world's leading Ai and make it available to every now today we are upgrading met AI with llama our new state-of-the-art AI model that we're open sourcing and I'm going to go deeper on llama 3 in just a minute but the bottom line is that we believe that meta AI is now the most intelligent AI assistant that you can freely use to make meta a even smarter we've also integrated real-time Knowledge from Google and Bing right into the answer we're also making met aai much easier to use across our apps we built it into the search box that's right at the top of WhatsApp Instagram Facebook and messenger so anytime you have a question you can just ask it right there and we built a new website mea.ing it from the web we're also releasing a bunch of unique creation features meta AI now creates animations and it now creates high quality images so fast that it actually generates an updates the images for you in real time as you're typing it's pretty wild and you can go check it out now on WhatsApp or the website we are investing massively to build a leading Ai and open sourcing our models responsibly is is an important part of our approach the tech industry has shown over and over that open source leads to better safer and more secure products faster Innovation and a healthier market and Beyond improving meta products these models have the potential to help unlock progress in fields like science Healthcare and more so today uh we're open sourcing the first set of our llama 3 models at 88 billion and 70 billion parameters they have best-in-class performance for their scale and we've also got a lot more releases coming soon that are going to bring multimodality and bigger context Windows we're also still training a larger dense model with more than 400 billion Pur and to give you a sense of llama 3's performance this first release of the 8 billion is already nearly as powerful as the largest llama 2 model and this version of the 70 billion model is already around 82 mlu uh leading reasoning and math benchmark the 400 billion parameter model um is is currently around 85 mmu um but it's still training um so we expect it to be industry leading on a number of benchmarks we're going to write a blog post with more technical details on all of this if you want to go deeper uh in the meantime enjoy meta Ai and let me know what you think so that was Mark's suckerberg statement and honestly there is quite a lot of information to dissect because there's just so much from this release and it's actually a lot more than than many people including myself did anticipate so let's actually take a look at one of the first things that he talks about and this is of course the benchmarks so we can see here that the benchmarks are actually rather surprising we can see that the meta llama 3 instruct model performance and the reason these benchmarks are so surprising is because if we take a look at these models they're actually state-ofthe-art which means that this is the very best that you can get in terms of AI so there's nothing better currently that exists at the 8 billion parameter model and at the 7 billion parameter size so with that you have llama 3 leading the way in terms of Open Source now one of the most surprising things that I think most people were actually shocked by was the fact that if you take a look at some of the benchmarks we can see that this right here is Claude Sonet and this was part of Claude 3's family of large language models but it seems to have been surpassed by meta with llama 3 which is quite surprising now we don't know exactly how large Claude 3 Sonet is but it's quite surprising that a 70 billion parameter large language model can actually surpass a state-of-the-art model that many people do use on a daily basis for a variety of tasks which does go to show that this industry is currently and consistently being shaken up in terms of who is the market leader on the benchmarks for different size and different price points and and I can see that this for meta and llama 3 is going to be a keykey area for dominance due to their ability to continually make changes to their models and update them and completely thrash them on benchmarks now to be honest with you guys this was something that I think nobody really expected because these models are just open source and mainly for the developer Community whilst yes we knew there were going to be improvements surpassing Gemini 1.5 Pro even on some benchmarks like the MML U are obviously quite surprising and of course when we do compare it to the other models at similar sizes like Gemma Google's Gemma and of course mistra 7B instruct we can see that llama 3 absolutely thrashes these models in terms of performance and in terms of General ability overall so this was something that you know you can look at and see that okay right now it seems that currently we're on a path where even companies like mistel are being beaten in terms of their ability to launch large language models or AI systems that are consistently at the front of the pack in terms of their ability which is to be honest with you guys quite surprising so with that being said there was also some other information that you should know one of the things that they did was they sought to optimize performance for world world scenarios and to this end they developed a new highquality human evaluation set this evaluation set contains 1,800 prompts that cover 12 key us use cases including asking for advice brainstorming classification closed question answering coding creative writing extraction inhabiting a character SL Persona open question answering reasoning rewriting and summarization and of course to prevent AC and of course to prevent accidental overfitting of our models on this evaluation set even our own modeling teams do not have access to it so essentially what they did here was they used a new human evaluation set and I think this is really really important because I've always said that humans are the end users of these products so they should be optimized for humans and not benchmarks and that's what some things like the LM leaderboards have aimed to optimize for and see which llms rank best in that area because it's being used by real humans and that should be by default The Benchmark that people do test their model against because that is what is going to be what people actually use it doesn't really matter if something looks as if it achieves great unlike the mlu or the GSM AK if people can't actually use the model on a day-to-day basis for certain things of course unless it has a really specific use case I think this is going to be something that shows us how good this is now if we look at llama 3 versus the human evaluations one of the things that they did was they tested it against some of the other state-of-the-art models so we can see here that in the initial area we can see that the meta llama 3 was tested against Claude Sonic and and overall the majority of the time it did win in the human evaluation which is like I said before quite surprising and it was 52% a win 12.9% a tie and 34% a loss but across the board we can see with even mraw medium that meta's llama 3 the 70 billion parameter model is really really surprising in terms of its capabilities and across the board it increasingly gets better compared to mistal medium GPT 3.5 and metas llama 2 so honestly they've actually done a pretty amazing job being more efficient and managing to get an even better AI system while still remaining at the same number of parameters now in comparison with other open source we can also see that the pre-trained model performance does also Excel these other open source and of course these other closed Source models abilities so we can also see the Llama 38 billion parameter compared to the mistra and the Gemma and it just completely thrashes them in terms of performance we can also see the 70 billion parameter model doing better than Gemini Pro 1.0 and mist's 8 time 22b and this is actually quite surprising considering the fact that MRA actually did just release this 8 time 22b model so I'm not entirely sure how on Earth meta you know were able to secure that lead because it seemed like with Mixr just releasing stuff pretty randomly if you don't know mix tr/ mistra are an AI company that is completely open source and when they release stuff it's pretty insane because they don't say that okay we're working on releasing this the thing that they do is they literally just released a download link that you can go ahead and then download and then you kind of have to find out exactly what they released how good it is and then kind of Benchmark it yourself there's no real blog post for that so the fact that you know mix could just drop anything open source at any time and they're pretty much the leading open source and to be honest hats off to them because they're not you know backed by a giant company like meta they don't have billions and billions of dollars I mean they recently did raise at 2 billion but you know meta is like a you know multi-billion dollar company so you know the comparison there is a bit stuck but the point is is that they were still able to do a lot better than that um and whilst yes it is marginally better it's still pretty surprising that they were able to do that compared to something that was only released a couple of days ago now in terms of the model architecture there are some quite interesting things here llama 3 uses a tokenizer with a vocabulary of 128,000 tokens that encodes language much more efficiently which leads to substantially improved model performance and in addition to that the training data was something that is rather fascinating because people always love to see what you trained your model on and llama 3 is pre-trained on over five trillion tokens that were all collected from publicly available sources and they state that their training data set is seven times larger than that used for llama 2 and it includes four times more code and of course to prepare for upcoming multilingual cases over 5% of the Llama 3 pre-training data set consists of high quality non-english data that covers over 30 languages however they do not expect the same level of performance in these languages as English which does make sense so they actually basically with the training data just ensured that their data set was truly truly high quality and that's why they are able to get so much more out of this model compared to other models at the similar size and remember as we've always discussed when training your models data is one of the most important things as we've seen with smaller models like Orca 2 and Microsoft's F 1.5 52 and of course those smaller models now in addition something crazy that he did actually speak about and I think this is pretty fascinating was that meta's llama 3 is actually going to be a 400 billion parameter model and this model currently is still in training the checkpoint as of last one was April the 15 2024 and you can see that this pre-trained model is pretty pretty intense you can see that these benchmarks are quite surprising considering the fact that meta have previously not trained a model of this size and this is the first time that we're getting a look at what meta can really do when compared to these close Source companies and I think it's so fascinating to how quickly these llms and AI systems are consistently evolving as these giant companies try to one up each other with in addition to just simply doing better in terms of providing a service they just constantly try to raise the level of what these benchmarks are now this was actually compared in a table provided by Jim fan and you can see he States here that the upcoming laa 3 Model will Mark the watershed moment that the community gains open weight access to a GPT 4 class model it will change the calculus for many research efforts and Grassroots startups I pull the numbers on claw 3 Opus gpt's recent model which is GPT 4 2024 0409 and of course Gemini and it says llama 3 the 400 billion model is still in training and will hopefully get even better in the next few months there's still many research potential that can be unlocked with such a powerful backbone and expecting a surge in Builder Energy across the system so the pretty much what we can see here is that this is basically on the level of GPT 4 and with that it's pretty crazy because if you have an open source level of GPT 4 it now means that people have access to build a variety of different applications and AI systems that they couldn't before in many different ways which means the ecosystem is going to truly evolve from this moment and that's why I said that this llama 3 release was definitely going to be something pretty crazy and it's basically open source gp4 when it is released in the couple ofc coming months and I do think it's going to be pretty interesting at how they Main main safety with this model because as you know usually when you open source something there are Bad actors that try to tweak the model but it will be interesting to see how meta changes their approaches as their model does get smarter but honestly this table is quite shocking I mean on the MML U it pretty much surpasses the rest of the models gpq it's on par the human eval it's on par and of course the math benchmarks it does lack a little bit but like they said it's still in training so that means that this system will likely get even better now here's something that they also did talk about which wasn't all the technicals currently they did make a new website in order for you to access this model currently so I do want to state that if you are in the EU and I know at least if you're in the UK you actually won't be able to access this current page that's because there are just some rules and regulations that means things take longer to get here however I'm going to be doing a full tutorial on this which will release probably about 1 or 2 hours after this video goes live I would have included it into this video but it is just significantly harder for the algorithm to find especially for those looking to use this now there are many different things like he said there's animations there's different ideas there's different images um so it's going to be pretty interesting to see how this new model is used if you would like to access this model whilst being in the UK like I am of course you're going to have to use a VPN and I think at this point I might just you know find a a VPN partner to partner up with because I find myself logging into vpns all the time because for whatever rules and regulations exist for some reason us people that live in Europe and the UK we are subject to just delayed AI releases so with that being said let me know what you thought about this release are you surprised by the benchmarks are you excited for the open source community that basically gives us access to an open source level GPT 4 sometime in the future and are you ready to try out meta AI let me know what you think
Info
Channel: TheAIGRID
Views: 486,643
Rating: undefined out of 5
Keywords:
Id: cEHFzvU-pzk
Channel Id: undefined
Length: 15min 29sec (929 seconds)
Published: Thu Apr 18 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.