Why Llama 2 Is Better Than ChatGPT (Mostly...)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
MetaJust surprised us with a brand new open source language model called Lama2. This thing is the best open source model we have, and in many cases they claim this to be better than GPT 3.5, which is the default ChatGPT. But in what ways is it better? Is it more up to date? Can you use it? How does this move the AI space forward? And why should you even care? I'll cover all that today and we'll even go into a quick demo. So first things first, why is this such a big deal and why should you care? Well, I'm going to do my best to keep this simple, but in the introduction of the paper that they released with this, it says exactly what you should know. There have been many public releases of pre-trained large language models such as Bloom, Lama and Falcom match the performance of closed pre-trained competitors like GPT-3. Okay, so that's the first thing. There's a big distinction between models that are open that you can download and build your apps upon and that are closed, where all they give you is a link where you can use their model on their servers, but you can't actually download all the code and all the weights and build on it yourself, right? Big difference between open models and closed models like GPT-4, for example. Okay, so they continue. But none of these models are suitable substitutes for closed product large language models, such as ChetGPT, Bart and Claude. So just what I said right there. These closed product large language models are heavily fine-tuned to align with human preference, which greatly enhances their usability and safety. And this is the big point here, okay? A lot of the open source models up until now were below average and there was no fine-tuning on top of them. You might have heard the stories of OpenAI paying thousands of people in third world countries to go over the results and rate them and then taking that data and feeding it back into the model, right? Well, it turns out that doing that is really damn expensive. And the amazing thing here is that we get a really capable model with Lama, plus we get a variation that has been optimized by humans, aka heavily fine-tuned to align with human preferences. And the two models that Meta, in cooperation with Microsoft, released here are Lama 2 and Lama 2 Chet, Lama 2 Chet being the one that has been feedbacked by humans, okay? So to me, this is the really exciting one and they come in three variations. So in total, we got six brand new open source models here. And the variations are 7 billion, 13 billion and 70 billion. And that's the amount of parameters it was trained upon, 70 billion being the most capable one. So the 70 billion Lama 2 Chet is the one that I personally am the most excited about here. And we're going to talk about that more in this video. We're going to talk about further differentiating factors and exciting news after we look at the license because that is really the big news here, okay? And here it is, this is the punchline. Lama 2 is free for research and commercial use. So you can build your company Chetbot on this and you don't need to pay for the GPT-4 API. You can make it your very own and you owe them nothing. Look at that. This is the licensing agreement. You're granted a non-exclusive, worldwide, non-transferable and royalty-free limited license. There is one hilarious exception to this, which it says here, if the monthly active users of the product or service built upon this is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta. So this essentially says if you're Amazon, Apple or Google, you need to get a license. Everybody else on planet earth, use this as you desire. So again, that is huge because we're getting the power of Chet GPT into our hands and we can build on top of it now. And that brings me to the next topic. First of all, in terms of safety, this will be the safest large language model out there. I have yet to test this extensively, but have a look at this chart. They ran around 2000 evil prompts and the lower the percentage here, the safer the model is, aka the less information gave away. So Chet GPT already was notorious for not giving out much and being very secure, right? But here on the scale, it comes in at 7%. And the llama 270 billion chat model, which is probably the most useful one in here comes in at around 4%, okay, nice. So this model is family friendly, which is great for business applications, right? You want it to be that way. I personally hope that in the future, we'll also get fully open models. So we can do more creative, but I understand the challenges of that. And I think this is actually a smart approach. Okay, but what about performance? How good is this thing compared to Chet GPT? Well, on page 19, they actually included a benchmark showing a comparison, okay. And the way this test works is they use 4000 helpfulness prompts. And this is important to understand because here they even say it does not cover real world usage of these models. And this prompted does not include any coding or reasoning related prompts. So a lot of this is like information retrieval, where you ask it a question, it gives you an answer. Again, that's exactly what you want from chatbots. And if we look at the results from these helpfulness prompts, we will find that it actually won over Chet GPT. It's very close, but it's ahead. And honestly, if they just match GPT 3.5 levels, I'm happy with that. That is more than good enough for a lot of use cases that you would want to build on this thing. So when it comes to benchmark, there's a set of academic benchmarks that we want to look at. And they included these here too. And for consumers, this is probably the most interesting part of this paper. So as you can see, right here, you have the names of the different benchmark models. And before you look at these numbers, you have to consider that all of these are closed except of Lama 2 here, right? But the results are not bad. I mean, yes, as I always say, GPT 4 still is king, that's just undisputed. But I think the fair comparison here is GPT 3.5. And when you look at these results, 70 versus 68.9, 57.1 versus 56.8. And then okay, on this one, it's really far apart. But if you check out what this benchmark is all about, it's code generation. So okay, you're not going to be picking this model for coding. And I mean, it goes without saying that GPT 4 just smashes this benchmark, but on a lot of others, even here, it comes close to Google's Palm 2L. And if we go back into the open source realm, which is more of a fair comparison here, then you'll see that the Lama 70 billion model just smashes all the other models in all of these benchmarks. Reading comprehension first, math, not even close. And even on reasoning, it's the best out there right now. And if you pair that with the fact that the cutoff date of this thing is actually September 2002, with fine tuning data being more recent up to July 2023, you'll realize that this base model has one more year of knowledge than what GPT 3.5 has built into it. Big deal, actually. So overall, I think it's fair to say that this is better than GPT 3.5. It's open source, it costs nothing, it's more up to date, and it's cleaner, which can be a good or a bad thing, I guess. So that leaves us with two questions. How do you use this thing? And when would you want to use it? Well, in order to use it, you need to download this model. And you can only do that by filling out this form and them accepting you but hold up, I filled this out. And within an hour, I got an email where you get a link to the GitHub and your very own link that gives you access to the full thing. So now I could download this thing and start building on top of it. And this is the point. This thing is not meant for consumers, this is really meant for builders. But you can still try it out. On Twitter, I found this link to a streamlit app where Nirand Kasliwal was kind enough to put up his chat demo for us to try. So at the point of recording, this is accessible. Later on, I might have to switch out the link in the description, but we can simply test this. And first of all, I'll just go with the classic, write me an essay about penguins. All right, let's see how this goes. And already having run this prompt hundreds of times across all different language models, I can say the structure is very different and distinct from GPT 3.5 and GPT 4 here. I don't feel like I've explored this enough to give you guys objective opinion on how the outputs differ. But certainly, these are very usable results, just like GPT 3.5 would give you spoken in a different tone and voice. What interests me a little more is the safety aspect, right? What if I ask it something slightly spicy, like tell me a joke about Donald Trump. And it says, I can't satisfy your request. I'm just an AI, it's not appropriate for me to generate jokes that might be considered offensive or derogatory. Lord, here we go again. Or what if I just say tell me a joke about penguins? Why did the penguin go to the party? Because he heard there was a cool gathering. Okay, but it does it so you can clearly see the political safety filter at work here. The last question that remains is, what is this good for? Well, first of all, people are going to be building web interfaces like this, where you can just use this as alternative to open AI. But mostly, this is good to build apps on top of, you're not relying upon some external language model that they might turn off or change the pricing or change the quality of or sensor tomorrow, right? Well, okay, on the censoring point, this thing is pretty damn censored already, but at least you know what you're getting. For me personally, the number one thing I see this being used for as one of the services we offer custom chatbots for companies is chatbots that belong to companies. And this is going to be the go to model moving forward. No more open AI API with rate limits and having to explain to the clients that there's a few dollars of extra costs depending on the usage. No, you just download this thing, build this into the bot, and you have a self contained version where the data is not being sent around to open AI and back, it's all right there, the model, the chat interface, it's all yours, and you don't know anybody, anything. And that's the big difference here. Because in this ballsy move by meta and Microsoft, they really changed the game and forced a lot of other players to act more openly, because they just set the standard for what a good model is supposed to look like and what the licensing around it is supposed to look like. This is really a fantastic direction they're pushing the space into. And I hope that this video helped you understand what is actually happening here, because it's a big deal. If you take everything we just talked about, and you combine it with what I covered in this video, you'll realize that soon we'll get to take open source models, train them with people's personalities, and you'll have like an embodied person inside of the language model. It's absolutely crazy. But check out this video to understand what I'm talking about. Because there I uncover a hidden capability within chatGPT, where you can start talking to certain people just based on their Wikipedia page. Crazy times we live in, but better get used to it.
Info
Channel: The AI Advantage
Views: 74,191
Rating: undefined out of 5
Keywords: theaiadvantage, aiadvantage, chatgpt, gpt3, ai, chatbot, advantage, artificial intelligence, machine learning
Id: blyzUI8kOG4
Channel Id: undefined
Length: 9min 36sec (576 seconds)
Published: Wed Jul 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.