FreeWilly (Orca 🐳): NEW LLM Leader! Better Than Llama-2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's been less than a week since Lama 2 was released and there is a whole entirely an open source model which is outperforming it on some of the benchmarks the new best model is called Free Willy 2 released by stability AI stability AI is the company behind the stable diffusion project how was they able to outperform the Llama 2 models for answer we need to look at the article paper by Microsoft research although we are still waiting for Microsoft to release the model as well as the data set there has been a number of Open Source projects they have been able to replicate the data creation process proposed in the archive paper and has resulted in some really impressive models the new free video models uses a very similar approach so they are basically fine tuning a 70 billion parameter lamma 2 model with a data set curated using the Orca approach proposed in the original Orca paper so stability AI is releasing two different models the first one is called free video one and the second one is called privity2 the original free value one model is actually a fine-tuned version of the original Lemma 65 billion parameter foundational model and that was carefully fine-tuned on a newly synthetic generated data set and we're going to look at how this data set was generated the free vd2 model is actually a fine-tuned version of the Llama to 70 billion parameter foundational model and according to the evaluation the performance compares favorably with chat GPT for some tasks now I just want to say this the results are on benchmarks the mileage on real world applications might vary now one not so good thing about these models is that even though they are open source models they cannot be used for commercial purposes and I think it comes down to they did a creation process that they have followed because the base number 2 7 billion parameter model is open source and you can use that for commercial purposes so let's look at the digit generation and collection process and see what exactly they have done here as uh noted here the methodology that they adopted was inspired by the Orca paper proposed by the Microsoft research team and they go on to say that while our data generation process is similar we differ in our data sources so the data set and model associated with the Orca paper are supposedly open source but they have not been released by Microsoft research and that's why all the open source Community or projects are actually using the methodology proposed in the paper to create their own data sets for training these models they have used nearly 600 000 data points or examples which is around 10 percent of the original data set proposed in the Orca paper if you notice these are relatively a small number of examples but they are high quality instructions and that's why the resultant model is really uh incredible in its performance this shows that you need to pay close attention to your data set that you're going to be using for training or fine tuning these lash language models and as a result this the state that despite training on one tenth the sample size of the original Orca paper the resulting free video models demonstrate exceptional performance across various benchmarks and again that comes down to how good your training data set is let's talk about the performance evaluation so they have used the LM evolve harness dataset which is their internal data set by the Luther AI so eluter AI is also a company under stability AI according to them the fee with the models excel in many areas including intricate reasoning and linguistic and then answering complex questions related to different domains for example law and mathematical problem solving so this seems to be pretty General models now let's look at a couple of very interesting comparisons that they have provided on different benchmarks keep in mind these results are on benchmarks the real world performance will most probably vary than what you see on these benchmarks so with that in mind let's look at this comparison so they are comparing both the models to chat GPT or Turbo 3.5 now we know that chart GPT has not been evaluated on all different benchmarks so the the results that you see here they are probably not official results from open AI so at least on one Benchmark which is the hello swag the 3vd2 model is able to outperform chat GPT model for the rest it doesn't really seem to be closer for mmlu it's within I guess like one percent for the AIC data set the performance is pretty bad relative to attached team however on the AGI evolve data set the results are really impressive and they are really comparable to charge GPT if you look here out of the eight data set on which the comparison was performed with a zero shot 3vd is able to outperform chat GPT on 6 out of the 8 different benchmarks in actually I would say this is pretty impressive now there is this very interesting table uh which is comparing free reading one to Free Willy 2. and the result is surprising for me at least like even though this Free Willy 2 is based on the lamba 27 billion model uh and vvd1 is based on the original gamma 65 billion model the results are pretty comparable to each other like I don't see a clear winner I think they are performing uh pretty consistently on these data sets so this kind of shows the importance of the data set rather than the size of the model and because both of them are trained or fine-tuned on the same 600 000 examples so even though there is a size difference of 5 billion parameters I don't think it translates into a huge performance Improvement for the vidi 2 model now the question is can you use these models and the answer is yes so the models are already released and you can access them on hugging face and since since it's based on number two so it's fully compatible with the Transformer library from hugging face there are even a couple of spaces where you can experiment with these models however you need to be aware that the wait time is too long for them to be useful for example here is a test prompt that I want to use in the stability AI free video 2 spaces and if you notice I am actually 48 in the queue and it doesn't seem to be changing at all right so probably we need to wait for somebody to generate or create a quantized version of this model probably you will be able to run a 2-bit quantize version because it's the biggest llama 2 model I think it will be very interesting to see its performance in real world okay so I'm still in the queue but I don't think I'll be able to include this in the video but the amount of elevation that is happening in the open source Community it is simply mind-blowing and impressive and to be frank really hard to keep up with I hope uh stability AI will release a smaller version of the model at some point so that it would be more useful for internal population so this was a quick update on the free VD projects I hope you found it useful thanks for watching and see you in the next one
Info
Channel: Prompt Engineering
Views: 13,684
Rating: undefined out of 5
Keywords: prompt engineering, Prompt Engineer, GPT-4, meta llama, llama2, llama-2, llama ai, freewill, freewilly2, freewilly1
Id: _eE91oJKKA4
Channel Id: undefined
Length: 8min 18sec (498 seconds)
Published: Mon Jul 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.