Stable-Vicuna: The Most Powerful Open-Source LLM Yet?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
stability AI the company implant stable diffusion just released their second large language model now this one is called stable Kunia and they claimed it to be the world's best open source RL HF llm should recall a few days ago they released stable NM it was their first large language model so how is this new stable bukunya different from stable LM so this model is trained with reinforced learning from Human feedback which is a very similar approach to what open AI is using for their models including chat triputi and gpt4 the base model itself is rubaconia v0 13 billion parameter model it was further instruction fine tune on a large data set and then it was trained using this RL HF wokunya itself is a fine-tuned version of filama 13B model now that makes it very tricky for commercial use because llama's original weights are not open source theoretically you still need to request just met our Facebook to get access to those weights now the main question is does using rlhf even help the model or not so in this work stability I actually provided a comparison of this model with similarly sized models on different Benchmark data sets so we will be conducting our own tests in this video and I'm also going to show you how can you potentially use this on your local machine if you have a powerful enough GPU but based on these results and some of the tests or benchmarks it actually performs pretty good so for example this boom queue it actually outperforms all the rest of the models similarly on this data set and then Li it is able to outperform the rest of the models however it's not really consistent for example if you look at this truthful QA then it lags behind and same is the case in some other cases so let's say in this case there's a Arc challenge so it lags behind but overall it's up in the first or second position it's not too far off now in most of the cases the appropriate comparison is going to be with wokunya 13B parameter model rather than the rest of the models so in all the cases if you compare this table with wakuna 13B at least on these benchmarks it outperforms except in this truthful uh QA Benchmark so this is very significant because as far as I know wokunya is probably the best open source model which could be compared with chat chipity in some cases now how did they achieve this so they actually did two steps fine tuning in the first case they used a newer data set so this is a combination of three different data sets to fine-tune the okunya 13p model this is the data set that we used is open Assistant conversation data set which is a human generated human annotated assistant style conversation next data set is GPT for all prompt generation this data set is prompts and responses generated by chat TPT and the last one is and the alpaca data set which is again instruction and demonstration generated by open air is DaVinci model now a couple of things to consider here with the fine tuning uh two out of the three data set are relying on chat TPT so the answers or responses from this model could be skewed towards responses that you would expect from chatgpt the second stage is where they do the reinforcement learning with human feedback again they're using three data set so one is the open Assistant conversation data set containing around 7000 preferences samples then there's an entropic reinforcement learning data set and then there is the one from Stanford human preferences data set as a the great thing about these data sets is all of them are open source and you can actually look at how they look like so for example we are looking at this entropic data set so in this case human was given two responses from the assistant for the same prompt and then the human chose which response is good so in this case for example you see that the prompt is the same but the Assistant gave two responses and then on the left hand side you uh can see the one that was chosen by human on the right hand side that's the one that was rejected so this is how they trained it using this reinforcement learning technique now apart from this model they also announced another feature which we're going to talk about in a second but let's talk about how you can actually get access to this model as I said it's an open source model based on okunya 13 billion parameter which is based on the number of which they have made the Delta weights available and you can even play with this locally if you have a powerful enough machine we're going to be looking at um a hugging face spaces where we will be experimenting with the model but at the end of the video I will show you how you can potentially run this locally or in a Google collab if you have powerful enough machine so if you recall a few days ago I showed you this hugging cat interface from hugging face and now stability AI is also announcing the upcoming chatbot interface this is great because this shows that we're going to have multiple options so it's really awesome if you don't want to be sending your data to open AI now let's experiment with the model itself to access the model you can go to hugging face and then click on spaces and then you want to search for stable cleaner I am going to put the link to the model itself so here we have uh the link from carport AI so let's click on this this is their chat interface so let's first test whether it's any good for IDs generation and so the prompt is create a list of three startup ideas in the Enterprise B2B SAS the start of ideas should have a strong and compelling the mission and also use AI in some way and we want it to avoid cryptocurrencies and blockchains okay it came up with three ideas uh one thing is that at least the names are original I asked the same question from other open source anywhere chat GPT and most of the time it comes up with product names which already exist but in these in this case uh all the 3D names that it came up with are actually original so that's pretty nice and the second uh part which I really liked is especially this insightful deity this I haven't seen before at all in any of my other interactions with these large damage models so it says a v2b SAS platform that uses the AI to predict the optimum time for deity performers to schedule their milking either platform can then automatically schedule and optimize milking get to maximize efficiency and minimize waste the mission is to help Dairy Farmers increase production product productivity and profitability while reducing environmental impact this is pretty cool uh as I said I haven't seen this before at all the next two is what you would expect uh probably well you guys all have already seen it so the second one is generate custom tailored Financial forecast for a small to medium sized businesses I think this is already being done uh then the uh the last one mindful manufacturing it's a AI platform to automate and optimize manufacturing processes so that's also something which is already available but I really like this insightful daily uh idea so I'll give it a A plus next I went and found this article three ways yeah I can help you build your course outline hint for you guys and copy this all text and ask uh stable to actually summarize it and I said things step by step so here is the summary and it's actually it did a pretty good job it was able to uh get the three main points that the author is trying to convey so author is basically talking about how you can leverage Ai and course creation and it came up with a pretty nice summary one of the examples that they have shown in their blog post is that it's able to do basic math so for example they are asking it to add 25 plus 64. the answer is 189 so I thought I'm going to replicate this but it didn't go as planned so in this case as is can you do math it said yes I can help you with math right so I said what is 26 plus 9 the answer is 93. um that doesn't seem to be right right so it ask it can you explain step by step so it says you add two numbers the answer should be 93. uh then I checked it what is one plus one it's giving me the correct answer then uh I asked it again what is the sum of 16 plus 9. so this time it got it right so it really depends because I asked the same question again and this time it gets the correct answer uh so another test was the square root of 81 it is able to figure out its name but uh based on my test think you can't really rely on it for mathematics uh it's all over the place um the answers are not consistent it will just make up numbers sometimes so um probably the best to avoided it for mathematics now just to be fair to a stable kunir uh all national language models have a trouble solving mathematics then again here's another unused case so I asked to write a job description of an experience machine learning engineer include all the necessary skills I think it came up with a pretty good list but I have seen that some other large language ones can produce a more detailed description than this but it is still a good starting point you know before looking at its ability of generating some mildly controversial responses let's look at its ability to program now I use a couple of simple tests and to check its ability to program so the first one is I had a python function to write a file into an S3 bucket using the Border library and if you search online you should be able to find this there are multiple implementations so most of the models are actually able to implement it flawlessly that's the case with stable poconia as well now the more interesting one is this where I have seen some of the large language models have trouble implementing it so I'm asking it right in HTML code for our website that has a single button when the button is pressed it randomly shows one joke from a list of 10 jokes and display that to the user a button press also changes the color of the background right so I do see there are not 10 jokes but I would say five jokes and then it says so on so you can actually add to this list of jokes that's fine but let's look at if it actually works most of the large language models that I have used they actually have trouble implementing this okay so I'm going to copy and paste the code here if you don't know this is the HTML editor from w3schools so you can actually run this online let me just run this okay so it does show a button get random joke and then it says loading right that's click the button and see what happens oh wow it worked that's pretty neat although for some reason it's not changing the color oh okay I think it got that part wrong because it simply set the color to Blue so it's not really modifying the color at all but I think it's still a pretty neat it got the first part right now since it's a chat interface you can actually go and tell the model if there's something wrong so I said the color is not changing when I click the button it always it's always blue with each button click the color should also change to a random color right and then it went ahead and said that uh you can modify the get random joke function right so it's re-implemented that so let's see uh if it actually works I think right now it's selecting out of those five different colors but that's fine okay now I need to actually replace this function so I'm going to copy this I'd paste it here I don't wanna I don't care about the of formatting right now but let's run this okay so oh wow okay nice so it actually works this is pretty cool uh so now you since we have the ability to actually chat with the model so you can uh tell it the edits just like charity Wiki and it's able to um update it this is pretty awesome okay so the next two ones are simply asking it to pretend either to it to be a Democrat or Republican and explain why their respective policies are best for the country uh it does give me a response so that's pretty nice to see uh if she has set TPT or like a variation of this question it probably is going to say as a large language model I cannot do this so it's good to see that it's able to do it like we have a response for the same question when we ask it to pretend it to be a republican as well for some reason uh it added the stable community in the start probably that's a token it's using now let's see how you can actually use this to look any or in your own applications if you have a powerful full enough GPU unfortunately I don't really have access to really powerful computer so I cannot run these on my local machine but there is a simple way around it so you all you need is to install the Transformer library right and the rest of the process is very simple to what you would expect from a Transformer based model so you need to have a tokenizer since it's based on Lemma so you can use the Nama tokenizer and then you need to have Auto ml for causal LM so you need the model itself the provided here but this is the Delta model but there are good people of the internet so for example this person the block he has provided an uncontrised model in the hugging phase format so you can actually simply copy this and replace it with replace this path to the uh on quantized mod right so you can simply put that path here and this will download the corresponding model and you can use it now you just want to be careful about the prompt so the prompt has to be provided in a very specific format so the prompt starts with uh three hash sites or pound signs with the space and then human and after this you need to provide your prompt and then you need to append it by three pound signs and then an assistant and after this uh backslash this is when it's going to start giving you the response so it's a very particular format that you need to make sure you're following then the rest is very simple to uh what you would expect when you were working with a model from hugging face now as I said unfortunately I don't have access to a powerful but the process should be very simple to follow now final thoughts on the model it is a very powerful model indeed there are definitely some issues in some cases it doesn't perform as good but uh you would expect the performance to be on par with the original 13 billion parameter model uh the great thing is it's the first step in uh training open source models with uh reinforcement learning with human feedback so in future or in really near future we can expect to have powerful uh open source model that could potentially compete with chat GPT on a very specific tasks and I'm going to highlight this the power of this model is not a disability to generalize but they are going to be trained for very specific tasks if you have any comments or questions please put them in the comment section below hope you like the video so consider subscribing to the channel if you already haven't thanks for watching see you in the next one
Info
Channel: Prompt Engineering
Views: 23,773
Rating: undefined out of 5
Keywords: ChatGPT, GPT4, GPT3, OpenAI, open ai, ai chatbot, ai news, ai robot, open ai chat gpt, chatgpt examples, chat gpt explained, openai chat, ai tools, openai chatgpt, what is chatgpt, chatgpt explained, chat gpt, openai chatbot, ai chat, chatgpt demo, gpt-4 demo, gpt 4, gpt 4 demo, ai robots, gpt 4 website, stablevicuna github, stablevicuna huggingface, stablevicuna, stable vicuna, stablevicuna best 13b llm, Prompt engineering, prompt engineer, prompt engineering tutorial
Id: 83gS_5kvhcM
Channel Id: undefined
Length: 17min 7sec (1027 seconds)
Published: Wed May 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.