How to Create Llama 2 Chatbot with Gradio and Hugging Face in Free Colab

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in today's video we will chat with llama 2 using graders new module chat interface and in one of my previous videos I showed you how to prompt a chat with a llama 2 but the previous video wasn't like the perfect example of that because we didn't really create a chat but we only got responses from Lama 2 but today we will really create a fully functional chat that will let you add Clamato and get responses and we will run all the code in collab notebook so we can just copy this notebook and run it by yourself and it's all on the free tires so you don't have to pay anything for that in Co-op and you will see like the fully fully functional chat so let's just Dive In let's move to the project then so this is the first three step that may be required because I had troubles upgrading radio and running this cell and then restarting the runtime has solved that problem so if you've got troubles running this cell just get back to this one uncomment run it and restart runtime sorry so let's start with the basic installations they are for the hugging face Transformers torch and accelerate okay we've got it and now let's try to install the newest version of radio like I said we will use this chat interface which is very very convenient for chatting with large language models gradual 3.42 is installed we will need access to the model on hugging face and this link will lead you to hugging phase llama27b and you either already have the access to that or you're going to see this message and it will send you to the meta website and The Meta website you will see this short form you just fill it with a basic information if you don't know what to put in organization just write personal and then just you will definitely need this one optional this one and it may take a couple of minutes for getting the access yeah but this is necessary to to get our to use a llama too so the next step will be hiding face login that's also the requirement and it gives you this link you just click on it and then copy go to stalking paste it then you choose no and I'm successfully logged in and just as just to ensure I always call this who am I and this is my username okay the next step is loading the model and tokenizer so in this step we just Define the model which we're going to use and as I showed you before this is we're going to use this llama27b chat hugging face so I'm running this one and then we will we just need to tokenize we but luckily uh hugging phase has this Auto tokenizer um module so we can just use it for any model and the next big step is just creating the Alarma pipeline or like the pipeline for any model but because we you we're using pipeline that's just where because we're using llama this is where we Define the model again and pipelines have so many different tasks and for our project we will use this text generation and let me just run this cell because that's what it really takes a lot of time and I'll get back to you once it's all loaded okay also the model is now loaded as you can see over three minutes to run this cell and I'll move to the next step which will be at this point I will kind of repeat what I have already shown you in one of the previous videos how to get basic responses from llama2 and this is pretty much the same function I've used in this other video let me just initialize the function and in this function we are using sequences with llama pipeline which we've just defined here and it will generate responses so let me just call it with hi this is my name okay so this is this author is very very uh funny but it shows also the other point that I want really want to mention in this video but it deserves a separate video on how to actually prompt llama but okay so I just introduced myself and it's given me like advice on how to get your ex back yeah so not really what I expected and I'm asking what's my name and let's see my name is Jack okay that's really interesting to be honest so like in the previous video I just my goal was to show you how to produce answers from llama2 but it wasn't really really conversational and right now you can you can really see that so it doesn't know it came up with with my name with like it it's made up my name which isn't my name so this is a very very bad example of on how to use llama too and so here are some drawbacks of of this solution is like there's no history and you can't really customize it and it's definitely not ready to use it as a chatbot and yeah as you can see it just keeps generating keeps generating like text without even me wanting it so the idea for this video is to show you first how to improve prompts and this is like the right structure of llama2 prompts and I will send you the link in the description uh to the hugging face blog post where I got it from but in general uh like your like user message is between the tokens of inst and inst and this is like this token is a start this is the start of the sentence and this is the end of the sentence but yeah but in this case sentence is like a piece of text and in the first and this is like user defined so what you give to the to the chatbot and in the first message you just want to define the system prompt which is also something I I really want to talk about later but not in this video but this is the right way of prompting glamor too because this is how llama 2 was fine-tuned and this is really important in order to get the right responses or like the more precise and correct response it's not like this garbage that I got here and give me I need to edit something okay and for this project we will also use graduate and radio has this chat interface with the message and the history and the message is your like last message and the history is the history of like user input uh chatbots response user input chatbot response so it's a list of tuples like I just wrote it here so like your first message and then the bot answer and then your second message and then the bot answer so of course at the beginning of the conversation the history is empty so when the history is empty we just want to use our system prompt which in our case is basic you are a helpful but your answers are clear and concise which is and I'm using this structure like s and inst like from here like if you can see like this first line corresponds to this line and then I use my system prompt and close this system prompt with this token and like um and again in this format message function I just like if the history is empty it means I have I hadn't sent any message yet I want to initialize my system prompt and my and my message which is my first query first prompt and then I want to close it with this closing ins tag which is again here because I will send my user message and close with insd and this is wrong this is not supposed to be here I'm sorry this is only after I'm closing this sentence or only after the response from from chat um foreign yeah and then I'm building like the formatted message because when we will when we later prompt the Llama we will always send the whole history because when we are used to when using charge gbt this is normal for us that charge gbt remembers the history of our conversation and until we don't really exceed the context length but in such models like in open source models when our context length grows when our conversation gets longer we actually save the whole conversation in the history and we pass when when we send for example the third message to our llama we don't just send the third message we send the whole history and this is why this history is so useful here and this is why um when we handle the conversation history we actually format message using all the previous user messages and what answers except from the first one because we we we defined the first one here and okay and this is the the newest message that we're sending okay awesome let's run this function and yeah and then we need a another function to get responses from llama because this is only how we format The Prompt that we will send to our model which means that we Define like we use this message history and the current message and this is only giving us what we will send to the Llama model and yeah so this get llama response again uses this message in history and the first thing what which we do is we update the query which is basically what this function returns and then we've got this response which is basically what our llama response uh what what the Llama model will respond to as NH like turn of the conversation and this is very really based on this function from the previous video with the small change that we're not using prompt we're using our the whole query and then again we generate sequences and the response we just want to the on the response from the model not the query and this is how we just remove our prompt or our query from from the response because those open source models or this solution with sequences it's it responds also with the user input so it means Lamas respond would give you both your query and its answer and we we don't want to see both in its responses this is so this is how we get rid of our of the user input and then I'll adjust Returns the response so let me just initialize this function and then where the whole UI magic happens is with radio and its powerful new new model module for chat interface so let me just run it and as as the only parameter it takes function that we want to use and this this is our function and as I said like radio automatically sends this message in history that's why we don't initialize this function with those two message history parameters they are automatically passed to to our function okay let's try let's give it some information about me and what's my name okay awesome so let me just just uh because I've I've limited this history to the previous to the last three messages let's just test it for a second so right now it remembers my name and let's tell it I'll I've got two kids they're doing great thank you for asking I love basketball and okay I'm from Poland and I live in Germany have you been Have you ever been to Poland yes I'm from Poland and so I've given it like five informations so let me just ask it how many kids do I have how many kids do I have okay it doesn't remember anymore and where do I live it's forgotten that's interesting again we're playing with the worst lava model which means that it may give us so really really poor answers I just want to show you the limitations of this of this models and they are still quite huge to be honest but the open source space of large language models is very new so don't get discouraged with that uh just because like the implementation that you learned here is something that you can then reuse for for newer and better models because the implementation you learned here is something applicable for other models so when the newer and more powerful models come you can just reuse it and you can basically get better answers and better user experience okay that's it for today I just showed you how to use this amazing amazing module from gradual for chat interface and we used it to chat with llama2 I hope you liked it I hope you learned something new and yeah feel free to like And subscribe this video and even hit the Bell on my profile so that you get notified when new videos come and thank you for today
Info
Channel: Kris Ograbek
Views: 4,507
Rating: undefined out of 5
Keywords: gradio chat interface, gradio chatbot, gradio hugging face tutorial, gradio chat ui, huggingface pipeline, huggingface tutorial transformers, llama 2, llama 2 huggingface, llama 2 chat, llama 2 tutorial, llm hugging face, large language models tutorial, llama 2 colab, llama 2 python, llama 2 ai, llm tutorial, llm tutorial for beginners, machine learning, ai, open source llm, python huggingface, chat with llama 2, gradio llama 2, how to use llama 2 in colab
Id: lSBX-nMQ8cE
Channel Id: undefined
Length: 19min 13sec (1153 seconds)
Published: Thu Sep 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.