Create a ChatBot in Python Using Llama2 and LangChain - Ask Questions About Your Own Data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so today we'll be building a chatbot which can read through PDF documents and then we'll also be demoing it using radio so to get started we just first install these libraries I'll be uh adding a link to this uh collab notebook in the description below so we install all these libraries I've already installed them and we checked the GPU ah you need a GPU instance to run this it will not work on a normal CPU instance so you can come to here runtime and then change runtime type to GPU if you have 800 or 800 that's better but I'm using the free version so there's nothing that you need extra to run this you're logging into hugging face Hub you log in because you need the hugging phase Hub access to access the Llama models lava 2 models and also model models if you don't have access to them already again we won't be using any OPI open API uh keys so this is all using open source llms uh we need nltk and then first we load the Llama to 7 billion chat model so I've already loaded it it takes a little bit of time so I'll run the code beforehand so I am loading it in four bit State uh it just makes that the GPU requirements are lower for this and you have enough space to load the context and all of those into GPU memory so you have the bits and byte config we are loading it in 4-bit the model ID the tokenizer and the model with the quantization config added to it as you can see it takes around five gigs of memory uh then uh you create the prompt template you need these tokens as the Facebook website says for The Meta sorry The Meta research page says it's like you need the instruction and token the system tokens and that's how you get this is a helper function to create the prompt the new download hugging with embeddings this is to create the embeddings which will which through which your documents are stored in a vector storage for retrieval later and we have the recursive character text splitter the chroma Vector store and the PDF loader we for the toy example we are using the World Chess Championship 2023 match between dingler in and yamneachi because this is quite recent and I don't think the Llama models were trained on this one so we'll be using this as the context for our questions uh we are using a text splitter recursive text splitter recursive character text splitter with a smaller chunk size and a chunk hold up a 20 we are using this because we don't want the GPU the queued out of memory error so because all of the documents that you pass as context also go into the GPU member so we are using a small chunk size you if you have a larger machine you can increase the chunk size here we load the pages we create the embeddings uh into a database and we are using a persist directory because in case you have a lot of documents you don't want to do this over and over again you want to store it somewhere and then call it directly okay now is the instruction given the context that has been provided the context will be passed here answer the following question the system prompt is that you are an expert in chess you will be given a context to answer from be precise in your answer wherever possible and then if there are any sources cite those sources as you can see our helper function creates The Prompt with the instruction along with the sys tokens on today then we set up the light chain we set up the hugging phase pipeline because we want to use that to create the llm and we get the prompt template the retrieval QA this is actually not required so just remove it I don't think we need it is anywhere and then the conversational retrieval chain and the buffer memory and the buffer window memory we'll be using the window memory but you can also use the conversation buffer memory this one just loads all the conversations that you had here it loads up to a number of conversations that you define and then this is the template and this is the prompt we will be loading the last five conversation that we had I don't think we will be needing more than that for this demo so and then we Define our retriever and then we create the pipeline which is the hugging phase pipeline so pipeline text generation model tokenizer max new tokens as part of this function so we can change it whenever we require and then we create the chessboard class so this is just to help me uh make any changes as I wish you you if you want you don't need to create a class to create a chatbot but if you initialize the memory The Prompt and the Retriever and then we create we create a function that helps us create the chatbot we Define the hugging base pipeline the llm and the chatbot itself which is the conversational retrieval chain from line chain and then we click we create an instance of the class here and then we create the bot itself here we Define two helper functions so I'll roughly explain why we are creating this help uh helper function so we want to clear the memory that the bot has in case we feel that the memory is becoming too big so we want to create a button which when we present it clears the memory so here is the button which is the clear memory and when the clear memory button is clicked it will execute this function which is what dot memory dot clear and then also we want to see what happens when we play around with uh the system prompts and how the LNM behaves so if you want to create a text field where you can pass in your own system prompt and see what happens uh so we have a text field as well as update system prompt and when you hit enter on the update system prompt when you pass something so and if it is not empty then what we want to do is we want to update the prompt within the bot using the template provider so we are not using the chat interface from gradu because I think it doesn't provide you that much flexibility than using just the blocks so I'll be using the radio blocks here is the text box to update the system from here is the chatbot box this is not the chatbot itself it's called the bot the message where you will be passing your question the clear it clears the messages uh previous messages clear memory creates clears the memory of the bot and then the respond functions actually is passed whenever you hit the message enter on the message button so it gets the answer it appends the chat history and then Returns the chart history so the chart history is the one which the graduate remembers and then shows you in the space all right and we'll launch the demo uh we'll have debug is true and share schools false because we want to see what errors come when it happens so as you can see we have the upgrade system from space the bot itself and the question so we want to ask who won the chess or just championship in 2023 and as soon as I hit enter it will try to load in the context uh using the retrieval and then it will pass through the llm and then it will try to come with an answer we have told it that it has to be as precise as possible so let's see if it is profile yeah it is so it returned based on the information provided in the text the winner of the World Chess Championship in 2023 is thinking now let's see uh we'll ask it to act like a drunken man so act like drunk listen so your responses should have spelling mistakes so we now we have updated the system prompt and let's see what was the final score and did the match go into line breaks we know that the match went to die breaks so let's see if our chatbot is able to answer it also in a particular flavor which is of that of a drunk person one more thing to remember that we are not using any API so you can run this on your local machine as well you don't even need internet access to run this uh apart from the radio I think for radio you also forget you you don't need internet access as far as I remember there's also some things that you can do which is like streaming and stuff like that so uh you can stream the responses so that it feels that the time is not taking so long but uh and also I haven't set this up so that it the inferencing time is super fast so it will take some time but uh okay uh clearly it has broken voltage Championship many many many many okay let's ask it a different question what was the result of the first game as far as I remember the first game was a draw let me just confirm again thank you okay now it has replied and as you can see it has some flavor of a drunk person but um yeah what hiccup blinks the result of the first game was a 49 move draw I think that's correct the first game was a draw so I think it works well so this is how you can create a working demo of a chat but uh which can go through your own documents and provide reasonably good answers uh on just Google collab it means that you don't need that much hardware and you'd all also don't need open APA open AI API tool on things thanks for watching and do like and subscribe also let me know in the comments if you want any more large language model related tutorials or anything that you want me to cover all right thanks for watching
Info
Channel: ML Explained
Views: 6,555
Rating: undefined out of 5
Keywords: Gradio, ChatBot, Custom data, Chess Chatbot
Id: E8NUSCDvfEQ
Channel Id: undefined
Length: 12min 2sec (722 seconds)
Published: Thu Jul 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.