ChatBOT with Memory for Your Documents - OpenAI & LocalGPT

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

how to create chatbots for your own documents that have memory that's the topic that we are going to be covering today first we will build a chatbot that you can use to talk to your own documents then we will add a nice graphical user interface for you to interact through it now in this chart we're going to be using chat GPT Sr llm and embedding model now if you have been following my channel you probably know that I have my own local GPT project this project lets you chat with your documents on your local machine and now data leaves your device so in the next step I'm going to show you how you can replace open AIS models with the local GPT models for this video we're going to be borrowing the main code from this new course blank chain chat with your data by Deep learning.ai I'll recommend everybody to check out this course because it's a really great beginner course let's quickly walk to the code I made some adjustments to it to make it functional so first and foremost we are simply installing all the required packages using pip within a jupyter notebook next we are importing all the required packages and then after that we're importing the operating system module you need to Define your API key as an environment variable it's actually better to put the API key within a DOT EnV file and then load it from there but I'm simply providing it here and then we're simply defining the model that we want to use so for the this first registration we're going to be using a GPT 3.5 turbo but later on in the video I will show you how to replace this with an open source model okay let's quickly talk about how you can add memory to your llm or a conversational chain so uh blank chain has different types of memory one is conversational buffer memory right and this is how you define it if you're looking for a more detailed treatment of the subject I would recommend you to watch this video but basically what you do is you define a memory object right then you define your embeddings as well as the vector store that you want to use now after that you need to add memory into your normal conversational chain or any other type of change that you are working with now this is the simplest way of adding memory to your chain or llm I have a dedicated video on different types of memories that are available in link chain so I would suggest to watch that after this video now let's see how you can use these Concepts to put together a graphical user interface around it here and they have defined a function called load DB that receives three inputs one is the file name so this is basically a PDF file the second one is the type of chain that you want to use for information retrieval and then the third one is K which is the number of chunks that you want to use during information retrieval if you're not familiar with these Concepts I would actually recommend you to watch this specific video in which I go into a lot more detail now within the function let's see what is happening first we load our PDF file and we use a document loader so for this specific case where you're using um Pi PDF loader okay so next we Define a text splitter which basically divides a document into smaller chunks or smaller segments and each segment is supposed to have a size of thousand tokens with an overlap of 150 tokens between the consecutive chunks or segments and then we use this text player to split our documents into those chunks right these are should be very familiar Concepts uh in this case we're using the open AIS embeddings right and uh we are using a vector DB now this specific Vector database is dock array in memory search so this Vector store is basically in memory you do not persist anything this is going to be especially helpful if you give users the ability to upload different documents while interacting with your app next we are defining a retriever which accepts the number of chunks that we are going to be looking at in order to extract certain information and after that we are simply defining our conversational retrieval chain now the chain accepts the llm as an input and this specific case we are using the GPD 3.5 turbo then the type of chain that you want to use so we're going to be using stuff then the retriever basically uh is our Vector store that we defined right and we do want to return sources as well as return the generated questions now when we look at the generated questions they're going to be slightly different than your actual prompts because there is a system message on top of your prompt now one thing you will notice that in this specific case uh they have not added memory into the conversational chain they're doing it outside this function they're using the rest of the code to create the graphical user interface using the panel package so I'm not going to go into a lot of details uh for the rest of the code however I do want to highlight this specific portion which relates to memory So within the code they are keeping track of the chat history and then the chat history is extended with the query and the results that you get for your subsequent prompts and that is being passed on to your q a chain now in order to run the dashboard you will need to execute this specific line at the end okay so once you do that now once you do that you're going to see a graphical user interface like this which is running at the localhost with 52743 Port all right so let me first quickly walk you through different components of illustrian's conversation which simply keeps track of the user and assistant conversation then there is a database which will actually show you the DB then there's a chant history and configuration so right now we are working with the U.S Constitution however you can upload your own files if you want now if you're curious so here is we are defining uh the default file so I'm using Source documents folder and within that we have the constitution.pdf file but if you want you can basically choose another file and then click on load DB all right so let's ask it a simple question what is the term limit of the president and then comes up with the term limit of the president 82's term as stated in 22nd Amendment of the U.S Constitution now we can also ask subsequent questions so I said what is the age restriction right now since it has memory so it will know that we are talking about the president that's why it says the age restriction of the president is that there must be at least 35 years old okay so you can basically upload your own documents and start chatting with them now let's look at the database so this shows uh four different chunks that were used for the query let's look at the end so it's found some information on page five page 13 page two and I guess it page 17. and so those different chunks that you see are subdocuments that were used for generating the response they also have a challenge history uh so you can see the first prompt and then the corresponding response and here is the second problem the corresponding response now as I said uh the last tab you can actually upload another document so let's try to upload something else now this time I am going to upload the Orca paper so I chose the file and then click on upload DB and it should actually replace this yeah so this is the Orca paper now and if you go to the conversation we can simply ask a question let's say who are the authors and let's see what response we get so here is the response front GPT 3.5 turbo now I wish the uh they would put the more recent interactions here on top but for some reason uh they're being put at the bottom but you can easily change those okay so this is a really nice little GUI that you can use for creating your own chat Bots and as I said before it's not by code it's coming from deep learning dot AI now let's look at second part where we're going to replace open AI llm as well as embeddings with our local GPT embeddings and llm okay so in this case I went ahead and cloned the repo of the local GPT project if you are not familiar with local GPT so I would recommend you watch this video now in summary the local GPD is a project which enables you to chat with your documents locally and privately and nothing leaves your system so if you have any type of sensitive data or you are concerned about your privacy I would recommend to check that project out okay so what I have done here is I created this chatbot underscore localgpt.i python notebook and this is basically exactly the same code that I showed you just with a couple of changes now you need to make sure that uh you have installed local GPT on your machine and after that you can run this notebook so the first change that we did was we're actually importing load underscore model from run underscore local GPT now this is basically the function that we are loading uh from our local GPT file now this needs three inputs one is the device type whether you want to run it on a CPU GPU or if you have an M1 or M2 Macbook so you will pass on MPS then the model that you want to run and the last thing is the model base name now we're going to look at this in a little bit and I'll explain some awesome functionalities that local GPT has after that the next change that we are making instead of open AI is embeddings I'm using hugging face embeddings you can replace to this with the instructor embeddings as well but they will need to be run on GPU and I think it's around four gigabytes file so if you don't want a big embedding files you can just use hugging face embeddings instead now let's go to the last change that we have made so if you remember we were passing on an llm so that is replaced by the load underscore model function the model ID that I'm actually using the model this is the Orca mini V2 a 7B GG ml model if you notice you can now work with GG ml models within local GPT now I'm running this on M2 Max that's why the device type is going to be MPS and then the model base name is basically the model uh the quantized version that I want to use so in this specific case I'm using the 4 bit quantize model okay I think it's a good point to talk about a couple of changes that helping me to the local GPT project I'll be making a detailed video on all the new changes but just wanted to highlight some of the features here I released local GPT project it only supported the hugging face models however since it's released there have been quite a few updates thanks to the open source community so now it has support for a GPT queue format files or models but keep in mind if you're running GPT Q based models then you need to have an Nvidia GPU now if you do not have an Nvidia GPU you can actually run GG ml format models and thanks to the contributors we're now able to run almost all different types of models support for MPT models is coming soon as I said I will be making a separate video on all the updates to local GPT okay so we created our llm chain and the rest of the process is very similar to what we did before simply run the rest of the code cells now when you run these you will actually notice something like this because I'm using a llama CPP so this is basically different outputs that gamma CPP is going to show you okay so when we run the code you are going to be presented with exactly the same graphical user interface but in this case at the back end we are using open source models okay so I asked this model the same question again uh and it came up with a very detailed answer now the reason is that the model that we have chosen the Orca Mini v27v model the output will really depend on the type of system prompt that you provide in this specific example I haven't provided any and provided any system message or system prompt but when you are working with these open source models you need to make sure to pay close attention to both the prompt template as well as the corresponding system message that you want to provide we are going to look at that in a future video okay so in this video I wanted to show you a few things first how to integrate memory into your model either you can use the memory buffer object or you can add is as a part of context the way it's done here second the graphical user interface and the third one was how to replace open AIS models with open source local gptbs models I hope you found this video useful consider liking the video and subscribing to the channel for similar content we have a very active Discord server where folks are helping each other out if you have any questions or comments please come join us you will find a lot of value there if you're working on llm-based projects and want to discuss this and check out my calendar in the description of the video as always thanks for watching and see you in the next one

Info

Channel: Prompt Engineering

Views: 7,085

Rating: undefined out of 5

Keywords: prompt engineering, Prompt Engineer, natural language processing, GPT-4, chatgpt for pdf files, ChatGPT for PDF, langchain in python, embeddings stable diffusion, Text Embeddings, langchain demo, long chain tutorial, langchain, langchain javascript, gpt-3, openai, vectorstorage, chroma, train gpt on your data, train openai, train openai model, train openai with own data, langchain tutorial, how to train gpt-3, embeddings, langchain ai, localGPT, localGPT with memory, Llm with memory

Id: ct8XoZc9W7I

Channel Id: undefined

Length: 15min 40sec (940 seconds)

Published: Fri Jul 07 2023