Llama2 Chat with Multiple Documents Using LangChain

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys welcome back many of you asked me to create a video about chat with documents without open AI models so here you go we are going to go through chat with documents using llama2 model from meta right I have already created a video about chain lead before with chat with PDF documents CSV and using Docker to dockerize the application and then deploying it to gcp you can follow this particular thing but in this particular video I am going to use chain lead right so how does it work this is the document I have explained you so many times this flowchart but if you want to go in depth what is happening here I have explained in this flow wise UI video please refer to that for detailed explanation for quantized llama 2 we are going to use this particular model the block llama to 7 billion chart gzml so to run into the simple terms in the commodity Hardware we use the quantized model right and how to do that is we are going to use the C Transformers library in order to load this particular model so ggml if you don't know what it is we are going to use this email I will provide the link in the readme of this Repository and then there is also the C Transformer link so what is C Transformer python binding for the Transformer models implemented in CC plus plus using ggml library right so thank you for all the open source community that has done really prevent this job in order to help us to achieve all the things that we are going through right now right because before when we want to run these big models it was difficult and we need GPU and several gpus right but now we can run in CPU that's all I want to show you in the beginning I will now go to the GitHub repo and we'll clone that particular repository locally and deploy the model using chain lead let's get started okay now I am on the GitHub repository of llama to chat with documents right all the instructions is provided here right I will just go through this when I clone this particular repository but just to let you know that I have already created the same thing using open AI before chat with any documents you can refer to this particular thing also if you want to go through the open AI models right the YouTube video is also here and all the instructions are here I will place the link in the description I have already cloned this particular repository here as you can see and yeah there are many things that you need to follow same as before I'm not going to go through all the details because I have gone through it many times right what you need to do is git clone and go inside the repository I have already done this you need to copy example.env which is here to dot EnV and you can get the hogging face hop API token from this particular link or URL please go there and then replace that into the dot EnV what does the dot EnV looks like it's same as what it is here you need to just replace the token from here right that's all and then next you need to create a virtual environment with this particular command but make sure that you have python 3.9 or later because that is what I have used and I have tested earlier versions of python may work but I haven't tested so it may not compile right and after that you can install the necessary packages from the requirements.txt which is mentioned here right all the things are here and that's all and then we can run the ingest file so what I have done here because this is the local model what I will be doing first is first do the embedding part right and then we will run the main app right what is in the ingest dot Pi file let's go here we are importing the necessary things I have already gone through this we it will create a DB folder when that particular file is rolled right and then there is the create Vector database function so yeah I'm going to use the PDF file readme file and the text files all these things are mentioned here to load that particular things and load documents from all the loaders right and then we text Splitter from line chain to split that into different chunks right and then we use the hogging face embeddings model that is sentence Transformer all mini lml6v2 right and the device is CPU we are going to use this on CPU and other things are the same Vector database chromatod from documents documents is this one embedding is hogging phase persistence directory is DB directory Vector database persist right this is all we need to do in order to run this particular ingest file now we can just do time and then we can run Python 3 in just.pi why time is because we it just displays the time how much time it takes in order to do the embeddings and store that into the vector database so it is doing behind the scene the embedding part and a DB folder will be created here once this emitting part is done let's wait for some time when the DB folder is being created now it is completed as you can see here the DB folder is being created and inside the DB folder we have all the embeddings being stored right so DB or the ingest process is now completed once the ingest part is now completed we can go to the main dot Pi file to run the particular chain lit up right if we go to the main dot Pi all the different things are here and by the way everything is same as the last video in my chain Lead Series you can refer to that but I have created different functions for different process here here on the code we need to save the model in locally right what we need to do is create a folder first you can do mkdir model right and there is a modal folder being created here we need to go inside the model and we need to download the model from the hogging face right for that what we can do this is the place where we went V4 go to files and versions and if you scroll all the way down here there is we are going to use this particular model there is this download icon you can download it and place it in the model folder but the easiest way is right click here copy link address now you can go back to the vs code and what you can do is W get Ctrl V and then if you do enter what is going to happen now is it is going to download this particular model inside the model folder so it is going to take some time I will get back to you once this is done okay so as you can see here it took 4 minutes and 10 seconds in order to download the model and now inside the modal folder we have that particular llama2 model right okay now you can go through this main.pi file right so we have the prompt template here and by the way the E dot EnV file is automatically taken from this particular dot EnV by Chain lead so we don't need to provide it right first we have the custom prompt and then we create the retrieval QA chain right you can go through this in details but this is what we had been through many times and we take the top three sources right similarity sorts from the vector store and then we load the model and we are going to use this particular model which we just downloaded right and then what we do next is we create the retrieval QA bot right and after that what we do is we retrieve the bot answer right here is the QA bot instance we have the create reachable Cuba bot and then we provide the query into it so that we get the answer out of it right then we have the bot initialization and after that we have this function which processes the incoming chart messages right that is how it works and the steps is mentioned in this particular diagram until this part where we store the embeddings in the ingest file and then all the different things down here in this main dot Pi file right what we can do now is just run this particular main dot Pi file for that you can go to readme.md right you can just copy this particular command Ctrl C and just Ctrl V so when we do the control V and enter right it is going to run the app locally in our system right as you can see here it is welcome to chat with documents using llama2 so I can just go here and ask questions related to the attention paper right so what I can say is who are the authors of the attention is all you need paper right so if I ask this question it is going to do the retrieval QA right and using stuff document chain there is llm chain right and then it is running let's see if it provides the answer or not because we have three different documents one is the PDF one is the readme file and the next one is the text file right one thing what I noticed myself is it is taking some time of course compared to the open AI models and it is completely fine because we are running this locally where the model is 7.2 GB right so it will take some time and the answer being provided what I noticed myself is also not that good in this particular retrieval Q question answer retrieval part so it says here could not reach the server this is what I have experienced a lot using Channel it and then again it refreshes and it is going to provide the answer but let's see if it's going if it is going to provide the answer or not it's still loading here right so as you can see it takes some time but at this point of time open AI model has already provided answer for us but we are paying some resources there but here we are not paying right this is open source so yeah there is some latency and if you are creating some app and deploying this at this point of time I don't think this is the good choice okay so as you can see here it says it just provided the random things here as you can see and the author of the course goes something something it is not providing the right answer it is trying to go somewhere right attention paper and here it it just shows okay document one and this is document one instruction self-attention sentence embedding and it is providing something here and it is going to that particular attention is all you need paper but it is not providing the right answer for us right but now let me let us go and ask the ram question from readme file I will say how to run the app using docker right so here it says okay this is showing it automatically now here because the same question I asked before right I can modify some question here how can I run the app using Docker right so if I run this it will go to the retrieval key why that was happening is because in chain lead if you ask the same question then that is already in the cache so it will provide the answer directly to you that is good part of chain lead but if you provide some different question then it needs to go through all the process again and the good part of creating app using chilled it is that of course it is the answer is stored in the cache and you can get the answer first if the question is exactly the same right let's see if it provides the answer or not as you can see here the answer is not that good for the PDF part right so I'm just showing you what I get here it goes to the source as you can see here it provides the source correctly data attention.pdf right so it is going to that particular file but it is not getting the answer right okay how can I run the app locally it is refresh now again there was some issue with connecting to the server let me let us see if it provides the correct answer for us or not so yeah this is how you can create the main idea of this video is just how you can load the models and use that in your particular use cases okay it says here let me see okay you can run the app using Docker by creating a Docker file in the root of your project directory and running the command okay this one this will create this this right and then you can run the app Docker bike okay this is the resources now it is being provided let me see where it took it from right base content Docker compose it says app on Google Cloud using the cloud run so yeah it is providing the answer now here as you can see so this is how it is providing and it is going to that particular part this is really good right and now the next thing and the resource is also exactly from where it is taking because it is stored in different chunks right so the next question I'm going to ask is about the text file where let me ask what is the president talking about NATO right it provides the answer directly because it was in the cache right and it is providing the right answer by the way I don't even need to ask the different question because for this one it was providing the right answer already here but if you want to go and test you can just provide some other random answers and it will provide the answer okay that was all I wanted to show you in this video now it says could not reach the solver because I stopped that from the terminal but when it was loading as you can see here it was loaded here and if we ask different questions and all the different things or the log is provided in the terminal also here I cancel this and the app is not running right so yeah this is all I want to show you in this video I hope you learned something new and now you can use llama to instead of open AI models but again you need to remember that it is not providing that correct answer as well as the latency right you can use some good resources maybe you can use GPU but I just want to show you that it is push able to run on CPU and at least of 16 GB of RAM I have 24 GB but still it is not providing the right answer right so yeah it all depends upon what kind of Hardware you are using but yeah thank you for all the people who are creating those different libraries from where we can run these big models easily in the CPU machines right yeah that's all for this video thank you for watching and see you in the next one
Info
Channel: Data Science Basics
Views: 11,509
Rating: undefined out of 5
Keywords: code, chat ai, llm, chat, langchain, langchain tutorial, langchain openai, langchain explained, framework, langchain hugging face, langchain chat gpt, llms, chat models, prompt, chain, agents, csv, chat with any tabular data, create chart with llm, documents, markdown, chat with your data, own chatgpt, llama2, llama, chat with documents using llama2, llama2 with langchain, chromadb, chainlit, llama 2 locally, llama 2 how to use, llama 2 python, llama 2 huggingface, llama 2 local install
Id: VPk-at5oqAY
Channel Id: undefined
Length: 15min 22sec (922 seconds)
Published: Sun Aug 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.