Chat with Multiple Documents with Llama 2 and ChromaDB (Free LLMs and Embeddings)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video tutorial we will learn how we can create a chat board to chat with our documents using llama2 or open AI as our large language model and we will also be using chroma as our Vector database and to do all this implementation we will be using Lang chain framework so here is the complete architecture so let me just show you so let me just zoom it out in so you can see that first the user will upload the documents whether it can be in the text file they can be a docs file but they can be in the PDF file then you will extract the data from this documents okay so as you know that either we use lava model or open a model they have some input token limits okay so as we have the extracting data from this documents so we cannot pass the extracted data directly to the large language model because they have the token limit so switch split the data into multiple chunks then we create embeddings for each of the text chunks and then we store the embeddings into a knowledge base so after splitting that data into text chunks and creating embeddings for each of the text Chunk we build a semantic index and then we store all these embeddings in our knowledge base so embeddings are basically vectors which contains uh which are which are basically floating Point numbers like you can see here 0.1 0.3 0.3 Okay so to reduce the size or to compress the size of text chunks we create embeddings and abandings are contain a floating Point numbers okay so uh you can see that then we basically built some vending attacks and when you create knowledge base using these embeddings so next we can see that when the user asks a question we create the beddings for the question and then we do a similarity search okay so we need we find out uh the answer from this uh the user has asked if I use the question the user has asked we find try to find the answer for that question in our database or embeddings which we have stored okay in the knowledge base so in the from the knowledge base we try to find the answer of that question and then we react the result like we find that rank the top three uh answers okay which are similar uh from the user question okay then we just pass our results to the large language model and then we also pass the uh question directly to a large language model as well and the large language model generates a natural response so this is how this complete architecture works so here is that word so we will be using doing this using gamma 2 as well as open AI so let's first see how we can do this with gamma 2. so first of all I will install all the packages so before running the script please make sure that you have selected a runtime as GPU so it's fine so this installation might take few minutes so let's wait until this gets completed and then we will go ahead so this might take few more seconds so now this package is installed so now I will just install the chrome rtb package because chroma is our Vector store or you can say that uh in is our bracket database where we will be storing our embedding so yeah in chroma we store environmentings locally while in fine go and we lower our embeddings in the cloud so in the previous tutorial we discussed about uh find one as well so you can just check it how you can store your embeddings in the fine point Vector store and we store 1000 bending the ground the out while in chroma we store our bendings locally so this might take few more seconds for it to get completed so let's wait for me for more seconds uh as it gets completed now the package installed so let's install open Ai and big token package and then we are good to go ahead so now I will just import all the required libraries we require Pi PDF loader so that we can load the PDF file and extract the data from it and then we require text order so we can upload that dot txt file and extract the data from it and then we required docs to text loader so that we can upload a DOT docs file or what like Word files and we can extract the data from it and then we will do voting characters text Twitter so that we can split the text into uh chunks okay and then we require how to increase some padding so we require hugging press and bindings because uh we will be converting our text chunks into our bedding so we will be creating a bindings for each of the text chunks like I show you over here uh we will be creating a paddings for each of the text chart so for this we require we are using uh some test transfer embeddings open source embedding model and Unix and test transform weddings plus we require coma because chroma is our Vector database or vector store where will be stored out embeddings and to assess the model from or to download the model or to download the llama2 model from adding Facebook by The Notebook logging where we will just log into our hugging face account by entering our ipid over here or entering our gingcase API key and we will be able to assess the models and web hosts which are in our which is which are in the hugging phase uh so and we can download those models in our notebook and then we require torch Transformers so that we can Import Auto tokenizer and we can uh convert our tracks into an array then we require Pipeline and SBN impact implementing gamma 2 models uh locally we are not assessing it so API keys so we require Health infused pipeline so we will be creating a pipeline and then we will require conversational reproval change so that we can do the question answer and then uh we are importing conversation buffer memory so that we can save the chat history okay so that if I ask some question that is relevant to the the previous question so our model has some chat history like we seeing Char GPD application then we require open a mobile links I will also show you how you can use open source operating model and how you can use both create embeddings for each or a text Chunk using open AI embeddings then we are importing chat open AI so that you can chat uh with our a chat with our basically large language model and we are using uh what I will be doing uh using the lava 2 model as well as I will be using the open a model which is not free the closed Source mode and it has some post associated with it then uh import OS so that we can link down the path or forward that data data folder and then we require C so that I can generate a system message if I just want to exit the process so let's import all this required libraries so now you can see that we have imported all the required direct libraries so now I will just create a directory over here by the name docs where I will just upload my PDF files you can also upload the word files or text file as well so now you can see here I have just created a directory by the name dots which you can see over here so now I will just upload a PDF file over here so I will just upload this resume over here as well as now I will just upload another file over here in or this candidates so let me just share about this PDF files which I have uploaded so I have uploaded these two PDF files one is this uh AI generated resume which you can see over here and other is this uh document this is the official your V7 a pre-research article of July V7 is an object detection uh model okay and this is the audio V7 research paper so I've just uploaded this paper and this resume okay so these are the two different kind of documents which we have uploaded over here you can also upload any kind of PDF file other documents or any txt files so these three extensions are spotted over here so I will just run this now so now I will just extract the data from these two PDF files and this data stored in the variable document which you can see over here as well okay so now we'll just delete the text into chunks so here I'm just letting that text into chunk and each chart will have 500 characters so these are so now what I will do is as I told you we will expect the abstracted data okay so as I told you that we will first abstract that data from the documents then we will split the data into chunks so now you can see that we have extracted data from the documents now I will just place the data into different chunks then I will create embeddings for each of the test chunks so I have the splitted that data into different chunks you can see that in the code I already showed you now I will just create embeddings for each of the text John okay so now you can see that we have to create a data into text chunks now I will just create embeddings for each of the text chance so we can also use hugging face uh embeddings you can use some test transform weddings plus we can also use open ambedding so open a embeddings are not free it has some post associated with them so first let's see how we can do this with opening embeddings and add open a models then you will see how we can do this this hugging system bitings and Gamma 2 model so here you can see that I have just passed my open a API key over here I'm just creating employment and here I'm just downloading embeddings from open a embeddings over here and let me show you by default it uses text embeddings ada002 model you can just change the embedding model as well but this is the default model which we have inside and we are using this so now you can see that I am just initializing the chroma Vector database so here what we are doing is that uh we we you can see that we have created uh expected our document into checks chunks so you can see that we have stacked data data over here from the PDF and then we have just particular data into different text chunks now what I will do is I will just create embeddings for each of the text jumps so what I'm doing here is that I'm just creating embeddings for each of the check stance and just saving those embeddings into Vector database or you can say that Vector store or knowledge base so using chroma I am just saving those embeddings into this uh chroma knowledge base so chroma is basically our Vector database or vector store and we are storing our batteries in chroma knowledge base so here you can see that I have just passed the downloaded embeddings over here and here I've just passed the doc text chunks okay so in this variable we have our document chunk Soul text chunks okay so now what I am doing is that I am now just converting the text Chunk into embeddings creating embeddings for each of the text Chunk and then I'm just creating a knowledge base okay so knowledge base you can see that yeah this can be so here it is chroma Vector stored oil Vector database okay so this is the basically uh what we are creating so knowledge base represent chroma Vector store or vector database okay so let's run this and so you can skip the network working because we required this when we try to download model from huggling face so just skip this so now I will just initialize my open a model over here we are using GP 3.5 turbo gpd4 is available as well you can use it as well and here I'm justifying the temperature as 0.7 pressure temperature terms define how creative your model will be so I temperature value varies from 0 to 1 so if the temperature value is 0 it means the model is not creative it is not taking any risk but if the temperature value is 0.8 or 0.9 it means the model is creative and it will take this okay so here I'm just defining conversation before memory as well so that it can drag inputs and output and it can save the conversation in the memory as well okay so here I am just creating a conversation retrieval chain okay so conversion retrieval chain is basically built on retrieval QA change and the composition which we will change with specific provider chat history component as well so you can just Define memory or you can randomly add chat history as well so here I'm just asking a first question below V7 is trained on which data set so let's see what the answer we get from here like this here I've just write my first question yellow B7 is your score so I'm just uh yellow V7 is a real-time object detection algorithm that is used in computer vision application so I just asked this question from this document and I've just asked from this stretch of green CV please meaning Russian green education okay so let's see what response do we get from this for agile green education includes a PhD in English from University of Illinois and Ma in English Okay so if for uh please share for experience so you can see we have the chat history available only as as well so here you can see we have the warning as well you can simply import warnings and ignore this warning as well at the start so official green X has experience in teaching advising and Advising specifically as composition in structure instructor Okay so you're gonna be seven outperforms page in order okay so let's see what response we get so yellow V7 outperforms in your lower your low X YOLO V5 okay so if I just want to exit this chat I can just write Q Z so if I just write two it's exiting okay that's good so we are done with this so we have seen that now we can create a chat go to chat with our multiple documents using uh open a and open a embeddings okay and we are using uh link chain as the framework so now let's do all this with gamma 2 and we will not be using any open source embeddings or open a embeddings or open a model we will be using opens for some weddings and open source models so we will be using how deep for some weddings and uh gamma 2 models so these are both free and they have no cost associated with them so let's do it from here okay so now I'm just using gamma 2 model so I will just download how to increase some weddings so you can see that the embeddings are being downloaded so now I will just uh let's see whether we are able to download this person batting so yes over embedding that friendly organization embeddings and here I'm just passing documents chunks and let's see how does it goes okay so I think I just need to restart the runtime because okay so let's uh restart the runtime and do it start because our previous embeddings are creating an issue for us okay so let me just figure this out oh I have just restarted the runtime and so now I've just installed all the required packages so now we will see how we can do with open source and weddings from sentence Transformer and how we can use this open source model number two model so previously we have done this using open a model and open a embeddings because they are not free and they have also associated with them so we are just trying it out with uh open so uh with open source embeddings from hugging space beddings we are using sentence Transformer paddings in this uh tutorial as open source embeddings plus we will using open source model Nama 2 model as well so we are just installing the packages again so this might take few more seconds now you can see that we have installed the required packages so I will just import all the required libraries so we are just following the same procedure which we already did okay okay so now I will just go below down and download the model from hugging face okay so I will just do a Network login from here okay and I will be just going to my hacking face account okay and from my hacking face account I will just go to settings and in assess tokens I will just find this copy this token and I'll log in over here so that I can just download model from hugging face okay so you can see login successful so now I'm just downloading number two chat model is 7 billion parameters from housing phase and I am just reading uh including four bit integer quantization so that I have the optimized model over here by default it has a 8 16 bit but as you can see that I am just using Google app free version and it offers 15 GB graphic but Google collab page subscription offers 40 GB graphic card so you can just use people for that bit subscription if you want but I'm using Google app research frequency topper 16GB graphic card so I have to optimize my resources uh so that I can just use it easily so that I don't face the issue that runtime crashes so I'm just using 4-bit integer quantization so that I can just use the the modern would like some plumber to model as per my resource is available okay so now you can see the model is being downloaded and here we have Auto tokenizer so that we can convert our text into one numerical array and here we are using gamma to Modern over here and just downloading the model so there are two ways basically you can just assess the model from hacking phase through API key and other is to download the model locally so I have tried to assess the order to hugging field from a confused to API key but it was not working so I am just downloading the model uh lamba 2 model from hacking face globally so this might take some time because it is quite large okay so after we download the Llama 2 model from how to face I will just create a pipeline next okay now we have downloaded the model so now I will just create a pipeline okay so I'm just creating a group Face pipeline okay so we don't want the open a models if this okay so now you can see that we have uh number two model which you can see over here so there's no quite good okay so we have created a hugging face Pipeline and we have everything ready over here okay so now I will just create memory over here before we load the conversation chain so let's go above and let's compute the rest of our part so now here I've just create a directory by the name documents okay so you can see this directory over here so I will just upload uh the same PDF file so let me upload these two PDF files okay so now you can see that I have uploaded these two PDF files so now I will just extract the data from these PDF files and text Junction okay so now I will just download embeddings from again face some bending so I'm just using sentence transform weddings is open source abundance and it has no cost associated with it okay so we have the organization bedding so that's cool so let's just go down from here and let's run this okay so we have the skip this step over here we have uh so we have downloaded our info some weddings and we have splitted a text so now we just want to convert the document chunks or text chunks which we have created into embeddings as then we initially to create a vector database our knowledge base using chroma Vector database okay so we have this uh converted our document chance into one bindings and now we are just creating a vector database okay where we are the or you can say knowledge base where we are just storing these embeddings which we have created for the text chunks okay so let's go down from here okay so let's ask a question and let's see what response do we get here and I've just asked a question because the V7 is free Android with data set so let's see what response do we get from here is trained on which data set so let's see and here you can see that we have saved our embeddings in Pro Mod Vector database USB T so here you can see embeddings files over here okay so this might take few more seconds before it is ready here you can see that we have the response roller V7 is to enrolled Ms Coco data set which contains 80 000 image and thirty thousand annotation so this is not broke whiteboard so okay so we can just adjust these maximum new tokens over here so that we don't get this types of answers so along with this now you can just interact your document board as well like we did previously and you can just generate your required response as well so that's all from this video tutorial I hope you have learned something from this see you all in the next video till then bye bye

Info

Channel: Muhammad Moin

Views: 7,835

Rating: undefined out of 5

Keywords: Llama 2, Llama, #Large Language Models, LangChain, Generative AI, ChromaDB, Multiple Documents, Multiple PDF Files, ChatGPT, chatbot, LLMs, llama arts, deep learning, open ai, Open AI

Id: nac0nVBO24w

Channel Id: undefined

Length: 24min 12sec (1452 seconds)

Published: Fri Jul 28 2023