Chat with Multiple PDFs using Llama 2, Pinecone and LangChain (Free LLMs and Embeddings)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video tutorial we will learn how we can chat with multiple PDF files with Lang chain Nama 2 and final so here is the complete architecture so first of all the user will upload the PDF files the multiple PDF files they can be there can be two PDF files three or four PDF file in that step number two we will extract the data from those PDF file so now as we are using gamma to model Nama 2 model has an input token limit so one token is equal to four English practice so there is 4096 token limit so we cannot pass more than 16 000 English characters at the input of gamma 2 model okay so as for example when we extract the data from multiple PDF there can be more than 16 000 English characters the data which we extract from the multiple PDF so if as the number two model has an input token limit of 4096 directors or 16 000 English characters so that we can say that we can cannot pass more than 16 000 English characters at the input of gamma 2 model so when we expect data from multiple PDFs so there can be more than 16 000 English factors uh that that from the data that we extracted so we cannot pass the data directly to the lamba 2 model so what we do is we script the data into text chunks so we create small text chunks of the data and we can Define that in each checks jump we can have more than uh or we can Define that in each text Chunk there can be maximum 500 English characters so first we will lower the PDF for X then we'll extract the data from this PDF file then we will split the data into small jobs and we can Define that in each text Chunk there can be maximum number of 500 English characters or thousand English characters so then we will create embedding for each of the checks chunks and bendings are basically vectors that are used to compress the size of the text jump so embeddings are basically vectors that are used to compress the size of the text Chunk and we haven't made this in this form 0.10.30.3 so in this way we'll create embeddings for each of the text term so if we have 10 different text chunks we will have will create embeddings for each of the text Chunk so we have 10 embedding and if we have 20 text chunks we will have 20 embeddings we have 100 text chart we will have 100 embeddings so after creating embeddings for each of the text Chunk then we will build a knowledge base based on this embedding so using these embeddings we will create a knowledge base so knowledge base or you can say we will create a vector database so in this tutorial we are using pine cone as our vector database so we will be creating a knowledge base or vector database where we will store all the embeddings and we will lose spine cone as our Vector Store Pine code store the embeddings in the cloud and one advantage is that anyone can assess these embeddings okay so while other like other Vector stores like twice or chroma they store them adding slowly on your system while file won't save the embeddings on the cloud and anyone can assess these embeddings Okay so in the next step when the user asks a question we will create embeddings for this question and then we will do a semantic search so in semantic search we'll find a top three or top five answers that are related to the question that the user have asked so that when the user asks the question we will find the top three or top five answers of this question from the knowledge base which we have created in the knowledge space we have stored all our text in the form of embeddings so we try to find the answer of the user question from our knowledge base where we have stored all the Bannings so after finding the results we rank the results like we just show the we just keep the top three or five results okay so we can Define how many results we can keep by the setting the parameter value K so we just kept the top three responses that we have got from our pineapple database and then we send those responses to the directly to the Nama to model and what we do is we send the user question to the Llama to model as well and we send the responses to the Llama to model as well which we have got from the knowledge base which is pine cone vector and database and then gamma 2 model generate a natural response and number two modern help us to generate a natural response which we get as the end user so this is in short the complete explanation of how it works where you can say this is the complete architecture I have to it's the end view of basically recruit word okay so let's move towards the implementation part and let's see how we can do this so now you can see over here uh this is The Notebook file I've already prepared this notebook file so basically before running the script please make sure that you have selected e for GPU so in the first step we will install all the required packages so I'll just run this cell install all these white packages okay so this installation might take few seconds or you can see that but packages are installed so here we are using link chain package uh link chain is a basically a framework that allows us to build large language models then we have Pinecone client we will use Pinecone to store our embeddings in the cloud then we have sentence to on Commerce package we require sentence to our spawn a package so we can download the embeddings from hugging phase so basically we want to convert our text Chunk into embedding so we want to create embeddings for each of the texture so we will use sentenced on for a wedding strong hugging face and these are free so PDF to image we require this package if you want to display and here we are and a page from our PDF file as an image about the Google app notebook then we have Pi PDF so we require Pi PDF a package so that we can load our PDF file and extract the data from the PDF file then we have a Transformer package we require the Transformers package when we want to load the Llama to model and we also require the bits and bytes accelerate and Transformers packages when when we are trying to load the Nama 2 model when we are then we will import all the required libraries so from we have Pi PDF directory loader so that we can load the PDF files then we have recursive directly text meter so that we can split our text into chunks then we have problems with some headings so we can download embedding from hugging face then we have the pine contracted so that we can store our embeddings in the cloud then we have also required pine cone and convert from path we are not using this electricity over here then we have Auto tokenizer auto model for casual LM so that we can load the number to model and as we are so we can assess the number two model or we can assess any model from Talking based in two ways one is that we can assess the model from public face through API key and second is that we can assess we can download the model from hugging phase locally and create a pipeline so to assess the gamma 2 model from having Face we should download the model locally and then create a pipeline we cannot assess the number to Modern from again Facebook API because uh to assess thumb API you need to have an Pro account or Enterprise account on hugging face so I don't have a pro Enterprise account face so I do cannot assess the number two model through API so I will download the gamma 2 model from hacking face and then I will just create a pipeline then we have import OS and sys we will use this Library so that we can later exist on us at least from the loop so I will use this assist package at the end then from language I am importing hugging case pipeline so basically we are just creating a hugging phase pipeline we are not assessing the number two model to API because this service available code Pro or underpace client of a game face so we are just downloading the Llama to model locally and creating a pipeline so we will just create a prompt template where it will pass the default system prompt and our instructions and then we will use that it we will through a chain so that we can do a conversation with our PDF yeah so now using mkdir I am just creating a directory over here by the name PDF so you can see that I've just created a directory whether I'm PDF so now I have already placed my PDF files into the drive and I am directly loading those downloading those PDF files from the drive into this PDFs directly folder so these are the two PDF files let me show you so now I will be using these two PDF files to chat so one is the yellow V7 trainable free Beast set of view state of the art for Real Time object detector so this is the uw7 uh of gw7 is an object detection model so this is the official paper of show the V7 and this is the sample resume okay or you can say this is a sample resume uh which we I will be using for the chat first so this is sample it is roommate so and this is the YOLO V7 official paper so these are the two PDF which I am using this paper is 15 pages long so this is a quite a long paper you can see over here and this is uh this resume is to page three page long here so you can have three or four PDF files or tens of PDF files as well so I'm I have just placed those uh PDF files from my in my drive so I'm directly downloading this PDF file from my drive into this PDFs folder so now you can see that in this PDF folder we have both this PDF file so now I will just extract the data from this PDF file using pi PDF track reloader foreign let me show you the data which we are extracted so now you can see over here we have extracted all the data that we have inside is to PDF so you can see over here this is a huge amount of data that we have extracted from both of these PDF and this is quite large data okay so that's no good so now after we have extracted the data from those PDFs now here you can see that in our data variable we have all that data saved that we have extracted from the PDF so now in the next step we will split the data into small number of chunks and in each junk we will have maximum 500 English characters and there will be our lap of 20 characters so there it will be overlap of 20 characters among each of the text John so let me just create a text into text chunks as I told you at the start we cannot pass the extracted data directly to the lamba 2 model because Nama 2 model has input token limit of 4096 token or 16 000 English characters and that if you have multiple PDF and we extracted data from those PDF they can be more than 16 000 English characters so if I just try to talk zero one talk V over here okay it's Docs and if I just write the terms whatever yeah so you can see that these are the text junk and each text term contain maximum 500 English collectors and they will be 20 characters over there okay now I will download the embeddings from hugging face so now you can see over here I am just downloading a beddings form of interface and we are using this trans sentence Transformers models to download the embeddies okay so if we want to save this so now we can you can see that we are just downloading the embedding storm hugging face and we are using sentence Transformers model and you can find more details over here okay so after downloading embeddings from hockey face let's find the length of the embedding so you can just uh use embed Dash query embed Dash query to find the length of those embeddings and they turned out to be beautiful so okay so now next what I will do is I will just go to Pine Point from here so now you can see that I am in the pine cone so I have uh I in my pine cone I have just really uh I don't have a paid account I have a free account so uh in the free account you cannot create more than one index so I will just delete my previous index and just create my new index so this is a limitation uh if you have a free account on buying phone you cannot create more than one index so I will just create and we want the X by the name blank chain find code and we have to find the dimension of our embeddings is 384 I told you in the notebook so I will just started package so I am just create mindless I will just be going to create index form here okay so if I just go to indexes from here you can see it's a naturalizing and if I just click from here our environment is is you can just copy the environment from here and you can just go over here and just paste the environment over here and here you can paste your point API key so it's you can easily create your account using Gmail or email or any other email on Pine code and you can just go to API key and here you will find your API key and just copy that key again from here so it's currently slicing so this might take few seconds before it gets initialized and here you can paste your point one API key over here okay and if you just see over here this is your index name which you can find over here okay so you can just copy this name from here and so you can just add this name over here so you can just add this name over here so I will just add this name over here you can see that in the index name you can add the name so let's see if our find is initialized so it's initializing for this might take few seconds then we are good to go now you can see the green signal it means it's ready so now let's go back to our if so now we are just back over here so now I will initialize the pine quantity is the API key environment okay so I just got this step and here I have just passed my index gain okay so now I will just create a embedding score each of the text chunks okay so now you can see that we have pretty embeddings for each of the text Chunk and let me just go back to the file cone so now uh what we can see over here is one thing we just need to remember is that we have splitted our text into how many chunks we have splitted our text into 168 junk so we have 168 different chunks so we will have 168 vectors so we will create a one embedding for each of the text Chunk so we will have total 168 embeddings for 168 vectors okay so now you can see over here we have total 168 of vectors because we have 168 text Chunk and we will create some battings or vectors for each of the packs jump okay so if you have already like you can say that I have created embeddings for this PDF so now I don't if I just read on this trip I don't want to uh create embeddings again and save those embeddings in the cloud so if you already have like I have already have an index in Pine code so I have already have an indexing find one where we have saved our embedding so we have saved our embeddings in pine cone in the cloud so if I just want to use these embeddings or these embeddings again I don't need to rerun this script and save to some baddings in the fine Port so I can simply run this if you have already have an index like I have already have an index in fine code so I can lower the embeddings in my Google collab notebook by running this cell okay so now what we have done is that we have created embeddings and we have saved those embeddings into fine code Vector database so now when the user asks a question we will create embeddings for that question and then we will do a semantic search so now here the user is asking a question now we are creating embeddings for that question and then we are doing a similarity search so the user asks a question now we are just creating embeddings for that question and now we are doing the similarity search and these are the three answers that we got one two three so this is the first answer this is the second answer this is the third answer this is of fourth answer which we got for our question by setting the K value you can just write if I want to just get two best responses of your question from the embeddings you can just set the K value from there okay this will give you four best responses okay so now I will just create another wrapper over here so now I will just do notebook login over here so I will just go to hugging face okay so let me just go there here so now you can see I'm not in this account or you can easily create your account for hugging face and it says token here you have your apag and this copy this okay and you can just go back to over here and you can just paste your SS token over here and just click on login from here okay so now we will download problems over here and now we are just using up downloading a llama tool model so as we are not assessing the number two model through API key so we'll just create a pipeline for gamma to model is a text generation model okay so we are just downloading the download over here and then I'm just creating a pipeline we have downloaded the model so now we will just create a pipeline over here okay and I have to set the temperature value 0.1 so if we are doing some uh if you are generating some Port then you can set a temperature value high but if you are going to generate some text article block then you can set the temperature value zero so the temperature value varies from 0 to 1 so if the temperature volume is high it means the model is more creative it will it might generate uh some wrong response but it would be very creative it will uh it will be very much creative but if the temperature value is low so it means some water is model is very deterministic it will generate at 2D Point response but you can say so as you want to generate a response from our PDF you don't want the model to be very creative so we can set the temperature value to like 0.1 so we are not setting the temperature value above we call High because we want to generate a response or from our PDF files we want don't want the model to be very creative we want it to be determined this thing okay so now here we are just creating a prompt template so this is a default system prompt which we are defining so these are the instruction and system tags so if you just go to Lama GitHub okay so if you just go to the lamba took it up deposit it from here so now here you could find that so as we are using chat model over here so if you just see over here we were using llama tune chat model okay so Lama 2 comes with uh three chain and fine-tune model as well so like you can say that we have the pre-trained model as well in Lemma 2 and find your model so number two pre-trained models cannot be used uh for a chat or question answer so you can see over here these models are not Point even for chat or question answered so here we're going to use our uh comma 2 models for chat to do question answer from our PDF so we cannot use the pre-trained model these should be prompted so that expensive answer is the natural contribution of the problem so uh it number two pre-trained model can be used for sentence completion but lava treatment model cannot be used what chat request answer so here we are using fine tune number to chat model for the chat completion okay and here if you just go over here further so here you can read that to get the expected feature and performance for them for that point from the fine tune chat model we need to follow the inspection and system tags okay so we need to follow the formatting so that we can get the expected feature and performance so like you can see over here here we are just following the uh and tags instruction and system tags over here this is the default system prompt uh we can say that number two model has been trained on that default system prompt I have updated this uh different from default system prompt update or you can say this is the custom system prompt and we here we have an instruction tag and here we have a system tags so default system prompt will come inside the system tags which we have over here so you can see that our system prompt is coming inside the uh system tags the system prompt okay and this is our instruction which I am passing okay so we have the quantized question so now we using this we will create a template and this is our template use the following piece of text to answer the questions at the end if you don't know the answer just say that you don't know don't try to make an answer to Series this is the default system prompt and this is my instructions which you can find over here that a question will be the user the input the user will wire it so now I will just create a prompt template over here and we have two inputs like you can see in my instruction one is the contest and there is a question which is the input so I am just passing input variable context question over here and now I'm just using retrieval Q8 prompt uh from Lang chain so that I can do question answer or chat with my PDFs okay so let me just ask a question yellow V7 is useful and let's see what response do we get over here so I'm just asking a question so this question answer lies in this paper so now you can see a based on the provided context the question answer the question will be seven is used for objective action and this is very right answer you know V7 is used for object detection as YOLO V7 is an object detection model so now we can also do a question answer over here as well so we can simply remove this and we can just start the question so here outperforms which model so if I ask this question yellow B7 outperforms Which models and let's see what response do we get for this question so here you can see that right represent input exit so if that user write exit it will simply exit so this while loop will end okay so this might take few more seconds before we get the response here is our response made based on the provided context you know B7 output forms the following models and here you can see that uh Which models show the V7 outperforms okay and I can just ask seven areas and if I asked us another question from this uh PDF what is qualification okay so I'm just asking that another question from Mrs you may and let's see what response do we get for this based on the provided contestation is PhD in English according to context with a PhD in English from the University of Illinois at ardwell campaign in 2023 okay so now if you just see over here this is our right answer or not so we can see that uh Russian Greece is PhD in English so we have got a correct response and you can see over here so that's correct well we can if you respond to a zip from here they write exist and this will I know okay so that's all from this tutorial I hope you have learned something from this tutorial thank you for watching have a great day ahead bye bye
Info
Channel: Muhammad Moin
Views: 3,765
Rating: undefined out of 5
Keywords: llama2, llama, llama llama, chatbot, chatgpt, multiple pdf files, llm, embeddings, pinecone, pinecone-client
Id: TcJ_tVSGS4g
Channel Id: undefined
Length: 26min 34sec (1594 seconds)
Published: Thu Sep 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.