4 ways to do question answering in LangChain | chat with long PDF docs | BEST method

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
do you know that there are at least four ways to do crushing history in line chain in this video we're gonna show you four different ways to do question answering in line chain and in the next video I'm going to show you how to build a question answering PDF chatbot to ask questions about your PDF just like this app okay let's get started with coding so you will first need to install needed packages and then you need to type in your opening API key already ran the cell and removed my key but you will need to pass in your key here because we're building a question answering engine I guess you will need to have your external document first I am interested in interacting with a PDF so here I'm using pi PDF loader there are a lot of different document loaders if you are interested in CVS file you can use CVS loader you can use stuck DB loader if you are using a duckdb file even at S3 file YouTube you can interact with your YouTube videos I think this actually will give the transcript from the video so which is super convenient but in this example I'm using a PDF file because I was reading this report 2023 AI index report which is a really good read I saved the first chapter as example.pdf in the materials folder if you check the documents you can see the content of the document it is actually separated by different pages okay so the first method to do question answering in line chain is called the load qha this chain provides the most generic interface for answering questions it lets you do question answering over a set of documents but it uses all of those documents so in this example we have load qha and the llm we're using the default openai gpd3 and then chain type let's do stuff first stuff means that we want to pass in all of the text into the prompt into the llm and then we can run the chain with the question how many AI Publications in 2021 let's take a look this is expected by the way you can see the model's context maximum token is 4097 however we requested this many tokens because our PDF file actually has more than 50 pages though it's a lot of text there are actually two types of solutions one is we change the chain type the second solution is the second method I'm going to talk about in a little bit if we change the chain type to for example map reduce this would work now we got the answer in 2021 the total number of AI Publications was almost 500 000 which is correct by the way okay so now you might wondering what other chain types do we have great questions here's the line chain document the stuff chain as I said before basically passing all the text into your prompt map reduce chain actually separate your document into different batches and feed each batch into your large language model separately and Returns the answer separately here's an example where we have four batches the first batch was able to answer the question and the next three batches were not able to answer the question and the final text was based on the answer of the first batch one thing to notice is that batch size actually matters and you can Define your batch size in your language model the refine chain is actually similar to the memory juice chain you also break down your document into different batches you feed your first batch into your language models and then you feed the output of the first batch and the second batch into your language model and your answer get refined along the sequence of batches as you can see here the answers just get longer and more refined at each step the refine chain is actually a sequence right but mapreduce is everything can be in parallel so memory juice can be a little faster here is for example how it works under the hood here is a refine template where we have an existing answer and now we have the opportunity to refine the existing answer only if needed with some more context below and this will be the text of the next batch memory rank chain is also very similar to map reduce with the additional score at the end of each answer for each batch so the score is defined as how fully it answered the user's question so the final answer is actually face down the answer with higher scores so those are four chains as you can see in our example here when we use mapreduce chain it indeed was able to return an answer for us the downside though is that this actually uses a lot of tokens it actually uses all of the tokens in the PDF file which can be really costly if your file is large a better solution in my mind is to retrieve relevant text chunks from your document and only use those relevant text in your language model so your language model actually is not looking through all the text but only looking through a small chunk of text that is our second method retrieval QA retrieval QA chain actually uses load QA chain under the hood we retrieve the most relevant chunk of text and feed those to the language model first of all we load our document as what we did before and then we split the document in two different chunks you can Define the chunk size here and then we can select which embeddings we want to use here we're using the opening eye embeddings and then we create embedded vectors or all of the chunks and then we can expose this index in the in a retriever interface basically what does it mean is when you have a embedded Vector of your question and then you have a database of the embedded vectors of text chunks you can find which of the vectors in your vector database is the most similar to your question vector and only retrieve the relevant ones so that's the retriever step and the final step is to create a chain to answer questions here we use the retrieval QA chain again you can Define what language models you like you can Define your chain type again now if you ask a question how many AI Publications in 2021 as you can see in the result we have our query a result almost 500k which is correct and Source document there were actually two documents here two of the most relevant document to answer this question because I defined my number of documents as two that's pretty straightforward but there are a lot of options you can play with for example there are a lot of embedding methods here we use open AI embedding but you can also use cohere embeddings and use hugging face embeddings you can even choose different models from hacking face the next thing we can choose is we can choose different types of text splitter in this example we used character X splitter which uses a single character to split the text the chunk size is measured as the number of characters but you can also use different text splitter and different tokens as to how the chunk is measured by different tokens instead of different characters another thing we can explore is here we used chroma as the vector store different Vector store might give you different capabilities that you can take a look and then retrievers or also different retrievers I know there are a lot of options you can explore here I just used the most generic Vector store retriever but there are different search types again the first way is the similarity search where the search type equals similarity which means you're trying to find the most similar Vector to your question vector or you can use MMR which is maximum marginal relevance search so it does not only optimize for similarities in vectors it also optimized for diversity in vectors which means the first chunk it shows should be a little more different from the second chunk the third method is Vector store index Creator a wrapper around the above functionalities that we just mentioned is exactly the same under the hood it's just a highlight level interface to let you get started with three lines of code okay so you can see you can just use three lines of code to get same answer and of course with Vector store index Creator you could change your parameters and Define different options of your text splitter embedding and Vector score as we mentioned previously if you want to take a look at the code here is how the vector store index wrapper is defined as you can see here the default language model is using the openai gpd3 it is using the retrieval QA of the second method the default Vector store is chroma the default embedding is open AI embeddings the default text splitter is this function which is a recursive character text splitter yeah so you can see all the default functions in this wrapper and you can change right here and the final method is the conversational retrieval chain which basically combines conversation memory plus the retrieval QA chain so if you want to keep all your chat histories and passing your chat history to your language model this one is for you so everything here is basically the same as we have seen before you split the documents into chunks select embeddings create vectors stores and then use the retriever interface and then use a chain to answer questions that change here we're using conversational retrieval chain and we call it QA we Define a check history as an empty list in addition to Define our question query we can also passing all all of the chat history into the language model when we start the conversation the chat history is empty it's an episode list so the answer here is the same as before 500 000 and and then the check history we're passing the query and the answer from the previous query which is 500k and then we can ask another question what is this number divided by 2 and sending this to history and this question into the language model and it Returns the correct answer which is 250k as you can see it actually has the context of this number represented 500k and then divided by 2 is 250k so that's it for this video we introduced four methods to do question answering in line chain in the next video I'm going to make a question answering chatbot so see you next time
Info
Channel: Sophia Yang
Views: 13,598
Rating: undefined out of 5
Keywords:
Id: DXmiJKrQIvg
Channel Id: undefined
Length: 11min 36sec (696 seconds)
Published: Sat Apr 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.