GPT-4 & LangChain Tutorial: How to Chat With A 56-Page PDF Document (w/Pinecone)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey this is mayor from chartered data and in today's video I'm going to be talking about how to chat with a long PDF so here we have uh 56 page legal document it's actually a legal case for um a massive Supreme Court case in the United States you can see we've got tons of pages which is typical for most PDF documents and you can see it's this kind of horrible text that you can't even properly copy out as well so what we want to end up with is a situation where we can chat with the document so we can say what is this legal case about press enter and hopefully we'll get response back this legal case about student Frederick interesting now we also have sources referring back to the PDF but also sections of the PDF as well that you can review so maybe you don't understand something from the response so you can say back what do you mean by qualified immunity let's see what comes back so is this kind of back and forth interaction where we're using Lang chain and the gpt4 to get response back um that hopefully is what we're looking for interesting cool let's check that points to actual links so that's pretty cool right so you get the references and you also have sources as well inside the document cool so how do we do this how does this work well let's jump into the diagram and get started so this is the PDF chatbot architecture using Lang chain and gpt4 now when I show the code or if you want to replicate from the chord base just bear in mind that uh you can swap out to all the models you don't have to use gpt4 I'm just lucky and I was lucky to get access to the API so you have the PDFs documents we convert it to text right then we split the text into chunks because of the issue of context window remember with if you've ever played Richard gbt if you try and paste an entire PDF doc inside or you're trying to copy the text and paste it inside you probably noticed that it says that the size is too big so we overcome that issue using Lang chain to split into chunks and each chunk is going to be a certain number of characters of your text so maybe it's a thousand characters two thousand whatever the cases so we have these chunks then we create these embeddings so an embedding is just a number representation of your text we store it somewhere okay so you can kind of think of this as an ingestion phase right and we'll talk about that in a second when we jump into the code but this ingestion phase will take this document convert it to text split it and convert it into numbers that will be stored in a vector base and in this case we're using pi kernel so I'll come back to that in a second so that's phase one now phase two is from your front end the user asks a question so maybe they say how do I create an account and what you've done here is a PDF docs of your company's um support docs right so the user says how do I create an account you combine that with the chat history as in this case you send it to the large language model so GPT 2.504 and you say hey create a standalone question so based on the chat history and the new question create a standalone question this done little question we convert into embeddings so embeddings kind of look like this right it's if I just do a quick sketch you just have something like 0.1 0.2 you know 1.1 and each vector you know you would end up with 1536 of these in the case of open the eye for to represent the the text that this Standalone question so all these vectors are then take it to the vector store so it says hey okay these are the numbers I have let me compare to these the numbers you have and remember when you stored here each of these represented num like a at these vectors right and they all have different values so what it's going to do is check and see okay which chunk is similar or more similar or which chunks are more similar to the question this Standalone question that was asked right and so it looks to the relevant documents that are embedded retrieves the relevant documents which are here in these cases the source documents then it combines a standalone question so in this case whatever this plus this led to that and it uses the relevant docs as a context to say hey based on this Standalone question and the relevant Docs do x y z right you obviously customize what you want the model to do and then gpt4 in my case returns an answer and that's basically what we're seeing here right the response comes back so that's the architecture in a nutshell so let's jump into the code itself to make sense of what's going on cool so basically there's two phases involved we already spoke about the ingest the ingesting phase which is just a phase of effectively converting your PDF into uh these Vector numbers that will be stored in a vector store so if you can see what's going on conceptually we've gone over the high level so Lang chain has this thing called a PDF loader and what the PDF loader does is it takes a file path so in this case the PDF is in here so this is the the file path and what it does is it basically loads the raw documents from the PDF file so it does all of that for you under the hood so these raw documents are just basically the text contains the text of the PDF once we have that we split remember the split in in the diagram into chunks of a thousand and they overlap so one section to another of 200 and again this is provided by Lang chain to make this easier so we split the dots and then we create the open UI embeddings function remember we need a a function we need we need this thing that's going to help to create the numbers from the text and so then we initiate we create we we create or initialize this index for python so in indexes you can think of as the name of your store or where you're going to store your vectors then we run this from documents function which effectively what it does is actually goes through the process of taking the embeddings and putting them into creating the embeddings and then putting them into pycorn right so that's the namespace here so you can change this namespace well actually in the configurations you would need to because when you create Pi chord you give your index a name and you would also give it you have an optional namespace but I recommend that because um you probably want to have a way to categorize different uh vectors of different um embeddings that you make into the store I'm going to show you what that looks like I know might sound very opaque right now so so yeah there we go so you have your index it's Pinecone you have your documents which are split already you create the embeddings and you store in the namespace right so let me run this again but I'll change the namespace so I don't override what I currently have um and let me just call this demo I'll just show you what that looks like so there's a script in package.json that is called ingest and that script will run this function right so that's npm run ingest I just want you to see what actually happens here there we go let's create the vector so it's done the splits it gets the metadata now let's create the Vex store and ingestion complete right so that was this process now the embeddings were done and the um ingestion is complete right because we run so let's go into pinecorn and see what that looks like so this is my pycorn dashboard you can set this up on your own and create your own index name so like I said you think of it as storage you set your environment so your environment is basically where it's going to be served closest to and you want to make sure this matches what's on your code cosine is the calculation is done it's find what's similar and then these are the dimensions for each Vector as I spoke about so you would effectively have say index here so if you check it out look at we did demo right and that was what we just did right now and demo has 178 vectors which is the same as tube T4 PDF let me show you what a test looks like query so this is what Vector looks like uh these are empty but effectively you would just literally have these array of numbers and they represent a particular section so your chunk that you've put in and so when we say 178 vectors that's what we're referring to in this case um so that's basically Pinecone is second in a nutshell let me see if I can test retrieve one D1 oh there we go so this is an example of what the vectors would look like so every Vector has an ID and you can see it has an ID it has values it also has metadata which is text so you can kind of think of this as your chunk and your chunk is represented by these values these factors right and it's these vectors that are compared to the question the user asks to then say hey which one of you guys is the most relevant to the question so I hope that explains pinecorn and obviously you've seen now cool so yeah back to the code basically now you've done the ingestion so that's phase one complete so what's next well pretty much at this point let's just go through other things so this is the pinecorn uh initiates in the Pinecone so here you set your environment as discussed um you set your API keys so you make sure you clone this one and then you create an environment variable folder where you put in the examples inside right so you copy these and then you put it in here and then you you go to open AI you go to Pinecone to get the uh the API Keys all right cool the visual guide is also in here um this is the open AI client which I you can get from Lang chain directly but I'm just trying to make this um more structured and then we have make change so make chain effectively what's going on here is this is the streaming effect that you saw and in this stream in effect this is actually a custom chain so usually line chain you have this thing called a chat Vector dbqa chain and all it does basically in a nutshell is it takes the question and it goes through the flow that we showed in the diagram and it goes to retrieve the similar documents and responds back when you call the chain right so it's a chain you can think of it as a series of actions just like you saw in the diagram so here we're passing in the vector store which is Pinecone and then we've just got some custom prompts and we're saying return the source documents true so that's how we get the ability to see the Source documents and then K is equal to two so how many Source documents to return and so this is the stream in effect is optional you don't have to but here this is the model name and you can change this to whatever your you currently have access to so it's a 2.5 turbo or DaVinci so it's it's whatever model you have access to temperature is zero to just prevent Randomness in response especially when it comes to legal stuff you don't want too much creativity um streaming and you got a callback manager as well which handles the tokens that are being streamed back and that's that side so let's go to the front end cool okay this is quite a bit going on um I've received quite a bit of requests quite a lot of requests to do a step-by-step tutorial especially for people who are um new to JavaScript or beginners in coding so I am if you check the description of this video there'll just be a link to a waiting list so you can go sign up if you're interested in that but I'll try my best in the short time I have now to just kind of go over what's going on so this is the front end obviously we're dealing with the query so this is the question and we have a state to manage the source documents coming back and as you can see we have you know an initial State basically messages is the messages pending is messages that are coming in and history is your chat history so again we're trying to represent the diagram I showed in code form um let me see let me skip forward so yeah this is a submission so obviously we clean up the query because maybe the users has spaces in their questions so we trim up and then we set the initial state to effectively uh take into account what the previous state was and um also the user's question as well so that's all passed into the messages and then pending is then defined right so now we have the new state and the messages and the type of questions to use a message this this coming through right we start loading and then we set pending to an empty string all right so at this point what happens is we then hit the endpoint API chat so if I jump in receive the question the history sanitize it just to clean it up to make sure that it's good for embeddings and then pycon we basically um go into pinecart and to say Hey you know um let's create this Vector store where we basically have the embeddings and the namespace and also this index represents the index name as well and then what we we do is we just create this function to tell the the client tell the front end that look um you know data is coming data's we're going to send data to the front end and this function is here now what happens effectively is that when this chain function is called right as you can see it in the previous uh code I showed you it uses the chat Vector DBQ hn which what it does is it goes retrieves similar documents comes back and you saw that we set the stream in with the tokens right so what's going to happen is it's gonna take this back to store go do the search and then retrieve the tokens so the token is just like one string you know press string and per string is going to send that string to the front end and that's how you get that streaming effect so this is a callback function for um that we created that would effectively every time a token would come send it to the front end so that's what's going on there so this is where we call the function so we're calling this make chain function uh with the sanitized question and the chat history again matches the diagram that we spoke about and then we also send The Source documents which we set true so now the source documents have come back so response.source documents which is coming back from [Music] um are set in return Source documents we send that to the client as well and that's how you're able to see what the source documents are when all of this is done done is triggered and that's why you can see what's going on here when it's done we set the history we also set the messages and we call off the loading right because at this point there's no pending or pending Source documents so we just basically say hey here's the API message and it's state depending which represents the message that came in and then the source documents come in otherwise if it's not done we just pass the um the data that's coming in right and so obviously here we're just checking for the source document but before we set the state with the source documents so if what I'm saying sounds like gibberish um okay yeah we also use use memo to effectively um memorize this and because it's a function we're calling over and over again so we're just trying to um be more efficient here now obviously this is the front end um that captures all of that Maps over it and and so on and so forth so I don't uh because of limited time that's just the overview the source code is going to be available like I said the visual guide is here as well um but yeah I think there's been quite a lot of requests for more in-depth step by step so if you're interested in that check the description join the wait list for a potential workshop I'll just talk to people on the waitlist and see if there's enough demand then I will do a comprehensive Workshop step by step for on how to build a chatbot for your document so whether it's a PDF or it's book or it's multiple PDFs or it's a doc CX or an Excel whatever um by the end of that hopefully you'll be able to build an application for yourself for your clients or whoever to have a back and forth interaction with that so this is a in a nutshell if you have any questions just shoot me a message on the comments and um yeah thanks for watching cheers

Info

Channel: Chat with data

Views: 171,542

Rating: undefined out of 5

Keywords: gpt3, langchain, openai, machine learning, artificial intelligence, natural language processing, nlp, typescript, semantic search, similarity search, gpt-3, gpt4, openai gpt3, openai gpt3 tutorial, openai embeddings, openai api, text-embedding-ada-002, new gpt3, openai sematic search, gpt 3 semantic search, chatbot, langchainchatgpt, langchainchatbot, openai question answering

Id: ih9PBGVVOO4

Channel Id: undefined

Length: 23min 56sec (1436 seconds)

Published: Thu Mar 16 2023