Talk to ANY PDF without OpenAI APIs: LangChain, it's not what you think

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody so some of you were asking about building a system to talk to the PDF so this video is all about that I will be showing you more than that so please bear with me there are various approach to do this but I will tell you the easiest one and with a similar approach like my previous video If you haven't seen my previous video please go and watch to better understand this so there are two approach to do this one using the OCR by using the OCR we can just extract the text from the PDF and save it in the txt file the similar approach we have done in our previous example by way we just extract the text from the URL and kept it in the text file after that everything Remains the Same I guess under the wood LinkedIn does the same but if you are too lazy to do this so you can use or exchange PDF loader that the links in provides which can be imported from LinkedIn dot document loaders after that you'll just load your PDF document into the honest as a PDF loader and load it in the document variable the process after this Remains the Same exactly like we have done in our previous video you here I have used my resume to do this but one problem I found doing this is that there is no preprocessing of the text as you can see we cannot remove next line and some other character from the text or maybe I am doing it wrong but our model is not that powerful as open AI so result might be inaccurate and we don't have much control on the whole process like we have just loaded our document in now install the PDF loader and everything is done by the links in so to solve this problem we can build the entire pipeline by our one to build in-depth pipeline layer S10 how the Lang chain have worked in the journal at the first we just have load our document by using the loader.load then we just have splitted into the multiple chunk here the chunk size is 500 the next we use a semantic search to encode our chunk to the vector here we have done by using embedding and semantic sorts in those Vector later on after that we have retriever top case 70 search result when we pass the query in it then we just pass the chunk into a QA model along with the query we have done semantic Source by here the QA model is Google 25 so as a length in user files for the semantic search we can leverage that in our benefit here is the repo of the files as is found online you can just study the repo to create your 170 search but let's see how we can use it to create ours one let's understand by an diagram we can see we haven't document for here for now the document is of PDF we can just extract the text from the PDF using an OCR the text then can be broken down with a multiple chunk chunk one chunk 2 and song three the chunk either can be encoded by using any word to wake model you can just use Imaging from the syntax Transformer or any Transformer you like including the bot Transformer the chunk Vector can be stored if the vector database like chroma DB or pine cone then we just semantic search on those by using the files after using the semantic Source we just get an top key result from it as our llm is basically a QA model we need to pass its context and a query to get an result so this is how the entire process of a lines in work this works for every practical example including the chatbot where we can create an infinite memory for it to create an infinite memory in a chatbot the similar approach can be taken only thing is we don't have document this time we haven't chatbot which include history the history can be extracted by using the response from our previous chat and if we use semantic search on those chat where we can just pass history to it and the response from the user to generate the response from our model this way the infinite memory can be done for chatbot so this is all I want to show you in this video thanks for watching And subscribe if you like my video
Info
Channel: Whispering AI
Views: 2,356
Rating: undefined out of 5
Keywords: LLMs, prompt engineering, natural language processing, langchain openai, langchain in python, long chain tutorial, langchain, gpt-3, train gpt on your data, train openai, langchain tutorial, langchain ai, whispering ai, asmr, prompt engineering course, langchain chatgpt, long chain python, gpt 4, pinecone, hugging face, hugging face models, langchain prompt, langchain agent, chat pdf, document question answering, how to use langchain with pinecone and openai, chatgpt, huggingface
Id: MQZINa6Y6Wk
Channel Id: undefined
Length: 4min 43sec (283 seconds)
Published: Mon May 15 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.