Summarize PDFs with a Local AI (Private GPT) in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this video, I'll show you how  to write your own app to process,   summarize, and query PDF documents using large  language models that run privately, locally,   on your own machine for free. This is Vincent  Codes Finance, a channel about coding for finance   research. If you're interested in that topic,  please subscribe. In my previous video, in which   I showed how to install a Chat GPT clone locally  on your machine, there was a feature in Open WebUI   that let you upload documents and query those  documents. However, this feature only looks at   part of the document at a time, it doesn't look  at the whole document. Many of you have asked   me in the comments if it was possible to summarize  documents using Open WebUI. I haven't found a way,   so I decided instead to show you how to build  your own app to do it. So, this is what we'll   be doing today. The app I'm building will be  fully running locally on my machine. For that,   I'll be using Ollama to serve the models, and  then to build the app, I will be using LangChain,   which is a library that provides a lot of  convenience tools to access and deal with   large language models programmatically. I'm doing  everything in Python today, but this library is   also available in JavaScript if you're more  comfortable with that. Finally, I'll be using   Streamlit, which is a Python module that lets  you build interactive dashboards very easily.   All the code for this tutorial is available  on GitHub, so look at the video description to   get the link to the repository and to my blog,  which provides written instructions for that. The app that I'm building today is based on  examples from the LangChain documentation. This   is a diagram from that documentation that explains  the two methods that we'll be looking at today.   First, we'll use the stuffing method to produce  a summary, and then I'll use the map-reduce   method not to produce a summary, although you  could; that's what they do in the example,   but I'll use it to query the document. So,  the way this works is that you first have to   load the documents. They call that multiple  documents in our case. When we load the PDF,   each page will be its own document, so we'll have  a list of documents, which are the pages from the   PDF. We'll extract the text and we'll run our  query on that text using the stuffing method.   What we'll do is we'll just put all of that text  within the context of our query. So, we'll just   put all of that together into one big query  which will submit to our large language model,   and then we'll get the result as our summary.  For the map-reduce process, what we'll do is   we'll do it in multiple steps. So, what we'll do  is we'll take each page and then run a query on   that page to find the relevant information in  that page and summarize it. And then we'll run   a final query where we use the result from each  page and aggregate them into our final result. First, we'll use Ollama to serve our models. In  order to install Ollama, just go on their website   and click download. If you're using Homebrew on  Mac, you can just do "brew install ollama," and   then you'll have to download a model in order  to work with that. So, you can have a look at   their model library. I'll be using Mixtral for  my examples here. In order to install that model,   all you have to do is run the following command:  "ollama pull mixtral," and that will download   the model. It is quite big, so it's going to  take a few minutes for you to download it. So, for this project, you'll need to install the  following dependencies. So, I'm using obviously   langchain, but in order to use all the features  that I want from LangChain, I'll also need pypdf,   which LangChain uses to load PDFs. Even though  we won't be using OpenAI for this project,   we'll need the langchain_openai module and the  openai module because we'll be using Ollama,   and Ollama provides an OpenAI API compatible  API, which we'll be using. That way,   if you ever want to take your app but instead of  running it locally, you want to run it on OpenAI,   it's only a few settings to tweak, and everything  will work. We also need tiktoken to help us count   the tokens in our queries. We'll use  python-dotenv to load our .env file,   and we'll be using streamlit in order to build  our dashboard. Finally, I have a final dependency,   which is rich. I'm not using it in this tutorial,  but in my repository, I do have also a CLI tool,   so a command-line tool, to do the same things,  and this one will be using Rich for the output.   As a sample document, I am using one of  my research papers. The link is in the   description if you ever want to read it. It is  open access, so you can download it for free,   and this is what we'll be using, and we'll be  trying to summarize and query that document. So, first, we'll need to read our documents,  and for that, we'll use LangChain,   which provides document loaders that help  us load different types of files. So,   whether they are local files such as PDFs,  as what we're doing today. It supports CSV,   different types of files, and it also has  integration for loading different types of data   from third-party providers. In our case, we'll be  using the PDF document loader, which uses pypdf,   so that's why we also need that as a dependency  for our project. So, we'll be following roughly   this example here. Note that it also supports  extracting images from PDFs, which I won't be   doing in this tutorial. So, first, we'll read  the file. So, we'll import the pypdf loader from   LangChain, then we'll provide the file path,  create our PDF loader, and then just load,   and that will create a list of documents. So, we  see we have one document for each page in my PDF. Next, what we'll do is we'll first summarize the  document using the stuffing method. So, we'll   just read the document, put it all in a query,  and ask the large language model to summarize   our document. This is inspired by the example in  LangChain. I tweaked it a bit so that it provides   slightly better results, in my opinion. But what  we have to do first is to create our prompt. So,   you can play with that to see whether you get any  better results, but what I have as a prompt is,   "Write a long summary of the following document,  only including that is part of the document,   and do not include your own opinion or analysis."  And then I provide the document. It is a prompt   template from LangChain, so I can add some curly  braces here, so this will be variables that will   be filled in by LangChain at runtime. This is the  document, and then I finish with "Summary:" and   then I ask the large language model to complete  this prompt. So, I'll create my prompt from the   template, and then next, what I'll do is  I'll define my large language model chain.   We will be using Ollama, but we'll be using the  OpenAI compatible API from Ollama, so that way,   if you ever want to use GPT-4 in the future,  you can just change some parameters here. So,   what we'll be creating here, we'll be setting  the temperature, so that's how much randomness   there is in the model. If you set it to zero, it's  almost a deterministic output from your model. You   should always get the same output given the same  prompt, but if you add some temperature with that,   it's going to add some randomness, so you can  play with that to see what it gives in terms   of results. Then we set the model name, so I'll  be using Mixtral, as I said. We have to provide   an API key, even if it is local. I'm running  Ollama locally, so I'll just pass it anything;   it doesn't really matter. And then I'll change  the base URL. Instead of using the OpenAI URL,   I'm using my local machine. And now I'm creating  my large language model chain, which is a chain   type from LangChain. And now what I'll do is I'll  finally create a full chain. So, I'll use a Stuff   Documents chain from LangChain, which takes two  things. So, it takes a large language model chain,   so an LLM chain, so we'll pass it the one we  created previously, and then it also asks for the   name of the variable that represents the document  in our prompt. So, in our case, it's "document."   If we go back to our prompt, it's "document." This  is what I need to provide in order to create my   chain. And now that I've got my chain created, I  can actually invoke that chain. So, in order to   invoke a Stuff Documents chain, I have to provide  the documents that I want the chain to run on. So,   if I do that, I'll just get the results and then  print the results so that we see what we get. So, it took a while, obviously. The first time I  called a model, Ollama has to load it into memory,   so there's an overhead with the first query,  but it took a bit over 1 minute on my computer.   I do have the output here. In order to kind of  look at it in a nicer way, I'll use textwrap,   which is a built-in library in Python, in order  to format it so that we can look at it. And then   we see there's a summary of the paper, but it  doesn't really have anything to do with the paper,   right? So, it says the paper examines whether  blah blah blah, but it's the paper doesn't talk   about the PEAD, the post-earnings announcement  drift, or anything like that. Where is it getting   this information? Well, it's actually getting it  from the references, if we look at the end of our   document, we do have a list of references that  do discuss these things, so that's where it's   getting that information. Obviously, I don't  want that; it's not part of the article. So,   what I could do in order to avoid this is just  select the pages that I want to use in my summary.   For example, here, I could just drop the last  two pages, which contain mostly references. So,   in order to do that, I can just use my list  of documents but go up not to the last page,   not to the next to last page, but the one before  that. So, go to minus three, and then if I   reinvoke my chain using these documents but just  excluding the last two pages, then I get a summary   that actually is much better. It's actually  really discussing what the paper is about,   and it's completely different from the abstract,  so it's also quite interesting. This way,   if you've been enjoying the video so far, please  like so that others also get to discover it. Okay, so that was the first thing I wanted to show  you. This is how we can just query a document;   we're just stuffing the full content of the  document into one query. It's good; it summarizes,   but it's also not providing the information maybe  that I want to extract. Maybe sometimes I want   something else than a summary. So, another way  to do that is we can just query a document. So,   have a prompt that changes from every query to  query, but for that, we'll use the map-reduce   approach. So, a common way to query a document  would be to use a RAG approach. That involves kind   of splitting a document into chunks, generating  embeddings, so kind of vector representations   for the text in those chunks, storing them in a  database that you can then query based on those   embeddings. So, you would embed the query from  the user, do a semantic search on those chunks,   get the most relevant chunks, and then use those  to build the answer. This is not what we'll be   doing today. If you're interested in that approach  and you'd like me to make a video on that, I'll be   happy to do so. Just let me know in the comments.  But here, what I'm going to do is I'm going to   take a simpler but much more processing-intensive  approach. It's kind of a brute-force approach. So,   I'll use a map-reduce to query the document.  The idea is that I'll be looking at every page   in the document, asking the large language model  if the document, so that page, is relevant for   the query. If so, extract, like, give me the  relevant information to that query, and then   after doing that on every page, I'll put them all  back together, and I will provide the final answer   using that. So, it's I'm mapping; I'm applying the  first prompt on every page, and then I'm reducing;   I'm taking these outputs and aggregating them to  produce a final answer. So, for this, I'll just   use a sample query: "What's the data used in this  analysis?" So, for that, I'll need kind of two   parts. I have two parts. The first part is the map  part. So, in order to map, what's this going to be   applied to every document, I'll say, "Well, the  following is a set of documents." It's actually   not a set of documents; it's just a page. But in  this case, it is still a set of documents because   we'll be using the Stuff Documents chain in order  to produce that. So, in order to be consistent,   we'll keep it with that. And then the idea  is we'll ask the model based on this list of   documents, please identify the information that  is most relevant to the following query. This is   the user query. If the document is not relevant,  please write "Not relevant." And then I'll ask for   a helpful answer. And using that, what I'll do  then is I'll create my prompt. I will partially   fill that prompt. So, I'll pass in the user  query that I have here in my prompt, and then   I'll pass the rest of my map prompt, so the one  that has been filled with the user query as well,   to the LLM chain, which I'll call the map chain.  This one is going to be used for mapping. After that, we'll need a reduce part. So, in order to  reduce, what we'll do is we'll use a different   prompt. We'll use this one: "The following is a  set of partial answers to a user query. These are   my documents. Take these and distill into a final  consolidated answer to the following query." This   is my user query. "Here's my complete answer."  And then I'll be again creating my prompt from   my template. I'll be setting a partial prompt.  So, I'll be setting the user query directly into   that prompt, so it's already integrated. And  then finally, I'll build my full chain. So,   in order to do that, I'll use a MapReduce  Document chain and a Reduce Documents chain. So,   in order to do that, I'll first need to create  my LLM chain for the reduce part. This is still   using the same LLM models that we defined  earlier. Now, I've got my reduce prompt. So,   here we'll build a few chains. The first one is  going to be a Stuff Documents chain that combines   documents using our reduce chain. That's what's  going to kind of reduce everything at the end.   This chain will be passed to a Reduce Documents  chain. This chain actually will take all of the   map documents and reduce them together using the  combined documents chain. So, as it is reducing,   so as it is combining the different outputs,  it also makes sure not to exceed the maximum   number of tokens. If the maximum number of  tokens is reached, what it will do then is   just produce a new answer and use that for a  next reduce step. And finally, we'll combine   all of that in a MapReduce Documents chain. So,  we'll set the map LLM chain, then we'll have our   Reduce Document chain that we've just defined  earlier, we'll set our document variable name,   and we'll set to not return the intermediate step.  We're only interested in the final result. So,   as you can imagine, this will generate  a lot more queries than the first step. Okay, so it's completed. It took a bit over 5  minutes. That's a bit longer than my different   trials that I've done. I think it's mostly  because I'm recording at the same time,   and this is maxing out my computer. This is making  things a bit slower than they were previously,   but still, it was a couple of minutes to run  all of the steps because we are doing a lot   of queries. But if we look at the answer  it provided, it is fairly decent. It does   provide a good summary of the data that's being  used in that project. Okay, so that's all good,   but if I want to summarize documents often, this  is kind of not a super nice interface to use. So,   I did build for myself a CLI interface. It is on  the GitHub repository; you can have a look at it   and download it. But what I think is even nicer  is if we have a kind of nice UI for that. So,   the next step will be to kind of package that in  a nice UI so that I can reuse it more often. So,   we'll build that using Streamlit, which is a  Python module that lets you build interactive   apps that run in your browser. You can also deploy  them online if you want, but given that here we're   running local models, that wouldn't quite  work. But this is what we'll use in order to   build our UI. So, if I look at the structure of my  project, I have a few files here that I added. So,   I do have my pyproject.toml that defines all the  dependencies. I've repackaged most of my code here   in that module that I call documents_llm. So, it  has a few things in here. So, we have a helper to   load a PDF, also to load a text file, but here  we'll just be using the pypdf. So, it's just a   simple function that when I call it, I provide it  either a path or a file, and then the start page,   the end page that I want, and then it's going to  wrap around the pypdf loader that we've just seen.   Then I have a summarize.py file that packages our  summarize document function. It takes a list of   documents as input, kind of model parameters  here, and then it will create the chain for   summarizing and invoke the chain and return the  output text. This is just what we've done before.   And in query.py, I did the same thing with  our map-reduce chain. So, it takes documents,   a user query, and then the model parameters. It  builds the model from a map-reduce chain. So,   I abstracted that part away. It's going to  invoke that chain and return the output text.   The get_map_reduced() chain function is just  what we've done before, so it kind of packages   all the steps to combine our chain. And then  I also have some Streamlit helper functions   that we'll come back to in a moment. My Streamlit  document is going to be in called doc_app.py. So,   it is fairly straightforward. I've added some  os and time imports in order to load environment   variables and also time the execution query.  I'm importing Streamlit, then I'm using dotenv   to load the .env file. So, you can set that to  set your default for the OpenAI URL, API key,   and model name that you want to use. If you want  to use the actual OpenAI API, you can just change   things here. And then from our Streamlit helper,  I'll also import run_query, which is our function,   our main function, to run the query. So, we'll  get back to that later. So, first, what the app   will do is load the .env file if it exists and  then load the parameters from the environment.   And then after that, I'm building my app. So, if  we go back to the app, I do have a title here,   which is my PDF Document Analyzer, with a little  description. And then here, I have some a sidebar   with different widgets that I can kind of use in  order to get the user input and configuration. So,   how is that build? Well, it's fairly simple.  I just use from Streamlit, I write the title,   and then I write my little description. And  then I create a sidebar using with sidebar,   and then everything that is within that  with block will be part of the sidebar. So,   I've got a header for the model. I'll use a text  input for the model name, but I'll get the default   one from the environment variable if it exists.  Then I've got a slider for the temperature. The   default value is 0.1, but you can adjust it as  you want. Next, we have the documents property,   so you can upload a PDF file. So, I'm using a file  uploader, which supports a PDF file. And then I'm   also having a selector for the page range. So, I'm  making it two columns, so I'm creating two columns   so that they are side by side. And the first one  will have the start page, which is a number input   that has a value, default value of zero, minimum  value of zero, and then the end page, which has a   default value of minus one, which in Python just  means you're taking the last element. So, it's   going until the end, but you can actually change  it. This one will also accept negative values. And   then I will also have a radio button to select the  query type, which is either summarize or query,   so one of the two that we've been using. So, now  we're outside of that block. But what I'm adding   here is that if the radio button value is query,  well, we also need to get the user queries. So, in   that case, I'll display a text area that asks for  the user query, and I'll provide kind of a default   query here. And finally, I'll add a button. And  the button, the way you define it in Streamlit,   is a bit weird. Is there's an if statement where  you put in the name of your button, and then   whatever is under that if statement will run once  the button is clicked. So, first, I'll initialize   my result to None. I'll start my timer. I'll check  that there's a file. If there's no file, I'll   write an error and say "Upload a file." And then  after that, if there is a file, what I'll do is   I'll display a status. I'll run my query, passing  all the parameters to this big function. And then   when I'm done, I'll update the status to "Done."  And if there's an error, I'll just catch it here.   It's a catch-all, and I'll just display the error.  And that's it. And after that, I have an if here.   If result, so if the result is not None, so there  was a result generated, then I'll create a new   container with a border that I'll label "Result."  I'll display the result. I'm wrapping it into a   markdown display. Some LLMs will generate results  in kind of markdown-like format, so in that case,   it will get rendered. Otherwise, it's just  displayed as text. And then I'll have a little   info bar with the time it took to run the model.  And the last thing that we haven't looked at is   how to run the query. So, let's look at that.  So, here in that Streamlit helpers file, I have   got two functions. One is save_uploaded_file.  Why? Well, it's because the LangChain document   loader needs an actual file, whereas the Streamlit  file uploader will only keep the file in RAM. So,   I actually have to take that file, I'll write it  to the temporary folder, so I'll write that file   and return the path to that temporary file. And  I'll use that where in my run_query. The run_query   will use the different functions that we've  created before. It will take the uploaded file,   so that's directly from Streamlit, then it'll take  a Boolean to say whether summarize or not. If not,   then it's query, the user query, which could be  null if it's summarize, start page, end page,   and then model parameters. And then what it will  do is it will first load the file and then either   summarize or query. In order to load the file,  well, I'll call my function save_uploaded_files.   But before, I'll write this update. Now, what's  not too clear when we look at this is that here,   our query is running within a status with status  block. So, anything that gets written within that   block will be part of that status bar. So, if we  go back to our run_query function, we see that   here, st.write will be written within that status  bar. So, we're just providing more information   about the status. We're saving the uploaded file.  Once we've got that file saved on disk, we're   loading the PDF. We'll use unlink to delete the  temporary file. And now, we'll either summarize,   so if we summarize, we'll call our summarize  document function. If we're querying, we'll call   our query document function, and we'll return the  result. And that's all there is to it. If we look   at our result and we run our Streamlit file using  Streamlit run and then calling our file, which   is doc_app.py, it loads our file. Here, I have  the model name. It's loaded from my .env file,   but I could change it if I want. I do have the  temperature. I will load my file, so I'll have the   PDF file here. File uploaded successfully. Can set  the start and end page, so two or three, something   like that. Then I can either query or summarize.  If I click "Query," I do have the text box that   appears here. But what I'll do here, I'll just do  our simple summary. So, I'll click "Run." Now we   see we're saving the uploaded file, we're loading  the documents, we're summarizing the document,   and now we have it. It took a little bit over 1  minute. We have our result, which is the summary   of the paper. So, now you have all the building  blocks. You can play with the prompts, you can try   your own types of query. Let me know how it works  for you. Let me know what to try. You can also try   the different models. So, here I'm using Mixtral.  Obviously, Llama2 is much faster. It's not as   powerful, but I have also had good results with  that one. So, let me know in the comments what you   try and what works and what doesn't work. But this  is how you can summarize PDF documents locally,   privately, on your own computer. That's it. If  you enjoyed the video, please like and subscribe.
Info
Channel: Vincent Codes Finance
Views: 2,232
Rating: undefined out of 5
Keywords: researchtips, research, professor, datascience, dataanalytics, dataanalysis, bigdata, data science, python pandas, big data, chatgpt, gpt, ollama, artificial intelligence, chat gpt, machine learning, uncensored, opensourceai, llama2, mistral, private, privacy, opensource, javascript, code, programming, python, langchain, streamlit
Id: Tnu_ykn1HmI
Channel Id: undefined
Length: 28min 57sec (1737 seconds)
Published: Sat Mar 30 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.