Summarizing and Querying Multiple Papers with LangChain

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video you're going to learn how to summarize and query multiple papers using link chain let's go okay guys okay guys so to get started what we're going to do is we're going to import our dependencies and then we're going to start with summarization first and then we're going to go for querying and the papers that we're going to be working with are these three papers we have the llm plus b Paper empowering large language models with optimal Plenty of Fish proficiency and then we have the prompt learning to prompt for visual language models from 2022 and finally we have a recent paper scaling Transformer to a million tokens and Beyond with rmt so these are the three papers we're going to be summarizing and then querying so to get started with this it's actually quite simple all we're going to do is we're going to import our dependencies so we're going to import the load summarize chain for summarizing using link chain from the summarize package from the summarize module then we're gonna import the pi PDF loader for doing the loading of the unstructured PDF and then we're going to import openai for the access to the llm in this case we're going to be using jgpt and prompt template in case we want to do custom summaries for the papers that we're going to be working with after that we're going to be doing uh we're going to be importing glob which we're going to use to Loop over all the PDFs in a folder and that's pretty much what we need now for the actual summarization we start by setting up our llm with openai we set the temperature to 0.2 which is like relatively low because we want the summarization to be as precise as possible all right and then we're going to set up our function for the summarization and to do that we're going to do is the approach we're going to take is very simple we're going to just have a folder with the PDFs with the papers that we want to summarize and we're gonna Loop over that folder and get the summaries for each individual paper so we're going to set up a function the function is going to be called summarize PDFs from folder and then we're going to feed it the PDF folder then what we're going to do here is we're going to set up an empty list that's going to contain our summaries then we're going to Loop over all the PDFs in dpdf folder and for each PDF we're going to load it with pi PDF loader then we're gonna um set up the indexing with length Chain by first loading and splitting the documents into chunks and then running the summarization chain in this case I'm using mapreduce which I found to be like something that works very well and then we're going to run the summarization so that's it we're just loading the document and splitting into chunks and then we're using the load summarize chain to create the chain for the summarization with the llm that we set up and then we're running that summarization chain and saving that to the empty list that we set up in the beginning and that's what I'm doing here and I'm also printing that summary just because you know I'm gonna set up a print here and that is it and then finally we're going to return the summaries okay so that's perfect now let's test our function so I'm just gonna call the summarization function and feed it the PDFs folder where I have all the papers that I just mentioned and let's run it so now it's running I'm going to do a magic trick so that this thing goes a little bit faster so to then okay so we got our first summary this paper presents scroll up official learning method that uses learnable context vectors to optimize pre-trained visual language models for Downstream image recognition tasks experiments on 11 data sets show that Co-op outperforms and crafted prompts and the linear probe model and is robust to domain shifts paper also discusses various research papers related to computer vision including topics like natural adversary examples Etc so first summary perfectly aligned with this paper learning to prompt for visual language models so it talks about Co-op which is this context optimization framework thing which is an approach for adapting clip-like visual language models for Downstream image recognition tasks so awesome and there we have these two other summaries that we can take a look here so presents the recurrent memory Transformer model designed to process sequence longer than a million tokens that's perfect that's exactly what that paper is about and then we have the this paper introduces llm plus b frame that combines the strains of large language models and classical planners to solve long Horizon planning problems all of these are perfectly aligned with the papers that we have here in this folder so this is awesome and now that we've done this instead of having these summaries like this on the terminal or whatever we're going to save them to files so we're going to save all the summaries into one txt file so I'm going to Loop over the summaries and then I'm going to write it and then I'm gonna jump a few lines so that the summaries are separated and that is pretty much what I'm gonna do so now if I come back if I come here on this code in a course summaries and I open this to my side you can see the summaries here and we can make it pretty by having this like this side by side so we have our first our first summary about the first paper and obviously we can you know organize this output as you know as well as we want if I was doing some specific research on these topics I might save these in some uh in some specific way but right now it's just for you guys to see what these summaries look like and I'm actually I already checked the outputs before so I'm quite satisfied with the quality of the summaries and as a researcher or someone that does independent research I mean this is just a Priceless two to have and if you can have it as code instead of something that you have to subscribe for ETC you have much more customization options and things that you can do so it's pretty cool so I have the three summaries here for the papers that's awesome and now what we want to do is we want to query those PDFs we want to query the papers right because it's not just about reading some random summary but if you're actually doing research or if you're actually trying to understand what the papers are talking about you need the ability to interact with those papers right there's a bunch of tools out there like chappy def that I said two that actually really like because you can have actual conversations with the PDFs and that can be quite interesting but I found that just like the ability to quickly query PDFs from a folder is actually really powerful so that's what we're going to be doing so to do that we're going to import the vector star index creator for the indexing and vectorization of dpdf so that we can query them and we're going to import this byte PDF directory loader from link chain which allows us to load multiple PDFs from a directory uh just with link chain which is just awesome okay we're gonna run this and now I'm going to set up my loader using the pi PDF directoryloader.pd and feeding the dot PDFs folder and I'm going to load the documents and do the chunking and create the vector store index we're using this Vector star index Creator class and then calling the from loaders method and feeding that a list inside which I have the loader that we were just talking about and oh sorry and now I can run this so now this is indexing all the papers that were here and again I'm gonna skip the part where it's running so essentially it's using uh chroma to do the vectorization so now let's go for the first query of the first paper so to give you guys a little bit of context so I will be querying this paper so this is the WE Post context optimization approach to adapting clip-like Vision language models for Downstream image recognition tasks concretely cop models a prompts context words with learnable vectors while entire pre-chain parameters are kept are kept fixed all right cool so I'm just going to say what is the core idea behind the co-op contact substantiation paper that's all I'm gonna say and now I'm querying the paper and I got my response core idea is a to model a prompt context words with learnable factors while keeping the entire pre-trained parameters fixed in order to adapt clip like clip like visual language models for Downstream image recognition tasks that is exactly what this paper is about and that makes total sense for this paper so this is a perfect question and answer result now I'm going to be querying the second paper so what I'm going to be saying is okay so I'm going to be querying this paper scaling Transformer to a million tokens and all I'm going to ask is what is the central idea that can allow for scaling Transformers through 8 million tokens it's a very general superficial idea but like just you know just because and I'm query now and the center idea is to use recurrent memory Transformer architecture to extend the context length of birch allowing it to store and process both local and Global Information across up to 2 million tokens so that is perfect and that is actually what's in the first page of the paper essentially and that's really cool because anyway we're just we're using just one centralized way to query multiple papers so I can imagine this creation of you know some database that I can query with natural language and it knows all the papers from some research that I'm doing whatever the case may be this is really interesting okay so for the last paper let's just ask something like according to the llm plus b Paper how do you Empower large language models with optimal planning efficiency actually I meant to say proficiency um and so yeah and now we're gonna query and we get the results so it takes in a natural language description of a planning problem that returns a correct plan for solving that problem in natural language that's why by converting language description into a file written in the plane planning domain definition language pddl then leveraging classical planners to quickly find a solution and then translating the file solution backed into natural language now this is really cool pipe ddl planning domain definition language DaVinci class formats quickly find a solution and translating back into natural language so yeah that's perfect and as far as summarization and queries go this is how you do that and it's essentially it's very simple I'm going to set up a repo with this so you guys can access it and play around the code alright guys so that's it for today if you like this video don't forget to like And subscribe and see you next time cheers
Info
Channel: Automata Learning Lab
Views: 12,739
Rating: undefined out of 5
Keywords:
Id: p_MQRWH5Y6k
Channel Id: undefined
Length: 11min 25sec (685 seconds)
Published: Tue May 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.