Workaround OpenAI's Token Limit With Chain Types

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so you got yourself in a bit of trouble because open AI returns an error to you and that error says that you have exceeded the token length that's an issue and we're going to show you four different ways on how to fix that issue so first let's set up the problem here just one more time I'm going to copy and paste a short passage into the playground on open AI this is the same for the API and I'm going to say hey please summarize this thing for me and you'll notice it's thinking about it it's trying oh no the mock the model can only process a maximum of 4 000 token tokens in a single request this is an issue and if you're running a business or you're making a product based on open ad you're gonna have to work your way around this now there's word of them with open ai's Foundry to increase this model or this token length however it's going to cost five or six figures a year just to be able to use that one so I'm not holding my breath for it and regardless I imagine that model links will always be a problem so it's good to learn how to invest I'm going to learn how to figure these out right now now the way that I want to talk about these four different methods is actually starting off with a diagram first and I find that this is helpful because um well when you get the code it's kind of confusing sometimes and so let's let's go let's go into the diagram first and use some pictures I like pictures so let's reframe the problem one more time so open AI has a 4K token limit okay now scenario one you give a prompt it gives you a response and it's still below the 4K you're golden let's say that you have a short prompt and a longer response as long as it's under the 4K you're still golden now number three long prom short response you're still good to go the issue is going to be when you go from a long prompt and a long response or any combination of the two and you exceed the 4K all right so let's figure out how we're going to fix this here with solution number one it's what we call stuffing well I guess I shouldn't really call it a solution I should just call this a method to prompt uh management if you will so in this case we have our document and we're going to try to summarize this document our document is only 2K characters long so we can feed that right into open Ai and we can say hey uh please summarize this for me and it's going to give us the response and we're going to stay under the 4K limit which is a good thing however if our document is too long that's where we run into an issue again we have the 4K limit but our document or documents is 8k characters to 8 000 tokens long right we can't feed that all into open AI that's an issue and we won't be able to do it it's going to throw an error to us so how do we get around this well let's look at prompt management method number two which is called oh well first of all the pros of this one is you get one API call and all of your data is in the prompt which is a good thing because you have all the contacts that you need and the language model can uh can use it the cons is that there's going to be the limited context length you're going to run into that error limit the second method we're going to look at is called mapreduce this is an interesting one because again we still have our 4K token limit but our document is 8k tokens long so what do we do for this one well in this case what we're first going to do is we're first going to slice up our document into individual pieces and with those individual pieces we're going to pass each one over come on now we're going to pass each one over to open Ai and we're going to say hey here's your prompt and well here's your prompt uh right then and there and instead of just giving it one API call and one prompt we're going to give it four prompts in four separate API calls and we're going to say to open AI hey it would be great if you could please summarize this for me and in response we're going to get four different summaries because we split it up into four different chunks and then we're going to make a fifth call on top of that and we're going to say hey given all these summaries that you just have give me a final summary or give me a summary of the summaries in this case and so this is mapreduce now the pros about this one is you can scale it to pretty large documents which is cool not only that it can be parallelized meaning you can make all four of these API calls in parallel you don't they're not um you don't need to wait for one to return for you to make the next one now the cons of this one is you're going to start to increase uh more API calls compared to the stuffing method and you might lose a little bit of information because you're doing summaries on top of summaries on top of summaries in some cases yeah that's method number two let's go ahead and look at method number three and in this case there's the refine method now with this one what we're going to do is we're still going to split up our document but in this case we're just going to pass it chunk number one and we're going to say hey please generate me a summary okay cool well with chunk number two what we're gonna give it is we're gonna give it that summary number one that we've already gotten and then we're gonna say given this summary number one given this context from this chunk number two please combine the two and give us a new refined summary and then this is going to keep on going on and on and on until you get to the end of your chunks that you have there and then that final piece that you have would be the fully refined summary if you will and then that'll be your final output now the pros about this one is you get pretty relevant context because you can kind of carry the important parts across your chain there the cons is that they're all independent calls right and so it's a synchronous uh process here where you need to wait for one wait for the other wait for the other and so it could take a long time okay now method number four that we're gonna do is one that's gonna be called map re-rank and this one is more for uh specific questions rather than uh summaries and in fact the library that we're going to be using today uh doesn't even support this for some reason they only do questions and the way that this is going to work here is we're still going to split our documents but this time we're going to pose a question to our different chunks and what the um the method is going to do here is it's going to say hey how confident are you that this answer that you've given from the chunk is the final answer that we actually need so in this case we asked that on our first Chunk we asked that a question and it has an 80 confidence that this is the right answer then okay we do it for chunk number two and there's only a 25 chance that this is the right answer and this is all just the language model interpreting the right answer so this isn't a scientific process here and then what you do at the very end of that is you're going to rank whereas the re-rank part comes in you're going to rank the top scores that you have there and you're going to return the answer that had the highest score so it'd be difficult to do this as a summary which is why we don't do it you only do it for a question and answer there so the pros for this one is is it scales well and it's but it's better for single answer questions so not very complex questions and then the cons is you're not combining any information in between documents because when you compare prompt one and prompt two there's no sharing of that information across there all right now that is the four methods that we're going to look at in diagram form let's go ahead and check these out in code form all right let's look at some code here so we're going to be using the lane chain library now I don't think I need to tell you but link chain is extremely good at file loading document management prompt management chaining all these things together and it's really the magic behind how we're doing everything we're doing here so if you haven't checked it out please go check it out I'm going to load up some libraries for us this includes the file loader the summarize chain which is going to do the summarizing for us and then a QA chain which is going to do question answer for us let's load up some documents we have a John mere essay about Lake Tahoe and we have a Paul Graham essay about work exciting um no I like programming so we do a summary on our docs here I just made a quick function we have one document about 2200 2300 words and we have a preview the glory of the Sierra how beautiful how poetic and then let's look at Paul's essay we have one document it's about 22 and a half thousand or Twenty twelve and a half thousand words so it's quite quite bigger quite larger and a preview before College the two main things I worked on outside of school were writing and programming not quite as poetic as Mr mirror but uh we'll let it slide here let's load up our llm in this case we're doing open AI pass in our key okay cool and then we're going to load our summarize chain and this is going to be with the stuff method so the first method that we talked about and in this case we're going to take our entire document and we're going to stuff it into the prompt I like how visceral that one sounds and so let's do it for our small dock that we have here and I did verbose equals true because that's going to show us what's underneath the covers and what Lang chain is actually doing here so write a concise summary of the following which this is a lang chain prompt by the way and then the inserts our own text and then we give it a text okay cool and then Lange says give us a concise summary so in this article blah blah blah and so we have our summary about our small doc which is cool now if we did this with a large dock Well Lane chain is going to do the same exact thing and it's going to say write a concise summary of the following and then we have the following but this is quite large and this is where the issue is going to be because down at the bottom oh no this model's maximum context length is about 40 97 tokens now that's where issue is so how do we get the summary of this larger dock well that's where the other methods come in so let's talk about those first one we're going to look at is mapreduce so again I'm going to say chain type equals equals mapreduce and then we're going to say for both equals true okay now if we're going to run this on the small dock uh I mean no surprise here it's more or less the same exact thing that we have with the stuffing because stuffing worked and mapreduce just has one document to work with so let's not even worry about that but to prove it to you uh you can see down here at the bottom we get more or less the same summary now the problem with our large dock is that it's just one document or it's one big huge chunk and we need to split that into smaller chunks and the way I'm going to do that is with Lang chains of recursive text splitter and okay cool we're gonna set this up and I'm going to say chunk size equals 400 I would normally make this much bigger but uh just to show you how it works I'm going to make it smaller chunk overlap I'm going to put it zero meaning I don't need any Venn diagram similarities going on there and I'm going to say hey split my documents the large stock and I'm going to put this into a large Docs I know it's not a wonderful way naming but that's what we're going to do let me do a summary of that so now I have 201 documents um with the same with roughly the same amount of words that which is too many from beforehand and we still have our preview okay cool but the important part is that we now have instead of one big dock we have 200 smaller docs right and if I were to run uh the mapreduce chain that we just made but I'm only gonna do it on the first five documents because 200 is way too many and I don't want to spend all that cash to query the API for that now here's where the cool part starts to happen so what Lang chain is doing is it saying write a concise summary of the following and then it gives it a shorter chunk so it's not passing the entire thing down there it's just this one chunk Okay cool so there's section number one here's section number two section number three section number four and section number five because I said just give it the first five sections and then what it's doing is it's taking all those summaries so here's summary One summary two summary three summary four summary five and it's saying write a concise summary of the following so give me a summary of the summaries and so we finally get a summary of our entire document that was way too big for the prompt um via the mapreduce method which is cool awesome so let's look at this one more time but let's use the refine method so in this case I'll do refined do verbose equals true and again I'm just going to do this on the first five documents and so this is where it gets kind of interesting the very first call that it makes remember this is uh not in parallel the first call that it makes is write a concise summary of the following and then we have all the different we have the first chunk here and then here's where it gets kind of interesting this is link chain inserting this prompt here in talking to open AI your job is to produce a final summary we have provided the existing summary up until a certain point so here is the summary that it pulled from chunk number one and then we have the opportunity opportunity to refine using this extra context and then given the new new context refine the summary all right cool so that's chunk number two well chunk number three oh interesting now we have a longer summary because it had two chunks to go off of and then it had part number three and it says give me a new summary give me a new summary give me a new summary blah blah blah and so now we have a longer summary between the two and you can see the last one and so we keep on refining and refining and refining this is why this one's a little bit longer here so that's the refine method on how you do it an alternative you're gonna have to see if it works for use case I suggest you try them out and see how it goes and then the final one we're going to do is we're going to switch over instead of summarization we're going to do a question and answer which is for map re-rank so again we'll say for both equals true but in this case I want to return the intermediate steps which is just a fancy way of saying hey show me even more what's underneath the hood so we got our chain there and we got our query so who oops uh who was the author's friend who got who he got permission from to use the IBM 1401 um I saw this referenced in the document so which is why I'm pulling it out so I'm going to input my only the first five docs again I'm going to give it my question and I'm going to return the outputs so let's go ahead and run this so now it's going through and what it's doing is it's kind of a complicated prompt but it's cool to see use the following piece of context to help answer the question at the end in addition to the answer also return a score of how fully it answered the user's question and then so not only does it say hey here's the format we're going to use how to determine the score but then it also gives it examples about how to score which is kind of interesting so it gives it a couple examples here and then begin all right so just by the way this is pretty good prompt engineering if this is a fine example of it we have context right here about the question and then here's the final question who is the author's friend blah blah blah blah and then it does the same thing for chunk number two chunk number three blah blah blah blah blah blah and then we go down and it finished the chain so it went through all those different five chunks ask the question ranked the answer for each one of those questions and let's see what we got here what we got rich Draves which um I won't show but it is in the essay yes this is the right answer in fact I bet you could even go in yeah so my friend Rich Draves is pulled out of one of the chunks which is cool um cool now let's take a look at the intermediate steps so what it did was I went through the different five docs that we passed it and for the first DOC or for one of the docs I don't know which number this was this document does not answer the question score of zero it does not answer does not answer does not answer but for this document it did and it gave it a score of 100 which is why it was uh it returned that answer super cool that is the ref that is the map re-rank method so in the end there are four different methods of prompt management it's not it's kind of like query management if you will about how to chain your different commands together in order to fit your use case now have fun and uh let me know which ones work for you we'll see you later
Info
Channel: Greg Kamradt (Data Indy)
Views: 40,342
Rating: undefined out of 5
Keywords:
Id: f9_BWhCI4Zo
Channel Id: undefined
Length: 15min 52sec (952 seconds)
Published: Wed Mar 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.