RAPTOR: New Retrieval Method for RAG | Chat with your Data using GPT

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] I absolutely found another issue exist in chat with your data apps that use retrieval augmented generation or better to say rack you just had your coffee right yes yeah I can guess so you're over caffeinated again okay what is your discovered shortcoming with rag implementation well sometimes you ask questions that the answer of those needs a holistic understanding of the context of your data and just bringing Just relevant chunk of data to The Prompt is not enough to answer those questions let's say you have index Harry Potter book series and then you want to ask how Harry Potter book series explore the concept of power through different characters you cannot just bring relevant chunks of books to the prompt to answer those questions you need a more picture bigger picture of the Contex of the data that doesn't work with the current rag implementation well there's a solution for it which is called Raptor okay raptor is a new retrieval technique that recursively embed chunk cluster and summarize your data recursively over and over again and use a tree based retrieval approach which considerably improve your rag performance and also good for those specific holistic type of [Music] questions what I have nothing to say that was the end of our script you didn't write any dialog for here you forgot our dialog again can just go ahead and show them the demo we have to maybe record again you know what let's go before we start make sure you subscribe and hit the Bell icon so you will get notified for the next video thank you all right let's get into Raptor or better to say recursive abstractive processing for tree organized retrieval well as you can guess from the name yes that's a new retrieval method we're going to discuss about but before we talk about how Raptor retrieve your data when you chat with your data let's talk about the challenge that we are going to potentially resolved through leveraging Raptor well if you know now how you can chat with your data using llm or through Rack or retrial augmented generation we are talking about how we can better retrieve our knowledge to prompt to ask our questions about our own data that llms don't have any idea about our own data because they have not been trained on our private data so rag is a technique that we can connect our own data to large language models if you don't know how you chat with your rors what is rack I have already created that a video about it so make sure you watch that first and I'll out add that video to the top right of the screen okay so I assume you already know what is rack and how you typically retrieve your knowledge through chunk of data to the prompt to answer your question or chat with your data but what's the challenge with these typical retrieval approaches we have with rack I would like to start to answer that question with referring to the paper published about Raptor as they mentioned sometimes you are chatting with your longtail knowledge what does that mean sometimes you're are going to ask some questions and the answer of those questions need a holistic understanding of your overall document context not just a specific chunks like the example we made in the intro of the video If you have a long book long PDF f a series of let's say Harry Potter books as an example again and you're going to ask some questions like how the concept of power is described in different characters in Harry Potter series I can not just bring relevant chunks to the prompt to answer the question so the llm should know the overall document and some details as well that means the typical retrieval approach we have is just chunking and bringing the chunks is not enough and we might fail to answer those questions and this is the challenge that raptor is going to address through a new retrieval method which is a Tre organized method we're going to discuss about and we will see implementation of that shortly and they are saying that they have benchmarked over some specific data and they could increase theet accuracy up to even 20% more okay so let's see how Raptor address that challenge this is typically what Raptor does let me open up this picture there you go as you can see from the picture well I'll explain that what's going on here but honestly their implementation is not really hard or or complex but the idea is pretty smart so this is what's happening here just follow my pointer you can see number one to five let's assume these are your chunks of data these are different chunks of different parts of Harry Potter book series your own L whatever and what Raptor will do Raptor will start to Cluster these chunks of text let's say chunk number three and chunk number five they talk about a similar topic so we cluster them together and I'm going to talk about that how raptors clustering and uh number one four and five they look also similar so one four five is is on under one clust two and three are under one cluster so now you can see out of five chunks Raptor has created automatically three cluster components for me and then by using an llm let's say GPT 3.5 or gp4 on backend it will start to Summarize each cluster for example it's going to tell uh cluster number one which include chunks three and five they talk about topic a so this is my uh six Noe 7even Noe eight Noe so and then it will start to do the same thing now you can consider the summaries are your chunk your new chunk and it will start to Cluster again summarize them again so these process of one 2 three recursively happening again and again until you see that you can no longer cluster these together for example 6 seven and eight they are different cluster by themselves I cannot put them under one cluster that means we have really reached to the highest level of uh context that we could gain from our documents and then with having this formation of um documents or chunks of documents we have now we're going to talk about okay how Raptor Raptor retrieve these chunks when we ask questions to answer our questions when we chat with our own data so let me show you that here's a published paper and by the way I told you I going to mention how they cluster your data the chunks they're using GMM or ghost and mixture models that's a pretty flexible model for crossing because it won't ask you necessarily how many clusters you want to have it feel sort of dynamically decide for which is great approach I believe and going all the way back how rap to retrieve your knowledge here it is there are two approaches that you can retrieve your knowledge to Raptor three Traverso retrieval or collapse three retrieval so let's start with number one a how three Travers retrieval work when we have these three structure that I told you how we do that chunking clustering some Rising chunking clustering some Rising again and again so each of these chunking clustering chunking is a layer right so when we say I want to retrieve top three or top K top three documents from my uh chunks that means I'm going to select three chunks of my data um from the overall documents that are going to retrieve to The Prompt right so if I say for example uh go through tree travels or retrial it is going to select three three documents out of each layer for example here the number three was number one so it has been asked that give me the top one document so out of each layer it is selecting one document the next layer one document third layer one document and it is using cosine similarity to also check out of these three which one is the most similar one or relevant to the question that the user ask so let's say second one here second one here third one and then I have three because I had three layers and each I selected top one and then I gave give these retrieve context to the to the large language model to answer the question but if you go with collapse tree retrieval it doesn't care about this three structure anymore it flatten all of them and when I say for example give me uh top three it will go ahead and select the most relevant chunk up to the top number that I have selected and it retribute to me give it to the query and answer the question now you might say that okay which one is better I don't know which one I need to choose well actually they did a benchmark and they said that majority of the time and regardless of the context of the the length of the context collapse tree works better because it will give the flexibility for the solution to choose the top um nodes relevant notes based on the question that can be a summary of multiple notes or one single chunk or a cluster of multiple chunks gathered under one Noe because as you remember we are doing this clustering stuff so if I choose number six that means I'm including some data from chunks three and five so this is one selection but underneath I have detailed information so that's why you're giving the higher level of context of knowledge to the prom and I think that's pretty brilliant okay so now we have understanding of how Raptor Works in general and you can of course read the paper um they have more information about they have benchmarked this solution over multiple different data sets you can check them out they have done Benchmark with other retrieval approaches check out the results but here's a quick implementation on how you can do that on your side and I will add this code uh reference link to Discord Channel and if you want to go to the Discord Channel below this video there's a description part and then the description I have added the Discord Channel link cink on Discord Channel when you join the Discord Channel there's a reference section click on references you will see the link of all these codes relevant to this video and even all previous videos I have recorded okay so what you got to do to start first you need still s couple of packages needed added in the notebook and the reference so these are uh it's developed by llama index that's why you'll see that uh you need to have that installed and then from llama index I'm importing Raptor pack what I going to do I want to chat with my own data I don't have any specific data so this example is using the PDF of the paper the same PDF file open to let's say this is my data and I'm going to chat with it assuming so so this uh code is simply just download the the paper and of course it needs a LM so make sure you add your uh open key I added mine and removed it after for recording this video and well this is important just in case if you want to run asynchronous Loop of events in Python especially when you run a jupyter notebook or a notebook here and but I'm using Google collab it doesn't mean you have to use Google collab you can use any python environment that you have and then just simply using llama index I am loading that PDF file or later on your data any data you have through llama index and then I want to create a vector database out of it right so you can use your own Vector database but here this is a demo notebook they're using an open source Vector database called chroma DB so they're creating that in the local directory of the place that I'm running this and then we are creating collection out of this Vector database to call it Raptor and then now we have assign our Vector store here so now because we have imported Raptor pack I can call it to start creating that three e base retrieval formation we discussed about this is my documents I I imported on the top I'm going to generate embeddings using text edding well you can certainly change the model if you want but this is the model was there in the demo I didn't change it and the model that I going to use for generating summaries of those cluster chunks I'm using GPT 3.5 and I'm going to say that this is the vector store I going to use for this retrieval and remember top K I mentioned that the value that going to say uh what going to be the number of chunks are you going to select on each layer if you go with three retrieval approach or if you want to go with collapse approach this is the total number of chunks you are going to retrieve and here we have decid the mot to be collapsed and that's it this is a chunk size and overlap in case you don't want to miss any information and this is sort of the structure of my tree so it's starting from the root layer I have 11 clusters generated out of my chunks and then the next lay I think I just have one cluster or two cluster yeah one cluster so this can go recursively and as you can see recursively I reach to two layers well with considering zero it is two three layers actually in my tree based knowledge that we discussed about which is this one okay now this is the time to to ask a question not getting for not getting the answer to just see if we can retrieve relevant chunks or not so let's say what is Baseline Raptor what is what Baseline is Raptor compared against this is a question from PDF and the mode is collapsed and you can see that from node number two there were some selected chunks that brought to the prompt if I go with three traversal this is how you need to change the mode I talked about the difference of these mod in the paper but it's also telling me okay how this approach for I mentioned you under each layer it will select two chunks because the top K value was two and that's why I have now more chunks of data coming in but when you want to answer a user question not just see the retrieve knowledge you have to create a retrieval with saying that again this is my embedding to embed the question because I'm back in going to do cosign similarity again top K where is your vector database and again the mode which is now this time three and with defining that as a query engine I'm just going to type my question here again the same one what Baseline was Raptor compered against and it's going to tell me directly the answer which is out of these retrieve chunks that the user won't see them all right that was a quick overview of a a recent Noble idea on how you can better retrieve your knowledge when you create your own rack when you chat with your data especially for the questions that need a high level holistic understanding of the conics of the data that your current simple rag implementation might fail and I think that work to take a look Implement and check it out to see how it perform in your own use case I haven't tried this in a higher scale uh but based on the Benchmark they have showcased and based on the concept you're talking about it seems to be very promising so give it a try let me know what you think about it and write down in comments what do you think about it if you have any uh if you see any pros and cons and in general if you have anything to add as a feedback I would love to read and the community will appreciate as well and if you like the video please make sure you click on like icon thank you so much by helping others you are paying the rents for the room that you have here on Earth dream big my friends believe in yourself and take action till next video take [Music] [Applause] care [Music]

Info

Channel: MG

Views: 3,673

Rating: undefined out of 5

Keywords: finetune your own llm, large language models, vector database, train gpt on documents, train gpt on your own data, how to train chat gpt 4 on your dataset, retrieval augmented generation, large language models explained, retrieval augmented generation chatgpt, retrieval augmented generation architecture, retrieval augmented generation (rag) tutorial, train custom gpt, machine learning, Connect your data to chatgpt, how to train your custom gpt, train gpt on pdf

Id: c9EVkLDd6_Q

Channel Id: undefined

Length: 16min 0sec (960 seconds)

Published: Tue Mar 05 2024