Is Tree-based RAG Struggling? Not with Knowledge Graphs!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
long Contex models are out what's the future of rack something's missing here this really surprised me because we clearly found Joi sayi in the contact but the system was not aware of let's go to our knowledge grab and fact check I'm not so sure about whether it's a loss in the middle thing or there's something wrong with the vector similarity search this is the enhanced answer versus the original answer now you see the obvious difference right ever since long Contex models came out such as Google Gemini Pro 1.5 or the open source large World model both claim they can process documents of 1 million tokens especially their results from the needle and the Hast has looks pretty promising this test checks how well the model can find the relevant information needed from a huge amount of text from a previous research lost in the middle language models seem to struggle more to fetch information especially if it's not placed at the very start or the end if these long contact models really turn out to be as amazing as the report show doesn't mean we'll never see the loss in the- Middle phenomenon again and does it also mean the end of rack at this point you're probably thinking another debate between long contacts in Iraq again if that's you you're at the right place because we're not taking that direction I mean why not both long contacts Rack Long contacts rack if you've been closely following llama index or Lane chain you probably already saw them using Raptor in some of their rack applications so let me briefly explain what raptor is and why it can sort of bringing long context features into rack there are three main steps in Raptor first instead of chunking documents as we usually do it directly embeds at the document level and then it group similar documents together as clusters to form high level summaries so it's a recursive process to form a documentary as they call it now the documentary has a hierarchical structure which allows the retrieval process to reference both High Lev Concepts and the more granular details in those individual documents this is how they did it in Raptor to skipe the traditional text splitting so you don't have to scratch your head to find out the optimal chunking strategy keep in mind that I'm giving you an overly simplified summary of the Raptor approach if you're interested in details I included The Source in the description so go check that out and now it's time for us to see how this version of long context rack works this is the Raptor code from lame chain and we'll be following most of the steps to construct a long contest rack however we also added something different here which is constructing the knowledge graph you will see later in the video how knowledge graphs not only can help evaluate rack performance but also enhance answers generated from Vector search our knowledge sours will be haam miyazaki's look that page the famous Japanese animator and producer just got his second Oscar the first step is pretty simple we just put in his name and Wikipedia loader will handle everything before jumping into the long context rack thing we will be using theb graph Transformer to help us extract entities and relationships from the Articles you will later see in the video how cool this will be as lengthy unstructured Text data can be visualized and the powerful things about knowledge graphs are not just limited to visualizations you will know what I mean at the end of the video so this is what I'm talking about originally these texts are like lengthy and unstructured but now you get to visualize it including the awards that he got the places he stayed the organizations you work for or relationships such as work relationships and skills are now clearly linked and visualized so now we already have our knowledge gra ready we'll put this aside in circle back to the long context racting time to run a code this is a very long chunk of code I'm not going to go through the details because first I don't want to Bard you guys second the main focus of this video is really see the performance of this long contacts rack and see how knowledge grabs can help so I'll just run through everything this is what we got and then add the documentary into the vector database our first test question is tell me about miyazaki's career achievements in this answer Studio Jubilee spirit and Away Academy Award I don't know what you feel about this answer but a lot of things are probably missing so if you still remember we have a Knowledge Graph let's go to our knowledge graph and fact check remember we have the wordss here apparently miyazaki's achievements are not limited to Academy at work you see there others such as Japan Academy prize person of cultural Merit something seems off but let's move on and test another question so our second question question is who work with Miyazaki here we got takahata let's go back to our knowledge graph to fact check again having direct work relationships with Miyazaki include Joi toshio Suzuki and takahata which is being correctly identified here but again something's missing here why are Joe hiashi and the other I forgot his name toshio Suzuki not included in this answer so then I follow with a third question wanting to make sure if the system is aware of the existence of Joe Hisashi and then this is what I got it does not clearly say whether he sayi is a colleague of mizaki the system however correctly identified takaha and Suzuki as miyazaki's colleague but in the previous answer Suzuki somehow was not included and I don't know why so then I got confused I start wondering whether joei was even in the context so this is what it did and he say she was indeed in the context for example we pick this random paragraph We Do find Hisashi in the context because this is Mia zaki's work and Jo he say she composed the music it should somehow made the inference that they are working Partners anyways this is another fall off question this really surprised me because we clearly found Joi sayi in the context but the system was not aware of so at this point I'm not so sure about whether it's a loss in the middle thing or there's something wrong with the vector similarity search I don't know but what we can do here is actually enhance and improve the generated answers from our knowledge graph so let me show you what it looks like I asked exactly the same questions as I did before who work with Miyazaki and it correctly returned tahada Suzuki and Hisashi if you don't remember it's okay there we go and I also have about his career achievements and here it correctly fetched other Awards Miyazaki has gotten so we can use the information to improve the generated answer I'll show you probably the simplest method while the changes can already be obvious to you so I just call chap open Ai and guide a system to enhance the original answer with information from the knowledge graph look what we got so the who are miyazaki's colleag question it's being enriched right now you not only see toshio Suzuki but also have Joi where his profession as a composer is also included so how does it know that he sayi is a composer it's because in our knowledge graph joi's position has a record of composer so you see that multiple structured information can be easily stored and extracted from the knowledge graph this is the enhanced answer versus the original answer now you see the obvious difference right let's move on to the next one we'll just look at the difference so it was about miyazaki's achievements originally it's a very simple answer and up here it's more solid like not only aademy award but also other merits such as person of cultural Merit and this and others which adds more context and becomes more informative too well finally here there's still feel lots of challenges to use long contact models such as demanding memory requirements and high latency regardless long contact models are definitely changing the way we think about rack and long contacts rack may be a future application as you saw in the video adding knowledge graphs to these systems can be really helpful as a source of validating information and I hope your takeaways from this video are that knowledge graphs are not just fun visualization tools they can be powerful at grounding trustworthiness in llm BAS systems let me know if you learned anything new about knowledge graphs and I'll see you in the next one
Info
Channel: Diffbot
Views: 45,533
Rating: undefined out of 5
Keywords:
Id: g1TzbKDNr7M
Channel Id: undefined
Length: 9min 5sec (545 seconds)
Published: Mon Mar 25 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.