Flare: New Retrieval Method for increasing Confidence in RAG | Chat with your Data using GPT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] yeah we all know this is a mutual business and we are not doubting your llms we're just saying we're not happy with the latency and do you have any cheaper LMS in general and we are a big rack customer you know you can give us some good discounted fees yeah but no we lack of confidence exist in rack implementation or chat with your data use [Music] cases I think he got rack phobia anyways this video is all about a new retrievable method called flare or forward looking active retrieval augmented generation which decide to when and what to retrieve from your knowledge base based on the confidence of the sentences that the llm has generated for example if your llm is answering your question but is not confident on a specific tokens words or sentences then flare will take care of them with retrieving some knowledge for the parts that they are not confident to give you a more confident answer that's pretty cool then let's go before we start make sure you subscribe and hit the Bell icon so you will get notified for the next video thank you all right let's get into flare or forward looking active retrieval augmented generation so this is actually their official published paper you can go through that but I'm here to mainly start with the flow of how flare works and then quickly show you a demo uh in Python card how we can leverage that so let's see what flare is trying to accomplish in your rag application let's say here are going to ask generate a summary about job right and I have some knowledge database and I can retrieve some relevant information about Joe Biden whether it's from internet from a local documentation whatever to answer that user question right so what flare does flare can technically predict what going to be the potential next sentence based on the context of the data for example most likely it's going to say Joe Biden born in this date and this is the numbers of preny of the United States right and as you can see here each generated token award has a probability for example how confident our model is about this date and for now for the sentence everything was pretty confident so we got this response back but falling on the top of that it says Joe Biden attended University of P Pennsylvania and where he earned a law degree now as you can see these underly tokens or words are the ones that the model is not confident these have low probability tokens then flare going to ask our llm that hey generate a question that the response will be University of Pennsylvania so the question will be then where did Joe biner attend the University and what he earned right so that question then will go all the way to my knowledge based again web your vector database Azure a whatever you have and get those specific know back and there you go be corrected that because there were not absolutely correct answers and this is the correct University and this is the date and this is the the domain because we were not confident we retrieved them back so again the next question Joe Biden announces candy for this state and president election on this month month and year day and year again we're not confident about this going back to the retrieval we got the correct date so technically this is the idea of how flare works and interestingly they were benchmarking their Solutions cou against couple of other retrieval methods let say the single retrieve approach and it seems like over the data sets that they have Benchmark this green line the green bar which shows flare approach seems to be more promising compared to others so let's see how it works in action good news is Lang chain already wrapped that solution so you don't need to develop this solution from scratch what what I going to do of course you need to have your open E ke I remove that after running this notebook and serer API what is serer API that's a wrapper around Google search API our knowledge base here going to be internet so I grab the information that we have low Confidence from internet but it can be your knowledge base again your vector database whatever right it doesn't matter now these are the Imports you need to um have them from L chain so make sure you have L chain pip install and pip install Lang chain openai and after importing all here I'm defining how I'm retrieving the knowledge this is the place that can connect it to your Azure AI search or vector database however you're retrieving knowledge this is for me is Google search API using serper and I have that already created and by the way when you sign up serer for free it's going to give you I think couple of thousand free credits so that you can search by API all right so here this is the part that flares already developed under L chain so you just need to Simply import flare chain and you can have obviously all these V know parameters that we all know what they are and by the way because flare is iterative it keep continuously ask questions about the parts that is not confident in the answer uh you can have the the max generation length also limited because each time llm is being asked to generate a question goes to knowledge database so there's a cost associated with that too for example here I ask explain the Great detail difference between Lang chain and autogen we know Lang chain and autogen is an open source Library developed by Microsoft for creating autonomous multi-agent applications now when I ran this let me go all the way to the top and let me show you the initial answer here let me open that up there you go it says Lang chain is a software develop development framework that is designed to help developers creating program language I'm not happy with this answer right this is technically without the starting F in place or if I show you that if I quickly ask this question directly from llm let's say gp4 as an example it's going to ask answer me something else because there's no retrieval method here it cannot confirm that and there's no flare here too so the answer is quite wrong but now with Flare we got this and as you can see there are some entities and terms that they're not confident the answer and we need to regenerate some question to get the answers and confirm them for example both tools have their own unique features we don't know what they are so it generated a question that the answer going to be this can you explain the purpose of functionality of Lang chain framework in context of software development and what are the key distinctions between so it's going to capture the unique features right so go all the way back retrieve the information generated some answer for me and again there are some specific entities that is still it's not confident for example data driven applications it's not sure if Lang is a data driven application right so it generated that or uh through its host we need to actually check what does that mean through its host so this is something it's not comfortable or confident about the answer so this is again another followup question from our knowledge base which is serer API or Internet this is how we grab the information and going all the way down we through this erative process this is the final answer which is much better than the previous one so telling me both of them are for sort of AI agent Frameworks and it's talking to me that Lang chain is for um designed to okay let me go to make oh sorry to make prompt injury more efficient well that makes sense for me now rather than saying Lang chain is a language development framework which is not that necessary so that's all it was pretty quick just wanted to show you an overview of what is flare and give you an idea you don't need to necessarily import flare through from Lang chain but you use the probability of confidence of the token generated in the answer to know what specifically you need to retrieve again to come up with a more confident answer this is something that we might have ignored we just grab the knowledge give it to the llm wait for the response but the final response is generated by llm and llm has seen some of the retrieved knowledge that you're bringing to the context and it's giving you the answer but even maybe the given context is not enough so llm is giving you the answer but not sure for a specific part or it's hallucinating or whatever so use that token probability to see what you need to refet from your knowledge database maybe other chunk of data because now we have a new query generated by LM on backand grab more context to the prompt and then get the answer in a more confident manner so that helps a lot to boost the accuracy of the answer or roundness of the answer that you have from your rag implementation I hope you like the video and you found value out of it if you enjoyed I would be happy if you press on like button and make sure you write down in comments or questions thank you so much and take care until next video life is all about a battle between concentration and distraction with concentration you will grow to power and in your peace don't let them getting you to slaughter house with distraction dream big my friends believe in yourself and take action take care
Info
Channel: MG
Views: 593
Rating: undefined out of 5
Keywords: vector database, retrieval augmented generation, large language models, vector similarity search, langchain chatbot, vector search, langchain ai, langchain chatbot python, train gpt on documents, train gpt on your own data, retrieval augmented generation chatgpt, retrieval augmented generation architecture, retrieval augmented generation (rag) tutorial, Connect your data to chatgpt, how to train your custom gpt, Increasing accuracy and confidence in RAG or Chat with data
Id: aXBBYh4xdgM
Channel Id: undefined
Length: 10min 28sec (628 seconds)
Published: Tue Mar 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.