What is Retrieval Augmented Generation (RAG) - Augmenting LLMs with a memory

Video Statistics and Information

Video

Captions Word Cloud

Captions

when using chat GPT you most probably have encountered responses like I'm sorry but as of my last knowledge update in January 2022 Etc or even responses that are not true at all this is where rag comes into play and says let me help you by injecting more knowledge or content into your interactions with an llm and help it answer the unknown and upcoming questions we hear llms prompts and rag Everywhere by now I think most of us know what an llm and the prompt is but did you know that right now rag is just as important as both of these and Powers most applications you may use involving a chatbot I recently did a poll on the learn AI together Discord Community to find out if people had already studied created or used rag applications and most voted to understand what rag is used for rag is as important as your coursebook for success in a class so understanding what it is is highly relevant in AI an llm or a large language mode model is just an AI model trained on language to talk with humans like GPT 4 used in ch GPT a prompt is simply your interaction with it it's the question you ask it but if you are experiencing issues like hallucinations or biases using such a language model or llm then rag or retrieval augmented Generations comes into play Let's quickly clarify hallucinations first it's when the model returns random things that seems true but aren't simply because it doesn't know the answer in fact a language model is constantly hallucinating it only predicts words in a statistical way it turns out that when they are trained with the entire internet there are so many examples that they manage to accurately predict the next logical words to answer most questions despite this it hallucinates it doesn't really understand what it's talking about and just outputs one word at a time that is probable what is incredible is that most of these hallucinations are actually true and answer our questions however some of them are real hallucinations of fabricated facts or scenarios and that can cause quite a few problems if they are not sufficiently controlled while there are several reasons why llms hallucinate it is mostly because they lack relevant context either because they cannot find the relevant data or don't know which data to refer to for a particular question this is because they were trained to answer and not to say I don't know rag solves this by automatically adding more knowledge or content into your interactions with an llm put simply you have a data set which is required and you use it to help the llm answer the unknown and upcoming user questions this is the simplest form and requires a few steps to make it work but this is the gist of a rag based system you have a user question that is sent to an automatic search in the database for finding relevant information which is then used back along with the question to give back an answer to the user as you can see with frag we use context from the user question question and our knowledge base to answer it this helps with grounding our model to the knowledge we control making it safer and aligned the disadvantage is limiting our answers to our knowledge base which is finite and probably not as big as the Internet it's just like an open book exam you would have in school you already have access to most answers and simply need to know where it is in your knowledge base if you find the answer in the manual it's quite hard to fail the question and write something wrong Jerry Leu CEO of L index gave a very interesting view on how to see rag in my most recent podcast with him if you think about it rag is basically prompt engineering because you're basically figuring out a way to put context into the prompt uh it's just a programmatic way of prompt engineering it's a way of prompting so that you actually get back um some contacts he also said to subscribe to the channel to learn more about AI okay maybe that's just a hallucination actually but you should still do it honestly in rag you first need data or knowledge which can be in the form of documentation but books articles Etc and only allow the llm to search and respond if the answer to the question is inside this knowledge base you have anyways if you have access to accurate information in your school manual why would you try to come up with something different Instead This is currently the best way to control your outputs and make your model safer and aligned basically the best way to ensure you will give the right answer and get your best grade for example we recently built an AI tutor to answer AI related questions we wanted accurate responses for our students both in terms of accuracy to give the right answer and in terms of relevancy so upto-date information with rag you can simply update your database if things have changed there's no big deal if the whole pytorch Library had a big update yesterday scrape it again and update your data set in a second and voila you don't have to retrain the whole model or wait for a gp4 to finally update the noledge card update the overall process of the butt is quite straightforward we validate the question answering it is related to Ai and that our chatbot should answer it then we search in our database to find good and relevant sources and finally use chat GPT to digest those sources and give a good answer for the student if you need safe information from an AI chat but like a medical assistant a tutor a lawyer or an accent you will be using rag for sure well maybe not if you're listening in 2030 but as of now rag is by far the best and safest approach to using a chatbot where you need to give factual and accurate information to build a rag based chatbot or application like our AI tutor we start by ingesting all our data into memory this is done by splitting all the content into chunks of text so split our textual data into fixed or flexible parts for example 500 character parts and processing it to an embedding model like open AI text embedding Adda model this will produce embeddings that are just vectors of numbers repres representing your text it will facilitate your life and allow you to compare text together easily you can save those vectors in a memory then for a new question from a particular user you can repeat this process and answer this means embedding the question using the same approach and compare it with all your current embeddings in your memory here you are basically looking for the most probable answer for this question searching in your memory just like you do for an exam looking through the chapters to find a title that seems relevant to the current exam question once it finds the most similar embedding chat GPT is asked to understand the user's question and intent and only use the retrieved sources of knowledge to answer the question this is how rag reduces hallucination risks and allows you to have upto-date information since you can update your knowledge base as much as you want and chat gbt or your current language model simply picks information from it to answer plus as you see it cites all sources it found on question for you to dive in and learn more which is also a plus when you're trying to learn and understand a new topic then there are still many things to consider like how to determine when to answer a question or not if it is relevant or in your documentation understand new terms or acronyms not in chat gpt's knowledge base find the relevant information more efficiently and accurately etc those concerns are all things we've improved through using various techniques like better chunking methods rank ERS query expansion agents and more that you can learn about in our free Advanced rag course we've built together with toi and active Loop that I Linked In the description below before some of you may ask yes an alternative to rag would be to fine-tune your model on your specific task basically to further train a model on your own data to make it more specific and ingest the knowledge you have rather than always searching in it like memorizing the book before the exam instead of bringing it with you I have a video comparing fine tuning and rag to teach you when you should consider each but in short rag stays relevant with or without fine tuning as it is much cheaper to build and is better to reduce undesired hallucinations as you force the model to give answers based on documentation you control and not simply things it ingested and hopefully will regurgitate correctly as in fine tune models coming back to our open book exams is just like professors making you focus on understanding the core matter and logic and not the knowledge itself as you can always find it in manuals or on Google same here for llms and complimenting them with rag plus even though those models have much better memories than us they are not perfect and they will not retain all the information you give them thus even with a fine tune model on hyp specific data rag remains something worth leveraging before we end this video I just wanted to mention that we discussed both these Topics in depth with coding examples in our llm and rag courses if you want to put this knowledge into practice the link is in the description below and they are completely free I hope you've enjoyed this video and that it helps you understand the goals and principles of rag better if you did please share it with a friend or your network to spread the know Edge and help the channel grow thank you for [Music] watching [Music]

Info

Channel: What's AI by Louis-François Bouchard

Views: 16,236

Rating: undefined out of 5

Keywords: ai, artificial intelligence, machine learning, deep learning, ml, data science, ainews, ai news, whats ai, whatsai, louis, louis bouchard, bouchard, ai simplified, simple ai, ai explained, ai demystified, demystify ai, explain ai, what's ai, rag, what is rag, vector databases, what are vector databases, why use rag, when to use rag, retrieval augmented generation, memory augmented generations, rag explained

Id: LAfrShnpVIk

Channel Id: undefined

Length: 9min 40sec (580 seconds)

Published: Tue Jan 09 2024