AWS re:Invent 2023 - Use RAG to improve responses in generative AI applications (AIM336)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Alright, I think it's time to get started. So thank you everyone for attending our talk on how to use retrieval augmented generation to improve your responses in generative AI applications. So hopefully that's what you're here for and if not, then you're stuck with us. So let's get started. We've gonna action pack agenda today where we're gonna cover a variety of topics including customizing foundation models, why you should think about customization and common approaches for how you should customize. Then we will go specifically into retrieval augmented generation or RAG, and we'll deep dive into how it works and cover all the different components from the data ingestion workflows to how embeddings work and demystify a lot of the concepts that you might have heard throughout this conference. We'll introduce Knowledge Bases for Amazon Bedrock, which I hope you heard in the keynotes from Adam and Swami. And then we'll talk about how we're making the building of RAG applications really, really easy. And then we'll cover lastly how these capabilities of knowledge bases work with other parts of the Bedrock ecosystem such as Agents for Bedrock and also how you can leverage open source generative AI frameworks such as LangChain for building retrieval augment degeneration capabilities. And I forgot to introduce myself. And so I'm Ruhaab Markas. I lead product management for Knowledge Bases and Amazon Lex. - And I'm Mani Khanuja, technical lead for generative AI specialist in the worldwide specialist organization. And today I'll be co-presenting with Ruhaab and taking you through this journey of how you can build your retrieval augmented generative applications using Knowledge Bases for Amazon Bedrock. - So quick show of hands, who has heard of Knowledge Bases for Bedrock, either through the keynote or through our previews. Okay, fantastic, quite a few of you have, and it looks like a few of you haven't, so we'll definitely have a variety of content for everyone here today. Alright, so first I want to talk about why you should think about customizing a foundational model. So foundational models have a vast amount of pre-trained knowledge directly encoded into the model, right? That's, if you've heard the term GPT, that's really what the P stands for, right? That pre-trained knowledge. But it's important to understand that in many ways these models don't really know a lot about your specific company, right? And so the first reason you might want to customize foundational model is to adapt to domain specific language. Let's say you're in healthcare and you need the model to understand all the medical devices that you sell, right? That doesn't come out of the box in most instances. Secondly, you might want these models to really perform better at really unique tasks suited for your company, right? Let's say you're a financial services company and you're looking to do more advanced accounting or analysis on earnings report, and you wanna teach these models about tasks and help really specialize these models on a company specific data set or task. And lastly, you may want to think about customization when you want to improve the context and awareness of these models with your external company data. So how do you bring company repositories such as FAQs or policies and other documents that exist in your company and pass that as context into a foundation model. So those are a few reasons why you may want to think about customization. And now we'll cover, you know, some common approaches of how you can customize a foundation model. And we'll talk about a few common approaches, these aren't exhaustive and there are other approaches and these will, you know, grow incrementally in terms of the complexity, the cost and time it takes to implement these changes. So the most simple approach for customizing foundational model is prompt engineering. And a prompt is simply the user input that you pass to a foundation model, right? And these prompts can be crafted and iterated upon to really steer the direction to get the right output foundation model. And there's a variety of different approaches that you can leverage for prompt engineering. Prompt priming, prompt weighting, or even chaining different prompts. So prompt priming is really the most basic form of prompt engineering, which is just taking a input or a form of instructions and passing that to a foundation model. Sometimes you can even pass specific examples or tasks in the foundation model through the prompt, and that's known as in context learning. Another approach, as I mentioned, is prompt priming where you, sorry, prompt weighting, which is giving more emphasis on certain elements of the prompt that you want the foundation model to really focus on, right? So if you tell the the model, you know, definitely don't respond to something that you don't know about, right? Capitalizing that and putting 5,000 exclamations, like those things actually do bias and put emphasis on certain parts of your instructions. And lastly, there's prompt chaining, which is taking more complex prompts and breaking that down into more discreet parts where outputs of a certain prompt are then passed as an input into the next task. So those are just a few examples of prompt engineering. Secondly, there is retrieval augmented generation, which is all about leveraging external knowledge sources to improve the quality and accuracy of responses. And when I use the term external knowledge sources, it's likely that these knowledge sources are actually internal to your company, but it's external in terms of the knowledge of the pre-trained model, right? You're really helping bring new knowledge to the foundation models, hence the term external, right? It's external to the pre-trained foundation models. And we'll really deep dive into retrieval augmented generation throughout the presentation, but the basic steps being you're retrieving some form of text from a corpus of documents. You're using that as context to a foundation model to ultimately generate a response that's grounded in the knowledge of your company's data, right? Which is extremely powerful when you think about using the advanced reasoning capabilities foundation models, but really steering that towards the knowledge specifically from your company data. So these two forms of customization is really about augmenting a foundation model. We're not actually going in and changing anything in the foundational model itself, but there are approaches that allow you to do that, such as model fine-tuning, and model fine-tuning, it allows you to really adapt a foundational model on a specialized task specific data set. And this is a supervised approach, meaning that you're training the foundation model on labeled examples of tasks and you've specified the expected output and outcome through those examples, allowing you to really, you know, train this model on a specialized task. And through fine-tuning, you're actually updating the weights of the model, right? The parameters of the models are actually being adjusted based on this customization. And lastly, and arguably the most costly and time intensive approach is training the foundation model from scratch. And it's a approach where you really want to be in complete control of the data that's used to train the model. You may want to remove any inherited bias that might exist from some of the other contexts that the model is trained on, and it's giving you complete control and building a domain specific model. But obviously this requires an extensive amount of task specific data, a lot of compute resources making it obviously one of the more, you know, complex and time intensive and costly approaches. And so while we talk about a few approaches of customization, again, this is not exhaustive, but some of the more common approaches that you'll see in model customization, and today we'll focus specifically on RAG. So now that we know why you should customize and common approaches on how to customize, let's look at a mental model for when you should use certain methodologies. So it's easier to make a decision of what approach to really take. And it all starts by thinking about the task that you want these foundation models to execute. And does this task require context from external data is kind of that first decision point. And if the answer is yes, you really then have to think about is that data access needed in real time? And if the data is relatively static where it's not changing on a realtime basis, such as frequently asked questions or policies, that's a classic use case for retrieval augmented generation. But just because I use the term relatively static doesn't mean that this data isn't changing, right? And so I don't want that to be misleading because you can, you know, have data changing in this construct as well, but it's not changing real time. However, if the data is changing real time and you also need the ability to connect to different tools, meaning that I'm fetching or querying data from databases or I'm interacting with APIs or applications and tools, that's a use case for Agents for Amazon Bedrock. And we'll also cover how agents and knowledge bases can be brought together to really build, you know, a really powerful, you know, capabilities, so these aren't mutually exclusive by any means. Next, if you have a use case that's more, it's leveraging historical data, right? A historical, like a snapshot of information and it's a relatively simple task that might already perform really well from a pre-trained foundation model. That's where prompt engineering can really help, right? I'm passing some specific context or task or instructions, right, as part of my prompt engineering, and that in many times can be extremely effective. And lastly, if I have historical data that's maybe a bit more complex as I mentioned, that you know, is task specific and needs a little bit more task training, that's where model fine-tuning, you know, serves a really important purpose. And you might have heard today in the keynote that we've announced the ability for fine-tuning for foundational models for all the Bedrock foundation models with the support of fine-tuning for Anthropic Quad coming soon. And so let's deep dive into RAG and really understand what is retrieval augmented generation. - So, so far what Ruhaab has covered is how do we, why do we need to customize and provided us some really good prescriptive guidance on when to customized and how you can work backwards from your particular use case. But now let's understand what is retrieval augmented generation? And as the name suggest, the first part is retrieval where you have to retrieve the information, relevant information and then augment it with your original query, pass that to the foundation model to generate the accurate response. Now there's so many aspects to it. Now just imagine if you have this large amount of information, right? And then you say, okay, let me just add everything to the model, everything, what will happen? Multiple things might happen. First of all, your input size that the model can take, which we call this context length, might not be enough and you might get errors. Second, just imagine if somebody throws us, like throws a lot of information on us, as human beings, we'll also be like, "Oh, let me pick up the relevant one to answer the question." How do I do that? It takes us time, right? The same goes for the model. So what we need to do is provide relevant information and that's where the retrieval part becomes super important. So when we retrieve the relevant information from our large knowledge data and then provide that relevant context to the model, right? So that relevant context, we augment it with our original query so that the model knows the question as well. And then we feed that to the model and then the model, that helps the model to generate responses. And the prompt engineering also plays a very important role over here because we might want to add more instructions to the model based on our use case. So let's take a look at the use cases for retrieval augmented generation. So the first use case that comes into mind when we think about RAG is really improving the content quality. How are we improving this content quality using RAG? Is by reducing hallucinations. So for example, as Ruhaab mentioned, right? These models, when we are talking about pre-trained models, they're really big, they're trained on really big amount of data, but that data was somewhere in some point in time, right? That might not be very recent, so that's number one. Oh, I don't have the recent data with the model, and the model can act super intelligent and provide you with incorrect answer if you ask recent information on which it was not trained on. So in order to improve those quality of the answers or the responses and remove hallucinations, that's where we can use retrieval augmented generation technique to improve it, right? Now we have covered that part, but what if I want the model to answer only based on my knowledge or on my enterprise data. I don't want it to provide me answers from its own knowledge. I want to use the intelligence of this model and channel it only and make it focus only towards my data. That's where there's, you know, the applications such as context-based chatbot and the Q&A comes into play, right? So you can use RAG technique to build those applications. The third one is personalized search. Why do we want to limit to question and answering, why not add this technique, because we are any ways retrieving the relevant content and provide maybe augment our recommendation engine and to create such type of application based on my profile as a persona that I might have in my preferences or based on my previous history. For example, if I'm on the retail side, I bought certain products, there is a history which is already there, what if I want to use that along with my preferences and show recommendations to my users? So you can do that using the RAG technique as well. And the last one is super close to me just because it, the way it works. So I wrote a book on applied machine learning and high performance computing in December 2022 it was published, and somebody gave, and you know, at that time this generative AI was also getting popular, so somebody posted a review using generative AI trying to summarize the book, which was approximately 400 pages. Now just imagine if you do that, that was a really cool thing, I really liked it by the way, but it was missing the key points. So how about using RAG techniques to do the text summarization as well? Or maybe I just want a summary of a particular chapter that I'm interested in, right? And make sure that it has all the key points so we can use RAG techniques to do that as well. And then when we have talked about the use cases, how do we use different types of data, right? You might be dealing with different types of data sets or different types of retrievals can also happen, right? So what technique should I use? The one is can be simply based on certain rules or the keywords or the phrases and I fetch the documents, it works for me, yes, let's use it, right? So we have to work always backwards from our use case or the data that we have on hand. The second one is I might have a lot of structured data, maybe imagine a use case, and this is actually something that we have already built with some of our customers. Imagine a use case where there is a natural language question, but I have my data in a, let's say, analytical database or a data warehouse or transaction data, it can be anything, right? And then based on that natural language question we use the foundation model to create a query, generate a query, and that query runs on your database, gives the results back, and then we use the foundation model to synthesize those results to provide you the answer of your original query, right? So you as a user get a full experience that I asked a question, I got the result, but behind the scene, you know, so many things were going on. So that can be one approach. The third one is semantic search. And this I would like to explain with an example because it really takes me back to high school, or even before that actually elementary school. So when I was in school, there used to be a reading comprehension where I was given this passage and then there was certain questions that we had to answer based on the passage, right? So as a kid I was like, oh, I'm smart, I'm not going to read the whole passage, I'm going to use the keywords in the question. I look up those keywords in the passage, they're just two, three paragraphs, four paragraphs and I'll be fine. So I used to get 10 on 10 every time and up until elementary school, some part of the middle school, and by the time I reached high school that 10 on 10 actually reduces to 3 on 10, 4 on 10, based on how lucky I was. The reason was as I was growing, these passages that were provided to us were becoming complicated. The questions that were asked were tricky, they were not anymore based on the keyword. I literally had to understand the question, I had to understand what the author is trying to say at a high level before I can even attempt an answer, right? And that's where the semantic search for machines come into play. So understanding the meaning of the text and then providing you the answer, right? So that's the third kind of retrieval and we'll be mostly focusing on this third kind of retrieval today. So in order to do semantic search or the meaning of the text, it looks lovely, right? But when I have to do it and I have to do it, like I'm not doing it like the machine is doing it, right? So then what do I do? How we, what we nearly need to do in order to enable the machine to do it. So what we really need to do over here is convert our text into numerical representations. Now why do we need to convert the text into numerical representations? Because we want to find the similar documents or the text based on the question that is coming in, and I'll double click on the numerical part in a moment, but we have to convert the numerical representations in such a manner that it's able to retain the relationship between the words, right? If it's unable to retain the relationship between the words, then it won't be meaningful to me, or the machine, right? Then the purpose is not solved. So the selection of the embeddings model, because you're not going to do it yourself, right? You will use an embeddings model feed in the text that will convert into numerical representations that will maintain the meaning and the features and the relationship between these words. So that's how, you know, if you have to do semantic search, you need an embeddings model, you need to convert into numerical representations, your query will come in and then you provide, it will fetch the relevant results based on that. So how does, is it helping me in briefly, if we have to summarize it is helping me to fetch the results based on the meaning, it is helping me because I'm getting accurate context. If I have accurate context and I'm feeding accurate context to the model, I'm getting accurate results, right? So look at how we are connecting each and every dot over here, right? So selecting, so first you have your data, then you have to split the data into chunks so that you can create embeddings and the high quality of the model, which will create the embeddings, will influence your response, the retrieval, and the retrieval will influence your response, right? So that's the reason why embeddings are important. So which model to select. And for that I'll hand it over to Ruhaab. - Thanks Mani, so embeddings might seem like a complicated process and you know, the thought of actually building an embeddings layer seems a little daunting, which is exactly why we launched the general availability of the Titan text embeddings model. And we actually launched this in September and the Titan embeddings model is optimized for text retrieval use cases and RAG use cases and is available in over 25 different languages such as English, Spanish, and Chinese. And because Titan text embeddings is accessible via Amazon Bedrock serverless experience, you can access this model with a single API without having to manage any infrastructure, right? It's so easy, right? You think about just pointing towards a model, passing it the context and getting the embeddings built, right, it's so incredibly easy to use through a single API. Okay, so now that we know what embeddings are and have some foundational knowledge of RAG, let's really understand what's happening underneath the hood, right? Like what enables this from a technological perspective. And before you can actually ask questions about, you know, your data, the data actually has to be optimized for a RAG use case. And this is the data ingestion layer and there's a workflow corresponding to that layer and we'll go right to left in this workflow. So it starts with your external data sources, right? Your company data. This could live in S3 and it could be in different file formats or it could even be PDF documents and unstructured data. We take this data and then go through a process called chunking. And chunking really is just the splitting off that data into different segments, which is really useful for optimizing for things like relevancy. And then these chunks are then passed into an embeddings model, such as Titan text, and then ultimately stored in a purpose-built vector database, which is really optimized for indexing and retrieval of embeddings and can maintain the relationship and semantic meaning that you get through an embeddings model and once you go through this data ingestion workflow, you're now ready to ask questions and really see the true power of RAG. This brings us to the text generation workflow and it starts with a user asking a question. So that question or query is then also, it goes through that same embeddings model to turn that question into a vector. And that vector then is searched in that same vector data source that allows us to do things like vector similarity search, right? So where you don't have to ask questions in that same rigid keyword context where we can actually extract meaning and look at similar aspects of that question and how it might relate to documents. And that's the real power of semantic search, right? Is really looking at that relationships and understanding meaning more deeply. So once we get that search result, that's the retrieval part, right? We're retrieving that data from the vector database and then we're passing that context into the prompt for a foundational model. So we're augmenting the prompt with these return passages and that's the augmentation part, right? That A, we're augmenting the prompt and then ultimately this large language model, the foundation model is generating that final response, right? And that's the G part. And this workflow as you might imagine, it can be fairly cumbersome, right? And there's so much inherent complexity associated with building a, you know, a complete RAG application. You have to manage multiple data sources, you have to think about which vector database to use, how do I make incremental updates to that vector database? And it requires, actually, a lot of different disciplines, right? You need help from data scientists and data engineers and infrastructure engineers. Info about scaling and DevOps, and a lot of this can seem daunting, open source frameworks such as LangChain have made this a little bit easier, but it still requires a considerable amount of development and coding. And so how might we completely abstract away all this complexity? - And that's where we have Knowledge Bases for Amazon Bedrock, where we want to implement RAG or build applications based on this RAG architecture that we just saw, but in a very fully managed way, so that you can focus on solving the business use cases or the problems or you know, working on it. And we take away all the undifferentiated heavy lifting from you. So how is Knowledge Bases for Amazon Bedrock going to help you? So first of all, it provides you with the data ingestion part that we just saw, right? So it will automate a lot of those things and we'll see that in a moment. The second part is it will securely connect the foundation models and even Agents for Bedrock with these knowledge bases or your data sources, right? The third is retrieval, right? How we can easily retrieve the relevant data and then augment our prompts. So it will help you do that. So we have features and we did recent announcements and then we'll be doing a deeper dive on those. And the last one is source attribution. I don't trust anyone to be honest. I'm just kidding, I trust a lot of people. (laughs) But when it comes to machines, we need proof and that's where source attribution comes into play. That how do I know that my foundation model is giving me the right response because the response is based on these data sources that I was providing, right? So let's take a look. Let's dive deep into the data ingestion workflow first, because if you don't have the data in the vector DB, you cannot really do the retrieval augmentation and generation. So the first part is data ingestion workflow that we just saw, right? In this case we are moving from left to right. So you have new data and then the data sources, chunking, embedding models, storing into vector DB, right? Imagine you have to implement each of these things on your own. First of all, you would need resources who can code really well. Second, you can code and then you have to do the maintenance of the code. You might want to use open source frameworks, which is great, but sometimes then you have to think about the versioning piece of it, right? So there's a lot that goes into it. And then you also have to learn specific APIs for the vector store that you're using. What if we say and change everything by providing you an option to and reducing it to the choices. What if we say choose your data source and in this case we support Amazon S3 as a data source. So you select your bucket in which you have your documents, right? And we provide support for incremental updates. As in, when your new documents are coming in, all you have to do is start the ingestion job, sync, right? And then multiple data formats. You don't have to really, you know, worry about the different data formats because with Knowledge Bases for Amazon Bedrock we provide support for PDFs, comma separated files, CSV, Excel documents, Word documents, STML files, Markdown files, and I think that was pretty much it and the text files as well, right? And the list may grow as we move along. So we have support for a lot of these file formats. So you can literally have your data and then upload it on S3 and add it as a data source. Then we provide you an option where you can do chunking, like splitting your documents. You might say, you know what, I don't want to choose anything because I might not be aware of those things. That's fine too, so we have default chunking option, which is defaulted to 300 tokens with 20% overlap. So you don't have to choose if you don't want to, right? But if you want to and if you want to have a particular fixed chunks that you are interested in, you can provide those as well. So the second option that we have is the fixed chunking. You provide the number of tokens for each chunk of text that you want and an overlap, we recommend having it between 10 to 20%. And then choose your embedding model, so right now we support Amazon Titan embeddings model and Ruhaab has already covered that, so I will not repeat that and it's important. Just one thing that I want to double click over here is when we say it supports 25 languages, it's very important aspect because remember when I was talking about the embeddings, these embeddings, when we say numerical representations, they are maintaining the relationship. If the model doesn't understand the language, it won't be able to maintain the relationship between the words, right? So it is important that if your text is in a different language that your model should know about this, your embeddings model should know about it and should be able to maintain that relationship. The next part is the vector store. So we are providing you options over here whether you want to use OpenSearch Serverless, Redis or Pinecone, right? So we have options over here and all of this, you would do the choices, you click the create knowledge base button or if you're using SDK, that's a create knowledge base API and everything is taken care for you. It's automated and fully managed data ingestion using Knowledge Bases for Amazon Bedrock. So now we have our data ingested and ready to use. So the next step is now how does my architecture looks like? It looks like something like this, right? We have knowledge bases now, the data is ready but we still have to query, which we will provide the query, create the embedding, retrieve the context, augment the prompt, provide the foundation model, still do the prompt engineering and then get the response, right? So we still have to do a lot of work. What if we eliminate that and takes away some of that heavy lifting as well. So with that, we recently announced two more features, or the APIs. One is retrieve and generate, which will literally retrieve the relevant documents, feed it to the model and give you the response. The second one is retrieve API if you need more customization options. So let's take a look. This is how it'll look like, your whole architecture. The user is asking a question you call retrieve and generate API and you get the response and this retrieve and generate API does the work for you. It'll take your query, create the embeddings with the embeddings model, it will then augment it to your prompt and then it will feed it to the model that you select. Currently we support two models, Claude Instant and the Claude version 2 by Anthropic. So we support these two models that you can select and get the generated response, right? Pretty cool, and then if you say "This is good, but I need more control," right? "I don't want to do the heavy lifting but I still want control, I still want to customize a bit." That's where we have our second API, which is the retrieve API where we enable you and provide you the flexibility as well. Over here we are still helping you where you have your query, it will be, the retrieve API will automatically use the embeddings model, create the embedding for your query and provide you the relevant data or the relevant documents, right? Now what you have to do once you get the relevant documents is do the prompt augmentation, you have flexibility, what instructions you want to provide to your prompt based on the model and literally use any model provided by Amazon Bedrock or maybe, you know, you might have a custom model or fine-tuned model that you were working within the Bedrock system that you want to use with retrieve API, you can do that, right? So we have options and we still want you to take full control of your application, your decision points, which really impact the answers that you are getting, right, from your application. So these are very important concepts. Enough of talk, right? Let's see something in action. Let's see how it looks like in the console. So the demo part now, And I'll share my, now you can see my screen. So where I am is basically I'm on the console and I've searched Amazon Bedrock and then this is Bedrock. And then I have to go to knowledge bases, which is literally under the orchestration, so we click over there, and then it talks about what all you can do from the console. You can create a knowledge base, test the knowledge base and then maybe use it, right? So we'll go through that. In order to make sure, because you know we have limited time, I've already created a knowledge base, but I'll still walk you through if you have to create what you will need to do. So the first part when you see over here is you have to create a knowledge base. So you click on the create knowledge base button. And by the way, whatever I'm showing you over here, you can do it via SDK as well. And then you provide a name. I would would suggest that you provide a very meaningful name over here because you might end up having a lot of knowledge bases and you don't want any confusion, also add a meaningful description. And then you need permissions in the role, right? Because when we were talking about knowledge bases, knowledge bases will be accessing your data in S3 and then we'll be storing it, creating the embeddings, so they need access to the embeddings model as well. And also they will be storing the embeddings into the vector DB. So they need access to those as well. So make sure that your Amazon Bedrock execution role for knowledge base has all those permissions. And if you're unsure how to do that, simply select create and use a new service role option so that it's automatically created for you. And then we go next, data source. So provide a meaningful data source name, provide an S3 location. I'm just going to type, this is not a existing S3 bucket to be honest, I just provided the name for the demo purpose. And then additional settings, this is where you get to select your chunking strategy. So you can select three options as I mentioned earlier, default, fixed size, no chunking, right? So you have options over here as well. Let's do fixed size and then I can select, maybe I want to do 512. And typically your overlap should be around 10 to 20%, that's what our recommendation is. So since we only support right now Titan embeddings model, so that's there. And then if you say that, you know what, I don't want to create a vector DB, I want you to create a vector DB on my behalf because we attended that talk and it said fully managed RAG, right? So that's where we have that option that you can select this quick create which will automatically create a vector DB and it will create an OpenSearch serverless. So you can choose that. But you know what, again, we have to give you options. What if you have an existing vector database or an index that you want to populate. So you can literally, if you have an index in OpenSearch Serverless, Pinecone or Redis Enterprise Cloud, you literally provide the details about those and then go next, that's it. - [Ruhaab] And you might have heard in the announcement today that we will be supporting new vector database types soon, including Aurora and MongoDB with likely more vector database options coming. - Yeah, so stay tuned. Okay, and then you review your setup and click on create knowledge base, right? So because we want to be cognizant of the time, we already have a knowledge base. Now this knowledge base is based on the text documents. So when you have created the knowledge base, you will actually land up over here. You will not land on there. It's only when you go back to the knowledge bases where you can see the list. So once you create, you will be here. And then most important point is once you create a knowledge base, you have to click on the sync button, this is very important. Because when we are saying we created a knowledge base that was good but we have to sync. Sync is the actual thing, when you do that, it basically, it will look up all your data that you have in S3, it will pre-process those documents, extract the text out of it, split it into chunking strategy that you provided and then pass it through the embeddings model, then store it in the vector DB. So that's the sync thing. And when you have, let's say new files in your S3, you press the sync button again or call it SDK start ingestion job, right? So it will literally make sure that everything is in sync. So you need to do that, I've already done that. And then you want to test it, right? So if you have generate responses, meaning we are using generate, retrieve and generate API or you can use when you untoggle this then it's only retrieval. So let's start with generate responses. I first need to select my model and then I can select either Instant model or Claude v2, We have 2.1 as well that was also recently announced. Okay, hold on. Yeah, it was too much zoomed out. (laughs) And then you select the model and then you can ask the question. Now since my documents are based on tax data, my knowledge base has all the tax related data set, then I can ask a question. So what I'm asking is "If I work remotely, which state do I owe taxes," right? I mean I just selected that because a lot of us, you know during the pandemic we were working from home. So I was like why not ask something like that? And I know a lot of us are now back to office, which is also cool. Okay, now you click on the show result details and notice some important things. First of all, it is giving me the response very quickly. Second, I can literally see the source attribution right on my screen, right away. So important points. If I work remotely but your employer is located in a particular state, you may owe income taxes to that state and I'm not going to read the entire thing. So, and then if you have to look up the source that the model used is basically right over here, right? So it provides you and if there are multiple sources, you will see multiple tabs over here and I'll show you in a moment. You can literally go to the location of that document as well. So this was about how we were doing retrieve and generate, what if I just want to retrieve, right? Let me ask the same question because it will just make it easier for us to go through. By default it'll retrieve top five documents, top five most relevant documents. And then I can go show details and I'll look at it. So I'm seeing this particular chunk from this p17.pdf and then another chunk from another PDF and another chunk from another PDF, and this retrieve API also gives you a score. So based on which vector DB you are using and which score you are using for that vector DB for example, if you're using cosign similarity, so the score will be based on that. If you're using using Euclidean distance, it will be based on that, right? So the score option is also there. So this was about how you can, you know, use it on the console. And we also have another demo where I will show you the APIs and how we can do the integration with LangChain. But the important point is if I have to build retrieval augmented generation applications with Knowledge Bases for Amazon Bedrock and use those APIs, you can literally do that end-to-end using the features that we just talked about. But what if I want to, I have some dynamic responses, right? Sorry, what if I have some dynamic information that I need to fetch in addition to what I have in my knowledge bases? Maybe I have a knowledge base which has lot of order details, but I also want to call some order API, which gives me the status of my existing order, which is in transit, right? Or do multiple things around that, right? So what do I need to do if I want to integrate the knowledge bases with let's say agents or other ecosystem of Amazon Bedrock? So Ruhaab, over to you, let's please walk us through it. - Great, if we could just go back to the slides please. So as Mani mentioned, you saw how easy it was to get started in terms of uploading documents into vector database and begin immediately interacting with those questions, right? Just in the matter of a few steps, you had a fully functioning RAG application and if you recall earlier, the information that we're storing in that document is relatively static and even though it's syncing maybe behind the scenes in some cadence, there's going to be times where you need an application interact real time with databases or even other tools and systems. And this is what Agents for Amazon Bedrock is really built to do and a knowledge base can work directly with an agent to enable that use case. And so if you think about the real power of a agent is they're very specific, you know, models that are used for planning and executing tasks, leveraging reasoning capabilities such as chain of thought processing and these state-of-the-art approaches are great when you want your application to interact with an API, automatically generate the dialogue with the user to collect the information needed to execute that API in action. You're not having to find the conversational flow, the parameters of the API can be automatically collected with this model asking for those bits of information to fulfill the required arguments of calling an API or interacting with the tool and orchestrating those actions. And agents, as I mentioned, can also be combined with knowledge bases. And you'd want to do this when you're looking to combine actions and those information retrieval type use cases where you're just simply fetching context from a document and using that as supplemental information of interacting with the tools. And lastly with agents and knowledge base, again, all of this, all of these workflows are completely abstracted away as a fully managed service built directly in Amazon Bedrock. So let's take an example of how this might work when Knowledge Base and Agents need to work together. So as I mentioned, Agents can orchestrate the user requested task, so let's take an example where you are asking this application to send a reminder to all your policy holders with pending documents, okay? And what happens is that this model can really plan for that execution of that task by breaking this down into smaller subtasks planning that such as, you know, getting the claims from a certain time period. It may have to identify what is the paperwork even required in this process, which might be, as I mentioned, the knowledge that's in your knowledge base. And then ultimately sending that reminder so the agent can determine the right sequence of steps, facilitate the dialogue, collect the information, and even handle error scenarios along the way. So an incredibly powerful ability to orchestrate across a dynamic sequence of actions across, you know, knowledge bases and APIs and tools and really offer a seamless experience. So something you know, as simple as asking for a machine to send a reminder to all policy holders with pending decisions, you can see the complexity that's abstracted away to, you know, really make that use case possible. And now we'd like to show you how you can also use open source generative AI frameworks like LangChain to build knowledge bases. And I'll have Mani walk us through another demo. - Yes, and for that I'll be sharing my screen, so okay. In this particular demo I'll be using the APIs because a lot of us here might love the APIs and the SDK experience in addition to the console experience as well. And also LangChain provides you with a lot of wrappers which are pre-built. And why do we need to reinvent the wheel when we have something out there and we want to reuse it but we want to reuse it with the latest features that we just showed you, right? So let me take you to this quick journey. So first of all, make sure that you have the latest Boto3 versions and you have the latest LangChain version. So it has to be equal to a greater than the versions that I'm showing over here. For LangChain it's 0.0.342 for Boto3, it's 1.33.2, right? So make sure you have equal to or greater than these versions. Now the first thing that you need to do is basically provide setup, right? So as with any AWS service, when you are wanting to use the APIs you first have to create the client. So for Bedrock and in this particular case we need two client, one is the Bedrock runtime client, which helps us call the invoke model. The second is Bedrock agent runtime using which we will call the retrieve API for the Knowledge Bases for Amazon Bedrock. So this is what we are doing it over here providing some model parameters because remember this is retrieve API and you can connect to any model provided by Amazon Bedrock. So that's what we are doing. You can, you provide the model parameters, you select your model. Now the actual retrieval. So if you are planning to use the retrieve API with LangChain, you will need to first initialize a retriever object. So you have to import Amazon Knowledge Base retriever from LangChain and then use it. So what do I need to pass? I need to pass the number of results, the relevant documents that I want, right? So that's what I'm providing it over here and the knowledge base ID because how will this know which knowledge base to get information from, super important, right? And let me show you how you will get the knowledge base ID, because if you are using SDK, then you will automatically get it as a response from the API and you can leverage it. If you're using it from the console, then you click on the knowledge base and that's where you get the knowledge base ID. So you can literally copy it and I'm using the same knowledge base over here as well. So a quick thing, I just wanted to point that out, right? So now you have the relevant documents. Now what we are going to do is if I'm building a Q&A application with LangChain provides you with a retriever QA chain, and all I need to do is, I've already declared the large language model, I've already declared my retriever, let's use the retrieval QA chain, pass everything together and then keep asking questions, right? Let's move to that part now. And this is just showing that you know, if you just want to retrieve the documents, get relevant documents, you can do that. But if you're integrating it with retrieval QA chain, you don't need to do that to be honest, all you need is this retriever object. So let's take a look how we integrate. Okay, so that's where we have the retrieval QA chain. Now I provide my language model which will give me the response, then I provide my retriever object, which will give the relevant documents and I also provide the prompt. So now I have flexibility that I can provide my own prompt, I can provide my own instructions, and this retrieval QA chain will automatically augment the relevant documents with my prompt. And just so that you are aware, I just wanted to show you the prompt template as well. So you can provide specific instructions and model specific prompting. So it's very important you can literally say the model only provide information based on the documents, right? And based on this relevant documents. So based on your use case, you can provide instructions. And then once you have integrated it with your retrieval QA chain, you literally provide the query to this QA object that you have created and it will keep giving you answers. So you don't have to initialize it over and over again. You can literally ask multiple queries, get the answers, multiple queries, get the answers, and now you have a running Q&A application with just three things, initializing your model, initializing the Amazon knowledge base retrieval with LangChain and then retrieval QA chain, passing everything together and we have the application ready, right? So, and you can use the same pattern if you want to build a contextual based chat bot with the conversational chains that LangChain provides, right? So do explore and if you are interested in looking through the same code, we have it on GitHub, we'll share the resources with you. So Ruhaab, can you just do a recap for us? - Yeah, absolutely, thanks Mani, and if you go back to the slides and you know, it's incredible how quick it is to get started using this both using LangChain, but also if you prefer using the Bedrock console, you saw that there's flexibility of choice and really getting to the same output. And so yeah, if we could go back to the slides, please, if we just quickly recap kinda what we all covered today. It seems like a lot and it was, so thank you for attentively listening. We first covered, you know, why customization is important, the different approaches for customization, both between augmentation and then other approaches that are actually changing the parameters and weights of the model. We talked a little bit about retrieval augmented generation, what the specific use cases are for RAG, and then all of the different components from data ingestion to the query workflow and how a lot of that is just completely abstracted away using Knowledge Bases for Amazon Bedrock. And lastly, we talked about how knowledge bases can be further extended when you need them to interact with real time data and databases and APIs where agents and knowledge bases together can really help enable that capability. And if you want to take a quick picture of this, a lot of the notebooks that Mani showed you earlier and a few more examples will be published in GitHub for you to take a look and use as inspiration for your work as well as the documentation which deep dives further into Knowledge Bases. So we hope you check that out. So we just wanna say thank you for attending. I hope this was useful. Our LinkedIn handles are here, we would love to hear from you and see how you're using Knowledge Bases and what feedback you have. And don't forget to take the survey in your app so that Mani and I can get invited again next year to give a talk at re:Invent and really appreciate you coming today. Thank you and have a great conference. (audience applauds)
Info
Channel: AWS Events
Views: 23,028
Rating: undefined out of 5
Keywords: AWS reInvent 2023
Id: N0tlOXZwrSs
Channel Id: undefined
Length: 58min 50sec (3530 seconds)
Published: Mon Dec 04 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.