AWS re:Invent 2023 - Generative AI: Architectures and applications in depth (BOA308)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Hello, welcome, everybody, thank you for joining us for this session on generative AI. I know that there are no other sessions about generative AI at re:Invent this year, so I'm grateful for you joining us in this one. So in this session, we're talking about architectures and applications in depth. Hello, my name's Mike Chambers, I'm a developer advocate for Amazon Web Services and pretty much specializing in generative AI. - And I'm Tiffany Souterre, also a developer advocate at AWS, specialize in AI also. - Awesome. Should we get into the agenda of what we're gonna cover today? - Sorry. - Oh no, that was, yeah. We've both got clickers, what could possibly go wrong? - I just, oh dear, sorry. So for the agenda today, we have a generative AI, a quick recap so hopefully, you have some understanding of what generative AI is. Then we will go through the question like what is retrieval augmented generation, RAGs? What are agents? We'll go through security audits and compliance for your applications with generative AI and a little bit of predictions for the future of generative AI in the tech world, yeah. - It's predictions where we hallucinate, probably- - Yeah, exactly. - All right. - So we have a poll question for you guys if you want to flash the QR code. The question is, are you using generative AI in your workloads or workflow today? Are you using generative AI? - I am, strangely enough, yes, because I put presentations like this together, but yeah, absolutely. I think generative AI has been a huge impact on my life and the life of my family as well. So yeah, using CodeWhisperer, for example, to help me write code, what about yourself? - Yeah, well, I mean, even in the workflow, like maybe you guys also use, oh- - We're in the wrong room. - No? Why are you even here? - No, well, we're here to talk about it, I guess, right? - Or maybe trying to understand why- - Look, and we've got some more questions coming up, which I think sort of going, may sort of reveal this in a little bit more depth. I don't if you find this interesting, but I think that polling was kind of interesting. And also, we welcome the other rooms that are joining us as well via the stream 'cause other people up and down the Strip are watching this. So yeah, okay. So let's dig into this in a little bit more detail. So I guess we start off here, we wanted to start off with a quick recap of generative AI. I think we're making the assumption at the moment in this session that everybody has some kind of familiarity with generative AI. Let's go for the really simple quick start. So generative AI, it's machine learning, it's artificial intelligence that generates content. A lot of the machine learning that we've dealt with back in the old days, like last year, was all about making predictions about what data was, so here's a picture of a dog, here's a picture of a cat, great stuff like that. But in this case, we're able to type in prompts to models that can do these generative tasks. And I'm sure we've all had a go with putting in a prompt and asking for a picture to be generated. A golden retriever wearing glasses, a hat in a portrait painting, I did type that and it did make that, and all going to chat bots and typing in prompts and getting answers back, so in this case, this is one of the open source large language models asking it, what is generative AI? It came back with its answer there. So generative technology, maybe we're all on the same page and we've all had to go with this stuff. So at the center of all of this is the foundation model. Certainly, the center of the, you know, the application is that we build as builders and developers on top of this technology, there is a foundation model. And these foundation models are machine learning models that have been trained with a transformer architecture. We talked a little bit more about this in a 200-level session that we had just before this morning. So we are not doing again, but it'll be up online at some point, so BOA 210 if you want to have a look at that once it's up online. But the transformer architecture was really the pivotal moment when using this technology, we're able to train really big models, so we're able to take data that we have. And a lot of the time in these presentations, we're probably gonna end up speaking about large language models, LLMs, because that's really where, at the moment, the majority of the enterprise and business and social and real value is, it's great that I can go into a prompt and get a picture of a golden retriever wearing a hat, but there is so many more things that we can do with this technology, large language models can do those things. So we put all of our data and train our foundation models with compute, I've said GPU here, this could be other kinds of PUs, but we're talking about the types of architectures which are optimized at training the transformer architectures. And we may be aware of this, we're probably familiar with this, these are not small things, these are not things, this isn't a rainy weekend project, which is, do you know what, I'm gonna make myself an LLM. That's not really how these work, not the big ones anyway. And so there is a lot of data and a lot of compute power required to generate these models. And and this has really been one of the transformative things and why we're talking about generative AI now as opposed to a couple of years ago, far less of us were talking about it anyway. And it's because of this gradual change in size, this growth in models, it's almost not worth trying to keep this slide up to date. The point of it is that models used to be really small, like back in the old days and very very old days, we dealt with tiny little models with like a few hundreds of parameters, then some millions of parameters. But that line on the graph, that's, in the 2016-2017 frame, that 2017 point is when transformers emerged and when we discovered this ability to make much larger models. And that's really been the key to this, is as the models have got bigger, so have their emergent properties have been discovered, like we've made these big models and said, oh my goodness, they can do all of these things. And so we are finding that models are not mathematically, exponentially, but language exponentially growing and getting much, much bigger, what's gonna happen in the future is a bit of a guessing game for everybody. In reality, what's happening is that this is fanning out, right? So models are getting much, much bigger, and there is a push to get more and more capabilities from the models by making them much bigger. But they're also finding that we can train smaller models which do other tasks quite well. So I know this isn't, you know, there are new versions of Llama 2 and Falcon since this was put together, but there are those smaller models too. So we're getting a lot of capability, a lot of spread of models. Generally, especially compared to the old days, they're all much, much bigger than we've dealt with before. Now, I said at the beginning, we're talking largely about large language models here, large language models, they are a type, of course, of foundation model, but they're text-based, so they're the ones that we're gonna be using for our chat bots and for making text completions. But a lot, lot more underlying this, and I think this is one of the things that, for me anyway, it just absolutely blows me away, I mean, even when we were talking about this this morning, it still just amazes me that this technology works. But really, all these things are doing is making next word prediction. Next token prediction, I guess we would say, we understand that they work with tokens, not words, but that's all they're doing. We're sending these prompts in and the models are figuring out what the next word might be and then cycling through that again and again and again, we get some kind of output we can use. And that is where the, it's the root of all of this amazing emergent properties that we've found. So we enter a prompt and the LLM adds a completion. But it's such a simple concept, very complicated from a mathematical perspective, but just by predicting the next word, we're able to create all of this quite astonishing capability. So the large language models that we produce, one of the things which sort of sets them apart from models that we've worked with in the past, again, apart from size alone, is that in the past, with natural language processing, we would create models to do specific jobs. So we would create models to do text generation, we would create models to do text summarization, et cetera, et cetera. Translation, I love translation, like we could translate from English to French, but you can also translate from natural language to a computer language, so from natural language to Python, which I find super useful and maybe some of you have used that, sentiment analysis, et cetera. What we're finding is that we can use the large language models as general-purpose models, they can do all of these things and a lot more. We say more, dot dot dot, that's important, we'll be coming back to that. But they can do all of those things, we don't necessarily have to take a single-purpose model. And we'll see what the future brings, there will be specialized models, et cetera, but these large language models are super, super capable. We're not gonna throw quiz questions at you, well, they're not quiz questions, or survey questions at you the entire time that we're on stage here. But we've got another question for you here today. And I'm super interested in this one. Of all the questions I've probably asked any audience, I'm very interested in this one, but what is stopping you from building with generative AI today? There's some multiple options on there that aren't gonna come, they'll come up on the screen when we get the answer. - Do you have a guess? - I have an idea, I don't know. I don't wanna predict anything. What's stopping you from working with generative AI? Not much, we are working with it. - We are. If we could not work with generative AI, we wouldn't be working at all. - That's right, that's true, yes. - But- - We'll take some time. - I guess we're- - All right. Ah, interesting, oh, interesting, oh, it's changing and it's live, this is good. - Privacy and security. - Privacy and security. Okay. There's more votes coming in. So all right, there's a whole cohort of people that there's nothing stopping you and I guess, ah, the devil's in the detail of this, we'd just like to have so much more interactivity. There's nothing stopping you and you are doing it, or there's nothing stopping you but still, you are not doing it, I don't know. But privacy and security, that's cool. We'll definitely be talking about that in this session. - Yeah. And so for the next slide, so one of the thing that actually, and some of you have answered that question also, answered that answer, is that what could prevent you also from using foundation models also is hallucination. And that's because some of you might have experienced that with LLMs, but sometimes, you ask a question, and because it really wants to give you an answer, sometimes it gives you something that is not accurate or that is probable of an answer but is not factually right. And this is something that, if you use LLMs in your use case for your business, you don't want to have some of the answers that were wrong. You want this to be accurate all the time. And so you need more control over your LLMs, you need to customize them for, they have to be tailored for your business to make sure that the answers that the AI is giving you is accurate. And also, there's the other limitation is that, once you've trained your foundation models, their knowledge is limited to the time at when you have trained your data. So let's say you have trained your LLM back in 2021, then it will only know- - For example. - For example. - For example, I don't know where you got that date from. - Yeah. Well, it will only know the things that you've given it as a dataset for training up until that date. But everything that has happened after that, it would not know factually. I mean, it would be able to give you answers that are probabilistically right, but the facts are not there. - Yeah, I think a really interesting way to put this is that large language models at that point, they have a good sense of the world. So they know that up is up, the down is down, the sky is blue, but they don't know about you, your business, your social enterprise, whatever it is you're doing, they don't have those facts. - Exactly. So one of the things that you could do to gain more control over your AI is using retrieval augmented generation and agents. We're gonna see in a minute what this actually means. So we're going to show you the general idea of what a retrieval augmented generation is. So basically, you have your LLM and you give it a query. So it is your prompt, this is the question you ask to your LLM, and what you wanna do is to give it more data. You want to inject data that will help your LLM understand better your question and make it sure that it has all the facts to make sure that the answer is accurate. And the source for the data could be different things, it could be a vector database, it could be an API that helps the agent to retrieve information and then put this in the prompt. So basically, it's not very complicated, what happens here is that you just augment your prompt with your initial query plus the data that is relevant to the question. - And I think a real good call-out here is this is what we're not doing. So the first thing that, we're not going immediately to the step of fine-tuning, we're not immediately going to creating our own foundation model or doing continuation training. All the things which you absolutely can do, but the first thing that you should look at is prompting, prompt engineering, and this, retrieval augmented generation, which is not as complicated as I think most people make out. - Yeah, exactly, 'cause you might have heard of fine-tuning, but this requires a lot of compute like power and you need data and it takes time and it's costly. When you do retrieval augmented generation, you only need to update the data, like the source of the data that you send to your LLM, and this is why your data is fresh. So now your foundation model has been trained up until 2021. But then in the data, in the vector database or the API, you give it more information that is up-to-date and this is how it's able to generate a more accurate answer. So the mental model. We have, so we know that those concepts can be very complicated, and we like to use metaphors to like kind of wrap your hand around what those RAG is. So an analogy that we found with Mike very useful is- - [Mike] Don't blame me. - No, no, is the wizard analogy. So basically, what a foundation model is here is the wizard student. So at the beginning, you have a wizard, it doesn't, a student, it doesn't know exactly how to spell cast, it doesn't know much about magic, and what you can do is to send the wizard to school. So they have pre-training. So this is where they learn about the world of magic, like general knowledge about everything that is magical. But it's unable to cast specific spells. Like right after school, if you want to specialize in a, let's say a domain of magic- - [Mike] There's a troll in the dungeon. - Yeah, exactly- - We all know what it is- - For example, you would not be able to kill the troll- - I don't know how to do it. - So what you would need at that moment is probably to have a book of spells with you. And that book of spells is the data that you need to be able to basically kill the troll like that is with you or answer a question but, or also specialize in a certain area of magic, like we say before, like maybe you want to specialize in transformation, like you want to transform into an animal or you want to specialize in, what's the other areas of magic like- - I don't know, I think you're digging a hole. - Right. So, but that basically what it is, so the student is your LLM and the spells is your data, your vector database. And, oh yeah, that's the fine-tuning part. - Yeah. So what might happen if the wizard goes and looks at the spells and doesn't understand the terminology of those new spells? - Then you can send the student back to school to refine the training, like to refine the general knowledge of it. And that's basically how you can customize your LLM. And now, going back to a little bit more of code, 'cause we know it's a 300-level session, we're going to show you how this works. - [Mike] The wizard's coming back, by the way, so if you wanna take pictures and say, I was at a 300-level talk and they were showing me pictures of wizards, you'll have your opportunity to do that. - [Tiffany] Of accurate architecture. - [Mike] Yes. So, and somewhat it's in this demonstration too. So what I've got here is some code. When we've been talking with people and exploring retrieval augmented generation, there has been some confusion, I'll be completely clear with you, so, of looking at the relationship between, especially when we're talking about vector databases and vectorized information and the actual large language model itself. And in this demonstration, I'm gonna show you some code. It's not necessarily the prettiest code in the world, and I'm not suggesting you'd use this in production either, by the way, but I'm going to step through an example of retrieval augmented generation so we can actually see it. Because very often, you'll be implementing this with a library such as LangChain or with products and services that do all the undifferentiated heavy lifting for you. This is a very hands-on, no libraries, kind of some libraries, but no real libraries at work. So we can see exactly how it's working and how the large language model and the vector database are separate. - [Tiffany] Yeah, just to make sure, don't do this in production. This is just to explain to you the concepts, but then you have tools to, you know, manage everything for you. So this is a very hands-on example of it, but don't do this. - [Mike] They said don't do this- - [Tiffany] Don't do this in production. - [Mike] What's the point then? Okay, so it's an example, and I'm wanting to illustrate the point in Python. So at the beginning of this notebook, so I've got a notebook of code, at the beginning of this notebook, I'm importing some libraries, and we'll obviously step through and see where these libraries are being used. The key things here is that we're using Faiss, F-A-I-S-S, the open source vector database in memory from Facebook, so, or Meta, that's just being used just for convenience sake. It's used a lot in examples for this type of thing. And then we're using Boto3, the SDK for AWS, and Jinja to do some templating. So the first thing that I'm going to do is I'm going to create for myself a Bedrock client. So I'm actually using Amazon Bedrock behind the scenes to do this. So I'm going to be using a couple of models for Amazon Bedrock, the embeddings model and then one of the large language models as well. But you could use any embeddings model, any large language model, and experiment around in a similar kind of way. But getting myself this client means I can move forward. So the next thing I'm going to do is load up my documents. And so I quickly last night thought, do you know what, I feel like we should make this about spells. So we've got a whole bunch of spells here. There's a special secret one in there somewhere if you can find it, we'll find it in a moment. So these are basically just text strings, I know these represent documents, essentially. So the documents that you might have in your document store. Small, simple, something that we can just run here today. Remind me as I scroll through this to run these cells because otherwise, we will have a bad day. Okay, so I've run everything so far, that's cool. So the next thing we're going to do is we're going to vectorize all of that information. So what that does for us is it takes all of those text documents, spells in this case, and it's going to convert them into a vector space. So we use an embeddings model to do that, and that basically takes the text string and converts it into an embedding space of 1000 and something vectors. And it would do that, if it said hello world, it would be that big, if it was an entire 10-page article, it would also be that big. So there's nuance and careful crafting when you're doing this in production about how large a text string you wanna put into the vector. But for now, we're gonna put the entire spell in. And so this is just a simple function that I've got, which is gonna help me do that. So it's going to take the text, the spell in this case, and we're going to run it through the Titan embeddings model. So these are just some keyword arguments here, which help me to call the model from Amazon Bedrock. So Amazon Bedrock has a fairly standard interface, an SDK-level interface. And so this construct here will look pretty similar for each time we're calling the model, whether we're generating an image with Stable Diffusion with my dog and a hat and whatever, whether we're doing text generation, or whether we're doing this embedding. And this line here where we just call the client that we got, so bedrock_runtime_client, and call invoke_model, this line is exactly the same for any of those generations that we're doing or any of the embeddings that we're doing, just passing in those keyword arguments there. So we've got that set up, we've got our function set up, ready to go, and it's just going to essentially return for us the embeddings, just extracting that out. And so from this section here, which is very long, lots of Enters there, we can now go and do that for all of our spells. So all I'm basically doing here is creating a NumPy array where I grab all of those embeddings that will be generated and put them into that NumPy array, so let's run that. And any suggestions for code improvements, please feel free to let me know, not now, don't shout out. So that took a little while to run, you probably noticed that. And that's because it was running through each of those spells, sending it off to the model, getting the embeddings, bringing it back. So creating embeddings is not gonna be instant, but it's gonna be pretty fast and it's gonna be much faster than trying to fine-tune a model or do continuation training on a model using especially that limited size of a dataset. So we've got that done now, so I should have my spell embeddings all set up. Let's do this, let's risk everything by writing more code. Let's just literally just print out the spell embeddings to the thing, so yeah, you can see. So numeric data, the embedding space for those different spells. It's a one-way process, we can convert it into embeddings, we can't bring it back again. So we are using this or we will use this to find the data from our database. The original documents essentially need to be stored somewhere, which they are, of course, further up in the notebook, we have the original list. We still need to have that because we're gonna use this as an index and we are gonna get our data back from the original list. So we've got our embeddings, what I need to do is I need to go and put that inside of the actual vector database itself. So by doing that, I have the capability then to be able to perform queries on that. So this is a simple line, we're gonna create a magic bookshelf. I haven't shown you all of this code, have I? This is called the magic bookshelf and this is the index, essentially, you would call this index if you're writing proper code. With that index now, we've gone and got that. And I can just run this, oh, sorry, that's creating the index now, we're gonna populate the index with all of our data, and just outputting the length of that. So we can see we've got 21 spells, documents, pieces of text, vectors, inside of our vector database. All right, so up until this point, we've done nothing with large language models. We've only dealt with vector, sorry, with embeddings models, and we're gonna carry on like that just for a moment. So we want to ask a magical question, and I know I'm wanting to know this, but how can I become a fish seemed like a good idea at the time. So how can I become a fish, that's the question I'm going to ask my spouse, and I know that the answer's in there. And obviously, in a larger system, you'd be able to ask more nuanced and interesting questions than that. But how can I become a fish? The first thing that we need to do with that question then is we need to turn that question itself into some embeddings because what we want to do is we want to get the database to query and do a similarity search to find all of the vectors that it's got, which are all the representations of the spells we've got, and find the ones which are closest, like in Euclidean distance, closest to the vector of our question. So I'm gonna embed my question, and just for the sake of being able to see what that looks like, let's just take a quick look at that. So there it is, there's my embedded question, very long. It's 1000 and something places in size, numbers in size. How big that vector is depends on the embeddings model that's being used. So I've got that now, and so now I can go and query my index. So I'm going to say k of four, so I'm looking for the four nearest facts, spells, documents, whatever they are, the four nearest snippets of thing that are closest to the question that I have. So I do that here, so I have k as four, I have my embedded query in there, and I'm searching this index that I have. So I can just press run on that and it's pretty quick, it's in memory, it's obviously also very small, this isn't enterprise size. And I get a couple of things back. So I get this first list of integer values. So these are the index positions of the facts of spells inside of our vector database, which most correlate to this. Now, clearly, I'm setting myself up for success, so I am asking something specific about a specific spell we've got, so I am expecting 11, the thing at index position 11, to be correct, printing out here as well the distances. So the distances between the vector spaces. So inside of the vector database, we've got a big cloud of vectors, which are our spells, and we've got our query, it sits in the middle, what are the distances to the four closest, and so those are what those values are there. So we could do something more sophisticated with those if we wanted to. Okay, so that's basically it, obviously, there's more, but that's basically it, we've now performed a query of our vector database, we've embedded all of our documents, and we've performed a query, we've got some answers back, something which is like the most likely answers are clustered there. No large language models have been used at this point at all. We've used an embeddings model, but that's all. Where does a large language model come in? Well, it only comes in if you want it to in your application, so we do want to, so let's carry on. So we're going to now construct a prompt and we're going to prompt our large language model to help us answer the question. And so I'm using Jinja here as a templating language, a templating model, to be able to put together a prompt which contains our information. There's a really crucial part in here as we run this. So let's run this, first of all, so we've now loaded the string, that's basically it. And I'm saying, you know, given the spells provided in the spells tags, find the answer to the question written in the question tags, there's lots of different kinds of prompt engineering kind of best practices. You can read a lot from the different model providers about what works really well. This is sort of taking play from Anthropic and the way that you work with Claude, but we're actually gonna end up using the Amazon Titan model. And you can see in here with Jinja templating, we're gonna put all the spells in here and we've got the question will be put in here. Let's go and fill that out though, so I'm literally just using Jinja, if you're not familiar with that, it's essentially a templating library mechanism that you can use quite easily in Python. And it's basically just gonna smush the data together with our template. So now let's go and print out the actual prompt so we can see what it is, let's go and do that. I think I might need to just put that into there. Let's run that. And so this is now my prompt, text prompt, that I'm gonna send to my large language model. Key, key concept here, the spells, the facts, the things have been entered into this as text. We are no longer talking about embeddings. So the embeddings model that was used to vectorize our data was only there to help us do that similarity search. And then we took the text that we got back from that similarity search and we've put it as text into this. The large language of the, yeah, the large language model has embeddings. - [Tiffany] Oh, well, just to make sure that this is clear. Before having a RAG, we would just send the question like how can I transform into a fish to the LLM. Now everything that Mike has just shown you is how he recovered relevant data in the vector database to augment this prompt. And this is like prompt engineering, basically. He is putting more information into the query to the LLM to make the LLM able to answer accurately to the question. - Absolutely, it's what, again, on that slide that you looked at before. And so, yeah, so now we're going to go and go ahead and put this into the large language model. The large language model also has embeddings built into it inside of the multi-headed self-attention and all that kind of other stuff completely separate from the embeddings that's used and the embeddings model that's used to get the data into the database. So we have our prompt, it's just a text string, it's the secret of prompt engineering, it's just text strings. And we're going to now send that into the model. So this is now something more specific to Amazon Bedrock. I've got my keyword arguments that I'm gonna set up again here. So in a similar way that I did when I called the embeddings model, this is now how I call this particular model, so this is the Titan Text Express v1. So a model that I think became generally available yesterday. So it's now available in Amazon Bedrock. You can enable the model access, if you're not sure how to do that, come see me afterwards and I can help you do that. And we've got all our keyword arguments that we can send in with max tokens and all that kind of stuff. Interestingly here, I've turned the temperature down to zero. So if you're familiar with temperature, it's about how creative you want the answer to be. And in this case, because we're dealing with specific data, which actually is very likely to be in the prompt, we don't need it to be very creative. We want it to be quite factual. We wanna avoid hallucination if we can. And I'm not gonna tell you that this architecture can eliminate hallucination, but it will definitely help move away from it. So we've got, I think I ran that already, but let's run that, so we've got our keyword arguments there, and I can just run it through this section of code here being, it's the sort of boilerplate kind of streaming body object response from Python if you're familiar with using the Boto3 SDK. So we can basically call again this line, exactly the same line from before, if you remember, to invoke model, we're gonna get back our body response, and we're gonna load out of that the generation, press play. Pray to the demo gods. Yes, I'm running this live. Yes, they said you should just record a video of doing it just in case it doesn't work, but I have faith. And you obviously have faith too. So to become a fish, are you ready for this? To become a fish, you need to puff out your cheeks and say bloop bloop bloop I'm a fish. Thanks. I was hoping you'd do that. I have another question which I'm gonna ask as well. So here's the, ooh, so the special spell that we had in the middle of here as well. Let's go and grab this one. How can I get started with any AWS service quickly? So if I just scroll back up to this just to do one more, where does it set? Here is our ask a magical question, asking you about the magic of AWS. So I'm gonna set my question, I'm just going to run these cells again, Shift + Enter all the way down. Here is my new set of potential answers with number 10 looking like it's, or the one at index position number 10 looking like it's likely. Let's redo our prompt. And maybe we'll skip past that, otherwise, you see the answer, it's very easy, as I said, this is a very simple demo, but I really just wanna highlight how all this works. And if I wanted to do that, to get started with AWS's services quickly, I open up the console and use Amazon Q. Have you had a go with Amazon Q? A little bit? Is anyone there? Okay, a little bit. Have a go with Amazon Q, Adam talked about it in the keynote yesterday. Okay, so thanks for watching that, this basically was to highlight hopefully underneath the surface how you can work and how retrieval augmented generation works. And I think that's really important. There are different ways that you can do this very easily, but that's what's happening under the surface. And I think having that intuitive mental model will really help to be able to debug applications that you're writing if you write them in other ways. - Yeah. So seeing all that helps you understand what's happening. But now, oh yeah, I'm going to talk about agents before- - [Mike] Yes, yes, yes, yes. - Yeah, okay, and so now with our agents. - [Mike] It's back. - To have the, finally the technical architecture complete. The agents are basically, they can do actions for you, if you give like APIs to agents, they will be able to perform tasks for you, so it's not just like you have your foundation model and just ask it a question. Now, suddenly, with agents, you are giving your foundation models arms and hands. So basically, they can, for example, if you ask them for an item to buy on a retail, like randomly, whatever retail company you can think of and then at the end, you ask like, search for this item for me on this website and then you can basically ask, buy it for me. And then the agent will be able to do the action for you. And depending on whatever action you want it to do, like you said a good example earlier, you said retrieve the time. - [Mike] Yes, super, super simple one. - Super simple one but very useful, if you want your LM to be able to retrieve the time, then an agent would be the way to do it 'cause then you would give the API to the agent and then the agent would do the action for the LLM. - Yeah, ask an LLM what the time is, some of them will tell you what the time is, but of course, they don't actually know. - And maybe it needs time to do other stuff, like other tasks after that, so this is a way to make your LLMs more capable, to enhance their capabilities. So basically, going back to a more- - Bye-bye, wizard. - Yeah, to a more serious architecture, the wizard is the foundation model. And in your example on the code, it was Titan. The vector database is, so it replaces the book of spells, but in your example, it was the list of spells, yeah, that you vectorize and put in a vector database. And then we have the agents and basically, oh, yeah, the pre-training, fine tuning, so that could be another model. But then basically, how those pieces interact together is that you send a query to an agent, then the agent will interrogate the foundation model and ask basically, do you need more information to answer that question? If the foundation model doesn't have enough information, it will say, yes, retrieve some information for me. The agent will then retrieve data from the vector database and then, as we showed with the code, enhance the prompt with the query and the necessary data to answer the question. And now we can finally talk about Amazon Bedrock because before that, it was all just, you know, theory, like we were talking about general architecture. But what Amazon Bedrock does for you is that it is a fully managed solution that creates everything for you behind the scenes, so you don't need to vectorize your data, you don't need to create a vector database, you don't need to create agents, everything is taken care of for you. So basically how it shows here in the console is you go to Amazon Bedrock, then you can find your foundation model, so here, you can choose between Jurassic, Titan, Claude, Command, Llama, and Stable Diffusion. If you want to know exactly what each foundation model do, then you have a description of what they're good at doing. Basically, if you wanna use a multi-language model because your use case is to speak with different languages and answer in different languages, then you would use Jurassic. If you just wanna do text generation, then probably Claude is the best option for you. If you wanna generate images, then Stable Diffusion is your pick then, depending on your use case, this is how you will choose your foundation model. Then here, we have the knowledge base. So basically, this is where you put your data. What it is in practice is that like practically, whatever data you have, it could be HTML pages, it could be text, it could be JSON, whatever you have that you want to use to enhance the knowledge of your LLM, you put it in an S3 bucket and then you give the URL to your S3 bucket here. And behind the scene, Bedrock is going to do the vectorization, the embeddings, and put everything in a vector database for you. And then you add the agents if you want to give more capacity to your LLMs to perform other tasks, then here are all the APIs that you can give it so your agents can perform the task for you. - [Mike] Awesome stuff. - Yeah. - So now put my pocket protector in and my tie on, and we'll talk about security, audit, and compliance. And of course, that was something that was super important to you when we talked to you earlier on and we did that poll before. And so I'm gonna jump into another architecture diagram at this point, and it sort of covers sort of what we did before, but there's no wizards this time, sorry. So we have on the left-hand side the apps that we are generating. And I just, I'll pick up on this point as well, we just talked there about Amazon Bedrock and how it can do all of those things. If you've already been experimenting around prior to the release yesterday of those services in the keynote, we also have integration with things like LangChain. So if you've been using the open source LangChain project, which is amazing project and very, very active, very busy, a little bit complicated, you can build those applications, and there is an Amazon Bedrock LLM library that you can use with LangChain as well. So let's for a moment imagine that that's the kind of thing that we're doing here. So we've got an app, our apps, our applications, which are running code, which are interacting with a large language model and they're also interacting with data sources, so very typical kind of thing that you might do with LangChain. If we see a typical flow, so we're thinking with our security hats on now about where connections are being made and where data specifically is flowing, where is the data? So someone asks our app a question that relates to data that sits in the data sources that we have. And so our app will take that question likely, if this is how it's been architected, but this is common. The app will take that query, form a prompt around it, it'll do some prompt engineering with a template like we've seen, and it'll send it to the large language model. Basically, I have some natural language, I'm a Python application, I don't know what to do with that, large language model, please help me out, tell me what's happening. And the large language model, again, depending on how it's been prompted, can respond back. And it might respond back, say, for example, with a SQL query, I really wanna make sure we also understand this, like RAG doesn't have to be a vector database, it's just what everybody likes to talk about. You can do RAG with SQL database, with a CSV file, a text file, whatever. So it could return back this is how to get the data that will answer that question, this is how to work with your SQL database. And then the application will say, excellent, thank you, I know what to do with a SQL query and run that over the data source, get some information back, and it's a new number or it's a table of data or whatever it is. So it could send that data back to the large language model so that the large language model can then process that in context and say, okay, well, based on the question that was asked, you've now run that query, thank you for that, now we've got this data, and send a natural language response back to the application so that the application can do what it needs to do next. And maybe this is responding to a user inside of a chat session, whatever you might be wanting to do. So that's a pretty typical flow, especially if you're using something like LangChain. A lot of this will happen behind the scenes if you're using something like Amazon Bedrock with agents. What's really important to consider here is what's happening here. So all of this data which is flowing to and fro, I sort of talked about it a high level just then, about, but I want you to consider what data is included from what data sources inside of these transactions which are happening with the large language model. And in actual fact, especially, I mean, if you're using, and again, I mentioned LangChain and I do, I like LangChain, I think it's really cool. It's a little bit difficult sometimes to see what's happening in the backend. And so if you do peer into the logs as you're working with this, what you can see, and I'm specifically talking about like an experimental SQL chain that I've been working at, it will say, okay, when it sets up that initial prompt, the very, very first one that sends a query over to the large language model, it will say, I'm a large language model, sorry, no, you are a large language model, you don't know anything about the answer to this question, but you can get the answer from this database that I have access to. And it does say this in natural language, it's freaky how we program these days. But it also says, I have access to this database. This is what this database looks like, here is the table, show table construction, depending on what kind of table it is, here is the first three or four or 10 rows of data from that database, all of it, so that the large language model has the context and can understand how to perform the query so it can create a syntactically correct SQL query. It's the only way it can do it. So let's all just be clear about what information is traveling from the application to the large language model, sensitive data potentially, customer records, not necessarily that filtered. And so that's fine as long as we know that's happening, as long as we know we're keeping that secure. So the security perimeter that we have around the large language model is important and where it is and how it's operated and whether it's in a trusted zone within our architectural makeup is important. And the idea here is that we really wanna wrap all of this, not just the application and the data sources, like we've always done forever in security audit and compliance, but it's also the large language model as well, that is absolutely part of this. It's not just providing interesting chat responses, it actually is seeing and, it's seeing and the system that that runs on sees sometimes that sensitive data, which is fine as long as we know that and we're putting the appropriate security controls around it. So in response to that, I mean, you need to look at the security of the architectures you're putting together, if you're hosting your own model, then you can look at which server that's on. If you are connecting with services, you need to know where those services are and make sure you've got the necessary compliance in place that meets your requirements. Inside of Amazon Bedrock, there are services, there are capabilities to help us with this. So part of this is on the logging, auditing logging of the models. So you can turn on logging for all of the invocations and responses that you're gonna get back from these foundation models. So we can turn on S3 logging, CloudWatch logging, or both. And so we can end up with all of that data saved, which is maybe something that's really useful for a compliance thing, I've gotta tell you, from a debug perspective, it's amazing, it's great. If you are working with LangChain and you are a little bit frustrated by not being able to see exactly what's going on, use it with Amazon Bedrock, hook up this, you'll see everything which is going on, all of the prompts and the templates and the combinations of things get put into there. So that's super useful, and this combines in with your existing security posture and your existing logging solutions. It either goes into S3, you can take it out and put it somewhere else, goes into CloudWatch, again, you can take it out and put it somewhere else or keep it there or do whatever you want with it. So that's the sort of logging and audit. From the perspective of the privacy then, and more to the point if you're working with compliant workloads. So if you are working with stuff that is sensitive to you and you are highly sensitive about where it goes, one, or if you're working in a regulated work, a regulated industry where you can't send data over the public internet, even if you wanted to, even if you had all of the SSL layers and everything, you were pretty comfortable about it, maybe you still can't. And so if that's the case, then we have this option for you as well. So let's just be clear about it, when you are connecting to AWS services like S3 and DynamoDB and all those things, the default position is that you're connecting to a public endpoint, right? It's still authenticated and it's still secure and you've still got encryption and it's fine. But there are these architectures that you may be familiar with, when you have an application inside of a private VPC, which doesn't have any access to the internet, you are able to use these gateway endpoints so that you can get through to these services. So with S3, the easy one, it's a free one as well, you can have an S3 endpoint inside of your VPC so that your traffic goes directly to it. And you can do exactly the same thing with Amazon Bedrock as well, or a very similar thing with Amazon Bedrock as well, it's called private link, it's available for other services as well as Amazon Bedrock. So you can have your application sat inside of your private VPC, you can have then no internet gateway there, so it's got no access to the internet because it's your regulated workspace. And you can have private link endpoint that allows you to go directly to the large language model inside of Amazon Bedrock. And then that connection is not going over the public internet and it opens up the possibilities for you to be able to host regulated workloads. And, of course, IAM, identity and access management, is there for you to be able to define your security perimeters. So we're basically having this architecture with this enormous power of an enormous large language model there. But, and you are completely in control of the security picture. I spoke for a long time. Thank you. - No, that's fine. So a last poll for you, what do you want to explore next in the realm of generative AI? - Yeah, and then we're gonna talk about predictions about what's coming up. Super interested to see what's gonna happen here. - Yeah. - Because you can select more than one, and so maybe you wanna select all of them or none of them. We can see. We can see what the results of that coming up are. And we're gonna have not really any time for questions, I'm very conscious of your time, and some people will need to get to the next session, but I'm more than happy, Tiffany, and I'm sure I can speak on your- - Yes, you can speak on my behalf. - We will both be here, so you can speak for yourself. We'll both be here afterwards and we can take questions and talk to you in a moment. How are we doing with this poll results? Oh, it's very balanced. - It's, yeah, it's surprisingly balanced. Building agents, using RAGs. - Building with agents. I think, I'm not surprised to see that building with agents is sort of like that leading one, if I'm honest. I think that's probably one of the most exciting things that's going to be happening. - Getting capabilities to it next year, yeah. - Over the next, particularly the next year. - Yeah. - Maybe we can move to predictions. But if I press this button, nothing happens. Can you press a button? Thank you. All right. Where to focus? - Where to focus? - I think my wizarding skills, maybe? No. Let's go. - So yeah, as we said, and you answered that also in the poll, so customizing solutions with agents and RAGs, obviously. But also, it was your concern at the beginning, security. And this is actually a big topic, so focus on governance and security. - I think so, yeah, I think the, we've experimented a lot with large language models and generative AI, especially this year. And I think that we should continue to do so because I think we're still unlocking the capabilities of it and we're still figuring out how it works for us. But it's now kind of time to switch to production mode, and I think a lot of what we're talking about here fits into that space. - Yeah. Models will get more sophisticated and add modalities for sure, I mean, we've seen like the exponential curve of LMs being more capable of answering questions, and for sure that we're gonna see breakthrough in the next, following years, even months. Generative AI will unlock projects that were not previously feasible. - Yes, I mean- - We were talking about this yesterday. - We were, and I think, I mean this is something as well that I took a little bit of a lead from this, I'm not gonna pretend like this is necessarily my own thought. So Andrew Ng, who is a very famous luminary in the machine learning world, there's a quite interesting talk actually on YouTube where he talks about what's currently happening in the space of AI and generative AI. And I think one of the messages from that is that the capabilities of these models, we talked about it before, about how they're capable of doing so many things, not just, there is not one thing that they're trained to do. So we are now in a position where we can start to look around our enterprise, social enterprise projects, whatever we're working on, and finding those datasets that actually have value in them, but it wasn't economical to get that value out before. Now we have these pre-trained models, someone's done a lot of heavy lifting for us to make these models. And I think now we're being able to do things which weren't feasible before, economically feasible before- - Economically and just like technically, it's much more easy now- - Absolutely. - To do those things. - All right. - That's a wrap-up. Thank you very much for coming. - [Mike] Thank you so much. - [Tiffany] Yeah.
Info
Channel: AWS Events
Views: 3,277
Rating: undefined out of 5
Keywords: AWS reInvent 2023
Id: aEA6X_IElpc
Channel Id: undefined
Length: 52min 21sec (3141 seconds)
Published: Sun Dec 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.