AWS re:Inforce 2023 - Securely build generative AI apps & control data with Amazon Bedrock (APS208)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Well, good afternoon, and thank you for coming to the session APS208, so this session is all about generative AI, so I hope you're actually in the right place. Now, as you may have noticed, gen AI has taken the world by storm over the last few months, and everyone's actually talking about it. Every organization wants to look at it and try and figure out how they can best leverage it to make a difference to their organization, but they do have some concerns, as I'm sure everyone here has concerns as well. First one is where is the gen AI model actually located? Where is it? Where am I sending my data actually to? Who can actually see the data? Will they use the data to actually train other models? And will the results from these models be full of offensive content? How can we stop that from happening? So what if I could tell you that on AWS you can actually go and build and deploy your own gen AI models within your account that follow your encryption and security policies, where you don't have to worry about managing or scaling any infrastructure whatsoever? So my name is Andrew Kane, and today, we're gonna talk about Amazon Bedrock. - And I'm Mark Ryland. I'm a member of the AWS security team, so I had the opportunity to join in this talk and share some of the presentation, preparation and presentation duties here this morning, so it's very nice to be with you. Let's look at our agenda, and we'll go from here. We're gonna talk what is generative AI? Obviously, a hot topic these days. We'll give a overview of that and the underlying technological shift, which has gone on in the industry over the last year or two of the foundation models, so these are models now with billions and billions of parameters as opposed to our previous layers of technology or levels, which were measured more in the millions. We'll introduce Bedrock as a service, kinda give you that overview. We'll talk about some of the critical topics around Bedrock for this audience, the re:Inforce audience, around data privacy and security, tenancy, how client connectivity will work, sort of the networking perspective on the service, and access management as well. We'll talk briefly about the security in the model challenges. You know, a lot of this talk is about the security of the model, like, this is a workload. It has to be run and operated in a secure fashion, and we'll talk about how you're able to do that, but there's also interesting issues that arise for the use of the technology and some of the security things. We'll touch on that as well, and then, we'll conclude with some talk about other ways you can approach foundation models in the AWS platform, and especially around SageMaker. Take it away. - So the first question to actually ask is quite an obvious one and not really stupid at all. What, actually, is generative artificial intelligence? Well, the clue is really in that first word of generative. The whole point behind it is it can actually create new content and ideas. This could include conversations, stories, images, music, video, all sorts, and like all AI, it's actually powered by machine learning models. In this case, can only really say very large models behind the scenes. They've been pretrained on a corpora of data that essentially is huge, and they are referred to essentially as foundation models, so recent advancements in ML technologies have basically meant that has led to the rise of FMs. They contain now billions, tens of billions, even hundreds of billions of parameters and variables to go into their actual makeup, so clearly, they sound like they could be quite complex. These could be quite difficult things and expensive things to build, so why are they just so popular? And so the important thing to note, really, is at their core, the generative AI are leveraging the latest advances in machine learning. An important thing to also note is they're not magic. They just look like they might well be magic because it's hard to differentiate them from the older models and what they actually do. They're really just the latest evolution of technology that's been evolving for months and actually many years now this technology has existed. It's only recently it's become really mainstream and really big and really powerful. Why the key is, why they're really special is that a single foundation model can actually perform many different tasks, not just one, and so it's possible for an organization to basically, by training it through their billions and billions of parameters, they can teach it to do lots of different things, essentially at the same time. You can instruct them in different ways and make them perform different tasks but you're calling all, you're pushing all these tasks through the same single foundational model, and this can happen because you trained it on, essentially, Internet-scale data, and so it's really linked to all the different forms of data, all the myriad of patterns of data you see on the Internet, which is really quite huge, and the FM has learned to apply the knowledge to that entire data set, so while the possibilities of these things are really, really quite amazing, customers are getting very, very excited because these generally capable models can now do things that they just couldn't think of before, and they can also be customized to perform really specific operations for the organization and really enhance their product offerings to the marketplace, so they can do this customization as well by just using a small amount of data, just a small amount to fine-tune the models, which takes a lot less data, a lot less effort to generate and create and a lot less time and money in terms of compute to actually create the models than if you did them from scratch, so the size, (clears throat) excuse me, and general-purpose nature of FMs make them really different from traditional models, which (indistinct) generally perform specific tasks, so on the left-hand side you can see some slides that basically say there was five different tasks that you want to perform in an organization, so for each of those tasks, you'll collect, collate, and label a lot of data that's gonna help that model learn that particular task. You'll go, and you'll build that model, and you will deploy it, and you can suddenly do tech generation. You do it again. You can then do tech summarization and so on and so forth, and you have teams building, collating, referencing, feeding and washing, changing, updating these data and these models to create those five tasks, and along came foundation models, so what these do that's quite differently is instead of gathering all that labeled data and partitioning into different tasks and different subsets to do summarization, generation, et cetera, you basically take the unlabeled data and build a huge model, and this is why we're talking Internet-scale data. You're really feeding it everything that you can find, but by doing that, they can then use their knowledge and work out how to do different tasks when you ask them, so the potential is very, very exciting where they're actually going, but we're still really in very early, early days of this technology, so customers do ask us quite a lot, how can they actually quickly get, well, start taking advantages of foundation models and start getting generative AI into their applications. They wanna begin to using it and generate, basically, generate new use cases, generate new income streams, and just become better than their competitors at everything that they actually do, so there are many ways of actually doing foundation models on AWS, and as Mark says, we'll touch on those other models, other methods later on in this session, but what we've found really from customer feedback is when most organizations want to do foundation models and want to do generative AI, we found that they don't really want to manage a model. They don't really want to manage infrastructure either, and those of you who worked lots in Lambdas and on containers, you know that that feeling is quite strong across AWS anyway, but what they want to do is they want AWS to perform all the undifferentiated heavy lifting of building the model, creating the model environment, deploying the model, and having all the scaling up and scaling down of those models so they don't have to do anything other than issue an API call that says, "Generate some text from that model "based on my question or based on my instructions." That's all they want to do, so Amazon Bedrock. This was talked about a few months ago in April when we preannounced the service, and we talked about what we're going to be doing in the generative AI space as a service over the rest of this year. It really has a service- or API-driven experience. There's absolutely no infrastructure to manage. You use Bedrock to find the model that you need to use for your use case. You can take those models, you can, (clears throat) excuse me, you can fine-tune some of them as well to make them more specific to your business use case and easily integrate them into your applications because in the end, it's just an API call, like any other AWS service, so all your development teams already know how to call AWS services in their various languages in their code. This actually is no different, so you can start taking advantage of all the other code-building systems that we have such as, excuse me, (clears throat) experiments within SageMaker to start building different versions of the models to see how they perform against each other and start using all the MLOps and pipelines to make sure these things are being built at scale in a timely and correct fashion, and you can do all of this without managing anything, so this is really it at the high level. It's really we see as the easiest way for any customers to build and use generative AI in their applications. Because Bedrock is really a fully managed experience, there's nothing for you to do to get started other than download the libraries for your programming environment, for your IDE, and just call the APIs. It is really that simple. We've taken a problem of deploying a model securely. We're making sure that you can privately customize them, which we'll go through later on the architecture diagrams, and you can do it all without really having to manage anything at all, so we're really excited because what Bedrock's going to be doing, it's going to be the first system that's gonna be supplying models from multiple different vendors in terms of Amazon, Anthropic, Stability AI, and AI21 Labs. All of those models are available within Bedrock through essentially the same API. If you want to generate text, you supply the instructions to generate text and just basically say, "Anthropic title AI21 Labs," and you'll get your response. There's nothing else, as a developer, you actually have to do or worry about. You don't even really need to know where those models live, where they are, how big they are. You just have to know, "I want to call that vendor's model. "Go." That's all you actually have to do, and so we're making sure we also apply all of a AWS's standard security controls to this environment so we can rest assured that everything is encrypted in flight with TLS 1.2 as a bare minimum, and everything's gonna be encrypted at rest, and that's saying, depending what you actually do store at rest, which is not a lot, but when it's there, it's all encrypted by KMS, and you can use your own customized keys as well, and so you can make sure everything there is safe and secure. Now, responsible AI is also key in these situations for all generative AIs, so all of our third-party model providers, they take this really, really seriously because it is a big issue, but in the end, those third-party model providers are responsible for how their models handle the situation, but they take it very seriously so that they're going to be doing a good job, so at Amazon Titan, which is the one that is built by ourselves, essentially, we're gonna use that to make sure that we keep inappropriate content away from the users, so we're gonna reject that content going in to make sure we can't fine-tune a model with just horrible things, and we're gonna be filtering the outputs as well to make sure that if there's inappropriate content like hate speech, incitement to violence, and things of that, profanity, racist speech, that gets filtered out as well, so gonna make, try to make sure those models start, essentially, in a good place, and that you can't fine-tune them away to an irresponsible place, so this is what we're gonna be building into Amazon Bedrock in the Titan models, and it's gonna make everyone's life, hopefully, a lot nicer and clearer and easier, but the models we have are these four on screen, so these are the four big ones. Talk about Amazon Titan first because that one is ours, and it's only gonna be available, at this point, within Amazon Bedrock, and so it's really, at this point, it's a text-based model, or two text-based models, and they can do all the usual text-based NLP tasks that you expect, such as text generalization, summarization, classification, open-ended Q&A, information research and retrieval, but it can also generate text embeddings, which is useful for many other use cases, and they're the ones that we're actually deploying as part of Bedrock. Now, the third-party ones, they've already got different use cases, different nuances, and so when you start to look for or to choose the model you want to use, really look at your use case in more detail to work out which one is better because the next two on the list, AI21 Labs and Anthropic, are also text-based LLMs, so what's the difference? So Jurassic family of models, which is from AI21 Labs, they're really multilingual, by their very nature, and so if you're looking for text-based systems that are really naturally able to handle things like French and Spanish and German, so naturally, without thinking, then those models are really well tuned for those use cases. Anthropic is slightly different with their Claude models. They're really the usual LLMs for conversational and text-based processing, but Anthropic has done an awful lot of research into how to build and develop sort of honest and truthful generative AI systems, and their models are really strong and really powerful. The last one is from Stability AI, which I'm sure everyone's used, everyone's children has used, and even everyone's grandparents have probably used as well. It's probably the most powerful image generation model that is actually out there. Everyone knows about it, so as part of Bedrock, we're using Stability AI, and we're embedding, (clears throat) excuse me, their Stable Diffusion suite of models into Bedrock, so if you want to do text image generation, then that's what you can actually use on us. You too can generate images that can be then used in a high-resolution fashion for things like logos, artwork, product designs, et cetera, prototyping, and all of these things just come out of the box, and so there the models that we're actually doing at this point in time, and hopefully, we're adding more at some point in the future. - So the message is clear. I'll reiterate it, and we'll talk after that on some of the more details, but really, the key value proposition of Bedrock is to quickly integrate some of this technology into your applications, into your business or government agency or other organization applications using tools you're familiar with, using technologies you're familiar with and familiar controls and security controls, privacy controls, making this as easy to access for you as possible, so that's really one of the key takeaways from this overall presentation. Now let's get into some additional details. This is a really important point. We'll say this several times. This comes up in every single customer conversation and, you know, understandable concern is, will you take my inputs, whether those are customizations of the model or my prompts or whatever I'm doing to utilize the model, what will you do with that information? And the very simple and clear answer is we won't do anything with that information because that will be isolated on a per-customer basis for your use, stored securely, et cetera. We'll talk, again, more details on that, but the key takeaway there is this is not going back into the model for further improvements, so that's a very clear customer commitment, and it will enable lots of use cases that otherwise might be difficult for organizations to decide because they'd have to make some trade-offs that we don't want you to have to make. Let's talk a little bit more about sort of the security and privacy aspects, so essentially, as mentioned, you're in control of your data in the Bedrock environment. We don't use your data to improve the model. We don't use it for further model generation. We don't share with any other customer. We don't share it with other foundation model providers, so they're in the same boat we're in, right? We don't use your data for Titan improvements. Other model providers will not see any of your data and will not be used in their foundation models. All of this applies to all of the things that customers input into the system, right? There's many ways that you interact with the system. We'll talk in some detail about kind of multi-tenancy versus single-tenancy model, but in all those circumstances, the things that you provide to the system in order to use the system are not going to be included in the system's behavior outside of your particular context, your customer context. Data security. Obviously, we'll build and operate this in the way we do with a lot of our services, all our services with things like using, you know, encryption of all data in transit, TLS 1.2 or higher, as you may have noticed, those of you who pay attention to our detailed blog posts, we're actually enabling TLS 1.3 on a number of our services going by the end of the year, majority of our services will be willing to negotiate the latest version of TLS, which has a little, some nice performance improvements. We're also supporting QUIC, which is another type of network encryption and speed-up technology for many services, so that's for your data in transit. For data at rest, we'll use AES-256, state-of-the-art symmetric encryption, and again, like with other kinds of services where we're storing customer data, we'll integrate this into the KMS system, so hopefully, everyone's familiar with KMS, but in a nutshell, KMS is a envelope, a hierarchical encryption technology with the notion of envelope encryption, so what that means is that there is a customer-managed key or a service-managed key that's inside the KMS service. Never access the service, is completely unavailable to anyone, including all AWS privileged operators. That base key is used to encrypt a set of data keys, and those data keys are what's actually used for data encryption outside the service, but those data keys are never stored outside the service, except in encrypted form, and what that means is whenever data needs to be decrypted in any of our services, the service has in its possession, if you will, a bunch of cipher text, which is the data that was encrypted with the data key, and it has a cipher text copy of the data key, the encrypted copy of the data key, so when it needs to read and send the data back to you, the service will take the encrypted data key, reach out to the KMS service on your behalf, and you set up permissions, by the way, and you'll see these accesses by the service in your CloudTrail because it's doing work on your behalf. Take those encrypted data keys. Ask KMS to decrypt that data key. Send it a decrypted copy. When it gets that back in the response, it will then use that, decrypt the data key in memory to decrypt the data and send it back to you, and when that operation is done, it'll throw away that data key, or in the case of S3, there's some nuances there. There's a model you can use where the data key gets cached for a while to increase performance, decrease costs, but in general, the data key gets thrown away, and now you're back to where you were before, but by using this method, you get super-high performance, but still ultimate control in things like crypto-shredding where you can literally just manage that upper-level key in the hierarchy, and by getting rid of that, you've actually gotten rid of all access to all the data because the only thing that exists outside the service is encrypted copies of data keys and encrypted data, and that exact same model will be used in the Bedrock service to do this really critical security operation. As noted before, CloudTrail is gonna be logging these API calls, again, all your tools, all your familiarity, these things, you know, these access can be streamed to Security Lake, analyzed with existing tools. That's just, again, a general part of using, utilizing a service built around our core kind of API competency, and all the customization that you do of the models, again, exists in exactly the same fashion: per customer, per tenant, completely isolated, encrypted, and maintained completely separate from the models themselves or any third-party access. Now, there is some configurability. As with lots of things in security, sometimes you wanna have a few knobs and dials. Some things are just off, so this kind of data privacy control, that one's just locked. This is actually different than some of our existing machine learning-based services. You may, those of you who are familiar with our, some of our existing kind of API-based machine learning services, services like Rekognition, Textract, other things, they have the property that we do use data input from customers to improve the models, and that's explicit. It's in the documentation. It's in the terms. You can disable that, and we give you a mechanism for doing that. In fact, we give you a, if you're in an organization, we give you an organization management policy, which is you can declare, like, "I want every account in this whole organization "to not share data back with the service," or, "I want this OU to not do that." You can have a lot of control over that particular setting, but in those more traditional ML services, the default is data is shared to improve the models. In the case of foundation models, we've made a decision, I'd say a strategic decision. We're not just not gonna do that. In fact, it's not even an option. It's not a matter of being the default. It's a matter of not even having the option of the share-back, and so that all the customization you do and all of the inputs that you do remain private to your environment. You do have some other choices, though. We'll talk more about single-tenancy versus multi-tenancy kinds of use cases, which essentially amounts to the degree of customization that you can do. KMS encryption. You don't have to use customer-managed keys. You can use service-managed keys if you like. That would be kind of the simple default if you prefer that or you have the choice. Obviously, model fine-tuning will have certain, you're gonna have a lot of control over the fine-tuning elements and a lot of choices that you're gonna be able to make with how you control and operate that process in terms of the content of your fine-tuning, and then, finally, like any of our services, you'll have access management decisions you need to make. You'll use IAM controls and SCPs and all our normal capabilities around controlling access to APIs to make decisions about who can access what and when and how. Let's talk briefly, then, about the tenancy models, and essentially, what the tenancy models boil down to is really the customization element. In a single-tenant endpoint, you have a deployment of the model that's available to you, and that's true essentially, in the multi-tenants case, essentially, you're accessing a model, but it's being shared across multiple tenants, but that's essentially, think of it as a read-only object. You're not modifying it. No one else is modifying it, so sharing is a perfectly safe thing in that case. In a single-tenant model, however, you can actually fine-tune the model, and that isn't required, but it's an option you have in that singleton is a modality, and you're gonna be doing that for just your data, just your customizations, and that, essentially, becomes your own copy of this overall, the behavior of the model. The combination of the base model and the customizations are something that now you're creating and provisioning and managing, or it's being managed on your behalf by the service. In the multi-tenant endpoint model, you're not doing those customizations, so there'll be some cost benefits, some, you know, operational benefits and simplicity here, but a lack of customizability and tunability in this type of approach. In both cases, the same promises apply, we've already mentioned and we'll continue to mention because this does become kind of one of the front-of-mind or continues to be a front-of-mind question for customers, and that is your inputs and the outputs will remain completely private to your environment. All of these models are deployed and managed within service accounts with all the controls we have around lots of isolation and protection from all kinds of possible threats, and then, finally, importantly, not only do we protect your data from our first-party model, but we're protecting data from the third-party models as well, so that means that you have that level of isolation that you want and that you'll depend on. Okay, let's talk a little bit about networking. This is, you know, access always involves both identity aspects, network aspects, or combined in our kind of zero-trusty world, so let's talk a little bit about that so we'll set up a basic environment, you know, notionally here we have a region. We have a client account, which you can think of as a kind of container, although not a network container, and then, of course, VPCs is kind of our fundamental networking container construct, and you have that environment in AWS. You also, obviously, often have a corporate network outside of AWS, and on the right side of this slide, as you can see, the Bedrock service is represented to you as an API endpoint, just as if you were using S3 or any other, or DynamoDB or any other API-driven service. When you wanna access that API, you have a couple of options. You can go over public address space, if you like, either Internet from your corporate network or using a NAT gateway or an IGW, what have you, the sort of standard technologies in AWS, and you can reach that API endpoint available to you from the Bedrock service. Now, I will note that, you know, sometimes there's a misconception that that upper yellow path from, say, a NAT gateway to an AWS service, people say, "Oh, the traffic's going over the Internet." This is not true. It's going over public address space in the same region. It never exits our private network or our border network. We encrypt all the traffic. We both encrypt all the traffic between facilities in all our regions, so even traffic going down a public road in the same availability zone, if the fiber optic is outside of our physical control, we're encrypting all that data all the time with a technology we call Project Lever, so this is actually a super-safe and secure path, but it does use public address space, which, for many people, in their imagination think is a source of risk, so if you don't wanna do that, you don't have to, but I wanna just point out that there's actually, there's really no risk there in terms of the risk you might assume if you're doing true Internet-based connectivity. The other path, of course, is the Internet, and although you're using TLS, and you're probably fine, there are a certain set of additional risks there, but they're, you know, pretty manageable. However, none of this is required because you can all do this through private paths as well, so you can set up a private link connectivity to the API endpoint. These are also called VPC endpoints, so the service will have a VPC endpoint. You can connect to this abstract network object we call an ENI, and all of your traffic will essentially be tunneled from your VPC to the API endpoint of the service. You can backhaul traffic to and from your corporate network over Direct Connect and TGW and all existing networking constructs and essentially create a private path to use the Bedrock service, and you can even write things like service control policies or IAM policies, which limit access to only certain network paths, which is also a very useful feature if you wanna, for example, block all access from non-private paths, (indistinct) all existing options which will apply to this service. - Okay, thank you. Thank you again for clarifying that public address space does not mean the Internet. I've had that question every day for must be eight years, so on the left-hand side of the diagram now, you can basically abstract it away everything Matt just said, which is this is where the way all the traffic is coming in. It's gonna come and hit its endpoint no matter what the source is, whether it was corporate data center, Direct Connect, Internet, doesn't matter. It's all gonna hit there, so let's talk about how some of the data flows work within the service itself, so we'll start with multi-tenancy inference, so on the right-hand side, you'll see there's a model provider escrow account, which Mark mentioned the previous slide. We have one of these per model provider per region, and each one contains a bucket to hold the basic models for that model for the provider, and also anything that's been fine-tuned for that provider, just so you know, to set the scene before we get going, so when the request comes in, it's gonna come and hit the API endpoint and get to the Bedrock service, and then, IAM permitting, of course, if they can actually make that request, it'll get passed to the runtime inference service. Its job is then to decide which of these model provider escrow accounts holds the endpoint I'm looking for for this multi-tenant request. It'll find it, send the data to, over, again, TWS connections, obviously, pick out the response from the model, and return it back to the user. All nice and simple, nice and straightforward. IAM's in play, encryption's in play, and nothing gets stored in the escrow account to record what happened that none of the model vendors can access the account anyway to actually look at the data that doesn't exist, and none of that data will get used by any vendor to train anything else. Again, we're gonna keep repeating this. We also see, at the bottom of the main service account, there's something called the prompt history store. Now, this is because we have a playground in the Amazon Management Console, which you've probably seen on every other gen AI vendor on the Internet, where you can type in your queries, you get some prompt responses, and they've cached it somehow somewhere so you can go back and edit your response and submit another variation until you get the right result you're looking for as you're crafting your query, so the console allows you to also store those queries as well, and so the service account, if it gets a console-based request, will store it in the encrypted prompt history store just for your account, which you can delete if you so wish at some point in the future, but it's there really just to make your life in the console and in the playground that little bit easier, so essentially, that's multi-tenancy. Single-tenancy is quite similar, in fact. If you go back to the forwards a few times it's extremely similar in the way that it actually works. We have, again, we have the same model provider escrow account on the right-hand side, but this time, the model on the endpoint is being deployed either from the base model bucket, so you have, like, a private version of one of those models, or it comes from a fine-tuned model bucket instead, and it's one that you've built, you've created, you've tuned, and it deploys that instead, so when the request comes in on the left through the API endpoints, hits the service, again, IAM permitting, goes to the runtime inference service, which, again, picks the right escrow account, picks the right endpoint, sends a request, picks up the response, and passes it back, and also, again, that we've stored that information in the prompt history store, if relevant because the (indistinct) came from the console, and again, we've got the same caveats again on data storage and on encryption. Everything's still TLS 1.2 across the board left to right. Nothing is stored within the escrow account as part of the inference. None of the providers can get to that, therefore none of the data can be used to train other models. It's, as we say, nothing is stored, and nothing is accessible. Nothing can be used by anyone else, so those two really are quite the same, which is quite important for developers because essentially, the difference between these two approaches, the single and multi-tenancy approach, is in the API core, you're changing literally one parameter that says, "I'm calling Anthropic this time. "Okay, I'm gonna call Titan this time," and that's essentially the change that developers has to make. There is nothing else. You're probably gonna use very similar prompt text. You're gonna be calling it in the same part of the application for the same use case, and you're just changing one thing, and you also get the point of view from the service team, of course, that, conceptually, this all makes sense. It's all very consistent, so even internally for us, it makes a lot of sense to do it this way. We're trying to remove all the complexities from the customer perspective and also from our perspective to make this as simple to do as possible. Moving on to possibly the more interesting one is the model fine-tuning, and so on the right-hand side, you'll see, again, this time, the customer account has appeared, which we'll talk about in a second, but again, this starts off on the left, as you imagine. Request comes into do fine-tuning to the endpoint, hits the service, IAM permitting, of course. It will then call the training orchestration piece. Now, what that does is in the relevant escrow account for that model provider whose model you're about to fine-tune, it will start an Amazon SageMaker training job. What that will do behind the scenes, it will load the particular base model you want to tune from the base model S3 bucket, and then it will reach into an S3 bucket that you nominate in your account to read the training data, but this could just could be the S3 address if that's all you wanted, but you could also provide it the VPC information, such as subnets and security groups, and then, you can make it essentially drop an ENI into the VPC so it will reach out to your S3 bucket via your VPC, so if you have S3 endpoints in your VPC or bucket policies that says only this VPC can access my bucket, great. That all still applies, and so the service is actually reaching down into your account and using whichever policies you've set up in that account or in that VPC to access your bucket, so, again, once the model is trained, it's gonna be encrypted again and dropped into the relevant fine-tuned model bucket and can then be deployed later as a single-tenancy endpoint, but through all this process, none of the data from your S3 bucket is then stored in the escrow account. The model is, of course, that's built and deployed, encrypted with your keys, and stored in the bucket. The model providers don't get to see that data either, so no one has any idea what you're actually doing in terms of training that model, so then we can take that data, and then we can see your use case and think, "That's excellent. "Let's go and steal your data "because we're the model provider. "Surely we can access it." No, you can't, so everything is safe, secured, and encrypted, and even the access path for S3, as shown on screen, is entirely under your control, so again, it makes the whole thing really safe and really secure, and this is the whole thing in one go, and so, conceptually, it is really simple, although, in this case, we're just showing one model provider escrow account. We know there's many pair region, based on the one-pair model, but this is how the whole thing actually works. You can see all the pathways in one place. You can really see clearly what's happening. The one thing we haven't really called out is at the bottom, think Mark mentioned before, that CloudWatch and CloudTrail are definitely in play. Anything that's used by the servers or touched by the servers is gonna be put out to CloudTrail. Any metrics that we want to be defined for CloudWatch will be output to CloudWatch in your accounts, so just for simplicity, we took them off the diagram to make it more focused on the flows themselves, but hopefully, this all makes sense. - And speaking of IAM, just to talk very briefly about... Again, this should be familiar to you if you're an AWS person or an engineer or someone who does security work in AWS. We'll follow the standard model that we follow with identity and access management. There'll be identity-based policies, so that means all the principals who want to use or access the Bedrock service will need to have the right permissions in a policy associated with their role or their other principal, and in those policies, again, you'll have the normal capabilities. You can define the actions. You can define resources, so you can specify which models, for example, are accessible for this particular principal. We'll support what's called attribute-based access control, ABAC, which means that you can also write permissions in terms of tags associated with principals and tags associated with some of the resources and objects. This gives you some additional flexibility that many people desire, and it's generally a trend in AWS to move to ABAC-based access control, so all this should be familiar to you, but it's, again, gonna be present and sort of standardized in the Bedrock service as well. A very simple example of a policy that one might write, in this case, it's a deny statement, which actually would work as a service control policy as well. You might have, for example, a principal who has access to most of the models in the system, but there's one in particular that's special in some regard, and you want to deny access for invoking that model, so you can write a deny policy, apply it to either the principal or to the account or the OU or the organization, and you can exclude that particular access from the general permissions that you've granted to a principal or a set of principals, just as one example but again, those of you who know AWS, this is old stuff, and you would understand how this works, and will just continue to apply that. Now, we've been talking a lot about what I'll call security of the model. That is, it's a workload, right? So you need to secure the workload, and you do that with a lot of the technologies we've invented with some cool new innovations, like the ability to access directly from your VPC and have control over even some network flows that you may not in traditional AWS services, but there's a whole 'nother aspect to this whole world, which we'll only touch on today, a very interesting aspect, and CJ talked about it in the keynote, I hope you saw that yesterday. There are challenges in using these technology. There are security concerns and other types of concerns in the security of them in the model, like, how do people use it? How can they potentially misuse it? These are all concerns that have come along with any new technological invention or innovation. I think of technology as an amplifier, right? It amplifies human capacity and capability, and when you can amplify something, you can usually do that for good or you can do it for ill, and so there undoubtedly will be attempts to use this technology for malicious purposes or, let's say, illegitimate purposes. Maybe the two are synonymous; maybe not. Maybe not quite the same to try to hack something versus, you know, have a little help with my homework, but we're all aware that that's going on out there, and this is gonna be a challenge that we all face and work on together. The model builders will do their utmost to protect the use of the models from certain kinds of malicious activities, but, again, abusive uses are possible, and people will certainly try to work around limitations or work around filters, and we see that today in the industry. We'll continue to use the technology to enhance the technology, and we'll learn from mistakes and problems and continue to improve the securities models, but it is something that we, as a community, have to be aware of and be building the kinds of protections and utilizing, creating the use cases that can account for the possible risks that are created in these environments. One of the things that I find interesting, and I've been reading, like, again, like many of you, about these topics and trying to learn what I can, and even the way that you, so you can think of a... These are probabilistic systems, right? And so they are amazing at what they can do, but it's not, by definition, like, literally error-free. You can't ever say there's no errors that result from a probabilistic system like this. One of the interesting details that I've learned, and perhaps you know this as well, is that although they're probabilistic systems, they're, by nature, they're deterministic systems, so unless you do some magic, if you enter the same prompt, you always get the exact same result, but why don't they do that? Well, because if that's how the, let's say, consumer version of a foundation model worked, people would quickly think, "Oh, this isn't particularly intelligent. "It's just a computer doing what computers do," so what happens, there's actually a param, there's a set of parameters in the models. It's called the model temperature in which the designer of the model can turn a knob of randomness, and so the same prompt will result in different outputs each time, creating the illusion that there's some real kind of creativity or intelligence there which might not be created if the same prompt always had the same output. Just a little point that helps people to grasp and understand, helped me to grasp and understand super-powerful technology, but maybe not quite as magical as it first appears, and another reasons I bring that up is in a lot of business use cases, put aside the consumer and, like, amazing stuff you can do use case, in business use cases, deterministic responses could be very useful, right? You might not have a temperature setting. You might want it to always give the same answer to the same question because for the business use or the more focused use, that's exactly what you want, so that will be, that's the kind of option that you can enable in a business or enterprise-oriented version of these kinds of services in a way that hasn't so far kind of hit the public consciousness because, again, we're sort of being amazed and entertained by these what I'll call more consumer use cases. Another thing that I find particularly interesting is how do you characterize the error of the outputs? Some kinds of errors are, for example, someone says, "This model contains... "It's biased," so it shows, for example, racial bias. Completely agree that's a problem, but it's a kind of error, right? It's something about the input is not matching the desired output, and we need to train and tune and filter in order to get the socially desirable outputs from a system which, again, has no intrinsic moral compass, if you will. Other kinds of errors I've seen declared it's a vulnerability. That's a security vulnerability, and I look at it and I say, "Well, I can see why you might say that "because it has a security implication, "that particular error, "but it's just another kind of error, "and I'm not sure "that characterizing it as a vulnerability is very helpful. "Maybe it is; maybe it isn't," and I think these are the kinds of discussions that, as a community, we need to have, but I think the key thing here is to recognize that we have to come to grips with the probabilistic nature of the models, the incredible value they create, but also the risks that are created through, you know, some of the manipulations and potentially malicious use of these systems. Now, what are the opportunities we have? Well, first of all, there's really clear low-hanging fruit that I've already seen, and you'll see quickly in a lot of products, including our products and others, is much better user experiences around things like query. Like, think about, you know, query languages. I have to learn SQL today in order to do... So, I mean, the natural language processing capability, we already have this on our QuickSight service, is just amazing. You can literally ask very normal human questions and get really good results using this type of technology. The fact that using domain-focused foundation models, and I'll give an example in a minute, is really useful because now, if I can scope down, like, it's amazing that you can do all kinds of broad range of things, but if I'm willing to scope down the desired outputs to a particular domain, there's really cool things you can do that are hard to do in the very general use cases, and another thing I think that will be common, at least in this first, you know, few, this first time period as we kind of learn and adapt to this new technology, will be supporting human judgment, so you'll ask advice of these systems. You'll get input from systems. You'll get very good help in solving some problem, but you probably won't do a full closed-loop automation because if there's an error in the output, and that results in a change in, say, a security setting in your environment, that could be a problem, so I think you're gonna, you know, I've been actually pretty impressed in the security community, where I tend to live, is that people are impressed with this technology, but they're a little bit skeptical that it will immediately solve a bunch of problems because they recognize that even, like, a 3 or 5% error rate is a problem if it means, like, shutting down a production system accidentally because you changed a firewall rule that kind of would normally make sense but didn't under those circumstances, but that doesn't mean it's not super-useful to get advice that's normally correct and then apply human judgment to that, so I think those are some issues that we, as a community, will continue to work on, but within the Bedrock framework, you can think of your ability to, again, customize and tune these systems to meet your business needs or the needs of your government agency, and I'll give, I think, a really cool example of that, and that is Amazon CodeWhisperer, which you've heard talked about this week already, but really think about what this tool is doing, so it's a pretty focused use case. It's gonna provide you with source code in languages of your choice. It supports a lot of languages. You know, in response to a human prompt, it'll write code for you. Doesn't write it, but it'll generate code for you, and it will help, you know, embed that in your IDE and give you some information about that code, but think about that, so because it's focused on that domain, what it does is it takes the generated output, and it compares it to its corpus and says, "Does the generated output "sufficiently resemble any inputs "in my giant massive database of source code "such that that could be reasonably seen "as the same or closely derivative work?" And if it does, there may be a licensing issue there because it might be under an open source license that's not acceptable in your organization, or maybe it is, or maybe it isn't, but now, what the tool will do is to say, "Look, this code is closely related to this code, "and here's the URL for where that code came from, "and here's the license that it's under," and it will, like, stick that in a comment in your source code, and you can decide, as a developer, under the policies of your organization, whether to use the code or not and in what way to use the code. Again, that is in a general-purpose system would be very difficult to build, but in a special-purpose system is super-valuable, and so, again, I think these kind of enterprise-type use cases will be where we see a ton of value and success for foundation models in generative AI. You know, CodeWhisperer also does security scanning using more traditional, both ML-based but also kind of rules-based, of the code that it generates, and so whether you're writing the code or you're asking it to generate code for you, you'll still get a bunch of security protections looking for all the standard, you know, top 10 OS things and other kinds of static code analysis types of capabilities. Super-useful. Off we go. - 'Kay, so we did mention earlier that there are other ways to do large language models and foundation models on AWS apart from Bedrock, although, personally, I'm quite more biased towards Bedrock, that's just where I am at the minute, so we give you the ability to be quite flexible in your choices of models, your choices or platforms, and let you build your own models from scratch if you want to or use some prebuilt pretrained models if you want to, just to try and make sure you're doing the right thing for your use case, so Amazon SageMaker JumpStart is a great example of this. It's an MLHub that offers you a number (indistinct) algorithms and models, et cetera, that you can just deploy yourselves within your account, so you can use that to discover all sorts of different LLMs or FMs within the environment, so such things, like, that aren't, especially in Bedrock, as an example, so you can look at the OpenLLaMa models. You can look at the FLAN-T5 models, look at the, (clears throat) excuse me, the Bloom models, which aren't in Bedrock, but they are in JumpStart, and so if those models, if you've read about them and you see how powerful they are at what they do, perhaps that they're particularly niche use cases in some situations, it's worth a look. It's worth to try those out to see if they're actually more suitable for what you're trying to actually achieve, and, of course, we're adding more and more models to JumpStart. I think we've added twice the number of models this year already to what we had last year, and so the growth there of what we support is just getting bigger and bigger and bigger, and it's a mixture of open source models or proprietary models, and so we're really giving you as much choice as possible to find the right sort of gen AI-type platform that you can use within your AWS environment. Now, some customers, they actually do need to build their own model from scratch, and as you can imagine, as you've kind of alluded to, it's quite a large, lengthy process. You have to collect all of the data that's relevant, get it all reviewed, get it into a useful form for the models, get it built, which takes time, but you can do all these things within Amazon SageMaker. The Amazon SageMaker tools let you do all these things at scale. It lets you build very reliable, very stable, very scalable models. It lets you do distributed training in certain cases so you can really reduce the training time, and you can use things like the debugging tools to find issues perhaps with the training in mid-run so you can correct those errors, and you can also do things to just analyze other metrics as part of that training situation, and really, really helps you do that work. I mean, you still have to know what you're doing, understand the models, understand your data, but SageMaker itself makes it a really straightforward thing to actually do, so if that suits your use case, and for some customers it absolutely does, you can do that as well, and because SageMaker also supports the human-in-the-loop process, when you're collecting your data, you can actually apply that sort of human knowledge and human judgment to the data that's coming in to make sure you're training your models on the right and relevant data for your use case and your domain, so the difference between proprietary and publicly available, for most customers, it's quite confusing. The licensing situation definitely comes to play because each of these, even if they are open source and public models, they still have a license condition. You will have to adhere to whatever they say, so that must take part of your choice when you select the models, but one thing to think about is proprietary models, may, I stress the may, be more accurate than the open source ones or the publicly available ones, but they also may be more expensive in comparison to them if they have, say, a similar model size, and so there's pros and cons of going with either way, so it really is down to you to look at each of the models. Look at the licensees. Decide on how many different model sizes do we have? Do we just have one model in this particular family? Or do we have a huge number, like Jurassic for AI21 Labs? There's quite a lot of variations in size and complexity and speed, so once you've looked at the license conditions, complexity, and speed, you've pretty much got an idea of which ones are gonna work, but the next thing to really think about as you get onto this is different models also support different languages, and we're not talking Python here; we're talking French, German, Spanish, Italian, et cetera, so you'll find that some, well, most of them support English, anyway. Some, such as AlexaTM, will support a big bunch of languages, including things like Arabic and Japanese, which are less common in some of these environments. You'll find some really concentrate on some Central European ones, and some, like some of the, I think LightOn will also support quite a lot of things in French. That's very, very powerful in French, and so you have that extra thing to look at as well, so there is a lot of choice, and so when you come to do your POCs, there's a lot of things to think about, but to actually test these is really straightforward, and before Bedrock was actually available, this is what I was doing to play with stability, AI and AI21 Labs just going via JumpStart, so once you've gone through the model list and decided, "I want to try this one or that one, "give them a go," you'll find that most of them actually have a playground as part of the console on the AWS Management Console, so you go into JumpStart. You pick AI21 Labs because it's top of the list there, and you get a playground option, so you can go straightaway and start typing in queries, start typing in prompts, start giving it sort of extra one-shot data for your extra context to your query, and it just works. There's nothing to build, nothing to deploy. Of course, it is a playground. It's not a production environment you can use, but if that works well for you, you can then click another button, essentially, and it gives you the code or the notebook that you need to go and launch an endpoint, and it will then go and deploy AI21 Labs, the relevant Jurassic model to a SageMaker, and it deploys it in your account, and so all the (indistinct) going to are gonna come from your account, and all of the logs are generated by it are gonna be basically available in your account, and so you're building your own private, essentially deploying your own private fusion model, and the real big difference is, in that sense, is it's a bit more work, in that sense, but the number of models available to it compared to Bedrock is just bigger, and so the way I work in this now was AI21 Labs and Stability AI on JumpStart. Now they're in Bedrock. I use Bedrock because the API experience is so much simpler, and for me, that's the thing I'm looking for, and again, is one of the powerful things about Bedrock, so how to actually get started? The first way to get started is not to use one of our models and JumpStart or Bedrock. It's CodeWhisperer, which sounds a bit silly, in that sense, but if you deploy that, you're instantly getting available to you large language models in your development environments, and developers can start generating code that suits what they're trend to actually achieve, and once you start using this, they start realizing, "I can describe my function in one line," so I get a function that can do, let's say, want to write this JSON structure out into this different format and then write it to this storage object. That's the only thing you want to try and describe is into CodeWhisperer, and it would generate the code as well as asking you, "Do you want the read function as well?" And it would generate that for you as well, and there's nothing you have to do except look at the code and just double-check that it passes all the rules and regulations and checks that Mark mentioned, and make sure it suits your in-house style, and that code is good to go, but it really gives you that early experience of actually using LLM prompts 'cause essentially that's what it's doing. You're writing a prompt. You're writing a query to generate the code, and once you really get that feel, which you can have within, I think, 10 minutes to get it installed in PyCharm for Python is really, really quick. You can start doing this and getting a feel for it and then realize, "Actually, I can think of a lot of use cases to use this," so once you've got that in place, you can start then looking at Bedrock or JumpStart, depending on which model you're looking at trying out, and they're the obvious places to go to start your gen AI journey because either Bedrock as API or JumpStart as a single, maybe a double-click to get going, and it means you can have these things available, again, within seconds if that's what you're trying to achieve, so once you looked at these things and you think, "Actually, "gen AI could do my business a lot of good. "There's a lot of benefit we can achieve "with these models and this new technology," so what do we do? You gotta get a POC, but what we have found in early conversations is that a customer comes to a meeting, and they say, "We've got these top three use cases," and you talk through them for a few minutes, and you realize they're not your top three use cases. Actually, it's these six things over here you just hadn't realized you could now do, so this is often a revelation to customers, so we actually have a program that's called the AWS Generative AI Incubator program, which is really sort of a applied scientist who will come on site and help build those initial discovery workshops for you and actually help you find out what, actually, are my top five, top six use cases? And they'll take them and then help you do those early POCs and get you to a stage where you can actually think about can this go into production? Is this actually gonna add the value that I want? Hopefully, yes, but that first decision point of working out what use case to use is actually quite tough because it is a completely different way of thinking, and that program team, they can actually do it quite well and help you get to the POC, hopefully, much faster, and if you're using an API system such as Bedrock, you could start that POC within minutes of the first meetings on use cases finishing. It's really, really straightforward, so that's really it for the discussion today, so hopefully, we'll talk to you about what Amazon Bedrock actually is and what it actually does, where it sits in the ecosystem compared to things that Amazon JumpStart, and Mark's gone through some of the security concerns that we really have to think about if you're putting these things into production 'cause if you don't think about them now, your security teams will think about them really quickly once these things get anywhere near a production in live state, so thank you. - Thank you for your time, and we'll be around later for questions if you're welcome to come up. Thanks. - Thank you. (audience clapping)
Info
Channel: AWS Events
Views: 20,317
Rating: undefined out of 5
Keywords: AI/ML, AWS, AWS Cloud, AWS re:Inforce, Amazon Cloud, Amazon Web Services, application security
Id: 5EDOTtYmkmI
Channel Id: undefined
Length: 54min 48sec (3288 seconds)
Published: Mon Jun 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.