- Well, good afternoon, and thank you for coming
to the session APS208, so this session is all
about generative AI, so I hope you're actually
in the right place. Now, as you may have noticed, gen AI has taken the world by storm over the last few months, and everyone's actually talking about it. Every organization wants to look at it and try and figure out how
they can best leverage it to make a difference
to their organization, but they do have some concerns, as I'm sure everyone here
has concerns as well. First one is where is the gen
AI model actually located? Where is it? Where am I
sending my data actually to? Who can actually see the data? Will they use the data to
actually train other models? And will the results from these models be full of offensive content? How can we stop that from happening? So what if I could tell you that on AWS you can actually go and build and deploy your own gen AI models within your account that follow your encryption
and security policies, where you don't have to worry about managing or scaling any
infrastructure whatsoever? So my name is Andrew Kane, and today, we're gonna
talk about Amazon Bedrock. - And I'm Mark Ryland. I'm a member of the AWS security team, so I had the opportunity
to join in this talk and share some of the presentation, preparation and presentation
duties here this morning, so it's very nice to be with you. Let's look at our agenda,
and we'll go from here. We're gonna talk what is generative AI? Obviously, a hot topic these days. We'll give a overview of that and the underlying technological shift, which has gone on in the industry
over the last year or two of the foundation models, so these are models now with billions and billions of parameters as opposed to our previous
layers of technology or levels, which were measured more in the millions. We'll introduce Bedrock as a service, kinda give you that overview. We'll talk about some
of the critical topics around Bedrock for this audience,
the re:Inforce audience, around data privacy and security, tenancy, how client connectivity will work, sort of the networking
perspective on the service, and access management as well. We'll talk briefly about the security in
the model challenges. You know, a lot of this talk is about the security of the model, like, this is a workload. It has to be run and
operated in a secure fashion, and we'll talk about how
you're able to do that, but there's also interesting
issues that arise for the use of the technology and some of the security things. We'll touch on that as well, and then, we'll conclude with some talk about other ways you can
approach foundation models in the AWS platform, and
especially around SageMaker. Take it away. - So the first question to actually ask is quite an obvious one and
not really stupid at all. What, actually, is generative
artificial intelligence? Well, the clue is really in
that first word of generative. The whole point behind it is it can actually create
new content and ideas. This could include conversations, stories, images, music, video, all sorts, and like all AI, it's actually powered by
machine learning models. In this case, can only really say very large
models behind the scenes. They've been pretrained
on a corpora of data that essentially is huge, and they are referred to
essentially as foundation models, so recent advancements in ML technologies have basically meant that
has led to the rise of FMs. They contain now billions,
tens of billions, even hundreds of billions
of parameters and variables to go into their actual makeup, so clearly, they sound like
they could be quite complex. These could be quite difficult things and expensive things to build, so why are they just so popular? And so the important
thing to note, really, is at their core, the generative AI are leveraging the latest
advances in machine learning. An important thing to also
note is they're not magic. They just look like
they might well be magic because it's hard to differentiate them from the older models and
what they actually do. They're really just the
latest evolution of technology that's been evolving for months
and actually many years now this technology has existed. It's only recently it's
become really mainstream and really big and really powerful. Why the key is, why they're really special is that a single foundation model can actually perform many
different tasks, not just one, and so it's possible for an
organization to basically, by training it through their billions and
billions of parameters, they can teach it to do
lots of different things, essentially at the same time. You can instruct them in different ways and make them perform different tasks but you're calling all,
you're pushing all these tasks through the same single
foundational model, and this can happen because you trained it on,
essentially, Internet-scale data, and so it's really linked to all the different forms of data, all the myriad of patterns of
data you see on the Internet, which is really quite huge, and the FM has learned
to apply the knowledge to that entire data set, so while the possibilities of these things are really, really quite amazing, customers are getting very, very excited because these generally capable models can now do things that they
just couldn't think of before, and they can also be customized to perform really specific
operations for the organization and really enhance their product offerings to the marketplace, so they can do this customization as well by just using a small amount of data, just a small amount to
fine-tune the models, which takes a lot less data, a lot less effort to generate and create and a lot less time and
money in terms of compute to actually create the models than if you did them from scratch, so the size, (clears throat) excuse me, and general-purpose nature of FMs make them really different
from traditional models, which (indistinct) generally
perform specific tasks, so on the left-hand side
you can see some slides that basically say there
was five different tasks that you want to perform
in an organization, so for each of those tasks, you'll collect, collate,
and label a lot of data that's gonna help that model
learn that particular task. You'll go, and you'll build that model, and you will deploy it, and you can suddenly do tech generation. You do it again. You can then do tech summarization
and so on and so forth, and you have teams building,
collating, referencing, feeding and washing, changing, updating these data and these models to create those five tasks, and along came foundation models, so what these do that's
quite differently is instead of gathering all that labeled data and partitioning into different
tasks and different subsets to do summarization,
generation, et cetera, you basically take the unlabeled data and build a huge model, and this is why we're
talking Internet-scale data. You're really feeding it
everything that you can find, but by doing that, they can
then use their knowledge and work out how to do different
tasks when you ask them, so the potential is very, very exciting where they're actually going, but we're still really
in very early, early days of this technology, so customers do ask us quite a lot, how can they actually quickly get, well, start taking advantages
of foundation models and start getting generative
AI into their applications. They wanna begin to using it and generate, basically,
generate new use cases, generate new income streams, and just become better
than their competitors at everything that they actually do, so there are many ways of actually doing
foundation models on AWS, and as Mark says, we'll
touch on those other models, other methods later on in this session, but what we've found really
from customer feedback is when most organizations
want to do foundation models and want to do generative AI, we found that they don't
really want to manage a model. They don't really want to
manage infrastructure either, and those of you who worked lots in Lambdas and on containers, you know that that feeling is quite strong across AWS anyway, but what they want to do is they want AWS to perform all the
undifferentiated heavy lifting of building the model, creating the model environment,
deploying the model, and having all the scaling up and scaling down of those models so they don't have to do anything other than issue an API call that says, "Generate some text from that model "based on my question or
based on my instructions." That's all they want to do, so Amazon Bedrock. This was talked about a
few months ago in April when we preannounced the service, and we talked about what
we're going to be doing in the generative AI space as a service over the rest of this year. It really has a service-
or API-driven experience. There's absolutely no
infrastructure to manage. You use Bedrock to find the model that you
need to use for your use case. You can take those models, you can, (clears throat) excuse me, you can fine-tune some of them as well to make them more specific
to your business use case and easily integrate them
into your applications because in the end, it's just an API call, like any other AWS service, so all your development teams already know how to call AWS services in their various languages in their code. This actually is no different, so you can start taking advantage of all the other code-building
systems that we have such as, excuse me, (clears throat) experiments within SageMaker to start building different
versions of the models to see how they perform against each other and start using all
the MLOps and pipelines to make sure these things
are being built at scale in a timely and correct fashion, and you can do all of this
without managing anything, so this is really it at the high level. It's really we see as the
easiest way for any customers to build and use generative
AI in their applications. Because Bedrock is really
a fully managed experience, there's nothing for you
to do to get started other than download the libraries for your programming
environment, for your IDE, and just call the APIs. It is really that simple. We've taken a problem of
deploying a model securely. We're making sure that you
can privately customize them, which we'll go through later
on the architecture diagrams, and you can do it all without really having to
manage anything at all, so we're really excited because what Bedrock's going to be doing, it's going to be the first system that's gonna be supplying models from multiple different
vendors in terms of Amazon, Anthropic, Stability AI, and AI21 Labs. All of those models are
available within Bedrock through essentially the same API. If you want to generate text, you supply the instructions
to generate text and just basically say,
"Anthropic title AI21 Labs," and you'll get your response. There's nothing else, as a developer, you actually have to do or worry about. You don't even really need to
know where those models live, where they are, how big they are. You just have to know, "I want
to call that vendor's model. "Go." That's all you actually have to do, and so we're making sure we also apply all of a AWS's
standard security controls to this environment so we can rest assured that everything is encrypted in flight with TLS 1.2 as a bare minimum, and everything's gonna
be encrypted at rest, and that's saying, depending what you
actually do store at rest, which is not a lot, but when it's there, it's
all encrypted by KMS, and you can use your own
customized keys as well, and so you can make sure everything there is safe and secure. Now, responsible AI is also
key in these situations for all generative AIs, so all of our third-party model providers, they take this really, really seriously because it is a big issue, but in the end, those
third-party model providers are responsible for how their
models handle the situation, but they take it very seriously so that they're going
to be doing a good job, so at Amazon Titan, which is the one that is built
by ourselves, essentially, we're gonna use that to make sure that we keep inappropriate
content away from the users, so we're gonna reject
that content going in to make sure we can't fine-tune a model with just horrible things, and we're gonna be filtering
the outputs as well to make sure that if there's
inappropriate content like hate speech, incitement to violence, and things of that,
profanity, racist speech, that gets filtered out as well, so gonna make, try to make sure those
models start, essentially, in a good place, and that you can't fine-tune them away to an irresponsible place, so this is what we're gonna be
building into Amazon Bedrock in the Titan models, and it's gonna make
everyone's life, hopefully, a lot nicer and clearer and easier, but the models we have
are these four on screen, so these are the four big ones. Talk about Amazon Titan first
because that one is ours, and it's only gonna be
available, at this point, within Amazon Bedrock, and so it's really, at this
point, it's a text-based model, or two text-based models, and they can do all the
usual text-based NLP tasks that you expect, such as text generalization,
summarization, classification, open-ended Q&A, information
research and retrieval, but it can also generate text embeddings, which is useful for many other use cases, and they're the ones that
we're actually deploying as part of Bedrock. Now, the third-party ones, they've already got different
use cases, different nuances, and so when you start to look for or to choose the model you want to use, really look at your
use case in more detail to work out which one is better because the next two on the
list, AI21 Labs and Anthropic, are also text-based LLMs,
so what's the difference? So Jurassic family of models,
which is from AI21 Labs, they're really multilingual,
by their very nature, and so if you're looking
for text-based systems that are really naturally able to handle things like French
and Spanish and German, so naturally, without thinking, then those models are really
well tuned for those use cases. Anthropic is slightly different
with their Claude models. They're really the usual LLMs for conversational and
text-based processing, but Anthropic has done
an awful lot of research into how to build and develop sort of honest and truthful
generative AI systems, and their models are really
strong and really powerful. The last one is from Stability AI, which I'm sure everyone's used, everyone's children has used, and even everyone's grandparents
have probably used as well. It's probably the most
powerful image generation model that is actually out there. Everyone knows about it, so as part of Bedrock,
we're using Stability AI, and we're embedding,
(clears throat) excuse me, their Stable Diffusion suite
of models into Bedrock, so if you want to do
text image generation, then that's what you
can actually use on us. You too can generate images that can be then used in
a high-resolution fashion for things like logos, artwork, product designs, et cetera, prototyping, and all of these things
just come out of the box, and so there the models
that we're actually doing at this point in time, and hopefully, we're adding more at some point in the future. - So the message is clear. I'll reiterate it, and we'll talk after that
on some of the more details, but really, the key value
proposition of Bedrock is to quickly integrate
some of this technology into your applications, into your business or government agency or other organization applications using tools you're familiar with, using technologies you're familiar with and familiar controls
and security controls, privacy controls, making this as easy to
access for you as possible, so that's really one of the key takeaways from this overall presentation. Now let's get into some
additional details. This is a really important point. We'll say this several times. This comes up in every
single customer conversation and, you know, understandable concern is, will you take my inputs, whether those are
customizations of the model or my prompts or whatever I'm
doing to utilize the model, what will you do with that information? And the very simple and clear answer is we won't do anything with that information because that will be isolated
on a per-customer basis for your use, stored securely, et cetera. We'll talk, again, more details on that, but the key takeaway there is this is not going back into the model for further improvements, so that's a very clear
customer commitment, and it will enable lots of use cases that otherwise might be difficult for organizations to decide because they'd have to
make some trade-offs that we don't want you to have to make. Let's talk a little bit more about sort of the security
and privacy aspects, so essentially, as mentioned, you're in control of your data
in the Bedrock environment. We don't use your data
to improve the model. We don't use it for
further model generation. We don't share with any other customer. We don't share it with other
foundation model providers, so they're in the same
boat we're in, right? We don't use your data
for Titan improvements. Other model providers will
not see any of your data and will not be used in
their foundation models. All of this applies to all of the things that customers input
into the system, right? There's many ways that you
interact with the system. We'll talk in some detail about kind of multi-tenancy
versus single-tenancy model, but in all those circumstances, the things that you provide to the system in order to use the system are not going to be included
in the system's behavior outside of your particular
context, your customer context. Data security. Obviously, we'll build and operate this in the way we do with
a lot of our services, all our services with
things like using, you know, encryption of all data in
transit, TLS 1.2 or higher, as you may have noticed, those of you who pay attention
to our detailed blog posts, we're actually enabling TLS
1.3 on a number of our services going by the end of the year, majority of our services will be willing to negotiate
the latest version of TLS, which has a little, some nice
performance improvements. We're also supporting QUIC, which is another type
of network encryption and speed-up technology for many services, so that's for your data in transit. For data at rest, we'll use AES-256, state-of-the-art symmetric encryption, and again, like with
other kinds of services where we're storing customer data, we'll integrate this into the KMS system, so hopefully, everyone's
familiar with KMS, but in a nutshell, KMS is a envelope, a hierarchical encryption technology with the notion of envelope encryption, so what that means is that
there is a customer-managed key or a service-managed key
that's inside the KMS service. Never access the service, is completely unavailable to anyone, including all AWS privileged operators. That base key is used to
encrypt a set of data keys, and those data keys are
what's actually used for data encryption outside the service, but those data keys are never
stored outside the service, except in encrypted form, and what that means is whenever data needs to be
decrypted in any of our services, the service has in its
possession, if you will, a bunch of cipher text, which is the data that was
encrypted with the data key, and it has a cipher text
copy of the data key, the encrypted copy of the data key, so when it needs to read and
send the data back to you, the service will take
the encrypted data key, reach out to the KMS
service on your behalf, and you set up permissions, by the way, and you'll see these
accesses by the service in your CloudTrail because it's doing work on your behalf. Take those encrypted data keys. Ask KMS to decrypt that data key. Send it a decrypted copy. When it gets that back in the response, it will then use that,
decrypt the data key in memory to decrypt the data and
send it back to you, and when that operation is done, it'll throw away that data key, or in the case of S3,
there's some nuances there. There's a model you can use where the data key gets cached for a while to increase performance, decrease costs, but in general, the data
key gets thrown away, and now you're back to
where you were before, but by using this method, you get super-high performance, but still ultimate control in
things like crypto-shredding where you can literally just manage that upper-level key in the hierarchy, and by getting rid of that, you've actually gotten rid
of all access to all the data because the only thing that
exists outside the service is encrypted copies of data
keys and encrypted data, and that exact same model will be used in the Bedrock service to do this really critical
security operation. As noted before, CloudTrail is gonna be
logging these API calls, again, all your tools,
all your familiarity, these things, you know, these access can be streamed to Security Lake, analyzed with existing tools. That's just, again, a
general part of using, utilizing a service built around our core
kind of API competency, and all the customization
that you do of the models, again, exists in exactly the same fashion: per customer, per tenant,
completely isolated, encrypted, and maintained completely separate from the models themselves
or any third-party access. Now, there is some configurability. As with lots of things in security, sometimes you wanna have
a few knobs and dials. Some things are just off, so this kind of data privacy control, that one's just locked. This is actually different than some of our existing
machine learning-based services. You may, those of you who
are familiar with our, some of our existing kind of API-based machine
learning services, services like Rekognition,
Textract, other things, they have the property that we do use data input from customers to improve the models, and that's explicit. It's in the documentation.
It's in the terms. You can disable that, and we give you a
mechanism for doing that. In fact, we give you a, if
you're in an organization, we give you an organization
management policy, which is you can declare, like, "I want every account in
this whole organization "to not share data back with the service," or, "I want this OU to not do that." You can have a lot of control
over that particular setting, but in those more traditional ML services, the default is data is
shared to improve the models. In the case of foundation models, we've made a decision, I'd
say a strategic decision. We're not just not gonna do that. In fact, it's not even an option. It's not a matter of being the default. It's a matter of not
even having the option of the share-back, and so that all the customization you do and all of the inputs that you do remain private to your environment. You do have some other choices, though. We'll talk more about
single-tenancy versus multi-tenancy kinds of use cases, which essentially amounts to
the degree of customization that you can do. KMS encryption. You don't have
to use customer-managed keys. You can use service-managed
keys if you like. That would be kind of the simple
default if you prefer that or you have the choice. Obviously, model fine-tuning
will have certain, you're gonna have a lot of control over the fine-tuning elements and a lot of choices that
you're gonna be able to make with how you control
and operate that process in terms of the content
of your fine-tuning, and then, finally, like
any of our services, you'll have access management
decisions you need to make. You'll use IAM controls and SCPs and all our normal capabilities around controlling access to APIs to make decisions about who can access
what and when and how. Let's talk briefly, then,
about the tenancy models, and essentially, what the
tenancy models boil down to is really the customization element. In a single-tenant endpoint, you have a deployment of the
model that's available to you, and that's true essentially, in the multi-tenants case, essentially, you're accessing a model, but it's being shared
across multiple tenants, but that's essentially, think
of it as a read-only object. You're not modifying it. No one else is modifying it, so sharing is a perfectly
safe thing in that case. In a single-tenant model, however, you can actually fine-tune the model, and that isn't required,
but it's an option you have in that singleton is a modality, and you're gonna be doing
that for just your data, just your customizations, and that, essentially,
becomes your own copy of this overall, the
behavior of the model. The combination of the base
model and the customizations are something that now you're creating and provisioning and managing, or it's being managed on
your behalf by the service. In the multi-tenant endpoint model, you're not doing those customizations, so there'll be some cost benefits, some, you know, operational
benefits and simplicity here, but a lack of customizability
and tunability in this type of approach. In both cases, the same promises apply, we've already mentioned and
we'll continue to mention because this does become kind
of one of the front-of-mind or continues to be a front-of-mind
question for customers, and that is your inputs and the outputs will remain completely
private to your environment. All of these models are
deployed and managed within service accounts with all the controls we
have around lots of isolation and protection from all
kinds of possible threats, and then, finally, importantly, not only do we protect your
data from our first-party model, but we're protecting data from the third-party models as well, so that means that you have
that level of isolation that you want and that you'll depend on. Okay, let's talk a little
bit about networking. This is, you know, access always involves both identity aspects, network aspects, or combined in our kind
of zero-trusty world, so let's talk a little bit about that so we'll set up a basic
environment, you know, notionally here we have a region. We have a client account, which you can think of
as a kind of container, although not a network container, and then, of course, VPCs is kind of our fundamental
networking container construct, and you have that environment in AWS. You also, obviously, often
have a corporate network outside of AWS, and on the right side of
this slide, as you can see, the Bedrock service is represented to you as an API endpoint, just as if you were using S3 or any other, or DynamoDB or any other
API-driven service. When you wanna access that API, you have a couple of options. You can go over public address space, if you like, either Internet from
your corporate network or using a NAT gateway
or an IGW, what have you, the sort of standard technologies in AWS, and you can reach that API endpoint available to you from the Bedrock service. Now, I will note that, you know, sometimes there's a misconception that that upper yellow path from, say, a NAT gateway
to an AWS service, people say, "Oh, the traffic's
going over the Internet." This is not true. It's going over public address
space in the same region. It never exits our private
network or our border network. We encrypt all the traffic. We both encrypt all the
traffic between facilities in all our regions, so even traffic going down a public road in the same availability zone, if the fiber optic is outside
of our physical control, we're encrypting all
that data all the time with a technology we call Project Lever, so this is actually a
super-safe and secure path, but it does use public address space, which, for many people, in their imagination
think is a source of risk, so if you don't wanna do
that, you don't have to, but I wanna just point
out that there's actually, there's really no risk there in terms of the risk you might assume if you're doing true
Internet-based connectivity. The other path, of
course, is the Internet, and although you're using
TLS, and you're probably fine, there are a certain set
of additional risks there, but they're, you know, pretty manageable. However, none of this is required because you can all do this
through private paths as well, so you can set up a
private link connectivity to the API endpoint. These are also called VPC endpoints, so the service will have a VPC endpoint. You can connect to this
abstract network object we call an ENI, and all of your traffic
will essentially be tunneled from your VPC to the API
endpoint of the service. You can backhaul traffic to
and from your corporate network over Direct Connect and TGW and all existing networking constructs and essentially create a private path to use the Bedrock service, and you can even write things like service control
policies or IAM policies, which limit access to only
certain network paths, which is also a very useful feature if you wanna, for example, block all access from non-private paths, (indistinct) all existing options which will apply to this service. - Okay, thank you. Thank you again for clarifying
that public address space does not mean the Internet. I've had that question every
day for must be eight years, so on the left-hand
side of the diagram now, you can basically abstract it away everything Matt just said, which is this is where the way
all the traffic is coming in. It's gonna come and hit its endpoint no matter what the source is, whether it was corporate data center, Direct Connect, Internet, doesn't matter. It's all gonna hit there, so let's talk about how
some of the data flows work within the service itself, so we'll start with
multi-tenancy inference, so on the right-hand side, you'll see there's a model
provider escrow account, which Mark mentioned the previous slide. We have one of these per
model provider per region, and each one contains a bucket
to hold the basic models for that model for the provider, and also anything that's been
fine-tuned for that provider, just so you know, to set the
scene before we get going, so when the request comes in, it's gonna come and hit the API endpoint and get to the Bedrock service, and then, IAM permitting, of course, if they can actually make that request, it'll get passed to the
runtime inference service. Its job is then to decide which of these model
provider escrow accounts holds the endpoint I'm looking for for this multi-tenant request. It'll find it, send the data to, over, again,
TWS connections, obviously, pick out the response from the model, and return it back to the user. All nice and simple,
nice and straightforward. IAM's in play, encryption's in play, and nothing gets stored
in the escrow account to record what happened that none of the model vendors
can access the account anyway to actually look at the
data that doesn't exist, and none of that data will
get used by any vendor to train anything else. Again, we're gonna keep repeating this. We also see, at the bottom
of the main service account, there's something called
the prompt history store. Now, this is because we have a playground in the Amazon Management Console, which you've probably seen on every other gen AI
vendor on the Internet, where you can type in your queries, you get some prompt responses, and they've cached it somehow somewhere so you can go back and edit your response and submit another variation until you get the right
result you're looking for as you're crafting your query, so the console allows you to also store those queries as well, and so the service account, if it gets a console-based request, will store it in the
encrypted prompt history store just for your account, which you can delete if you so wish at some point in the future, but it's there really
just to make your life in the console and in the
playground that little bit easier, so essentially, that's multi-tenancy. Single-tenancy is quite similar, in fact. If you go back to the forwards a few times it's extremely similar in the
way that it actually works. We have, again, we have the same model
provider escrow account on the right-hand side, but this time, the model on
the endpoint is being deployed either from the base model bucket, so you have, like, a private
version of one of those models, or it comes from a fine-tuned
model bucket instead, and it's one that you've
built, you've created, you've tuned, and it deploys that instead, so when the request comes in on the left through the API endpoints, hits the service, again, IAM permitting, goes to the runtime inference service, which, again, picks the
right escrow account, picks the right endpoint, sends a request, picks up the response, and passes it back, and also, again, that we've
stored that information in the prompt history store, if relevant because the (indistinct)
came from the console, and again, we've got
the same caveats again on data storage and on encryption. Everything's still TLS 1.2
across the board left to right. Nothing is stored within
the escrow account as part of the inference. None of the providers can get to that, therefore none of the data can be used to train other models. It's, as we say, nothing is stored, and nothing is accessible. Nothing can be used by anyone else, so those two really are quite the same, which is quite important for developers because essentially, the difference between
these two approaches, the single and multi-tenancy approach, is in the API core, you're
changing literally one parameter that says, "I'm calling
Anthropic this time. "Okay, I'm gonna call Titan this time," and that's essentially the change that developers has to make. There is nothing else. You're probably gonna use
very similar prompt text. You're gonna be calling it in the same part of the
application for the same use case, and you're just changing one thing, and you also get the point of view from the service team, of course, that, conceptually, this all makes sense. It's all very consistent, so even internally for us, it makes a lot of sense to do it this way. We're trying to remove
all the complexities from the customer perspective
and also from our perspective to make this as simple to do as possible. Moving on to possibly the
more interesting one is the model fine-tuning, and so on the right-hand side, you'll see, again, this time, the customer
account has appeared, which we'll talk about in a second, but again, this starts off
on the left, as you imagine. Request comes into do
fine-tuning to the endpoint, hits the service, IAM
permitting, of course. It will then call the
training orchestration piece. Now, what that does is in the relevant escrow account
for that model provider whose model you're about to fine-tune, it will start an Amazon
SageMaker training job. What that will do behind the scenes, it will load the particular
base model you want to tune from the base model S3 bucket, and then it will reach into an S3 bucket that you nominate in your account
to read the training data, but this could just
could be the S3 address if that's all you wanted, but you could also provide
it the VPC information, such as subnets and security groups, and then, you can make it essentially drop an ENI into the VPC so it will reach out to
your S3 bucket via your VPC, so if you have S3 endpoints
in your VPC or bucket policies that says only this VPC can
access my bucket, great. That all still applies, and so the service is actually reaching
down into your account and using whichever policies
you've set up in that account or in that VPC to access your bucket, so, again, once the model is trained, it's gonna be encrypted again and dropped into the relevant
fine-tuned model bucket and can then be deployed later
as a single-tenancy endpoint, but through all this process, none of the data from your S3 bucket is then stored in the escrow account. The model is, of course,
that's built and deployed, encrypted with your keys,
and stored in the bucket. The model providers don't
get to see that data either, so no one has any idea
what you're actually doing in terms of training that model, so then we can take that data, and then we can see your use case and think, "That's excellent. "Let's go and steal your data "because we're the model provider. "Surely we can access it." No, you can't, so everything is safe,
secured, and encrypted, and even the access path
for S3, as shown on screen, is entirely under your control, so again, it makes the whole thing really safe and really secure, and this is the whole thing in one go, and so, conceptually, it is really simple, although, in this case, we're just showing one model
provider escrow account. We know there's many pair region, based on the one-pair model, but this is how the whole
thing actually works. You can see all the pathways in one place. You can really see
clearly what's happening. The one thing we haven't really
called out is at the bottom, think Mark mentioned before, that CloudWatch and CloudTrail
are definitely in play. Anything that's used by the servers or touched by the servers is gonna be put out to CloudTrail. Any metrics that we want to
be defined for CloudWatch will be output to
CloudWatch in your accounts, so just for simplicity, we
took them off the diagram to make it more focused
on the flows themselves, but hopefully, this all makes sense. - And speaking of IAM, just
to talk very briefly about... Again, this should be familiar to you if you're an AWS person or an engineer or someone who does security work in AWS. We'll follow the standard model that we follow with identity
and access management. There'll be identity-based policies, so that means all the principals who want to use or access
the Bedrock service will need to have the right permissions in a policy associated with their role or their other principal, and in those policies, again, you'll have the normal capabilities. You can define the actions. You can define resources, so you can specify which
models, for example, are accessible for this
particular principal. We'll support what's called attribute-based access control, ABAC, which means that you can
also write permissions in terms of tags
associated with principals and tags associated with some
of the resources and objects. This gives you some additional flexibility that many people desire, and it's generally a trend in AWS to move to ABAC-based access control, so all this should be familiar to you, but it's, again, gonna be present and sort of standardized in
the Bedrock service as well. A very simple example of a
policy that one might write, in this case, it's a deny statement, which actually would work as a service control policy as well. You might have, for example, a principal who has access to most of
the models in the system, but there's one in particular
that's special in some regard, and you want to deny access
for invoking that model, so you can write a deny policy, apply it to either the
principal or to the account or the OU or the organization, and you can exclude that particular access from the general permissions
that you've granted to a principal or a set of principals, just as one example but again, those of you who
know AWS, this is old stuff, and you would understand how this works, and will just continue to apply that. Now, we've been talking a lot about what I'll call
security of the model. That is, it's a workload, right? So you need to secure the workload, and you do that with a lot of the
technologies we've invented with some cool new innovations, like the ability to access
directly from your VPC and have control over
even some network flows that you may not in
traditional AWS services, but there's a whole 'nother
aspect to this whole world, which we'll only touch on today, a very interesting aspect, and CJ talked about it in the keynote, I hope you saw that yesterday. There are challenges in
using these technology. There are security concerns
and other types of concerns in the security of them in the model, like, how do people use it? How can they potentially misuse it? These are all concerns
that have come along with any new technological
invention or innovation. I think of technology
as an amplifier, right? It amplifies human
capacity and capability, and when you can amplify something, you can usually do that for
good or you can do it for ill, and so there undoubtedly will be attempts to use this technology
for malicious purposes or, let's say, illegitimate purposes. Maybe the two are synonymous; maybe not. Maybe not quite the same
to try to hack something versus, you know, have a
little help with my homework, but we're all aware that
that's going on out there, and this is gonna be a challenge that we all face and work on together. The model builders will do their utmost to protect the use of the models from certain kinds of
malicious activities, but, again, abusive uses are possible, and people will certainly try
to work around limitations or work around filters, and we see that today in the industry. We'll continue to use the technology to enhance the technology, and we'll learn from mistakes and problems and continue to improve
the securities models, but it is something
that we, as a community, have to be aware of and be building the kinds of protections and utilizing, creating the use cases that can account for the
possible risks that are created in these environments. One of the things that I find interesting, and I've been reading, like, again, like many of
you, about these topics and trying to learn what I can, and even the way that you,
so you can think of a... These are probabilistic systems, right? And so they are amazing
at what they can do, but it's not, by definition, like, literally error-free. You can't ever say there's no errors that result from a
probabilistic system like this. One of the interesting
details that I've learned, and perhaps you know this as well, is that although they're
probabilistic systems, they're, by nature, they're
deterministic systems, so unless you do some magic, if you enter the same prompt, you always get the exact same result, but why don't they do that? Well, because if that's
how the, let's say, consumer version of a
foundation model worked, people would quickly think, "Oh, this isn't particularly intelligent. "It's just a computer
doing what computers do," so what happens, there's actually a param, there's a set of parameters in the models. It's called the model temperature in which the designer of the model can turn a knob of randomness, and so the same prompt will result in different
outputs each time, creating the illusion that there's some real kind of creativity or intelligence there which might not be created if the same prompt always
had the same output. Just a little point that helps people to grasp and understand, helped me to grasp and understand
super-powerful technology, but maybe not quite as
magical as it first appears, and another reasons I bring that up is in a lot of business use cases, put aside the consumer and, like, amazing stuff
you can do use case, in business use cases, deterministic responses
could be very useful, right? You might not have a temperature setting. You might want it to always give the same
answer to the same question because for the business
use or the more focused use, that's exactly what you want, so that will be, that's the kind of option
that you can enable in a business or
enterprise-oriented version of these kinds of services in a way that hasn't so far kind of hit the public consciousness because, again, we're sort of
being amazed and entertained by these what I'll call
more consumer use cases. Another thing that I find
particularly interesting is how do you characterize
the error of the outputs? Some kinds of errors are, for example, someone says, "This model contains... "It's biased," so it shows,
for example, racial bias. Completely agree that's a problem, but it's a kind of error, right? It's something about the input is not matching the desired output, and we need to train and tune and filter in order to get the
socially desirable outputs from a system which, again, has no intrinsic moral
compass, if you will. Other kinds of errors I've seen declared it's a vulnerability. That's a security vulnerability, and I look at it and I say, "Well, I can see why you might say that "because it has a security implication, "that particular error, "but it's just another kind of error, "and I'm not sure "that characterizing it as a
vulnerability is very helpful. "Maybe it is; maybe it isn't," and I think these are
the kinds of discussions that, as a community, we need to have, but I think the key thing here is to recognize that we
have to come to grips with the probabilistic
nature of the models, the incredible value they create, but also the risks that are
created through, you know, some of the manipulations and potentially malicious
use of these systems. Now, what are the opportunities we have? Well, first of all, there's
really clear low-hanging fruit that I've already seen, and you'll see quickly
in a lot of products, including our products and others, is much better user experiences
around things like query. Like, think about, you
know, query languages. I have to learn SQL
today in order to do... So, I mean, the natural
language processing capability, we already have this on
our QuickSight service, is just amazing. You can literally ask very
normal human questions and get really good results
using this type of technology. The fact that using
domain-focused foundation models, and I'll give an example in a minute, is really useful because
now, if I can scope down, like, it's amazing that you can do all kinds
of broad range of things, but if I'm willing to scope
down the desired outputs to a particular domain, there's really cool things you can do that are hard to do in the
very general use cases, and another thing I think
that will be common, at least in this first, you know, few, this first time period as we kind of learn and
adapt to this new technology, will be supporting human judgment, so you'll ask advice of these systems. You'll get input from systems. You'll get very good help
in solving some problem, but you probably won't do a
full closed-loop automation because if there's an error in the output, and that results in a change in, say, a security setting
in your environment, that could be a problem, so I think you're gonna, you know, I've been actually pretty impressed in the security community,
where I tend to live, is that people are impressed
with this technology, but they're a little bit skeptical that it will immediately
solve a bunch of problems because they recognize that
even, like, a 3 or 5% error rate is a problem if it means, like, shutting down a
production system accidentally because you changed a firewall rule that kind of would normally make sense but didn't under those circumstances, but that doesn't mean
it's not super-useful to get advice that's normally correct and then apply human judgment to that, so I think those are some issues that we, as a community,
will continue to work on, but within the Bedrock framework, you can think of your ability to, again, customize and tune these systems
to meet your business needs or the needs of your government agency, and I'll give, I think, a
really cool example of that, and that is Amazon CodeWhisperer, which you've heard talked
about this week already, but really think about
what this tool is doing, so it's a pretty focused use case. It's gonna provide you with source code in languages of your choice. It supports a lot of languages. You know, in response to a human prompt, it'll write code for you. Doesn't write it, but it'll
generate code for you, and it will help, you know,
embed that in your IDE and give you some
information about that code, but think about that, so because it's focused on that domain, what it does is it takes
the generated output, and it compares it to its corpus and says, "Does the generated output "sufficiently resemble any inputs "in my giant massive
database of source code "such that that could be reasonably seen "as the same or closely derivative work?" And if it does, there may be a licensing issue there because it might be under
an open source license that's not acceptable
in your organization, or maybe it is, or maybe it isn't, but now, what the tool will do is to say, "Look, this code is closely
related to this code, "and here's the URL for
where that code came from, "and here's the license that it's under," and it will, like, stick that in a comment
in your source code, and you can decide, as a developer, under the policies of your organization, whether to use the code or not and in what way to use the code. Again, that is in a general-purpose system would be very difficult to build, but in a special-purpose
system is super-valuable, and so, again, I think these kind of
enterprise-type use cases will be where we see a
ton of value and success for foundation models in generative AI. You know, CodeWhisperer
also does security scanning using more traditional, both ML-based but also
kind of rules-based, of the code that it generates, and so whether you're writing the code or you're asking it to
generate code for you, you'll still get a bunch
of security protections looking for all the standard,
you know, top 10 OS things and other kinds of static code analysis types of capabilities. Super-useful. Off we go.
- 'Kay, so we did mention earlier that there are other ways
to do large language models and foundation models on
AWS apart from Bedrock, although, personally, I'm quite
more biased towards Bedrock, that's just where I am at the minute, so we give you the ability to be quite flexible in
your choices of models, your choices or platforms, and let you build your
own models from scratch if you want to or use some prebuilt pretrained
models if you want to, just to try and make sure you're doing the right
thing for your use case, so Amazon SageMaker JumpStart
is a great example of this. It's an MLHub that offers you a number (indistinct) algorithms
and models, et cetera, that you can just deploy
yourselves within your account, so you can use that to discover all sorts of different LLMs
or FMs within the environment, so such things, like, that aren't, especially in Bedrock, as an example, so you can look at the OpenLLaMa models. You can look at the FLAN-T5 models, look at the, (clears throat) excuse me, the Bloom models, which aren't in Bedrock, but they are in JumpStart, and so if those models,
if you've read about them and you see how powerful
they are at what they do, perhaps that they're
particularly niche use cases in some situations, it's worth a look. It's worth to try those out to see if they're actually more suitable for what you're trying
to actually achieve, and, of course, we're adding more and
more models to JumpStart. I think we've added twice the number of
models this year already to what we had last year, and so the growth there of what we support is just getting bigger
and bigger and bigger, and it's a mixture of open source models
or proprietary models, and so we're really giving
you as much choice as possible to find the right sort
of gen AI-type platform that you can use within
your AWS environment. Now, some customers, they actually do need to build
their own model from scratch, and as you can imagine, as
you've kind of alluded to, it's quite a large, lengthy process. You have to collect all of
the data that's relevant, get it all reviewed, get it into a useful form for the models, get it built, which takes time, but you can do all these
things within Amazon SageMaker. The Amazon SageMaker tools let you do all these things at scale. It lets you build very reliable, very stable, very scalable models. It lets you do distributed
training in certain cases so you can really reduce
the training time, and you can use things
like the debugging tools to find issues perhaps with
the training in mid-run so you can correct those errors, and you can also do things
to just analyze other metrics as part of that training situation, and really, really helps you do that work. I mean, you still have to
know what you're doing, understand the models,
understand your data, but SageMaker itself makes it a really straightforward
thing to actually do, so if that suits your use case, and for some customers it absolutely does, you can do that as well, and because SageMaker also supports the
human-in-the-loop process, when you're collecting your data, you can actually apply that sort of human
knowledge and human judgment to the data that's coming in to make sure you're training your models on the right and relevant data for your use case and your domain, so the difference between proprietary
and publicly available, for most customers, it's quite confusing. The licensing situation
definitely comes to play because each of these, even if they are open
source and public models, they still have a license condition. You will have to adhere
to whatever they say, so that must take part of your choice when you select the models, but one thing to think
about is proprietary models, may, I stress the may, be more accurate than the open source ones or
the publicly available ones, but they also may be more
expensive in comparison to them if they have, say, a similar model size, and so there's pros and cons
of going with either way, so it really is down to you
to look at each of the models. Look at the licensees. Decide on how many different
model sizes do we have? Do we just have one model
in this particular family? Or do we have a huge number,
like Jurassic for AI21 Labs? There's quite a lot of variations in size and complexity and speed, so once you've looked at the license conditions,
complexity, and speed, you've pretty much got an idea
of which ones are gonna work, but the next thing to really think about as you get onto this is different models also
support different languages, and we're not talking Python here; we're talking French, German,
Spanish, Italian, et cetera, so you'll find that some, well, most of them
support English, anyway. Some, such as AlexaTM, will support a big bunch of languages, including things like Arabic and Japanese, which are less common in
some of these environments. You'll find some really concentrate on some Central European ones, and some, like some of
the, I think LightOn will also support quite a
lot of things in French. That's very, very powerful in French, and so you have that extra
thing to look at as well, so there is a lot of choice, and so when you come to do your POCs, there's a lot of things to think about, but to actually test these
is really straightforward, and before Bedrock was actually available, this is what I was doing
to play with stability, AI and AI21 Labs just going via JumpStart, so once you've gone through the model list and decided, "I want to
try this one or that one, "give them a go," you'll find that most of them
actually have a playground as part of the console on
the AWS Management Console, so you go into JumpStart. You pick AI21 Labs because
it's top of the list there, and you get a playground option, so you can go straightaway
and start typing in queries, start typing in prompts, start giving it sort
of extra one-shot data for your extra context to your query, and it just works. There's nothing to
build, nothing to deploy. Of course, it is a playground. It's not a production
environment you can use, but if that works well for you, you can then click another
button, essentially, and it gives you the code or
the notebook that you need to go and launch an endpoint, and it will then go and deploy AI21 Labs, the relevant Jurassic
model to a SageMaker, and it deploys it in your account, and so all the (indistinct) going to are gonna come from your account, and all of the logs are generated by it are gonna be basically
available in your account, and so you're building your own private, essentially deploying your
own private fusion model, and the real big difference
is, in that sense, is it's a bit more work, in that sense, but the number of models available
to it compared to Bedrock is just bigger, and so the way I work in this now was AI21 Labs and Stability AI on JumpStart. Now they're in Bedrock. I use Bedrock because the API
experience is so much simpler, and for me, that's the
thing I'm looking for, and again, is one of the
powerful things about Bedrock, so how to actually get started? The first way to get started is not to use one of our models
and JumpStart or Bedrock. It's CodeWhisperer, which sounds
a bit silly, in that sense, but if you deploy that, you're instantly getting available to you large language models in your
development environments, and developers can start generating code that suits what they're
trend to actually achieve, and once you start using
this, they start realizing, "I can describe my function in one line," so I get a function
that can do, let's say, want to write this JSON structure out into this different format and then write it to this storage object. That's the only thing you
want to try and describe is into CodeWhisperer, and it would generate the code as well as asking you, "Do you want the read function as well?" And it would generate
that for you as well, and there's nothing you have
to do except look at the code and just double-check that
it passes all the rules and regulations and checks
that Mark mentioned, and make sure it suits
your in-house style, and that code is good to go, but it really gives you
that early experience of actually using LLM prompts 'cause essentially that's what it's doing. You're writing a prompt. You're writing a query
to generate the code, and once you really get that feel, which you can have within,
I think, 10 minutes to get it installed in PyCharm for Python is really, really quick. You can start doing this
and getting a feel for it and then realize, "Actually, I can think of a
lot of use cases to use this," so once you've got that in place, you can start then looking
at Bedrock or JumpStart, depending on which model
you're looking at trying out, and they're the obvious places to go to start your gen AI journey because either Bedrock as API or JumpStart as a single, maybe
a double-click to get going, and it means you can have
these things available, again, within seconds if that's what
you're trying to achieve, so once you looked at these things and you think, "Actually, "gen AI could do my
business a lot of good. "There's a lot of benefit we can achieve "with these models and
this new technology," so what do we do? You gotta get a POC, but what we have found
in early conversations is that a customer comes to a meeting, and they say, "We've got
these top three use cases," and you talk through
them for a few minutes, and you realize they're not
your top three use cases. Actually, it's these six things over here you just hadn't realized you could now do, so this is often a
revelation to customers, so we actually have a program that's called the AWS
Generative AI Incubator program, which is really sort
of a applied scientist who will come on site and help build those initial
discovery workshops for you and actually help you find out what, actually, are my top
five, top six use cases? And they'll take them and then
help you do those early POCs and get you to a stage where
you can actually think about can this go into production? Is this actually gonna
add the value that I want? Hopefully, yes, but that first decision point of working out what use case to use is actually quite tough because it is a completely
different way of thinking, and that program team, they can actually do it quite well and help you get to the
POC, hopefully, much faster, and if you're using an API
system such as Bedrock, you could start that POC within minutes of the first
meetings on use cases finishing. It's really, really straightforward, so that's really it for
the discussion today, so hopefully, we'll talk to you about what Amazon Bedrock actually is and what it actually does, where it sits in the ecosystem compared to things that Amazon JumpStart, and Mark's gone through some
of the security concerns that we really have to think about if you're putting these
things into production 'cause if you don't think about them now, your security teams will think
about them really quickly once these things get anywhere near a production in live state, so thank you.
- Thank you for your time, and we'll be around later for questions if you're welcome to come up. Thanks.
- Thank you. (audience clapping)