(upbeat music) - Hi, Kimberly, how are you? - Hi, Bratin, I'm doing
well. How about you? - Good, thank you. Can you believe it's re:Invent
season all over again? - I cannot. Time really flies, doesn't it? - Really. So how are we doing on our slides? - Well, Bratin, if you remember, you made a really big bet last year. - I did? - And here's your money. - Really? I don't win bets. Are you sure I won the bet?
- I'm sure. (Kimberly laughs) - Wow. So this is about the one
where we can use generative AI to make my slides like you
will just type in some text and out will come the slides? - Yep, that's exactly right. I'd like to show you what
I did using generative AI in Amazon Bedrock to work on your deck. I think it's pretty cool.
- [Bratin] Nice. - So first, I go into Amazon Bedrock, and I just click Get started, and I'm gonna go into the Text playground to generate some talking points for you. In this case, I'm going to
select the Amazon Titan model and Titan Express. And if you remember,
we've been working hard on building some messaging, and I'm gonna use that as
context for the Titan model. - From the marketing messages
to the slides, that's amazing. - Yeah, that's right. So put all this context in, which is the messaging
we've been working on for generative AI, and I'm gonna ask Titan to
come up with five key themes that you should highlight in your talk. (bright music) Let's see how it does. (bright music) - [Bratin] Wow. This is as good as what
you and I would've done. - [Kimberly] I agree, these
talking points are right on. I think we can use them for your talk. - [Bratin] Amazing. - Your slide deck is ready to go. I think we can take it to Vegas. (upbeat music) - Good afternoon, everyone. Welcome, and thank you for being here. A year back, using generative AI to create
slides for my presentation might have seemed fanciful, but here we are. Not only can it generate
slides, but a whole lot more. And in today's talk, I'll talk about what it takes to build and scale generative
AI for the enterprise. Because when you're
building for the enterprise, it's important to pay attention
to some key considerations. This slide generated by Amazon Bedrock lists those key considerations. And in my talk, I'll discuss why these
considerations are important and how AWS helps you
address these considerations. We'll also have customer speakers come and talk to these key considerations. So for example, we will have Ryanair, one of the largest airlines in the world, talk about choice and
flexibility of models. We'll have Fidelity, one of the largest financial
companies in the world, talk about differentiating with your data. We'll have Glean, one of the most popular AI-driven enterprise search assistance, talk about responsible
AI in the enterprise. We'll also have TII talk about machine
learning infrastructure. And finally, we'll have Netsmart, one of the largest healthcare
community providers, talk about using
generative AI applications. But first, let me take a step back and discuss why generative
AI is so transformational. Over the last five years, the pace of innovation
in compute, in data, and in machine learning has accelerated, and this acceleration
was driven by the cloud. The cloud made massive amounts of compute and massive amounts of
data available to everyone, and as a result, practitioners in industry and academia were able to innovate
rapidly on machine learning. In fact, almost every frontier
machine learning model was born in the cloud. Now, let me give you a
little bit more context on the space of innovation that has been driven by the cloud. Over the last six years, the amount of compute that
we use for machine learning has grown by more than 100,000 times, the amount of data that we use for training
machine learning models has grown by more than 100 times, and the size of models has
grown by more than 1,000 times. This is a pace of innovation
that we have never before seen in the history of information technology, and this space of innovation
has allowed us to create models that are trained on internet scale data, the so-called foundation models. Let me give you a little bit
of a feel on what it takes to build one of these foundation models. A human being, you and me, in the course of a lifetime, in the course of our entire lifetime, a human being listens to about
1 billion to 2 billion words. Now, when we train
these foundation models, we are training these models
with trillions of words. In other words, we train these models with thousands of times more information than a human being will listen
to in their entire lifetime. There's another way to look at this. When we train these foundation models, we train them with terabytes of data that is thousands of times more than the information
contained in Wikipedia. And so when you pack so much
information into these models, they start having very
interesting properties. But the question that most
customers care about is: How do I build applications
out of these models? How do we put these models to work? And so I'm going to talk
about how we, at AWS, have been building
generative AI applications because I think the
lessons we have learned and the considerations
that we had to keep in mind will also apply when you want to build and scale your own
generative AI applications. Earlier this week, we launched Amazon Q. Amazon Q is a generative AI application that uses generative AI to transform how employees
access the company's data. So you can use Q to ask
questions about a company's data. You can use Q to create
content on your company's data. You can also use Q to act on your behalf
on your company's data. And so I'm going to use Q to illustrate the considerations that
we had to keep in mind and how we went about building an enterprise-scale
generative AI application. So the first question that
we had to ask ourselves, and I suspect you'll have
to ask yourselves is: Where do I get started? How do I choose a foundation model to build an application with? And this is not an easy question to answer because every model has its
own strengths and weaknesses, and so it's important to
do a lot of experimentation to figure out which model to use. Let me illustrate with some examples. Suppose I asked this question to two different foundation models, and these questions and answers are actually real
questions and real answers that we asked many
different foundation models. So I ask: What is your shoe return policy? Model 1 gives me a quick concise answer, free returns within 30 days. Model 2 gives me a longer,
more complete answer. Both of them are accurate answers. Which model do you want to start with? That actually depends on the
application you have in mind. For example, if you want
to build an application to generate ad copies, you want Model 1 because you
want brief, concise statements. On the other hand, if you're looking to build
a customer service chat bot where you actually want to
have a verbose interaction with the customer, then you want Model 2. Let me give you another example. Suppose I asked this question: What is your checked bag policy? Model 1 gives me a quick accurate answer. Model 2 also gives me a correct answer. It's more complete, but it
takes longer to generate and it takes longer to generate because it has to do a lot more compute. And because Model 2 has
to do a lot more compute, it's an order of magnitude more expensive. And so now, you have to ask the question: Do I really want to pay
an order of magnitude more for Model 2, or am I better off paying
a fraction of the cost and using Model 1 because it gives the answers
that customers care about? And so it's very important as you think about building
and scaling generative AI that you run a whole set of tests and you work back from your use case. And so let me show you the results of running a whole set
of tests for Amazon Q on many different models. But first, let me talk
about some of the parameters that we use for evaluating these models. So there's cost effectiveness. How expensive is it to use this model? There's completeness that
I talked about before. There's low hallucination. So when a model has low hallucinations, it's a lot more accurate. Then, this conciseness
that I talked about. And finally, there's latency. How quickly do I get an answer back? Now, when we did our actual evaluations, we used a lot more parameters, but I put up five-year because
they get the point across. And so now, let's look at the results from the first two models. And what we'll notice here is that Model 1 is not as cost-effective, but Model 1 is a lot more complete and now it's not clear which model to use. And so we said, "You know what? Let's go and try a few other models." So we took another model and
ran the whole set of tests, and the results were, again, the same. The models are good in some dimensions, they're not so good in other dimensions. And we ran our tests against
many different models, and the results were always the same. Models have strengths and
models have weaknesses. And I can bet you that as you build
generative AI applications for the enterprise, you too will likely have to
go through a similar process where it'll end up with some models that
are good for some things and others that are good for other things. So where did we end up? Here is where we ended up. We picked a model that's good
on the cost access and said, "Let's go and optimize it
on the other dimensions." And it's very likely that
as you build applications, you too will probably have
to make a similar choice where you pick something
that's good on some dimensions and then go in and optimize
it on other dimensions. So what were the optimizations
that we had to do? When we started building Q, we thought we would use
a single large model, we thought we would take the
largest model and run with it. Turns out, that's not where we ended up. We actually ended up using
many different models. Each one of them somewhat
specialized to the task. Let me explain why this was the case. So when a user sends a query to Q, Q has to do a bunch of things, it has to first understand
the intent of the query. What is the user trying to get done? It then needs to retrieve
the right data for the query. It then needs to formulate
the answer for the query. It then needs to do a
bunch of other things. And so it turns out that using a single model
wasn't the optimal experience. Using multiple different
heterogeneous models ended up giving a better experience. Now, we thought this was counterintuitive, and you may also think
this was counterintuitive, until we realized there's really
a very interesting analogy to how the human brain works. It turns out that the human brain is
not one homogeneous thing. It actually has multiple
different heterogeneous parts that are each specialized
to different tasks. So for example, the frontal cortex that deals with reasoning
and logical thinking is constructed differently
than the limbic system that deals with fast,
spontaneous responses. Even the neural structures are different. And so it's probably not surprising that when we considered all of the tasks that Amazon Q has to do, we ended up with the
heterogeneous model architecture. What were some of the other optimizations that we had to do for Q? Once we took care of the models, we actually had to spend a lot of time on the data engineering. Let me explain why. Suppose I asked this question to Q. Tell me about my customer
meeting tomorrow at 10:00 AM. Notice that Q now has to
access multiple data sources. It needs to first go
and look at my calendar to figure out what meeting I have. It then needs to look at my CRM system, my customer relationship
management system, to figure out details about the customer. It then needs to look at
other company documents to understand how we are
interacting with the customer. And so what Q has to do is it has to aggregate data
from multiple data sources to be able to give me a helpful answer. And so we spent a lot of time on building enterprise data connectors on data processing, data pre-processing, data post-processing, data quality checks, to ensure that Q had the right
data quickly and efficiently. Now, once we got done with
machine learning model design and once we got done with
the data engineering, we thought we were done. Turns out, that was not the case. Let me explain why. Suppose I asked this question to Q. What is the expected revenue
of products this quarter? This is company confidential information. What this means is that some people should
have access to this answer, but not everyone. And so in this case, if the software engineer
is asking this question, Q should say, "Sorry, I
can't give you the answer." But the CEO is asking this question, Q should be able to give some answer. In other words, Q or any
enterprise application needs to respect the access
control policies on the data. It should only give answers that a user is entitled to have. And so we have to spend a lot of time on building access management, block topics, sensitive topics in general, on building responsible AI capabilities. Now, to build all of these, we also needed a performant and low-cost machine
learning infrastructure. And this leads me to
the key considerations for accelerating your
generative AI journey. First, you want to have choice
and flexibility of models. Second, you want to be able to use and differentiate with your data. Third, you want to
integrate responsible AI into your applications. Next, you want to have
access to a low cost and performant machine
learning infrastructure. And finally, in many cases, you want to get started with
a generative AI application. Let me now dig deeper
into each one of these, starting with choice and
flexibility of models. In fact, this is why we
launched Amazon Bedrock. Amazon Bedrock. Amazon Bedrock is the easiest way to build scalable applications
using foundation models. It gives you a range of state
of of that foundation models that you can use as is or you can customize them with your data. You can also use Bedrock agents
that can act on your behalf. And so to talk more about how customers are innovating with Amazon Bedrock, please welcome John Hurley, the Chief Technology Officer at Ryanair. (audience applauding) (upbeat music) - Thank you, Bratin. - Hello, everybody. My name is John Hurley,
I'm the CTO of Ryanair. Who is Ryanair? We're Europe's
favorite and largest airline. We will fly 185 million
passengers this year, and that will grow to 300 million
over the next coming years as the new aircraft orders come in. Two key stats I love about Ryanair is we fly 3,300 flights per day and carry 600,000 passengers
all on 737 aircraft. A very efficient operation,
high-volume, high-energy. And the IT department which I work in has to go at the same
speed as the business. COVID came. We actually had a chance to breathe. Unlike other people, we
saw it as a positive. We took the bucket list and tackled projects we've been
trying to tackle for years. For example, we use the
SageMaker for dynamic pricing. We've now dynamically
priced every single fare and every single ancillary
product on our website. That's over a million
different price points getting calculated continuously 24/7. We use SageMaker for
predictive maintenance. It's been interesting, some early positive
prototypes have gone well with a lot to in that space. It's very interesting. It'll help us in our
operational efficiency, and we look forward to that
going forward, further forward. We use SageMaker for
packing our fresh food. It wasn't all about SageMaker
tank, unfortunately, but we did other projects there as well. For example, we got rid of the
papers copy using TraceNet. And during COVID, we had 30 odd different
European governments who had different regulations
being thrown on us all about safety, and procedures, and information being shared, and we're constantly doing that. And if we didn't have the
likes of Lambdas and AWS being fully in the cloud, we'd have been snookered in that world. For example, I think one government was Italians gave us three
days to build a COVID wallet. And we only did it because
of the power of technology of Lambdas that can do it in that space. And while we were in that
case, thought we were bored, we're also refunding... Refunding, refunding
and processing refunds to over 20 million passengers. So it was a very busy time, we did a lot. We circle back and that
project about the fresh food, we call it the panini predictor,
but it's catchy title. The idea was, was how to give
packing plans to the business, so it could actually
have the right fresh food in every single flight. We did it, and it's a very
interesting example here of where the theory
was wonderful in paper, it was brilliant with data scientists, they're over the moon. We handled over these packing
plans and it was a car crash. It was absolutely impossible to have 550 different packing plans across 93 bases at 4:00 in the morning. So we were stoked, we spoke to Amazon, sorry our AWS partners, they
put us in contact with Amazon, who actually gave us a tour of their fab to show us what good looks like. It was brilliant, I loved it. Their robots were absolutely
everything you would've... Didn't know, which is my favorite part. On the way back, I was
talking to the Inflight, Head of Inflight, and I asked her what
was her favorite robot. And she goes, "Robots? Did you not see the Amazon A to Z app? I want an A to Z app." And I was like, "What? Did
you not see the robots?" But got back to Dublin, I ran our contacts, and they put us in contact
with the Amazon team again to go through the their
Amazon's A to Z app. And we did a working backwards
session, got it going, and six months later we
actually released Ryanair, I'm sorry, with a very catchy title, the Ryanair's employee app. This is for our cabin crew
pilots across the network, gives you your roster,
gives your schedules, it'll give you ability to book time off based on transfer systems. Every need one location
has gone very well, it's been very positive, but
didn't fix all our problems. We had cabin crew, we had
concerns over training, how to upsell products,
grooming guidelines, where are all these documentation, and it was spread right across our network and in different places. It was in YouTube, it was in
PowerPoint, it was everywhere. We worked at AWS, we used Bedrock, and we actually built an employee bot. So suddenly you could ask questions like, from selling a coffee, how way you upsell a bar of
chocolate to go with that, or can I have a tattoo on my forehead? You can't, by the way, in
case didn't wanna check it. But it allowed people
to ask these questions. We'll have to search
through documentation, it was on your phone, in your pocket while you were traveling. See if the information
was touched at hand. It has gone very well. We hope to actually roll
out the Bedrock part to the business early next
year once it's been finished. Internal testing with our
senior cabin crew staff. Other areas of... We're using Bedrock. Well, we have a great plan
for in-app for employees, but after announcement, tested with Q, we might actually have a more of a refresh and see for the right tech stack in place. We're using for CodeWhisperer. It's been interesting as a way to go, we're excited about that. Projects, the one excites me the most is definitely gonna be
customer experience. We get about 10, 15% for our
daily calls, people ring in. But random questions that aren't actually related
with their actual site. They're like, "Can I bring
rollerblades on a plane?" Unusual questions like that. We have agents answering
the phone on queuing time. All these things can
be done through gen AI, and that's where we
exceeded huge excitement and a huge area of improvement. Also, I thank Bratin and
I saw in his presentation the very start of the
project that checked-in bag. So I'll be back to him with five and a half
thousand other questions and model recommendations to make that go forward
and make that go faster. And with that, I'll hand you back to Bratin. Thank you. (audience applauding) (upbeat music) - Thank you, John. We are so glad to be partnering with you on your generative AI journey. Let me now get to the
next key consideration, and that is using and
differentiating with your data. In fact, every machine
learning system we have built, and this predates generative AI. Every machine learning
system we have built uses data as a critical ingredient. And so it's really important for customers to be able to
build a robust data platform to drive their machine learning. To that end, AWS provides you with the
most comprehensive services to store, query, and analyze your data, and then to act on it with
business intelligence, machine learning, and generative AI. We also provide you services
to implement data governance and to do data cataloging. And best of all, you can use
these services in a modular way so you can use the services that you need. And I'm happy to announce
yet another data capability, the Amazon S3 Connectors for PyTorch. These connectors make
foundation model training a lot more efficient, and they do this by accelerating
some of the key perimeters that are used in
foundation model training, like saving and restoring checkpoints. Now, many customers use AWS
to build the data platform to drive their machine learning. And so I'm pleased to welcome Vipin Mayar, the Head of AI Innovation at Fidelity, to talk more about how
they build a data platform to drive the machine learning. (upbeat music) (audience applauding) (upbeat music) - All right, good afternoon. I am Vipin Mayar from
Fidelity Investments. We are a large financial services company. Data, AI is really important to us, and I believe you can only be good in AI if you have a very good data
strategy, data platforms, and data quality. Now, you're hearing
all this from everyone, and I thought we should unpack it a bit, and I'll tell you a little
bit about our journey and what's really important to us. Okay, we started seven years
ago in partnership with AWS. We've done a lot. A lot still remains to be done, and I could talk about many things, but I'll talk about three things that I feel are really important now. The first one is unstructured data. How well do you have it collected? How well do you have it organized? We started collecting
it five, six years ago. We started digitizing calls. We started streaming
all unstructured text, built features around them, gave access to end users
through Query tools so that over the years, they
have become familiar with text, which now, with LLMs, is
a critical capability. The second thing that I
believe is really important, especially with large companies, is to have an enterprise taxonomy. Very easy to say, very hard to do 'cause it requires getting
consistency of KPIs and a semantic layer to instrument it. We have been working at it, we've got a lot of KPIs in one place. That enables dashboards to
be spun off very easily. The third piece, which is an investment in
democratization of data, we've enabled Query
into our data platforms so people on the business
side can discover data and even have a social
interaction with other people regarding the data elements. Okay, those three things I
would single out pipelines. We've worked with AWS. The backend works pretty
well for us, okay. So now that you have sound data, let's quickly fast
forward to generative AI. There are four things we
are doing in generative AI. Conversational Q&A pairs,
especially for service reps. On the coding technical side, developer assist plus
looking at migration of code, translation of code, things
that I think many of you know. The third piece, perhaps the one that
gets talked about a lot in these conferences, is RAGs. Search, semantic search rendered through a
conversational interface. A lot of work in that and all the announcements
around vector stores. Really, all that work we
are doing in the third lane. And lastly, content generation
with a human in the loop. Okay, easy to say. But the challenges we face, let's talk about them for a minute or so. LLM's pace of innovation, incredible. If you go to Hugging Face, they add 1,000 new models every day. Claw 2.1, excellent. Big models, great. But we've gotta balance the large models with smaller, fit-for-purpose,
task-specific models. Doing that rapid experimentation
quickly, challenge. As you do this, getting capacity and managing
cost gain a challenge. Guarding against hallucinations, another challenge for us, okay. So with that, let me go to my last slide, which is: What is our approach? With classic machine learning, we don't talk much about
it, but you need... Your factory, we use SageMaker. We are now excited with Bedrock, but also SageMaker and being able to test and
experiment all these things. RAG tuning, prompting, being able to look at evaluation metrics. And really critical for us
a lot of work in that space. But let me end with where I began, which is all this can take a lot of time and can distract you from where I began, which is data. At the end of the day, there's a greater premium
now to data quality, and that's where we are still focused and a lot more to be done in that space. (audience applauding)
(upbeat music) - Thank you, Vipin. Incredibly important insights into how you build a robust data platform because without that, it's very hard to innovate
with machine learning. Let me now get to the
next key consideration, and that is integrating responsible AI. Any powerful capability needs to have the appropriate guardrail so it can be used in the right way. And if machine learning and
generative AI have to scale, it's incredibly important that we integrate responsible
AI into the way we work. And to that end, I'm pleased to announce that you can now use SageMaker Clarify to evaluate foundation models, and you can also get
the same functionality on Amazon Bedrock. So here is how it works. As a user, you come in
and select a few models. You choose the responsible
AI and quality criteria that you want to evaluate them on. If you want human evaluation, you can also choose a
specialized workforce, and then Clarify goes off, and
does the evaluation for you, and then it generates the report. So all of that work that
we had to do for Amazon Q, all of those evaluations and criteria, that was months of hard work. All of that gets a lot easier now. To talk more about responsible
AI in the enterprise, please welcome Arvind
Jain, the CEO of Glean. (audience applauding) (upbeat music) - Thanks, Bratin. It's great to be here. Glean is a modern work assistant that combines the power
of enterprise search and generative AI and helps employees in your company find answers to their questions using your company knowledge. It's like having an expert who's been at a company since day one, who has read every single
document that has been written, who's been part of every conversation that has happened in the company, who knows about every
employee's expertise, and then they're ready to assist you 24/7 with all of that
knowledge and information. That's what Glean does, and we're so excited
to be here at re:Invent and to announce our partnership with AWS. Today, I'm going to walk you through how we address the
challenge of responsible AI with our customers. Customers are really
excited about generative AI, but they want to know if they can trust the answers
they get back from AI. Here are the three main
questions on their mind. First, how do I know the
answers I receive are accurate? Everybody knows LLMs can hallucinate, and actually even more importantly, you have to provide
information to the LLMs. The input that you give to the LLM is going to decide how accurate
the output is going to be. And oftentimes in an enterprise, information can be out of date, and that can make the job of an LLM hard. The second challenge is: How do I know that I'm
using the best model? The market is evolving quickly. Each customer has different needs, priorities, and constraints. Glean needs to guide them
through this complex ecosystem and make it easy for them to get the LLMs that works best for them. And third, how do I make sure
that my company data is safe? Glean indexes all of your company's data, so we take this problem very seriously. We need to make it easy for our customers to keep their information safe and not have them worry about data leaks. Let's go dig a little bit
deeper into each one of these. So first, let's talk about
how we address the concerns around accuracy. The output of an LLM is
going to be only as good as input you're going to provide to it. To make the LLMs provide good answers, you need to use retrieval
augmented generation to provide it both the
right knowledge to work on, as well as to constrain its
output to that knowledge. A really good search engine
is the key to LLM accuracy, and that's at the core of Glean. Our search uses
technologies like SageMaker to train our semantic language models and LLM models from Bedrock to provide accurate answers to our users. After the LLM generates an answer, we apply post-processing to provide in-line citations
for everything in the answer. If a piece of information
doesn't have a citation, we exclude it from the response. All of this put together, a RAG system, backed by a powerful
enterprise search engine, and post-processing LLM responses are how we address customer
concerns around accuracy. Let's talk about model selection. Each customer has their own
needs and unique constraints that may require using different LLMs. So long as a model is able to pass our internal tests for accuracy, we want to enable customers to use it to power Glean Assistant
for their employees. Bedrock is awesome for this because it's easy to select from its large repository of models and pick the one that works
best for our customers. And finally, on the topic of: How do you make sure as an enterprise that your data is safe and secure? Bedrock is great because of
its compliance certifications and support for end-to-end encryption. It makes it easy for our
customers to feel confident that their data is secure and not being used for other purposes. Each Glean customer, in addition, gets their own proprietary AWS project running within their own
environment, secured environment. And no, none of your company
data leaves that environment, including the customized models that we've trained using
SageMaker inside that project. So as our customer, you get to use the latest search technologies
and AI technologies while making sure that all of your data resides within your own
premises, within your own VPC. And finally, the way Glean works is we connect with hundreds
of different applications and make sure that as
users are asking questions, the answers that they get back are limited to the knowledge
that they have access to. This is what we are
showing it in action here. The user came and asked a question: How do I set up Glean on AWS? And the system actually does a search using our core search engine, assembles the right pieces
of information and knowledge, and then uses the RAG technique to take all of that knowledge and give it to an LLM powered by Bedrock and synthesize answer and
response for the end user. When the answer comes back, we show the citations to the users on where the information come from. So this is how it works, and we are really excited
to be partnering with AWS. These are our first steps
of journey with AWS, and we're so excited to be here and bring the power of Glean
to more companies worldwide. Our entire team is excited to explore more services in future, like SageMaker Clarify,
Trainium, and Inferentia. And if you wanna learn more about Glean, you can visit our booth
in the exhibit hall or see our website at glean.com. Thank you so much. (audience applauding) (upbeat music) - Thank you, Arvind. We are really looking forward
to a partnership with Glean to take Glean to a lot more AWS customers. Let me talk next about
the fourth consideration for building and scaling your
generative AI applications, and that is having access to a low cost and highly-performant machine
learning infrastructure. Our hardware infrastructure
starts with the GPU instances, where we have the G5 instances that provide you the fastest inference and the P5 instances that provide you the fastest training. In addition, we also
have custom accelerators for generative AI, AWS Inferentia for doing inference and AWS Trainium for doing
training of generative AI models. And in fact, these customer accelerators provide you up to a 50%
better cost performance. Now, at AWS, hardware infrastructure
is just part of the story. We complement our hardware instances with a software infrastructure, and that is where SageMaker
provides you a fully managed end-to-end machine learning service that you can use to build, train, tune, and deploy all kinds of models: generative models, classical models, and deep learning models. And now, SageMaker has a number of purpose-built capabilities to help with generative AI. So earlier today, we
launched SageMaker HyperPod. Now, SageMaker HyperPod accelerates your generative
AI training by almost 40% due to its optimized
distributed training libraries. It also provides you automatic
self-filling clusters. Now, it's obvious why
performance is better. Customers get to train
their models faster. But why do we need to provide
self-filling clusters? Let me illustrate with an example. Before generative AI, customers would use small-scale cluster. So you would use maybe eight or 16 nodes and you would train your
models for a few days. At that small scale, the
probability of falls is negligible. Now when you get to generative AI, customers use tens of thousands of nodes, and they're training for months on end. At that scale, fall tolerance is critical because the probability
of falls is very high. And in fact, if your
software infrastructure is not resilient, it's going to be very
hard to train your models because it'll become a
start and stop exercise. And therefore, we are now
providing self-filling clusters. Let me illustrate how they work. So as a user, when you
use SageMaker HyperPods, the first thing that happens is that your model and
data get distributed to all the instances in the cluster. And this makes sure that the
training can happen in parallel so that the training can get done quickly. Once that happens, SageMaker then also
automatically checkpoints your applications. It's saving the state of your training job at regular intervals. At the same time, SageMaker also monitors all of
the instances in the cluster, and if it finds an unhealthy instance, it removes it from the cluster, it replaces it with a healthy instance, it then goes and resumes from
the last saved checkpoint. So it resumes the training job from the last saved checkpoint and then runs it to completion. All of this without the user having
to worry about resiliency or fault tolerance. I'm also pleased to announce that SageMaker is now launching
a number of optimizations to make inference more efficient. So it's reducing the cost of
large language model inference by almost 50% and reducing the latency by almost 20%. Here is how it works. So today, when customers
deploy foundation models for inference, they deploy models on a single instance. And what happens is that that instance
is often underutilized, and that increases the
cost for the customer. So what SageMaker allows now is that you can allocate multiple different foundation
models onto the same instance, and you can control the resources that you're allocating
for each foundation model. Like, you can auto scale
on a per model basis. Not just that, it also
does intelligent routing, so it looks at the load of
the different instances, and then it directs incoming requests to the instance that is
the most likely loaded. And as a result, it can reduce
inference latency by 20%. It's optimizations like this that make SageMaker the best
place to build, train, tune, and deploy foundation models. And to talk more about this, please welcome... Dr. Ebtesam Almazrouei the Chief AI Researcher and
Executive Director at TII. (audience applauding) (upbeat music) - Good afternoon, everyone. Thank you for joining us today. One of the most important
thing in advanced technology, it is when you are
developing a technology, you have to think about the
Sustainable Developments Goal. And advanced technology, it has improved the
acceptance to transformation and communication, facilitated sustainable energy solutions. Not only this, but also transformed
agriculture and healthcare, and promoted innovation and advanced technology infrastructure. You can see here, however all
of this advanced technology, it's very important to
address the digital divide, ethical considerations, privacy concerns. It's crucial to ensure
equitable distributions of technology's benefit for all of us all. We believe that openness is the key to harness
technology potentials while safeguarding human rights and achieving sustainable
development for all of us. Open large language model is a step forward to achieve this goal. LLMs, or large language models, are forging a golden era of possibilities, from personalizing learning experiences to summarizing massive amount of docks. Not only these, but these
algorithms have proven that they can crack the code of NLP. By harnessing language, LLMs are helping us not only
to solve our daily life task, but also to contribute to
the most pressing issues of our time. That's why in Technology
Innovation Institute, we invested in building our Falcon LLMs. We started in 2022 by building NOOR, one of the largest Arabic
NLP model in the world. Leveraging the power of cloud
made it all possible for us. AWS-accelerated compute infrastructure allowed us to proceed and
process massive amount of data, train models with billion of parameters and trillion number of tokens. Not only that, but significantly reduced
the operational overhead. To take you through our journey, we leveraged SageMaker to pre-process petabyte scale with data to generate approximately
12 terabyte of data, representing about 5 trillion tokens. To put it in context, 5 trillion tokens, it's
about 3 million books, each book with an average of 400 pages. Can you imagine the amount of the data? Then, what we did is all of this data set, we used them to train all our Falcon LLMs: 7B, 40B, 180 Billion parameters. On a large-scale high
performance compute clusters, we managed to achieve up to 166 teraFLOPS, thanks to the optimized
AWS infrastructure. And again, to give you and to give you a sense of that scale, if a single person is solving
a math problem in five seconds to reach to 166 teraFLOPS, he needs 22,000 years to solve what that cluster
can solve in only one second. Then going from Falcon-7B to 40B, all the way to
Falcon-180-Billion parameter, we needed also to scale
our compute capacity. So SageMaker was able to
seamlessly scale up to 4,000 GPUs. After that, we, of course,
did our model evaluation using SageMaker real-time endpoint. Not only this, but we did
our own human evaluations. This rigorous evaluation
process is to ensure that Falcon is not just a technological advancement, but also particularly effective
and also ethically sound. So what we did, as a team, we built assembled serverless architecture and leveraged a Slack channel to evaluate all the model's answers. Finally, I am glad to let you know that all our Falcon LLMs today, they are now available as
part of SageMaker JumpStart, and you can start deploying them, fine-tune them with only single click. Starting with the adoption
Falcon-180-Billion parameter is now the largest and top-performing
open-source model in the world in the Hugging Face. It has been downloaded
over 20 million times. And what that can tell you, it can showcase the strong desire and interest for open-source LLMs. Now, I want to share some of
the best practices that we have and it enable us through
our AI innovation. First, you wanted to
foster visionary thinking at all levels. So we encourage all our researchers to continuously explore new ideas and challenge also all the assumptions. Second, we also wanted to
ensure adequate capacity for our experimentation. So it is very crucial to provide access to large-scale compute not only to do the necessary step, but also to empower and
constrain exploration and experimentation. Third, you have to have an institute rigorous evaluation of protocols. We thoroughly benchmark all new methods, testing and also validating them. This prevents overoptimistic results and also ensure real-world viability. In summary, embracing
the journey thinking, scaled experimentation,
rigorous evaluation, and collaboration with vendors like AWS, we are committed to continue
applying these best practices from a seed of an idea to
a garden of opportunities to deliver groundbreaking innovation. Let's all shape the future of AI. Thank you. (audience applauding) (upbeat music) - Thank you, Dr. Almazrouei. It's really amazing work going on in TII on foundation models. Now, SageMaker is also focused on making machine learning accessible to people who may not be
experts at machine learning or who may not be experts at coding. And that is why two years back, we launched SageMaker Canvas, a no-code interface for building and deploying your
machine learning models. And now, with generative AI, I'm pleased to announce that SageMaker Canvas's no-code interface is also being extended
to foundation models so you can build, and customize,
and train, and tune models, all with the no-code interface. And so data analysts, business
analysts, finance analysts, citizen data scientists who may
not be proficient at coding, who may not be proficient
with machine learning can still build generative AI. Let me now get to the
final key consideration for accelerating your
generative AI journey, and that is using generative
AI-powered applications. Many customers tell us
that they would like AWS to provide generative AI applications for important enterprise workflows, like in the contact center,
like for personalization, like document processing,
or even healthcare. Earlier this week, we launched a general
availability of AWS HealthScribe that uses generative AI to accelerate clinical productivity. Today, when a patient
has to go to a physician, that patient-physician interaction has to be scribed manually, and doctors can spend
almost 40% of the time, 40% of the time on this manual work. That is time that's not
being spent on patient care. And so AWS HealthScribe uses AI to automatically analyze that
patient-physician conversation and then uses generative AI
to create a clinical summary that can be uploaded to your
electronic health records. And so software vendors,
healthcare software vendors, can now use generative AI to
enhance clinical productivity. To talk more about this, please welcome Tom Herzog, the Chief Operating Officer at Netsmart. (audience applauding) (upbeat music) - Thank you. (upbeat music) Tom Herzog. Grateful for the opportunity
to represent the cause and communities that we serve because at the end of the day,
that's what healthcare is. Healthcare is about people helping people. We've been digitizing
healthcare for decades now. It's been about more and more data. And the questions we're all asking now: What are we gonna do with that data? Whether we're a provider,
all of us are consumers, and healthcare is absolutely
a universal language. I want to introduce this notion that these tools that we're talking about that we've all now arrived at,
that we're so excited about, it truly is about addition
through subtraction. See, I believe that less is more. And as we talk about HealthScribe and we talk about Bedrock, what we're really talking about is: How can we be more efficient,
less task, less input, so that caregivers can see more people at the right time when they need it most? The challenge, we all know the demand
far outpaces the supply. That when we schedule
our own appointments, we're limited with the number of options because of the need that's out there. I'm gonna talk about
that here in a second. Let me get to a very
pragmatic idea and solution. Providers spend over 40% of
their times two days a week, those in telehealth sessions, just doing documentation. That's two days that they're
unable to see someone. And if you ask them, 15 to 45% of the information they have while what they're using is really good, they need more, more contextual awareness, not just for when they're talking to you right then and there, but for things that may
have been known weeks ago, months ago, years ago, that contextual awareness, if you will. Let's frame the challenge
in how this is impacting us as a society in our communities. We know that over 50 million
people will be challenged with a mental healthcare illness
or crisis in a given year. We know that over 60% of our youth do not receive treatment for things that they may be suffering with like depression or anxiety. And we know that nearly 25% of adults, their needs go unmet for the
treatment that they're seeking or that they're not even
aware that they need. This creates an opportunity for us to do something different. This is the team, this is the cause and communities
that we proudly serve. This is also the team of innovators and designers who are working together to change the healthcare
landscape as we know it. We serve over 754,000 providers who are touching over 133 million lives and beyond what we know
as traditional medicine of acute or primary care. We're talking about community
services, public health, intellectual development
and disability needs, those who have foster
or family care services, long-term care, hospice care. This is a real opportunity for all of us. Simply, as we look at the
things that we're doing, here's what we need to
focus on as a solution. Not usability, not less clicks. We need extreme usability to
reduce the burden on providers, so that they can accelerate, and prove, and optimize the outcomes for the people that they are seeing. We have a unique opportunity using tools, solutions like HealthScribe
and Amazon Bedrock, to do something simple. Let's give those two
days back to caregivers so that they can see more people. Let's streamline discharge so that as you need to
connect with other people, that information is relevant to you right then and right there. And let's transform
collaboration as we know it and take manual processes away to introduce how this system can cohesively follow
you anywhere, anyhow. Why did we choose these tools? Quite simple. HealthScribe and Bedrock produce ready-built,
purpose-built solutions that we can plug into
our systems right now, it's able to scale with us
from a performance standpoint, and it has the ability to integrate across the ecosystem very uniquely. And lastly, give you back to the solution, the notion that we started with. Imagine a telehealth session, if you will, where you're not only just
capturing the information, you're doing it systematically, you're doing it with a
great degree of accuracy. But using tools within Bedrock to pull forward that information so that as I am interacting with you, I can look back a week, six months, a year to have relevant information to suggest the right
treatment plan going forward. And while we often talk about
the tools and the technology, and I love it and I'm a geek at heart, what this really takes is for
all of us working together. Our relationship with AWS just isn't about how we can
use these in a systematic way. Beyond partnership, it's
about collaboration. 'Cause the things that we're talking about in healthcare today isn't about tomorrow. It's happening right here, right now. And we're deeply grateful and appreciative for that partnership. Dr. Saha, appreciate the time and the opportunity to share our story. Thank you. (audience applauding) (upbeat music) - Thank you, Tom. It's amazing how Netsmart is
embedding AI and generative AI into the healthcare space. Let me now summarize the
key points of my talk. At AWS, we are focused
on helping customers build and scale generative
AI for the enterprise. And when you're building
for the enterprise, it's important to pay attention
to some key considerations. This is what we learned from
building our own applications, and I believe this will be applicable when you build your own applications. First, you want choice
and flexibility of models. Second, you want to use and
differentiate with your data. Next, you want to integrate responsible AI into your applications. You also need to have access to a low cost and highly-performant machine
learning infrastructure. And finally, in many cases, you want to get started with
the generative AI applications that we provide for contact
centers, personalization, document processing,
healthcare, and others. Thank you for coming and please
enjoy the rest of re:Invent. Please don't forget to fill
the survey for this session. Thank you. (audience applauding)