AWS re:Invent 2023 - Innovate faster with generative AI (AIM245)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
(upbeat music) - Hi, Kimberly, how are you? - Hi, Bratin, I'm doing well. How about you? - Good, thank you. Can you believe it's re:Invent season all over again? - I cannot. Time really flies, doesn't it? - Really. So how are we doing on our slides? - Well, Bratin, if you remember, you made a really big bet last year. - I did? - And here's your money. - Really? I don't win bets. Are you sure I won the bet? - I'm sure. (Kimberly laughs) - Wow. So this is about the one where we can use generative AI to make my slides like you will just type in some text and out will come the slides? - Yep, that's exactly right. I'd like to show you what I did using generative AI in Amazon Bedrock to work on your deck. I think it's pretty cool. - [Bratin] Nice. - So first, I go into Amazon Bedrock, and I just click Get started, and I'm gonna go into the Text playground to generate some talking points for you. In this case, I'm going to select the Amazon Titan model and Titan Express. And if you remember, we've been working hard on building some messaging, and I'm gonna use that as context for the Titan model. - From the marketing messages to the slides, that's amazing. - Yeah, that's right. So put all this context in, which is the messaging we've been working on for generative AI, and I'm gonna ask Titan to come up with five key themes that you should highlight in your talk. (bright music) Let's see how it does. (bright music) - [Bratin] Wow. This is as good as what you and I would've done. - [Kimberly] I agree, these talking points are right on. I think we can use them for your talk. - [Bratin] Amazing. - Your slide deck is ready to go. I think we can take it to Vegas. (upbeat music) - Good afternoon, everyone. Welcome, and thank you for being here. A year back, using generative AI to create slides for my presentation might have seemed fanciful, but here we are. Not only can it generate slides, but a whole lot more. And in today's talk, I'll talk about what it takes to build and scale generative AI for the enterprise. Because when you're building for the enterprise, it's important to pay attention to some key considerations. This slide generated by Amazon Bedrock lists those key considerations. And in my talk, I'll discuss why these considerations are important and how AWS helps you address these considerations. We'll also have customer speakers come and talk to these key considerations. So for example, we will have Ryanair, one of the largest airlines in the world, talk about choice and flexibility of models. We'll have Fidelity, one of the largest financial companies in the world, talk about differentiating with your data. We'll have Glean, one of the most popular AI-driven enterprise search assistance, talk about responsible AI in the enterprise. We'll also have TII talk about machine learning infrastructure. And finally, we'll have Netsmart, one of the largest healthcare community providers, talk about using generative AI applications. But first, let me take a step back and discuss why generative AI is so transformational. Over the last five years, the pace of innovation in compute, in data, and in machine learning has accelerated, and this acceleration was driven by the cloud. The cloud made massive amounts of compute and massive amounts of data available to everyone, and as a result, practitioners in industry and academia were able to innovate rapidly on machine learning. In fact, almost every frontier machine learning model was born in the cloud. Now, let me give you a little bit more context on the space of innovation that has been driven by the cloud. Over the last six years, the amount of compute that we use for machine learning has grown by more than 100,000 times, the amount of data that we use for training machine learning models has grown by more than 100 times, and the size of models has grown by more than 1,000 times. This is a pace of innovation that we have never before seen in the history of information technology, and this space of innovation has allowed us to create models that are trained on internet scale data, the so-called foundation models. Let me give you a little bit of a feel on what it takes to build one of these foundation models. A human being, you and me, in the course of a lifetime, in the course of our entire lifetime, a human being listens to about 1 billion to 2 billion words. Now, when we train these foundation models, we are training these models with trillions of words. In other words, we train these models with thousands of times more information than a human being will listen to in their entire lifetime. There's another way to look at this. When we train these foundation models, we train them with terabytes of data that is thousands of times more than the information contained in Wikipedia. And so when you pack so much information into these models, they start having very interesting properties. But the question that most customers care about is: How do I build applications out of these models? How do we put these models to work? And so I'm going to talk about how we, at AWS, have been building generative AI applications because I think the lessons we have learned and the considerations that we had to keep in mind will also apply when you want to build and scale your own generative AI applications. Earlier this week, we launched Amazon Q. Amazon Q is a generative AI application that uses generative AI to transform how employees access the company's data. So you can use Q to ask questions about a company's data. You can use Q to create content on your company's data. You can also use Q to act on your behalf on your company's data. And so I'm going to use Q to illustrate the considerations that we had to keep in mind and how we went about building an enterprise-scale generative AI application. So the first question that we had to ask ourselves, and I suspect you'll have to ask yourselves is: Where do I get started? How do I choose a foundation model to build an application with? And this is not an easy question to answer because every model has its own strengths and weaknesses, and so it's important to do a lot of experimentation to figure out which model to use. Let me illustrate with some examples. Suppose I asked this question to two different foundation models, and these questions and answers are actually real questions and real answers that we asked many different foundation models. So I ask: What is your shoe return policy? Model 1 gives me a quick concise answer, free returns within 30 days. Model 2 gives me a longer, more complete answer. Both of them are accurate answers. Which model do you want to start with? That actually depends on the application you have in mind. For example, if you want to build an application to generate ad copies, you want Model 1 because you want brief, concise statements. On the other hand, if you're looking to build a customer service chat bot where you actually want to have a verbose interaction with the customer, then you want Model 2. Let me give you another example. Suppose I asked this question: What is your checked bag policy? Model 1 gives me a quick accurate answer. Model 2 also gives me a correct answer. It's more complete, but it takes longer to generate and it takes longer to generate because it has to do a lot more compute. And because Model 2 has to do a lot more compute, it's an order of magnitude more expensive. And so now, you have to ask the question: Do I really want to pay an order of magnitude more for Model 2, or am I better off paying a fraction of the cost and using Model 1 because it gives the answers that customers care about? And so it's very important as you think about building and scaling generative AI that you run a whole set of tests and you work back from your use case. And so let me show you the results of running a whole set of tests for Amazon Q on many different models. But first, let me talk about some of the parameters that we use for evaluating these models. So there's cost effectiveness. How expensive is it to use this model? There's completeness that I talked about before. There's low hallucination. So when a model has low hallucinations, it's a lot more accurate. Then, this conciseness that I talked about. And finally, there's latency. How quickly do I get an answer back? Now, when we did our actual evaluations, we used a lot more parameters, but I put up five-year because they get the point across. And so now, let's look at the results from the first two models. And what we'll notice here is that Model 1 is not as cost-effective, but Model 1 is a lot more complete and now it's not clear which model to use. And so we said, "You know what? Let's go and try a few other models." So we took another model and ran the whole set of tests, and the results were, again, the same. The models are good in some dimensions, they're not so good in other dimensions. And we ran our tests against many different models, and the results were always the same. Models have strengths and models have weaknesses. And I can bet you that as you build generative AI applications for the enterprise, you too will likely have to go through a similar process where it'll end up with some models that are good for some things and others that are good for other things. So where did we end up? Here is where we ended up. We picked a model that's good on the cost access and said, "Let's go and optimize it on the other dimensions." And it's very likely that as you build applications, you too will probably have to make a similar choice where you pick something that's good on some dimensions and then go in and optimize it on other dimensions. So what were the optimizations that we had to do? When we started building Q, we thought we would use a single large model, we thought we would take the largest model and run with it. Turns out, that's not where we ended up. We actually ended up using many different models. Each one of them somewhat specialized to the task. Let me explain why this was the case. So when a user sends a query to Q, Q has to do a bunch of things, it has to first understand the intent of the query. What is the user trying to get done? It then needs to retrieve the right data for the query. It then needs to formulate the answer for the query. It then needs to do a bunch of other things. And so it turns out that using a single model wasn't the optimal experience. Using multiple different heterogeneous models ended up giving a better experience. Now, we thought this was counterintuitive, and you may also think this was counterintuitive, until we realized there's really a very interesting analogy to how the human brain works. It turns out that the human brain is not one homogeneous thing. It actually has multiple different heterogeneous parts that are each specialized to different tasks. So for example, the frontal cortex that deals with reasoning and logical thinking is constructed differently than the limbic system that deals with fast, spontaneous responses. Even the neural structures are different. And so it's probably not surprising that when we considered all of the tasks that Amazon Q has to do, we ended up with the heterogeneous model architecture. What were some of the other optimizations that we had to do for Q? Once we took care of the models, we actually had to spend a lot of time on the data engineering. Let me explain why. Suppose I asked this question to Q. Tell me about my customer meeting tomorrow at 10:00 AM. Notice that Q now has to access multiple data sources. It needs to first go and look at my calendar to figure out what meeting I have. It then needs to look at my CRM system, my customer relationship management system, to figure out details about the customer. It then needs to look at other company documents to understand how we are interacting with the customer. And so what Q has to do is it has to aggregate data from multiple data sources to be able to give me a helpful answer. And so we spent a lot of time on building enterprise data connectors on data processing, data pre-processing, data post-processing, data quality checks, to ensure that Q had the right data quickly and efficiently. Now, once we got done with machine learning model design and once we got done with the data engineering, we thought we were done. Turns out, that was not the case. Let me explain why. Suppose I asked this question to Q. What is the expected revenue of products this quarter? This is company confidential information. What this means is that some people should have access to this answer, but not everyone. And so in this case, if the software engineer is asking this question, Q should say, "Sorry, I can't give you the answer." But the CEO is asking this question, Q should be able to give some answer. In other words, Q or any enterprise application needs to respect the access control policies on the data. It should only give answers that a user is entitled to have. And so we have to spend a lot of time on building access management, block topics, sensitive topics in general, on building responsible AI capabilities. Now, to build all of these, we also needed a performant and low-cost machine learning infrastructure. And this leads me to the key considerations for accelerating your generative AI journey. First, you want to have choice and flexibility of models. Second, you want to be able to use and differentiate with your data. Third, you want to integrate responsible AI into your applications. Next, you want to have access to a low cost and performant machine learning infrastructure. And finally, in many cases, you want to get started with a generative AI application. Let me now dig deeper into each one of these, starting with choice and flexibility of models. In fact, this is why we launched Amazon Bedrock. Amazon Bedrock. Amazon Bedrock is the easiest way to build scalable applications using foundation models. It gives you a range of state of of that foundation models that you can use as is or you can customize them with your data. You can also use Bedrock agents that can act on your behalf. And so to talk more about how customers are innovating with Amazon Bedrock, please welcome John Hurley, the Chief Technology Officer at Ryanair. (audience applauding) (upbeat music) - Thank you, Bratin. - Hello, everybody. My name is John Hurley, I'm the CTO of Ryanair. Who is Ryanair? We're Europe's favorite and largest airline. We will fly 185 million passengers this year, and that will grow to 300 million over the next coming years as the new aircraft orders come in. Two key stats I love about Ryanair is we fly 3,300 flights per day and carry 600,000 passengers all on 737 aircraft. A very efficient operation, high-volume, high-energy. And the IT department which I work in has to go at the same speed as the business. COVID came. We actually had a chance to breathe. Unlike other people, we saw it as a positive. We took the bucket list and tackled projects we've been trying to tackle for years. For example, we use the SageMaker for dynamic pricing. We've now dynamically priced every single fare and every single ancillary product on our website. That's over a million different price points getting calculated continuously 24/7. We use SageMaker for predictive maintenance. It's been interesting, some early positive prototypes have gone well with a lot to in that space. It's very interesting. It'll help us in our operational efficiency, and we look forward to that going forward, further forward. We use SageMaker for packing our fresh food. It wasn't all about SageMaker tank, unfortunately, but we did other projects there as well. For example, we got rid of the papers copy using TraceNet. And during COVID, we had 30 odd different European governments who had different regulations being thrown on us all about safety, and procedures, and information being shared, and we're constantly doing that. And if we didn't have the likes of Lambdas and AWS being fully in the cloud, we'd have been snookered in that world. For example, I think one government was Italians gave us three days to build a COVID wallet. And we only did it because of the power of technology of Lambdas that can do it in that space. And while we were in that case, thought we were bored, we're also refunding... Refunding, refunding and processing refunds to over 20 million passengers. So it was a very busy time, we did a lot. We circle back and that project about the fresh food, we call it the panini predictor, but it's catchy title. The idea was, was how to give packing plans to the business, so it could actually have the right fresh food in every single flight. We did it, and it's a very interesting example here of where the theory was wonderful in paper, it was brilliant with data scientists, they're over the moon. We handled over these packing plans and it was a car crash. It was absolutely impossible to have 550 different packing plans across 93 bases at 4:00 in the morning. So we were stoked, we spoke to Amazon, sorry our AWS partners, they put us in contact with Amazon, who actually gave us a tour of their fab to show us what good looks like. It was brilliant, I loved it. Their robots were absolutely everything you would've... Didn't know, which is my favorite part. On the way back, I was talking to the Inflight, Head of Inflight, and I asked her what was her favorite robot. And she goes, "Robots? Did you not see the Amazon A to Z app? I want an A to Z app." And I was like, "What? Did you not see the robots?" But got back to Dublin, I ran our contacts, and they put us in contact with the Amazon team again to go through the their Amazon's A to Z app. And we did a working backwards session, got it going, and six months later we actually released Ryanair, I'm sorry, with a very catchy title, the Ryanair's employee app. This is for our cabin crew pilots across the network, gives you your roster, gives your schedules, it'll give you ability to book time off based on transfer systems. Every need one location has gone very well, it's been very positive, but didn't fix all our problems. We had cabin crew, we had concerns over training, how to upsell products, grooming guidelines, where are all these documentation, and it was spread right across our network and in different places. It was in YouTube, it was in PowerPoint, it was everywhere. We worked at AWS, we used Bedrock, and we actually built an employee bot. So suddenly you could ask questions like, from selling a coffee, how way you upsell a bar of chocolate to go with that, or can I have a tattoo on my forehead? You can't, by the way, in case didn't wanna check it. But it allowed people to ask these questions. We'll have to search through documentation, it was on your phone, in your pocket while you were traveling. See if the information was touched at hand. It has gone very well. We hope to actually roll out the Bedrock part to the business early next year once it's been finished. Internal testing with our senior cabin crew staff. Other areas of... We're using Bedrock. Well, we have a great plan for in-app for employees, but after announcement, tested with Q, we might actually have a more of a refresh and see for the right tech stack in place. We're using for CodeWhisperer. It's been interesting as a way to go, we're excited about that. Projects, the one excites me the most is definitely gonna be customer experience. We get about 10, 15% for our daily calls, people ring in. But random questions that aren't actually related with their actual site. They're like, "Can I bring rollerblades on a plane?" Unusual questions like that. We have agents answering the phone on queuing time. All these things can be done through gen AI, and that's where we exceeded huge excitement and a huge area of improvement. Also, I thank Bratin and I saw in his presentation the very start of the project that checked-in bag. So I'll be back to him with five and a half thousand other questions and model recommendations to make that go forward and make that go faster. And with that, I'll hand you back to Bratin. Thank you. (audience applauding) (upbeat music) - Thank you, John. We are so glad to be partnering with you on your generative AI journey. Let me now get to the next key consideration, and that is using and differentiating with your data. In fact, every machine learning system we have built, and this predates generative AI. Every machine learning system we have built uses data as a critical ingredient. And so it's really important for customers to be able to build a robust data platform to drive their machine learning. To that end, AWS provides you with the most comprehensive services to store, query, and analyze your data, and then to act on it with business intelligence, machine learning, and generative AI. We also provide you services to implement data governance and to do data cataloging. And best of all, you can use these services in a modular way so you can use the services that you need. And I'm happy to announce yet another data capability, the Amazon S3 Connectors for PyTorch. These connectors make foundation model training a lot more efficient, and they do this by accelerating some of the key perimeters that are used in foundation model training, like saving and restoring checkpoints. Now, many customers use AWS to build the data platform to drive their machine learning. And so I'm pleased to welcome Vipin Mayar, the Head of AI Innovation at Fidelity, to talk more about how they build a data platform to drive the machine learning. (upbeat music) (audience applauding) (upbeat music) - All right, good afternoon. I am Vipin Mayar from Fidelity Investments. We are a large financial services company. Data, AI is really important to us, and I believe you can only be good in AI if you have a very good data strategy, data platforms, and data quality. Now, you're hearing all this from everyone, and I thought we should unpack it a bit, and I'll tell you a little bit about our journey and what's really important to us. Okay, we started seven years ago in partnership with AWS. We've done a lot. A lot still remains to be done, and I could talk about many things, but I'll talk about three things that I feel are really important now. The first one is unstructured data. How well do you have it collected? How well do you have it organized? We started collecting it five, six years ago. We started digitizing calls. We started streaming all unstructured text, built features around them, gave access to end users through Query tools so that over the years, they have become familiar with text, which now, with LLMs, is a critical capability. The second thing that I believe is really important, especially with large companies, is to have an enterprise taxonomy. Very easy to say, very hard to do 'cause it requires getting consistency of KPIs and a semantic layer to instrument it. We have been working at it, we've got a lot of KPIs in one place. That enables dashboards to be spun off very easily. The third piece, which is an investment in democratization of data, we've enabled Query into our data platforms so people on the business side can discover data and even have a social interaction with other people regarding the data elements. Okay, those three things I would single out pipelines. We've worked with AWS. The backend works pretty well for us, okay. So now that you have sound data, let's quickly fast forward to generative AI. There are four things we are doing in generative AI. Conversational Q&A pairs, especially for service reps. On the coding technical side, developer assist plus looking at migration of code, translation of code, things that I think many of you know. The third piece, perhaps the one that gets talked about a lot in these conferences, is RAGs. Search, semantic search rendered through a conversational interface. A lot of work in that and all the announcements around vector stores. Really, all that work we are doing in the third lane. And lastly, content generation with a human in the loop. Okay, easy to say. But the challenges we face, let's talk about them for a minute or so. LLM's pace of innovation, incredible. If you go to Hugging Face, they add 1,000 new models every day. Claw 2.1, excellent. Big models, great. But we've gotta balance the large models with smaller, fit-for-purpose, task-specific models. Doing that rapid experimentation quickly, challenge. As you do this, getting capacity and managing cost gain a challenge. Guarding against hallucinations, another challenge for us, okay. So with that, let me go to my last slide, which is: What is our approach? With classic machine learning, we don't talk much about it, but you need... Your factory, we use SageMaker. We are now excited with Bedrock, but also SageMaker and being able to test and experiment all these things. RAG tuning, prompting, being able to look at evaluation metrics. And really critical for us a lot of work in that space. But let me end with where I began, which is all this can take a lot of time and can distract you from where I began, which is data. At the end of the day, there's a greater premium now to data quality, and that's where we are still focused and a lot more to be done in that space. (audience applauding) (upbeat music) - Thank you, Vipin. Incredibly important insights into how you build a robust data platform because without that, it's very hard to innovate with machine learning. Let me now get to the next key consideration, and that is integrating responsible AI. Any powerful capability needs to have the appropriate guardrail so it can be used in the right way. And if machine learning and generative AI have to scale, it's incredibly important that we integrate responsible AI into the way we work. And to that end, I'm pleased to announce that you can now use SageMaker Clarify to evaluate foundation models, and you can also get the same functionality on Amazon Bedrock. So here is how it works. As a user, you come in and select a few models. You choose the responsible AI and quality criteria that you want to evaluate them on. If you want human evaluation, you can also choose a specialized workforce, and then Clarify goes off, and does the evaluation for you, and then it generates the report. So all of that work that we had to do for Amazon Q, all of those evaluations and criteria, that was months of hard work. All of that gets a lot easier now. To talk more about responsible AI in the enterprise, please welcome Arvind Jain, the CEO of Glean. (audience applauding) (upbeat music) - Thanks, Bratin. It's great to be here. Glean is a modern work assistant that combines the power of enterprise search and generative AI and helps employees in your company find answers to their questions using your company knowledge. It's like having an expert who's been at a company since day one, who has read every single document that has been written, who's been part of every conversation that has happened in the company, who knows about every employee's expertise, and then they're ready to assist you 24/7 with all of that knowledge and information. That's what Glean does, and we're so excited to be here at re:Invent and to announce our partnership with AWS. Today, I'm going to walk you through how we address the challenge of responsible AI with our customers. Customers are really excited about generative AI, but they want to know if they can trust the answers they get back from AI. Here are the three main questions on their mind. First, how do I know the answers I receive are accurate? Everybody knows LLMs can hallucinate, and actually even more importantly, you have to provide information to the LLMs. The input that you give to the LLM is going to decide how accurate the output is going to be. And oftentimes in an enterprise, information can be out of date, and that can make the job of an LLM hard. The second challenge is: How do I know that I'm using the best model? The market is evolving quickly. Each customer has different needs, priorities, and constraints. Glean needs to guide them through this complex ecosystem and make it easy for them to get the LLMs that works best for them. And third, how do I make sure that my company data is safe? Glean indexes all of your company's data, so we take this problem very seriously. We need to make it easy for our customers to keep their information safe and not have them worry about data leaks. Let's go dig a little bit deeper into each one of these. So first, let's talk about how we address the concerns around accuracy. The output of an LLM is going to be only as good as input you're going to provide to it. To make the LLMs provide good answers, you need to use retrieval augmented generation to provide it both the right knowledge to work on, as well as to constrain its output to that knowledge. A really good search engine is the key to LLM accuracy, and that's at the core of Glean. Our search uses technologies like SageMaker to train our semantic language models and LLM models from Bedrock to provide accurate answers to our users. After the LLM generates an answer, we apply post-processing to provide in-line citations for everything in the answer. If a piece of information doesn't have a citation, we exclude it from the response. All of this put together, a RAG system, backed by a powerful enterprise search engine, and post-processing LLM responses are how we address customer concerns around accuracy. Let's talk about model selection. Each customer has their own needs and unique constraints that may require using different LLMs. So long as a model is able to pass our internal tests for accuracy, we want to enable customers to use it to power Glean Assistant for their employees. Bedrock is awesome for this because it's easy to select from its large repository of models and pick the one that works best for our customers. And finally, on the topic of: How do you make sure as an enterprise that your data is safe and secure? Bedrock is great because of its compliance certifications and support for end-to-end encryption. It makes it easy for our customers to feel confident that their data is secure and not being used for other purposes. Each Glean customer, in addition, gets their own proprietary AWS project running within their own environment, secured environment. And no, none of your company data leaves that environment, including the customized models that we've trained using SageMaker inside that project. So as our customer, you get to use the latest search technologies and AI technologies while making sure that all of your data resides within your own premises, within your own VPC. And finally, the way Glean works is we connect with hundreds of different applications and make sure that as users are asking questions, the answers that they get back are limited to the knowledge that they have access to. This is what we are showing it in action here. The user came and asked a question: How do I set up Glean on AWS? And the system actually does a search using our core search engine, assembles the right pieces of information and knowledge, and then uses the RAG technique to take all of that knowledge and give it to an LLM powered by Bedrock and synthesize answer and response for the end user. When the answer comes back, we show the citations to the users on where the information come from. So this is how it works, and we are really excited to be partnering with AWS. These are our first steps of journey with AWS, and we're so excited to be here and bring the power of Glean to more companies worldwide. Our entire team is excited to explore more services in future, like SageMaker Clarify, Trainium, and Inferentia. And if you wanna learn more about Glean, you can visit our booth in the exhibit hall or see our website at glean.com. Thank you so much. (audience applauding) (upbeat music) - Thank you, Arvind. We are really looking forward to a partnership with Glean to take Glean to a lot more AWS customers. Let me talk next about the fourth consideration for building and scaling your generative AI applications, and that is having access to a low cost and highly-performant machine learning infrastructure. Our hardware infrastructure starts with the GPU instances, where we have the G5 instances that provide you the fastest inference and the P5 instances that provide you the fastest training. In addition, we also have custom accelerators for generative AI, AWS Inferentia for doing inference and AWS Trainium for doing training of generative AI models. And in fact, these customer accelerators provide you up to a 50% better cost performance. Now, at AWS, hardware infrastructure is just part of the story. We complement our hardware instances with a software infrastructure, and that is where SageMaker provides you a fully managed end-to-end machine learning service that you can use to build, train, tune, and deploy all kinds of models: generative models, classical models, and deep learning models. And now, SageMaker has a number of purpose-built capabilities to help with generative AI. So earlier today, we launched SageMaker HyperPod. Now, SageMaker HyperPod accelerates your generative AI training by almost 40% due to its optimized distributed training libraries. It also provides you automatic self-filling clusters. Now, it's obvious why performance is better. Customers get to train their models faster. But why do we need to provide self-filling clusters? Let me illustrate with an example. Before generative AI, customers would use small-scale cluster. So you would use maybe eight or 16 nodes and you would train your models for a few days. At that small scale, the probability of falls is negligible. Now when you get to generative AI, customers use tens of thousands of nodes, and they're training for months on end. At that scale, fall tolerance is critical because the probability of falls is very high. And in fact, if your software infrastructure is not resilient, it's going to be very hard to train your models because it'll become a start and stop exercise. And therefore, we are now providing self-filling clusters. Let me illustrate how they work. So as a user, when you use SageMaker HyperPods, the first thing that happens is that your model and data get distributed to all the instances in the cluster. And this makes sure that the training can happen in parallel so that the training can get done quickly. Once that happens, SageMaker then also automatically checkpoints your applications. It's saving the state of your training job at regular intervals. At the same time, SageMaker also monitors all of the instances in the cluster, and if it finds an unhealthy instance, it removes it from the cluster, it replaces it with a healthy instance, it then goes and resumes from the last saved checkpoint. So it resumes the training job from the last saved checkpoint and then runs it to completion. All of this without the user having to worry about resiliency or fault tolerance. I'm also pleased to announce that SageMaker is now launching a number of optimizations to make inference more efficient. So it's reducing the cost of large language model inference by almost 50% and reducing the latency by almost 20%. Here is how it works. So today, when customers deploy foundation models for inference, they deploy models on a single instance. And what happens is that that instance is often underutilized, and that increases the cost for the customer. So what SageMaker allows now is that you can allocate multiple different foundation models onto the same instance, and you can control the resources that you're allocating for each foundation model. Like, you can auto scale on a per model basis. Not just that, it also does intelligent routing, so it looks at the load of the different instances, and then it directs incoming requests to the instance that is the most likely loaded. And as a result, it can reduce inference latency by 20%. It's optimizations like this that make SageMaker the best place to build, train, tune, and deploy foundation models. And to talk more about this, please welcome... Dr. Ebtesam Almazrouei the Chief AI Researcher and Executive Director at TII. (audience applauding) (upbeat music) - Good afternoon, everyone. Thank you for joining us today. One of the most important thing in advanced technology, it is when you are developing a technology, you have to think about the Sustainable Developments Goal. And advanced technology, it has improved the acceptance to transformation and communication, facilitated sustainable energy solutions. Not only this, but also transformed agriculture and healthcare, and promoted innovation and advanced technology infrastructure. You can see here, however all of this advanced technology, it's very important to address the digital divide, ethical considerations, privacy concerns. It's crucial to ensure equitable distributions of technology's benefit for all of us all. We believe that openness is the key to harness technology potentials while safeguarding human rights and achieving sustainable development for all of us. Open large language model is a step forward to achieve this goal. LLMs, or large language models, are forging a golden era of possibilities, from personalizing learning experiences to summarizing massive amount of docks. Not only these, but these algorithms have proven that they can crack the code of NLP. By harnessing language, LLMs are helping us not only to solve our daily life task, but also to contribute to the most pressing issues of our time. That's why in Technology Innovation Institute, we invested in building our Falcon LLMs. We started in 2022 by building NOOR, one of the largest Arabic NLP model in the world. Leveraging the power of cloud made it all possible for us. AWS-accelerated compute infrastructure allowed us to proceed and process massive amount of data, train models with billion of parameters and trillion number of tokens. Not only that, but significantly reduced the operational overhead. To take you through our journey, we leveraged SageMaker to pre-process petabyte scale with data to generate approximately 12 terabyte of data, representing about 5 trillion tokens. To put it in context, 5 trillion tokens, it's about 3 million books, each book with an average of 400 pages. Can you imagine the amount of the data? Then, what we did is all of this data set, we used them to train all our Falcon LLMs: 7B, 40B, 180 Billion parameters. On a large-scale high performance compute clusters, we managed to achieve up to 166 teraFLOPS, thanks to the optimized AWS infrastructure. And again, to give you and to give you a sense of that scale, if a single person is solving a math problem in five seconds to reach to 166 teraFLOPS, he needs 22,000 years to solve what that cluster can solve in only one second. Then going from Falcon-7B to 40B, all the way to Falcon-180-Billion parameter, we needed also to scale our compute capacity. So SageMaker was able to seamlessly scale up to 4,000 GPUs. After that, we, of course, did our model evaluation using SageMaker real-time endpoint. Not only this, but we did our own human evaluations. This rigorous evaluation process is to ensure that Falcon is not just a technological advancement, but also particularly effective and also ethically sound. So what we did, as a team, we built assembled serverless architecture and leveraged a Slack channel to evaluate all the model's answers. Finally, I am glad to let you know that all our Falcon LLMs today, they are now available as part of SageMaker JumpStart, and you can start deploying them, fine-tune them with only single click. Starting with the adoption Falcon-180-Billion parameter is now the largest and top-performing open-source model in the world in the Hugging Face. It has been downloaded over 20 million times. And what that can tell you, it can showcase the strong desire and interest for open-source LLMs. Now, I want to share some of the best practices that we have and it enable us through our AI innovation. First, you wanted to foster visionary thinking at all levels. So we encourage all our researchers to continuously explore new ideas and challenge also all the assumptions. Second, we also wanted to ensure adequate capacity for our experimentation. So it is very crucial to provide access to large-scale compute not only to do the necessary step, but also to empower and constrain exploration and experimentation. Third, you have to have an institute rigorous evaluation of protocols. We thoroughly benchmark all new methods, testing and also validating them. This prevents overoptimistic results and also ensure real-world viability. In summary, embracing the journey thinking, scaled experimentation, rigorous evaluation, and collaboration with vendors like AWS, we are committed to continue applying these best practices from a seed of an idea to a garden of opportunities to deliver groundbreaking innovation. Let's all shape the future of AI. Thank you. (audience applauding) (upbeat music) - Thank you, Dr. Almazrouei. It's really amazing work going on in TII on foundation models. Now, SageMaker is also focused on making machine learning accessible to people who may not be experts at machine learning or who may not be experts at coding. And that is why two years back, we launched SageMaker Canvas, a no-code interface for building and deploying your machine learning models. And now, with generative AI, I'm pleased to announce that SageMaker Canvas's no-code interface is also being extended to foundation models so you can build, and customize, and train, and tune models, all with the no-code interface. And so data analysts, business analysts, finance analysts, citizen data scientists who may not be proficient at coding, who may not be proficient with machine learning can still build generative AI. Let me now get to the final key consideration for accelerating your generative AI journey, and that is using generative AI-powered applications. Many customers tell us that they would like AWS to provide generative AI applications for important enterprise workflows, like in the contact center, like for personalization, like document processing, or even healthcare. Earlier this week, we launched a general availability of AWS HealthScribe that uses generative AI to accelerate clinical productivity. Today, when a patient has to go to a physician, that patient-physician interaction has to be scribed manually, and doctors can spend almost 40% of the time, 40% of the time on this manual work. That is time that's not being spent on patient care. And so AWS HealthScribe uses AI to automatically analyze that patient-physician conversation and then uses generative AI to create a clinical summary that can be uploaded to your electronic health records. And so software vendors, healthcare software vendors, can now use generative AI to enhance clinical productivity. To talk more about this, please welcome Tom Herzog, the Chief Operating Officer at Netsmart. (audience applauding) (upbeat music) - Thank you. (upbeat music) Tom Herzog. Grateful for the opportunity to represent the cause and communities that we serve because at the end of the day, that's what healthcare is. Healthcare is about people helping people. We've been digitizing healthcare for decades now. It's been about more and more data. And the questions we're all asking now: What are we gonna do with that data? Whether we're a provider, all of us are consumers, and healthcare is absolutely a universal language. I want to introduce this notion that these tools that we're talking about that we've all now arrived at, that we're so excited about, it truly is about addition through subtraction. See, I believe that less is more. And as we talk about HealthScribe and we talk about Bedrock, what we're really talking about is: How can we be more efficient, less task, less input, so that caregivers can see more people at the right time when they need it most? The challenge, we all know the demand far outpaces the supply. That when we schedule our own appointments, we're limited with the number of options because of the need that's out there. I'm gonna talk about that here in a second. Let me get to a very pragmatic idea and solution. Providers spend over 40% of their times two days a week, those in telehealth sessions, just doing documentation. That's two days that they're unable to see someone. And if you ask them, 15 to 45% of the information they have while what they're using is really good, they need more, more contextual awareness, not just for when they're talking to you right then and there, but for things that may have been known weeks ago, months ago, years ago, that contextual awareness, if you will. Let's frame the challenge in how this is impacting us as a society in our communities. We know that over 50 million people will be challenged with a mental healthcare illness or crisis in a given year. We know that over 60% of our youth do not receive treatment for things that they may be suffering with like depression or anxiety. And we know that nearly 25% of adults, their needs go unmet for the treatment that they're seeking or that they're not even aware that they need. This creates an opportunity for us to do something different. This is the team, this is the cause and communities that we proudly serve. This is also the team of innovators and designers who are working together to change the healthcare landscape as we know it. We serve over 754,000 providers who are touching over 133 million lives and beyond what we know as traditional medicine of acute or primary care. We're talking about community services, public health, intellectual development and disability needs, those who have foster or family care services, long-term care, hospice care. This is a real opportunity for all of us. Simply, as we look at the things that we're doing, here's what we need to focus on as a solution. Not usability, not less clicks. We need extreme usability to reduce the burden on providers, so that they can accelerate, and prove, and optimize the outcomes for the people that they are seeing. We have a unique opportunity using tools, solutions like HealthScribe and Amazon Bedrock, to do something simple. Let's give those two days back to caregivers so that they can see more people. Let's streamline discharge so that as you need to connect with other people, that information is relevant to you right then and right there. And let's transform collaboration as we know it and take manual processes away to introduce how this system can cohesively follow you anywhere, anyhow. Why did we choose these tools? Quite simple. HealthScribe and Bedrock produce ready-built, purpose-built solutions that we can plug into our systems right now, it's able to scale with us from a performance standpoint, and it has the ability to integrate across the ecosystem very uniquely. And lastly, give you back to the solution, the notion that we started with. Imagine a telehealth session, if you will, where you're not only just capturing the information, you're doing it systematically, you're doing it with a great degree of accuracy. But using tools within Bedrock to pull forward that information so that as I am interacting with you, I can look back a week, six months, a year to have relevant information to suggest the right treatment plan going forward. And while we often talk about the tools and the technology, and I love it and I'm a geek at heart, what this really takes is for all of us working together. Our relationship with AWS just isn't about how we can use these in a systematic way. Beyond partnership, it's about collaboration. 'Cause the things that we're talking about in healthcare today isn't about tomorrow. It's happening right here, right now. And we're deeply grateful and appreciative for that partnership. Dr. Saha, appreciate the time and the opportunity to share our story. Thank you. (audience applauding) (upbeat music) - Thank you, Tom. It's amazing how Netsmart is embedding AI and generative AI into the healthcare space. Let me now summarize the key points of my talk. At AWS, we are focused on helping customers build and scale generative AI for the enterprise. And when you're building for the enterprise, it's important to pay attention to some key considerations. This is what we learned from building our own applications, and I believe this will be applicable when you build your own applications. First, you want choice and flexibility of models. Second, you want to use and differentiate with your data. Next, you want to integrate responsible AI into your applications. You also need to have access to a low cost and highly-performant machine learning infrastructure. And finally, in many cases, you want to get started with the generative AI applications that we provide for contact centers, personalization, document processing, healthcare, and others. Thank you for coming and please enjoy the rest of re:Invent. Please don't forget to fill the survey for this session. Thank you. (audience applauding)
Info
Channel: AWS Events
Views: 9,019
Rating: undefined out of 5
Keywords: AWS, Amazon Web Services, AWS Cloud, Amazon Cloud, AWS re:Invent, AWS Summit, AWS re:Inforce, AWS reInforce, AWS reInvent, AWS Events, generative AI, AI, artificial intelligence, ML, machine learning, cross industry, innovation on AWS, customer stories, Amazon SageMaker, Amazon Bedrock, Amazon Transcribe
Id: edPF6ItZsnE
Channel Id: undefined
Length: 61min 4sec (3664 seconds)
Published: Fri Dec 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.