AWS re:Invent 2022 - Keynote with Swami Sivasubramanian

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[music playing] Please welcome the Vice President of Data and Machine Learning at AWS, Dr. Swami Sivasubramanian. [music playing] Welcome to day three of re:Invent, everyone. You know, this past summer, my seven-year-old daughter, who wants to grow up to be an inventor and a scientist among 20 other things, asked me a question, "Dad, how do scientists come up with these amazing new inventions? How did they come up with new ideas?" To answer the questions, I didn't want to just make up and answer within like 10 seconds. I actually said, "Now, maybe let's watch a few documentaries behind some of the greatest inventions that changed humankind." And here I am, several months into this exploration, still very fascinated by how great inventions are born. We like to think that the genesis for every great idea happens for the spark at random thought, or the lightbulb moment, or simply stroke of genius. And as history would dictate, it always happens to seem so suddenly. With a flash of realization, ancient mathematician Archimedes uncovered the physical law of buoyancy in his bathtub. Isaac Newton developed his theory of gravitation after observing an apple fall from a tree. Percy Spencer discovered the earliest microwave oven when this candy bar accidentally melted in his pocket when he was standing in an MIT lab, and he was standing next to an active magnetron. These are the vacuum tubes used in early radar systems. But is that really how these light bulb moments work? Are they really as instantaneous as we have been led to believe? These Aha! moments are actually preceded by ingesting hundreds, if not thousands of information that our minds assimilate over time. Let's revisit the microwave oven example. Percy Spencer had more than 20 years of experience working at magnetrons, leading up to that moment in the MIT lab. And before that, he had more than… Another, he was an expert in radio technology while working for the US Navy. In fact, it actually took Spencer more than 30 years to arrive at his microwave epiphany. He just had to connect the dots, just like we do with our data. Researchers Dr. Mark Beeman and Dr. John Kounios wanted to explore this phenomenon even further. They measured participants brain activity through a specific set of tasks and found that creativity follows a real scientific process, in which our brains produce these big ideas. That research demonstrate that human beings concentrate, analyze, and find correlations in different lobes of our brain, when we are making sense of new information, and they even process it when we sleep. And this all happens before this creative spark occurs in this lobe, right about the right ear. To put it simply, they prove that insights can occur when our observations are paired with the power of analytical processing. Really cool, right? I am fascinated by this research for several reasons. If you look closely, the human mind shows us how we can harness the power of data to drive creativity. The same process can apply to organizations. However, applying the neuroscience of creativity to a modern-day organization, it's not always perfect. Our environments in organizations present many specific challenges, and several important caveats. Within a business we refer to the knowledge or information we acquire as data points. But unlike the human brain, there isn't one centralized repository to collect all our data, which often means it leads to data silos and inconsistencies across an organization. It takes a considerable amount of time and effort to clean your data and store it in accessible locations. Unlike the human brain, data isn't automatically processed when we sleep. It always should be hard to build automation into our data infrastructure to avoid manual replications and costly updates after working hours. Data doesn't naturally flow within our organization, like the neural pathways in our brain. We had to build complex pipelines to move data to the right place, and set up mechanisms for the right individuals to get access to the data when and where they need it. And finally, data isn't always easy to analyze or visualize, which can make it really difficult for you to identify critical correlations that spark these new ideas. If you step back, you need all of these elements to come together in order for these parts, your new products or new customer experiences, to come to life. So while this theory of neuroscience can be applied to the principles of data science, they must acknowledge that the processes required to maximize the value of our data are far from innate. I strongly believe data has the genesis for modern invention. To produce new ideas without data, we need to build a dynamic data strategy that leads to new customer experiences as its final output. And it is absolutely critical that today's organizations have the right structures and technology in place that allows new ideas to form and flourish. While building a data strategy can really feel like a daunting task, you are not alone. We have been in the data business long before even AWS came into existence. In fact, Amazon's early leaders often repeated the face that data beats intuition. We built our business on data. They have enabled data-driven decision making with [PH] Babbler, our internal AV testing suite, to produce the earliest book recommendation on amazon.com. Since then, we have used data to develop countless products and services, including two day shipping to local grocery delivery, and many more. We also use data to anticipate our customers expanding storage needs, which paves the way for the development of AWS. And for more than 15 years we have solved some of the most complex data problems in the world with our innovations in storage, databases, analytics, and AI and ML. We delivered the first scalable storage in the cloud with S3, the first purpose-built database in the cloud with DynamoDB, the first fully managed cloud data warehouse with Redshift and many more. Since introducing several of these files, we are continuing to launch new features and services that make it easy to create, store, and act on data. And we have seen recognition for many of our services. This year, AWS received a 95 out of 100 score in the Gartner Solution Scorecard for Amazon RDS, including Amazon Aurora. These types of achievements are why more than one and a half million customers are counting AWS for their data needs. We're about with some of the biggest brands in the world like Toyota, Coca-Cola, and Capital One to build comprehensive end-to-end data strategy. And our customers are using these strategies to transform their data into actionable insights for their businesses every day. For example, organizations like Bristol Myers Squibb use AWS data services to advance the application of single cell technologies in drug development and clinical diagnosis. Nielsen built a data lake capable of storing 30 petabytes of data, expanding that ability to process customer insights from 40,000 households to 30 million households on a daily basis. And in the race to launch autonomous vehicles, Hyundai leverages AWS to monitor, trace, and analyze the performance of their machine learning models, achieving a 10x reduction in their model training time using Amazon SageMaker. By working with leaders across all industries, and of all sizes, we have discovered at least three core elements of a strong data strategy. First, you need a future-proof data foundation supported by core data services. Second, you need solutions that weave connective tissue across your entire organization. And third, you need the right tools and education to help you democratize your data. Now, let's dig in starting with the future-proof data foundation. In the technology industry, we often hear the phrase, future-proof, thrown around a lot to market all sorts of products and technologies. But my definition of future-proof foundation is clear. It means using the right services to build a foundation that you don't need to be heavily rearchitecting or incur technical debt as your needs evolve, and the volume and types of data changes. Without a data strategy that is built for tomorrow, organizations won't be able to make decisions that are key to gaining a competitive edge. To that end, a future-proof data foundation should have four key elements. It should have access to the right tools for all workloads and any type of data so you can adapt to changing needs and opportunities. It should be able to keep up with the growing volume of data by performing at really high scale. It should remove the undifferentiated heavy lifting for your IT and data team so you can spend less time managing and preparing your data and more time getting value from it. And finally, it should have the highest level of reliability and security to protect your data stores. For the first element of future-proof data foundation, you will need the right tools for every workload. We believe that every customer should have access to a wide variety of tools based on data types, personas, and use cases, as they grow and change. A one-size-fits-all approach simply does not work in the long run. In fact, our data supports this. 94% of our top 1,000 AWS customers use more than 10 of our databases and analytics services. That's why we support your data journey with the most comprehensive set of data services out of any cloud provider. We support data workloads for your application with the most complete set of relational databases like Aurora and a purpose-built databases like DynamoDB. We offer the most comprehensive set of services for your analytics workloads, like SQL analytics with Redshift, big data analytics with EMR, business intelligence with QuickSight, and interactive log analytics with OpenSearch. We also provide a broad set of capabilities for your machine learning workloads, with deep learning frameworks, like PyTorch and TensorFlow, running on optimized instances, and services like Amazon SageMaker that makes it really easy for you to build, train, and deploy ML models end to end, and AI services with built-in machine learning capabilities with services like Amazon Transcribe and Amazon Textract. All of these services together, come to form your end-to-end data strategy, which enables you to store and query your data for your databases, data lakes and data viruses. Act on your data with analytics, BI, and machine learning. And catalog and govern your data with services that provide you with centralized access controls, with services like Lake Formation and Amazon DataZone, which I will dive into later on. By providing a comprehensive set of data services, we can meet our customers where they are in their journey, from the places they store their data to the tools and programming languages they use to get the job done. For example, take a look at Amazon Athena, our serverless interactive query service, which was designed with a standard SQL interface. We made Athena really easy to use. Simply point your data in S3, define your schema, and start querying to receive insights in just within seconds. Athena SQL interface and ease of use is why it's so popular among data engineers, data scientists, and many other developers. In fact, tens of thousands of AWS customers use Amazon Athena today. While we have made it really easy to leverage SQL on Athena, many of our customers are increasingly using open-source frameworks like Apache Spark. Apache Spark is one of the most popular open-source frameworks for complex data processing, like regression testing, or time series forecasting. Our customers regularly use Spark to build distributed applications with expressive languages like Python. However, to build interactive applications using Spark, our Athena customers told us that they want to perform this kind of complex data analysis using Apache Spark, but they do not want to deal with all this infrastructure setup and keeping up all these clusters for interactive analytics. They wanted the same ease of use we gave them with SQL on Athena. That's why today I'm thrilled to announce Amazon Athena for Apache Spark-- [applause] -- which allows you to start running interactive analytics on Apache Spark in just under one second. Amazon Athena for Apache Spark enables you to spin up Spark workloads up to 75 times faster than other serverless Spark offerings. You can also build Spark applications with a simplified notebook interface in the Athena console or using Athena APIs. Athena is deeply integrated with other AWS services like SageMaker and EMR, enabling you to query your data from various sources and you can chain these calculations together and visualize your results. And with Athena, there is no infrastructure to manage and you only pay for what you use. We are thrilled to bring Apache Spark to our Athena customers, but we are not stopping there. Just yesterday, we announced Amazon Redshift Integration for Apache Spark, which makes it easier for running Spark applications on Redshift data from other AWS analytic services. This integration enables EMR applications to access Redshift data to run up to 10x faster compared to existing Redshift-Spark collectors. And with a fully certified Redshift connector, you can quickly run analytics and MO without compromising on security. With these new capabilities, AWS is the best place to run Apache Spark in the cloud. Customers can run Apache Spark on EMR, Glue, SageMaker, Redshift, and Athena with our optimized Spark runtime, which is up to 3x faster than open-source Spark. We're pleased to bring these new integrations to our customers. So, we've discussed how critical it is to have a variety of tools at your fingertips when you need them. But these tools should also include high performing services that enable you to grow your businesses without any constraints. That brings me to the second element of our future-proof data foundation, performance at scale. Your data foundation should perform at scale across your data viruses, databases, and data lakes. You will need industry- leading performance to handle inevitable growth spurts in your business. You will need it when you want to quickly analyze and visualize your data, you will need to manage your costs without compromising on your capacity requirements. Our innovations have helped our customers at scale, right from day one. And today, Amazon Aurora auto scale up to 228 terabyte per instance at 1/10 the cost of other legacy enterprise databases. DynamoDB process more than 100 million requests a second across trillions of API calls on Amazon Prime Day this year. But Amazon Redshift, tens of thousands of customers collectively process exabytes of data every day. It's up to five times better price performance than other cloud data viruses. Redshift also delivers up to seven times better price performance on high concurrency low latency workloads like dashboarding. And Document DB, our fully-managed document database service that can automatically scale up to 64 terabytes of data per cluster with no latency, that serves millions of requests per second. Tens of thousands of AWS customers, including Venmo, Liberty Mutual, and United Airlines rely on Document DB to run their JSON document workloads at scale. However, as our DocumentDB customers experience growth, they have asked us for easier ways to manage scale without having performance impacts. For example, they said it's really difficult to handle throughput beyond the capacity of a single database node. So they [PH] told the scaling guard or sharding their data sets across multiple database instances is really, really complex. You got to actually build special application logic for sharding. You got to manage the capacity, and you got to reach out your database live without having any performance impact. In such a distributed setting, even routine tasks can become increasingly cumbersome, as the application scales across hundreds of instances. They also wanted the ability to auto scale to petabytes of storage. And the only alternate options that exists, either they squeal slowly or they are really expensive. So they asked us for an easy button to scale reads and writes. That's why I'm pleased to announce the general availability of Amazon DocumentDB Elastic Clusters, a fully-managed solution for document workloads of virtually any size and scale. [applause] Elastic Clusters automatically scale to handle virtually any number of reads and writes with petabytes of storage in just minutes, with little to no downtime or performance impact. You don't have to worry about creating, removing, upgrading, or managing, or scaling your instances. Elastic Clusters takes care of all these underlying infrastructure. This solution will save developers months of time for building and configuring all these custom scaling solutions. I am proud to share this new capability with you today. That is just one example of how we are helping you scale. In the leadership session today, Jeff Carter, our VP of Database Services and Migration Services, will explain how fully-managed databases can help you build faster and scale further than ever before. So, when our organizations are backed by high performing services, they can deliver better experiences than ever before. And we are helping our customers perform at scale across a variety of AWS data services. Netflix provides a quality customer experience using S3 and VPC Flow Logs to ingest terabytes of data per day, enabling them to respond to events in real time across billions of traffic flows. Philips uses Amazon SageMaker to apply machine learning to over 48 petabytes of data, making it easy for clinicians using its digital health suite platform to identify at risk patients. And with AWS data services, Expedia is able to deliver more scalable products and online experiences to their travelers. But I won't steal their thunder. Let's welcome Rathi Murthy, Expedia Group CTO and President of Expedia Product and Technology. [music and applause] Good morning. I'm super excited to speak to an audience that actually understands the power of data. When Expedia Group started almost 25 years ago, it disrupted the travel space. Online travel was truly a groundbreaking innovation. Today, we connect over 168 million loyalty members, over 50,000 B2B partners with over 3 million properties, 500 airlines, car rentals, and cruise lines. Expedia is one of the world's largest online travel companies powering travel in over 70 countries. But at our core, we are a technology company. We have gathered decade's worth of data on travel behaviors, booking patterns, traveler preferences, and partner needs. When I joined Expedia Group last year, I was super excited to work for a company that brought together my passion to lead technology, my love for travel together, with customer centricity at its core. It felt like a perfect marriage between technology, travel, and transformation. Today, we are mastering the art of transformation on two fronts, one, transforming our own company, and two, transforming the travel industry. Like many companies in the room here today, Expedia Group scaled through acquisitions. And as technologists, we all know, this means multiple stacks, and added complexity. And as you bring in more partners, you need to reconfigure, which can be costly and time consuming. And like AWS, we are also a customer-first company. We understand the power of data, and that data is key to drive our innovation and our long-term success. And we've continued to invest in our AI/ML to drive those great experiences across our platform. Just to give you an idea of our scale today, we process over 600 billion AI predictions per year powered by over 70 petabytes of data. We also use AI/ML to run over 360,000 permutations of one page on one of our brand sites, which means that every time a traveler comes to our site, they see what is most relevant to them. To help us with this massive transformation, we've been working with AWS on a few fronts. One, helping us modernize our infrastructure to stay highly available for our travelers, by helping us migrate our applications to a container-based solution by leveraging Amazon EKS and Karpenter. Two, to help us render relevant photos and reviews for our travelers at sub-millisecond latency with over 99% accuracy by leveraging Amazon DB and SageMaker. And last, but not the least, also helping us self-serve our travelers by hosting our conversation platform, which has powered over 29 million virtual conversations, saving us over 8 million agent hours. But before I continue, let's just take a moment and think about a truly great holiday. What made it great? Was it the places you visited, the people you were with, the things you saw? When my children were seven and five years old, we decided as a family that we would visit a new country every year. this is a picture from one of our trips to Paris. It doesn't look like it. But yes, it was a family trip. What touched me most was when I read on their college essays that they learned more from these trips about the world and the culture and life than any textbook had taught them thus far. Travel is so much more than just a transaction. Some of our best memories are from a trip away. And the best way to broaden our understanding of the world is to actually go out there and experience it. This is the reason I love being a technologist working in travel, where we can innovate products that bring joy to so many people all over the world. And… data is our competitive advantage. And we want to leverage the immense amount of data we've hosted on AWS to innovate products and create those memories. Now, knowing when to book a flight truly seems like dark art. Earlier this year, we launched Price Tracking and Predictions. This uses machine learning and our flight shopping data to map past trends and future predictions for the prices for your flight route so that you understand the best time to book your flight with confidence. Equally, comparing hotel rooms is also super complex. With our smart shopping, we have the ability now to compare different hotels easily. We leverage AI to read through billions of read descriptors that pull out attributes like room features, upgrades, amenities, all together on one page, so you can easily compare different hotel types side to side and make your right choices. Every time a traveler interacts with us, we collect more data, our models become smarter, and our responses become more personalized. 2022 was a transformative year for us at Expedia Group. Earlier this year, we launched our Open World vision to power partners of all sizes with the technology and supply needed to thrive in the travel market, a first in the travel sector. At its core, it's truly rebuilding a platform in an open way, in a way taking all of our big capabilities, breaking it up into microservices or small building blocks that are configurable, extensible and externalizable so that we can accelerate anyone in the travel business or even help someone enter the travel market. So if you're an airline wanting to expand its offerings with hotels, or if you're an influencer wanting to make it easy for your followers to book that same amazing trip, we can provide you with the building blocks to create everything from the basic payment portal to the complete travel store. So just as we opened the world to travel 25 years ago, we are now making travel as a business more open and accessible to all. Thank you. [music playing] Thank you, Rathi. So as you saw with the Expedia story, when customers are backed by tools that enable them to perform at scale, they can analyze their data and innovate a lot faster, and all with less manual effort. This brings me to the third element of a future-proof data foundation: removing heavy lifting. We are always looking for ways to tackle our customers' pain points by reducing manual tasks through automation and machine learning. For instance, DevOps Guru uses machine learning to automatically detect and remediate database issues before they even impact customers, while also saving database administrators time and effort to debug the issues. Amazon S3 Intelligent-Tiering reduces ongoing maintenance by automatically placing infrequently accessed data into lower-cost storage classes by saving users up to $750 million to date. And with Amazon SageMaker, we are removing the heavy lifting associated with machine learning so that it's accessible to many more developers. Now, let's take a closer look at Amazon SageMaker. As I mentioned earlier, SageMaker enables customers to build, train and deploy ML models for virtually any use case, and with tools for every step of your machine learning development. Tens of thousands of customers are using SageMaker ML models to make more than a trillion predictions every month. For example, Dow Jones & Company created an ML model to predict the best time of day to reach their customers of Wall Street Journal, Barron's and Market Watch subscribers, improving their customer engagement rate by up to 2x their previous strategies. Many of our customers are solving complex problems with SageMaker by using the data to build ML models, right from optimizing driving routes for rideshare apps to accelerating drug discovery. Most of these models are built with structured data, which is really well organized and quantitative. However, according to Gartner, 80% of all new enterprise data is now unstructured or semi-structured, including things like images and handwritten notes. Preparing and labeling unstructured data for ML is really, really complex and labor-intensive. For this type of data, we provide features like SageMaker Ground Truth and Ground Truth Plus that helps you lower your costs and make data labeling a lot easier. However, customers told us that certain types of data are still too difficult to work with, such as your geospatial data. Geospatial data can be used for a wide variety of use cases, right from maximizing harvest yield in agricultural farms, to sustainable urban development, to identifying a new location for opening a retail store. However, accessing high-quality geospatial data to train your ML models requires working with multiple data sources and multiple vendors. And these data sets are typically massive and unstructured, which means time-consuming data preparation before you can even start writing a single line of code to build your ML models. And tools for analyzing and visualizing data are really limited, making it harder to uncover relationships within your data. So not only is this such a complicated process, but it requires such a steep learning curve for your data scientists. So today, we are making it easier for customers to unlock the value of the geospatial data. I'm really excited to announce that Amazon SageMaker now supports new geospatial ML capabilities. [applause] With these capabilities, customers can access geospatial data on SageMaker from different data sources with just a few clicks. To help you prepare your data, our purpose-built operations enable you to efficiently process and enrich these large datasets. It also comes with built-in visualization tools, enabling you to analyze your data and explore model predictions on an interactive map using 3D accelerated graphics. Finally, SageMaker also provides built-in pre-trained neural nets to accelerate model building for many common use cases. Now, let's see how it works. Please welcome Kumar Chellapilla, our GM for ML and AI services at AWS, who will demonstrate these new capabilities and action. [music playing] Thanks, Swami. Imagine a world where when natural disasters such as floods, tornadoes, and wildfires happen, we can mitigate the damage in real time. With the latest advances in machine learning, and readily available satellite imagery, we can now achieve that. Today we have the ability to not only forecast natural disasters, but also manage our response using geospatial data to make life-saving decisions. In this demo, I'm going to take on the role of a data scientist who's helping first responders with relief efforts as the flood occurs. Using the geospatial capabilities in Amazon SageMaker, I can predict dangerous road conditions caused by rising water levels, so that I can guide first responders on the optimal path as they deliver aid, send emergency supplies and evacuate people. In such a scenario, I want to move as quickly as I can because every minute counts. I want to get people to safety. Without SageMaker, it can take a few days to get access to data about real-world conditions and even more time to make predictions because the data is scattered and difficult to visualize. And there's no efficient way to train and deploy models. Now, let me dive into the demo, and show how to access geospatial data, build and train a model and make predictions using the new geospatial capabilities in Amazon SageMaker. To build my model, I need to access geospatial data, which is now readily available in SageMaker. Instead of spending time gathering data from disparate sources and vendors, I simply select the data I need. In this case, I select open-source satellite imagery from Sentinel-2 for the affected area. In order to understand where the water spread, I apply land classification, a built-in SageMaker model, which classifies the land as having water or not. Looking at images before and after the flood occurred, I can clearly see how the water is spread across the entire region and where it caused the most severe damage. Knowing where floodwaters are spreading is super helpful. But I still need to zoom in to see which roads are still there and help first responders navigate safely. Next, I add high-resolution satellite imagery from Planet Labs, one of the third-party data providers in SageMaker. These visualizations allow me to overlay the roads on the map so I can easily identify which roads are underwater, and keep first responders up to date as conditions unfold on the ground. Now that I understand my data, I start making predictions. With SageMaker, I don't have to spend weeks iterating on the best model for my data. I simply select one of the pre-trained models in SageMaker, in this case, road extraction, which makes it easy for me to train the model on my data and send directions to the first aid team. Once the model is ready, I can start making predictions. In this case, the model I built identifies which roads are still intact and not underwater. Using the visualization tools in SageMaker, I can view the predictions in an interactive map so that I have full visibility on what's happening on the ground. I can see that the red-colored roads are flooded, but the green color roads are still available and safe to drive on. Similar to satellite imagery from Planet Labs, I can add point-of-interest data from Foursquare to see where the nearest hospitals, medical facilities and airports are. For example, I can see that the airfield on the left is surrounded by water, so I must use the temporary helipad or the international airport on the right instead. With this information in hand, I can now give clear directions within minutes so that they know the best path for sending emergency aid, directing medical staff and routing people out of the flood zone. We've covered flood path predictions. But SageMaker can support many different industries. In fact, later today during the AI/ML leadership session with Bratin Saha, you will hear how BMW uses geospatial machine learning. As Swami mentioned, it's not just automotive. Customers use geospatial machine learning for a variety of use cases in retail, agriculture and urban planning, and the list goes on. We can't wait to hear what you will do with geospatial data. Head over to the console today and try the new geospatial capabilities in Amazon SageMaker. Thank you. [music playing] Thank you, Kumar. These types of innovations demonstrate the enormous impact that data can have for our customers and for the world. It's clear that data is extremely powerful. And today it is critical to almost every aspect of your organization, which means you need to put the right safeguards in place to protect it from costly disruptions and potential compromises. This brings me to the last element of the future-proof data foundation: reliability and security. AWS has a long history of building secure and reliable services to help you protect your data. S3 was built to store your data with 11 9s of durability, which means you can store your data without worrying about backups or device failures. Lake Formation helps you build a secure data lake in just days with fine-grained access control. And our core database services like DynamoDB, Aurora and RDS were architected with multi-AC capabilities to ensure seamless failovers in the unlikely event an AC is disrupted, thereby protecting our customers' mission-critical applications. But today, our customers' analytics applications on Redshift are mission-critical as well. While our Redshift customers have recovery capabilities like automated backups, and the ability to relocate that cluster to another AC in just minutes, they told us that sometimes minutes are simply not enough. Our customers told us they want their analytics applications to have the same level of reliability that they have with their databases like Aurora and Dynamo. I'm honored to introduce Amazon Redshift multi-AZ, a new multi-AZ configuration that delivers the highest levels of reliability. [applause] This new multi-AZ configuration enhances availability for your analytics applications with automated failover in the unlikely event an AZ is disrupted. Redshift multi-AZ enables your data warehouse to operate on multiple AZ simultaneously and process reads and writes without the need for an underutilized standby sitting idle in a separate AZ. That way, you can maximize on your return on investment, and no application changes or other manual intervention required to maintain business continuity. But high availability is just one aspect of a secure and reliable data foundation. We are making ongoing investments to protect your data from the core to the perimeter. While these security mechanisms are critical, we also believe they should not slow you down. For example, let's take a look at the security as it relates to Postgres. Postgres and RDS and Aurora has become our fastest-growing engine. Developers love Postgres extensions because it enhances the functionality of their databases. And with thousands of them available, our customers told us they want them in a managed database. However, extensions provide super user access to your underlying file systems, which means they come with a huge amount of organizational risk. That's why they must be tested and certified to ensure they do not interfere with the integrity of your database. This model is like imagine you're building an impenetrable fortress only to leave the keys on the front door. To solve this problem for our customers, we have invested in an open-source project that makes it easier to use certified Postgres extension in our databases. Today, I'm excited to announce Trusted Language Extensions for Postgres, a new open-source project that allows developers to securely leverage Postgres extensions on RDS and Aurora. [applause] These Trusted Language Extensions help you safely leverage Postgres extensions to add the data functionality you require for your use cases without waiting for AWS certification. They also support popular programming languages you know and love, like JavaScript, Perl, PL/pgSQL. With this project, our customers can start innovating quickly without worrying about unintended security impacts to their core databases. We will continue to bring value to our customers with these types of open-source tools while also making ongoing contributions back to the open source community. So now that we have talked about protecting your data at the core, let's look at how we are helping your customers protect their data at the perimeter. When you leverage your database services on AWS, you can rely on us to operate, manage and control the security of the cloud, like the hardware, software and networking layers. With our shared responsibility model, our customers are responsible for managing the security of their data in the cloud, including privacy controls for your data, who has access to it, how it's encrypted. While this model eases a significant portion of the security burden for our customers, it can still be very difficult to monitor and protect against these evolving security threats to your data year round. To make this easier for our customers, we offer services like Amazon GuardDuty, an intelligent threat detection service that uses machine learning to monitor your AWS accounts for various malicious activity. And now we are extending the same threat detection service to our fastest-growing database. Built for Amazon Aurora, I'm very excited to announce the preview of GuardDuty RDS Protection... [applause] ...which provides intelligent threat detection in just one click. GuardDuty RDS Protection leverages ML to identify potential threats like access attacks for your data stored in Amazon Aurora. It also delivers detailed security founding so you can quickly locate where the event occurred and what type of activity took place. And all this information is consolidated at an enterprise level for you. Now that we have explored the elements of data proof foundation-- future-proof data foundation, we will dive deep into how you can connect the dots across your data stores. The ability to connect your data is as instrumental as the foundation that supports. For the second element of a strong data strategy, you will need a set of solutions that help you weave the connective tissue across your organization from automated data pathways to data governance tools. Not only should this connective tissue integrate your data, but it should also integrate your organization's departments, teams and individuals. To explain the importance of this connective tissue, I wanted to share an analogy that is really close to my heart. This is a picture of a Jingkieng Jri in the northeastern part of India. It is a living bridge made of elastic tree roots in the state of Meghalaya. These bridges are built by the Khasi tribe, indigenous farmers and hunters who trek through dense valleys and river systems just to reach nearby towns. Every year, the monsoon season means the forest rivers become almost impassable, further isolating their villages that are sitting on top of the foothills of Himalayas. That is until these living bridges came to be. So you might be asking yourself, why is Swami talking about these ancient root bridges when he is supposed to be talking about my data? Well, I wanted to share this story because we can apply many valuable engineering lessons from the Khasi on how we can build connective tissue with our data stores. First, they use quality tools that enable growth over time. The Khasi built the structures with durable root systems that were able to withstand some of the heaviest rainfall in the world, and these bridges can last up to 500 years by attaching and growing within their environment. Similarly, your connective tissue needs both quality tools and quality data to fuel long-term growth. Second, they leveraged a governance system of cooperation. Over a period of decades, and sometimes even centuries, tribal members cooperated and shared the duty of pulling these elastic roots one by one, until a passable bridge was formed. With data, governance enables safe passage for disconnected teams and disconnected data stores so your organizations can collaborate and act on your data. And finally, they created strong pathways to their vital resources. These bridges protected the region's agricultural livelihood by providing a pathway from remote villages to nearby towns. The Khasi were engineers of connection because their success depended on it. Today, one of the most valuable assets in our organization is connected data stores. Connectivity, which drives ongoing innovation is also critical for our survival in the organization. Now let's revisit the importance of using high-quality tools and high-quality data to enable future growth. When our customers want to connect their structured and unstructured data for analytics and machine learning, they typically use a data lake. Hundreds of thousands of data lake run on AWS today, leveraging services like S3, Lake Formation, and AWS Glue, our data integration service. Bringing all this data together can help you gather really rich insights, but only if you have quality data. Without it, your data lake can quickly become a data swamp. To closely monitor the quality of your data, you need to set up quality rules. And customers told us building these data quality rules across data lakes and their data pipelines is very, very time consuming, and very error prone with a lot of trials and errors. It takes days, if not weeks for engineers to identify and implement them, plus additional time needs to be invested for ongoing maintenance. They asked for a simple and automated way to manage the data quality. To help our customers do this, I'm pleased to share the preview of AWS Glue Data Quality, a new feature of AWS Glue. [applause] Glue Data Quality helps you build confidence in your data so that you can make data-driven decisions every day. Engineers can generate automated rules for specific data sets in just hours, not days, increasing the freshness and accuracy of your data. Rules can also be applied to your data pipelines. So poor quality data does not even make it to your data lakes in the first place. And if your data quality deteriorates for any reason, Glue Data Quality alerts you so you can take action right away. Now, with high-quality data, you will be able to connect the dots with precision and accuracy. But you also need to ensure that the right individuals within your organization are able to access this data so you can collaborate and make these connections happen. This brings me to the second lesson we learned from the Khasi, creating a system of governance to unleash innovation within your organization. Governance was historically viewed as a defensive measure, which meant really locking down your data silos. But in reality, the right governance strategy helps you move and innovate faster with well defined guardrails that give the right people access to the data when and where they need it. As the amount of data rapidly expands, our customers want an end-to-end strategy that enables them to govern their data across their entire data journey. They also want to make it easier to collaborate and share their data while maintaining quality and security. But creating the right governance controls can be complex and time-consuming. That's why we are reducing the amount of manual efforts required to properly govern all of your data stores. As I mentioned earlier, one of the ways we do this today is through Lake Formation, which helps you govern and audit your data lakes on S3. Last year, we announced new role and cell-level permissions that help you protect your data by giving users access to the data they need to perform their job. But end-to-end governance doesn't just stop with data lakes. You also need to address data access and privileges across more of customers use cases. Figuring out which data consumers in your organization have access to what data can itself be time-consuming. From manually investigating data clusters to see who has access to designating user roles with custom code, there is really simply too much heavy lifting involved. And failure to create these types of safety mechanisms can mean unnecessary exposure, or quality issues. Our customers told us they want an easier way to govern access and privileges with more of our data services, including Amazon Redshift. So today, I'm pleased to introduce a new feature in Redshift Data Sharing, Centralized Access Controls that allow you to govern your Redshift data shares using Lake Formation console. [applause] With this new feature in Redshift Data Sharing, you can easily manage access for data consumers across your entire organization from one centralized console. Using the Lake Formation console, you can designate user access without complex querying or manually identifying who has access to what specific data. This feature also improves the security of data by enabling admins to granular role level and cell level access within Lake Formation. Now, Centralized Access Controls are critical to helping users access siloed data sets in a governed way. One of the key elements of an end-to-end data strategy is machine learning, which is really critical for governance as well. Today, more companies are adopting ML for their applications. Our governing this end-to-end process for ML presents a unique set of challenges very specific to ML, like onboarding users and monitoring ML models. Because ML model building requires collaboration among many users, including data scientists and data engineers, setting up permissions requires time-consuming customized policy creation for each user group. It's also challenging to capture and share model information with other users in one location, which can lead to inconsistencies and delays in approval workflows. And finally, custom instrumentation is needed to gain visibility into the model performance, and that can be really expensive. To address this for our customers, we are bringing you three new machine-learning governance capabilities for Amazon SageMaker, including SageMaker Role Manager, Model Cards and Model Dashboards. [applause] These are really powerful governance capabilities that will help you build ML governance responsibly. To address permission sharing, Role Manager helps you define minimum permissions for users in just minutes without automated-- with automated policy creation for your specific needs. To centralize the ML model documentation, Model Cards create a single source of truth throughout your entire ML model lifecycle and auto-populate model training details to accelerate your documentation process. And after your models are deployed, model dashboard increases the visibility but unified monitoring for the performance of your ML models. With all these updates, we have covered now governance for your data lakes, data viruses, and machine learning. But for a true end-to-end governance, you will need to manage data access across all of your services, which is the future state we are building towards. As Adam announced yesterday-- [applause] -- we are launching Amazon DataZone, a data management service that helps, catalog discover, analyze, share, and govern data across your organization. DataZone helps you analyze more of your data, not just what's in AWS, but also third-party data services while meeting your security and data privacy requirements. I have had the benefit of being an early customer of DataZone. I leverage DataZone to run the AWS weekly business review meeting, where we assemble data from our sales pipeline and revenue projections to inform our business strategy. Now to show you DataZone in action, let's welcome our Head of Product for Amazon Data Zone, Shikha Verma, to demonstrate how quickly you can enable your organization to access and act on your data. [music playing] Thanks, Swami. Wow! It's great to see you all out here. I am so excited to tell all the data people over here. Now you don't have to choose between agility and getting to the data you need, and governance to make sure you can share the data across your enterprise. You can get both. We have built Amazon DataZone to make it easy for you to catalog, organize, share, and analyze your data across your entire enterprise with the confidence of the right governance around it. As we know, every enterprise is made up of multiple teams that own and use data across a variety of data stores. And to do their job, data people, like data analysts, engineers, and scientists have to pull this data together but do not have an easy way to access or even have visibility to this data. Amazon DataZone fills this gap. It provides a unified environment, a zone, where everybody in your organization from data producers to consumers can go to access, share, and consume data in a governed manner. Let's jump into how this works. I'm going to use a very typical scenario that we see across our customers. This may seem familiar to many of you. In this scenario, a product marketing team wants to run campaigns to drive product adoption. Sounds familiar? To do this, they need to add analyze a variety of data points, including data that they have in the data warehouse in Redshift, data that they have in their data lake around data marketing campaigns, as well as third-party sources like Salesforce. In this scenario, Julia is a data engineer. She is a rock star. She knows the data in and out, and often gets requests to share the data in the data lake with other users. To share this more securely with a variety of users across her enterprise, she wants to catalog and publish it in Amazon DataZone. She is our data producer. And Marina is a rock star marketing analyst. She's a campaign expert who wants to use the data in the data lake to run the marketing campaigns. She is our data consumer. Let's see how Amazon DataZone helps them connect. Let's start with Julia and see how she publishes data into the DataZone. She logs into the DataZone portal using her corporate credentials. She knows the data sources that she wants to make available, so she creates an automated sales publishing job. She provides a quick name and description, selects a publishing agreement, which essentially like a data contract that tells the consumers how frequently she'll keep this data updated, how to get access, who will authorize access and things like that. She then selects the data sources, and the specific tables, and the columns that she wants to make available in Amazon DataZone. She also sets the frequency of how quickly this data will be kept into sync. Within a few minutes, the sales pipeline data and the campaign data from the data lake will be available in Amazon DataZone. Now, Julia has the option to enrich the metadata and add useful information to it so that data consumers like Marina can easily find it. She adds a description, additional context, any other information that would make this data easier to find. We also know that for large datasets, adding and curating all of this information manually is laborious, time-consuming and even impossible. So, we are making this much easier for you. [applause] Thank you. We are building machine learning models to automatically generate business names for you. And then, Julia will have the option at a column level to select the recommendation that we came up with or edit it as you see please. How awesome is that? [applause] Thank you. I think so too. Once Julia has created this particular data asset in DataZone she wants to make available for the data consumers. In this scenario, since Julia is a data expert, and she knows this data very well, she also functions as a data steward. And she could publish this directly into the DataZone. But we also know that many of you have set up data governance frameworks or want to set up data governance frameworks, where you want to have business owners and data stewards managing your domain the way you'd like to. For this, we also have that option. Now that the data is published and available in Amazon DataZone, Marina can easily find it. Let's see how easy this is. Marina goes back to Amazon DataZone, logs in using her corporate credentials, uses a search panel to search for sales. A list of relevant assets is returned and she learns more about the data and where it comes from. She can see a bunch of domains in there. She sees sales, marketing, finance. She can also see that there is data from all kinds of sources. She notices Redshift, data lake, and Salesforce. And you also saw that there was a variety of assets that she could have sorted the search results on. It's really easy peasy. Now, to perform the campaign analysis, Marina wants to work with a few of her team members, because they want the same access as her. So now, she creates a data project. Creating a data project is a really easy way for her to create a project where she wants the team members to collaborate with, they will get the same access that she wants, to the right datasets, as well as the right tools such as Athena, Redshift, or QuickSight. Marina knows the data she's after, so she subscribes to it or gets access to it using the identity of the project. And after this, any of our team members can use the deep links available in Amazon DataZone to get to the tools that they want. Using the deep links, they can get to the service directly without any additional configuration or individual permissions. In this particular case, they choose Athena. And now Marina and her team members can query the data that Julia wanted to make available for them using the project context and using the tools that they wanted. So, I know this went by quick, but hopefully, you can see how easy this is. And the entire data discovery, access, and usage lifecycle is happening through Amazon DataZone. You get complete visibility into who is sharing the data, what data sets is being shared, and who authorized it. Essentially, Amazon DataZone gives your data people the freedom that they always wanted, but with the confidence of the right governance around it. As Adam mentioned yesterday, there is really nothing else like it. So, I can't wait to see how you use it and come find out more in our dedicated breakout session later today. Thank you. [music] Thank you Shikha, it's really exciting to see how easy it is for customers to locate the data and collaborate with DataZone. We'll continue to make it even easier for customers to govern that data for this new service. We're just getting started. So, I shared how governance can help the connective tissue by managing data sharing and collaboration across individuals within your organization. But how do you weave a connective tissue within your data systems to mitigate data sprawl and derive meaningful insights? This brings me back to the third lesson from the courses living bridges, driving data connectivity for innovation and ultimately survival. Typically, connecting data across silos requires complex ETL pipelines. And every time you want to ask a different question of your data, or you want to build a different machine learning model, you need to create 10 other data pipeline. This level of manual integration is simply not fast enough to keep up with the dynamic nature of data and the speed at which you want your business to move. Data integration needs to be more seamless. To make this easier, AWS is investing in a zero-ETL future where you never have to manually build a data pipeline again. [applause] Thank you. We have been making strides in the zero-ETL future for several years by deepening integrations between our services that help you perform analytics and machine learning without the need for you to move your data. We provided direct integration with our AWS streaming services, so you can analyze your data as soon as its produced and gather timely insights to capitalize on new opportunities. We have integrated SageMaker with our databases and data warehouses so you can leverage your data for machine learning without having to build data pipelines or write a single line of ML code. And with federated querying on Redshift and Athena, customers can now run predictive analytics across data stored in operational databases, data warehouses, and data lakes without any data movement. While Federated Query is a really powerful tool, querying and analyzing data stored in really different locations isn't optimized for maximum performance when compared to traditional ETL methods. That's why this week we are making it easier for you to leverage your data with creating and managing ETL pipelines. Yesterday we announced Aurora now supports zero-ETL integration with Amazon Redshift, thereby bringing your transactional data sitting in Aurora to the analytics capabilities and Redshift together. This new integration is already helping customers like Adobe to spend less time building Redshift ETL pipelines and more time gathering insights to enhancing their core service like Adobe Acrobat. We are also removing the heavy lifting from ETL pipeline creation for customers who want to move data between S3 and Redshift. For example, imagine you're an online retailer trying to ingest terabytes of customer data from S3 into Redshift every day to quickly analyze how your shoppers are interacting with your site and your application, and how are they making these purchasing choices? While this typically requires creation of ETL pipelines, what if you had the option to automatically and continuously copy all of your data with a single command? Would you take it? Today, I'm excited to announce Amazon Redshift now supports auto copy from S3 to make it easier to continuously ingest your data. With this update, now customers can easily create and maintain simple data pipelines for continuous ingestion. Ingestion rules are automatically triggered when new files are landing on your S3 bucket without relying on custom solutions or managing third-party services. This integration also makes it easy for analysts to automate data loading without any dependencies on your critical data engineers. With these updates I have shared today, including Aurora zero-ETL integration with Redshift, auto copying from S3, as well as integration of Apache Spark with Redshift, we are making it easy for you to analyze all of your Redshift data, no matter where it resides. And I didn't even cover all of our latest innovations in this space. To learn more, make sure to attend this afternoon's leadership session with G2 Krishnamoorthy, our VP of AWS Analytics. With our zero-ETL mission, we are tackling the problem of data sprawl by making it easier for you to connect to your data sources. But in order for this to work, you can't have connections just to some of your data sources. You need to be able to seamlessly connect to all of them, whether they live in AWS or an external third-party applications. That's why we are heavily investing in bringing your data sources together. For example, you can stream data in real-time from more than 20 AWS and third-party sources with Kinesis Data Firehose, a fully managed serverless solution that enables customers to automatically stream the data into S3, Redshift, OpenSearch, Splunk, Sumo Logic, and many more with just a few clicks. Amazon SageMaker Data Wrangler, our no-code visual data prep tool for machine learning makes it easy to import data from a wide variety of data sources for building your ML models. And Amazon AppFlow, our no-code fully-managed integration service offers connectors to easily move your data between your cloud-based SaaS services and your data lakes, and data ware houses. Because these connectors are fully-managed and supported by us, you can spend less time building and maintaining these connections between your data stores and more time maximizing business value with your data. Our customers tell us they love the no-code approach to our connector library. However, as expected, they have continued to ask for even more connectors to help them bring their data sources together. That's why today I'm pleased to share the release of 22 new AppFlow connectors including popular marketing sources like LinkedIn Ads and Google Ads. With this update, our AppFlow library now has more than 50 connectors in total from data sources like S3, Redshift, and Snowflake, to cloud-based application services like Salesforce, SAP, and Google Analytics. In addition to offering new connectors in AppFlow, we are also doing the same for Data Wrangler and SageMaker. While SageMaker already supports popular data sources like Databricks and Snowflake, today, we are bringing you more than 40 new connectors through SageMaker Data Wrangler, allowing you to implement and import even more of your data for ML model building and training. With access to all of these data sources, you can realize the full value of your data across your SaaS services. Now, looking across of all of our services, AWS connects to hundreds of data sources, including SaaS application, on-prem, and other clouds so you can leverage the power of all of your data. We are thrilled to introduce these new capabilities that make it easier to connect and act on your data. Now, to demonstrate the power of bringing all your data together to uncover actionable insights, let's welcome Ana Berg Asberg, Global Vice President R&D IT at AstraZeneca. [music playing] Good morning. I know it's really early, but I need your help. Can I ask you to raise your hand if you or any of your loved ones have been impacted by lung disease? Keep them up and now add to those hands if you or anyone you know has been impacted by a heart failure or heart diseases. And add to those hands if you know anyone or any of your loved ones has been impacted by cancer. Look around. These touch so many of us. It's important. You can take the hands down. Thank you very much. We at AstraZeneca, a global biopharmaceutical company are breaking the boundaries of science to deliver life-changing medicines. We use data, AI, and ML with the ambition to eliminate cancer as a cause of death and protect the lives of patients with heart failure or lung diseases. In order to understand how we are breaking the boundaries of science, we need to zoom in and start really small with the genomes, the transcriptome, proteome, metabolome. Say that fast with a Swedish accent three times, it's quite hard. The genome is a complete set of our DNA every single cell in the body. It contains a copy of the 3 billion DNA base pairs, mapping the genome to uncover new insights into disease biology, and help us discover new disease therapies. Today, our Center of Genomics Research is on track to analyze up to 2 million whole genomes by 2026. The scale of our genome database is massive. And it's really hard to manage a database at that scale, but we do it together with AWS. Together, we have moved 25 petabytes of data across the AWS global network. We process whole genomes across multiple regions, generating 75 petabytes of data in the intermediate process. At a high level, we use AWS Step Function, AWS Lambda for orchestration, AWS Batch to provision optimal compute, and Amazon S3 for storage. So, the list is important, we all know that, but the impact is so much more critical. We can now run 110 billion statistical tests in under 30 hours, helping us provide genetic input to our AstraZeneca projects. Genomics give us the DNA blueprint. But as you know, it's not the only ome. Beyond the genome, is a largely untapped repositories of rich data that if connected could give us valuable insights and we bring it together, together with AWS into multi-omics. We bring the multi-omics data together and make it available to mine for actionable insights by the scientists. Having the bandwidth to process and maintain the data or multi-omics data gives us the possibility to take a step back We add to the understanding of disease by looking small at the data at our hand, the tumor scans, the medical images, and patient data, and we pull it together to detect patterns. For example, in lung cancer studies, we need to measure the tumor scans, the CT scans, and we use a similar deep learning technology that self-driving cars use to understand the 3D environment to run them. Today, we use this in the clinical trials, but in the future, this technology could be used to inform how doctors make treatment decisions with the prediction of what's going to happen next. As you can imagine, the quantity of the data at hand has grown exponentially. And we are accelerating it together with AWS… the pace that a scientist can unlock patterns by democratizing ML using Amazon SageMaker. We use AWS Service Catalog to stand up templated end to end MLOps environments in minutes. And we take every single step with extra care as we're managing patient data in a highly regulated industry. We can now run hundreds of concurrent data science ML projects, to form insights into science. So, we looked small at the multi-omics data, we looked at the data at hand, but one of the most exciting advancements in the industry right now is that patients can choose in clinical trials to share the data with us from their ownhomesown homes. Today, the digital technology is able to collect the data from the patient's home on a daily or even continuous basis. And the data collected is as reliable as data that could only be collected in clinical settings before. The data adds value and enables us to collect data from underdeveloped regions and remote locations. Moving us toward early diagnosis, disease prediction for all people, because our future depends on healthy people, a healthy society, and a healthy planet. AWS helps us to pull the data together, the multi-omics, the data at hand with the medical images, the tumor scans, the remote data collection, and helps us to accelerate insights to science through data, AI, and ML. Today, I raise my hand in the beginning. I've been impacted by cancer. I lost my father in 2018. This is my father and my mother in the year before he passed. And he reminds me every day that every data point we handle is a patient, a loved one. I work at AstraZeneca and with my thousands of colleagues so you can spend every day possible with your loved ones. And it's my privilege to do so. Thank you. [music playing] Wow, what a heartfelt and inspirational story. Thank you, Ana. I'm truly amazed by how AstraZeneca was able to democratize data and machine learning to enable these types of innovation in healthcare. This brings me to the third and final element of a strong data strategy, democratizing data. Since I joined Amazon 17 years ago, I have seen how data can spur innovation in all levels, right from being an intern to a product manager, to a business analyst with no technical expertise. But all these can happen only if you enable more employees to understand and make sense of data. With a workforce that is trained to organize, analyze, visualize and derive insights from your data, you can cast a wider net for your innovation. To accomplish this, you will need access to educated talent to fill the growing number of data and ML roles. You need professional development programs for your current employees. And you need no code tools that enable non-technical employees to do more with your data. Now, let's look at how AWS is preparing students, the future backbone of our data industry to implement these types of solutions we have discussed here today. This may surprise some of you but I grew up in the outskirts in southern part of India, outside the city, where we had one computer for the entire high school. Since I didn't come from an affluent family, I learned to code on this computer with only 10 minutes of access every week and I was fascinated. But you don't have to grow up in a rural Indian village to experience limited access to computer science education. It's happening here every day in the United States. In fact, the U.S. graduates only 54,000 CS students each year and that is the dominant pathway to roles in AI and ML, yet the AI workforce is expected to add 1 million jobs by 2029. This creates quite the gap, and the graduation pipeline is further hindered by a lack of diversity. This is where community colleges and minority-serving institutions can't really help. They are the critical access point to higher education in the U.S. with more than 4 million students enrolled just last year. While data and ML programs are available in many universities, they are really limited in community colleges and MSIs where lower-income and underserved students are more likely to enroll. And the faculty members with limited resources, they simply cannot keep up with the necessary skills to teach data management, AI and ML. If we want to educate the next generation of data developers, then we need to make it easy for educators to do their jobs, we need to train the trainer's. That's why today I'm personally very proud to announce a new educator program for community colleges and MSIs through AWS and MLU. [applause] This new train-the-trainer program includes the same content we use to train Amazon engineers, as well as the coursework we currently offer to institutions like UC Berkeley. Faculty can access free compute capacity, guided curriculum, and ongoing support from tenured science educators. With all these resources, educators are now equipped to provide students with AI/ML courses, certificates, and degrees. We have currently onboarded 25 educators from 22 U.S. community colleges and MSIs. And then 2023 we expect to train an additional 350 educators from up to 50 community colleges across the United States. We were able to bring an early version of this program to Houston Community College. Our team worked with HCC to create a tailored sequence of content for their students. And now, they are the first community college to have this coursework accepted as a full bachelor's degree. [applause] With continued feedback from educators, we will continue to remove barriers that educators face in this arena. My vision is that AWS will democratize access to data education programs, just like we do across our organizations and we are making progress. We are building years of programmatic efforts to make student data education more accessible. Last year, we announced AWS AI and ML scholarship programs to provide $10 million to underserved and underrepresented students and awarded 2000 scholarships today. We also provided students with hands-on training opportunities with AWS Academy, SageMaker Studio Lab, and AWS DeepRacer, our 118-scale race car driven by reinforcement learning. I hope that these programs can enable students to create sparks of their own, just like I did. So democratizing access to education through data and ML programs is really critical. But it's clear, we won't be able to fill this skill gap just through student education alone. That's why in addition to educating those entering the workforce, organizations must also focus on how to leverage their existing talent pool to support the future growth. Through our training programs, we are enabling organizations to build data literacy through ML tools, classroom training, and certifications. As I mentioned, AWS DeepRacer helps us train students on ML through reinforcement learning. But DeepRacer is not just for students. In fact, more than 310,000 developers from over 150 countries have been educated on ML with AWS DeepRacer. It continues to be the fastest way to get hands-on with ML, literally. In addition, we now offer customers more than 150 professional development courses related to data analytics and ML with 18 new courses launched in 2022 and we'll continue to add more. Now, while closing the cloud skills gap is critical not every employee need to have the technical expertise to do data-driven innovation. In fact, you need individuals in your organization without coding experience to help you connect the dots with your data. That's why we provide no code and no code tools that helps data analysts and marketers, typically known as your data consumers to visualize and derive insights from your data. QuickSight is our ML-powered BI solution that allows users to connect to data sources like S3, Redshift, or Athena, and create interactive dashboards in just minutes. We are continuing to add new capabilities in QuickSight at a rapid clip. But with more than 80 new features introduced just in the past year alone. And this week, Adam touched on a new capability called QuickSight Paginated Reports, that makes it easier for customers to use multiple reporting systems to create print-friendly, highly formatted reports in QuickSight. He also shared new features for QuickSight Q, which allows user to query the data in plain language without writing a single line of code. With these new capabilities, business users can ask wide questions to better understand factors that are impacting their underlying data trends. They can ask and forecast metrics by saying something like forecast sales for the next 12 months and get an immediate response based on information like your past data and seasonality. With Amazon QuickSight, you can enable more employees to create and distribute insights. And now more than 100,000 customers use QuickSight to help them act on data. For example, Best Western, a global hotel chain use QuickSight to share data with 23,000 hotel managers and employees and more than 4600 properties, enabling them to elevate their guest experience and drive ongoing business value. Another tool we offer in this arena is SageMaker Canvas, a no-code interface to build ML models with your data. Using Canvas, analysts can import data from various sources, automatically prepared data, and build and analyze ML models with just a few clicks. And we are continuing to invest in local no code tools with features that enhance collaboration across technical and non-technical roles. With all these services, access is no longer relegated to just one department in your organization. If you want to expand the number of ideas within your organization, you have to expand across different types of employees so that sparks can come from anywhere. Let's see how one customer, Warner Brother Games did just that. The current landscape of gaming is way more free to play, which means that the processing of data and those needs just grow and grow over time. Warner Brothers Games has worked with AWS since 2014. Because we work with AWS, it's meant our business could scale easily. Peak volumes on a launch day, we pull in about 3 billion events per day. There's no reason to guess or just go purely off gut instinct anymore. It's data. Data drives all the decisions. AWS is such an important partner because we don't have to worry about scale. We've tested up to 300,000 data events a second. We know when we launch a game it's not going to fall down on us. A specific example of how we use data to influence our strategy is MultiVersus. MultiVersus is a 2v2 brawler game. Featuring all the best characters from Warner Brothers. People know these characters, they love these characters. And if the design team doesn't nail the visceral feel of these characters, it's going to show up in the data. We can find that through the data, see how many people are getting impacted, and then propose solutions that will make the game better. One of the biggest lightbulb moments that I encountered is bringing our analytics data back into our partner's line of business tools, whether it's a game designer, designing a character or a world and seeing telemetry around how that worlds behave or bringing data into our CRM systems so folks that are marketing and interacting with our players can see what their experience with us has been historically. Anytime that we make a suggestion and that's changed in the game, I know why that change happened and what drove that decision-making, making the game better for the players. That's what it's really all about. So, the three elements of a modern data strategy I shared with you this morning, building future-proof data foundations, weaving connective tissue, and democratizing data across your organization. All of them play a critical role in helping you do more with your data. But if I can leave you with only one thing today, please remember, it's individuals who ultimately create these parts. But it is the responsibility of leaders to empower them with a data-driven culture to help them get there. Now, go create the next big invention. Thank you. [applause]

Info

Channel: AWS Events

Views: 312,748

Rating: undefined out of 5

Keywords: AWS, Amazon Web Services, AWS Cloud, Amazon Cloud, AWS re:Invent

Id: TL2HtX-FmiQ

Channel Id: undefined

Length: 105min 16sec (6316 seconds)

Published: Thu Dec 01 2022