[music playing] Please welcome the Vice President
of Data and Machine Learning at AWS, Dr. Swami Sivasubramanian. [music playing] Welcome to day three
of re:Invent, everyone. You know, this past summer,
my seven-year-old daughter, who wants to grow up
to be an inventor and a scientist among 20
other things, asked me a question, "Dad, how do scientists come up
with these amazing new inventions? How did they come up with new ideas?"
To answer the questions, I didn't want to just make up
and answer within like 10 seconds. I actually said, "Now, maybe
let's watch a few documentaries behind some of the greatest
inventions that changed humankind." And here I am, several months
into this exploration, still very fascinated by how
great inventions are born. We like to think that the genesis
for every great idea happens for the spark
at random thought, or the lightbulb moment,
or simply stroke of genius. And as history would dictate, it always happens
to seem so suddenly. With a flash of realization,
ancient mathematician Archimedes uncovered the physical law
of buoyancy in his bathtub. Isaac Newton developed
his theory of gravitation after observing
an apple fall from a tree. Percy Spencer discovered
the earliest microwave oven when this candy bar
accidentally melted in his pocket when he was standing
in an MIT lab, and he was standing next
to an active magnetron. These are the vacuum tubes
used in early radar systems. But is that really how these
light bulb moments work? Are they really as instantaneous
as we have been led to believe? These Aha! moments are actually
preceded by ingesting hundreds, if not thousands of information
that our minds assimilate over time. Let's revisit the microwave
oven example. Percy Spencer had more than
20 years of experience working at magnetrons, leading up
to that moment in the MIT lab. And before that,
he had more than… Another, he was an expert
in radio technology while working for the US Navy. In fact, it actually took Spencer
more than 30 years to arrive
at his microwave epiphany. He just had to connect the dots,
just like we do with our data. Researchers Dr. Mark Beeman
and Dr. John Kounios wanted to explore
this phenomenon even further. They measured participants
brain activity through a specific set of tasks and found that creativity follows
a real scientific process, in which our brains
produce these big ideas. That research demonstrate that
human beings concentrate, analyze, and find correlations
in different lobes of our brain, when we are making
sense of new information, and they even process it
when we sleep. And this all happens
before this creative spark occurs in this lobe,
right about the right ear. To put it simply, they prove
that insights can occur when our observations are paired with
the power of analytical processing. Really cool, right? I am fascinated by this research
for several reasons. If you look closely, the human mind
shows us how we can harness the power
of data to drive creativity. The same process
can apply to organizations. However, applying
the neuroscience of creativity to a modern-day organization,
it's not always perfect. Our environments in organizations present many specific challenges,
and several important caveats. Within a business we refer
to the knowledge or information
we acquire as data points. But unlike the human brain, there isn't one centralized
repository to collect all our data, which often means
it leads to data silos and inconsistencies
across an organization. It takes a considerable
amount of time and effort to clean your data and store it
in accessible locations. Unlike the human brain, data isn't automatically
processed when we sleep. It always should be hard
to build automation into our data infrastructure
to avoid manual replications and costly updates
after working hours. Data doesn't naturally flow
within our organization, like the neural pathways
in our brain. We had to build complex pipelines
to move data to the right place, and set up mechanisms
for the right individuals to get access to the data
when and where they need it. And finally, data isn't always
easy to analyze or visualize, which can make it
really difficult for you to identify critical correlations
that spark these new ideas. If you step back, you need all of
these elements to come together in order for these parts, your new products or new customer
experiences, to come to life. So while this theory of neuroscience
can be applied to the principles
of data science, they must acknowledge
that the processes required to maximize the value of our data
are far from innate. I strongly believe data has
the genesis for modern invention. To produce new ideas without data, we need to build
a dynamic data strategy that leads to new customer
experiences as its final output. And it is absolutely critical
that today's organizations have the right structures
and technology in place that allows new ideas
to form and flourish. While building a data strategy
can really feel like a daunting task, you are not alone. We have been in the data business long before even
AWS came into existence. In fact, Amazon's early leaders
often repeated the face that data beats intuition. We built our business on data. They have enabled data-driven
decision making with [PH] Babbler, our internal AV testing suite, to produce the earliest book
recommendation on amazon.com. Since then, we have used data to develop countless products
and services, including two day shipping
to local grocery delivery, and many more. We also use data to anticipate
our customers expanding storage needs, which paves
the way for the development of AWS. And for more than 15 years
we have solved some of the most complex
data problems in the world with our innovations in storage,
databases, analytics, and AI and ML. We delivered the first scalable
storage in the cloud with S3, the first purpose-built database
in the cloud with DynamoDB, the first fully managed cloud data
warehouse with Redshift and many more. Since introducing
several of these files, we are continuing to launch
new features and services that make it easy
to create, store, and act on data. And we have seen recognition
for many of our services. This year, AWS received a 95
out of 100 score in the Gartner Solution Scorecard
for Amazon RDS, including Amazon Aurora. These types of achievements
are why more than one and a half million customers are
counting AWS for their data needs. We're about with some of the biggest
brands in the world like Toyota, Coca-Cola,
and Capital One to build comprehensive
end-to-end data strategy. And our customers are using
these strategies to transform their data
into actionable insights for their businesses every day. For example, organizations
like Bristol Myers Squibb use AWS data services
to advance the application of single cell technologies in drug
development and clinical diagnosis. Nielsen built a data lake capable
of storing 30 petabytes of data, expanding that ability
to process customer insights from 40,000 households to 30
million households on a daily basis. And in the race to launch
autonomous vehicles, Hyundai leverages AWS
to monitor, trace, and analyze the performance
of their machine learning models, achieving a 10x reduction
in their model training time using Amazon SageMaker. By working with leaders
across all industries, and of all sizes, we have discovered
at least three core elements of a strong data strategy. First, you need a future-proof
data foundation supported by core data services. Second, you need solutions
that weave connective tissue across your entire organization. And third, you need the right
tools and education to help you democratize your data. Now, let's dig in starting with
the future-proof data foundation. In the technology industry, we often hear the phrase,
future-proof, thrown around a lot to market all
sorts of products and technologies. But my definition of
future-proof foundation is clear. It means using the right services
to build a foundation that you don't need to be heavily rearchitecting or incur
technical debt as your needs evolve, and the volume
and types of data changes. Without a data strategy
that is built for tomorrow, organizations won't be able
to make decisions that are key
to gaining a competitive edge. To that end, a future-proof
data foundation should have four key elements. It should have access to the
right tools for all workloads and any type of data so you can adapt
to changing needs and opportunities. It should be able to keep up
with the growing volume of data by performing
at really high scale. It should remove the
undifferentiated heavy lifting for your IT and data team
so you can spend less time managing and preparing your data
and more time getting value from it. And finally, it should have
the highest level of reliability and security
to protect your data stores. For the first element
of future-proof data foundation, you will need the right tools
for every workload. We believe that every customer
should have access to a wide variety of tools
based on data types, personas, and use cases,
as they grow and change. A one-size-fits-all approach
simply does not work in the long run. In fact, our data supports this. 94% of our top 1,000 AWS customers use more than 10 of our
databases and analytics services. That's why we support
your data journey with the most comprehensive
set of data services out of any cloud provider. We support data workloads
for your application with the most complete set
of relational databases like Aurora and a purpose-built
databases like DynamoDB. We offer the most comprehensive
set of services for your analytics workloads,
like SQL analytics with Redshift, big data analytics with EMR, business
intelligence with QuickSight, and interactive log analytics
with OpenSearch. We also provide
a broad set of capabilities for your machine learning workloads,
with deep learning frameworks, like PyTorch and TensorFlow,
running on optimized instances, and services like Amazon SageMaker that makes it really easy
for you to build, train, and deploy ML models end to end, and AI services with built-in machine
learning capabilities with services like Amazon Transcribe
and Amazon Textract. All of these services together, come to form your
end-to-end data strategy, which enables you to store
and query your data for your databases,
data lakes and data viruses. Act on your data with analytics,
BI, and machine learning. And catalog and govern
your data with services that provide you with
centralized access controls, with services like Lake Formation
and Amazon DataZone, which I will dive into later on. By providing a comprehensive
set of data services, we can meet our customers
where they are in their journey, from the places they store
their data to the tools and programming languages
they use to get the job done. For example, take a look
at Amazon Athena, our serverless
interactive query service, which was designed
with a standard SQL interface. We made Athena really easy to use. Simply point your data in S3,
define your schema, and start querying to receive
insights in just within seconds. Athena SQL interface
and ease of use is why it's so popular
among data engineers, data scientists,
and many other developers. In fact, tens of thousands of AWS
customers use Amazon Athena today. While we have made it really easy
to leverage SQL on Athena, many of our customers
are increasingly using open-source frameworks
like Apache Spark. Apache Spark is one of the most
popular open-source frameworks for complex data processing, like regression testing,
or time series forecasting. Our customers regularly use Spark
to build distributed applications with expressive languages
like Python. However, to build
interactive applications using Spark, our Athena customers told us
that they want to perform this kind of complex data
analysis using Apache Spark, but they do not want to deal
with all this infrastructure setup and keeping up all these clusters
for interactive analytics. They wanted the same ease of use
we gave them with SQL on Athena. That's why today I'm thrilled
to announce Amazon Athena for Apache Spark-- [applause] -- which allows you to start
running interactive analytics on Apache Spark
in just under one second. Amazon Athena for Apache Spark enables you to spin up
Spark workloads up to 75 times faster than other serverless
Spark offerings. You can also build Spark
applications with a simplified notebook
interface in the Athena console or using Athena APIs. Athena is deeply integrated with other AWS services
like SageMaker and EMR, enabling you to query your data
from various sources and you can chain
these calculations together and visualize your results. And with Athena, there is
no infrastructure to manage and you only pay
for what you use. We are thrilled to bring
Apache Spark to our Athena customers,
but we are not stopping there. Just yesterday, we announced
Amazon Redshift Integration for Apache Spark, which makes it easier
for running Spark applications on Redshift data
from other AWS analytic services. This integration enables
EMR applications to access Redshift data
to run up to 10x faster compared to existing
Redshift-Spark collectors. And with a fully certified
Redshift connector, you can quickly run analytics and MO
without compromising on security. With these new capabilities, AWS is the best place to run
Apache Spark in the cloud. Customers
can run Apache Spark on EMR, Glue, SageMaker, Redshift, and Athena
with our optimized Spark runtime, which is up to 3x faster
than open-source Spark. We're pleased to bring these
new integrations to our customers. So, we've discussed how critical
it is to have a variety of tools at your
fingertips when you need them. But these tools should also
include high performing services that enable you to grow your
businesses without any constraints. That brings me to the second element
of our future-proof data foundation, performance at scale. Your data foundation
should perform at scale across your data viruses,
databases, and data lakes. You will need industry-
leading performance to handle inevitable
growth spurts in your business. You will need it when you want
to quickly analyze and visualize your data,
you will need to manage your costs without compromising
on your capacity requirements. Our innovations have helped
our customers at scale, right from day one. And today, Amazon Aurora auto scale
up to 228 terabyte per instance at 1/10 the cost of other legacy
enterprise databases. DynamoDB process more than
100 million requests a second across trillions of API
calls on Amazon Prime Day this year. But Amazon Redshift, tens of
thousands of customers collectively process exabytes
of data every day. It's up to five times better
price performance than other cloud data viruses. Redshift also delivers up to
seven times better price performance on high concurrency low
latency workloads like dashboarding. And Document DB, our fully-managed
document database service that can automatically scale up to 64 terabytes of data
per cluster with no latency, that serves millions
of requests per second. Tens of thousands of AWS customers,
including Venmo, Liberty Mutual, and United Airlines
rely on Document DB to run their JSON
document workloads at scale. However, as our DocumentDB
customers experience growth, they have asked us for easier ways
to manage scale without having performance impacts. For example, they said
it's really difficult to handle throughput beyond the capacity
of a single database node. So they [PH] told the scaling guard
or sharding their data sets across multiple database instances
is really, really complex. You got to actually build special
application logic for sharding. You got to manage the capacity, and you got to reach out
your database live without having
any performance impact. In such a distributed setting, even routine tasks can become
increasingly cumbersome, as the application scales
across hundreds of instances. They also wanted the ability to
auto scale to petabytes of storage. And the only alternate options
that exists, either they squeal slowly
or they are really expensive. So they asked us for an easy button
to scale reads and writes. That's why I'm pleased to announce the general availability of
Amazon DocumentDB Elastic Clusters, a fully-managed solution
for document workloads of virtually
any size and scale. [applause] Elastic Clusters automatically scale to handle virtually
any number of reads and writes with petabytes
of storage in just minutes, with little to no downtime
or performance impact. You don't have to worry
about creating, removing, upgrading, or managing,
or scaling your instances. Elastic Clusters takes care of
all these underlying infrastructure. This solution will save developers
months of time for building and configuring all
these custom scaling solutions. I am proud to share this
new capability with you today. That is just one example
of how we are helping you scale. In the leadership session today,
Jeff Carter, our VP of Database
Services and Migration Services, will explain
how fully-managed databases can help you build faster
and scale further than ever before. So, when our organizations are backed
by high performing services, they can deliver better
experiences than ever before. And we are helping
our customers perform at scale across a variety
of AWS data services. Netflix provides a quality customer
experience using S3 and VPC Flow Logs
to ingest terabytes of data per day, enabling them to respond
to events in real time across billions of traffic flows. Philips uses Amazon SageMaker to apply machine learning
to over 48 petabytes of data, making it easy for clinicians using
its digital health suite platform to identify at risk patients. And with AWS data services, Expedia is able to deliver
more scalable products and online experiences
to their travelers. But I won't steal their thunder. Let's welcome Rathi Murthy,
Expedia Group CTO and President of Expedia Product
and Technology. [music and applause] Good morning. I'm super excited to speak
to an audience that actually understands
the power of data. When Expedia Group started
almost 25 years ago, it disrupted the travel space. Online travel was truly
a groundbreaking innovation. Today, we connect over
168 million loyalty members, over 50,000 B2B partners
with over 3 million properties, 500 airlines, car rentals,
and cruise lines. Expedia is one of the world's
largest online travel companies powering travel
in over 70 countries. But at our core,
we are a technology company. We have gathered decade's
worth of data on travel behaviors, booking patterns, traveler
preferences, and partner needs. When I joined Expedia Group
last year, I was super excited
to work for a company that brought together my passion
to lead technology, my love for travel together,
with customer centricity at its core. It felt like a perfect marriage
between technology, travel,
and transformation. Today, we are mastering the art
of transformation on two fronts, one, transforming
our own company, and two, transforming
the travel industry. Like many companies
in the room here today, Expedia Group
scaled through acquisitions. And as technologists,
we all know, this means multiple stacks,
and added complexity. And as you bring in more partners, you need to reconfigure, which can be
costly and time consuming. And like AWS, we are also
a customer-first company. We understand the power of data, and that data is key
to drive our innovation and our long-term success. And we've continued to invest
in our AI/ML to drive those great experiences
across our platform. Just to give you an idea
of our scale today, we process over 600 billion
AI predictions per year powered by over
70 petabytes of data. We also use AI/ML to run
over 360,000 permutations of one page
on one of our brand sites, which means that every time
a traveler comes to our site, they see what is
most relevant to them. To help us with this
massive transformation, we've been working with AWS
on a few fronts. One, helping us modernize
our infrastructure to stay highly available
for our travelers, by helping us migrate
our applications to a container-based solution by leveraging Amazon EKS
and Karpenter. Two, to help us render
relevant photos and reviews for our travelers
at sub-millisecond latency with over 99% accuracy by leveraging Amazon DB
and SageMaker. And last, but not the least, also helping us
self-serve our travelers by hosting
our conversation platform, which has powered over
29 million virtual conversations, saving us over
8 million agent hours. But before I continue,
let's just take a moment and think about
a truly great holiday. What made it great? Was it the places you visited, the people you were with,
the things you saw? When my children were seven
and five years old, we decided as a family that we would
visit a new country every year. this is a picture
from one of our trips to Paris. It doesn't look like it. But yes, it was a family trip. What touched me most was when I read
on their college essays that they learned more
from these trips about the world
and the culture and life than any textbook
had taught them thus far. Travel is so much more
than just a transaction. Some of our best memories
are from a trip away. And the best way to broaden
our understanding of the world is to actually go out
there and experience it. This is the reason I love being
a technologist working in travel, where we can innovate products that bring joy to so many people
all over the world. And… data
is our competitive advantage. And we want to leverage the immense
amount of data we've hosted on AWS to innovate products
and create those memories. Now, knowing when to book
a flight truly seems like dark art. Earlier this year, we launched
Price Tracking and Predictions. This uses machine learning
and our flight shopping data to map past trends
and future predictions for the prices
for your flight route so that you understand the best time
to book your flight with confidence. Equally, comparing hotel rooms
is also super complex. With our smart shopping,
we have the ability now to compare different hotels easily. We leverage AI to read through
billions of read descriptors that pull out attributes
like room features, upgrades, amenities,
all together on one page, so you can easily compare different
hotel types side to side and make your right choices. Every time a traveler interacts
with us, we collect more data,
our models become smarter, and our responses
become more personalized. 2022 was a transformative year
for us at Expedia Group. Earlier this year, we launched our Open World vision to power partners
of all sizes with the technology and supply needed to thrive
in the travel market, a first in the travel sector. At its core, it's truly rebuilding
a platform in an open way, in a way taking all
of our big capabilities, breaking it up into microservices
or small building blocks that are configurable,
extensible and externalizable so that we can accelerate anyone
in the travel business or even help someone
enter the travel market. So if you're an airline wanting to
expand its offerings with hotels, or if you're an influencer wanting to
make it easy for your followers to book that same amazing trip, we can provide you
with the building blocks to create everything
from the basic payment portal to the complete travel store. So just as we opened the world
to travel 25 years ago, we are now making travel
as a business more open and accessible to all. Thank you. [music playing] Thank you, Rathi. So as you saw with the Expedia story, when customers are backed by tools
that enable them to perform at scale, they can analyze their data
and innovate a lot faster, and all with less manual effort. This brings me to the third element
of a future-proof data foundation: removing heavy lifting. We are always looking for ways
to tackle our customers' pain points by reducing manual tasks through
automation and machine learning. For instance, DevOps Guru uses
machine learning to automatically detect and remediate database issues
before they even impact customers, while also saving database
administrators time and effort to debug the issues. Amazon S3 Intelligent-Tiering
reduces ongoing maintenance by automatically placing
infrequently accessed data into lower-cost storage classes by saving users up
to $750 million to date. And with Amazon SageMaker,
we are removing the heavy lifting associated with machine learning so that it's accessible
to many more developers. Now, let's take a closer
look at Amazon SageMaker. As I mentioned earlier,
SageMaker enables customers to build, train and deploy ML models
for virtually any use case, and with tools for every step of
your machine learning development. Tens of thousands of customers
are using SageMaker ML models to make more than a trillion
predictions every month. For example, Dow Jones & Company
created an ML model to predict the best time of day
to reach their customers of Wall Street
Journal, Barron's and Market Watch subscribers, improving their customer
engagement rate by up to 2x their previous strategies. Many of our customers are solving
complex problems with SageMaker by using the data to build ML models,
right from optimizing driving routes for rideshare apps
to accelerating drug discovery. Most of these models are built
with structured data, which is really well organized
and quantitative. However, according to Gartner,
80% of all new enterprise data is now unstructured
or semi-structured, including things like images
and handwritten notes. Preparing and labeling
unstructured data for ML is really, really complex and labor-intensive. For this type of data, we provide features like SageMaker
Ground Truth and Ground Truth Plus that helps you lower your costs
and make data labeling a lot easier. However, customers told us that
certain types of data are still too difficult to work with,
such as your geospatial data. Geospatial data can be used
for a wide variety of use cases, right from maximizing harvest yield
in agricultural farms, to sustainable urban development, to identifying a new location
for opening a retail store. However, accessing high-quality
geospatial data to train your ML models requires working with multiple data sources
and multiple vendors. And these data sets are typically
massive and unstructured, which means time-consuming data
preparation before you can even start writing a single line of code
to build your ML models. And tools for analyzing and
visualizing data are really limited, making it harder to uncover
relationships within your data. So not only is this
such a complicated process, but it requires such a steep learning
curve for your data scientists. So today, we are making it easier
for customers to unlock the value
of the geospatial data. I'm really excited to announce that
Amazon SageMaker now supports new geospatial
ML capabilities. [applause] With these capabilities, customers can access
geospatial data on SageMaker from different data sources
with just a few clicks. To help you prepare your data,
our purpose-built operations enable you to efficiently process
and enrich these large datasets. It also comes with
built-in visualization tools, enabling you to analyze your data
and explore model predictions on an interactive map
using 3D accelerated graphics. Finally, SageMaker also provides
built-in pre-trained neural nets to accelerate model building
for many common use cases. Now, let's see how it works. Please welcome Kumar Chellapilla,
our GM for ML and AI services at AWS, who will demonstrate these
new capabilities and action. [music playing] Thanks, Swami. Imagine a world where when
natural disasters such as floods, tornadoes,
and wildfires happen, we can mitigate
the damage in real time. With the latest advances
in machine learning, and readily available
satellite imagery, we can now achieve that. Today we have the ability to not
only forecast natural disasters, but also manage our response
using geospatial data to make life-saving decisions. In this demo, I'm going to take
on the role of a data scientist who's helping first responders with
relief efforts as the flood occurs. Using the geospatial capabilities
in Amazon SageMaker, I can predict dangerous
road conditions caused by rising water levels,
so that I can guide first responders on the optimal path
as they deliver aid, send emergency supplies
and evacuate people. In such a scenario,
I want to move as quickly as I can because every minute counts. I want to get people to safety. Without SageMaker,
it can take a few days to get access to data
about real-world conditions and even more time
to make predictions because the data is scattered
and difficult to visualize. And there's no efficient way
to train and deploy models. Now, let me dive into the demo, and show how to access
geospatial data, build and train a model
and make predictions using the new geospatial
capabilities in Amazon SageMaker. To build my model, I need
to access geospatial data, which is now readily
available in SageMaker. Instead of spending time
gathering data from disparate sources and vendors,
I simply select the data I need. In this case, I select open-source
satellite imagery from Sentinel-2
for the affected area. In order to understand
where the water spread, I apply land classification,
a built-in SageMaker model, which classifies the land
as having water or not. Looking at images before
and after the flood occurred, I can clearly see how the water
is spread across the entire region and where it caused
the most severe damage. Knowing where floodwaters
are spreading is super helpful. But I still need to zoom in to see
which roads are still there and help first responders
navigate safely. Next, I add high-resolution
satellite imagery from Planet Labs, one of the third-party data
providers in SageMaker. These visualizations allow me
to overlay the roads on the map so I can easily identify
which roads are underwater, and keep first responders up to date
as conditions unfold on the ground. Now that I understand my data,
I start making predictions. With SageMaker, I don't have
to spend weeks iterating on the best model
for my data. I simply select one of the
pre-trained models in SageMaker, in this case, road extraction, which makes it easy for me
to train the model on my data and send directions
to the first aid team. Once the model is ready,
I can start making predictions. In this case, the model
I built identifies which roads are still intact
and not underwater. Using the visualization
tools in SageMaker, I can view the predictions
in an interactive map so that I have full visibility
on what's happening on the ground. I can see that the red-colored
roads are flooded, but the green color roads are still
available and safe to drive on. Similar to satellite imagery
from Planet Labs, I can add point-of-interest data
from Foursquare to see where the nearest hospitals,
medical facilities and airports are. For example, I can see
that the airfield on the left is surrounded by water,
so I must use the temporary helipad or the international airport
on the right instead. With this information in hand, I can now give clear directions
within minutes so that they know the best path
for sending emergency aid, directing medical staff and routing
people out of the flood zone. We've covered flood path predictions. But SageMaker can support
many different industries. In fact, later today
during the AI/ML leadership session with Bratin Saha, you will hear how BMW uses
geospatial machine learning. As Swami mentioned,
it's not just automotive. Customers use
geospatial machine learning for a variety of use cases in retail, agriculture and urban planning, and the list goes on. We can't wait to hear what you
will do with geospatial data. Head over to the console today and try the new geospatial
capabilities in Amazon SageMaker. Thank you. [music playing] Thank you, Kumar. These types of innovations
demonstrate the enormous impact that data can have for
our customers and for the world. It's clear that data
is extremely powerful. And today it is critical to almost
every aspect of your organization, which means you need to put
the right safeguards in place to protect it from costly disruptions
and potential compromises. This brings me to the last element
of the future-proof data foundation: reliability and security. AWS has a long history of building
secure and reliable services
to help you protect your data. S3 was built to store your data
with 11 9s of durability, which means
you can store your data without worrying about backups
or device failures. Lake Formation helps you build
a secure data lake in just days with
fine-grained access control. And our core database services
like DynamoDB, Aurora and RDS were architected
with multi-AC capabilities to ensure seamless failovers
in the unlikely event an AC is disrupted, thereby protecting our customers'
mission-critical applications. But today, our customers' analytics applications on Redshift
are mission-critical as well. While our Redshift customers
have recovery capabilities like automated backups, and the ability to relocate
that cluster to another AC
in just minutes, they told us that sometimes
minutes are simply not enough. Our customers told us
they want their analytics applications to have the same
level of reliability that they have with their databases
like Aurora and Dynamo. I'm honored to introduce
Amazon Redshift multi-AZ, a new multi-AZ configuration that delivers the highest levels
of reliability. [applause] This new multi-AZ configuration
enhances availability for your analytics applications
with automated failover in the unlikely event
an AZ is disrupted. Redshift multi-AZ enables
your data warehouse to operate on multiple AZ simultaneously
and process reads and writes without the need
for an underutilized standby sitting idle in a separate AZ. That way, you can maximize
on your return on investment, and no application changes
or other manual intervention required to maintain
business continuity. But high availability is just
one aspect of a secure and reliable data foundation. We are making ongoing investments
to protect your data from the core to the perimeter. While these security mechanisms
are critical, we also believe
they should not slow you down. For example, let's take a look
at the security as it relates to Postgres. Postgres and RDS and Aurora has
become our fastest-growing engine. Developers love Postgres extensions because it enhances the
functionality of their databases. And with thousands of them
available, our customers told us they want them
in a managed database. However, extensions provide
super user access to your underlying file systems, which means they come with a huge
amount of organizational risk. That's why they must be
tested and certified to ensure they do not interfere
with the integrity of your database. This model is like imagine you're
building an impenetrable fortress only to leave the keys
on the front door. To solve this problem
for our customers, we have invested
in an open-source project that makes it easier to use certified
Postgres extension in our databases. Today, I'm excited to announce Trusted Language
Extensions for Postgres, a new open-source project that allows
developers to securely leverage Postgres extensions
on RDS and Aurora. [applause] These Trusted Language Extensions
help you safely leverage Postgres extensions to add
the data functionality you require for your use cases without waiting for
AWS certification. They also support popular
programming languages you know and love, like
JavaScript, Perl, PL/pgSQL. With this project, our customers
can start innovating quickly without worrying about
unintended security impacts to their core databases. We will continue to bring value
to our customers with these types
of open-source tools while also making
ongoing contributions back to the open
source community. So now that we have talked about
protecting your data at the core, let's look at how we are helping
your customers protect their data
at the perimeter. When you leverage your
database services on AWS, you can rely on us to operate, manage and control
the security of the cloud, like the hardware, software
and networking layers. With our shared responsibility model,
our customers are responsible for managing the security
of their data in the cloud, including privacy controls
for your data, who has access to it,
how it's encrypted. While this model eases
a significant portion of the security burden
for our customers, it can still be
very difficult to monitor and protect against these
evolving security threats to your data year round. To make this easier
for our customers, we offer services
like Amazon GuardDuty, an intelligent
threat detection service that uses machine learning to monitor your AWS accounts
for various malicious activity. And now we are extending the same
threat detection service to our fastest-growing database. Built for Amazon Aurora,
I'm very excited to announce the preview
of GuardDuty RDS Protection... [applause] ...which provides intelligent
threat detection in just one click. GuardDuty RDS Protection
leverages ML to identify potential threats
like access attacks for your data stored
in Amazon Aurora. It also delivers detailed
security founding so you can quickly locate
where the event occurred and what type of activity
took place. And all this information
is consolidated at an enterprise level for you. Now that we have explored the
elements of data proof foundation-- future-proof data foundation,
we will dive deep into how you can connect
the dots across your data stores. The ability to connect your data is as instrumental
as the foundation that supports. For the second element
of a strong data strategy, you will need a set of solutions that help you weave the connective
tissue across your organization from automated data pathways
to data governance tools. Not only should this connective
tissue integrate your data, but it should also integrate
your organization's departments,
teams and individuals. To explain the importance
of this connective tissue, I wanted to share an analogy
that is really close to my heart. This is a picture of a Jingkieng Jri
in the northeastern part of India. It is a living bridge
made of elastic tree roots in the state of Meghalaya. These bridges are built
by the Khasi tribe, indigenous farmers and hunters
who trek through dense valleys and river systems
just to reach nearby towns. Every year, the monsoon season
means the forest rivers become almost impassable,
further isolating their villages that are sitting on top
of the foothills of Himalayas. That is until these living bridges
came to be. So you might be asking yourself, why is Swami talking about
these ancient root bridges when he is supposed
to be talking about my data? Well, I wanted to share this story
because we can apply many valuable engineering lessons
from the Khasi on how we can build connective
tissue with our data stores. First, they use quality tools
that enable growth over time. The Khasi built the structures
with durable root systems that were able to withstand some of
the heaviest rainfall in the world, and these bridges can last up
to 500 years by attaching and growing
within their environment. Similarly, your connective tissue
needs both quality tools and quality data
to fuel long-term growth. Second, they leveraged a governance
system of cooperation. Over a period of decades,
and sometimes even centuries, tribal members cooperated
and shared the duty of pulling
these elastic roots one by one, until a passable bridge was formed. With data, governance enables
safe passage for disconnected teams and disconnected data stores
so your organizations can collaborate
and act on your data. And finally, they created strong
pathways to their vital resources. These bridges protected
the region's agricultural livelihood by providing a pathway
from remote villages to nearby towns. The Khasi were engineers
of connection because their success
depended on it. Today, one of the most
valuable assets in our organization is connected data stores. Connectivity, which drives
ongoing innovation is also critical for our
survival in the organization. Now let's revisit the importance
of using high-quality tools and high-quality data
to enable future growth. When our customers want
to connect their structured and unstructured data
for analytics and machine learning, they typically use a data lake. Hundreds of thousands of data
lake run on AWS today, leveraging services
like S3, Lake Formation, and AWS Glue,
our data integration service. Bringing all this data together
can help you gather really rich insights,
but only if you have quality data. Without it, your data lake
can quickly become a data swamp. To closely monitor
the quality of your data, you need to set up quality rules. And customers told us building these data quality rules
across data lakes and their data pipelines
is very, very time consuming, and very error prone
with a lot of trials and errors. It takes days, if not weeks
for engineers to identify and implement them, plus additional time needs to be
invested for ongoing maintenance. They asked for a simple and automated
way to manage the data quality. To help our customers do this, I'm pleased to share
the preview of AWS Glue Data Quality, a new feature of AWS Glue. [applause] Glue Data Quality helps you
build confidence in your data so that you can make
data-driven decisions every day. Engineers can generate automated
rules for specific data sets in just hours, not days, increasing the freshness
and accuracy of your data. Rules can also be applied
to your data pipelines. So poor quality data does not
even make it to your data lakes
in the first place. And if your data quality
deteriorates for any reason, Glue Data Quality alerts you
so you can take action right away. Now, with high-quality data, you will be able to connect the dots
with precision and accuracy. But you also need to ensure that the right individuals
within your organization are able to access this data
so you can collaborate and make these connections happen. This brings me to the second lesson
we learned from the Khasi, creating a system of governance to unleash innovation
within your organization. Governance was historically viewed
as a defensive measure, which meant really
locking down your data silos. But in reality, the right
governance strategy helps you move and innovate faster
with well defined guardrails that give the right people
access to the data when and where they need it. As the amount of data
rapidly expands, our customers want
an end-to-end strategy that enables them
to govern their data across their entire data journey. They also want to make it
easier to collaborate and share their data while
maintaining quality and security. But creating the right
governance controls can be complex
and time-consuming. That's why we are reducing
the amount of manual efforts required to properly govern
all of your data stores. As I mentioned earlier, one of the ways we do this today
is through Lake Formation, which helps you govern and audit
your data lakes on S3. Last year, we announced new role
and cell-level permissions that help you protect your data by giving users access to the data
they need to perform their job. But end-to-end governance
doesn't just stop with data lakes. You also need to address data
access and privileges across more of customers
use cases. Figuring out which data
consumers in your organization have access to what data
can itself be time-consuming. From manually investigating
data clusters to see who has access to designating
user roles with custom code, there is really simply
too much heavy lifting involved. And failure to create these types
of safety mechanisms can mean unnecessary exposure,
or quality issues. Our customers told us they want
an easier way to govern access and privileges
with more of our data services, including Amazon Redshift. So today, I'm pleased to introduce
a new feature in Redshift Data Sharing,
Centralized Access Controls that allow you to govern
your Redshift data shares using Lake Formation console. [applause] With this new feature
in Redshift Data Sharing, you can easily manage access
for data consumers across
your entire organization from one centralized console. Using the Lake Formation console,
you can designate user access without complex querying
or manually identifying who has access
to what specific data. This feature also improves
the security of data by enabling admins
to granular role level and cell level access
within Lake Formation. Now, Centralized Access Controls
are critical to helping users access
siloed data sets in a governed way. One of the key elements
of an end-to-end data strategy is machine learning, which is really
critical for governance as well. Today, more companies are
adopting ML for their applications. Our governing this end-to-end process
for ML presents a unique set of challenges
very specific to ML, like onboarding users
and monitoring ML models. Because ML model building requires
collaboration among many users, including data scientists
and data engineers, setting up permissions requires time-consuming customized
policy creation for each user group. It's also challenging to capture and share model information
with other users in one location, which can lead to inconsistencies
and delays in approval workflows. And finally, custom instrumentation
is needed to gain visibility into the model performance,
and that can be really expensive. To address this for our customers,
we are bringing you three new machine-learning governance
capabilities for Amazon SageMaker, including SageMaker Role Manager,
Model Cards and Model Dashboards. [applause] These are really powerful
governance capabilities that will help you build
ML governance responsibly. To address permission sharing, Role Manager helps you define
minimum permissions for users in just minutes
without automated-- with automated policy creation
for your specific needs. To centralize the ML
model documentation, Model Cards create
a single source of truth throughout your entire
ML model lifecycle and auto-populate model
training details to accelerate your
documentation process. And after your models are deployed, model dashboard
increases the visibility but unified monitoring for
the performance of your ML models. With all these updates,
we have covered now governance for your data lakes,
data viruses, and machine learning. But for a true end-to-end governance, you will need to manage data access
across all of your services, which is the future state
we are building towards. As Adam announced yesterday--
[applause] -- we are launching Amazon DataZone,
a data management service that helps, catalog discover,
analyze, share, and govern data
across your organization. DataZone helps you analyze
more of your data, not just what's in AWS,
but also third-party data services while meeting your security
and data privacy requirements. I have had the benefit of being
an early customer of DataZone. I leverage DataZone to run the AWS
weekly business review meeting, where we assemble data
from our sales pipeline and revenue projections
to inform our business strategy. Now to show you DataZone in action, let's welcome our Head of Product
for Amazon Data Zone, Shikha Verma, to demonstrate how quickly
you can enable your organization to access and act
on your data. [music playing] Thanks, Swami. Wow! It's great to see you
all out here. I am so excited to tell all
the data people over here. Now you don't have to choose
between agility and getting to
the data you need, and governance to make sure
you can share the data across your enterprise. You can get both. We have built Amazon DataZone
to make it easy for you to catalog, organize, share, and analyze your data
across your entire enterprise with the confidence
of the right governance around it. As we know, every enterprise
is made up of multiple teams that own and use data
across a variety of data stores. And to do their job, data people,
like data analysts, engineers, and scientists have to pull
this data together but do not have an easy way to access
or even have visibility to this data. Amazon DataZone fills this gap. It provides a unified environment, a zone, where everybody in
your organization from data producers to consumers can go
to access, share, and consume data
in a governed manner. Let's jump into how this works. I'm going to use a very
typical scenario that we see across
our customers. This may seem familiar
to many of you. In this scenario, a product
marketing team wants to run campaigns
to drive product adoption. Sounds familiar? To do this, they need to add analyze
a variety of data points, including data that they have
in the data warehouse in Redshift, data that they have in their data
lake around data marketing campaigns, as well as third-party sources
like Salesforce. In this scenario,
Julia is a data engineer. She is a rock star. She knows the data in and out, and often gets requests
to share the data in the data lake with other users. To share this more securely
with a variety of users across her enterprise, she wants to catalog
and publish it in Amazon DataZone. She is our data producer. And Marina is a rock star
marketing analyst. She's a campaign expert
who wants to use the data in the data lake
to run the marketing campaigns. She is our data consumer. Let's see how Amazon DataZone
helps them connect. Let's start with Julia and see how
she publishes data into the DataZone. She logs into the DataZone portal
using her corporate credentials. She knows the data sources
that she wants to make available, so she creates an automated
sales publishing job. She provides a quick name
and description, selects a publishing agreement, which essentially like a data
contract that tells the consumers how frequently
she'll keep this data updated, how to get access, who will authorize
access and things like that. She then selects the data sources,
and the specific tables, and the columns that she wants
to make available in Amazon DataZone. She also sets the frequency
of how quickly this data will be kept into sync. Within a few minutes,
the sales pipeline data and the campaign data
from the data lake will be available in Amazon DataZone. Now, Julia has the option to enrich
the metadata and add useful information to it so that data consumers like Marina
can easily find it. She adds a description,
additional context, any other information that would
make this data easier to find. We also know that for large datasets, adding and curating all of this
information manually is laborious, time-consuming
and even impossible. So, we are making this much easier
for you. [applause] Thank you. We are building machine
learning models to automatically generate
business names for you. And then, Julia will have the option
at a column level to select the recommendation that we came up
with or edit it as you see please. How awesome is that? [applause] Thank you. I think so too. Once Julia has created this
particular data asset in DataZone she wants to make available
for the data consumers. In this scenario,
since Julia is a data expert, and she knows this data very well,
she also functions as a data steward. And she could publish
this directly into the DataZone. But we also know that many of you
have set up data governance frameworks or want to set up data
governance frameworks, where you want to have
business owners and data stewards managing your domain
the way you'd like to. For this, we also have that option. Now that the data is published
and available in Amazon DataZone, Marina can easily find it. Let's see how easy this is. Marina goes back to Amazon DataZone, logs in using her
corporate credentials, uses a search panel
to search for sales. A list of relevant assets is returned and she learns more about the data
and where it comes from. She can see a bunch of domains
in there. She sees sales, marketing, finance. She can also see that there is data
from all kinds of sources. She notices Redshift, data lake,
and Salesforce. And you also saw that there was
a variety of assets that she could have sorted
the search results on. It's really easy peasy. Now, to perform the campaign
analysis, Marina wants to work with
a few of her team members, because they want
the same access as her. So now, she creates a data project. Creating a data project is a really
easy way for her to create a project where she wants
the team members to collaborate with, they will get the same access that
she wants, to the right datasets, as well as the right tools such as
Athena, Redshift, or QuickSight. Marina knows the data she's after,
so she subscribes to it or gets access to it
using the identity of the project. And after this, any of our team
members can use the deep links
available in Amazon DataZone to get to the tools that they want. Using the deep links, they can
get to the service directly without any additional configuration
or individual permissions. In this particular case,
they choose Athena. And now Marina and her team
members can query the data that Julia wanted
to make available for them using the project context
and using the tools that they wanted. So, I know this went by quick, but hopefully,
you can see how easy this is. And the entire data discovery,
access, and usage lifecycle is
happening through Amazon DataZone. You get complete visibility
into who is sharing the data, what data sets is being shared,
and who authorized it. Essentially, Amazon DataZone
gives your data people the freedom
that they always wanted, but with the confidence
of the right governance around it. As Adam mentioned yesterday,
there is really nothing else like it. So, I can't wait to see how you use
it and come find out more in our dedicated
breakout session later today. Thank you. [music] Thank you Shikha, it's really
exciting to see how easy it is for customers to locate the data
and collaborate with DataZone. We'll continue to make it
even easier for customers to govern that data
for this new service. We're just getting started. So, I shared how governance can help
the connective tissue by managing data sharing
and collaboration across individuals
within your organization. But how do you weave a connective
tissue within your data systems to mitigate data sprawl
and derive meaningful insights? This brings me back to the
third lesson from the courses living bridges,
driving data connectivity for innovation and ultimately survival. Typically, connecting data across
silos requires complex ETL pipelines. And every time you want to ask
a different question of your data, or you want to build a different
machine learning model, you need to create
10 other data pipeline. This level of manual integration is simply not fast enough to keep up
with the dynamic nature of data and the speed at which
you want your business to move. Data integration needs
to be more seamless. To make this easier, AWS
is investing in a zero-ETL future where you never have to
manually build a data pipeline again. [applause] Thank you. We have been making strides
in the zero-ETL future for several years
by deepening integrations between our services that help
you perform analytics and machine learning without the need
for you to move your data. We provided direct integration
with our AWS streaming services, so you can analyze your data
as soon as its produced and gather timely insights
to capitalize on new opportunities. We have integrated SageMaker with
our databases and data warehouses so you can leverage
your data for machine learning without having
to build data pipelines or write a single line
of ML code. And with federated querying
on Redshift and Athena, customers can now
run predictive analytics across data stored
in operational databases, data warehouses, and data lakes
without any data movement. While Federated Query
is a really powerful tool, querying and analyzing data
stored in really different locations isn't optimized
for maximum performance when compared
to traditional ETL methods. That's why this week
we are making it easier for you to leverage your data with creating
and managing ETL pipelines. Yesterday we announced Aurora now supports zero-ETL
integration with Amazon Redshift, thereby bringing
your transactional data sitting in Aurora to the analytics
capabilities and Redshift together. This new integration is
already helping customers like Adobe to spend less time
building Redshift ETL pipelines and more time gathering insights to enhancing their core service
like Adobe Acrobat. We are also removing the heavy
lifting from ETL pipeline creation for customers who want to move data
between S3 and Redshift. For example, imagine you're
an online retailer trying to ingest terabytes
of customer data from S3 into Redshift
every day to quickly analyze how your shoppers are interacting
with your site and your application, and how are they making
these purchasing choices? While this typically requires
creation of ETL pipelines, what if you had the option to
automatically and continuously copy all of your data
with a single command? Would you take it? Today, I'm excited to announce
Amazon Redshift now supports auto copy
from S3 to make it easier
to continuously ingest your data. With this update, now customers
can easily create and maintain simple data pipelines
for continuous ingestion. Ingestion rules are
automatically triggered when new files are landing
on your S3 bucket without relying on custom solutions
or managing third-party services. This integration also makes it easy
for analysts to automate data loading without any dependencies
on your critical data engineers. With these updates
I have shared today, including Aurora zero-ETL
integration with Redshift, auto copying from S3, as well as integration
of Apache Spark with Redshift, we are making it easy for you
to analyze all of your Redshift data, no matter where it resides. And I didn't even cover all of our
latest innovations in this space. To learn more, make sure to attend
this afternoon's leadership session with G2 Krishnamoorthy,
our VP of AWS Analytics. With our zero-ETL mission,
we are tackling the problem of data sprawl by making it easier for you
to connect to your data sources. But in order for this to work, you can't have connections just
to some of your data sources. You need to be able to seamlessly
connect to all of them, whether they live in AWS or an external
third-party applications. That's why we are heavily investing in bringing your data
sources together. For example, you can stream data
in real-time from more than 20 AWS and third-party sources
with Kinesis Data Firehose, a fully managed serverless solution
that enables customers to automatically stream
the data into S3, Redshift,
OpenSearch, Splunk, Sumo Logic, and many more with
just a few clicks. Amazon SageMaker Data Wrangler, our no-code visual data
prep tool for machine learning makes it easy to import data from a wide variety of data sources
for building your ML models. And Amazon AppFlow, our no-code
fully-managed integration service offers connectors to easily
move your data between your cloud-based
SaaS services and your data lakes, and data ware houses. Because these connectors are
fully-managed and supported by us, you can spend less time building and maintaining these connections
between your data stores and more time maximizing
business value with your data. Our customers tell us they love
the no-code approach to our connector library. However, as expected,
they have continued to ask for even more connectors to help them
bring their data sources together. That's why today I'm pleased
to share the release of 22 new AppFlow connectors
including popular marketing sources like LinkedIn Ads
and Google Ads. With this update,
our AppFlow library now has more than 50 connectors in total
from data sources like S3, Redshift, and Snowflake, to cloud-based
application services like Salesforce, SAP,
and Google Analytics. In addition to offering
new connectors in AppFlow, we are also doing the same
for Data Wrangler and SageMaker. While SageMaker already supports
popular data sources like Databricks
and Snowflake, today, we are bringing you more than
40 new connectors through SageMaker
Data Wrangler, allowing you to implement
and import even more of your data for ML model building
and training. With access to all of
these data sources, you can realize the full value of
your data across your SaaS services. Now, looking across
of all of our services, AWS connects to
hundreds of data sources, including SaaS application,
on-prem, and other clouds so you can leverage
the power of all of your data. We are thrilled to introduce
these new capabilities that make it easier to connect
and act on your data. Now, to demonstrate the power
of bringing all your data together to uncover
actionable insights, let's welcome Ana Berg Asberg, Global Vice President R&D IT
at AstraZeneca. [music playing] Good morning. I know it's really early,
but I need your help. Can I ask you to raise your hand if you or any of your loved ones
have been impacted by lung disease? Keep them up and now add
to those hands if you or anyone you know
has been impacted by a heart failure
or heart diseases. And add to those hands
if you know anyone or any of your loved ones
has been impacted by cancer. Look around. These touch so many of us. It's important. You can take the hands down. Thank you very much. We at AstraZeneca, a global
biopharmaceutical company are breaking the boundaries
of science to deliver life-changing
medicines. We use data, AI, and ML with
the ambition to eliminate cancer as a cause of death and protect the lives of patients
with heart failure or lung diseases. In order to understand how we are
breaking the boundaries of science, we need to zoom in and start
really small with the genomes, the transcriptome,
proteome, metabolome. Say that fast with a Swedish accent
three times, it's quite hard. The genome is a complete
set of our DNA every single cell
in the body. It contains a copy
of the 3 billion DNA base pairs, mapping the genome to uncover new
insights into disease biology, and help us discover new
disease therapies. Today, our Center of Genomics Research
is on track to analyze up to 2 million
whole genomes by 2026. The scale of our genome
database is massive. And it's really hard to manage
a database at that scale, but we do it together with AWS. Together, we have moved 25 petabytes
of data across the AWS global network. We process whole genomes
across multiple regions, generating 75 petabytes of data
in the intermediate process. At a high level,
we use AWS Step Function, AWS Lambda for orchestration, AWS Batch to provision
optimal compute, and Amazon S3 for storage. So, the list is important,
we all know that, but the impact
is so much more critical. We can now run 110 billion
statistical tests in under 30 hours, helping us provide genetic
input to our AstraZeneca projects. Genomics give us the DNA blueprint. But as you know, it's not
the only ome. Beyond the genome, is a largely
untapped repositories of rich data that if connected could
give us valuable insights and we bring it together,
together with AWS into multi-omics. We bring the multi-omics data
together and make it available to mine for actionable insights
by the scientists. Having the bandwidth to process and maintain the data
or multi-omics data gives us the possibility
to take a step back We add to
the understanding of disease by looking small
at the data at our hand, the tumor scans, the medical images,
and patient data, and we pull it together
to detect patterns. For example, in lung cancer studies, we need to measure the tumor scans,
the CT scans, and we use a similar deep learning
technology that self-driving cars use to understand
the 3D environment to run them. Today, we use this
in the clinical trials, but in the future, this technology
could be used to inform how doctors make treatment decisions
with the prediction of what's going to happen next. As you can imagine, the quantity of the data at hand
has grown exponentially. And we are accelerating it together
with AWS… the pace that a scientist
can unlock patterns by democratizing
ML using Amazon SageMaker. We use AWS Service Catalog
to stand up templated end to end MLOps
environments in minutes. And we take every single step
with extra care as we're managing patient data
in a highly regulated industry. We can now run hundreds of concurrent
data science ML projects, to form insights into science. So, we looked small
at the multi-omics data, we looked at the data at hand, but one of the most exciting
advancements in the industry right now is that patients can choose in
clinical trials to share the data with us
from their ownhomesown homes. Today, the digital technology is able
to collect the data from the patient's home on a daily
or even continuous basis. And the data collected is as reliable
as data that could only be collected
in clinical settings before. The data adds value and enables
us to collect data from underdeveloped regions
and remote locations. Moving us toward early diagnosis,
disease prediction for all people, because our future
depends on healthy people, a healthy society,
and a healthy planet. AWS helps us to pull the data
together, the multi-omics, the data at hand
with the medical images, the tumor scans,
the remote data collection, and helps us to accelerate insights
to science through data, AI, and ML. Today, I raise my hand
in the beginning. I've been impacted by cancer.
I lost my father in 2018. This is my father and my mother
in the year before he passed. And he reminds me every day
that every data point we handle is a patient, a loved one. I work at AstraZeneca
and with my thousands of colleagues so you can spend every day
possible with your loved ones. And it's my privilege to do so. Thank you. [music playing] Wow, what a heartfelt
and inspirational story. Thank you, Ana. I'm truly amazed by how AstraZeneca was able to democratize data
and machine learning to enable these types
of innovation in healthcare. This brings me to the third and final
element of a strong data strategy, democratizing data. Since I joined Amazon 17 years ago, I have seen how data
can spur innovation in all levels, right from being an intern
to a product manager, to a business analyst
with no technical expertise. But all these can happen only
if you enable more employees to understand
and make sense of data. With a workforce that is trained
to organize, analyze, visualize
and derive insights from your data, you can cast a wider net
for your innovation. To accomplish this, you will need
access to educated talent to fill the growing number
of data and ML roles. You need professional development
programs for your current employees. And you need no code tools
that enable non-technical employees to do more with your data. Now, let's look at how AWS
is preparing students, the future backbone
of our data industry to implement these types of solutions
we have discussed here today. This may surprise some of you
but I grew up in the outskirts in southern part of India,
outside the city, where we had one computer
for the entire high school. Since I didn't come
from an affluent family, I learned to code on this computer with only 10 minutes of access
every week and I was fascinated. But you don't have to grow up
in a rural Indian village to experience limited access
to computer science education. It's happening here every day
in the United States. In fact, the U.S.
graduates only 54,000 CS students each year and that is the dominant pathway
to roles in AI and ML, yet the AI workforce is expected
to add 1 million jobs by 2029. This creates quite the gap, and the graduation pipeline
is further hindered by a lack of diversity. This is where community colleges
and minority-serving institutions can't really help. They are the critical access point
to higher education in the U.S. with more than 4 million
students enrolled just last year. While data and ML programs are
available in many universities, they are really limited
in community colleges and MSIs where lower-income and underserved students
are more likely to enroll. And the faculty members
with limited resources, they simply cannot keep up
with the necessary skills to teach data management, AI and ML. If we want to educate the next
generation of data developers, then we need to make it easy
for educators to do their jobs, we need to train the trainer's. That's why today I'm personally
very proud to announce a new educator program
for community colleges and MSIs through AWS and MLU. [applause] This new train-the-trainer program
includes the same content we use to train Amazon engineers,
as well as the coursework we currently offer
to institutions like UC Berkeley. Faculty can access free
compute capacity, guided curriculum, and ongoing support
from tenured science educators. With all these resources,
educators are now equipped to provide students with AI/ML
courses, certificates, and degrees. We have currently onboarded
25 educators from 22 U.S. community colleges and MSIs. And then 2023 we expect to train
an additional 350 educators from up to 50 community colleges
across the United States. We were able to bring an early
version of this program to Houston Community College. Our team worked with HCC
to create a tailored sequence of content for their students. And now, they are the first
community college to have this coursework accepted
as a full bachelor's degree. [applause] With continued feedback
from educators, we will continue to remove barriers
that educators face in this arena. My vision is that AWS
will democratize access to data education programs, just like we do across
our organizations and we are making progress. We are building years
of programmatic efforts to make student data
education more accessible. Last year, we announced AWS AI
and ML scholarship programs to provide $10 million to underserved
and underrepresented students and awarded 2000
scholarships today. We also provided students with hands-on training opportunities
with AWS Academy, SageMaker Studio Lab,
and AWS DeepRacer, our 118-scale race car driven
by reinforcement learning. I hope that these programs
can enable students to create sparks of their own,
just like I did. So democratizing access
to education through data and ML programs
is really critical. But it's clear, we won't be able
to fill this skill gap just through student
education alone. That's why in addition to educating those entering the workforce,
organizations must also focus on how to leverage
their existing talent pool to support the future growth. Through our training programs,
we are enabling organizations to build data literacy
through ML tools, classroom training,
and certifications. As I mentioned, AWS DeepRacer
helps us train students on ML through reinforcement learning. But DeepRacer is not just
for students. In fact, more than 310,000 developers
from over 150 countries have been educated on ML
with AWS DeepRacer. It continues to be the fastest way
to get hands-on with ML, literally. In addition, we now offer customers more than 150
professional development courses related to data analytics and ML with 18 new courses
launched in 2022 and we'll continue to add more. Now, while closing the cloud
skills gap is critical not every employee
need to have the technical expertise to do data-driven innovation. In fact, you need individuals
in your organization without coding experience to help
you connect the dots with your data. That's why we provide no code
and no code tools that helps data analysts
and marketers, typically known as your data
consumers to visualize and derive insights from your data. QuickSight is our ML-powered
BI solution that allows users to connect
to data sources like S3, Redshift,
or Athena, and create interactive
dashboards in just minutes. We are continuing to add
new capabilities in QuickSight at a rapid clip. But with more than 80 new
features introduced just in the past year alone. And this week, Adam touched
on a new capability called QuickSight Paginated Reports, that makes it easier
for customers to use multiple reporting systems to
create print-friendly, highly formatted
reports in QuickSight. He also shared new features
for QuickSight Q, which allows user to query
the data in plain language without writing
a single line of code. With these new capabilities,
business users can ask wide questions to better understand factors that are impacting
their underlying data trends. They can ask and forecast metrics by saying something like forecast
sales for the next 12 months and get an immediate response based on information
like your past data and seasonality. With Amazon QuickSight, you can
enable more employees to create
and distribute insights. And now more than 100,000 customers
use QuickSight to help them act on data. For example, Best Western,
a global hotel chain use QuickSight to share data with 23,000
hotel managers and employees and more than
4600 properties, enabling them to elevate
their guest experience and drive
ongoing business value. Another tool we offer in this arena
is SageMaker Canvas, a no-code interface to build
ML models with your data. Using Canvas, analysts can import
data from various sources, automatically prepared data,
and build and analyze ML models with
just a few clicks. And we are continuing to invest
in local no code tools with features that
enhance collaboration across technical
and non-technical roles. With all these services,
access is no longer relegated to just one department
in your organization. If you want to expand the number
of ideas within your organization, you have to expand across
different types of employees so that sparks
can come from anywhere. Let's see how one customer,
Warner Brother Games did just that. The current landscape of gaming
is way more free to play, which means that
the processing of data and those needs just grow
and grow over time. Warner Brothers Games
has worked with AWS since 2014. Because we work with AWS, it's meant our business
could scale easily. Peak volumes on a launch day, we pull in about
3 billion events per day. There's no reason to guess or just go
purely off gut instinct anymore. It's data. Data drives all the decisions. AWS is such an important partner because we don't have
to worry about scale. We've tested up to 300,000 data
events a second. We know when we launch a game
it's not going to fall down on us. A specific example of how we use data to influence
our strategy is MultiVersus. MultiVersus is a 2v2 brawler game. Featuring all the best characters
from Warner Brothers. People know these characters,
they love these characters. And if the design team
doesn't nail the visceral feel
of these characters, it's going to show up
in the data. We can find that through the data, see how many people
are getting impacted, and then propose solutions
that will make the game better. One of the biggest lightbulb moments that I encountered is bringing
our analytics data back into our partner's line
of business tools, whether it's a game designer,
designing a character or a world and seeing telemetry around
how that worlds behave or bringing data into our CRM systems so folks that are marketing
and interacting with our players can see what their experience
with us has been historically. Anytime that we make a suggestion
and that's changed in the game, I know why that change happened
and what drove that decision-making, making the game better
for the players. That's what it's really all about. So, the three elements
of a modern data strategy I shared with you this morning, building future-proof
data foundations, weaving connective tissue, and democratizing data
across your organization. All of them play a critical role in helping you
do more with your data. But if I can leave you
with only one thing today, please remember, it's individuals
who ultimately create these parts. But it is the responsibility
of leaders to empower them with a data-driven culture
to help them get there. Now, go create
the next big invention. Thank you. [applause]