- Good morning, everybody, welcome to Poor Use
Cases for Generative AI, hopefully you're in the right room. No, don't worry, don't worry. We're in this beautiful
room with a lovely audience, being altogether is such a treat. And to the people in the similar cast room somewhere else in Vegas, hello to you and thank you for joining
us, we're with you in spirits and if you're watching
the recording afterwards, well, high five to
everybody in the future, thanks so much for watching. My name's Julian Wood,
I'm a developer advocate in the Serverless Team, I love teaching and helping people build
serverless applications and also act as your voice internally to make sure we're building
the best products and features. And I'm joined by the one and only... - Oh, that's me, Chris Munns, I used to lead developer advocacy for Serverless here at AWS, these days I work as
part of our Startup team as the tech lead for North America. - Cool, Chris is gonna be
talking about lots of that later, but for now, so this is a big topic and talking about best
practices for Serverless, we unfortunately don't have
all four days of re:Invent, so there is a bit of a small warning, we are gonna be covering a lot and we're gonna be going quickly. So to give you as many possible
best practices as we can and give you as many jumping
off points with other links to more contents and
information to dive even deeper. So the slides and the recording
will be available later, it means you don't necessarily
have to take pictures if you don't want to, this resources page, which I'm gonna share again at the end, already has the slides and lots of other links to best practices, so you can have all the things you need to build your serverless applications. I've actually done two
previous talks on this topic at re:Invent in previous years,
if you haven't seen them, the links are also in the resources page, lots of best practices over
there also worth considering, we just wanna cover some new stuff today and have some more things to think about. So Chris and I just need to take a breath, you probably need to take a breath, it's gonna be a lot
today, everybody ready? - Woo.
- Let's go, okay. So a question we may
initially be thinking about is, well, all the serverless
stuff, is it a fad or actually is it the future? Well, we commonly think
of the start of serverless at Lambda like nine
years ago at re:Invent. But S3 and SQS, some of
our foundational services and they're very much serverless services, launched way before Lambda in 2006, and even before EC2 in 2008. The initial fundamental
building blocks of AWS were built before we
even introduced the idea of being able to rent servers. And in fact, you could actually say that the cloud was born
serverless in the same way and in fact, it's only gonna
get more and more serverless as time progresses, as (mumbles) announcements
even after re:Invent come along and as we are gonna be
providing easier and easier ways to run and operate your applications. So when Lambda was launched in 2014, then the industry sort of created this weird term, serverless,
it was designed to help people with this mental model of running code without managing servers
or infrastructure. But over the past, sort
of nearly decade now that we've been doing this, that sort of has evolved
a little bit more. And what we try and
think of serverless now, is more the terms of building applications beyond just running code
to many more things. And today, many people I speak to have a sort of mental model
as serverless being closer to delivering value for customers
without having to manage complex infrastructure capabilities. And then actually what that translates to in a day-to-day basis, is
you're delegating the outcomes of building on the cloud, to people who are experts
on those outcomes. And if you think about what
development in the cloud looks like today, well,
you need to understand how to develop for distributed services. How do you manage failures at large scale and manage availability
and of course performance. And you've got other complexities of managing maybe large
fleets of ephemeral compute and storage and networking
that come in and out in a virtual capacity, and
your network connectivity between various resources,
these all need to be managed with permission constructs and everything that also comes with that. And all of this requires, of course, a certain level of expertise. And over the nearly decade or so we've been doing the serverless thing, that expertise has become the norm. But learning all this cloud expertise isn't the actual value you
get from doing the cloud work. The actual value is delivering
value to your customers and being able to deliver and
build cool things for them. And what we see more and more is builders leveraging AWS's expertise in delivering these best practices, and that's what we call the
term well-architected outcomes. And these are things
like security and scale and performance and availability, so the builders can focus their efforts on the differentiated work that they need to do for their customers. And when building serverless applications, we sort of actually
evolve our building blocks from infrastructure perimeters, these are things like load
balances and instance types, networking and storage, to
rather application constructs like databases, functions,
queues, workflows, and many more things. And this distinction is
actually where I think some people sort of miss the full value proposition of serverless, when we talk about less
infrastructure to manage. And AWS really has a broader
selection of services to offer those application constructs. EventBridge, debt
functions, Lambda, DynamoDB, and all our managed
services including Redshift, ElastiCache, managed Kafka and others that are offering serverless services. And they're all offering or moving to a more serverless model, where we bake in the
well-architected goodness for you. And so I'd like you to sort
of rather consider serverless as a strategic mindset and approach to how you can build applications. And certainly the events
over the past years and the economic environment we're in, has universally sharpened
the focus on business value. So what Werner spoke about
in his keynote this morning, was about cost and value
and efficiency and speed in enabling real customer value. So today's serverless
can be thought of more as that operational model
of being able to run or build applications
without having to focus on the undifferentiated
muck, we like to call it, of managing low level infrastructure. And this allows you to
build within the cloud, taking advantage of all the features, security, agility and scale,
not just building on the cloud, on top of the cloud with having to use a whole
bunch of abstractions that maybe make you do a lot more work. And the benefits are clear, getting apps faster from
prototype into production, fast feedback loop, which
helps you iterate quickly for your business. And we do need to measure things. I really actually like the DORA metrics from the team who've
really had a huge influence on making DevOps successful. And there are four metrics for working out how well you
can release applications, how well you can release
application software, how often you can release the production, the time then from commit to
maybe running a production, and then the percent of deployments causing a production failure, and then how long to recover
from a production failure if you do have one. And the two top metrics
then correlate to speed, how quickly to get features into the hands of your customers, but then also importantly, quality, how good those changes are
because I mean, the reality is there's no point being
really, really quick and rushing things into production, when you have to then go
back out and redo the work to restore functionality. But Dave Farley also of big DevOps fame says if you want high speed, you must build high quality systems, and if you want high quality systems, you must build them quickly
as a series of small changes. And this actually means excitingly that there isn't a trade off
between speed and quality. They actually work together
when you can do it right. And Serverless is really a
great way to achieve this and improve your DORA metrics, iterating with small changes quickly. And Serverless brings you
agility and cost benefits, which you can expect when
you build on top of AWS. And the thought process
I'll leave you here is innovation comes from speed
and speed means doing less and so to do less, go serverless. The next topic to cover
is service-full serverless using configuration rather than code and using managed services
and features where possible. When we often talk about
a serverless application, one maybe with Lambda,
which we know can be written in a number of languages or of course you can bring
your own, with an event source, which then triggers your Lambda function and based on a change or a request, and then perform some
actions on that request and sends it to another service. This is a very common
Lambda-based application. But what if the event source directly talks to a destination service? You don't have to then
maintain your own code and this is a direct service integration, what is called being service-full. And a great code from one
of the fathers of Lambda, Ajay Nair, who says, you misuse Lambda when you need to transform data,
not just to transport data. If you're just copying data around, well, of course there're gonna
be other ways to do that. And another thing to think about is how much logic you're then
squeezing into your code. Are you adding more and more
functionality into your code, into a Lambda function, doing
everything possible in code, if then's decision, trees,
all those kind of things, and it becomes what we call a Lambda lift, getting a little bit large and unwieldy. Or another way, how little
code are you actually running in your Lambda function
when your function runs? You've got a whole lot
of code in your function that isn't doing much, well, it's gonna be adding complexity, it means you've gotta
have tests against that, you've gotta secure that and you're not actually using that code. And this often does come
from good intentions when you're moving to the cloud, well, you've got an application
that sits on premises maybe in a container or a VM, and a lot of the components
and functions on the app are in a single place,
that was a good thing. And so you move it to... you move it to the cloud and of course, wisely you
think I'm gonna choose Lambda for your compute, stick
an API in front of it and maybe S3 for some of the storage, but all those components
and a lot of that complexity is just moving into your Lambda function. And when you ultimately really
what should be migrating all those components into
different discreet services as it shows here, using the
best service for the job, move your front end to S3, get API Gateway to handle your auth, your
caching, your routing, and maybe your throttling,
and then you can use your various messaging
services asynchronously, offload transactions to a workflow, use the native service
error handling and retries and then also split your Lambda functions into more discrete targeted components. And this all helps you
scale your application, provides high resilience,
improved security, and hopefully even better costs. And as part of this, it does help to make your functions modular and single purpose if you can, instead of having a huge,
big single Lambda function that does a whole bunch of things, rather have multiple functions
that each do a single thing. For example, if you have a
single image processing function that changes the format,
creates a thumbnail and adds it to a database,
that's what's going on here, think about it maybe having
three different Lambda functions that do each process. This also improves performance as you don't have to load extra
code that you don't need to and you can improve security as each function can be scoped down to only what it needs to do. But to unpack the Lambda-lith
a little bit that people use, it might seem reasonable to have an app where API gateway then is
gonna catch all the requests and route them downstream
to a single Lambda function. And then the Lambda function
itself can contain logic to branch internally, to
fit the appropriate code and to run the appropriate code. And that can be based on
the inbound event method, the URL or the query parameters. And this works, and yes, it
does scale operationally, but it means that the
security, the permissions, the resources, the memory
allocation and performance is applied to the whole function and when you think about splitting this up into more granular functions. Now, this is an extreme example where every single API route
is a separate Lambda function, which of course does have the benefits of very granular ion permissions and being able to manage your scaling per individual function, but operationally, this is
gonna be a lot to manage. And so particularly for web apps like this and other scenarios, it doesn't
make sense to be pragmatic about how you group your web
route and your functions. Too many functions can
be an operational burden and too few can be too broad security and have resource issues. So grouping your function,
your Lambda functions based on maybe bounded
context or permission groups or no common dependencies or
maybe your initialization time, can give you the best of both worlds, effective permissions
and resource management and operational simplicity. So more on using services. When building distributed applications, another aspect to think about is how you can effectively
use orchestration and choreography as communication methods for your workflows and rather
than writing your own code, manage this as configuration. Now, in orchestration you
have a central coordinator which is gonna manage the
service to service communication and do the coordination
and act of interactions and ordering, in which
the services are used. Now, choreography is slightly different and it communicates without tight control. Events flow between the services without any central coordination and many applications
use both choreography and orchestration for different use cases. Step Functions is an example, it's an orchestrator doing
that central coordination and ordering to manage a workflow. And EventBridge is a great choreographer when you don't need strict
ordering and workflows, and need events to flow seamlessly without centralized control. And at re:Invent this year,
we've been demoing a great app which I've loved doing, which shows how these
two can work together. ServerlessVideo is a live
video streaming application built with serverless technologies. Make sure to take a look. We are bringing you live
broadcasts from AWS experts all throughout re:Invents,
and after the live broadcast, you can watch the content on-demand and all of this is managed
with a serverless backend. There are a number of microservices
managing the channels, video streaming and publishing, and doing post-processing of videos, which has got a really cool
flexible plugin architecture where different builders
can build functionality to do a whole bunch of different things. And it could be transcribing
the speech-to-text, generating the video titles
based on generative AI and doing some optimized
integration with Amazon Bedrock and also doing some content moderation. And this uses then bus
EventBridge and Step Functions to work effectively together. It uses EventBridge to pass information between the microservices, and each microservice then
does what it needs to do, and asynchronously places a
finished event on the event bus. An individual microservice like the video processing service, then uses Step Functions
to do its orchestration. It's got decision logic
like whether to use Lambda or Fargate for the compute
depending on the video length and then Step Functions
then makes a decision, does the orchestration parts, and when it finishes,
emits an event when done so the rest of the microservices
can react, very powerful. The plugin manager service
also uses Step Functions to handle the video processing timeline using various lifecycle hooks. And so the text-to-speech and
the Gen AI title generation, all work in a particular order. Again, when finished, the
plugin manager service puts an event back on the event bus and the other services can react. Extremely flexible, and
of course super scalable. With Step Functions, great
opportunities to remove code, the state machine on the left
is doing quite a lot of logic with Lambda functions, but
they're pretty much just invoking other AWS services. So you can optimize this
with direct SDK integrations like this, implementing
the same business logic without running and paying
for a Lambda function. And obviously you can mix and match and transition gradually,
you have complete control over what your workflow contains. So the story of how choreography and orchestration work together, as we see when running serverless video, each of the microservices
can act independently and within each microservice, the bounded context then
decides what happens. Step Functions can help
with any orchestration within the service and then
use events to communicate between the bounds,
between the microservices, which is a very effective way to build distributed applications. If you didn't know, there
are actually two parts of Step Functions, standard workflows, they run for up to a year on asynchronous, and express workflows on the
right, they fast and furious and they build for high throughput and they can run for a max of five minutes and can also be synchronous. Standard workflows and express workflows have different sort of pricing models, and the cool thing is, with
the different cost structures, express workflows can also
be significantly cheaper. And here in this example we've got here, you can use express flows as the workflow runs
synchronously under five minutes, and this is actually half a second faster to run per execution, so
also a performance boost. And then a million standard
workflow executions would cost $420, but
using express workflows, this is $12.77. So that is seriously quite
a big cost benefit too. But even better story is how
they can both work together nesting express workflows
within standard workflows, allowing you to run
long-running standard workflows that can support callbacks
and other kinda things. And then you nest the express
workflows for high speed and high scale, which return
to the parent standard workflow when they're complete and a great way to get
the best of both worlds, which is also happy for your budget. Now when building the Step Functions, you don't have to start from scratch, our team has to put together the serverless workflows collection, prebuilt open-source
Step Functions workflows on serverlessland.com, for
a whole bunch of patterns that you can literally just pick up and get going as soon as you need to. Also other options for reducing code, same goes for API gateway. Do you have Lambda functions
that you serve only as a proxy between API Gateway and
downstream services? Well, you can optimize them as well. You can configure API Gateway to connect directly to
multiple AWS services, such as Dynamo, SQS, Step
Functions and many more. Once again, no need to use
Lambda just as a proxy. There are many other ways to reduce code and use native service integrations, it's a common pattern to
consume DynamoDB streams using a Lambda function
to pause the events and then put them on EventBridge maybe for a downstream service
to, I dunno, take some action when a new customer is added
to a database, for example. Well, EventBridge pipes,
if you're not aware, is another part of the EventBridge family and allows you to do just this but with configuration code
rather than Lambda code. You configure the pipe to read from Dynamo and then there's a built-in integration to send the event to an event bus. And the pipe actually uses
the same polling mechanism under the hood as the
Lambda event source mapping but the code to move the
data is just handled for you. Just manage your configuration, which doesn't need security
patching or any maintenance. It's a winner in my book. So with all of this
service-full stuff remember, the best performing and
cheapest Lambda function is the one you actually remove and replace with a built-in integration. But don't get too excited about that, that's not the whole story, over to Chris. - Thanks, Julian. So Julian's obviously just covered here a whole bunch of ways that you can build serverless applications without thinking about Lambda. Well, Lambda was obviously the thing that kind of started the
world of serverless here for us at AWS, actually we
didn't first call it Lambda a serverless product when we launched it, but obviously we've seen this concept and this world kinda grow around it. Now, Julian talked a
little bit about Lambda, the model that we have of
you have an invoke source, you have a Lambda function, you have the things that
your Lambda function does, and one of the unique
things that Lambda brought to the industry that we didn't have before was an ability to directly
invoke application code behind an API as a service. Now, today there are over
140 different services that can invoke Lambda
functions on your behalf, and there's three ways that they do that; synchronously, asynchronously
or via what we call a stream or poll-based model, otherwise known as event source mappings. Now use the different services, again, do this on your behalf, and you can also use the API directly to invoke these functions. Then one of the things that
we did based on feedback and hearing from our
customers for many years was back in April of 2022, we announced Lambda function URLs. So this gives you the ability
to invoke a Lambda function directly from HTTP endpoint, essentially looking very
similar to a webhook. So you've got a couple different ways now that you can invoke Lambda functions, again, integrated with the
platform via its API directly or via the webhook model. Now when it comes to thinking
about performance in Lambda, we kinda say that there's
one kind of primary knob that you can turn, and again,
maybe to lean a little bit on Werner's joke from
his keynote earlier today of kinda cranking the
knob for a performance, essentially what we do is
we give you the ability to configure the memory
of a Lambda function, and what comes with that is
a proportional amount of CPU and then essentially
networking throughput. So today you can configure
Lambda functions anywhere from 128 megabytes up to 10 gigabytes, and again, that gives you this
proportional amount of CPU and network bandwidth. Now, customers often ask or
they're trying to understand when they are performance-bound, again, how do I get more access to CPU? And again, that is the
primary way that you do it. This is an example here, this diagram is not entirely accurate, it doesn't completely go
linearly as you scale, there are some stepping actions to it, but essentially as you
increase the amount of memory, you then of course get that
proportion amount of CPU, and so at 10 gigabytes
you get up to six cores. Now, technically before this we start exposing the cores to you, but essentially what we're
doing behind the scenes is we're limiting the power of those cores up until you get up to the
maximum memory configuration. And so you do end up again at some point with six cores that you can make use of, but the key aspect of this that
makes it successful for you is that your code has
to support the ability to run across the cores. So technically a Lambda function tops out at a single core performance somewhere between about 1.5
and 1.8 gigabytes of memory, and so if your core, if
your application code is not multi-threaded, that's where you're gonna see
basically the maximum payoff in terms of CPU performance. Again, you might need
the additional memory for your function for other
needs, but when it comes to CPU that's gonna kinda be where you top off. So ways to think about this, right? Let's assume that I have
two different functions, one is configured for
two gigabytes of memory, it runs for one second, another one is configured
for one gigabyte of memory, it runs for two seconds. Effectively these are the exact
same when it comes to cost. So running for half as much
time, twice as much memory, is the same as running for double as long with half as much memory. Now how about this one here? So I have a function that's
configured for 128 megabytes, it runs for 10 seconds, and then I have a function
configured for one gigabyte and it runs for one second. The answer in this case is
that the one that runs for... the one that has one gigabyte configured, is the lower cost one, right? So why is this happening? Typically as you're
getting more CPU power, you're able then to have your
application code run faster, where do you see this? Almost any place that you
have a Lambda function calling out to another service. So a number of years ago,
HTTPS across the industry, the TLS certificate bit rates
increased from 1,024 to 2,048 to 4,096 that you see
sometimes these days, and actually required a linear increase, or sorry, a logarithmic increase in CPU in order to handle the
encryption of the traffic back and forth between
the source of destination. So even if all your function does is talk to single HTTPS endpoint, more memory will give
you a faster function. Now, I said there's
basically just one knob that we give you for
performance, it's kind of a lie, there's another toggle that we give you, which is the type of CPU that you can run your Lambda functions on. So we've launched back in 2014
with x86, 64-bit processors, today we also have Graviton2. So Graviton2, you do get a
better price performance, again, you do wanna test your application depending on what your code does, whether or not it's gonna
be supported on Graviton, but generally speaking,
when we see customers move to Graviton, they find success in being able to all save money and have functions run faster. Now you don't have to blindly
stumble into doing this. We've got a number of ways
that you can explore it, one is with the Lambda Power Tuning tool, this is an open source project that was started by a
member of our community who's now a member of (murmurs) staff, but it's been wonderfully
supported by the community for many years, and
what it allows you to do is take a function configuration and then punch or push a bunch
of tests invocations at it, and then you can change or have it test for different types of configurations. So we see here in this diagram that I've got a number of
different memory configurations that it's testing, what
it can come back basically and tell me, or I can
deduce from the data here is this is the lowest cost function, this is the fastest function. The lowest cost and the fastest
may not always be the same, and so it depends on
what you're looking for. If I have a synchronous invocation, I probably care a lot
more about performance. If I have an asynchronous invocation or one of the event mapping
or poll-based functions, I probably care a little
bit more about cost. Generally speaking, I'm
not looking for things that are consuming SQS
to be fast necessarily. And the same thing goes for when you're working with Graviton. So you can basically take your
function, deploy it on x86, run it through Power Tuning, you can then take your
function, deploy it on Graviton, run it through Power Tuning, and the Power Tuning tool allows
you to compare or contrast those two runs that you have. And so we can see here that the
Graviton configured function ran 27% faster and 41% cheaper
again, for the workload that was used in this test. And so free and easy tool for you to use that gives you the ability to test these different configurations. We also have another tool inside of AWS, which is called AWS Compute Optimizer, this gives you a whole
bunch of information, it's constantly kind of
looking at your functions and how they perform over time, again, it gives you the
ability to kind of look at the different options
for turning down memory base on performance and what you need. And so again, another tool
in the toolbox that you have for when your functions are
actually running in production, to see, hey, does this seem
like it's configured well? Should I think differently
about configuring it? The next thing I wanna talk about here is the AWS Lambda execution
environment lifecycle, I'm gonna talk about
everyone's favorite topic here, which is cold starts and I know we've got AJ
somewhere in the room down the front here, who gave a great talk on demystifying cold
starts earlier this week. So cold starts, what is this? So I've been talking about cold starts, I feel like for half of
my life here at Amazon, but essentially what this is is that when Lambda needs to
create a new worker environment to run your code, there's a period of time when we have to bring up that environment and make it available to you. Now, there are a couple places where this happens due
to actions that you take, and there's a couple
places where this happens due to actions that we have to take. But the real key thing that
I want you to understand here is the line that's here in purple, which is that our data shows that cold starts impact
less than a percent of all production function invokes. So again, if you have
a production workload and you have any sort of
consistency or normalcy of traffic, generally speaking, cold
starts should be pretty far out on the tailend of your traffic. Now, for some of you that are running against synchronous-based
workloads, you've got APIs, you've got consumers on the
other end of that maybe, that 1% might be not acceptable to you so we'll talk about how you gonna overcome some of these challenges later today. Other times that you'll see code starts if you deploy new function versions, if you deploy new code
to your Lambda functions, that's gonna cause us to
have to basically swap out the environments for you,
and then they'll spin back up as traffic comes into them. Again, we'll talk about how
you can get past that as well. On our side, so again, Lambda
is a managed compute service. We take care of a bunch of
things under the hood for you and that's part of the
magic of what Lambda does, so from time to time we actually do have to what we call reap these environments, take them back away from
you for various reasons, keep the instances fresh, give the operating system
various code patches, stuff like that, again for the managed runtime
configurations of Lambda, we're taking care of a lot of
these things on your behalf and so we have to take
care of those things. Another is failure, right? As Werner has always
said for many years now, everything fails over time, and so eventually you
have a problem potentially and so again, you could see environments get kinda swapped out from under you. Now, if we break down, again,
here this functional lifecycle and we look at where the
cold start does happen, so what happens inside of
this are a number of things. One, again, we have to create that new execution environment. We basically have to find in
our pool of resources a host that we wanna run your code on, we then have to download
your code or the OSI image if you're using container packaging, we have to then kick up your runtime, again, whether it's a managed runtime, a custom runtime or the OSI image, and then we have to run what's called your
function pre-handler code. Then after that point,
your function is warm and it's ready to execute upon the event that's been sent into it. Now basically, in a managed
runtime world on Lambda, this is where there's
kind of a demarcation here between what you can control
and what you can't control. And so essentially everything that comes before the init of the runtime is on us. The Lambda team spends a lot
of time over the years here, shaving down milliseconds,
nanoseconds, improving jitter and trying to make everything that comes on our side of this line here
faster and faster and faster. Let's talk a little bit how the composure of a Lambda
function impacts things. So this is some kind of
example pseudo code here, nothing really kind of
amazing going on here although Julian did beat me up for having a dash in a function and saying that that wasn't clean Python, so this is apparently clean Python, but what we see here is I've
got kinda two sections here that are part of my
initialization of my function, this is code that's gonna
run in that init period during a cold start before
my actual invocation, and then have my handler function, again, the handler function
is where we look to execute your business logic and
we pass the event into during an actual event invoke, and then if you follow
a best practice of ours, one of the things that
we encourage you to do is to take your kinda core business logic, not have that in the handler, but have it in separate functions
or separate parts of code inside of your application, some of you will wrap up
your own business logic into other packages
that you might include, some of you might use layers
and containers for this, and it really helps with
portability, testing, keeping the handler nice and kinda short and clean and concise, and so again, a general
kind of best practice that we recommend overall for Lambda. Now, there are some
things that you could do to help make the init be faster,
again, for your functions, one here is that you
only really wanna import the things that you need, so some of the various SDKs libraries that you might be using will
allow you to selectively import just certain aspects. So don't import a huge, huge library that's got tens of
thousands of lines of code if you only really need
a small subset of that. Another thing that you can do is basically to lazy
initialize various libraries based on need. So you might have something
where inside of a function, let's say that I have potentially
two different logic paths inside of my function, one might use S3, one might use DynamoDB, I can essentially at the
needed time inside of my code, decide to further
initialize those aspects. Again, with the way that Lambda works, once they've been initialized
in a warm environment, that will stick around going forward. So again, it depends on
what you're trying to do. Are you looking to get
through init really fast, are you looking to get
through Lambda invocations really fast, that's something
that you should check. Someone should wake up
'cause their alarm went off and I'm sorry if I put you to sleep now. Now, again, we've got a
bunch of other guidance here for what to think about in
that init prehandler code, again, I'm not gonna go
bullet point by bullet point through this, but this is stuff that you kinda wanna loosely be aware of, don't load it if you don't need it, again, we see lots of people
bringing lots of tools and things into their Lambda
functions, lots of excess code, try to keep that as minimal as you can, try to lazy initialize shared libraries, try to think about how
you establish connections. So sometimes establishing
connections in init makes sense, sometimes you're better off waiting for during the first kind
of invoking your function and connecting at the time of need and then also reestablishing connections inside your handler, think
about how you use state during your function. So sometimes people like
to bring state way early and then they maybe don't
need it for every invoke, and so it kind of sits around again and can slow them down early on, and then we'll talk a little bit more here about provisioned concurrency
and SnapStart in a bit, but these are ways to effectively pre-warm or pre-initialize your functions. Now X-Ray can help you identify this as well as a number of other tools that we have here in the
industry, partners of AWS's, so we see here, as I've highlighted, where the initialization is,
I could go further with X-Ray and actually tool inside
of my Lambda functions, the individual things that
are happening inside of it so that I get really good deep data on what's happening inside
of an initialization, and again, I think this is something that is part of your testing environments, it's part of you testing your functions, you really wanna measure this
to understand the impact. Now, there are a couple other variations of the Lambda function
lifecycle that we see, one is if you use a
capability called extensions, extensions give you the
ability to plug in code that exists outside of the
actual execution of your function and respond to events or things that are happening inside your function. So we have many partners here at AWS that are released extensions
that allow you to do things like inspect an event,
inspect performance, look at what's happening, say,
on the wire over the network, provide things like
access to parameter stores or key value stores for various things, different logging tools
and agents and so forth. When you have those in your code, it shifts the optimization
line over a bit. It shifts that line of
shared responsibility because the extension performance
then becomes something that the third-party partner or your team has to think about. And so that's something
that you end up owning and we have seen some
of these partners need to tweak things over time where the extension hasn't
been as optimized as it could. The next model that we have
is what we see that happens with SnapStart for Java function. So SnapStart was released last
year, and what SnapStart does is it basically goes and it
completes the full invoke of your code for you ahead of time, and then it takes a snapshot
or effectively like an image of that running, of that
execution environment, and then makes it available going forward for your Lambda function. And so basically then what it does is for every new invoke that happens for a function with SnapStart, it starts from that
pre-inited environment. And so this could be
a really great benefit for Java-based functions, the only language that
will support init today, it's also a language that
historically has struggled within init performance. Now, SnapStart again, I'm pretty much just encouraging customers that are using Java just to use it. It works really, really well, the couple of nuances of
things you wanna think about how you connect to databases
because that will be frozen in the image then that the rest of the
environment we use over time, but beyond that, there's no
additional cost for this, there's no other tooling that
you need to use for this, there's no special
packaging you need to do, nothing changes in your CI/CD pipelines, you basically toggle it in the config and it helps make your Java functions much, much, much, much
faster on that init. Now, there are other optimization
things that you could do across pretty much all the
different runtimes that we have, and there's a bunch of talks
that have happened this week covering optimizations inside of Lambda and general kinda coding best practices, and what I would say
is that a lot of these are just general best practices,
either SDK best practices or again, best practices
for the given runtime that you're working with. Now, one really cool hack that I like to see every now and then, this is a super secret
trick that I've learned over a couple decades of working in IT, is upgrade your stuff. So one of the best things
that you could do sometimes, one of the cheapest, easiest,
like dumbest, laziest win is like just run the latest
version of something. And there's been a lot of
examples over the year, removing a minor version on a runtime, gets you a performance win
and you're like, "Oh my God, all I did was deploy a new
version and I'm saving money and my stuff runs faster,"
and that's awesome. And so keep on top of version updates, keep on top of your dependency updates. Like yes, Dependabot can be
annoying for security things, but there's a lot of stuff that happens, especially when you're including
code, where minor tweaks, minor new versions, whatever it might be, could lead to graphs like this where you see a major drop off just by moving to a new runtime, and so I love to see wins like this. Now one thing that isn't
necessarily a performance win but can help logically
inside your functions, how you think about logging, and we've just had a bunch
of new stuff come out with logging here in
the last couple weeks, both pre-re:Invent and
then during this week here and certain observability tools, one of the things that you
could do is now have the ability to control log levels for
your Lambda functions, now, not necessarily a performance thing, but definitely a cost thing
that we see with Lambda, people who are aggressively logging, it's gonna lead to higher cost. So now you can set the log
level, control the log format and the outputs of those,
you also have the ability to use the infrequent access
log class in CloudWatch, again, it's gonna help you save money and just make overall things better. Now, one thing that I'm
also a huge, huge fan of is the Powertools for Lambda, this basically helps
automate a whole bunch of best practices
guidance in your function, how you think about
coding for your functions, how you think about how you
handle and process events, that team has been cranking on full steam, they became an official team inside of AWS under Andrea (murmurs) earlier this year, who's an incredible member
of our team here at AWS, and so the Powertools are something that we're seeing really great
adoption of inside of AWS, outside of AWS, and it's
kind of just best practices in a box and so I definitely encourage you to look at Powertools. The last thing I'll talk about here is another thing that you can do, is turn on the CloudWatch Lambda Insights, Lambda Insights gives you more data about how your Lambda functions perform, now, this is good for testing
and good for production and good for diagnosing things, maybe you don't wanna
keep it on all the time, you are gonna be paying for this data that CloudWatch collects, but it gives you a bunch
of metrics and information that you wouldn't otherwise get with the default CloudWatch
metrics for Lambda functions, such as the CPU usage, the network usage, and it can also then in those
cases when you see those, help you think about, okay,
I should tune up memory, I should think about my
configuration for functions a little differently. Now, we've got a great
learning guide to this on Serverless Land, cost
optimization for AWS Lambda, a whole bunch of stuff in here that can help you thinking about cost and cost, again, very
much aligns to performance in this world. So let's tell you another
fun topic here, concurrency, concurrency is basically the number of your execution environments that are running at any point in time. And this is definitely another topic that I find that people have
struggled with over the years, they're not talking about a
per second rate typically, but concurrency in Lambda, again, works a little bit different. One of the other aspects here about Lambda is that a Lambda worker environment
or execution environment can only process a single
event at a time, right? We do not today support the ability for you to have multiple
events inside of that. Now, this is different
if you're using Lambda to process things like SQS
messages where we do batching. So batching still comes
through a single event, but again, it's not multiple
events in individual messages. So again, regardless of invocation model, regardless of everything that might be sitting in front of it, you do have, again, a single environment processing a single
effectively request or event at any point in time. So we think about this, let's
take a scale of time here in a window, I have a function that has just gotten an invocation, it's gonna have that
little bit of cold start go through my init code, and then it's going to
execute and run my logic, and so again, this is only
processing that single request. So all of a sudden I start
to get more requests in because that first environment
is essentially locked on that first event, what does it do? It causes some cold starts, these new environments are
spun up, and so we see here that I have these two new
function invocations that came in, they both cause a cold start and then they start processing the event. While those three are still running, I get two more that come in, again, all three of
those first environments are still basically busy or tied up, and so essentially I now
have two more environments that have to go through that cold start and begin invoking the
event that's coming to it, however, I can see here that at some point my first environment becomes free. And so another invoke comes in and the Lambda service behind
the scenes is able to say, "Aha, I have an environment
that's already warmed and up and running, I'm gonna
pass the event to that." And so essentially here now
we now have effectively, depending on where you're
looking on this timeline, three concurrency, four
concurrency and so on. And so eventually as more
of these events come in, the service says, "I have
warmed environments," we keep warmed environments
around for a period of time based on idleness, based
on scale of your function, a number of other factors, and again, without some
of the other things I'll talk about here in a moment, you really don't get to control this, it's something that we take care of and try to optimize on your behalf. And so as we see events seven,
eight, nine and 10 come in, they're able to use these
warmed environments, however, nine had come in during a time when there
was no environment free, and so again, that caused an init. So in thinking about how
concurrency actually works over a period of time here, and this is kind of a loose
scale of time, what we see here is that during the time
period for point one, I have one concurrency, during the time period two,
still have one concurrency, and eventually at time point
three, still one concurrency. But then as this expands over time, you see again, where these
function environments are active, where they're being utilized, is how you think about the concurrency at that point in time. So again, this is not necessarily a request per second type of model, this is just a point in time way of thinking about what's
happening with my functions. Now, one thing that can happen that you might see from time to time if you're using tools like X-Ray or other observability tools,
is you might see a disconnect in between a cold start
and a function invocation inside of an environment. And so one thing that
could actually happen is we see up top here that environment, the worker environment on that top level for the first function
came in, did its init, did its invocation,
and then at some point, I got a second invocation that came in. And the first environment was tied up, and so we started to do a cold start for a new worker environment. However, before that init finished, the first function
environment became free again and it said, "I'm
available for an invoke," and so behind the scenes
we sent the invoke to that first worker. Now, at this point, the second
worker becomes available at some point in time
for a future invocation, but if you're using tools
like X-Ray, you might see then that you had a cold start
and then you had no execution that happened for a really
long period of time, and then all of a sudden
you see the actual invoke of the Lambda function
happen kind of detached. And so this shows up as a
gap in tools like X-Ray, but again, know that what happened here is that we're optimizing for the performance of your application. And so we're basically giving the freshest worker environment
of the invoke when we can. Now, talking a little bit here about TPS, TPS starts to play a role when you talk about
downstream systems, right? If I'm talking to a relational database, I only have so much capacity and ability to work with that database, or maybe I have a third-party
API that I'm working with, again, I might be constrained by some aspects of that third-party, effectively transaction per second are relativity against the
concurrency that you have and the time that it takes
your functions to run. So we could see here that
if I have 10 invocations and they each take a second to run, effectively I have 10 TPS. Then if my functions took half as long, I would be able to fit up to
20 of them in that time period. And so again, I don't actually have more than 10 concurrency, it's
just that that 10 concurrency is working faster because
it takes a shorter duration. And so again, performance here
as a factor of concurrency, the time duration of
your function is running and that's what leads to the combination of what looks like TPS. So if I have a downstream
service that I'm talking to and they're gonna maybe start
throttling me back at 15 TPS, I then have to think about that as a factor of the amount of concurrency that I wanna allow to go
to that downstream service. And so we have options for
how you can control this, we have a concept called
reserved concurrency, this allows me basically
to set a threshold of how much concurrency I
wanna allow a function to have. And again, this can gimme the ability to protect downstream services
without having to worry about overwhelming or causing errors down to that other service
that I might be talking to, and again, there's a bunch of
cool things that you could do with treating it as like an off switch in case of times when
you've got downstream issues or impacts. Now, one thing that we do have in Lambda that can help you avoid cold starts is a capability called
provisioned concurrency. What provisioned concurrency
does is it comes in and you configure it for a
certain value, and then we go and we effectively prewarm
those environments for you and we try to keep those
warmed environments available for you always. So if we have, say, we need to reap or take back a worker environment or a piece of hardware
fails, we'll then go back and re-preprovision
those functions for you. And so we see here that you
configured your function, you turned on provisioned
concurrency for concurrency of 10, we go then and run all those inits. So you see all those
inits happen in parallel, at some point your events come in and they land on those
already warmed environments and so you won't see a cold start in front of those environments. Now again, you are setting
this initially for a value, so you're setting it for 10. So if I had an 11th request
come in at this point in time and all my workers are busy, that's then going to just be an on-demand effectively invoke and that would cause a cold
start for that function. One of the other cool things that we do with provisioned concurrency, is it does have a slightly
different cost model for it but if you use it really, really well, you actually save money
with provisioned concurrency over the on-demand pay model for Lambda. And so we see, and it does
vary slightly by region, but somewhere around about 65%
utilization of your function, at least in this case US-east-1
or 60% US-east-1 these days, when you've utilized at least 60% of your provisioned currency
for a given function, it actually becomes cheaper for you to run that Lambda
function versus the on-demand. So again, this is both a
cost and a performance knob that we give you for
your Lambda functions. And so you can kind of think
of how we could apply this on top of our workload, let's assume that we have
some concept of the traffic that's gonna come to our application that's backed by Lambda
at some point in time, we've looked at this workload, what we could do is essentially
establish a baseline of provisioned concurrency that we always wanna have
configured for this function, we could then use tools like auto-scaling to actually turn up and
down provision concurrency against that cost demand. And so in so far as as
long as this environment is at least 60% utilized, again,
US-east-1, I'm saving money and increasing the performance
effectively in my application by removing cold starts from it. And so again, you wanna
keep on top of this, it would be a mistake for me to set the provisioned
concurrency for this application at 100, which is kind
of at the top bar here 'cause then I might have
environments for periods of the day that we're not getting invoked, and you really do wanna
try to find this model where you can leverage again,
the ability to either set a baseline that covers the majority of your traffic over time, or again, fluctuates based
on the need of the day. Now, another thing to talk about when we talk about concurrency is also how fast Lambda
functions can scale over time. Now, previously we had a model
of account concurrency quota where based on a given Region, you had a total amount of concurrency and then you had a burst
rate inside of that. We've kind of gone now and
changed the burst model, so this came out just
about two weeks ago now, and this applies to basically
all functions that exist. So now what you end up with is a maximum increase in
concurrency of 1,000 instances or 1,000 worker environments, over a period of 10
seconds for every function. To help visualize this
a little bit better, I'll show you the old model that we had. So previously and burst
rate depended on the Region, what you had is an initial burst rate, so in many Regions we had
initial burst rate of 3,000, so that means you could go
from zero workers to 3,000 pretty much almost instantaneously, and then we had this kind
of stepping scale over time that we could add 500 per minute to that inside of your account. So basically what it looked like is that at some point
in time over 12 minutes, you could get to 10,000 staying requests. This is starting at zero, right? For production workload, you're typically not starting at zero unless you're doing things
like deploying a new function, which is using provisioned concurrency. With the new model, it looks like this. So what we're able to do,
is actually in 90 seconds, get to that 10,000 concurrency. Essentially this is the fastest way at AWS to get a whole lot of compute
power behind an application and it got faster with this. So again, really interesting change, a lot of interesting stuff with scale happening behind the scenes. With this I'm gonna hand
it back off to Julian to take us home. (Chris murmuring)
- Thanks, Chris, wow, I love our Lambda,
so flexible and scalable, anybody like Lambda? - [Audience] Woo. - Excellent, you can run your
code in the best possible way, well-architected way. So in this section, I'm going to talk about the software lifecycle. You have your services,
you have your code, well, how does this all fit
together from your workstation out into the world? Now if you are building
serverless applications, just please use a framework, it's gonna make your life so much easier, there's serverless-specific
infrastructures, code ones to define your cloud resources, from AWS we've got AWS SAM, the
Serverless Application Model and also you can use CDK which allows you to build CloudFormation in familiar programming languages, both generate CloudFormation. There are a number of
great third-party tools from here and even others, but you really want to
be using a framework to build your serverless applications and get into the habits
of starting rather... in with infrastructure's code
rather than in the console. But if like me visual is your thing, also have a look at Application Composer, which you can now also
jump to from the Lambda and Step Functions console, and announced today in Werner's keynote, it's also available in
your IDE with VS Code. And this has got a great
drag and drop interface to build applications. And not just serverless ones, it works with all CloudFormation resources and you can import existing stacks to see what they look like, which is great for understanding
what you already have. It actually syncs with
your local file system so you can build visually
in the console or IDE, and then generate the
infrastructure's code at the same time. Two for one best practices
built in, isn't that good? And you don't also have
to start from scratch. As with Serverless Workflows, we've got the Serverless
patterns collection on serverlessland.com, more than 700 to sample
infrastructures code patterns across many languages,
across many services, and with different service integrations. And I'm sure there's likely
one for your user case which you can just copy and
use in your applications, and because it's all open source, you can even submit your own and why not help out your
fellow other builders? So a traditional developer workflow is often done on your local
machine to get fast feedback while you're developing your applications. And then developers then think, well, they need to have
their entire app locally and run everything locally. However, when you're
building cloud applications, this works slightly differently because sure you've got
code that you're developing, there's also a lot of other
stuff that you're connecting to, integrations with other services. You're gonna be sending
messages and events, or maybe connecting to other APIs or talking to other databases. And so it can be tempting to try to emulate all
these things locally, to build all these services
locally on your laptop so you can do everything,
but this is hard. This is really gonna be hard
to get everything working and also critically to
keep things up-to-date. So try avoid doing this if you can. Now locally, you can do some stuff. So sparingly you can use some
mock frameworks, for example, so if you've got some complex logic and you want to do some testing for that, you can mock your event payloads so you can then provide that
input and check your outputs and that's a really good thing to do. But ideally, we want
the best of both worlds. We do want this sort of
local quick iteration and also in the cloud, you wanna iterate locally
on your business logic, and then also run your
code in a cloud environment as quickly as possible. And so SAM has SAM Accelerate, which helps you with just this. While you're developing, you
can iterate against the cloud with the speed of local development and CDK Watch also does a similar thing. And this allows you really cleverly to work in your local IDE and sync the changes to cloud resources. It actually really quickly updates code, to test against other
resources in the cloud without waiting for
CloudFormation to deploy. And also you can use SAM
logs to get aggregated logs and traces directly in your terminal, so you don't have to
jump into the console, and this makes what developers
call the inner loop, really super quick using
both cloud and cloud... Sorry, both cloud and local resources. And this really does change the way you build service
applications in the cloud, giving you the best of both worlds, fast local development experience and using real cloud resources. Now, just linking back
to those DORA metrics I was talking about earlier, about getting things
into production quicker. Now, remember, we both want
both speed and we want quality. Well, automated testing
is the way to get there. Good testing is an
investment in that speed and in that quality
and they'll help ensure that your systems are developed
efficiently, accurately, and of course with high quality. You want to have good test coverage from your code all the way
through your CI/CD pipelines, so you can confidently get
features into production. Now, there of course, there
are a number of places where tests are important, you should of course unit
test your Lambda function code when developing locally, and
then automatically in the cloud through your pipelines. You can use test harnesses, these are super useful to generate inputs and then to receive the outputs. And then you want to be
testing service integrations in the cloud as quickly as possible. Maybe you're gonna define
some integration tests, maybe you're gonna pick
two or three services and then develop your
full end-to-end testing for all application. And then of course you want to move towards also testing in
production to prioritize speed, this isn't only testing in production, this is also testing in production. So you can use things
like canary deployments and this allows you to
develop things locally, push it to the cloud and
introduce changes more slowly in defined increments, rather than having a big
bang all-at-once approach. Feature Flags also help you
to introduce code effectively, and then back out really quickly
if you do have a problem. And observability, this is absolutely key. Observability tooling is super critical to measure what's happening, and also understand if
things are changing. And a good rollback procedures then allow you to reduce the risk
and also increase your agility. Again, another jumping off
point, plenty of more information to talk about testing, more
than we've got time for today, so Dan Fox and some other experts have written a superb learning
guide on serverlessland.com, which has loads of information and the link is in the resources page, and there are examples for
various programming languages, super helpful. But again, let's just switch and look a little bit about ops again. The biggest barrier to agility
when building applications, is often a lack of time spent
on the things that matters. CIOs want development teams
to focus on innovation and move with speed, but today what are most developers doing? They're spending a lot of time on operations and maintenance. So then you ask the question,
what does ops actually do in a serverless world? Well, I think there's a lot. With serverless, the
operational management required to run and scale applications
is handled with you by AWS and the cloud, so not
only is no ops not a reality, operators are actually
more important than ever. But ops are different but the
role isn't any less important. But the cool thing is, it
actually becomes less manual, it's more strategic, taking on a wider role
within the organization so you can actually operate
safely and with speed. And there are two approaches
to ops, there's free for all. Now of course, this isn't a reality for production applications,
but that at the extreme end, it lets devs go as fast as
they can, but they won't do it the way one goes as quickly as they can. But obviously that's gonna risk bad code, you're gonna have poor code going out, you're gonna have reliability issues and it could be as bad
as even in legal issues. But then on the other end of the spectrum, you've got everything
being centrally controlled. You've got a central team that is gonna take control
of the release pipeline, maybe it's gonna do all the
provisioning of the resources, it's gonna handle all the security and all of the troubleshooting. It's gonna be lower risk, of course, because it's very understandable, but it's obviously gonna be a lot slower due to the dependencies and the time lags. So we actually wanted both ways. We wanted fast to get features
out, really fast iteration, but we also wanted it to be safe with a low risk to the business and this is why we use
this concept of guardrails. And these are processes and practices that reduce both the occurrence and impact of undesirable application behavior, and these are rules that define,
rules that you can define to stop the bad stuff happening, and obviously, you wanna
express them as code. Now, there are many examples, things like enforcing your pipelines and maybe not making things public, logging and tracing, looking at that, whether you need access to
a VPC, tags, log groups, encryption settings,
a whole bunch of stuff in the list here, and these
are things you want to ensure that actually gets done. And these need to be
checked at various stages. As much as you can, of course, while building your application,
the so-called shift lift, but also you wanna catch those to be able to catch those things early on but also at various stages
during your automated pipeline and while your apps are running. And so you have proactive controls which you can use to
check and block resources before they're deployed,
and you can use linting and CloudFormation Guard, super useful. AWS Config if you haven't
used that, is super helpful to get a view of your cloud resources, and you can define the rules to check the compliance of those resources before they're actually
deployed into production. And then on the other side, you've got the detective controls
while your app is running to ensure that your app
still stays in compliance, checking for vulnerabilities
and config issues on a continual basis. And AWS Config is still helpful, and you can also use the
cool Amazon Inspector, for ongoing vulnerability management. Again, another jumping off thing, lots of learning guides
information on this, implementing governance in depth for serverless applications, the link is in the resources page, so I just, you take a look. Now, DevOps has been superb to foster organizational collaboration, but we're asking a lot
for developers to take on more and more in building applications, particularly when we're
building serverless applications and we're using application constructs rather than as infrastructure primitives. And giving developers full... giving developers' teams
full control over everything is intimidating and it's gonna be complex, especially with the governance. And so the concept of platform
engineering recognizes that and a lot of the operational
and governance issues actually don't need to be surfaced to the developers directly, and you absolutely want to
increase developer productivity and the pace of software delivery. Developers wanna get their stuff done using self-service enablement and great tooling to work
with your applications. A central platform team can provide some of the best practices across your whole organization, to manage and to govern and
to run your applications. But I also wanna caution
you just a little bit from building one huge
platform to rule them all. The job of a platform team should not be about building a platform, it should be about enablement and integrating other platforms. And you probably wanna have
maybe many teams doing this, enabling many platforms
that your devs can use, from security platforms,
logging platforms, dev tooling, and to integrate with other things. You want your platform
teams to work closely with your dev teams, to understand how the platforms
are being actually used within the business, to better
enable people to use it. Unless if you don't do that, it's just gonna become
another isolated silo and probably a very expensive one. And here are just some
examples of the kind of things that platform teams can get involved with to help your developers
working, look at all this list. Observability, CI/CD pipelines,
deployment strategies, cost management, security, so if anybody says that
serverless doesn't require ops, they certainly dunno what
they're talking about and you're can send them my
way, I'll tell them what to do. So in our time today, we
said we would cover a lot, is everybody still okay, still breathing? Good, good, well, you
probably need some time to digest this all, so that
just was part of the plan and what we thought. So we've got this resources link, we talked about how
Serverless lets you focus and concentrate on your customers, first of all, how you can build with great service-full serverless, connecting different services together, obviously the awesome power of Lambda that Chris was talking about and then talking about the whole
software delivery lifecycle and how you can get
things into productions. But of course, we haven't even started, we don't have another hour. There are many more best practices and optimizations available,
the link to resources page includes all the links
in this presentation and a whole bunch more, so we'd suggest you have
a look at that as well. Of course, you can
continue your AWS learning, you can do Skill Builders
and Ramp-Up Guides and digital badges, just some cool things you can learn more about
Serverless development, and of course we mentioned
it a few times today, ServerlessLand.com, your best resources for all things Serverless on AWS. So with a deep breath, from Chris and I, we really appreciate you joining us today, it's your fourth day of re:Invent, you've still survived this far, hopefully we've given you
some things to think about, and also, if you really
like deep technical content, please rate in the
survey in the mobile app, five-star rating, it lets us know that you're
absolutely hungry for more, and our contact details are here and we will be around a bit in the foyer if you do have any questions,
enjoy the rest of your day. (audience applauding)