AWS re:Invent 2023 - Best practices for serverless developers (SVS401)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Good morning, everybody, welcome to Poor Use Cases for Generative AI, hopefully you're in the right room. No, don't worry, don't worry. We're in this beautiful room with a lovely audience, being altogether is such a treat. And to the people in the similar cast room somewhere else in Vegas, hello to you and thank you for joining us, we're with you in spirits and if you're watching the recording afterwards, well, high five to everybody in the future, thanks so much for watching. My name's Julian Wood, I'm a developer advocate in the Serverless Team, I love teaching and helping people build serverless applications and also act as your voice internally to make sure we're building the best products and features. And I'm joined by the one and only... - Oh, that's me, Chris Munns, I used to lead developer advocacy for Serverless here at AWS, these days I work as part of our Startup team as the tech lead for North America. - Cool, Chris is gonna be talking about lots of that later, but for now, so this is a big topic and talking about best practices for Serverless, we unfortunately don't have all four days of re:Invent, so there is a bit of a small warning, we are gonna be covering a lot and we're gonna be going quickly. So to give you as many possible best practices as we can and give you as many jumping off points with other links to more contents and information to dive even deeper. So the slides and the recording will be available later, it means you don't necessarily have to take pictures if you don't want to, this resources page, which I'm gonna share again at the end, already has the slides and lots of other links to best practices, so you can have all the things you need to build your serverless applications. I've actually done two previous talks on this topic at re:Invent in previous years, if you haven't seen them, the links are also in the resources page, lots of best practices over there also worth considering, we just wanna cover some new stuff today and have some more things to think about. So Chris and I just need to take a breath, you probably need to take a breath, it's gonna be a lot today, everybody ready? - Woo. - Let's go, okay. So a question we may initially be thinking about is, well, all the serverless stuff, is it a fad or actually is it the future? Well, we commonly think of the start of serverless at Lambda like nine years ago at re:Invent. But S3 and SQS, some of our foundational services and they're very much serverless services, launched way before Lambda in 2006, and even before EC2 in 2008. The initial fundamental building blocks of AWS were built before we even introduced the idea of being able to rent servers. And in fact, you could actually say that the cloud was born serverless in the same way and in fact, it's only gonna get more and more serverless as time progresses, as (mumbles) announcements even after re:Invent come along and as we are gonna be providing easier and easier ways to run and operate your applications. So when Lambda was launched in 2014, then the industry sort of created this weird term, serverless, it was designed to help people with this mental model of running code without managing servers or infrastructure. But over the past, sort of nearly decade now that we've been doing this, that sort of has evolved a little bit more. And what we try and think of serverless now, is more the terms of building applications beyond just running code to many more things. And today, many people I speak to have a sort of mental model as serverless being closer to delivering value for customers without having to manage complex infrastructure capabilities. And then actually what that translates to in a day-to-day basis, is you're delegating the outcomes of building on the cloud, to people who are experts on those outcomes. And if you think about what development in the cloud looks like today, well, you need to understand how to develop for distributed services. How do you manage failures at large scale and manage availability and of course performance. And you've got other complexities of managing maybe large fleets of ephemeral compute and storage and networking that come in and out in a virtual capacity, and your network connectivity between various resources, these all need to be managed with permission constructs and everything that also comes with that. And all of this requires, of course, a certain level of expertise. And over the nearly decade or so we've been doing the serverless thing, that expertise has become the norm. But learning all this cloud expertise isn't the actual value you get from doing the cloud work. The actual value is delivering value to your customers and being able to deliver and build cool things for them. And what we see more and more is builders leveraging AWS's expertise in delivering these best practices, and that's what we call the term well-architected outcomes. And these are things like security and scale and performance and availability, so the builders can focus their efforts on the differentiated work that they need to do for their customers. And when building serverless applications, we sort of actually evolve our building blocks from infrastructure perimeters, these are things like load balances and instance types, networking and storage, to rather application constructs like databases, functions, queues, workflows, and many more things. And this distinction is actually where I think some people sort of miss the full value proposition of serverless, when we talk about less infrastructure to manage. And AWS really has a broader selection of services to offer those application constructs. EventBridge, debt functions, Lambda, DynamoDB, and all our managed services including Redshift, ElastiCache, managed Kafka and others that are offering serverless services. And they're all offering or moving to a more serverless model, where we bake in the well-architected goodness for you. And so I'd like you to sort of rather consider serverless as a strategic mindset and approach to how you can build applications. And certainly the events over the past years and the economic environment we're in, has universally sharpened the focus on business value. So what Werner spoke about in his keynote this morning, was about cost and value and efficiency and speed in enabling real customer value. So today's serverless can be thought of more as that operational model of being able to run or build applications without having to focus on the undifferentiated muck, we like to call it, of managing low level infrastructure. And this allows you to build within the cloud, taking advantage of all the features, security, agility and scale, not just building on the cloud, on top of the cloud with having to use a whole bunch of abstractions that maybe make you do a lot more work. And the benefits are clear, getting apps faster from prototype into production, fast feedback loop, which helps you iterate quickly for your business. And we do need to measure things. I really actually like the DORA metrics from the team who've really had a huge influence on making DevOps successful. And there are four metrics for working out how well you can release applications, how well you can release application software, how often you can release the production, the time then from commit to maybe running a production, and then the percent of deployments causing a production failure, and then how long to recover from a production failure if you do have one. And the two top metrics then correlate to speed, how quickly to get features into the hands of your customers, but then also importantly, quality, how good those changes are because I mean, the reality is there's no point being really, really quick and rushing things into production, when you have to then go back out and redo the work to restore functionality. But Dave Farley also of big DevOps fame says if you want high speed, you must build high quality systems, and if you want high quality systems, you must build them quickly as a series of small changes. And this actually means excitingly that there isn't a trade off between speed and quality. They actually work together when you can do it right. And Serverless is really a great way to achieve this and improve your DORA metrics, iterating with small changes quickly. And Serverless brings you agility and cost benefits, which you can expect when you build on top of AWS. And the thought process I'll leave you here is innovation comes from speed and speed means doing less and so to do less, go serverless. The next topic to cover is service-full serverless using configuration rather than code and using managed services and features where possible. When we often talk about a serverless application, one maybe with Lambda, which we know can be written in a number of languages or of course you can bring your own, with an event source, which then triggers your Lambda function and based on a change or a request, and then perform some actions on that request and sends it to another service. This is a very common Lambda-based application. But what if the event source directly talks to a destination service? You don't have to then maintain your own code and this is a direct service integration, what is called being service-full. And a great code from one of the fathers of Lambda, Ajay Nair, who says, you misuse Lambda when you need to transform data, not just to transport data. If you're just copying data around, well, of course there're gonna be other ways to do that. And another thing to think about is how much logic you're then squeezing into your code. Are you adding more and more functionality into your code, into a Lambda function, doing everything possible in code, if then's decision, trees, all those kind of things, and it becomes what we call a Lambda lift, getting a little bit large and unwieldy. Or another way, how little code are you actually running in your Lambda function when your function runs? You've got a whole lot of code in your function that isn't doing much, well, it's gonna be adding complexity, it means you've gotta have tests against that, you've gotta secure that and you're not actually using that code. And this often does come from good intentions when you're moving to the cloud, well, you've got an application that sits on premises maybe in a container or a VM, and a lot of the components and functions on the app are in a single place, that was a good thing. And so you move it to... you move it to the cloud and of course, wisely you think I'm gonna choose Lambda for your compute, stick an API in front of it and maybe S3 for some of the storage, but all those components and a lot of that complexity is just moving into your Lambda function. And when you ultimately really what should be migrating all those components into different discreet services as it shows here, using the best service for the job, move your front end to S3, get API Gateway to handle your auth, your caching, your routing, and maybe your throttling, and then you can use your various messaging services asynchronously, offload transactions to a workflow, use the native service error handling and retries and then also split your Lambda functions into more discrete targeted components. And this all helps you scale your application, provides high resilience, improved security, and hopefully even better costs. And as part of this, it does help to make your functions modular and single purpose if you can, instead of having a huge, big single Lambda function that does a whole bunch of things, rather have multiple functions that each do a single thing. For example, if you have a single image processing function that changes the format, creates a thumbnail and adds it to a database, that's what's going on here, think about it maybe having three different Lambda functions that do each process. This also improves performance as you don't have to load extra code that you don't need to and you can improve security as each function can be scoped down to only what it needs to do. But to unpack the Lambda-lith a little bit that people use, it might seem reasonable to have an app where API gateway then is gonna catch all the requests and route them downstream to a single Lambda function. And then the Lambda function itself can contain logic to branch internally, to fit the appropriate code and to run the appropriate code. And that can be based on the inbound event method, the URL or the query parameters. And this works, and yes, it does scale operationally, but it means that the security, the permissions, the resources, the memory allocation and performance is applied to the whole function and when you think about splitting this up into more granular functions. Now, this is an extreme example where every single API route is a separate Lambda function, which of course does have the benefits of very granular ion permissions and being able to manage your scaling per individual function, but operationally, this is gonna be a lot to manage. And so particularly for web apps like this and other scenarios, it doesn't make sense to be pragmatic about how you group your web route and your functions. Too many functions can be an operational burden and too few can be too broad security and have resource issues. So grouping your function, your Lambda functions based on maybe bounded context or permission groups or no common dependencies or maybe your initialization time, can give you the best of both worlds, effective permissions and resource management and operational simplicity. So more on using services. When building distributed applications, another aspect to think about is how you can effectively use orchestration and choreography as communication methods for your workflows and rather than writing your own code, manage this as configuration. Now, in orchestration you have a central coordinator which is gonna manage the service to service communication and do the coordination and act of interactions and ordering, in which the services are used. Now, choreography is slightly different and it communicates without tight control. Events flow between the services without any central coordination and many applications use both choreography and orchestration for different use cases. Step Functions is an example, it's an orchestrator doing that central coordination and ordering to manage a workflow. And EventBridge is a great choreographer when you don't need strict ordering and workflows, and need events to flow seamlessly without centralized control. And at re:Invent this year, we've been demoing a great app which I've loved doing, which shows how these two can work together. ServerlessVideo is a live video streaming application built with serverless technologies. Make sure to take a look. We are bringing you live broadcasts from AWS experts all throughout re:Invents, and after the live broadcast, you can watch the content on-demand and all of this is managed with a serverless backend. There are a number of microservices managing the channels, video streaming and publishing, and doing post-processing of videos, which has got a really cool flexible plugin architecture where different builders can build functionality to do a whole bunch of different things. And it could be transcribing the speech-to-text, generating the video titles based on generative AI and doing some optimized integration with Amazon Bedrock and also doing some content moderation. And this uses then bus EventBridge and Step Functions to work effectively together. It uses EventBridge to pass information between the microservices, and each microservice then does what it needs to do, and asynchronously places a finished event on the event bus. An individual microservice like the video processing service, then uses Step Functions to do its orchestration. It's got decision logic like whether to use Lambda or Fargate for the compute depending on the video length and then Step Functions then makes a decision, does the orchestration parts, and when it finishes, emits an event when done so the rest of the microservices can react, very powerful. The plugin manager service also uses Step Functions to handle the video processing timeline using various lifecycle hooks. And so the text-to-speech and the Gen AI title generation, all work in a particular order. Again, when finished, the plugin manager service puts an event back on the event bus and the other services can react. Extremely flexible, and of course super scalable. With Step Functions, great opportunities to remove code, the state machine on the left is doing quite a lot of logic with Lambda functions, but they're pretty much just invoking other AWS services. So you can optimize this with direct SDK integrations like this, implementing the same business logic without running and paying for a Lambda function. And obviously you can mix and match and transition gradually, you have complete control over what your workflow contains. So the story of how choreography and orchestration work together, as we see when running serverless video, each of the microservices can act independently and within each microservice, the bounded context then decides what happens. Step Functions can help with any orchestration within the service and then use events to communicate between the bounds, between the microservices, which is a very effective way to build distributed applications. If you didn't know, there are actually two parts of Step Functions, standard workflows, they run for up to a year on asynchronous, and express workflows on the right, they fast and furious and they build for high throughput and they can run for a max of five minutes and can also be synchronous. Standard workflows and express workflows have different sort of pricing models, and the cool thing is, with the different cost structures, express workflows can also be significantly cheaper. And here in this example we've got here, you can use express flows as the workflow runs synchronously under five minutes, and this is actually half a second faster to run per execution, so also a performance boost. And then a million standard workflow executions would cost $420, but using express workflows, this is $12.77. So that is seriously quite a big cost benefit too. But even better story is how they can both work together nesting express workflows within standard workflows, allowing you to run long-running standard workflows that can support callbacks and other kinda things. And then you nest the express workflows for high speed and high scale, which return to the parent standard workflow when they're complete and a great way to get the best of both worlds, which is also happy for your budget. Now when building the Step Functions, you don't have to start from scratch, our team has to put together the serverless workflows collection, prebuilt open-source Step Functions workflows on serverlessland.com, for a whole bunch of patterns that you can literally just pick up and get going as soon as you need to. Also other options for reducing code, same goes for API gateway. Do you have Lambda functions that you serve only as a proxy between API Gateway and downstream services? Well, you can optimize them as well. You can configure API Gateway to connect directly to multiple AWS services, such as Dynamo, SQS, Step Functions and many more. Once again, no need to use Lambda just as a proxy. There are many other ways to reduce code and use native service integrations, it's a common pattern to consume DynamoDB streams using a Lambda function to pause the events and then put them on EventBridge maybe for a downstream service to, I dunno, take some action when a new customer is added to a database, for example. Well, EventBridge pipes, if you're not aware, is another part of the EventBridge family and allows you to do just this but with configuration code rather than Lambda code. You configure the pipe to read from Dynamo and then there's a built-in integration to send the event to an event bus. And the pipe actually uses the same polling mechanism under the hood as the Lambda event source mapping but the code to move the data is just handled for you. Just manage your configuration, which doesn't need security patching or any maintenance. It's a winner in my book. So with all of this service-full stuff remember, the best performing and cheapest Lambda function is the one you actually remove and replace with a built-in integration. But don't get too excited about that, that's not the whole story, over to Chris. - Thanks, Julian. So Julian's obviously just covered here a whole bunch of ways that you can build serverless applications without thinking about Lambda. Well, Lambda was obviously the thing that kind of started the world of serverless here for us at AWS, actually we didn't first call it Lambda a serverless product when we launched it, but obviously we've seen this concept and this world kinda grow around it. Now, Julian talked a little bit about Lambda, the model that we have of you have an invoke source, you have a Lambda function, you have the things that your Lambda function does, and one of the unique things that Lambda brought to the industry that we didn't have before was an ability to directly invoke application code behind an API as a service. Now, today there are over 140 different services that can invoke Lambda functions on your behalf, and there's three ways that they do that; synchronously, asynchronously or via what we call a stream or poll-based model, otherwise known as event source mappings. Now use the different services, again, do this on your behalf, and you can also use the API directly to invoke these functions. Then one of the things that we did based on feedback and hearing from our customers for many years was back in April of 2022, we announced Lambda function URLs. So this gives you the ability to invoke a Lambda function directly from HTTP endpoint, essentially looking very similar to a webhook. So you've got a couple different ways now that you can invoke Lambda functions, again, integrated with the platform via its API directly or via the webhook model. Now when it comes to thinking about performance in Lambda, we kinda say that there's one kind of primary knob that you can turn, and again, maybe to lean a little bit on Werner's joke from his keynote earlier today of kinda cranking the knob for a performance, essentially what we do is we give you the ability to configure the memory of a Lambda function, and what comes with that is a proportional amount of CPU and then essentially networking throughput. So today you can configure Lambda functions anywhere from 128 megabytes up to 10 gigabytes, and again, that gives you this proportional amount of CPU and network bandwidth. Now, customers often ask or they're trying to understand when they are performance-bound, again, how do I get more access to CPU? And again, that is the primary way that you do it. This is an example here, this diagram is not entirely accurate, it doesn't completely go linearly as you scale, there are some stepping actions to it, but essentially as you increase the amount of memory, you then of course get that proportion amount of CPU, and so at 10 gigabytes you get up to six cores. Now, technically before this we start exposing the cores to you, but essentially what we're doing behind the scenes is we're limiting the power of those cores up until you get up to the maximum memory configuration. And so you do end up again at some point with six cores that you can make use of, but the key aspect of this that makes it successful for you is that your code has to support the ability to run across the cores. So technically a Lambda function tops out at a single core performance somewhere between about 1.5 and 1.8 gigabytes of memory, and so if your core, if your application code is not multi-threaded, that's where you're gonna see basically the maximum payoff in terms of CPU performance. Again, you might need the additional memory for your function for other needs, but when it comes to CPU that's gonna kinda be where you top off. So ways to think about this, right? Let's assume that I have two different functions, one is configured for two gigabytes of memory, it runs for one second, another one is configured for one gigabyte of memory, it runs for two seconds. Effectively these are the exact same when it comes to cost. So running for half as much time, twice as much memory, is the same as running for double as long with half as much memory. Now how about this one here? So I have a function that's configured for 128 megabytes, it runs for 10 seconds, and then I have a function configured for one gigabyte and it runs for one second. The answer in this case is that the one that runs for... the one that has one gigabyte configured, is the lower cost one, right? So why is this happening? Typically as you're getting more CPU power, you're able then to have your application code run faster, where do you see this? Almost any place that you have a Lambda function calling out to another service. So a number of years ago, HTTPS across the industry, the TLS certificate bit rates increased from 1,024 to 2,048 to 4,096 that you see sometimes these days, and actually required a linear increase, or sorry, a logarithmic increase in CPU in order to handle the encryption of the traffic back and forth between the source of destination. So even if all your function does is talk to single HTTPS endpoint, more memory will give you a faster function. Now, I said there's basically just one knob that we give you for performance, it's kind of a lie, there's another toggle that we give you, which is the type of CPU that you can run your Lambda functions on. So we've launched back in 2014 with x86, 64-bit processors, today we also have Graviton2. So Graviton2, you do get a better price performance, again, you do wanna test your application depending on what your code does, whether or not it's gonna be supported on Graviton, but generally speaking, when we see customers move to Graviton, they find success in being able to all save money and have functions run faster. Now you don't have to blindly stumble into doing this. We've got a number of ways that you can explore it, one is with the Lambda Power Tuning tool, this is an open source project that was started by a member of our community who's now a member of (murmurs) staff, but it's been wonderfully supported by the community for many years, and what it allows you to do is take a function configuration and then punch or push a bunch of tests invocations at it, and then you can change or have it test for different types of configurations. So we see here in this diagram that I've got a number of different memory configurations that it's testing, what it can come back basically and tell me, or I can deduce from the data here is this is the lowest cost function, this is the fastest function. The lowest cost and the fastest may not always be the same, and so it depends on what you're looking for. If I have a synchronous invocation, I probably care a lot more about performance. If I have an asynchronous invocation or one of the event mapping or poll-based functions, I probably care a little bit more about cost. Generally speaking, I'm not looking for things that are consuming SQS to be fast necessarily. And the same thing goes for when you're working with Graviton. So you can basically take your function, deploy it on x86, run it through Power Tuning, you can then take your function, deploy it on Graviton, run it through Power Tuning, and the Power Tuning tool allows you to compare or contrast those two runs that you have. And so we can see here that the Graviton configured function ran 27% faster and 41% cheaper again, for the workload that was used in this test. And so free and easy tool for you to use that gives you the ability to test these different configurations. We also have another tool inside of AWS, which is called AWS Compute Optimizer, this gives you a whole bunch of information, it's constantly kind of looking at your functions and how they perform over time, again, it gives you the ability to kind of look at the different options for turning down memory base on performance and what you need. And so again, another tool in the toolbox that you have for when your functions are actually running in production, to see, hey, does this seem like it's configured well? Should I think differently about configuring it? The next thing I wanna talk about here is the AWS Lambda execution environment lifecycle, I'm gonna talk about everyone's favorite topic here, which is cold starts and I know we've got AJ somewhere in the room down the front here, who gave a great talk on demystifying cold starts earlier this week. So cold starts, what is this? So I've been talking about cold starts, I feel like for half of my life here at Amazon, but essentially what this is is that when Lambda needs to create a new worker environment to run your code, there's a period of time when we have to bring up that environment and make it available to you. Now, there are a couple places where this happens due to actions that you take, and there's a couple places where this happens due to actions that we have to take. But the real key thing that I want you to understand here is the line that's here in purple, which is that our data shows that cold starts impact less than a percent of all production function invokes. So again, if you have a production workload and you have any sort of consistency or normalcy of traffic, generally speaking, cold starts should be pretty far out on the tailend of your traffic. Now, for some of you that are running against synchronous-based workloads, you've got APIs, you've got consumers on the other end of that maybe, that 1% might be not acceptable to you so we'll talk about how you gonna overcome some of these challenges later today. Other times that you'll see code starts if you deploy new function versions, if you deploy new code to your Lambda functions, that's gonna cause us to have to basically swap out the environments for you, and then they'll spin back up as traffic comes into them. Again, we'll talk about how you can get past that as well. On our side, so again, Lambda is a managed compute service. We take care of a bunch of things under the hood for you and that's part of the magic of what Lambda does, so from time to time we actually do have to what we call reap these environments, take them back away from you for various reasons, keep the instances fresh, give the operating system various code patches, stuff like that, again for the managed runtime configurations of Lambda, we're taking care of a lot of these things on your behalf and so we have to take care of those things. Another is failure, right? As Werner has always said for many years now, everything fails over time, and so eventually you have a problem potentially and so again, you could see environments get kinda swapped out from under you. Now, if we break down, again, here this functional lifecycle and we look at where the cold start does happen, so what happens inside of this are a number of things. One, again, we have to create that new execution environment. We basically have to find in our pool of resources a host that we wanna run your code on, we then have to download your code or the OSI image if you're using container packaging, we have to then kick up your runtime, again, whether it's a managed runtime, a custom runtime or the OSI image, and then we have to run what's called your function pre-handler code. Then after that point, your function is warm and it's ready to execute upon the event that's been sent into it. Now basically, in a managed runtime world on Lambda, this is where there's kind of a demarcation here between what you can control and what you can't control. And so essentially everything that comes before the init of the runtime is on us. The Lambda team spends a lot of time over the years here, shaving down milliseconds, nanoseconds, improving jitter and trying to make everything that comes on our side of this line here faster and faster and faster. Let's talk a little bit how the composure of a Lambda function impacts things. So this is some kind of example pseudo code here, nothing really kind of amazing going on here although Julian did beat me up for having a dash in a function and saying that that wasn't clean Python, so this is apparently clean Python, but what we see here is I've got kinda two sections here that are part of my initialization of my function, this is code that's gonna run in that init period during a cold start before my actual invocation, and then have my handler function, again, the handler function is where we look to execute your business logic and we pass the event into during an actual event invoke, and then if you follow a best practice of ours, one of the things that we encourage you to do is to take your kinda core business logic, not have that in the handler, but have it in separate functions or separate parts of code inside of your application, some of you will wrap up your own business logic into other packages that you might include, some of you might use layers and containers for this, and it really helps with portability, testing, keeping the handler nice and kinda short and clean and concise, and so again, a general kind of best practice that we recommend overall for Lambda. Now, there are some things that you could do to help make the init be faster, again, for your functions, one here is that you only really wanna import the things that you need, so some of the various SDKs libraries that you might be using will allow you to selectively import just certain aspects. So don't import a huge, huge library that's got tens of thousands of lines of code if you only really need a small subset of that. Another thing that you can do is basically to lazy initialize various libraries based on need. So you might have something where inside of a function, let's say that I have potentially two different logic paths inside of my function, one might use S3, one might use DynamoDB, I can essentially at the needed time inside of my code, decide to further initialize those aspects. Again, with the way that Lambda works, once they've been initialized in a warm environment, that will stick around going forward. So again, it depends on what you're trying to do. Are you looking to get through init really fast, are you looking to get through Lambda invocations really fast, that's something that you should check. Someone should wake up 'cause their alarm went off and I'm sorry if I put you to sleep now. Now, again, we've got a bunch of other guidance here for what to think about in that init prehandler code, again, I'm not gonna go bullet point by bullet point through this, but this is stuff that you kinda wanna loosely be aware of, don't load it if you don't need it, again, we see lots of people bringing lots of tools and things into their Lambda functions, lots of excess code, try to keep that as minimal as you can, try to lazy initialize shared libraries, try to think about how you establish connections. So sometimes establishing connections in init makes sense, sometimes you're better off waiting for during the first kind of invoking your function and connecting at the time of need and then also reestablishing connections inside your handler, think about how you use state during your function. So sometimes people like to bring state way early and then they maybe don't need it for every invoke, and so it kind of sits around again and can slow them down early on, and then we'll talk a little bit more here about provisioned concurrency and SnapStart in a bit, but these are ways to effectively pre-warm or pre-initialize your functions. Now X-Ray can help you identify this as well as a number of other tools that we have here in the industry, partners of AWS's, so we see here, as I've highlighted, where the initialization is, I could go further with X-Ray and actually tool inside of my Lambda functions, the individual things that are happening inside of it so that I get really good deep data on what's happening inside of an initialization, and again, I think this is something that is part of your testing environments, it's part of you testing your functions, you really wanna measure this to understand the impact. Now, there are a couple other variations of the Lambda function lifecycle that we see, one is if you use a capability called extensions, extensions give you the ability to plug in code that exists outside of the actual execution of your function and respond to events or things that are happening inside your function. So we have many partners here at AWS that are released extensions that allow you to do things like inspect an event, inspect performance, look at what's happening, say, on the wire over the network, provide things like access to parameter stores or key value stores for various things, different logging tools and agents and so forth. When you have those in your code, it shifts the optimization line over a bit. It shifts that line of shared responsibility because the extension performance then becomes something that the third-party partner or your team has to think about. And so that's something that you end up owning and we have seen some of these partners need to tweak things over time where the extension hasn't been as optimized as it could. The next model that we have is what we see that happens with SnapStart for Java function. So SnapStart was released last year, and what SnapStart does is it basically goes and it completes the full invoke of your code for you ahead of time, and then it takes a snapshot or effectively like an image of that running, of that execution environment, and then makes it available going forward for your Lambda function. And so basically then what it does is for every new invoke that happens for a function with SnapStart, it starts from that pre-inited environment. And so this could be a really great benefit for Java-based functions, the only language that will support init today, it's also a language that historically has struggled within init performance. Now, SnapStart again, I'm pretty much just encouraging customers that are using Java just to use it. It works really, really well, the couple of nuances of things you wanna think about how you connect to databases because that will be frozen in the image then that the rest of the environment we use over time, but beyond that, there's no additional cost for this, there's no other tooling that you need to use for this, there's no special packaging you need to do, nothing changes in your CI/CD pipelines, you basically toggle it in the config and it helps make your Java functions much, much, much, much faster on that init. Now, there are other optimization things that you could do across pretty much all the different runtimes that we have, and there's a bunch of talks that have happened this week covering optimizations inside of Lambda and general kinda coding best practices, and what I would say is that a lot of these are just general best practices, either SDK best practices or again, best practices for the given runtime that you're working with. Now, one really cool hack that I like to see every now and then, this is a super secret trick that I've learned over a couple decades of working in IT, is upgrade your stuff. So one of the best things that you could do sometimes, one of the cheapest, easiest, like dumbest, laziest win is like just run the latest version of something. And there's been a lot of examples over the year, removing a minor version on a runtime, gets you a performance win and you're like, "Oh my God, all I did was deploy a new version and I'm saving money and my stuff runs faster," and that's awesome. And so keep on top of version updates, keep on top of your dependency updates. Like yes, Dependabot can be annoying for security things, but there's a lot of stuff that happens, especially when you're including code, where minor tweaks, minor new versions, whatever it might be, could lead to graphs like this where you see a major drop off just by moving to a new runtime, and so I love to see wins like this. Now one thing that isn't necessarily a performance win but can help logically inside your functions, how you think about logging, and we've just had a bunch of new stuff come out with logging here in the last couple weeks, both pre-re:Invent and then during this week here and certain observability tools, one of the things that you could do is now have the ability to control log levels for your Lambda functions, now, not necessarily a performance thing, but definitely a cost thing that we see with Lambda, people who are aggressively logging, it's gonna lead to higher cost. So now you can set the log level, control the log format and the outputs of those, you also have the ability to use the infrequent access log class in CloudWatch, again, it's gonna help you save money and just make overall things better. Now, one thing that I'm also a huge, huge fan of is the Powertools for Lambda, this basically helps automate a whole bunch of best practices guidance in your function, how you think about coding for your functions, how you think about how you handle and process events, that team has been cranking on full steam, they became an official team inside of AWS under Andrea (murmurs) earlier this year, who's an incredible member of our team here at AWS, and so the Powertools are something that we're seeing really great adoption of inside of AWS, outside of AWS, and it's kind of just best practices in a box and so I definitely encourage you to look at Powertools. The last thing I'll talk about here is another thing that you can do, is turn on the CloudWatch Lambda Insights, Lambda Insights gives you more data about how your Lambda functions perform, now, this is good for testing and good for production and good for diagnosing things, maybe you don't wanna keep it on all the time, you are gonna be paying for this data that CloudWatch collects, but it gives you a bunch of metrics and information that you wouldn't otherwise get with the default CloudWatch metrics for Lambda functions, such as the CPU usage, the network usage, and it can also then in those cases when you see those, help you think about, okay, I should tune up memory, I should think about my configuration for functions a little differently. Now, we've got a great learning guide to this on Serverless Land, cost optimization for AWS Lambda, a whole bunch of stuff in here that can help you thinking about cost and cost, again, very much aligns to performance in this world. So let's tell you another fun topic here, concurrency, concurrency is basically the number of your execution environments that are running at any point in time. And this is definitely another topic that I find that people have struggled with over the years, they're not talking about a per second rate typically, but concurrency in Lambda, again, works a little bit different. One of the other aspects here about Lambda is that a Lambda worker environment or execution environment can only process a single event at a time, right? We do not today support the ability for you to have multiple events inside of that. Now, this is different if you're using Lambda to process things like SQS messages where we do batching. So batching still comes through a single event, but again, it's not multiple events in individual messages. So again, regardless of invocation model, regardless of everything that might be sitting in front of it, you do have, again, a single environment processing a single effectively request or event at any point in time. So we think about this, let's take a scale of time here in a window, I have a function that has just gotten an invocation, it's gonna have that little bit of cold start go through my init code, and then it's going to execute and run my logic, and so again, this is only processing that single request. So all of a sudden I start to get more requests in because that first environment is essentially locked on that first event, what does it do? It causes some cold starts, these new environments are spun up, and so we see here that I have these two new function invocations that came in, they both cause a cold start and then they start processing the event. While those three are still running, I get two more that come in, again, all three of those first environments are still basically busy or tied up, and so essentially I now have two more environments that have to go through that cold start and begin invoking the event that's coming to it, however, I can see here that at some point my first environment becomes free. And so another invoke comes in and the Lambda service behind the scenes is able to say, "Aha, I have an environment that's already warmed and up and running, I'm gonna pass the event to that." And so essentially here now we now have effectively, depending on where you're looking on this timeline, three concurrency, four concurrency and so on. And so eventually as more of these events come in, the service says, "I have warmed environments," we keep warmed environments around for a period of time based on idleness, based on scale of your function, a number of other factors, and again, without some of the other things I'll talk about here in a moment, you really don't get to control this, it's something that we take care of and try to optimize on your behalf. And so as we see events seven, eight, nine and 10 come in, they're able to use these warmed environments, however, nine had come in during a time when there was no environment free, and so again, that caused an init. So in thinking about how concurrency actually works over a period of time here, and this is kind of a loose scale of time, what we see here is that during the time period for point one, I have one concurrency, during the time period two, still have one concurrency, and eventually at time point three, still one concurrency. But then as this expands over time, you see again, where these function environments are active, where they're being utilized, is how you think about the concurrency at that point in time. So again, this is not necessarily a request per second type of model, this is just a point in time way of thinking about what's happening with my functions. Now, one thing that can happen that you might see from time to time if you're using tools like X-Ray or other observability tools, is you might see a disconnect in between a cold start and a function invocation inside of an environment. And so one thing that could actually happen is we see up top here that environment, the worker environment on that top level for the first function came in, did its init, did its invocation, and then at some point, I got a second invocation that came in. And the first environment was tied up, and so we started to do a cold start for a new worker environment. However, before that init finished, the first function environment became free again and it said, "I'm available for an invoke," and so behind the scenes we sent the invoke to that first worker. Now, at this point, the second worker becomes available at some point in time for a future invocation, but if you're using tools like X-Ray, you might see then that you had a cold start and then you had no execution that happened for a really long period of time, and then all of a sudden you see the actual invoke of the Lambda function happen kind of detached. And so this shows up as a gap in tools like X-Ray, but again, know that what happened here is that we're optimizing for the performance of your application. And so we're basically giving the freshest worker environment of the invoke when we can. Now, talking a little bit here about TPS, TPS starts to play a role when you talk about downstream systems, right? If I'm talking to a relational database, I only have so much capacity and ability to work with that database, or maybe I have a third-party API that I'm working with, again, I might be constrained by some aspects of that third-party, effectively transaction per second are relativity against the concurrency that you have and the time that it takes your functions to run. So we could see here that if I have 10 invocations and they each take a second to run, effectively I have 10 TPS. Then if my functions took half as long, I would be able to fit up to 20 of them in that time period. And so again, I don't actually have more than 10 concurrency, it's just that that 10 concurrency is working faster because it takes a shorter duration. And so again, performance here as a factor of concurrency, the time duration of your function is running and that's what leads to the combination of what looks like TPS. So if I have a downstream service that I'm talking to and they're gonna maybe start throttling me back at 15 TPS, I then have to think about that as a factor of the amount of concurrency that I wanna allow to go to that downstream service. And so we have options for how you can control this, we have a concept called reserved concurrency, this allows me basically to set a threshold of how much concurrency I wanna allow a function to have. And again, this can gimme the ability to protect downstream services without having to worry about overwhelming or causing errors down to that other service that I might be talking to, and again, there's a bunch of cool things that you could do with treating it as like an off switch in case of times when you've got downstream issues or impacts. Now, one thing that we do have in Lambda that can help you avoid cold starts is a capability called provisioned concurrency. What provisioned concurrency does is it comes in and you configure it for a certain value, and then we go and we effectively prewarm those environments for you and we try to keep those warmed environments available for you always. So if we have, say, we need to reap or take back a worker environment or a piece of hardware fails, we'll then go back and re-preprovision those functions for you. And so we see here that you configured your function, you turned on provisioned concurrency for concurrency of 10, we go then and run all those inits. So you see all those inits happen in parallel, at some point your events come in and they land on those already warmed environments and so you won't see a cold start in front of those environments. Now again, you are setting this initially for a value, so you're setting it for 10. So if I had an 11th request come in at this point in time and all my workers are busy, that's then going to just be an on-demand effectively invoke and that would cause a cold start for that function. One of the other cool things that we do with provisioned concurrency, is it does have a slightly different cost model for it but if you use it really, really well, you actually save money with provisioned concurrency over the on-demand pay model for Lambda. And so we see, and it does vary slightly by region, but somewhere around about 65% utilization of your function, at least in this case US-east-1 or 60% US-east-1 these days, when you've utilized at least 60% of your provisioned currency for a given function, it actually becomes cheaper for you to run that Lambda function versus the on-demand. So again, this is both a cost and a performance knob that we give you for your Lambda functions. And so you can kind of think of how we could apply this on top of our workload, let's assume that we have some concept of the traffic that's gonna come to our application that's backed by Lambda at some point in time, we've looked at this workload, what we could do is essentially establish a baseline of provisioned concurrency that we always wanna have configured for this function, we could then use tools like auto-scaling to actually turn up and down provision concurrency against that cost demand. And so in so far as as long as this environment is at least 60% utilized, again, US-east-1, I'm saving money and increasing the performance effectively in my application by removing cold starts from it. And so again, you wanna keep on top of this, it would be a mistake for me to set the provisioned concurrency for this application at 100, which is kind of at the top bar here 'cause then I might have environments for periods of the day that we're not getting invoked, and you really do wanna try to find this model where you can leverage again, the ability to either set a baseline that covers the majority of your traffic over time, or again, fluctuates based on the need of the day. Now, another thing to talk about when we talk about concurrency is also how fast Lambda functions can scale over time. Now, previously we had a model of account concurrency quota where based on a given Region, you had a total amount of concurrency and then you had a burst rate inside of that. We've kind of gone now and changed the burst model, so this came out just about two weeks ago now, and this applies to basically all functions that exist. So now what you end up with is a maximum increase in concurrency of 1,000 instances or 1,000 worker environments, over a period of 10 seconds for every function. To help visualize this a little bit better, I'll show you the old model that we had. So previously and burst rate depended on the Region, what you had is an initial burst rate, so in many Regions we had initial burst rate of 3,000, so that means you could go from zero workers to 3,000 pretty much almost instantaneously, and then we had this kind of stepping scale over time that we could add 500 per minute to that inside of your account. So basically what it looked like is that at some point in time over 12 minutes, you could get to 10,000 staying requests. This is starting at zero, right? For production workload, you're typically not starting at zero unless you're doing things like deploying a new function, which is using provisioned concurrency. With the new model, it looks like this. So what we're able to do, is actually in 90 seconds, get to that 10,000 concurrency. Essentially this is the fastest way at AWS to get a whole lot of compute power behind an application and it got faster with this. So again, really interesting change, a lot of interesting stuff with scale happening behind the scenes. With this I'm gonna hand it back off to Julian to take us home. (Chris murmuring) - Thanks, Chris, wow, I love our Lambda, so flexible and scalable, anybody like Lambda? - [Audience] Woo. - Excellent, you can run your code in the best possible way, well-architected way. So in this section, I'm going to talk about the software lifecycle. You have your services, you have your code, well, how does this all fit together from your workstation out into the world? Now if you are building serverless applications, just please use a framework, it's gonna make your life so much easier, there's serverless-specific infrastructures, code ones to define your cloud resources, from AWS we've got AWS SAM, the Serverless Application Model and also you can use CDK which allows you to build CloudFormation in familiar programming languages, both generate CloudFormation. There are a number of great third-party tools from here and even others, but you really want to be using a framework to build your serverless applications and get into the habits of starting rather... in with infrastructure's code rather than in the console. But if like me visual is your thing, also have a look at Application Composer, which you can now also jump to from the Lambda and Step Functions console, and announced today in Werner's keynote, it's also available in your IDE with VS Code. And this has got a great drag and drop interface to build applications. And not just serverless ones, it works with all CloudFormation resources and you can import existing stacks to see what they look like, which is great for understanding what you already have. It actually syncs with your local file system so you can build visually in the console or IDE, and then generate the infrastructure's code at the same time. Two for one best practices built in, isn't that good? And you don't also have to start from scratch. As with Serverless Workflows, we've got the Serverless patterns collection on serverlessland.com, more than 700 to sample infrastructures code patterns across many languages, across many services, and with different service integrations. And I'm sure there's likely one for your user case which you can just copy and use in your applications, and because it's all open source, you can even submit your own and why not help out your fellow other builders? So a traditional developer workflow is often done on your local machine to get fast feedback while you're developing your applications. And then developers then think, well, they need to have their entire app locally and run everything locally. However, when you're building cloud applications, this works slightly differently because sure you've got code that you're developing, there's also a lot of other stuff that you're connecting to, integrations with other services. You're gonna be sending messages and events, or maybe connecting to other APIs or talking to other databases. And so it can be tempting to try to emulate all these things locally, to build all these services locally on your laptop so you can do everything, but this is hard. This is really gonna be hard to get everything working and also critically to keep things up-to-date. So try avoid doing this if you can. Now locally, you can do some stuff. So sparingly you can use some mock frameworks, for example, so if you've got some complex logic and you want to do some testing for that, you can mock your event payloads so you can then provide that input and check your outputs and that's a really good thing to do. But ideally, we want the best of both worlds. We do want this sort of local quick iteration and also in the cloud, you wanna iterate locally on your business logic, and then also run your code in a cloud environment as quickly as possible. And so SAM has SAM Accelerate, which helps you with just this. While you're developing, you can iterate against the cloud with the speed of local development and CDK Watch also does a similar thing. And this allows you really cleverly to work in your local IDE and sync the changes to cloud resources. It actually really quickly updates code, to test against other resources in the cloud without waiting for CloudFormation to deploy. And also you can use SAM logs to get aggregated logs and traces directly in your terminal, so you don't have to jump into the console, and this makes what developers call the inner loop, really super quick using both cloud and cloud... Sorry, both cloud and local resources. And this really does change the way you build service applications in the cloud, giving you the best of both worlds, fast local development experience and using real cloud resources. Now, just linking back to those DORA metrics I was talking about earlier, about getting things into production quicker. Now, remember, we both want both speed and we want quality. Well, automated testing is the way to get there. Good testing is an investment in that speed and in that quality and they'll help ensure that your systems are developed efficiently, accurately, and of course with high quality. You want to have good test coverage from your code all the way through your CI/CD pipelines, so you can confidently get features into production. Now, there of course, there are a number of places where tests are important, you should of course unit test your Lambda function code when developing locally, and then automatically in the cloud through your pipelines. You can use test harnesses, these are super useful to generate inputs and then to receive the outputs. And then you want to be testing service integrations in the cloud as quickly as possible. Maybe you're gonna define some integration tests, maybe you're gonna pick two or three services and then develop your full end-to-end testing for all application. And then of course you want to move towards also testing in production to prioritize speed, this isn't only testing in production, this is also testing in production. So you can use things like canary deployments and this allows you to develop things locally, push it to the cloud and introduce changes more slowly in defined increments, rather than having a big bang all-at-once approach. Feature Flags also help you to introduce code effectively, and then back out really quickly if you do have a problem. And observability, this is absolutely key. Observability tooling is super critical to measure what's happening, and also understand if things are changing. And a good rollback procedures then allow you to reduce the risk and also increase your agility. Again, another jumping off point, plenty of more information to talk about testing, more than we've got time for today, so Dan Fox and some other experts have written a superb learning guide on serverlessland.com, which has loads of information and the link is in the resources page, and there are examples for various programming languages, super helpful. But again, let's just switch and look a little bit about ops again. The biggest barrier to agility when building applications, is often a lack of time spent on the things that matters. CIOs want development teams to focus on innovation and move with speed, but today what are most developers doing? They're spending a lot of time on operations and maintenance. So then you ask the question, what does ops actually do in a serverless world? Well, I think there's a lot. With serverless, the operational management required to run and scale applications is handled with you by AWS and the cloud, so not only is no ops not a reality, operators are actually more important than ever. But ops are different but the role isn't any less important. But the cool thing is, it actually becomes less manual, it's more strategic, taking on a wider role within the organization so you can actually operate safely and with speed. And there are two approaches to ops, there's free for all. Now of course, this isn't a reality for production applications, but that at the extreme end, it lets devs go as fast as they can, but they won't do it the way one goes as quickly as they can. But obviously that's gonna risk bad code, you're gonna have poor code going out, you're gonna have reliability issues and it could be as bad as even in legal issues. But then on the other end of the spectrum, you've got everything being centrally controlled. You've got a central team that is gonna take control of the release pipeline, maybe it's gonna do all the provisioning of the resources, it's gonna handle all the security and all of the troubleshooting. It's gonna be lower risk, of course, because it's very understandable, but it's obviously gonna be a lot slower due to the dependencies and the time lags. So we actually wanted both ways. We wanted fast to get features out, really fast iteration, but we also wanted it to be safe with a low risk to the business and this is why we use this concept of guardrails. And these are processes and practices that reduce both the occurrence and impact of undesirable application behavior, and these are rules that define, rules that you can define to stop the bad stuff happening, and obviously, you wanna express them as code. Now, there are many examples, things like enforcing your pipelines and maybe not making things public, logging and tracing, looking at that, whether you need access to a VPC, tags, log groups, encryption settings, a whole bunch of stuff in the list here, and these are things you want to ensure that actually gets done. And these need to be checked at various stages. As much as you can, of course, while building your application, the so-called shift lift, but also you wanna catch those to be able to catch those things early on but also at various stages during your automated pipeline and while your apps are running. And so you have proactive controls which you can use to check and block resources before they're deployed, and you can use linting and CloudFormation Guard, super useful. AWS Config if you haven't used that, is super helpful to get a view of your cloud resources, and you can define the rules to check the compliance of those resources before they're actually deployed into production. And then on the other side, you've got the detective controls while your app is running to ensure that your app still stays in compliance, checking for vulnerabilities and config issues on a continual basis. And AWS Config is still helpful, and you can also use the cool Amazon Inspector, for ongoing vulnerability management. Again, another jumping off thing, lots of learning guides information on this, implementing governance in depth for serverless applications, the link is in the resources page, so I just, you take a look. Now, DevOps has been superb to foster organizational collaboration, but we're asking a lot for developers to take on more and more in building applications, particularly when we're building serverless applications and we're using application constructs rather than as infrastructure primitives. And giving developers full... giving developers' teams full control over everything is intimidating and it's gonna be complex, especially with the governance. And so the concept of platform engineering recognizes that and a lot of the operational and governance issues actually don't need to be surfaced to the developers directly, and you absolutely want to increase developer productivity and the pace of software delivery. Developers wanna get their stuff done using self-service enablement and great tooling to work with your applications. A central platform team can provide some of the best practices across your whole organization, to manage and to govern and to run your applications. But I also wanna caution you just a little bit from building one huge platform to rule them all. The job of a platform team should not be about building a platform, it should be about enablement and integrating other platforms. And you probably wanna have maybe many teams doing this, enabling many platforms that your devs can use, from security platforms, logging platforms, dev tooling, and to integrate with other things. You want your platform teams to work closely with your dev teams, to understand how the platforms are being actually used within the business, to better enable people to use it. Unless if you don't do that, it's just gonna become another isolated silo and probably a very expensive one. And here are just some examples of the kind of things that platform teams can get involved with to help your developers working, look at all this list. Observability, CI/CD pipelines, deployment strategies, cost management, security, so if anybody says that serverless doesn't require ops, they certainly dunno what they're talking about and you're can send them my way, I'll tell them what to do. So in our time today, we said we would cover a lot, is everybody still okay, still breathing? Good, good, well, you probably need some time to digest this all, so that just was part of the plan and what we thought. So we've got this resources link, we talked about how Serverless lets you focus and concentrate on your customers, first of all, how you can build with great service-full serverless, connecting different services together, obviously the awesome power of Lambda that Chris was talking about and then talking about the whole software delivery lifecycle and how you can get things into productions. But of course, we haven't even started, we don't have another hour. There are many more best practices and optimizations available, the link to resources page includes all the links in this presentation and a whole bunch more, so we'd suggest you have a look at that as well. Of course, you can continue your AWS learning, you can do Skill Builders and Ramp-Up Guides and digital badges, just some cool things you can learn more about Serverless development, and of course we mentioned it a few times today, ServerlessLand.com, your best resources for all things Serverless on AWS. So with a deep breath, from Chris and I, we really appreciate you joining us today, it's your fourth day of re:Invent, you've still survived this far, hopefully we've given you some things to think about, and also, if you really like deep technical content, please rate in the survey in the mobile app, five-star rating, it lets us know that you're absolutely hungry for more, and our contact details are here and we will be around a bit in the foyer if you do have any questions, enjoy the rest of your day. (audience applauding)
Info
Channel: AWS Events
Views: 24,821
Rating: undefined out of 5
Keywords: AWS reInvent 2023
Id: sdCA0Y7QDrM
Channel Id: undefined
Length: 59min 50sec (3590 seconds)
Published: Sun Dec 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.