AWS re:Invent 2019: [REPEAT 1] Serverless at scale: Design patterns and optimizations (SVS335-R1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay we'll get started so this is services scale design patterns and optimizations my name is Roberto thank you all for coming if you find this topic interesting or if you found even just the title of today's talk interesting I think here are some other sessions that you may also find valuable there are certain considerations or optimizations that such as lambda specific or rather Java specific guidance for lambda that I did not include in today's talk B as I knew was covered elsewhere so I'm just to make note and to level set on the content that I have for you all today this is a 300 level talk so you will see me showing some real world design patterns you will see some small optimizations that really have a may seem trivial when you see them but will have a significant impact either on performance or cost at scale and you'll see some pseudocode and you will not however see me defining what server list is at AWS or giving general overviews of some of the core AWS services of a service platform so eight of us lambda or api gateway amazon sqs etc so hopefully that's what you're all expecting if that's a surprise to any of you then um you're not gonna hurt my feelings if you know you find there somewhere else that that you rather be so let's walk through a journey that a customer may go through with an early server this application as that application starts to experience scale so this is a pretty common early serverless architecture that we see especially for applications that are migrating to service for the first time maybe an existing application that was running on another style of compute maybe it's running on Prem or on ec2 and applications being moved to service this is a pretty common architecture that that we see so the characteristics of this architecture it is a synchronous API so Amazon API gateway is hosting a rest endpoint and that fronts a lambda function has lambda the backend integration responding synchronously to request from the front-end using a relational database for your persistent storage so here amazon RDS using any one of the engine's database engines that RDS supports credentials to access that database are stored in a secret stores such as AWS secrets manager you know we don't recommend that you have your secrets such as database credentials hard-coded in your application code and so it's best to externalise those and so you may have your credentials stored there logs from your logic running in your lambda functions as you know whatever you write to standard out from lambda gets streamed to cloud watch vlogs as do any custom metrics that you want to emit from your code so you know lambda will by default emit some default metrics but it's common for customers to have additional metrics that have business specific meaning to you and to write those directly to cloud watch metrics from within your lambda examples of architectures like this maybe you know write heavy applications such as order submission or transaction submission workflows or perhaps read heavy applications like you know based on how a user interacts with a front-end and submits a query again it's a database and then the results are kind of painted on the web page as a response things like that so we go we design this architecture we migrate the application perhaps the server list and at first things look great scale is increasing a little bit we're getting some traffic we're high-fiving everything's going wonderful this is the greatest and we didn't load test this is service I've seen this lives it scales by itself it's elastic everything's what nothing could happen why would I even test it it's gonna work great there can be no problems but then scale increases and we start seeing some strange behavior we didn't see at lower and we're saying we'll wait a minute maybe we overlooked some details or some nuances of our design or of the services we're using that are now being exposed at higher scale so what might some of those observations be things that you didn't see at lower scale then now you are seeing first maybe you're seeing timeouts there are a few places in that architecture I showed where we implement timeouts so Amazon API gateway has a maximum integration timeout so that's the time that the the integrations what sits behind your API get we endpoint has to respond to a synchronous call of twenty nine seconds so in the example that I showed the lambda function is sitting behind that API gateway endpoint has up to 29 seconds and it's configurable to respond instead of synchronous response back maybe you weren't getting that timeout before and maybe you start to similarly AWS lambda has a maximum execution time for a given function invocation of 15 minutes also configurable 15 minutes is the maximum so maybe you were not having your function timeout at lower scale but now you are and maybe now you're finding you're having connectivity issues to your relational database so as traffic has increased you're now noticing either connection errors or various flavors of errors trying to connect to your relational database and you're overwhelming that downstream in a way that you weren't at lower scale maybe you're seeing throughput dilution maybe you know for the same set of traffic you're now finding that you're not getting the same throughput through your your application that used to maybe now even though lambda scales beautifully whatever the dependencies were the downstream said that bit your application code that lives in that lambda function interacted with maybe couldn't scale along with it so your swarming down streams in this case C was a relational database it could be anything sometimes we have folks that move applications like this like I showed earlier to server list but it still connects to some on-prem resources and so your lambdas scaling up beautifully but now it's swarming and overwhelming some on-prem resource that was never made to be able to handle the level of scale that lambda can achieve lambda has a couple ways in which you can throttle you we'll talk about them in more detail that perhaps you were not seeing lambda throttles that lower traffic or at lower spikes in traffic and now you're both your traffic is increased and the peaks and valleys or how fast you spike in traffic is increasing you're seeing flavors have blamed a timeouts you hadn't Co rather blame the throttle when you hadn't seen before and you know proposed to work with us for a while most if not all of the eight of us api's have some level of request limit throttling built into them so in our architecture we were interacting with secrets manager and cloud wash metrics directly from our lambda code but maybe now you're finding you're being throttled by those services where you weren't at lower scale and this is a funny one because if you're using the AWS SDKs to interact with any of our services by default our SDKs have a default retry with back off if the SDK finds it's being throttled by one of our services so the SDKs are intelligent enough by default this is configurable but they're intelligent enough by default to notice ah I got a throttling error from calling se secrets manager I'm going to retry if I get settled again I'm gonna wait a little bit longer and retry wait a little longer retry until I get through so the symptom here may not be errors thrown in your lambda code the symptom here actually may just be latency that one line of code that was calling secrets manager may start taking longer and longer to respond as your traffic increases and suddenly you're being throttled by secrets manager or by polish metrics or by anyone other one of our ApS now you can if you enable debug logging on the SDKs you can see when this is happening it doesn't have to be silent but um you know sometimes folks will see me talk about this and think that oh I'm not seeing errors thrown by my lambda functions well that's not necessarily the symptom with the default behavior and maybe your cost is growing in a way that you didn't expect based on your early forecasts from your smaller scale tests so the two pricing dimensions for AWS lambda are invocations and execution time so maybe your functions are not taking longer to return maybe because they're being throttled maybe because they're overwhelming a downstream that's not as quickly as it used to and so now your functions taking longer and longer and longer to execute which leading to additional cost because that's one of the pricing dimensions of lambda and then similarly there are some ways some suboptimal ways that your code could be authored to interact with the services cloud wash logs and metrics that could lead similarly to your costs growing in a way that may not have been what you expected based on your early tests we'll dive into that a bit more detail so what happened let's look back at that early architecture and try and figure out what it's stress points were and why it started exhibiting some of these characteristics so here's that same diagram again so as lambda scales out and we start kind of hammering that relational database with more queries than we were before the load on that database starts to increase which impacts query performance and now lambda is waiting longer for the same queries to respond perhaps because the database is under more significant load than it was a lower scale and that's of course making lambda run the execution time for a given invoke be longer which is higher cost write the same code running the same query is not taking longer to respond because the database is under higher load than it was as I mentioned not only does longer execution time lead to higher cost it also makes you more likely to bump up against those timeouts so maybe you weren't timing up before if the database wasn't under the load that it is now but now it is and your hidden timeouts or anything before and you know my architecture here has API gateway as a trigger for lambda I mean same traits for whatever your architecture is whatever is triggering your lambda function you may now find that if lambda is overwhelming its downstreams it's your function may be taking longer to execute and be more likely to hit up against its lambda function timeout right and when lambda throttles you and we'll talk in detail in a minute about the dimensions of throttle for lambda functions when lambda throttles its caller which in this case is Amazon API gateway that manifests itself as an error back to the API caller say API gateway experiences a throttle on lambda and returns a 502 the caller right so now the client has to retry the client experiencing errors they maybe wasn't seen before so maybe the symptom you're seeing is I'm getting occasional errors thrown from from my API and it's actually because that that function is throttling API gateway and again maybe there were some suboptimal there was some suboptimal code in our lambda function that was interacting with some of those other services you know and not the most optimal way and that's causing the code to be throttled more often than it needs to be so we'll dive into that in a minute as well so to better understand how let's do a little detour how lambda measures concurrency and for folks who've used lambda you know that there's a default concurrency when you exceed that you get throttled let's look at the lander documentation see what it teaches us so it's trying to teach us about how they measure concurrency so the first time you invoke your function lambda creates an instance of the function okay make sense when the function returns a response it being the execution environment sticks around to process additional events okay this is that kind of warm execution context you probably heard about if you invoke your function again while the first event is being processed then lambda creates another instance of that function so a given function is single entrant right if one is currently doing work on behalf of a request and/or another request comes in we create another instance of the function and on and on and on and on here's the key bit as more events come in lambda routes them to available instances so ones that are not currently taking traffic and creates new instance as needed your functions concurrency is the number of instances serving requests at a given time what does that mean let's figure it up let's imagine I have a function that takes on average 200 milliseconds to respond 1 invocation 200 mils that one instance can actually handle a throughput of 5 requests per second right because it only ever as soon as it returns every 200 mils it can take the next one so you can see really handle 5 requests per second with the concurrency of 1 this is a common misconception a lot of folks have what they think ok much concurrency for lambda the default concurrency in a given region say is a thousand concurrent execution city ok Mike then that means I can only do a thousand requests per second nope you gotta understand how long your function takes to execute because I give in execution context depending on how long it takes to return can handle multiple requests per second right so here we have one function since handling five requests per second because of how long it takes to execute so here's the formula if you want to estimate roughly how much concurrency you're gonna consume for a given workload you say okay well what's my average function execution time in seconds multiply that time is my average requests per second and then I'll get a rough estimate of the concurrency I'll consume so for us the math we just did was pretty straight forward 200 milliseconds so 0.2 seconds times 5 requests per second equals 1 so we'll be using on average one unit of concurrency now the other thing that was kind of in the documentation there as well is that you can reuse some aspects of the execution take advantage of the fact that you're getting a kind of a warm execution context across those invokes and there and so anything that's outside of your lambda handler to the entry point for a given invoke can be reused across executions for a warm container now you can't count on it you know sometimes we're cycling our fleet or doing a lot of the things that are our responsibility as part of you know us managing the infrastructure and all the server's beneath lambda but you can do things like you know move either static data or HTTP connections such as database connections that you have to make outside of the handler function to try and take advantage of that execution context reuse awesome folks write something else want you to observe about this kind of little arithmetic we're doing here is that given steady traffic so imagine that you're always getting five but probably not five more like five thousand or a much higher amount of requests per second if you're lambdas taking a little bit longer and a little bit longer and a little bit longer to respond maybe it's overwhelming at downstream like we were talking about on a few slides ago now you're consumed concurrency he's gonna increase given steady traffic right or maybe you pushed out some new code that responds a little bit slower than he used to steady traffic and you're gonna consume more concurrency if it takes you longer to respond so again maybe because you're overwhelming some downstreams I had a way to you weren't before you're consuming more concurrency and when you hit your concurrency limit you're gonna get throttled alright I'll scroll down the docs a little bit kind of keep reading what they had to tell us about how lambda scales so talking about bursting bursts of traffic for an initial burst of traffic your functions concurrency can reach the initial level initial level between 500 and 3,000 which varies for region and you know what it is for each region is in our documentation after the initial burst your functions concurrency can scale by an additional 500 instances each minute and this continues until there are enough instances to serve all the requests or a concurrency limit is reached what does that mean okay so we're showing here I'm kind of setting up a scenario here so let's imagine starting at the top above that graph let's imagine we're starting out with just a regional default in the Northern Virginia region which is the US East one region so that false there are four lambda and US East one a concurrence in the mid of a thousand and a burst limit of three thousand those are the defaults and let's imagine we have a function that takes on average one second to respond just to keep the math easy so now you can think of the y-axis on this graph as either the amount of concurrency you're consuming or the number of requests per second because they're the same even though it takes us a second to respond so to make the graph a little easier to read that's kind of what the y-axis is showing us and so what we're gonna do is move kind of left to right across this graph and see what happens you know initially we have imagined steady a thousand requests per second so we're using a thousand concurrent vocations and let's see what happens that wants to get a burst of traffic to four thousand requests per second and then that that burst is sustained for several minutes let's see how we behave that for a minute two minutes three minutes so initial burst of 4,000 requests throttled what that means there are thousand concurrent instances working working working every second responding responding freeing up and handling managed requests but any requests above that that you know beyond it when all thousand instances are currently handling other requests all the other requests that come in are getting throttles and the clients retrying trying to get one of those a thousand instances you know trying a lucky kind of an get one next time they they make a request same story as time goes on right so we hit our concurrency them it our burst limit is higher by default but we will go until we hit the concurrency limit we hit it out of thousands and so that's that the client has to keep retrying and we keep its rötteln so again if you're seeing if you know depending on how the bursts of traffic are coming into your API gateway API for example and again my example uses API gateway as the the trigger into a lambda function but none of this is specific to that architecture these this is just the behavior of throttling and bursting in lambda you know these throttles are gonna be felt by API gateway as errors return back to the caller let's make a little change to our scenario we increased our regional concurrency limit in u.s. seas to one in the Northern Virginia region to four thousand and we still have a three thousand burst limit so the burst comes in and we get immediate access to two thousand more instances that's our Burslem that we burst to three thousand we were at a thousand we got two thousand more right away everything above that throttle just like before and again you know it's these instances are handling one request per second so they're freeing up every second and taking requests as they can but those thousand requests that are coming in per second if they don't happen to land on one of those containers that's available and they're getting throttled and trying again after after a minute we can create 500 more instances right and so on after minute two we now have the full four thousand I continues a minute three and now we're serving our 4000 requests per second so this is what the documentation was talking about when talking about the burst behavior of lamda there's an initial burst and then there's kind of the subsequent burst you can do until you you are sustaining traffic and no longer experiencing throttles or until you hit a limit so to see what's happening in the code we wrote to talk to see Chris manager why are we getting throttled there so here's some sample Python code it's pretty straightforward so we're on the first three lines grabbing from environment variables first the name of the database we want to connect to on the second line the username to use to connect to that database we're pulling these from environment variables that were configured on our lambda function and third we're actually getting the Amazon resource name the AR M of where to find the password and secrets manager and then in the handler we're going to secrets manager and saying get me that secret based on the AR M that we had on line three and that gets us the password assuming that our function has permissions to access that secret which it does and then we use that to create a connection to the database and we query and do whatever we need to do now that secrets manager call to get secret value call this is a Python code but under the covers that it's using this get secret value API call again secrets manager that has a throttling limit of 1,500 requests per second right this means that once the traffic to our function exceeds 1,500 requests per second then now a subsequent calls are getting throttled by secrets manager right now again this may not be an errors being bubbled out this may just be additional latency on that call for any of the functions that happen to have that call be throttled by secrets manager right and the key here is that that line of code where we're going to secrets manager is happening on every function invocation because it lives inside of our handler function very similar story for cloud watch metrics right let's listen Majan we have some sample code here and scroll down maybe just the code we looked at a minute ago and further down you know we executed the query using the connection we established on the previous slide further up in the code here and maybe it's important for us to keep track of how many results are returned by each query for some reason that's important to our business and so we are using the cloud watch put metric data call to write that metric to CloudWatch metrics as a custom metric now let's look for a second at how cloud watch metrics price is that API call so the put metric data API call which is what sits beneath that put metric data line of Python code is if you if you kind of adjust the the units for the pricing Utena documentation you'll see that it's being priced at a penny per thousand requests to that API call now for reference if you have a lambda function that's configured and you know lambda does pricing based on at least four execution time based on gigabyte seconds of execution time if you have a lambda function that has a gig of memory allocated to it and runs for a half a second that function is also gonna cost you about a penny per thousand invocations so what that means is if that function that with a gigabyte of memory allocated that runs for half a second on average writes a metric once like this you're paying as much for the metric write as you are for the function to run if it writes twice like this you're paying twice as much to write the metrics as you are for the functions do everything else it's doing in the code now there's a pretty easy way around this the put metric data API actually supports batching of metrics and you can put up to 20 different metrics for call you can put lots of different data points you know there's a payload restriction you can fit quite a bit of data in there so you can you can really Jam a lot more into that call to put metric data the real problem here is we're putting one metric at a time that's a really inefficient way to write metrics to CloudWatch metrics and also similar to the previous slide file washed metrics has API throttling 150 transactions per second on that put metric data call and so again similar to what we saw with secrets manager once you get above with the code written this way once you get above 1,500 or 150 requests per second you're now going to find the requests are being throttled by cloud wash and again the symptom may not be an error it may just be latency of that call so our architecture before really looks kind of like this right really that has that function scales out and out and out it really exposes some inefficiencies in the way that either we'd altered our code or interacted with some of these services if you imagine a function that has this is a pretty reasonable function that executes on average in three seconds and gets 3,000 requests per second because of the arithmetic I showed you several slides ago that's gonna result in about 9,000 concurrent invocations that could be that could be 9,000 database connections right that could be 9,000 functions trying to grab secrets or trying to write to cloud wash metrics right it's really important to understand the interplay between you know the traffic you received and how long your function runs and the concurrency and then how you've written your code and what that concurrency is gonna do to your your down streams and the other resources you interact with so what could we do about it how could we have done things differently what would have been the way to do this better let's try and do some of the easy things first so to start with this is how this is that same code we were using to interact with secrets manager to fetch the password to connect to our relational database we're just gonna move the fetch of the secret out of the handler let's take advantage of that execution context we used that we that we talked about a few slides ago and now whenever a new function is initialized the code that runs outside that handler function is going to be executed and then be able to be reused by any subsequent invocation of that handler for this warm execution context right now for some of you you may want to refresh your secrets more frequently than whenever you create a new function instance and for use cases like that you want to use perhaps in memory cache with an expert policy that will you know fetch from in this case secrets manager at a more frequent interval than just whenever a new function sense is spun up and I've been talking about secrets manager here but the practices are the same if you store your secrets in Systems Manager parameter store for example now for metrics we actually have some new features that can help you out so a few weeks ago the cloud watch team released a new feature called the embedded metric format to optimize how customers are publishing metrics to CloudWatch metrics the way that this works is you log your metrics out to standard out in the case of a lambda function in a standardized format in the embedded metric format now the format the embedded metric format has a specification you can read all about it in our documentation to ensure that your logs go out in the correct format and when they do cloud watch we'll be looking at the log stream for your lambda function and picking out the metrics that were written in that embedded metric format and then optimally writing those at Cloud watch metrics on your behalf now you know this came out a few weeks ago before this came out customers were doing something pretty similar so that just was just I want to write my metrics asynchronously and I want to do some batching and optimization outside of my functions so what customers we're doing before is writing very similarly writing the metrics out in a well-known format to file watch logs and then streaming that to Kinesis for example and having a lambda function may be consuming off that can use a stream and then doing this work themselves and you know we don't wanna have to do any more work than you need to still we listen to feedback and rolled that right into the cloud wash service so if you want to see what this looks like it looks like this so we have helper libraries available for a couple programming languages to make this much easier for you this is some example Python code so here I'm importing the embedded metrics library into my Python code this is open source on github and you I'm decorating my handler to give me access to write metrics in this way and then I'm just putting metrics using that library and that's gonna ensure that they get written out to my logs in the standardized format so they get picked up after the fact okay so we got the low-hanging fruit but what about this what about the over warming of our downstream is what about the of warming of my database what about the timeouts on my lambda function so the question I want you to ask yourself is does your API do you need the response from the API do you need to be synchronous so and I mean this seriously I think the dumb there's some customers I talk to and say well of course I do and you look at what they're doing in their writing orders and all they really need to know is that that message that they posted to the API was durably stored they just needed 200 good back saying yeah we have it we have this stored it's being processed but you don't actually me it's not like you're executing a querying at the database you need the results back right away you're just writing data so for right how do you work loads I just need to know that it's stored and it's safe and it's there and I with the processing continue a synchronous Lee and I can do my I can give you my work so if that's you then you can look at an architecture like this so Amazon API gateway has the ability to write payloads directly to SQS directly to a queue there's no hidden lambda there it's direct API gateway to ask us integration and so let's say you're posting orders or transactions or you're just writing data that you need to know is persisted you can decouple the traffic against your API from the business logic being done in your function by putting a queue in between and now you have a buffer that will buffer surges and traffic from your API from overwhelming your lambda function and whatever the down streams are you can have a dead letter Q on that sqs queue to catch failures and you can use a feature of a tube us lambda called reserved concurrency which will do what it sounds like it will reserved and kind of put a cap on the number of concurrent I invocations concurrent execution contexts that will be allowed for that function so you kind of reel and the scale and don't let it overwhelm that downstream so the flow works like I was describing API gateway direct integration goes to ask us as soon as the message is stored we return sqs has the as an event source for lambda so direct integration between lambda and SQS now something else that's worth calling out is if you need guarantee order you can do a similar architecture here but going to Kinesis instead right because that's us will not guarantee order but same idea and you can use the concurrency them it's like I mentioned to control the scale on the backend now since I'm showing you sqs one of their kind of quick optimization that I don't see is often that is I would like to in my customer conversations is take advantage of batching in sqs so if you ask us fully serverless be if you were look at how it's priced essentially priced based on number of API calls you make just us and many of those API is support batching most importantly you're the getting in the pudding so take advantage of that batch your rights and batch your reads it's right there in the console if you're configuring sqs as an event source for lambda you can say a1 up to ten messages in my batch not only does that improve throughput cuz you're pushing more messages through your queue it also improves cost because you're just pulled you're pulling off ten messages at a time and the unit of of metering for sqs is API calls and there is some payload size kind of dimension to that as well that's worth considering but a lot of folks will forget that sqs supposed supports both string payloads and binary payloads which means you can push compressed data through sqs as well to further optimize your cost and throughput and if after we're doing all of these optimizations you're still under scale seeing time outs and your lambda functions I want to look at your functions and look for something like this look for a place where you're doing orchestration within your code maybe you're doing kicking off one thing kicking off another thing waiting for them both to complete and then doing a third don't do that don't orchestrate inside your function you'd never want to find your function sleeping you mean you're paying for execution time don't pay us for your code sleeping and waiting for things to happen instead use a service that notice how to do that orchestration try and carve that code up maybe into three separate functions and use AWS step functions to orchestrate those maybe now three smaller lambda functions and let step functions not only handle the orchestration set functions has really rich support for retry logic and error handling I mean there's a lot of really nice stuff right out of the box there that you can get by using step functions for orchestration instead of doing it inside of your function code and now if you're using step functions and you carve up that code of yours you may find that actually one of those you know one of that chunk of code that you've made now its own function isn't doing much more than talking to one of these services and step functions has native integration with several eight of other AWS services and what that means is you may be able to take some of those new smaller functions you created and replace them entirely with with a native integration in step functions and you have one less function to worry about you know a lot fewer lines of code to deal with just let step functions handle the interaction with that service for you okay some are you saying okay Roberto that's great but that question you asked me before my answer does not I can't just fire and forget these things I actually do need the response from API call so what can we do for you so we've a few different options that we could use here API gateway does have native support for throttling at the API level and at the method level you could use that to throttle the traffic right at the API layer that will lower you overall throughput that requires work on the clients to implement retry and back off and it just to generally impacts the client so I'd love to not do that if I don't have to I could look at other database types so Amazon Aurora service has a feature called the data API which allows you to interact with it via rest calls as opposed to via a traditional database connection now that requires you to be using Aurora server lists as there are some limitations there so it may not be a good fit for you you can also look at dynamo GB which is our no sequel database his scales tremendously can handle all the traffic you can throw at it but that's a shift from a relational database to a no sequel database and that it's a non-trivial change so I don't want to sit up here and say hey just get rid of all your relational databases and go to dynamo that's not an easy ask right something to consider maybe down the road but I wonder if there are asynchronous patterns that we could look at instead to help us get the same behavior we need but not be just sitting with an open HTTP connection through an API gateway REST API waiting for the response to come back from lambda now before I talk about the async patterns there actually is one other option that I didn't put up here and it launched yesterday in public preview so there's a new offering called Amazon RDS proxy this was announced in public preview yesterday and let's talk about what it does so it will pool and share your database connections for you in the RDS proxy so your lambda connects to the proxy and the proxy connects to your database right the proxy will preserve the connections during database failover step by that I mean the connections from your lambda function so the lambda may not know for the backing database behind the proxy scaling over it still has a steady connection never interrupted but the back the database behind the proxy may be getting failing over the proxy will also manage the credentials for you so you can point the database proxy at secrets manager where you had the credentials stored and now the proxy gets the permissions from secrets manager and you can connect to the proxy via iam permissions you can say that this lambda function has access to this database proxy and now you're using iam to manage your permissions in the function you don't have to you can also connect to it using the same database credentials you were using to connect the backing database but now you have more flexibility some customers would rather say my lambda function has no awareness of the database credentials only the proxy does and the proxy manages getting them it's fully managed there's no provisioning no patching no management this is just an endpoint that you connect to so it is in public preview so you'll have so you'll see some limitations in the documentation around that specifically it's only available for RDS or Aurora my sequel versions five six and five seven so this may not be the Silver Bullet for you how many but once it comes to GA also be aware that long-running queries are still gonna take a long time now maybe maybe you have long-running queries because you were overwhelming the database and maybe that's getting addressed for you to some extent at least from a connection pooling perspective five is this service but um you know if you have queries to take minutes minutes to respond you'll still hit that API gateway timeout right and second you may find that at sufficient scale like this maybe buys you some more time as you scale up but at sufficient scale you may still need to consider going to a no sequel database just for the tremendous scale you can get there after you've scaled up your relational databases out best you can so let's talk through some async patterns you may be able to use now before I talk about the patterns I want to do a quick refresher on s3 pre sign URLs so for folks who aren't familiar the pattern here a feature of s3 is that you can using your I am permissions whether it's via a user or a role generate a pre signed URL for a given s3 object key a pre signed get or a pre signed put so you sign the get input with your credentials the URL you generate to get her put URL is only valid for long as your credentials are valid or the expiration time that you set on the URL when you generated it so if you use an I am role to generate this pre-signed URL and your role expire if the link expires also you can when generating the URL specify an extra time and then it'll expire when you set it now the big warning here is whoever you give this link to can download the data from s3 or write the data to that s3 location so be careful with what you do with this link but it does have some really nice use cases I'm gonna use it in some of the patterns that I show you here so the first and maybe most straightforward async pattern to understand is a simple polling pattern so here what we're doing is you're having now you have a few different API endpoints for this long and running work that you used to do synchronously so you have one endpoint some do work endpoint where the client submits the work to be done maybe requesting sending a query to that they want the data for and they get back a request ID right away as soon as we've durably stored their request for work to be done here I'm actually using step functions to orchestrate some work on the backend right maybe it's lambda functions maybe it's something else but you can do a direct integration again there's no hidden lambda there direct integration between Amazon API gateway and AWS step functions so the backing service starts doing the work async and now it's up to the client to hit a new status endpoint saying how's my request how's my request are you done yet are you done yet with that request ID that you've ended them an API gateway could directly check against step functions to see hey how's that work for execution doing how's it doing is it done is it done is it done when the work is done you write that out put somewhere persistence may be here at s3 and then when the client and now cease from checking their results endpoint or rather the status endpoint that the work is complete they go to the get results endpoint with that request ID and you return them back the output of that long-running query of that long-running job that you used to do synchronously but now giving these timeouts and things you had to consider other patterns so given if this is a talked about scale there are a few considerations here I want you to be aware of the first is based on your execution time meaning how long does it take that back-end work to run that will drive you from whether that's a backing gram to function being orchestrated by step functions or multiple lambda functions or if you're using another service that allows you to run containerize workloads like AWS batch so throughput considerations as well so in this architecture API gateway is calling step functions directly step functions has rate limits on how fast you can kick off new work for executions an API Gateway itself has a payload limit of 10 Meg's so if you know that your payload can be bigger than that if you know your payload is gonna be there's always smaller than that then you can through the get results endpoint return the payload right back to API gateway if you know it can be bigger maybe you're using an s3 pre-signed URL to return that payload and then the client has some amount of time to download their results so benefit of an architecture like this it can require some pretty minimal changes on your callers so the code that's currently calling your API that's now maybe today calling it synchronously has some blocking code that calls your API maybe Waits however long it takes you to generate the results and then when the results come back they continue processing now they can similarly just block by polling polling polling polling seeing when the data is ready and then proceeding on after that so it could be relatively easy to wrap an existing back-end with an architecture like this now the downsides are that there can be a delayed response when happen when the data gets back to the caller depending on how frequently they're polling right if they're not polling very frequently then the data may be ready but they don't know about it yet because they haven't pulled to see if it's ready yet it's also more work for them and more work for you compute-intensive for them to keep pinging you computer intensive for you to keep checking to see if their work is done so let's look at another pattern using Web books so for folks who aren't familiar web folks is generally the I it's kind of a callback pattern where you're calling back to an HTTP endpoint hosted by the caller to let them know in this case that that the work is done so the flow here is and I'll come back to this step so there's some optional trusts you can establish with the caller to have you know who they are let's just assume for the moment that you have established trust with them so you've had them register themself as a known caller of your API and you then them some token that they use to demonstrate that trust so they provide that token to you to demonstrate who they are and then their request gets put directly in this case to a queue to be processed asynchronously and API gave you returns immediately as soon as the requests has been thoroughly stored on sqs in this example now the backing service does its work here it's a lambda function churning away at some work and the backing service can call back to the client when the work is complete this is where the trust is important so if you've already established trust you by that I mean you know who they are and you've confirmed that their end point is valid and owned by them you can configure an SMS topic and I already have completed the handshake with that HTTP endpoint to know that they own that endpoint so now you maybe can look at their client IDs perhaps baked into like a JSON web token that they provide we know when they first call you so you know what their client ID is and you know this client ID maps to this SMS topic I'm a PI publish all their responses there and I'll know that they'll go to the endpoint owned by that caller now if you don't want to do that you can set up trust yourself or you can choose not to trust so you know if you have the trust set up initially then you can you either again have a dedicated SMS topic for a caller or you can have multiple clients on a given SMS topic and use subscription filters based on maybe that client ID to say maybe I'd publish all my results to this topic and then you know client ID 1-2-3 has a subscription filter such that you know when the message has that quantity on it it goes to them and then based on the client ID it gets routes to different different folks now if you don't want to do that so you know that I'll talk a minute about the benefit of why you would do that if you don't want to do that you can have self sign up endpoints where they can you know register themselves and sign themselves up with no and endpoints or you can just choose to not do any validation on the URL you know be careful of that they could give you any callback URL perhaps in the request they send to you and you want to not be sending their data to the wrong place if they give you the wrong endpoint right and then um pretty similar story with the considerations at scale here as well both are on execution time you know 15 minutes being the timeout for a lambda function and payloads I in this case with SNS SNS has a 256 kilobyte payload limit so you can either depending on what you know your payload sizes to be send the payload right back through SNS or use a pre signed URL again so no polling here which is a benefit less resource-intensive for them and for us we send back the results as soon as we have them a benefit to using SNS here is SNS has a well-documented retry policy and recently launched support for dead letter Q so you can draw you know you can keep track of the payloads that were retried and retry banned retried and exceeded that retry policy and stopped attempting to retry now if you don't use SNS well so first of all this potential non starter for a lot of use cases here that it requires the caller to host a web book endpoint right and that could be there could be lots of architectures where that's not feasible or not reasonable to ask your callers to host a web hook you need a document to retry policies who your callers know what their uptime needs need to be to ensure that they get the response of their request and as I mentioned earlier for untrusted clients you're responsible for how you want them to demonstrate their trust ok one more pattern I want to show you here this is a web socket pattern so Amazon API gateway supports rest endpoints which is the one we've been talking about so far today it also supports web socket end points for folks who are less familiar with web sockets this is a persistent bi-directional kind of open communication channel you can establish between client and server and so here we're actually using both we're using both API gateways rest support and web socket support and we here's what the flow looks like so the client submits the request and receives back actually quite a bit more than they did in some of the previous examples they get a bunch of details about the step functions workflow execution that we kicked off behind them I'd rather behind the the API now the client the caller uses those details we returned to them to hit the web socket endpoint to say hey I just kicked off some work here are the details the work that I kicked keep me posted on how it's doing so you establish a WebSockets connection with those details there's a little lambda function that runs for WebSocket based API if that runs whenever a new client connects and can inspect the payload they submitted on connect the lambda function goes back and there's a step functions workflow on the back here that was kicked off when the work was first run and what you're seeing beneath that little yellow dot is a parallel state in step functions one side of it is doing the work and the other side of it is waiting for the client to connect and start listening so when the client does connect over WebSockets and the on cannot claim the function fires with those details the client provided it finds that execution says hey the clients now listening and so that now means that when the work is done and the client is listening that's when we try and let them know the work is done and we call back to them that would be don't we avoid the edge case of the work completing very quickly and us not calling doc because they weren't listening when we were done so again some scale considerations here as well some throughput considerations we saw these on some former slides for step functions but also just be aware of the payload limit for WebSockets and similarly to the other architectures this will determine whether or not you're sending the response directly back over that WebSocket connection or giving them a pre-signed download URL to grab the results benefits we have an open connection with the client now not only can we tell them immediately as soon as the results are done we can do more rich eventing over that open connection so you know web and mobile clients are very accustomed to web Sabich WebSocket connections for exactly this reason so if that front end that's executing these long-running queries is actually a web or mobile client they'll be very happy with using a WebSocket connection for notifications of status updates on the backing work or being notified when the data is available for them to go and fetch now the flip side of that coin is you have to be familiar with WebSockets and your clients or your callers have to be comfortable with that protocol that could also be a non-starter for some of you kit use cases you may have some clients to call your long-running synchronous API today that it wouldn't make sense for them to learn WebSockets just for use case like this right not all browsers support WebSockets and the Amazon API gateway WebSocket support has a rate limit on how fast new connections can be established that's what it is they're just to be aware of the link that I put there is actually a link not only to a blog post that goes very deep into this architecture but that blog post itself also links to source code in github that we posted for exactly how this architecture works so if you want to look at it again you know you have that link there to get rid of deepen and how this is designed so to wrap up first of all understand the scaling behavior of the services you choose I understand that our documentation can be pretty verbose sometimes and I still encourage you to have a look and understand what the docs will say about how some of these services scale and load tests right don't take our word for it test the load that you need these are server the services you only pay for the traffic that you drive to them so drive some big traffic to make sure it works the way you expect and then as soon as the traffic the load test is done you're not paying anymore right try to decouple your components best you can so they can scale independently implement and retry with back off wherever you can as well we do it in our SDKs you should be doing it as well for interactions with anything that can be throttling you or have some throttling behavior leverage batching wherever you can so the example that I showed was sqs very similar story for architectures I've seen that used Kinesis you can really maximize your throughput through Kinesis by taking better advantage of you know the payload rate limiting they have their don't orchestrate your code in a single lambda function not only does that increase cost but it also increases some complexity try to use step functions for example to orchestrate that for you and use asynchronous patterns wherever you can again to just kind of decouple and your your architecture if you want to learn more we have hope heaps of free training on a TBS dog training including a new brand new server this architecture course they just launched there's some great stuff there I'll be down here to take some questions thank you all for coming and please remember to submit your service [Applause]
Info
Channel: AWS Events
Views: 13,491
Rating: undefined out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, SVS335-R1, Serverless, AWS Lambda
Id: dzU_WjobaRA
Channel Id: undefined
Length: 52min 41sec (3161 seconds)
Published: Fri Dec 06 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.