AWS re:Invent 2019: [REPEAT 3] Serverless architectural patterns and best practices (ARC307-R3)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
alright hello everyone and actually surprise is how many people after the party I can see some people still alive which is good right so to kick off this this if you are in the right room just to make sure this is not about containers or kubernetes or anything this is surveillance architectural patterns and we're gonna be covering different from last year where we covered tons of different patterns which you can find on YouTube this this time we decided to cover the one that works for the vast majority of customers about 80 percent of customers doing it and services are doing something like this and at the end I'm also going to show you some of the examples that we announced our recently especially if you haven't seen the more a larger application being open source in multiple languages as well like my my name is Adria Lisa I'm a principal service lead at the where were connected my job is to help define best practices and aggregates those practices and lessons learned we learn from the engineering and also from customers and partners we're gonna do a quick recap on the best practices and this year to make sure you are still alive and make sure you still paying attention instead of calling things like web hooks or REST API so we're gonna change the names instead of web book which you guys we're just kind of say call me maybe right so you're gonna see some of those funky names it's also important to call out what we're not gonna be covering this is a 300 level session so I have some expectation from your knowledge as well it's not a 400 level that we go too deep but it's reasonable enough that most people should be able to understand but I expect that you know what serval is already means you expected to know what lambda is some of the services we're going to be covering at the end as well there are some patterns that are very specific that they would take about an hour to cover so especially when your run services at a very large scale we're dealing with hundreds of thousands of transactions per second the patterns will be different so we have one session specifically for that purpose including all the guards and Cavett's you should be aware of when designing for large-scale just to make sure we set the scene here at this time when we talk about surveillance or what we actually mean by service depending on who you talk to you they may have different understanding of what surveillance means the best way I've found you explain what service is in a more succinct way is think of service as a spectrum it is not necessarily add addition nation that you have to go it's more of a sometimes it could be on the very far left hand side or if Sita computes very traditional virtual machines VMs and on the far right hand side you see the complete opposite everything being fully managed and you don't have to do with servers everything's patched you don't have to think about scaling all that handles for you most of the time when do working with customers for the past four and a half years on server list and about six seven years on Lai WS you will see a mix of those technologies being used at any point in time it's not all the time that you're gonna see a customer or a partner doing service 100% but you're gonna be seeing a mix of those things so for this presentation I'm gonna be taking that into account and I'm gonna be showing to you some of the customers I worked with more recently they adopted different technologies and you can see how they evolved as well to give you a more realistic approach beyond the best practices right some of the key practices I expect you to know but I just want to make sure that can recap on this we announced this month a month and a half ago if you are doing service for the first time and you never done see ICD this is where you start in the lambda console there's going to be something called create an application and that will create a basic CSS ID for you with the common practices we expect you to start right that will cover some of the testing some read me some example how to organize your code how to structure your codes including how you evolve your code however if you're not starting now if you're being using service for more than a year or three years or if you want any large enterprise this may not be a good fit for you do this instead this was was something we've worked with track ten partner of AWS advanced on service consulting they've done as something that what most enterprise is do in what scene as well especially when you work in multiple teams multiple micro-services and you already have some CI CD in place you might want to do things like feature branches feature toggles just autodiscover branches as you create create preview environments and then delete them open merge on pour requests this is exactly it it covers most of the best practices on CI CD but it could also be an overkill if you are just starting and prototyping and development this is about the point you mooch port counts and things like this right the next one which this talk is heavily based on one of my friends go Jeremy daily if you haven't seen him he covered about 17 patterns if I'm not mistaken a more in-depth there are some patterns that I'm not covering here so if you are interested in multiple other patterns of service especially surveillance micro services dealing with connection pooling all the sorts of things Jeremy covers in a fantastic well-written article that he keeps up to date including event driven and a bunch of other emerging patterns that I decided not to cover in this session so please check it out ok next piece is this is if you already doing service and you you already pass the point of having ten functions or 50 functions if you are in the path if you're in the process of having hundreds of functions you know how hard it could be to figure out what is the correct memory setting for lambda as you apply memory for that as we increase the memory we will allocate proportionally CPU networking and i/o if all your function does is calling HTTP endpoints don't use the default memory there's a bunch of other ways to do it more memory doesn't mean is gonna be more expensive in fact almost always in some cases are going to be cheaper what this tool does is the deploys a state machine on are either blas accounts and you say what is the alum des function you want me to tests how many times you want me to invoke this function and what is the payload you will basically reconfigure your lambda function across all memory settings we have and we'll keep passing or lambda function and then you can say are you trying to optimize for costs for performance or something in between and then we give you a web graph that you can go and investigate what was the performance and you can find out what is the best memory setting for you it's a more systematic approach to do it to give a more realistic approach why is this important well take a look at the difference just by switching the memory from 128 megabytes to 512 megabytes all we were doing this in this particular one that I highlighted is a api gateway calling lambda that caused a DynamoDB and do some calculation on loyalty points which I will cover the whole use case at the end of the session this is a tool that we call gatlings an open source load testing - that gives us some beautiful graphs and reports on how latency is distributed and some of the areas that you can go and improve study rates versus rates that sort of thing we only did two things here to get that performance boost we've tuned the lambda function and we've done something that you may already know api gateway when you launch using CloudFormation it deploys what we call edge optimized if you're all you're trying to do is service the service communications or all your customers are in the same region you're trying to deploy or a diverse region don't use the default use regional api's regional api is essentially remove the CDN from the top of api gateway which increases the networking connection or it also uses HTTP to to give you some of the benefits so if you're trying to do lambda 2 lambda ec2 2 lambda container show lambda through an api gateway there's no need for having an edge optimized you would only use edge optimized when your customers are completely geographically dispersed when you're actually in the CDN on top and nothing also stops you from adding your own city and if you have your own sitting in on top of api gateway use regional api otherwise they have CDN on top of CD and on top of api gateway and this is just a switch that you have to do the next one which I kind of expect you to know a bit which I will talk about more is what we call a saga pattern or state or coordinated transactions within a state machine I will or we cover this particular servers are lining more details at the end but the idea here is to show you if you were to do some complex booking process if you were to search for a flights pay for their flights and you want to reserve seats and do that sort of thing there's a bunch of transactions that have to happen either sequentially or in parallel and if they fail you want to make sure the quote goes back to where it's supposed to be clean up this state and tell the customer I couldn't book your flight or I couldn't collect your payment you know what I will try again three times or for the next 72 hours if I can't collect your payment then I will tell you that I'm sorry I'm gonna come to your booking and I'm gonna tell you this is how you can talk to us on the right hand sides this is what we call the happy path when everything works when the network is reliable and everything's working just fine everything will just go super fast right here but when it doesn't each of those tabs they have retries buting exponential back-off jitter caching failures and try-catch modes and when you detect certain types of failures you will go to the left hand side you see those red that's where it goes one of the pieces I've noticed by working with customers in production for quite some time they've almost always missed the last tab on the far left hand bottom left hand sides when you use them functions by default it doesn't have an integration with DL the key or dead letter Q if you want to capture that message that failed of the whole state execution you can add an additional step integrating directly we've asked us as a cue and you can send the entire input event chew that cue so you can do something with it so it's actually just about four lines to do this I can give you the source code as well at the end all right now that you know all the basics you know all this we're gonna start with the comfortable rest the comfortable rest as you can imagine is a REST API right it's where most customers start with service or lambda in particular an api gateway and what we're going to be doing across the session now you're gonna see an archetype we're almost everyone starts like this this is kind of the POC right this is the beautiful state where everything looks so simple that it's amazing like it just feels the dream and at the bottom I'm using my team Weber architected five pillars to help you apply best practices and better design no matter what the pattern you come up with for you use the ones that the bottom is gonna help you create some guidelines and some processes around it what are it in the right thing or not technologists come and go patterns come and go the pillars can help you have a better Norv on that sense okay so first things first for an Operations perspective if you go into production they're very minimal you want to have some tracing because you're gonna have some useful functions you want to know what was it downstream latency what is it distributed latency across multiple services what was the customer that sort of thing and you also want to make sure they have met custom metrics and structure logging by me structure logging I mean configuring your logging output as JSON and clearly defining what are the standard keys in your JSON outputs that's always gonna be there no matter what service you actually log by doing this you can have a much better search platform using cloud watch login sites using elastic search cabana it doesn't matter JSON is well really well understood and you can do this the only piece of advice that's new now creating custom metrics from within your lambda was actually a bit challenging before last week some customers will try to do that synchronously in their code using AWS SDK and just saying put metric data don't do this unless you're gametic error metric is highly critical you wanna make sure this term the new cloud watch embedded metric format by using the same structure logging all you do you have a big JSON blob file log line that you can collect up to 100 metrics and cloud watch will do this for you automatically and we're talking about microseconds now all you need to do is just to log in your outputs a single JSON blob in a format that we expect and we are gonna create those metrics for you you can do what 100 metrics at a time the next one in might seem basic Cerberus is awesome because it scales for you he handles a bunch of things that you have to do otherwise but at the same time it doesn't mean your business needs 100,000 requests per second all the time or 10,000 requests per second as you understand your customer behavior you want to make sure you don't allow yourself for an abuse from BOTS from things like this at a very basic level you would want to regulate the access rates just because the API gateway lambda can scale for a large number of requests it doesn't mean your traditional database or anything else you need you sometimes you get requests that you're actually paying for this by regulating the access rates and enforcing authorization you actually don't pay for those requests that fail so that helps a lot as well like I already mentioned for an authorization perspective that's kind of at a minimum you want to do unless you were doing a public API and you don't want to have authorization in place in this case were using cognero here but it could be using anything else as a custom outer Iser and we're also using sequence manager for sensitive data if you don't have sensitive data you just use parameter store and work better for you if you have about 1,000 transactions on getting fetching parameters and configurations the default behavior would not work for you you have to go to the console or the CLI or the city kay and tell parameter store you wanna bump in this transactions per second so if you have about 100 functions or more you have to do this this is kind of mandatory sequence manager on the other hands that will give you about 1.5 thousand transactions per second from fetching secrets it helps you a lot on that one from the performance perspective I already touched on the original endpoints like I already mention you don't want to do this if your customers are original or if you have micro services talking to your micro services via an API DynamoDB as well right now with you Sam or even serve this framework for the matter they were already used a on demand by default as a best practice if you you're starting out you're prototyping you don't know your access patterns yet there's no point on paying for something or not knowing yet use the on demand it's gonna be a lot cheaper for you unless you know exactly what you're having as a request in is consistent reboots if it's consistent put all the time you want to switch to provision capacity because that's gonna be cheaper for you as well without the scaling lastly like already mentioned use the loom the power tuning to make sure you know what is the correct memory setting it's in there's a even one session the chalk talk which I can recommend to you afterwards if you look for lamda optimization Ida reinvent session catalog or YouTube afterwards there's a talk by one of my friends who actually built the lambda power tuning called alex castle bonnie he showcases every possible detail when people thought having lower memory was actually cheaper when effect wasn't sometimes five times more expensive alright the next one is my new favorite it's called the cherry pick when you're building a new web application or a mobile app and you're looking for something more real-time capabilities this is the graph QL piece we're gonna start very basic and we're gonna start adding other features so you can see how he grows I can't rightly mention when you start with graph QL and in tutorials and hello words this is like the dream right you have an API on top and you all you don't have to do is like some create update to read and delete and sorting data and filtering data for flights just to do all of this it's about five lines you don't need to do anything else no code involved this changed the way I see applications right now the way I even talk to your customers so up sink manage the graphical API for us let's say you actually want to use something more complex logic if you're just inserting some data listing some data or fetching some sort of data Apache VTL or what Apache velocity template will work for you graph QL makes it easier to do this however as you start troubleshooting as you start doing something more complex adding some tracing some custom logic don't do this lambda function specifically for this in this case I'm trying to get loyalty for the customer X and my lambda function has to calculate all the points they have and also tell them what is the next tier for you how many points you are behinds the next year percentage and things like this you're not gonna do this in a patch of ETL one of them even my mistakes that I learned by myself when you start using graph QL there's a feature in the app sting called pipeline resolvers imagine you re doing REST API when you make one single call and that call will probably do some Cheney actions so called service number one and call service number two if both succeeds it returns in theory it's awesome right it's super super helpful it's super simple in this case we're trying to do some booking process trying to use multiple numbers for doing this pipeline resolver doesn't give you that capability of if fails what do I do how do I retry how much how many times retry when do I stop what exactly failed tell me these things obscene can King work directly with any other Blood Service as well and in this case when someone does in its booking in a mutation there was pinup that's agra pattern that you saw this is one of my favorites because it looks simple in architecture but it's one of the most powerful things today let's say you have a booking and inside the booking you have the booking reference the outbound flight the status at the booking and whatever you want and let's say you decided to add additional fields that only certain people in a travel agency for instance can have access but not the customer when you enable kognito in up sync or any JWT for the matter you can not only save in order to make that API call they have to be authorized up sync or graph key or takes it to the next level without you coding anything and you can say for that particular field of my flights or my booking not only they have to be authorized to get this data but also if they have if they are a part of the group agency or add me and it can mix and match on a per field level as well if they're not the data returns just as no and your front-end will just work with that one of the being I guess the biggest benefits of graph QL it'd be on the subscriptions in real time is the fact that you can swap databases and yet keep a consistent model and contract if your front-end if for instance I have my booking reference inside my booking everything comes from DynamoDB but if I decide now that I need elasticsearch for doing full-text search on booking reference I just swapped that field to go to elastic search and when I fetch my booking everything comes from DynamoDB except booking reference and dobo for running parallel it's a much more powerful mechanism and he allows that you keep swapping those things and the client doesn't have to know the contract still remains the same and you still get the benefit of starting to use purple purpose-built databases this is the same for single table design that we discussed a lot single table design is awesome for performance in many different ways but if you don't have that performance needs and you're just starting out you don't know all your access patterns appsync can handle all of these pieces for you in a graph QL and a DynamoDB and once you feel ready you can change those implementations otherwise there's no need for this he allows you to have a much more progressive experience and allows you to evolve your architecture maker change your decisions as you understand better your customer behavior without having the drawback and that came out last week one of the hardest engineering pieces we've done for gap sync beyond the amplified that data store for offline data rests for caching is quite simple you know what is the URL and you know what is the payload that comes back right and it's kind of I wouldn't say easy but it's relatively okay to do caching it's not that hard graph QL is hard because it gives the front-end or the client the flexibility to say I want the booking but I don't want all the data from booking only send me these fields of the booking because I can watch I want to show up agitation to say how many bookings you have and then I can make another call and I can ask something else and because you have this flexibility makes it hard to cache especially when you look at the API call that's made it's only just a slash graph key well you don't see anything else the control is in the payload with the new caching server sides we're now giving you two types of caching when you make queries for instance get loyalty or get flights or lease flights we can cache in a parrot or ization session as the customer authenticates if they make more than one request using the same parameters we will know it and we will cache it for you or better yet you don't want to cache everything you don't want to cache basket for instance you want to care sketch the same zip data while you can't do as well on on the first left hand side you see resolvers queries mutations in graph QL terms resolver is actually where the compute happens is where you want to go and fetch some data you can now enable cash on a resolver level so you can say for doing a full-text search I want that field to actually be cacheable everything else I don't want it to be cached it's much more powerful concept than on the rest side of things if you fancy a practical example of this I'm gonna be doing this now from now on I'm gonna be sure the pattern and if there's a customer that I know what I've worked with I'm gonna be showing how they implemented something similar this is a service airline that you saw the screenshot we've done this on twitch with about three months 14 hours of episodes that we have recorded I give you the link so you can go and watch we've done everything from scratch for an authorization authentication from defining our first data model to define your schema to make some mistakes live and realize that the pattern was wrong and then we fixed it with mode made multiple pairs that memory tuning that I'm talking about the API get to original API that low testing all comes from here I've now documenting all the tech tabs that I actually even encouraged myself and the trade-offs I had to make or decisions I had to make as we were building this in a nutshell the client is a mobile app or a web app like it was a progressive web app and as I search for a particular flight that goes straight up sync there's no longer involved for just patching a flight just go straight to catalog catalog fetch the data from database in dynamodb as you were making about to make a booking we can talk directly to payments at the bottom going to stripe through an API gateway what we can also go straight to the booking afterwards if we're able to do a pre authorization and then step functions cause multiple functions collect payments confirm booking yada yada and then it calls SNS should publish a message to loyalty and say booking has been confirmed I want you to calculate those points and ingest into the database later we do something that I thought it wasn't you I couldn't see it before but now it is possible to do it easily you can do up sink as an API hub or a graph a big graph to any other blood service and more importantly if you notice that loyalty has an API gateway and payment has its own API gateway we can connect this up sink to a P I get to directly just by using I am authorization know and the client doesn't need to know about this the client continues to use their kognito their JWT whatever authorizer they have to use and we do that securely now they call me maybe call me maybe' is nothing less that web hook right what is that web hook you were dealing with slack you were dealing with some github some background job that's actually gonna take a while and once it's finished you want to be notified in theory again super simple to make things different I decided to not use a dynamodb and give some love to relational databases because they're awesome as well as you start for doing this you probably already notice before their announcements which I couldn't update the slides for now but I will walk it through you had an issue with lambda scaling she fasts and relational database is having issues with memory connections and things like this when people do this the first reaction they normally normally take is limiting the lambda function for the number of concurrency that is great and at the same time is really bad why is that it's great when you have low volume requests it works really well you don't need more than five and that's okay however if you need more than five you're gonna be throttled on this your lambda will not go beyond a five at the same time it reserves that concurrency but it also limits that concurrency as well so if you don't want to do this there's a much better way to do this something that customers have been using for the past couple years and it's being battle tested and it works really well this works when it's asynchronous while you can do it's using Kinesis Canisius is an event source by default will limit the concurrent server function without having to limit the function concurrent's itself as you receive more as you ingest more data Kinesis by default has a single shard and will aggregates batch as much as possible and then it's do a single invocation on a per second basis if you decide you're receiving too much and now you want to flush out that extreme of that queue more rapidly you can then increase your shirts or you can do the new feature called paralyzation where you can then say even if I have a single shirt I have too much data now I want to spin up more concurrency and that doesn't limit your lambda function and you still get all the benefits you're looking for including the new features of custom which rise sending Kinesis failed batches to you and ask you a dlq and things like this even though I said dlq at the bottom don't do the elk use anymore if you're looking for additional data I'll explain in a minute lambda when something fails an event source you basically say try three times and then send a message to a deal the queue we announced something called lambda destinations I've been trying this and I definitely don't recommend doing the elk use anymore why when you use lambda destinations it's a piece of infrastructure outside your lambda function construct that you say if my lambda function fails send every possible contextual data why failed how many retries it right what was the event source what was the function to a deal queue before this if a message failed to be processed you would receive the payload of that failure but you wouldn't know why where it came from which function authorization becomes a classic now you can't even know all the security you have at a very minimal authorization in place but there's something else you could also do if you're doing nodejs you can look up which I was showing after at the demo but if I'm for some reason I forget there's a library code Dasom which is the company Daz n power tools power tools give you a library to work with Canisius with any other data that you can easily obfuscate sensitive data imagine you receive a big JSON blob that before it goes to the database or before it goes to any other processing you want to make sure that that part of the JSON key or part of the data has obfuscation you just import this library and that will handle for you you can also do it by yourself as well but the idea here is that you do this on the stream itself by having another lambda function just to obfuscate that data and then when it goes to the lambda before he goes to dine or RDS in this case it's already obfuscated because we're dealing with webhook sometimes you don't need to do some processing immediately in fact for the better cost perspective and performance you want to batch as much as you can aggregate as much as you can and do a single invocation so you get a better cost parity vocation ratio how you can do this in Kinesis there's a feature called batch window that you can say for how long you want to wait and you invoke and you can wait up to five minutes that gives you very low concurrency a huge cost and a huge value on provocation basis because we're dealing with webhooks if you something are doing new right now why not just using api gateway straight again Kimi's sqs alumni and DynamoDB you can still buffer if those requests fail you if lambda has some concurrency issues or your code fails whatever that is you still have the capability of holding that data for 14 days if you wish and you still have dynamo that you don't have to deal with connections and things like this we announced I think yesterday if I'm not mistaken if you are doing relational databases with my Seco we announced RDS proxy it's basically is a proxy sequel per se we're basically proxy all those sicko connections are those database connections to the database and you no longer have to think about those can you see streams of those pieces if you wanted to you think of it as a PG bouncer if you use Postgres or proxy sequel for any other databases the next one it might be familiar but there's a big difference which I will show in a minute the big fan out is the fan out classic one why you have REST API lambda sensual and SNS and in your fan out to one consumer or multiple consumers at first it looks like this but it can be it can be too much of having all this they can be a lot simpler a lot less tremendous one of the interesting I wouldn't say you should see is that it's a very interesting challenge to see as you start using lambda and get comfortable you see lambda everywhere sometimes the architecture barely fits the diagram there's so many lambda functions that you don't need and why have so many functions sometimes all of this if all you're trying to do is REST API lambda 2 s and s connect directly API gate where or up sync all those manage the api's can if you're not doing any custom processing within the payload itself and all you want to do is send a message to an SNS topic or a stream or DynamoDB or s3 you name it api gateway connects that directly that's fine next one you can also use SQS from the consumers pieces you make it two kind of a batch as much as you can so they have SNS two sqs as opposed to SNS 2 lambda if you haven't seen is this before SNS has a one-to-one ratio you send one payload it's one invocation SNS you sqs will keep patching those aggregating those and send 10 message at once so again better cost better performance ratio what is also new if you're only doing SNS for whatever purposes if SNS fails to deliver the message you would keep wrong with trying multiple times but if you wanted to stop or if you want to receive a failure pay a failed payload you can outdo dlq directly in SNS as well that's a new feature that came out last week and again authorization is the same with one exception if we're doing some public api s-- or it you don't trust whoever sending the message or if the master has been tempered or not if you haven't noticed inside the payload of SNS it contains a key that you can use to verify if was indeed SMS who publishes a message to you and that's something you can use as well as you're going through more multiple consumers to make it more efficient SNS by default if you don't use this feature as you increase the number of consumers every message is sensuous unless you will broadcast to multiple consumers the same message sometimes all you want is actually to only process messages if the booking status is created or confirmed or a confirmed or whatever order is now being processed use a feature called message filtering to make sure you pay less and you also invoke class and is more efficient as well if you can from a cost perspective you can also compress and aggregate messages as much as possible on the client hand side before they send those messages back alternatively if you have a high throughput large payloads don't use SNS use Kinesis instant Canisius will give you about a 1 megabyte payload in and 2 Meg about to be read on the back of it if you have consistent tributes from a cost perspective Kinesis will be more efficient for you but if you don't have large payloads like always 256 kilobytes all the time in a high throughput SNS you just to fine you works really well battle-tested super simple to do but how does that look like in real life when would you use something like this this is a customer I worked with we were doing migration of the whole ecommerce platform to service and this is what we came up with as we were trying to do step by step first before we went full service in multiple components so what we did first was they needed to get insights of all the furnitures and some of the business transactions they actually sell online so Donaire is a leading furniture retailer in the UK on that one so all of those business transactions were having an SI p at the end before we before we try to change every are possible architecture we decided to do one piece at a time so every business transaction goes from s AP sends a message to an api gateway REST API that goes straight to us and s no longer function involved as you ready now and then we start funding out to multiple consumers we started with one and then start increasing and more consumers one two the same data and then you go one and that's a couple about 300 Q's or 400 cubes if I'm not mistaking today just to give an idea everything else is kind of the same he also works the same way for gates and other containers ID implementations as well right this is new for me I've seen in many different many different forms especially in data Lake but they say I'm a streamer it's streaming data re-imagined I've seen this patterning customer and I fall in love with the way the simplicity they did and I decided to put this in here because I think it's a much better way to do it what you want to do is you have clients web clients clickstream whatever the data it is all you want to do is send those data very fast to an API that you know it's gonna be stored somewhere then you're gonna do with that data this is why you're gonna be using string pattern at first you start with something like this works fine as you start growing growing your use case that may not be the best fit operations was the very least if you're trying to do events sourcing instead of having along the functions you put a backup of that the feature is already there for you you can say whatever record it goes to my stream I want a backup of it in a separate bucket so you can do this for so many reasons that you probably know more than me on that sense now on why the next one this is perfect when you have a high amount of streaming records coming and they might be having different services or different functionalities of your application the customer in particular he was as they were trying to grow the number of services and features offered to their clients they started adding different fire hose for different business domain data that ended up going to different buckets afterwards as you already know I already talked about into the authorization and obfuscation of the data so that's the same it doesn't change as you already probably noticed some of those patterns some of those best practices repeat no matter what pattern you use so keep that in mind from a performance perspective if you're gonna use something like a teener or some sort of our way to crawl the data and squee with the data you can convert that JSON blob that you're saying into api gateway to either orc or parking in this stream itself no longer function is necessary unless you want something custom logic that yes then you need a lambda function what happens it is all these business events go to separate firehose separate s3 buckets and they have a crawler to automatically detect the schema of that data that clickstream data and then they can use teenage who actually do the queries as well you are they are an expert on this you could actually use SNS and their lambda function now becomes much much smaller and all the along the function does now is one thing that you can don't even need to maintain that anymore all the lambda function now does is routing the message based on the payload tag the message filtering to a particular Kinesis firehose it's something that now you can deploy it to many different accounts and you don't have to manage those complexity inside the code anymore because we have API gate which and SNS the a payload will be tagged because you know exactly which fire hoses should go and lambda just as a job is actually putting that data in if you are doing global regions instead of going to vapor gate at a high throughput API gate when in that particular case it can be a little bit of expensive compared to doing this approach where you can use cloud front lambda attached and going straight to those canoes as fire hose we've actually blogged about this as well so if you look for global data is streaming ingestion on AWS you find the blog on that one but how does that look like in real life right that's why we're here for this is the customer adds the CTO of LifeWorks was kind enough to share his architecture just for this bit they are running in multiple windows at the same time at the same idea and it's super effective on how they do it life works provides live employee well-being services for large enterprises where all of this S&S and firehose would actually be business events for the think of it as a people portal type of scenario where they have different employee services that you up you offer them inside your company and based on how they use your platform your services they generate different business events that you can go and crawl and do something including the obfuscation of data which is not here but you get the point all right so this is the strangler we're not going to strangle anyone here just to make clear that's why I put in quotes so the strangler pattern is what if you want to move to service you want to migrate to service you've never done it at once before or you never done service before instead of going a big bank approach which could be super interesting at first a strangler Patton it's a more conservative approach that also takes into account not only the technology change but also the employee cognitive load going from on-premises or going from someplace where you don't know yes do you have an implement and many of micro-services ideas or distributed systems can be a toll on everyone's in our team so Strangler kind of a helps you to do this in a phased approach say you have a data center you have a load balancer mooch multiple servers and obviously a single database but they don't purpose because of the monolith joke then as you're law you could have an API on top of it to start adding slowly new functionalities without having to change your client your contract now is established and you hide the implementation details and it slowly keep adding pieces that's exactly what we did at Dunn L and then we slowly I keep adding features at the back of it the back of systems so first you have a VP see where API gate will send the traffic to network load balancer sends you a private IP of that server and then CH what database we use that a connection making more performance but the mistake here is going from network load balancer to the private IP address of the server the server could go away the private IP could change that could fail don't do this but first let's fix something else we go back to this because you're now gonna have two different systems possibly you want to make sure that those logs and metrics are centralized you could use cloud watch you can use your own system the whole point is a centralized those otherwise you're gonna be using multiple systems to see what's going on with our customers in everything x-ray again for tracing instead of using the private IP I can see of any load balancer today f5 and Citrix and many other load balancers already provide you today a cluster IP or virtual IP use that instead because that IP will be always there and if I service change you're always going to be fine again authorization here but what changes in this pattern is most customers already have actually directory or some sort of identity provider we're a custom authorization here would be a lambda function talking to your own premises identity provider in order to make sure that clients should be able to connect you I didn't talk in here but most of the time you also want to have the API gateway in private as a private API gateway it depends on what you're trying to do because you now have this API gateway this contract on top you're slowly shifting what you have the same as apples to apples or sometimes you refactor some of it or every platform some of it some databases now could be an RDS if it's the same database engine sometimes you can just use this container now using Amazon is yes for a gate you name it but sometimes we know we know what happens not every code can be refactored and not every code can be lived and shifted so sometimes it could be just a virtual machine or sometimes they stay on premises for constant time you retire them afterwards or you do the stream it the screaming tasks shut down the server if anyone screams now I know who owns that piece I know don't ask our collection then for new functionalities then you can start using service you can start using London dynamodb to get the new functionalities and also cost reasons as well so how does that look in real life last year I had the pleasure to work with another custom in the UK called hey HSBC you probably have heard of them HSBC didn't sip in different parts so I'm going to show you the networking foundation first the ABS absolutely necessary for customers who are in large enterprise and have lots of traffic and regulations and see huge constraints I've seen any multiple places not only in FinTech and retail and the news media and all that pieces so how does that work you normally have a V PC right that you may have some lambda functions there those little functions have to talk to something on premises now you already know there's gonna be an API that causes a lambda and they do something on premises and sometimes it could be both ways as well as you have this lambda function one of the one of the regulations especially the HSBC when we were discussing this was all the internet traffic including the outbound traffic sure on premises had to go to a proxy so they could enforce the security controls they already had in place what we did was having a separate VPC super-small just for those proxies and what we did for those lambda functions because clearly belem the function will grow that will start very small and it's going to go for hundreds if not thousands of functions afterwards what we do we use vp c endpoint should get all those traffic Shu on-premises should have vp c endpoint that goes should network a load balancer then goes to that i connect and it goes to hsbc the opposite also works you have another vp c endpoint that go to the internet go to the proxy that goes to igw or Internet gateway same idea one of the interesting pieces which they cover on their video is now they're able to get that lambda V PC and started duplicating across the business without having to change the network anymore and that's something they still do today and I will give you the session ID or if you just look for HSBC service reinvent you find that the session was last year they not only covered this but they covered what I'm going to cover next which was even more interesting if you're doing distributed systems or if you want to do the same mainframe to service how does that look similar to the idea of strangler Ponton as they got the networking foundation they wanted to have a new service in this case was a new feature completely new that their mobile customers will be able to receive push notifications and they will also be able to slide those push notifications to only then get the information they wanted as opposed to send a push notification with the entire date in there from security reasons like you don't want to see your your cash or withdraw in a push notification sometimes so what we did was first we need to find a way for the customers to get on their mobile app web app and set preferences of what exactly they want to receive as an modification super simple you already are an expert in here you see this is kind of classic recipe I just works then next we had the mainframe operations happening all over the place and what we wanted we needed to convert what was the mainframe and just something that could be consumed on AWS so first we started Kafka and then we use something amazing that I love now it's called Apache knife I Apache knife is nothing more but a data workflow you get data in a particular format in another one and Apache knife I was also responsible through a connector to get that data from Averell to JSON asking and then sending that to a canoe so string quite similar to this trimming pattern in some ways we have a difference that we couldn't use London in here they already had the process of doing some heavy computation of the data from mainframe so spark was something that they were already comfortable with so keep that in mind as well if we are placed already to know what you're doing and they already have something that works really well start with that and then slowly you see if that gives you more benefits of using something new the next one was you've got other data from the mainframe you've gone into Jason but the challenge now you have so much data that you don't know what is the business event that would match the customer preferences they actually wanted are they looking for credits are they looking for withdraw what exactly is happening there this data service ended up becoming a central piece of HSBC from a service perspective now because it provides the data service for any other micro service any other team inside should go and fetch the data from this mainframe if they wanted you so the data service had thus apart from starting the daily engine dynamo in their video they explain how they use aurora should deal with duplicity and handling we've Ida pond and same type of type of situation so data service transforms all this JSON into business events store into that dynamo and then it makes sure that you can work with as a REST API for other services should work with the next one event which wasn't available back then so what they did was they created their own event engine by making sure that once the data comes in the business the event now they know what it is they need to know if the customer actually is interested on this before this in a moment push notification to them so that was the future I see all these events is that what the customer wants similar to the SMS message future in which we discussed once they know exactly what the customer wants the event is what the customer wants they push you a message service you say the same thing API gateway lambda etc API gateways to make sure that you always have a contract the order services can talk to you but the data event-driven is actually always using Canisius all over the places on entry point once they know what it is you you get that data and you say I want to send this as a flush notification the notification service will then send that but the message service is the only one who knows exactly what the content of the message is so we just say send to this particular customer mobile application ID and they can come back to me with that message I did I'm telling you and then you can see what the content is for security reasons so that's exactly what happens that's why I see the API Gators all over the place as an entry point as a contract either between services or reusable components across the business or for the mobile apps you also interactive but in this case we're only setting preferences with treating preferences and also making sure we get that message out to win with slides that button on our mobile phone so that's currently working for that couple medium is not twenty over over twenty million customers right now okay so now that you're ready before I actually close I just want to show you some quick links that you all should know this is the service airline that I mentioned to you I'm documenting the patterns right now that we've introduced and documented some of the trade-offs I already mentioned but you can already deploy in our account we're now doing ETL in another pool request we're now also doing load testing in another pull request so just kind of switch my laptop show you what this project is how you read in how you get those patterns and if you are doing Python the best language in the world I have a treat for you as well all right so let me switch this real quick oops okay so this is the this is the pub this is a repo that I mentioned she already so you can go to the pull request and you can see a bunch of them in fact you can if you haven't watched the twitch session if you go to closed you're gonna see multiple requests per episode and a bunch of things including optimizations all of that regional edge to regional lambda memory optimizations is all in this pull request you can see there's a bunch of pull requests in there including application composition and a bunch of things we tell customers should do but we've actually done any practice so how do you read this if I were to open github in my ended in here this is the folder that I used amplify she managed all the graph QL authorization and things like this don't touch it don't touch that folder like this can break seriously so leave that for CI inside the source you have not only the front-end we're using view Jas again best framework in the word for me and now react lovers right and then I have intro and tasks using Cypress we basically spin up a docker container open up a browser search for a flight booked a flight payment and make sure that I got the booking and I got the points for now and then inside the booking inside the backend I have the booking service catalog service everything you've sent if you are using cloud watch logs today have a look at this this has all the common practices I've been telling customers for quite some time within one click you apply log exploration custom metrics asynchronously and a bunch of other pieces you want to do just a single template inside each of those backends if you already had the curiosity how a project would normally be structured whether it's Python nodejs whatever that is I'm using Python here again best language but you could be Java it could be other things or go for instance for worst if you will and the same template is over here and also came to booking every micro service for every function itself we are also doing differently especially if i already have more than a couple confirmations tax and natural question for you would be what is the pattern of sharing resources between stacks we cover all of this inside those templates as well so go and check that out the next one if you do Java this is a much better example for you so look up for real words tablets application what we've done here is if you ever seen the service application repository service on AWS we open sourced part of the functionality to showcase how we develop internally not only it's available in the builders library some of the patterns you can see but inside there you'll be able to see some of those patterns yourself implemented that's using Java and using some annotations for a REST API as well the best part is you want to go to the wiki all of those patterns and why CloudFormation why closed formations to restructure the way it is in this project dynamodb why rest in fact why are they even using single number function as opposed to multiple other functions there's always there's always why there's reason why this is always there including alerts and pagination and some strategies you can use but this is for you now what if you're doing Python if you're doing parts on it you're the lucky one I need your feedback on this I silently published this before the ring vent should capture feedback so I can go on holidays come back and get your feedback and start working on this remember the operations pillar at the bottom of every architecture I'm trying to automate all of this with a few lines so what this does I will show you that does structure logging for you tracing and custom metrics in a way that's so simple to use the idea here you should make it easier to have conventions that we tell you on a chat on a on a session basis but we start with Python first but once we lock it down because it's beta right now we can start working on other languages as well so how this is approached so let's say you want to do this structure this tracing that I've been telling you to do this is how import the tracer and say tracer tracer and that we already trace code starts if you run some CLI Weaver function you already know that you're trying to run locally and we disabled tracing no performance impact on that one and if you do this at tracer capture mode or at tracer capture handler that will capture the metadata of or errors or traces and adding the trace itself any fair doing something real in production that is like absolutely must if you don't do annotations and you're tracing or labels you're absolutely missing visibility on this one what this does imagine you have 100 functions or more and you want to know if the payment has been successful what were the transactions that the payments failed where the latency was over five seconds which micro-services and whether the downstream was talking to DynamoDB or not that's what labels and annotations give you annotations is nothing else but a key value that you can sort and group all your traces across a WS and you can start doing logic like end or whiff and things like this and better yet you can also create a much more composite alarm for instance if you want you can say show me all the traces where the customer has a premium tier and you went to loyalty and it has a latency downstream of over a second because I want to know they shouldn't have that experience and if that ever happens create a cloud watch metric for me all of this is done through annotations so don't miss on that feature you don't hate you for doing that feature it's free indexing for you next what if you want to do structure logging well you do logger setup and we were already hooking to the lambda logging and we will convert all the log impossible in Python to JSON format and that at logger inject lambda context will pick up the classic keys that we always tell customers you should log in their logs so if I were to log something like this collecting payments or send the entire object for whatever reason this is how cloud watch would look like this is a structure logging I've been saying on slides in practice you will see the code start whether the transaction was a code start or not and you also see things like requested the correlation ID memory our name of the function what service what line etc etc etc and this is the only one that I'm sure is gonna change because like you I didn't even know the cloud watch was been announced at piece so I implemented that before I'm now gonna change it this is how you create a custom metric today with this library you do log metric you say the name of the metric the unit the value name space or dimensions customer ID etc and we'll do all of this behind the scenes asynchronously with no performance impact so that's how we're doing right now this is the start but there's also the lambda design power tools if you are interested on this one for nodejs this is where we got the idea from which is much more comprehensive for now it provides a different way of different things but I will show you in this picture only so what this does is not only gives you utilities to do obfuscation of the data is simple logs in production which is an amazing way of doing things especially for concern about cost it also works with which I prevent infinite loops prevents timeouts if something happens you want to get the data and it also adds coalition ID with all those events sources and it gives you a fine chain a de Bresse SDKs as well so this is again in no js' right now but the idea is actually if you lock it down the other API and start giving you something similar across other languages as well we know those conventions we can make it easier and we will make it easier so if you haven't so this is called dozen power tools so amazing company fantastic engineering and yeah it's all open sources what I use in production now with a couple thousand functions alright okay so I'm going to switch back and then just close this if I had two hours I would probably have given you the same patterns and I will probably have talked to you about why more patterns we're not gonna help you don't obsess over patterns don't obsess that these are the only true way through way of doing things because it's not patterns is great because you can learn from other people doing production or doing other ways you works for them but it doesn't mean is the best way for you to learn from the pattern see what makes sense experiment and test so why aren't wrenches proposed here is I use these five pillars of where we are connected as a way to guide you whether whatever you're doing makes sense to you when we announce a new feature for instance or new service use the operations pillar to understand do I or whatever open source or whatever service I use do I get enough visibility if something fails if something fails how exactly do I know when it fails do I have metrics for this do I have enough information for this how does anyone handle this how do I do the resiliency same thing with liability as well but preservatives what's important here is service most likely it's going to be using with other containers or other technologies side by side as you're progressing it and that's super important you know that even those servers can scale everything else may not scaling the same pattern so you need to regulate the access rate to make sure you protect those resources otherwise it just kind of queue everything don't do this don't be like me the security reasons different from what you're oh I saw most of it is actually authorization here or there for an infrastructure most of it is authorization but service doesn't change anything in how you do security inside your application in fact what changes is that now you have a much more focus on the security side of things application security that's what matters or what hasn't changed doing see huge code review is still so important as it was before the difference is a servlet handles all the infrastructure pieces for you and now I have more time to do these things performance already mentioned you have high you have high scale but you also have low scale so see what you need and change use different features as well and cost like already mentioned many many times we like to talk about TCO and many other values that we provide and they're scaling the infrastructure but what I've seen over the past four and a half years we've served as in production or as a six years of a SS employee or obtained cloud for about ten years is there is an operational toll and also cognitive load on people when you try to choose new technologies service is fantastic but what really works in enterprise it's really really effective is start small start with about four people or five people even same with HSBC same with Daniel same with many other customers I've worked it not only in the UK but across to me across Asia as well the same idea start small make sure that works make sure everyone understands what it is create your own design patterns your own blueprints your reusable components and then you get to the rest of the company like I mentioned you close up as well at the last slides we always provide you related breakouts even though they already happened you can find them on YouTube but I will tell you my favorites because I think I'm allowed right at the bottom if you were doing service at scale the patterns may be different there are got shows that our operational livers you have to know how to use it that last session covers this covers concurrency at scale it covers Kinesis in a much more in-depth detail in many other pieces if you are doing java again i like to make jokes about java and python python being the best language etc but the reality is a java is super super fast except if you Java spring which is something else but if you do Java spring your code start could be high but after that is really fast but there's a much better way to do java spring if you have to as you're starting that session photo tree are covers a java spring application that was taking a couple seconds about 10 or 11 seconds to do a code start and it's a funnel improved by 80% of that code start and the whole application by doing a few tricks so it's a very deep dive session 400 level that if you're doing Javed zamosc that must have if you're all if you're still learning on some of those patterns or better yet if you're starting right now or if you want to do a refresh at the bottom right-hand sides we have a specific course now in specific path that you learning some of those patterns and some of those pillars that we just discussed not only for the certification but also for anything else so go check it out there's always something to learn always something okay so apart from saying please send the survey I shall appreciate if you want to reach out when I talk to me directly my DMS my direct messages are open on Twitter ping me at any time if I don't know which I don't know everything I will probably find someone who knows and I can help you with that so feel free to take follow me on Twitter or ping me that actually at any time if you're doing Python you want to provide me feedback on that send to the repo or say me directly thank you so much for having me [Applause]
Info
Channel: AWS Events
Views: 52,003
Rating: undefined out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, ARC307-R3, Architecture, AWS Lambda, Amazon S3, Amazon API Gateway
Id: 9IYpGTS7Jy0
Channel Id: undefined
Length: 63min 34sec (3814 seconds)
Published: Tue Dec 10 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.