AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC402)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys good afternoon welcome to serverless architectural patterns and best practices my name is drew Dennis I'm a Solutions Architect with AWS based in Texas any Texans in the audience all right with me today I have my colleague Maitreya Ranganath and also from BMC a joy Kumar with the Rd group at BMC our agenda today we're going to focus on four common well adopted in use today by existing AWS customer patterns I'm going to take you through the first couple of patterns which are a three-tier web architecture and batch processing and then Maitreya is going to come up and take you guys through stream processing and operations automation our hopes are that you can take these patterns back with you to help you identify potential use cases and workloads in your environments where they might be applicable next a joy will come up and talk to you about BMC specific use case of one of these patterns I'm sure you guys have all heard of BMC right and are familiar with some of their products and it's I think it's really a great use case and a great example of how a customer a large customer like DMC can actually modernize existing products and applications for their customers as well as add new features around security operations and compliance auditing before we get into those items though I want to just do some brief level sets around serverless applications kind of talked a little bit about what they are what makes them different from traditional applications so as the name implies as and as I'm sure you're all aware serverless applications are comprised of services where servers operating systems containers are completely removed and abstracted from you unlike deploying applications on ec2 where you have to you know you don't have to manage the operating system with a serverless service you don't have to make sure it's healthy its performant all of those types of concerns there's also a class of services at AWS that we call managed services here we being the three of us for this presentation and these are services like elastic search and elastic cache RDS or redshift where servers are still very much a part of the service in the model when you deploy services within those products you actually are concerned with the number of servers that you deploy potentially the role of those servers where they're located so servers are still very much there and you need to be concerned with them from a scaling perspective as well typically with serverless services you don't have to worry about any of those types of concerns secondly with serverless services they're automatically or inherently sit at a regional level with an AWS infrastructure and what that means is they're automatically aware of all availability zones in a particular region where they're deployed so you never have to worry about high availability and fault tolerance those types of concerns are taken for you and that's one of the big reasons why serverless applications are becoming so popular now and then probably the third important distinction I want to point out is the unit of scale with serverless applications is different it's not servers it's actually functions and particularly lambda functions so the four patterns that we're going to be presenting to you today are actually all very much you know lambda functions are very much at the heart of those patterns right and they really are the unit of scale so and before we proceed and go into the four patterns I want to briefly talk about lambda a little bit because all of the patterns will leverage them a lambda function is a unit of work that's comprised of your code that responds to individual requests and events now as those events and requests grow the lambda functions will grow as well and as they shrink the lambda functions will shrink so you never run the risk of over provisioning or under provisioning lambda functions for your application that's a really nice thing economically that means you never pay for idle that's also very nice and what I really like about lambda functions is they really make it easy for you to not worry as much about some of the mundane and boring aspects of an application like logging and op racial monitoring their facilities built into lambda to handle a lot of those things for you and it's very extensible to add additional functionality in those areas and I think lambda functions always also take care of some of the really really difficult things about applications being horizontal horizontal scale in your applications building that out yourself is extremely difficult to do and it's something that lambda provides essentially out of the box you could have a lambda function that's as simple as a single line of code and as a single request comes in that line of code will execute but if a thousand simultaneous requests come in for that it will scale horizontally to answer and handle all thousands of those requests so it really allows you to kind of skip the boring parts and skip the hard parts if you know what I mean lambda is also stateless and there's a lot of conjecture about this I think out in the industry right now and really what that means at a very high level is you should always store state outside of lambda any state that lambda creates or needs to use in order to run use one of our persistent data stores services like dynamo DB or s3 is an example to interact with that state information there's no affinity with the underlying hardware and lambda functions or at least you shouldn't assume that and if you have a lambda function that interacts with the local file system on the host where it's deployed or spawns a child process or maybe even executes some of your own custom binaries that you've packaged with the lambda function which are all possible with lambda you should never assume that subsequent implications of that lambda function are going to be able to use those things okay so that's what we mean by lambda is stateless lambda functions are deployed inside of a container and when that container is deployed you know the function will execute and then that container will remain for a period of time now we don't publicize or provide information about how long that lambda function or that container will be available so that's why you can't plan on state and reusing any of these artifacts that we mentioned okay when these lambda functions are deployed inside of containers there's an initialization phase that happens and that's going to be things like downloading the code and all the code dependencies from s3 attaching en is if you want your lambda function to run inside of a V PC and running the initialization code we call this a cold start process now when you define a lambda function one of the things you have to define is the handler or a function in your code that each invocation of that lambda function will execute and we call that kind of a lambda function that swarm will just execute that particular function so in this example you see here and this is just example code don't try to reproduce this I've removed some things so it'll it'll fit on the slide but in this example we're importing some Python modules at the top of the code and then we actually establish a database connection with a relational database and we do those things outside of the handler so those things will execute during a cold start or as that container is initially deployed onto the host then for every invocation will only execute the code inside that handler that you see at the bottom of this okay so there's concerns sometimes about cold starts and whether or not they'll affect your application and the answer that whether or not they will affect your application the answer that is really you know maybe I mean it depends on your application if your application is fairly consistent with the number of lambda functions that it executes and the frequency is reasonable at which it executes those then you know cold starts and you know may not be too much of a concern for you because that's going to be a very small percentage of your overall invitations but if you have a lot of disparity and you know you might have 100 lambda functions and then a thousand right after that and then a period of a lot of inactivity then cold starts can be an issue right so you want to make sure that you you're aware of you know how the cold starts function if you want to improve your cold start times you know minimize the code outside of the handler obviously is one approach make your package as small as possible remove any unnecessary dependency so that that download from s3 will happen as quickly as possible and VP for VP C support with en is only use that if you need that VP see support that's optional and we've certainly seen customers that attach those to lambda functions without needing it to be attached so that will just add to the latency of your cold start times and customers that really need to manage these cold start times one strategy is to keep them warm with cloud watch events I'm sure many of you are familiar with cloud watch events but you can schedule functions and invitations of those functions with that and that's a really good way an approach to kind of keep them warm if cold starts your concern for you from a local file system perspective lambda functions have access to /tmp on the host its 512 Meg's of scratch space that you can pretty much use to do whatever you want to do I was working with a customer a couple months ago and they wanted to take documents PDF files office documents whatever convert that into text so we downloaded that to /tmp before we sent it off to that service to convert that information into text and that was really nice because if there's an error we don't have to redownload that from s3 we can retry for the duration of that lambda function this particular example shows no js' example have used an ffmpeg and taking a video file and extracting a frame to a JPEG storing that locally before it uploads it to a service another best practice with lambda is to use custom metrics with cloud watch often-overlooked it's really very simple to create cloud watch custom metrics just one API call put metric data and you can absolutely store any kind of information that's important to your application in a cloud watch custom metric it could be business related and could be rallied related to the operations of that function it's completely up to you but this is a very useful thing to do keep in mind the scale of your application may dictate how you send those custom metrics into cloud watch you here's I've mentioned on this slide a few account limits that are defaults for putting metric data into cloud watch so if your scale extends these these can be increased by making a support call into into tech support another approach for very very large scale ingestion of cloud watch metrics is to send information to Kinesis you know and aggregate it before you deliver that into cloud watch so that's something we've seen a lot of customers adopt okay now I'm going to go through the the first of our pattern which is a three-tiered web application here's an example of the three-tiered web application essentially static content is up at the top you can see in red Amazon s3 and CloudFront for content delivery of those static artifacts that exist in s 3 and then API calls that the client makes go through API gateway which in turn can do a lot of things interact with various AWS services commonly they'll interact with lambda functions that do a variety of different things in this case and in this example leveraging DynamoDB is a persistent data store now there are many variations to this particular pattern you know we're talking about a web application here and that can take up a lot of different forms it could be a mobile backend as an example and and you could add SNS to this architecture or pattern to send push notifications to mobile devices you know it could be a micro services application an approach where you have you know hundreds even api gateway methods defined and backed by lambda functions so you can apply this same pattern to that use case as well now when you think about deploying a serverless web application within your organization's somebody might come up to you and say well what about security right how does it handle security or how do you handle operations of that or how do you handle deployments and inversions of deployments to to that environment this those things are a little bit different right with the serverless application than what you might be traditionally used to for example a three-tiered web application a traditional three-tiered web application we usually have subnets with firewalls between the subnets right to restrict access to the different layers so things are a little bit different with serverless applications so let's talk a little bit about that so here's that same pattern and from a security perspective with s3 you have bucket policies and ACLs as I'm sure you're all aware to control access to those artifacts that exist on s3 you can limit access for example within a VP see if it's an internal application cloud front has a feature called origin access identity which ensures that only cloud front can get access to that those s3 resources there's also geo restriction capabilities with cloud front and the ability for cloud front to deliver private content to your users through signed URLs or signed cookies and cloud front has inherent DDoS protection built in from an api gateway perspective there's throttling which can certainly help and if you're using this for actually delivering an API then there's great features like quotas and usage plans that you can leverage as well but API gateway also has some really nice authorization features if you want to require some sort of authorization in order to get access to the methods delivered by API gateway then there's three ways to do that one of them is with I am using I am credentials so if you're coming in as a federated user as an example you can leverage those iam credentials to get access to those methods another one is Cognito user pools and Cognito can also be used with web identity federation providers to provide iam credentials so you can get access to your methods and then lastly you can use custom authorizers with api gateways so if you want to you know include something into an authorization header of a request and then validate that somehow maybe through a JSON web token as an example api gateway can provide security that way as well when you put a lambda function behind an api gateway o next thing custom domain names can completely be used here both levels AWS certificate manager supports cloud front and you can import your custom TLS certificates into Amazon API gateway so that you can have custom domain names across the board here for both a for both types of data that you'll be interacting with and then API gateway has a trust policy so that only API gateway has Authority or the privileges to invoke lambda functions the lambda function that you designate and then lambda functions as you probably know also run with execution roles right so you can ensure that these lambda functions that are being called only have access to the services that they need so Security's handled a little bit differently if you need additional security application layer security at an API gateway layer one strategy that is a common pattern is to create an additional cloud front distribution put that in front of API gateway and attach AWS wife to that for cross-site scripting attacks or sequel injection attacks and and attacks of that sort to do that obviously cloud front needs to be configured to use HTTPS to interact with API gateway on the backend obviously and depending on your handling your custom domain names you want to make sure that the host header that's a part of that request is not going to be delivered in the backend to API gateway for obvious reasons because of the API gateway name will most likely be the host name will not be the host name that's being requested from a monitoring perspective let's talk about logging and monitoring a little bit with this pattern cloud front and s3 they both provide access logs in s3 so you can get full logs of all the accesses and all the requests that come in to the artifacts up in that static layer API gateway in in lambda they actually log to Cloud watch logs so you kind of get two separate locations for log files in this pattern that that's easily remedied you can send both of these locations to elasticsearch which i think is a great destination for log files you'll get a meet you know Amazon's elastic search service will give you a Cabana dashboard and immediate indexing of those log files as you see fit and certainly there are a lot of good third-party products out there as well that can consolidate those log files dynamodb has a feature called dynamodb streams which you can use to provide triggers for your application that's really useful you know depending on your application and the data that it writes the dynamodb you might want to be actionable on that and send out notifications or or the like so dynamodb streams can be a really important thing to leverage here certainly cloud watch custom metrics as we discussed with lambda and then cloud trail for auditing and I don't know if you guys caught it but a couple of weeks ago we actually announced the s3 me now provides data access auditing events in the cloud trail so if you want to see every time a file in s3 is being accessed through this pattern or even being written to those events can show up in cloud trail now and you can be actionable on those as well how about from a deployment perspective well this landscape has grown a lot over the last year I'm sure you guys are familiar with frameworks like server lists and apex and Sparta there are a lot of really great third-party frameworks that are out there and available well a couple of weeks ago we actually announced our own it was codenamed flourish and now the official name is the AWS server 'less application model and what it allows you to do is define a more complex serverless application comprised of multiple lambda functions multiple API methods and DynamoDB and iam credentials in a much more efficient syntax that's yeah Mille or JSON based so the way this works is locally on a development machine I can kind of define that template which is represented in yellow here on the far left of this slide and I can combine that or package that with my code for my lambda functions maybe a swagger file for my API definitions as well as any dependencies and then I can package that and deploy it into s3 through this service when the service does that it produces a new version of your template with specific references to an s3 object for that specific version of the deployment and that's really important so you kind of get this marriage between a new version of that template and an s3 object that represents all your artifacts additional deployments will have its own s3 objects so it's very easy to revert back and forth to previous versions and then that serverless template can be submitted into cloud formation one of the really nice things about this product in this framework is it's built on top of cloud formation it's just an extension of that so if you have CI CD tools today that interact with cloud formation or the AWS CLI they'll work with this framework we've actually to the AWS CLI added a couple of commands for this AWS CloudFormation package which is a single come and that will take care of all that packaging and produce that new serverless template file that you see in green here that has the references to the specific s3 object as well as the AWS CloudFormation deploy command which actually creates the change set for cloud formation and deploys that and as you know you can get more complex with this by leveraging CI CD tools so if you wanted to trigger all of this based on a change in a code repository or if you wanted the change set to be approved before it's deployed CI CD tools can certainly be beneficial and can be used to achieve more complex workflows around this I also want to talk briefly about a couple of features with API gateway because it's showcased in this pattern best practices use mock integrations for sure if you can especially in the design phase of your API mock integrations are a very easy way for you to kind of model out your API and allow in you know the front-end to kind of start its development and its interaction with with your methods a few weeks ago we also announced API gateway support for binary payloads and and that's really really important but the size limit there is ten megabytes right so if you're going to be interacting with API gateway for binary transfers beyond that 10 megabyte limit may be really really large images that you're uploading or downloading as an example through API gateway then the best pattern and way to handle that is to leverage signed URLs with s3 hand those off to your application and then allow the end user of the application them to interact with us 3 directly for those large binary transfers asynchronous calls for lambda greater than 30 seconds API gateway API gateway is actually built on top of cloud formation right so in this model you still have cloud formations limit of a 30 second transaction so if you have a transaction that extends beyond 30 seconds we probably need to talk about why it's taking so long but if you do have that need just make sure you make an asynchronous call to that API method so that it can return back and then you can check back later from your client to see if that's done now there are a few new API gateway features that I think also enables some new patterns for deploying web applications through API gateway first of all there's something called a greedy path or catch-all path variable in this case shown by the proxy plus and the squiggly brackets that essentially is a catch-all variable for any child path that you specify in your request it can be located anywhere in your API in this case it's inside of the root we also have an any method an API gateway which stands for any HTTP verb that you might be calling in your method so that's a really easy way to provide you know any types of requests that come in and then there's also proxy integration now with API gateway which essentially means that your request and your response doesn't have to be transformed we'll take the host headers and the body of the request as it comes in and just pass that on to the backend now in this case and this slide the backend is a lambda function that's going to process that request it doesn't have to be to use these features it can certainly be just an HTTP endpoint if you want it to be an ec2 instance or something like that maybe you have an existing API that's deployed on an operating system and you could certainly use this to front-end that and to just pass those requests on through specifically we have a framework called the AWS serverless Express framework you can go out and download that off of github in the AWS labs account and that allows you to take existing Express applications written in nodejs and deploy those directly into lambda to take advantage of these features so it's probably the simplest way to get a web application up and running if you have an existing Express jas application okay let's talk about pattern too witch is batch processing a few characteristics of batch processing very large datasets typically handled on a periodic basis maybe a nightly basis or an hourly basis to process your data and that's a broad term processing your data it could be performing calculations on your data querying your data may be enriching your data in some some respect well very very common to do with ETL workloads and this can certainly be used in lieu or in addition to a lot of ETL workloads certainly not a replacement for those they're usually very not very interactive and long long-running and very similar to the Hadoop ecosystems MapReduce programming model right so the things you might use MapReduce for you might consider this pattern as well to kind of complement or be used instead of some of those let's talk about what that pattern looks like let's say you have an object it's going to be delivered into s3 and it's a really big object you know lambda has a execution time out of five minutes right so if I give a one terabyte file let's say to a lambda function it's probably going to take more than five minutes to process that so in this pattern what we do is we create a splitter function so as that file large file is dropped in the s3 bucket this lambda function sole job is to take that file and split it up somehow maybe by size by lines whatever the case may be that you need and then deliver that to cascading mapper functions so these subsets of this file will be delivered to the mapper functions they'll process it do whatever you want the the batch processing application to do and they'll deliver the results into DynamoDB then you'll have a reducer function that kind of collects all of those results from all of the mapper functions and stores the results somewhere in this case s3 so this model in this pattern is very dependent and in the key to it really as we said before with lambda functions is cascading those mapper functions those are really the unit of scale in this particular pattern when should you use this pattern over a MapReduce or a hadoop mapreduce type type solution well I think a lot of it depends on your your level of expertise is where your areas of expertise are you know if you have expertise in sequel and and a dupe and spark and presto and those types of services absolutely use those but if you don't if your expertise is more around the lambda languages like Python or node this can be a really great way to kind of achieve the same levels of efficiencies the speed of this pattern is directly proportional to the number of concurrent lambda function that you have set in your account and we'll talk about that in example here in just a minute you can really use any of our persistent data store services as that intermediate storage to collect all the results of your map mapper functions and we have actually started a new project on our AWS labs github site called the lambda MapReduce reference architecture I've provided a link to you for you here on this slide to that that that really builds upon this this same pattern in this same model that particular reference architecture doesn't really use a splitter function so that's one difference but it does actually use s3 as the the storage location for all of the the mapper functions to deliver to and the reducer functions to work from so if you need to run all of this inside of a V PC that can be a really good approach so lastly I'll just leave you with an example that we did based on this pattern we took a 200 gigabyte Google Ngram the 3 in gram dataset we normalized it a little bit stored it in an s3 bucket and then we process that with this pattern so 200 gigabytes we did it from an AWS account that had a thousand concurrent lambda functions and it took nine minutes to go through and to process that entire data set to do some calculations on the data and some summations and return the results so really really efficient and absolutely up to par to the types of performance you would see with some of the in-memory Hadoop ecosystem solutions like spark and presto and it all totaled up for a cost of seven dollars and six cents which I think is fairly compelling but also keep in mind this is a paper execution model right so if you're not doing this if you're doing this on a nightly basis if you're not doing this regularly then you won't pay for those costs okay I'm going to hand it over to my tray to take you through the third pattern Thank You drew so let's look at some of the common characteristics of stream processing applications so imagine that you have a fleet of sensors out in the field and they're generating measurements every few seconds what you might imagine there is you are going to get messages or readings or measurements at a very high data ingest rate and your requirements might be to take those messages and analyze them in near real-time so you want to make sure that the time between ingest and the analytics is as short as possible you may also have spiky traffic so imagine that field of devices is not always connected to the network or they might produce traffic that is spiky in nature and you want to architect your solution to make sure that you can handle those into those transient spikes and valleys right oftentimes when you're dealing with streaming applications you want to make sure that every message that comes in is stored durably until you have time to handle them and you can make sure that every message is counted so it's important to do that and very often but not always message ordering is important so imagine that you have a stream of transactions it's important for you to handle those transactions in a sequential manner and those could be characteristics of a stream processing applications that we see across our customers so how do you handle that in a server less fashion here's an example architecture that puts that together and the use case we imagining here is that you have those same fleet of sensors in the field they're generating some sort of physical measurement imagine that says temperature and our requirement here is to aggregate those temperature measurements and average them over a period like one minute and generate those aggregated results out every five minutes that's the use case we are implementing here and the way we realize that is to use a producer and that's using the kpl library the Canisius producer library to submit those measurements to m2a Kinesis stream we have defined the Kinesis stream as the event source for our processor lambda function that processor lambda function performs a one pass through the measurements that are coming in and stores the intermediate results into s3 again we are using s3 as the persistence tour you could have used DynamoDB or ElastiCache or any other service like a database for that and then we have the separate paths at the bottom and that's the scheduled path that's using cloud watch events scheduled to run every five minutes and that triggers a scheduler function whose sole job is to spin off a number of parallel dump functions which go ahead and take the intermediate results and produce final aggregated results in another s3 bucket right so this example here meets the requirement that we had a couple of considerations to remember when you're using Kinesis to trigger lambda functions it's important to note that the number of parallel lambda invocations that you'll get is equal to the number of shards in your stream so remember that the unit of scale for Canisius is a shard and that has a certain amount of capacity so if you have five shards you'll get five concurrent lambda functions now that means that each concurrent lambda function needs to keep up with the capacity of a shard and if you find that you can't do that then you might want to consider what is called the fan-out pattern where you have the first function and you split the logic into two parts the first functions job is to pick messages of the shard as fast as it can and then split those messages up into chunks and invoke the second set of lambda functions in parallel so that you can now increase your throughput but what we've lost as a result is message ordering so it's a trade-off it depends on your application if this is a good pattern or not a little bit more about fan-out pattern the capacity of a stream of chard in a Kinesis stream comes out to 1,000 records per second or up to one MB per second of data so really these are two envelopes that you have and it's important to see which of these envelopes is a consideration for your app and whether lambda is keeping up with that peak capacity if you find that you can't keep up with that it's a good place to actually introduce the fan-out pattern and if you're looking at implementing the fan-out pattern consider using synchronous invocations of lambda in parallel and if you're using node.js there's a great project called our coil that lets you do this quite easily and handle errors again it's important to handle errors because you are interested in message driven and one pattern that you can use for that is to use what's called a dead letter Q so any messages that could not be handled you put those in a queue and you handle them in an offline or a separate process that's a common pattern that we see out there couple of best practices when you configure Canisius to trigger lambda there's also something called the Blatche size what this defines is the number of messages or number of Records that will be batched together and submitted to lambda in one invocation the default is 100 but for some high-throughput use cases you might want to increase that batch size higher what that does is it sends more records per invocation and one of the ways that we charge you for lambda is by the number of invocations so in doing this you can actually reduce your lambda cost quite a bit another thing to make sure you remember is that you need to tune the memory settings of your lambda function when you increase the memory setting of a lambda function you also correspondingly give that lambda function more CPU so this could be a good way for you to deal with problems where you can't keep up with the load and you can increase the CPU by increasing the memory and you might be able to keep up with the rate of messages coming in and you might avoid the fan-out pattern and finally I've talked about this it's always a great idea to use the Kinesis producer library when you're sending messages to Kinesis because what that does is it it packs multiple messages into one record and lets you efficiently use the capacity of your stream to the fullest extent so if you have very small messages you can have packed them into one record multiple messages into one record and you can use increase your throughput how do you monitor the stream processing pipeline we talked about of course you have the standard tools at your disposal and one of the important metrics to keep an eye on is the Kinesis iterator age milliseconds metric if you're doing things right the metric will look something like this you have a steady state of zero with very transient spikes that indicate some intermittent problem in handling the data but if you see your metric go like this a staircase function it's a surefire indication that you are not keeping up with your message rate and this is a time when you want to look at optimizing the work that you're doing a lambda function or to consider the final pattern the pattern we presented earlier really has lambda code that you are executing but what if you said that look I don't want to write that code then an alternative pattern is to use Kinesis analytics Canisius analytics is the simplest way for you to analyze data that's streaming in and the way it works is you define an application in Canisius analytics you define the data source in this case a Canisius stream the same stream that we had earlier and you define the destination which is an s3 bucket and you represent your processing logic in sequel as you can see in this example so we have the same kind of time windowing functions and those are represented by the yellow text there what you're doing is you're taking the time stamp flowing it down to get the minute that the measurement should belong to and then you're doing the aggregation that's the blue text you're calculating the sum you're calculating the number of messages that you've seen so you can later do an average and you're grouping that by the device ID so this pipeline essentially does the same function that we had in the first pipeline but it does it without having to manage and worry about things like how do i fan out how do i scale to match the needs so it takes on more of the work from you we wanted to do a cost comparison between a server less approach and a server based approach and in order to do that we had to send some sort of Representatives traffic and we settled on this traffic model this is a six hour traffic model which peaks at 50,000 messages per second and it has a steady-state baseline of 10,000 messages per second and we extrapolated from that six hour run to see what the cost would be over a 30-day month and in this example you see how the cost broke out on the server left side we are using a Kinesis stream with five shards and that was good enough to handle the 50,000 messages per second because we are packing multiple messages into one record and the cost came out to be a little over $400 a month on the server base side we're using Keeney's Kafka to actually collect the messages we using zookeeper to manage the cluster so we have a three node cluster of each of these and we have one consumer that's processing this and the cost on demand comes out to a little over seven hundred and thirty dollars but if you change to the one year reserved instance purchasing model that cost drops down to about four hundred and thirty four hundred fifty dollars so the key to remember here is when you are using server less model your unit of scale is quite different from servers you're scaling based on the traffic that comes in so if your traffic pattern is variable you come out ahead with the server based side your unit of scale is server so you have to worry about utilization am i using the service to the fullest extent possible so you also have to worry a lot about operations and I didn't I didn't actually account for the operational cost but a joy later we'll talk about how that changes in a server less world AWS provides a number of services that are related so we've talked a bit about Kinesis but there's also sqs and SNS that can let you process messages that are coming in and this I chat here attempts to compare certain attributes of those I'd like to highlight two of those one of those is message ordering so Canisius guarantees that messages in a shard will be strictly ordered with sqs you have two choices now the standard queues which do best effort ordering and the FIFO queues which actually guarantee the message ordering within what is called a message group with SNS of course you don't get message ordering at all another aspect amount of highlight is how the messages are processed you looked at how Canisius and SQS both need the ability to write a consumer or a reader so with Kinesis you can have that being lambda with sqs you need to write something that reads messages while SNS actually comes built-in with a number of destination types so you can send messages to SMS email even sqs and of course lambda so which of these is suitable for your use case really depends so use this chart as a way to decide which one is best for you if you're familiar with big data processing there's a concept called lambda architecture not the same as AWS lambda but lambda architecture and the gist of that is you're looking at taking your data processing pipeline and splitting that up into batch and speed layers and the concept that Dru talked about with batch processing as well as the concept I talked about can be combined to give you a big data lambda architecture automation now the next pattern this is really a pattern that involves a number of different types of use cases so examples would be you need to respond to alarms or events you need to schedule periodic jobs or need to audit and notify on anomalies that happen in your environment or you want to extend area of yours functionality something that we haven't built but you wish that you want to actually control that and you want to do all of these while being making sure that you are highly available and you are scalable right so how do you do that and I'm going to show you a couple of examples of automation patterns they cover the gamut of some of the use cases that we talked about here and the idea is that you get inspired by that and you can apply that to the problems that crop up in your environment the first example is extending AWS functionality so in this example you know that when you launch an ec2 instance you get a DNS name but that's not really very friendly because it has an IP address embedded in that but what if you wanted to have a friendly name associated with that ec2 instance and you want that name to be resolvable when the ec2 instance is running but not resolvable when the instance is not running so the way you achieve that dynamic dns capacity is by setting up an ec2 instance change event that is triggering a cloud watch events rule and the cloud dot event rule is now going to trigger a lambda function and the code in the lambda function is going to call route 53 api's to add a name a name an a record with the name and the private IP of the ec2 instance that just started or change state and you can take that name from a tag on the tc2 instance in this example I use the name tag cname and whatever the value is that makes it into route 53 and is resolvable now even the ec2 instance changes State so it goes from running to stop or running to terminate it then you go ahead and remove that entry from route 53 so you achieve the use case this example also shows us saving some state in dynamo dB the idea here is that we want to store some metadata because we can then clean up that route 53 entry when the instance stops another example is what we call a s3 driven data flow and in this example the use case is that users are uploading images to an s3 bucket but what you need to do is actually trigger off that upload and process those images to resize them or generate thumbnails or whatever right so in this example you can create a in s3 that triggers a lambda function upon an object upload to the put object that lambda function does the work required to process that image that just got uploaded so it resizes the images and stores the final output right back into s3 so this is a common pattern and it can be used for many other use cases for example Twitter periscope uses this to analyze content that's uploaded by users for appropriateness and it rejects any messages that should not be published on to the final streams so that's an example of using s3 data-driven workflows and finally I'm going to talk to you about the audit and notification use case and here I would like to highlight the great open source project called cloud custodian this is a project that's sponsored by Capital One they essentially saw that there were lots of scripts that they were managing to audit their environment so they felt that it would make a lot of sense to create one tool where you could describe all the rules for your environment so example would be if a rule would be that let us say that we want to enforce that all EBS volumes that are ever created are encrypted so you can actually define that in a template and what cloud custodian will do is take that template and create one or more lambda functions in your environment those lambda functions are triggered on various different events for example they could trigger trigger on cloud watch events they could clear trigger on cloud trail events as well as on cloud watch log log entries and those triggers will trigger the lambda function the lambda function continuously checks against the compliance rule that you defined and then if anomalies are found you can define actions that it should take so for example in the EBS volume encryption use case you might define the action as terminate the instance because we don't want any instances with unencrypted EBS volumes you could also define an action that says notify me and that could be done through SNS so if you're finding yourself creating a lot of scripts you might want to step back and see if Cloud custodian is a good match for your use case a couple of best practices when you are dealing with automation do document how you can stop the event flow for your automation so how do you disable the event flow for automation so that you can troubleshoot your automation thing goes wrong oftentimes when you are doing automation you are calling AWS api's so do be aware that we do throttle api's so you want to handle those API throttle responses in a graceful manner so the usual advice we give people is to have an exponential back-off algorithm and if you are using our AWS SDKs they will take care of this automatically so it's a great idea to use our AWS SDKs and finally like drew mentioned it's important especially in the case of automation to publish custom cloud wash metrics which make operational sense for you so if you are having an automation that does periodic snapshotting of EBS volumes it would be a great idea to publish how many volumes were successfully snapshotted how many fail and you can then alarm on those and notify people if something should go wrong now I'd like to invite a joy to talk about how it at BMC they realize the survey les pattern for security and DevOps automation thank thank you feels great to be here and be able to tell you the story of our cloud journey so as you know BMC his has been a pretty well-known player in data center automation cloud automation but mainly on on-premise solutions next month we are about to release a new cloud service which is going to be security and automation as a service in cloud I'm Ajay Kumar and I'm going to talk about our journey in terms of how we build this cloud service in a very agile manner and how serverless architecture and lambda services we're a key factor in really doing this I am going to talk about three things I'll talk about our use case architecture and then key learnings so let's deep dive on our journey here earlier this year we started thinking about what what what would a cloud security and compliance solution would look like based on talking to a few of our customers we had four goals essentially and these are very typical goals of any SAS service basically support wrap rapid iteration rapid changes like weekly deliveries scale in terms of number of tenants amount of data we're consuming support sophisticated logic applications which you can extend the product and obviously economically scalable so let me talk about the specific use case what exactly does it mean that you want to do CI CD integration of thumb clients into DevOps pipeline so let me start with what a typical DevOps pipeline looks like you have a build phase a test phase and a deploy phase and it typically what happens is in each of these phases you're creating a lot of artifacts here for example in the build phase you could be creating docker containers or infrastructure as code like CloudFormation template and you really need to understand how are you governing these how are you basically May ensure whatever artifacts you're generating they are secure for example is my docker container secure if I am building it out through a build process if I am creating cloud formation templates am I using the right set of instance type for ec2 there are all these all these really compliance checks which need to be done right after it build just happens that's part of the automation use case which we want to which we want to build as a part of this service this the third part of the phase is really when you deploy your software to cloud again here after you have deployed the code you want to check did I is my cloud secure did somebody make a change in infrastructure as a code and left a firewall port open these are all questions which really need to be tackled in this in the DevOps pipeline and not typically what is done today in production so that's what that's what is our key really the key key use cases that we are solving where we want to consume any type of data from a DevOps pipeline of whether it's the container data of the metadata about containers the cloud the test cases and so on or the test case results and be able to do some sort of application business logic on that data and then be able to tell you the results in terms of what are some of the rules that passed and what didn't pass so that's really the key goal of this use case which we are trying to build if you see the common pattern across all this from an architecture perspective is essentially you want to collect the data and collect that massive scale because you're you're talking about a saseru is collecting from hundreds of tenants each of them running dozens of pipelines so you want to collect all this data then you want to really build applications on top of it do some processing and then show the results to the to the customers so then we started thinking about building this architecture in a traditional way which is really build these some of these clusters here like Ingenix kafka cassandra elasticsearch some of the zookeeper world and what we started realizing that we are spending a lot of time in building infrastructure and very little time focusing on an application the green box there is our real application which is really doing our compliance and security checks but then we realized why are we spending all this time but but that's where we started really looking at whoa let's start understanding how to build these clusters how we will operate them in production so while we are busy doing that and this is really true the product management comes back and tells us that hey there are some customer validations we need to do and can you get it done in a month so that's when we pivoted on our architecture and we said let's do something more disruptive here and let's see how what we can do with lambdas and the Cerberus architecture so what you see here is really the architecture which we built in a month and now we are continuously what you evolve in so let me just describe you to you very quickly what this is on the far left you see collectors which are collecting data from your devoxx pipelines from your cloud and all all kinds of data getting ingested into API gateway you do see an ingest lambda which is pushing all that data into kinases so that's the part of data where data is flowing into the kinases note that this is slightly different from what may 3 are described we are not using kpl but we are using a lambda function essentially to do a lot of data enrichment and tenant context in that lambda once once all the data is available on the kinases you see on the far right the green things there which are applications which are like policies rules which are running on those that data and and once that data the rules are evaluated it gets stored in Amazonas dynamodb and s3 and then there is another lambda function which runs on those under database which is continuously indexing it and pushing it into the elastic search so what you see here is the entire data pipeline is really a series of lambda function ingest lamda application which is also lambda which is getting triggered as data comes in and then finally I didn't I want to make sure I talk about the API this is a completely API driven system so what you see at the far bottom-left all our public API czar also follow the pattern of a gateway and lambda functions behind it we defined all our API is in swagger and essentially this completes basically all the the core architecture of our product now what you see here is the benefits of it are obvious I mean we were able to do this amazingly in in less than four weeks so that's the beauty of lambda that instead of thinking about infrastructure calf car and cassandra and managing operating clusters worrying about security patches monitoring we are not doing any of that we are just I mean we have a small team of developers thinking about how do I really add business value how do I think about new services new applications to focus and how do I deliver that functionality continuously so that's what as was our aha moment when we build this architecture and last six months we have had number of trials next month we will be actually going live with this so let's let me just review some of the patterns the top two patterns which I am thoroughly impressed with is really the the API pattern which is having an API gateway and lambdas is amazing you we basically churn out api's at an amazing speed here the second most valuable or equally valuable is really ability to build new applications so we have data coming into the Canessa stream and you have multiple lambdas which are getting invoked so that's also an amazing pattern which we have really leveraged here in our architecture I'll skip some of these other things these are pretty obvious the scaling you already heard we use fanouts so the scale is really amazing I mean we have several hundreds lambdas simultaneously ingesting the data as we really scaled from a few tenants to like 50 200 tenants in our load testing similarly all our applications got scaled up to close to four 500 lambdas so essentially we are we are achieving massive scalability with this architecture and again I wanted to point out without really doing anything at the infra level because if you did this with servers in a traditional ec2 world you'd be doing a lot of things like aSG's and thinking about metrics and all that this is all this this just happens automatically so that's really the core core a message I wanted to bring about here on we have been talking about application architecture so on the left side of the screen is all the things we have used beyond lambda lambda so these are all managed services again the core principle is the same we don't want to deal with infrastructure we want to do we want to leave that to Amazon we want to really focus more on our business use cases what I wanted to talk to you about is in addition to application architecture when our operations architecture is also pretty much serverless so essentially what we are using is cloud watch we are using a lambda we're using logs in lambda and building application metrics we are passing them into cloud watch and then we are generating as a SNS and SES events if some some activity goes on which is which which doesn't align okay okay let me switch to operation side here so I've been talking about we are doing no infrastructure operations there is a myth about no ops and I wanted to clarify here that you still need to do operations even though you are using serverless essentially now you are doing more application operations so for example is your lambdas or all your services up and running or whenever data is fed in you really need to watch those kinds of metrics are you getting 5xs errors in your API calls how is my lat latency doing so all these are application metrics which you do want to really continuously measure and we so you do you do have operations but they are mostly there are all at the application level and none at the infrastructure level so that's the really key benefit we have been deriving our operations team at this point is very very small essentially because is when we do DevOps pipelines every week or every few days and push new versions of our lambda functions that's all we are really pushing into the production I want to close this with a few key learnings of what we have learned basically using serverless patterns in architecture is really great the biggest value in my mind is the agility or the speed of innovation so in some sense we are we are all now talking the language or all our conversations are all about how do I what does a customer want and how do I really bring the next feature in by making a new lambda version and then using our DevOps CI CD automation to push a new lambda so that's where the whole innovation angle of of lambda really helps so that's that's really what I see is the most useful or the most impactful feature obviously the we talked about operations aware we are really doing a pops and not-not-not none at all on infrastructure ops so no servers no security patching none of that really matters security does matter so we are actually doing security but again at the application level at the API gateway level and so on cost savings we are getting a huge cost savings because we have a very small dev ops team we are not managing six clusters or eight clusters like I talked before if we had chosen a different architecture we don't have any servers so that's the beauty of this if there is no data then if then we are not any incurring any costs as customers really are committing to their pipelines we will see more data coming in these lambdas really kick in and we generate our results so it works really nicely basically in terms of how we have designed the serverless architecture finally I just wanted to mention that we are really impressed with what we have achieved we have done a lot of load testing next month we will be going live with this service to do compliance checks in our DevOps pipeline and we are looking at really building a lot more additional services based on our learnings from what we have done in the last seven eight months of putting together this architecture thank guys that's pretty much it I want to thank you all very much for sticking it out past 5:00 p.m. I know you have a pub crawl to get to these are a few related sessions that I just wanted to point out around serverless technologies we're going to be doing a repeat of this one on Friday Challis is a really good one it's our server list Python based web framework that you can use to deploy web applications and there's also a workshop building a serverless application that I would strongly recommend going to on Thursday at a couple of different times thank you very much please feel free to complete your evaluations and if you have any questions we're going to hang around up here feel free to come on up thank you
Info
Channel: Amazon Web Services
Views: 26,445
Rating: undefined out of 5
Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, AWS re:Invent 2016, aws reinvent, reinvent2016, aws, cloud, amazon web services, aws cloud, re:Invent, ARC402, Drew Dennis, Maitreya Rangana, Architecture
Id: b7UMoc1iUYw
Channel Id: undefined
Length: 60min 17sec (3617 seconds)
Published: Thu Dec 01 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.