All about the health and lifecycle of your serverless apps | Operational Excellence WAF Pillar #1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello world and welcome to another episode of ubar today we are going to continue talking about the well architected framework and in this episode we are going to talk about the first of the pillars that is the operational excellence pillar if you don't have a clue what the well architected framework is and you want to know i have done a video getting you started on that the link is in the description box so i will not cover all that now we should go directly to the pillar because that's a long one so what is this pillar about every pillar has a different reason to exist and this one is all about developing running and maintaining effectively your operations in your organization in your products in whatever you're doing so in this video we are going to talk a lot about devops practices we are going to talk about infrastructure's code we are going to talk about automatization we are going to talk about observability so that's what is going to happen today some things that you have to have in mind some design principles for this pillar are perform operations as code we all know that but we will go more in depth in that make frequent reversible and a small changes that's one of the basis of continuous integration that you want to keep on pushing things to the the repo and building and testing it and then review and refine your operations frequently and that's something that is very important and you will see in this pillar that is all about iteration and feedback and learning so that's super super important and then learn and anticipate failure everything fails all the time as mr vernon vogel says and here is all about looking at uh learning from your failure and looking at what can fail in the future so this pillar is organized in four areas uh or best practices or i like to call them steps because i think they kind of need to happen in this order this first step or area as the document call it is the organization [Music] area and basically here you will get your things together you will understand where your business priorities are what the responsibilities of every team and every member of the organization is you will make sure that everybody is aware of what they are responsible for so that's the first one the second one is uh getting ready like the prepare area or the prepared step i mean here is basically uh design these um these workloads and these operations in a way that you can uh get an understandable uh output of this uh workload so when you are planning your workloads you want them to be able to tell you something and not just execute and you need to guess so that's the prepared part then we have the operation parts the operate part and this is a step where basically i will be doing a lot of talking and this is the operate part in the operational excellence i will say here is where i will show you demos and i will show you how to implement everything so stay tuned for that and the last step area or whatever you like to call them is the evolve and this is the evil learn from what you have experience iterate share and improve and continuously improve so this video is going to get kind of organized in these four areas but i have to be honest the three of the four areas are quite short uh let's talk about organize prepare and evolve those are quite short the one that we are going to focus a lot is the operation part the operate part so yeah you will find chapters of all the kind of uh where all these areas start and what is the content inside them so let's start with the first area that is the um organization and get everybody in the same page part i mean you will see that all the areas are more or less organized in some way they have some basic questions that you need to ask and then some kind of design principles or ideas that will help you to address those questions so when we talk about our organization we talk about everybody needs to know what they are doing everybody needs to understand the business uh what your business is doing uh everybody needs to understand what are the business priorities and have priorities everybody needs to know the responsibility in a level of a team or in the level of organization and here the questions are how do you determine what your priorities are then how do you structure your organization to support your business outcome and then how does your organization culture support your business outcome and these questions can be addressed by different strategies that they are listed in the world architecture framework but here is just a kind of summary of some of the most important make sure that you have owners for every component application process whatever you do having owners is critical to improve the performance of a component if nobody owns it that component is kind of dead if too many people owns it nobody takes responsibility so have one clear owner for every piece of your application or your organization make sure that the owner and the people around your organization understands the business value of each component and why we own it sometimes this is i saw it a lot in different organizations that we want to build as engineers we want to build everything from scratch and because it's fun it's a challenge but it doesn't make sense to own those piece of components or those pieces of software because that requires maintenance that requires a lot of things and those are not aligned to the business that we are doing so for example hosting all your i don't know your logs and have build your own log system or building your own authentication system for your application when you can use a library or you can use a third-party services you need to have a very clear reason why you own that is that crucial for your application that you definitely cannot exist if everything else is down then maybe you need to own it but if it's not that important maybe you need to think that about so this is making you think about those things another one is evaluation of risks and here uh is important also to recommend those risks so again we go to the reasons of why you own what and sometimes you hear from the engineering teams that they say well i want to build my um authentication component because what happened is the third party goes down and then they don't understand that well if for example they're hosted in the cloud if the whole cloud provider is down or the service is having issues in a bigger level then the authentication will be less of the problems and then also look at the cloud providers slas and reports how much they are really down and and it's at real risk like that's something you need to evaluate whoever is your provider and it can be a cloud provider or it can be a third party provider but try to figure out if it's a real risk that they're down or it's more risky that you are down um so that's something also i have heard about uh what if we want to migrate imagine that the price goes up and then we need we will be locked down well is that a real risk or that something that it's kind of uh making your brain a little bit annoyed because you really don't like to be all in in one platform and if you really cannot be all in one platform state the risk and why are you not doing that some organizations cannot be all in one provider or in one platform or they need to be able to migrate in a kind of matter of really short time you may have your own restrictions and you might have your own situation so in this moment is the time to address those situations and understand what a real risk and what are like your engineering team wanting to implement everything from scratch then here it's also important to create a safe place for experimentation and trying new things and learning and education so this is the part of the culture to build a culture that is safe that is uh embrace failure and embrace experimentation and embrace learning and people are not look weird for trying new things unsuggested new things and this is the place that you are going to develop those things so this is more a cultural change than maybe kind of our technology chase but the technology in the operate step will help you to create a technology-wise safer environment for your developers and architects to experiment but this is about the culture about the embracing and having processes like for example if something goes wrong how you manage those situations what you do in the case that there is a failure do you do a postmortem you try to make them that you are not blaming anybody and everybody feels safe when that happens and they learn something about it so this is the part that you do that so now we go to the phase of preparation so now we are thinking about our workloads we are getting more in the technical aspects and we want to make sure that our workloads are able to provide all the information that we need to know later in the operations part like information about their state and how the application is doing and how it's performing and everything that we need this is the part that you prepare your workload for that so the questions that these uh area ask are questions like how do you design your workload so it can understand this state so you can understand the state how do you reduce their defects is reminiscents and improve flow to production of your workload how do you mitigate deployment risk how do you know that you're ready to support a workload and this uh is all about design and planning of your workload things a lot of these things will get solved with a ci cd pipeline that we will see in the next phase how we can start looking at that um basically approaching uh the fast feedback phenomenon that you want to get from a network being discovered to the error being solved in production super fast and that's something we we will measure in this stage for example then we want to do small changes that are easy to reverse and easy to test and they go fast to production we want to have a consistent way to create changes in production either having a process that is automated and also have a place where we record the changes into production it can be a remote repository tickets or whatever you do but have a process for that other things on this stage get addressed with devops processes like have run books for normal routines infrastructures code playbooks when things go wrong and what you learn from them so you can uh fix and and if something happens again you can go and check that playbook and see what is going on uh have multiple environments again create environments easily so your developers can experiment so all this is in this preparation phase so if you're still here and you're about to hear me rant for the operations part that is the most interesting one and you have not yet liked this button what you're waiting for i'm waiting i'm waiting like it like thank you so so let's continue now i'm going to talk about the operate phase and this is the core part of the operational excellence pillar in my opinion but if you have not done the pre-work in the organize and in the prepared phase this is kind of hard to have completely done correctly so the questions that we are going to answer here are how do you understand the health of your workload of your application how do you understand the health of your operations and how do you manage the workload and operational events and here i will add two more questions and these are the ones i'm going to answer because i think they answer the other questions uh in a way and this comes from the serverless lens because we want to focus on serverless applications i told you in the previous video and i forgot to tell you again so the question that the serverless lens asks us is how do you understand the health of your serverless application and how do you approach application life cycle management so these are the two questions i want to address and i will start with the first one how do you understand the health of your serverless application and here we need to look at the observability triad metrics logs and tracing so this is the kind of pillars of observability and we need to implement those things within our applications and in our workloads in order for them to tell us something about their state so logs are basically timestamp records that we do in some moment uh and with information that we want to post they are very discreet like now we put this in the database we decide what to write in the blog so that's something uh we will write then we have metrics that usually they gather some kind of numerical data about things that help us to track our application health they can be things like uh more operational things like so many lambdas were successful or so many lambdas were throttled or so many failures or so many 404 or they can be things about your application uh execution business logic like we have so many people register today or we have so many views or we have so many sessions or i don't know something traces are uh end-to-end transactions so we will get some not all of them so this is something that we will sample you don't get all the the the traces but you sample your your executions and you get some from end to end that goes transversally through your whole application so you can understand the transaction journey understand the bottlenecks understand how everything connects together and then basically if we want to uh implement this with aoes then we will have these services that are for logs we have cloudwatch uh logs and then we have cloudwatch insights i show you how to do this in one of my videos all the links for this are in the description box there will be a lots of links because i'm not going over myself but i have made a video about that about structure logs and about uh cloudwatch insights and how you do it and how you implement it in your serverless application then we have for metrics cloudwatch metrics the dashboards and the alarms the alarms are critical to tell you when something is wrong so you don't need to be watching that uh dashboard all the time and i also made a video on how you create alarms how you create metrics and how you see these in a dashboard so video in the description box and finally we talk about traces and we talk about x-ray aws x-ray that help us to create a traceability in our application we can see the distributed tracing going from synchronous process to a synchronous process like event bridge or cues or things in the way and then how we can put logs metrics and traces in one place using the cloudwatch service lens all that is in a video on how you do it with aws serverless and some and all that stuff so go and check those videos so now you have the videos and you have the question that those videos answer so let's go to the other question that the serverless lens put on us that is how do you approach the life cycle management of your application so applications in general have more or less a similar life cycle we code we build we deploy we test i hope and then we go to production you might have 1 2 3 27 different environments in place but at the end of the day is the same if you prototype if your experiment you might want to build and test and then that might not go to production but still or you have a very brave team and they can just put things into source and go directly to production so at the end of the day it differs from team to team and the needs and the applications that you're doing but these four faces are present everywhere so the key here is to automate as much as we want on the life cycle of our application we want to focus on coding our business logic we don't want to focus on anything else so we are going to talk about infrastructures code cicd pipelines continuous integration and continuous delivery or deployment pipelines environments having multiple environments like development testing production one environment per developer whatever you're mashing then we are going to talk about application configuration management that is fundamental to be able to do infrastructures code and have multiple environments in a sane way and then we will talk about testing and uh i will show you a lot of links again because i already talked about everything so i want to start first telling you that i have covered all these for serverless applications uh all the life cycle management uh in my ring band 2020 conference talk it's the link is in the description box you can go there and watch it it's a 30 minute talk it's all about clear formation so that's kind of the drawback because in that time that's what i was doing uh and then i will share with you more links on how to do with cdk and another tools as well but that talk covers everything basically and it has a story on how things move from one part to another so that's kind of my first recommendation if you only watch what one video after this that can be very helpful so when we talk about infrastructures code there is so many options nowadays you can do cloud formation you can do some you can do cdk you can do amplify you can do serverless framework you can do others and for all of those i have different beginner videos that you can go and check in the playlist i leave it in the description box so you can learn a little bit about all these different infrastructures code and if you don't know you're not aware when to use one or the other i owe you a video on that i have one video that is a couple of years old and i want to make a new one so stay tuned for that then when it comes to ci cd pipelines again we have multiple ways of doing them we can do that with code pipelines uh we can do them with github actions or gitlab we can do it with jenkins i don't know whatever you imagine the important thing is not the tool that you use but it's that you have one in place and here when we talk about ci cd pipelines i also want to talk about environments i want to talk about having multiple environments in these pipelines and here we can talk again what kind of uh deployment strategy you use do you will for example multi-branch strategy that you have a development branch and then a production branch and then you deploy uh to that dev branch and then you merge the code and then it goes to dev i call it git flow i don't know if that's the right terminology or then you have an all-in strategy where you just push to one branch and it will go to a development stage everything automated run integration test and then push to production without you doing anything that's one way another way is to have like different everything and do everything manually please don't do that but but again it depends on what are your um your needs so depending on what you need the strategy that you have to build i have videos on all these strategies and i have videos on so many of these technologies i leave you the link this down there in the description box also important in this part is to talk about the different deployment strategies for your code so for example when we talk about serverless applications we talk about strategies like canary or linear that are supported by code deploy and meaning that you can deploy your lambda functions and run integration tests before deploying then deploy them in a way that they are gradually getting into production so you can make sure and monitor that everything is right and then run integration tests when they are finished so this is a great way of doing uh deployments in a very very safe way a lot of the patterns in ci cd and multiple environments are based on the of that building environment is so so so cheap when you are doing serverless applications you only pay for what you use so if you create an environment for marcia to test the whole application it's okay while marcia is testing you might be paying something but then you'd remove the application the environment as easy as you created it and you don't pay anything else and if you leave it up if nobody's using it you might not pay for more than the storage so for the application configuration management things that are important for you to learn they're critical because if you're doing infrastructures code and multiple environments and you don't have application configuration in place you will be nuts you will be creating a lot of duplicated code inside your infrastructure's code part so say uh this is production this is them this is that and just giving names and giving attributes and and give a lot of things that are can be resolved with environmental variables basically uh that every environment has a different set of environments but they all run exactly the same code so for that basic services to get to know are the parameter store kms because you might want to encrypt some of those values in the parameter store also the secret manager the secret manager is great it's super easy to use it's a little bit expensive so that's why i encourage kms and parameter store and then if you really need to juggle around and do more interesting things with your application configuration you might want to check app config service that will do a lot of this for you so i'll leave you in the description box a playlist with videos about these topics so we are reaching the end and we will talk about testing congrats for the one staying until here so testing it's uh i think it's one of those recurrent things that we all developers want to know more about and we never implement but yeah when you talk about serverless applications you can do testing in so many places so it's important to be aware of that that there is not only one type of test that you can do so uh you can do code reviews yes you can do those uh you can add some machine learning to your code reviews if you're using java you can do code guru for example or then you can have some libraries running and checking your code there is many many libraries that are open source or services that you can integrate or then have good senior developers that look at your code and review then when we talk about unit tests and the the build phase itself you can automate the unit test running uh unit tests are super important you should not do millions of them if your lambda functions are simple enough you don't need a lot of unit tests because very important that they are not as important than in a traditional application also you can test your infrastructure so you can create for example with cdk um you can create a cdk test in your infrastructure to make sure that you are creating the right infrastructure you can also add linting to make sure that you have the right format for your application and everything is right so for example if you have a yaml you might want to run some lendings before starting to deploy that because it might fail because it's not correctly formatted so all those things you can do in the build phase as an automated process then when we come to the integration part you deploy your application and what you do there well you create integration test here you can have it against the real cloud services build environments for your testing or then you can have it again in the sandbox accounts or fake accounts if your third party doesn't support this type of environment so it's good to have uh sometimes you need to mock those third parties so for example if you're doing a payment sometimes the payment uh third party might allow you to have mock system sometimes they don't so you need to think about these things here for running integration tests i like to use lambda functions learning functions are great you can integrate them directly in the code pipeline as um kind of action within the pipeline you can pass parameters in and you can pass parameters out so it's kind of really easy way to do integration tests without really needing to have a server or anything running uh in there so basically the things finish deploying building their i don't know cloud formation stack putting everything up and then the next step is along the function that run some integration test and make sure that things work and everything is good and also it can tear down things as well another thing that we sometimes forgot when we talk about serverless applications is slow testing unload testing is an interesting one uh there is many libraries i like this one called artillery i have a video on how to set it up you can run it from an ec2 machine if you don't want to run it in your machine you can add it in your ci cd you can build a script you can you can do a lot of with it if you want to know how to do load testing in your ci cd let me know in the comments box but load testing is an interesting one because we think serverless is scalable until the end and yes and no there is soft limits there is services that might scale in a way and serverless that mining services that might not scale that well there is third-party integrations that might not like your serverless applications so having load testing in place to make sure that your application can't support a load that you expect is fundamental also no tests help you to know how much it will cost but but yeah so i recommend you to have a suit uploading test and maybe you don't need to run it for a deployment but whenever you do some major changes in your application run them and make sure that everything is good and finally after you go to production or when you are going to production use code deploy with these kind of safe deployments with the linear or the canary or even a just all in deployment where you can run integration tests before to make sure that your environment was stable deploy and then run a integration test afterwards to make sure that your changes didn't break everything up again integration tests can be created as lambda functions so it should be quite simple and that's it that's all in the operate thing ah i forgot i have videos on testing links in the description box so now you have videos on observability you have videos on ci cd you have videos on infrastructures code you have videos on application management you have videos on testing you can do this pillar like a boss the last phase is the above phase and this is a short one that address how you can evolve your operations meaning what you're learning and how you're going to improve all these videos i show you there are a lot of information and if you are starting from zero you will not implement everything right away and you might not need to implement everything right away so this is the part that you are evaluating if what you have in place is right for you in your current situation and what you want to do next and then create a plan organize prepare operate and enable this is very important to have fitment mechanisms have metrics on your operations to tell you how fast you're deploying how many deployments you're doing how many errors you have when deploying i don't know all these kind of things that might help you to make better decisions to understand if your operations are working good also it's good uh to have nah practice on this and for that i like game day so game days are kind of exercises that you do now and then where you break something in your organization and then you need to start um kind of feeling who is the responsible person how we can fix what is the process what is the playbook how we can run this um and then you can automate all these with some cows engineering exercises but that's for another video i have talked about with kaos engineer with adrian horsby so also i have a video so that's it for me today i will stop talking because you might have a headache right now and i hope you enjoy this video let me know if you like this video if you don't let me know what you would like to see in the security pillar because that's coming next and i have not done it yet so i would like to know how i can make this better for serving your purposes so thank you again and have a great day [Music] you

Info

Channel: FooBar Serverless

Views: 425

Rating: undefined out of 5

Keywords: foobar, well architected review, well architected framework, operational excellence, aws waf, amazon well architected, aws well architected, well architected, cloud computing, amazon web services, testing serverless applications, observability in serverless applications, deploying serverless applications, developing serverless applications

Id: jyz0WBNVhkA

Channel Id: undefined

Length: 32min 8sec (1928 seconds)

Published: Thu Nov 11 2021