AWS re:Invent 2015: DevOps at Amazon: A Look at Our Tools and Processes (DVO202)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

ROB BRIGHAM: Welcome everyone, my name's Rob Brigham and this Clare Liguiori. We're from the AWS Developer Tools Group where we build the tools that developers inside of Amazon use as well as a new set of AWS code services that all of our customers can use. Today, we're going to talk about DevOps at Amazon and give you an inside peek at how Amazon develops our web applications and web services. We're going to divide the talk into two parts. First, I'm going to tell you the story of how Amazon made its own DevOps transformation and how we became more agile at delivering software. After covering that history, we're going to come back to the present and I'll introduce to you three new services, AWS CodeCommit, CodePipeline, and CodeDeploy. After that, Clare's going to come up and give us a great in-depth demonstration of how to use these new code services to create your own DevOps processes. But before we get started I have both an apology and a confession to make. So, first I apologize for my voice, it's not normally this hoarse. I promise I went to bed early last night, I did not stay up late, but still, I'm losing it a little bit. So, I might have to take some breaks and have some water in between the talk. Now for my confession, I used to hate the term DevOps. It just really, really bugged me. And it bugged me because it's fuzzy. People would use it in many different ways to mean many different things. So, whenever anyone said DevOps, other people weren't really sure what they were talking about. But earlier this year I finally caved to the pressure and I started using DevOps in my talks. It just really is the best term out there that captures this new modern style of software development and delivery. So, since I'm using it in my talk, I'm going to have to define it. And I'm not going to define it directly, but I'm going to relate it to something that we're all very familiar with, and that is the software development lifecycle. So, this is a typical lifecycle for any kind of web application or web service. On one side you're going to have your customers and then the other side you're going to have the developers. Every new feature that you build in that application is going to go through this lifecycle. Developers are going to come up with the idea, they're going to implement that, they're going take the codes that they write, then they're going to build it, test it, put it through the release process until it finally gets out into production where your customers can use it. After your customers get their hands on it then you can start to learn from it. Your development team can look at the customer usage data of the application, they can get feedback directly from the customers, and they can start to learn to make educated decisions on what they want to do next. So, they might choose to refine that feature to improve it, or they could chose to build a whole new feature and then this whole loop starts again. So, there are two important things to note about this development lifecycle. First, is that the speed at which you're able to complete this loop to build a new feature, get it in the hands of your customers, and have your customers start using it and so you can learn from that, the speed of completing that loop determines your business agility. The faster you're able to go through it, the more responsive you'll be to customers, and the quicker you'll be able to innovate. At Amazon, we focus intently on completing this loop as quickly as we can. The second thing to note is that your developers are only adding value in the eyes of your customers when they're working over here on the left side, writing new code. Any time that your developers spend in the middle either building the delivery pipeline, or handholding changes through that pipeline, all of that time is lost in the eyes of your customers. Your customers will only see the new features that your developers are writing. So, what you want to do is maximize the amount of time that your developers are writing new features and minimize the amount of time that they're spending in the middle. And it's really those two things that make up the heart of DevOps to me. To me, DevOps is any efficiency that you can drive into this process that helps you move through this loop faster. And this is why it's so confusing because there are many things you can do here. You can make organizational changes, cultural changes, process changes, tool changes, and I think that's okay. To me, any of those improvements that you can do to move faster through this lifecycle count as DevOps. So, to make this more concrete, I'm going to tell you the backstory of Amazon's own transformation to DevOps. And like most companies, we did not start out that way. In fact, if you go back to 2001, the amazon.com retail website was a large architectural monolith. Now, don't get me wrong, it was architected in multiple tiers and those tiers had many components in them, but they're all very tightly coupled together where they behaved d like one big monolith. Now, a lot of startups and even projects inside of big companies start out this way. They take a monolith first approach because it's very quick to get moving quickly, but over time, as that project matures, as you add more developers on it, as it grows and the codebase gets more large and the architecture gets more complex, that monolith is going to add overhead into your process and that software development lifecycle is going to begin to slow down. So, to depict how this was affecting Amazon, I've re-renderd the software development lifecycle for a monolithic application. So, what we had was a very large number of developers working on this one big monolithic website, many more than what we have up on this slide. And even though each one of these developers is only working on a very small piece of that application, they still need to bite off the overhead of coordinating their changes with everyone else in the project. If they're adding a new feature or making a bug fix, they need to make sure that change is not going to break someone else on that project. If they want to upgrade a shared library to take advantage of a new feature, they need to convince everyone else on that project to upgrade to the new shared library at the same time. And if they want to make a quick fix to push out to their customers quickly, they can't just do it on their own schedule, they're going to need to coordinate that with all the other developers who have in-process changes at the same time. And this leads to the effect of something like a merge Friday, or maybe even a merge week, where all the developers take their in-process changes, merge them together into one version, resolve all their conflicts, and finally create a master version that's ready to move out into production. And even when you have that big, large new version, it still adds a lot of overhead on this delivery pipeline. That whole new codebase needs to be rebuilt, all of the test cases need to be rerun to make sure that there's no regressions. and then you need to take that entire application and deploy it all to your full production fleet. At Amazon in the early 2000s, we had a single engineering group whose sole job it was to take these new versions of the application and manually push it across our production environment. So, this was adding a lot of overhead not only to our delivery process, it was frustrating our developers, and most importantly it was slowing down our software development lifecycle, it was slowing down our ability to innovate. So, we made some changes and we made a couple big ones. The first was architectural. We went through that monolithic application and we teased it apart into a Service-Oriented Architecture. We went through the code and we pulled out functional units that served a single purpose, and we wrapped those with a web service interface. So, some of these single-purpose services that we pulled out, some examples are one, whose sole job is was to render the buy button correctly on the product detail pages, we also had a single-purpose service whose sole job it was to calculate the tax correctly on the checkout process. So, when we created these single-purpose services and pulled them out individually, we also had a rule that they could only talk to each other through their web service APIs. There was no backend shared data access allowed. What this enable us is to create a very highly decoupled architecture where these services could iterate independently from each other without any coordination between those services as long as they adhered to that standard web service interface. To give you an idea of what this architecture looked like, I've included this graphic. And what this represents is the amazon.com retail website circa 2009, and all of these individual services that made up that experience. So, back then when we made this architectural shift we didn't have this term, but today we call this a microservices architecture. In addition to that architectural change we also made an organizational change. So, before, we had one central hierarchical product development team. We ended up teasing that apart as well and we broke it down into small, what we call, two-pizza teams. And the idea behind that name is that we wanted the teams to be small enough so you could feed them with just two pizzas. Now, full disclaimer here, we're really targeting around six to eight developers per team, so depending on how hungry your teammates are your mileage may very on that name. So, each of these two-pizza teams was give full ownership of one or maybe a few of these microservices, and when I say full ownership I mean everything. They owned talking with their customers, whether they be internal or external, they owned defining their feature roadmap, designing their features, implementing them, writing the test for them, deploying those services into production, and also operating those services. So, if anything went wrong anywhere in that full lifecycle, they were the ones accountable for fixing it. If they chose to skimp on their testing and were unknowingly releasing bad changes into production, they were breaking in the middle of the night, it was those same engineers that were paged and had to wake up to fix it. So, what that did is that it properly aligned incentives so the engineering team was fully motivated to make the entire end to end lifecycle operated efficiently. So, again, we didn't have this term back then, but today we'd call this a DevOps organization because we took those responsibilities of development, and test, and operations, and merged those all onto a single engineering team. So, after we made these two changes, the architectural and organizational change, we dramatically improved the front end of that development lifecycle. These small two-pizza teams were able to quickly make decisions, quickly crank out new features for their microservice, but when they went to deploy that to their customers we had a problem. That old model where we had that central engineering group that manually deployed the entire application out to the production fleet, just would not fit this model where we had thousands of these microservices all wanting to deploy on their own schedule. So, we had a tools gap. And to fix that, we started a new tools group that built a new breed of developer tools. These tools had some unique characteristics. The first is that they had to be self-service. There's just no way one tools group would be able to onboard thousands of these different two-pizza teams if it involved any handholding. So, what they did is they created a web portal where these teams could come learn about these new developer tool services, they could figure out how to get started, and they could also provision whatever resources they needed to start using it. So, you can say this is very AWS-like even before AWS had started. Second, these tools had to be technology-agnostic. We had given these two-pizza teams full autonomy to make whatever decision they wanted and they took full advantage of that. They chose different operating systems, different programing languages, architectures app frameworks to implement their services. So, the tools were going to have to be adaptable enough so that they could work with all of these different technologies. Third, we wanted these tools to encourage best practices. Even though we split that central product development team up into all of these autonomous two-pizza teams, we still wanted them to share their learnings. And we found the most effective way of doing that is that if a team learned a best practice, we would take that an bake it into the toolset and that made it very easy for other teams to both discover this new best practice as well as to adopt it themselves. And finally, you can say we drank the microservices Kool-Aid, and just as we were teasing apart the website architecture, we didn’t want to deliver an end to end tool chain that was very tightly coupled together, we wanted to deliver it as these functional building block units that the teams could pick the pieces that worked best for them and then tie them together in the way they wanted. So, I want to talk about a couple of these building block tool services that we use internally at Amazon. The first is Apollo, which is our deployment engine. Its job is essentially to get bits onto a box. We've been using Apollo to deploy the retail website for over a dozen years now, and we also use it to deploy our Amazon Web Services. And over that time we've learned a lot about how to do deployments well. And we've taken those learnings and we've baked them back into the tool. One of those features is the ability to deploy without any downtime. So, as you can imagine, we're not allowed to take down the retail website any time we want to push a code change, so we came up a feature called rolling updates. And what Apollo will do is when it's updating a fleet of application servers, it's only going to update a small fraction of those at a time and then incrementally work its way across the fleet until it brings the whole fleet of servers up to the new version of the application. Another feature that we added was health tracking, and that's because it happens rarely, but occasionally a bad code change can make its way through testing and roll out into production. And what we want to make sure is that that bad code change is not going to take down the entire fleet. So, as Apollo is doing this rolling update, if it detects errors or failures on the services that it deployed to, it will automatically cut off that deployment and stop it from progressing further. And since Apollo versions these deployments, it makes it real easy for a developer to roll back to a past known good version of that application. Next service I want to talk about is Pipelines, which is our internal continuous delivery engine. So, even after we built Apollo and had automated deployments we still noticed that it took a long time for a code change to go from a developer check in to be running in production where customers could use it. So, being a data driven company we did a study on that and we measured the amount of time it took a code change to make its way through that deployment lifecycle across a number of teams. When we added up that data and looked at the results and saw the average time it took, we were frankly embarrassed. It was on the order of weeks. So, we dug into that data and we looked exactly for detailed breakdowns how long it was taking at the different steps. And what we saw, it wasn't so much the duration of any one of those actions, it wasn't the duration of a build, or the duration of a test run, or the duration of a deployment, it was all of this dead time in between. We had a bunch of inefficient manual handoffs where after one task was run, a person would take that and then notify another person that the next job is ready to run. And that usually happened in the form of an e-mail, it could be cutting a ticket, but these requests were sitting in these queues and sitting idle for a very long time. And for a company like Amazon that prides itself on efficiency, for a company that uses robots inside of our fulfillment centers to move around physical goods, a company that wants to deploy packages to your doorstep using drones, you could imagine how crazy it was that we're using humans to pass around these virtual bits in our software delivery process. So, we had to fix that, and we did that using Pipelines. Pipelines allowed these teams to model out their complete and end release process. They could specify how they wanted their source code changes to be automatically built and unit tested, how they wanted those to then be deployed to their test environments, what tests they wanted to run on those in those environments, and then how they wanted those changes to move out into a production deployment. After they modeled out that release process, Pipelines would automatically handle all those code changes and marshal all those code changes through the release process for them. So, it automatically trigger off the builds, automatically checked the results, and automatically move it off to the next step for the development teams. So, when we implemented Pipelines and it began to be adopted internally, we saw dramatic improvement in the speed of these software releases. But we also saw another improvement that we didn't expect. We saw that the teams that had fully automated Pipelines had actually more reliable releases than those that had manual tests or manual steps involved. And that was a little unintuitive at first. We thought that if you put a human who can make intelligent decisions and an extra scrutiny on your release, that might help make a release more reliable, but what we saw was the opposite. What we saw is that the teams that fully dedicated themselves to making sure every validation step that they wanted was baked into an automated test, we saw that those teams had more reliable releases that required less roll backs and had less deployment errors. So, with these two advantages of having faster and more reliable releases, it's been incredibly successful inside of Amazon and is used pervasively across the different teams. Now, after adding these two tools for automated deployments and continuous deliver, we fully unblocked these two-pizza teams from operating independently. So, now these teams can decide what features they want to work on, implement those features on their own schedule and now they're completely unblocked to push those changes to their customers through their own delivery Pipeline. When customers ask me how it is that Amazon is able to move so fast, this is the answer I give them, even though from the outside it might look like Amazon is a large organization that might have some internal overhead, on the inside we're really structured like a bunch of small startup teams that are all operating and organized very efficiently and moving as fast as they can. So, there are a lot of different ways that we can measure success here. One of the ways is - that we've talked about publicly - is how many deployments we do each year. So, when you have thousands of these two-pizza teams that are working on these small microservices, practicing continuous deliver across multiple Dev, test, and production environments, that multiples out to an insane number of deployments. So, last year in 2014, just during a 12 month period, we did over 50 million deployments. That averages out to be a deployment and a half every second. So, that's an incredible number, and it just shows you how quickly that Amazon is turning the crank on that software development lifecycle. Now, when we tell customers this story, typically the next question they as is how can they do it themselves. And I'm not going to over simplify things here because it is a very complex answer. A company needs to look at cultural changes, organizational changes, process changes, and there's not one right answer for every company. Everyone's going to choose and have their own twist on their solution that will adapt their particular needs and environment. But there is one common building block that every DevOps transformation needs, and that is to have an efficient and reliable continuous delivery pipeline. And that's what I want to talk about for the rest of this talk. So, what does it take to set up a continuous delivery pipeline? Well, there's a few requirements for it. The first is the most important, and that is that you absolutely need to have fully automated deployments. That's going to be because you're going to deploy a lot. Every release that you have to customers, you're going to have to deploy multiple times to your testing environment when you're debugging it and iterating, then you're going to need to deploy it to your staging environment to run tests to make sure there's no regressions, then you're going to deploy it to production where your customers can finally use it. If there are any manual steps in this process, it's not only going to slow you down, but it's going to incent you to deploy and release less often. So, after you set up automated deployments, the next thing you're going to do is try to tie together and automate your software release process. You want your source code changes to automatically be built, automatically be deployed to your test environments, automatically trigger off all of your test runs, and finally move into a production employment. Now, it's going to be okay if you have some manual steps here. Almost all customers, when they're starting off, are going to have manual steps, but over time, as you mature here, you're going to want to take any manual actions, and manual validations that you have, and try to convert those into automated steps as much as you can. So, we have a few tools and services that can help you out here. We have CodeDeploy, which you can use for automated deployments, CodePipeline that you can use for end to end release automation, and then if you want to move your source code to the cloud so that you have your entire pipeline, everything from source, to build, to your test stages, to production, if you want your entire pipeline hosted in AWS, you can use CodeCommit to store your source code also. So, what I'm going to do now is give you a very quick introduction to these three services, and I want to do that very quickly because I want to save time for Clare to come up here and give you nice, in-depth tour of what it's like to use these through a demonstration. So, the first service I'm going to talk about is CodeDeploy. And CodeDeploy works just like Apollo. It's going - you're going to specify what version of your application you want to deploy to what target group of servers, and it's going to handle that roll out for you. It has the same features of Apollo with rolling updates so that you can deploy without downtime, it's going to allow you to do health tracking to cut of bad deployments before it takes down your entire application. And when we launched CodeDeploy we only supported deploying to Amazon EC2 instances, but earlier this year we released support from on-premises deployments. This allows you to deploy your application now to servers in your own private data center, and it also allows you to deploy to VMs in other clouds. This means that you can now use CodeDeploy as your central tool to manage all your deployments to all your different applications in all of your different environments. The next service I want to talk about is CodePipeline, which it was inspired by our internal Pipeline service. It's going to work in much the same way. You're going to specify how you want your release process to work, how you want to tie your source code changes into your build stage, what test environments you want to deploy to and what tests you want to run in those environments, and ultimately how you want it to deploy into production. This service was designed to be very extensible and pluggable so you not only have control of that workflow, you have control over what systems you connect to each step of this process. If you want to use an AWS service like CodeDeploy or Elastic Beanstalk for your deployments you can do that. If you want to use an integrated partner tool like GitHub for source control you can do that. And if you have your own servers, maybe on-premise servers that you want to integrate into this process - you might be using Jenkins for build or test - you can hook those in as well. So, after you define your own custom release process then CodePipeline is going to manage all of your code changes for you, so it automatically will trigger each step of your process along the way and make sure that every change goes through the validations that you define. The last service that I want to introduce is CodeCommit. CodeCommit is git, git source control re-implemented on top of S3 storage. On the front end it works like any other git source control system out there. Use the same git tools, issue the same git commands, so there's nothing new there. But on the backend is where it's really unique. We've implemented git on top of S3 and DynamoDB. So, this brings us the advantages of that cloud scale storage plus a few interesting bnus features. One of those is that CodeCommit will automatically encrypt your repositories using customer specific keys. That means that every customer will have their repository encrypted differently when it's stored into S3. So, I think I lived up to my promise, I gave you a really quick introduction to those three. And now, I'm going to turn things over to Clare for the rest of the talk, and she's going to give us a hands on tour of what it's like to actually use these services to set up your own DevOps processes. Thank you. CLARE LIGUIORI: Thanks, Rob. So, I'm Clare Liguiori, I'm an engineer in the Code Services, and I want to give you a live demo using a simple app that I've created that gives a little bit of a flavor of what Rob was talking about. So, it's got two microservices as two independent teams - Rob and I are going to be the independent teams for this scenario. And it has that microservices architecture. So, for a simple calculator app that does add and subtract I have a web service that's written in Go that is my calculator API. So, that's what I'm going to be using to add and subtract given inputs. The front end service is going to be a website, very simple HTML website, that's going to call into that calculator API on the web service. Rob worked on the web service in Go and then I worked on the website. I'm also going to give you a little bit of a flavor of some of the options that you have for managing your release process on top of CodeDeploy and CodePipeline and some of the features that you can mix and match. So, the scenario that we're going to run through is that recently Rob added multiple and divide to that web service, and so we're going to go into the website and actually add that as a feature for our customers on the website. But it turns out that Rob has a bug in his code, bad Rob. He has a divide by arrow - divide by zero error. So, we'll look at what you can do to stop those types of bugs from getting out to production. We'll first start with that change the website that I talked about, adding multiply and divide to the website. I'm using GitHub for my source, and I'm going to use CodeDeploy to deploy that change into test environment. We'll do a little bit of smoke testing on it and then promote it to production with CodeDeploy as well. For the web service, it's a little bit more complicated because it's written in Go, so I need to compile that code in order to get build artifacts that I can actually deploy. We're going to use CodePipeline to hook that entire process up from the source in GitHub, and to build to compile that Go code, add some automated tests so we can catch Rob's bugs, and then deploy that with CodeDeploy as well. And then finally, I'm going to show you how you can move those repositories into CodeCommit and start managing that source code in the cloud. So, let's first go and make this change for the website, add multiply and divide. This is my very, very simple calculator application. Again, it only has addition and subtraction right now, and then it will call back into that web service to that calculator API to get the result and display that. This is my website running in my test environment, this is my website running in my production environment, today they're running the same version of the application. The production environment is a little bit different because I want it to be available and reliable, I actually have three EC2 instances running behind Elastic Load Balancer. So, I've created this heads up view to show you exactly what version of the application is running on each of the EC2 instances. So, today we have this addition and subtraction running on all three instances. The first step for using CodeDeploy for your release process is going to be packaging up your application for CodeDeploy in order to deploy it onto your instances. So, like I said, I have my calculator website source stored in GitHub and I'm using this really package it up for CodeDeploy. I have all of my application artifacts, today it's just a simple HTML page, but if it was more complex, had separate Java Script files or image assets, those would go into this repository as well. I have a scripts folder, that's going to hold all of my installation and configuration tools and scripts that I need to deploy this onto a server. And then the most important part for CodeDeploy is going to be the application specification file. So, that's going to define how CodeDeploy needs to install this on a single machine. So, we call that AppSpec for short. Let me show you a little bit about the AppSpec file and break this down. So, the first section is the files section. That's going to tell CodeDeploy what files in this bundle of application does it need to copy onto your instance and where does it need to go. So, I only have my single index at HTML file and it's going to go into the root web server content directory, but you could add many more for more complex applications. And then this hook section defines what does CodeDeploy need to do in each stage of the development lifecycle. So, we're going to stop the application, CodeDeploy is going to bring down the web server, take it out of the load balancer. Before instillation we want to make sure that we have all of the dependencies, in this case just a simple web server package. CodeDeploy is going to start the web server back up after it's copied the files onto that machine, and then finally it's going to validate that everything's working correctly, that we can actually get a good response from the web server. Again, my example is very, very, very simple, but you can grow this to your more complex applications, more complex scripts and steps. Let me show you a couple of my simple scripts. We're looking now at my dependency scripts. I'm using Yum here just to make sure that a web server package is installed, but you can grow this to any configuration management tool that you're using today. You can use Chef, or Puppet, or Ansible, and hook that right into this script. Let's also take a look at my test script. I want to make sure that that web server was actually started property before I add traffic back to it, and I want to make sure that it's actually serving pages. This is a really simple example, of course, I'm just making sure that we're getting a successful response code. You could extend this to make sure that you're getting the right content back, that you have actually copied the file that you expect to copy onto that machine. The scripts model work really well inside of Amazon because Rob mentioned that it's very independent teams there. As Andy said this morning, they're determining their own destiny. And so we want to make sure that they can choose the technologies that work for their application. And so in this scripts model it's very flexible and powerful, you can hook into any software that you're using, any programming language, any scripting language, really anything that you need to get your application onto the instance and up and running. The next step for deploying with CodeDeploy is going to be choosing the instances that you want to deploy to. So, over here on the left hand side, you're going to see the deployment groups. A deployment group is simply the group of servers that you want to deploy to. So, that can be defined by an EC2 instance tag, a tag for your on-premise servers, or simply the name of an auto scaling group. So, in this example, I'm using two auto-scaling groups. I have my test environment with just a fixed size of one, and then my production environment with a fixed size of three. And then on the right hand side you'll see this list of revisions. What that is, is in order to register a new revision of my application, you're going to zip up all of the files that we just looked at in the repository and register that as a newer vision with CodeDeploy. So, you can see I've been playing through a couple different revisions here. So, let's go and try to kick off a deployment. A deployment can be kicked off through the AWS CodeDeploy console, it can be through the CodeDeploy API with the AWS CLI, or the SDK, or you can do it through one of our partner integrations. GitHub just happens to be one of our great partner integrations, so we're going to use that. And let me show you what that looks like. So, I'm now looking at the service hooks that I have configured on my repository for my website, and the first one at the top is going to be GitHub auto deployment. What that's going to do is configure my repository to trigger a deployment every time I make a change to that repository. So, every time I push a new commit to the repository it's going to kick off a deployment. And then the CodeDeploy service hook is going to define what application do I want to deploy to. So, that's going to be my website application that I've configured within CodeDeploy, and then what group of servers do I want to deploy to, what deployment group. So, I probably don't want to automatically deploy into production, that sounds wrong, so I've configured it to deploy automatically into test when someone pushes a new commit. So, let's go back to the website, and at this point we're actually going to enable multiply and divide for our customers because it's already an - it's a new option in the calculator API. So, we'll do that, and let's announce to our customers that we have a new feature for them. And then I'm going to change the font color and the background color just so it's really clear to us which version we're running. And commit those changes. So, now what GitHub is doing, it's taking the updated files in my repository, registering that as a new revision in CodeDeploy, and kicking off a deployment. So, we can go back to the CodeDeploy console and it's already registered this as a new revision. And we can go into deployments, this is the central dashboard for all deployments across all of your applications. You can see all of your deployment activity. I'm going to drill into this deployment in progress. This is the deployment that GitHub kicked off for my multiply and divide change, and you can see the instance right here that it's actually deploying to in that auto-scaling group. So, let's give it just a second to get started. What I want to show you is I want to drill into the individual deployment that goes onto that specific instance. You might recognize some of these event names from the AppSpec file that we looked at. So, applications stop, it has already brought down the web server, and it has installed my dependencies, my web server, it's installed those files, copied my index out of HTML to the web server content directory. And so, right now what it's doing is starting up the web server and then it will actually call my test script and make sure that it's getting a 200 successful response. So, everything's succeeded at this point. Let's refresh. Alright. So, we're looking at the test page, and I can actually do some multiplication. Very exciting. I can do divide. So, at this point we've done a little bit of smoke testing, and we feel pretty confident in getting this out to production. We want to get it in front of our customers. In order to do that, I'm going to go back to the CodeDeploy console, go to my application, HTML website. So, CodeDeploy makes it really easy to go into the console and manually deploy a revision that you've already deployed somewhere else, in this case test, and then to deploy that to a different production group, or deployment group, in this case, production. So, we already know that it's deployed to test right now, so let's go and deploy this revision. And deploy. So, in this release process, this is obviously a manual step, someone has to decide at this point that, yes, we've done all the tests that we need to do and go and manually promote that to production. So, that deployment is running now out to the three instances that I have. I'm going to let that run for a little bit and I want to go back and talk about some of the features that Rob talked about a little bit and give you a better visual of those. So, Rob talked a little bit about how we do rolling updates with Apollo for the amazon.com website. And CodeDeploy has the same feature so you can actually deploy without downtime. We talked about bringing down the web server, installing all those files, and then bringing them back up, we obviously don't want any downtime for our production site. That's probably fine for test where we have that single instance, but we want our customers to always be able to access our website. So, how that's going to work is I mentioned that I have three EC2 instances behind a load balancer, CodeDeploy, what it's doing right now in production, is it's taking a single instance out of that load balancer so that at this point all of your customer traffic is flowing to the two that are still behind the load balancer, still running that version one of the application. So, on that single instance it's now upgrading it to version two. Once it's finished with that, it's going to put that instance back in the load balancer and roll onto the next one, take it out of the load balancer. Now we have our customer traffic going to sort of 50% V2 and 50% V1, but our customers are always able to access that website. It will roll to the next one, and finally everything's back in the load balance, and everything's on version two. And for this feature, we actually have some sample scripts for you in the AWS Labs Organization on GitHub that you can use in your own application. And I'm using a couple of them. So, deregister from ELB, that's what's going to take that instance out of the load balancer and then register with ELB, that's going to put it back in the load balance and start serving traffic again. The next feature I want to talk about is our Auto Scaling integration. I've mentioned a couple times that I'm using Auto Scaling for my test and production environments. Auto Scaling is an AWS service to help you scale up or scale down your fleet of servers. You can scale it all the way down to one just like I'm doing in my test environment, or you can scale it up to thousands in your production environment. Obviously, when you're scaling up your environment and adding new instances into your deployment group, you want to make sure that they have the latest application version. You don’t want them to start serving customer traffic with some old version of the application that you had baked into your army, maybe. So, CodeDeploy is actually able to hook into the Auto Scaling lifecycle and catch that instance before it's added. CodeDeploy is going to deploy the latest version of your application that's configured, and then it's going to add it to the load balancer. So, you don't need to worry as you're scaling up your fleet that your customers aren't getting the latest version of your application with all its great features. The final feature that I want to talk about is health tracking. Rob mentioned this a little bit, and we want to make sure with CodeDeploy that we're going to catch any deployment problems before they're able to get out to the entire fleet. So, we know that problems happen, bugs get out into production, but we want to be protected from that getting out into your entire fleet and having a complete outage of your website if for some reason a bad version gets out. So, what's going to happen is let's say we have a new version of our application, and for some reason that causes the web server not to give successful responses. Maybe we accidently deleted the HTML file and now it's giving 404s. We want to make sure that that definitely does not get out to the rest of the fleet. So, CodeDeploy, based on that validate service script, whatever test script that you define to be success for your application, it's actually going to stop the deployment at that point. At that point you have the option to either deploy a new fix and do a new revision of your application out to your deployment group, or you can chose to roll back to a good known version of your application where you know that it's going to do a successful response from the web server and put that back in the load balancer. So, I actually have a failure set up. In this case, in CodeDeploy I have a fleet of nine instances, and I have it configured to make sure that I have a minimum of eight out of nine instances that are healthy. So, what that means is there's a single failure, it's going to stop that deployment and fail that deployment. So, here I can see the step that did fail. I can drill directly into the logs that actually failed. So, CodeDeploy is going to declare failure if any of your scripts that you define in the AppSpec file give an error response. So, in this case it was validate service gave an error response. I don't have to SSH into that instance, grep through all these logs, I can actually see the last little bit of that script running directly in the CodeDeploy console so I can figure out what happened there. So, at this point, let's go back and check on the production deployment. So, everything looks good, we have success across the board. We can go and refresh our production website. We have the new version of the application running, and then we can look at the dashboard and see that across the board we now have the new version of the application running. So, now that we've gone through the release process that I've set up for my website, I want to take you through the release process for the Go web service, the calculator API. So, as I said before, this is a little bit more complicated. I have a build step that I need to do to compile that Go code in order to have artifacts to deploy, and then I also want to add a few more automated tests to make sure that no bugs get out to production. So, for the calculator web service I have a GitHub repository very similar to the website. It's already packaged for CodeDeploy. We have an AppSpec file and we have scripts files ready to install this. I just have a single Go language file that's going to define my web service, and then I've already created an application in CodeDeploy for this web service. So, just like on my website, I have a test environment and a production environment. And you can see some of the revisions that have been flowing through. I mentioned that I need a build step, I don't want my engineers to have to build this on their desktops and copy the artifacts somewhere, I want to make this an automated process. I have Jenkins set up on an EC2 instance - this is a very popular build server. And to integrate it with CodePipeline, I have the CodePipeline plugin installed in Jenkins. So, what that's going to do is CodePipeline's going to notify Jenkins where there's new source that's available to compile, and then it's going to notify CodePipeline when the build is done and they are artifacts ready to move on to the next step of deploy. And then I just have a simple build my Go code. So, now that I have the build step and I have the deploy step set up, I want to link all of these together in an automated release process. So, let's jump over to CodePipeline - that's not CodePipline - there we go. This is a really simple pipeline that I've created with the pipeline wizard in the CodePipeline console. You can see I have a source step that's going to pull directly from GitHub. Anytime that there's a new change, CodeDeploy is going to automatically pull that new change for my repository and move it into the build step. So, in the build stage I have the Jenkins server configured. What that's going to do is, using the CodePipeline plugin, build my Go code, get my deploy artifacts ready, and then it's going to hand it off automatically to the beta stage, which is where I have my test production environment set up with CodeDeploy. So, with a developer pushing a single change into my repository, it's going to flow through the entire process right into test deployment. Now, you'll notice that I don't have my production deployment here and I don't have any test, I'm not very comfortable yet deploying directly, automatically, into production because I don't have any automated tests. I don't quite trust Rob to always check in a working change. So, let's go into the pipeline and edit and add some tests. This is the edit view in the CodePipeline console. This is where you can freely edit your entire release process, manage, and model your release process here. You can add new changes and you can add new steps into each stage. So, let's edit the beta stage and add a new action. We're going to add test, and I want to add an API test to make sure that the behavior of each of those APIs for the calculator website is correct. In a test provider dropdown, you can see just a couple of our great partner integrations. It makes it really easy if you're using one of these existing tools for your applications today, to start using it with CodePipeline. So, I'm going to choose Runscope, I want to do an API test. I'll click connect. That's going to take me right to Runscope where I already have a test set up, create that integration, and add the action. And that's all that it takes to add this automated API test into my pipeline. Let's also go in and add a load test. I want to make sure that once it gets into production any change is going to perform properly in the higher load of production. So, let's choose BlazeMeter, add that action - connect. That's going to take me right into BlazeMeter. I already have a load test set up with BlazeMeter and we'll add that action. You'll notice that I added these two actions side by side in this stage. What I like to do is add these side by side because that means they're going to run in parallel. So, I love doing tests in parallel, it means that I'm going to catch any problems that are happen with the change that's flowing through the pipeline, but I'm going to - for any successful change I'm going to get that out to production faster and into the hands of my customers. So, now that I have a couple of tests, I feel pretty confident that that's going to catch any major errors before they get into productions. So, let's add the production stage. And this time we'll add deploy and choose CodeDeploy. I already have that application that I showed you for the Go web service, and then this time we'll go for the production deployment group. So, at this point I have modeled out my entire release process. I want to show you what it looks like when we're trying to catch some of the errors that Rob has added in. I'm really picking on Rob today. I have a baked web service that has a failure in it. So, it looks just like the one we just created but it has a failure. One of the great things about the CodePipeline console is that you can actually link directly to that failure and drill in and see what's going on. So, we can scroll down to the failure, and we're not getting the behavior we expected for divide by zero. So, at this point, what you can do is you can link directly into that GitHub repo, direct link there, go in, fix the change, push that through, and then that will go from source to build, we'll get artifacts deployed into test, hopefully it will pass the API test this time, and then get straight into production after that. So, now I've showed you CodeDeploy and CodePipeline, I want to show you how we can move these repositories in CodeCommit. Let's add simple calculator website, copy the URL. I have this repository on my desktop already so I'm just going to add that as a new remote, push everything up. So, now that my repository is now moved into CodeCommit, I can drill into the repository. This is a new feature that we just launched on Monday, the code browsing in the console, and I can actually see all of my code right in the CodeCommit console. So, to wrap up, I showed you a couple of our partner integrations. We looked at GitHub, and Runscope, and BlazeMeter. This is our full list of partner integrations. They have some great end to end solutions, and some of them are here today, so you have the opportunity to really see how they could benefit your cloud development. If you go to the AWS DevOps kiosk in the AWS booth, you can pick up a partner passport. If you get three partners who are here at reInvent to stamp them, bring that back to the DevOps kiosk, and you'll get a little gift of some AWS credits. I also want to give a little bit of a plug for some of the related sessions that are here at reInvent. Right after this talk you can dive deeper into CodeDeploy and learn more about automating your software deployments. Like I said, please check us out at the DevOps kiosk. You have the great opportunity to see how our partners can benefit you. Rob and I and some of our partners are here to answer questions. We'll be taking questions up here in the front and out in the hall if you have any questions. And definitely please fill out your evaluations. Thank you.

Info

Channel: Amazon Web Services

Views: 68,656

Rating: undefined out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, aws-reinvent, reinvent2015, aws, cloud, amazon web services, aws cloud, DevOps, DVO202, Rob Brigham - Amazon Web Services, Clare Liguori - Amazon Web Services, Introductory (200 level), service-oriented architecture, code commit, code pipeline, cloud computing event

Id: esEFaY0FDKc

Channel Id: undefined

Length: 53min 34sec (3214 seconds)

Published: Thu Oct 15 2015