Deep Dive with Amazon EC2 Systems Manager [ENT401]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon everybody and thank you very much for joining me for today's session which is a deep dive on Amazon ec2 Systems Manager my name is Maitreya Ranganath and I'm a solution architect I am based out of the Dallas office and today it's my pleasure to go deep on the assistant manager with you just by a show of hands how many of you have actually are using or evaluating systems manager or using run command I'd say about 10% so my objective today is to address that to show you how you can use that and my hope is that you come away from this session understanding some cases where you can use systems manager and understanding how it can help you so what can you expect from today's session you'll get an overview of system manager capabilities and components we will walk you through some use cases of each component i'll also break off and show some demonstrations to show you some of those components in action and finally we will talk about how you can bring all of those components together to realize enter and use cases I have some demos so it probably will take almost the entire hour but I'll hang around after this to answer any questions if we finish early I can take a few questions so we've been listening to customers we've been having we had customers come up and speak about how cloud has become the new normal in places like the keynote that we had yesterday so what's happening is that customers of all sizes enterprises as well as small customers are moving to AWS and they're getting advantage of the benefits of AWS increase agility reduced costs as well as global reach but there's been a refrain that what we've seen our customers are also trying to manage the systems that they deploy on AWS and when they do that the first step they sometimes take is to bring their traditional on-premises tools to manage those systems or new wshh and they also try to do this management so have a single pane of glass there a single tool that they can use to manage servers both in AWS as well as on-premises and what happens then is they've shared challenges with us what they've told us is that many of those traditional tools are not really designed to handle the scalability that comes with it u.s. or the dynamic nature of the environment in AWS when systems can come up and come down within matter of minutes many of these tools really can't keep up with that pace of change what happens then is you have to create two different systems a set of tools perhaps that work well with AWS and a different set of tools that work with on-premises and when you do that you don't get a single pane of glass you don't get variability across all your resources what also happens is that when you try to integrate these kinds of tools together you end up having significant complexity and finally many of these on-premises tools have licensing costs and you kind of have to pay by managed service and that can quickly become difficult when you try to manage servers that don't exist for very long periods of time ephemeral servers so in summary customers have shared with us that managing cloud and hybrid environments using traditional tool sets is complex and costly so we heard that feedback loud and clear and then we said what can we do about that and ec2 Systems Manager came out as a result of those discussions so in summary ec2 Systems Manager is a set of capabilities and you'll see it comes in seven capabilities today and we'll dive deep on each of those it allows you to do ongoing operations and management's of systems at scale this is addressing the responsibility area of managing operating systems and about so if you look at the shared model that AWS has that is a responsibility if you are using a service like ec2 that is your responsibility so Systems Manager is a tool that can help you on that on to deliver on that responsibility so it helps you automate all the actions that you need to do for that it works across both Linux and Windows server so it's cross-platform and it also can work across both systems and servers in ec2 you see two instances as well as servers that you have on-premises so it's hybrid in nature so the name is easy to systems manager but it's not just easy to it can be both on-premises as well as ec2 and finally to address that cost equation it doesn't cost anything to use the facilities we are going to talk about right Systems Manager is offered to you at no additional cost you pay for the resources you use so for example ec2 you pay for those you might pay a little bit for storage of the output and you might pay for things like spinning up servers when you want to take images and we will talk about that but the services themselves come to you as no additional cost both for unpromising l as in AWS so why should you care about systems manager it really comes down to the benefits it provides you so number one is it supports hybrid architectures you can use Systems Manager as a uniform way to manage servers both in AWS as well as in on-premises data centers it's cross-platform Windows and Linux several flavors of Linux and several versions of Windows it's inherently scalable it can handle tens or thousands of servers without having any additional complexity that you need to introduce it's inherently secure like every other AWS Service Systems Manager integrates with tools like cloud trail and with Identity and Access Management so you use your familiar ways that you have to manage accesses to resources in AWS you can use the same mechanisms to manage how actions can be done in Systems Manager Systems Manager makes it really easy for you to write your own automation and extend that to do additional actions and we'll see examples of that and finally a combination of all of that results in lower total total cost of ownership you don't have to pay for licenses as well it reduces the amount of manual work that you need to do so anything that you can automate that's effort that you can save on and you can focus on higher order functions actually what makes your business unique so these we consider as undifferentiated heavy lifting you can kind of elevate yourself and do something which is unique to your business so let's take a look at the system manager components and use cases the code to the systems manager is the system manager agent this is a piece of software that you install on the operating system of servers that you want to manage it supports a bunch of platform so it has several flavors of Linux including Red Hat Center or some SUSE Linux and it has several versions of Windows in fact if you are running on ec2 and you have launched an ami from for Windows subsequent to about mid of 2016 the Systems Manager agent is already included in the that you've launched so you are already enabled to be managed by Systems Manager if you're using windows on AWS ec2 if you're using Linux you have to download and install the agent so it's a few steps to get it from as3 URL and expand and run that command and you can script that as part of your user data bootstrap script so every instance that you launch will be ready to be managed by Systems Manager the agent itself is open source we make the source available in github so you can examine it to see what it's doing under the hood so once you have an agent installed on a server be that on ec2 or in on-premises you are now able to take advantage of the capabilities which are on the next slide so there are seven capabilities and I won't read them all right now but let's see how they kind of hang together and that's the next diagram so the core of Systems Manager is a concept called documents so documents are JSON files that you can author which contain a list of commands or actions that you want systems manager to do for you what you can then do is deliver that document to a service so in this example let's say you go towards the right and you deliver the document to a service or a facility called run command and run command will execute that document on the entire suite of servers that you define there are also services like state manager patch manager and automation that can take documents of different types and execute them we go deeper into each of these and there are three other services that are overly parameter store which lets you store parameters inventory which lets you get a full inventory of a software and maintenance window as the name suggests lets you define maintenance windows and execute them at your schedule so you might be thinking what's a document really right so let's look at an example a document is a JSON document like we said this is an example of a document which you can deliver and have executed it has a schema so it has a schema version we have updated the schema a couple of times so it's schema version 2.0 right now it has a description and this document takes some input parameters so in this example this is a document that installs a particular Windows feature so the parameter a represents the name of the feature that we want to be installed so you can have a feature windows feature like iis dotnet framework and this document will install that on all the servers you specify down below are the main steps so this is one or more steps that you want the document to in to execute and here we are taking example of running a PowerShell script so this is a windows powershell commandlets that installs the windows feature that we've named with the feature you can see some examples here of parameters being expanded because those double brackets represents the value of the feature that you're going to provide when you run this command okay so what we're doing now is we can take this document and deliver that to one of those features we saw on the previous page and have that executed so the first feature i want to dive deeper on is run command so this is the core service we launched that actually a little over two years ago the concept with this is that it lets you securely and remotely manage your servers at scale both in ec2 as well as in on-premises environments you represent the actions that you want run command to do in that document the format we just saw and then you decide on how many servers or which servers this particular document should be run run command will execute that the system is manager agent will run those commands the output and the success in failure state comes back to run command and you can see the results in the service console or through s CLI or an API call what also happens is the output of those commands are also stored they can be sent to s3 or they can be viewed directly in the console run command also has features to make sure that it can control the rate so if you have a fleet of say tens of servers you can say that run command will run at the most two of those commands in parallel so it kind of spreads that workload over time and it also has error handling so you can define how many errors should happen before run command decides that that particular command has failed it's obvious native what that really means is that it's integrated with several other AWS services cloud trail is a great example which let's use that which lets you know which commands were run and who ran them so you can actually have a very clear audit of all the commands that were executed on your fleet it also is integrated with iam so you can decide to control your access using identity and access management policies to decide who is allowed to run a command and with what conditions right so you can let that scope the permissions down so what are some of the use cases that people are using run command for the first one that jumps out is that because you now have the ability to remotely execute commands without having to SSH or RDP into servers you can now turn those services off so rather than having to have a whole fleet of managed users on each server and having to worry about who's having access to what and I'm going to have to rotate their keys and things like that you can eliminate that entire class of problem and say to those users that you can now use run command to run those commands instead of having to log into boxes now because your run command has documents that's the curated set of actions that you want to allow your users to do they can't deviate from that all they can do is run that so they can't actually log into a box serve to some site and download some malicious code because they don't have the ability to do that which is a nice thing to have so again you can run arbitrary bash and PowerShell scripts and you can use that for various different use cases and it's really a peer magician to see how you can use run command to do your work so some other some of the key use cases that we've seen customers use that for are for operating systems changes a very common use case is to do automated domain join so if you use the AC to console there's a drop-down in the wizard that lets you automatically join a domain behind the scenes what's really happening is that a run command document is being executed to join your Windows Server 2 directory services domain you can manage application changes and you can also integrate run command with configuration management tools like ansible salt powershell of DSC so you can use that to trigger actions to keep your configuration up-to-date here's a blog post link as well as a diagram that shows you how to replace a bastion host so this is the bastion host or a jump box that traditionally we've had to maintain because users want to go to servers and administer them what we can do by taking all those actions and making curated lists of documents is eliminate that entire set of bastion hosts so that's an entire infrastructure that we don't need to maintain and don't need to worry about securing because run command lets you kind of have the speak basic the the curated set of commands as well as very good auditing in terms of who executed what when so you don't have to have the set of Bastion host and the other thing is since you do not allow SSH in that's another vector that you kind of eliminate from consideration what I'm going to show you now is a walkthrough so I'm going to switch now to a demo and show you how the run command works so what you're seeing here is I've logged into my AWS account and I'm in the ec2 I just logged into the ec2 dashboard so where you find Systems Manager and all its related services and capabilities is on the left side it comes in two parts you see the Systems Manager services and systems manager shared resources what I'm going to do now is to click on manage instances and it brings up a list of servers that have the SSM agent installed so these are servers where I've either install the SSM agent such as the Linux servers or I have enabled the SSM agent on Windows and in order to let SSM agent talk to the assistant service I need to give those instances and an I am role which lets it do that so there's a predefined policy document that you can associate with that I am role which lets the server communicate with the SSM api's and do things like register and appear online scrolling down I also have a new bundle virtual machine that's running on my laptop which kind of represents the on-premises use case so what you can see here is that it looks identical to all the other instances except that it's instance ID is MI - something rather than just I - something so from the perspective of Systems Manager and on-premises server is the same as NEC - instance what I can see here are that the agent version so we can keep track of agent versions so that is manage instances so these are servers on which I can now run Systems Manager commands I'm going to show you now documents so these are the documents and if you come just log into documents you'll see that there's a whole a long list of documents that AWS provides so these are documents that we've created for common use cases for example in order to be able to patch the SSM agent which is a common task we have a document here called patch the agent or update SSM agent so this is the document that's predefined all you have to do is run it if you want to run this particular command now you can also author your own document so I'm going to show you an example of that a really simple document which lists the open ports so essentially it runs a net stat for me and if I look at the content of the document you can see the JSON I'm going to scroll down to show you that it's very simple it has the basic preamble here main steps it doesn't take any parameters what it does is we have two steps here the first one is a PowerShell script which runs next at minus-8 and I've scoped that down with a precondition to say that this should run only for Windows and have a second command called run shell script and scope down to run on Linux servers to run let's start with a slightly different set of parameters so this is a multi-platform or cross-platform document that I can run on a fleet of servers so let's go and try run that and see what happens so I'm going to run command we click on run a command which brings me to this view that lets me select the document I'll scope it down to look for documents owned by me this is the same document we talked about earlier list open ports and now I need to choose this list the set of servers on which I want to apply this document so I can choose the servers in two ways I can get a full list of managed service and cannot select them one by one or I can choose them by a tag so choosing by a tag is really powerful because now I am not constraining it to the set of individual identified servers it's any server that happens to have that tag when I'm running the command and this will come in useful when I have scheduled commands later for now let me just go and select all of the servers that I have in my - list and I can set a couple of other parameters like how many servers I need to run in parallel so let's say I put that as two I want to stop after one error and I can in Advanced Options write the output to s3 but I just wanted I'm not interested in just looking at the output side on that what I get back is a command ID and I can view the result what I see here is the result of running that command on all the servers that employee in my list of servers here's an example what I'm going to do is reflect that and check what happened to be run off this particular command on this particular server having a conflicted with another patching system so that's what happened here but that's fine we can find the output of that here so this is an example of a free this run what I can see here is I can look at the output and this is an example of a Windows server so it was skipped the Linux step if I go back to the Windows server I can see the output here and so this is the step and the output from the Windows command so arbitrary commands can be run and I can capture the output I can see them in the console or I can pipe those outputs to s3 where I can further look at them so at its basic level what we've done here is used run command to select a command to run set the number of servers on which I want it to be run and collected the results back all right let's switch back to the presentation now and go through some other components so the other company I want to talk to you about is state manager so we saw an example of run command which is run this particular command right now but what state manager does is to let you do that on a repeatable basis so the idea with state manager is that with the come talk ument you can define the rules and the commands that you want to run on a periodic basis and the idea here is with those pet commands you're going to be able to deal with configuration drift so you can make sure that your instances and servers are in a defined state based on the rules that you define in that state manager document so instead of having to say list ports let's say I required action to update the SSM agent every 30 second every 30 minutes rather so that's an action that I could deliver to state manager and have that execute every 30 minutes the second feature here is inventory and the role of inventory is to collect software inventory of all my servers and present that in a query about fashion so I'm able to now query inventory to know what are the instances of software versions and I can use that information to find out if I have any server with a particular vulnerability example so let's look at some use cases state manager is all about keeping a consistent configuration so and the second thing is making sure that configuration drift is avoided so an example of that is let's say that I have a large set of auto scaling group of instances those instances come and go but they happen to be tagged with the auto scaling group ID so that's what's done automatically I can set up a state manager state action that says run that on all the servers that are belonging to this auto scaling group and perform an action so for example make sure that this SSM agent is updated and state manager will now execute that on the schedule and remove that configuration drift what are some of the use cases of inventory so obviously you can discover and audit your software you're getting detailed information about all the software versions and packages that are installed including windows hot fixes and you can use that information for security and incident analysis so let's say that there's a high-priority CVE that comes out if you want to quickly tell am i running any of those impacted vulnerable versions of software by going to inventory you can type a query and quickly answer that what you can also do is inventory we recently announced a new feature called resource Data Sync the idea here is that rather than having all that data just in inventory you can also enable that data to be dumped into s3 and in certain format which lets it be variable easily with tools like Amazon Athena as well as be visualized in a tool like Amazon quick slide so here's an example of a quick site dashboard that queries inventory and shows that inventory graphically so I can see here what are the count of instance IDs by publisher how many servers are from different publishers and just below that you can see all of it it shows you a kind of heat map of what are the versions of software that are installed so you can kind of slice and dice that data that is delivered by inventory in any different way that you want so certainly you can use our console to query but you can now bring that in to Athena and quick side to do further analysis that you want this is really really useful if you have hundreds of thousands of servers and you want to kind of slice and dice them to find out what you need to do an operational work let's do a walk-through now of state manager and inventory I'm going to go back to my console now now you're not seeing that yet okay there we go so I'm going to go and now look at state manager and this shows me that there are a few states that are already running or associations that are already present I'm going to walk you through how you would create such an association so let's click on create Association it looks very similar to run commands right the first thing I need to do is back to the core concept I need to choose a document so let's take an example of what I was talking about earlier let's say I want to make sure that the EC to the SSM agent is updated in all the servers and kept up to date so we've created a document just for that which is called update SSM agent I choose that predefined document I note that it can be run on both Windows and UNIX so it's a cross cross-platform document I need to now choose instances by tag so I choose here to choose instances by tags so I'm going to choose instances that are tracked tagged with environment equals production so and after that I need to set the schedule so here let's say I want this to be run every 30 minutes and I said that as I said you and I create the Association so what happens from now on is any instance that is tagged with environment production will now get this action every 30 minutes right and you can now see the result of applying that in the status here it takes a few seconds to actually go scheduled and do that what I'm going to do is switch back and look at another one of these that I had created earlier and see what has happened so you can see here that the Association was run on the top of the hour two o'clock and it was successful on all these instances so these were the three say production instances and I can click on that instance ID and switch on the association side and see the output of that so again it says that one plugin which is the update SSM agent plug-in was present and one was successful so this means that that particular update operation either was not necessary or the package was updated and the SSM agent version is now at the latest so that is state manager in a nutshell so the use case is really are depending upon the document that you want to run so your document in this case was updating software it could also be things like checking for a particular Fermi in a particular directory and undoing any changes that could happen so think of that as something that continuously runs and under under undo any drift that could happen in configuration let's look at inventory so back to manage instances and here's a wizard way of setting up inventory so what happens really is inventory is another type of SSM document which is scheduled to run so what really happens when I do the wizard view is that the document is chosen to be the gather software inventory document and I can now choose which instances to run it on so again let's go back to environment equals let's say in this case development what sort of schedule do I want that inventory to be gathered on so for development servers where a lot of change happens I might want to do this every 30 minutes production servers where I expect less change I might want to do this every day for every week or a different shell you depend on what you want these are the parameters in terms of what type of inventory I want to collect so I want to make sure that I want to collect information about applications AWS components that AWS installs network configuration windows updates for Windows servers detailed information about the instance and things like custom inventory so you have the ability to also push custom information about your servers and have that be tracked by inventory when I click on setup inventory what really happens is that there is a state manager association that now schedules that gather inventory job to happen at the schedule that I said the end result of all of that is that when I click on the inventory tab here in the console for manage instances I can see the inventory by filtering on application so for example let us say that I want M interested in what are the apps that are installed on this particular Red Hat server this is a full list of all the apps so I can see here the app I can see the version I can see the publisher when was it installed architecture URLs and things like that so that metadata is what inventory' collects on my behalf I can also now filter down and say okay let's say I'm interested in finding out what sort of version of OpenSSL do I have so so that shows me that there's these are the versions of OpenSSL that I have so it's 1.01 e is what's installed on this particular Red Hat server right so what I'm doing here is I'm querying the inventory by selecting a particular instance so that's great but I can also go out I can unclick this and I can go back and query the inventory on an entire fleet basis so this is what I'm going to do now let's say I want to answer the question of which are all the servers that have dotnet framework installed and that's the question we can also hear so I'm saying what are the application names that begin with it shows me that there's one server happens to be Visual Studio which is probably a dev box that has got net framework there what if we have a rule in our enterprise that says we should not have any system with dotnet framework less than 4.6 as an example so we can add another filter here that says application version less than 4.6 whether it's the same server so this is the one that we need to look at a little closely so when I come back here I can do the same kind of filtering to see and I can see that this particular server happened to have both 4.5 as well as 4.6 not a big concern so we are probably using this for development but it is something that we can now take action on if this was interesting to us another thing I want to show you is I'll clear out these filters so what we've done is we are now querying what's the information what's the inventory information right now right that's useful but what if I want you to know about the historical view so let's go back to the Red Hat server and then take a look at on the right side it links directly to a service called AWS config if you use AWS config for other AWS resources you know that it's a service that tracks changes in your resources over time so now what happens is that inventory or software inventory also becomes the first-class citizen of the AWS config in terms of how it's tracked so I'm going to click on the timeline button and let the AWS config UI come up so what we are looking at here is the inventory changes over time so you can see here that I launched this server on 23rd I made a few changes over time and then this is the inventory view with the timeline view on the left let me click on something here which is five changes that happen on 24th what were they so you can see here that what's happening is that I had installed the apache set of program so there were a couple of packages that got installed httpd tools a PR util and you can see here that on that particular day it went from nothing to that that means that a new package was installed I can also see patching happening so when you have changes where a particular version changes let's take an example here you can see that in this case the SSM agent was patched by my scheduled scripts to go from eight four seven to eight seven nine right so this gives you a great timeline you and have can help you answer the question of not just what's the version now what was it three weeks ago and what were the changes that happen because maybe something has happened I'm troubleshooting something right so great way for you to do that so that was in summary both looking at state manager and inventory so back to the presentation we looked at how state manager can be used to schedule jobs to keep your configuration drift in check as well as inventory can be used to collect information about all your servers and present that in a query Abell fashion to more features there's maintenance window and patch manager so we all have situations where we need to make changes to systems but we want to do this at periods of time when the when the disruption is allowed so for example low low traffic periods because maybe we are patching systems and they need to be rebooted so traditionally we've had to define these as maintenance windows we have run books we have run plans and someone is going to wake up at night and do that work so maintenance window really lets you define those rules in the service itself and schedule those tasks to happen at the time that you define right so potentially disruptive actions can be scheduled to happen at the schedule that you define and patch manager is a service that now lets you patch the operating system of your servers based on the rules that you define as well we will dive deeper into both of those so some of the use cases maintenance media obviously you can use that to automatically perform actions during the periods that you define it has an inherent concept of priority and a list of tasks so you can define high priority tasks to be executed first and then low priority tasks you can define information like what's the duration of the window and how long before the window closes should I stop executing more tasks so you can incorporate a lot of those rules within the logic that you define with maintenance window and patch manager obviously is used for patching and managing your operating system versions so you can use that to manage a core concept called patch baseline a patch baseline is a set of rules in terms of what patches supplied by operating system vendor are approved to apply to service automatically so you can define you can select those patches by severity you can select those patches by classification and you can define periods of time that the patch baseline should wait before applying a patch to the operating system so an example of that would be when Windows or Microsoft issues critical updates wait for seven days before applying it to my production servers but wait for zero days before applying it to my dev servers that's an example of a patch baseline I'll show you some examples of patch baselines and you can always customize those as well once you have liked those patches patch manager also lets you look at patch compliance so this helps you answer the question of once I have defined the rules how are my servers doing as regards those rules have those patches being applied and how do I stand in terms of which other patches that are supposed to be applied but are missing which are the patches that should have been applied but failed to apply so patch compliance lets you query that and see that on the console itself let's look at patch manager and go back to the demo now so patch manager is found here I'm going to start by looking at patch baselines what we've defined here are AWS provided patch baselines so these represent common logic the logic I talked about in terms of taking Windows updates seven days later is represented by this predefined patch baseline called default patch baseline so this default patch baseline actually applies to all Windows servers if you don't do anything else so this is the default for Windows and I'm going to click on the approval rules and you can see here it says that take all critical and security updates as long as their severity is critical or important so this Riviere T comes from the operating system vendor and after that wait for seven days before applying it onto the servers so this basically sets up that rule and we can now have this patching execute on a maintenance window if we want let's take a look at a similar rule for Red Hat which is another one of those pre-built rules it looks very similar it says take any security or bug fix as long as security is critical or important and again apply it after seven days right so these are examples of rules that are pre-built now you could go with these rules or you can also of course create your one rule so here's an example of a rule I created for Windows development which says I want to take all security updates and all critical updates no matter what the severity and I don't want to wait any time before approving so this gives me kind of much more fast following kind of idea the idea here is that I can possibly test these patches in dev before deciding that they go get approved to be applied in prod rules also have the ability to do whitelist and back-lit blacklist of patches so patches that match these rules but you want to blacklist those so they're never installed because maybe they conflict with something else that you're using you can blacklist those and they'll never be installed so those rules are represented in patch exceptions I have not defined any right now but you can define both approved and rejected patches which are never going to be installed even if they are proved by the rules so once we define a patch baseline we now need to schedule the patching and that can be done by running a maintenance window so let's go back and look at maintenance windows and see how we set that up so when we create a maintenance window I'm required to provide a name here so I'm going to sue choose unregistered target so any target that matches the criteria defined below will be a candidate for maintenance window and I'm required now to to choose a schedule so I can choose the schedule to be every 30 minutes or I can say Sunday mornings at 1:00 a.m. as an example so let me go ahead and do that as an example so this is Sunday morning 1:00 a.m. UTC every every every week the duration of the patch maintenance window is 4 hours what I wanted to stop executing things of the 3rd hour so one hour before the maintenance window closes and that's all I need to do if to create a maintenance window just define the schedule right and the properties about that once I've defined that now I need to tell patch manager what exactly to do and I do that by deciding the tasks so there's one task that we have which is called apply or run patch baseline out here and the idea with this one is the moment I run this predefined document what happens is that the patch baselines that are defined for those operating systems those are executed those rules are evaluated and the operating system is patched to the level that are approved by the rules right so this predefined document does that so this is what I want to be done I choose that I say task priority is 1 I will choose the selecting registered target groups so I have not done that yet but let's say that I had a set of target groups I could choose specific instances and have that run here I can also choose based on the document to scan so this simply scans for patches that should be applied or I can choose install which actually installs those patches so I can register that task and I need to choose the role so this is a role that patch manager uses on my behalf to actually run those patching commands so there is a maintenance window role that I have and I choose to execute on to at in parallel and stop after one error right there same kind of semantics a state manager and I registered the tasks so I think I skipped a step so let me go back and define the targets so you define which servers it should run on I do that by using the tags so this runs on development servers and I've set the task here to say run the patching task and I'm done once I've registered the task now maintenance window is going to wake up on Sunday at 1:00 a.m. and start executing the tasks I defined which is run patch baselines and runs patch baselines is now going to discover the rules that I've set up for that particular patch baseline for that particular operating system and start executing those particular patching operations I can come back to state to be managed instances and take a look at the patching actions that have taken place so let's go back to the Windows Server here and there's a patch tab here that shows me what are the patches that have been installed so these are 45 patches that were installed by patch manager some of those were installed ahead of time by they were already part of the ami and some of them were installed by the patch manager based on the severity so if i scope down to severity i can see here that there was this july roll-up patch update which was applied and installed on July 22nd and this was because it was selected by the rules that I had defined if I do the same thing with red hat I can see again that there were two critical patches that patch manager decided needed to be installed and those were installed the status was installed here says that these patches were installed on the server there's another state called missing which is basically means that those patches are not present on the server so this is on a server by server view I want to look at compliance across my field across my fleet I can come to class compliance and I can go to instances here choose all the instances that I have and see how they stack up in terms of the rules that I have defined I'm in a good place here so all my five instances are up to date there are no missing updates there are no servers that are in error state what that means is patch manager is running and all the instances in terms of the rules that have defined have been patched up to the level that I want right and I can click on these instances see the output of all of those actions to see what patching actions was complete completed so to summarize what we've done here is we've used patch manager to define patch baselines we've also used maintenance window to execute those patches on a schedule that we defined in this case once a week on Sunday night or Sunday mornings but we could have chosen a much more frequent execution if we wanted to do that alright so we switch back now to be presentation and talk about one more feature which is the final capability that comes with Systems Manager two more capabilities we've talked about so far actions and changes that are happening to running instances so we ran run command on a set of running instances automation takes that problem to the next step which is updating and managing Amazon machine images so we are we can use automation to make sure that Amazon machine images are up-to-date from a passing perspective as well as up-to-date from a software installation perspective a common task we see customers do is to bake a common set of security or monitoring agents into their a Mis but over time those agent versions change and when you want to patch the operating system you can combine both of those actions in automation so the way automation works is you define all your actions that you want to take starting from an ami launching an ec2 instance performing actions on that ec2 instance and then finally taking an image of that ec2 instance that entire step can be represented in one automation document and executed periodically if you want or up on demand if you want once you have this capability what are some of the use cases obviously you can use that to maintain and update your Amazon machine images you can also include an additional application so your own custom code you're monitoring agents your security agents can be bundled into your a.m. eyes automatically instead of manually executing automation you can integrate that with your CI CD pipeline so imagine a use case where you have your application being built by your CI CD pipeline the output of that is a binary version of your latest application code your requirement is to burn that into an AMI what you could do is your final step in your build process could trigger the automation flow which picks up that latest package install this on any c2 instance takes an image of that and now you have a Amazon machine image with the latest version of your code and you can use that to now deploy that to additional servers or you can update your auto scaling group to deploy that when new servers are needed as an example one of the ways you can extend automation is through Amazon lambda so automation steps can also have a call out to a lambda function where you can do arbitrary actions so let's say that at the end of our process we want to take our Amazon machine image update all the auto scaling groups associated with that and deploy it in such a way that a rolling update happens that's an action that you could script in a lambda function as one of the steps in your automation workflow the last feature is parameter store and this is the service that can be used across all the components we just talked about as well as with other services so parameter storage the base of it solves the problem of securely storing and sharing information and that could be information that secrets across your fleet of servers so rather than having things like database passwords or passwords to other systems embedded in configuration files where they can be lost or inadvertently committed to public repositories what you can do instead is use parameter store to store those parameters parameters can be simple key value pairs or key and string pairs or encrypted strings and encrypted strings are are encrypted using key management service so if you're familiar with kms or key management service it's a service that lets you use store keys and use them to encrypt and decrypt your secrets so parameter store under the hood uses that to encrypt and decrypt those encrypted string parameters you can reference those parameters in documents so you can use a parameter a kind of markup to look up SSM parameters and have that value be retrieved from the parameter store and used as a value in any of your documents you can go further and integrate that with other services so certain other services like ECS or it obvious lambda have native integration with parameters store and they can pull parameters and read them into the function if you want if you are just writing a script you can also do a direct get parameter API call to get a parameter and decrypt it and use it so this solves the problem of not having to store secrets you don't want to store your secrets in a place where they can be lost instead store it in parameter store and you can now use API to get at it the beauty with that is every API again goes to cloud trail so you know exactly who made a parameter call request who was accessing a parameter and at what time right so you have the ability to do that you also have the ability to scope down that because every call is going to be checked against iam policies you can decide that these users are these ec2 instances with iam roles by EC to have access to those parameters a very classic use case is if you're not using directory services but you want to join directly to Active Directory in order to do a domain join you need the domain admin password which is a sensitive piece of information it's all powerful you want to secure that so a parameter store is a great place to put that parameter and you can pull that value of the parameter on the sly use that to join the domain and forget that and it won't be persisted anywhere and you can have full audit on who access that parameter and what it was used for here's an example of using parameter stores with another service called code deploy so on the left here we have that example of using database password which is ABCD in this example and that is stored in parameter store and we have a fleet of ec2 instances that are configured and deployed code is being deployed using AWS code deploy the example here is that what we have in the deployment is an app spec file which defines pre and post steps so one of the steps in that file is to do a get parameter call and decrypt the value of that parameter and get the password ABCD and use that to configure my connection string and forget that from then on the way that authentication happens is by an easy to roll with specific permissions so we are giving it permissions to do just those two particular calls get parameter and decrypt write because we need to decrypt the encrypted parameter so you can define exactly who is able to get that parameter and they have to have the correct internet credentials to be able to do that so we've talked about seven features or seven capabilities of Systems Manager today let's look at how it can be integrated with other AWS services so these are some examples to do that the first service is cloud watch events so many of the capabilities we talked about like run command parameter store can be sources of events so if you've used cloud work events it's a service that lets you take events insert from different sources an example would be easy to events systems going up systems going down and why are those events up to targets where you can react to those events so an example of a target will be a lambda function that is triggered upon a launch of any c2 instance which does something to make that ec2 instance ready for use you can use the same philosophy with events coming from Systems Manager so an example of that would be you take an event source let's say the output of run command or the state the fact that run command has completed and you can now type that event to something like a lambda function to do some work so that's the Systems Manager as a source idea Systems Manager can also be a target so right now you can run commands as a result of events coming into your cloud watch events so let's say that same situation an ec2 event happened you want to react to that by running a run command you can set that up directly in cloud watch events and have that happen let's take a look look at a concrete example here's an integration with lambda so what I am doing here is I set up a cloud watch event rule to trigger a lambda function so let's just walk through a little bit of this lambda function what it does is it retrieves the event parameter it lists the output of the command that this event pertain to so basically calls the run command list output command and then it writes the output right into cloud watch logs so the philosophy here is rather than having to go through each command in the console I want to look at all the output in cloud watch logs and the way you set that up if you set a rule which defines the source event which is basically Systems Manager run command and then you set up the statuses on which it should be triggered so in this case it's a status change notification what that means is run command status has changed from say pending to completed or pending to failed and we set up the outputs so that goes to the function that we define SSM in this case email output but you could also have it as cloud watch output and that why is it all so that every time run command state changes that function is going to be invoked and the end result looks like this so you have all the output going to cloud watch logs so this inter Eliza's all the output from all the run commands that have taken place in one place that I can now query and I can now have additional things like I can set up metrics to alert me when many many run commands are feeling so if the number of failures exceeds say five I can now have an alert and take a look at that so that's an example of taking output from systems manager run command passing it through cloud watch events sending it to lambda and finally having it end up in cloud watch logs here's another example so a common case is that we want to scan our environments for common vulnerabilities and best practices so Amazon inspector is a service that does that and the way it works is that you run Amazon inspected on a fleet of servers and it produces findings those findings are things like I discovered a vulnerable version of a software package and this is the CVE or common vulnerabilities and exploits ID for that particular issue those findings can now be delivered through to SNS topics and you can have a lambda function that is triggered by that which in a closed loop goes back and run the command or runs patch to go and patch that particular vulnerability that was found right so instead of having to work with vulnerability software that produced lots and lots of PDF documents having lots of meetings to decide what to do about it you could actually script the entire process and have that closed loop where a vulnerability is found and patched almost immediately right and this can be used for operating systems it can be used for application patching and other best practices findings that Amazon inspector surfaces so we've been working really hard on systems manager so this is a list of recent launches so Systems Manager announced the agent announced support for SUSE Linux in the past few weeks we have hierarchy tagging and notification support for parameter store so what that lets you do is basically organize your parameters in a tree base structure so rather than having a flag namespace it a hierarchical tree like namespace you can use that to organize your parameters together so let's say that back to that DB example you might want to have DB parameters for a parameter for dev and a parameter for prod so other than having some sort of naming convention you can actually have a path that says prod /bb password and dev slash DB password as an example notification support is really cool this lets you track parameter life cycles so back to the DB example let's say that that DB password has updated now we have a problem on our hands because now we need to update all the servers that were using that parameter so what we can do is we can have a notification on the change of that DB parameter value and trigger a function such as a lambda function to go and update all the servers to let them restart or reconnect to the database pulling down the new value of the parameter right so again you don't have to manually do any of that we can set up a closed-loop system to do that we announce Clause cross-platform and multi-step document support so we saw examples of that where a single document such as that lists open ports document we saw can be used for both Windows and Linux with variants of the commands for both of them this is a big one here patch manager supports the next patching this came out about two and a half weeks ago so we're really proud of that this was a very common task and now what you can do is you can use patch manager as we saw and compliance as a single pane of glass to look at patching of both Windows and Linux servers both in AWS as well as on-premises and the final feature was syncing the inventory data we saw an example of quick site analyzing that data sitting in s3 and you can kind of think of many different use cases the moment you have the data in a queryable form you can actually represent your inventory query in sequel and basically it's up to you to query that join that with other information and produce your reports as we saw so we continue to innovate and we continue to listen to your feedback so if you have used Susan manager or if you have some interesting use cases please please feel free to each other we really keen to hear that here are some customers that are using Systems Manager obviously this is not an exhaustive list these are some customers who are public about have use that so if you have any interesting use cases do see me after this we are always looking for use cases and areas in which we can improve so to summarize where is SSM a Systems Manager doesn't have its own console it's found in the ec2 console like I showed you and so do you remember that and in summary why would you use Systems Manager it's about managing hybrid infrastructure it's cross-platform Windows and Linux it's inherently scalable it's secure because it's protected by both I am policies as well as logging in cloud cloud trail it makes it easy for you to write automation we saw examples of that and the end result the follow that is reduced total cost of ownership you don't pay for the usage of the service itself and through automation the idea is that you can use reduce the amount of manual work I also put a link here to the management tools blog where we have several articles some of the articles I pointed out earlier they appear in the management tool blogs so keep a look at that and let us know your interesting use cases so I hope this was useful thank you very much and I will hang around here for some questions
Info
Channel: Amazon Web Services
Views: 40,813
Rating: 4.9326925 out of 5
Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, ENT401, Chicago Summit Series 2017
Id: BmpxZsk9N48
Channel Id: undefined
Length: 55min 33sec (3333 seconds)
Published: Mon Aug 07 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.