AWS re:Invent 2018: Augmenting Security & Improving Operational Health w/ AWS CloudTrail (SEC323)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is a 300 level course we will be going deeper this does assume you have some degree of technical knowledge about AWS in general and some knowledge of cloud trail is helpful let's take a minute to understand where everyone in the crowd is at as far as your knowledge of cloud trail how many folks here have actually used AWS cloud trail before for any purposes now how many people have actually looked at a cloud trail event directly themselves okay good so we have a good mix of experience in the room today well we will be going deeper I'm going to start off because there are some people who may not be familiar we're gonna spend a few slides just talking about what cloud trail is how it's positioned in the management tools suite of products and we'll talk a little bit about fundamental ideas of cloud trail this should take about 10 minutes then we'll get into deeper concepts now we'll skip a couple of our typical high-level discussions today because we do have a lot of content to cover so we will skip the discussions of what operational health are what are personal what security is and what compliance are I mean make the assumptions you know why these things are important to your company so one thing I forgot to mention we do have two new features to discuss today we also have two demos and in the course of this session we will talk about how you can simplify your compliance story how you can improve your operational troubleshooting activities and how you can improve your security posture so we'll start off by taking a look at a cloud trail features specifically then we'll get into how to secure your event logs how to set them up best practices and recommendations for that we'll also take a look at two use cases contrived examples of how you can explore cloud trail events to derive insights into your account and then we'll talk about some more advanced topics like monitoring an automated response workflows now we are having a bit of an issue with a session timer here so you may see some variation now how fast we go through this please pardon us as we work the process here so let's talk about the product suite that in which cloud trail is home and this is the AWS management tools product suite so AWS gives you a set of management tools which enable you to programmatically automate provision and monitor every component of your environments and when you use these tools you can maintain control in your accounts without sacrificing development velocity now these tools are integrated with each other and they're also integrated with every aspect of the AWS platform there are four categories of tools and let's touch briefly on each one of these if you're not familiar with each of these I would highly suggest you get to know them because whenever you have something you need to do in your accounts which requires management these are the first places you should start we have cloud formation which allows you to author a simple text template describing the resources in your accounts you can use the template to provision resources in a safe way with a transactional field we have service catalog which allows you to safely deploy applications to your accounts and this could be anything from images to full application architectures there's also opsworks which is a configuration service and this takes care of hosting and scaling chef and puppet servers on your behalf from monitoring we have cloud watch cloud watch enables you to monitor your systems and it does this by providing metrics which you can monitor and also allowing you to define alerts for notifications when there are problems you can also use it to respond in real-time to events and then operations and compliance management we have cloud trail which is what we'll discuss today more the Sinhala allows you to capture information from across your accounts and gain insights we also have config which enables you to look at configuration of your resources in your accounts and maintain awareness and inventory and finally we have Systems Manager Systems Manager enables you to view and monitor your resources and it also gives you the ability to automate some common tasks like patch management and state management so that's a brief introduction to a very broad number of services and we won't go further today on this please do make sure you're aware of all these all these tools together if you use them together can give you control of every aspect of your AWS accounts so now let's dive into cloud trail specifically cloud trail is a service which enables compliance governance operational auditing and risk auditing in your accounts and it does this by helping you track activity across teams users services accounts and even across your entire organization now I had a conversation earlier this week with a gentleman in our booth and he indicated that we need to be a little bit crisper about defining activity so for our purposes today and our discussion will define activity in the account as public API activity in AWS that's public AWS API activity so I'm gonna use the word activity a lot that's what I mean cloud trail is integrated with the vast majority of services in AWS it collects information from these services every time the public API is called in it brings these into a central location where you can view and control the information it contains it's easy to set up it's very easy to configure and we give you a set of tools that you can use to interact with the event logs the other thing is since this is managed by AWS you don't have to worry about coverage as AWS continues to evolve and new services are added with new features AWS adds support and cloud trail for each of these new things now if you're familiar with cloud trail you may have noticed that I mentioned organizations so I have some good news to share with you on that topic and we'll get to that shortly so last in this segment let's talk about a couple of use cases that cloud trail can enable for you and the first is simplifying your compliance work flows because cloud trail pulls all this information into one single location it makes it much easier for you to prove that you are compliant if you are subject to regulatory regulations like HIPAA or PCI you can very quickly demonstrate everything that's happened in your account and it doesn't take much work to consolidate this information it's also useful because all of the information presented by cloud trail is at a single level of granularity you can also enhance your security analysis one of the hardest parts of a security analysis sometimes is just establishing what real usage patterns look like how our customer how are your users really using their accounts with cloud trail it's very easy to just jump in and you can slice and dice the information however you want you can aggregate it by region you can look at specific users it becomes very easy to establish whether or not the usage patterns in your account have any security risks associated with them you can also monitor data exfiltration risks cloud trail enables you to carefully monitor at the object level what is happening with objects and s3 buckets so if you have particular areas where you're concerned about data exfiltration you can monitor these closely and you can also react automatically to close data exfiltration risks and finally performing operational troubleshooting is just much easier with cloud trail you can dive into specific sequences of events and when you don't really know what the root cause of a problem is cloud trail is a great place to start to help you quickly get to a hypothesis and this will help reduce your mean time to resolution so these are a few examples of the use cases cloud trail can help you with just to get your mind started so that's it for just understanding you know the layout of the environment the product suite that are available in cloud trail and what it is let's take four slides and we'll talk about the basic features of cloud trail and understand some terminology that we'll be using throughout our discussion today let's talk about cloud trail events so events are the fundamental unit of useful information that cloud trail provides to you an event corresponds to a single invocation of a public AWS API and it contains not only the name of the event but a lot of the context information about the invocation of the API this includes the caller the source of the call the request and response information and all this enables you to determine who made the call at what time the call was made and to debug in to why various aspects of the information passed you and received from the call cloud trail gathers information from over a hundred and thirty AWS services at present and our that list continues to grow a cloud trail also automatically collects this activity there's nothing you really have to do other than just setting up the initial configuration and once all this information is pulled from these individual services it's brought into a central location so you can access it and use this as a single source of truth so that's what an event is now there are different types of events there's two main types of events that we categorize two different ways and this is very important for our discussion today so keep in mind the differentiation between management events and dates these are the two types of events we support management events correspond to activities using AWS api's which control resources in your account you can think of things like creating an ec2 instance or deleting an RDS instance these are usually relatively infrequent compared to data events data events are finer drain actions things like interacting with the individual objects or say executing a specific lambda function much higher volume this is something like reading from a specific object in an s3 bucket now because these have different of typical volumes associated with them we'll treat them differently when we think about how to design our cloud trail setup also each event either management events or data events can be categorized as either a read event or a write event and this is also important for data volume purposes a read event doesn't materially alter anything in any way something like a describe instances when you're interacting with ec2 whereas a right event obviously something like an updating an RDS instance so read events are much more frequent than write events typically and data events are much more common than management or much more frequent than management events typically so let's talk about how these events come to you we deliver events we capture - s3 buckets of your choice you can see you can specify any s3 bucket you want and you can use this to bring information from multiple accounts into a single location you can also opt in to delivering events - cloud watch logs and this is important because there's a number of cloud watch features which we'll explore today which are enabled by doing so now a note about our collection are our delivery speed typically at present you'll see that from the time an API is about to the time when a cloud trail event is livered is usually about is less than five minutes our 99th percentile or our effective worst case is about 15 minutes now that is changed recently for some services we've increased the speed that this these events get delivered in for these services and you'll see them delivered in under 30 seconds with a 99th percentile of 5 minutes now this is important for folks who are building notifications on top of cloud trail because that affects how rapidly you can become aware of information that is sourced on cloud trail I can tell you also that the number of services that have received this upgraded integration has increased recently so we understand now what events are we understand the mechanics of how they're delivered to you for consumption now how do you control it and that's where we come we talk about the concept of a trail a trail is a configuration object which you create using either the CLI the API or the console it's a very easy process to create these it's quite user friendly and it enables you to define a specific set of events that you want captured and delivered to a specific location and you can also have more than one trail this is important for thinking about different use cases we'll talk more in detail about why having multiple data sets can be very advantageous for you ok so that's kind of our foundational discussion we understand now what events are and understand that they're pulled from multiple services we understand that they're delivered to you in s3 buckets and optionally to cloud watch logs groups we understand that you can configure a trail to control what cloud trail does now let's talk about some best practices about how to set these trails up to enable yourself to maximize cloud trail potential before we get into this it's important to think about your design one of the problems with cloud trail is that if you if you make certain decisions and then later in the future regret them there's no way to go back in time and change your configuration and capture information you missed so it's important to think upfront how are you going to use cloud trail who is going to consume your data what are they going to do with it because it will significantly affect what you how you set up your your trails do you have multiple teams can they see the same information is it okay for the for everyone to see all of the cloud trail information emitted by every account usually the answer to that is no you need to think about how you're going to partition that and I'll show you how to do that you also need to think about what types of information you need and this is where event volume starts becoming important if you have a compliance scenario where you want to just store everything and archive it that can generate quite a bit of data but if you want to use that same data set to drive an actual application workflow the volume that data may choke it so you can when you think about the volume of data you're using you can architect your system for the smallest data set possible this makes you more resilient to scale as you continue to grow you also need to think about where you want your events delivered are you okay with a centralized source of truth across your entire company or do you want to partition things by different divisions due to security concerns this is the time to think about this and finally think about what regions you want to be in I'm going to make the argument for you that cloud trail is actually a security backplane and you should think about it from a security perspective not a pure compliance perspective in this case it's important to think about putting it in all the different regions that you have available so if all of this sounds good and you say that's that's wonderful but where do I start we do have a set of recommendations for you and this is this is accumulated from talking to customers again and again seeing what works for some companies what doesn't this may not be the perfect recommendation for you but it's a good place to start your thinking and the first thing we recommend is you create a trail in each account which captures all management events including read and write events and this is your compliance backplane this is your record of everything that's happened in your account you can use it for multiple things you can even use it when you're doing operational debugging then we recommend you create an additional trail for use cases which have specific requirements one thing that comes up a lot is people like to create a secondary trail that captures only write events for management events because that makes it very easy when they're doing operational troubleshooting to go through and just look through them individually and not have to filter out a bunch of noisy read events that they may not care about in that scenario you also may have two different teams that have different needs that are changing you can have two different trails one for each team and then you can give them each control over the data that's being logged and they can control it independently so these are some examples of use cases where you might want to have multiple trails also we recommend that you do use a centralized delivery location for events as I mentioned you can partition this but we find that most customers get the most benefit about from having a single location where they have a single source of truth for cloud trail events next you need to think about your bucket policy who gets access to what if you're delivering all of your events into a single bucket we we break it out into a folder structure that allows you to give different people access to different subsets that folder structure is broken down by account ID and it's also broken down by trail if you configure a prefix in the trail so you can literally control who gets access to the output of which trail in which account and it's up to you to decide how you want to partition it but we do recommend that you set it up explicitly and that you don't just give everyone access to everything next it's a good idea to actively monitor data exfiltration and critical function usage for s3 buckets and lambda functions which are critical for your business we do not recommend that you just enable it for everything by default you should look at your s3 buckets and say which ones of my buckets have things that are critical to my business mission-critical information or at high risk of data exfiltration same thing with your lambda functions which things are so dangerous to execute that I really can't have this floating around unmonitored and then you can turn on data events for these so you can track activity and that will also set you up to build automated remediation workflows or even notifications on top of this so with all that you now have a pretty good set up if you follow these steps you'll have a really good understanding of what's going on your system I would recommend that you then let it run for a couple of days or a week and then spend some time getting to know your activity patterns and this is probably one of the most overlooked things we see when people are dealing with cloud trail is that people never stop and take the time to get into it and figure out what's actually happening in their accounts you can learn a lot you're almost guaranteed to see - you're almost guaranteed to learn some follow up questions and you'll need to act on all so now that we've talked about how to set up your your event logging system let's talk about some mistakes that we see people making again and again and these are less it's less important about the specific mistakes it's more important to think about the thinking let's take a look at a couple of examples one thing we hear a lot is people just turn on cloud trail in the areas where they have stacks deployed they say hey I'm logging at all the important regions then a malicious user logs and launches an internal attack from an unused region and they have absolutely no visibility into what happened so again start to think about Kyle trail is not only just a compliance tool but something that you can use for security and for forensics for going in and determining what happened retrospectively and even for setting up monitoring and remediation in which case you need to think about all the possible places attack vectors could come from and that's why we recommend you set it up in all regions another thing we hear a lot is people saying hey I don't need all the events cloud trail produces tons of data that's too much for me I don't need that and so they captured just right events but there are things you can learn from write of their read events for example you can determine whether or not somebody's probing your resources looking for Reis weaknesses you can learn if employees are exercising due diligence when they make a change and that caused an operational outage so you need to think again about the broader picture about what you can do with cloud trail before you try to do early optimization on the data set size so just some tips to think about make sure that you think through some of your planning assumptions before you implement them because again Cloud Trail once you've started logging if you later go back and wish that you can see those events they're not necessarily available so next let's talk about securing your events it's important to secure your event logs because they provide you a layer of protection from malicious users event logs are not just the ability to die - what happened if you're challenged from a compliance perspective you want to be able to go in and catch people who are doing things on purpose or even to determine if something was accidental and understand where it came from I'll show you a situation where this is very easy but you need to make sure that people can't go in and wipe your event logs clean this is a very common attack pattern somebody goes in they want to internally damage an account the first thing I want to do is make sure no one can track what they've done so they'll go in and they'll try to wipe out the cloud trail event logs so what we will do is we will talk about four steps that you can take that will make it very difficult for anyone to interfere with cloud trails ability to log events in your in your accounts so the first recommendation we have is to turn on log file validation this is a feature available in cloud trail which causes cloud trail to create a hash of every log file it delivers to you it collects these hashes and puts them in a file and it periodically delivers this to you into your delivery destinations in a different folder cloud trail digitally signs this digest file so that you can tell if it's been tampered with with these things in place you have a very strong chain of ownership that will allow you to tell whether or not anyone has materially modified the event logs in any way and this includes deleting any of it you can tell completeness and whether it's in its original form now if you find this is the case you don't need to have building a remediation strategy you can contact AWS support and indicate that you have had your cloud trail event logs compromised and we can regenerate them for you so you don't need to worry about not only building your remediation strategy here but what your security posture would look like to keep that segregated next we recommend you use your own kms key this adds another barrier to malicious users in that in order to access your event logs they would need to have decrypt permissions on that key also we recommend that you enable access logging for s3 buckets where cloud trail event logs are delivered and this makes it hard for anyone to even try to access the event logs without being tracked this will allow you to see if people even try and they don't have ax to get an access denied this is an additional level of protection against probing and finally turn on multi-factor authentication for s3 buckets where a tub is cloud trail data is stored this yet again makes it harder for an attacker to attack your event logs without revealing their identity now if you follow these steps you will have a strong defense against people interfering with your event logs and this will start to form a good security backplane that you can use for establishing accountability in your accounts now because we're thinking about cloud trail from a security perspective we need to change our way of thinking and remember that every security system is only as strong as its weakest link and one thing we see problems with a lot is people secure their event logs but then they don't think carefully about how they're deploying cloud trail so you need to think about your strategy from a security perspective do you want to take a manual process this is easy it's the fastest ramp up but it's also prone to human error you can use an automated process a little bit more investment upfront but less prone to human error once you figure out your approach you need to make sure that all the appropriate policies are set up for cloud trail the s3 buckets and the cloud watch logs groups to make things are button down we recommend you do this in a way that's repeatable and maintainable now on that topic it is my pleasure to introduce a new component of cloud trail which is organizational trails historically whenever you defined a trail it only applied to the account in which it was defined you had the option to decide whether or not applied in a specific region or all regions we now give you the ability to create a trail in one account and apply it across all the accounts in an AWS organization this is accomplished by defining the trail in the organization master account and it's the same experience you have in defining trails so it's very intuitive when you define a trail in an organization master account and indicate it's an organizational trail cloud trail will copy the trail definition to all the accounts in the organization automatically cloud Jail takes care of all the heavy lifting of configuring roles making sure everything's set up and keeping everything in sync across the so you have a single point of control in the master account now there are a couple of benefits to this and the first of which is as a result of the way this is implemented you cannot make changes to the copies of the trail that are present in a member account from the member account the only place you can make changes to an organizational trail is from the organization master account this reduces the surface area both for errors and malicious attacks because another things some malicious users like to do is actually just turn off the cloud trail do their thing and then turn it back on that becomes much harder now another benefit here is that cloud trail listens for changes to the organization so if you add accounts to your organization cloud trail will intercept and immediately configure an organizational trail in new member accounts for every organizational trail that's defined in the organization master account and I've been shopping this around with some customers they indicated to me this is somewhat of a game changer so I really want to drive home how easy this is this is the AWS cloud trail or the native this cloud trail console we won't go into detail about what's happening here but you can see different types of information you specify when creating a trail like the trail name the different types of events you might want but if you look at the red arrow that's really all you have to do to specify that something's an organization trail just opt in say yes and cloud trail does the rest it's all automatic one thing I forgot to mention is organization trails always share the same delivery destination so you get that same centralized source of truth for all the delivered events across the accounts by default and do remember that in order to do this you need to actually create the trail in the organization master account so next let's talk about maintaining awareness of costs cloud trail can encourage charges it does not always we have a non charge tier and we have a charge tier if you are not incurring charges you may want to protect yourself to make sure that nobody accidentally starts incurring charges if you are incurring charges you want to make sure that there's not a sudden spike in charges that you're not aware of for this we recommend that you use AWS budgets using AWS budgets is very easy to set up a budget for cloud trail on that each billing cycle will set a certain maximum amount that's expected if the amount that's in charge by cloud trail ever either exceeds that or is forecast to exceed that you'll receive a notification if you set your threshold correctly you can get into a situation where if there is a problem or a change there can be almost negligible financial impact you can still be aware of it and proactively take care of it so this allows you to cheat identify changes in trends it can be a result of someone say you're not incurring charges someone doing something that does start incurring charges it can also capture changes that are result in changes and configurations from unrelated services we'll talk in a little bit about how changing other services can change your event volume and by consequence can change the volume of data you see and the charges you see if you are paying for cloud trail and again just to give you some idea of how easy this is this is the AWS budgets console and you can see it the first arrow that you can just simply into it you can enter the budgeted amount you have simple you just define whatever you want you can also at the third arrow see your historical time series information about the costs incurred so you can use that to calibrate what you think would be a good hypothesis for a starting point also at the second arrow notice you can just easily select which service you want to use as part of your budget you can use cloud in this case I've selected cloud trail only but you can select multiple services and this is good not only for cloud trail but this is broadly useful for a number of services so please do check this out there's a lot of protection you can build for yourself I've heard several cases where customers come to us and if they had this setup it would save them a great deal of pain so please look at this very carefully so we are about halfway through let's take a quick break and just rehash what we've talked about we looked at management tools all up we looked at all the different services briefly that are in that space we looked at cloud trail what it is we looked at some foundational information about cloud trail and the kinds of things that can do we set up a secure foundation for tracking and exploring activity and we also explored how to maintain awareness of cost and I will again because I talk to customers frequently and because they come to me a lot when there's a problem I cannot overemphasize how important it is to make sure you maintain awareness of cost across your services so now that we understand a little bit about cloud for me or cloud trail and some of the ways to configure it let's talk about how we can actually use it in order to do this we're going to take two slides and we're gonna just look briefly at what an event looks like for you folks who haven't seen this cloud trail events are JSON documents so they're very human readable and they're exposed in a standardized format all the information is exposed as attributes and they contain information like the source identity the look at the calling party that called the API and the information about the request and the responses that was sent to and received from the API you can use a lot of this information for debugging and you can also use it to establish accountability let's dive into two areas here where we have more complex objects the source identity contains a lot of additional information you can tell not only the principal and the arn of the the identity that was invoking the call but you can tell other things like whether or not that particular identity was MFA authenticated at the time of the call you can also determine if this was executed as an assumed role and if so you can correlate back to other events and determine which user actually assumed the role before making the call and this is a good thing to think about also when you start thinking about establishing what the usage patterns are in your account you can use cloud trail to determine how people are using assumed roles which is typically a security risk also the requests in the response information there you can just see some examples when you get into debugging operational issues this is useful especially when you're doing things where you need to identify specific instances one of the things I use this for a lot is pulling out the instance idea of ec2 instances when I'm doing things with ec2 instances that I need to debug so again I'm providing this to you not so much so that you would parse all this information more to whet your appetite understand that there's a wealth of information in the cloud trail events and do explore this on your own time and take a look at the information that's available I can almost guarantee you if you look in detail it will change how you use cloud trail because you'll realize there are use cases based on the information that's in these events so now let's talk about the tools that we can use to access cloud trail events out-of-the-box cloud trail provides you a set of search and browse tools which you can use to access the lesson 90 days of management events let's take a quick aside and discuss that point when you turn on cloud trail all right I'm sorry excuse me when you create an account we turn on cloud trail for you automatically and we keep of private copy of the last 90 days of management events by default this does not cost you anything and you can use these tools to access that data if you want more than this 90 days you need to set up a trail to deliver to an s3 bucket but if you're looking just in this particular slice of time at management events this applies only to management events you can use the event history in the AWS console this will allow you to browse through your events and it will also allow you to apply a single filter that filter can help you find specific events or look for specific sequences of events and it can also be used to specify a specific time range some of the examples of the types of options we have for filters include filtering by event name or filtering by the resource that was affected by a particular API call we also have the look of events command in the CLI which has similar functionality and also the look of events action in the API these are good tools to get started if you're just poking around or you have a real quick action you need to perform that's need to take the action against management events so if you want something more complicated or excuse me if you want something more powerful performing more complex queries we've integrated cloud trail with amazon athena now amazon athena is an interactive query service that allows you to query against datasets using standard sequel now through this integration it's very easy for you to select a cloud trail event log and import it into athena as a snapshot Athena will populate one of their tables with this and you can perform ad hoc queries of your choice against this data set in the in the Athena table and we've simplified this for you to the point where it's literally just a few clicks of the console for folks who are familiar with Amazon Athena using this integration you do not need to define a schema we take care of that for you so it's literally just a few clicks with this integration you can perform much more complex queries you can perform multi attribute searches you can even do advanced things like building time series views of your event log data and this is where we expect people to go when you're doing more detailed security analysis like looking at behavioral patterns all up across your accounts and doing formal analysis now if you're looking for something that's a little bit in between maybe not maybe a broader scope than the out of box tools and maybe not so not not not as complicated as something as a full sequel implementation or something that operates against live data the cloud watch team announced a new feature a couple of days ago which is very useful for exploring cloud trail events and this is cloud watch logs insights it's very useful for lightweight queries it provides you with a simplified query language that's reminiscent of a stripped down version of sequel with some concepts from command-line utilities integrated it's very easy I'm a product manager I'm not a technical person I learned this query language in 20 minutes flat with almost no help I'm sure there won't be any problems for anyone with this you can do simple or complex searches you can execute them in seconds you can author queries and you can elaborate on them which allows you to do iterative queries and allow you to slice through your cloud trail event logs iteratively and this also supports a set of data aggregation commands so you can summarize information across your accounts so we've looked now at how to access cloud trail event data our next step then would be to say well how can we take this information and do real work that's really useful to our businesses so I have a set of demos or two demos specifically that I will show you about how to actually use cloud trail event data and we're going to use them first to understand usage patterns in the account to support security analysis and then we'll look at a contrived example of how to explore an operational troubleshooting situation before we dive into the demos let's talk about this general space if I am looking at a new account with cloud trail I can perform a variety of analysis very very quickly I can baseline my I am user activity I can find out quickly who my top users are I can find out what they're doing I can find out which users are using which services I can find out what my top services are I can find out specific sequences of behavior that are happening for users of interest I can also profile my use of assumed roles these are just a few examples there's also the ability to look at console Lite sign-in patterns and even understand the source IP address distribution for access to your account the list goes on and on and on one thing that's very interesting is that you can actually use this to very quickly investigate a suspicious user activity pattern maybe hopefully you'll never be in a situation where you have to deal with an internal bad actor but if you do or if you have someone whose accounts being compromised and you need to just quickly check what does this person been doing you can check in seconds without with cloud trail and we'll actually take a look at this scenario right now please give me one moment well I switch accounts here okay so in this demo we're gonna use cloud watch logs insights I'll give you a quick introduction to this interface I'm personally madly in love with this interface but that's just my opinion at the very top there you can see actually let me highlight this for you you can see that you can select the cloud watch logs group that you are pulling information from that's important if you're using multiple trails that deliver two different logs you can select time ranges quickly that you want to look for and here you can specify this custom query language which is very simplified we won't talk about this language specifically right now but for your information you can see I'm using a command called stats and that basically does the same thing as a sequel group by a statement you can see also that I'm dripping by AWS region so below this you see that cloud cloud watch emits a time series chart for you by default and below that you can see the query results so what I've done here is I've just simply in a very short query I've said show me the count of activity aggregated by a two base region so I can see in what regions I have activity now in this contrived example I'm working for a company where I expect activity to be focused in a few regions specifically in Europe and also in the United States and I see that that is indeed the case I have activity in use central one an activity in US East one but I also see that there's quite a bit of activity in other regions as well and I want to understand what that is we'll come back to that so next let's say I want to just drill into a specific region it's as easy as just adding a filter and changing my aggregation to event source now I'm looking at this by service ok so now I can see all my services listed in that region I can see their call volume per service I can see that s3 is my top service immediately to see the kinds of insights with almost no effort I'm just accumulating here s3 is my Thai ec2 is also want to mess up and I go down and continue to look and I see ok these all make sense in this contrived example I notice that number 14 is redshift and let's just say that we have a company policy that we're not using redshift I'm immediately concerned that someone's use redshift using the same strategy I can drill down and look at who's using redshift what the call flows are I can find the users that are using it and I can connect with them and understand what's going on so let's say I want to take this a level further and I'm going to dive in and I want to understand the users who are using using my account so again I just add another filter criteria I changed my a grinning to B by event name and user identity and I can immediately see the api's that are being called and the users who are calling them along with the information about how frequently this is happening may seem a little bit intimidating at first to look at these queries but if you spend 10 minutes with this you'll be able to pivot through this stuff very quickly and you can see right away that I have an IT support user who is one of my top active people you can also see that it stands out very clearly that I have a finance user who is executing ec2 activity and that concerns me and I forgot to mention that I've also pivoted this by ec2 usage specifically so at line number eight there are the finance user who is using API as an ec2 that concerns me now I have a follow-up I need to talk to this user and understand why they're actually using this because I don't believe anybody in the finance team should be creating or using easy-to instances directly so here we're building a very basic understanding of how we can start at the top and just peel the onion look back and explore our accounts and gain insight upon insight you can go as deep as you want with this we're building an exploration graph as an example of going a different route if we go back to our original query here let's say that we want to figure out what's actually going on in these regions where we believe there should be no activity I can alter my query again to just look at that specific region and I see that there is a single user who is performing something related to ec2 it looks like he's created an ec2 instance and he's manipulating it and you'll notice here that you can't see the information about who created it and that's because I turned on logging after this user created this instance this is the problem you'll kind of reel run into if you don't set up your your trails proactively before usage starts to occur see something like this and you'll say okay I know there is a problem but I don't necessarily have all the information so think about that in this case I see it's a product manager who is creating things that using ec2 in a region that's not supposed to that sounds like a process issue to me so I can connect with this user and I can say what are you doing in this region why are you doing this this doesn't make any sense so that's a really rudimentary example of how you can start to explore information in your account you'll notice how easy it is to look at things across services and that's because of cloud trails design where everything is reported for each event as a single schema that's consistent across all services the only service specific variations you get are the request and response information in individual events and that's useful for debugging now let's pivot this a different way completely let's look at it from the other end of the spectrum where we're saying I have an operational problem I'm going to use a contrived example again how can we go through diagnosing this problem so let's think about diagnosing operational problems all up if you've done a good job of establishing an operational baseline in your accounts and you're familiar with the activity patterns using cloud trail it's very easy to identify trends or changes which are unusual you can understand differences in an event volume you can understand differences in error code patterns coming back from API implications you can also drill into areas of interest where things look unusual and stuff and go to a really fine grained or fine degree of detail understanding exactly what's causing changes in the call volume patterns so the best way to explore this is to take a look at an example so let's take a look at a situation that I've contrived in this case I'm going to pretend that I am a I'm a company or I'm a operation manager at a company and I've just received a notification that there's been an operational outage someone suspects that a particular ec2 instance I has been has failed in some way and when people went in to understand what was wrong they noticed that it seemed to be running so now I have this report that there was a transient failure I'm concerned there may be an intermittent outage I may be looking at a large-scale event of some kind I need to understand what happened fast so the first thing I'm going to do is I'm going to go into cloud trail and I'm just going to look at everything that's happened in my account in that period of time now if you have sharp eyes you may notice that I'm looking at a different log group here and this goes back to the statements we discussed about why it makes sense to set up different trails for different reasons you'll notice that I'm just listing all of the information that happened in this account and it's fairly digestible and that's because this is coming from a different trail it's coming from a trail that's set up to capture only write to management events therefore in a fairly small period of time I'm likely to have a digestible piece of information that I can just scan so let's do that let's look through here and we can see a variety of different types of activity we can see every bucket activity we can see some activity interacting with ec2 instances and right away we see something of interest we see right here that there's a call to stop instances now cloud watch logs insights does does the work of actually parsing the log data and showing it to you in a view you can explore here in fact the column on the right there you can see all the fields that they've discovered and you'll recognize those from the event schema so as we look through this particular event we can see that it is in fact looking at the request whisperer a meters the same instance ID we heard about so now we know that this is not a critical problem we know that this outage is caused by someone going in and explicitly stopping an instance so we know we can downgrade our levels of concern and when we look at the information about who actually executed it we see that this was again a product manager so I'm again wondering why product managers are roaming around in my accounts creating easy to instances and stopping them so my next question is show me everything that the see has just been doing I can make a very simple query which simply spans the last week and aggregate or doesn't aggragate selects all the information for this particular user stepping through I can see everything they've been up to and one simple view it's that easy and I can see as I step through that there's some activity related to managing ec2 instances and then I see that they've created some instances and immediately stopped the instances afterwards so when I look into this I can see that the instances they created looking at the instance ID this is a different instance so thinking about this putting on my thinking cap I say I have a someone who logged in created an ec2 instance and then stopped a different instance and this sounds like human error sounds like somebody made a mistake so I want to see what else they've done I look they've done some other things with s3 and then look they logged back in later and they started an instance out of the blue so I go in here and I look at this start instance call and I see that looking at the instance ID this is in fact the instance in question so I can go through this is two queries I can go through this in three minutes I went from zero knowledge worried about a LS c2 looking at this and realizing I have a human error and not only that I have somebody who made a mistake realized his mistake logged back in and fixed it and didn't tell the operations crew so I now go to this user and we have a wonderful conversation we have a conversation about what's appropriate to do an account so we also have a conversation about processes but I accomplished all of this in two minutes so again these are contrived examples they're not meant to reflect real-world situations they're more just to kind of give you an idea the kinds of things that you can do with cloud trail the kinds of insights that you can glean when you explore cloud trail events okay so we talked about how to explore cloud trail events we also talked about using them for diagnosing operational failures let's get into some more advanced use cases now it's one thing to diagnose the origin of an operational problem it's another thing to be notified immediately we want to be notified in every case where there's an operational problem as soon as possible in order to minimize the surface area for damage to customers or business interests cloud trail is because it is a very broad coverage area covering so many different services it is a great source of information to drive monitoring activities and we provide out-of-the-box two different ways that you can perform monitoring using integration with cloud watch and integration with AWS config and you can choose what it makes sense for you using cloud watch and assuming that you have put your is configured your trails to deliver events to cloud watch logs you can set up cloud watch metrics on these logs and then create cloud watch alarms on the metrics what this will allow you to do is did detect any unusual changes in event volume changes in error code characteristics changes in particular users API usage basically any way you can filter cloud trail event information through a cloud trail setup or a configuration you can set up alarm specific to these things and when an alarm is tripped it sends you a notification now if you are more interested in receiving if you're more interested in looking at things from a configuration perspective we also support integration with AWS config and this using this if you've turned this on a nativist config anytime our resource has changed config will listen for the cloud trail information about this and automatically inspect resources which have changed and notify you if it's important now because we're talking about monitoring I do want to make mention of both guard duty and macey we're not going to talk about these at length because of time constraints but cloud trail goes very broad across AWS services but these services focus more on the security aspect of things and they bring a great deal of domain knowledge into detecting threat vectors and they bring a lot of progressive knowledge as new threat vectors emerged so if you're concerned about these areas I would suggest you take a look at these services to see if they're right for your company now we talked about receiving notifications whenever there is a problem the next best thing would be to actually fix them and you can do this with cloud trail using integration with cloud watch events every event that comes through cloud trail is also sent to cloud watch events and as a result you can set up custom remediation lambda functions which process the cloud trail events you can build your own remediation solutions based on criteria of your choice and really what you do in those functions is up to you the sky's the limit if you can do it from lambda you can do it as a remediation function so when these events come in you can filter them you can determine if there's specific changes in call volume that are a problem you could determine if an action is not compliant with policy and you can use AWS api's even to automatically fix the problem so let's look at a couple of use cases here let's look at a hypothetical data data exfiltration scenario that we're gonna automatically remediate in this situation I have the company that set up a lambda function to watch my s3 buckets it's gonna watch for changes in the access control lists or Ackles and if they're not conform it to plot come to policy they're gonna automatically revert them so I had the bad actor an internal disgruntled user who says they want to change the access policy on a particular s3 object and let the whole world know just to disclose some data for who knows what reason so when they go to do this this sends a notification through cloud trail in the form of an event that gets delivered to cloud watch events and this in turn invokes the custom remediation function in the company the function detects that this is a non-compliant action and immediately calls an s3 function to revert the change and we've seen remediation actions like this execute in seconds so it greatly reduces the timeframe in which data exfiltration risks occur if you've set something like this up you could do this for s3 buckets you can also do this for lambda functions so let's take a look at an example in the lambda function the protecting in this case let's look at a different situation I have a developer who's become very excited about some changes he made and he goes to deploy his function and test it and accidentally logs into a production environment when he goes to play with this lambda function it sends a cloud trail event which again goes through cloud cloud trail excuse me goes through cloud watch events and executes a custom remediation function this function is guarding lambda functions and making sure that lambda functions in this category only get executed by service principles having understood that this is now an invalid permission change to this lambda function the custom remediation function built by this company is able to immediately change the permissions on the lambda function back to limit the amount of damage that can be caused by using this function in an uncontrolled way it could also do something like send an email to a central compliance Authority or even send an email to the developer saying hey what are you doing you're in the production environment and this is not allowed and this is a good example of how you can also build protections into your accounts to help protect you against human error which is becoming a more and more important idea so we've covered a variety of topics today we've talked about how to look at cloud trail events we've talked about how to get at the directly at the data use it for a variety of use cases I would like to highlight that there are some third-party products that are doing some pretty amazing things with Cloud trail which give you additional value everything you've seen that we've done here can be done by any tool which understands cloud trail not necessarily our first party tools so just due to time constraints we won't go into the details of any of the the partners that are building applications to use cloud trail but I would recommend that you take a look at some of these partners and understand their offerings we've seen a variety of interesting things from custom solutions to complex queries to aggregated views of operationally important data if we saw one in one case or combining VPC flow logs with cloud trail data which is very interesting we've also seen cases where people have built applications that automate a lot of the security analysis work that you might need to do in your account and this just makes your life easier so please do check out our partners our last major topic to cover today will be cloud trails pricing model and this is extremely important cloud trail can be a charged feature and our pricing model is to charge based on the number of events that are delivered as well as the number of times they're delivered and the second part is where it needs some scrutiny you can think of each trail as a delivery action so if you have three trails you would consider that every event that comes in would be delivered three times that's not strictly true because it depends on the filters but you can think of it that way so for management events we charge $2 for every 100,000 deliveries the caveat with management events is that the first delivery is not charged so you can set up a trail in an account that has no trail so it captures all management events and you will not be charged many of our customers in the bucket where that's how they approach this if you set up a second or a third or fourth trail those additional trails will be charged in $2 per hundred thousand events for management events data events are charged at ten cents per 100 thousand events and we had to get creative in how we did this because of the difference of event volume between management events and data events so we actually built a whole new architecture that we can run at a lower cost for data events so they're charged lower but there is no free tier for data events so please do keep this in mind make sure you're aware of this make sure your organization is aware of this so there's no accidental to encourage charges for people setting up trails in situations which would incur charges now we just have a few minutes left so let's go ahead and start to wind down I personally have collected some suggestions based on the conversations I've had with customers and I want to share three things with you first so we talked about be aware of when people are creating secondary trails in your account if you ever have more than one trail just be aware of the pricing model make sure everyone that's involved with it is under understands the cost implications there are many good reasons to do this just make sure that you're doing it on purpose the second is when you change configurations of some services which are completely unrelated to cloud trail you can cause changes in cloud trail event volume a good example of this would be say you're using a service that has an encryption option you go and you turn on encryption if that's integrated with kms that can cause a significant increase in the amount of events that KMS events amidst a cloud trail this can increase the size of your data set and also increase your costs so both of these first two points a really good way to defend against this is again use AWS budgets to make sure you maintain a very tight awareness of your costs you don't have to watch them all the time you'll have the notifications there to tell you quickly if there is a problem and finally beware beware beware of lambda infinite loops if you're writing custom remediation workflows if you write a lambda function that responds to a particular cloud trail event by calling an API which causes the same event to be emitted you're going to create an infinite loop and that will spin out of control at an unbelievable speed and you can't incur unwanted charges there so be aware of that that's something that's very easy to mitigate it's really a question of just development discipline and awareness it's actually quite easy people just need to know about it all right let's quickly rehash what we cover today we took a survey of the cloud trail features we set up a secure foundation friend a secure backplane for maintaining awareness in your account we talked about managing costs we also learned how to explore events with the different tools available and we performed some sample security analysis and operational troubleshooting we talked about how to implement active monitoring and we also covered automating response workflows and finally we discussed third party tools and actually one thing we didn't include here was our pricing model so please do remember to fill out the session survey in your mobile app this is very important to us for making sure that we deliver focused content for you guys in subsequent public engagements other than that thank you very much for joining me it's been a pleasure speaking with you please enjoy the rest of reinvent and please enjoy the party tonight [Applause]
Info
Channel: Amazon Web Services
Views: 5,458
Rating: undefined out of 5
Keywords: re:Invent 2018, Amazon, AWS re:Invent, Security, Identity, and Compliance, SEC323-R1, AWS CloudTrail, AWS Lambda
Id: YWzmoDzzg4U
Channel Id: undefined
Length: 58min 37sec (3517 seconds)
Published: Fri Nov 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.