AWS re:Inforce 2019: Threat Detection on AWS: An Introduction to Amazon GuardDuty (FND216)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
tried this one alright we'll try it this way alright I thank you everyone for coming today's session threat detection on AWS and introduction to Amazon guard duty this was not working yep nope no something happen now it's black here goes all right there's no thanks so yeah welcome to say session introduction to Amazon guard duty so before we dive really deep into the specifics on guard duty itself a lot of times it's really kind of instructive and useful to understand how this fits into all of our other security services so as you can kind of see we have quite a number of different security services at AWS and so one way to kind of help visualize those is to apply them to the NIST model of identify protect detect respond and recover guard duty fits in as a detective control however as we go through the presentation especially when we get to the section towards the end when we look about how we respond to some of these detection that come out of guard duty we'll start to leverage some of the other adjacent services specifically in the protect and the response side so what is guard duty from a very high level guard duty is a fully managed threat detection service for your workloads and your AWS accounts themselves and we're specifically looking at identifying threats that are known malicious or represent behavior that is sufficiently unusual that it should be looked at I had a curiosity is there anyone here already using guard duty on their account all right yeah we got a good number and it doesn't really surprise me so we released a guard duty at reinvents 2017 so just about 18 months ago it's been one of our fastest growing services and we have quite a number of customers that are using it we are monitoring primarily API calls as well as the network traffic that's coming to and from your your instances and we really decided to build this service because we needed something that was a purpose-built threat detection service whereas a lot of legacy tools or other detection methods were really only very network focused we needed something for Amazon for AWS that was looking not only at the network side of things but also understood that you need to have some ability to detect malicious activity through API activity in addition - specifically just looking at network traffic so from a very high level how do we detect threats with guard duty so our detection starts with three primary data sources the first data source is V PC flow logs if you're not familiar with flow logs one thing that I do want to point out is you don't have to turn any of these services on so we get access to the log data associated with these through an independent Channel behind the scenes directly from the service that creates them so you don't have to actually administer or enable or configure any of these log sources we automatically get that data into the guard duty service as soon as you enable guard duty itself so we look at flow logs we love to look at your DNS logs the NS log saw all of the domains that instance is within your environment our query and those to help us really understand the network traffic alright so we can see who you're communicating with as well as well as what domains you're looking up and then lastly we look at cloud trail so cloud trail if you're not familiar with it is a essentially an audit trail of all of the API activity that takes place within your account the feeds from all these log services go into our back-end our back-end is consist of two different sets of fleets so we have what we call a stateless fleet and a stateful fleet and the stateless one is really things that we can detect immediately through that one particular piece of data so things like an IP match on a threat list if you look more at the stateful those are more of the behavioral detections these are detections that we have to contain State for that particular instance or that particular I am user or role to understand whether or not that behavior is normal or not and then the output of that is what we call a finding and one of the things that we do with our finding is we try to aggregate them we also try to minimize the amount of noise and we try to make it very simple as far as the severity so we have kind of a stoplight method right so low medium and high and we'll talk about how we can respond to those later on in the presentation as well if we dig into these three data sources a little deeper so flow logs as I mentioned you don't have to turn these on that sometimes is a concern from customers because in large environments flow logs can get expensive again we get access to these without you have to enable them which also means you don't have to configure anything so that's really helpful especially in large numbers of accounts you don't have to worry about whether or not somebody is disabled one of the services that we depend on what we get from flow logs is we get information that's very similar to net flow if you're familiar with that essentially we get the IP address we get the port from the source as well as the IP and port for the destination the protocol that's in use as well as how much data was transferred in that session and with this information we're able to actually apply the threat intelligence feeds to those to look for communication to no malicious IPS either inbound or outbound as well as to build behavioral models for your individual instances themselves to determine whether or not the amount of data that's being sent by that instance is normal or if it's communicating outbound on a port that is unusual DNS logs is somewhat of a special log source currently so there is no customer-facing way to get access to these logs today the guard duty is the only way to get or have you lis the DNS query logs inside your V PC get inspected so these are separate from the external route 53 query logs this is the domains that are being queried by the instances within your environment and this is really useful because if you have an instance that gets compromised it might query domains associated things like Bitcoin mining or other associated threats or also to detect compromised instances in that fashion the last one of course is clout rel again most everyone will have cloud trail turned on but again we get access to cloud trail through an independent stream and cloud trail gives us a history of all of the API activity that takes place in your account all the console logins and what we're looking for here is things like compromised credentials looking for unusual activity logins to the console or use of api's from locations that are unusual users that are performing activities or trying to perform activities that they don't ordinarily perform or the exfiltration of instance based credentials is yet another thing that we could detect with cloud trail to make sure that if somebody compromises an instance and gets I am instance credentials that they can't use them outside of AWS one other thing to make note on this is we don't store any of this data so this is one of the ways that we also keep our costs down for guard duty is all of this data is streamed to us from the internal source we process it in memory and then we destroy it so we don't actually maintain any of this data as part of the service the only exception that of course would be if one of those login trees creates a finding the information about that finding obviously goes in the finding itself and then we store that for you some of the the key benefits are kind of selling points of guard duty itself one of the first ones is its simplicity right - in order to get guard duty turned on there really is only a single button there's nothing to configure or no administration to do and this is one of the things that customers tell us they like the most about it is there's absolutely no changes to your environment one of the challenges a lot of times with security tools is you end up having to install agents or put appliances somewhere in the path and because we're doing all of our analysis on log data that we're getting from outside of your environment there's absolutely no changes to anything that you're running nothing to install no architectural changes as soon as you turn on the gradually service with that single button we start collecting that data into our account on the backend and you start getting protection for your services we talked to a number of security teams where they've enabled guard duty on their application teams accounts and they didn't even realize that was turned on and because we're doing everything out-of-band it gives those security teams that ability to turn on the service without having to really do a lot of coordination with the application owner or worry about if it's going to impact performance or cause any kind of challenges with the application itself the other benefit of again being banned is it allows us to do continuous monitoring so as you create new VP C's or if you create new instances there's no agent that you have to go and deploy or keep up-to-date you don't have to make sure that a new VP C is added in scope or anything of that nature when you enable guard duty on an account in that in a particular region we immediately start protecting all of the assets that are part of that particular account in that region and what happens continuously regardless of if you create new ones third way it is a global service so we are in all of the commercial regions we also have stated that we will be available in new regions relatively shortly after the region launch if not at region launched and we think this is important because we don't want customers to have to operate in regions without having the protection that guard duty can provide them the types of threats that we detect again where you can check known threats then we're going to look at through the threat intelligence and we'll dive into that in a minute as well as unknown threats through monitoring the behavior of your users and your instances the last box down there is yet another thing that customers tell us they really really like and this is a feature that we have that's called a master and member and guard duty so in the console you can go and associate a guard duty account with a master and you can choose any account to be a master it's not tied to any other service generally speaking this is a security account some sort of a centralized logging account and when you do that the member accounts will be able to still see all of their findings but the master account gets to see findings across the entire organization and this is really useful because you allow the individual line of business owners the people that own those applications to still be able to go in the console they can see their own findings they can be responsible for you know their security that events that are happening their account but from a security team perspective you can see a global view across all of your accounts you can also then have all of those go out to a downstream system from one single spot so you don't need to collect events from a hundred different accounts if you have a hundred accounts you make them all members of a single master and in order to collect all of those as a security team you only have to look at it the master account itself also it allows you to prevent member accounts from taking certain actions like creating white lists or suppression filters right so it puts that control into the centralized security teams account so if we look at the the different types of threats that we can protect the the first type is we consider known threats and we detect known threats by applying threat intelligence feeds and threat intelligence seeds is really a fancy way of saying IPs and domains that we know do bad things or are involved in unwanted activity these feeds are constantly being updated right so the fidelity on these you have to constantly make sure that they're you're getting updates otherwise they get stale we get our threat intelligence from three different sources so first off as you might imagine at AWS we have some of our own threat intelligence that we generate by seeing a lot of traffic that happens on the Internet itself that is something customers have been asking us for a long time to externalize unfortunately it's not really an easy thing to do in a one-off type situation but Guardian gives us a secure and scalable way to share that intelligence with customers through a service the second piece is our partners so we partner with proof point and CrowdStrike and so if you when you turn on guard duty you get proof point and crowd strikes threat intelligence feeds with no additional cost so we applied both of those and again we manage the updates to all of that those feeds so there's nothing that you have to go and apply updates to or to make sure you're using the latest version it just happens automatically then customer provided so we allow customers to upload or apply their own threat intelligence we see some of the larger customers or customers that have certain industries where they have very specific threat intelligence that is important to them and you can upload that either through API or in the console from an s3 bucket and we will apply that only to your account so it doesn't apply to everyone else a lot of times there's agreements that you're not allowed to share some of that intelligence so we only apply those detections to your specific accounts if you have the mass remember setup and we will apply it to all of your member accounts as well we don't share it across customers these IPS and domains that we're looking at gives us the ability to based on the DNS logs or the flow logs identify communication with known infected hosts command and control lately we are seeing a lot of Bitcoin activity from infected machines and it allows us to raise findings based on those the the other thing that we're looking at is unknown threats and unknown threats are kind of important because we don't have a pattern to match those but within the AWS if we do all of the baselining and we look at some machine learning against those logs we're able to identify behavior that is sufficiently outside the norm that it warrants some investigation we're looking at four things on this again unusual ports unusual API activity you know unusual logins as well as unusual data transfer out that could indicate data exfiltration one one one piece of caution that we've we've learned over the time too is when you enable the guard duty service the threat intelligence findings are automatically start working because we're we're applying the log feeds directly to feed these do take some training time so when you first enable the service you're not going to start getting behavior findings immediately you does take up to 14 days for us to build a proper baseline that we can generate these findings without causing sufficient false positives so one word of warning and we've seen people do this don't turn on the service and then start doing unusual things because one you're not going to get a finding for it and two you're now putting that into our models so make sure that you wait for a while before you try to to trigger any of these findings when we look at the types of findings from a compromise standpoint they really fall into three different broad categories so the first one is reconnaissance these are things that take place before a successful attack so recons would be port probes from known scanners or known malicious IP addresses on the internet instances within your environment that are doing port scans right that's they're looking for other instances that they might be able to do lateral movement to or network permissions or resource formation discovery so this is API activity where somebody is trying to figure out what permissions they have or what the network security group or network ACL rules are set for again activity that you would do if you weren't really familiar with an environment but you wanted to figure out what you could do with the credentials that you've obtained the second two are really more post compromised activity so the first one is the reconnaissance before something bad has happened the last two something bad has already happened and we're detecting a behavior that is indicative of an actual compromised so from an instance perspective it could be things like denial of service traffic and it's important to note that this is not an inbound denial of service if you are looking for denial service inbound that would be the shield service what we're looking for in this is traffic from your instance that looks like it's part of a botnet that's participating in a denial of service attack command and control activity so communicating with IPS or domains that we know are hosting active C&C for botnets Bitcoin activity again one that we see quite a bit of another one I'll talk about is outbound tor traffic right so the Tor network is the way to do an anima zation of of your browsing if you're running a public web site having inbound tour traffic might not be a problem for you that might be perfectly okay however there's not usually a lot of good reasons to make outbound tor connections from within the Amazon environment right somebody's on that box and they're trying to communicate outbound in a way that's anonymizing that it's generally something that you probably want to look at and then the third category is account compromise so this is activity that is generally going to be based on the machine learning models so again make sure you know you give us time to build these but these are changes to the network rules so security group changes network ACL changes I am permission changes that that are unusual come from an unusual user or an unusual location or changes to the security of the account like changing your password policy to make it weaker or trying to disable things like cloud trail right these are all things that an attacker might do to try to hide hide their tracks and so those are all things that we would flag as a potential account compromise and and something that you would want to go ahead and evaluate the findings we create them under what we call purpose classes so the threat class it tells you it pretty much what the purpose of that particular attack is trying to do so whether or not it's a backdoor cryptocurrency recon right so these are all the different classifications and it makes it really easy when you get one of these findings to help classify them so the taxonomy that we use we'll start with the kind of class then the resource type that's affected and then what the actual finding name was but this is useful to break down because you might have certain types of findings that you want to prioritize from it for example all right or to send to different teams and you can do that based on the filters that we'll talk about here in a minute and you can consume all of these through our console so we have a console for guard duty one thing you'll notice here when you look at the console is we have the finding type so you can see that threat class and then what type of resource was affected on the right hand side you'll get all of the details that are in the finding itself anything that's got a hyperlink it will bring you directly to that resource one of the other things that's important is you'll notice that far call and you see the word count or that a column for a count and that is one of the aggregations that we do to help reduce the amount of noise anyone who's ever used traditional Network ideas is probably familiar with kind of alert fatigue where they just send out constant streams of alerts we really try to reduce the noise by doing aggregation so I'll give one example you see the SSH brute force that's happened 36 times well the simple fact of the matter is if you leave ssh open to the internet it's going to happen a whole lot more than 36 times it's gonna happen pretty much continuously right and there's not really a lot of point and that's telling you every five seconds that someone new has come along and tried to hit on that port right the the fact that that's happening it's happening because you left the security group open so instead of constantly streaming alerts that you will send you that one initially when it first happens and then we allow you to decide how frequently you want us to tell you if it happens again the default is six hours and then you can change it down to 15 minutes but again it's much more useful for us to tell you once and give you time to fix it if you haven't changed that security group then it'll still be happening and we'll send another notification after 6 hours or or 1 hour or 15 minutes depending on how you have how you have that set up the other thing I'd call out from the console is the filter criteria so for most of our consoles filters is just a UI experience one thing that's unique to kind of guard duty filters is you can save those filters as auto archives that's why you can kind of see it says save filters and slash Auto archive the auto archive capability allows you to decide that there might be certain circumstances where you don't want to be alerted to something so for example if I had if I wasn't going to correct that security group if I wasn't going to close that port down let's say that's my Bastion host and I've hardened it and I know that it's going to be open to the Internet and I'm just accepting the fact that it's going to constantly get brute force I can create a filter based on the tags the ami the V PC you can get very very specific and then the finding type and say that as an auto archive filter when you save a file archive filter we will create the finding you can view them by changing that current archived in the console but importantly we won't send out a cloud watch event so it will not go to your downstream alerting systems or your ticketing system or whatever you have consuming the guard duty findings you won't get those alerts and so it's a way essentially of suppressing a particular finding based on a set of criteria that you can be very specific with with the filter bar the findings themselves also again are available in JSON through our API so this allows you to call our API and consume them encode yourself however most of the time what we see is customers retrieve them directly through cloud watch events so cloud watch events if you're not familiar with that particular service it is a sub service under cloud watch but it's essentially an event bus and it allows guard duty as well as other services to send events to cloud watch events and how much events will send them on to other services in this case specifically we mostly use lambda or step functions this again is all aggregated so when those events happen multiple times we don't send those those alerts unless it's within the six hour one hour 15 minute windows that you've chose and it also from a testing standpoint you can go ahead and send sample findings to cloud watch events as well so in the console there is a way to generate a sample set of events and that could be really useful if you have automation on the other side of this cloud watch event stream you might want to test what those what every single finding will do in your automation and so generating the the sample of findings in the console will allow you to do that without having to try to manually trigger every single type of event right so this essentially is what we end up looking at and that to take actions on all of these findings the typical scenario that we see is guard duty again cloud watch events triggering lambda or also step functions those are pretty much the two typical ones some partners use SNS but lambda really gives you a little bit more freedom as to what type of automation you want to perform typical integrations that we have are upstream for sims or partner tools so you might have saw on the console that we had a link to a partner page and we have over 20 different partners that have built integrations with with guard duty some of those are to consume or to visualize finding some of them provide additional context of findings or integrate with blocking technologies right there's a lot of partners that have built a really a lot of cool tools and most of them are actually out on the show floor so if you have time you can talk to them after after the session it also though it gives you the ability to do some automated remediation or response and well well look at a few different ways that you can take these findings consume them and then take some sort of an action to help mitigate or prevent a future finding from occurring when we when we do this we're gonna look primarily at three other adjacent services so it'll be us lambda so if you're not familiar with lambda lambda is our functions as a service so it allows you to run function code with zero infrastructure in your account you essentially give us a piece of code we execute it for you the second one that we're going to use is Systems Manager so Systems Manager allows us to take action and get information off of an instance because it has an agent that runs on the Box you can send it commands you can ask it to query information things of that nature and then the third one is inspector so again through lambda and we can actually trigger inspector to do a vulnerability assessment so this could tell you whether or not an instance is missing a bunch of patches for example from a vulnerability standpoint so using these three services the way we tie this all together again is first Amazon guard duty creates the the findings we do our aggregation and we do our noise reduction to make sure that we're not sending continuous streams of alerts goes into cloud watch events and we're going to trigger lambda functions again you can also do step functions to trigger lambda functions for more complex integrations from there we have all of our partner integrations some automated responses and really anything you want again because lambda executes whatever code you get it so anything that you can express in code you you can take as an action some examples that we'll go through here so these are some some of the more popular ones this one we have also written a blog post specifically on this this kind of use case that has some of the sample code that's used in this diagram so the concept here is if you have an application that's running behind an AWS laughs you have two different types of threats that we can think about so if I'm getting port probe let's say on on HTTP or HTTPS and the port probe is coming from a malicious actor on the internet I might want to be able to block them and so when guard duty sees that hey this is a known malicious IP maybe they're involved in scanners on the internet we can create that finding finding will go to step functions which because of the fact it's an inbound connection and it's for a port probe it will send that over to an IP block list in AWS laughs which will then prevent that actor from getting any Freder communications when we do this there's also a dynamodb table on the back end that has the the set of IPs and the time and then you basically configure how long you want to shun that IP address for and what will happen is that lambda function will also then remove those black lists after they're no longer after they timeout another thing you might want to look at is modifying networks ACLs so security groups don't support negative rules all right but network ACLs do so if you have let's say an instance that we see traffic going from the instance outbound to a command and control server I might want to sever that command and control traffic by putting an outbound block list on my network ACL and prevent any of the instances inside that subnet from communicating with that IP address this could also be done again for tor traffic outbound Bitcoin traffic how about anything that's outbound traffic from an instance you can block again using your network ACL you can also block inbound traffic from those as well if you wanted to block both sides so if you're not ready to do blocking actions one of the other things that we could do that's very useful is utilize some of this automation to help gather information for a human incident responder so if instead of actually putting blocks in place you want to have somebody who's going to go and actually investigate or look at what's happening on that instance you generally are going to have a set of commands or a set of activities that they do probably immediately when they go look at a host and we can automate a lot of those and give that information directly to that sock analyst or that incident responder as part of the the finding itself so when the finding creates itself we will have a lambda function that does a couple things in this case one I might want to take a snapshot of that volume immediately right so I get a point in time snapshot of that volume which I can then attach to an offline in another instance that I can in a use some investigation with so it's not running I can also then use Systems Manager to collect information off that instance for me so I could run scripts that give me the list of recently changed files what processes are listening on which ports and that stat to see what outbound connections you have going what will the full process list itself you can even go and do something like lime and use it to get a full memory capture all of this then can go and you can upload to like an s3 bucket or or send to a queue and have that information associated with the findings so that way when it gets to somebody who's going to do an investigation a lot of the activities that they might have already done manually have been performed for them and they have that in front of them or straightaway the kind of perfect end state in a completely stateless environment is even a little bit more automated so this is one where we see a lot of customers want to get here but it's very difficult to do and it really relies on a fully stateless auto scaling type architecture so if you have an architecture that is completely stateless you don't have any information that's on a host that is important to the application itself and your deploy 100% auto scaling you can automatically remediate and replace instances essentially so in this case if I had a finding about an instance that was potentially compromised I could detach it from its auto scaling group which means we'll have that instance get replaced by auto scaling so you keep the availability up next you might want to check and see if it has an iamb role right because if the instance is potentially compromised and it's got an I am role attached to it there's a potential that somebody could use those credentials maliciously so we can remove the roll from the instance itself snapshot the volume again so you have a forensic copy of what was on the instance you can then replace its security group with one that has denying rules so that way if there is any process that's running on it it can no longer communicate or accept commands from an outside and then attach a forensics network interface so the idea here is you have a subnet inside your environment that has a hardened host on it and you attach an ID to this that is in that subnet that only allows SSH traffic for example so that way you can go on to an instance you can SSH safely into that but it can't send any traffic back out now you only allow kind of that inbound SSH from one of those instances that's going to go and and be used to identify what was wrong with it when you're all done then you can simply terminate it right when you're okay doing that because auto-scaling would have already replaced it again this requires a very very specific type of architecture before you can get to that state so one thing I will say before you try to enable automatic remediation a few things that are really important first and foremost not every single one of those is going to work for everything right so yeah you got to pick the right type of action based on the affected resource and the type of finding right especially things that are behavioral based so something that's unusual might not necessarily be malicious so there's there's different activities that you might want to take based on the finding type but also the finding resource or the resource that's actually affected so pick the right one don't try to apply the same action against every single account in your environment I will say most of our customers are in the notify and ticket phase right so for most cases you want to notify the resource owner or the I am user itself or open a ticket to somebody to do further investigation the isolation and terminate and Replace are again much more aggressive actions but no matter what one of the things that is very key to in order to enable all of these is having a good tagging strategy so a lot of times especially when you have a security team that's getting access to all of these findings and they're not the actual application odor themselves if there's not good tagging associated it and it's not consistent then they won't have the context that they need in order to realize which action also if the tagging is not consistent you can't use code or automation in order to determine how you're going to route that so even if you're gonna go for just a ticketing you have to know who to ticket all right you need to know who owns that resource or what that resource is used for in order to properly route any sort of notification so this is a very important thing and this has more uses than just guard duty right in general you will be a lot happier from both the security and an operational standpoint if you have a well-defined and consistently enforced tagging strategy the third one it's incumbent upon us as security professionals then to work with those application owners to understand whether or not we can ticket them whether or not we can isolate those instances right in order to understand what type of action or how you should respond to individual findings themselves the security team needs to have at least some interaction with the application owners to understand you know what what that instance does and/or what that particular role or user is supposed to do and this is something we should do anyways this is just good practice so lastly I would say again you know start with notifications and ticketing that is where we see the bulk of our users do right now and if you are gonna move into an automated remediation type framework you want to do so deliberately and slowly to make sure that you don't accidentally cause any sort of availability concerns so thank you for coming today I'm we have some time left so if there's any questions I will be over here on the side I'm happy to answer any questions that anyone has and also do please make sure that you complete the session survey in the mobile app Thanks [Music]
Info
Channel: Amazon Web Services
Views: 14,774
Rating: 4.9534883 out of 5
Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, AWS re:Inforce, AWS re:Inforce 2019, security, identity, compliance, cloud security, AWS security, cloud security community, learning conference, Detective Controls, Infrastructure Security, Data Protection, Incident Response, Governance, Risk, Compliance, security best practices, The Foundation, AWS re:Inforce 2019 Sessions, Session, FND216, Ryan Holland, 200 - Intermediate
Id: czsuZXQvD8E
Channel Id: undefined
Length: 36min 29sec (2189 seconds)
Published: Wed Jun 26 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.