Deep Dive on Amazon GuardDuty - AWS Online Tech Talks

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so hey this is Nate and we're gonna start talking about how to find important information in the sea of wonderful information that you get from guard duty so we're gonna go through this quickly the intro at very least we're going to talk about some of the services used and then we're gonna get into the actual looking at the different pieces parts of the system so Amazon guard duty was released at reinvent last year 2017 the goal of Amazon guard duty is to be continuous monitoring of your AWS environment the intent is to be able to quickly and succinctly remediate issues that occur in your AWS environment for you now we'll get to some of the auto remediation and whatnot you saw that and in the itinerary I'm sure but what that actually means is that we have guard duty that's acting and guard duty is a system again that continuously monitors the environment through a number of pieces of information we'll talk about that a little bit too but from there we actually start to use cloud watch events and cloud watch rules and in this demo we'll actually see some cloud wash dashboard as well to do that we have to use lamda events so what's gonna happen here is that we're gonna have a guard duty finding which is what we call our needles so the guard duty finding wool will kick off and we'll go ahead and we'll actually see an event occur and then when that event occurs the lambda function will fire and the lambda function will go ahead and put some information into our cloud watch dashboard so we're gonna talk a little bit about guard duty now and as its let's move on so enabling guard duty enabling guard duty is really horribly simple it really is a one-click type of a thing you know end up - it's a couple of clicks if we're being entirely honest you've got to go to the services screen and you've got to click on guard duty and then you've got to click enable dart duty it is really an amazing service to enable keep in mind that you do get a 30-day free trial when you click on guard duty and there's a couple of reasons for that so we're gonna start talking initially when we talk about guard duty about some defined pieces of information that it will use to figure out whether or not there's an issue with your account whether or not there's a whether or not it's looking for a threat whether or not it's finding a threat rather and these pieces of information generally are called definitions and we'll talk about some of those definitions sort of the middle of this presentation but it's important to know that after that thirty days is really where the machine learning kicks in so initially we're looking at those kind of predefined items and after that first 30 days then we're looking at some of the machine learning types of things we'll talk a little bit about that later too so continuous analyzation totally cool idea really great implementation here what we're really working to do is to watch cloud trail events VPC flow logs and DNS query logs all the time and we're trying to get that information back to you as quickly as we can now think about historically as we looked at these things we would look at you know maybe we're looking at some sort of a mirrored flow of data information between two NICs maybe you were looking at your DNS query logs coming out of mind maybe you were looking at some different things happening in your in your environment but really what we look at in guard duty specifically is kind of a limited subsection of information that gives us the best ability the best chance we think that we can find to give you the information that you're going to need to define whether or not there's a threat in your environment so we're gonna use VPC flow log first now it's important to know what VPC flow logs are VPC flow logs are not a packet captured the pc flow logs are basically a concatenation of a session a connection between two points now that can be julienne eyes in your V PC that can be two en eyes and two different subnets a whole bunch of things or it can be an en I and elastic network interface connecting to an outside system now that's that can be important obviously because if somebody's trying to actual information from your your data from your V PC that that's a problem we need to know about that and kind of on that note we're going to start looking at some DNS logs too because as things start to change in an environment that's that's production that's supposed to be appropriately static that environment is gonna start making DNS calls out to outside hosts and as those calls get made we can actually see potentially some data be accelerated just through the DNS column it's kind of an interesting way to exfil data here we're gonna see some of the guard duty findings as we look it through this that will actually show that that is being exfilled through the DNS query itself and finally cloud trail events now cloud trail events is a different switch right we've got VPC flow locks we've got DNS logs those are both kind of networking things right and now we're gonna talk about cloud trail events which is kind of an AWS account sort of a thing so it's not really a networking thing so we get this kind of weird you know dichotomy thing going on here the triptych I suppose but still you get the point so cloud trail events is gonna tell us everything that happens inside of your AWS account it's gonna talk about someone making a new eni it's gonna talk about launching an ec2 instance it's gonna talk about making an s3 bucket public it's gonna talk about making a new user potentially and some of the API calls that that user might be making now as we look at the combination of these three things we start getting into some really interesting stuff so for instance let's pretend that we have an outside person that's trying to hack into your account and somehow they managed to get ahold of an AWS key or password we're gonna see a new IP that's gonna show up in our V PC flow logs there on the left that is making API calls showing up at our cloud trail events on the right and then we're going to see a new instance pop up and that new instance is going to start doing things that our static production environment has never done before making DNS calls so you can see how all three of these things begin to flow together and help us find help us make findings that we can then make it intelligent responses to and that brings us to detecting threats so just like we did in our little game there a second ago we detected a threat or at least we pretended we did D are there are some basic categories that we can talk about as we go through this it's important for you to understand that each of these items are specifically unique we'll talk about backdoor threats pretty straightforward it could be malware it could be something inappropriate behavior something that is different than normal Delta's in account activity potentially cryptocurrency which is my favorite if you ever suddenly see that you have a cryptocurrency finding that's probably a bad thing pentesting which is an interesting one frankly we'll talk a little bit about that not much a recon you'll see a lot of ports being probed if anybody's ever run intrusion detection system before I'm sure you've all been probed by multiple IPS in different countries a stealth attacks always interesting so that's kind of an interesting thing too though when we start talking about stealth we're really looking at some of that that cross cross crossing the boundaries we're looking at something that's trying to hide obviously the whether it's stealth but we're looking at something that's trying to hide its action so maybe somebody's turning off cloud trail Trojans obviously being exactly what it sounds like and some on unauthorized activity will see some tor node questions there in a moment so we're gonna talk about specifically in findings what they look like and as we look at a finding keep in mind that we're looking at something that's happened in your environment we need to talk about what that finding actually is now if we look at the left we're actually gonna see a piece of a guard duty screen this is actually a finding inside of guard duty and we're gonna see that instance ie - F and etc is performing an outbound scan which is probably a bad thing so it's trying to do a port scan against the remote target if your system has never done a port scan before and you are not a penetration testing company this is probably not a good thing so we're gonna note that the severity of this is medium it could be an OK thing maybe it's not a horrible thing but it's it's not okay normally this isn't normal activity right we're gonna see how many times this happened now as we talk about doing things that would help us detect threats if something happens once maybe it's a fluke right I think we've all heard this this analogy before if it happens once shame on you if it happens twice shame on me so if we start seeing things happening more than once that's that can be something that we can act on as well we go back through the threat types and different items there we talked about that on the last screen but really the interesting thing here and frankly the cool thing about guard duty is it's really set up to go ahead and output these these events to a second a second system so if you have a sim that's running it on-site if you have a sim that's running in the cloud we're totally cool taking these API these JSON formatted findings and pulling them into that and letting you deal with it both inside of that and then also in Guardi to keeping in mind that guard duty is going to be able to link up and deal with some of that automated response that we'll get to at the end of this talk so let's talk about severity level we get to our red needle finally um so kind of as we go back to that initial discussion about what this talk was gonna be about we talked about what to do about that that red needle and as you can see with as with most intrusion detection systems we've got low medium and high high being read um we'll talk a little bit about this after we look at the demo it's an interesting question about how to deal with some of this so low findings are gonna happen you're always gonna have suspicious activity in a system that's okay if you've got a web front-end for most systems you're always gonna have somebody scanning your ports that's one of those things that you're gonna have to be ok with there are things that we can talk about as far as architecture to kind of remediates some of that and we'll talk about some of that risk if you'd like to with your SI most likely however as we get into medium and high we start to see some really interesting things so we start looking at maybe activity that is not normal for a production environment to go ahead and show maybe something happened and now we've got we've got a virus or we've got malware inside of an OS that wasn't there before how did that get there what that what is that gonna do what kind of action should we take for that and that's kind of an interesting question and that's gonna be unique to everybody and we'll get into what actions you can take here in a minute in the demo but keeping in mind that actions for all of these are really dependent upon what you're looking at and what you're doing I'm going to tell you as a consultant that if you see cryptocurrency mining that's probably going to be a high and chances are you ought to burn that machine and make a new one however if you're a crypto mining company that's probably a normal finding in you that's okay so these are what the threats look like when we look at programming on the API or programming in lambda to begin to actually remediate those threats keeping in mind that and we'll get into this a little bit real quickly that we can filter on all of these different groups so as we look at these I mean there's some of them are really obvious here keeping in mind that the ones in blue generally are the ones that are actually going to be controlled or dealt with under machine learning we've got some obvious ones like I said we've got tor IP calls if we start seeing some iam users coming from tor IPS obviously that's probably not okay you will pop up a normal finding on that one or a medium finding and probably need to go ahead and and whitelist that individuals IP if seeing stuff coming from an or IP that's probably gonna rotate but if you needed to for the moment some of these are pretty straightforward if you're getting SSH brute-forcing it does really matter where you're coming from that's gonna be an obvious hey we've got a problem scenario kind of the most interesting thing on this slide specifically is the fact that I just got off the phone with the guard duty GM and he told us that we could go ahead and show off the bottom three here guard duties findings are always increasing and we're doing our best to add findings as quickly as we can we've got phishing domain requests which is basically an email scam scenario we have black hole traffic coming from DNS systems that are inappropriate and we have some domain requests that are also kind of that DNS thing that we talked about before where we're looking at some of the data going back and forth and being somewhat not ok as we go forward then we're gonna start talking about the fun thing here leveraging actionable alerts so we've talked about what the findings actually looked like kind of you're gonna see some of these basic things coming up but now what are we gonna do we talked about right at the very beginning in this well it goes from guard duty to cloud watch events to lambda pretty straightforward stuff we've all done a lot of this stuff I'm sure that a lot of the operations guys in the in the in the audience have used cloud watch events in the past to automatically remediate something in their environment we've seen all sorts of things from patch management to many other different SS maskull scenarios we're having a cloud watch event kickoff and then having a lambda function actually go and do something is a great idea so these are our usual suspects we have Amazon Guardian at the top left we see cloud watch we see lambda we see Amazon systems manager previously known as SSM I have kind of left some of the specific names out of this because I want it to be more of a generalized review of what's gonna happen so as guardian kicks off we're gonna kick off a rule in Amazon Cloud watch and it's gonna say hey we have indeed something that's going on and that something is going to be of interest or not and as we talk about this it's important for you to understand that this is generally what would you would put into cloud watch to go ahead and find everything that's coming out of guard duty now that's fine you can do this this basically catches everything coming out of guard duty or you can do something like this where we're only caching specific finding types like we're here we're looking for malicious IP caller and as we look at as we think about that table that I showed a little bit ago that's only one of the many finding types that we can do and that's kind of neat because if we have a specific need we could go ahead and program a number of them and as you look at this this finding type here you see type : and then you see that square bracket there that square bracket means it's a list so I can go ahead and put a comma here and put a whole bunch of different finding types here and act on those specific finding types which is really nice because then I'm really just using one rule one event specifically to deal with what's going on so let's talk about what we do next so next we're gonna go to a lambda function right because that's what we ought to do now you'll note in the last finding screen that I showed a second ago we also might have an SNS to go ahead and email somebody and say hey we've got something that's going on and we'll get to what choices you might want to make there or not but generally speaking if you find a red needle probably not a bad idea to inform someone that it happened so now that we're at the lambda function what are we gonna do well probably we're going to go ahead and look at the description for that instance and see what that instance needs to do so we might find out that that instance has two network adapters that's pretty normal not a big surprise it also has an EBS volume of course because it has two sure enough the primary network adapter is looking at 80 and 443 and it's going out to data SG so that means that it would only talk then to data SG the database portion of its existence and that's really cool that's kind of what you expect however it also has three three eight nine open to the world that's probably not what you were expecting shouldn't be anyway so what would we do as instant response individuals we've now got a finding and guard duty that's kicked off a rule and we can do something with our lambda function what could we do well we could go ahead and wipe out that second Nick maybe that second Nick wasn't supposed to be there it's definitely something we could do it's probably not the best choice in this scenario though and it's not the best choice specifically because this instance still exists so if we look at this scenario and we say hey we've got a we've got a finding let's pretend that now we've got some sort of an API finding that's saying hey we've got a tor IP that's controlling the API and all of a sudden we have an instance that's communicating on an on an en I that it didn't typically have or an SG has changed so that it's open to the world as we look at this do we want to continue to treat this instance as if it's a normal instance what do we want to do with this instance well in this scenario we could just delete that eni or we could go ahead and kick off something in systems manager and what Systems Manager is gonna do for us is really jump into that instance and start actually looking at how that instance is functioning on an inside level so we're not really doing anything to it right now for a moment it may indeed continue to run like it normally would we can run top we can run all sorts of different linux commands to go ahead and get a list of the current functioning processes and where things are we can do a peek app we can capture some of the packets coming and going from this machine and see what's going on we can run lime we could go ahead and do a memory capture effectively of the system and figure out if there's anything that's interesting that we want to do that with and where would we do that well we could put it on a forensics volume and that's kind of one of those interesting things that comes up here because this is totally virtualized environment we can go ahead and pop just an EBS volume on for a moment dump all of our forensics data onto that EBS volume and then make it go away so now we still have a functioning ec2 instance we've removed the Knicks from it potentially we've gone ahead and taken inventory of what's going on inside of it and that instance continues to run now obviously we can't let that happen bad things could be happening on it so let's burn it so we can go ahead and kill that instance and we can go ahead and make a snapshot of that volume now it's an interesting question what do we do with this snapshot I think it's a cool idea to go ahead and have a snapshot it's always a good choice but generally speaking we don't really want to leave it in the same account why because we have an eps volume hanging out for somebody to accidentally find and do something with that they shouldn't probably do so what I generally would suggest to someone is taking this EBS volume and moving it over to a forensics account always a good choice always best practice to try to segment things that you think might be in some way shape or form contaminated to a different place a different account so that only people who know what they're doing in the forensics account can interact with that potentially contaminated volume so we'll go ahead and copy it over there and that's kind of where this would end right I mean effectively as things would go we might go ahead and delete that forensics volume one thing to note here is that we do have a forensics volume and both availability zone a and B and there's a reason for that as we talk about this we need to make sure that we keep in mind that volumes exist only in one availability zone and we're gonna need to make sure that we have our resources split up now let's get into the demonstration so a little bit of things that go into this a couple of things that we're going to need to make sure that happened you are gonna need to be able to do something like cloud watch put metrics right that's something that you have to do and we're gonna need some sort of an iam role specifically for the lamda policy now there's a problem with this lamda policy I wrote it for the lab I did not write it for production one of the things that you will notice is that the resource down there is a star there's a lot of things that we can do with this star that we shouldn't probably be doing here so in a lab not a big deal in a production system we might actually want to call out what it's allowed to do or we might want to have another lambda function that changes the IAM policy so that it's specifically able to only interact with a volume or a specific errand at one time so just a thought as we go forward so this is the point at which we're going to go ahead and jump to the demonstration so here we find ourselves in the guard duty screen guard duty kinda looks exactly like this this is just what it is life is a bunch of findings now keeping in mind that yard duty was originally made so that this is not the only screen that you would see guard duty was made so that you would export these events to your sim to your cloud watch dashboard to somewhere else so what are we gonna do today well we're gonna go ahead and look at these findings and were gonna figure out what we should do with them the goal of this talk was again needle needle needle it's just a whole bunch of needles so what happens well we talked about this they go to cloud watch right so let's go over to cloud watch and we'll take a look at our events and our rules and you'll see that we have two rules here we have one that actually deals with things and we have one that's written by Michael Chan based on findings it's gonna catch everything that's going on and then it's gonna dump it to the metric system and the metric system as we get into our lambda here it's relatively straightforward so we're gonna go ahead and use some basic libraries that are available to us Bato 3 of course we're going to go ahead and set up some basic logging here at the top always a good idea we also have x-ray that we can turn on here if we wanted to you'll note that we're not gonna do that for the demo but for right here that's it's always a good idea so we're gonna set our logging level right now to info we're just trying to keep track of what's going on and here we start getting really interesting so we can start looking at the different types of things coming through we can start dealing with whether or not we want to deal with a recon a Trojan or an unauthorized access and this is kind of where we start wondering what we want to do I feel creative when I see this part of the code because it's kind of a question as to how do we want to respond if I had a specifically let's say I had a Trojan what what would you want to do with a Trojan let's kind of in a unique question I probably want to burn it and build another one and how do we want to deal with some of those other items and there's another slide at the end of this deck actually the next slide after this one that we'll talk about maybe splitting this up into groups but in this scenario what we're actually doing is building a set of items here that we're then able to go ahead and put into metrics and these metrics then show up on our dashboard if we go back to cloud watch we can actually see our metrics over the past week if we look at just today we see a relatively normal day we had a little bit of a spike here and it looks like about noon Eastern um but other than that life has been pretty normal now the cool thing about our metrics traffic and the code that we have written now is that as we see attacks we start to see these counts go up now keeping in mind that if we just wanted to look at guard duty we see just our findings screen it's kind of hard to tell exactly how bad something is we've got a count number over here and that's definitely something we need to look at but it makes it hard to begin to figure out whether some of these findings are really important or not and that's kind of one of the coolest things about cloud watch dashboards that we have the ability to do here's as as we start putting this together then we can go ahead and look at this dashboard and figure out that hey we do indeed have a huge issue and we need to react to it so how do we want to react to it what do we want to do so if we go back to the cloud watch events rules section we've got the second one here this is one of our lab items and you'll note that there also is an SNS topic associated with this one because good lord we're gonna go do something to the environment please please please always communicate what you're gonna do we go ahead and we launch a cloud watch sorry a lambda function that's gonna do a whole bunch of other things so this system is gonna basically figure out whether this cloud watch item is indeed important should we actually act on it it is indeed guard duty if it's not we're gonna say hey this rule may not apply for this event and if it does what are we actually going to do then so we're gonna go ahead and we're gonna kick off a bunch of other things we're gonna look at that threat we're gonna decide whether or not there actually is a threat or not we're gonna go ahead then and and kick off this specific action and then we're gonna move to a bunch of other things too so let's go ahead and look at this other action real quickly so we have a couple of different options here you saw that this is the one that we're pulling in that specific code and as we look at this one what are we doing well we're going through and saying hey we're gonna go ahead and put in the forensics security group because that seems like a good idea we're actually not doing much else other than that we're creating this forensics well we're looking at this forensics group we're creating the group and then we're going to go ahead and attach it to some specific en eyes now generally speaking when I give this talk I would be talking to a roomful of people and I would tell them that I would give somebody bonus points for figuring out why we went ahead and did this to all of the en is and why we're looking at doing a loop here why we're looking at all of the security groups so keeping in mind that each instance can have multiple different nyun eyes and each key and I then is also has the ability to have its own security groups on it so we actually need to loop through all of our en eyes to make sure that we get all of our security groups and remove all of them and then put on our forensics group so always a good idea to loop through all of the pieces parts and not just assume that you have a primary one and once we do that then we'll see that our instance no longer has the same security group that it has before so we can go ahead and do that real quickly just showed you the configuration testing event and you'll see here that we have the instance ID right here now keep in mind that this is a configuration this is totally a convictive en't it's something that I actually took out of the guard duty event that it sent over so this is actually a guard duty event it's just customized for this particular test and what we'll see is that when we click test will actually attach the volumes back to this system over here if we look at our running instance neither of them now have the block device you've got to be kidding me let's go ahead and rerun that the demo gods have decided that they don't like us today and they are going to give us a failure what is the failure so we're getting an invocation does not exist so we're going to do is for right now we're gonna comment this out figure out that later there we go all right one more time because we just love pain forensics volumes currently unattached I am to make sure it's actually going to attach the volume all right so we're gonna take it from the concept of hey we're looking at this and if we go down to the very bottom here you can actually see the instance ID and a couple other information these informational pieces about this this is actually the guard duty event that comes out of guard duty goes through cloud watch events and then into the lambda so this is copy and paste it effectively from that you can get that from guard duty or you can get that from the cloud trail so if we go over here and we go ahead and hit test we'll see that it will run will get our success and what we've got going on here is we should go ahead and see that we've attached the volume and we've gone ahead and kicked off the SSM Command now in a real environment what we want to continue doing is monitoring this SS gun commands and then validating that it's actually occurred if we look over here and our services and go to ec2 we should then see our maliciously infected system over here and our compromised instance over here and we should go ahead and see that it doesn't eat and not have a block device attached to it even though it passed and said that it did forensics of system did not attach to it we can't win today guys I'm sorry okay I feel a little bit like Groundhog Day I would swear we just did this that's really scary all right going back to configuring test events we talked about this I cancel so and as you can see here what we're gonna do is go ahead and attach the volume now we're gonna print the result of that attachment then we're to kick off an SSM command and this is really the mem dump SSM script you can see that we've also got a packet capture script there - we're not gonna do that today as we hit tests goes ahead and sends out the information we get a success and we can actually see where we've attached the volume so we've got zero retries it did give us a 200 so that we know it attached this time we can see that we have parameters all of this is good stuff good stuff to see in cloud watch our cloud trail rather as we go over here then we can see that in our ec2 instance and we can go ahead and take a look at the compromised instance you will see that we now have the block device actually attached to it now it's an interesting question as to why we're actually attaching the block device in lambda as opposed to using the API or the the CLI out of this instance and the really the reason is to do that realistically we don't want that instance to have anything other than the limited amount of stuff that it needs an SSM so we've got an SSM connection to it that's about it that SSM profile here this a iam role is not allowed to add this this specific forensics volume the only thing that's allowed to add this for a specific forensics volume is the lambda system itself so we don't want to elevate the privileges of a compromised instance keep that in mind we want to go ahead and use the permissions of the lambda function to go ahead and attach that specific volume back to this instance once we've done that we can go down here and take a look at run command and we'll see what commands we had running and we'll see that it failed and that's okay we really didn't expect it to work in this in this demo the point was that we could make it work you'll note that in this scenario the file doesn't exist we've got the wrong FS type and that's all right this is really just totally a test scenario we just wanted to prove that we could do it we talked about remediation actions so if we can do stuff like this we can do security groups we can do memory dumps what do we want to do if we wanted to do this what would it really look like shoot so as we look at this one of my colleagues actually Josh du Lac suggested doing it this way and I think this is probably a good concept he's got it split up into specific organizations we have things that you have to do immediately we're going to remediate these regardless of what happens next so if you have a Trojan and you're you're doing all sorts of nasty dns data exfiltration we need to take care of that right away and that probably means doing something to the tune of actually just going ahead and burning that that instance immediately and spitting up a new one um there are some items in here that you may want to do things differently with if you're more about attribution if you need to know who actually did it maybe you want to do some of the forensic stuff we just looked at prior to this slide we have some other things as well we have investigate before we're Mediate we have some unusual port traffic kind of abnormal different volume maybe we're getting some more or more bandwidth out of a specific production server than usual maybe somebody's rerunning or something's wrong with yum update and it's just spinning out of control you don't necessarily need to remediate that easy ec2 instance maybe you need to figure out why that happened as we move down this list then we start looking at even less important things or things that are different maybe let's call it different so keeping in mind that when we talked about this way back in the beginning we talked about they're kind of being two levels of review in in guard duty we had items that were happening on the network and you kind of see that in column a whereas in column C over here or section C we actually see things that happen in the AWS account and we don't necessarily want to remediating instance if we see something happen in the AWS account that might be something more aligned along the lines of remediating keys or taking care of users that shouldn't be there or getting rid of of different API keys that we need to go ahead and roll there are other issues on this the C column or the c section that are going to take be more along thatthat scenario we also obviously have instances being launched in unusual regions that's probably something more along the lines of one of the the individuals in your company launching something in a region that maybe they didn't mean to go into doesn't necessarily mean you need to rotate all of your API keys as at the very bottom here we have ports being probed if you don't want your your systems to be broked if you don't want them to have things go on with them then you probably need to look at adding some ELB ELB easier and elby's in front of your systems so that you can clean up some of that probing so that's pretty much the talk as we've had it so far kind of the call to action here is keeping in mind that guard duty is free for the first 30 days why not just turn it on guard duty is one of those things that really will help you see what's going on inside your systems it'll help you understand what where that red needle actually is and help you react to it more cleanly I'd suggest to use it I use it on all of my personal stuff and I suggest that all my clients use it this was Nathan case I hope you enjoyed this and have a good day

Info

Channel: AWS Online Tech Talks

Views: 32,182

Rating: 4.8327527 out of 5

Keywords: security, guardduty, intrusion detection, threat intelligence, analysis, siem, compromise, vpc, flow logs, cloudtrail, AWS, Webinar, Cloud Computing, Amazon Web Services

Id: o2YaIsps5LY

Channel Id: undefined

Length: 35min 35sec (2135 seconds)

Published: Tue Feb 27 2018