Incident Response in the Cloud (AWS) - SANS Digital Forensics & Incident Response Summit 2017

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

(eerie music) (audience claps) - Uh, I'm Jonathon Poling. I'm a Principal Consultant with SecureWorks. When I saw the agenda come out, I immediately looked for where Hal was presenting. And I saw that I was going right before him, and I was like, oh, thank God. Hal is a hard act to follow. He's done a lot for people in this community. He likes to say he is successful, you know, by standing on the shoulders of giants. And I like to say I'm partially successful by standing on the shoulder of a giant who stands on the shoulders of giants. So thank you, Hal, for a lot of the work that you've done. I'm already off script. I'm like a minute into this thing. Be cool, play it cool. All right, here we go, let's jump right in. So, I am a huge proponent of the Treat Yo Self model. All right. So, what we're gonna do today is give you a short little overview of AWS Security Model, which is a little bit of a boring thing, but is rather important to understanding why we care about this stuff in general. We'll go through a few items on how to set yourself up for success, right. By logging, monitoring, environment prep. And then, once you're set up for success, you can then. Oh my God, that was weak. You've had a break. You should have your second cup of coffee in you by now. Targeted Response, Log Collection, System Data Analysis. Right, so you want to be prepared for this stuff, right? You don't want to go in, and, you know, we have customers come into us and say, you know, something happened and we're not prepared at all. And it's even worse in the cloud. So, let's start with the Shared Responsibility Model. So, you can see there's a clear delineation here. When we talk about incident response in AWS, with clients, they say, well, why should I care? Like Amazon secures the cloud. Isn't it secure by default? They're responsible for security of the cloud, right. But we're not operating of the cloud, we're operating in the cloud. So, we are responsible for everything basically above infrastructure. And this is super important. All right, so they provide the infrastructure, infrastructure as a service. And this is really a critical model to follow, which builds kind of into the why. So, why should we care? Why should we care about doing incident response in AWS? Amazon's awesome. The do everything amazingly well. They offer so many things. They give you so many tools to do so many awesome things. But, they're really set up there to help facilitate you do response. They say, here's the tools. We'll help facilitate you doing security, but we don't do security for you. So, why would we care about this? Why do we care about doing incident response, and why would we have a presentation on this? Isn't this all taken care of? No. Right, so does anyone here use OneLogin? Their company's at an enterprise level. Right, so. This is relatively common place, but from the recent OneLogin attack, right. So attackers came in through the AWS infrastructure and the attack started at 2 am PST. Right, so they're not doing this in the middle of the day when everyone's at work. Right. Created several instances to do reconnaissance. They detected it and they saw that within minutes they were able to shut it down and determine that the keys were stolen. So within a couple of minutes, they were able to instrument themselves and see what was going on. But, how many people here operate in AWS? At all. Four, five. I don't want to ask this, but, how many people operate in Azure? Okay, good. Wanted to make sure we weren't outnumbered. So, how many of you can say yeah, if an attack happened at 2 am I would be able to detect it within minutes? Can anyone here detect an attack at all? Hey, one, all right. So these are things you need to be thinking about, right. There's a lot of stuff that goes on in AWS. It's not on prem. It may not have all the same notifications you have for on prem stuff. But this stuff's happening, right. This is why we care about it. This is not some ethereal thing. So, this is the core mantra here. Security is my responsibility in AWS. All right. So, how is AWS different from on prem? Like what are the differences, what are the core things we need to look at? There's four major log sources that I'm gonna talk about today. And there's others. There's things for like application load balancing, elastic load balancers, cloud front, if you're hosting websites, blah, blah, blah. But, the four main essentials are CloudTrail, CloudWatch, Config, and S3. And CloudTrail's essentially API activity logs. Almost everything in AWS is an API call that is recorded in these logs. CloudWatch is essentially system performance monitoring. You can also do, you know, from a CPU/IO metrics, there's also OS and application logging if you instrument it with the right agent. The Config logs are basically resource configuration logs. So, what's been configured, how it's been configured over time. And what that's look. What that looks like over time, how that historically has changed. And S3 is essentially the data storage area for AWS. So, these are called buckets, the main things that you put data into in S3 is called buckets. And everything else is essentially an object. And then there's server access logs, which we'll get to a little later in the presentation. So we'll start with CloudTrail. So this logs essentially every API call, every action that's taken in your account. Like this is your main log source. You can essentially think of this as the Windows event log. Or syslog, or you know, dmessage, dbusmessages for Linux. This is the critical core component of AWS where you're gonna get the most bang for you buck, and almost essentially everything is gonna be recorded. So, what might this help you do? You can figure out who did what when, you know, where they took place. It has source IP, date, time, region, things like that. Who made these calls, user accounts, things like that. CloudWatch, performance metrics. So you can use this to say, oh, you know, have I seen a CPU spike, you know, was there a spike at 2 am, why? We're not operating during that time. That sort of thing. Various high level metrics from a performance perspective. CloudWatch also has an agent that you can install on each host or instance, as they're referred to in AWS, which will also allow you to do host level, extract and analyze host level logs. So, CloudWatch itself isn't going to be seeing your Windows event logs, you syslogs or anything, but if you instrument them with an additional agent, they can be forwarded to CloudWatch for monitoring. Config, again, this is the configuration history for, you know, instances, VPC is essentially a VLAN, if you want to think about that way. EBS volumes are basically the hard drives, and Security Groups is essentially the firewalls of AWS. So this captures periodic snapshots of config status, of everything is configured this way at this point in time, and then, at some period it'll take another. You can also do it manually to do them quicker. And so, you can capture snapshots of resources in time, you can diff these. And then you can see kind of over time how things have been changed, right. So this would be super useful, right? If something's changed in your infrastructure, Config can be a great resource to look at. S3 is again, the main data storage area in AWS for most of your data and info. So, there's bucket-level logging, which is essentially have you created any of these major buckets to put data into, and they can live in different regions. And there's additional logging. So, there's object level logging. And again, everything is an object in S3. So. If you ever see S3 and you get a link to something in S3 or it's hosted in S3, you'll see something that looks like a file path. Like, you know, bucket/this/that, and you're like, okay so it's kind of like a file system. I'm like, no, it's objects. And it's like, but they're files and folders. Yeah, they look like folders, but you know, they're essentially objects. And it's like, okay, well can I do a make-dir on the command line? And you know, make folders in there? And they're like, no, they're not directories. They're not folders. They're objects. It's like, okay, I get it they're objects. And then you log into the S3 console, and all the, you know, the high level objects are displayed as folders, and I'm like okay. I don't follow you there. But, as long as you can get past that, you can kind of understand these a little bit better. And there's server access logs. So, there's things that are hosted in there, and then there's also the ability to have the server access logs. So, if the JSON logs, which is the default format, isn't enough for you for the log level, or the bucket level logging or the object level logging, there's also server access logs, which are probably what more people are used to for, you know, the HTTP, Apache-like server access logs. Which can be super useful. So those are the main log sources that we're gonna be talking about today. Again, there's others, but these are kinda the four core for this high level overview of starting incident response in AWS. So, what should we monitor in general? I started getting into specifics, and then I realized there's basically an entire presentation I can do on specifics on what to monitor, and so, that's probably gonna be a follow on presentation at some point. But, for now, this is just kind of a general overview of okay, here's kind of what things break down to and what you'd want to be looking for, and what to monitor. So, environment enumeration/recon. Like, they've gotta under. Attackers want to find out what's happening in there, and how do they find out that information, right? In CloudTrail these APIs are gonna look like, you know get things, list things, blah, blah, blah. List permissions, get environment resources, things like that. Resource event data collection. These are gonna be the same gets. Their gonna be describe, list, lookup, that sort of stuff. You want to maybe monitor for resource creation, mod, and deletion, right? You know, those things are pretty critical. And also log tampering and modification. Which we've seen attackers do as well, which can be pretty nasty. So, this is kind of a general overview of, if you're looking at CloudTrail logs, and you want to know which API should I be monitoring for, this is kind of how they break down into what these prefixes are associated with. So. Again, I started going down a rabbit hole of, I had like seven slides just for these specifics, and it was just gonna take too long. So, essentially it boils down into the Quadfecta of Monitoring, right. So you have Config, which is a configuration resources history. You have CloudTrail, which is your main resource for all the API calls that happen there. You have CloudWatch, which is system performance, and also OS and host level logging. And then you have Lambda. So Lambda is essentially like server lists, run this function, you know, in various program languages. You can do it in Python and Node.js or whatever. Run these things, you know, as something happens. Be like the catalyst to do something once you see something change. So, with this quadfecta, this is basically what you're gonna be focusing on for doing real time monitoring and analysis and alerting on stuff. So, you know, there's tons of services and API calls, but my best advice just to start at least at this high level review, is you know, and this isn't unique to AWS, but know how your systems work. Know what's normal. And know what's not normal. Read the AWS docs. I spent probably six months just reading through the AWS docs. It's some of the best documentation I've ever seen in my life. Absolute kudos to the Amazon team. But it is an absolute bear. There is a ton of material, and as soon as you understand it, they add some new feature. Which is great, but you gotta be on your toes to keep up with this stuff. So, think play, test. There's a lot of stuff that goes on in AWS as you start getting into the environment and saying, oh, I just did this, I'm gonna pull the CloudTrail logs and look and see what happens. There's gonna be tons of things that you didn't realize were happening in there. And there's gonna be information you expect to find that might not be. There's all sets of nuances of, you know, if you assume a role, then that data used to not be associated with the target account, that used to only be associated with the source account, and blah, blah, blah, anyways. There's a lot of stuff to go into. But, play, test, you know, think like the attacker. If I was. Knowing my environment, if I came in here, and I did x, y, and z, how would I detect that, right? So, here we are in the setup for success phase still. We're trying to say how do we set ourself up for success, how do we prepare to respond should something happen? And one of the things that I like to have our customers setup is an isolation environment. And this is one of the awesome things about operating in the cloud, and AWS makes it super easy. What I like to do is have customers create an isolation VPC, and it's basically a VLAN that's totally segmented off from everything else. And you know, give it a route to the internet, and that's it, right. So then, when something happens, you can take a system and place it in this environment and have containment, right. Not so easily done on prem. But, a lot easier done in AWS. There's also this thing called Flow Logs in AWS which is basically netflow that is enabled by, like, point and click and it's enabled, which is an absolute Godsend in a lot of situations. So you create an isolation VPC environment, set up Flow Logs, set and forget it. Just leave it there. It's not gonna take much resources. Not gonna fill up any sort of space in your S3 bucket. But it's set up there so that when something happens it's a space where you can put stuff, and you know, have containment done. And you're not cutting off network connections, you're not severing anything, right? Pretty awesome resource and capability for AWS. What I also recommend is create Responder accounts, and these are separate from your main operating accounts. These are essentially the IR roles that you're gonna use to respond to events. That don't necessarily have admin access to stuff. The base setup is probably just good to have a read-only admin, so you can at least read data, you can get the logs, you can collect stuff, but not necessarily make changes. Unless you need to. Again it depends on your environment. But a good base to start is just read-only. And then create a dedicated S3 bucket for all this collected data. So, have like a, you know, a lab, or acquisition, or whatever bucket set up, with the appropriate permissions. Say there's three responders on your team. You have a responder account. These accounts all have this responder role for read-only access and you have a bucket that's for, you know, investigations. Read-only for these people. No one else can see it. Pretty good setup. And if you recall, the Security Groups is basically the firewalls setting. So, for these instances, and for these accounts, basically allow access only in from where you know people are coming from. So, for us, we work remote. So I know the IP or IPs that I'm usually coming from when I'm doing investigations. So for my instances I set it up that I allow SSH in, but only from my known IPs, right, you want to lock stuff down. And super easy to do this sort of stuff. So, highly recommend locking it down to source IPs. You don't want people brute force attacking your response instances and somehow getting in, and then having admin read access to everything. That's a bad setup, right? Then I recommend also having a couple responder instances. And these are the ones from where you're gonna do your forensic analysis, or your investigations, right. So, have them up and running. Have like three or four ready to go. Maybe keep them shut down, or keep them running, whatever you want to do. You know, they cost when they run. But have them set up and ready to go, so that you can SSH directly into them, and start pulling data as soon as something happens. So, create a couple volumes, one for your root data, one for data, data drive, depending on how large, or how much data you're gonna deal with. If you like doing stuff locally, and you want to pull all the data down locally to the data volume, you know, create appropriately. Then add all your favorite tools. Personally, I do most, if not all my investigations with free, open source security tools. So, it's rather easy to instrument an instance for me in AWS, with Fedora and the CERT Forensic Repo, and I'm good to go. The problem starts coming when you have, say you want to operate from Windows, and you use EnCase. Like, it's not trivial to go and set up, you know, EnCase instrumented in instances in Amazon with the proper licensing, right. It's not easy to do that. So if you are going to do that, please spend the time ahead of time. The last thing you want to be doing is trying to spin up an instance. You have a compromise and you're like, uh, I can't figure out how to attach this physical dongle to an AWS instance, right. So, having set yourself up for success, you can enter the treat yourself phase. Targeted Response, right. So we have all this set up, we have the capabilities, we're ready to go. Now we're ready to collect memory and disk images. Amazon does a lot of things really well, and they provide capabilities to do what used to be complicated things super easily. Memory collection is not one of them. It surprises me. There are tools to help you automatedly SSH into instances, essentially and load kernel objects for the right kernel and collect it. But, it's not, it's not pretty. So, unfortunately, memory collection is gonna be not too dissimilar from how you do it on prem. It can be automated with those managed instances/run command, and that's. If these instances are already instrumented with an agent that allows you to do these sort of things. So that's something to think about ahead of time as you're deploying your infrastructure. Do you want to have agents running on these systems so that you can reach thousands of them at once? Or are you gonna do this manual, SSH into them. Or if you're using Windows, you're gonna be RDPing into them, what have you. A really cool thing though is you can image directly to an S3 bucket. So, you don't have to image local to the machine. You don't have to necessarily connect anything, unless you are in Windows, then you have to use some sort of third party tool to map your S3 drive. In Linux you can just use the AWS CLI tools to image directly to it. Disk image is super easy. No longer do you have to log in, or do a DD, or anything like that. You literally just right-click on the volume that you care about and do create snapshot, and it creates a bit for bit image of the drive. It is absolutely fantastic. Right, so once we've collected images, now we can isolate and move these to the VPC, right. So, just to kind illustrate this. We're not talking, okay, you have an incident on prem. Okay, where is the system located? Is the user onsite? Can we reach them? How are we gonna image this machine? Oh, it's a laptop, do we have to have them come back into the office? Or, oh, it is a desktop, and it's down here. Okay, bring your Write Blocker down, we're gonna go image it live, and make sure we have the tools and instrument it. I mean, you're talking orders of magnitude difference. You're talking like hours and days to response for this on prem sort of stuff. And you're talking minutes with disk and memory image collection from AWS. And isolation, right. I mean, how hard is it to achieve containment on prem? It's an absolute nightmare. And here, with the proper setup, you collect the images, you move it into isolation, and by the way, it's still talking to the internet, so you haven't severed internet connections, you're still monitoring what it can do. You can collect IOCs, you have Netflow logging. You can then take those IOCs, have Netflow logs for the rest of your network, see if it's happening on the rest of your network. Really awesome capabilities here. So aside from the actual system data itself, how do we collect these logs that we talked about? So, most of these logs are stored in S3. So CloudTrail, the Config logs, the CloudWatch logs. So, if it's all instrumented correctly, you log into a single S3 bucket and all you're doing is just logging in and copying all of the relevant files for the days of interest. So CloudTrail is broken down by day. You're like, oh, it was March 18th or something, log in, go to 2017, 3/16, blah, blah, blah, collect 3/17, 3/18, and then it's just a matter of doing the analysis. So, when you create a snapshot from a disk image, remember it creates a bit for bit image, but then how do you analyze that? So, you have that sitting somewhere, but now what? Well, remember we created those responder instances, we had those up and running. Well, all we need to do is again, right-click on the snapshot you created, create a volume. Right-click, attach to an instance. Then image out, if you want an actual image to store somewhere or download to your network for storage or chain of custody, or whatever you want to do. But it's that simple. I mean, it's directly attached to your machine. You've made a couple of clicks. And it's immediately attached and ready for analysis. Like, not walking a drive down anywhere. Not transferring an image over the network, right. It is immediately available. One thing that is not initially evident, that can be a royal pain is that when you first attach an image or a volume created from a snapshot it may perform at less than 50% of expected performance. And we find ourselves in a conundrum here when we basically try to take a DD image, say I attach it and I want to create an image elsewhere for other people to work on, or something like that. Because Amazon's advice is. Well, their statement is it needs to be initialized. So, every block on the volume has to be touched before it can reach its full performance. For whatever reason, whatever they do on the back end, okay. And their recommendation is well, we recommend FIO or DD on Linux to touch all of those blocks and do that. But, we're doing DD. So it's kind of a catch-22, right. So, be aware that you may have an absolutely fast SSD, highly optimized IOPs volume that you attached, and it's just giving you crap performance for imaging or accessing it, right. And it will, until you've touched all the blocks. Just a pro tip that wasn't evident for a long, long time, but now they actually have it in their docs. And memory analysis isn't dissimilar from traditional on prem. You can read it directly from S3, or copy it to your local drive for analysis. Analyze it like you would any other image. So log analysis options. So, you've got the disk and the memory, we walked through how to do all those for log analysis. So, for the manual command line gluttons in a lot of us, or some of us, or maybe very few of us. I don't know how we break down here. Personally, I love it. So for me it's just a combo of just gunzip, zcat, sift, and jq. So how many people have heard of jq? Man none. Okay. If you've worked with JSON in any length of time, jq is your absolute best friend. So, it can take a JSON blob and intelligently transform it with filter queries. So, here's kind of an example. Here's kind of a snippet of a log I took from CloudTrail, and this is what it would look like. So here's a stop instance API call. This is what you'd see in CloudTrail for when someone right-clicks or does from the command line stop instance. So. All we're doing is running JQ. You can see you have everything as a record. So it's a record. The first one, identity and user name, and boom, we get Alice. Or we want to extract the source IP address. So, if there were a bunch of these, and a bunch of records, it would be record one, two, three, four, blah, blah, blah. This is gonna be your best friend from the command line, for parsing CloudTrail logs. Or, a sys analysis, courtesy of AWS, meaning courtesy of your wallet, you can extract into Elastic Search for analysis. You know, if you have a cluster, or you have a Kibana interface, no different. You can use Kinesis to send logs directly to your cluster, that's what some people do. Something came out called Athena, relatively recently, I think a couple months ago. So before you used to have to send the data somewhere, take it somewhere out of S3, do something to work on it. But, Athena actually allows you to perform SQL queries against data right in S3. No taking it anywhere. No having to copy it off, put it in another tool, nothing. Pretty awesome. So, query to your heart's discontent, depending how much you like SQL. But it is limited to a few regions. So, what do we look for when we're doing analysis? And this is another talk that I think I'll have to give completely separate, because there's a lot of specific things to look for. But, overall, you know, start with CloudTrail. Something happened, okay. So, the OneLogin attack. So something happened. Instances were created, recon was done, they can start looking for who's doing, who's doing, who's looking for the identity, right. So there's a thing called get the current identity of who I am. Very few people are gonna be doing that. Now, some tools do it. So that's where you'll have to separate the wheat from the chaff, but think like an attacker. Like, when I get in, I'm like who am I? Right, it's the equivalent of the who am I. So, look for the calls that are requesting what identity am I. Look for a lot of successive gets or describes, or list sort of stuff. And with that, you'll be like okay, I see a user name associated with it. Or I see a certain role, or something that was created. Or you know, I have a source IP, use it. It's a star pattern approach, right. You know, same DFIR analysis skills, but the CloudTrail is gonna be kinda your main source of doing that. CloudWatch, if you had alarms set up for resources, look for any anomalies and spikes. If you had OS host logs instrumented with CloudWatch, check those out. For Config see how things transpired over time, chronologically, what happened. If you have a time frame of when you started seeing anomalous activity, well, what was changed and when? Right, this is a great tool to say, oh, this config was changed, or this volume was created or added to this instance, or these instances were set up. Use it to your advantage, identify anomalies there. And S3, review object accesses. So, maybe they want to get data out. Maybe they set up another bucket and that's how their doing exfil. They made it publicly available, and now they're doing all sort of stuff. Well, see what buckets were created. What did they get? What did they put there? What did they delete? Right. So that's kind of a high level overview of everything. I could do a ton more material here. I could talk for days about it. But, suffice to say, AWS provides a lot of these native capabilities for performing DFIR, right. I mean, so. You know, I was asked a couple months ago for our company, like what's. Is AWS complicated? Why should people move to it? And I'm like, I wish everyone would move to it, just because I would take a default AWS install any day over a default on prem install for doing any sort of incident response or forensics. Because by default, CloudTrail logs are enabled. You can see what's going on in the infrastructure. Like for any of us that are consultants, or even in your own company, like how often does something happen and you're like, oh, we never got that set up or that wasn't enabled. Or those logs weren't enabled, sorry, we didn't get around to it. Could be a nightmare, right? But there is a non-trivial ramp up. Like I said, it's easy to spend six months down in the docs getting to know everything really well. But at a certain point, you know, DFIR methodologies for AWS and on premises converge. So there's these different logging sources involved. Well, once you get the data out, or once you have the instances set up and you're doing analysis, you know, nothing changes. You still have logs. You still have disk images. You still have memory images, right. So, there is an initial ramp up that's different. But, you know, at some point they converge, and you're just being your regular DFIR rockstar self after a while. So, what did we maybe learn? Maybe we learned a couple new words. I don't know. Maybe you already knew those words. But, like I said, there's a couple more presentations I think that are coming out of this, and I didn't realize it until I started putting all this together. 'Cause there's a ton of I think really useful material for everyone. But, hopefully this just kind of whets our appetite and kinda gives you an idea into why we care about it, what's going on, how super useful this can be, whether you're on the fence and moving in, or not, or you're already in there, and you just want to figure out how do I know if an attacker is using my infrastructure right now? Because I never really looked. You know, maybe this is the time to start looking. If you have a $60,000 a month bill, it's gonna be really hard to identify that $1,000 of that was like because an attacker spun up an instance and started moving data around, right? Like, these are the things to start thinking about. And why we should care. Because this stuff's happening. (deep suspenseful music)

Info

Channel: SANS Digital Forensics and Incident Response

Views: 13,668

Rating: 4.9187818 out of 5

Keywords: digital forensics, incident response, threat hunting, cyber threat intelligence, dfir training, dfir, AWS, yt:cc=on

Id: VLIFasM8VbY

Channel Id: undefined

Length: 28min 2sec (1682 seconds)

Published: Wed Sep 20 2017