(eerie music) (audience claps) - Uh, I'm Jonathon Poling. I'm a Principal Consultant
with SecureWorks. When I saw the agenda come out, I immediately looked for
where Hal was presenting. And I saw that I was
going right before him, and I was like, oh, thank God. Hal is a hard act to follow. He's done a lot for
people in this community. He likes to say he is
successful, you know, by standing on the
shoulders of giants. And I like to say I'm
partially successful by standing on the
shoulder of a giant who stands on the
shoulders of giants. So thank you, Hal, for a lot
of the work that you've done. I'm already off script. I'm like a minute
into this thing. Be cool, play it cool. All right, here we go,
let's jump right in. So, I am a huge proponent
of the Treat Yo Self model. All right. So, what we're gonna do today is give you a short little
overview of AWS Security Model, which is a little bit
of a boring thing, but is rather important
to understanding why we care about
this stuff in general. We'll go through a few items on how to set yourself
up for success, right. By logging, monitoring,
environment prep. And then, once you're
set up for success, you can then. Oh my God, that was weak. You've had a break. You should have your second
cup of coffee in you by now. Targeted Response,
Log Collection,
System Data Analysis. Right, so you want to be
prepared for this stuff, right? You don't want to go
in, and, you know, we have customers
come into us and say, you know, something happened
and we're not prepared at all. And it's even
worse in the cloud. So, let's start with the
Shared Responsibility Model. So, you can see there's
a clear delineation here. When we talk about
incident response in AWS, with clients, they say,
well, why should I care? Like Amazon secures the cloud. Isn't it secure by default? They're responsible for
security of the cloud, right. But we're not
operating of the cloud, we're operating in the cloud. So, we are responsible
for everything basically above infrastructure. And this is super important. All right, so they provide
the infrastructure, infrastructure as a service. And this is really a
critical model to follow, which builds kind
of into the why. So, why should we care? Why should we care about doing incident response in AWS? Amazon's awesome. The do everything
amazingly well. They offer so many things. They give you so many tools
to do so many awesome things. But, they're really set up there to help facilitate
you do response. They say, here's the tools. We'll help facilitate
you doing security, but we don't do
security for you. So, why would we
care about this? Why do we care about
doing incident response, and why would we have
a presentation on this? Isn't this all taken care of? No. Right, so does anyone
here use OneLogin? Their company's at
an enterprise level. Right, so. This is relatively common place, but from the recent
OneLogin attack, right. So attackers came in through
the AWS infrastructure and the attack started at 2 am PST. Right, so they're
not doing this in the middle of the day
when everyone's at work. Right. Created several instances
to do reconnaissance. They detected it and they
saw that within minutes they were able to shut it down and determine that
the keys were stolen. So within a couple of minutes, they were able to
instrument themselves and see what was going on. But, how many people
here operate in AWS? At all. Four, five. I don't want to ask this, but, how many people
operate in Azure? Okay, good. Wanted to make sure we
weren't outnumbered. So, how many of
you can say yeah, if an attack happened at 2 am I would be able to detect
it within minutes? Can anyone here detect
an attack at all? Hey, one, all right. So these are things you need
to be thinking about, right. There's a lot of stuff
that goes on in AWS. It's not on prem. It may not have all
the same notifications you have for on prem stuff. But this stuff's
happening, right. This is why we care about it. This is not some ethereal thing. So, this is the
core mantra here. Security is my
responsibility in AWS. All right. So, how is AWS
different from on prem? Like what are the differences, what are the core things
we need to look at? There's four major log sources that I'm gonna talk about today. And there's others. There's things for like
application load balancing, elastic load
balancers, cloud front, if you're hosting
websites, blah, blah, blah. But, the four main
essentials are CloudTrail, CloudWatch,
Config, and S3. And CloudTrail's essentially
API activity logs. Almost everything in
AWS is an API call that is recorded in these logs. CloudWatch is essentially
system performance monitoring. You can also do, you know, from a CPU/IO metrics,
there's also OS and application logging
if you instrument it with the right agent. The Config logs are basically
resource configuration logs. So, what's been configured, how it's been
configured over time. And what that's look. What that looks like over time, how that historically
has changed. And S3 is essentially the
data storage area for AWS. So, these are called
buckets, the main things that you put data into in
S3 is called buckets. And everything else is
essentially an object. And then there's
server access logs, which we'll get to a little
later in the presentation. So we'll start with CloudTrail. So this logs essentially
every API call, every action that's
taken in your account. Like this is your
main log source. You can essentially think of
this as the Windows event log. Or syslog, or you
know, dmessage,
dbusmessages for Linux. This is the critical
core component of AWS where you're gonna get the
most bang for you buck, and almost essentially
everything is gonna be recorded. So, what might this help you do? You can figure out
who did what when, you know, where they took place. It has source IP, date, time,
region, things like that. Who made these calls, user
accounts, things like that. CloudWatch, performance metrics. So you can use this
to say, oh, you know, have I seen a CPU
spike, you know, was there a spike at 2 am, why? We're not operating
during that time. That sort of thing. Various high level metrics
from a performance perspective. CloudWatch also
has an agent that you can install on
each host or instance, as they're referred to in AWS, which will also allow
you to do host level, extract and analyze
host level logs. So, CloudWatch
itself isn't going to be seeing your
Windows event logs, you syslogs or
anything, but if you instrument them with
an additional agent, they can be forwarded to
CloudWatch for monitoring. Config, again, this
is the configuration history for, you
know, instances, VPC is essentially a VLAN, if you
want to think about that way. EBS volumes are basically
the hard drives, and Security Groups
is essentially the
firewalls of AWS. So this captures periodic
snapshots of config status, of everything
is configured this way at this point in time, and then, at some period
it'll take another. You can also do it manually
to do them quicker. And so, you can
capture snapshots of resources in time,
you can diff these. And then you can see
kind of over time how things have
been changed, right. So this would be
super useful, right? If something's changed
in your infrastructure, Config can be a great
resource to look at. S3 is again, the main
data storage area in AWS for most of
your data and info. So, there's
bucket-level logging, which is essentially
have you created any of these major
buckets to put data into, and they can live in
different regions. And there's additional logging. So, there's object
level logging. And again, everything
is an object in S3. So. If you ever see S3
and you get a link to something in S3
or it's hosted in S3, you'll see something that
looks like a file path. Like, you know,
bucket/this/that, and you're like, okay so it's
kind of like a file system. I'm like, no, it's objects. And it's like, but they're files and folders. Yeah, they look like
folders, but you know, they're essentially objects. And it's like, okay, well can I do a make-dir
on the command line? And you know, make
folders in there? And they're like, no,
they're not directories. They're not folders. They're objects. It's like, okay, I get
it they're objects. And then you log
into the S3 console, and all the, you know,
the high level objects are displayed as folders,
and I'm like okay. I don't follow you there. But, as long as you
can get past that, you can kind of understand
these a little bit better. And there's server access logs. So, there's things that
are hosted in there, and then there's
also the ability to have the server access logs. So, if the JSON logs, which
is the default format, isn't enough for you
for the log level, or the bucket level logging
or the object level logging, there's also server access logs, which are probably what
more people are used to for, you know, the HTTP,
Apache-like server access logs. Which can be super useful. So those are the
main log sources that we're gonna be
talking about today. Again, there's
others, but these are kinda the four core for
this high level overview of starting incident
response in AWS. So, what should we
monitor in general? I started getting
into specifics, and then I realized
there's basically an entire presentation
I can do on specifics on what to monitor,
and so, that's probably gonna be a follow on
presentation at some point. But, for now, this is just
kind of a general overview of okay, here's
kind of what things break down to and what you'd
want to be looking for, and what to monitor. So, environment
enumeration/recon. Like, they've gotta under. Attackers want to find out
what's happening in there, and how do they find out
that information, right? In CloudTrail these APIs
are gonna look like, you know get things, list
things, blah, blah, blah. List permissions, get
environment resources, things like that. Resource event data collection. These are gonna
be the same gets. Their gonna be describe, list,
lookup, that sort of stuff. You want to maybe monitor
for resource creation, mod, and deletion, right? You know, those things
are pretty critical. And also log tampering
and modification. Which we've seen
attackers do as well, which can be pretty nasty. So, this is kind of a
general overview of, if you're looking
at CloudTrail logs, and you want to know which API
should I be monitoring for, this is kind of how
they break down into what these prefixes
are associated with. So. Again, I started going
down a rabbit hole of, I had like seven slides
just for these specifics, and it was just
gonna take too long. So, essentially
it boils down into the Quadfecta of
Monitoring, right. So you have Config, which is a configuration
resources history. You have CloudTrail, which
is your main resource for all the API calls
that happen there. You have CloudWatch, which
is system performance, and also OS and
host level logging. And then you have Lambda. So Lambda is essentially
like server lists, run this function, you know,
in various program languages. You can do it in Python
and Node.js or whatever. Run these things, you
know, as something happens. Be like the catalyst
to do something once you see something change. So, with this quadfecta,
this is basically what you're gonna be focusing on for doing real time
monitoring and analysis and alerting on stuff. So, you know, there's
tons of services and API calls,
but my best advice just to start at least at
this high level review, is you know, and this
isn't unique to AWS, but know how your systems work. Know what's normal. And know what's not normal. Read the AWS docs. I spent probably six
months just reading through the AWS docs. It's some of the
best documentation I've ever seen in my life. Absolute kudos to
the Amazon team. But it is an absolute bear. There is a ton of
material, and as soon as you understand it, they
add some new feature. Which is great, but you
gotta be on your toes to keep up with this stuff. So, think play, test. There's a lot of stuff
that goes on in AWS as you start getting
into the environment and saying, oh, I just did this, I'm gonna pull the
CloudTrail logs and look and see what happens. There's gonna be
tons of things that you didn't realize were
happening in there. And there's gonna be information you expect to find
that might not be. There's all sets of nuances of, you know, if you assume a role, then that data used
to not be associated with the target
account, that used to only be associated
with the source account, and blah, blah, blah, anyways. There's a lot of
stuff to go into. But, play, test, you know,
think like the attacker. If I was. Knowing my environment,
if I came in here, and I did x, y, and z, how
would I detect that, right? So, here we are in the setup
for success phase still. We're trying to
say how do we set ourself up for success,
how do we prepare to respond should
something happen? And one of the things
that I like to have our customers setup is
an isolation environment. And this is one of
the awesome things about operating in
the cloud, and AWS makes it super easy. What I like to do
is have customers create an isolation
VPC, and it's basically a VLAN that's totally segmented
off from everything else. And you know, give it a
route to the internet, and that's it, right. So then, when something happens, you can take a
system and place it in this environment and
have containment, right. Not so easily done on prem. But, a lot easier done in AWS. There's also this thing
called Flow Logs in AWS which is basically netflow
that is enabled by, like, point and click
and it's enabled, which is an absolute Godsend
in a lot of situations. So you create an
isolation VPC environment, set up Flow Logs,
set and forget it. Just leave it there. It's not gonna take
much resources. Not gonna fill up any sort of space in your S3 bucket. But it's set up there so
that when something happens it's a space where
you can put stuff, and you know, have
containment done. And you're not cutting
off network connections, you're not severing
anything, right? Pretty awesome resource
and capability for AWS. What I also recommend is
create Responder accounts, and these are separate from
your main operating accounts. These are essentially
the IR roles that you're gonna use to
respond to events. That don't necessarily
have admin access to stuff. The base setup is probably just good to have a read-only admin, so you can at least read data, you can get the logs,
you can collect stuff, but not necessarily
make changes. Unless you need to. Again it depends on
your environment. But a good base to
start is just read-only. And then create a
dedicated S3 bucket for all this collected data. So, have like a,
you know, a lab, or acquisition, or
whatever bucket set up, with the appropriate
permissions. Say there's three
responders on your team. You have a responder account. These accounts all have
this responder role for read-only access
and you have a bucket that's for, you
know, investigations. Read-only for these people. No one else can see it. Pretty good setup. And if you recall,
the Security Groups is basically the
firewalls setting. So, for these instances, and for these
accounts, basically allow access only
in from where you know people are coming from. So, for us, we work remote. So I know the IP or
IPs that I'm usually coming from when I'm
doing investigations. So for my instances I set it up that I allow SSH in, but
only from my known IPs, right, you want to
lock stuff down. And super easy to do
this sort of stuff. So, highly recommend locking
it down to source IPs. You don't want people
brute force attacking your response instances
and somehow getting in, and then having admin
read access to everything. That's a bad setup, right? Then I recommend also having
a couple responder instances. And these are the
ones from where you're gonna do your
forensic analysis, or your investigations, right. So, have them up and running. Have like three or
four ready to go. Maybe keep them shut down,
or keep them running, whatever you want to do. You know, they
cost when they run. But have them set
up and ready to go, so that you can SSH
directly into them, and start pulling data as
soon as something happens. So, create a couple volumes,
one for your root data, one for data, data drive,
depending on how large, or how much data
you're gonna deal with. If you like doing stuff locally, and you want to
pull all the data down locally to the data volume, you know, create appropriately. Then add all your
favorite tools. Personally, I do most, if
not all my investigations with free, open
source security tools. So, it's rather
easy to instrument an instance for me
in AWS, with Fedora and the CERT Forensic
Repo, and I'm good to go. The problem starts coming when you have, say you want
to operate from Windows, and you use EnCase. Like, it's not trivial
to go and set up, you know, EnCase
instrumented in instances in Amazon with the
proper licensing, right. It's not easy to do that. So if you are going to do that, please spend the
time ahead of time. The last thing you
want to be doing is trying to spin
up an instance. You have a compromise
and you're like, uh, I can't figure out how to attach this physical dongle
to an AWS instance, right. So, having set yourself
up for success, you can enter the
treat yourself phase. Targeted Response, right. So we have all this set up, we have the capabilities,
we're ready to go. Now we're ready to collect
memory and disk images. Amazon does a lot of
things really well, and they provide
capabilities to do what used to be complicated
things super easily. Memory collection
is not one of them. It surprises me. There are tools to
help you automatedly SSH into instances,
essentially and load kernel objects for the
right kernel and collect it. But, it's not, it's not pretty. So, unfortunately,
memory collection is gonna be not too dissimilar from how you do it on prem. It can be automated with those managed instances/run
command, and that's. If these instances are
already instrumented with an agent that allows you
to do these sort of things. So that's something to
think about ahead of time as you're deploying
your infrastructure. Do you want to
have agents running on these systems so that you can reach thousands of them at once? Or are you gonna do this
manual, SSH into them. Or if you're using Windows,
you're gonna be RDPing into them, what have you. A really cool thing though is you can image directly
to an S3 bucket. So, you don't have to
image local to the machine. You don't have to
necessarily connect anything, unless you are in
Windows, then you have to use some sort of third party
tool to map your S3 drive. In Linux you can just
use the AWS CLI tools to image directly to it. Disk image is super easy. No longer do you have to log in, or do a DD, or
anything like that. You literally just
right-click on the volume that you care about
and do create snapshot, and it creates a bit for
bit image of the drive. It is absolutely fantastic. Right, so once we've
collected images, now we can isolate
and move these to the VPC, right. So, just to kind
illustrate this. We're not talking, okay, you
have an incident on prem. Okay, where is the
system located? Is the user onsite? Can we reach them? How are we gonna
image this machine? Oh, it's a laptop, do we have to have them come back
into the office? Or, oh, it is a desktop,
and it's down here. Okay, bring your
Write Blocker down, we're gonna go image
it live, and make sure we have the tools
and instrument it. I mean, you're talking orders
of magnitude difference. You're talking like hours
and days to response for this on prem sort of stuff. And you're talking minutes with disk and memory image
collection from AWS. And isolation, right. I mean, how hard is it to
achieve containment on prem? It's an absolute nightmare. And here, with the proper setup, you collect the images,
you move it into isolation, and by the way, it's still
talking to the internet, so you haven't severed
internet connections, you're still monitoring
what it can do. You can collect IOCs,
you have Netflow logging. You can then take those IOCs, have Netflow logs for
the rest of your network, see if it's happening on
the rest of your network. Really awesome
capabilities here. So aside from the actual
system data itself, how do we collect these
logs that we talked about? So, most of these
logs are stored in S3. So CloudTrail, the Config
logs, the CloudWatch logs. So, if it's all
instrumented correctly, you log into a single S3 bucket and all you're doing
is just logging in and copying all of
the relevant files for the days of interest. So CloudTrail is
broken down by day. You're like, oh, it was
March 18th or something, log in, go to 2017,
3/16, blah, blah, blah, collect 3/17,
3/18, and then it's just a matter of
doing the analysis. So, when you create a snapshot
from a disk image, remember it creates
a bit for bit image, but then how do
you analyze that? So, you have that sitting
somewhere, but now what? Well, remember we created
those responder instances, we had those up and running. Well, all we need
to do is again, right-click on the
snapshot you created, create a volume. Right-click, attach
to an instance. Then image out, if you
want an actual image to store somewhere or
download to your network for storage or chain of custody, or whatever you want to do. But it's that simple. I mean, it's directly
attached to your machine. You've made a couple of clicks. And it's immediately attached
and ready for analysis. Like, not walking a
drive down anywhere. Not transferring an image
over the network, right. It is immediately available. One thing that is not
initially evident, that can be a royal pain is that when you first attach an image or a volume created
from a snapshot it may perform at less than
50% of expected performance. And we find ourselves
in a conundrum here when we basically try
to take a DD image, say I attach it and I
want to create an image elsewhere for other
people to work on, or something like that. Because Amazon's advice is. Well, their statement is
it needs to be initialized. So, every block on the
volume has to be touched before it can reach
its full performance. For whatever reason, whatever
they do on the back end, okay. And their
recommendation is well, we recommend FIO or DD on Linux to touch all of those
blocks and do that. But, we're doing DD. So it's kind of a
catch-22, right. So, be aware that you may
have an absolutely fast SSD, highly optimized IOPs
volume that you attached, and it's just giving
you crap performance for imaging or
accessing it, right. And it will, until you've
touched all the blocks. Just a pro tip that
wasn't evident for a long, long time, but now they
actually have it in their docs. And memory analysis
isn't dissimilar from traditional on prem. You can read it
directly from S3, or copy it to your local
drive for analysis. Analyze it like you
would any other image. So log analysis options. So, you've got the
disk and the memory, we walked through how to do
all those for log analysis. So, for the manual
command line gluttons in a lot of us, or some of
us, or maybe very few of us. I don't know how
we break down here. Personally, I love it. So for me it's just a
combo of just gunzip, zcat, sift, and jq. So how many people
have heard of jq? Man none. Okay. If you've worked with JSON
in any length of time, jq is your absolute best friend. So, it can take a JSON
blob and intelligently transform it with
filter queries. So, here's kind of an example. Here's kind of a
snippet of a log I took from
CloudTrail, and this is what it would look like. So here's a stop
instance API call. This is what you'd
see in CloudTrail for when someone
right-clicks or does from the command
line stop instance. So. All we're doing is running JQ. You can see you have
everything as a record. So it's a record. The first one,
identity and user name, and boom, we get Alice. Or we want to extract
the source IP address. So, if there were
a bunch of these, and a bunch of
records, it would be record one, two, three,
four, blah, blah, blah. This is gonna be your best
friend from the command line, for parsing CloudTrail logs. Or, a sys analysis,
courtesy of AWS, meaning courtesy of your wallet, you can extract into
Elastic Search for analysis. You know, if you have a cluster, or you have a Kibana
interface, no different. You can use Kinesis to send logs directly to your cluster,
that's what some people do. Something came
out called Athena, relatively recently, I
think a couple months ago. So before you used to have
to send the data somewhere, take it somewhere out of S3,
do something to work on it. But, Athena actually
allows you to perform SQL queries
against data right in S3. No taking it anywhere. No having to copy it off, put
it in another tool, nothing. Pretty awesome. So, query to your
heart's discontent, depending how much you like SQL. But it is limited
to a few regions. So, what do we look for
when we're doing analysis? And this is another
talk that I think I'll have to give
completely separate, because there's a lot of
specific things to look for. But, overall, you know,
start with CloudTrail. Something happened, okay. So, the OneLogin attack. So something happened. Instances were created,
recon was done, they can start looking
for who's doing, who's doing, who's looking for
the identity, right. So there's a thing called get the current identity
of who I am. Very few people are
gonna be doing that. Now, some tools do it. So that's where you'll
have to separate the wheat from the chaff,
but think like an attacker. Like, when I get in,
I'm like who am I? Right, it's the equivalent
of the who am I. So, look for the calls
that are requesting what identity am I. Look for a lot of
successive gets or describes, or
list sort of stuff. And with that,
you'll be like okay, I see a user name
associated with it. Or I see a certain role, or
something that was created. Or you know, I have
a source IP, use it. It's a star pattern
approach, right. You know, same DFIR
analysis skills, but the CloudTrail
is gonna be kinda your main source of doing that. CloudWatch, if you had
alarms set up for resources, look for any
anomalies and spikes. If you had OS host logs
instrumented with CloudWatch, check those out. For Config see how
things transpired over time, chronologically,
what happened. If you have a time frame of when you started seeing
anomalous activity, well, what was changed and when? Right, this is a
great tool to say, oh, this config was
changed, or this volume was created or added
to this instance, or these instances were set up. Use it to your advantage,
identify anomalies there. And S3, review object accesses. So, maybe they want
to get data out. Maybe they set up another bucket and that's how
their doing exfil. They made it publicly available, and now they're doing
all sort of stuff. Well, see what
buckets were created. What did they get? What did they put there? What did they delete? Right. So that's kind of a high
level overview of everything. I could do a ton
more material here. I could talk for days about it. But, suffice to
say, AWS provides a lot of these
native capabilities for performing DFIR, right. I mean, so. You know, I was asked
a couple months ago for our company, like what's. Is AWS complicated? Why should people move to it? And I'm like, I wish
everyone would move to it, just because I would take
a default AWS install any day over a default
on prem install for doing any sort of incident
response or forensics. Because by default,
CloudTrail logs are enabled. You can see what's going
on in the infrastructure. Like for any of us
that are consultants, or even in your own
company, like how often does something happen
and you're like, oh, we never got that set
up or that wasn't enabled. Or those logs weren't enabled, sorry, we didn't
get around to it. Could be a nightmare, right? But there is a
non-trivial ramp up. Like I said, it's easy
to spend six months down in the docs getting to
know everything really well. But at a certain
point, you know, DFIR methodologies for AWS
and on premises converge. So there's these different
logging sources involved. Well, once you get the data out, or once you have
the instances set up and you're doing analysis,
you know, nothing changes. You still have logs. You still have disk images. You still have
memory images, right. So, there is an initial
ramp up that's different. But, you know, at some
point they converge, and you're just being
your regular DFIR rockstar self after a while. So, what did we maybe learn? Maybe we learned a
couple new words. I don't know. Maybe you already
knew those words. But, like I said,
there's a couple more presentations I think that
are coming out of this, and I didn't realize
it until I started putting all this together. 'Cause there's a ton
of I think really useful material for everyone. But, hopefully this just
kind of whets our appetite and kinda gives you an idea into why we care about
it, what's going on, how super useful this can be, whether you're on the
fence and moving in, or not, or you're
already in there, and you just want to figure out how do I know if an
attacker is using my infrastructure right now? Because I never really looked. You know, maybe this is
the time to start looking. If you have a
$60,000 a month bill, it's gonna be really
hard to identify that $1,000 of that
was like because an attacker spun up an instance and started moving
data around, right? Like, these are the things
to start thinking about. And why we should care. Because this stuff's happening. (deep suspenseful music)