(upbeat music) - My name is Eric Brandwine, and I'm a distinguished engineer
with the AWS Security Team. One of the most common questions
that I get from customers is, how do you do that? I'm an engineer, and most
of the people that I talk to that our customers are engineers. Engineers like technology, which is good. That's why they're engineers. And usually when I get this question, what they're really asking is more about the tools that we have, the things that we've
purchased or that we've built, the technical mechanisms that we use to run the AWS Security Team. But I've been at this for a while, and that means two things: One, I've made a bunch of mistakes, and I've learned a lot of
things that don't work; and two, the scale is daunting. Exponential growth is
implacable and impressive. I've had to completely change how I think about getting
the security job done. I've realized that the
single most important thing that we have is our organization,
our humans, our people. Scaling as a leader, even
scaling as an engineering leader, is a very people-intensive process. My job is still technical. I still dive deep and get
involved in the details, but the most important
thing that I work on, that I build, isn't built using computers. So as is the style these
days on the Internet, we're gonna talk about
scale using bananas. And so at first I built tools. When we started AWS Security,
it was me and three managers. I was literally 100% of
our engineering bandwidth. If it happened, I did it. This is the job that I
thought I was gonna have when I was in school, and
I was super happy with it. But then the banana turned
into a bunch of bananas. As things got larger, I helped
other people build tools. This was great. We were getting a lot more done, and I had to learn a bunch of new skills. It still kind of matched my expectations. We were a team, I was becoming
a leader, and it was awesome. But remember exponential growth, the banana bunch is now
an entire banana tree. The cloud got bigger. The team got bigger. It got to the point where no single person could even be aware of every single tool, every development effort
that was underway. The goal now was to build the
org such that the right tools were built at the right speed
with the right quality bar, even more new skills to learn,
a bunch of new challenges, and still tons of fun. We were a team of teams, and I figured out that
the most important thing that I was building was the
AWS Security Organization. But remember, exponential
growth keeps on marching. The only analogy that I
have for this kind of scale is an entire banana plantation. It got to the point that no single person could even be aware of every
single hiring decision, every single headcount allocation, every single roadmap trade-off. I really had this problem
at the banana tree stage, but it wasn't until the
universe rubbed my nose in it that the dime dropped for me, that I really understood the challenge that I had to work on. I couldn't build the AWS
Security Organization. It was too large. It changed too quickly. I couldn't keep up. My mechanisms for scaling
myself kept breaking down. I was forced to personally
confront a lesson that large-scale leaders
throughout history have had to learn. No longer could I stretch
myself such that I had a link, no matter how tenuous, to even the big things that were going on. Even if I ignored the
actual security work, I couldn't keep up with the pace of building the organization. What now? We have to build an organization that not only built the right tools with the right speed
and the right quality, it had to, on its own, build
more of that organization. It had to be self perpetuating
and mostly autonomous. Now what? How do we build this machine that builds itself using humans, which are notoriously non-standardized and difficult to predict? So just a caveat here, I used the word I an
awful lot on this slide. This was my story about my growth alongside the growth of AWS Security. I'm one of many people
who have helped grow and shape this organization. There may be a stage that
comes after banana plantation, but this is as far as I've
gotten in my career growth. I'll let you know if I get there. Anyway, the answer to
building an organization that builds more of itself
is large and complicated, and I won't claim to
really know all of it. But an important part of
the answer is culture. We are, all of us, members of
a bunch of different cultures. I'm an American, I'm a
Jew, I'm an Amazonian. Each of these groups has their
own customs, their own norms, and they tend to be self-perpetuating. I can walk into a synagogue,
even one I've never been in, and I know what to say,
what to do, what not to do, even though I've never been there before, never met these people. Amazon has gotten large, and there are teams I've
never heard of before. But when I meet a new team, we're working off the same playbook. We've talked a bunch in various fora about elements of our Amazon-wide culture, like our leadership principles,
our use of bar raisers, working backwards
documents, things like that. Those are incredibly important mechanisms for the Amazon-wide culture, but AWS Security isn't all of Amazon. We've got our own peculiarities,
our own priorities. And we have to make sure that our team is working on the right
stuff in the right direction and that our new hires are
brought into this culture. How do you make a culture? And again, I don't claim
to have the entire answer, but one of the mechanisms that we use for intentionally building and steering our culture are tenets. Effectively, tenets can be
viewed as rules for a culture, what we want people to do,
how we want them to act, what is important to us as a group. Good tenets are hard. Often, you've got a
culture that is evolved. Hopefully, you mostly like it. It's really tough to
think about your culture, to step outside of it and
look at it objectively and to write down the rules that capture what you like about it. It can be harder to write down rules that address what you don't like about it. Often cultures feel
instinctive and subconscious. Not only do you have to be able to think about these all
but automatic behaviors, you have to worry about
the unintended consequences of the changes that you're trying to make. You're not gonna get this
right the first time, and that's okay. Good tenets are often in
tension with each other. They're not just simple
declarative statements that can be just followed. They're guideposts, ways of thinking that help people make good decisions in unforeseen situations. If you're gonna write down tenets, if you're gonna try to make them an element of your culture, you
have to take them seriously. As leaders, you have to follow the tenets, to use them in conversation, or nobody in your organization is gonna take them seriously, either. So enough lead-up here,
let's get to our tenets. And this is how tenets are
always presented at Amazon: Our tenets, unless you know better ones. And it's an honest offer. I've taken feedback on our tenets. I've given other teams
feedback on their tenets. Literally everyone is
invited to speak up here, and it can be difficult for
junior people, new people. It can be very uncomfortable for them to feel comfortable challenging tenets, and it's our job as leaders
to give them the space and the comfort to do so. Our tenets are posted publicly,
publicly within Amazon on the AWS Security Wiki page. Tenets can't be limited access. They can't be need-to-know.
They can't be restricted. And so literally anyone in Amazon that wants to read our tenets
can go to our Wiki page, and they can read our tenets and they can understand what we value and how the team is going
to prioritize their work. So one, we lead in preventing
unauthorized access to AWS resources, our customers' or ours. We continually assess our
systems, identify exposures, evaluate risks, and
relentlessly drive mitigations. Our first tenet seems pretty
obvious for a security team, but there's some nuances here. Just writing this tenet down changes it from an implicit assumption into an explicit expectation of our team. At pretty much every re:Invent, Andy Jassy would say security is job zero at some point during his keynote, and he's serious about that. Every team owns the
security of their services, which is great, because at this scale, we have to have everyone pulling with us, but we're out in front. We lead. This is our focus. But it doesn't say we lead
Amazon or we lead AWS. We expect our team to be out in front, not just inside the company,
but outside as well. If there's a security issue to be found, we're the ones that should find it. Our customers or ours, this
scopes our responsibility. Of course, the AWS services,
infrastructure, data centers, et cetera, and all of
our internal usage of AWS is within our charter. But this tenet tells us
that if our customers are not getting the right
outcomes, that we need to engage. We have the shared responsibility model. Our customers are responsible
for their own security, and they have deeper knowledge of what is and is not acceptable for
their applications than we do. However, this tenet tells
us that we need to care about the actual resulting
customer experience. It's the bit of our culture
resulting from this tenet that led to the launch of our customer-facing security services, like GuardDuty and Security Hub. And relentlessly, security
can be exhausting. We're here for the long haul. We have to work with the
service teams for years, and we have to have a good
working relationship with them. This tenet tells everyone
in the Security Organization and everyone else that reads our tenets that we expect our team
to doggedly drive issues until they are done, done, done. The fact that one of our
engineers won't drop something isn't them being annoying. It's them doing exactly
what they should be doing. Two, we constantly provide
visibility to senior leadership into the biggest potential risks backed up with data and
carefully prioritized. I've talked in the past about the culture of rapid escalation that we have at AWS, and this tenet has elements of this. Constantly, we are expected, not only to report up to our leadership, we're expected to do so all the time. There are plenty of companies
where you avoid escalation, where bothering your executives is a sign that you failed to do your
job and that you need help. Here, we not only expect people to keep our leaders informed, we expect them to do it all the time. Backed up with data. Security is inherently
dealing with the unexpected, unique unforeseen events, but even so, security at AWS
is a data-driven discipline. We may not have all the
data or even a lot of data, but anytime we engage, we
bring the data we have. When we have the data, when
it's available, we have it. We're familiar with it
and we can speak to it. And carefully prioritized. What we're saying is
that we're gonna call it like we see it, no matter how
uncomfortable that may be. If we think that flagship
launch at re:Invent isn't ready and won't be ready for six months, that's what we're going to say. If the security thing that Adam or Andy has been asking about literally every week for the past couple of months is what we believe to
be the third or fourth or 12th priority, that's
what we're going to say. Our most constrained resource,
would anyone care to guess what are most constrained
resource is across AWS and Amazon? It's our builders, our engineers. Every day that they spend
working on a security effort, they're not working on
new features or services. They're not even working
on other security tasks. Security teams inevitably
run into tough trade-offs, and we don't punt on that problem. We own it, we dig in, and we bring forward our best suggested prioritization. Three, we escalate
appropriately yet aggressively to ensure that security
issues are resolved promptly and with high judgment. If in doubt, we will escalate. Right here, clearly articulated, we've got our culture of escalation. I could highlight the entire tenet but will refrain from doing so. If you hang out at Amazon long
enough, you hear people talk about making high-quality,
high-velocity decisions. That's what the promptly
and with high judgment is getting to. We're gonna do it fast, and
we're gonna do it right. We reject the false
choice between the two. And the way we do this is by escalation. If a group of us is not
confident in the decision that we're making, or if we
can't converge on a decision, we don't yet have the
right people engaged, and we need to escalate. It's easy to escalate aggressively. I could just page Adam or
Andy or Jeff for every issue, and that would qualify
as aggressive escalation, but it wouldn't be appropriate. But again, we're going to do this well, and we're gonna do this quickly. We're gonna eat our cake and have it too. We do this by calibrating
our team members, by encouraging escalation,
and by giving them cheap, low-risk escalation paths, it can be really uncomfortable
for a more junior engineer to escalate to a general
manager or vice president that they've never met. Honestly, it's unfair
to expect them to do so. Instead, we make it clear that escalation within the AWS Security
Organization is free. Everyone has a manager. Everyone has teammates that they trust. Those are great first
points of escalation, and those people can help
calibrate the escalation. They have a broader network that can help bring the right people in. And everyone in our team knows that they can call on the
leaders of AWS Security at any time, day or night,
if they need our help. And one of the things that we help with is calibrating escalations. We have a bunch of on-call rotations. There's a dedicated pager carrier for these internal escalations
and I'm on that rotation. But Steve, CJ, and I, as
well as other leaders, are always available. I will underline this bit here. If in doubt, we will escalate. This is super clear, in
plain, unambiguous English. This is an example of a tenet
as declarative instruction. This tenet came out of
a review of our tenets with Andy Jassy. He's the one that added this sentence. It captures a clear expectation of our most senior leadership
from the CEO on down. One of the common concerns that I hear in response to talking about
this culture of escalation is that our pagers must
be going off all the time. Do we ever sleep? Isn't it exhausting being
on-call all the time? In reality, no, it's not a problem. The number of inappropriate escalations that I've been involved
in is stunningly low. I've been at the company
for 13 years, Very few. Almost every time I've
been pulled into an issue, it's been the right call. The times when someone
else could have handled it or there was a better way to escalate just serve feedback to me
and to the other leaders on our training and our tooling to make sure that there's better
escalation paths next time. When we've dug in and something
turns out to be no issue, people often apologize to us. It's natural. You've got someone. This is
their first security event. They're not sure what to do. They hesitantly push the Page
Someone Right Now button, and it comes back and it
says, there's no issue. There's nothing wrong. And the natural reaction there is to say, I'm so sorry for paging you
in the middle of the night for something that was no issue. And people across AWS
Security at every level always respond, "Nope, that
was the right thing to do." I would rather have a
mountain of no issues than a single missed issue. It's wonderful. It thrills me so much, and it's a sign of the culture in action. Four, we are guardians of
customer privacy and trust. We advocate for our customers
in all security engagements. This tenet is pretty straightforward. It tells us who we're working for. Privacy and trust, this
bit clarifies our charter. Our relationship at AWS with
our customers is deep and rich. They trust us not just with their data, but with the computations on that data. It doesn't matter how much vetting you do when selecting a partner. It doesn't matter how
many compliance controls and audits they can produce evidence for. It doesn't matter how many security or encryption features they offer. At the end of the process, you have to make a decision
to trust this partner. It is our job to make
sure that AWS is worthy of the truly humbling amount of trust that our customers have placed in us. But this tenet doesn't
just talk about trust. It also talks about privacy. We just had the Fireside
Chat about privacy. Privacy is a foundational
part of what we do. We ensure that the data that
customers have trusted us with is used and retained in accordance
with their expectations. And all, I like this word here. It doesn't matter if you're involved in an application security
review, compliance audit, design review, high-severity
security ticket, or literally any other
activity, the answer to, "Is now the time to speak
up for our customers?", is always yes, always. Our customers can't be
present in these meetings, these engagements, these tickets, so we're there to speak for them. This can come across as
corny and perhaps trite, but I actually find it really empowering. One of our leadership principles
is customer obsession. It's a pillar of the Amazon-wide culture. In my role as a security engineer, I regularly take unpopular uncomfortable and even borderline heretical positions, but I've never had a problem
as long as I could show that I was doing so from a
position of customer obsession. This tenet is one of the ties
between our team's culture and the broader Amazon culture. Five, we own security for all of AWS, including third-party
and open-source software. We take nothing as a given and extensively test
all of our components, even those built by other
parts of the company. If something doesn't work for
us, we will move off of it. At re:Invent in 2017, I gave a talk about
normalization of deviance. In the talk, I go through the
tragic story of a plane crash. Highly trained pilots failed to follow the approved procedures, leading to overrunning
the end of the runway, a crash, and the deaths of all onboard. I'm not gonna recount
that entire story here. It's literally a different talk, but it's a topic that I
worried about four years ago, and it's one that's still
front of mind for me today. When you dig in, you realize that these
highly trained pilots, they go through a tremendous
amount of training. The airplane is incredibly expensive. You don't entrust it to just anyone, yet these highly trained
pilots made these mistakes that led to their deaths and those of all of their passengers. And when you dig in, you learn that this was
not a one-time failure. These pilots didn't come to
work one morning and say, "Let's get sloppy today." Slowly, likely over years,
their discipline slipped. There were no negative
consequences for their actions, and so their discipline slipped some more. The local community of pilots, perhaps it was just these
two that work together, perhaps there was a larger
group that all worked together, all acted alike. Had an outsider come into that group, they would have been appalled. And this is called
normalization of deviance. Our application security
process is constantly evolving, improving as we get better at our jobs and as our tools get better. Yet, the services that
we launched a year ago or three years ago or five
years ago were pretty good. Customers liked them. And that older application
security process was a lot easier. This team is under a lot
of pressure to launch, and just a couple of years ago, we didn't have to do
this step or that step. Can't we skip them just this once? And perhaps most frustrating, it's hard to know when
your security efforts have made a difference. "We fuzzed that piece of code for weeks. We fixed a dozen bugs. It's been running flawlessly for a year." Is it running cleanly
because we fuzzed it, or would it be doing just
fine without the fuzzing? If you look back on some of
the largest IT security flaws in the industry, things like Heartbleed, EternalBlue, Spectre, and Meltdown, one thing that they all have in common is that they were present in
the code or in the hardware for years before they became public. Is my security work making
a difference at all? The feedback loop between what
we do and the ensuing results can often be very long and very lossy. In some cases, it's possible
to make mistakes in security that have no negative
consequences for years, you can see how it would
be easy in security for your discipline to slip. You relax a bit, nothing bad
happens. Teams move faster. You relax a bit. Nothing
bad happens, 20 goto 10. At the end of this process lies a loss of customer privacy and trust. We cannot go down this path. So five, we own security for all of AWS, including third-party
and open-source software. We take nothing as a given and extensively test
all of our components, even those built by other
parts of the company. If something doesn't work for
us, we will move off of it. This tenet is one of our efforts to prevent normalization of deviance. All of AWS. Here's another bit that
scopes our charter. We own security for AWS, full stop, from the cement slab in the data center through the power and
cooling gear of the servers, the network, the services
we build, everything. It's a daunting task, but it makes the ticket routing
flow chart really simple. You've got a security problem. We're it. It doesn't matter who wrote it, where we got it, who runs it. If it affects the security
of AWS, it's ours. Across the Internet, there
are defacto standards. If you need a library for parsing XML, for terminating TLS, for any of a huge number of common tasks, there's a preferred choice. Everyone's using it. It's the most popular
library for this task, pretty much everywhere. The obvious assumption is that it's good and that we should use it too. We're not allowed to make that assumption. It may be good. It may not be good. We have to go and actually
find out, get actual facts, make an informed decision. We take nothing as a given. This tenet is telling us to
make our assumptions explicit and then to question them. This is incredibly hard to do, but you get better at it with practice. We can't become complacent. We can't allow our discipline to slip. We will move off of it. This last clause tells us
that there's yet another way, that not only is it okay to put ourselves in an
uncomfortable situation, we're expected to do so. Amazon has a rich technical legacy. We've got decades of tools
built by really smart people across the company. But many of these tools
were built, for example, for a single tenet e-commerce
site, massive, scaled, secure for their intended
purpose, but very different from the low-level multi-tenet
infrastructure services that AWS started with. It may be that everyone
else in the company is using some tool, but
if it's not right for AWS, we're not going to use it. If we're already using
it, we will migrate. These migrations are expensive, and it can be difficult
to tell the service team, your service that you're
perfectly happy with and really proud of needs rework. Instead of that cool feature,
you need to do this migration. This tenet tells us that
not only are we allowed to speak up here, we're
obligated to do so. Amazon. We are the one-stop shop for all security questions within Amazon. In cases where we don't own the answer, we own getting the question answered. Amazon is a large distributed company. Teams are good at navigating
Amazon within some radius, and commonly performed tasks converge to some level
of reasonable efficiency. But outside that radius,
it can be very difficult to navigate, to find the right owner. And so one pattern that I've seen here is that someone cuts a ticket to their best guess for the right owner. That on-call engages and
says, "Nope, that's not us. Try team two." Team two engages and says,
"Nope, that's not us. Try team three." This can continue for quite a while. And in the most frustrating cases, you wind up looping back to a
team you've already talked to. I call this ticket ping pong. This is a waste of everyone's time. It doesn't move us any closer
to resolving the issue, but it's rational behavior
for each of those on-calls. Each of those on-calls is trying to limit how much time they spend on a problem that isn't their problem. They're being helpful, but they're only being locally helpful. For us, security is normal. The quotidian pager tickets, the constant risk management decisions, it's what we're trained for,
and it's familiar to us. For our service teams, an urgent security issue is
unusual, unfamiliar, unsettling. Getting lost in a twisty maze
of security passages all alike is exactly the wrong outcome. If we've got an urgent security issue, we can't waste time on ticket ping pong. As large as our team is it's way too small to secure AWS alone. We own security, but
security is everyone's job. If we're gonna drive the
right outcomes for customers, we need all the service teams, everyone pulling along with us. And to be clear, the right
outcomes for customers don't mean Security always gets its way. The service teams have deeper
business context than we do. And in order to get to those
high-quality decisions, we need to have a productive
relationship with them. Even if the issue isn't urgent, tickets that bounce around
from queue to queue, email threads where nobody owns the issue, unproductive meetings
with the wrong people are frustrating and disappointing and they erode that relationship. If you have a security problem
and you get ahold of someone in AWS Security, it's sticky. You may have found the wrong person. It may not be their job to
help you, but per this tenet, our answer has to be, "That's not us. You probably want Team X. I will reach out to them
and find an owner for this." We're gonna spend more
of our time right now, locally suboptimally, so that we get a more
globally optimal result. It's inherent in security that you're gonna have
plenty of uncomfortable, unexpected discussions that put a strain on your relationships. You've got to take any opportunity you can to invest in those relationships, to build strength that you
can rely on when you need it. We found that this intake
process, this first impression, the time between having
a question or an issue and finding someone to help
is surprisingly important. Seven, we drive our work to focus on the most critical security
risks for the business. They will be prioritized
first for the business and then for the service teams. we will ensure each
expectation is well-understood, actionable, and supported
by appropriate tooling. Security is the art and
science of risk management. There's no organization on earth that has zero security risk. The only way to drive
security risk to zero is to not offer any useful services. The old joke is that
a pair of wire cutters is the best network security tool. Security engineers like to fix things. This is a natural human reaction. You get that hit of dopamine,
the sense of accomplishment. It's great. And speaking as a security engineer, one of the hardest things to do is to walk past a problem
that you know you can solve. A team is struggling, a customer isn't getting
the right experience, and you can help, but you must not. Because if you spend your
time helping out here, you're not gonna be spending your time working on the bigger, more ambiguous, more important security challenge that you should be working on. This tenet is telling us to stay focused. You can escalate, you can phone a friend. You can make the argument
that this new issue is more critical to the
business than that other one, but we've gotta keep working
on the most important things. On the last slide, I talked
about building relationships with the service teams. These can't be tiny little badge pictures next to correspondence and tickets. They need to be colleagues, real human three-dimensional beings. As these relationships grow, we're gonna empathize with our
friends on the service teams. Another perfectly natural
human reaction is to think, we've cut them a lot of tickets recently. That last on-call shift was really tough. Do I really need to
page them for this one? This highlight here,
first for the business and then for the service teams reminds us who we're working for. It is the goal of the business to drive long-term customer value, leading to a virtuous cycle
of deeper relationships, more usage of our services,
and greater customer value. So in this case, the business
is a proxy for our customers. We have to do what's right for customers. That doesn't mean that we don't empathize with the service teams. In our hypothetical example, you have to cut the pager ticket, but then you can immediately
pick up the phone and reach out to the on-call and make sure they're doing okay, that they have everything they need. You can reach out to the
services general manager, and you can express your
concern about the ticket load and offer to help. But regardless, the prioritization
and urgency of the asks coming from our team, our expectations of the service teams are going to be driven by customer risk, customer expectations, and customer needs. And actionable, in security, it's easy to tell people what to do. We all know what the right things are. Use least-privileged, revoke
old keys, a whole bunch more. Saying it doesn't help much. Anyone that's read more than three or four
security blog posts, they know this stuff. If they care about security,
and security is everyone's job, and if they're competent,
then why aren't they doing it? It's because it's hard, and they don't even
know how to get started. If we expect a team to do something, it has to be actionable. There has to be a clear next step, a path for them to follow. We've talked a bunch about escalation, and this applies to escalation. Eric, the grumpy security engineer that just wants to get stuff done is gonna send an email to a VP that says, "Your team hasn't finished deprecating that old TLS protocol yet,
and you should feel bad." This isn't helpful. It
erodes the relationship. It's not actionable. It's gonna
get a head shake, a shrug. It's gonna get deleted. If instead, Eric, the
AWS Security engineer, sends this VP an email that says, "Of 100 load balancers,
your team has migrated 87. There are 13 remaining. Of those 13, four are
blocked on this feature that's due to be released on this date. That leaves you with nine
actionable load balancers that you should move right now. Here's the list of the nine, and here's a link to the instructions that you should follow." Then I'm helping them do the right thing. I'm guiding them down the path. That second email is actionable. Supported by appropriate tooling. If you have a narrow problem, then you want the owners of
that problem to own the tooling. For example, our hypervisor
team owns their own build tools, their own test tools, and
their own patching tools because our standard tooling
doesn't work for them, and they're the experts. But if you have a broad problem such as patching general
purpose Linux boxes or managing IAM policies, you need to have centrally-owned tooling. If there are a hundred
teams that need to patch, not only is it gonna be really expensive to have all 100 of them build
their own patching tooling, you can have 100 different sets of bugs, 100 sets of subtly different behaviors. It's gonna be a disaster. It will be cheaper and better to invest in a single centrally-owned set of tools. And it's gonna be faster. My nephew was a U.S. Marine
and at the firing range, they taught them that slow is
smooth, and smooth is fast. And that really captures how I think about a lot of things in security. At our scale, you have to learn
how to panic strategically. Slow is smooth, and smooth is fast. If you announce, "Okay,
everyone upgrade TLS, go!", Then it's gonna be satisfying. A few teams are gonna
figure it out quickly. Your numbers are gonna start to move. There's gonna be a lot of
activity. You're doing security! And it's gonna be a mess. You've successfully panicked,
but it's not strategic. If instead you dive deep
on the TLS upgrade problem on how services are using TLS, which libraries or which
services they're using to terminate TLS, how
customers are connecting, what impact this migration
is gonna have on customers, and then you plan and build tooling to support the common use
cases, it's frustrating. Your numbers sit at zero.
There's no progress. Most of the teams are doing nothing, but then the tools become available. They get validated by the early adopters. They get rolled out broadly,
and all of a sudden, there's this tidal wave of progress. As hard as it is to wait for the tooling, this path is faster
than the okay-go method. This is moving with urgency, but it's doing so strategically. It's panicking strategically. One consequence of this is that we're a builder organization. We have more software developers than we do security engineers. That's not to say that we
build all of these tools. For example, our patching tools are owned by our builder
tools organization because patching is a software change, just like any other software deployment, and the same safety testing and
availability concerns apply. Sometimes these tooling
efforts are small and internal, and no one ever hears of them. Sometimes they're major investments that we launch publicly. Delete old unused keys
is one of the reasons that we built access-key-last-used
in our IAM service. Use least-privilege is part of why we have IAM Access Analyzer and
VPC Reachability Analyzer. Rotate all your passwords was
a driver for our SSO service, our single sign-on service. Rather than making it
easier to rotate passwords, just get rid of as many
of them as you can. There are still urgent security issues where we page people in and figure out the path
forward in real time. But this tenet tells us
that we always dive deep, and we provide clear, actionable guidance and support any broad efforts with tools. So the goal of this set of tenets, and really of any set of tenets, is to equip a set of people
to make good decisions, to allocate their time well, and to prioritize the
things that are, to us, the most important. And to be clear, it's
not, how do I train people to make decisions the way that I would, or even the way that Steve would? But how do we give a growing
group of people a framework, a core set of shared values and beliefs? To lend some breadth to this talk, to show how tenets can be used by teams that aren't security teams
and other teams within AWS I've chosen a couple of tenets
from other teams to share. The AWS Cryptography team
owns services like KMS, our Key Management Service; ACM, the Amazon Certificate
Manager; and more. They're also our internal
experts on cryptography. We chatted with Ken
Beer earlier in the day. I love their tenets. And today, we're gonna
look at two of them. Trust is hard to earn and easy to lose. To maintain trust, we prioritize security, durability, and
availability, in that order, over building new features. This tenet is exemplary. It is a super clear expression
of what the team values. A developer, a product manager, a general manager who
faces a tough decision, a summer intern, a new
hire can just read this and know how to make their decisions, can know what is important
to this organization. Durability, we never lose a key, but we will delete a customer's key when the customer asks us to do so. And this tenet, in a
single sentence, says, we're gonna tackle one
of the hardest problems in computer science. Making data like keys durable
is a heavily studied problem. Most of the solutions revolve around keeping multiple copies, ideally on multiple systems
on multiple types of media in multiple locations. This works, but all those techniques also make it harder to delete data. Real deletion means not only
is the key not accessible, it's no longer recoverable
from any media anywhere. Doing either one of these things is hard. Doing both of them in a single
system is a real challenge, and this tenet keeps
the team focused on it. I think that this tenet
is one of the reasons why the KMS team has been as
successful as they have at solving this problem. It keeps the entire team focused on solving both of these
challenges simultaneously. One of the reasons that
I love these tenets is that the team uses
them in conversation. "Our trust tenet says that
we should do this first," or "that would be awesome, and I bet customers would love it, but I don't know how to square that with our durability tenet." They're a part of the daily conversation. They're influencing the
members of the team, spread from team member to team member. As I expect pretty much everyone
watching this already knows, S3, the Simple Storage Service, is our highly durable,
scalable object store. It's one of our oldest services, and it's a foundational building block. And so most of our customers are using it. And as a result, it's one of the largest
distributed systems on earth. At its core, S3 is really simple. Just put and get over HTTP,
pay-as-you-go storage. But real customers with real applications have interesting requirements, and S3 has become so much more than that. scalable, we scale
availability, speed, throughput, capacity, and robustness to
support an unlimited number or variety of web-scale applications. We design our systems to
use scale as an advantage, so that system growth
increases, not decreases, our availability, speed, throughput, capacity, and robustness. This tenet is non-obvious, but once you wrap your brain around it, it is clearly the right
way to think about S3. If you're building for
S3, it's gonna get large. At S3 scale, even seemingly trivial jobs are distributed systems, rather than tiny little Perl scripts. Most systems lose
efficiency as they scale. Clearly N-squared scaling is out, but even N log N scaling can be an issue. You have challenges with
multi-machine coordination, distributed knowledge, network throughput. It's a big, big problem. This tenet tells the S3 team that their designs not only need to scale, but they need to get
better as they get larger. As a simple example, a web server is a single point of failure. If you lose that web server,
you no longer have a website. So you run two web servers. It's great. Now your app can survive
the loss of one of them, but you can't load them up past 50%, because if you lose one, you
lose half of your capacity. As your fleet of web servers
gets larger and larger, the portion of your capacity
that any one server represents get smaller and smaller. The cost of losing a
single host goes down. And as you scale up, you can load the boxes
closer and closer to 100%. It's easier and easier
to take individual hosts out for maintenance to perform upgrades, software deployments, or
even to handle failures. Now, this is a borderline trivial solution because we're dealing with
an inherently very scalable share-nothing web server. It's a completely stateless service, but now apply that thinking
to every layer of S3. Using this tenet, every engineer on S3 immediately thinks about any
of their design scaling to 10, 100, or 1,000 times as large as proposed and how they're gonna get
better as they get larger. When you're working on S3,
that's the right thing to do. First use matters. Our largest customer tomorrow
may be using our services for the first time today. We balance our investments and
aren't afraid to self-disrupt to ensure our services
remain differentiated and compelling for these customers. S3 is the simple storage
service, and it still is, but it also supports an ever-growing set of rich functionality, like
cross-region replication, storage class tiering,
information lifecycle management, encryption, permissions,
retention, and more. This tenet is another
rejection of a false choice. We aren't gonna choose between our large,
experienced mature customers and our small customers
who are just trying out S3 for the first time. We're gonna delight both of them. So here, for your
screenshotting convenience, are all of our tenets on a single slide. These are not the tenets
that we started out with, and they're not gonna be the tenets that we have in the future. Even now, we're talking
about changes to them. It took a few iterations and some discussions with
our senior leadership to get them to where they are now, and we're always open to changing them. These tenets work for us. They express the peculiar
way that AWS Security thinks and engages with our service
teams and with our customers. They may or may not work for you. They may or may not be
a good starting point for your own tenets should
you want to try this out. But I suggest that you do. I'm an engineer. I have
a very technical job. Success in my role involves
getting deep in the details, understanding the gritty
reality of implementations, the internals of our services. Yet, this was an entirely
non-technical talk because as my career has
progressed, as the team has grown, I've realized, as have
many, many before me, that the single longest
lever that I have to pull is to make the team itself more effective, more consistent, and self-perpetuating. If we're gonna keep up with the innovation of our service teams and the
innovation of our customers, we have to make AWS Security
make more AWS Security. When customers ask us how
we do what we do, the tools, the systems, the processes
are all interesting. It's all fun to talk about, but the real underlying
bedrock of our success at scale is our internal culture that keeps us working as a single team making high-quality, high-velocity
distributed decisions. And our tenets are a visible mechanism defining that culture. Thank you so much for the opportunity to
talk with you today. Have a great day.