[MUSIC PLAYING] SARUI SUN: Welcome to
Best Practices for Privacy and Security in GCE. My name is Sarui. I'm a product manager on
the Compute Engine team. CACHE MCWHERTER: I'm Cache. I'm an IAM engineer, and I'm
ready for a 400 level little talk on the theory
of securitrons. SARUI SUN: Well,
Cache, you're going to have to wait just a little
bit because this is a 200 level talk. But we still have a lot of
great content for you today. So I wanted to get started. So over the past years,
we've learned a lot here at Google Cloud in securing
both our internal cloud and the clouds
for our customers. And the goal of
the next 50 minutes is for us to share
some of those learnings for you, to share some
of the best practices we've picked up so
that you don't have to make the same
mistakes that we did, and that you can take
off running when it comes to securing your cloud. So often when we
talk to customers, they'll ask us a question that
sound something like this. They'll say, hey, how do I
check off that security box? How do I make sure my cloud is
completely, completely secure? And to that we say, well,
that's a good question. And Cache and I got in a
room we thought about it for a little bit, and it turns
out it's the craziest thing, you can actually
secure your cloud with just this one weird trick. It's pretty crazy. Are you ready for it? Just kidding,
that's not a thing. It'd be crazy. Wouldn't that be crazy, though? Like this whole talk would
be over in like two minutes. In reality what we
found in over the past years is that
security is a lot more of a dynamic contextual thing. So if you think
about it in terms of securing a physical space
like securing Moscone Center, the correct level of security
or the correct security posture for Moscone Center is going to
depend on a number of factors. It's going to depend
on what part of Moscone are you trying to secure? Are you trying to secure
the front sidewalk or are you trying to
secure the generator room? It also is going
to depend on what's happening on Moscone Center? is this an open house
that anyone can attend, or there's the president in
town and giving a speech? And even more, if you are
in charge of the team that's securing Moscone Center is your
responsibility to just secure this building, or is it to
secure the entire complex, or is it to secure a nationwide
chain of convention centers? That's all going to
be a different part of the calculation
of the equation and it's going to change
how you approach security. And so when you look at
security posture, first of all, it lies on a spectrum. Regardless of what
you're trying to secure, it lies on a spectrum,
and we should recognize that it can be too loose. No matter what you're trying
to secure it can be too loose. It can be the Wild West. Anyone could do anything
and for sure bad things are going to happen there. But it can also be too strict. You can have too many rules,
too many things locked down. And that's bad too, because that
will lead to loss of agility. It might lead to an
antagonistic relationship between your security
teams and everyone else, and a number of other things. And so to start the talk I want
to set aside some principles to help you try to understand
and navigate the spectrum and get to the right spot there. So the first principle
I want to talk about is centralized administration. And this is the idea that if
you as a central security team-- there should be a central
security team that has full awareness of
everything that's going on, or at least the ability to be
fully aware of what's going on, and also has responsibility
and governance and authority to govern over
everything that's going on. Centralized governance,
without that, if you don't have the ability
to know what all is going on or the authority to
enforce what's going on, you're almost
certainly too loose. But what we found is if
you just have centralized administration, you
may end up falling on the opposite end
of the spectrum, especially in a
larger organization. If you have a team of
hundreds or thousands and the responsibility for
securing and tracking down everything that's
happening is falling on the shoulders
of very few people, you may end up on the
opposite side of the spectrum where you're forced to
lock down everything or you're forced to
implement policies that you can't keep track
of because there's just so much going on. And so to that we introduce
a second principle called delegation. And this is the idea
that if you have a centralized administrative
team that they can still delegate administrative
tasks to a separate sub team or some other division
or some other set of folks, and that they can move much
quicker having delegated authority. And what we found is that
between those two principles you're able to get
closer to the center and get into what we call
the Goldilocks zone-- It's a technical term. --where you basically
have the right mix of too loose and too strict. And notice what we've done
here is we've called it a zone. This is not a Goldilocks point,
and that's in recognition that the right place to
pick on the security posture is going to be very
contextually relevant. It's going to depend on
the context for your cloud just as it depended on the
context for Moscone Center. And so what we're going to do
within this presentation is we're going to start with the
building blocks for security, start with some of
the tools there, and we're going to build our
way up to some best practices that we've learned. But by doing so we'll allow
you to kind of pick up the best practices,
some of which you should do
regardless all the time, but the practice of some of
which will depend on what your security context is and
what's the right security posture for your organization. OK. So the first building
block, starting simple, are the identities. This is the set of
users and groups and services that you have
that Google Cloud knows about. Google Cloud knows about them
because Google Cloud Identity, which is our identity
service, knows about them. The quick thing I'll say
here is that we understand that some of you may have your
identities mastered elsewhere. In fact that might
be most of you. A lot of organizations
that we've talked to, they master their identities
in Active Directory. And I just want to call
out that it is possible. We do support you
syncing your identities from wherever you master them
into Google Cloud Identity so that you can
keep mastering them wherever they currently exist. Next, I want to talk
about resources. So these are the things
that you're ultimately trying to protect. When it comes to
Compute Engine, these might be your VMs and your
disks and your subnets and your images. But resources also apply to
any non Google Compute Engine resources inside GCP as well. Resources, of course, roll up
to one and only one project. These are our base
level of grouping. So all resources have
to belong to one project and projects also encapsulate
some other properties like the billing account,
quotas, permissions, and things like that. The next level up
is the organization. So this is an optional concept. You don't have to
have an organization, but we highly, highly
encourage it for reasons that I'll get into for a second. But these will act as the root
node for all GCP resources. So if you set up an
organization everything should roll up into
the organization. And finally, we have folders. These are an additional
optional grouping mechanism. They can contain other
projects or folders, and we found that this is a
great way to organize and make sense of your policies. So we're going to dive a little
bit more abstract in a sec, so bear with me,
but we'll then make it more concrete with some
examples and some best practices to put them together. So once you have the
resource hierarchy in place you can apply policies to
this resource hierarchy. Policies can be applied anywhere
in the resource hierarchy, and once they're
applied, a policy is then passed from the
parent to the child. So for example, if I set
a policy on department x it'll pass down to all the other
sub folders under department x as the folder that'll get
passed down into the projects and into the resources. There we go. There are two types of policies. The first one is
called IAM policies. So these are in the business
of granting permissions to particular identities
for particular resources. The second type is called
organization policies. These are in the business
of creating constraints. Let's talk about
IAM policies first. IAM policies control who is
allowed to do what on which resources. So it takes a set of identities,
like a user or a group or a service account, and then
it takes a set of resources, like a folder or a project
or individual resources, and it says these identities
will have these permissions on these resources. And it does that through
a particular role. So in this particular
example, we might give, for example, Alice
the compute instance admin role on an individual VM. And what that'll do is it'll
give Alice permissions to do instance administrative actions
on that VM, like creating VMs or like creating it,
what's already created, deleting it, starting
it or stopping it. So what I find is
that it helps make things a little more concrete. If we show what this
looks like in the UI, you can of course set
permissions in G Cloud and the APIs as well. But for the purposes
of this presentation, we'll go in the UI. So here I am in the instances
list in Compute Engine. I can select a particular VM. Here I've selected instance
1,000, and now that I specified the resource I just
need to say who can do what on that resource. And so if I click the
Add Members button over in the top right there, I
can answer those two questions. So it'll ask me who do you
want to have this policy, and also what role do you
want to apply to this user? So in this particular
case, I'm going to give Alice this
particular role, and the role I'm going to give
is the instance administrator role. So that's just walking through
what the previous example concretely looked like. Next, the second type
of policy is called the organizational policy. So these are in the business
of placing restrictions on your resources. And some examples
of that are you might want to restrict
the specific types of APIs or a specific set
of services that can be used in a particular
folder or a project. You might want to restrict a set
of VMs that have external IPs, or the third example
there is you might want to restrict a
set of users that can be added to IAM policies. Concretely what
this looks like is if you go into a project or
a folder or an organization and you go into the IAM
section and then the org policy subsection, you can see the full
list of organizational policies that we support. And the important thing here
is we offer a lot of them. We're adding to that
list all the time. And so one thing we recommend
is just when you're starting your Clouds appointment
or even if you've already got going, if you haven't
looked through all of these just take a look. It's kind of like flipping
through your iPhone settings when you first set it up. Here's a concrete
example of that. So suppose I'm an administrator
and I want to do my job well, and part of that is
preventing data exfiltration. So one org policy
that we find is very effective in helping to do
that is the domain restricted sharing org policy. So what I can use this for is I
can dictate that my developers they're not able
to share resources to users outside
of my organization unless I give them
explicit access to do so. Similarly, another vector
for data exfiltration might be external
attacks that go and look for VMs in my organization
with external IP addresses. And so another organization
policy that I can set is one that restricts
the set of VM instances in my organization with
external IP addresses. And so I will have knowledge
about all of the VMs that are in my organization
that have external IPs. OK. So let's put this all together. We're back to the hierarchy. Resources inherit the
policies that we just talked about from their parent. And so we can see
how we can start to apply some of the principles
that we talked about before. So the first principle
I talked about was centralized administration. You need to give away for some
central administrative team to understand what
all is going on and to have some broad
authority over that. So what you can
do, for example, is you can have an IAM policy
which gives the compute viewer role-- this is view
only access to compute resources. --to your
organizational admin group, and then once you apply that
policy to the organization node you'll see that it
flows down to all the folders and projects and resources
within that node automatically. And this will apply to
all the present resources in your organization,
but it'll also apply to all the future
resources in your organization as well. And so you can start to
see how this gives you a bit of an easier time in
doing centralized administration because you can imagine if you
didn't have this organization node, you might be
chasing after thousands of projects that get spun
up in your organization. Secondly, we can achieve the
second principle as well, which is that of delegation. So if you imagine you
have team A over here. Team A has an SRE team and as
a centralized administrative group you want to delegate
instance administration actions to that particular team so you
can set the policy on team A's folder to give instance
admin role to the entire team A SRE group, and that
will also propagate down to all the projects and
resources under team A. And so now you've effectively
delegated permissions for that particular team
for that particular group. The same thing applies
with org policies. So suppose I have a
folder and in that folder is supposed to
categorize a set of workloads that are internal only. I don't want these
workloads to ever have an external IP address,
which I'm showing here as the internal folder. I can set a org policy on
that particular folder, which restricts all VMs
with external IP addresses. And in doing so I'll find that
the org policy automatically flows down to all the
resources and projects in that particular folder. And so here again I'll
have a much easier time of governing my organization. OK. So that was some of
the building blocks. I'm going to turn
it over to Cache, who's going to start building
them up into best practices. CACHE MCWHERTER: Thanks. Now, as you might expect we
talk to a lot of customers. Day in and day out
this is all we do. We try to help them understand
where they're coming from and how to apply the security
tools that we're developing to best solve their
security problems and manage their cloud. And one of the common
problems that we have and we find that customers
have is that they don't know how to
most effectively apply the resource
hierarchy to organize their resources in our cloud. We find that customers will
sometimes not really know what to do and they'll end up
with the sprawl of hundreds or thousands of
projects, all of which have different
policies and settings and so forth where
they have to manage each of these individually. Likewise, other companies think
that maybe the best thing to do is to take their org chart and
sort of push it into the folder structure and use that. We find that the best
hierarchies don't do either of those, and
instead they sort of build off of two concepts. The first concept is delegating
authority as Sarui mentioned, and the second is grouping
resources of like risk together where you can apply
similar security policies to the same sets of resources. To show you what that
looks like here's a simple organization here. A simple company that's decided
to build a couple applications inside of Google
Cloud, app1 and app2. In order to provide delegated
authority they created folders to represent each of
their applications and underneath
the applications-- and this is a representation
of delegating authority to those teams. --they
granted the service teams that manage app1 and
app2 access to those photos. Below the folders there
are separate projects, each of them indicating whether
the resources within are pre-production development
environment resources or whether they're production
resources with critical data. And then this is
a sense grouping resources in accordance to their
security profile and security risk. The pre-fraud environments don't
need to be lockdown super hard. All the developers
can have access to deploy code,
access data, whatever they need to debug and
develop their system. The production environment
on the other hand is going to be locked
down much more tightly. Only the on-callers are
going to have access to fiddle with these projects
and their production systems are going to be responsible
for pushing coding and data to these environments. Once an application
scale is large enough you can imagine continuing
to apply the hierarchy to break down the application
and manage the complexity they're in. Here, for instance,
the application two has broken itself into two
sub-components, a front end and a back end, for instance,
and the administrators of application two were
responsible or delegating authority to the front end
team and the back end team to manage those
projects independently. And they'll set those
things up without having to come back to
the administrator to have to configure it. Now, to look at a more
complicated example using both IAM policy and
organization policy, you can do some
pretty cool things. So here we have a
large organization, fairly complicated. There're numerous teams,
numerous organization, numerous applications
within their hundreds of projects,
potentially thousands. As a security administrator
this makes me nervous. They're all deploying
virtual machines. I have no idea what
they're deploying. I want to make sure that
my company is making sure that the only things that
are running inside those VMs are trusted code. I need to put down an
operating system standard. I need to make sure that it's
running all the daemons, all the logging, all the security
protection infrastructure that I need into those
operating systems. I need to make sure
that all the developers inside my organization
are using those images. So what I'm going to do is I'm
going to hire an image team. I probably have one already. I'm going to pull off a
section of my policy hierarchy and I'm going to grant
access to my image team to manage those images. They're going to create some
images, test them, vet them, and then they're going to
publish them to this project and share those resources with
the rest of my organization. Now I can apply an
organizational policy called Trusted Image Projects to
the rest of my organization to lockdown and make sure that
the only operating systems that only boot disks that
can be used in my compute VMs are created and pushed by
the Golden Images Project. And this allows me to have faith
and confidence in the software that's running in my cloud. I'll wait for a photo. So now that we have
a policy hierarchy, the next big question that
a lot of customers have is, how do I figure
out how to assign the roles to my employees and
my services, and grant access? Now, this is a hard problem. This is fundamentally
the most difficult part of managing access control
and security in our system. You need to essentially
have someone who's capable of making
sense of the system and granting access that's
appropriate for business requirement to the
service or the user that's getting such access. I can't just make
that problem go away. I'm sorry. But we can help
you a little bit. We can help you build a set of
standards for deploying access grants that make sense for you. So as you come to
Google Cloud, you've probably encountered what we
call the primitive roles called owner, editor, and viewer. And these come out of the box. They're extremely
broad grants of access. We don't recommend
using them for anything. It's a production system. They're only there
to get you started and to help you kick
the tires essentially. And the reason that
they're not recommended is because they
don't satisfy what we call the principle
of least privilege, wherein you grant
an access that's commiserate with the minimum
requirements for each employee and service in your system. Now, when you're
designing roles it's important to remember again that
you don't want to specifically identify every single permission
and every single object that the employee is
going to have to access. In general we find there's
like a Goldilocks zone where you are going to grant
reasonable level access that contains all of the sets
of things that are in line with the business
requirements of the user, of the engineer or the
service that you're deploying. You don't want to be
too strict and you don't want to be to loose. Now, nine out of 10
security engineers love our pre-defined roles
to achieve this purpose. They're designed
by Google engineers and PNs based on
feedback we've gotten from all of our customers. They're representative the
kinds of roles in segregations of access and responsibility
that our customers have been asking for. They've been applying
to their own resources. And so, for instance,
here you can see compute has provided
you with a number of roles such as instance admin, network
admin, and load balancer admin, because these are roles
that are naturally managed by separate
individuals in a large number of organizations. The network administrator is
responsible for making sure that the firewalls are locked
down, and so forth and so on. Whereas the instance admin is
responsible for making sure that the Hadoop cluster is
up and running, for instance. Now, that final
security engineer who wasn't happy enough
with the predefined roles is usually satisfied and made
happy with our custom roles. They're used in
special circumstances where the predefined roles
don't line up exactly with your business requirements. For instance, we find
that a lot of customers will combine a number
of predefined roles into a single
role, a custom role so that they can
grant it more easily and make sure that
all sets of accesses are present on any resource
which they grant this access. Another common pattern is
that a security engineer sees a predefined role that exactly
matches their requirements, but there's that
one weird permission inside that they
really don't want, so they want to take that
away from their engineers. Now, when you're granting
access and assigning roles to your
employees and services it's quite easy to figure out
when you've gotten it wrong and you've under provisioned
the access to your resources. The engineers and
the service are going to complain in
one way or another, and you're going to get
a ticket or something to fix the assignment of
authority, fix the grants. But if you've over provisioned
you don't get any notice. No one comes along and tells
you I have too much power. I'm so sorry. Could you take some away? I'm happy to let you know
that we're rolling out tools that are going to help you
adjust these kinds of problems. One is, for instance, called
the IAM role recommender which looks at the access patterns
of users and services in your system, in your cloud,
and based on historical traffic gives you a future prediction
of what kinds of roles you might want to have
instead given to the user. And this allows you to make
more fine grain access control and apply the principle of
these privilege in your cloud at scale. So now another kind of
problem that we see-- and it happens to everyone. It happens to us. It happens to me. It happens to our customers. --they often will get into a
situation in which they'll find a policy that looks like this. And it's hard to make sense
of why this policy looks the way it looks. Here you can see a number of
engineers have different levels of access to this project. I have no idea why any
of these roles exist. Some companies have
constructed spreadsheets to try to keep track of
the historical reasons about why every one of these
grants have been created. That's one strategy. The strategy we
recommend instead is to use what we
call a security group based model which sort
of models the Rbac philosophy. Instead of granting
roles to users individually in an ad hoc
way, what we recommend is you create security
groups that represent the conceptual roles
and responsibilities each engineer might
have to a resource. Here, for instance,
team grilled cheese has been broken down into
data scientists, debuggers, and on-callers. Now I can grant access to that
project to each of these roles and responsibilities
based on what they should have access to. Data scientists get to run
through the prediction engine, debuggers get to
run some queries and access some other
data, and on-callers get to do everything they want. Yeah. And then I can
independently manage the sets of users
and services that are members of the
security groups, and then I can audit each of
these categories independently. It's often usually much easier
to understand whether Cache belongs to team grilled cheese
than it is to decide whether he should have AutoML predictor
on some random project in your organization. Before I go on from role
construction and role assignment I'd like to talk
about one other pattern that we've been seeing
a lot of benefits from. That's break glass success. So one of the
common failure modes that organizations
both inside of Google and outside of Google
encounter with Google Cloud is that senior engineers
and on-call engineers end up getting
lots of privileges on production resources. Often they're an owner or often
they have administrator rights and they carry this as what
we call ambient authority. What this means is that it's
quite easy for an on caller to accidentally visit the
wrong project, for instance, and do something destructive
to that project thinking they might have been
working on another project. This is not a great
situation, especially when your business is on the line. What we recommend instead
is creating some sort of break glass pattern
for this kind of access for privileged access. One way you can do this is
with predefined IAM roles such as a project IAM Admin. Project IAM Admin does
not give you the ability to do anything on a project
except for set policy and change the policy. So instead of wandering
over to the Cloud console and deleting some VMs, they have
to contemplatively first grant themselves access to delete a
VM and then go and delete a VM. We find that the break
glass pattern is actually enhanced and made
a lot more secure by application of automation. And you can use automation such
as Jenkins, Terraform, Cloud Functions, anything
that you have in house to facilitate both the
breaking of the glass, possibly with a ticketing
system or even an audit and review process to make
sure that the escalation of privileges is appropriate,
and then also to repair the glass, to take away the
access when the incidents are resolved. Now that we've dealt
with roles let's deal with securely accessing
my cloud, my VMs. So securely accessing
data to and from my VMs involves three
categories of access. The first thing
I need to do is I need to be able to SSH
into and connect to my VMs and figure out what's
going on sometimes. Second, I need to be able to
get securely from my VMs-- workloads in my VMs
need to securely get to Google APIs to access
storage and workflow engines and all the other
products that we offer. And finally, your VMs
will need to phone home and make calls to your on
prem services or to other VMs or to VMs running in other
clouds, and do so securely. I'm going to go through
each of those in turn here. Oh, wait. Did you feel that, Sarui? It was like a disturbance, as
if a million credentials were just created and
suddenly silenced because they were published to a
GitHub repository in the cloud. Breaks my heart. This is what we
want you to avoid. In particular,
unmanaged credentials is one of the greatest
security threats that we've seen in almost
all cloud activity. When we work with our
customers we strongly encourage you to use
fully managed credentials in every circumstance that you
can, where the platform takes care of rotating credentials,
keeping them in escrow and protecting those
credentials from loss or leakage or publishing to a
repository in the sky. And so do this, first, let's
look at the SSH'ing to my VM scenario. I just deployed a
10,000 VM cluster. It was doing something cool,
trust me, but five of those VMs stopped working and
I need my on-callers to be able to get into
those VMs and figure out what was going wrong. So some customers will
go and push some SSH keys around into those VMs, but
the downside of that approach is that when those
keys are lost or leaked or an employee who had access
to a key leaves your company, you're now stuck
with a key that's leaked that an employee
that left your company might still be able
to SSH into your VMs. Instead, we'd like to recommend
that our customers use what we call OS Login. It's a product in
the Compute system which you can enable by--
when you create your VMs, you can enable it
by setting metadata on your virtual machine. You enable OS Login
and enable OS Login two-factor authentication,
which requires that your engineers provide
a security key whenever they SSH into a VM. And then you can control
access to your virtual machines using IAM permissions, using the
OS Login, or the OS admin login roles. And the difference being
whether you let them log in as themselves
or as root users. And if you put
this all together, you have complete assurance
that all access to that VM is tied to the user lifecycle. When the user leaves your-- when my [? on-caller ?]
leaves my company, for whatever reason, the access
to the key that they have doesn't matter anymore. They can take that key
and publish it to GitHub and it still doesn't give
them access to my VMs. Next, we want to talk about
authenticating my VM securely to my Google APIs and
my Google resources. To achieve this
task, what we do is what we call binding a service
account to a virtual machine. The first step is to
create a service identity in the Google Cloud console. The second step is to
grant that service identity access to all the things that I
think my workload needs access to. Third and finally, when I'm
creating the virtual machine I select which
service account I want that virtual machine to run as. Every access that's made with
client libraries running inside of that virtual
machine authenticates the cloud as the service
account identity. The long-term keys are held
in escrow and the only things that the workload in
the virtual machine gets is a short-term
access token, which can be used to apply to API calls. Now, one of the
common difficulties with using this feature is
something called access scopes. It's something that
you're required to select when using the feature. It is a great source of
confusion to customers, we find. The default is set to allow
default access, which isn't always the easiest to use. We recommend instead
using "allow full access to cloud APIs" and what we
call the Cloud Platform scope. This ensures that you have
easy access to call APIs. It's a tool to effectively
make client library integration simpler. And then use IAM
permissions and IAM roles granted to the service account
as your security feature. Now, talking about VMs
that need to phone home. A lot of times, you're
going to be running only part of your workload
on Compute and some of it's going to be running elsewhere. To address that
particular problem, we use the same approach
that we used for calling into Google APIs previously. You'd create a service account,
you attach it to the VM when you've constructed it. The code in the VM can then
get an OIDC Access Token, which is an industry standard
compliant OpenID Connect token that can be
verified by verifying the signature in the token with
the OIDC endpoint for Google. And the token contains
all the information that you could
ever possibly need to authorize access to this VM. It contains the
service identity, it contains the project ID, the
zone, the VM name, the creation date, everything you
possibly ever need. Again, it's
standards-compliant so it works with all open source
OIDC-compliant libraries. It gives you strong
proof of identity, and again, you don't have
to manage credentials. So everyone's safe. Next, I'd like to talk about
other kinds of threats. Other kinds of
organizational threats to your virtual machine fleet. In particular,
rootkits and bootkits. When you have people
who can SSH in or malicious software
that can hack its way into the
front door, you always have a risk that the software
that you're running in your VM fleet is not the
software you think it is unless you have a strategy
to make sure that it is. And the approach to solve
that-- what you can use is a product we call Shielded
VMs, which just launched to GA. It allows for a
trusted chain of boot from the virtual TPM of the
VM through the firmware, through the operating
system, to make sure that all the software running
in the operating system has been vetted and vouched for. If you have a rootkit that
installs random malware into a kernel module in
your operating system, we'll detect it and we'll
notify you and tell you which VMs have
been hacked so you can take remediative action. To use this feature, you
can go to the boot disk operating system
images on Compute and you can filter
by the Shielded VM provided operating
system images. And then when you
create your VM, you turn on the
Shielded VM option. And you can notice
that this is an option, and so you might worry, maybe
my developers cannot create a Shielded VM. And that's why we recommend
that you combine that with the trusted boot pattern
that we showed you before and organizational policies. You can apply the same pattern
that we showed you before, where you can have your
operating system images being vetted and curated by your
images team, published to your organization. And then you can apply
an organizational policy to the rest of your
organization to both require that all VMs be created with
the Shielded VM feature. And also, to ensure that
the only ones that are used are the ones that you've
vetted and approved. And again, to turn
that on you just go to the organization policy
control panel in your Cloud console and turn on Shielded
VMs and your trusted image projects settings. Should I go back? Got to take a picture. [INAUDIBLE] OK. Finally, I'd like to
talk about setting up secure-by-default
behaviors for your Cloud. Like I said before, we are
sometimes in a tough spot because we're trying
to sell a Cloud. And to sell a Cloud, we need
to make it easy to use and also secure at the same time. To achieve this,
oftentimes the defaults will come out of the box
with things easy to use. You'll get the owner permissions
when you create a project so you don't have to worry about
figuring out the permission model just to deploy a function. We do recommend that you
secure those projects after you created them,
especially when they're production resources. Now, to achieve
this, what we find is a factory pattern
works wonders. And every one of our customers
that cares about security seems to apply this mechanism
in one way or another. The basic idea is, again, to
set up a workflow using Jenkins or Terraform or Cloud
Function or whatever it is, where your
employees, your developers, can click on a button
hosted by you that creates the resource in GCP. And then sets up all the
configuration exactly as you need it. So for instance,
to achieve this, you can first go to your
organization policy, your IAM policy. There, you'll find
a policy grant that enables everyone
in your domain to be able to create a project. You can disable that. Take that permission away. Instead, you can grant that
permission to your workflow engines that are allowed to
create vetted, secure projects. Again, you can use
Jenkins and Terraform. Anything you want. And for us, some starting
points for thinking about how you'd like to
make your default projects-- you could, for instance, delete
the default Compute service account. You can neuter its
permissions, which start out with fairly
broad editor permissions. You can delete the VPC network
that is created on the project and instead connect that
project to a secured shared VPC that you use for your
entire organization or for sections of
your organization. You can attach billing
accounts to make sure that all the costs centering
goes to the right billing account. You can set up audit
logs and billing exports to make sure that you have
full visibility as to what's going on in those projects
when you unleash it to your developers. And again, this gives you
centralized governance and control and guide
rails for your developers. And it eliminates
all the challenges of making your developers
get easy access-- rapid access-- to a
production environment, or to develop their code,
or to create a production environment. And with that, I'll
hand it over to Sirui to tell you more about
keeping track of what your developers are doing. SIRUI SUN: Thanks, Cache. So, another type of question
we get from customers a lot is, how do I more easily start
to understand everything that's going on in my organization? And this is a key part of
centralized administration because in the
beginning, we talked about how you can
extend permissions to your organization,
to your org admin team, for example, to view everything. But that's not the
same thing as being able to look over that
content in an easy way to analyze what all is going on. And so I want to start by
talking about some of the tools that we offer to start giving
you that kind of 360 degree view of your organization. So, the first tool I
want to talk about-- this has been
around for a while-- are audit logs. And I don't have time to get too
deep into the nuances of audit logs, but I will talk
about a few types that are really helpful for
this particular use case. So, the first type
I want to talk about are admin activity logs. So, these logs are
generated whenever an administrative
action is taken on both the GCE resource but
also most other types of GCP resources. So for example, if I
had a user named Alice and she created a VM in
a particular project, we'll create an admin activity
log automatically for you. And that will include
what type of VM it was, what time it
happened, and other details that give you more
visibility into what exactly was happening. These audit logs are enabled
by default in all projects. In addition, they are
generated in real time. So they really give
you that up-to-date, up to the second details
about what all is going on in your organization. A second type of
audit log that's important to know at this point
is the data access audit log. So, these talk about data
reads and data accesses. These tend to be a little bit
more verbose so they're not turned on by
default. You actually have to go and turn
them on yourself but when you do so,
that information will start flowing in. And so in the use
cases where you're really interested in knowing
even about the reads, we highly suggest that
you turn these on. So, with audit logs you get kind
of this moment-by-moment set of events for your
entire organization. And we release these and
we go to our customers and they say,
well, this is good, but it's difficult sometimes
to understand the precise state of your world at a
particular moment in time if I just have a set
of events leading up to that particular
moment, right? And it's possible to
construct that from audit logs but it gets a little tricky. And so in response
to that feedback, we've actually recently
released a second tool that we call the
Cloud Asset Inventory. And this tool answers
the question of, at a particular moment in time,
what resources and policies were there in my organization,
in a particular folder, or in a particular project? This tool, this particular
API, will actually look back in the last five
weeks so it has knowledge of the last five weeks. And so you can ask a
question like, hey, last Saturday at 2:00 AM when
I had a security incident or something like that, what
were the VMs that were running in my organization? And what were the policies
applied on those VMs? And so you can start to do those
forensics-type investigations or you can start--
you could imagine you could use it for any
other number of use cases. And it's great for
printing out that state. Because it has knowledge of your
environment for the last five weeks, it can also
tell you about how a particular resource changed
over the past five weeks. So, kind of like a
scrubbing over time, you can see how a VM
might have evolved. You can see who gained or
lost access to that VM. And so you can think of these
as the two building blocks to start getting an
understanding of what's going on in your organization. I won't delve too much
deeper here, other than to say that we'll be
going much deeper into this at tomorrow's session called
Best Practices for GCE Enterprise Deployments. Now, having released
both of these tools, what we see a lot of times
is that customers will go through and build
very similar solutions on top of these building blocks. They'll build solutions
to go and list out all of the assets in
their organization and they might also scan
the events and the assets in their organization for
things that they don't want to have happen, right? Scan for vulnerabilities,
scan for that the right set of firewall rules are
set up, et cetera. And so in response
to that, we wanted to deliver some of that
functionality out of the box. And so we recently released to
beta the Cloud Security Command Center. This is going to provide a
lot of that functionality out of the box. It's going to be a single
pane of glass that's going to let you understand
what resources are in your organization, let you
explore what resources are in your organization, and also,
it'll proactively be scanning for vulnerabilities and
proactively informing you of, for example, if you have
suspicious traffic going on, if you have VMs with insecure
firewalls and things like that. So based on the feedback we
got from what type of solutions our customers built, based
on our building blocks, we've built some of that
out of the box for you. And if you want to
go further if you want to modify that
or change something based on your
security posture, you can always go back to
your building blocks and build that out from there. Last section. I want to talk a little
bit about the tools that we provide you
as our customers to ensure your
privacy and to ensure that you're in full control
over your data in Google Cloud. So one thing we hear
a lot from customers, especially those in more
tightly regulated industries like banking, is that they want
full control over their data that they upload to the cloud. It might be very sensitive. They might be beholden to
regulatory needs to do so. And one way that this translates
is, most of the times customers ask for more control over
how their data is encrypted in the cloud so that they
can manage that encryption and so that they could,
for example, take away the encryption keys protecting
their data so that no one has access to that data anymore. Not even us, Google. So, the first thing
I want to talk I want to say
before I proceed is that by default all
data in Google Cloud is encrypted at rest
regardless of configuration. So even if you go in
and you don't configure any of this data, any
of this encryption, we'll still do it for
you automatically. But we do give you the
tools to go further. A best practice that's
quickly developing is that customers can use
another managed service that Google Cloud provides
called Google Key Management Service. This is a managed service that
allows you to manage keys. So, you can create keys, rotate
keys, delete them at any time. And then you can
apply these keys to your disks, to your
images and snapshots, and any other content
that you store in GCE. And we'll start using
that key that you've managed to go ahead and
start encrypting your data. I should also note that
this whole flow is supported with-- it's integrated with
IAM and audit logging, so all the tools that
we just talked about. So you can see who's
doing the encryption and where these keys are. And Cloud KMS has
a broad support for all of the set
of security standards that we've been hearing are
required from our customers. So, what this looks like. You would start by creating
a key and a key ring in Cloud KMS. That's a very quick task. It should just take
you a few minutes. And you can specify what the
rotation period of this key, what encryption standard
it uses, et cetera. And then when you go and
create a GCE disk or image or snapshot, you will just
point GCE to that particular key that you've just created. And just like that, we'll
start using that particular key to protect your data. And so what that means
is as-needed later you can go revoke or
disable access to that key. This is what it looks
like both in the UI and in the G Cloud
command line tool. And from there, once you've
disabled access what you'll find is that GCE will no longer
have access to that information at all. So if you try to attach a disk
that's protected by the revoked key to a VM you'll see
that that action fails and that will fail until you
restore access to the key. And so in this way
you have full control over who has access to
your data on Google Cloud. Another question that we get
from customers is, well hey, you're Google, how do I
know that you're not using my data for other means, right? How do I know you're not
using it to train some-- yeah, you know, to advertise
or something like that. Now, here I want to say
again that at Google Cloud, we do not access your
data for any other reason than those necessary to fulfill
our contractual obligations to you. And in fact, when we
looked at the data internally, over 99% of when
we do these data accesses, they're to fulfill
support requests. So it's where
customers are actually asking us to do these
to do these actions. But you don't have to
take our word for it. We have a feature called
Access Transparency logs. They're another type of audit
logs that we just talked about. And what they do
is they'll tell you in near-real-time and
without cost any time Google accesses your
data for any reason. All right. The one caveat
here is that these are available for only a certain
subset of our support packages. And you can find out
more details online. But they're very easy to enable. So as long as you have the right
permissions, that's step one-- we have a particular role
for this Access Transparency admin-- you can just go into a
particular project folder organization. And you'll find a little
toggle in IAM admin settings. And from there you
can go and you'll just see that however you're
getting your logs, maybe it's through the
Stackdriver Logging Portal, you'll start seeing these
Access Transparency logs. So, what does that look like? Here's an example of an
Access Transparency log. It will be generated
once for every action that we take on your data. And it will tell
you-- in yellow, you can see it tells
you the reason, so why did we access your data? Here in particular,
we're doing this because of a customer-initiated
support request. And we'll attach the
case number for you. And then in green you can
see what exactly Google did. So you can see the resource
that we were operating on, the API calls that we
made when we did this, and a number of other
details relevant to the case. And so, you don't have to take
our word for what I just said. You can actually go in
and verify yourselves. And so with that, that brings
us to the end of our talk. We started by laying
down some principles, and hopefully we've
been able to build up to a set of best practices
to go apply those principles. Thank you very much. [APPLAUSE]