Best Practices for Privacy and Security in GCE (Cloud Next '19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] SARUI SUN: Welcome to Best Practices for Privacy and Security in GCE. My name is Sarui. I'm a product manager on the Compute Engine team. CACHE MCWHERTER: I'm Cache. I'm an IAM engineer, and I'm ready for a 400 level little talk on the theory of securitrons. SARUI SUN: Well, Cache, you're going to have to wait just a little bit because this is a 200 level talk. But we still have a lot of great content for you today. So I wanted to get started. So over the past years, we've learned a lot here at Google Cloud in securing both our internal cloud and the clouds for our customers. And the goal of the next 50 minutes is for us to share some of those learnings for you, to share some of the best practices we've picked up so that you don't have to make the same mistakes that we did, and that you can take off running when it comes to securing your cloud. So often when we talk to customers, they'll ask us a question that sound something like this. They'll say, hey, how do I check off that security box? How do I make sure my cloud is completely, completely secure? And to that we say, well, that's a good question. And Cache and I got in a room we thought about it for a little bit, and it turns out it's the craziest thing, you can actually secure your cloud with just this one weird trick. It's pretty crazy. Are you ready for it? Just kidding, that's not a thing. It'd be crazy. Wouldn't that be crazy, though? Like this whole talk would be over in like two minutes. In reality what we found in over the past years is that security is a lot more of a dynamic contextual thing. So if you think about it in terms of securing a physical space like securing Moscone Center, the correct level of security or the correct security posture for Moscone Center is going to depend on a number of factors. It's going to depend on what part of Moscone are you trying to secure? Are you trying to secure the front sidewalk or are you trying to secure the generator room? It also is going to depend on what's happening on Moscone Center? is this an open house that anyone can attend, or there's the president in town and giving a speech? And even more, if you are in charge of the team that's securing Moscone Center is your responsibility to just secure this building, or is it to secure the entire complex, or is it to secure a nationwide chain of convention centers? That's all going to be a different part of the calculation of the equation and it's going to change how you approach security. And so when you look at security posture, first of all, it lies on a spectrum. Regardless of what you're trying to secure, it lies on a spectrum, and we should recognize that it can be too loose. No matter what you're trying to secure it can be too loose. It can be the Wild West. Anyone could do anything and for sure bad things are going to happen there. But it can also be too strict. You can have too many rules, too many things locked down. And that's bad too, because that will lead to loss of agility. It might lead to an antagonistic relationship between your security teams and everyone else, and a number of other things. And so to start the talk I want to set aside some principles to help you try to understand and navigate the spectrum and get to the right spot there. So the first principle I want to talk about is centralized administration. And this is the idea that if you as a central security team-- there should be a central security team that has full awareness of everything that's going on, or at least the ability to be fully aware of what's going on, and also has responsibility and governance and authority to govern over everything that's going on. Centralized governance, without that, if you don't have the ability to know what all is going on or the authority to enforce what's going on, you're almost certainly too loose. But what we found is if you just have centralized administration, you may end up falling on the opposite end of the spectrum, especially in a larger organization. If you have a team of hundreds or thousands and the responsibility for securing and tracking down everything that's happening is falling on the shoulders of very few people, you may end up on the opposite side of the spectrum where you're forced to lock down everything or you're forced to implement policies that you can't keep track of because there's just so much going on. And so to that we introduce a second principle called delegation. And this is the idea that if you have a centralized administrative team that they can still delegate administrative tasks to a separate sub team or some other division or some other set of folks, and that they can move much quicker having delegated authority. And what we found is that between those two principles you're able to get closer to the center and get into what we call the Goldilocks zone-- It's a technical term. --where you basically have the right mix of too loose and too strict. And notice what we've done here is we've called it a zone. This is not a Goldilocks point, and that's in recognition that the right place to pick on the security posture is going to be very contextually relevant. It's going to depend on the context for your cloud just as it depended on the context for Moscone Center. And so what we're going to do within this presentation is we're going to start with the building blocks for security, start with some of the tools there, and we're going to build our way up to some best practices that we've learned. But by doing so we'll allow you to kind of pick up the best practices, some of which you should do regardless all the time, but the practice of some of which will depend on what your security context is and what's the right security posture for your organization. OK. So the first building block, starting simple, are the identities. This is the set of users and groups and services that you have that Google Cloud knows about. Google Cloud knows about them because Google Cloud Identity, which is our identity service, knows about them. The quick thing I'll say here is that we understand that some of you may have your identities mastered elsewhere. In fact that might be most of you. A lot of organizations that we've talked to, they master their identities in Active Directory. And I just want to call out that it is possible. We do support you syncing your identities from wherever you master them into Google Cloud Identity so that you can keep mastering them wherever they currently exist. Next, I want to talk about resources. So these are the things that you're ultimately trying to protect. When it comes to Compute Engine, these might be your VMs and your disks and your subnets and your images. But resources also apply to any non Google Compute Engine resources inside GCP as well. Resources, of course, roll up to one and only one project. These are our base level of grouping. So all resources have to belong to one project and projects also encapsulate some other properties like the billing account, quotas, permissions, and things like that. The next level up is the organization. So this is an optional concept. You don't have to have an organization, but we highly, highly encourage it for reasons that I'll get into for a second. But these will act as the root node for all GCP resources. So if you set up an organization everything should roll up into the organization. And finally, we have folders. These are an additional optional grouping mechanism. They can contain other projects or folders, and we found that this is a great way to organize and make sense of your policies. So we're going to dive a little bit more abstract in a sec, so bear with me, but we'll then make it more concrete with some examples and some best practices to put them together. So once you have the resource hierarchy in place you can apply policies to this resource hierarchy. Policies can be applied anywhere in the resource hierarchy, and once they're applied, a policy is then passed from the parent to the child. So for example, if I set a policy on department x it'll pass down to all the other sub folders under department x as the folder that'll get passed down into the projects and into the resources. There we go. There are two types of policies. The first one is called IAM policies. So these are in the business of granting permissions to particular identities for particular resources. The second type is called organization policies. These are in the business of creating constraints. Let's talk about IAM policies first. IAM policies control who is allowed to do what on which resources. So it takes a set of identities, like a user or a group or a service account, and then it takes a set of resources, like a folder or a project or individual resources, and it says these identities will have these permissions on these resources. And it does that through a particular role. So in this particular example, we might give, for example, Alice the compute instance admin role on an individual VM. And what that'll do is it'll give Alice permissions to do instance administrative actions on that VM, like creating VMs or like creating it, what's already created, deleting it, starting it or stopping it. So what I find is that it helps make things a little more concrete. If we show what this looks like in the UI, you can of course set permissions in G Cloud and the APIs as well. But for the purposes of this presentation, we'll go in the UI. So here I am in the instances list in Compute Engine. I can select a particular VM. Here I've selected instance 1,000, and now that I specified the resource I just need to say who can do what on that resource. And so if I click the Add Members button over in the top right there, I can answer those two questions. So it'll ask me who do you want to have this policy, and also what role do you want to apply to this user? So in this particular case, I'm going to give Alice this particular role, and the role I'm going to give is the instance administrator role. So that's just walking through what the previous example concretely looked like. Next, the second type of policy is called the organizational policy. So these are in the business of placing restrictions on your resources. And some examples of that are you might want to restrict the specific types of APIs or a specific set of services that can be used in a particular folder or a project. You might want to restrict a set of VMs that have external IPs, or the third example there is you might want to restrict a set of users that can be added to IAM policies. Concretely what this looks like is if you go into a project or a folder or an organization and you go into the IAM section and then the org policy subsection, you can see the full list of organizational policies that we support. And the important thing here is we offer a lot of them. We're adding to that list all the time. And so one thing we recommend is just when you're starting your Clouds appointment or even if you've already got going, if you haven't looked through all of these just take a look. It's kind of like flipping through your iPhone settings when you first set it up. Here's a concrete example of that. So suppose I'm an administrator and I want to do my job well, and part of that is preventing data exfiltration. So one org policy that we find is very effective in helping to do that is the domain restricted sharing org policy. So what I can use this for is I can dictate that my developers they're not able to share resources to users outside of my organization unless I give them explicit access to do so. Similarly, another vector for data exfiltration might be external attacks that go and look for VMs in my organization with external IP addresses. And so another organization policy that I can set is one that restricts the set of VM instances in my organization with external IP addresses. And so I will have knowledge about all of the VMs that are in my organization that have external IPs. OK. So let's put this all together. We're back to the hierarchy. Resources inherit the policies that we just talked about from their parent. And so we can see how we can start to apply some of the principles that we talked about before. So the first principle I talked about was centralized administration. You need to give away for some central administrative team to understand what all is going on and to have some broad authority over that. So what you can do, for example, is you can have an IAM policy which gives the compute viewer role-- this is view only access to compute resources. --to your organizational admin group, and then once you apply that policy to the organization node you'll see that it flows down to all the folders and projects and resources within that node automatically. And this will apply to all the present resources in your organization, but it'll also apply to all the future resources in your organization as well. And so you can start to see how this gives you a bit of an easier time in doing centralized administration because you can imagine if you didn't have this organization node, you might be chasing after thousands of projects that get spun up in your organization. Secondly, we can achieve the second principle as well, which is that of delegation. So if you imagine you have team A over here. Team A has an SRE team and as a centralized administrative group you want to delegate instance administration actions to that particular team so you can set the policy on team A's folder to give instance admin role to the entire team A SRE group, and that will also propagate down to all the projects and resources under team A. And so now you've effectively delegated permissions for that particular team for that particular group. The same thing applies with org policies. So suppose I have a folder and in that folder is supposed to categorize a set of workloads that are internal only. I don't want these workloads to ever have an external IP address, which I'm showing here as the internal folder. I can set a org policy on that particular folder, which restricts all VMs with external IP addresses. And in doing so I'll find that the org policy automatically flows down to all the resources and projects in that particular folder. And so here again I'll have a much easier time of governing my organization. OK. So that was some of the building blocks. I'm going to turn it over to Cache, who's going to start building them up into best practices. CACHE MCWHERTER: Thanks. Now, as you might expect we talk to a lot of customers. Day in and day out this is all we do. We try to help them understand where they're coming from and how to apply the security tools that we're developing to best solve their security problems and manage their cloud. And one of the common problems that we have and we find that customers have is that they don't know how to most effectively apply the resource hierarchy to organize their resources in our cloud. We find that customers will sometimes not really know what to do and they'll end up with the sprawl of hundreds or thousands of projects, all of which have different policies and settings and so forth where they have to manage each of these individually. Likewise, other companies think that maybe the best thing to do is to take their org chart and sort of push it into the folder structure and use that. We find that the best hierarchies don't do either of those, and instead they sort of build off of two concepts. The first concept is delegating authority as Sarui mentioned, and the second is grouping resources of like risk together where you can apply similar security policies to the same sets of resources. To show you what that looks like here's a simple organization here. A simple company that's decided to build a couple applications inside of Google Cloud, app1 and app2. In order to provide delegated authority they created folders to represent each of their applications and underneath the applications-- and this is a representation of delegating authority to those teams. --they granted the service teams that manage app1 and app2 access to those photos. Below the folders there are separate projects, each of them indicating whether the resources within are pre-production development environment resources or whether they're production resources with critical data. And then this is a sense grouping resources in accordance to their security profile and security risk. The pre-fraud environments don't need to be lockdown super hard. All the developers can have access to deploy code, access data, whatever they need to debug and develop their system. The production environment on the other hand is going to be locked down much more tightly. Only the on-callers are going to have access to fiddle with these projects and their production systems are going to be responsible for pushing coding and data to these environments. Once an application scale is large enough you can imagine continuing to apply the hierarchy to break down the application and manage the complexity they're in. Here, for instance, the application two has broken itself into two sub-components, a front end and a back end, for instance, and the administrators of application two were responsible or delegating authority to the front end team and the back end team to manage those projects independently. And they'll set those things up without having to come back to the administrator to have to configure it. Now, to look at a more complicated example using both IAM policy and organization policy, you can do some pretty cool things. So here we have a large organization, fairly complicated. There're numerous teams, numerous organization, numerous applications within their hundreds of projects, potentially thousands. As a security administrator this makes me nervous. They're all deploying virtual machines. I have no idea what they're deploying. I want to make sure that my company is making sure that the only things that are running inside those VMs are trusted code. I need to put down an operating system standard. I need to make sure that it's running all the daemons, all the logging, all the security protection infrastructure that I need into those operating systems. I need to make sure that all the developers inside my organization are using those images. So what I'm going to do is I'm going to hire an image team. I probably have one already. I'm going to pull off a section of my policy hierarchy and I'm going to grant access to my image team to manage those images. They're going to create some images, test them, vet them, and then they're going to publish them to this project and share those resources with the rest of my organization. Now I can apply an organizational policy called Trusted Image Projects to the rest of my organization to lockdown and make sure that the only operating systems that only boot disks that can be used in my compute VMs are created and pushed by the Golden Images Project. And this allows me to have faith and confidence in the software that's running in my cloud. I'll wait for a photo. So now that we have a policy hierarchy, the next big question that a lot of customers have is, how do I figure out how to assign the roles to my employees and my services, and grant access? Now, this is a hard problem. This is fundamentally the most difficult part of managing access control and security in our system. You need to essentially have someone who's capable of making sense of the system and granting access that's appropriate for business requirement to the service or the user that's getting such access. I can't just make that problem go away. I'm sorry. But we can help you a little bit. We can help you build a set of standards for deploying access grants that make sense for you. So as you come to Google Cloud, you've probably encountered what we call the primitive roles called owner, editor, and viewer. And these come out of the box. They're extremely broad grants of access. We don't recommend using them for anything. It's a production system. They're only there to get you started and to help you kick the tires essentially. And the reason that they're not recommended is because they don't satisfy what we call the principle of least privilege, wherein you grant an access that's commiserate with the minimum requirements for each employee and service in your system. Now, when you're designing roles it's important to remember again that you don't want to specifically identify every single permission and every single object that the employee is going to have to access. In general we find there's like a Goldilocks zone where you are going to grant reasonable level access that contains all of the sets of things that are in line with the business requirements of the user, of the engineer or the service that you're deploying. You don't want to be too strict and you don't want to be to loose. Now, nine out of 10 security engineers love our pre-defined roles to achieve this purpose. They're designed by Google engineers and PNs based on feedback we've gotten from all of our customers. They're representative the kinds of roles in segregations of access and responsibility that our customers have been asking for. They've been applying to their own resources. And so, for instance, here you can see compute has provided you with a number of roles such as instance admin, network admin, and load balancer admin, because these are roles that are naturally managed by separate individuals in a large number of organizations. The network administrator is responsible for making sure that the firewalls are locked down, and so forth and so on. Whereas the instance admin is responsible for making sure that the Hadoop cluster is up and running, for instance. Now, that final security engineer who wasn't happy enough with the predefined roles is usually satisfied and made happy with our custom roles. They're used in special circumstances where the predefined roles don't line up exactly with your business requirements. For instance, we find that a lot of customers will combine a number of predefined roles into a single role, a custom role so that they can grant it more easily and make sure that all sets of accesses are present on any resource which they grant this access. Another common pattern is that a security engineer sees a predefined role that exactly matches their requirements, but there's that one weird permission inside that they really don't want, so they want to take that away from their engineers. Now, when you're granting access and assigning roles to your employees and services it's quite easy to figure out when you've gotten it wrong and you've under provisioned the access to your resources. The engineers and the service are going to complain in one way or another, and you're going to get a ticket or something to fix the assignment of authority, fix the grants. But if you've over provisioned you don't get any notice. No one comes along and tells you I have too much power. I'm so sorry. Could you take some away? I'm happy to let you know that we're rolling out tools that are going to help you adjust these kinds of problems. One is, for instance, called the IAM role recommender which looks at the access patterns of users and services in your system, in your cloud, and based on historical traffic gives you a future prediction of what kinds of roles you might want to have instead given to the user. And this allows you to make more fine grain access control and apply the principle of these privilege in your cloud at scale. So now another kind of problem that we see-- and it happens to everyone. It happens to us. It happens to me. It happens to our customers. --they often will get into a situation in which they'll find a policy that looks like this. And it's hard to make sense of why this policy looks the way it looks. Here you can see a number of engineers have different levels of access to this project. I have no idea why any of these roles exist. Some companies have constructed spreadsheets to try to keep track of the historical reasons about why every one of these grants have been created. That's one strategy. The strategy we recommend instead is to use what we call a security group based model which sort of models the Rbac philosophy. Instead of granting roles to users individually in an ad hoc way, what we recommend is you create security groups that represent the conceptual roles and responsibilities each engineer might have to a resource. Here, for instance, team grilled cheese has been broken down into data scientists, debuggers, and on-callers. Now I can grant access to that project to each of these roles and responsibilities based on what they should have access to. Data scientists get to run through the prediction engine, debuggers get to run some queries and access some other data, and on-callers get to do everything they want. Yeah. And then I can independently manage the sets of users and services that are members of the security groups, and then I can audit each of these categories independently. It's often usually much easier to understand whether Cache belongs to team grilled cheese than it is to decide whether he should have AutoML predictor on some random project in your organization. Before I go on from role construction and role assignment I'd like to talk about one other pattern that we've been seeing a lot of benefits from. That's break glass success. So one of the common failure modes that organizations both inside of Google and outside of Google encounter with Google Cloud is that senior engineers and on-call engineers end up getting lots of privileges on production resources. Often they're an owner or often they have administrator rights and they carry this as what we call ambient authority. What this means is that it's quite easy for an on caller to accidentally visit the wrong project, for instance, and do something destructive to that project thinking they might have been working on another project. This is not a great situation, especially when your business is on the line. What we recommend instead is creating some sort of break glass pattern for this kind of access for privileged access. One way you can do this is with predefined IAM roles such as a project IAM Admin. Project IAM Admin does not give you the ability to do anything on a project except for set policy and change the policy. So instead of wandering over to the Cloud console and deleting some VMs, they have to contemplatively first grant themselves access to delete a VM and then go and delete a VM. We find that the break glass pattern is actually enhanced and made a lot more secure by application of automation. And you can use automation such as Jenkins, Terraform, Cloud Functions, anything that you have in house to facilitate both the breaking of the glass, possibly with a ticketing system or even an audit and review process to make sure that the escalation of privileges is appropriate, and then also to repair the glass, to take away the access when the incidents are resolved. Now that we've dealt with roles let's deal with securely accessing my cloud, my VMs. So securely accessing data to and from my VMs involves three categories of access. The first thing I need to do is I need to be able to SSH into and connect to my VMs and figure out what's going on sometimes. Second, I need to be able to get securely from my VMs-- workloads in my VMs need to securely get to Google APIs to access storage and workflow engines and all the other products that we offer. And finally, your VMs will need to phone home and make calls to your on prem services or to other VMs or to VMs running in other clouds, and do so securely. I'm going to go through each of those in turn here. Oh, wait. Did you feel that, Sarui? It was like a disturbance, as if a million credentials were just created and suddenly silenced because they were published to a GitHub repository in the cloud. Breaks my heart. This is what we want you to avoid. In particular, unmanaged credentials is one of the greatest security threats that we've seen in almost all cloud activity. When we work with our customers we strongly encourage you to use fully managed credentials in every circumstance that you can, where the platform takes care of rotating credentials, keeping them in escrow and protecting those credentials from loss or leakage or publishing to a repository in the sky. And so do this, first, let's look at the SSH'ing to my VM scenario. I just deployed a 10,000 VM cluster. It was doing something cool, trust me, but five of those VMs stopped working and I need my on-callers to be able to get into those VMs and figure out what was going wrong. So some customers will go and push some SSH keys around into those VMs, but the downside of that approach is that when those keys are lost or leaked or an employee who had access to a key leaves your company, you're now stuck with a key that's leaked that an employee that left your company might still be able to SSH into your VMs. Instead, we'd like to recommend that our customers use what we call OS Login. It's a product in the Compute system which you can enable by-- when you create your VMs, you can enable it by setting metadata on your virtual machine. You enable OS Login and enable OS Login two-factor authentication, which requires that your engineers provide a security key whenever they SSH into a VM. And then you can control access to your virtual machines using IAM permissions, using the OS Login, or the OS admin login roles. And the difference being whether you let them log in as themselves or as root users. And if you put this all together, you have complete assurance that all access to that VM is tied to the user lifecycle. When the user leaves your-- when my [? on-caller ?] leaves my company, for whatever reason, the access to the key that they have doesn't matter anymore. They can take that key and publish it to GitHub and it still doesn't give them access to my VMs. Next, we want to talk about authenticating my VM securely to my Google APIs and my Google resources. To achieve this task, what we do is what we call binding a service account to a virtual machine. The first step is to create a service identity in the Google Cloud console. The second step is to grant that service identity access to all the things that I think my workload needs access to. Third and finally, when I'm creating the virtual machine I select which service account I want that virtual machine to run as. Every access that's made with client libraries running inside of that virtual machine authenticates the cloud as the service account identity. The long-term keys are held in escrow and the only things that the workload in the virtual machine gets is a short-term access token, which can be used to apply to API calls. Now, one of the common difficulties with using this feature is something called access scopes. It's something that you're required to select when using the feature. It is a great source of confusion to customers, we find. The default is set to allow default access, which isn't always the easiest to use. We recommend instead using "allow full access to cloud APIs" and what we call the Cloud Platform scope. This ensures that you have easy access to call APIs. It's a tool to effectively make client library integration simpler. And then use IAM permissions and IAM roles granted to the service account as your security feature. Now, talking about VMs that need to phone home. A lot of times, you're going to be running only part of your workload on Compute and some of it's going to be running elsewhere. To address that particular problem, we use the same approach that we used for calling into Google APIs previously. You'd create a service account, you attach it to the VM when you've constructed it. The code in the VM can then get an OIDC Access Token, which is an industry standard compliant OpenID Connect token that can be verified by verifying the signature in the token with the OIDC endpoint for Google. And the token contains all the information that you could ever possibly need to authorize access to this VM. It contains the service identity, it contains the project ID, the zone, the VM name, the creation date, everything you possibly ever need. Again, it's standards-compliant so it works with all open source OIDC-compliant libraries. It gives you strong proof of identity, and again, you don't have to manage credentials. So everyone's safe. Next, I'd like to talk about other kinds of threats. Other kinds of organizational threats to your virtual machine fleet. In particular, rootkits and bootkits. When you have people who can SSH in or malicious software that can hack its way into the front door, you always have a risk that the software that you're running in your VM fleet is not the software you think it is unless you have a strategy to make sure that it is. And the approach to solve that-- what you can use is a product we call Shielded VMs, which just launched to GA. It allows for a trusted chain of boot from the virtual TPM of the VM through the firmware, through the operating system, to make sure that all the software running in the operating system has been vetted and vouched for. If you have a rootkit that installs random malware into a kernel module in your operating system, we'll detect it and we'll notify you and tell you which VMs have been hacked so you can take remediative action. To use this feature, you can go to the boot disk operating system images on Compute and you can filter by the Shielded VM provided operating system images. And then when you create your VM, you turn on the Shielded VM option. And you can notice that this is an option, and so you might worry, maybe my developers cannot create a Shielded VM. And that's why we recommend that you combine that with the trusted boot pattern that we showed you before and organizational policies. You can apply the same pattern that we showed you before, where you can have your operating system images being vetted and curated by your images team, published to your organization. And then you can apply an organizational policy to the rest of your organization to both require that all VMs be created with the Shielded VM feature. And also, to ensure that the only ones that are used are the ones that you've vetted and approved. And again, to turn that on you just go to the organization policy control panel in your Cloud console and turn on Shielded VMs and your trusted image projects settings. Should I go back? Got to take a picture. [INAUDIBLE] OK. Finally, I'd like to talk about setting up secure-by-default behaviors for your Cloud. Like I said before, we are sometimes in a tough spot because we're trying to sell a Cloud. And to sell a Cloud, we need to make it easy to use and also secure at the same time. To achieve this, oftentimes the defaults will come out of the box with things easy to use. You'll get the owner permissions when you create a project so you don't have to worry about figuring out the permission model just to deploy a function. We do recommend that you secure those projects after you created them, especially when they're production resources. Now, to achieve this, what we find is a factory pattern works wonders. And every one of our customers that cares about security seems to apply this mechanism in one way or another. The basic idea is, again, to set up a workflow using Jenkins or Terraform or Cloud Function or whatever it is, where your employees, your developers, can click on a button hosted by you that creates the resource in GCP. And then sets up all the configuration exactly as you need it. So for instance, to achieve this, you can first go to your organization policy, your IAM policy. There, you'll find a policy grant that enables everyone in your domain to be able to create a project. You can disable that. Take that permission away. Instead, you can grant that permission to your workflow engines that are allowed to create vetted, secure projects. Again, you can use Jenkins and Terraform. Anything you want. And for us, some starting points for thinking about how you'd like to make your default projects-- you could, for instance, delete the default Compute service account. You can neuter its permissions, which start out with fairly broad editor permissions. You can delete the VPC network that is created on the project and instead connect that project to a secured shared VPC that you use for your entire organization or for sections of your organization. You can attach billing accounts to make sure that all the costs centering goes to the right billing account. You can set up audit logs and billing exports to make sure that you have full visibility as to what's going on in those projects when you unleash it to your developers. And again, this gives you centralized governance and control and guide rails for your developers. And it eliminates all the challenges of making your developers get easy access-- rapid access-- to a production environment, or to develop their code, or to create a production environment. And with that, I'll hand it over to Sirui to tell you more about keeping track of what your developers are doing. SIRUI SUN: Thanks, Cache. So, another type of question we get from customers a lot is, how do I more easily start to understand everything that's going on in my organization? And this is a key part of centralized administration because in the beginning, we talked about how you can extend permissions to your organization, to your org admin team, for example, to view everything. But that's not the same thing as being able to look over that content in an easy way to analyze what all is going on. And so I want to start by talking about some of the tools that we offer to start giving you that kind of 360 degree view of your organization. So, the first tool I want to talk about-- this has been around for a while-- are audit logs. And I don't have time to get too deep into the nuances of audit logs, but I will talk about a few types that are really helpful for this particular use case. So, the first type I want to talk about are admin activity logs. So, these logs are generated whenever an administrative action is taken on both the GCE resource but also most other types of GCP resources. So for example, if I had a user named Alice and she created a VM in a particular project, we'll create an admin activity log automatically for you. And that will include what type of VM it was, what time it happened, and other details that give you more visibility into what exactly was happening. These audit logs are enabled by default in all projects. In addition, they are generated in real time. So they really give you that up-to-date, up to the second details about what all is going on in your organization. A second type of audit log that's important to know at this point is the data access audit log. So, these talk about data reads and data accesses. These tend to be a little bit more verbose so they're not turned on by default. You actually have to go and turn them on yourself but when you do so, that information will start flowing in. And so in the use cases where you're really interested in knowing even about the reads, we highly suggest that you turn these on. So, with audit logs you get kind of this moment-by-moment set of events for your entire organization. And we release these and we go to our customers and they say, well, this is good, but it's difficult sometimes to understand the precise state of your world at a particular moment in time if I just have a set of events leading up to that particular moment, right? And it's possible to construct that from audit logs but it gets a little tricky. And so in response to that feedback, we've actually recently released a second tool that we call the Cloud Asset Inventory. And this tool answers the question of, at a particular moment in time, what resources and policies were there in my organization, in a particular folder, or in a particular project? This tool, this particular API, will actually look back in the last five weeks so it has knowledge of the last five weeks. And so you can ask a question like, hey, last Saturday at 2:00 AM when I had a security incident or something like that, what were the VMs that were running in my organization? And what were the policies applied on those VMs? And so you can start to do those forensics-type investigations or you can start-- you could imagine you could use it for any other number of use cases. And it's great for printing out that state. Because it has knowledge of your environment for the last five weeks, it can also tell you about how a particular resource changed over the past five weeks. So, kind of like a scrubbing over time, you can see how a VM might have evolved. You can see who gained or lost access to that VM. And so you can think of these as the two building blocks to start getting an understanding of what's going on in your organization. I won't delve too much deeper here, other than to say that we'll be going much deeper into this at tomorrow's session called Best Practices for GCE Enterprise Deployments. Now, having released both of these tools, what we see a lot of times is that customers will go through and build very similar solutions on top of these building blocks. They'll build solutions to go and list out all of the assets in their organization and they might also scan the events and the assets in their organization for things that they don't want to have happen, right? Scan for vulnerabilities, scan for that the right set of firewall rules are set up, et cetera. And so in response to that, we wanted to deliver some of that functionality out of the box. And so we recently released to beta the Cloud Security Command Center. This is going to provide a lot of that functionality out of the box. It's going to be a single pane of glass that's going to let you understand what resources are in your organization, let you explore what resources are in your organization, and also, it'll proactively be scanning for vulnerabilities and proactively informing you of, for example, if you have suspicious traffic going on, if you have VMs with insecure firewalls and things like that. So based on the feedback we got from what type of solutions our customers built, based on our building blocks, we've built some of that out of the box for you. And if you want to go further if you want to modify that or change something based on your security posture, you can always go back to your building blocks and build that out from there. Last section. I want to talk a little bit about the tools that we provide you as our customers to ensure your privacy and to ensure that you're in full control over your data in Google Cloud. So one thing we hear a lot from customers, especially those in more tightly regulated industries like banking, is that they want full control over their data that they upload to the cloud. It might be very sensitive. They might be beholden to regulatory needs to do so. And one way that this translates is, most of the times customers ask for more control over how their data is encrypted in the cloud so that they can manage that encryption and so that they could, for example, take away the encryption keys protecting their data so that no one has access to that data anymore. Not even us, Google. So, the first thing I want to talk I want to say before I proceed is that by default all data in Google Cloud is encrypted at rest regardless of configuration. So even if you go in and you don't configure any of this data, any of this encryption, we'll still do it for you automatically. But we do give you the tools to go further. A best practice that's quickly developing is that customers can use another managed service that Google Cloud provides called Google Key Management Service. This is a managed service that allows you to manage keys. So, you can create keys, rotate keys, delete them at any time. And then you can apply these keys to your disks, to your images and snapshots, and any other content that you store in GCE. And we'll start using that key that you've managed to go ahead and start encrypting your data. I should also note that this whole flow is supported with-- it's integrated with IAM and audit logging, so all the tools that we just talked about. So you can see who's doing the encryption and where these keys are. And Cloud KMS has a broad support for all of the set of security standards that we've been hearing are required from our customers. So, what this looks like. You would start by creating a key and a key ring in Cloud KMS. That's a very quick task. It should just take you a few minutes. And you can specify what the rotation period of this key, what encryption standard it uses, et cetera. And then when you go and create a GCE disk or image or snapshot, you will just point GCE to that particular key that you've just created. And just like that, we'll start using that particular key to protect your data. And so what that means is as-needed later you can go revoke or disable access to that key. This is what it looks like both in the UI and in the G Cloud command line tool. And from there, once you've disabled access what you'll find is that GCE will no longer have access to that information at all. So if you try to attach a disk that's protected by the revoked key to a VM you'll see that that action fails and that will fail until you restore access to the key. And so in this way you have full control over who has access to your data on Google Cloud. Another question that we get from customers is, well hey, you're Google, how do I know that you're not using my data for other means, right? How do I know you're not using it to train some-- yeah, you know, to advertise or something like that. Now, here I want to say again that at Google Cloud, we do not access your data for any other reason than those necessary to fulfill our contractual obligations to you. And in fact, when we looked at the data internally, over 99% of when we do these data accesses, they're to fulfill support requests. So it's where customers are actually asking us to do these to do these actions. But you don't have to take our word for it. We have a feature called Access Transparency logs. They're another type of audit logs that we just talked about. And what they do is they'll tell you in near-real-time and without cost any time Google accesses your data for any reason. All right. The one caveat here is that these are available for only a certain subset of our support packages. And you can find out more details online. But they're very easy to enable. So as long as you have the right permissions, that's step one-- we have a particular role for this Access Transparency admin-- you can just go into a particular project folder organization. And you'll find a little toggle in IAM admin settings. And from there you can go and you'll just see that however you're getting your logs, maybe it's through the Stackdriver Logging Portal, you'll start seeing these Access Transparency logs. So, what does that look like? Here's an example of an Access Transparency log. It will be generated once for every action that we take on your data. And it will tell you-- in yellow, you can see it tells you the reason, so why did we access your data? Here in particular, we're doing this because of a customer-initiated support request. And we'll attach the case number for you. And then in green you can see what exactly Google did. So you can see the resource that we were operating on, the API calls that we made when we did this, and a number of other details relevant to the case. And so, you don't have to take our word for what I just said. You can actually go in and verify yourselves. And so with that, that brings us to the end of our talk. We started by laying down some principles, and hopefully we've been able to build up to a set of best practices to go apply those principles. Thank you very much. [APPLAUSE]

Info

Channel: Google Cloud Tech

Views: 3,737

Rating: undefined out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate

Id: qDyjE1fIqkk

Channel Id: undefined

Length: 48min 11sec (2891 seconds)

Published: Fri Apr 12 2019