VPC Deep Dive and Best Practices (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] NEHA PATTAN: Hello, everyone. Welcome to GCP Next 2018. I'm really glad that all of you could make it. My name is Neha Pattan and I'm a software engineer at Google. I work as a DO on VPC. I'm going to be joined by Emanuele and Kamal today. And we are going to do a deep dive into Google VPC. So to start off with what is a VPC? A VPC stands for Virtual Private Cloud, a concept that I'm sure most of you are familiar with. A virtual private cloud is a logical isolation of a section of your cloud. And it is a cloud equivalent of your traditional network that you would run in your own data center, but with the extremely important added benefits of running on infrastructure that is highly optimized for scale, performance, and reduced network costs. We have a 50-minute talk today. I'm going to be introducing a few core VPC concepts and walking you through some of the work that we've been doing over the past one year, and also mentioning some best practices that we recommend. Emanuele, who is a networking product specialist at Google, will be walking us through a few example VPC topologies. And Kamal, who is our guest from PayPal, will be walking us through PayPal's experience of using Google VPC. Now what is the first thing that you do when you want to check whether your internet connectivity is working correctly or not? If you're a networking expert, the answer may be slightly different. But for most of the world, the answer is to open a web browser, navigate to google.com, and check whether that web page loads correctly or not. Using Google Cloud gives you the unique opportunity of using the same underlying network infrastructure that powers all of Google's services, like Search, Maps, YouTube, Gmail. And VPC is a really simple and intuitive interface to this infrastructure. And it provides to you programmable distributed networking functions. So what's unique about Google VPC? Most importantly, it uses the highly optimized, low congestion and high bandwidth, high quality of service global network backbone. Now this is important for many reasons. It allows you the capability of creating virtual machines anywhere in the world in many regions and having private connectivity between all of these virtual machines by default, so you don't need to set up expensive VPN connections or peering connections. You don't need to give your virtual machines public IPs in order for these virtual machines to be able to communicate with each other across regions. It allows you to peer with GCP at a single location and to get access to all of your workloads running globally. Using Google Cloud Load Balancer, you can create a global load balancing application. And you can rest assured that the load balancer will send requests to the closest backends that are healthy and that are closest to the end user that is sending the request. Now this is really important because it improves the perception of performance of your application. Using a single global VPC that is shareable in nature allows you to administer your network centrally, while allowing thousands of developers and development teams in your organization to create their own projects, to create and manage their compute resources in. Through private access, you can enable virtual machines in your VPC to have access to Google APIs, like your GCS packets, without having to assign public IPs to these VMs. So the traffic that goes from the virtual machines to your storage APIs, for example, would use the network backbone and would not use the internet. But all of these things sound very 2017, right? What is new about VPCs in 2018? We've been working really hard over the past year to build the model of VPCs to be more secure, scalable, performant, and easy to use. All the data traffic in GCP gets authenticated and encrypted when it exits the physical boundary that is owned by Google or operated on behalf of Google. We use the same DDoS protection system in the cloud as we do for the rest of Google, so you get the benefit of perimeter security. We built and announced a product around this called Cloud Armor, which allows you fine-grained control over the rules that you can specify for the perimeter security. And these rules can be IP blacklist, whitelists, or WAF rules, or you can specify geo-specific rules for allowing traffic into your load balanced applications. We provide network layer security through stateful connection tracked firewall rules that are implemented in a distributed fashion and so are very scalable. We use Andromeda, which is the SDN virtualization stack. And we have made several improvements in the Andromeda control plane infrastructure over the past year, which have enabled us to more than double the number of virtual machines that you can create in a single VPC. We take performance very seriously. We launched Andromeda 2.1 late last year, which gave a 40% reduction in the cross-VM latencies within a single zone over Andromeda 2.0. And it does this through an optimization by bypassing the hypervisor threads for network traffic that is going from the virtual machine to the software switch on the host machine. We also made several changes that enable fine-grained IAM access control and integration with third party services using VPC peering. Now, one of the more exciting initiatives we have had within the VPC team is to take a Kubernetes-first approach. And so we have made several changes in VPC inherently, in order to better support Kubernetes as a first-class citizen through GKE. Now, whether you are cloud-native or you are also running your work loads in your on-prem data center or you may be running workloads in another cloud as well, you may start small in a single region or in a small set of regions. But as your application grows and as you need to expand globally, you may expand to multiple regions. But the configuration that you did initially just seamlessly works. And so the overhead of network administration and management doesn't grow over the number of regions that you need to put your compute resources in. Now, let's walk through the different aspects of VPC. The first aspect that I'm going to be talking about is that of a network. Now, a network maps to a global virtual network that you can create virtual machines in. And it basically consists of regional subnetworks that have non-overlapping IP ranges. Now before I explain the concept of a Shared VPC, I'm going to step back a bit and talk to you about a project. Now, a project is a granular unit in GCP, which is a container that you can put resources inside. So if you want to create resources like firewall rules, routing policies, a network, or virtual machines, or VPN connections, you can create them inside of a project. A project is also a basic unit that you can attach billing accounts to, that you can specify IAM policies on, and that you can manage [INAUDIBLE] on. So it gives you a container that you can completely manage. And then you can have full control over the lifecycle management of your resources. Now with Shared VPC, you get the ability to create a VPC centrally in your organization in a project called the host project, and to share that with thousands of other projects in your organization called service projects, that you can create virtual machines in and have these virtual machines be connected to the Shared VPC. Now, this is really important because it allows you to scale your organization to thousands of developers or development teams, while centrally managing the firewall rules, the routing policies, the VPN connections, and not having to create these every time a new project is spun up. So what's new in the space? Now with the Shared VPC, you can administer the Shared VPC in folders. And so for example, if you want to isolate your billing pipeline VPC from your web app VPC, then you can do so by having separate folders for these two projects and having separate Shared VPC administrators who can setup the respective Shared VPCs. One of the things that we recommend doing here is to put all the service projects for a given Shared VPC set-up inside the same folder so that you can associate org level policies on that folder at a later time. Another thing that we recommend doing is to use subnetwork level permissions so that you can specify which service projects have access to create virtual machines in which specific subnetworks. That gives you fine-grained access control. Another feature that we have launched to GA over the past year is the ability to create private IP reservation. And so you can reserve RFC 1918 IP addresses. And you can use these for creating virtual machines or internal load balancers in your service projects. Another feature that we have launched to GA over the past year is alias IP ranges. Now, alias IP ranges give you the ability to create alias or secondary IPs and associate these with network interfaces of your virtual machines. You can allocate alias IPs through primary or secondary ranges on your subnetwork. Now if you use GKE to manage workloads running in containers on your GCE VMs, then alias IP ranges just gives you the ability to centrally manage all of your RFC 1918 IP addresses. So your port IPs, your node IPs, cluster IPs are allocated from the primary and secondary ranges of your subnetworks. It also gives native support for routing, and so it scales better and you can create more nodes in the cluster. If you use GKE to create an Ingress load balancer with a network endpoint group, then you can rest assured that the front end of the load balancer would load balance directly to the pods that are back ends of the load balancer rather than having the front end of the load balancer load balance to virtual machines, which then have IP table rules to route the packet to the back end pods. So using network endpoint groups, we get rid of this extra hop in the request path. And so it is basically optimizing the load balancing application performance. It also gives native support for health checking. And so we are able to health check the pods directly rather than health check the virtual machines. So as I mentioned before, alias IPs can be allocated from primary or secondary ranges in a subnetwork. If you're running out of IP space, you can add more secondary ranges on the subnetwork or you can expand the existing primary or second ranges. Now, one of the things that we recommend doing when you are planning your IP space is to plan it in such a way that you leave enough room for growth. Now if you're running an application on your VM and you're using an alias IP to run the application, then you may want to move the IP from one VM to another if the pod that you're running the application in dies or the application crashes. And you can do this by deleting the alias IP on VM1, adding it to VM2, creating a pod there, and starting your application in VM2. So this is basically how you migrate an IP from one VM to another VM. Now, the next aspect I will be talking about is security. As I mentioned before, we provide network layer firewall rules. But the way we implement these firewall rules is not through a metal box or a proxy, and so there are no single choke points. Firewall rules are programmed on every single host machine that has virtual machines running in your VPC and so they are truly distributed in nature. The rules themselves are flexible in allowing ingress, egress, allow or deny rules using priorities. You can use tags to easily group the virtual machines that you want to apply specific firewall policies on. Now, when a developer adds a tag to their virtual machine, there is no [? ACL ?] check that happens by default. This is because if you own a virtual machine, if you have access to set tags on it, then you are able to associate any tag with the virtual machine. Now, if the service provided by your organization is security-sensitive, internally or externally, then this may be problematic. And we solve this problem using firewalls with service accounts. So we are happy to announce that this is now available in GA. You can associate a service account and you can create that as a source or a target of a firewall. And you can rest assured that when a developer associates a service account with their VM, then there is an [? ACL ?] check that happens. They need to have IAM permission on using that service account. Now, you configured your VPC topology, you carved out the IP space, you created the subnetworks in different regions, and you created the firewall rules to ensure that the traffic within the VPC is secure. Now, the next thing you would like to do is connect your VPC running in the cloud with your on-prem data center. Now, there are multiple ways of doing this. If bandwidth is not of a major concern for you, if you are OK with the overhead of IPsec encryption-- it basically uses the internet-- then you can use VPN. And there are two ways of using VPN connections as well. You can either create static routes to your VPN tunnel, which basically allow you to statically specify the prefixes that should be routed to your on-prem or you can use Cloud Router, which runs as a BGP daemon, to exchange routes with your on-prem dynamically. Now one thing worth mentioning about Cloud Router is that unlike what the name might suggest, it's not a traditional router. And so packets that are going to your on-prem or coming from your on-prem to the VPC are not going through the Cloud Router. Cloud Router is a control plane component. And it basically is responsible for programming the routes that it learns from the on-prem onto the host machines that have virtual machines in your VPC running on those host machines. Now if bandwidth is of a major concern for you, then you can use Interconnects. If you would like to directly peer with Google, if you have a public ESN and you want to basically connect with us at one of our POPs, or Points Of Presence, then you can use Dedicated Interconnect. If you are doing the same thing using a carrier's infrastructure and you are paying the carrier for the service, then you would use Partner Interconnect. We're really excited that both Dedicated and Partner Interconnect are now generally available. There's a talk at 11:40 on hybrid connectivity that I would highly recommend checking out. Now, a couple of features that I would like to talk about that are new in the dynamic routing space are global BGP routing and custom route advertisements. Using global BGP routing, you can announce prefixes that belong to subnetworks in other regions using a Cloud Router in one region. And we recommend doing this for enabling redundancy in your connection to your on-prem. And so you can have Cloud Routers in two regions to your global VPC. And you can enable global BGP routing on your VPC. And this would enable you to have local routes as well as remote routes through the other region. Now, Cloud Router will add weight to the remote routes, so it will always prefer the local routes. But if your local connection-- if your local VPN connection or Interconnect-- goes down, then the remote route would be used. And so you can fall back to using the routes in a different region. So this gives you higher availability. You can use custom route advertisements to choose prefixes that you want to announce to your on-prem. And so you can choose prefixes that you don't want to announce. You can use custom route advertisements also to announce prefixes that don't belong to subnetworks in your VPC. And these may be RFC 1918 IB or IP ranges, or non-RFC 1918 as well. Now, the next thing I'm going to talk about is using third party services. Now, Shared VPC was designed with the intention of having private connectivity between virtual machines in one organization. But if you want to connect two VPCs in two different organizations-- the typical consumer and producer scenario-- then you can use VPC peering. VPC peering is also not implemented through a proxy and it is truly distributed in nature. And so there is no single point of failure with VPC peering either. When you peer to VPCs, you get full connection and you get full connectivity to all the virtual machines in the peer VPC. And you also get access to all the services that the peer VPC is exposing through internal load balancers. Now one of the things that I want to mention about both Shared VPC as well as VPC peering is that in both cases, you get high throughput and bandwidth. And you get the same throughput and bandwidth as though if you were to create those two virtual machines in the same network in the same project. And this is because that aspect of network topology is pretty much transparent to the SDN data plane. If you're a service provider and you would like to have a management interface to the virtual machines that are serving traffic to your consumer, then you can do so by creating a virtual machine with multiple network interfaces. You can create up to eight network interfaces on your virtual machine and the traffic basically on those two network interfaces will be completely isolated. So your data plane network will be fully isolated from your management network. Now, the last aspect I'll be talking about is that of monitoring. As you know, GCE is integrated with Stackdriver. And so if you use features like global load balancing or internal load balancer, then you'll be able to view the logs for these features in Stackdriver. We're really excited to now announce the general availability of VPC Flow Logs. VPC Flow Logs allow you to view the VPC flows, which are aggregated over five seconds by connection. And you can enable this feature on a per subnetwork basis. On the console, you can choose the log type as VPC Flows and you can then view the connections that are logged. Each log will show you the [INAUDIBLE] of the connection along with other meta information like source dest VPC, source dest subnetwork, and other annotation information like geo annotations that we add on top of that on the logs. You can export this to BigQuery or Pub/Sub. And you can use this feature for network forensics, for troubleshooting your network, as well as for analyzing the different traffic patterns in your network across geographies, and to use that information to better improve the topology of your network to reduce costs. Now, you may have started small in a single or a small set of regions, but the configuration that you did initially by creating the network, by creating the firewall rules, by setting up VPC peering-- these things just seamlessly work as you grow to more regions. With the VPC design, we make sure that the cost and overhead of network administration and management doesn't grow linearly with the scale of your network footprint. We really think of this model of VPCs as being the true next generation of how VPCs are defined in the cloud. And we're really excited to see how you use them. We've been working closely with some of the top names in education, banking, software, technology, mobile labs, you know, you name it. And they've been giving us some excellent feedback on how to improve our features. You know, it's really been a fun ride so far. Now, three things that I want you to take away from this presentation are that VPCs are global in nature. They use the same underlying network infrastructure as the rest of Google that allows you to do amazing things at scale. They are highly performant. We take performance very seriously. As I mentioned earlier, we have been making a lot of progress in improving performance of our network, but we're also looking for new ways of doing this. We have a lot of ongoing investment, and so you can expect to see more performance improvements in the coming future. And lastly, that VPCs are easy to use. As you can imagine, the network is very powerful, but also very complex. And VPCs provide a simple and intuitive interface to this complex infrastructure that allows you extensibility, access control, and integration with third party services. And with that, I would like to invite Emanuele onstage to share a few example VPC topologies. EMANUELE MAZZA: Thanks. NEHA PATTAN: Thanks. EMANUELE MAZZA: Thanks, Neha. [APPLAUSE] So we're going to run some through example topologies that uses all the concept and that Neha just explained. Before going over that, I just want to quickly recap a couple of things. So if you look at from the routing point of view, a VPC is really a routing domain. That's what you get when you create a VPC. It does provide you internal IP connectivity across all your subnetworks-- doesn't matter in which region they exist-- and without actually the traffic leaving the Google network. So all the traffic stays inside the Google network. Neha mentioned that we have a project concept. So the project is a unit of building. It's a unit of permissions. It's a unit of resources. And yes, you can have multiple VPCs in the same project. If you do that, you get two completely disjoint routing domains, which means, like in the example here, you can have two VPCs with overlapping IP subnets. So in this picture, you can see that VPC1 and VPC2 as overlapping in us-west2. Of course, this is possible because they are disjoint and they do not speak across themself by design unless you want it to. And then the Cloud Router. So as Neha was saying, it's the EBGP speaker process. Think about it like a route processor in a physical router world. It does not do data plane. The whole goal in life of this guy is to get routes from outside and program them into the VPC, and gets route from the VPC and send them outside. As you can see in the picture, it can programs route across all regions. So in this example, the Cloud Router exists in us-west1, but its able also to advertise subnetwork that exists in us-west2, which is the 10.240.1/24. And they also are able to advertise subnetworks that do not exist in your VPC-- in this example, the 10.239.2/24. OK, now that these two concepts are clear, let's start looking at some potential topologies. So in this one, we're using VPC peering in a producer consumer scenario. So we have on top, a provider project in the blue org. And at the bottom, we've got two consumer projects-- one in the green org and one in the red org. So please know that these three projects are in three different organizations. And now, what the provider projects want to provide is common services that must be used from both consumer project in the green org and the consumer project in the red org. So the way I can do this, I do this through VPC peering. So I do VPC peering from the green org VPC to the leftmost VPC in the provider project. And the red one is peering with the rightmost one. Note that here I got full overlapping IP across the consumer projects. Both consumer project are using the same subnetworks. That's why I need two different VPCs in the provider project. Tenant isolation in the provider project is done by default because those two VPCs are separated routing domains, so there is no connectivity possible. You don't need to use firewall rules for that. Shared VPC-- so I'm going to show you a simple example of how shared VPC can be used with a multiple tier application. And the goal of this topology is to basically centralize networking and security configurations and operations in the host project. And by using shared VPC and giving different users different permissions to use different subnets-- and we'll see that-- and then I will be able to have each team-- the team of the front end service project and the back end service project-- manage their own compute by themselves. And this is, of course, fully compatible with load balancers and flow logs. So how does that work? So basically, first, in this project, I create a subnetwork-- an orange subnetwork. And it's connected to the shared VPC router. And I give permission to the orange user to use the subnetwork. Subnetwork is an object, so you can give permission to this object. Now what happens is this orange user that exists in the service project logs into the service project, create instances, and he sees that subnetwork as available to connect instances onto, even if that subnetwork does exist in the host project. From here on, from the networking point of view, those two instances is just like if they are connected to the host project. From the compute point of view, they exist in the service project. So building for the instances is on the service project. Now from there, the orange user create a global load balancer protecting the front end with Cloud Armor, for example, and [INAUDIBLE] traffic from the internet. Now, I'm also creating another subnetwork in the host project. And this time, I'm giving it permission to use this subnetwork to the blue user. The blue user is a user that exists in the back end service project. So what this user does is logs into his service project and create an ILB, create instances. And as the orange user previously, he sees the subnetwork in the host project as available to connect his instances to and available to create the VIP of an internal load balancer. And from here, I got full IP connectivity between front end and back end, exactly like those two set of instances are connected to the same VPC on the host project. Same idea, but on a different concept. So I want to use a network virtual appliance to provide security and networking services to multiple projects at the same time. So what I can do is create a network virtual appliance which is a multinic appliance. And as Neha was saying, a multinic instance-- appliance, in this case-- in your multinic's unique instance, each interface needs to be connected to a different VPC. So here in this example, I got those two VPCs. Let's say they are two internal VPCs. And I'm using the same concept of shared VPCs, so I'm sharing [INAUDIBLE] with the orange user and with the blue users. And they create their instances and the internal load balancer exactly like before. And then the network virtual appliance also has a northbound-- well, I should say eastbound in this picture-- external interface that is connected to the external VPC-- so the one on the right-- that does BGP with the Cloud Router with your on-prem. So in this example, all the instances-- VM1, VM2, VM3, and VM4-- do get connectivity with on-prem and do leverage networking and security services provided centrally by the network virtual appliance in the host project. The next one is a more complex one, I would say, but not that much-- Shared VPC with shared Interconnect. So what's the goal? The goal is that I want to leverage the same Dedicated Interconnect across multiple projects. Why? Because Dedicated Interconnect is a precious resource. It provides [INAUDIBLE] connectivity and it has a cost. So I want to use it as much as possible. And I want to keep the operational and management advantages of the Shared VPC model that I showed before. And also I want to allocate cost across projects in a meaningful way. So how can we meet these goals? So first, so we create one project where the Dedicated Interconnect exists-- is created. And then you have one or more shared VPC host project, each one with its own Cloud Router and each one with its own Interconnect attachment. As we will see, those Interconnect attachment will use Dedicated Interconnect that lives in a different project. Because again, everything in GCP, it's an object. So also the Dedicated Interconnect is an object and has permissions. How you do security? The best practice Neha recommended is to use firewall rules based on service account. And VPC Flow Logs will give you complete visibility of traffic, including the service VPC and the host VPC. So visually, how that looks like? So if you look here first, so this is the physical world. So these are [INAUDIBLE] sites and you have your router here. And this is our peering fabric device. So this is a site where we provide dedicated interconnect. And this DI is the Dedicated Interconnect. Physically it's a port channel, so that's where the connection is built. Now if you forget about the physical and think about the cloud as an object model, the Dedicated Interconnect is an object that lives in a project. So you go into a project and you say, I want a Dedicated Interconnect. Now, we got in this example two shared VPC. And the structure is the same as the one that I was basically showing you before. And what is to be noticed here, that each of those two shared VPC-- the host project-- has a VLAN attachment and a Cloud Router. So note that VLAN attachment is an object that lives in the host project, but it is using Dedicated Interconnect that lives in the Dedicated Interconnect project. So you can have a single project with the Dedicated Interconnect. And you can use it to basically share the bandwidth across multiple, in this case, host projects, of a shared VPC configuration. Of course, if you want, you can have full overlapping IP support because, from our side, the tenancy's provided by the fact that we have two different VPCs. And on your side and the customer side, of course, you have to terminate [INAUDIBLE] that is overlapping. You have to terminate them into different VRF, for example, and then bring the tenancy inside your network to MPLS VPN or whatever you are use to. So we basically met all these goals. And a couple of consideration-- so how about the billing? So the Dedicated Interconnect gets billed against that project, the project where it exists. The VLAN attachment gets billed against the host project. The Egress traffic from the instances gets billed on the service project where those instances lives, not on the host project. So if you have multiple teams, different teams that manage the service project, you can know how much traffic and Egress cost they are basically using. There's a common DNS space spanning each Shared VPC domain. This is a native capability of Shared VPC. And VPC Flow Logs, I was saying, contains information, both about the service project and the host projects. So if you look at the flow logs, in the source instance and destination instance field, you will see the service project where the instances actually leaves. And if you look in source VPC and destination VPC, you will see the host project to which the instances are attached. So I have two more. One is a classical high throughput VPN connection. So in this example, you have two VPN gateways in your VPC. In this example, they are in two different regions. And you have multiple VPN tunnels for each VPN gateway. Since we can do equal cost multipath per flow, of course, from GCP to on-prem, this basically gives you a high throughput connectivity between your on-prem-- sorry, your GCP project and the on-prem. There are multiple ways you can set up this. The suggestion is, for sure, do route-based to start with because it's more flexible. But the thing I always suggest to customer is to start, if possible, with the Cloud Router and BGP. Why? Because the moment you switch to Dedicated Interconnect, the routing will be exactly the same. So you can test and get confident of your routing setup, VPN, and then move to the dedicated interconnect, and you don't change a thing. You can actually use VPN as a backup of Dedicated Interconnect. And this is what I'm showing you in the last slide. So what I'm leveraging here is the global routing feature of the Cloud Router. So let's say that I want to give reachability to us-east subnets. I send two advertisements from the us-east Cloud Router on the green region with met-120, so met is a BGP attributes, basically, that you can set on the Cloud Router. And also, the Cloud Router in us-west knows about the us-east1 subnet. But we automatically add an inter-region cost to that route, which is that 201, which means that when Cloud Router in the orange region advertises subnetworks, they automatically get an additional met and higher met than 120, meaning 120 plus 201. And that's 201 is just an example. It's dynamically calculated by the platform. And this way, you're sure that both R1 and R2 use the Interconnect default. And when those Interconnects are down, you use a VPN as a backup. The same thing you can do on the reverse side. And on the reverse side, it's actually easier because even if you advertise the same met from on-prem to your Cloud Routers, we automatically add that inter-region cost when the Cloud Routers that lives in the orange region program the routes on the VPC. We automatically do that because the subnets are living in a different region, in the green one. So that gives you automatic load balancing and usage of the Dedicated Interconnect as a primary and all leading VPN only as a backup. And with that, we go to Kamal, who will tell you how PayPal is using VPC in their production network. Thank you. [APPLAUSE] KAMAL CONGEVARAM MURALIDHARAN: Good afternoon, everyone. My name is Kamal. I'm part of the core platform and infrastructure team at PayPal. So at PayPal, we started our public cloud journey a couple of years back. So when we started looking at how PayPal can expand into public cloud, and we are having one of the largest private data centers. So we started looking at what is needed and what we need to do to get into public cloud. We soon realized two important things. One is that it is not a one-year or a two-year journey and just like that we can just jump into a public cloud. It is a multi-year journey because it involves a culture change, mindset change, toolset change, and even some application rewrite as we move into that. The second thing we realized was none of the public cloud had all the features for supporting enterprises to run in a more secure and a scalable way. So it means that we need a partner to journey with us, and we travel along with them in this multi-year journey to get to the state where we can run a big enterprise on public cloud. So with that, one of the key things we started looking at is that how do you build building blocks in the public cloud so that we can start figuring out how to move things in a phased manner because we cannot move everything. So one of the key pieces which we started looking at is, how do you build this VPC in public cloud? So with that, I'll go to one of the use cases which we started with and how we went through the process of moving into public cloud. So the first use case we took was a dev use case, which is basically-- PayPal has thousands of developers, and each of those developers needs their space to go and do creative things, like creating their VMs, deploying their services, running obligations, or do things, which is something we want to provide the space for them. So when you look at that, what they need is autonomy in doing that. I would say that DevOps is a very widely used term, but the meaning of DevOps varies from company to company, person to person. But what we really think is that give the autonomy to the developer so that they can do what they need to do, and we don't step into their thing. At the same time, there are a lot of other things which developers doesn't want to take care of. For example, when we started a Google Cloud, there was nothing called a Shared VPC. So the model which existed was, you create a project. And inside the project, there will be a network, which means that every project has its own network, which means that every developer-- if I have to give a project, then we also have to create a network inside the project, which means that we need to allocate an IP range. So we are talking about hundreds of projects-- even more than that, thousands of projects. And it's a big nightmare for us to manage IP ranges for each of these projects. The second thing is that-- another main principle we talk about for public cloud is, everything should be automated, which means that when we set up something, it's not just the automation to create the initial project, but any maintenance after that should be automated. So if we are creating a project for a developer and if the network is within that-- so any interconnectivity configuration or a gateway configuration all exist within the network. So if you have to make a change, it's not making change in one project. We are talking about going and making changes in thousands of projects. So it created a pretty difficult for us to scale in that model. And that's where we work with Google and then we partnered up and say that, hey, these are the problems. And that's where we started looking at-- Google introduced this concept of shared VPC. So I think Neha and Emanuele explained all the topologies and everything, so I'm not putting any diagram here. But I'm going to explain what problems we faced when we started initially and how we saw it in the shared VPC model. So as I said, in the current industry, there are two schools of thought. One is a complete DevOps model, where you give complete control to the developer from start to end. That means that you create an account and give everything to the developer, and developer manages from network to compute or deploying applications and everything. And there's another extreme thought, which is basically completely anti-DevOps, where everything is done by somebody else, and developer gives support. So at PayPal, what we did is that we took a hybrid approach. We really figured out what each team should be responsible for and where we should draw the line and say that, these are all your developer responsibilities, like compute, storage, your code, and everything. You manage it. But anything below, we want to really standardize it. And especially in a big enterprise, when you have thousands of developers and thousands of projects, standardization becomes very critical in terms of security and also auditing. If you want to change something, you need to know where to change it. If it is in many places, it becomes very tricky. So if you look at it, as I said, IP management was one of the problems. Allocating IP ranges, assigning them, and when the project is deleted, re-assigning it somewhere-- it was becoming a challenge. With Shared VPC, the model we are after was that we have many security zones in PayPal. Each security zone is carved out based on what data can be stored, what data can be accessed, what services can exist, how the services can interact. There's many criterias based on which we carve out zones. For example, there may be a zone where the data resides, which means that nobody can enter into the zone except through a single channel, and that would be protected with firewalls, data loss, everything. So we have this concept already built into our private cloud. We want to bring a similar concept into public cloud. That's where this model of VPC became very useful, where we said that every VPC we create is going to be a security zone, which means that, within that security zone, it's open, that if there are two VMs, they can communicate. But any query to come into that security zone-- it has to go through a firewall or it'll have a VPC peering with more security and scrutinization. The second thing is that, as I said, once we have Shared VPC, the management becomes more simpler because we just need to manage the connectivity at the Shared VPC level. Every service project we just created doesn't need to worry about it. So as far as the developers are concerned, they just keep creating VM, storage, and everything we started with a VPN based approach, and then now we are moving to a data Interconnect, which means that there's no impact on the developers of us because everything is centrally managed at the Shared VPC level. So this is a functional representation of how we divided the functionalities and felt that this is the optimal way so that we give enough freedom for developers to play around with public cloud, do things what they need. But at the same time, we don't need to overburden them with responsibilities which they don't need to take care of. So if you look at it, we have a Enterprise Cloud team which comes with best practices of how many VPCs has to be created, how the VPCs has to be interconnected, like peered or [? peered, ?] how they have to connect back to the on-prem, where should be the routing configured. All these are centralized and automated through one single mechanism so we manage everything down. And at the same time, we have automation to create projects which sets a project up, puts the right developers there, and grants permission to this network. That's it. Once they get the project, within the project, they have an autonomy to do anything. So in this mechanism, we were able to scale it to thousands of projects, and we are able to manage it without any impact on the future changes because we all know that things [? airwall. ?] The cloud is going to [? airwall. ?] That means we should also move along with that. So we build the foundation such a way that we can make more changes in the future with less impact. Also, another principle we talk about is there's another use case, which is a test zone, which is basically running PayPal.com inside PayPal, which means that it is a complete replication of PayPal inside PayPal. That means that it is used for our test environments, the integration testings, and everything. Now, when we are up at a model, which I showed previously, our goal is to see that the same model can be applied for any use cases which comes later, like test. Tomorrow it could be production or later, something else. So in this case, if you look at the use case, it's something a lesser scale. In terms of number of projects, we are talking about only a few tens of projects. But it's a huge resource. It's like thousands of VMs, thousands of gigabytes of storage and everything because it's complete PayPal inside. But if you go into the model, it's the same. We didn't change anything. We created a VPC, which was already automated, which has all the features. We created the project in the same way. We just need to assign different themes to these projects. And within that, they were able to do everything they wanted. So the model is scalable to any use case which you bring it. And we are in the process of, next year, trying to get public cloud production use cases into public cloud. So we don't need to change much. The same automation which we created is going to help us do this again. As I said, its a multi-year journey. We all realize that not everything is available today, and we have to get this going. So we have seen that VPC is common, but, again, not every services are routed VPC. For example, we wanted to create database with the private IP assigned from the VPC range, which will become available. We'll start adopting it. IP alias is a great feature, which was released. For us, we have a container networking technology where we assign IP to container. So for that, IP Alias is a great thing. So we are going to start using it. And similarly, as this technology evolves, our tools also evolve. For example, we had events which have been consumed. Now, since we have this host in a service project isolation, we need to figure out how to collect even from different service projects as from a single place because we don't want to deploy something on every service project. So we are also evolving as that technology evolves. And as Google is maturing enough more with this, we will also be moving along with that. Thank you. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 39,352
Rating: 4.8694639 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: wmP6SQe5J7g
Channel Id: undefined
Length: 49min 42sec (2982 seconds)
Published: Tue Aug 14 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.