[MUSIC PLAYING] NEHA PATTAN: Hello, everyone. Welcome to GCP Next 2018. I'm really glad that all
of you could make it. My name is Neha Pattan and I'm
a software engineer at Google. I work as a DO on VPC. I'm going to be joined by
Emanuele and Kamal today. And we are going to do a
deep dive into Google VPC. So to start off
with what is a VPC? A VPC stands for Virtual
Private Cloud, a concept that I'm sure most of
you are familiar with. A virtual private cloud
is a logical isolation of a section of your cloud. And it is a cloud equivalent
of your traditional network that you would run in
your own data center, but with the extremely important
added benefits of running on infrastructure that is
highly optimized for scale, performance, and
reduced network costs. We have a 50-minute talk today. I'm going to be introducing
a few core VPC concepts and walking you through
some of the work that we've been doing
over the past one year, and also mentioning some best
practices that we recommend. Emanuele, who is a networking
product specialist at Google, will be walking us through a
few example VPC topologies. And Kamal, who is our
guest from PayPal, will be walking us through
PayPal's experience of using Google VPC. Now what is the
first thing that you do when you want to check
whether your internet connectivity is working
correctly or not? If you're a networking
expert, the answer may be slightly different. But for most of the
world, the answer is to open a web browser,
navigate to google.com, and check whether that web
page loads correctly or not. Using Google Cloud gives
you the unique opportunity of using the same underlying
network infrastructure that powers all of Google's services,
like Search, Maps, YouTube, Gmail. And VPC is a really simple
and intuitive interface to this infrastructure. And it provides to you
programmable distributed networking functions. So what's unique
about Google VPC? Most importantly, it uses
the highly optimized, low congestion and high bandwidth,
high quality of service global network backbone. Now this is important
for many reasons. It allows you the capability
of creating virtual machines anywhere in the
world in many regions and having private
connectivity between all of these virtual
machines by default, so you don't need to set up
expensive VPN connections or peering connections. You don't need to give your
virtual machines public IPs in order for these
virtual machines to be able to communicate with
each other across regions. It allows you to peer with
GCP at a single location and to get access to all of
your workloads running globally. Using Google Cloud
Load Balancer, you can create a global
load balancing application. And you can rest assured
that the load balancer will send requests to the closest
backends that are healthy and that are closest
to the end user that is sending the request. Now this is really
important because it improves the perception
of performance of your application. Using a single global VPC
that is shareable in nature allows you to administer
your network centrally, while allowing thousands of
developers and development teams in your
organization to create their own projects,
to create and manage their compute resources in. Through private access, you
can enable virtual machines in your VPC to have
access to Google APIs, like your GCS packets,
without having to assign public IPs to these VMs. So the traffic that goes
from the virtual machines to your storage
APIs, for example, would use the network backbone
and would not use the internet. But all of these things
sound very 2017, right? What is new about VPCs in 2018? We've been working really
hard over the past year to build the model of VPCs
to be more secure, scalable, performant, and easy to use. All the data traffic in GCP
gets authenticated and encrypted when it exits the
physical boundary that is owned by Google or
operated on behalf of Google. We use the same DDoS
protection system in the cloud as we do for the rest of
Google, so you get the benefit of perimeter security. We built and announced a product
around this called Cloud Armor, which allows you fine-grained
control over the rules that you can specify for
the perimeter security. And these rules can be
IP blacklist, whitelists, or WAF rules, or you can
specify geo-specific rules for allowing traffic into your
load balanced applications. We provide network
layer security through stateful connection
tracked firewall rules that are implemented in a
distributed fashion and so are very scalable. We use Andromeda, which is
the SDN virtualization stack. And we have made several
improvements in the Andromeda control plane infrastructure
over the past year, which have enabled us to more
than double the number of virtual machines that you
can create in a single VPC. We take performance
very seriously. We launched Andromeda
2.1 late last year, which gave a 40% reduction
in the cross-VM latencies within a single zone
over Andromeda 2.0. And it does this
through an optimization by bypassing the
hypervisor threads for network traffic that is
going from the virtual machine to the software switch
on the host machine. We also made
several changes that enable fine-grained
IAM access control and integration with third party
services using VPC peering. Now, one of the more
exciting initiatives we have had within
the VPC team is to take a
Kubernetes-first approach. And so we have made
several changes in VPC inherently, in order to
better support Kubernetes as a first-class
citizen through GKE. Now, whether you
are cloud-native or you are also running your
work loads in your on-prem data center or you may be running
workloads in another cloud as well, you may start
small in a single region or in a small set of regions. But as your application
grows and as you need to expand globally, you
may expand to multiple regions. But the configuration that
you did initially just seamlessly works. And so the overhead of network
administration and management doesn't grow over
the number of regions that you need to put your
compute resources in. Now, let's walk through the
different aspects of VPC. The first aspect that I'm
going to be talking about is that of a network. Now, a network maps to
a global virtual network that you can create
virtual machines in. And it basically consists
of regional subnetworks that have non-overlapping IP ranges. Now before I explain the
concept of a Shared VPC, I'm going to step back a bit
and talk to you about a project. Now, a project is
a granular unit in GCP, which is a container
that you can put resources inside. So if you want to
create resources like firewall rules,
routing policies, a network, or virtual machines,
or VPN connections, you can create them
inside of a project. A project is also a
basic unit that you can attach billing
accounts to, that you can specify IAM policies
on, and that you can manage [INAUDIBLE] on. So it gives you a container
that you can completely manage. And then you can
have full control over the lifecycle
management of your resources. Now with Shared VPC,
you get the ability to create a VPC centrally
in your organization in a project called
the host project, and to share that with
thousands of other projects in your organization
called service projects, that you can create virtual
machines in and have these virtual machines be
connected to the Shared VPC. Now, this is really
important because it allows you to scale your
organization to thousands of developers or
development teams, while centrally
managing the firewall rules, the routing policies,
the VPN connections, and not having to
create these every time a new project is spun up. So what's new in the space? Now with the Shared
VPC, you can administer the Shared VPC in folders. And so for example, if you
want to isolate your billing pipeline VPC from
your web app VPC, then you can do so by having
separate folders for these two projects and having separate
Shared VPC administrators who can setup the
respective Shared VPCs. One of the things that
we recommend doing here is to put all the service
projects for a given Shared VPC set-up inside the same
folder so that you can associate org level policies
on that folder at a later time. Another thing that
we recommend doing is to use subnetwork
level permissions so that you can specify
which service projects have access to create virtual
machines in which specific subnetworks. That gives you fine-grained
access control. Another feature that
we have launched to GA over the past year
is the ability to create private
IP reservation. And so you can reserve
RFC 1918 IP addresses. And you can use these for
creating virtual machines or internal load balancers
in your service projects. Another feature that
we have launched to GA over the past
year is alias IP ranges. Now, alias IP ranges
give you the ability to create alias or secondary
IPs and associate these with network interfaces
of your virtual machines. You can allocate alias IPs
through primary or secondary ranges on your subnetwork. Now if you use GKE to manage
workloads running in containers on your GCE VMs, then alias
IP ranges just gives you the ability to centrally
manage all of your RFC 1918 IP addresses. So your port IPs,
your node IPs, cluster IPs are allocated from the
primary and secondary ranges of your subnetworks. It also gives native
support for routing, and so it scales better and
you can create more nodes in the cluster. If you use GKE to
create an Ingress load balancer with a network endpoint
group, then you can rest assured that the front
end of the load balancer would load balance directly
to the pods that are back ends of the load balancer
rather than having the front end of
the load balancer load balance to
virtual machines, which then have IP table
rules to route the packet to the back end pods. So using network
endpoint groups, we get rid of this extra
hop in the request path. And so it is basically
optimizing the load balancing application performance. It also gives native
support for health checking. And so we are able to health
check the pods directly rather than health check
the virtual machines. So as I mentioned
before, alias IPs can be allocated from
primary or secondary ranges in a subnetwork. If you're running
out of IP space, you can add more secondary
ranges on the subnetwork or you can expand the existing
primary or second ranges. Now, one of the things
that we recommend doing when you are
planning your IP space is to plan it in
such a way that you leave enough room for growth. Now if you're running an
application on your VM and you're using an alias
IP to run the application, then you may want to move
the IP from one VM to another if the pod that you're running
the application in dies or the application crashes. And you can do this by
deleting the alias IP on VM1, adding it to VM2, creating
a pod there, and starting your application in VM2. So this is basically
how you migrate an IP from one VM to another VM. Now, the next aspect I will
be talking about is security. As I mentioned before, we
provide network layer firewall rules. But the way we implement
these firewall rules is not through a
metal box or a proxy, and so there are no
single choke points. Firewall rules are programmed
on every single host machine that has virtual machines
running in your VPC and so they are truly
distributed in nature. The rules themselves
are flexible in allowing ingress, egress, allow or
deny rules using priorities. You can use tags to easily group
the virtual machines that you want to apply specific
firewall policies on. Now, when a developer adds a
tag to their virtual machine, there is no [? ACL ?] check
that happens by default. This is because if you
own a virtual machine, if you have access
to set tags on it, then you are able
to associate any tag with the virtual machine. Now, if the service provided
by your organization is security-sensitive,
internally or externally, then this may be problematic. And we solve this
problem using firewalls with service accounts. So we are happy to announce that
this is now available in GA. You can associate
a service account and you can create
that as a source or a target of a firewall. And you can rest assured that
when a developer associates a service account
with their VM, then there is an [? ACL ?]
check that happens. They need to have IAM permission
on using that service account. Now, you configured
your VPC topology, you carved out the IP space,
you created the subnetworks in different regions, and you
created the firewall rules to ensure that the traffic
within the VPC is secure. Now, the next thing
you would like to do is connect your VPC
running in the cloud with your on-prem data center. Now, there are multiple
ways of doing this. If bandwidth is not of
a major concern for you, if you are OK with the
overhead of IPsec encryption-- it basically uses the internet-- then you can use VPN. And there are two ways of
using VPN connections as well. You can either
create static routes to your VPN tunnel,
which basically allow you to statically
specify the prefixes that should be routed to
your on-prem or you can use Cloud Router,
which runs as a BGP daemon, to exchange routes with
your on-prem dynamically. Now one thing worth
mentioning about Cloud Router is that unlike what
the name might suggest, it's not a traditional router. And so packets that are
going to your on-prem or coming from your
on-prem to the VPC are not going through
the Cloud Router. Cloud Router is a
control plane component. And it basically is responsible
for programming the routes that it learns from the on-prem
onto the host machines that have virtual machines in your
VPC running on those host machines. Now if bandwidth is of
a major concern for you, then you can use Interconnects. If you would like to
directly peer with Google, if you have a public ESN and
you want to basically connect with us at one of our POPs,
or Points Of Presence, then you can use
Dedicated Interconnect. If you are doing the same
thing using a carrier's infrastructure
and you are paying the carrier for the
service, then you would use Partner Interconnect. We're really excited that
both Dedicated and Partner Interconnect are now
generally available. There's a talk at 11:40
on hybrid connectivity that I would highly
recommend checking out. Now, a couple of
features that I would like to talk about that are new
in the dynamic routing space are global BGP routing and
custom route advertisements. Using global BGP
routing, you can announce prefixes that
belong to subnetworks in other regions using a
Cloud Router in one region. And we recommend doing this
for enabling redundancy in your connection
to your on-prem. And so you can have Cloud
Routers in two regions to your global VPC. And you can enable global
BGP routing on your VPC. And this would enable you
to have local routes as well as remote routes through
the other region. Now, Cloud Router will add
weight to the remote routes, so it will always
prefer the local routes. But if your local connection--
if your local VPN connection or Interconnect-- goes
down, then the remote route would be used. And so you can fall
back to using the routes in a different region. So this gives you
higher availability. You can use custom
route advertisements to choose prefixes that you want
to announce to your on-prem. And so you can choose
prefixes that you don't want to announce. You can use custom route
advertisements also to announce prefixes
that don't belong to subnetworks in your VPC. And these may be RFC 1918 IB
or IP ranges, or non-RFC 1918 as well. Now, the next thing
I'm going to talk about is using third party services. Now, Shared VPC was
designed with the intention of having private connectivity
between virtual machines in one organization. But if you want to
connect two VPCs in two different organizations-- the typical consumer
and producer scenario-- then you can use VPC peering. VPC peering is also not
implemented through a proxy and it is truly
distributed in nature. And so there is no
single point of failure with VPC peering either. When you peer to VPCs,
you get full connection and you get full connectivity
to all the virtual machines in the peer VPC. And you also get access
to all the services that the peer VPC is exposing
through internal load balancers. Now one of the
things that I want to mention about both Shared
VPC as well as VPC peering is that in both cases, you get
high throughput and bandwidth. And you get the same
throughput and bandwidth as though if you were
to create those two virtual machines in the same
network in the same project. And this is because that
aspect of network topology is pretty much transparent
to the SDN data plane. If you're a service
provider and you would like to have a
management interface to the virtual machines
that are serving traffic to your consumer,
then you can do so by creating a virtual
machine with multiple network interfaces. You can create up to
eight network interfaces on your virtual
machine and the traffic basically on those
two network interfaces will be completely isolated. So your data plane network
will be fully isolated from your management network. Now, the last aspect
I'll be talking about is that of monitoring. As you know, GCE is
integrated with Stackdriver. And so if you use features
like global load balancing or internal load
balancer, then you'll be able to view the logs for
these features in Stackdriver. We're really excited
to now announce the general availability
of VPC Flow Logs. VPC Flow Logs allow
you to view the VPC flows, which are aggregated
over five seconds by connection. And you can enable this feature
on a per subnetwork basis. On the console, you can choose
the log type as VPC Flows and you can then view the
connections that are logged. Each log will show
you the [INAUDIBLE] of the connection
along with other meta information like source
dest VPC, source dest subnetwork, and other
annotation information like geo annotations that we add on
top of that on the logs. You can export this to
BigQuery or Pub/Sub. And you can use this feature
for network forensics, for troubleshooting
your network, as well as for analyzing
the different traffic patterns in your network
across geographies, and to use that information
to better improve the topology of your
network to reduce costs. Now, you may have started small
in a single or a small set of regions, but the
configuration that you did initially by
creating the network, by creating the firewall rules,
by setting up VPC peering-- these things just
seamlessly work as you grow to more regions. With the VPC design, we make
sure that the cost and overhead of network administration
and management doesn't grow linearly with
the scale of your network footprint. We really think of
this model of VPCs as being the true
next generation of how VPCs are defined in the cloud. And we're really excited
to see how you use them. We've been working
closely with some of the top names in education,
banking, software, technology, mobile labs, you
know, you name it. And they've been giving
us some excellent feedback on how to improve our features. You know, it's really
been a fun ride so far. Now, three things that
I want you to take away from this presentation are
that VPCs are global in nature. They use the same underlying
network infrastructure as the rest of Google
that allows you to do amazing things at scale. They are highly performant. We take performance
very seriously. As I mentioned earlier, we have
been making a lot of progress in improving performance
of our network, but we're also looking for
new ways of doing this. We have a lot of
ongoing investment, and so you can expect to see
more performance improvements in the coming future. And lastly, that
VPCs are easy to use. As you can imagine, the network
is very powerful, but also very complex. And VPCs provide a simple
and intuitive interface to this complex infrastructure
that allows you extensibility, access control, and integration
with third party services. And with that, I
would like to invite Emanuele onstage to share a
few example VPC topologies. EMANUELE MAZZA: Thanks. NEHA PATTAN: Thanks. EMANUELE MAZZA: Thanks, Neha. [APPLAUSE] So we're going to run some
through example topologies that uses all the concept
and that Neha just explained. Before going over that, I
just want to quickly recap a couple of things. So if you look at from
the routing point of view, a VPC is really
a routing domain. That's what you get
when you create a VPC. It does provide you
internal IP connectivity across all your subnetworks-- doesn't matter in which
region they exist-- and without actually the traffic
leaving the Google network. So all the traffic stays
inside the Google network. Neha mentioned that we
have a project concept. So the project is
a unit of building. It's a unit of permissions. It's a unit of resources. And yes, you can have multiple
VPCs in the same project. If you do that, you get two
completely disjoint routing domains, which means,
like in the example here, you can have two VPCs with
overlapping IP subnets. So in this picture, you
can see that VPC1 and VPC2 as overlapping in us-west2. Of course, this is possible
because they are disjoint and they do not speak
across themself by design unless you want it to. And then the Cloud Router. So as Neha was saying, it's
the EBGP speaker process. Think about it like a route
processor in a physical router world. It does not do data plane. The whole goal in
life of this guy is to get routes from outside
and program them into the VPC, and gets route from the
VPC and send them outside. As you can see in
the picture, it can programs route
across all regions. So in this example, the Cloud
Router exists in us-west1, but its able also to advertise
subnetwork that exists in us-west2, which
is the 10.240.1/24. And they also are able to
advertise subnetworks that do not exist in your VPC-- in
this example, the 10.239.2/24. OK, now that these two
concepts are clear, let's start looking at some
potential topologies. So in this one, we're using VPC
peering in a producer consumer scenario. So we have on top, a provider
project in the blue org. And at the bottom, we've
got two consumer projects-- one in the green org
and one in the red org. So please know that these
three projects are in three different organizations. And now, what the provider
projects want to provide is common services that must
be used from both consumer project in the green
org and the consumer project in the red org. So the way I can do this, I
do this through VPC peering. So I do VPC peering
from the green org VPC to the leftmost VPC in
the provider project. And the red one is peering
with the rightmost one. Note that here I got
full overlapping IP across the consumer projects. Both consumer project are
using the same subnetworks. That's why I need two different
VPCs in the provider project. Tenant isolation in
the provider project is done by default
because those two VPCs are separated routing
domains, so there is no connectivity possible. You don't need to use
firewall rules for that. Shared VPC-- so I'm
going to show you a simple example
of how shared VPC can be used with a
multiple tier application. And the goal of this topology
is to basically centralize networking and security
configurations and operations in the host project. And by using shared VPC
and giving different users different permissions to
use different subnets-- and we'll see that-- and then I will be able
to have each team-- the team of the front
end service project and the back end
service project-- manage their own
compute by themselves. And this is, of course,
fully compatible with load balancers
and flow logs. So how does that work? So basically, first,
in this project, I create a subnetwork--
an orange subnetwork. And it's connected to
the shared VPC router. And I give permission
to the orange user to use the subnetwork. Subnetwork is an object,
so you can give permission to this object. Now what happens
is this orange user that exists in the service
project logs into the service project, create
instances, and he sees that subnetwork as
available to connect instances onto, even if that subnetwork
does exist in the host project. From here on, from the
networking point of view, those two instances is
just like if they are connected to the host project. From the compute
point of view, they exist in the service project. So building for the instances
is on the service project. Now from there, the orange user
create a global load balancer protecting the front end with
Cloud Armor, for example, and [INAUDIBLE] traffic
from the internet. Now, I'm also creating another
subnetwork in the host project. And this time, I'm
giving it permission to use this subnetwork
to the blue user. The blue user is a user that
exists in the back end service project. So what this user does is
logs into his service project and create an ILB,
create instances. And as the orange
user previously, he sees the subnetwork
in the host project as available to connect
his instances to and available to create the VIP
of an internal load balancer. And from here, I got
full IP connectivity between front end
and back end, exactly like those two set
of instances are connected to the same
VPC on the host project. Same idea, but on a
different concept. So I want to use a network
virtual appliance to provide security and networking
services to multiple projects at the same time. So what I can do is create
a network virtual appliance which is a multinic appliance. And as Neha was saying,
a multinic instance-- appliance, in this case-- in your multinic's
unique instance, each interface needs to be
connected to a different VPC. So here in this example,
I got those two VPCs. Let's say they are
two internal VPCs. And I'm using the same
concept of shared VPCs, so I'm sharing [INAUDIBLE]
with the orange user and with the blue users. And they create their instances
and the internal load balancer exactly like before. And then the network
virtual appliance also has a northbound-- well, I should
say eastbound in this picture-- external interface that is
connected to the external VPC-- so the one on the right-- that does BGP with the Cloud
Router with your on-prem. So in this example,
all the instances-- VM1, VM2, VM3, and VM4-- do get connectivity with on-prem
and do leverage networking and security services provided
centrally by the network virtual appliance
in the host project. The next one is a more
complex one, I would say, but not that much-- Shared VPC with
shared Interconnect. So what's the goal? The goal is that I want to
leverage the same Dedicated Interconnect across
multiple projects. Why? Because Dedicated Interconnect
is a precious resource. It provides [INAUDIBLE]
connectivity and it has a cost. So I want to use it
as much as possible. And I want to keep the
operational and management advantages of the Shared VPC
model that I showed before. And also I want to allocate
cost across projects in a meaningful way. So how can we meet these goals? So first, so we create one
project where the Dedicated Interconnect exists-- is created. And then you have one or
more shared VPC host project, each one with its
own Cloud Router and each one with its own
Interconnect attachment. As we will see, those
Interconnect attachment will use Dedicated
Interconnect that lives in a different project. Because again, everything
in GCP, it's an object. So also the Dedicated
Interconnect is an object and has permissions. How you do security? The best practice
Neha recommended is to use firewall rules
based on service account. And VPC Flow Logs will give you
complete visibility of traffic, including the service
VPC and the host VPC. So visually, how
that looks like? So if you look here first, so
this is the physical world. So these are [INAUDIBLE] sites
and you have your router here. And this is our
peering fabric device. So this is a site where we
provide dedicated interconnect. And this DI is the
Dedicated Interconnect. Physically it's a port
channel, so that's where the connection is built. Now if you forget
about the physical and think about the
cloud as an object model, the Dedicated Interconnect is an
object that lives in a project. So you go into a
project and you say, I want a Dedicated Interconnect. Now, we got in this
example two shared VPC. And the structure is
the same as the one that I was basically
showing you before. And what is to be noticed here,
that each of those two shared VPC-- the host project-- has a VLAN attachment
and a Cloud Router. So note that VLAN
attachment is an object that lives in the host
project, but it is using Dedicated Interconnect
that lives in the Dedicated Interconnect project. So you can have a single
project with the Dedicated Interconnect. And you can use it to
basically share the bandwidth across multiple, in this
case, host projects, of a shared VPC configuration. Of course, if you want, you
can have full overlapping IP support because, from
our side, the tenancy's provided by the fact that
we have two different VPCs. And on your side and the
customer side, of course, you have to terminate
[INAUDIBLE] that is overlapping. You have to terminate them into
different VRF, for example, and then bring the tenancy
inside your network to MPLS VPN or whatever you are use to. So we basically met
all these goals. And a couple of consideration--
so how about the billing? So the Dedicated
Interconnect gets billed against that project,
the project where it exists. The VLAN attachment gets billed
against the host project. The Egress traffic
from the instances gets billed on the
service project where those instances lives,
not on the host project. So if you have multiple
teams, different teams that manage the service
project, you can know how much traffic and Egress
cost they are basically using. There's a common DNS space
spanning each Shared VPC domain. This is a native
capability of Shared VPC. And VPC Flow Logs, I
was saying, contains information, both about the
service project and the host projects. So if you look at the flow
logs, in the source instance and destination
instance field, you will see the service project
where the instances actually leaves. And if you look in source
VPC and destination VPC, you will see the
host project to which the instances are attached. So I have two more. One is a classical high
throughput VPN connection. So in this example, you have
two VPN gateways in your VPC. In this example, they are
in two different regions. And you have multiple VPN
tunnels for each VPN gateway. Since we can do equal cost
multipath per flow, of course, from GCP to on-prem, this
basically gives you a high throughput connectivity
between your on-prem-- sorry, your GCP project
and the on-prem. There are multiple ways
you can set up this. The suggestion is, for sure,
do route-based to start with because it's more flexible. But the thing I always
suggest to customer is to start, if possible,
with the Cloud Router and BGP. Why? Because the moment you switch
to Dedicated Interconnect, the routing will be
exactly the same. So you can test
and get confident of your routing
setup, VPN, and then move to the dedicated
interconnect, and you don't change a thing. You can actually
use VPN as a backup of Dedicated Interconnect. And this is what I'm showing
you in the last slide. So what I'm leveraging here
is the global routing feature of the Cloud Router. So let's say that I want
to give reachability to us-east subnets. I send two advertisements
from the us-east Cloud Router on the green
region with met-120, so met is a BGP attributes,
basically, that you can set on the Cloud Router. And also, the Cloud
Router in us-west knows about the us-east1 subnet. But we automatically
add an inter-region cost to that route, which
is that 201, which means that when Cloud
Router in the orange region advertises subnetworks, they
automatically get an additional met and higher met than
120, meaning 120 plus 201. And that's 201 is
just an example. It's dynamically
calculated by the platform. And this way, you're sure
that both R1 and R2 use the Interconnect
default. And when those Interconnects are down,
you use a VPN as a backup. The same thing you can
do on the reverse side. And on the reverse side,
it's actually easier because even if you advertise
the same met from on-prem to your Cloud Routers,
we automatically add that inter-region cost
when the Cloud Routers that lives in the orange region
program the routes on the VPC. We automatically do
that because the subnets are living in a different
region, in the green one. So that gives you automatic
load balancing and usage of the Dedicated Interconnect
as a primary and all leading VPN only as a backup. And with that, we
go to Kamal, who will tell you how PayPal is
using VPC in their production network. Thank you. [APPLAUSE] KAMAL CONGEVARAM MURALIDHARAN:
Good afternoon, everyone. My name is Kamal. I'm part of the core
platform and infrastructure team at PayPal. So at PayPal, we started
our public cloud journey a couple of years back. So when we started
looking at how PayPal can expand into
public cloud, and we are having one of the
largest private data centers. So we started looking
at what is needed and what we need to do
to get into public cloud. We soon realized two
important things. One is that it is not a
one-year or a two-year journey and just like that we can
just jump into a public cloud. It is a multi-year journey
because it involves a culture change, mindset
change, toolset change, and even some application
rewrite as we move into that. The second thing we
realized was none of the public cloud
had all the features for supporting enterprises
to run in a more secure and a scalable way. So it means that we need a
partner to journey with us, and we travel along with them
in this multi-year journey to get to the state where
we can run a big enterprise on public cloud. So with that, one of the key
things we started looking at is that how do you build building
blocks in the public cloud so that we can start figuring
out how to move things in a phased manner because
we cannot move everything. So one of the key pieces
which we started looking at is, how do you build
this VPC in public cloud? So with that, I'll go
to one of the use cases which we started with and how
we went through the process of moving into public cloud. So the first use case we
took was a dev use case, which is basically-- PayPal has thousands
of developers, and each of those
developers needs their space to go and do creative things,
like creating their VMs, deploying their services,
running obligations, or do things, which
is something we want to provide the space for them. So when you look at
that, what they need is autonomy in doing that. I would say that DevOps is
a very widely used term, but the meaning of DevOps
varies from company to company, person to person. But what we really think
is that give the autonomy to the developer so that they
can do what they need to do, and we don't step
into their thing. At the same time, there
are a lot of other things which developers doesn't
want to take care of. For example, when we
started a Google Cloud, there was nothing
called a Shared VPC. So the model which existed
was, you create a project. And inside the
project, there will be a network, which means
that every project has its own network, which means
that every developer-- if I have to give a
project, then we also have to create a network
inside the project, which means that we need to
allocate an IP range. So we are talking about
hundreds of projects-- even more than that, thousands
of projects. And it's a big nightmare
for us to manage IP ranges for each of these projects. The second thing is that-- another main principle we
talk about for public cloud is, everything
should be automated, which means that when
we set up something, it's not just the automation
to create the initial project, but any maintenance after
that should be automated. So if we are creating a
project for a developer and if the network is within
that-- so any interconnectivity configuration or a
gateway configuration all exist within the network. So if you have to
make a change, it's not making change
in one project. We are talking about
going and making changes in thousands of projects. So it created a pretty difficult
for us to scale in that model. And that's where
we work with Google and then we partnered up
and say that, hey, these are the problems. And that's where we
started looking at-- Google introduced this
concept of shared VPC. So I think Neha and
Emanuele explained all the topologies and
everything, so I'm not putting any diagram here. But I'm going to explain
what problems we faced when we started initially and
how we saw it in the shared VPC model. So as I said, in the
current industry, there are two
schools of thought. One is a complete
DevOps model, where you give complete control to
the developer from start to end. That means that you
create an account and give everything
to the developer, and developer manages from
network to compute or deploying applications and everything. And there's another extreme
thought, which is basically completely anti-DevOps,
where everything is done by somebody else,
and developer gives support. So at PayPal, what we did is
that we took a hybrid approach. We really figured out what each
team should be responsible for and where we should
draw the line and say that, these
are all your developer responsibilities,
like compute, storage, your code, and everything. You manage it. But anything below, we want
to really standardize it. And especially in
a big enterprise, when you have thousands of
developers and thousands of projects,
standardization becomes very critical in terms of
security and also auditing. If you want to
change something, you need to know where to change it. If it is in many places,
it becomes very tricky. So if you look at it,
as I said, IP management was one of the problems. Allocating IP ranges,
assigning them, and when the project is deleted,
re-assigning it somewhere-- it was becoming a challenge. With Shared VPC, the
model we are after was that we have many
security zones in PayPal. Each security zone
is carved out based on what data can be
stored, what data can be accessed, what
services can exist, how the services can interact. There's many criterias based
on which we carve out zones. For example, there may be a
zone where the data resides, which means that nobody
can enter into the zone except through a single
channel, and that would be protected
with firewalls, data loss, everything. So we have this concept already
built into our private cloud. We want to bring a similar
concept into public cloud. That's where this model
of VPC became very useful, where we said that
every VPC we create is going to be a
security zone, which means that, within that
security zone, it's open, that if there are two
VMs, they can communicate. But any query to come
into that security zone-- it has to go through
a firewall or it'll have a VPC peering with more
security and scrutinization. The second thing is that, as I
said, once we have Shared VPC, the management becomes more
simpler because we just need to manage the connectivity
at the Shared VPC level. Every service project
we just created doesn't need to worry about it. So as far as the
developers are concerned, they just keep creating
VM, storage, and everything we started with a
VPN based approach, and then now we are moving
to a data Interconnect, which means that there's no impact
on the developers of us because everything is centrally
managed at the Shared VPC level. So this is a functional
representation of how we divided
the functionalities and felt that this is the
optimal way so that we give enough freedom for
developers to play around with public cloud, do
things what they need. But at the same
time, we don't need to overburden them with
responsibilities which they don't need to take care of. So if you look at it, we
have a Enterprise Cloud team which comes with best
practices of how many VPCs has to be created, how the
VPCs has to be interconnected, like peered or [? peered, ?]
how they have to connect back to the on-prem, where should
be the routing configured. All these are
centralized and automated through one single mechanism
so we manage everything down. And at the same time,
we have automation to create projects which
sets a project up, puts the right developers there,
and grants permission to this network. That's it. Once they get the project,
within the project, they have an autonomy
to do anything. So in this mechanism,
we were able to scale it to thousands of
projects, and we are able to manage it without any
impact on the future changes because we all know that
things [? airwall. ?] The cloud is going to
[? airwall. ?] That means we should also move
along with that. So we build the
foundation such a way that we can make more changes
in the future with less impact. Also, another
principle we talk about is there's another
use case, which is a test zone, which is
basically running PayPal.com inside PayPal, which means that
it is a complete replication of PayPal inside PayPal. That means that it
is used for our test environments, the integration
testings, and everything. Now, when we are up at a model,
which I showed previously, our goal is to see
that the same model can be applied for any use cases
which comes later, like test. Tomorrow it could be production
or later, something else. So in this case, if you
look at the use case, it's something a lesser scale. In terms of number
of projects, we are talking about only
a few tens of projects. But it's a huge resource. It's like thousands
of VMs, thousands of gigabytes of
storage and everything because it's complete
PayPal inside. But if you go into the
model, it's the same. We didn't change anything. We created a VPC, which
was already automated, which has all the features. We created the project
in the same way. We just need to assign different
themes to these projects. And within that, they were able
to do everything they wanted. So the model is scalable to any
use case which you bring it. And we are in the process
of, next year, trying to get public cloud production
use cases into public cloud. So we don't need to change much. The same automation
which we created is going to help
us do this again. As I said, its a
multi-year journey. We all realize that not
everything is available today, and we have to get this going. So we have seen
that VPC is common, but, again, not every
services are routed VPC. For example, we wanted
to create database with the private IP assigned
from the VPC range, which will become available. We'll start adopting it. IP alias is a great
feature, which was released. For us, we have a container
networking technology where we assign IP to container. So for that, IP Alias
is a great thing. So we are going
to start using it. And similarly, as this
technology evolves, our tools also evolve. For example, we had events
which have been consumed. Now, since we have this host
in a service project isolation, we need to figure out
how to collect even from different service
projects as from a single place because we don't want to deploy
something on every service project. So we are also evolving as
that technology evolves. And as Google is maturing
enough more with this, we will also be moving
along with that. Thank you. [APPLAUSE] [MUSIC PLAYING]