[MUSIC PLAYING] BABI SEAL: So today
we are going to do a deep dive into our Cloud
load balancing family. On the global side, we have
our HTTP, HTTPS load balancers, and our proxies for SSL and TCP. And on the regional, we have
our external facing network load balancer and our internal
layer 4 load balancing. By regional, I mean
that the back instances are constrained to
a specific region, and global-- we can
have back end instances across multiple regions. It's important to note that
Google Cloud load balancing is built on the same
underlying infrastructure that Google uses to deliver its
services to billions of users daily. It also underpins
many of the services that you use on Google Cloud-- be it storage, App Engine,
Kubernetes, managed, or native. So now by a show of
hands, how many of you consider yourselves
Cloud native? Yay! Cloud native in the house. OK. And how many of you consider
yourselves Enterprise folks moving to the Cloud? Awesome. I have good news. Today, we are going to show
you how Google Cloud load balancing can help
you build applications that, one, scale, make the
most use of your resources, are secure, and optimized
for latency and cost. We're going to do
this in four parts. We're going to do a deep
dive into our load balancers. We're going to secure your edge. We're going to optimize
for latency and cost, and Oz is going to pull it all
together with best practices. So let's get started. You know, our
first load balancer was our network load balancer. Three years ago, we
published a paper on Maglev, and it's our actually in-house
developed software load balancer. It runs on standard
Google hardware, but they optimized the code
for throughput and latency. And this has been in
production since 2008, and it literally load
balances all traffic coming into Google's data centers. It also distributes traffic to
our front end engines located in our points of
presence, and now we make them available to
you as our network load balancer for Google Cloud. So a network load balancer
essentially load balances layer 4 TCP or UDP traffic via
hashing to regional back end instances. The Maglevs tunnel
the packet directly to the back end instance, so
the source IP is preserved. The back end instances return
the packet to the client directly. It's direct server
written, so it does not go through the
Maglevs In this slide, you see three regional
load balancers-- test, my app, and travel. What's interesting is that
these regional load balances-- the virtual IP addresses-- they
come from a regional block, but when you're using
the premium tier, we advertise them globally. So if Shen in Singapore
wanted to access the load balancer located in
the US West, myapp.com, their traffic would ingress
the port in Singapore, go through our Google
global network, and be load balanced at US West. So even though it's a
regional load balancer, you have global access with
network load balancing. So what's under the hood? Remember I was
talking about Maglevs? So compare and contrast. When you have traditional
load balancers, they're typically deployed
in an active standby fashion, and they typically need
to be pre-warmed to deal with spikes in traffic. That's the one on the left. And on the right, you
have your Maglevs. And these Maglevs,
they're actually deployed in Active/Active
scale-out fashion. The way it works is
that the Maglevs-- they advertise the virtual
IP address of your load balancers-- all of them
to the peering router. The peering router then
equal-cost multi path distributes the
flows to the Maglevs. What do you want out
of a load balancer? You want to evenly
distribute your workloads, and you want a stable
connection from your client to your back end instance. And the Maglevs do this
primarily in two ways. One, it does connection tracking
of existing connections. So if other back end
instances come up or go down, your current connection keeps
tracked and stays stable. The second, it does something
called consistent hashing, which means that regardless of
the set of Maglevs changing, the back end is always
selected consistently. So that's essentially
Maglev under the hood. So just to recap what does a
network load balancer get you? It's a regional load balancer. It's a layer 4 load balancer. You don't get layer 7. You can do 2, 3,
or 5-tuple hashing, and it's very performant. You get almost a million
queries per second. But what does it not provide? It does not provide you global
IPv4 and IPv6 load balancing. You cannot route on the
basis of layer 7 headers, and you cannot do SSL proxy. So Google needed to come up
with a global load balancer. Now just to recap, think of
traditional load balancers and public clouds. They are regional. They have a regional VIP
address and regional set of back end instances. So let's say you have a
service that are distributed in three different regions. You have three different VIPs. Now if you wanted to globally
load balance across those three regions, you need
a DNS load balancer that would map the client
request to one of these VIPs, but there are several
challenges with this approach. Imagine an instance in one
of the regions going away. The load balancing and
the DNS infrastructure has to know of that change. Let's say your client
caches that IP address, and that could result in
a suboptimal selection. Last, your capacity is siloed. Resources in one region cannot
be used in another region. Now that would simply
just not work for Google, so when we had to come
up with a load balancer we did not use the
DNS-based approach. What we did was we went and
pushed the load balancing to the edge of our
global network, and we came up with a single
global anycast virtual IP address that front ends
worldwide capacity. If you run into availability
or capacity constraints, you can do cross region
failover and fallback. It's very nicely tied to our
auto scaling infrastructure, so when demand rises,
we can scale it up to meet our requirements. And when demand ebbs,
we can scale it down. It's also really nice. It gives you the
single point where you can apply global policies,
and it's very, very performant because it's nearly no
single point instance that's doing the load balancing. And you can support easily a
million queries per second. So how do our customers
typically deploy this, right? You would typically
actually start out deploying your services
in a single region, but what you would do is you'd
configure a global forwarding rule with a global anycast
virtual IP address. And you'd make your customer,
Maya in California, very happy. But let's say your service
grows in popularity, and now you have a
customer, Bob in New York, that also wishes to
access the service. Similarly, you would deploy
back end instances in US East, but you don't have to
reconfigure the VIP. You don't have to
modify any DNS entries. You're a global success. You have customers in Asia
that are now using back end instances in Singapore. Unfortunately, it
[? suddenly ?] happened-- and you had instances
in Singapore go down. At the same time,
demand in US West spiked, and you need to
address those events. What will happen is, without
requiring any intervention, your user requests
will seamlessly be directed to the region with
the most available capacity. Now when the US West
scales up, the traffic will again fall back to US West. Again, this is
happening all seamlessly without any intervention. Now what is it under the hood? This tool is also
software-based load balancing. The load balancing happens
at these things called Google Front End engines, or GFEs. These GFEs are located at
the edges of our network. Now the edge itself is tiered. Some of the load balancing
happens in the tier one edge. Some of the load balancing
happens in the tier two. The tier one edge is where our
points of presence are located. The GFEs effectively
proxy the client traffic. And they load balanced
by working in concert with each other-- working with our software
defined networking plane and also our global
control and config plane. How do we distribute traffic? Now, this slide is really dense,
but I have one key takeaway, which is as long as you have
capacity in the region closest to your users, we will
send the traffic there. But should you run into capacity
or availability constraints, we will then direct
the traffic spill-over to the next closest region. We call this our waterfall
model for load balancing. A region has multiple zones. We will populate each
zone proportional to the capacity that you
have allocated to each zone, and each zone we will
load balance evenly among the instances in
a round robin fashion. Now it's quite a lot, and I
don't expect to remember this. Oz is going to show
this to you live, but before we go to
the demo, I think it's important
that we cover what the data model for a
configuration looks like. In a load balancing, you have
something called the front end and you have something
called the back end. In the front end, you have
that global anycast VIP. That VIP is associated
with the forwarding rule. That forwarding rule
points to a target proxy. Now this target proxy could
be HTTP, HTTPS, TCP, or SSL. The target proxy is also the
location where have URL maps, and URL map-- essentially, maps
in a client URL based on host and path rules
to a specific back end service. Now this is where the
back end portion begins. A back end service is
made up of back ends. Back ends could be managed
instance groups or network endpoint groups. The back end service
is also the location where we associate
health configuration and serving capacity. Oz is going to show all
of this to you right now, so now let me just
hand it off to Oz to show this to all to you live. OSVALDO COSTA: Thanks, Babi. So can we switch to
the demo, please? All right. So here, what we
have on the screen-- there are three load balancers
that I have configured-- three HTTP, HTTPS load balancers. I'm going to talk-- the first
two we're not going to cover. In this demo will be the
next one, so let's drill down and take a look
at the third one. So the third load balancer,
what I have configured here-- and let's see now how that data
model, that Babi explained, how maps should configure in it. So the first part
is the front end. So as you can see
here on the screen, I have exposed both VIPs-- one v4 and another one v6. And remember that
these are global VIPs, so irrespective of
your client location, it's going to target the
same VIP, either v4 or v6. And they're both exposed
using HTTPS, which is part of our best practices-- I mean TLS everywhere. So that's the first
session that is established between the client and the VIP. Then I have also added
here-- have configured-- a feature that is available
as beta, which is the Google managed certs. So we are going to take
care of that for you and do the renewal and
revocation of the certs, and we'll see that in detail
also with the SSL policy. And the last config of the
front end is the network tier. What we have here is the
premium tier that Babi touched, which means the cold
potato type of strategy, which is the packet
stays within our network as much as possible. So it's not handed
off to the internet, so it traverses
over our backbone. Now the second section is
the host and path rules. What I have defined are
two simple rules here. So the first one,
or the last one, [? better saying, ?]
is based on the URL, everything that is being
sent to slash secure slash is going to be sent
to one back end. And everything else,
the default path then is going to be sent
to a separate back end. And then last here is how do
we map those hosts and path rules through the back ends? So I have two back
ends here in this case. The first one, which is going to
handle all the default traffic, I have configured as
the endpoint protocol HTTPS, which again is
also our best practice. So make sure that-- we recommend having
TLS everywhere. So the first session,
just to recap, between the client and
the VIP uses HTTPS. The second one for this back
end also relies on HTTPS. I have also a named
port, which is a port that you can define
for the communication between the proxy
layer and the client. It doesn't have to be the
same one as the front end, and we're going to see that
configuration as well when we take a look at the
managed instance group. The next one is a
timeout, which is also for the connection between
proxy and the back end, so here you want to take
a look at your application requirements. So if you have any requirements
of using long-lived sessions, you may want to relax
the timer a little bit in order to not kill your
sessions before you want. Cloud CDN is also here. You can enable cloud CDN
on the back end level, and it's a very
simple configuration. It's a checkbox to
enable Cloud CDN. And just a note here--
that Cloud CDN can use a VM as a back end
service but also a bucket, so you could also
do CDN on a bucket. And that traffic is going-- that cachable content is
going to be cached also at the POP layer-- so right at the edge. Also another
configuration that we have there is the health
check, which is what is the mechanism
that is going to be used to check the
liveliness of that instance. In this case, I'm
using, here, HTTPS. But you can also configure
using HTTP, using HTTP/2, configure the timers and do
some other customizations like setting, for example,
what type of response you would expect from
their application to consider that healthy. Now we have also, as part of
the back end, session affinity. So I have cookie
here, but it can also be configured as client IP. So client IP would mean that all
the sessions from a certain IP will be sent to that back
end to maintain the affinity. Cookie, on the other hand, will
allow the HTTP load balancer to send a cookie
back to the client. So if you have multiple
sessions from the same client, it would allow a
better distribution across our back ends. I have also, here, the
connection training timeout, which is the timer
that you configure-- in this case, the default-- for the sessions to persist or
just to live in that instance when an instance is
removed from the back end. So we want to configure
that as well to make sure that, when you remove the
session from the group-- sorry, when you remove an
instance from the group-- those sessions are not
killed automatically. And last but not least here,
I'm using also a new feature I have configured-- which
is available in beta-- which is user defined headers. So let me magnify here. So I have configured these
custom request headers that can be added on the HTTP
layer, on the proxy layer, and sent back to your back end. In this case, I'm adding here
those options, client region, city, rtt. So the back ends can take
a look at that information on a per-session basis and
make smarter decisions, but it's better if
I show you how that looks like in the web page. So I have configured-- this is
the landing page for that VIP, and here we can see some of the
information that I mentioned. So the first one
is the client IP, which is the public IP that
my client is using-- probably a netted IP in this case. The second one is the load
balancer IP, which is the VIP. That's the same one that we
saw in the configuration. The third one is
the proxy IP, which is for the second session-- what's going to be the source
IP of that second session. The server information. And here we can see
the user defined headers that are inserted. So my connection is
coming from San Francisco, so I would be a little
bit alarmed if it was coming from a different place. Here's my RTT-- in this
case, 3 milliseconds. The lat, long, the TLS
version, and ciphers. So this is custom. I mean you don't need to
add all the variables, but if you want to add some
of them, that's how you do it. Now going back here, I
just want to show, also, the second back end
that I have configured. We are going to use it for
the network security demo, but the only thing that
I want to highlight is I'm using a
different protocol here. So for this back
end, I'm using HTTP/2 instead of HTTPS, which is
also another feature that is released in beta. So for the communication
between the proxy layer and the back end,
you can actually leverage HTTP/2 for better
latency of your application. Let's take a look at
the instance group-- at one of those. Actually before we do that,
just one thing that I wanted to explain to make
sure it's clear is when you configure
those back ends, you add those instance groups. That could be managed or
unmanaged, as Babi explained. On those instance groups,
you have the option to set the capacity-- so what we
can see here on the right side. So if we go back there. So what we can see here is the
max RPS that I have configured. In this case, the way it works
is [? I will ?] configure, on the managed instance group
basis, the amount of requests that each VM can handle-- in this case, 100 RPS,
Requests Per Second. And then the max
capacity that that VM can handle in this case-- 100%. Now if I combine
those two, it means each VM can handle 100
RPS, and if I set that with the auto scaling,
it means that 80% of that capacity, 80 RPS,
auto scale up and start a new instance. Also note, that in
the second instance group that I have defined
here, auto scaling is off, and I have set that
to a capacity of 10%. So this is a great use case
to use for canary deployments. So if you have a new
web server that you want to deploy and just
send just a small fraction of your traffic, you can do that
and have a new instance group just using part of-- just handling part
of that traffic. And you're going to
see that in action. But just before we do that,
I just wanted to show you, very quickly, the instance
group configuration. So the first part is the
named port that I mentioned. In this case, I'm using
443 encrypted traffic, but you could also use a
different port if you wanted. Then the next one is
the instance template, which what is the instance
that is being used here to create that VM, so you
can customize that and load with your own software
and the required software that you need. Auto scaling is on,
as we mentioned. And I just wanted to highlight
the cooldown period, which is the time that you
need to make sure that your application takes to
be ready to process traffic. So in this case, I
changed to 30 seconds because it's a
simple web server, but you may want to customize
according to your application requirements. And the last one
is auto healing, so you can enable auto
healing on the group here because it's
going-- what it's going to do is health check that
instance based on the criteria that you set. And you get also
another timeout. Now just note that,
with the auto healing, if for some reason that
instance is marked as unhealthy, it's going to be
destroyed and recreated. So it will take more time
as compared to just pulling an instance from the group. Now I'm also sending
traffic here, so let's take just a quick look
at how the monitoring works in this case. So I have three different
load generators-- one in Asia, another one in North America,
another one in Europe. They are sending
traffic here, and you can see that for the
European one the traffic-- the target is the
back end that is used in Europe because it's close. And since I'm
sending a little bit below 200 RPS,
which is the max, I can see here that it's
being alerted that usage is approaching capacity. Now another behavior
that we can see here also is traffic being spilled
over from the Asian and North American load generators, being
sent to my new instance group-- just a small fraction
of that traffic just like I wanted,
because I just wanted 10% of the
traffic being sent there. And also another thing that
I wanted to show you is I have created-- there's a new dashboard
here, showing also the protocol distribution. So this is based on
a new feature that is available in alpha,
which is the login and monitoring of the
HTTP load balancer, so we are going to
talk a little bit more about what is available. But we can see, here, the number
of requests being generated by my load generators-- the
same ones that you saw-- and how is the
protocol distribution? So each one of
those is sending-- is using a different
HTTP version. Now back to Babi. BABI SEAL: Can you go back
to the slides, please? So some of you have
internal services that you run on Google Cloud. You want to scale and grow
them behind a virtual IP address that's only accessible
from your internal instances. And for these use cases, we
have the internal load balancer. So for the layer 4
internal load balancer, you're effectively load
balancing behind an RFC 1918 private virtual IP address. You are load balancing TCP and
UDP, similar to a network load balancer, and doing the
2, 3, or 5-tuple hashing. Your client IP is preserved,
and you can have TCP, HTTP, or HTTPS health checks. Now the key takeaway is this. There is really no middle proxy. Effectively, there's
really no load balancer. The way it works is we have
our underlying software defined networking layer, Andromeda. That takes care of doing
the connection tracking-- the consistent hashing. And what it does
is it sends traffic directly from the client
VM to the back end. That's how the client
IP is preserved. But what's even more
important to note is that this is a very
scale out architecture. Because there is
no choke point, you can have very performance
load balancer. If you look, the data model
seems vaguely familiar. Yes, it is pretty consistent
with our global data model. Except over here,
the forwarding rule, which is regional
forwarding rule as opposed to a global
forwarding rule, has an RFC 1918 VIP. You specify the
protocol TCP/IP-- I mean, TCP or UDP. And then you can have
up to five ports, and it points directly to a
regional back end service. Now one thing we went
and did this year is to give you the flexibility
to specify all ports. Now think of use cases such
as a Pivotal Cloud Foundry where they have
multiple applications, each on a separate TCP port. Well, you can load balance
them with really good entropy behind a single load
balancer, as opposed to creating, you
know, forwarding rules for different ports. You can have firewall rules
to protect access to ports that you don't want traffic on. Now a lot of our customers
also have requirements for running certain applications
in active standby mode, and I'm happy to
announce that we just did a beta for something we
call our failover groups. And what it allows
you to do is you can designate certain
instance groups as primary and certain
instance groups as secondary. When the health of the
instances in the primary group go below a certain
threshold, the load balancer will just automatically
failover the traffic to the standby instance group. You know, it's in beta. We'd love for you to try it out. Give your feedback, and
we can go from there. Now in my initial
slide, I mentioned load balancers deep dive. There was a sixth load
balancer in there. We do have a layer 7
internal load balancer. It is in alpha. It does support both Compute
as well as Kubernetes-- container-based Kubernetes. Under the hood, it uses the
Android-based sidecar proxy. That's performant, feature
rich, as well as open source. You don't have to worry
about the management of it, because GCP-- Google Cloud manages the
load balancer for you. It's currently a
regional service, but making it globally available
is something on our roadmap. So now I'm going to
hand it over to Oz to talk about container
native load balancing. OSVALDO COSTA:
Thanks again, Babi. So now we're going to talk a
little bit about the network endpoint groups that
are used, or leveraged, by the container
native load balancing. So on the left side here,
the blue one-- what we have is the traditional
way to load balance across Kubernetes
clusters, which would be targeting the nodes. So in that case, the
instance group would have-- or the load balancer
would have the visibility of two different nodes only,
and doesn't know about the pods. So it will load
balance and distribute the traffic across those
different nodes only. When the traffic gets to
those nodes, then kube-proxy-- that will be running
on that node. Then we'll use, if needed, IP
masquerade to the source net and may send that traffic
to a different node-- to a different pod. That different node may
be in a different zone, so you may have inter-zone
communication in that case. So for this behavior, then
you have two different things that apply here. The first one is the
source net, so you don't have the source
client anymore because it's a blind source net. And the second one, you have
a higher network utilization, because now you have IP
tables being applied, and traffic that
gets to one port-- to one node-- being
sent to another node-- to a different pod. Then now we introduced the
network endpoint groups concept which is also available in beta. So the network endpoint
groups concept-- the load balancer will now have
visibility to the pod layer-- to the pod level-- by using IP and port pairs. So in this example here,
instead of targeting two different nodes, it will
target five different pods. So you have better latency. You can expect better latency-- better network utilization
because we don't have traffic getting to one
node, and then being translated to a different one-- and now also better health
checking, because instead of health checking the node
and then gets translated, it just health tracked
the pods directly. And that is also available
by adding that annotation that I had added
here to the screen, but let's see this live. So can we go back
to the demo, please? Thank you. So I have created two different
clusters in this case-- identical clusters--
so each one of them with three nodes and six pods. And six pods. And then I have exposed
both of them using ingress-- and one of them with the network
endpoint group's annotation, as I mentioned. And just to make
sure that we know how the pods are distributed
here, if we use this command-- this kubectl
command-- then we can see that Kubernetes
allocated those pods across different nodes a
little bit differently. So if we take a look at the
network endpoint, the neg cluster, then the
pods got allocated-- three pods in one node,
then two in the second one, and one in the third one. And the non-negs was a
little bit different, because it got four pods
in one node and then one in each additional node. Now I am also
sending traffic here. I have traffic generators. So if we go back to
the load balancer, let's see how this traffic
is being distributed. So first let's take a
look at the non-neg one, so the first thing is
the instance groups. So it sees three different
instance groups, and that's it. And it just has node awareness. Now if you go to Monitoring, I'm
sending around 5,500 requests per second, and since it
does not have visibility into the number of
nodes, those requests are just evenly distributed
across the nodes irrespective of the number of pods
that are healthy there. Now if we take a look at
the neg one, since the neg-- since the load balancer now
has visibility into the pods, then we can see that
the traffic is actually distributed according to
the number of pods there. So the first node is
getting traffic here for around 1,200
requests per second. The second one is
getting three times that, and the third
one-- two times that. Now if we take a look also
at our custom dashboard-- that I also built using the same
login and monitoring feature that is available in alpha-- we can see the number of
requests there reaching-- and also the latency. And break the latency
based on the first session and the second session. So we can see here that,
even though the same amount of resources are dedicated
to both clusters, the neg-enabled one
is able to handle 7,000 requests-- so 2,000
more that the non-neg one. And in terms of latency,
what this graph here means is on the 99th percentile,
or all sessions under-- 99% of the sessions,
for the neg one it took around 57
milliseconds to magnify-- so 57 milliseconds. Now the non-neg took
106 milliseconds in terms of latency, so it took
almost twice the amount of time to respond to those sessions. And just to make sure
that there are no tricks, I have also added here
the front end latency, so it's not significant. So the latency is
really on the back end. And also just to
highlight the point that I mentioned in
terms of network traffic, even though it's a very simple
web server that I have running here, the neg one consumes
half of the amount of traffic as compared to the non-neg. So if you had those nodes
just like in this case across different
zones, you would be paying for more traffic
as well to cross those zones. Back to Babi. BABI SEAL: So now
you've built a service, and it's now very important
that you secure your edge to protect it against DDoS
attacks, either volumetric or application layer. Before we get
started, every network security at Google Cloud-- it's a shared responsibility
between you and Google. We strive our best to
protect our network as much as possible, but we also
give you tools to protect it. We give you tools. And our best practice
that we recommend is take a defense
in depth approach. Secure it at all layers. So please secure it within
your VPC, in your VPC, access to your VPC, access to
Google services, from your VPC. And secure your edge. We're going to talk about
securing your edge soon, but a lot of these
individual topics have very good coverage
in sessions here at Next. And I would strongly recommend
that you go take them out, and do your due
diligence in terms of figuring out
the best security solution for your requirements. Now at Google, we strongly
recommend, as a best practice, that you run TLS
everywhere that you can. This is for your data privacy
and your data integrity. So we don't charge
extra for encrypted versus unencrypted traffic. We give you HTTPS
and SSL proxies. We'll talk about
our manage certs that help you apply certs
to talk to your clients. We also recommend that
you can apply self sign certs from your
back end instances to your load balancer, so
net net, as this diagram illustrates, have HTTP
and TLS running everywhere and have better privacy
and data integrity. If you want to run
multiple domains behind a single virtual IP
address and port of the load balancer, we support that. Managed certs-- we
will help you reduce your toil of procuring
the cert and managing the lifecycle of the cert. All you have to do, as
this slide illustrates, is just specify the domain
that you want to secure, and Google will procure the
certificate in very short order and keep you up and running. You can use SSL policies to
specify the minimum TLS version and the SSL features
that you want to enable on your HTTPS load
balancer and your SSL proxies. You can use Google
preconfigured profile for some of these
features, or you can choose to have your own custom. Let's say you have a
strict requirement where you want to be very
explicit in the ciphers, and the TLS version that you
want to deploy in your network. You can use custom
SSL policies for that. Now securing your edge. And again, take a
layered approach. Use a global load balancer. Use it in concert
with Cloud Armor. Use it in concert with
Identity-aware Proxy, and layer it with
firewall rules. So with the Google global
network and our global load balancer, we are able to
absorb, mitigate, and, you know, dissipate a lot of the
volumetric layer 3 and layer 4 attacks. So that, you get with
the global load balancer. Now for application
layer attacks, we recommend that you take a
look at a Cloud Armor solution. There's actually a lot of
really good talks over here at Next with Cloud Armor. But essentially
with Cloud Armor, you can specify security
policies such as IP allow/deny lists and geo-based
access control, in addition to
protecting protection against cross site scripting
or SQL injection attacks. Now layer that with
Identity-aware Proxy. Now this is the whole beyond
[? carb ?] story, so based on the identity and
context of the user, you can authenticate the user
and authorize their access to back end services. So, layering security
levels on top. And all of this work in
concert with each other. So if your Identity-aware
proxy says no, and your Cloud Armor says
yes, it will default to no. In addition, because our load
balancer proxies come from very well-defined IP addresses,
why don't you block-- you know, your
firewall should be configured to only allow those
traffic from the load balancer. Don't open it up
to the internet. So take a defense
in depth approach, apply multiple solutions
that we provide for you, and protect your edge. I'm going to give it
to Oz to kind of show all of this in action. OSVALDO COSTA: Thanks, Babi. So here I'm going to show a
little bit of the security features that I have configured. Some of them I actually
showed already. So the first one is
the managed certs-- the Google managed certs
that I have enabled here. So as we can see, it's active,
and it's a super simple feature to deploy. The way to configure
it is you have to enable when you're in DNS. And then once you do that,
make sure that you add the A record-- the quad-A record-- pointing to your VIPs, and
those VIPs are configured pointing to your forwarding-- to our back ends
and are healthy. Once you do that,
then the certs should to take a few minutes just
to be alive and ready. I have also attached, here, an
SSL policy restricting the TLS version, so let's take a look
how that is actually working. So I have my default
policy, and I have defined a new one which
restricts the minimum DNS version to 1.2 and the set
of ciphers to a restricted. So if we actually Edit here, we
could even select a custom list and select only the
ciphers that I want. So I am exposing
that here on my VIP, and this is attached
to the front end. Now if we go here
to my cloudshell, and I try to connect
using TLS 1.2, we can see that HTTP/2
responded with a 200. It was fine. Now if I change
to 1.1, then I get an error of not being accepted. And the last part that I
wanted to show you here is Cloud Armor. In this case, I'm using
Cloud Armor with the allow and deny lists, so
basically what I'm doing is setting the
default behavior-- so setting the default
behavior to deny, and I'm customizing the
response-- in this case, a 403. But it could be, also,
a different response. It could be a 502, for example. And then I'm setting
an allow list-- in this case, setting here the
ranges from the [INAUDIBLE],, so everyone connected can
actually reach this VIP. And then I'm targeting
that second back end that I have configured,
so everything that goes through to
slash secure slash is where I'm attaching it. So if I go to my landing
page and just type secure, you can see it's a
different landing page. And now if I try from
cloudshell, I get a 403, so it's being blocked. Babi? BABI SEAL: So now
you've secured your edge and deployed your services. The next thought that
comes to your mind is how do I optimize
my services? And you'd like to
optimize them for latency, and then optimize them for cost. We've already talked about our
distributed edge infrastructure and the latency benefits
that you get out of that. A few additional
takeaways-- when you do SSL termination at the
edge, you get better latency. When you have PCP
retransmissions from the edge, you get much better
latency performance. Talking about TCP optimizations,
after the first connection is established from the Google
front end to the server, we can then optimize for
subsequent connections to round trip latencies. We can eliminate most of
those round trip times and optimize for a
handshake latency. Now a really nice way
to optimize for latency, especially if you have
content that's cachable, is cache it at the edge. Use Cloud CDN, and
all you have to do is enable a checkbox on
your back end service. And you'll be serving content
and delighting your users with their latency experience. In fact, there
was a public study done by cedexis.com that
compared Google Cloud CDN with other providers. You can go-- refer that link. And it showed that
Google Cloud CDN had lower latency, better
throughput, and much better availability. How do you make the web faster? Now this is something we at
Google spend a lot of time working on and thinking
about, and if we have any-- my two takeaways
from this session is please go take
a look at QUIC. Take a look at HTTP/2. And see if it can help
provide you latency benefits. So QUIC stands for Quick
UDP Internet Connections. If you've used Chrome,
or you've used YouTube, your traffic has
gone on top of QUIC. QUIC is UDP-based
transport that's encrypted, and it's optimized for HTTPS. Among its many features, the
three that help improve latency are quicker connections
setup times, multiplexing of
different streams, and no head-of-line blocking. So when cedexis was doing the
study of CDN throughput, when we went and turned on
QUIC, the throughput increased significantly. That was because
with lower latency we were able to set
up more connections, and with more
connections we were able to get a higher throughput. Now the diagram on the
right shows the advantages that QUIC has in terms of
quicker connection setup times. When you think of HTTP
running over TCP and TLS, consider all the handshakes
that are involved with that HTTP over TCP and TLS. And think, how paired with
QUIC, it's a lot quicker. It's even more quicker if the
server has been seen before, when the latency is just 0. We also support HTTP/2 not just
from the client to the load balancer, but also from the
load balancer to the back end. Doing this enables
us to support gRPC, and what it enables us to do
is support a stream-based load balancing of gRPC streams
to the back end service. Now optimizing for cost. You know, when you
egress the data center, you incur egress
bandwidth charges. But if you're optimizing
for performance, we recommend that
your return traffic do take the Google's performant
and well-provisioned network and exit it at the
closest point of presence to the user to give the
best user experience. But let's say you have
certain workloads that are cost sensitive. They are not mission
critical, and you want to optimize for cost. Well, we then give you the
option of the standard tier, and with the
standard tier, it'll take standard ISP transit to
egress the data center just like other Cloud providers. And you'll save in cost savings. You can turn it on
per load balancer, or you can turn
it on per project. And user-defined headers. So Oz had shown a really
cool demo of this. So what you can do is-- at the load balancer
level, you can configure custom request headers
to be added to those requests. You can configure for collecting
the geolocation, the TLS parameters, the smooth round
trip latency between the load balancer and the client. Capture all that information
from the load balancer. Send it to your
back end instance, then the application
logic in the back end can use that
information and then use it for the
optimization purposes that you are looking for. So now I'm going to
request Oz to pull it all together for this session. OSVALDO COSTA: Thanks, Babi. So just to summarize a little
bit of the best practices that we have, the first point
is how to pick the right load balancer for you? We have this chart which is
available in our public site, but some of the main points
that I would highlight here is make sure you understand
what type of traffic you want-- external versus internal? What type of client,
v6 versus v4? What type of protocol? Do you have HTTP or HTTPS? Do you need host and path rules? Is it TCP? Is it UDP? How the sessions are
being distributed-- do you need TLS termination? So this helps you to pick the
best load balancer for your use case. Now once you have selected
that, a few points that I wanted to highlight-- the first one is secure
your service, right. So we have seen several
of the best practices here in the demo, and
also we talked through it. So TLS everywhere,
that's the first point. So do TLS on your front
end using your cert-- or leveraging the
Google managed cert, so you don't have to be
concerned about renewals. Use self signed
certs, for example, on your instances for
that communication between the proxy and the VIP. Leverage the VPC firewall. So as Babi mentioned, make sure
you harden your VPC firewall, and it does not
allow-- and do not allow traffic that you are
not supposed to expect. Use the SSL policies, as well-- very important for compliance. So optimized for latency, so
pick the right regions closer to your clients. Leverage the cross region
global load balancer, so make use of the
waterfall algorithm-- that it's going to
send the traffic to the closest available
region based on the RTT. If you're using
Kubernetes, take a look at the container and native load
balancing for the network end point groups, because that may
also help you with latency. Also, optimize for auto scaling. Our load balancers, they don't
need to be warmed up, right. They are ready to
process traffic. However, your back
end instances, they need to be ready. So make sure that you
configure those instances with the right auto scaling
criteria that you need. The timers that you need for
that instance should be ready. And be careful also
with health checks. They can be chatty, and the
way that the architecture works is that there is a
back end manager that is responsible for doing
all those health checks. So you can set those timers
and customize those timers based on your
application requirements. And another thing
is auto healing. It is an option
that you can use. Just be aware that if
auto healing detects that your application
is not ready, it's going to destroy
that VM and recreate, so it will take a longer time. If you are using the
global load balancers-- HTTP load balancers,
TCP, or SSL proxy. So combine that
with Cloud Armor-- being allow and deny
lists or at the WAF layer for all the other features
that Babi mentioned. Also use the capacity scaler. So we talked about it, right? So the capacity, you can
set to a percentage-- a small percentage. So when you have
a new version, you can then deploy with
just that instance group, and just send just a
small percentage just to test traffic. And by the way, it doesn't
need to be in the same VPC, so it can also be
on a separate VPC. Set the appropriate
connection timeouts, so make sure you understand
your application-- what do you expect? How long those flows-- they need to be to
be up and running? And use cookies or
IP-based, as we talked. Now for the Layer 4 load
balancer and network load balancer, the default
is the 5-tuple, so you can get a
better distribution. However in some cases, you
may have fragmented packets, or even sessions that you
need to do stickiness based on the source and client IP. So you can customize that. Also on the ILB-- Babi mentioned that the ILB
actually does not exist. So the way it works in the
architecture under the hood is that those instances, they
are configured to respond to packets sent to the VIP. So if you're using your custom
VM, that's a common thing. Make sure that you add
that to your VM as well, and allow that VM to respond
to packets sent to the VIP. BABI SEAL: I think
we're actually at the close of our session. But before we go,
please take a moment to say a big thanks to
the production crew. They were awesome. They work really hard, and I
totally appreciate their work. [APPLAUSE] [MUSIC PLAYING]