Cloud Load Balancing Deep Dive and Best Practices (Cloud Next '19)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] BABI SEAL: So today we are going to do a deep dive into our Cloud load balancing family. On the global side, we have our HTTP, HTTPS load balancers, and our proxies for SSL and TCP. And on the regional, we have our external facing network load balancer and our internal layer 4 load balancing. By regional, I mean that the back instances are constrained to a specific region, and global-- we can have back end instances across multiple regions. It's important to note that Google Cloud load balancing is built on the same underlying infrastructure that Google uses to deliver its services to billions of users daily. It also underpins many of the services that you use on Google Cloud-- be it storage, App Engine, Kubernetes, managed, or native. So now by a show of hands, how many of you consider yourselves Cloud native? Yay! Cloud native in the house. OK. And how many of you consider yourselves Enterprise folks moving to the Cloud? Awesome. I have good news. Today, we are going to show you how Google Cloud load balancing can help you build applications that, one, scale, make the most use of your resources, are secure, and optimized for latency and cost. We're going to do this in four parts. We're going to do a deep dive into our load balancers. We're going to secure your edge. We're going to optimize for latency and cost, and Oz is going to pull it all together with best practices. So let's get started. You know, our first load balancer was our network load balancer. Three years ago, we published a paper on Maglev, and it's our actually in-house developed software load balancer. It runs on standard Google hardware, but they optimized the code for throughput and latency. And this has been in production since 2008, and it literally load balances all traffic coming into Google's data centers. It also distributes traffic to our front end engines located in our points of presence, and now we make them available to you as our network load balancer for Google Cloud. So a network load balancer essentially load balances layer 4 TCP or UDP traffic via hashing to regional back end instances. The Maglevs tunnel the packet directly to the back end instance, so the source IP is preserved. The back end instances return the packet to the client directly. It's direct server written, so it does not go through the Maglevs In this slide, you see three regional load balancers-- test, my app, and travel. What's interesting is that these regional load balances-- the virtual IP addresses-- they come from a regional block, but when you're using the premium tier, we advertise them globally. So if Shen in Singapore wanted to access the load balancer located in the US West, myapp.com, their traffic would ingress the port in Singapore, go through our Google global network, and be load balanced at US West. So even though it's a regional load balancer, you have global access with network load balancing. So what's under the hood? Remember I was talking about Maglevs? So compare and contrast. When you have traditional load balancers, they're typically deployed in an active standby fashion, and they typically need to be pre-warmed to deal with spikes in traffic. That's the one on the left. And on the right, you have your Maglevs. And these Maglevs, they're actually deployed in Active/Active scale-out fashion. The way it works is that the Maglevs-- they advertise the virtual IP address of your load balancers-- all of them to the peering router. The peering router then equal-cost multi path distributes the flows to the Maglevs. What do you want out of a load balancer? You want to evenly distribute your workloads, and you want a stable connection from your client to your back end instance. And the Maglevs do this primarily in two ways. One, it does connection tracking of existing connections. So if other back end instances come up or go down, your current connection keeps tracked and stays stable. The second, it does something called consistent hashing, which means that regardless of the set of Maglevs changing, the back end is always selected consistently. So that's essentially Maglev under the hood. So just to recap what does a network load balancer get you? It's a regional load balancer. It's a layer 4 load balancer. You don't get layer 7. You can do 2, 3, or 5-tuple hashing, and it's very performant. You get almost a million queries per second. But what does it not provide? It does not provide you global IPv4 and IPv6 load balancing. You cannot route on the basis of layer 7 headers, and you cannot do SSL proxy. So Google needed to come up with a global load balancer. Now just to recap, think of traditional load balancers and public clouds. They are regional. They have a regional VIP address and regional set of back end instances. So let's say you have a service that are distributed in three different regions. You have three different VIPs. Now if you wanted to globally load balance across those three regions, you need a DNS load balancer that would map the client request to one of these VIPs, but there are several challenges with this approach. Imagine an instance in one of the regions going away. The load balancing and the DNS infrastructure has to know of that change. Let's say your client caches that IP address, and that could result in a suboptimal selection. Last, your capacity is siloed. Resources in one region cannot be used in another region. Now that would simply just not work for Google, so when we had to come up with a load balancer we did not use the DNS-based approach. What we did was we went and pushed the load balancing to the edge of our global network, and we came up with a single global anycast virtual IP address that front ends worldwide capacity. If you run into availability or capacity constraints, you can do cross region failover and fallback. It's very nicely tied to our auto scaling infrastructure, so when demand rises, we can scale it up to meet our requirements. And when demand ebbs, we can scale it down. It's also really nice. It gives you the single point where you can apply global policies, and it's very, very performant because it's nearly no single point instance that's doing the load balancing. And you can support easily a million queries per second. So how do our customers typically deploy this, right? You would typically actually start out deploying your services in a single region, but what you would do is you'd configure a global forwarding rule with a global anycast virtual IP address. And you'd make your customer, Maya in California, very happy. But let's say your service grows in popularity, and now you have a customer, Bob in New York, that also wishes to access the service. Similarly, you would deploy back end instances in US East, but you don't have to reconfigure the VIP. You don't have to modify any DNS entries. You're a global success. You have customers in Asia that are now using back end instances in Singapore. Unfortunately, it [? suddenly ?] happened-- and you had instances in Singapore go down. At the same time, demand in US West spiked, and you need to address those events. What will happen is, without requiring any intervention, your user requests will seamlessly be directed to the region with the most available capacity. Now when the US West scales up, the traffic will again fall back to US West. Again, this is happening all seamlessly without any intervention. Now what is it under the hood? This tool is also software-based load balancing. The load balancing happens at these things called Google Front End engines, or GFEs. These GFEs are located at the edges of our network. Now the edge itself is tiered. Some of the load balancing happens in the tier one edge. Some of the load balancing happens in the tier two. The tier one edge is where our points of presence are located. The GFEs effectively proxy the client traffic. And they load balanced by working in concert with each other-- working with our software defined networking plane and also our global control and config plane. How do we distribute traffic? Now, this slide is really dense, but I have one key takeaway, which is as long as you have capacity in the region closest to your users, we will send the traffic there. But should you run into capacity or availability constraints, we will then direct the traffic spill-over to the next closest region. We call this our waterfall model for load balancing. A region has multiple zones. We will populate each zone proportional to the capacity that you have allocated to each zone, and each zone we will load balance evenly among the instances in a round robin fashion. Now it's quite a lot, and I don't expect to remember this. Oz is going to show this to you live, but before we go to the demo, I think it's important that we cover what the data model for a configuration looks like. In a load balancing, you have something called the front end and you have something called the back end. In the front end, you have that global anycast VIP. That VIP is associated with the forwarding rule. That forwarding rule points to a target proxy. Now this target proxy could be HTTP, HTTPS, TCP, or SSL. The target proxy is also the location where have URL maps, and URL map-- essentially, maps in a client URL based on host and path rules to a specific back end service. Now this is where the back end portion begins. A back end service is made up of back ends. Back ends could be managed instance groups or network endpoint groups. The back end service is also the location where we associate health configuration and serving capacity. Oz is going to show all of this to you right now, so now let me just hand it off to Oz to show this to all to you live. OSVALDO COSTA: Thanks, Babi. So can we switch to the demo, please? All right. So here, what we have on the screen-- there are three load balancers that I have configured-- three HTTP, HTTPS load balancers. I'm going to talk-- the first two we're not going to cover. In this demo will be the next one, so let's drill down and take a look at the third one. So the third load balancer, what I have configured here-- and let's see now how that data model, that Babi explained, how maps should configure in it. So the first part is the front end. So as you can see here on the screen, I have exposed both VIPs-- one v4 and another one v6. And remember that these are global VIPs, so irrespective of your client location, it's going to target the same VIP, either v4 or v6. And they're both exposed using HTTPS, which is part of our best practices-- I mean TLS everywhere. So that's the first session that is established between the client and the VIP. Then I have also added here-- have configured-- a feature that is available as beta, which is the Google managed certs. So we are going to take care of that for you and do the renewal and revocation of the certs, and we'll see that in detail also with the SSL policy. And the last config of the front end is the network tier. What we have here is the premium tier that Babi touched, which means the cold potato type of strategy, which is the packet stays within our network as much as possible. So it's not handed off to the internet, so it traverses over our backbone. Now the second section is the host and path rules. What I have defined are two simple rules here. So the first one, or the last one, [? better saying, ?] is based on the URL, everything that is being sent to slash secure slash is going to be sent to one back end. And everything else, the default path then is going to be sent to a separate back end. And then last here is how do we map those hosts and path rules through the back ends? So I have two back ends here in this case. The first one, which is going to handle all the default traffic, I have configured as the endpoint protocol HTTPS, which again is also our best practice. So make sure that-- we recommend having TLS everywhere. So the first session, just to recap, between the client and the VIP uses HTTPS. The second one for this back end also relies on HTTPS. I have also a named port, which is a port that you can define for the communication between the proxy layer and the client. It doesn't have to be the same one as the front end, and we're going to see that configuration as well when we take a look at the managed instance group. The next one is a timeout, which is also for the connection between proxy and the back end, so here you want to take a look at your application requirements. So if you have any requirements of using long-lived sessions, you may want to relax the timer a little bit in order to not kill your sessions before you want. Cloud CDN is also here. You can enable cloud CDN on the back end level, and it's a very simple configuration. It's a checkbox to enable Cloud CDN. And just a note here-- that Cloud CDN can use a VM as a back end service but also a bucket, so you could also do CDN on a bucket. And that traffic is going-- that cachable content is going to be cached also at the POP layer-- so right at the edge. Also another configuration that we have there is the health check, which is what is the mechanism that is going to be used to check the liveliness of that instance. In this case, I'm using, here, HTTPS. But you can also configure using HTTP, using HTTP/2, configure the timers and do some other customizations like setting, for example, what type of response you would expect from their application to consider that healthy. Now we have also, as part of the back end, session affinity. So I have cookie here, but it can also be configured as client IP. So client IP would mean that all the sessions from a certain IP will be sent to that back end to maintain the affinity. Cookie, on the other hand, will allow the HTTP load balancer to send a cookie back to the client. So if you have multiple sessions from the same client, it would allow a better distribution across our back ends. I have also, here, the connection training timeout, which is the timer that you configure-- in this case, the default-- for the sessions to persist or just to live in that instance when an instance is removed from the back end. So we want to configure that as well to make sure that, when you remove the session from the group-- sorry, when you remove an instance from the group-- those sessions are not killed automatically. And last but not least here, I'm using also a new feature I have configured-- which is available in beta-- which is user defined headers. So let me magnify here. So I have configured these custom request headers that can be added on the HTTP layer, on the proxy layer, and sent back to your back end. In this case, I'm adding here those options, client region, city, rtt. So the back ends can take a look at that information on a per-session basis and make smarter decisions, but it's better if I show you how that looks like in the web page. So I have configured-- this is the landing page for that VIP, and here we can see some of the information that I mentioned. So the first one is the client IP, which is the public IP that my client is using-- probably a netted IP in this case. The second one is the load balancer IP, which is the VIP. That's the same one that we saw in the configuration. The third one is the proxy IP, which is for the second session-- what's going to be the source IP of that second session. The server information. And here we can see the user defined headers that are inserted. So my connection is coming from San Francisco, so I would be a little bit alarmed if it was coming from a different place. Here's my RTT-- in this case, 3 milliseconds. The lat, long, the TLS version, and ciphers. So this is custom. I mean you don't need to add all the variables, but if you want to add some of them, that's how you do it. Now going back here, I just want to show, also, the second back end that I have configured. We are going to use it for the network security demo, but the only thing that I want to highlight is I'm using a different protocol here. So for this back end, I'm using HTTP/2 instead of HTTPS, which is also another feature that is released in beta. So for the communication between the proxy layer and the back end, you can actually leverage HTTP/2 for better latency of your application. Let's take a look at the instance group-- at one of those. Actually before we do that, just one thing that I wanted to explain to make sure it's clear is when you configure those back ends, you add those instance groups. That could be managed or unmanaged, as Babi explained. On those instance groups, you have the option to set the capacity-- so what we can see here on the right side. So if we go back there. So what we can see here is the max RPS that I have configured. In this case, the way it works is [? I will ?] configure, on the managed instance group basis, the amount of requests that each VM can handle-- in this case, 100 RPS, Requests Per Second. And then the max capacity that that VM can handle in this case-- 100%. Now if I combine those two, it means each VM can handle 100 RPS, and if I set that with the auto scaling, it means that 80% of that capacity, 80 RPS, auto scale up and start a new instance. Also note, that in the second instance group that I have defined here, auto scaling is off, and I have set that to a capacity of 10%. So this is a great use case to use for canary deployments. So if you have a new web server that you want to deploy and just send just a small fraction of your traffic, you can do that and have a new instance group just using part of-- just handling part of that traffic. And you're going to see that in action. But just before we do that, I just wanted to show you, very quickly, the instance group configuration. So the first part is the named port that I mentioned. In this case, I'm using 443 encrypted traffic, but you could also use a different port if you wanted. Then the next one is the instance template, which what is the instance that is being used here to create that VM, so you can customize that and load with your own software and the required software that you need. Auto scaling is on, as we mentioned. And I just wanted to highlight the cooldown period, which is the time that you need to make sure that your application takes to be ready to process traffic. So in this case, I changed to 30 seconds because it's a simple web server, but you may want to customize according to your application requirements. And the last one is auto healing, so you can enable auto healing on the group here because it's going-- what it's going to do is health check that instance based on the criteria that you set. And you get also another timeout. Now just note that, with the auto healing, if for some reason that instance is marked as unhealthy, it's going to be destroyed and recreated. So it will take more time as compared to just pulling an instance from the group. Now I'm also sending traffic here, so let's take just a quick look at how the monitoring works in this case. So I have three different load generators-- one in Asia, another one in North America, another one in Europe. They are sending traffic here, and you can see that for the European one the traffic-- the target is the back end that is used in Europe because it's close. And since I'm sending a little bit below 200 RPS, which is the max, I can see here that it's being alerted that usage is approaching capacity. Now another behavior that we can see here also is traffic being spilled over from the Asian and North American load generators, being sent to my new instance group-- just a small fraction of that traffic just like I wanted, because I just wanted 10% of the traffic being sent there. And also another thing that I wanted to show you is I have created-- there's a new dashboard here, showing also the protocol distribution. So this is based on a new feature that is available in alpha, which is the login and monitoring of the HTTP load balancer, so we are going to talk a little bit more about what is available. But we can see, here, the number of requests being generated by my load generators-- the same ones that you saw-- and how is the protocol distribution? So each one of those is sending-- is using a different HTTP version. Now back to Babi. BABI SEAL: Can you go back to the slides, please? So some of you have internal services that you run on Google Cloud. You want to scale and grow them behind a virtual IP address that's only accessible from your internal instances. And for these use cases, we have the internal load balancer. So for the layer 4 internal load balancer, you're effectively load balancing behind an RFC 1918 private virtual IP address. You are load balancing TCP and UDP, similar to a network load balancer, and doing the 2, 3, or 5-tuple hashing. Your client IP is preserved, and you can have TCP, HTTP, or HTTPS health checks. Now the key takeaway is this. There is really no middle proxy. Effectively, there's really no load balancer. The way it works is we have our underlying software defined networking layer, Andromeda. That takes care of doing the connection tracking-- the consistent hashing. And what it does is it sends traffic directly from the client VM to the back end. That's how the client IP is preserved. But what's even more important to note is that this is a very scale out architecture. Because there is no choke point, you can have very performance load balancer. If you look, the data model seems vaguely familiar. Yes, it is pretty consistent with our global data model. Except over here, the forwarding rule, which is regional forwarding rule as opposed to a global forwarding rule, has an RFC 1918 VIP. You specify the protocol TCP/IP-- I mean, TCP or UDP. And then you can have up to five ports, and it points directly to a regional back end service. Now one thing we went and did this year is to give you the flexibility to specify all ports. Now think of use cases such as a Pivotal Cloud Foundry where they have multiple applications, each on a separate TCP port. Well, you can load balance them with really good entropy behind a single load balancer, as opposed to creating, you know, forwarding rules for different ports. You can have firewall rules to protect access to ports that you don't want traffic on. Now a lot of our customers also have requirements for running certain applications in active standby mode, and I'm happy to announce that we just did a beta for something we call our failover groups. And what it allows you to do is you can designate certain instance groups as primary and certain instance groups as secondary. When the health of the instances in the primary group go below a certain threshold, the load balancer will just automatically failover the traffic to the standby instance group. You know, it's in beta. We'd love for you to try it out. Give your feedback, and we can go from there. Now in my initial slide, I mentioned load balancers deep dive. There was a sixth load balancer in there. We do have a layer 7 internal load balancer. It is in alpha. It does support both Compute as well as Kubernetes-- container-based Kubernetes. Under the hood, it uses the Android-based sidecar proxy. That's performant, feature rich, as well as open source. You don't have to worry about the management of it, because GCP-- Google Cloud manages the load balancer for you. It's currently a regional service, but making it globally available is something on our roadmap. So now I'm going to hand it over to Oz to talk about container native load balancing. OSVALDO COSTA: Thanks again, Babi. So now we're going to talk a little bit about the network endpoint groups that are used, or leveraged, by the container native load balancing. So on the left side here, the blue one-- what we have is the traditional way to load balance across Kubernetes clusters, which would be targeting the nodes. So in that case, the instance group would have-- or the load balancer would have the visibility of two different nodes only, and doesn't know about the pods. So it will load balance and distribute the traffic across those different nodes only. When the traffic gets to those nodes, then kube-proxy-- that will be running on that node. Then we'll use, if needed, IP masquerade to the source net and may send that traffic to a different node-- to a different pod. That different node may be in a different zone, so you may have inter-zone communication in that case. So for this behavior, then you have two different things that apply here. The first one is the source net, so you don't have the source client anymore because it's a blind source net. And the second one, you have a higher network utilization, because now you have IP tables being applied, and traffic that gets to one port-- to one node-- being sent to another node-- to a different pod. Then now we introduced the network endpoint groups concept which is also available in beta. So the network endpoint groups concept-- the load balancer will now have visibility to the pod layer-- to the pod level-- by using IP and port pairs. So in this example here, instead of targeting two different nodes, it will target five different pods. So you have better latency. You can expect better latency-- better network utilization because we don't have traffic getting to one node, and then being translated to a different one-- and now also better health checking, because instead of health checking the node and then gets translated, it just health tracked the pods directly. And that is also available by adding that annotation that I had added here to the screen, but let's see this live. So can we go back to the demo, please? Thank you. So I have created two different clusters in this case-- identical clusters-- so each one of them with three nodes and six pods. And six pods. And then I have exposed both of them using ingress-- and one of them with the network endpoint group's annotation, as I mentioned. And just to make sure that we know how the pods are distributed here, if we use this command-- this kubectl command-- then we can see that Kubernetes allocated those pods across different nodes a little bit differently. So if we take a look at the network endpoint, the neg cluster, then the pods got allocated-- three pods in one node, then two in the second one, and one in the third one. And the non-negs was a little bit different, because it got four pods in one node and then one in each additional node. Now I am also sending traffic here. I have traffic generators. So if we go back to the load balancer, let's see how this traffic is being distributed. So first let's take a look at the non-neg one, so the first thing is the instance groups. So it sees three different instance groups, and that's it. And it just has node awareness. Now if you go to Monitoring, I'm sending around 5,500 requests per second, and since it does not have visibility into the number of nodes, those requests are just evenly distributed across the nodes irrespective of the number of pods that are healthy there. Now if we take a look at the neg one, since the neg-- since the load balancer now has visibility into the pods, then we can see that the traffic is actually distributed according to the number of pods there. So the first node is getting traffic here for around 1,200 requests per second. The second one is getting three times that, and the third one-- two times that. Now if we take a look also at our custom dashboard-- that I also built using the same login and monitoring feature that is available in alpha-- we can see the number of requests there reaching-- and also the latency. And break the latency based on the first session and the second session. So we can see here that, even though the same amount of resources are dedicated to both clusters, the neg-enabled one is able to handle 7,000 requests-- so 2,000 more that the non-neg one. And in terms of latency, what this graph here means is on the 99th percentile, or all sessions under-- 99% of the sessions, for the neg one it took around 57 milliseconds to magnify-- so 57 milliseconds. Now the non-neg took 106 milliseconds in terms of latency, so it took almost twice the amount of time to respond to those sessions. And just to make sure that there are no tricks, I have also added here the front end latency, so it's not significant. So the latency is really on the back end. And also just to highlight the point that I mentioned in terms of network traffic, even though it's a very simple web server that I have running here, the neg one consumes half of the amount of traffic as compared to the non-neg. So if you had those nodes just like in this case across different zones, you would be paying for more traffic as well to cross those zones. Back to Babi. BABI SEAL: So now you've built a service, and it's now very important that you secure your edge to protect it against DDoS attacks, either volumetric or application layer. Before we get started, every network security at Google Cloud-- it's a shared responsibility between you and Google. We strive our best to protect our network as much as possible, but we also give you tools to protect it. We give you tools. And our best practice that we recommend is take a defense in depth approach. Secure it at all layers. So please secure it within your VPC, in your VPC, access to your VPC, access to Google services, from your VPC. And secure your edge. We're going to talk about securing your edge soon, but a lot of these individual topics have very good coverage in sessions here at Next. And I would strongly recommend that you go take them out, and do your due diligence in terms of figuring out the best security solution for your requirements. Now at Google, we strongly recommend, as a best practice, that you run TLS everywhere that you can. This is for your data privacy and your data integrity. So we don't charge extra for encrypted versus unencrypted traffic. We give you HTTPS and SSL proxies. We'll talk about our manage certs that help you apply certs to talk to your clients. We also recommend that you can apply self sign certs from your back end instances to your load balancer, so net net, as this diagram illustrates, have HTTP and TLS running everywhere and have better privacy and data integrity. If you want to run multiple domains behind a single virtual IP address and port of the load balancer, we support that. Managed certs-- we will help you reduce your toil of procuring the cert and managing the lifecycle of the cert. All you have to do, as this slide illustrates, is just specify the domain that you want to secure, and Google will procure the certificate in very short order and keep you up and running. You can use SSL policies to specify the minimum TLS version and the SSL features that you want to enable on your HTTPS load balancer and your SSL proxies. You can use Google preconfigured profile for some of these features, or you can choose to have your own custom. Let's say you have a strict requirement where you want to be very explicit in the ciphers, and the TLS version that you want to deploy in your network. You can use custom SSL policies for that. Now securing your edge. And again, take a layered approach. Use a global load balancer. Use it in concert with Cloud Armor. Use it in concert with Identity-aware Proxy, and layer it with firewall rules. So with the Google global network and our global load balancer, we are able to absorb, mitigate, and, you know, dissipate a lot of the volumetric layer 3 and layer 4 attacks. So that, you get with the global load balancer. Now for application layer attacks, we recommend that you take a look at a Cloud Armor solution. There's actually a lot of really good talks over here at Next with Cloud Armor. But essentially with Cloud Armor, you can specify security policies such as IP allow/deny lists and geo-based access control, in addition to protecting protection against cross site scripting or SQL injection attacks. Now layer that with Identity-aware Proxy. Now this is the whole beyond [? carb ?] story, so based on the identity and context of the user, you can authenticate the user and authorize their access to back end services. So, layering security levels on top. And all of this work in concert with each other. So if your Identity-aware proxy says no, and your Cloud Armor says yes, it will default to no. In addition, because our load balancer proxies come from very well-defined IP addresses, why don't you block-- you know, your firewall should be configured to only allow those traffic from the load balancer. Don't open it up to the internet. So take a defense in depth approach, apply multiple solutions that we provide for you, and protect your edge. I'm going to give it to Oz to kind of show all of this in action. OSVALDO COSTA: Thanks, Babi. So here I'm going to show a little bit of the security features that I have configured. Some of them I actually showed already. So the first one is the managed certs-- the Google managed certs that I have enabled here. So as we can see, it's active, and it's a super simple feature to deploy. The way to configure it is you have to enable when you're in DNS. And then once you do that, make sure that you add the A record-- the quad-A record-- pointing to your VIPs, and those VIPs are configured pointing to your forwarding-- to our back ends and are healthy. Once you do that, then the certs should to take a few minutes just to be alive and ready. I have also attached, here, an SSL policy restricting the TLS version, so let's take a look how that is actually working. So I have my default policy, and I have defined a new one which restricts the minimum DNS version to 1.2 and the set of ciphers to a restricted. So if we actually Edit here, we could even select a custom list and select only the ciphers that I want. So I am exposing that here on my VIP, and this is attached to the front end. Now if we go here to my cloudshell, and I try to connect using TLS 1.2, we can see that HTTP/2 responded with a 200. It was fine. Now if I change to 1.1, then I get an error of not being accepted. And the last part that I wanted to show you here is Cloud Armor. In this case, I'm using Cloud Armor with the allow and deny lists, so basically what I'm doing is setting the default behavior-- so setting the default behavior to deny, and I'm customizing the response-- in this case, a 403. But it could be, also, a different response. It could be a 502, for example. And then I'm setting an allow list-- in this case, setting here the ranges from the [INAUDIBLE],, so everyone connected can actually reach this VIP. And then I'm targeting that second back end that I have configured, so everything that goes through to slash secure slash is where I'm attaching it. So if I go to my landing page and just type secure, you can see it's a different landing page. And now if I try from cloudshell, I get a 403, so it's being blocked. Babi? BABI SEAL: So now you've secured your edge and deployed your services. The next thought that comes to your mind is how do I optimize my services? And you'd like to optimize them for latency, and then optimize them for cost. We've already talked about our distributed edge infrastructure and the latency benefits that you get out of that. A few additional takeaways-- when you do SSL termination at the edge, you get better latency. When you have PCP retransmissions from the edge, you get much better latency performance. Talking about TCP optimizations, after the first connection is established from the Google front end to the server, we can then optimize for subsequent connections to round trip latencies. We can eliminate most of those round trip times and optimize for a handshake latency. Now a really nice way to optimize for latency, especially if you have content that's cachable, is cache it at the edge. Use Cloud CDN, and all you have to do is enable a checkbox on your back end service. And you'll be serving content and delighting your users with their latency experience. In fact, there was a public study done by cedexis.com that compared Google Cloud CDN with other providers. You can go-- refer that link. And it showed that Google Cloud CDN had lower latency, better throughput, and much better availability. How do you make the web faster? Now this is something we at Google spend a lot of time working on and thinking about, and if we have any-- my two takeaways from this session is please go take a look at QUIC. Take a look at HTTP/2. And see if it can help provide you latency benefits. So QUIC stands for Quick UDP Internet Connections. If you've used Chrome, or you've used YouTube, your traffic has gone on top of QUIC. QUIC is UDP-based transport that's encrypted, and it's optimized for HTTPS. Among its many features, the three that help improve latency are quicker connections setup times, multiplexing of different streams, and no head-of-line blocking. So when cedexis was doing the study of CDN throughput, when we went and turned on QUIC, the throughput increased significantly. That was because with lower latency we were able to set up more connections, and with more connections we were able to get a higher throughput. Now the diagram on the right shows the advantages that QUIC has in terms of quicker connection setup times. When you think of HTTP running over TCP and TLS, consider all the handshakes that are involved with that HTTP over TCP and TLS. And think, how paired with QUIC, it's a lot quicker. It's even more quicker if the server has been seen before, when the latency is just 0. We also support HTTP/2 not just from the client to the load balancer, but also from the load balancer to the back end. Doing this enables us to support gRPC, and what it enables us to do is support a stream-based load balancing of gRPC streams to the back end service. Now optimizing for cost. You know, when you egress the data center, you incur egress bandwidth charges. But if you're optimizing for performance, we recommend that your return traffic do take the Google's performant and well-provisioned network and exit it at the closest point of presence to the user to give the best user experience. But let's say you have certain workloads that are cost sensitive. They are not mission critical, and you want to optimize for cost. Well, we then give you the option of the standard tier, and with the standard tier, it'll take standard ISP transit to egress the data center just like other Cloud providers. And you'll save in cost savings. You can turn it on per load balancer, or you can turn it on per project. And user-defined headers. So Oz had shown a really cool demo of this. So what you can do is-- at the load balancer level, you can configure custom request headers to be added to those requests. You can configure for collecting the geolocation, the TLS parameters, the smooth round trip latency between the load balancer and the client. Capture all that information from the load balancer. Send it to your back end instance, then the application logic in the back end can use that information and then use it for the optimization purposes that you are looking for. So now I'm going to request Oz to pull it all together for this session. OSVALDO COSTA: Thanks, Babi. So just to summarize a little bit of the best practices that we have, the first point is how to pick the right load balancer for you? We have this chart which is available in our public site, but some of the main points that I would highlight here is make sure you understand what type of traffic you want-- external versus internal? What type of client, v6 versus v4? What type of protocol? Do you have HTTP or HTTPS? Do you need host and path rules? Is it TCP? Is it UDP? How the sessions are being distributed-- do you need TLS termination? So this helps you to pick the best load balancer for your use case. Now once you have selected that, a few points that I wanted to highlight-- the first one is secure your service, right. So we have seen several of the best practices here in the demo, and also we talked through it. So TLS everywhere, that's the first point. So do TLS on your front end using your cert-- or leveraging the Google managed cert, so you don't have to be concerned about renewals. Use self signed certs, for example, on your instances for that communication between the proxy and the VIP. Leverage the VPC firewall. So as Babi mentioned, make sure you harden your VPC firewall, and it does not allow-- and do not allow traffic that you are not supposed to expect. Use the SSL policies, as well-- very important for compliance. So optimized for latency, so pick the right regions closer to your clients. Leverage the cross region global load balancer, so make use of the waterfall algorithm-- that it's going to send the traffic to the closest available region based on the RTT. If you're using Kubernetes, take a look at the container and native load balancing for the network end point groups, because that may also help you with latency. Also, optimize for auto scaling. Our load balancers, they don't need to be warmed up, right. They are ready to process traffic. However, your back end instances, they need to be ready. So make sure that you configure those instances with the right auto scaling criteria that you need. The timers that you need for that instance should be ready. And be careful also with health checks. They can be chatty, and the way that the architecture works is that there is a back end manager that is responsible for doing all those health checks. So you can set those timers and customize those timers based on your application requirements. And another thing is auto healing. It is an option that you can use. Just be aware that if auto healing detects that your application is not ready, it's going to destroy that VM and recreate, so it will take a longer time. If you are using the global load balancers-- HTTP load balancers, TCP, or SSL proxy. So combine that with Cloud Armor-- being allow and deny lists or at the WAF layer for all the other features that Babi mentioned. Also use the capacity scaler. So we talked about it, right? So the capacity, you can set to a percentage-- a small percentage. So when you have a new version, you can then deploy with just that instance group, and just send just a small percentage just to test traffic. And by the way, it doesn't need to be in the same VPC, so it can also be on a separate VPC. Set the appropriate connection timeouts, so make sure you understand your application-- what do you expect? How long those flows-- they need to be to be up and running? And use cookies or IP-based, as we talked. Now for the Layer 4 load balancer and network load balancer, the default is the 5-tuple, so you can get a better distribution. However in some cases, you may have fragmented packets, or even sessions that you need to do stickiness based on the source and client IP. So you can customize that. Also on the ILB-- Babi mentioned that the ILB actually does not exist. So the way it works in the architecture under the hood is that those instances, they are configured to respond to packets sent to the VIP. So if you're using your custom VM, that's a common thing. Make sure that you add that to your VM as well, and allow that VM to respond to packets sent to the VIP. BABI SEAL: I think we're actually at the close of our session. But before we go, please take a moment to say a big thanks to the production crew. They were awesome. They work really hard, and I totally appreciate their work. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 27,863
Rating: 4.9141107 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: HUHBq_VGgFg
Channel Id: undefined
Length: 50min 48sec (3048 seconds)
Published: Wed Apr 10 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.