Cloud Load Balancing Deep Dive and Best Practices (Cloud Next '18)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[UPBEAT MUSIC PLAYING] PRAJAKTA JOSHI: Hi everyone. It's a pleasure to be here at Next with all of you. I'm Prajakta. I'm a product manager in Cloud Networking. MIKE COLUMBUS: Hey everyone thanks so much for being here. I'm Mike Columbus. I am a manager in the Network Specialist team. So we work with customers like yourselves for all things networking on your journey to Google Cloud. PRAJAKTA JOSHI: So what we wanted to do was to start this talk by introducing you to our Cloud Load Balancing family. We are going to spend most of this session getting to know five members, which is a global HTTP, HTTP(S) Load Balancing, SSL Proxy, TCP Proxy, and then we've got two other flavors that are regional, which is Network LB and Internal L4 LB. And one of the things that many of you know but some of you may not, is that these load balancing flavors actually underpin a lot of things, especially load balancing on the computer engine side-- or whether it's Cloud Storage-- or whether it's the GK side, which is the Kubernetes side, or even yourself, manage Kubernetes on GCP. So we thought, let's start with the basics, which was the first flavor that we have. And then Mike, give us a tour of Network Load Balancer. MIKE COLUMBUS: Sure, thanks. So around two years ago, we published a paper that revealed the architectural details of the Maglev. Now, the Maglev has been around for a while. It's our in-house developed, all software load balancer that runs on standard Google hardware. This load balancer has been handling the majority of Google's traffic since 2008 and is a key part of our architecture. Maglevs handle traffic as they come into our data centers. They direct traffic across our front end infrastructure. And they're also exposed to you as our Network Load Balancer for Google Cloud. So the Network Load Balancer, when you create a network load balancer, we use forwarding rules that basically direct TCP and UDP traffic across regional back ends that can be auto scaled using hashing. So it's a network-based load balancer. We look at the source/dest port IP and protocol. And you can, you know, manipulate the hashing algorithm as you see fit. When you create a forwarding rule, you get a VIP, or a Virtual IP address, that originates from a regional network block wherever you tie that forwarding rule to your back ends. So this VIP is anycast out our global points of presence, but the back ends are regional, meaning you cannot have back ends that span multiple regions. This is a pass through load balancer, so your back ends are going to see the original client request. It's not doing any TLS offload or proxying. So traffic gets directly routed to your VMs. And that's where you can use the Google firewall to control or filter access to the back end VMs. So in this diagram, you see we have three regional back ends, three forwarding rules. We have Maya in California, Bob in New York, and Shen in Singapore. And they're all connecting into their back end resources, which is my app test and travel. But if Shen were to connect into the US-West back end, again, would ingress that traffic closest to Shen in Singapore because that ranges anycast out our network. And then we would route that traffic to the regional back end. So let's take a little bit of a deeper look under the hood of the Maglev. So on the left here, you see a traditional middle edge proxy load balancing design which are typically deployed in active standby pairs, or highly available pairs. They're scaled up as more capacity is needed. And typically, if you have massive spikes in traffic, this needs to be pre-warmed to prepare for the influx of traffic. On the right hand, you can see the Maglev architecture. So again, this is an all software-based solution that runs on our standard servers and it's horizontally scalable. The way this works is, in a traditional sense, a load balancer is primary or active for its VIP, right? In the Maglev architecture, we announce all VIPs to our edge routers via ECMP. So we'll distribute flows based on hashing-- so equal cost routing-- to our Maglevs. The Maglevs will then use consistent hashing. What this means is as Maglevs, or load balancers are added to the pool, we'll get consistent results as that back end is selected. The Maglevs then encapsulate that traffic based on the virtual network ID for the customer and send that to the back end service. And then we use direct server return to send that traffic directly into our edge routers. So the Maglevs are not in path for the return flow. So I want to recap what features we have with our network load balancer. Again, it's a regional solution. So we have high availability by deploying your back ends in multiple zones. It does not look at Layer 7. It load balances TCP and UDP. The back ends will see the original client request. So there's no need to pass an x-forwarded-for header, use proxy protocol to inform the back ends of what the original client request was. And you can control this access with our VPC firewall. We balance based on 2, 3, or 5-tuple. And you can get affinity by using 2 or 3-tuple hashing so that way the source, if you're using source IP, Dest IP, you'll always choose the same back end. We health check the back ends. And this is a very high performance solution. So we published a paper. We handled 1 million requests per second with no pre-warming. What it does not provide. So there is no global back ends for the Network Load Balancer. We also only support IPv4. So there's no IPv6 support for the Network Load Balancer. No Layer 7 traffic routing, and also no TLS termination or offload. So with that, I'll hand it over to Prajakta. PRAJAKTA JOSHI: Thanks, Mike. So that is how-- so we had the network load balancer. And then, given all of the reasons that Mike spoke through, that's how we decided to go on and build a Global Load Balancer. Now, most of you are familiar with the kind of load balancing that you see in other public clouds. And the load balancers are regional. And what that means is, per region, you need a virtual IP for your load balancer. And then your back ends are simply in that region as well. So now imagine you wanted to set up your service in three regions. And then you would need to go put three VIPs. And then, obviously, you need a DNS Load Balancer or a Global Server Load Balancer to go pick one of those VIPs for your given client. And this has actually several issues. So as an example, if something were to go down, first of all, your DNS load balancer has to go and understand something went down. The second thing is your local DNS to the client could go and cache the IPs, and that could result in suboptimal selection. And the other thing is, imagine you ran out of capacity in one region. The regions are silos. So the load balancer in one region cannot leverage the capacity in another region. This obviously would not have worked for Google Services. And so this is when we thought, why don't we rethink how global load balancing is done? And the idea was to not depend on the DNS load balancer to go do load balancing for you. So now imagine, first of all, you don't build out the load balancer using managed VMs, or instances, or servers. Instead, we just put all of Google's Global Edge infrastructure and we put load balancing logic there. And this was along with a bunch of different systems that interoperate to do load balancing for your clients. Which means it doesn't matter where you put your application instances, you get a single front end VIP. You get worldwide capacity because now that VIP is fronting all of your capacity across all of your deployments. And then you get cross region failover and so on and so forth. And one of the really nice things about the Load Balancer is that it's very tightly coupled with our auto scaler, which means as your traffic grows, it will scale up your back end instances. And then as your traffic ebbs, it'll scale them down as well. It's also-- one really nice side effect of it is the VIP becomes, or our global VIP becomes a very nice place to go apply all your global policies. And the other thing is you don't have a choke point, because these are not just point implementations with VMs and so on. So a really nice way to think about, you know, what does this Global Load Balancing really look like? So often we have customers who will start off with a single region and they will go deploy back ends there. They will configure the load balancing VIP. Now, it's very likely that as your service grows in popularity, you will need to bring up application instances in another region. And so as you can see here, once Bob in New York comes on board, all you need to do is go add application instances in a region in US-East. You don't need to change the VIP. You don't need to put a DNS load balancer because you still have only one VIP. In this case, it's 200111. And then you can just expand wherever Google has data centers by simply putting instances there. But as you can notice, there was no change to the VIP. There was no change to your DNS server. And you definitely did not need any of the things related to DNS load balancing. One of the other things which I spoke about was the capacity that you get. So in this case, for instance, let's say your instances in Singapore are overloaded. So Shen in Singapore would be very seamlessly directed to any of the other available instances. And it's also possible that the California instances for your app are overloaded. And so your users could be overflowed back into New York. And once the capacity is available-- because, basically, the autoscaler scaled up-- then the clients would be brought back. So a lot of people ask us, like, what does this look like under the hood? So because of all of our mental models of load balancers are these boxes, this is actually what the global load balancer looks like under the hood. We have a tiered edge. So we do part of the load balancing and termination at our Tier 1. And then we do some of the load balancing, the remainder of the load balancing logic at the Tier 2. So that's what you see, the two tiered architecture. And then your application instances are in the last, which says instances there. That's where you would be running your application software. The other thing that a lot of people ask us is, so you have all of these instances, and how does the load balancer go distribute traffic? What it does is it fills up the local region to capacity for a given client. Then it'll fill up basically all the zones within a region. Then, if you're out of capacity in a region, it will overflow to the closest region. And then, within the zone itself, it round-robins. You can totally forget all of this because you don't need to worry about this. This happens automatically. But the key takeaway is as long as you have available capacity in the region that's closest to your user, your user will land up there. You will overflow your failover only if there's some issue, or capacity issue, or availability issue in the region closest to the user. The other thing that we wanted to walk you through is, this is what's under the hood. So what does the data model for the Global Load Balancer look like? So you have essentially the VIP, which is the global forwarding route. Then you have the proxy. So if you're running HTTP traffic, your proxy will be HTTP. Or it's HTTP(S) or SSL, or any of the other TCP and so on and so forth. Then you've got the back end side of it. This is where you're actually putting your application instances. And so a service maps to a back end service. And then that could have application instances within instance groups in one or more regions, depending on where you are users are. So this was a whirlwind tour. But what I wanted to do. Mike, why don't you take us on a tour of this through a demo? MIKE COLUMBUS: I'd be happy to. Thanks, Prajakta. So today we're going to be interfacing with an HTTP(S) Global Load Balancer Deployment. It's live up and running. And I wanted to take a few minutes to talk through the components and how they're configured, and really solidify the object model that Prajakta just spoke of. And what's really cool is if you're one of the select few who are online during this conference on the network, you can connect to this fancy front end we built for you. So if we can switch to the demo. All right. So here we're in Google's Cloud Console. We're going to navigate to Network Services. And this is where we'll configure load balancing DNS and CDN. In this exercise, we'll be taking a look at our load balancers. You can see we have three in this project. The first two we'll get to later. The third one is what we're going to go through today. This is an HTTP(S) Load Balancer. So we're going to drill in a little bit on what's configured. So in the Cloud Console, it really breaks this object model down into three components. The first is the front end, right? Host and path rules, which is basically like, we'll look at the URL request and route that to the appropriate back end service. And then the back end service. I'll go through each component in its configuration here. So we just spent a bunch of time talking about a single VIP. There's two here. You might be saying, Mike, what gives? It's because the first VIP is for IPv4. The second VIP is for IPv6. So basically, for the domain name, www.gcpnetworking.com, we're going to take in HTTP(S) traffic and direct this to our target proxy. So when you create an HTTP(S) front end global forwarding rule, you need to have a certificate. So we support SNI. But I kind of cheated here. This is a little bit of a sneak peek. I used a Google Manage cert. So this is going to manage the lifecycle of the cert for me-- super easy. I just, you know, when I deployed the cert, I configured the domain. And using Cloud DNS, I pointed an A and quad A record to these IP addresses, and it deployed. SSL policy we'll get to it a little bit. We're going to additively do, further in this talk, some additional things to this environment. We'll secure the environment. And the network tier is premium. This is our default tier. This means that these VIP ranges come from a global pool of addresses that will anycast out our global pops, right? So kind of back to the DNS thing that Prajakta mentioned, right? One A record, one quad A record, route clients via anycast, just BGP routing to the closest front end. So from there, we'll talk about host and path rules. So in this environment, it's pretty simple. You don't need to configure these for the HTTP(S) Load Balancer. It's kind of important, right? That's why you might want a Layer 7 Load Balancer. What these do is that we look at the incoming URL and direct the traffic to the appropriate back end service based on this matcher. So you can do path matches. In this environment, it's pretty simple. Let me zoom in here further. Make it a little easier to see. So anything unmatched will go to our Web back end service. Anything slash secure is going to go to our Secure back end Service. The back end services themselves. So these are made up of a collection of instance groups. The instance groups can be deployed in any one of our 20 regions. In this environment, I picked US-West 1-- it's fairly close-- and Europe-West 1. But in addition to adding your instance groups into your back end service, it's also a collection of configuration. So you're going to want to group your applications appropriately into back end services for this reason. So the first thing that's configured in the back end services is your endpoint protocol. So we pitched TLS everywhere. I was a little bit lazy. I did HTTP. But it's fairly common to use self-signed certs in the back end because, you know, your clients are connecting into the front end proxy. You'll notice the named port. Where this is important-- and this is a configuration that's part of the instance group-- is that this is basically a key pair. So if you wanted the Load Balancer to direct traffic to your instances on the back end on, you know, a nonstandard port, you'll set a named port in the instance group, associate that to the port that you want to receive traffic on. And the back ends will receive traffic from the front end or from the Google Load Balancer on that port. Timeout is also very important. So if you're running web sockets or long lived HTTP connections, you're probably going to want a timeout of greater than 30 seconds. This is a fairly static page so I didn't muck with this at all. We enabled Cloud CDN. It's a simple check box in your back end service configuration. So if you had, you know, dynamic content that you didn't want to cache, you could leave it off for a back end service associated with that. And then, you know, enable it for static content. The other thing I forgot to mention is in your host and path rules, you can also direct to back end service or also a back end bucket. So I wanted to just mention that real quick. The back end service health check is configured on the back end service. So we're using a basic HTTP check for our back end services to make sure that they're healthy. And then there's some advanced configuration. So we can configure session affinity, connection draining, or timeout. So basically how long if we drain, or if we're going to be removing a VM, how long do we want to keep those connections active. We can associate a security policy. It's a little bit of a teaser there. We'll do that later. And also identity aware proxy, which we'll talk about. The other thing is, you can also add custom headers. So if we wanted to insert a header or have the HTTP(S) Load Balancer inform us of say, geolocation or some custom header, we can insert those on a per back end service. So this back end service consists of three instance groups. You'll see everything's healthy, which is great. You got to love live demos. We have back end. We have an instance group in Europe-West 1 and US-West 1. You'll notice we set the auto scaling to 80% of our target, which is 300 requests per second per VM. So auto scaling will be triggered, in this case, 80% of 300, which is 240, assuming that capacity is 100%. This is an additional knob that's really important for Canary or A/B releases. So you might notice this new web, you know, instance group. This is really like if I want to spin up a test and direct a subset of traffic to this instance group. I set the capacity for 10%. So 10% of 300, so about 30 requests per second will go to this new instance group. So great for Canary and additional control. Now, the next thing I want to do is contrast the Load Balancing configuration with the configuration of an instance group itself. So we'll pick an instance group, which is US-West 1. And let me just zoom out a little bit. And we'll edit this group. So in the instance group itself, here is where you specify the key pair. So we're mapping HTTP to port 80. We could do, you know, 8080 on the back end if we wanted to, or 8443. This instance group is built off a template, which basically calls in a custom image, trying to keep the boot time relatively short. We have auto scaling on based on HTTP(S) Load Balancing usage. We can also use CPU usage or custom metrics. But we want the auto scaler. Because we see the requests per second, we want the auto scaler to trigger auto scaling. We have a minimum of one instance, a maximum of 10. And the cool down period is relatively important. What this means is it's a time we're going to wait before we collect statistics on instance. So if you have long-running startup scripts, you're going to want to know how long your instance takes to boot and set appropriately in your cool down period. The other thing is auto healing. So auto healing is the ability to have an additional health check that will recreate the VM if it goes unhealthy or it's unresponsive. Typically, your auto healing health check, you want to relax a little bit because it's a more expensive operation to reboot a VM or rebuild a VM versus just taking it out of the healthy available pool for your back end. But that's there. And it's turned on here with an initial delay of 30 seconds. This is similar to the cool down period. We're going to wait 30 seconds before we health check this back end VM. So we've gone through the configuration. Another thing I have set up here is some load. So I want to show just a little bit of monitoring our Load Balancer. So jump back in here. So here's our web back end service. I have two load generators, one in North America, one in Europe. Both are sending about 1,000 requests per second. And what this graph shows us is that traffic arriving at our North American front end locations are always being directed to our US-West 1 back end service or instance groups. Now, this goes to the distribution, right? We're always going to select the healthiest back end with available capacity. You can see a small subset of traffic is being directed to our Canary release site, which is our new web back end instance group. And then in Europe, all those requests are being served by the Europe-West 1 instance group. And then the last thing we'll do is, let's just-- we can connect to the demo site. So hopefully some of you were able to access this. So it's a pretty simple web page. The client IP is the original client IP so originating from probably some NAT Gateway here at the Moscone. Load Balancer IP is the Load Balancer VIP. The proxy IP is basically the connection, so the actual IP address that the VM back end will see once that connection is proxied. And these are known ranges of addresses that you're going to want to allow through your firewall, your VM firewalls. You can see the host name we're connecting into. We're in San Francisco. We're connecting to US-West 1. I'd be a little alarmed if this was Europe. And you can see the region and zone. So with that, let's jump back to the slides. Let's talk a little bit about container networking. So for those running containerized workloads on Google Cloud, I want to talk about how we've enhanced the experience and really made containers first class citizens with regard to our Load Balancing offerings. So I'm really excited to talk about Network Endpoint Groups. So what Network Endpoint Groups are, they're a collection of network endpoints. So fundamentally, this maps to a port and IP pair. So as a precursor to Network Endpoint Groups, VMs can have multiple ranges associated to them. So we use alias IP ranges. And in that sense, a VM can host multiple containers with VPC routeable IP addresses, along with serving ports. So now, with Network Endpoint Groups, we can target these IPs along with serving port as an endpoint, health check and load balance accordingly and much more accurately without Network Endpoint Groups. So the diagram on the left here-- I apologize, it's a little choppy-- shows a Kubernetes deployment without Network Endpoint Groups. And one of the things you will see is that the Load Balancer targets nodes, not endpoints, not containers themselves. So what happens is traffic arrives at the Load Balancer. We select a back end node. And then, with Kubernetes Engine, we have queue proxy, which basically configures a bunch of IP tables rules almost as a secondary Load Balancer to distribute traffic to the back end pods. So depending on how many pods-- so let's assume you have an equal distribution of pods across your nodes. Well, you have a 1 divided by n, which is the number of nodes, chance that a node outside of the local node your traffic arrives at will be selected for this Load Balancer. So in this case, traffic comes into the node. IP tables will do some masquerading and send that traffic to a pod located on another node. It also sorts NATs. So you have these extra hops. Then the traffic returns from the pod back to the original node, and then back to the Load Balancer and then back to the client. So its fairly suboptimal. Now, you can turn on a mode which is known as Local Only, which will just target the pods local to the node that the traffic arrives on. But even then, you're not going to get an even distribution. On the right hand diagram, we show Network Endpoint Groups. In this model, the load balancer sends traffic directly to the port and IP pairs for the containers. This is enabled with a simple annotation, right? This was an alpha feature. But with one simple annotation, you can eliminate a lot of hops. And I'm going to show a quick demo on this later. The next bit I want to talk about is multicluster ingress. So if you have an application running on Kubernetes Engine in different regions-- so you have different regional clusters setup but the same application-- you can now deploy a multicluster ingress. Which basically, you can register your container clusters to and create a unified ingress that can load balance client traffic using all that global load balancing magic to the appropriate container cluster. So this is an enhancement to our queue Control interface. So with Cube MCI, you can create this new ingress that will map multiple clusters to a single Load Balancer. All right. So here's the demo environment again. I want to show Network Endpoint Groups. And I'll prove this. There's no smoke and mirrors going on here. So I have one cluster, which is Cluster A. That is deployed with Network Endpoint Groups. So we have a simple web application that's just returning its host name. We have load generators, so we're going to blast a bunch of traffic to it. And we're going to observe some monitoring metrics on this. The other cluster is deployed in the same zone, US-West 1A-- same number of nodes, same application, same load testing characteristics. So we're sending 50 concurrent clients connecting into this environment. But this is not using Network Endpoint Groups. All right. So let's jump back into our demo. All right. So back in the Cloud Console, I just want to give you a quick tour of the environment so you can't call me out later. So here you can see our two clusters. They both have three nodes. They're basically identical. As far as the deployments we have in these clusters, you can see our Network Endpoint Group demo app has five replicas in our Cluster A, which is our Network Endpoint Group cluster. The non-NEG or, you know, the traditional app is running in Cluster B. And then if we look at the services, both of these clusters are, again, running five pods. And then we have ingress of the HTTP(S) Load Balancer. So if we jump over to the HTTP(S) Load Balancer screen, you'll see that in our network endpoint group cluster, you notice that the back end service is targeting a network endpoint group. So these are built zonally. So if we had a regional cluster, you would see a network endpoint group per zone. But again, these are port and IP pairs. And if we jump back to the non-NEG cluster, you can see that the Load Balancer is targeting the nodes, right? So you know, IP tables is involved, masquerading packets. So if we look at Stackdriver monitoring, there's a couple of interesting observations. So I'm capturing or charting two metrics. The first is requests per second. And the other is aggregate network traffic on the back end nodes. So even though the load testing is the exact same, you'll notice we're able to send about 10% more connections to the Network Endpoint Group cluster. I think this is-- don't take my word for this-- but I believe this is because the transactions are finishing faster so the performance-- the traffic generator is able to pump more traffic to the Network Endpoint Group cluster. This is key here, though. You'll notice that the Network Endpoint Group cluster has about half of the aggregate traffic as the non-NEG Network Endpoint Group cluster. So this is substantial, right? This is all a factor of eliminating those hops. But the other thing is, like, imagine if you're running a regional cluster where you have nodes in different zones. This could amount to huge cost savings because you're not getting charged zonal egress between the nodes, right? So pretty quick demo there. Hopefully that was helpful. I'll hand it back over to Prajakta. PRAJAKTA JOSHI: Great thanks, Mike. So what we did is so far, we wanted to provide you a very close look at Global Load Balancing and then our Container Load Balancing. The next half of this presentation we'll just quickly walk you through all of the features. Many of these actually have their own sessions. But the idea is that you're exposed to a lot of these features. So the first one of those is Internal Load Balancing because not every service of yours needs to have a public-facing VIP. So that's the first one. You can see a very traditional internal Load Balancer where you've got a client in RFC 1918 space, Load Balancer VIP in RFC 1918 space, and a back end. This is also a traditional 3-tier app where the first tier is essentially public facing. And then the internal load balancer is used for rest of the tiers. The most interesting bit, actually, about the internal load balancer is that there is really no internal load balancer. So between your clients and your back ends, there is really no middle proxy sitting and doing load balancing. We have Andromeda, which is our network virtualization stack. And then that plants the client, along with health information, so that the client itself can go pick the optimal back end. So that way there's no choke point in your network with internal load balancer. The other thing, you know, when you start with private services, the next set of things that you're going to go tackle is essentially securing the edge because all of your internet0facing services are going to get DDoSed. And these would be either infrastructure DDoS attacks, or these would be even application level attacks. So we have a bunch. Like, when you actually secure your network, you want to do it all the way from VPC to your edge network. And we have several great sessions on each of these. One thing I want to point out is one best practice is to put TLS wherever you can. We don't charge extra for TLS, so that should be your default. The second thing is, if you have this need for multiple certs, we have a feature that lets you put multiple certs on a given load balancer. And then many of you complained about the toil of procuring certs. So we do have managed certs which are in alpha. They will be available in beta this year at NEXT. [APPLAUSE] The other thing is, many of you said, can I turn off TLS 1.0 because I need to pass my PCI compliance test. So we also have custom SSL policies available for you. Either you can outsource-- you can select what kind of policy you want, and you can outsource management to Google. Or then you can go create your own custom ones if your IT department has certain regulations around those. The other feature that I also want to call out is as a best practice, you should be turning on Global LB with TLS, all of the features we've talked about, Cloud Armor. And then if you are controlling access based on user identities, you would also layer in Identity-Aware Proxy. So all of these work together pretty well. And then also, if Cloud Armor says drop traffic and Identity-Aware Proxy says allow traffic, it's a combined thing. So it will essentially drop the traffic. So they all work pretty nicely along with each other. What I wanted to do now is-- and of course, we not going to deep dive into the features of Cloud Armor. We had several sessions on those. It goes all the way from IP blacklist, whitelist to sophisticated Layer-7 type defense. But we wanted to do a quick demo with Mike. MIKE COLUMBUS: All right. Back again. So today for this session, we're going to be tying some security policy to our existing demo that we just went through. So I pinned network security here. So we're going to be configuring an SSL policy as well as a Cloud Armor policy. And I'll show you how that gets attached to your Load Balancers to provide additional protection. So we'll start with Cloud Armor. You can see I have a Cloud Armor Security Policy here. This is a simple policy, but you can always add to these later. So this is titled, Allow Conference Net Block. So the idea here is that this is an IP Whitelist Blacklist policy. So the rule set is fairly simple. We're going to allow any-- so I got the Moscone network ranges. Hopefully these are accurate. Hopefully you get online to test these. So basically, we're going to allow this. So you can see some IPv4 ranges in there, along with an IPv6 range. And then we're going to deny everything else, right? So if you have an application that, you know, you want to make sure you're explicit as far as which IP addresses are allowed to access that, this allows you to obviously filter that at the edge. And don't worry about it on your back end VMs. We take care of it. So once the rule's created, you know, we've added our rule set, which is fairly simple here with the deny at the end. The order that the rules are evaluated is from low to high. So low is the first. So we created a priority 100 rule here so we have some room to grow. We attached this policy to targets. So you can attach this policy to a single target or multiple targets. I already attached this policy to our secure back end service. So the idea is if you are on the local network here, you should have access to www.https, www.gcpnetworking.com/secure/, and make sure there's a tailing slash there too for access. If you're not, you're not going to get access. So one way I can prove this is I'm on the network right now. Let's open up a new tab and connect to our Edge Security demo. And you can see we got there. The page looks a little bit different. It's about the extent of the customizations I did. But again, this is connecting into the other back end service. One way to test that this doesn't work is to use our Cloud Shell. So Cloud Shell is basically a jump host pre-installed with our software development kit. So basically, it's an easy way to interface with your project programmatically. It's a Linux host. So we'll just cURL to the secured back end service. And who thinks I'm going to be able to connect? I hope not. And you can see I got a 403. So the actual error response is configurable. You can send a 403, a 404, Not Found, or a 502. So in this rule, I had configured a 403. So that's Cloud Armor. A very simple example of Cloud Armor. And the key takeaway here is that it's attached to your back end service. So if we wanted to connect to, you know, the kind of the naked URL here, it would work just fine. It's because we don't have the policy tied to that back end service. SSL policies, as Prajakta mentioned, is a powerful tool to enforce compliance. So say if we wanted to follow PCI guidance and say, all right, I only want my clients to connect into my environment with a minimum of TLS1.2. It's kind of like Cloud Armor, right? You create the policy, but in this case, you attach it to the target proxy and the global forwarding rules. So we created a policy that specifies minimum TLS version is 1.2 with the restricted profile. But, you know, you can create custom profiles and explicitly allow or disallow ciphers. We also have a modern profile to support a wide set of clients. Then we also have a compatible profile, which is a little bit more loose. Or you can create your own. So for this, we created a minimum TLS version of 1.2 and we attached this to the Load Balancer. So now, if we do a cURL command but specify the TLS version, so let's try to connect with version 1.0. You see we get an SSL protocol version error. If we try 1.2, same error, right? We only want to support 1.3. So if we do 1.-- I'm sorry. We only support 1.2 or above. So 1.2. We specify that for the TLS version. Works no problem. All right. So with that, I'll hand it back to Prajakta. PRAJAKTA JOSHI: So again, the next set of features are-- I'm going to put some of the work on you in that, after the session, do go and check out all of these features because they do bring really good benefits to your services. The main thing, after you've done putting your services in place and then you've secured the services, the next thing you want to do is optimize. And I think the first thing that people talk about optimizing is for latency. So the first, we already spoke about the Global LB, we spoke about the two tiered architecture. But the main takeaway is your SSL termination happens at the edge. And that saves a lot in terms of latency. The second thing is that once your first handshake is done, then essentially after that, the connections are maintained. And so your next handshake, like you're not truly doing a handshake all the way to the end. But for all of your subsequent connections, you're essentially saving on round trips. So you got this just by putting the Global Load Balancer. One of the really good ways to just shave down more of your latency is if your content is cacheable, you would go down on CDN. So we have Cloud CDN, which is our native offering. And you can turn it on with a single checkbox. So if you have a Load Balancer, you can just turn on a checkbox, as you can see there. And then you can enable it. There is also a public report, which is www.cedexis.com/google-reports/, were Cedexis measured Cloud CDN against other providers. And then you can see that it does really well in terms of low latency, high throughput, and availability. The other thing is something that we've been thinking about at Google for a long time, like, how do you go and make the Web faster? And some of the key points I wanted to call out. So we spoke about the infrastructure, we spoke about the Global LB. There are two additional things that I wanted to look at. One is HTTP2, which essentially goes and improves HTTP. And then there is QUIC, which is a protocol which is over UDP. And in fact, if you're using a Chrome browser during the day, you've already used QUIC. And, you know, it does some really cool things. So if you think of why your latency is there, it's because of things like head of line blocking. So there's a bunch of features built into QUIC, which actually stands for Quick UDP Internet Connections, that help you bring the latency down. I'm going to show you how you would configure it. A lot of times, people will say we support HTTP2. The first thing you need to see is from where to where. So we support HTTP2 from client to the Load Balancer, and then Load Balancer to the back end. And you would need that to run gRPC. Same way with QUIC. We support from client to the Load Balancer. And we're the first Cloud to actually support that, the first major public Cloud to do that. And what it really gives you is this. So when we turned on QUIC for our CDN, you can see the sharp jump in throughput because you've brought down latencies. You can essentially have more connections. And then what you see on the right is how did it bring down this latency? And you can see that the handshake looks much smaller when you have QUIC versus without QUIC. So we have a blog post that we posted which describes all of these details as well, but that is one of the features you should be looking at to bring down your latency. Now, these are some of the features that Mike already spoke about, which is network tiers. So when your traffic egresses out of Google Cloud, you're paying the outbound data transfer costs. You should by default use the Premium Tier because that makes your return traffic over Google's Global Network. But you may have services which, for instance, are a free tier for you, or you may have services that are not mission critical. And if you want to bring down the cost there, you can actually go configure the Load Balancer to use, just like other public Clouds, just standard ISP transit, which is not Google's network, to go back to your user. And so the main thing we want to do is that we wanted you to have choice per workload. So you can actually have this configuration per Load Balancer if you want to. Or, if you just want to put it at the project level, which is our thing which holds all of the load balancers, you can do that as well. This one is, again, something that Mike spoke about, which is when you configure your application instances, we have a feature called user defined request headers which lets you put things such as client geolocation, or RTT, or TLS information, and then send it to your back end. So then your back ends can use that. And they can use it in their application logic to make optimized decisions. So, Mike, recap everything we spoke about. MIKE COLUMBUS: All right. So we tried to-- as she walks off the stage. [LAUGHTER] All right. Some of these might actually be new, but just kind of a cheat sheet, right? So the first thing in general, right? Choose the optimal Load Balancer. We have a few different solutions. And there's not one size fits all, right? So factors you want to consider, do I need IPv6 support, right? Do I need TLS offload? Is it an internal versus external application? Regional versus global? Do your back ends need true client IP, right? These are going to guide you to the right Load Balancer. Make sure you don't just start deploying without knowing what you're getting yourself into. The other thing is optimize for latency, right? So we have 20 regions. Deploy your workloads in the region closest to your users. And if you're using a Global Load Balancer, deploy to multiple regions closest to your users. And use modern protocols, right? Like QUIC, HTTP2, right? All these things are great to, you know, improve performance and also help in lossy environments where the connections might not be great. Secure your service. So there's a few bits here, right? As Prajakta mentioned, use TLS everywhere. Standardize on that. Use the VPC firewall to augment your security policies. So health checks come from known ranges. Any of our Proxy Load Balancers, traffic comes from known ranges. Lock that down. These things are predictable. Secure at the edge with Cloud Armor, right? Make sure you're using that functionality to eliminate additional security measures have a defense in depth strategy and offload some of that to our Edge. The other thing is-- and this is a common one-- so your VMs do not need public IP addresses in any of our Load Balancer solutions, right? Even though, you know, that GFE IP, if you recall, when it connected back into the back end, lock your firewall down to that range, right? There's no need to expose your back ends ends with a public IP address. Optimize for auto scaling, right? Know the profile of your application. Know how you want auto scale. If you have a dual-headed service, if you have an instance group that's a back end of two load balancers, really think through how you want to scale that, right? Do you want to use a custom metric? Because the load balancers will scale independently. Understand and optimize health checks, right? Try to health check as close to the application as you can. And, you know, as I mentioned earlier, it's a less expensive operation to remove a VM from a pool back end services and then use auto healer to reboot that VM should it become unresponsive, but relax those timers a little bit. The other thing I'll say here is that our health checks can be quite chatty. So if you're doing logging on your back end VMs, make sure you're aware of our ranges so you can treat them accordingly on your logs. For the ILB or Network Load, Balancer, there's UDP considerations. So these load balancers for UDP-- so if you think you're going to get fragmented UDP packets, be careful with 5-tuple hashing. The reason is you can lose the destination port as packets arrive on or load balancer and those packets will be dropped. So you might be better off using a different load balancing algorithm. The other thing is with UDP-- this is a common thing I see-- make sure you're listening on the actual VIP. So what happens is if you put-- with the Network Load Balancer, the ILB, the VIPs are known by the VMs. They get installed as a Linux route. You want to listen on that VIP on your back end VMs for UDP traffic. Use 5-tuple hashing for the best entropy. A lot of times, people will load test the Network Load Balancer, the ILB, and they'll send the same source, dest port IP and protocol. And they're like, why isn't this stuff getting distributed across my back ends? You know, you wouldn't want to throw a variety of traffic at it just to see traffic get balanced because it's using hashing. Use 2 or 3-tuple hashing for session affinity. For the Global Load Balancers, secure at the Edge. Use Capacity Scaler for A/B deployments. It's a very powerful tool where you could, you know, set capacity at additional measure to start directing traffic to new builds. Set the appropriate connection time out. Use cookie or IP based affinity for session of persistence. So with that, I'll have Prajakta take us home. PRAJAKTA JOSHI: Thanks, Mike. So you know, we spoke about everything that exists today. And the next question is, what's next in load balancing? And we had an entire session on it just prior to this one. But if you attended any of the sessions in this conference, there were two main trends. One was monoliths to microservices. And then the second one is almost everybody has at least an on-prem deployment in Google Cloud, or Google Cloud and another cloud, or maybe all three. And delivering services that span this, or even taking your monolith, when you chop it to micro services, sure, you get agility and so on and so forth. But it's still really hard to deliver those services. And so Cloud Services Platform, which was an announcement in our keynote yesterday, is attempting to solve those problems, where instead of giving you point products, we're putting them all together into a platform so you can actually go and then build these hybrid or multi-cloud services. I wanted to add two new bits that we announced in the previous session. One is the Traffic Director and then how it operates with Envoy, which is a service proxy. I'm not going to go into the details of it. But the simple thing I wanted to show is this is a service. You brought in the notion of a service mesh. So that's when a lot of this load balancing is headed to. Where just like in the old days, you took hardware boxes and you separated the control and the data plane, you're doing exactly to the same to services, where your data plane is the service proxy and then your control plane is your service management infrastructure. One of the most popular service proxies that we see in use is Envoy. And our previous session was centered around that. So if you get a chance, do look at the Envoy proxy. And we like it because, first of all, it's high performance. It provides a bunch of features that would be required for your services. And then the third thing is that also, it is very modern. So it's supporting things like HTTP2 from the get go. So the moment you can put a proxy next to your existing service or application logic, the way you build services can be fairly uniform because now you can do it the same way. This is the announcement that I want you to go and look at later, which is Traffic Director. So what exactly does Traffic Director do? Let me show you a picture here. You're so used to global load balancing with our global load balancer. We spent a big chunk of time talking about that. We want to bring that same capability to microservices. And so what you can see here is there's a shopping cart app. You've got the front end, the shopping cart app, and the payment. But your instances for your microservices are in two different regions. So one is in US Central, so our data centers are in Iowa. And then the other one is actually in Singapore. And what this means is that it's not that they are two services. They're literally one service with instances in both regions. And then the Traffic Director is the one that's managing those proxies that you see sitting on each of those instances. And that's the one that's feeding intelligence about global load balancing to those proxies. So it's a departure from the traditional way where the server side did the load balancing. Now the intelligence is with the Traffic Director, which is feeding your client side proxy to then go pick the optimal back end for the next service. So this is what you see. And then the reason it's important is this. This exactly matches what I spoke about for global load balancing, but in the context of micro services. So you can overflow, you can do failovers, and your services will be much more resilient in this case. And like everything else, I mean, if there was one takeaway from this conference, you already know that Google is big on Hybrid Cloud. And so in the future, we do want to provide Traffic Director for your work loads which are not in Google Cloud. And then the last thing that we also want to do is with all of this, we fit into the CSB architecture. So you do have this really neat and nice platform that lets you go and build services at scale and without much toil. So it was a whirlwind deep dive. But thank you for coming, and we hope you got something out of this session. Thank you. [APPLAUSE] [UPBEAT MUSIC PLAYING]

Info

Channel: Google Cloud Tech

Views: 14,059

Rating: 4.9459457 out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate

Id: J5HJ1y6PeyE

Channel Id: undefined

Length: 50min 27sec (3027 seconds)

Published: Wed Jul 25 2018