Starting with Kubernetes Engine: Developer-friendly Deployment Strategies (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] WILLIAM DENNIS: I'm William Dennis, a product manager on Google Kubernetes Engine. I'm going to cover three broad topics today. The first is, why Kubernetes? The second is best practices for application deployment on the Kubernetes-- and lastly, planning for growth. So let's get started with, why Kubernetes? And I guess it helps to actually explain what Kubernetes for those not familiar. So people call Kubernetes a container orchestrator, which is really just a fancy way of saying that it manages your containers. The application developer is responsible for containerizing their application and describing properties of that application, like the memory and CPU requirements. And then an operator is responsible for deploying that application to the cluster, which could actually be the same person. Kubernetes then takes that and manages the kind of heavy lifting for you. So basically, it just works out where to put those containers-- a very simplified overview. What makes it popular? Well, one of the top reasons is that you get the language flexibility of containers. So there's no requirement of what language you're using, or what versions the libraries, or anything like that. There's less vendor lock-in with Kubernetes. Kubernetes is based on open source. And in fact, Google runs certified Kubernetes. This is a version that's been certified by the Cloud Native Computing Foundation using a certification program that, actually, I participated in creating. And there are about 50 other vendors that are also offering certified Kubernetes. So you can be assured that, whatever you do on Kubernetes, you can run it in a number of different places. It's also highly resource optimized. So you can really pack a lot of containers on the same pool of computer resources. You can build your own pipelines. So let's say, if you have a particular opinion how you want to deploy software, you want a blue-green type deployment or whatever preference you have for deploying things, I guarantee there's a way you can do it on Kubernetes. There are also just tons of open source add-ons. So because the project is open source, there's this massive community. And you get all these great open source add-ons. You name it. There's probably an open source add-on that can help you. And finally, it lets you deploy a whole bunch of different types of applications beyond just a 12-factor stateless application. So I mentioned at the outset that containers is one of the most important reasons. Maybe it's worth just giving a quick recap on why containers are so good. So if we look back at how you might have deployed multiple applications on a single machine in the past, it probably would have looked like the one on the left there with a shared machine. We have a whole bunch of different apps sitting on the one machine. There was no isolation between the apps. So if one of them started using all the resources, it would impact all of the apps. There were common shared libraries between them. So it's a real pain if you had to have a different version of the dependencies. Then we moved to VMs, where you got some isolation between the apps. And you had isolation, in the sense that there was no common libraries. But it was a lot more expansive and inefficient, because the kernel is getting duplicated in each of these VMs. Which finally brings us to containers, which kind of gives you the isolation properties of VMs, particularly when you add in things like gVisor, some new open source that Google released a couple of months ago. You get the same property of the VM in that the library is not shared. But there's a lot less overhead. And we should know. Because Google's actually been running containers in production for over 12 years. And for about three years prior to that, we were heavily developing this technology, including contributing some of the underpinnings like cgroups to Linux. So back to Kubernetes. Kubernetes is a workload-level abstraction. And what I mean by that is that you describe your resources in terms of the workload. Like, I want 10 of container A, and I want 20 of container B. I want them grouped together on adjacent nodes. Or I want them spread far around the world. Or maybe something different-- maybe I want to render a movie. And I want to render that one frame at a time as a batch job. And I want to use the cheapest compute resource available, so like the preemptable VM. It's a workload-level abstraction. So you just define those workloads. And its job is to handle that for you. I believe it sits at the ideal level of abstraction, as well. So it doesn't do a super low level such that you have to manage individual machines. But it's not super high level either, in the sense that it doesn't tell you how to design your apps. It really sits nicely in the middle. To me, this contrasts with the Platform as a Service or the PaaS style, where delays were a little bit blander. The application and the PaaS layer that sits on top of the IaaS, the Infrastructure as a Service. And you can see these, because the PaaS is actually typically, like, changing some code in the application. This can be very convenient, because it can do helpful things for you. But it can also constrain you. Kubernetes, I think, has a very nice layer separation, keeping all these concerns quite separate. And the main point about Kubernetes, I think, is that it really prioritizes making hard tasks possible over simple tasks easier. So I guarantee you, if you have a task, and it's already quite easy, and you want to make it easier, you can probably find a way to do that. There's got to be someone that can help you make it even easier. And why not? It's always good to optimize things. But you have to ask yourself, I think, am I actually making the hard tasks harder by pursuing this route? Kubernetes, people probably won't say that it makes simple tasks easier. I would say it doesn't make simple tasks particularly hard. But where it really excels is that you can capture just a broad range of things and do it in a fairly streamlined way. And why Kubernetes Engine? So this is Google Kubernetes Engine, the hosted product that runs Kubernetes that my team manages. It's actually the same team that writes and contributes the majority of the open source that brings you Kubernetes Engine. So if you're looking for people that have real expertise in Kubernetes, definitely take a look at our product here. And maybe more important than software, I think, is the attitude that we have to open source. So this is a quote from Eric Brewer at the keynote last year, where he said, "We're going to be the open cloud. We're going to give you the freedom to join and to leave. And I'll take my chances." And I think this really aligns us with your objectives, as well. By making it so easy for you to come and leave, we're incentivized to make it as good as possible so that you stay because it's good, not because you're locked in. So I think there our objectives get quite well-aligned. So before I jump into a demo, let me cover a few key Kubernetes concepts to help explain the demo. I covered containers already. Pod-- in Kubernetes, there's the object called a pod, which is actually just a collection of containers. It could be one or many. This is the smallest schedulable unit in Kubernetes. And the reason we have a pod is so that, if you have tightly-coupled containers that need to be deployed together, you can represent them like this. But often, a pod will just be one container. Then we have nodes. Nodes are just a way of saying machines. We call it nods, because they can either be a VM or could be bare metal. As you saw at the keynote yesterday morning, GKE now works on prim. So this will be more and more common. And then there's a deployment. The deployment is the real driver in Kubernetes. It's a statement of the desired number of pods that you want to be running. So in this example, we have deployment A, which says, I want two of part A. And we have deployment B that says, I want one of my pod. Kubernetes uses what's called an operator pattern, where you will observe the state of the cluster. It will look at what you have specified, the desired state. And it will drive that observed state to the desired state for you. So in this case, the user has said, I want two of part A and on of part B. And that is what Kubernetes is observing. But what would happen if that node became unhealthy? Well, Kubernetes would observe that there's actually now only one of pod A none of pod B. It would redeploy those pods onto a node that was fine, after which the observed state, again, matches the desired state. And everything is right in the world. That's deployment. Last one I want to cover is service. So a service is really just a group of identical pods that host like an HTTP service, or a TCP service, or whatever you are trying to provide. It's just a way of collecting all the pods and addressing them together. OK, so with that, let me talk a little bit now about how to actually deploy things into Kubernetes. And it all starts by dockerizing and deploying to Kubernetes. Let's switch to the demo. OK, so I have a very simple Node.js app here. And I added a 10-second startup time for reasons that will be apparent in a moment. It's just to emulate an app that takes a little while to boot. Let's take a look at this app. It's extremely simple. So that's running not in containers. It's just running on my machine. Let's take this app, put it into Google Kubernetes Engine, and put it into a container. So I mean, I can build this locally with the Docker Build command. But actually, the best practice is to use a Cloud Build tool. Because you don't know if-- maybe I had some change files here on my machine that weren't in version control. I probably don't want those changes going into production with code review. So let's not deploy that container I built locally. Let's set up a Cloud Build environment. To use Cloud Build, I'll need a source repo. I probably should pick the project. OK, I'm going to create the repo called Next Demo. By the way, that thing that just happened that was me just tapping the security key. It just helps keep everything safe, a little second factor device. So a repository that's created is called Next Demo. And I'm going to push the code up. OK, so while it's getting pushed up, let me go to Cloud Builder now. And I'm going to create a build trigger that's going to build this code into a container. So we'll pick Google Source Repository. You can also connect this to Bitbucket and GitHub. I'm going to pick that repo that I just created. And I'm literally going to leave everything as the default. So we want to build this from a docker file. That all looks good. Let's just create the trigger. So normally, this would be triggered when I push code. I'm just going to trigger it manually, so you can see what happens. And we should see-- here we go. OK, so that container is now building in the cloud-- very similar to when I just built it then. I'll wait for that to finish. It only takes a few seconds. I'm just was running my npn install there. This is a little node app. And we're done. All right, so let's get these into Kubernetes Engine. So I go over to Kubernetes Engine. And I'm going to pick the myapp cluster that I created earlier. And I'm going to click this Deploy button here. This is kind of the simplest way to get started, by the way. I'm going to select the container that I just built. I'm going to give it a name-- let's call it hellonext-- and deploy. And that's basically it. It's quite simple. There is one more thing I need to do. I need to actually expose the service so that we can actually view this online. So because my container was running port 8080, I'm going to target port 8080 from port 80. I'm going to pick the service type load balancer so that you can all take a look and click Expose. So if I now go to the Services tab there, you can see that there's one service here, which is creating endpoints. Since that takes a minute or so-- because that IP address space goes to get broadcast to every point of presence that Google has around the world-- I'm just going to look at this one that I created earlier. And that is the application that I just demonstrated locally now running in the cloud-- pretty straightforward, right? Let's go back to the slides. All right, but one of the advantages of Kubernetes is that it can really help you automate a whole bunch of different things in your cluster. For all these great things actually work, though, you need to actually give it some hints, and let it know how it can help you. One of the most important things that you need to specify are these liveness and readiness checks. These are probes that run at frequent intervals checking the consistency of the container. They look very similar, but they have an important difference. So the liveness check is intended to just check the integrity of the container. Is the container running? Now if the container actually crashes, Kubernetes is smart enough to restart it. But if your application hangs, Kubernetes doesn't know if the application hung or if it's running just fine. So this is why you need to have a probe. And that's what the liveness probe is for. Is my container just operating? Readiness is a slightly different check-- although it uses the same function-- where you indicate to Kubernetes that the container is ready to receive traffic. So it's a slightly different state to whether it's actually running. Your container may have an external dependency like on cloud SQL. In which case it could be alive and running. But it may not have that database connection yet, in which case, it would pass the liveness check but not the readiness check. Perhaps, a diagram would help. So we have the liveness probe that goes just from the control plane to the container and a readiness probe that goes to the container and the external dependencies. The reason why you don't want to mix these two is that Kubernetes will actually reboot the container if it doesn't respond to a liveness probe. So if your liveness probe is dependent on an external service which goes down, you don't actually necessarily want your container to restart at that moment. That's why we separate it. Of course, maybe you do after a while. Because maybe there's something kind of funky with that connection. So you can configure the readiness probe to still restart your container but typically at a slower pace so that you don't churn the containers. There are three options for these probes. One is a command where Kubernetes will use the exit status. Then second is a TCP Port where it will just try to open a socket. If the socket opens, then everything's good. And the third one is an HTTP request where it will make an HTTP request and look at the HTTP response code. It's the third one that I'm going to demo today. So let's switch back to the demo. All right, and first up, I'm actually going to show you what happens if you don't have a readiness check. So to do that, I'm just going to have a little while loop here that's just going to go, while true, I'm going to curl that endpoint that I just created. So you can see, it's just printing out this app that I created and the current time so that you can see that there's actually something changing. All right, let's add something to this. All right, I'm going just add a little smiley face here. So it's just some change that we can demo. All right, and we're going to push this up. And as you know already, this is going to build the container for me. So I'm going to deploy that container. But before I do, just to make the demo run a little better, I'm going to take a look at my deployment here. And I'm just going to remove that little thing there. The app that I deployed actually has an automatic horizontal port [INAUDIBLE],, which I don't want for this demo. But I will be showing you later, so fear not. OK, so now that we're ready to go, let's see if that container was built. It looks like it was. I'll grab the new container name, go back to Kubernetes Engine. And this is also showing you how to update your app. So I'm just going to do a rolling update here. I'm going to paste in the new contained image, quick Update. And let's switch over, and see what happens. So you notice, there's a few errors on the top right here. Those errors are happening, because as I mentioned, this particular container has a 10-second boot time. So while it was booting, because we didn't give it a readiness check, all the containers were receiving traffic, even the ones that weren't ready. That's why you see a bunch of errors there. So let's get back to here, and add a readiness check. So this is what the readiness check looks like. I'll explain what it all means in a second. So below the container in a configuration, we're adding a readiness probe. We're saying the failure threshold is three. So if this probe fails three times, I assume that there's a problem, that the container needs to be restarted. We'll do a HTTP GET request with an initial start up delay of three seconds. We'll run that every three seconds. And if it succeeds once, we're going to count that as a success. So let's save that. I'm just going to force update that. It looks like I've got a few terminating pods here that are taking a little bit of time. So let me update the app while we wait for that. And I can demo what the rollout looks like when you have a readiness check. OK, we go back to Cloud Builder-- same process. I think we're in a good state now. So let's roll this out. And as we do, let's observe what happens. Oh, by the way, we're still waiting for the image to be built. So that's actually building in the background. I was a little bit quick there. One of the good things about Kubernetes is-- you see where it says image pull back off? It's actually smart enough not to try and boot the content that doesn't exist yet. So it's actually going to keep retrying that, which is pretty neat. And it just got it. And if you observe here where it says ready, you can notice that it's actually not ready yet. And we're still routing traffic to the previous container. Once that flips over to ready, now you can see that it's flipping between the two different expressions there. Both containers are receiving traffic. Then it's going to terminate the old container, and route all the traffic to that one. So unlike previously, when there was an error state as traffic got routed to containers that weren't ready, once you have the readiness check, that error doesn't happen. And you have a very smooth update. That's what makes readiness checks so important. All right, back to the slides, please. All right, now you may have seen that I was using a lot of UI there. And I kind of just patching the configuration fairly willy-nilly. Maybe it doesn't surprise you that that is not the production best practice. So let's talk a little bit about what is. So as I've covered, Kubernetes is build on a declarative style architecture, where you declare the state that you want. And it drives that observed state to the state that you declared. What this means is that you end up with a lot of different configuration files. The workloads and the services are all specified with configuration. So what do you do with all that configuration? Well, at Google, we've actually had this problem for quite some time. So we have a fairly good experience on how to deal with this. This is a quote from the Google SRE, the "Site Reliability Engineering" book, which you can read online for free. And it says that, "At Google, all configuration schemes that we use for storing this config involve storing that configuration in our primary source code repository and enforcing a strict code review requirement." That is the same code review requirement for configuration as for code. The benefits of this model-- so if you're storing configuration in Git along with the code on a separate Git repo-- is you get all the benefits that you get with storing code in version control. You get rollbacks. So if you wrote out the wrong release, you can just rollback the configuration. And that will effectively rollback that release. You get versioning and auditability. If someone asks you, hey, what version of our code was running at 9:00 AM on Monday? Well, if you're just patching it by hand, you probably don't know. But if everything's stored in Git, you can just look back through the commit history and see what was running at exactly that time. You can also utilize code reviews. So you can try and avoid that situation when someone fat fingers the replica count down to one by accident and takes down the servers. Disaster recovery-- so while this is not the primary motive in making your Kubernetes application highly available-- on Kubernetes, we support regional clusters and a whole bunch of different things-- it is, of course, good to have that configuration stored in version control as a backup. And finally, whatever identity and authorization model you have on your code, you can apply that to the configuration, as well. So you could potentially restrict it to and SRE-type role if you wanted to. So what does that look like in a Git Kubernetes world? This is just one way to do it. But it's kind of the one I recommend-- where you would have one configuration repository per app. So in this example, there are two repos at the top there, one for the app, Foo , one for the app, Bar. And then you use branches for each environment of the app. So the master branch likely maps to production. You could have a staging branch. And these branches of configuration map 1 to 1 to name spaces in Kubernetes. The advantage of that is that you don't have to change the object name. If you have an app called Foo Deployment, you don't have to call it Foo Deployment Staging. You can just call it Foo Deployment, because you're going to deploy into a namespace. It's up to you if you want to run this on two clusters or one. So I've just shown here, just for example, say, having two different clusters and a bunch of different namespaces. That part's totally up to you. How do you get from the state, though, that I was demoing into this new more production grade system? Well, fortunately it's pretty easy. You can just extract all the configuration resources that you had so far using this command, kubectl get resource dash o yaml. Once you've done that, you can also set up some very nice things like continuous deployment. So we have a product, Cloud Build. I demoed it just trying building the container. But you can actually use it to roll out the config change, as well. And all you need is this very simple configuration file that basically says, I want to use the kubectl task. I want to run the command at the bottom, which is apply dash f dot, which just means apply the current directory of config. And I'll give it two environment variables, so it knows where to run that. Let's switch to the demo. And I'll show you how I can migrate this app that I've created into that new environment. First, let's look at the deployment objects. We have one deployment object, hellonext, that I created in the UI. Now I'm going to download the config for that. I'm going to use the export command here, as well. So that it kind of skips some extra config that I don't really need. OK, and that's what it looks like. If you saw briefly before, that's the same config that I was editing on the website. I mentioned that we can hook this up to Cloud Builder. So let's take a look at how that will work. To use Cloud Builder, once again, I'm going to create another source repository. This one, I'll call Next Demo Config. I'll go to Cloud Build. And again, go to the build triggers. I'll add a new trigger again on the Google Cloud Source repository. We'll pick here the config 1. But instead of using the docker file, what I'm going to do is I'm going to point this at the Cloud Build config file, which is also in this directory, which I can show you. That's that same file that I had up on the slide. It's in a subdirectory. So that's all, one change. We're just going to point this at the Cloud Build file. And we're going to create that trigger, which will sit alongside the code container building tree that I have. All right, so let's get this in. Push that up. And we'll take a look at what happens in the build history. So you can see that, just like we were building the container before, we're now actually executing this config build. Instead of building a container, it's actually just running this command, which is kubectl apply dash f dot. Basically, it's just kind of roll out all that config in that Git repo onto the cluster. So let's demo an app update using continuous deployment. Change that one more time. And this time, I'm not going to use the UI, because I know what the image is going to be named based on the SHA Let me just look at that. And here in my config, I'm going to open up. And I'm going to edit here the container image. I'm just going to pasting the new SHA, because I know that Container Builder is going to create a container with that name. So I don't have to go back to the UI any more. All right, this time, let's just observe what happens. So in theory, [INAUDIBLE] in a few seconds, pick up that commit automatically, and start rolling out into the cluster. Let's get my little curl while loop back up running. Yes, there we go. So Google Cloud Build picked up the change, has applied the change. We're once again waiting for that readiness check to pass. Very soon, on the right, you're going to see that change I just might get rolled out. We're splitting the traffic. We're going to terminate the old container in just a moment. So that is kind of the start of a proper production-grade pipeline on GKE. Obviously, you can take this a lot further. The team also has various different ways of doing this. We have a product called Spinnaker, which is fairly advanced. And we have some partners in this space like GitLab, Weaveworks, Codefresh. But that's just one way to do it. Back to the slides, please. So of course, it goes without saying, I think-- but I'll say it anyway. Once you moved to Configuration as Code, don't go editing the configuration or replying it outside of version control. So don't be doing what I was doing before in the UI once your moving the system. Because then, of course, you're violating the constraint that everything went through a code review. OK, so what have we covered? We've covered how to configure your cluster so that communities Kubernetes knows when your application is running, when it's ready. We've talked about how to export that configuration and get it into a close-to-production-grade pipeline. There's a couple more things that you need to do to help Kubernetes help you. And one of those is telling it what resources you need for your container. So this comes in two forms, kind of like the probes before. But again, they have two very specific requirements. The requests is how you reserve resources for your pod. And limits constrain how many resources your pod can use. So if you request all the resources in the cluster, then the schedule will actually stop scheduling new containers. Whereas, limits just restrain your container. I think a diagram is probably more helpful here. This is showing one VM and all of the resource allocation of that VM. We have container A that requested 20% of one CPU, container B that requested 50%, container C that requested 20%. And then let's look at what's actually being used. So that was just the resource allocation. But it doesn't actually restrain what the container can use. So here we have container C using a lot more than it said it would. We have container B using a bit less than it said and A using a bit less, as well. is actually really good, because this means that you're not wasting those resources. Even though they were allocated to container B, container C can use them as long as container B isn't. So it's kind of like a way of guaranteeing a minimum amount of resources without actually constraining the maximum. Now if you do want to constrain the maximum, then you can set a limit, as well. So I could have limited container C at 25%, in which case, it won't go over that amount. What this means, though, is that, if you have a container-- like here, container D-- that is requesting 20%, it's actually not going to get scheduled. So even though the CPU does have 20% capacity available, because it wasn't allocated, it's not going to get scheduled. This is how you set it up in the deployment. So you just put this in pretty much exactly what I did with the probes. I'm actually going to demo that one. But it's fairly easy to set up. And you can look at Google Stackdriver to see what resources your pods are currently using in order to get good values for this. Obviously, you want to put the request somewhere above the minimum that you need. But obviously, don't make it excessive. Don't make it too high. Because you also kind of want to catch cases where you might have a memory leak. That's where the limit can come in handy. And you just limit the container at certain amount so that Kubernetes is restarted if it ever uses too much memory. Another thing to consider when you set limits and the requests on the container is how many forked processes you want actually inside that container. So this obviously affects the memory and the CPU usage. Before Kubernetes, before containers, it was generally pretty good practice to have a lot of concurrency in each version of your app. Because it was such a pain to deploy the app, you wanted to make that app highly concurrent. But in Kubernetes, it's very easy to create multiple replicas of that app such that you don't have to have a huge amount of concurrency within the app. So what's the right balance, though? I tend to think about two or three forked processes per container as about the right amount. Because you do get some memory savings by just forking a process and making it internally a little bit concurrent. But the advantage of having so many replicas is that, if one of them were to crash, then you haven't removed all the capacity. Whereas, if you had only two replicas and a huge amount of concurrency inside them, one of them going away would be quite problematic. Kind of the same trade-off for machines-- so large machines do offer a slightly greater efficiency with Kubernetes. You can pack more containers in. There's less wastage. But of course, more machines gives you more availability. So kind of the same as before, with more replicas giving you more availability. So again, just balance the two trade-offs of machine size and machine count to get your desired availability goal. And what about rapid iteration? So one of the nice things about not using containers is that I could just go Node server.js whenever I wanted, bring up that application. It was really straightforward. So for Kubernetes, we have a tool called Scaffold that aims to give you that kind of quick iteration that you're used to while still using Kubernetes. Scaffold observes your directory of configuration code. And whenever something changes, it's going to redeploy your app too Kubernetes. So you can very quickly see any changes you make deployed. Right. So that was about some best practices of how to describe your application to Kubernetes in a way that it can help you kind of keep everything running for you. It can restart the container if there's a problem. It can split the traffic accordingly. What about growth, though? What about planning for growth? So this is something where Kubernetes Engine really excels, actually. We have three world-class automation tools that we offer as part of Kubernetes Engine. The first is cluster auto-scaling. So when you schedule a container and there are not enough resources, the Cluster Auto-Scaler can automatically provision more resources to handle those extra replicas. But perhaps as importantly, it can also remove resources that are not being used. Here at Google, we don't want you to pay for compute resources that you don't need. So we'll actually help you scale it back in the quiet times to save money. The second one is Node Auto Repair. So occasionally, VMs can become unhealthy. Node Order Repair is constantly watching the VMs, just as Kubernetes is watching your containers. And it's going to replace any VMs that have a problem. The last one is Node Auto Upgrade. So as you've probably seen from the last couple of years of news, there's a lot of vulnerabilities that the world discovers in releases. And you probably don't want to stay up late at night worrying about this. So if you turn on Note Auto Upgrade, we'll keep you up to date. And we'll apply all these patches to Kubernetes right onto the nodes for you. And these three features are actually, to the best of my knowledge, not available in other clouds. These are very much specific to Google and what makes, I think, Kubernetes Engine just such an amazing product. To really take advantage of these things, though, you actually also need to auto scale the workload. So you need some way of detecting when there are more users coming than what you can serve and automatically adding replicas of your pod to handle them. For that, what we recommend is using requests per second metrics. Again, this is kind of like a best-in-class thing to do. You can use CPU and memory usage as a proxy to scale up your container. So for example, you could say, if my container is running really hot and is using 90% of the CPU, add another one. But not every application is CPU bound. In fact, that's fairly uncommon. It's often, there's a queue length. There's often a request per second that the app can handle. So on Kubernetes Engine, we make this actually fairly easy for you to set up. In conjunction with Stackdriver monitoring, we have a Kubernetes Horizontal Pod Auto-Scaler component that can read the requests per second metric from Stackdriver that's written to Stackdriver from the HTTP load balancer, such that, when new users arrive, we can automatically provision more containers to handle the increase in requests per second from these users. So what you do is you measure how many requests per second can your application handle, just as you measured how much RAM and how CPU it used. It's fairly easy to measure how many requests-- what the throughput is of your application. And then you can just say to Kubernetes, look, I want each of the replicas to have a maximum of, say, six requests per second, or 100, or whatever it is. Setting up a horizontal pod auto-scaler it is really straightforward. You just have one more piece of config. I know, Kubernetes is all that the config. There's really two lines that matter there, too. It's the ones I've highlighted in blue. And that is, my metric is the request count coming from the web answer. And my target value per container, in this case, for this example, is going to be six requests per second. I want to have one container running for every six requests per second. Let's switch over to a demo. And I'll show that in action. All right, so the bottom right there, I have Stackdriver monitoring showing the request per second on the service. And you can see there's just under two requests per second happening right now. Let me add a couple of while loops to generate some load on this. That should be about 20 RPS, I believe. After about 30 seconds to 60 seconds, that RPS should show up in Stackdriver. And we'll see if the cluster reacts to handling. I'm going to set some watches here. So we're going to watch how many replicas we have in that container, how many nodes we have. And for good measure, I can also watch the status of my horizontal pod auto-scaler. Now as this takes a few seconds to show up, I'm actually going to switch to a video and show you a 30-minute window of what happens when a whole bunch of traffic arrives on this example application. But I'll switch back, so you know I wasn't cheating later. All right, so this is the video I recorded on Monday-- exact same demo. And here we've just added an extra five RPS. And I'm just going to speed up the video a little bit. So you can see it in the chart on the bottom right that the QPS is just about to arrive. We should clock up there to about eight or nine. An because I set a target of six RPS per container-- it looks like we're currently at about seven-- you can actually see that Kubernetes has already scheduled a second container at the top left there to handle that extra traffic. I'm going to speed up this demo again. And let's try adding another 10 QPS of traffic. And you can see that the 10 QPS registered in Stackdriver. The Horizontal Pod Auto-Scaler successfully scheduled two more containers. But it looks like there's a problem. So those containers are pending. They're not running yet. And the reason is is that this cluster is actually out of resources. So those containers are actually stuck in a pending state. Well, the good news is, as I mentioned, Kubernetes has a cluster auto-scaler, as well. So what's going to happen is GKE, Kubernetes Engine, is watching for this state and has realized that there are pending contains here that they don't have the capacity to run. And so it's actually automatically provisioned one more node there, you can see, in the second from the top left. Once that node becomes ready-- and let me just play the demo-- those pending containers get created. And they're now running and serving traffic. All right, so for a bit of fun, let me just add another 20 QPS. And we'll speed up the demo quite fast. And we'll see what happens. Once again, it schedules some more contentious. It scheduled some more nodes. I think about five nodes now. But I told you that you're going to save money, as well, right? So here I've seen that I did a whole bunch of users arriving. You might have been sleeping. I don't know what was happening. It could be the weekend. A whole bunch of users arrived and were successfully served by your app. So they weren't disappointed. They didn't have a fail character on the screen. They were successfully searched while you were sleeping, or on vacation, or whatever. But what about saving you money? All right, if I drop the load all the way off here, drop it back to five QPS. Let's see what happens. And I'm going to run the demo a little bit fast. Because as with a lot of auto-scaling things, the scaling down is a little bit more conservative than scaling up. The reason that it's typically set that way is that you don't want to quickly deprovision resources in case the users come back straightaway. So it looks like we've already deprovisioned the pods in the top left. They're no longer running. But we still have some nodes that are running now. They're probably not needed now that we only have one container. So let's see what happens. And look at that. The nodes have been removed, deprovisioned, and went back to their starting two-node cluster. So that is Kubernetes Horizontal Pod Auto-scaling using an RPS metric and the cost-to-order scaler to scale up and down the cluster. Back to the slides. OK, so what did I show you? The recap is we looked at readiness and liveness probes and how important they were, the importance of defining requesting limits so that Kubernetes knows what resources that you need, the idea that you should be treating configuration as code using a CI/CD pipeline to deploy code so that you're not going to accidentally deploy things that would just happen to be sitting on your machine. Using a CI system means that you're only going to deploy the code that was checked in. And with a CD system, you're only going to deploy the configuration that was checked in. You should turn on all the automation settings in Kubernetes Engine, so you can just sleep easier at night-- and then finally, auto-scale using RPS. [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 9,337
Rating: 4.9268293 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: 2ZP4M6UdH8s
Channel Id: undefined
Length: 43min 52sec (2632 seconds)
Published: Thu Jul 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.