[MUSIC PLAYING] WILLIAM DENNIS: I'm William
Dennis, a product manager on Google Kubernetes Engine. I'm going to cover three
broad topics today. The first is, why Kubernetes? The second is best practices
for application deployment on the Kubernetes-- and
lastly, planning for growth. So let's get started
with, why Kubernetes? And I guess it helps to
actually explain what Kubernetes for those not familiar. So people call Kubernetes
a container orchestrator, which is really just a
fancy way of saying that it manages your containers. The application developer is
responsible for containerizing their application and describing
properties of that application, like the memory and
CPU requirements. And then an operator is
responsible for deploying that application to the
cluster, which could actually be the same person. Kubernetes then takes
that and manages the kind of heavy lifting for you. So basically, it
just works out where to put those containers--
a very simplified overview. What makes it popular? Well, one of the top reasons
is that you get the language flexibility of containers. So there's no requirement of
what language you're using, or what versions the libraries,
or anything like that. There's less vendor
lock-in with Kubernetes. Kubernetes is based
on open source. And in fact, Google runs
certified Kubernetes. This is a version that's been
certified by the Cloud Native Computing Foundation
using a certification program that, actually, I
participated in creating. And there are about 50
other vendors that are also offering certified Kubernetes. So you can be assured that,
whatever you do on Kubernetes, you can run it in a number
of different places. It's also highly
resource optimized. So you can really pack
a lot of containers on the same pool of
computer resources. You can build your
own pipelines. So let's say, if you have
a particular opinion how you want to deploy software,
you want a blue-green type deployment or
whatever preference you have for deploying
things, I guarantee there's a way you can
do it on Kubernetes. There are also just tons
of open source add-ons. So because the project
is open source, there's this massive community. And you get all these
great open source add-ons. You name it. There's probably an open source
add-on that can help you. And finally, it lets
you deploy a whole bunch of different types
of applications beyond just a 12-factor
stateless application. So I mentioned at the outset
that containers is one of the most important reasons. Maybe it's worth just
giving a quick recap on why containers are so good. So if we look back at how
you might have deployed multiple applications on a
single machine in the past, it probably would have looked
like the one on the left there with a shared machine. We have a whole bunch
of different apps sitting on the one machine. There was no isolation
between the apps. So if one of them started
using all the resources, it would impact all of the apps. There were common shared
libraries between them. So it's a real pain if you had
to have a different version of the dependencies. Then we moved to VMs, where
you got some isolation between the apps. And you had isolation,
in the sense that there was no common libraries. But it was a lot more
expansive and inefficient, because the kernel
is getting duplicated in each of these VMs. Which finally brings us to
containers, which kind of gives you the isolation
properties of VMs, particularly when you add
in things like gVisor, some new open source that Google
released a couple of months ago. You get the same
property of the VM in that the library
is not shared. But there's a lot less overhead. And we should know. Because Google's actually
been running containers in production for over 12 years. And for about three
years prior to that, we were heavily developing
this technology, including contributing some of
the underpinnings like cgroups to Linux. So back to Kubernetes. Kubernetes is a
workload-level abstraction. And what I mean by that is that
you describe your resources in terms of the workload. Like, I want 10 of
container A, and I want 20 of container B. I
want them grouped together on adjacent nodes. Or I want them spread
far around the world. Or maybe something different--
maybe I want to render a movie. And I want to render that one
frame at a time as a batch job. And I want to use the cheapest
compute resource available, so like the preemptable VM. It's a workload-level
abstraction. So you just define
those workloads. And its job is to
handle that for you. I believe it sits at the ideal
level of abstraction, as well. So it doesn't do a
super low level such that you have to manage
individual machines. But it's not super high
level either, in the sense that it doesn't tell you
how to design your apps. It really sits
nicely in the middle. To me, this contrasts
with the Platform as a Service or the PaaS style,
where delays were a little bit blander. The application and
the PaaS layer that sits on top of the IaaS, the
Infrastructure as a Service. And you can see these, because
the PaaS is actually typically, like, changing some
code in the application. This can be very
convenient, because it can do helpful things for you. But it can also constrain you. Kubernetes, I think, has a
very nice layer separation, keeping all these
concerns quite separate. And the main point about
Kubernetes, I think, is that it really
prioritizes making hard tasks possible over
simple tasks easier. So I guarantee you,
if you have a task, and it's already quite easy,
and you want to make it easier, you can probably find
a way to do that. There's got to be
someone that can help you make it even easier. And why not? It's always good
to optimize things. But you have to ask
yourself, I think, am I actually making
the hard tasks harder by pursuing this route? Kubernetes, people
probably won't say that it makes
simple tasks easier. I would say it doesn't make
simple tasks particularly hard. But where it really
excels is that you can capture just a
broad range of things and do it in a fairly
streamlined way. And why Kubernetes Engine? So this is Google
Kubernetes Engine, the hosted product
that runs Kubernetes that my team manages. It's actually the same team
that writes and contributes the majority of the
open source that brings you Kubernetes Engine. So if you're looking
for people that have real expertise
in Kubernetes, definitely take a look
at our product here. And maybe more important
than software, I think, is the attitude that
we have to open source. So this is a quote
from Eric Brewer at the keynote last year,
where he said, "We're going to be the open cloud. We're going to give you the
freedom to join and to leave. And I'll take my chances." And I think this really aligns
us with your objectives, as well. By making it so easy for
you to come and leave, we're incentivized to make
it as good as possible so that you stay
because it's good, not because you're locked in. So I think there our objectives
get quite well-aligned. So before I jump
into a demo, let me cover a few key
Kubernetes concepts to help explain the demo. I covered containers already. Pod-- in Kubernetes,
there's the object called a pod, which
is actually just a collection of containers. It could be one or many. This is the smallest
schedulable unit in Kubernetes. And the reason we
have a pod is so that, if you have
tightly-coupled containers that need to be deployed
together, you can represent them like this. But often, a pod will
just be one container. Then we have nodes. Nodes are just a way
of saying machines. We call it nods, because
they can either be a VM or could be bare metal. As you saw at the keynote
yesterday morning, GKE now works on prim. So this will be more
and more common. And then there's a deployment. The deployment is the
real driver in Kubernetes. It's a statement of the
desired number of pods that you want to be running. So in this example, we have
deployment A, which says, I want two of part A. And we
have deployment B that says, I want one of my pod. Kubernetes uses what's called
an operator pattern, where you will observe the
state of the cluster. It will look at what you have
specified, the desired state. And it will drive
that observed state to the desired state for you. So in this case, the user
has said, I want two of part A and on of part B. And that is
what Kubernetes is observing. But what would happen if
that node became unhealthy? Well, Kubernetes would observe
that there's actually now only one of pod A none of
pod B. It would redeploy those pods onto a
node that was fine, after which the
observed state, again, matches the desired state. And everything is
right in the world. That's deployment. Last one I want to
cover is service. So a service is really just
a group of identical pods that host like an HTTP
service, or a TCP service, or whatever you are
trying to provide. It's just a way of
collecting all the pods and addressing them together. OK, so with that, let
me talk a little bit now about how to actually deploy
things into Kubernetes. And it all starts by dockerizing
and deploying to Kubernetes. Let's switch to the demo. OK, so I have a very
simple Node.js app here. And I added a
10-second startup time for reasons that will
be apparent in a moment. It's just to emulate an app that
takes a little while to boot. Let's take a look at this app. It's extremely simple. So that's running
not in containers. It's just running on my machine. Let's take this app, put it
into Google Kubernetes Engine, and put it into a container. So I mean, I can build this
locally with the Docker Build command. But actually, the best practice
is to use a Cloud Build tool. Because you don't know if-- maybe I had some
change files here on my machine that weren't
in version control. I probably don't want those
changes going into production with code review. So let's not deploy that
container I built locally. Let's set up a Cloud
Build environment. To use Cloud Build,
I'll need a source repo. I probably should
pick the project. OK, I'm going to create
the repo called Next Demo. By the way, that thing
that just happened that was me just tapping
the security key. It just helps keep everything
safe, a little second factor device. So a repository that's
created is called Next Demo. And I'm going to
push the code up. OK, so while it's getting pushed
up, let me go to Cloud Builder now. And I'm going to
create a build trigger that's going to build this
code into a container. So we'll pick Google
Source Repository. You can also connect this
to Bitbucket and GitHub. I'm going to pick that
repo that I just created. And I'm literally going to
leave everything as the default. So we want to build
this from a docker file. That all looks good. Let's just create the trigger. So normally, this would be
triggered when I push code. I'm just going to
trigger it manually, so you can see what happens. And we should see-- here we go. OK, so that container is
now building in the cloud-- very similar to when
I just built it then. I'll wait for that to finish. It only takes a few seconds. I'm just was running
my npn install there. This is a little node app. And we're done. All right, so let's get
these into Kubernetes Engine. So I go over to
Kubernetes Engine. And I'm going to pick the myapp
cluster that I created earlier. And I'm going to click
this Deploy button here. This is kind of the simplest
way to get started, by the way. I'm going to select the
container that I just built. I'm going to give it a name-- let's call it hellonext-- and deploy. And that's basically it. It's quite simple. There is one more
thing I need to do. I need to actually expose the
service so that we can actually view this online. So because my container
was running port 8080, I'm going to target
port 8080 from port 80. I'm going to pick
the service type load balancer so that you can all
take a look and click Expose. So if I now go to the
Services tab there, you can see that there's
one service here, which is creating endpoints. Since that takes
a minute or so-- because that IP
address space goes to get broadcast to
every point of presence that Google has
around the world-- I'm just going to look at this
one that I created earlier. And that is the application that
I just demonstrated locally now running in the cloud-- pretty straightforward, right? Let's go back to the slides. All right, but one of the
advantages of Kubernetes is that it can really
help you automate a whole bunch of different
things in your cluster. For all these great
things actually work, though, you need to
actually give it some hints, and let it know how
it can help you. One of the most important
things that you need to specify are these liveness
and readiness checks. These are probes that run at
frequent intervals checking the consistency
of the container. They look very similar, but they
have an important difference. So the liveness check is
intended to just check the integrity of the container. Is the container running? Now if the container
actually crashes, Kubernetes is smart
enough to restart it. But if your application
hangs, Kubernetes doesn't know if the
application hung or if it's running just fine. So this is why you
need to have a probe. And that's what the
liveness probe is for. Is my container just operating? Readiness is a slightly
different check-- although it uses
the same function-- where you indicate to
Kubernetes that the container is ready to receive traffic. So it's a slightly
different state to whether it's
actually running. Your container may have
an external dependency like on cloud SQL. In which case it could
be alive and running. But it may not have that
database connection yet, in which case, it would pass
the liveness check but not the readiness check. Perhaps, a diagram would help. So we have the liveness probe
that goes just from the control plane to the container
and a readiness probe that goes to the container
and the external dependencies. The reason why you don't
want to mix these two is that Kubernetes will
actually reboot the container if it doesn't respond
to a liveness probe. So if your liveness
probe is dependent on an external service
which goes down, you don't actually necessarily
want your container to restart at that moment. That's why we separate it. Of course, maybe you
do after a while. Because maybe there's
something kind of funky with that connection. So you can configure
the readiness probe to still restart your container
but typically at a slower pace so that you don't
churn the containers. There are three options
for these probes. One is a command
where Kubernetes will use the exit status. Then second is a TCP
Port where it will just try to open a socket. If the socket opens,
then everything's good. And the third one
is an HTTP request where it will make
an HTTP request and look at the
HTTP response code. It's the third one that
I'm going to demo today. So let's switch
back to the demo. All right, and first
up, I'm actually going to show you what happens
if you don't have a readiness check. So to do that, I'm just going
to have a little while loop here that's just going
to go, while true, I'm going to curl that
endpoint that I just created. So you can see, it's just
printing out this app that I created and
the current time so that you can see that there's
actually something changing. All right, let's add
something to this. All right, I'm going just add
a little smiley face here. So it's just some
change that we can demo. All right, and we're
going to push this up. And as you know
already, this is going to build the container for me. So I'm going to
deploy that container. But before I do, just to make
the demo run a little better, I'm going to take a look
at my deployment here. And I'm just going to remove
that little thing there. The app that I deployed actually
has an automatic horizontal port [INAUDIBLE],, which I
don't want for this demo. But I will be showing
you later, so fear not. OK, so now that
we're ready to go, let's see if that container was
built. It looks like it was. I'll grab the new
container name, go back to Kubernetes Engine. And this is also showing
you how to update your app. So I'm just going to do
a rolling update here. I'm going to paste in the new
contained image, quick Update. And let's switch over,
and see what happens. So you notice, there's a few
errors on the top right here. Those errors are happening,
because as I mentioned, this particular container
has a 10-second boot time. So while it was booting, because
we didn't give it a readiness check, all the
containers were receiving traffic, even the ones
that weren't ready. That's why you see a
bunch of errors there. So let's get back to here,
and add a readiness check. So this is what the
readiness check looks like. I'll explain what it
all means in a second. So below the container
in a configuration, we're adding a readiness probe. We're saying the failure
threshold is three. So if this probe
fails three times, I assume that there's a problem,
that the container needs to be restarted. We'll do a HTTP GET request
with an initial start up delay of three seconds. We'll run that
every three seconds. And if it succeeds
once, we're going to count that as a success. So let's save that. I'm just going to
force update that. It looks like I've got a
few terminating pods here that are taking a
little bit of time. So let me update the app
while we wait for that. And I can demo what the
rollout looks like when you have a readiness check. OK, we go back to
Cloud Builder-- same process. I think we're in
a good state now. So let's roll this out. And as we do, let's
observe what happens. Oh, by the way, we're
still waiting for the image to be built. So that's actually
building in the background. I was a little bit quick there. One of the good things
about Kubernetes is-- you see where it
says image pull back off? It's actually smart
enough not to try and boot the content that
doesn't exist yet. So it's actually
going to keep retrying that, which is pretty neat. And it just got it. And if you observe here
where it says ready, you can notice that it's
actually not ready yet. And we're still routing traffic
to the previous container. Once that flips over
to ready, now you can see that it's
flipping between the two different expressions there. Both containers are
receiving traffic. Then it's going to terminate
the old container, and route all the traffic to that one. So unlike previously, when
there was an error state as traffic got routed to
containers that weren't ready, once you have the
readiness check, that error doesn't happen. And you have a
very smooth update. That's what makes readiness
checks so important. All right, back to
the slides, please. All right, now you may have seen
that I was using a lot of UI there. And I kind of just
patching the configuration fairly willy-nilly. Maybe it doesn't surprise
you that that is not the production best practice. So let's talk a little
bit about what is. So as I've covered, Kubernetes
is build on a declarative style architecture, where you declare
the state that you want. And it drives that
observed state to the state that you declared. What this means
is that you end up with a lot of different
configuration files. The workloads and
the services are all specified with configuration. So what do you do with
all that configuration? Well, at Google, we've
actually had this problem for quite some time. So we have a fairly
good experience on how to deal with this. This is a quote from the Google
SRE, the "Site Reliability Engineering" book, which you
can read online for free. And it says that, "At Google,
all configuration schemes that we use for
storing this config involve storing
that configuration in our primary source
code repository and enforcing a strict
code review requirement." That is the same code
review requirement for configuration as for code. The benefits of this model-- so
if you're storing configuration in Git along with the code
on a separate Git repo-- is you get all the benefits
that you get with storing code in version control. You get rollbacks. So if you wrote out
the wrong release, you can just rollback
the configuration. And that will effectively
rollback that release. You get versioning
and auditability. If someone asks you, hey,
what version of our code was running at
9:00 AM on Monday? Well, if you're just
patching it by hand, you probably don't know. But if everything's
stored in Git, you can just look back
through the commit history and see what was running
at exactly that time. You can also utilize
code reviews. So you can try and
avoid that situation when someone fat fingers the
replica count down to one by accident and takes
down the servers. Disaster recovery-- so while
this is not the primary motive in making your Kubernetes
application highly available-- on Kubernetes, we support
regional clusters and a whole bunch of different things-- it is, of course, good to
have that configuration stored in version control as a backup. And finally, whatever
identity and authorization model you have on your
code, you can apply that to the configuration, as well. So you could
potentially restrict it to and SRE-type
role if you wanted to. So what does that look like
in a Git Kubernetes world? This is just one way to do it. But it's kind of the
one I recommend-- where you would have
one configuration repository per app. So in this example, there are
two repos at the top there, one for the app, Foo ,
one for the app, Bar. And then you use branches for
each environment of the app. So the master branch
likely maps to production. You could have a staging branch. And these branches of
configuration map 1 to 1 to name spaces in Kubernetes. The advantage of that
is that you don't have to change the object name. If you have an app
called Foo Deployment, you don't have to call it
Foo Deployment Staging. You can just call
it Foo Deployment, because you're going to
deploy into a namespace. It's up to you if
you want to run this on two clusters or one. So I've just shown
here, just for example, say, having two different
clusters and a bunch of different namespaces. That part's totally up to you. How do you get from
the state, though, that I was demoing into this new
more production grade system? Well, fortunately
it's pretty easy. You can just extract all
the configuration resources that you had so far
using this command, kubectl get resource
dash o yaml. Once you've done
that, you can also set up some very nice things
like continuous deployment. So we have a
product, Cloud Build. I demoed it just trying
building the container. But you can actually use it
to roll out the config change, as well. And all you need is this
very simple configuration file that basically says, I
want to use the kubectl task. I want to run the command
at the bottom, which is apply dash f dot,
which just means apply the current
directory of config. And I'll give it two
environment variables, so it knows where to run that. Let's switch to the demo. And I'll show you
how I can migrate this app that I've created
into that new environment. First, let's look at
the deployment objects. We have one deployment
object, hellonext, that I created in the UI. Now I'm going to download
the config for that. I'm going to use the export
command here, as well. So that it kind of
skips some extra config that I don't really need. OK, and that's
what it looks like. If you saw briefly
before, that's the same config that I was
editing on the website. I mentioned that we can hook
this up to Cloud Builder. So let's take a look
at how that will work. To use Cloud
Builder, once again, I'm going to create
another source repository. This one, I'll call
Next Demo Config. I'll go to Cloud Build. And again, go to
the build triggers. I'll add a new trigger again
on the Google Cloud Source repository. We'll pick here the config 1. But instead of using the docker
file, what I'm going to do is I'm going to point this
at the Cloud Build config file, which is also in
this directory, which I can show you. That's that same file that
I had up on the slide. It's in a subdirectory. So that's all, one change. We're just going to point
this at the Cloud Build file. And we're going to create
that trigger, which will sit alongside the code
container building tree that I have. All right, so let's get this in. Push that up. And we'll take a look at what
happens in the build history. So you can see that, just
like we were building the container before, we're
now actually executing this config build. Instead of building
a container, it's actually just running
this command, which is kubectl apply dash f dot. Basically, it's
just kind of roll out all that config in that
Git repo onto the cluster. So let's demo an app update
using continuous deployment. Change that one more time. And this time, I'm
not going to use the UI, because I
know what the image is going to be named based on the
SHA Let me just look at that. And here in my config,
I'm going to open up. And I'm going to edit
here the container image. I'm just going to
pasting the new SHA, because I know that Container
Builder is going to create a container with that name. So I don't have to go
back to the UI any more. All right, this time, let's
just observe what happens. So in theory, [INAUDIBLE]
in a few seconds, pick up that commit
automatically, and start rolling out into the cluster. Let's get my little curl
while loop back up running. Yes, there we go. So Google Cloud Build
picked up the change, has applied the change. We're once again waiting for
that readiness check to pass. Very soon, on the
right, you're going to see that change I just
might get rolled out. We're splitting the traffic. We're going to terminate the
old container in just a moment. So that is kind of the start
of a proper production-grade pipeline on GKE. Obviously, you can take
this a lot further. The team also has various
different ways of doing this. We have a product
called Spinnaker, which is fairly advanced. And we have some partners
in this space like GitLab, Weaveworks, Codefresh. But that's just
one way to do it. Back to the slides, please. So of course, it
goes without saying, I think-- but I'll
say it anyway. Once you moved to
Configuration as Code, don't go editing
the configuration or replying it outside
of version control. So don't be doing what I was
doing before in the UI once your moving the system. Because then, of course,
you're violating the constraint that everything went
through a code review. OK, so what have we covered? We've covered how to
configure your cluster so that communities Kubernetes
knows when your application is running, when it's ready. We've talked about how to export
that configuration and get it into a close-to-production-grade
pipeline. There's a couple
more things that you need to do to help
Kubernetes help you. And one of those is
telling it what resources you need for your container. So this comes in two forms,
kind of like the probes before. But again, they have two
very specific requirements. The requests is how you
reserve resources for your pod. And limits constrain how many
resources your pod can use. So if you request all the
resources in the cluster, then the schedule will actually
stop scheduling new containers. Whereas, limits just
restrain your container. I think a diagram is
probably more helpful here. This is showing one VM
and all of the resource allocation of that VM. We have container
A that requested 20% of one CPU, container B that
requested 50%, container C that requested 20%. And then let's look at
what's actually being used. So that was just the
resource allocation. But it doesn't actually restrain
what the container can use. So here we have container
C using a lot more than it said it would. We have container B using
a bit less than it said and A using a bit less, as well. is actually really good, because
this means that you're not wasting those resources. Even though they were
allocated to container B, container C can use them as
long as container B isn't. So it's kind of like
a way of guaranteeing a minimum amount of resources
without actually constraining the maximum. Now if you do want to
constrain the maximum, then you can set
a limit, as well. So I could have
limited container C at 25%, in which case, it
won't go over that amount. What this means, though, is
that, if you have a container-- like here, container D-- that is requesting
20%, it's actually not going to get scheduled. So even though the CPU does
have 20% capacity available, because it wasn't
allocated, it's not going to get scheduled. This is how you set it
up in the deployment. So you just put
this in pretty much exactly what I did
with the probes. I'm actually going
to demo that one. But it's fairly easy to set up. And you can look at
Google Stackdriver to see what resources your pods
are currently using in order to get good values for this. Obviously, you want
to put the request somewhere above the
minimum that you need. But obviously, don't
make it excessive. Don't make it too high. Because you also kind of
want to catch cases where you might have a memory leak. That's where the limit
can come in handy. And you just limit the
container at certain amount so that Kubernetes is
restarted if it ever uses too much memory. Another thing to consider when
you set limits and the requests on the container
is how many forked processes you want actually
inside that container. So this obviously affects
the memory and the CPU usage. Before Kubernetes,
before containers, it was generally
pretty good practice to have a lot of concurrency
in each version of your app. Because it was such a
pain to deploy the app, you wanted to make that
app highly concurrent. But in Kubernetes,
it's very easy to create multiple
replicas of that app such that you don't have to have
a huge amount of concurrency within the app. So what's the right
balance, though? I tend to think about
two or three forked processes per container
as about the right amount. Because you do get
some memory savings by just forking a process
and making it internally a little bit concurrent. But the advantage of
having so many replicas is that, if one of them
were to crash, then you haven't removed
all the capacity. Whereas, if you had only two
replicas and a huge amount of concurrency inside them,
one of them going away would be quite problematic. Kind of the same
trade-off for machines-- so large machines do offer a
slightly greater efficiency with Kubernetes. You can pack more containers in. There's less wastage. But of course, more machines
gives you more availability. So kind of the same as before,
with more replicas giving you more availability. So again, just balance the
two trade-offs of machine size and machine count to get your
desired availability goal. And what about rapid iteration? So one of the nice things
about not using containers is that I could just go Node
server.js whenever I wanted, bring up that application. It was really straightforward. So for Kubernetes,
we have a tool called Scaffold that
aims to give you that kind of quick
iteration that you're used to while still
using Kubernetes. Scaffold observes your
directory of configuration code. And whenever something
changes, it's going to redeploy your
app too Kubernetes. So you can very quickly see
any changes you make deployed. Right. So that was about
some best practices of how to describe your
application to Kubernetes in a way that it can
help you kind of keep everything running for you. It can restart the container
if there's a problem. It can split the
traffic accordingly. What about growth, though? What about planning for growth? So this is something where
Kubernetes Engine really excels, actually. We have three world-class
automation tools that we offer as part
of Kubernetes Engine. The first is cluster
auto-scaling. So when you schedule a
container and there are not enough resources, the
Cluster Auto-Scaler can automatically
provision more resources to handle those extra replicas. But perhaps as
importantly, it can also remove resources that
are not being used. Here at Google,
we don't want you to pay for compute resources
that you don't need. So we'll actually help you
scale it back in the quiet times to save money. The second one is
Node Auto Repair. So occasionally, VMs
can become unhealthy. Node Order Repair is
constantly watching the VMs, just as Kubernetes is
watching your containers. And it's going to replace
any VMs that have a problem. The last one is
Node Auto Upgrade. So as you've probably seen
from the last couple of years of news, there's a
lot of vulnerabilities that the world
discovers in releases. And you probably don't
want to stay up late at night worrying about this. So if you turn on
Note Auto Upgrade, we'll keep you up to date. And we'll apply all these
patches to Kubernetes right onto the nodes for you. And these three
features are actually, to the best of my knowledge,
not available in other clouds. These are very much specific
to Google and what makes, I think, Kubernetes Engine
just such an amazing product. To really take advantage
of these things, though, you actually also need
to auto scale the workload. So you need some
way of detecting when there are more users
coming than what you can serve and automatically
adding replicas of your pod to handle them. For that, what we
recommend is using requests per second metrics. Again, this is kind of like
a best-in-class thing to do. You can use CPU and
memory usage as a proxy to scale up your container. So for example, you could
say, if my container is running really hot and
is using 90% of the CPU, add another one. But not every
application is CPU bound. In fact, that's fairly uncommon. It's often, there's
a queue length. There's often a request per
second that the app can handle. So on Kubernetes Engine, we
make this actually fairly easy for you to set up. In conjunction with
Stackdriver monitoring, we have a Kubernetes Horizontal
Pod Auto-Scaler component that can read the requests
per second metric from Stackdriver that's written
to Stackdriver from the HTTP load balancer, such that,
when new users arrive, we can automatically
provision more containers to handle the increase
in requests per second from these users. So what you do is you measure
how many requests per second can your application
handle, just as you measured how much
RAM and how CPU it used. It's fairly easy to
measure how many requests-- what the throughput is
of your application. And then you can just
say to Kubernetes, look, I want each of the replicas
to have a maximum of, say, six requests per second,
or 100, or whatever it is. Setting up a horizontal
pod auto-scaler it is really straightforward. You just have one
more piece of config. I know, Kubernetes is
all that the config. There's really two lines
that matter there, too. It's the ones I've
highlighted in blue. And that is, my metric
is the request count coming from the web answer. And my target value
per container, in this case, for
this example, is going to be six requests per second. I want to have one container
running for every six requests per second. Let's switch over to a demo. And I'll show that in action. All right, so the
bottom right there, I have Stackdriver monitoring
showing the request per second on the service. And you can see there's just
under two requests per second happening right now. Let me add a couple
of while loops to generate some load on this. That should be about
20 RPS, I believe. After about 30
seconds to 60 seconds, that RPS should show
up in Stackdriver. And we'll see if the
cluster reacts to handling. I'm going to set
some watches here. So we're going to
watch how many replicas we have in that container,
how many nodes we have. And for good measure,
I can also watch the status of my
horizontal pod auto-scaler. Now as this takes a
few seconds to show up, I'm actually going
to switch to a video and show you a
30-minute window of what happens when a whole
bunch of traffic arrives on this
example application. But I'll switch back, so you
know I wasn't cheating later. All right, so this is the
video I recorded on Monday-- exact same demo. And here we've just
added an extra five RPS. And I'm just going to speed
up the video a little bit. So you can see it in the
chart on the bottom right that the QPS is just
about to arrive. We should clock up there
to about eight or nine. An because I set a target
of six RPS per container-- it looks like we're
currently at about seven-- you can actually see that
Kubernetes has already scheduled a second container
at the top left there to handle that extra traffic. I'm going to speed
up this demo again. And let's try adding
another 10 QPS of traffic. And you can see that the 10
QPS registered in Stackdriver. The Horizontal Pod
Auto-Scaler successfully scheduled two more containers. But it looks like
there's a problem. So those containers are pending. They're not running yet. And the reason is is that
this cluster is actually out of resources. So those containers are actually
stuck in a pending state. Well, the good news
is, as I mentioned, Kubernetes has a cluster
auto-scaler, as well. So what's going to happen
is GKE, Kubernetes Engine, is watching for this state and
has realized that there are pending contains here that they
don't have the capacity to run. And so it's actually
automatically provisioned one more node there,
you can see, in the second from the top left. Once that node becomes ready--
and let me just play the demo-- those pending
containers get created. And they're now running
and serving traffic. All right, so for a bit of fun,
let me just add another 20 QPS. And we'll speed up
the demo quite fast. And we'll see what happens. Once again, it schedules
some more contentious. It scheduled some more nodes. I think about five nodes now. But I told you that you're going
to save money, as well, right? So here I've seen that I did a
whole bunch of users arriving. You might have been sleeping. I don't know what was happening. It could be the weekend. A whole bunch of users arrived
and were successfully served by your app. So they weren't disappointed. They didn't have a fail
character on the screen. They were successfully searched
while you were sleeping, or on vacation, or whatever. But what about saving you money? All right, if I drop the
load all the way off here, drop it back to five QPS. Let's see what happens. And I'm going to run the
demo a little bit fast. Because as with a lot
of auto-scaling things, the scaling down is
a little bit more conservative than scaling up. The reason that
it's typically set that way is that you don't
want to quickly deprovision resources in case the users
come back straightaway. So it looks like we've
already deprovisioned the pods in the top left. They're no longer running. But we still have some
nodes that are running now. They're probably not
needed now that we only have one container. So let's see what happens. And look at that. The nodes have been
removed, deprovisioned, and went back to their
starting two-node cluster. So that is Kubernetes
Horizontal Pod Auto-scaling using an RPS metric and
the cost-to-order scaler to scale up and
down the cluster. Back to the slides. OK, so what did I show you? The recap is we looked at
readiness and liveness probes and how important they were,
the importance of defining requesting limits so that
Kubernetes knows what resources that you need, the idea
that you should be treating configuration as code using a
CI/CD pipeline to deploy code so that you're not going
to accidentally deploy things that would just happen
to be sitting on your machine. Using a CI system
means that you're only going to deploy the code
that was checked in. And with a CD
system, you're only going to deploy the
configuration that was checked in. You should turn on
all the automation settings in Kubernetes
Engine, so you can just sleep easier at night-- and then finally,
auto-scale using RPS. [MUSIC PLAYING]