ANDREW JESSUP:
Thank you, everyone, for coming down today. First of all, my name is Andrew. I've been working in Google
Cloud for about four years now. I think this is the third
time we've run a GCP Next event in San Francisco. And every year
we've run it, it's really been a step
change increase in the number of partners,
the number of attendees, the number of launches, and
more importantly, the energy that we see at
these conferences. It's really gratifying to see
it grow so much every year. So truly, thank you
for coming down today, and thank you for being
a part of that energy and a part of this conference. Particularly, thank
you for coming at this, which is probably
the danger hour for a tech conference-- right
after lunch on a Friday. So hopefully we'll be able
to keep you entertained. What we're talking
about today is building high-performance
micro services with Go, Kubernetes, and gRPC. And we actually
realized, when we were working through the
structure of this talk, we kind of started
with a title, and we realized that you could
almost pick any single word in that sentence and
quite comfortably make an hour-long talk out of it. And in fact, a number of
my colleagues are doing so. Even right after
the session, we've got Tim Hawken is
going to be diving into the details of Kubernetes
networking and the latest there. Dan Ciruli and Sep are going to
be talking about a thing called Cloud Endpoints-- which I'll
talk about a little bit, too. But there'll be
going deep on that. And then, across
the week we've had a bunch of sessions
on these topics that have usually picked
individual slices of this and gone deep. So we were trying to
think of what we could do to make this interesting. And we've realized
that what we could do is a complement to some of
these deep-dive sessions was something that is a little
broader and a bit more of an overview. So what we want to
do today, is show how you can use these technologies. Hopefully, you've heard
of some, or all of them. But we want to show
how you can use them in a real world, practical,
grounded application. So that's what we're
going to do today. Today, we're going to build
and deploy an application to the Cloud Platform
that uses all of these different services. And we're going to
show you a little bit about that process
and some techniques and some strategies you
can use to turn that into a robust workflow. We're going to talk a
little bit about some of these technologies as well
and how they help us get there. So before we get
into the app itself, It's worth just recapping
a little bit of motivation. Hopefully,
microservices is a term that you've probably
heard before. It's not a new concept
for the most part. Fundamentally, it's
about taking apart what might have previously been
a large monolithic codebase and breaking it out into
discrete components that can be deployed independently. And particularly in the case-- there's a number of
advantages for doing this. But it's particularly
advantageous when you are building
an application that has a large number
of developers on it. Being able to independently
deploy and manage and implement each of these
different components unlocks the development
things from each other so that they can
work independently. It can be an enormous
productivity boost. At Google, we're a very heavily
microservice-driven company, or service-driven company. When you interact with
a site like Gmail, for example, when you go
and view mail.google.com, you are triggering
a cascade of RPCs across Google's data
centers in order to retrieve the information
and serve you that page. There's a whole ton of work
going on behind the scenes, and as you do that, your RPC is
passing down through the code paths that dozens and, in many
cases, hundreds of engineers have worked on. So for us being able to deploy
these things independently is hugely powerful. So we've seen with
cloud, as well, there's been something of a resurgence,
and this is a design pattern. Again, for those who
were around in the '90s, service-oriented
architecture was another way of kind of
saying the same thing. The general design
pattern is not new, but a combination of elastic
workload provisioning. And some of these
technologies have helped accelerate it and push
it a little bit further back into the mainstream. So it's becoming a
very popular design pattern, a very
useful one, especially in large organizations. But it comes with a couple
of important challenges. And perhaps the most immediate
and obvious challenges, when you're taking apart a
codebase, where previously, you were passing objects
around in memory, now we're as often as not
passing them across the wire. So to send a message,
we have a bunch of work, to take a request, to
serialize it into a form that we can actually send
its bits down the wire. That request then needs
to get to a network card. That needs to get
translated into packets. Those packets need to get
sent, and again, there's many hours of discussion
you could have about that. All of this, then, needs
to happen in reverse on the other end. The message needs to be
deserialized, and then turned into something that an
application developer can use, some kind of in-memory
object that they can actually work with. To do that can impose a
significant performance penalty. It's obviously a
lot more expensive from a compute-resource point
of view to do all of this work every time you send a message. And particularly, when you
look at modern microservice architectures, they
may be composed of many tiers and many different
operations, like Fan-Out. As you push on this
design pattern, this cost of serialization
becomes more and more acute. Excuse me. So that's the first problem. In concert with that
is another problem, which is just
network contention. Part of this is just simply
the amount of bytes and bits you have to send over
the wire when you're dealing with significant
volumes of scale, when you're dealing
with low latency or low, poor
reliability networks. Network contention
itself can be a problem. And if you're not careful,
not having a good model on your network architecture
can impede your ability to take up a design
pattern like this. And then, finally, as you're
breaking these things down into multiple
components, usually, you're breaking them down
into multiple processes, multiple running programs. And when you take traditional
models of application isolation, application
containments, and apply that to
microservices, often this thing could cause a certain
degree of overhead that may be practical
for something that requires a whole machine
or several machines to run. But when you have a small
service that just takes up a couple of megabytes of
memory, maybe only needs a small amount of spindle usage,
some of the traditional models of managing the compute
resources that these services need aren't necessarily
as robust or as elastic as we would like them to be. And just to emphasize
that last point, it's worth mentioning
Bin Packing. And so this is a somewhat
simplified view of the problem. But in a traditional model
of building applications, we may have separated
our component. We might have separated
our architecture out into a few services,
but we will typically have at least one machine
dedicated to each service, or maybe a cluster
dedicated to each service. And again, that's fine when
the services are quite large. It can often justify several
machines to run those. But when you have a large
number of small services, this model becomes problematic. It's much more convenient
if we can run twofold. Firstly, run services
across multiple machines, and secondly, run
multiple services within a single machine. At Google, this
is a real problem. This is something we've started
encountering a long time ago because we had a large shared
bulk set of compute resources. We've got a large number of
developers deploying services into them. We wanted a way of being
able to efficiently manage how services were being
allocated to compute resource, without necessarily
having to get into the language of
individual machines to individual services. So this fine-grain Bin
Packing, if you can do it, turns out to be a real
advantage when deploying these kinds of systems. Again, not strictly necessary,
not strictly required to couple this kind of
deployment to microservices, but it definitely
helps you, if you can. So, since we've been doing
so much of this at Google, and we've been doing
it for a while, we've spent a number of
engineering cycles building out technologies that can
help us in this process. We're going to talk about
three of them today. In many cases, what
you're looking at are the second, third, or
fourth generation attempts at solving these
kinds of problems. The first, and one
of the most exciting, is a thing that we open-sourced
reasonably recently called gRPC. It stands for Google
Remote Procedure Call. And gRPC is actually
a collection of libraries and
tools that allow you to create APIs, client
and servers, in a number of different languages. And it relies on
Protobufs, which are a strongly-typed and
very binary-efficient mechanism for serializing
and deserializing messages. And it's also built as
a transport on HTTP too. So it's able to take
advantage of things like bi-directional streaming. And then, on top of
that, is robust library for things like flow control,
right management, retries, all of the work that ends up
being necessary in building a robust, powerful client. But it usually takes a few
attempts and a few mistakes, in order to get right. The other thing about it
is it's all open-source. Second thing that we're
going to use is Go. Go is the language
that absolutely has the cutest mascot-- I'm sorry, smallest
runtime footprint. And this, again, turns
out to be important, especially when your
services are small, and you have a lot of them. It's not, again,
strictly necessary to have a small
runtime footprint, but when you can run your
process as efficiently as possible, with
as little memory and as little system
overhead as possible, this allows you to pack more
and more of that capacity into a finite amount of
compute, which saves you money. The other nice thing about Go,
when it comes to microservices, it has a ton of interesting
features in the language. But one of the things
it has that's important when it comes to
microservices is it has a modern and
efficient networking library. Go is a relatively new
language, which means, again, it's been able to learn some
of the lessons of the past in terms of just
how networking can be performed and abstracted. And it's been used
very, very heavily at Google for
networking applications. And so we've had a lot of
time and a lot of energy spent into making these
libraries as efficient and as optimal as possible. They're also nice and
easy to work with. And again, Go is open-source,
so there's a community out there that you can-- you don't just get it for
free, but there's a community that you can engage with. And I don't want to make this
too academic a discussion, or too academic a
talk, but it is worth talking a little bit
about performance. The graph here is actually
from a public benchmark that the gRPC team published. They rerun this benchmark
with every release of gRPC. It's a thing called the
Ping Pong Test, where you have two VMs, and you just
take a message serialize it in a client, send
it to a server, deserialize it, serialize
it again, send it back, and repeat, I think
it's about 100 times. And measuring the
latency of that process. So it's a really good
measure of the performance of the language and the runtime
at dealing with this message passing. And you can see Go
does really well here. It's approaching the kind of
performance you get out of C++, and also, although it's
not on this list, Java. So, while gRPC is very much a
language-neutral technology, it works really well with
Go, which is why we like it. The last key piece in
this is Kubernetes. Again, there's a lot of talks
right now in Kubernetes, so, hopefully, you've
heard something about it. Kubernetes is the spiritual
successor, in many ways, to a technology that we've
been working on at Google for a long time, called Borg. Borg is our hosting
platform, that allows us to, again,
solve the Bin Packing problem-- taking a
large number of tasks, that large numbers of Googlers
is submitting to a system. And it effectively
acts as a giant solver, figuring out where is the
best place in my fleet to run that at any given time. And when these
workloads are not just being deployed all the
time, but they're also dynamic in terms
of the resources that they need to
have access to, being able to have a dynamic,
automatic system that's able to handle this
allocation is hugely powerful. With Kubernetes, we've taken
that exact same philosophy, and many of the same ideas,
and we've packaged app into an open-source project. Interestingly, it is
itself written in Go. Although, like gRPC,
it works perfectly well with any language. It's very much a
language-agnostic piece of infrastructure. Also, open-source and a very
thriving community of people building around that, as well. And I think, more contributions
of Kubernetes work commits have now come from outside
Google, rather than inside. So we want to build an app and
show how all of these things can fit together. If we can cut to
my laptop quickly-- which I may need to wake up. So we're going to build an app. The app does this-- it creates 3D animated
GIFs Super practical. I hope you can see
how this applies to your day-to-day
lives already. So it seems practical. One of these apps,
that is actually a little deceptive-- very simple
to use, very simple to show, but there's a lot going on,
in order to make this happen. I'll try this live and do my
ritual prep for the demo gods. Well, let's see
what this actually looks like to use this app. So a very simple form. I'm going to put in a name. I'm going to choose a mascot. You've seen the Go one. Again, cutest, right? I'll pick something
else, say gRPC. We go create an app. So we've now submitted a job. We're using a technique,
called Path Tracing, to actually generate
that 3D GIF, which is a particularly
computationally-intensive exercise. And so, in order to
make all of this happen, we've actually had
to do a ton of work. The first thing we had to do is
create a series of scene assets to render that image--
things, like materials files, objects files, a
bunch of metadata about light sources
and camera placement. And we actually generate
that dynamically for every single GIF that
needs to be requested. So we've gone and created
a bunch of those assets. The next thing we've
done is create a task to render each and
every frame of that GIF. There's about 10 of them. And so we then submitted that. And we're done. It's actually faster than it
took when I practiced this. So we had to submit a task for
every frame of this animation, and then we've then gone
and dispatched that out to our cluster. Our cluster has then gone and
run that rendering process. It's a pretty slow
library that we use. So it takes about 15 to 20
seconds to render each frame. So it's doing that
in the background. Pulls them all together
and synthesize them into an animated GIF. So quite a bit of
work there, just to handle all of that overhead. And if we switch
back to the slides, we can dive into the
architecture of this thing and take a look about
how we actually built this using these technologies. So what we're seeing here is,
basically, the set of services that we needed in order
to build this out. Would you believe five
services are needed to build 3D animated GIFs? So let's break this
down a little bit. Let's start with
the first service, the bottom-most service,
which is the Renderer. So the Renderer is actually a
very general-purpose service, which is nice if we ever
want to reuse animated GIF creation for any other purpose. What the Render service does
is take a request that includes references to all of those
assets that we talked about-- so, again, objects, materials,
light sources, and so forth-- will perform the task of
actually generating an image. It will store that image
in Google Cloud Storage, and then, it will
return in the response back to any upstream
caller path to where that single frame,
that single image, lives in Google Cloud Storage. On the other side of this, we
have the front-end service. The front-end service
is the service that we were actually
interacting with directly in the browser there. This is the,
essentially, HTTP server. So it's handling the form, it's
handling the form submission. If we were doing
user authentication, it would probably do that. And it does all
the kinds of things you would normally expect
to see in a web front-end. Handle it and serve static
assets, and things like that. There's not a lot of logic
in the front-end, though. We really wanted to keep the
logic out of the front-end, as much as possible. Because we never know if we
need to be building front-end ends for other things. Maybe there's a mobile
kiosk we want to build, or a mobile app that we want to
build that doesn't necessarily need to talk to a HTTP API. So we really wanted
to try and encapsulate as much of the business logic,
if you will, of the GIF creator into some other service. So we called this, of course,
the GIF Creator Service. GIF Creator does everything
in between our web front end and our Renderer. GIF Creator has
an API of its own. It takes an API request that
looks a lot like that form that you saw before. So there's a caption. You can pick a mascot. There's a couple of other
little bits of metadata. And when it receives that, it
will then do all of the work that we talked about. It will be creating
the scene assets. It will be storing them
in Google Cloud Storage. It will also be
responsible for calling out to the Render Service. So it will call a
Render's own API and perform the Fan-Out in
order to make that happen. We want to do that robustly,
so we've actually split this into a server and a worker. And we use Redis as our
State Management Service to track the status
of all of these jobs. GIF Creator handles all of that. It actually has its
RPC and its response. It doesn't send back the
GIF because we expect this process to take a while. Instead, we send
back a job status. Which the upstream front-end
can pull for, as necessary, in order to see where
this work is at. So GIF Creator is really the
workhorse of all of this. Of course, GIF Creator also does
that synthesis, the compositing of a set of frames, once
they've all been rendered, into a final image. So a lot of work going on here. Let's zoom in on the
APIs for a second. There's really two
APIs that we have for the two different services. The Render API and
the GIF Creator API. We've talked a little bit
about what those APIs do. But let's jump into
a little bit of code. We wanted to write
these APIs in gRPC. Again, we like the
fast serialization, the efficiency on the wire. We like all of the
retry and flow control that gRPC gives you. GRPC APIs typically
start with a document that looks a bit like the one
you've got on screen here. This is called a Protofile. A Protofile is actually
a language-agnostic domain-specific language
for describing an API. And it's describing two
things-- both the RPCs, which end points, you can call. Most crucially though,
it's also describing the messages that get passed in
the request and the response. And the takeaway,
really, from this screen is that this document
is quite strongly typed. You can see we've got
things like ENUMS here. We can describe quite
complex data structures. Although we haven't
had them here, we can have things like
repeating fields and arrays. And so it's a robust language
for describing message passing. And as I said, this
is language-agnostic. Although it looks a bit
like Go code, it isn't. But what it can be
used for is two things. One, is it's a fairly
human readable contract for describing what
the API should be. That in itself is
actually pretty useful. Two different teams who
need to talk via API can use this document to
effectively negotiate on what RPCs they need
from each other and what information they
need from each other. Especially when they work
in different languages, having a language-neutral
document like this can actually be a
really nice idea. The second thing that we
get out of this document, though, is an input for
a tool called Protoc. So Protoc is the
protocol buffer compiler. And what Protoc does
is take this document, or a document like it, and can
turn it into clients and server libraries in your
language of choice. I think, from memory, we support
about seven different languages at this point. And certainly Go is one of them. And I want to demo
it to you here. But when I run Protoc
over a file like this, I then get a library
generated, which I can then use in my code. So this is some actual code
from the GIF Creator Service. I've cut a bit out. But this gives you
an idea about how this generated code
from a Protofile can be used in my service. And if you've
worked with, really, any kind of client
server interface, this will look fairly familiar. There's a lot of
options and things that you can play around with
that we don't necessarily need to deal with by default.
But you can certainly get into things like
injecting middleware, things like supporting different
forms of authentication. But fundamentally, it's a very
simple, particularly in Go, construct to work with. And, crucially in
this, you can see that the objects that I'm
dealing with in my Go code are Go objects. I have Go structs. I have Go arrays. I have Go strings. So I'm working in a form that's
pretty idiomatic and pretty natural. I haven't had to deal with
any of the serialization and deserialization. GRPC has handled that for me. So now we'll get back
into the code in a second. But one of the things that we
found really trips people up-- it tripped me up a lot when I
was first starting to work with these tools-- is actually building
the pipeline that gets us from code,
like you saw before, to something that's actually
running up inside Kubernetes. It's complex because,
frankly, there's quite a few different
steps involved. And there's many different ways
of accomplishing those steps. There's not necessarily
one single blessed path that everybody uses. So we wanted to talk
about that a little bit, not to show the economical
way, necessarily, but a way, where we can
get from a code base to something that actually runs. As I said, there's
a lot of steps. The two high-level
steps, if you will, are the Build and
Packaging Phase-- this is about taking
code and turning it into an image of binary assets. Typically, that's
a Docker image, but you can pick your
packaging format of choice. And you'll usually store that
in an asset registry somewhere. Maybe it's
Artifactory or DocHub. Or we're going to show Google
Container Registry, which comes with every
Google Cloud project. Second stage is
about deployment. I've built this
asset that describes everything I need to run. I then need to get it into
Kubernetes and deploy it. So let's double click
on that first stage. This is what the build
pipeline looks like-- going from source
to an image for this app that we just showed you. And there's actually quite a
few different steps in there. And this is especially
true when you think about compiled languages,
like Go or C or Java. It's a lot more than
just getting some code up and running on a server. We really actually
rely on these pipelines in order to be able to do it. The first step is getting
the dependencies involved. Pretty straightforward. For those of you who
are Go developers, you'll know there's plenty of
dependency management tools. In this case, we used
Glide, although, I encourage folks to
check out Dep, which is gaining a lot of traction. But you use a package
management tool of some kind, to bring in all the
necessary dependencies that you have in your
application, anything you haven't checked in. Particularly important, if
this is on, say, a CI server, or it's a clean room
build of some kind. The next step is you need to
actually build your binaries. You need to run Go Build. Now, in our case, we have three
services that we're actually building ourselves. So we have three binaries,
one for each of them. In our case, all
of these binaries come from the same code base. And we're not just
going to package them into the same image. We're just going to invoke
that image differently for each service. Now, you don't have
to do it this way. Again, you could have separate
binaries, separate images, even separate languages for
every different service here. But for convenience, we've
packaged it into one. And we're at Google. We like them on a
repo thing I guess. But you have three binaries. And then, we have
these binaries-- we need to package
them into an image. So in that image, this image is
this hermetically-sealed unit that contains everything
we need in order to be able to run our app. So obviously we need
the binaries in there. We don't need the source code. We don't need the build tools. But we do need a
few other assets. The web server, for example,
needs images, CSS files, templates, a bit of extra
stuff in order for it to work. The GIF Creator Service
needs canonical copies of all of its asset files. So there are a few resources
in there that are necessary, in addition to the binaries,
in order for this thing to run. And then, once
we've packaged it, we need to get this thing
into a registry, somehow. Now, when most people
start off with this, usually what they end up
doing is building usually some kind of, I don't
know, shell script file, or some other tool to help
coordinate all of these steps and bring them in together. And that works pretty well. Although, you do run into some
challenges with that model. One of them being,
you actually need copies of all of
these tools running, wherever you're running
that script in order to run the build process. So we found, at
Google, a lot of teams have this problem as well. A lot of our customers
have this problem. So we've come up with
a tool that helps. It's called Container Builder. We shipped it a couple
of days ago, actually. Container Builder does exactly
what it says on the tin. It takes source
code and packages it into a Docker image. And it supports a
language for describing these kinds of pipelines
that we've talked about. It's called, the YAML Syntax. The file is usually
called Cloudbuild.yaml. And again, I won't
dive into the details here, except to say that we
have a Cloud build YAML file. For this project, we've checked
it into the same repository that we have all
of our other code. And this describes the three
steps that are necessary in order to build our app. Everything is
included in this file. The way we describe steps
in Container Builder is actually with Docker
images themselves. So the nice thing
about this file is that it can actually
be taken and run by anyone else on my team, even
if they haven't necessarily installed all the tools like
Glide and Go and Docker before. So it's a pretty neat way of
describing a build process, and it's pretty general purpose. Again we chose things like Glide
and Go Build and Docker Build to package everything. But you can pick up other
tools if you wanted to. Practically, how do you actually
trigger a build like this? You've got two ways. One is, of course, there's
a GCloud command to do it. And that can be done from
any source tree that's sitting on your laptop. But we also have-- and again, this was just
launched a couple of days ago-- an extension to
contain a registry, called Build Triggers. Which allows you to watch
GitHub, any Bitbuckets, any Google source repository
that you might have set up. And every time you do a
commit to a particular branch, or a particular tag, or that
follows a particular pattern, you can automatically
trigger a new build in Container Registry. So I'll give you a
quick demo of that now. Hopefully, you can see this. I'm looking at Google
Source Repositories here. This is the source code of
all three of those services. Let's jump into front-end,
just to give you a sense of it. This is the front-end,
this is the Go code that produces the binary
for the front end. Some static assets in there. And some templates and a
few other little things. If you jump back out and
look at the root folder here, we're looking at
the master branch. We've also got some pretty
familiar-looking files that help us with the build process. So here's our Glide
file that describes all of our dependencies. Here's the Cloud Build
file, what we showed you a couple of seconds ago. Let's also go to
Docker file here, too. I mentioned there
was a packaging step. We're using Docker to
perform that packaging, and so it'll pull
from this Docker file. And you can see there-- if you're not familiar
with Docker files, this probably won't
mean much to you. But if you are, the interesting
takeaway from this Docker file is there's really not
that much going on. All it's doing is
copying in the binaries that we built
upstream and adding the templates, and
a few other things. It's creating some
root certificates. But otherwise, it's really
not doing very much. It's a last mile packaging step. And this is really nice. It allows us to keep
all of our build tools out of our final image. And it allows us to create
something that really only has the bare minimum we
need in order to run it. So down here in Cloud Shell-- hopefully, you can all see this. So this is a local VM. I've checked out, and you
can see that same repository. So I'm going to try and
make a quick change here. And for the sake
of this demo, I'm going to make a boring change. I'm just going to jump
into the front-end. And what I'm going to do is, I'm
going to make the form that you fill out to create
a GIF, I'm just going to change
the color of that, just to prove this
deployment flow to us. So if look at that,
it's form.html. OK. We have a header here. So I'm going to do
something really trivial. I'm just going to add
something like that. So I made the change-- simple. Let's commit that
back to our repo. OK. So we've pushed that. So now, if we jump over
to Container Registry-- let me give you some room. If we jump over to Container
Registry, you see here, we have this list
of Build Triggers. So we've set up
a trigger already to watch that repository,
and every time we push to the master branch, it's
going to automatically trigger a Build for us. If we going jump in
to Build History here, we can see that build
is actually kicking off. And I won't bore you with
too much of the detail here, except to say, here's
the logs of the process. Right now, it's
running Glide to bring in all of the
dependencies, and then it'll continue on with the
build and the remaining steps. Something interesting
to note here is every build is
given a Build ID. This is just a UUID. And in our Cloud Build file-- actually, let me show
you that really quick. In our Cloud Build
file, we've actually configured it to generate our
Docker image with a tag based on that Build ID. So we've asked to insert the
Build ID into the final image. So let's jump back. That might be done by now. And it is. So there's our build. If we jump into
Container Registry-- and look. And we have a new image. Great. And just a quick look at
the size of that image. It's only 15 megabytes. And if you're used to dealing
with your Docker images built from Ubuntu that have a
ton of tooling in them, 15 megabytes is nice and small. So now, we have an image. Next, we want to get
it into production. So now, we want to
talk about deployment. Before we do that, though-- so you'll notice,
again, this thing has a tag based on its Build ID. For Kubernetes, this
can be really useful. Because we're going to use
these tags in our deployment manifests. Now, we could use the UUID
that I got created here, but that's a lot to
remember and a lot to type. So I'm actually going to
add a second tag here called Demo Fix, which we can
use a little later when we're doing a deployment. So if we can jump
back to the slides. Let's just talk about what this
deployment actually looks like. So what does this thing actually
look like inside Kubernetes? Kubernetes has a number
of constructs that are really useful in this process. And again, it could be
easily an hour talk, just describing the different
moving parts of Kubernetes that come into play when
you're running a real service. We're really only going
to talk about two of them. The first is deployments. So a deployment is a
concatenated object in Kubernetes that
actually, in turn, describes a number of other
underlying constructs-- a replica set and pods
and pod configurations. And it provides a really
nice declarative way of describing how all of
these things are constructed. Fundamentally, what
a deployment is doing is telling Kubernetes how
to run this image, how to run this code. And we actually have
an example of what a deployment looks like. This is a slightly
simplified deployment spec. What this deployment
is describing is, take an image that
we have sitting in a registry somewhere. Run that image
using this command. So this is the
front-end service here. So we want to call
that Front-End Binary, that we built before. Describe any of the metadata
that this thing needs. So what ports does it need? What environment
variables does it need? You can describe
other constraints, like what's the minimum amount
of CPU this thing needs? How much memory does it need? But fundamentally,
pretty simple. And how many
replicas do we need? How many copies of
this process do we want running across our cluster. The other component is services. So it's going to
get a little meta, but we're going to build
services for services. So a Kubernetes service
is a way of providing a simple static endpoint
that we can use when we talk, used to help us
in talking to all of these different components. With deployments, we
might be describing a number of different processes
running all over our cluster. We need a way of being able
to discover where they are and being able to
get traffic to them. So a service in Kubernetes
helps us do that. The way you can use
a service really depends on your deployment. It's a very flexible descriptor. In our case, though, the three
services we're building all work basically the same way. We want to have a
stable, internal DNS name for each of them. so that if we need
to refer to them from another service
inside Kubernetes, we can just look it
up by its DNS name. And then, what we want
is a load balancer which can dynamically route traffic
to any of the pods, i.e. the running versions of this
process, at any one time. So this manifest file
describes everything we need, in order to be able to do that. And we have one of these
for each of our services. We also have one for
the Render Service that we're running inside
Kubernetes as well. So now, we're going to
run another quick demo. So if we can switch-- great. So we've built this image. Let's get that image
into Kubernetes. So in our Source Tree, we
have this directory k8s. And in that k8s directory,
we have a set of YAML files that describe these deployments. Basically, the same thing
you just saw on screen. Now, we have three services. And each of those has
a deployment manifest and a service manifest. So in our case, what we want
to do is update the front-end. And really, all we want to do
is tell that front-end service, instead of pointing
to the old image, now we want you to be
running the new image. So if I increase this a little
bit and jump into that file, you can see here is the file. I like environment variable. So I've created a bunch
of them for this project. And this here is
the image that's being used to render
the front-end. You can incidentally run
multiple, different images inside a single what we call
pod in Kubernetes parlance. But in our case, we're
just going to do one. And we're going to pull
from a different tag than the one we did previously. And I think-- what was it again? Was it Demo Fix? Yeah, Demo Fix. OK. So we've updated that file. Now we need to tell
Kubernetes to deploy it. So let's go do that. OK. Deployment replaced. Now, it's worth unpacking that
last section a little bit, because a lot just happened. We didn't tell Kubernetes that
we wanted to replace an image. What we did was tell
Kubernetes that we would like the state of
the system to change, to match this new file. And one of the things that's
different in that new file is the image has changed. It's kind of important that
we described it that way. We didn't say, make a change. We just said, we have a
new idea of what we want the state of our system to be. Kubernetes, go and figure
out how to make that happen. Now we can provide
hints as to how Kubernetes can make that happen,
like deployment policies. But, at the end of the day, this
is Kubernetes's responsibility to figure out how
to make it work. And this is, again, a really
powerful idea in Kubernetes, that you can
declaratively describe the state of your system and
have Kubernetes figure out all the phony details about
how it should actually work. So let's actually-- a
moment of truth here-- see if this actually worked. So I'm going to jump into
our Kubernetes Dashboard. You can see, again,
all of the deployments. This is our running cluster. This is running
across 10 nodes, which might be a little excessive
for a GIF Creator. We have a front-end
deployment here. And we can see, a
minute ago, this thing received a new replica set. So what Kubernetes
did, in the background, was create a new
replica set that describes the set of
running pods that we need. This new replica set would
have the new pod configuration in it. I with the new image. And it will gradually increase
the size of that replica set, while spinning down the old one. And so what you get is
a graceful deployment as we shift from one
mode of the system to another mode of the system. So that actually happened. In our case, it's a
pretty simple system, and we're not really doing much
in the way of health checking, or any other thing that can
potentially slow this down. So in our case, it just
happened in a second or so. But we have now a
new replica set. And you can see here,
it has two pods in them. Pods aren't doing very much yet. But if we jump into
one of them, we can see that this is now
running with a new image. And it looks like it
started perfectly well. So now, if we go back
to our GIF Creator, we go back to that entry form. Our text is in red. So our change went through. [APPLAUSE] That is, hands down,
the biggest applause I've ever got for
changing a bit of CSS. So thank you for that. So now, we have a running
app, running in a cluster. We only got a little
bit of time left. And I want some for
Q&A. So I'm going to do a whirlwind tour
of some of the things that you get when you run this
stuff inside Google Cloud. Everything I've shown you up
until now, for the most part, with the exception of
Google Cloud Storage, perhaps, you can run
basically anywhere. In fact, most of it, you can run
on your laptop, if you want to. And again, all of this
stuff is open-source. That's pretty nice. It means, you don't really
need to pick Google, in order to get started
with this stuff. But if you do happen to be
running it on the platform, you get a couple of
interesting advantages. The first, of course, is you get
Google Container Engine, which takes away a lot of the
headaches of actually running a Kubernetes cluster. And it can do a lot
of things for you, including automatically
patching nodes, automatically standing up,
say, Google Cloud Platform Load balances in order
to make your service run. It can do dynamic auto-scaling. A bunch of other nice
plug-in features there. And it takes a couple
of clicks to use. But there's a couple
of other features that turn out to be pretty
powerful on the Cloud Platform that are worth
touring through briefly. We saw a Container Builder. We saw Source Repositories. We saw the Container Registry. We also have a built-in tracing
system, which works really well with Kubernetes and gRPC. So I won't bore you with
a full tour of this, but this allows us
to do distributed tracing across a cluster. Distributor tracing is
actually really, really useful in situations like this. One of the first
things, that I wanted to do once I deployed
this app, was figure out why it
took so long for us to render a bunch of frames. So by jumping into
tracing, you are able to understand, even
as a job progresses, where the time is being taken. Is it being taken
up in rendering, or passing this Cloud Storage? Or is it in a Go binary? Distributed tracing allows
you to instrument your code. And when it's distributed,
the nice artifact of this is you can parse trace context
across multiple running processes. GRPC, in particular, makes
it really easy to do that. But by doing so, you can
actually ditrace a request, all the way from when the user
requested it on our front-end, down through to the fanned-out
tasks that were being run in the Render-- and all the way back
up to the final job. I won't demo that now, but would
make for a good follow up talk. Incidentally, what you're
looking at right now is the breakdown of
latency over time. And you can see there's a few
experimental tasks there, which took a really long time to run. And generally, you've
got a clustering that looks a little bit
like what you'd expect. Some of these dots represent
our front-end service, which is really fast. And responds to
requests very quickly, terminates very quickly. Some of these represent
spans of longer jobs-- e.g., the entire process of
rendering an image end to end. Or just calling
the Render Service, which takes a bit
more time, too. So we have tracing. Of course, logging
is centralized. If we jump back to
the slides, I'll cover a couple of other quick,
nice features of the Cloud Platform The other one
is the Live Debugger. So if you want to do real-time
introspection of application states, the Stackdriver Debugger
now allows you to do that. It works really well with Go,
and it works with Kubernetes. The other thing is
Google Cloud Endpoints. If you haven't heard
of Cloud Endpoints, and you're interested
in this, Dan Ciruli is giving a talk on
this, right afterwards. What Cloud Endpoints
allows you to do is to provide some additional
robustness and automation-- and centralization
around your APIs. Happens to work
really well with gRPC. And this gives you features
like a dashboard for detailed logging of RPC traffic,
for detailed monitoring of your RPCs, and latencies--
again, across the system, if necessary-- and things like
centralized access control. So if you ever get
to a point, where you're thinking about
publishing one of these APIs to make them consumable to
the rest of the world, all of those things that you
might need to think about, like user-based quartering
and ACL management. Cloud Endpoints
grossly simplifies the process of building app. And again, works
really well with gRPC. [MUSIC PLAYING]