[MUSIC PLAYING] ALEXIS
MOUSSINE-POUCHKINE: Hello. Hello, everybody. Thank you for coming. My name is Alexis, and this is
DevOps in a Serverless World. I work as a developer
advocate at Google in the cloud organization. And before we get started,
let me tell you about the tool that we have to take questions. Hopefully we'll have some time
for questions towards the end. In the conference
app, there's a section for this talk where
you can ask questions. You can even see other
people's questions, and you can vote them up, or I
think even down if you'd like. I hope we have time for this. If we don't, I will and my
colleagues will provide answers right in the tool. So with that out
of the way, I do need to spend just a little bit
of time defining serverless. One of the buzzwords in
the title that I used is serverless. And I really believe that there
are real and profound reasons why you would want to
use serverless today when writing new applications or
maybe even taking existing applications to the cloud. From an operational model,
there are no services to provision, no cluster to set
up, no cluster to administer or secure. And the important
thing is that you are billed for the actual usage
via optimized and transparent auto scaling. That's the operational model. From a developer's
perspective, this is really focused on consuming
and exposing services. It's a very service-focused
environment. You're also reacting
and producing events. The loose coupling between your
modules, between your services, is probably done
through eventing. And finally, we'd
like, as developers, probably to have an environment
that's as open as possible so we're not locked into
any particular solution. Now, let's go back to
the word serverless. Many jokes are about serverless. Yes, there are servers. This is not Google's new and
latest serverless data center. But there's sometimes
some confusion, and it's kind of weird to
define an area by what it's not, you know, serverless. So let me suggest that we try to
find a new word, a better word. How about no-ops? I mean, this term is
trying to capture the fact that developers can focus
on delivering great apps and focus on code, rather
than on the infrastructure. So who thinks here
this is a good, better approach than serverless? A few hands. OK. I think I liked it before
I really realized that this is not a whole lot better. Well, first of all,
here's another term that's defining something
by what it is not. But actually, if you
consider operations as the practice of
keeping the tech that runs your business
going, then no-ops is not really a good term. So implying that operations is
something that might go away or is something
that you don't need is, I think, a misunderstanding
of ops in general, and maybe DevOps in particular. So I'm sorry, I don't
have a better term. So we're stuck with serverless. What I do have for you are
some of the best practices we've developed at
Google over the years. But before we get
into that, let's talk about the
challenges, specifically in the context of
serverless when it has to do with managing
those serverless workloads and what makes it a bit
trickier than other workloads. Well first, by definition,
you don't have access to the servers. You cannot install your favorite
agents on the machines to call home to some administration
or to some service providing monitoring. There is no SSHing
into the machine, and that's a good thing. We don't want you to
log into this machine and start creating snowflakes. The second thing is that
this is an environment that has multiple services
interacting with one another, or with service buses,
or through messages. And microservices
are great because you can operate or scale each
service independently, but tracing calls
through the system can be both more critical and
trickier to actually achieve. And things like
cascading failures become a lot harder to diagnose. Third, cloud and serverless
work really best when auto scaling is implemented. And auto scaling is all
about spinning up instances to take the load but also taking
them down so we save on cost. And that's great, but it
increases the overall system's entropy and makes monitoring
somewhat more complicated, because the system
keeps on changing shape. And finally, serverless
compute workloads are often event triggered. So the asynchronous
nature of events makes it more complex to
understand how an application got into a certain state. So we do have a
number of challenges here specific to serverless
when it comes to monitoring, managing your environment. Now let me take a moment to
go through the GCP serverless compute options to get
everyone on the same page. First of all, we
have Cloud Functions. This is functions
as a service where you deploy a short amount of
code, a list of dependencies, and the event that
triggers this code to run. We have App Engine,
which is more suited for front
end applications with multiple services,
multiple modules. And this one comes with built
in versioning and traffic splitting. And finally, if
you'd rather have the flexibility and the freedom
of container-based applications where you can choose
your stack, anything you'd like to put
in that container and still want the
serverless benefits, well then, Cloud Run is the new
product we're announcing today that has several
sessions that will go into the details of this. But we will dive a
little bit into it and certainly do a little
demo of monitoring Cloud Run. So Cloud Functions is really
there to react to events. And those events can be as
simple as HTTP requests, but they can be also
uploaded to a bucket, messages posted to
a pub/sub topic. They could be data
changing in Firestore. They could be metadata changing
in an object in cloud storage. And there are many more coming. It has all the
serverless qualities discussed a second ago, and it
offers a choice of run times. And as you might see here, we're
adding new modern run times, updating existing ones,
and adding new ones. In particular here,
we're announcing this week the support
for Java 8 for Functions. App Engine, as I mentioned, has
been around for a little while and is a solution to
deploy code as well and have the cloud scale
the application for you based on the load. There are no clusters
to set up or manage, and it comes with traffic
splitting, multiple versions of your app, and it offers
a wide variety of run times as well. This week we're announcing Ruby
2.5 as a new supported runtime, and there's a dedicated
session for that. And there should
be links to those later in this presentation. And as announced
earlier this morning, we also now have
this thing called Cloud Run, a fully
managed compute platform that enables you to run
stateless containers that are invocable via HTTP requests. Cloud Run is serverless, which
means that it extracts away all infrastructure
management so you can focus on building great
apps with any technology stack you'd like. It is built with
Knative, letting you choose your containers from
either a fully managed Cloud Run or a Google
Kubernetes Engine cluster with Cloud Run on GKE. So if you're still confused
about the various options we have to offer,
think of it this way. You can think of it in
terms of which artifact you'd like to give us to run
in this serverless fashion. If it's a function or a
collection of functions, well, Cloud Function
clearly is the option that you should look at. If you'd like to provide
something that's more granular, that has multiple modules
that have dependencies, well, consider App Engine,
because what you're giving us is really an app. And if you'd like to
give us a container because you want total freedom
in terms of what you're using inside that
container, well, Cloud Run now becomes an amazingly
interesting option. Let me just mention, to add to
the serverless compute picture, Cloud Pub/Sub, Cloud
Tasks, and Cloud Scheduler. These offer very
elegant solutions to actually use multiple
products together if you'd like to orchestrate
several functions or have different parts being
built in your application using those different
technologies. Now let's go back here
to Cloud Functions, and let's get into a
small demo and see, in the DevOps and serverless
topic of this talk, how you can actually have
a logging set of features to enable you to
see what's going on in a serverless and a
Cloud Function application. So here-- let me go back. Oops. My apologies. This is Cloud
Console, and this is an application in which I can
choose from a set of pictures. And I can say, "Hello
Next Attendees." And what this will do is
post a message to Pub/Sub to which four different
functions written in four different languages
will react and manipulate that image by adding logos and
the text that I just entered. Now, if you look at
the back end of this and you look at the
console, everything here is written with Cloud Functions. We have the few
helper functions. And we have one that processes
the image in the Go language, another one in Java, a couple
in Node, and one in Python. So if I look at, let's say,
this Cloud Function, I can see, in the course of 24 hours,
how many invocations came in-- and there
weren't that many-- how much time these
took on the average, and I can see the
various data that's here. And I can, of course,
look at something over the course of
two days, for example. And what's interesting
here is that I can get to the logs of
that specific function. I can see all the
logs aggregated here. I might have multiple
instances of this running. This might be running on
physically different servers. I really don't have to care. This is a unified,
consolidated, fully managed view of the environment. I can filter by log level. In this case, we only have
two, which are info, typically the things that I put out
in the standard output, and the system events, which
are-- the function completed, this is how long it took,
or, the function started. And these are Cloud
Function events. Now what I can do here is
look at this guy, which is what we call the execution ID. And I would like to show all the
matching entries, so every log for this function, the
Node function, which has this execution. So typically, this is
one request for Node. So I could see
the entire process starting with the function
starting, the request ID, the name of the
file, the output file, downloading files
from compute storage, processing the image
using ImageMagick to do the transformation,
and eventually storing the new image, and finishing. Now, that's really nice. You can see here the filter
that was populated for me. Maybe what I would like
to do at this point is remove the fact
that I'm looking at this Node feature
and this Node function and actually look
at all the ones that have this
specific execution ID. So I can see that
for one event, I have a Node, which we just saw
a function executing, a Node 10, a Go function,
a Java function, as well as a Python function. And if I take this one
little step further, I can also see things such
as how long they took. So if I add this "execution
took" additional filter here, I could see that, indeed,
I had five functions, and they took anywhere between
a few hundred milliseconds to a few seconds. So that's a quick
demo of what you can expect in Cloud Functions
when it comes to logging. Of course, there's
way more to logging and to DevOps than
just logging, and we'll get into that in a second. Oh, and this demo is
available on the show floor if you'd like to play
with it and understand and talk into greater details
and about how it was built. Now, I said that I did not have
a better term, in serverless, but what I do have are
some DevOps lessons learned from Google that have recently
materialized under the name SRE. So SRE stands for site
reliability engineering. And in fact, you can
state the following-- and the developers in the
room will understand this-- SRE implements DevOps. SRE is really an opinionated
implementation of DevOps. There are other
valid implementations of DevOps principles,
Google just uses SRE. When you say DevOps,
we really think SRE. In fact, the industry seems
to have adopted SRE as a term, and companies such as Microsoft,
Apple, Twitter, Facebook, Dropbox, Amazon, and
others all have SRE teams that they call that way. So let's dive into this
and see what, really, there is behind this acronym. At Google, we deliver
services to billions of users. And if you look
at those services, they're really managed by
a fairly small set of SREs. We have multiple
products that serve a billion or more users
each, and all of them have a few SREs. And if you do the math,
that's hundreds of thousands of users per single SRE. And if you've ever
carried a pager and been in the ops world,
this is very different. You cannot be pulled into
hundreds or thousands or tens or hundreds of thousands of
directions trying to put out fires between the different
projects that you manage. Instead, the SRE model
can keep your users happy, your business running without
blowing your operations budget. You cannot scale the number of
SREs with the number of users that you have, or else you would
be burning out a small group of people, or you'd be relying
on some heroic individual actions, which is really
not the point of SREs. So SREs really, at
the end of the day, try to balance two
competing needs. The first one is reliability. Is my service available? Is it returning 200 error codes,
or is it three, four, or five errors? Second, agility-- you can make
something almost perfectly stable if you never
allow changes. But Google is really about
experimenting, speed, and innovation. So how do SREs balance both
needs and keep their sanity? So the answer lies
really in an iceberg. It all starts with the culture. There are a number of books that
have been written by Googlers, and I encourage
you to read those. Just Google them. They're available online. They're also available
from O'Reilly. The goal of an SRE is really
to automate him or herself out of their job. The engineering
reliability needs to be built into the
product, and those SREs participate in the
development of the product. And in particular,
they try to make things as observable as possible. One important thing
to note here is that they can refuse to carry
the pager because they consider the app or the service to
not be well instrumented. SREs are also responsible for
on-calls but also for things like blameless post-mortems,
which is really an important part of the SRE culture, but
also things like incident management, testing, CI/CD-- continuous integration,
continuous deployment. The other part here
that's really important is the infrastructure. SREs are completely outnumbered
by software engineers, so opinionated infrastructure
is the only approach. Every service needs to have
a name to be discoverable, to have quota, out
to have permissions, and it needs to come with
base telemetry no matter what. So that platform, that
infrastructure here, brings observability. It captures the data, typically
in a time series database, and it makes it
available for querying and for running analysis. Now, there are a number
of open-source projects we contribute to,
but what we use and we expose to Google Cloud
Platform users is Stackdriver, which is both the
technology, the platform, as well as the tools
that sit on top of it. But really, the tooling
is the tip of the iceberg, and you really need the
bottom layers for all of this to be extremely useful. Now, going back to the
culture aspect of SREs, rather than playing
whack-a-mole all day trying to put out fires, SREs
look at what impacts customers and users directly. So this is called a
service level indicator. Let's take a bad example of SLI. A CPU goes to 80%. That is not a good
service level indicator. It could affect the customer,
but in reality, we've done the obvious thing of
implementing auto scaling. So at some point,
the problem just goes away, so there's no
real point in paging somebody in the middle of the
night because that CPU has hit that threshold. An SLI, a service
level indicator, is fairly close to what other
people refer to as KPIs, and it really needs to impact
your customer or your business. So obvious examples of that,
and probably better examples, are things around availability. In other words, how
many of my requests are coming back with
200 response codes? And the second one is latency. What percentage of
my requests come back with responses in less
than, say, 300 milliseconds? These are two obvious ones. Of course, you can come
up with other things that have impact on your
customer or your business. Now, moving on to SLOs, or
service level objectives. These are promises
you make to yourself. At Google SLOs are actually--
and this might come as a surprise to some of you-- they're never 100%, mostly
because this would mean you can't ever change anything. And again, we try to
deploy new features, try new things out, and
innovate fairly quickly. So the interesting thing here is
that when you subtract the SLO from 100, you get something we
refer to as an error budget, and this is defined for
a given period of time. This is critical and
a key part of SRE practices, managing that budget. If you burn through
your budget, this means you really should
stop adding new features and rolling out new experiments
and go back and work on your technical debt and
make your system more reliable. On the other hand, if you
never use your budget, that means you're not
experimenting enough. So we call this a
budget for a reason. We manage this as a budget. We have a budget. We actually consume that. So the bottom line
is that you really want to make your service as
reliable as your customers need and spend your error budget
on adding new features. So when do you
actually page somebody in the middle of the
night if you have to? Well, that's when you're error
budget is dropping rapidly, and that's really something
you should be monitoring. And finally, SLA,
it's important, but it's really a
business concept. And it's just the
business aspect of SLOs. It doesn't really fit into
the SRE description of things. Now, another best practice
is versioning your services, and better than
that, actually having those services
multiple ones of them to be active at the same time. If you're able to deploy
multiple active versions of your services, you could
decide to deploy them, maybe, from you're promoted
builds, and you can carry the build version all
the way through to deployment. And in this case,
you can then start doing very interesting
things, such as A/B testing. Maybe things like
Canary and Blue/Green deployments, deploying
a new version and exposing only a small
subset of your users to it. And then incrementing
and increasing the amount of users
that are exposed to it, and eventually move to
the entire new version if all goes well, or roll
back to a previous versions. So the ability to
have those versions opens up a lot of
possibilities when deploying so that you have
both flexibility and confidence in your deployments. So there are other
sessions that fall under CI/CD, which I am not
really talking about here. And I encourage you to
attend some of these. There's one today, one
tomorrow, and one on Thursday. Now let's talk about
Stackdriver, both the platform and the tools. Stackdriver really is the
technology that we use and that implements that
platform that I talked about. This is where we
collect, of course, logs, metrics of all sorts. This is where we extract errors. This is where we
infer relationships. This is something that we
provide dashboards with. There's a bunch of tools that
are built in the Stackdriver offering. But there's also
a set of APIs that are available for
others to build upon. Google Stackdriver is the
externalization of Google's SRE best practices in a way. It has the infrastructure, the
platform, and the observability for GCP applications that
I've mentioned before. So let's quickly go through
some of these tools, and we'll be demoing a
number of these next. So the built-in
products-- you've already seen the Logging
in my short demo, with centralized, fully managed
logs across all products. I used it to show
Cloud Functions, but this works equally well
across App Engine and Cloud Run. And of course, you look at
all those logs regardless of the physical location
of the application running. Cloud Traces takes the execution
ID you saw in that quick demo and takes it a bit farther
by providing a view of calls across services. How much time does a call
to cloud storage take? How long is ImageMagick taking
to manipulate that image? How long is it before I
can write the response to cloud storage? This is a tool
that will actually help you compare
different traces, so it's a very powerful one. Error Reporting is probably
one of my favorites because it's really
easy to setup, because there is no setup. This offers alerting
and summaries for application errors. As your application generates
errors, 500 typically errors, it writes stack traces
to the standard output. It will capture all of this,
and it will provide you a view in which you
can see when was the first time and the last
time this error occurred, how often does it occur, and,
of course, the stack trace. So it will group
those together instead of showing you too
much data, and I think it's a very useful and
very easy to take advantage of. Cloud Profiler
and Cloud Debugger really take all of
this to the next level because they bring
what is traditionally development time tooling
to production workloads. So through open-source
and minimal overhead, Cloud Profiler offers CPU
and memory cost analysis down to function and method levels. Cloud Debugger lets you
inspect the state of a running application with life
snapshots, and it even lets you add log statements
without a redeploy. So how often have you redeployed
just to add a log statement? Well, if you use Cloud
Debugger, you probably will not have to do this again. And we'll see this in
action in a short moment. So Stackdriver also
exposes data with APIs. So one can use this to build,
for example, SLO monitoring dashboards, which are really
important because this is where it all starts. Then you need to have the
tools to drill in to understand what the problem is. But the monitoring of
your SLO is probably where it all starts. So speaking of exposing metrics
via the Stackdriver API, I would like now to introduce
Daniel Langer from Datadog to talk about the
integration they've done with our new release
product, Cloud Run. Over to you Daniel. DANIEL LANGER: Cool. Thank you, Alexis. So like Alexis just
said, my name is Daniel. I'm a I'm a product
manager at Datadog. And really quick
for those of you who don't know what Datadog is,
we're a monitoring platform. So we collect
infrastructure metrics, traces, and application
level metrics, logs. We centralize it in
one SaaS platform, let you create alerts,
dashboards, do cause analysis. We're an operations
tool to make sure that everything in
your environment is running smoothly. We have an open-source agent
and over 250 integrations and an open API, so
that no matter where you're running what
you're running, you can monitor
it within Datadog. We are over 7,500 customers. We're processing trillions
of data points a day, so we've got some
scale behind us. We've been a partner with
Google Cloud for many years now, and we have tons of
different integrations. So whatever you're
running in Google Cloud, you can monitor it in Datadog. So we collect metrics via
Stackdriver and other sources. And today we are
excited to announce a brand new integration
with Stackdriver logs, so that whatever logs
you're generating that are being stored
in Stackdriver, you can now view
them in Datadog. What's more, with the
announcement of Google Cloud Run today, you can
both send metrics and logs from that service
into Datadog as well. Like I mentioned, we
also have an agent, this open-source program
that you can deploy where you have servers running. If you're running
Compute Engine or GKE, you can deploy the Datadog
agent right onto those to get insights into
custom metrics, traces, and logs as well. So really exciting Cloud
Run came out today. So I want to do a quick demo
of how you can monitor this within Datadog. So I've set up a pretty
simple Cloud Run service, and I've had a couple
of revisions going. In this example, I
allocated a low amount of memory, a low
amount of concurrency just to get it going. Behind the scenes is a
super simple Flask app. I have a couple
routes I've set up-- a "hello world" route, a slash
"Daniel rocks" route, and then a route that's going to eat up
the memory of this Cloud Run service. So super simple. And if we go over
to my terminal, we can see that I
have a script running that's hitting the "hello
world" and the "Daniel" route. It's running smoothly. It's fetching over and over
again, nice and smooth. And if we pop on
over to Datadog, I want to show you
a couple of things. So what this is a
Datadog time board where you can drag and drop
widgets and plot metrics and information across
your entire infrastructure. And in this one, we
have a few things. You can see that we have 200
responses for this revision. Everything is looking nice. Our latency is smooth, and
our CPU and memory allocation are pretty constant as well. So as this is running, you
can see that the service is performing nicely. But let's throw a
wrench into that. So let's go back to
terminal, and let's run this eat memory
script that's going to hit that other endpoint. And as we can see,
it's now spitting out that we're out of memory. This is a print statement
that I actually made. If we hop on over back
to my other script, we can see it's
acting up as well. Hitting this new endpoint
that's eating up memory is causing things to
get a little funky. So if we hop on
back over to Datadog and we take a look at
our updated dashboard, we can see this in Datadog. The number of 200 has gone down. The request latency has gone up. And in this case, we know why. I started running
the script that's hitting this eat memory thing. But let's say you were
the owner for this service and you had no idea why this
request latency was going up. You just saw this graph,
or you received an alert that latency had gone up. What do you do? Well, this is where the
power of logs comes in, and especially
important to understand the "why" something's happening. So what we can do
is, right in Datadog, we can click on that
spike, and we can go down to view related logs. And what that's going
to do is take us to the Datadog Log Explorer
scoped to that exact time frame with the context in frame. So we can see we
have a three minute window of logs right
before and after I clicked, and its scoped to that
revision that that time series plot referred to. So we have a nice scoping here. However, as you can see, we
still have over 1,000 log lines for those couple minutes,
so it's still pretty broad. But we can quickly
narrow that down. We can say let's only
look at the error logs. So we've scoped it down. We only have 90 now. And we could also use a
feature called Patterns, which will automatically
group similar logs, so that it's easy to find
needles in a haystack. In this case, we only have
a couple of types of logs, so it's not super exciting. But now we can quickly see
that the cause of our issue is we're hitting that memory. Not super exciting,
we knew this is why. But if you were a developer
trying to debug it, this would be
really, really vital. It will automatically parse out
important attributes from logs. So you can easily see them,
create facets, search for them, query for them. Whatever might be
important to you, you can search for
them in Datadog Logs. I want to quickly go back to the
Datadog dashboard really quick and highlight one
thing that Alexis touched on that's
really important, and that is SLOs and SLIs. So this widget you see here
is called the Datadog Uptime widget, and it lets you track
those exact things, those SLOs, right in Datadog. So in this example, we have
an application latency. That's our service
level indicator. And we see over
different time periods if we are meeting our SLOs. Sadly, in this example, it
doesn't appear that we are. So alongside your service
metrics, application metrics, you can understand
your SLOs as well. So that was just a brief
foray into a Datadog and what we're doing
with Cloud Run already. We are super excited
by the launch today. We expect great
things, and we're looking forward to learning
more as you all begin to use it. I'm going to pass it back
on over to Alexis now. Thank you. [APPLAUSE] ALEXIS MOUSSINE-POUCHKINE:
Thank you, Daniel, for the demo. Thank you for the plug for SLOs. I think it's really important
to be monitoring those SLOs. That's really something that's
been working well for us. And also, there was no
set up on the client side. The people writing
the applications didn't have to integrate with
Datadog or with Stackdriver. Again, this is something
that you get for free, and then you have, of
course, a number of tools that you can use
to monitor things. Now let's switch gears
a little bit and talk about the Cloud Debugger. And for that, let me introduce
Ludovic, a software engineer working on an App
Engine standard, who will walk you through some
demos on App Engine and Cloud Debugger. LUDOVIC CHAMPENOIS: Yep. Thank you, Alexis. So we'll try to do a demo
of the Cloud Debugger. Where is-- OK. I see it. So to run the Cloud Debugger
in an App Engine application, we will run an application
called Pic-a-Daily, which can upload-- well, let's switch to the
second laptop, please. So it's pic-a-daily.appspot.com. And you can upload a
picture, and we scan it, and we tried to find tags. So here, this picture
was taken yesterday, and we can see it's a vehicle,
a car, a road, lane, parking, street, whatever. OK. Or a bowl of strawberries,
we can find it's strawberries using Google Cloud Vision APIs. So this application is a
collection of microservices. The front end is written in Java
as an App Engine application. So here you see the
App Engine console with the current
service running. And it's running on Java 8. So these are the
list of applications that have been deployed. So it's a Java application. You deploy it as a web app. And it would be
nice if you could look at all the logs
of this application even after this application
has been deployed. OK, so for that,
we're going to use a tool which is fully
integrated with App Engine. So if I go to the Tools
menu, you have two views. One is Log, and one is Debug. So the Debug view will
switch the Cloud Console to the Cloud Debugger,
which is configured for you out of the box for all
the App Engine applications. So here, to debug
an application, you need source code usually. And what you deploy
to App Engine is JAR file or a collection
of JARs and frameworks. So to make the console
aware of your source code, you can either connect
your application with a cloud-sourced resource,
a cloud-sourced repository, or you can upload a
file from your laptop. The only thing we need is source
code to map to a line number in a dodge .java file. So here the App
Engine application has been connected to
a cloud repository, and I can now navigate
to Java source code. So you can see we are using
cloud APIs for storage. We are using a Spark
Java framework, which is really cool. You can do two things
with this Cloud Debugger. You can set a breakpoint, like
you would do on your laptop, for local debugging. And it's very interesting
because in the cloud an App Engine application can
scale to millions of JVMs. OK, so which one do you
attach a debugger to? It's quite complicated. So the Cloud
Debugger will listen to a snapshot, which are
more or less breakpoints, without stopping
your application. And the first JVM
reaching this line will capture the
state of the memory and dump it somewhere so
that you can analyze it. So it is very cool,
because let's say I want to set a breakpoint-- not here-- in line-- So here we have a
collection of pictures here, which is an array list. We populate it, so let's
put a breakpoint after. So I just click on
the line number-- well, snapshot and line number. And now the debugger [INAUDIBLE]
is waiting for the application to be triggered. So let's reload the application. Oh, somebody took a picture. And now, maybe
we'll see it here, because in this
debugger view, I can see the state of the
memory of my JVM. So I can look at all the
variables-- request, response, the pictures, which
is an array list. And the first one,
that might have been taken right away with
a call-- or two minutes ago. OK. So I can debug live my
application in production with the Cloud Debugger. So you have seen in the previous
presentation all the logging capabilities of Google Cloud. It would be nice if, after you
deploy your Java application, to inject a new log entry
point in your log viewer. OK. So Cloud Debugger can do that
with what we call a log point. So instead of
setting a breakpoint to your application, what you
want to add after deployment is an entry in your log. So here in line 184, I
would type an expression, which is the content I want
to add every time this line is reached in my application. So here I can put
some text or pictures. OK. "Pic equals--" and
to put a variable, you put open brackets,
so "pictures.get(0). So I will just dump
the first picture, the first entry in the array. Sorry, I put a typo,
so I can edit it here. Good point. I'm very bad at typing. Thank you. You are debugging my debugger. [LAUGHTER] I apply, and then-- so apparently the log
point has been entered. So let's reload the
application here. OK, another picture. If I go back to the debugger,
I set the log level to Info, so now in my log
viewer in the console, I should find somewhere-- I don't know
exactly where it is. But I should find-- we'll load more, or reload-- yeah here, my new
line in the log, which can be now analyzed
with all the other log viewers and Stackdriver
capabilities of our integration. So this is, in a few minutes,
a demo of the Cloud Debugger running live in App Engine. So it's enabled by
default. It has zero cost. It doesn't slow down
your application. And when you use
it, it's very handy because we all make mistakes,
and we need debuggers, even in production. ALEXIS MOUSSINE-POUCHKINE:
And you did well on stage. Thank you. Thank you, Ludovic. LUDOVIC CHAMPENOIS: Thank you. And thank you for
helping me on the typo. [APPLAUSE] ALEXIS MOUSSINE-POUCHKINE: If we
go back to this machine please. So this was a demo of Cloud
Debugger, again, set up for every application that's
running already in App Engine. So this was the quick
demo, and thank you for participating and
posting a few pictures. This is the Vision
API pulling data for each of these pictures. And we do have an admin
for this just in case, so we can remove any
pictures that were submitted. And this is the
brief architecture that we have put in place that
uses both Functions and App Engine and orchestrates
some of that through Pub/Sub and through
Cloud Schedule as well. Now this was [INAUDIBLE]. So let me close out here
with a few thoughts. So first of all, I think
a move to serverless, or an increase of serverless
in your production workloads, is an amazing opportunity to
actually redefine or define those as SLIs and SLOs. I think this is where you
focus on the developer, and this is where
you set, really, things that matter to
your business as things that you should be monitoring. The second step probably,
especially given the announcement that we've
made around Cloud Run, is trying to understand what
is the best cloud serverless tool for the job. If you'd like to
give us functions because functions
as service is really how you think you want to
build your applications, that's all good and fine. Maybe you'd like to have
something more granular, and maybe that's an application
made of multiple app modules. And that's App Engine. Maybe it's a
container because you want full freedom in
terms of what languages and frameworks you use. And that would
probably be Cloud Run. So spend a little
bit of time, probably in the sessions
describing and getting into the details of Cloud
Run, to understand which serverless tool is best
for the job, all of which come supported with
logging and monitoring. And then there are features in
App Engine such as the Cloud Debugger and other features
and in other products. And profit. So with that, let me give you
a list of related sessions. There are some interesting
ones happening today. There more tomorrow, and
there's more on Thursday. Some of them have to do with
trying to find and choose those best run times. Others are more involved in to
CI/CD, as I mentioned earlier. Some are focused on
specific languages because a number of developers
identified themselves by the language of their choice. So I would really encourage
you to look at these others. That session, SVR303,
which is running Cloud Run on an existing
GKE cluster, which is a very interesting topic, I believe. So with that, let
me pull up our tool and see if we have
any questions. And if you do have questions,
we have microphones here in the aisles, and you're free
to step up to one of them. And I believe they're
now open for questions. Thank you for coming. And this session will be made
available online real soon. AUDIENCE: Hey. Hi. Thanks for the session. This was very helpful. My name is Vin. I have a quick question on
the Anthos announcement today. How does this integrate
with some of the migration that we could do with
the hybrid cloud? ALEXIS MOUSSINE-POUCHKINE:
That's a very good question. We're not really-- so
Anthos doesn't really qualify as serverless,
because serverless is probably scaled to zero. It has a number of
things around billing as you go, as opposed to paying
for the cluster when it's up. So if it's Kubernetes per se,
it's not really serverless. If you add Knative, and you add
things such as Cloud Run on GKE on top of it, it
becomes serverless. So that's why I
didn't cover this, and we didn't cover
this in the session. But Anthos comes with a really
advanced set of features to monitor. And all the things I've
said about SREs hold true. And all the tools
that they have that are specific to GKE go
probably just as far as what we've seen here. There's a lot that
comes with it. And there's a console
which is really dedicated for DevOps for Anthos
that that's made available. And you should probably go in
on the ground on the show floors or you check out some sessions. There's definitely a
lot to look at there. AUDIENCE: Hi. My question is also
about integration. How do these things
work with Apigee to expose those microservices
using API manager? ALEXIS MOUSSINE-POUCHKINE:
So you're talking-- Apigee, is that-- AUDIENCE: Apigee. ALEXIS
MOUSSINE-POUCHKINE: Right. So Apigee for a while has
been somewhat separate from Google Cloud. It's all about
enabling businesses to expose their
services as APIs. I am not very familiar with
Apigee but to be honest. They have a set of tools. There is quite a
bit of integration that's going on
with Apigee, where everything we've
discovered here, everything we've talked about will
apply equally to Apigee in the fairly near future. But I can't comment
very much more on the existing tools
that are available. Let me-- I did say that I
wanted to check questions. Yes, there are a few actually. So Jeff asks, how
do you recommend securing Cloud Functions that
need access to GCP resources? So that's typically where we
have work going on with Apigee. I think Apigee is where
the answer lies for this, because they have some fairly
advanced gateway features that can play that role of
securing Cloud Functions. We're also working on making
functions protected as a base feature so you don't have to
pull in things like Apigee. Let me try to get to a last one. How do cloud containers relate
to App Engine Flex Environment? It scales to zero
is the short answer. If you want more, we
have dedicated sessions, both for the fully managed and
the Cloud Run on GKE solutions. And with that, I'm sorry, I'm
told that we're out of time. But we're sticking around. Please come to us if
you have questions. And thank you for coming. [MUSIC PLAYING]