DevOps in a Serverless World (Cloud Next '19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] ALEXIS MOUSSINE-POUCHKINE: Hello. Hello, everybody. Thank you for coming. My name is Alexis, and this is DevOps in a Serverless World. I work as a developer advocate at Google in the cloud organization. And before we get started, let me tell you about the tool that we have to take questions. Hopefully we'll have some time for questions towards the end. In the conference app, there's a section for this talk where you can ask questions. You can even see other people's questions, and you can vote them up, or I think even down if you'd like. I hope we have time for this. If we don't, I will and my colleagues will provide answers right in the tool. So with that out of the way, I do need to spend just a little bit of time defining serverless. One of the buzzwords in the title that I used is serverless. And I really believe that there are real and profound reasons why you would want to use serverless today when writing new applications or maybe even taking existing applications to the cloud. From an operational model, there are no services to provision, no cluster to set up, no cluster to administer or secure. And the important thing is that you are billed for the actual usage via optimized and transparent auto scaling. That's the operational model. From a developer's perspective, this is really focused on consuming and exposing services. It's a very service-focused environment. You're also reacting and producing events. The loose coupling between your modules, between your services, is probably done through eventing. And finally, we'd like, as developers, probably to have an environment that's as open as possible so we're not locked into any particular solution. Now, let's go back to the word serverless. Many jokes are about serverless. Yes, there are servers. This is not Google's new and latest serverless data center. But there's sometimes some confusion, and it's kind of weird to define an area by what it's not, you know, serverless. So let me suggest that we try to find a new word, a better word. How about no-ops? I mean, this term is trying to capture the fact that developers can focus on delivering great apps and focus on code, rather than on the infrastructure. So who thinks here this is a good, better approach than serverless? A few hands. OK. I think I liked it before I really realized that this is not a whole lot better. Well, first of all, here's another term that's defining something by what it is not. But actually, if you consider operations as the practice of keeping the tech that runs your business going, then no-ops is not really a good term. So implying that operations is something that might go away or is something that you don't need is, I think, a misunderstanding of ops in general, and maybe DevOps in particular. So I'm sorry, I don't have a better term. So we're stuck with serverless. What I do have for you are some of the best practices we've developed at Google over the years. But before we get into that, let's talk about the challenges, specifically in the context of serverless when it has to do with managing those serverless workloads and what makes it a bit trickier than other workloads. Well first, by definition, you don't have access to the servers. You cannot install your favorite agents on the machines to call home to some administration or to some service providing monitoring. There is no SSHing into the machine, and that's a good thing. We don't want you to log into this machine and start creating snowflakes. The second thing is that this is an environment that has multiple services interacting with one another, or with service buses, or through messages. And microservices are great because you can operate or scale each service independently, but tracing calls through the system can be both more critical and trickier to actually achieve. And things like cascading failures become a lot harder to diagnose. Third, cloud and serverless work really best when auto scaling is implemented. And auto scaling is all about spinning up instances to take the load but also taking them down so we save on cost. And that's great, but it increases the overall system's entropy and makes monitoring somewhat more complicated, because the system keeps on changing shape. And finally, serverless compute workloads are often event triggered. So the asynchronous nature of events makes it more complex to understand how an application got into a certain state. So we do have a number of challenges here specific to serverless when it comes to monitoring, managing your environment. Now let me take a moment to go through the GCP serverless compute options to get everyone on the same page. First of all, we have Cloud Functions. This is functions as a service where you deploy a short amount of code, a list of dependencies, and the event that triggers this code to run. We have App Engine, which is more suited for front end applications with multiple services, multiple modules. And this one comes with built in versioning and traffic splitting. And finally, if you'd rather have the flexibility and the freedom of container-based applications where you can choose your stack, anything you'd like to put in that container and still want the serverless benefits, well then, Cloud Run is the new product we're announcing today that has several sessions that will go into the details of this. But we will dive a little bit into it and certainly do a little demo of monitoring Cloud Run. So Cloud Functions is really there to react to events. And those events can be as simple as HTTP requests, but they can be also uploaded to a bucket, messages posted to a pub/sub topic. They could be data changing in Firestore. They could be metadata changing in an object in cloud storage. And there are many more coming. It has all the serverless qualities discussed a second ago, and it offers a choice of run times. And as you might see here, we're adding new modern run times, updating existing ones, and adding new ones. In particular here, we're announcing this week the support for Java 8 for Functions. App Engine, as I mentioned, has been around for a little while and is a solution to deploy code as well and have the cloud scale the application for you based on the load. There are no clusters to set up or manage, and it comes with traffic splitting, multiple versions of your app, and it offers a wide variety of run times as well. This week we're announcing Ruby 2.5 as a new supported runtime, and there's a dedicated session for that. And there should be links to those later in this presentation. And as announced earlier this morning, we also now have this thing called Cloud Run, a fully managed compute platform that enables you to run stateless containers that are invocable via HTTP requests. Cloud Run is serverless, which means that it extracts away all infrastructure management so you can focus on building great apps with any technology stack you'd like. It is built with Knative, letting you choose your containers from either a fully managed Cloud Run or a Google Kubernetes Engine cluster with Cloud Run on GKE. So if you're still confused about the various options we have to offer, think of it this way. You can think of it in terms of which artifact you'd like to give us to run in this serverless fashion. If it's a function or a collection of functions, well, Cloud Function clearly is the option that you should look at. If you'd like to provide something that's more granular, that has multiple modules that have dependencies, well, consider App Engine, because what you're giving us is really an app. And if you'd like to give us a container because you want total freedom in terms of what you're using inside that container, well, Cloud Run now becomes an amazingly interesting option. Let me just mention, to add to the serverless compute picture, Cloud Pub/Sub, Cloud Tasks, and Cloud Scheduler. These offer very elegant solutions to actually use multiple products together if you'd like to orchestrate several functions or have different parts being built in your application using those different technologies. Now let's go back here to Cloud Functions, and let's get into a small demo and see, in the DevOps and serverless topic of this talk, how you can actually have a logging set of features to enable you to see what's going on in a serverless and a Cloud Function application. So here-- let me go back. Oops. My apologies. This is Cloud Console, and this is an application in which I can choose from a set of pictures. And I can say, "Hello Next Attendees." And what this will do is post a message to Pub/Sub to which four different functions written in four different languages will react and manipulate that image by adding logos and the text that I just entered. Now, if you look at the back end of this and you look at the console, everything here is written with Cloud Functions. We have the few helper functions. And we have one that processes the image in the Go language, another one in Java, a couple in Node, and one in Python. So if I look at, let's say, this Cloud Function, I can see, in the course of 24 hours, how many invocations came in-- and there weren't that many-- how much time these took on the average, and I can see the various data that's here. And I can, of course, look at something over the course of two days, for example. And what's interesting here is that I can get to the logs of that specific function. I can see all the logs aggregated here. I might have multiple instances of this running. This might be running on physically different servers. I really don't have to care. This is a unified, consolidated, fully managed view of the environment. I can filter by log level. In this case, we only have two, which are info, typically the things that I put out in the standard output, and the system events, which are-- the function completed, this is how long it took, or, the function started. And these are Cloud Function events. Now what I can do here is look at this guy, which is what we call the execution ID. And I would like to show all the matching entries, so every log for this function, the Node function, which has this execution. So typically, this is one request for Node. So I could see the entire process starting with the function starting, the request ID, the name of the file, the output file, downloading files from compute storage, processing the image using ImageMagick to do the transformation, and eventually storing the new image, and finishing. Now, that's really nice. You can see here the filter that was populated for me. Maybe what I would like to do at this point is remove the fact that I'm looking at this Node feature and this Node function and actually look at all the ones that have this specific execution ID. So I can see that for one event, I have a Node, which we just saw a function executing, a Node 10, a Go function, a Java function, as well as a Python function. And if I take this one little step further, I can also see things such as how long they took. So if I add this "execution took" additional filter here, I could see that, indeed, I had five functions, and they took anywhere between a few hundred milliseconds to a few seconds. So that's a quick demo of what you can expect in Cloud Functions when it comes to logging. Of course, there's way more to logging and to DevOps than just logging, and we'll get into that in a second. Oh, and this demo is available on the show floor if you'd like to play with it and understand and talk into greater details and about how it was built. Now, I said that I did not have a better term, in serverless, but what I do have are some DevOps lessons learned from Google that have recently materialized under the name SRE. So SRE stands for site reliability engineering. And in fact, you can state the following-- and the developers in the room will understand this-- SRE implements DevOps. SRE is really an opinionated implementation of DevOps. There are other valid implementations of DevOps principles, Google just uses SRE. When you say DevOps, we really think SRE. In fact, the industry seems to have adopted SRE as a term, and companies such as Microsoft, Apple, Twitter, Facebook, Dropbox, Amazon, and others all have SRE teams that they call that way. So let's dive into this and see what, really, there is behind this acronym. At Google, we deliver services to billions of users. And if you look at those services, they're really managed by a fairly small set of SREs. We have multiple products that serve a billion or more users each, and all of them have a few SREs. And if you do the math, that's hundreds of thousands of users per single SRE. And if you've ever carried a pager and been in the ops world, this is very different. You cannot be pulled into hundreds or thousands or tens or hundreds of thousands of directions trying to put out fires between the different projects that you manage. Instead, the SRE model can keep your users happy, your business running without blowing your operations budget. You cannot scale the number of SREs with the number of users that you have, or else you would be burning out a small group of people, or you'd be relying on some heroic individual actions, which is really not the point of SREs. So SREs really, at the end of the day, try to balance two competing needs. The first one is reliability. Is my service available? Is it returning 200 error codes, or is it three, four, or five errors? Second, agility-- you can make something almost perfectly stable if you never allow changes. But Google is really about experimenting, speed, and innovation. So how do SREs balance both needs and keep their sanity? So the answer lies really in an iceberg. It all starts with the culture. There are a number of books that have been written by Googlers, and I encourage you to read those. Just Google them. They're available online. They're also available from O'Reilly. The goal of an SRE is really to automate him or herself out of their job. The engineering reliability needs to be built into the product, and those SREs participate in the development of the product. And in particular, they try to make things as observable as possible. One important thing to note here is that they can refuse to carry the pager because they consider the app or the service to not be well instrumented. SREs are also responsible for on-calls but also for things like blameless post-mortems, which is really an important part of the SRE culture, but also things like incident management, testing, CI/CD-- continuous integration, continuous deployment. The other part here that's really important is the infrastructure. SREs are completely outnumbered by software engineers, so opinionated infrastructure is the only approach. Every service needs to have a name to be discoverable, to have quota, out to have permissions, and it needs to come with base telemetry no matter what. So that platform, that infrastructure here, brings observability. It captures the data, typically in a time series database, and it makes it available for querying and for running analysis. Now, there are a number of open-source projects we contribute to, but what we use and we expose to Google Cloud Platform users is Stackdriver, which is both the technology, the platform, as well as the tools that sit on top of it. But really, the tooling is the tip of the iceberg, and you really need the bottom layers for all of this to be extremely useful. Now, going back to the culture aspect of SREs, rather than playing whack-a-mole all day trying to put out fires, SREs look at what impacts customers and users directly. So this is called a service level indicator. Let's take a bad example of SLI. A CPU goes to 80%. That is not a good service level indicator. It could affect the customer, but in reality, we've done the obvious thing of implementing auto scaling. So at some point, the problem just goes away, so there's no real point in paging somebody in the middle of the night because that CPU has hit that threshold. An SLI, a service level indicator, is fairly close to what other people refer to as KPIs, and it really needs to impact your customer or your business. So obvious examples of that, and probably better examples, are things around availability. In other words, how many of my requests are coming back with 200 response codes? And the second one is latency. What percentage of my requests come back with responses in less than, say, 300 milliseconds? These are two obvious ones. Of course, you can come up with other things that have impact on your customer or your business. Now, moving on to SLOs, or service level objectives. These are promises you make to yourself. At Google SLOs are actually-- and this might come as a surprise to some of you-- they're never 100%, mostly because this would mean you can't ever change anything. And again, we try to deploy new features, try new things out, and innovate fairly quickly. So the interesting thing here is that when you subtract the SLO from 100, you get something we refer to as an error budget, and this is defined for a given period of time. This is critical and a key part of SRE practices, managing that budget. If you burn through your budget, this means you really should stop adding new features and rolling out new experiments and go back and work on your technical debt and make your system more reliable. On the other hand, if you never use your budget, that means you're not experimenting enough. So we call this a budget for a reason. We manage this as a budget. We have a budget. We actually consume that. So the bottom line is that you really want to make your service as reliable as your customers need and spend your error budget on adding new features. So when do you actually page somebody in the middle of the night if you have to? Well, that's when you're error budget is dropping rapidly, and that's really something you should be monitoring. And finally, SLA, it's important, but it's really a business concept. And it's just the business aspect of SLOs. It doesn't really fit into the SRE description of things. Now, another best practice is versioning your services, and better than that, actually having those services multiple ones of them to be active at the same time. If you're able to deploy multiple active versions of your services, you could decide to deploy them, maybe, from you're promoted builds, and you can carry the build version all the way through to deployment. And in this case, you can then start doing very interesting things, such as A/B testing. Maybe things like Canary and Blue/Green deployments, deploying a new version and exposing only a small subset of your users to it. And then incrementing and increasing the amount of users that are exposed to it, and eventually move to the entire new version if all goes well, or roll back to a previous versions. So the ability to have those versions opens up a lot of possibilities when deploying so that you have both flexibility and confidence in your deployments. So there are other sessions that fall under CI/CD, which I am not really talking about here. And I encourage you to attend some of these. There's one today, one tomorrow, and one on Thursday. Now let's talk about Stackdriver, both the platform and the tools. Stackdriver really is the technology that we use and that implements that platform that I talked about. This is where we collect, of course, logs, metrics of all sorts. This is where we extract errors. This is where we infer relationships. This is something that we provide dashboards with. There's a bunch of tools that are built in the Stackdriver offering. But there's also a set of APIs that are available for others to build upon. Google Stackdriver is the externalization of Google's SRE best practices in a way. It has the infrastructure, the platform, and the observability for GCP applications that I've mentioned before. So let's quickly go through some of these tools, and we'll be demoing a number of these next. So the built-in products-- you've already seen the Logging in my short demo, with centralized, fully managed logs across all products. I used it to show Cloud Functions, but this works equally well across App Engine and Cloud Run. And of course, you look at all those logs regardless of the physical location of the application running. Cloud Traces takes the execution ID you saw in that quick demo and takes it a bit farther by providing a view of calls across services. How much time does a call to cloud storage take? How long is ImageMagick taking to manipulate that image? How long is it before I can write the response to cloud storage? This is a tool that will actually help you compare different traces, so it's a very powerful one. Error Reporting is probably one of my favorites because it's really easy to setup, because there is no setup. This offers alerting and summaries for application errors. As your application generates errors, 500 typically errors, it writes stack traces to the standard output. It will capture all of this, and it will provide you a view in which you can see when was the first time and the last time this error occurred, how often does it occur, and, of course, the stack trace. So it will group those together instead of showing you too much data, and I think it's a very useful and very easy to take advantage of. Cloud Profiler and Cloud Debugger really take all of this to the next level because they bring what is traditionally development time tooling to production workloads. So through open-source and minimal overhead, Cloud Profiler offers CPU and memory cost analysis down to function and method levels. Cloud Debugger lets you inspect the state of a running application with life snapshots, and it even lets you add log statements without a redeploy. So how often have you redeployed just to add a log statement? Well, if you use Cloud Debugger, you probably will not have to do this again. And we'll see this in action in a short moment. So Stackdriver also exposes data with APIs. So one can use this to build, for example, SLO monitoring dashboards, which are really important because this is where it all starts. Then you need to have the tools to drill in to understand what the problem is. But the monitoring of your SLO is probably where it all starts. So speaking of exposing metrics via the Stackdriver API, I would like now to introduce Daniel Langer from Datadog to talk about the integration they've done with our new release product, Cloud Run. Over to you Daniel. DANIEL LANGER: Cool. Thank you, Alexis. So like Alexis just said, my name is Daniel. I'm a I'm a product manager at Datadog. And really quick for those of you who don't know what Datadog is, we're a monitoring platform. So we collect infrastructure metrics, traces, and application level metrics, logs. We centralize it in one SaaS platform, let you create alerts, dashboards, do cause analysis. We're an operations tool to make sure that everything in your environment is running smoothly. We have an open-source agent and over 250 integrations and an open API, so that no matter where you're running what you're running, you can monitor it within Datadog. We are over 7,500 customers. We're processing trillions of data points a day, so we've got some scale behind us. We've been a partner with Google Cloud for many years now, and we have tons of different integrations. So whatever you're running in Google Cloud, you can monitor it in Datadog. So we collect metrics via Stackdriver and other sources. And today we are excited to announce a brand new integration with Stackdriver logs, so that whatever logs you're generating that are being stored in Stackdriver, you can now view them in Datadog. What's more, with the announcement of Google Cloud Run today, you can both send metrics and logs from that service into Datadog as well. Like I mentioned, we also have an agent, this open-source program that you can deploy where you have servers running. If you're running Compute Engine or GKE, you can deploy the Datadog agent right onto those to get insights into custom metrics, traces, and logs as well. So really exciting Cloud Run came out today. So I want to do a quick demo of how you can monitor this within Datadog. So I've set up a pretty simple Cloud Run service, and I've had a couple of revisions going. In this example, I allocated a low amount of memory, a low amount of concurrency just to get it going. Behind the scenes is a super simple Flask app. I have a couple routes I've set up-- a "hello world" route, a slash "Daniel rocks" route, and then a route that's going to eat up the memory of this Cloud Run service. So super simple. And if we go over to my terminal, we can see that I have a script running that's hitting the "hello world" and the "Daniel" route. It's running smoothly. It's fetching over and over again, nice and smooth. And if we pop on over to Datadog, I want to show you a couple of things. So what this is a Datadog time board where you can drag and drop widgets and plot metrics and information across your entire infrastructure. And in this one, we have a few things. You can see that we have 200 responses for this revision. Everything is looking nice. Our latency is smooth, and our CPU and memory allocation are pretty constant as well. So as this is running, you can see that the service is performing nicely. But let's throw a wrench into that. So let's go back to terminal, and let's run this eat memory script that's going to hit that other endpoint. And as we can see, it's now spitting out that we're out of memory. This is a print statement that I actually made. If we hop on over back to my other script, we can see it's acting up as well. Hitting this new endpoint that's eating up memory is causing things to get a little funky. So if we hop on back over to Datadog and we take a look at our updated dashboard, we can see this in Datadog. The number of 200 has gone down. The request latency has gone up. And in this case, we know why. I started running the script that's hitting this eat memory thing. But let's say you were the owner for this service and you had no idea why this request latency was going up. You just saw this graph, or you received an alert that latency had gone up. What do you do? Well, this is where the power of logs comes in, and especially important to understand the "why" something's happening. So what we can do is, right in Datadog, we can click on that spike, and we can go down to view related logs. And what that's going to do is take us to the Datadog Log Explorer scoped to that exact time frame with the context in frame. So we can see we have a three minute window of logs right before and after I clicked, and its scoped to that revision that that time series plot referred to. So we have a nice scoping here. However, as you can see, we still have over 1,000 log lines for those couple minutes, so it's still pretty broad. But we can quickly narrow that down. We can say let's only look at the error logs. So we've scoped it down. We only have 90 now. And we could also use a feature called Patterns, which will automatically group similar logs, so that it's easy to find needles in a haystack. In this case, we only have a couple of types of logs, so it's not super exciting. But now we can quickly see that the cause of our issue is we're hitting that memory. Not super exciting, we knew this is why. But if you were a developer trying to debug it, this would be really, really vital. It will automatically parse out important attributes from logs. So you can easily see them, create facets, search for them, query for them. Whatever might be important to you, you can search for them in Datadog Logs. I want to quickly go back to the Datadog dashboard really quick and highlight one thing that Alexis touched on that's really important, and that is SLOs and SLIs. So this widget you see here is called the Datadog Uptime widget, and it lets you track those exact things, those SLOs, right in Datadog. So in this example, we have an application latency. That's our service level indicator. And we see over different time periods if we are meeting our SLOs. Sadly, in this example, it doesn't appear that we are. So alongside your service metrics, application metrics, you can understand your SLOs as well. So that was just a brief foray into a Datadog and what we're doing with Cloud Run already. We are super excited by the launch today. We expect great things, and we're looking forward to learning more as you all begin to use it. I'm going to pass it back on over to Alexis now. Thank you. [APPLAUSE] ALEXIS MOUSSINE-POUCHKINE: Thank you, Daniel, for the demo. Thank you for the plug for SLOs. I think it's really important to be monitoring those SLOs. That's really something that's been working well for us. And also, there was no set up on the client side. The people writing the applications didn't have to integrate with Datadog or with Stackdriver. Again, this is something that you get for free, and then you have, of course, a number of tools that you can use to monitor things. Now let's switch gears a little bit and talk about the Cloud Debugger. And for that, let me introduce Ludovic, a software engineer working on an App Engine standard, who will walk you through some demos on App Engine and Cloud Debugger. LUDOVIC CHAMPENOIS: Yep. Thank you, Alexis. So we'll try to do a demo of the Cloud Debugger. Where is-- OK. I see it. So to run the Cloud Debugger in an App Engine application, we will run an application called Pic-a-Daily, which can upload-- well, let's switch to the second laptop, please. So it's pic-a-daily.appspot.com. And you can upload a picture, and we scan it, and we tried to find tags. So here, this picture was taken yesterday, and we can see it's a vehicle, a car, a road, lane, parking, street, whatever. OK. Or a bowl of strawberries, we can find it's strawberries using Google Cloud Vision APIs. So this application is a collection of microservices. The front end is written in Java as an App Engine application. So here you see the App Engine console with the current service running. And it's running on Java 8. So these are the list of applications that have been deployed. So it's a Java application. You deploy it as a web app. And it would be nice if you could look at all the logs of this application even after this application has been deployed. OK, so for that, we're going to use a tool which is fully integrated with App Engine. So if I go to the Tools menu, you have two views. One is Log, and one is Debug. So the Debug view will switch the Cloud Console to the Cloud Debugger, which is configured for you out of the box for all the App Engine applications. So here, to debug an application, you need source code usually. And what you deploy to App Engine is JAR file or a collection of JARs and frameworks. So to make the console aware of your source code, you can either connect your application with a cloud-sourced resource, a cloud-sourced repository, or you can upload a file from your laptop. The only thing we need is source code to map to a line number in a dodge .java file. So here the App Engine application has been connected to a cloud repository, and I can now navigate to Java source code. So you can see we are using cloud APIs for storage. We are using a Spark Java framework, which is really cool. You can do two things with this Cloud Debugger. You can set a breakpoint, like you would do on your laptop, for local debugging. And it's very interesting because in the cloud an App Engine application can scale to millions of JVMs. OK, so which one do you attach a debugger to? It's quite complicated. So the Cloud Debugger will listen to a snapshot, which are more or less breakpoints, without stopping your application. And the first JVM reaching this line will capture the state of the memory and dump it somewhere so that you can analyze it. So it is very cool, because let's say I want to set a breakpoint-- not here-- in line-- So here we have a collection of pictures here, which is an array list. We populate it, so let's put a breakpoint after. So I just click on the line number-- well, snapshot and line number. And now the debugger [INAUDIBLE] is waiting for the application to be triggered. So let's reload the application. Oh, somebody took a picture. And now, maybe we'll see it here, because in this debugger view, I can see the state of the memory of my JVM. So I can look at all the variables-- request, response, the pictures, which is an array list. And the first one, that might have been taken right away with a call-- or two minutes ago. OK. So I can debug live my application in production with the Cloud Debugger. So you have seen in the previous presentation all the logging capabilities of Google Cloud. It would be nice if, after you deploy your Java application, to inject a new log entry point in your log viewer. OK. So Cloud Debugger can do that with what we call a log point. So instead of setting a breakpoint to your application, what you want to add after deployment is an entry in your log. So here in line 184, I would type an expression, which is the content I want to add every time this line is reached in my application. So here I can put some text or pictures. OK. "Pic equals--" and to put a variable, you put open brackets, so "pictures.get(0). So I will just dump the first picture, the first entry in the array. Sorry, I put a typo, so I can edit it here. Good point. I'm very bad at typing. Thank you. You are debugging my debugger. [LAUGHTER] I apply, and then-- so apparently the log point has been entered. So let's reload the application here. OK, another picture. If I go back to the debugger, I set the log level to Info, so now in my log viewer in the console, I should find somewhere-- I don't know exactly where it is. But I should find-- we'll load more, or reload-- yeah here, my new line in the log, which can be now analyzed with all the other log viewers and Stackdriver capabilities of our integration. So this is, in a few minutes, a demo of the Cloud Debugger running live in App Engine. So it's enabled by default. It has zero cost. It doesn't slow down your application. And when you use it, it's very handy because we all make mistakes, and we need debuggers, even in production. ALEXIS MOUSSINE-POUCHKINE: And you did well on stage. Thank you. Thank you, Ludovic. LUDOVIC CHAMPENOIS: Thank you. And thank you for helping me on the typo. [APPLAUSE] ALEXIS MOUSSINE-POUCHKINE: If we go back to this machine please. So this was a demo of Cloud Debugger, again, set up for every application that's running already in App Engine. So this was the quick demo, and thank you for participating and posting a few pictures. This is the Vision API pulling data for each of these pictures. And we do have an admin for this just in case, so we can remove any pictures that were submitted. And this is the brief architecture that we have put in place that uses both Functions and App Engine and orchestrates some of that through Pub/Sub and through Cloud Schedule as well. Now this was [INAUDIBLE]. So let me close out here with a few thoughts. So first of all, I think a move to serverless, or an increase of serverless in your production workloads, is an amazing opportunity to actually redefine or define those as SLIs and SLOs. I think this is where you focus on the developer, and this is where you set, really, things that matter to your business as things that you should be monitoring. The second step probably, especially given the announcement that we've made around Cloud Run, is trying to understand what is the best cloud serverless tool for the job. If you'd like to give us functions because functions as service is really how you think you want to build your applications, that's all good and fine. Maybe you'd like to have something more granular, and maybe that's an application made of multiple app modules. And that's App Engine. Maybe it's a container because you want full freedom in terms of what languages and frameworks you use. And that would probably be Cloud Run. So spend a little bit of time, probably in the sessions describing and getting into the details of Cloud Run, to understand which serverless tool is best for the job, all of which come supported with logging and monitoring. And then there are features in App Engine such as the Cloud Debugger and other features and in other products. And profit. So with that, let me give you a list of related sessions. There are some interesting ones happening today. There more tomorrow, and there's more on Thursday. Some of them have to do with trying to find and choose those best run times. Others are more involved in to CI/CD, as I mentioned earlier. Some are focused on specific languages because a number of developers identified themselves by the language of their choice. So I would really encourage you to look at these others. That session, SVR303, which is running Cloud Run on an existing GKE cluster, which is a very interesting topic, I believe. So with that, let me pull up our tool and see if we have any questions. And if you do have questions, we have microphones here in the aisles, and you're free to step up to one of them. And I believe they're now open for questions. Thank you for coming. And this session will be made available online real soon. AUDIENCE: Hey. Hi. Thanks for the session. This was very helpful. My name is Vin. I have a quick question on the Anthos announcement today. How does this integrate with some of the migration that we could do with the hybrid cloud? ALEXIS MOUSSINE-POUCHKINE: That's a very good question. We're not really-- so Anthos doesn't really qualify as serverless, because serverless is probably scaled to zero. It has a number of things around billing as you go, as opposed to paying for the cluster when it's up. So if it's Kubernetes per se, it's not really serverless. If you add Knative, and you add things such as Cloud Run on GKE on top of it, it becomes serverless. So that's why I didn't cover this, and we didn't cover this in the session. But Anthos comes with a really advanced set of features to monitor. And all the things I've said about SREs hold true. And all the tools that they have that are specific to GKE go probably just as far as what we've seen here. There's a lot that comes with it. And there's a console which is really dedicated for DevOps for Anthos that that's made available. And you should probably go in on the ground on the show floors or you check out some sessions. There's definitely a lot to look at there. AUDIENCE: Hi. My question is also about integration. How do these things work with Apigee to expose those microservices using API manager? ALEXIS MOUSSINE-POUCHKINE: So you're talking-- Apigee, is that-- AUDIENCE: Apigee. ALEXIS MOUSSINE-POUCHKINE: Right. So Apigee for a while has been somewhat separate from Google Cloud. It's all about enabling businesses to expose their services as APIs. I am not very familiar with Apigee but to be honest. They have a set of tools. There is quite a bit of integration that's going on with Apigee, where everything we've discovered here, everything we've talked about will apply equally to Apigee in the fairly near future. But I can't comment very much more on the existing tools that are available. Let me-- I did say that I wanted to check questions. Yes, there are a few actually. So Jeff asks, how do you recommend securing Cloud Functions that need access to GCP resources? So that's typically where we have work going on with Apigee. I think Apigee is where the answer lies for this, because they have some fairly advanced gateway features that can play that role of securing Cloud Functions. We're also working on making functions protected as a base feature so you don't have to pull in things like Apigee. Let me try to get to a last one. How do cloud containers relate to App Engine Flex Environment? It scales to zero is the short answer. If you want more, we have dedicated sessions, both for the fully managed and the Cloud Run on GKE solutions. And with that, I'm sorry, I'm told that we're out of time. But we're sticking around. Please come to us if you have questions. And thank you for coming. [MUSIC PLAYING]

Info

Channel: Google Cloud Tech

Views: 5,157

Rating: 4.8823528 out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate

Id: G_OeE92R29U

Channel Id: undefined

Length: 49min 52sec (2992 seconds)

Published: Wed Apr 10 2019