Introduction to Cloud SQL and Kubernetes (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] BRETT HESTERBERG: Today we're going to talk first a little bit about the big picture, and then we'll get into building our application. What we're going to attempt to do today is build a web app that is fault tolerant and scalable. And we're going to do it using Kubernetes Engine and Cloud SQL as our backend database. What you see in this diagram is that we're going to opt to deploy Kubernetes Engine and Cloud SQL in two failure domains, in multiple failure domains, where these are represented by Google Cloud zones within a region, where a zone is essentially a failure domain within a region. So we're going to deploy these across zones. We're going to see how things scale. And then we'll inject some faults and kind of simulate disaster striking, so to speak. Before we get there, we'll do a little introduction on Kubernetes Engine itself, as well as Cloud SQL. I think, by show of hands at least, there are a number of folks in the audience who haven't had experience with Kubernetes Engine and potentially not with Cloud SQL. ALLAN NALM: Yeah, thanks, Brett. So back in the early 2000s, Google was off indexing the whole world wide web, and its goal is to build a service at planetary scale, where anyone around the world basically can get a response back in milliseconds, right? But when you go back 12, 14 years ago, Google was like any company you see today. Applications were built running on top of infrastructure, and there was this dependency on the underlying infrastructure. And we knew we needed to scale and really scale quickly. So we went off and tried to solve two very discrete problems. Number one, how do you take a workload and deploy it on a machine and do it predictably, and then deploy a variety of workloads on the same machine so you get better efficiency? And then how do you take this machine and attach it to a pool of compute. So as developers, you just write code, and you don't really care where that application ends up running, right? That's what we went off to solve back then. And now is really the advent of containers. Google has been using containers for many years-- over a decade. Back in-- about 14 years ago, we open sourced a lot of the work around cgroups. That really helped running containers in isolation and so forth. So as you look at where Google is today, everything at Google actually runs inside a container, all our services, from Ads to Search. A matter of fact, for Google Cloud, our virtual machines are actually running inside a container. And we spin up over four billion containers per week. A lot of the work that we did around containers really led to the creation of an internal system Google uses called Borg. All of our developers that write applications submit their jobs to Borg. And then Borg schedules all these workloads across Google's fleet. A few years back, with the advent of containers and the popularity of containers, Google decided to take all the various best practices associated with running and operating and managing containers internally with Borg and create this open source project called Kubernetes. And really, that was the genesis behind Kubernetes that we announced around four years ago. And we open sourced it. And the goal of Kubernetes is really to manage and operate around your container workloads, right? It was really inspired by the best practices that Google had around running, managing, and operating containers internally. And the design premise behind Kubernetes is we wanted it to run anywhere, right? And that was really the goal of basically open sourcing it. And then working with the community in terms of providing support, whether you're running it on Google Cloud or running it on any cloud, on bare metal, on any virtualization technology, the whole goal with Kubernetes is just to make it easy, open, flexible, and basically the ability to run it anywhere you want. It was written in Go. And as I mentioned, from day one we open sourced this. And the popularity of Kubernetes right now is it's become the de facto standard for running and managing containers out in the market. Really, the success is behind the community behind it and the API that it brought forward to the world. When you're getting started with Kubernetes-- just a show of hands, how many people attended the keynote this morning? Great. So you've probably heard about the announcements around GKE and bringing the GKE experience to on-prem and things of that nature. Before this announcement, getting Kubernetes up and running outside of Google Cloud really involved using a mix and match of different tooling. It was an inconsistent experience, depending on where you're running it. There are different tools. And we went off and tried to figure out, how do we make this consistent? But at Google, we have this notion called separation of concerns, right? And although DevOps is a very important function, oftentimes when you give a developer a double-edged sword, where on one hand you're asking them to write code, and on the other hand you're asking them to worry about debugging, troubleshooting, deployments, it's not a very optimal thing to do. And at Google we have different functions. We have our developers, and then we have this specialization called an application ops person. And then we have another specialization called the cluster ops person. And in the case of Kubernetes, a cluster ops person would come along and create the environment. That's your day zero, right? And once you have this environment up and running and you're provisioned on top of nodes, and you've configured networking, and you've set up all the various agents and containerized services in the cluster, it's not a simple process. And the cluster administrator really is focused on getting all of that up and running. From there, you basically have now this API that you can use to deploy your containerized applications in a consistent fashion. And once that's available, now the application ops person can come along and deploy their containerized applications in such a way where it's resilient, it's scalable, and so forth. So the role of the developer ends where the role of the application ops person starts. And the role of the application person stops where the role of the cluster admin starts. Now, you might be wondering, how do I do all this in the cloud? Well, at Google, we introduced, three years ago, a managed Kubernetes service called Google Kubernetes Engine. It's been GA now for three years. It's very much managed Kubernetes. It's an API call to create a cluster. We're actually going to show you demo of this. Our demo's going to go through everything that we're talking about, and we're going to do it live. But it's really an API call to create a cluster. You specify how many nodes you need. You specify the region you want your nodes, which zones you want them deployed in. You can specify a whole bunch of other parameters that we'll go through. But that's it. After a few minutes, you have a cluster now. And you have an API that you can use to actually run and manage your containerized applications. On top of that, Kubernetes Engine provides you a level of automation around node auto-repair, right? So your worker nodes, where you're running your applications, every three months, there's a new version of Kubernetes. And we basically manage-- give you the option of specifying if you want your nodes auto-upgraded, if you want your master auto-upgraded. If there's an issue with one of your nodes, we'll go in and we'll repair the node. There's integrated auto-scaling with the underlying virtual machines. So you can configure everything from your container auto-scaling all the way down to your node auto-scaling. And again, we'll show you how all that works in our demo. BRETT HESTERBERG: With that, let me ask the audience, how many of you have experience with Google Cloud SQL? OK, about a third or so. I'll give a little introduction. And of course, you'll see this as we begin to demonstrate our application deployment. Let me start by saying where Cloud SQL fits in Google's overall database portfolio. The database portfolio in Google Cloud is large and ever-growing. You see rightmost on the slide a number of database offerings from our partners. Google Cloud has a large partner ecosystem. And we list here a handful of database partners who have offerings on Google Cloud. This is not an exhaustive list. Leftmost on the slide you see our in-memory database. We pretty recently launched a managed Redis service. So if you have caching needs, take a look at Cloud Memorystore which recently launched Redis into beta. Just right of center is our relational portfolio comprised of Cloud Spanner and Cloud SQL. If you haven't taken a look at Cloud Spanner, there are a number of talks going on this week at the conference, and I encourage you to get to know that product. Cloud SQL in the relational portfolio has one core commitment, and that is compatibility. We offer as close to unmodified MySQL and Postgres as we can, so that if your application works with MySQL or Postgres, it just works with Cloud SQL. In today's demonstration, we're going to be using Cloud SQL for Postgres. But the compatibility story is essentially Cloud SQL's core promise. In addition to compatibility, Cloud SQL aims to reduce mundane database administration tasks commonly associated with running MySQL and Postgres. When we think about the Cloud SQL stack and what it can do, I think Tim Kelton from Descartes Labs does a good job of summing this up in terms of how he sees SQL for his team. He notes here that his team's using Cloud SQL at Descartes Labs because it allows them to focus their time on other things, things that more valuable to them and to their customers. When I tend to think of the Cloud SQL stack, I picture this diagram, where this, in my mind, is what it takes to run a database yourself. You have leftmost on the slide a column that shows essentially the hardware, racking and stacking of servers, thinking about power and cooling for those servers. As you move left, we get to the operating system, where we're worried about installation and maintaining, updating that operating system. Continuing to move left, we finally get to the database itself. We're installing that database. We're keeping it patched. We're keeping it secure. We're setting up things like regular backups for data protection. And finally, rightmost in the slide, we get into some advanced kind of things. We're talking about here high availability, setting up replication for scale out, making decisions about when a database is unhealthy, should we automatically failover, failover manually? And the breadth of this stack, we have to plumb this through to some monitoring systems so we can keep tabs on our environment. The goal of Cloud SQL is to turn this technology stack into an API call, so that if you're a DevOps engineer, a developer, you can set up, with a single API call, a very simple development database or a much more advanced production database that includes automatic failover over for high availability and replication for read scale out. Again, the goal here is simplify this picture, such that this turns into an API or a few UI clicks, as you'll see in our demo. What isn't covered on this slide, and what's not shown in this diagram, is what backs Cloud SQL. And Jason from RealMassive points this out. He notes that Cloud SQL, and managed services generally in Google Cloud, give him and his team peace of mind because they are backed by dedicated 24-by-7 SRE teams. For Cloud SQL, this is the team writing the automation, writing the monitoring to make sure that our fleet of databases is healthy. And when needed, they're the folks who get on the keyboard to bring an unhealthy database back to life. Hopefully, that gives you just a feel for what Cloud SQL is. Let's now talk a little less and start our building. We'll remind everybody that what we're building today is a fairly simple web application. It's about 200 lines of Python code. The language doesn't matter so much. We're taking a web application that we'll test and then finally deploy here in a way that uses Kubernetes Engine and Cloud SQL to make sure it's both fault tolerant and scalable. Again, the picture here illustrates that we're going to span our application across two zones so that we're fairly fault tolerant. And if all goes well, we're going to throw some faults at it. Before we get into this, I want to note two additional key components we're using. We've talked a lot about Kubernetes Engine and Cloud SQL. We're going to be introducing the notion of the Cloud SQL proxy. And what you see in this diagram is the proxy is a client-side installable that makes your Cloud SQL database look local. So if you were to install the proxy on your laptop, and you fired it up at your favorite coffee shop, your Cloud SQL database would appear to be local on that laptop. The Cloud SQL proxy behind the scenes is setting up a secure tunnel to your database automatically. And to the application that you're developing, you can act like, or make it believe, that you are running a local database. And you'll see that in the demo. This is really important for Kubernetes Engine because what we're going to be starting up pods and bringing them down rapidly. And we want to make sure that we don't have to deal with things like white-listing IP addresses on a firewall. We let the proxy do all that sort of authentication and connection for us. The other key concept here is Google Cloud Shell. How many of you have used Google Cloud Shell? That's great. So about half the room. That's actually more than we thought. We joke that sometimes Cloud Shell is one of our best kept secrets in GCP. Cloud Shell is a VM that you get for free, provisioned to you with your Google Cloud account. We're going to be using it today as our development environment because it includes a lot of useful tools for a variety of languages, including Python. It also includes a Docker environment that we're going to use as we start to build and test our application. OK, with that, Allan, let's get into the actual build itself. We're going to start with a Cloud SQL instance. And so you see my project here. And we can adjust the font as needed so folks can see. I'm going to start by creating a Cloud SQL instance. And let me just zoom in a little bit. When you're creating a Cloud SQL instance-- I'm going to choose Postgres in this case-- we answer a few fairly straightforward questions. We'll say this is our instance name. I knew I'd typo right away. All right, this is our instance name. I'm going to use a password that I highly do not recommend. You can all see it here. This is way short and very prone to dictionary attack. Please don't use this. We're going to deploy Cloud SQL and Kubernetes Engine in US central. Obviously, you can deploy it in any region you'd like. And I'm going to pick Zone B, as in Bravo, for our instance. Now let's ask Cloud SQL to take care of some administration for us. As a starting point, I'm going to size Cloud SQL here. Cloud SQL starts at very small, inexpensive instances, less than $10 a month, if you haven't used Cloud SQL. And it scales way up right now to 64 core machines. This number is ever increasing. You'll see higher numbers in the future. But for today, we'll give Cloud SQL, say, eight compute cores. And we'll max out RAM, which is always a good idea for our database. We expect some traffic on our web application. The nice thing about Cloud SQL is you can change these at any time. We can scale up later or scale back down if we need to. One thing that's a bit less intuitive when I'm thinking about scaling Cloud SQL or starting it up is scaling my storage performance. In Google Cloud and in Cloud SQL, performance to scale by adding storage capacity. And you see that Cloud SQL includes a performance calculator here, which right now is showing me that we just have a sliver of the performance we could get for our instance here. I can increase that performance by increasing capacity. And I'll give it a little more because, again, we think our web app is going to be a popular one. I mentioned that Cloud SQL aims to help manage mundane database administration tasks. And a good example is the automatic storage increase. This is a simple checkbox in the Create flow. What it does is tell Cloud SQL, hey, keep an eye on my disk. Make sure I don't run out of disk space. Before I do, increase it a little bit and continue to do so that I don't run out of disk space. This, I think, is a great example of what we're talking about when we say managing the mundane. I'll leave this checked so that, just in case, we don't run out of disk today during the demo. As we look a bit more, we'll ask Cloud SQL to automate our backups. We'll have it do so pretty late at night to keep it out of the way of our production runs during the day. And as a key point, I'm going to tell Cloud SQL that we want to be highly available. When I do this, Cloud SQL gets the message that it has some extra work to do to make sure that this instance is a bit more fault tolerant. OK, with those options selected, I'm not going to bother authorizing any networks. We're going to let the Cloud SQL handle our connection security for us. And I won't bother setting up any more configuration. We'll just go ahead and create our instance. A Cloud SQL instance takes about three minutes to create. So fortunately for all of you, I have, very Martha Stewart style, one fresh from the oven ready for us to use. And with that-- thank you, by the way, for the few laughs. With that, I'm going to fire up Cloud Shell, as I mentioned, one of, potentially, our best kept secrets. And let's get to developing our application. As Cloud Shell comes up, remember, this is the VM that's provisioned for you in your project. It comes with a number of tools built in. I've done just a little bit of setup work. What I did was install the Cloud SQL proxy and set up a service account so that I can authenticate or connect to any Cloud SQL instance in my project. So let me go ahead and start up the proxy. And then I'll show you what a connection looks like. OK, so we'll start up the proxy. This looks good. And then I'm going to go ahead and connect using Cloud Shell's built-in Posgres client. And what you'll notice here as I type this command is that I'm going to tell my Cloud Shell VM that the database is running locally. You see that I'm using localhost as the address. When I do that, the Cloud SQL proxy intercepts, and it asks for my password, which I'll put in my very secure password here. And we've connected here. Let me make sure that we have an empty database. We just created this database, so let's make sure. Let's ask Postgres if it's empty. Whoops. OK, Postgres says I've got nothing in my tables, no tables at all, which is a good thing. We've got a brand new database, and we're ready to use it for our application. What you've seen here is that I've got good connectivity now from my Cloud Shell VM to my database. So with that, let's get our app started. One of the things to note-- you're more than welcome to take pictures as we go along. Everything Allan and I do today is captured in a public code lab, so you can go through this yourself when you're ready on your own PC. What I will show you is the application that we've pre-built for you as part of the code lab. As I mentioned, it's a Python application, a pretty small web app. And I'll download it now to our Cloud Shell. This is hosted right now on GitHub. And again, it's available for you as part of the code lab. We're writing this application in Python, but the language itself doesn't matter as much. For those who are familiar with Python, I'm going to go ahead and set up a virtual environment so we can get our development started. And we will also then go ahead and activate the environment before we can get some necessary modules downloaded, some dependencies for our application. So I've got my environment activated now. Let me go get some dependencies. The pre-built app comes with a requirements list that you'll see as one of the files. And I can go and pip install a bunch of dependency modules. For those who are familiar with Python and web frameworks, you'll notice that most of these dependencies are Flask. We're running a Flask web app today. So we've downloaded these. Let's go ahead and test our application. So I'm going to launch this directly within the Cloud Shell VM. And a really useful feature here is that Cloud Shell builds in a local web host environment. So I can test my web application right here from the Cloud VM. Let's bring this up. And what you see is our app is at least deployed as a test application. We're creating a Meme Gen application, which is very important at Google. And I will go ahead and create the first meme. Let me put in our meme here. We'll see if our application can happily talk to our database. And there we go. We've got our first meme of the day. Let me navigate back. We'll just see that the app's working. We should just have our one meme in Recent. OK, so we know now our application is working, at least for test purposes. Let's navigate back to our Cloud Shell where I will go ahead and kill our web host real quick So now, if I look at our other tab and I do a refresh, we should see, yep, our web app is down. So we don't have our test running anymore, which is great. What we're going to do now is start to use Docker. We're going to containerize our application now that we've tested it and know that it's running. What you see here is that our Cloud Shell VM has some Docker tools built in. And I'm creating now a container that we will go ahead and deploy using Docker within our Cloud Shell VM as a way to test again and get ready for our deployment to GKE. As this is building, Allan, can you talk a little bit about the general steps to take an application like we have now and get it to GKE? ALLAN NALM: Yeah, so the first step is containerizing the application. Once it's containerized, we're going to push the image into Google Container Registry. And Google Container Registry is the private repo for you to store your container images. Keep in mind that you can use Google Container Registry not only for Google Kubernetes Engine or Google Cloud, but you can use it from outside Google Cloud as well. So once this image is stored in Google Container Registry, we'll then be in a position to specify and build our Kubernetes manifest to go pull from Container Registry the images that we want deployed in our pods. BRETT HESTERBERG: That's great. So it looks like our container has built. I'm going to go ahead and run it in Docker in our Cloud Shell VM. We've got that running. And what this should mean is that as I navigate back to our test, or our local web app, if I refresh this app, we've now containerized the application. And ideally, this should work. There's is our Meme Gen back. We stored our first meme in the Cloud SQL database, so it's been persisted. And we have a live containerized web app. Let me navigate back to our Cloud Shell where we can stop our Docker image here. And we'll get a little bit of cleanup done before, now, we get ready to move to Kubernetes. So we've tested our application. We containerized it. Now, as Allan said, we can start to get to the fun part, which is let's get this deployed to Kubernetes Engine. Let me go ahead and stop our Docker runtime here. There we go. And we'll just double check that indeed, yep, we've got this turned off. OK, so let me kill off this tab. I'm going to also stop or Cloud SQL proxy. So we've gotten cleaned up here, Allan. Let me now start to get ready to deploy our application. ALLAN NALM: All right, now comes the fun part. So we're actually going to go in and create a Google Kubernetes Engine cluster. BRETT HESTERBERG: And so as part of this, Allan, we're starting to set up a bit of credentials. And we're going to get ready to deploy first our image to Container Registry. ALLAN NALM: Yes. So what's happening here is we're tagging the image that we're going to push into Container Registry. And then we're going to issue a command to push this image into GCR with those associated tags. And then the step before that was really giving the Docker runtime the credentials to be able to communicate with Google Container Registry. BRETT HESTERBERG: So what you see here is we have a Container Registry that's empty. This is a new project for us. We don't have anything that we've uploaded to Container Registry yet. But let's change that right now. And we'll start the push to Container Registry. Allan, as this starts to work, can you tell us, in general, what's the benefit of using Container Registry? ALLAN NALM: Right. So Container Registry's really a secure way for you to store your images, and it can integrate with any CI/CD tooling that's available today on the market. It gives you consistent experience for managing the lifecycle of your container images. And as I mentioned earlier, this is not just a Google Kubernetes Engine service. You can use it from pretty much anywhere outside of Google Kubernetes Engine, including outside of Google Cloud. We've done a bunch of work around giving you the ability to sign images, really have at-the-station steps around the lifecycle of images coming out of Google Container Registry. So it's really about giving you that consistent and verifiable and secure way of running and managing your images as you look at pushing them into different environments. BRETT HESTERBERG: And I think, yeah, the important point you mentioned there is that while Container Registry is very useful for using Google Cloud products, it can also be useful when you're managing things well outside of Google Cloud. ALLAN NALM: Correct, yes. BRETT HESTERBERG: OK, so we're close now to getting this pushed to our Container Registry. Let's see as this command finishes up if, indeed, we have this uploaded. We've got our Meme Gen app. And let's just take a quick look. Looks like we uploaded it just now. OK, great. Next step then, Allan, is to actually create our Kubernetes cluster. And before I do that, we'll just double check. I don't think we've created anything in this project before. Looks like we've got a pretty brand new project here. Let me go ahead and issue this command. And as this is running, let's talk through a few of these parameters. I'll bring up the UI. One thing to note here is that while we are pretty CLI heavy in this demonstration, all of these things can be done via a UI, or even an API, depending on your preference. I'll go ahead and bring up the UI so we can talk through a few of the parameters. ALLAN NALM: Great. So what we're doing here is we're actually creating a Kubernetes cluster that spans multiple availability zones. So think of an availability zone as a fault domain, right? And the whole goal here is our nodes will span availability zones, so when we deploy or applications, the replicas of the application will be across the region. So this gives us higher availability and resiliency for our application. And let's just walk through some of the configurations associated with the cluster. When you go and create a cluster, by default it's zonal. So your control plane and the worker nodes will all be in the same zone. But you have an option to change it to regional. And when you change it to regional, you now have the ability for your control plane, which is your master, to be replicated across three zones within the region. And then you specify how many nodes you want for this particular cluster. And then those nodes get deployed across the various zones within that particular region. I mentioned earlier that Google Kubernetes Engine, having been a GA product for three years now, we feel like we have the most sophisticated, most mature managed Kubernetes offering out there on the market across any cloud. Typically, when there's a new version of Kubernetes out there released every three months, Google Kubernetes Engine has support for that upstream version within a couple of weeks. So part of the process is you basically specify how many CPUs you want for each machine type. By default, the node operating system is Container-Optimized OS. This is a Chromium-based operating system that's very secure that Google has created, Google maintains. If you want more flexibility from a host operating system standpoint, we also provide support for Ubuntu. Now, if you scroll down a bit more, you provide information around how many nodes you want in your particular cluster. And that's really around the day zero experience. Now, there's a whole bunch of parameters really associated with the day one experience. So you can specify, for example, I want Google to automatically upgrade my nodes when there's a new version, or I want to be notified. If something happens to a particular node, it fails, I'd like Google Kubernetes Engine to basically repair that node and fix that node. Same thing with patching, all that is very-- we have an SRE team that basically manages the services. And if anything happens to your cluster, that SRE team will fix the problem. There's built-in integration with Stackdriver logging and monitoring. Scroll down, click on More. You can label your cluster. You can assign additional zones that you want your nodes to be distributed across. There's auto-scaling not only at the pod level, but at the node level. So you can integrate with the underlying managed instance group framework for Google Compute Engine to initiate an out-of scale from your pod level all the way down to your cluster level. So we would add additional nodes to your cluster if there's resource contention. You can configure persistent disk storage to your cluster and basically map the persistent disk resources, depending on where your pods are running, which zone that they're running in. And then one of the really cool features of Google Kubernetes Engine is the integration with Cloud networking. The fact that Google Cloud networking is a global service, you can assign a VPC to your cluster. And that cluster now has the ability to take advantage of a global network from a backbone standpoint. So you can actually have clusters that are running in different regions that are using the same VPC that can actually talk to each other. Pods can talk to each other. That's a very cool feature that's only available in Google Cloud. BRETT HESTERBERG: Those are great points, Allan. And we picked out a few parameters specifically for this demonstration. But I think walking through the UI here shows some of the flexibility you get when you're creating clusters. Again, if you're new to Kubernetes, have a pass at our code lab. We think we've picked a few good options, but obviously there are more to choose from. It looks like, Allan, our cluster has been created. I see now it now in the UI. ALLAN NALM: Yep, so can you click on Compute Engine? BRETT HESTERBERG: Yeah. Let's take a look here because you mentioned that underpinning all of this are a set of VMs. ALLAN NALM: Yes. So you see these VMs now that are created? There are six in total. And they're created. There's two in each zone. So this gives you the resiliency now for your application so that your application can be scheduled. Depending on how many replicas you specify, they'll be scheduled across the availability zones for the cluster. BRETT HESTERBERG: OK, so let's continue on. We promised some scalability, et cetera. Let's get this started to set up here. ALLAN NALM: So the next thing we're going to do is we're going to get the credentials for that cluster, and we're going to store it in our kubeconfig file. It's a local file. So what kubeconfig does is it provides-- Kubernetes has a command line interface that comes with the project called kube control, or kubectl. So once you have the credentials, you can now use kube control to manage your cluster and basically operate against any of the APIs, the Kubernetes APIs that are available. And that's what we're going to do. From here forward, we're going to use kube control to manage and operate and deploy applications on our cluster. BRETT HESTERBERG: That's a great point. I'll get started with kube control by setting up a couple secrets. I'm going to set up two secrets here. The first will be to make sure that our application can use the Cloud SQL proxy to connect to our Cloud SQL instance. The second secret, you'll notice, is my username and database password, so that our application can log in. So we've got our two secrets created here. What I have done before we started was modify the YAML file that we're going to use for deployment. So let me just get the new YAML file put in place here. And then we can take a quick look at the YAML file, Allan, just to show folks what what's inside of this one. ALLAN NALM: Yeah, so this is a deployment file. This is a Kubernetes artifact, right. It's a specification that you define that describes your application. And what we're doing here is we are deploying a pod that contains the [? Meme ?] Gen container, as well as the Cloud SQL proxy as a sidecar container. So a pod is really about packaging containers together that have very much a dependency on each other, right? So any time you have multiple containers that need to live and die together, or perhaps need to share local IPC or share storage, you put those in a pod. And it becomes a unit of scale. When you create a pod in Kubernetes, every pod is assigned an IP address. So we have this notion of IP per pod. And these are internal addresses that are basically pulled out of the CIDR block ranges that are assigned for every node within a Kubernetes cluster. By default, every node gets a slash 24 CIDR range. And when a pod comes up on that particular node, it takes an IP from that range. BRETT HESTERBERG: That's great. So I just used our YAML file to start the creation. And what we'll look for is that our pods get created as we'd expect, as you mentioned, with two containers. ALLAN NALM: Correct. So think of a deployment as a Kubernetes controller, right? So with the deployment, Kubernetes brings this notion of a declarative model. So within a deployment, you specify your desired state, what you'd like the world to look like for your application, so how many instances. You can specify certain labeling associated with your application. A controller is constantly monitoring the actual state of your application. So if something bad happens, a node goes down, a pod goes down, the controller will look at the actual state and then will compare with the desired state that you specify in that deployment file, and will always ensure that your actual state equals your desired state. This is the beauty in terms of being able to deploy an application that's highly available, resilient, and not have to worry about ensuring that the application, the number of instances. If something goes wrong, Kubernetes is will always kick in with health checks to ensure that your actual state equals to that state that your application needs. BRETT HESTERBERG: It's really a notion of infrastructure by configuration file in this case. ALLAN NALM: Yep. So what we're doing now is we talked about the whole notion of a pod gets an internal IP address. But let's say you want that pod to be accessed from the outside world. And what Brett just did is exposed the deployment as a load balance service. So what this actually does it creates a service object, and it creates a GCE load balancer, assigns it an external IP address, and creates the routes associated with that external IP address back to the pods, the internal pods, that are running in your cluster. In our case here, we only had one pod that was running. But think of a case where you have 10 pods running across multiple machines. Once you hit that external IP, they'll get round-robined to the appropriate pods running across your cluster. So as you can see here, we have an external IP address that's been assigned. And Brett can now hit it. And basically, it should show the application. And for those of you that have phones and want to hit that address, please go in and meme. BRETT HESTERBERG: Yeah, I think this is really the moment of truth. We're going to find out right now if we deployed this app publicly. It looks like on our end that this app is good. We're going to double check our database here. Looks like our lone meme is still there. So as Allan mentioned, if you have mobile devices and a pretty good set of eyes here to read this URL-- we can read it out to you. Let me zoom this in, if we want. Here, let's just see. I'm struggling to zoom the IP address. I will read it quickly-- 35.226.10.149. I suspect there are many great memes in this audience, far better than mine. And to sweeten the deal a bit, Allan and I have come armed with $500 Google Cloud credit coupons. We're happy to dole those out to folks who create some of the best memes and hit us up after the event. One thing I will note here is that this session is being recorded and will be broadcast on the internet. So let's please keep the memes as tasteful as we can. But I look forward to seeing what you come up with. We talked to Allan at the beginning about auto-scaling, or about building a scalable application. Let's start to add an auto-scaler to our configuration. And then we'll see if, between the audience and maybe ourselves, we can generate some load. ALLAN NALM: Let's do that. BRETT HESTERBERG: All right. And I'm looking forward, by the way to-- whoa. Some of the memes coming in. We're going to go back to that tab. [LAUGHTER] OK, so we just added an auto-scaler. ALLAN NALM: Yeah, so what we're doing right now is we're creating a horizontal pod auto-scaler object. And what this does is it defines a certain set of thresholds. So what we're doing here is we're saying, if the combined CPU utilization across all my pods increases above 50%, I'd like you to auto-scale. So if we take a look at our HPA object right now, there's really no load. BRETT HESTERBERG: There's no load at this point. We should really see this number at almost close to zero. So we'll let this get it initialized here. And then, oh, it looks like we've got a little bit of load, courtesy of the audience. Thank you. Let's push this a bit. Allan and I have set up a VM that has an application called Hey installed. Hey is a web load generator. Let me bring this up. Bear with me just a moment. We'll add our load to the mix. And please keep your memes coming. We'll see if we can knock down our application here. Let's just take a look. I'm going to bring up a terminal to this VM. And we'll get our load kicked off. And then we'll see how Kubernetes handles things. And by the way, I'm excited to get back to the tab to see what you all are creating. ALLAN NALM: So it'll take a minute or so for this to come up. And what'll actually happen is the horizontal pod auto-scaler will be monitoring the load usage coming in. And then as soon as there's an average CPU utilization, above 50%, you should see a pod auto-scale. So we're talking about auto-scaling here at the pod level. But keep in mind, we've also enabled cluster auto-scaling for the Kubernetes Engine cluster. So if, for some reason, we exceeded the resources that are available from a node standpoint, Kubernetes Engine will actually provision a new node, a new set of nodes in our zones as well. BRETT HESTERBERG: OK, so let's take a quick look and just see how many pods we have. We still have our lone pod running. Hopefully, our load generator has started to take effect. Let's just see here. So we're getting close to our target here. We'll see if between the audience and our load generator, we can ratchet this up. What we're looking for is a load number coming in that is in excess of our 50% target. When that happens, that tells Kubernetes Engine, Allan, to start to scale by adding more pods, adding more web front ends behind our load balancer so that we can scale to meet the load of our now very popular application. Let's take a quick look here and see where we're at. Still at 50. We'll see if we can get our Load Gen tool spinning up a bit more. ALLAN NALM: And these thresholds-- you can also-- here we're using CPU. You can also specify memory. And we also have an option for you to define custom metrics, so number of requests, queue length, things of that nature, to trigger these horizontal pod auto-scaling. So there we go, 266%. So we've exceeded the threshold, and Kubernetes now is auto-scaling the pods. And these pods are being auto-scaled across the availability zones. And there we go. So now we have four pods that are running within our region. And this is the actual state that the application is running in. And the controller is monitoring us. BRETT HESTERBERG: It looks like our app is still up and pretty healthy, even under this load. Let's give it a little more to work with. I'm going to say that something bad happens to, let's just say, our original pod here. ALLAN NALM: Yep, so if we kill a pod, the controller will check to see that, hey, I've got only three pods running, I should have four, and will automatically create a new pod so that you constantly have enough capacity to serve your application. BRETT HESTERBERG: All right, it looks like we fired up one, and it's a replacement. I'm going to see if I can give it a little more work. We'll kill one more pod just to see what happens. And while we kill that, let's go take a look at our web app. It looks like our web app is still living pretty healthy. Wow. We've got-- did someone script this? [LAUGHTER] That may be worth a coupon in and of itself. Looks like our web front ends are handling this load pretty well. Let's imagine now that we have a problem on the database side. So I'm going to go over to our Cloud SQL instance. And let's give it a problem. I'll start by noting that our Cloud SQL instance is happily existing in Zone B of US central 1, Zone Bravo. Let's ask Cloud SQL to failover this instance. I'm going to initiate a failover here. We'll start this. And Cloud SQL says this operation is starting. So what we've done here is simulate a failure. Typically, Cloud SQL will be health checking our instance. And if for some reason the instance becomes unhealthy, Cloud SQL will trigger this failover automatically. Here, because we just want to show what this looks like for our application, we're testing a failover by manually triggering it. So Cloud SQL thinks the failover operation's in progress. When this happens, our application attempts to connect to the database and it fails. So what's going on right now is Cloud SQL had previously provisioned our primary instance in Zone B. It provisioned a standby instance in some other zone within the region. It always makes sure it's in a different zone, a different failure domain. Cloud SQL is making sure that our primary instance, our old primary instance, has come down, won't serve any more write requests. And then it's bringing up our standby instance. After doing so, it's going to move the name and the IP address of our primary instance over to the standby instance, making it primary as well. This is important because to your application, when a failover occurs, it looks simply like a database goes down and then comes back up. There's no need to build in failover logic to the application. Cloud SQL is orchestrating those mechanics itself. After it's bringing up our new primary instance to life, Cloud SQL's going off and building another standby instance in yet another zone to make sure that our new primary instance is protected. Usually this whole failover scenario with Cloud SQL for Postgres takes about 45 seconds. So let's check back in and see what Cloud SQL has to say. Cloud SQL looks like it's healthy again. And we'll try to note now, our primary instance is no longer in Zone B. We're now in Zone F, as in Foxtrot. So we failed from Zone B to Zone F. Let's see if our application is happy again. With any luck, we'll do a refresh here. And it looks like we're back. Ooh, wow. Well done. So what we've seen here, Allan, is that we've injected some faults in our application. And it looks like we've successfully recovered. We've coped with the load from our VM, as well as from our audience here. With that, let me switch back now to our slides. And we'll do a little bit of wrap up. So again, I think a couple of key points in our architecture here-- you saw the auto-scaling nature of Kubernetes Engine. As we increased load, Kubernetes automatically added pods on our behalf. You also saw how Kubernetes Engine and Cloud SQL handle failures. Our application was down for about 45 seconds in total, while Cloud SQL was failing over. It wasn't down at all when Kubernetes pods failed. The upshot for Allan and I is that we didn't get a text or email about this. This happened without us taking any intervention. And you all are happily creating memes again. So this, I think, really speaks to the fault tolerant nature of this architecture. We want to wrap up today with just a few next steps. As a starting point, please try this out yourself with the code lab that's available online. If you haven't used Google Cloud before, there is a free trial for folks who are new to signing up. You get 300 free trial dollars. If you created a great meme and want to show it to Allan and me after the presentation, we'd love to give you $500 more dollars to play with, with Kubernetes Engine, Cloud SQL, and other products. We're early in our conference this week. I have a few presentation picks for you. First is a selfish plug for a presentation tomorrow about Cloud SQL, migrating to Cloud SQL with low downtime. If you have databases and you want to get to Cloud SQL, take a look at this presentation. Two others on the list, Allan, are computing with power not previously possible with BigQuery and Compute Engine. A bit of a tongue twister, but I think it's going to be a great presentation. And lastly, if you enjoyed the nature of this presentation, building an app together, we think you'll really like Cross Platform Mobile App Build with Firebase. That's Dev 222 later in the week. [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 6,687
Rating: undefined out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: OHyNlAckQCc
Channel Id: undefined
Length: 48min 16sec (2896 seconds)
Published: Wed Jul 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.