Google Cloud Platform (GCP) Essentials (Google I/O'19)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ALEXIS MOUSSINE-POUCHKINE: Hello, and good afternoon, Google IO. Thank you for being here. Thank you for everybody on the live steam, as well. The cloud is what powers an endless number of web and mobile apps, and that makes them magical. The easy access to potentially huge amounts of compute, storage, and to machine learning APIs is what makes the cloud so special. And something that you probably should try to leverage in your applications whenever possible. My name is Alexis, and I'm a developer advocate with Google Cloud. Now, a few questions for you. How many Android developers do we have in the room? How many web developers? Anybody here has used Firebase before? A good number, that's great. Now, to be fair, server side-- or otherwise, back-end development-- can look tedious to some folks, or just not your cup of tea. And that's fine. Luckily, Google has this rich, open, and developer-friendly cloud, Google Cloud Platform, or GCP for short. Let's take a look and start with Firebase. You might know Firebase for all the amazing features it offers to enhance your client code-- authentication, crash reporting, analytics, A/B testing, and the fairly recently announced ML Kit. I'd like here to cover everything that is not a client-side feature. And that's because Firebase is a wonderful way for Android and web developers to leverage Google Cloud Services without exposing them to all the knobs of GCP. So first, there's cloud storage. This will let you manipulate user-generated files and application files, such as pictures, videos. Think of it as a way to save files to the cloud for later retrieval from any authorized device or service. Next, Cloud Firestore is the JSON database for all your application data with real time notifications. This means that if a specific value changes in Firestore, all your connected users can be notified. This is truly amazing. Cloud Functions is the easy way to add server-side logic without having to manage any server or any cluster. Simply provide a piece of code. specify the event that triggers the execution of that code. And that could be, for example, an image uploaded to a bucket, or some data changing in Filestore. Finally, ML Kit for Firebase offers the ability to add machine learning features to your apps using APIs that run either in the cloud or on device. So this includes text recognition, face detection, barcode scanning, image labeling. And as announced at the technical keynote yesterday, we now have also something called AutoML Vision Edge, as well as on device translation. So why am I talking about Firebase, here? Well, it's simple. These features that I've just described are all based on Google Cloud Platform, or GCP. And if you only remember one thing from this presentation, it should be that you should start with Firebase-- if you haven't done so already-- and then grow to use GCP. When you create a Firebase project, it is, in fact, a GCP project in every aspect. Resource grouping, identity management, and billing. This gives you a great way to easily use some cloud services-- before you graduate into using GCP-- when you need to. So let's talk about Google Cloud Platform and give you a sense of how it can extend what we've just briefly seen. So Google Cloud Platform, GCP, is big. I mean, really big. We recently had a large event in San Francisco called Cloud Next with tens of thousands of participants, with some sessions just focusing on one specific feature. So I can't possibly be covering everything here. Instead, I'll try to cover what I believe might be useful to you and give you a few tips along the way. Maybe you already have some sort of back end software, and you're just looking for a place to host it, potentially, in the cloud. So that will help you spend less time on infrastructure management, get potentially better security, and essentially spend more time writing your app. If you are familiar with virtual machines, Compute Engine is a great environment for you. Compute Engine, or GCE, offers virtual machines, disks, and networking. As you can see, we can go from very small to very large amounts of CPU and memory. And that can be done using sliders, as shown here, to get the perfect configuration. You can also customize your virtual machines by adding GPUs and Cloud CPUs, tensor processing units. There are many delightful features about GCE, but one of my favorites is the little time that it takes to provision and boot a VM. But it's probably best if I show you this in action. So with that, let's switch to the demo machine, and let's create a VM. Let's take most of the defaults. I will call this instance one, just to make it more fun. Create this in Europe. And you can have different presets. You can see the sliders that I just talked about. You can change the image, the OS that you're using. There are many to choose from. You can set identity and API access settings and setup a firewall to allow or disallow-- which is the default-- any HTTP or HTTPS traffic. So let's actually go ahead and create that instance. So we're actually provisioning here the instance. We're creating this, in this case, with a DBN image. With all the defaults, this one has one CPU and, I think, four gigabytes of memory. And it is being created, as I said, in Europe. And that's it. We have the VM that's available, that has an external IP address. And we can, at this point, SSH into the machine. And notice, I didn't have to install anything here. I'm just using a web browser, clicking through a button, and here I am, actually connected to that machine that didn't exist a few seconds ago. So I can look at it, and I can do things such as apt-get. Update. And I have a fully working machine, which I can resize, which I can delete, which I can recreate very easily. So, with that, let's go back to slides. This is just a few seconds to get a new machine running in a Google data center. You can also do this with the command line, which means you can easily automate the process of spinning off a machine, or multiple machines, when you need them, and also shut them down when you no longer need them. There are many features with Compute Engine, some that I would like to point out are live migration, which is really a unique feature to migrate running applications from one machine to another. Preemptible VMs offers you short-lived, low cost virtual machines, something that I would recommend that you use if your jobs, your running processes, can be interrupted. We also have some automation features, such as instance groups, to administer virtual machines in batches, to achieve some level of scalability, and even to do auto repair on production instances. Now, let's switch gears a little bit and talk about storing data, and databases in particular. Storing data is likely to be vital for your application. You could, of course, spin up a VM-- as we just did-- install your favorite database there. But here, we'll go through some cloud, or GCP native, storage and database options, just a few to give you a sense of what's available. Chances are you have files that you would like to store. And cloud storage, in this case, is a pretty obvious choice here. You can put all of your files into one or multiple buckets, and prices for storing these varies depending on what we call the storage class. To give you a sense of the cost, you can host all of the internet archive-- and that's about 10 petabytes-- for less than $100 per month, all with no disk to manage. No capacity issue to worry about. So cloud storage offers different classes, as I said, of storage. These classes range from having data as close as possible to the end user-- this is multi-regional-- to archival needs. The regional option here is good for when data needs to be close to the machine that processes it. The long-term storage, Nearline and Coldline, are technologies that come with retrieval cost but are ideal for archival data. But really, best of all here is that there's a single API to manage the lifecycle of your objects, regardless of their storage class. And you can move objects from one class to the other, so data that's no longer used can be pushed back to something like, let's say, the Coldline storage. Now, for user data, such as their profile information, the transactions associated with those users, you may want to use a relational database. And rather than setting up, managing, and securing your own installation, you can use Cloud SQL instead. This will free you up from monitoring uptime of that database, from managing backups, and from applying security patches. Cloud SQL also offers the ability to define replicas for a highly available setup. Their currently supported databases are MySQL, Postgres, and SQL Server coming soon. Now, NoSQL databases are known to scale, regardless of the amount of data that you throw at them. And Cloud Firestore is no exception. In fact, this database offers a JSON structure with no schema. It is strongly consistent. It is indexable. And it is serverless, meaning that there is no infrastructure to size, provision, or manage. And it comes with great SDKs for mobile and web development. If you've ever used the real time database for Firebase, it's all the good things you know with added strong consistency, better querying capabilities, multiple data centers around the world, and still the great notification and offline capabilities that you know. So which one do you choose? Well, for files such as images, PDF documents, it's pretty easy. Cloud storage Is most likely what you need. For user data, transactional information, good old relational databases offered in a Cloud SQL package are probably a good choice. If you need horizontal scalability, a schema-less database, change notifications, mobile offline support, you're probably looking at Cloud Firestore, which you can use, by the way, regardless of any other Firebase features. Now, let's go beyond VMs and into Cloud native solutions. Now, don't get me wrong. Virtual machines are great. But management remains your responsibility-- provisioning, patching the OS, updating it, and all the securities left for you to do, which means time not spent developing your app. And also, the scalability is pretty much vertical. You can make the VM bigger, but you can't really play on the horizontal axis with clustering technologies. And availability is also something that is hard to add after the fact. So here, we'll talk about cloud native approaches to running your code-- Kubernetes, Cloud Functions, App Engine, and the newly released Cloud Run product. So maybe by a show of hands here, how many people know and use Docker containers? Great. So containers are solving the works on my machine problem by packaging an app with all of its dependencies-- including its runtime-- into a container. But if you use containers, you also probably know that that does not solve all problems. You still need to schedule and scale those containers. You need to manage their health, monitor them, and more. This is where schedulers, such as Kubernetes, come into the picture. As the inventor of Kubernetes, Google offers Google Kubernetes engine, or GKE, a fully managed Kubernetes service. If you were starting from virtual machines, or from your own servers, setting up Kubernetes clusters would mean that you have to deal with, well, actually creating the virtual machines, attaching some storage, installing the actual Kubernetes software, setting up some networking and security. And that's a lot of work. Instead, with Google Kubernetes Engine, GKE, it takes one command line and just a few minutes. Once created, the cluster is ready to host your containerized applications. Kubernetes version upgrades, auto repair of failing nodes, and other features are provided out of the box with GKE. Now, while GKE offers a wonderful, portable platform, it comes with a requirement to first containerize your application. It also still requires creating and managing a cluster with worker nodes. So this isn't really serverless, which we define as something that has no server management, which is fully secure by default-- this is not your problem-- and that has to pay per use through auto scaling, including scale to zero. This all sounds nice. So let's take a quick look at some GCP products that actually qualify for this definition. So an obvious and popular example of serverless is Functions as a Service, or FAS. And Cloud Functions is a great implementation of that paradigm. Simply upload some code written in Python, and go in Node, or even in Java, along with its dependencies, and define the event that triggers the execution of that code. The events can be anything from an HTTP request coming in to a file being uploaded to a bucket. Data changing in Firestore. Messages being posted to a Pub/Sub topic. Cloud Functions has been used by customers to implement everything from, what we call, glue code to a fully fledged microservices based applications. But enough talking. Let's see Cloud Functions in action. If we could go back to the demo machine-- This is a function written in Node that will be triggered when a file-- a picture, in this case-- is uploaded to a specific bucket. So we have an event for a given bucket, for a given file that has a name. So the first thing we do here is we download that file locally, and then we use a library called ImageMagick to actually do the resize of that picture, and to resize it to width of 256 pixels while preserving the ratio. We write those to a local file. And if everything goes well, we upload the result with a prefix called resize to that same bucket. So this is the code. There's some actual metadata in terms of dependencies, where we declare dependencies on Cloud Storage, which we listen to, and the ImageMagick version that we use. We also define what triggers the execution of this. And in this case, it's an upload to this bucket, which I can click on. This bucket is empty, and I suggest that we actually upload a picture there. So as we do this, we can look at the picture that was uploaded. And we can go back here. And hopefully-- as I refresh the bucket-- we have a second picture that's there. And that has been the resized version of the initial one. So here you go. These are Cloud Functions in action. And maybe we can move now back to slides so I can tell you that these are available in multiple languages, as I said. This was Node, but you can use Python, Go, and Java. And there are many events that can trigger them, not just file uploads. I think Cloud Functions is the easiest way to access one of the many powerful GCP services, from machine learning APIs, to other storage and processing solutions. Now, as a developer, you may want to have yet even more freedom in the languages and the frameworks that you use. And most importantly, you may want to hand over a carefully crafted Docker image instead of source code. Cloud Run was announced last month at the Cloud Next conference. And it is here to offer you a truly serverless experience for your stateless HTTP container images. So the events are HTTP, and there needs to be no state preserved by the container for this to work. But if that is something that works for you, well, simply build your image, upload to a registry, and create a cloud run service using that container. At that point, your app is now deployed and running in the Cloud. And you can forget about the provisioning and managing of servers. Cloud Run does that for you. It will automatically, and quickly, scale up and down based on the incoming traffic. It will even scale to zero, meaning, no traffic, no cost. But even more, you can use the same container on your own GKE cluster. If you really want to understand and master the underlying infrastructure, you can do so since this is all actually written on top of an open source technology called Knative, which provides an abstraction layer on top of Kubernetes to provide a server environment. So choice is good, right? Cloud Functions. Cloud Run. Well, guess what? There's even more choice. App Engine is the mother of serverless at Google. And it offers the ability to build and host entire web applications with multiple services while still retaining the source deployment approach, all of that, obviously, with serverless benefits. It comes with versioning built in. It provides out-of-the-box traffic splitting to implement things such as cannery deployments or A/B testing, all at the click of a button. It also supports a long list of languages, including recent versions of Python, Java, Go, PHB. And we recently announced Ruby, as well. So at this point, you might be confused about which one to use. Let me suggest that you think of it this way. Which artifact would you like to deploy? Would you like to give me a function? Would you like to give me an app that has multiple services? Or would you like to give me a container? All of these are serverless. All of these will scale to zero. And we will manage the entire infrastructure for you. OK, so intelligence. Now, I'm not trying to imply here that your apps are not built by smart developers, but instead, that there are some amazingly low hanging fruit to make those applications even smarter. I'm talking here mostly about machine learning, and specifically about easy-to-use APIs, which any developer can call, regardless of their ML skill set. We call these AI building blocks. And they can group into the following categories-- the Vision API and the Video Intelligence API, this is the site category. We have the language category, with natural language and translation APIs. And we have the conversation category, with speech-to-text, text-to-speech, and Dialogflow APIs. I mentioned ML Kit for Firebase earlier. What you have here is the server-side machine learning APIs that ML Kit actually uses. So these APIs are available via RESTful endpoints, making them easy to be called from any part of your application or your architecture. Certainly, if you're building mobile Android or iOS apps, ML Kit for Firebase is a great way to use these. So let's look at the Vision API, which is one of those APIs. And if we switch back to the demo machine, I can actually test this API right from the browser. I can upload a picture, same picture. Did I click on it? All right, let me refresh this. And I am not a robot, I hope. So I'm sending this to the Vision API, asking it to return and tell me everything it finds about that picture, from entities to landmarks, to text, to web properties and entities. So it has detected that this is indeed a landmark. This is Notre Dame, in Paris. There are a bunch of labels that it found. This is machine learning working for you with a pre-trained model. There are web entities with all the things that it finds on that picture. It even finds text. If you were to zoom in, you could see that the barge here is called Nouvelle Seine. There's some image properties, such as dominant colors. And last but not least, there's what we call Safe Search. Is this picture safe from a adult, spoof, medical violence, or racy point of view? If you have a website that has user generated content, you probably owe it to yourself to use something, such as the vision API, to make sure everything that's upload is actually something you can then show to other users. All of this is actually the result of a request, asking for landmark detection, face detection. We don't have any in this one. Object localization, we don't have any. But image properties, crop hints, web detection. And the result is a JSON document, which is actually, parsed and presented here, in this UI, but which, you typically would be using in your application to enhance your application. So if we move back to slides, AI building blocks, in the form of API calls, can be extremely powerful. And they're really easy to set up. Again, they're really just an API call. So those APIs, such as the vision API, are great. And we call these pre-trained models, meaning that Google did the heavy lifting of training a model, leaving you with the easy prediction part. Send an image. Get a result back. But what if you wanted to build your own model from your own data to better fit your business needs? This is where Cloud AutoML comes in. This is another part of our AI building blocks. AutoML lets you create your own custom machine learning models with an easy-to-use graphical interface. These models can be specific to your business needs and trained with your own data with minimal effort and little to no coding. Now, if you're an ML developer, a data scientist, you may want to have complete control over the training and prediction phases. But you probably do not want to have too much infrastructure overhead. And this is where the cloud AI platform comes in. This is Google's data science development environment. We offer AI platform notebooks. These are managed Jupiter Labs notebooks integrated with all the big data products you find in GCP. Cloud TPUs, and the newly announced Cloud TPUs pods, are hardware accelerators designed to speed up machine learning workloads for training, and prediction, and inference programmed with TensorFlow. Deep learning VM images are pre-configured GCE virtual machines for deep learning applications that use TensorFlow, PyTorch, Sidekick Learn. And it's trivial to add Cloud TPUs or GPUs to these virtual machines. So Cloud AI platform offers tools and products for probably the entire lifecycle of machine learning development if you'd like to control everything. Now, switching gears a little bit here. We've talked about storing the data. Let's talk now about processing your data at scale. Chances are your apps, web apps, mobile apps, back end apps, generate some valuable data. And you would like to turn this into insights. This takes potentially massive amounts of compute power, data processing resources in general. And the good news is that GCP is really great and amazing place to do just that. BigQuery is an amazing product to make sense of your data. Simply send your data-- as much as you like-- along with an SQL query. The data will be processed in just a few seconds, thanks to BigQuery's unique and really massive back end architecture. Cloud Dataflow is Google's implementation of the Apache Beam programming model. And it offers to process and transform massive amounts of data both in batch and in streaming modes. Cloud Dataproc is a hosted Apache Hadoop and Spark version that will spin up a fully managed cluster in less than 90 seconds. It will resize it dynamically and offer, overall, great cost performance for any Hadoop or Spark job that you have. Let's take a look at a quick BigQuery demo and move back to the demo machine, please. So 400,000 GitHub repositories, one billion files, and one question. Spaces or tabs? Well, we might have the answer today. We look at all of those files. We just ignore those there are less than 10 lines long. And for every file, we give a plus one if its tabs or a plus one if its spaces. And if it's a mix, we just decide to vote only one for whichever comes more often, spaces or tabs. This is the GitHub, the real data. And what we have in this query is the ability to run a query against that table and to actually-- for every single line in every single file-- run a regular expression, counting the number of tabs and the number of spaces, and summing up all of this. So what I suggest is we actually run this. This will process 133 gigabytes. It shouldn't take more than, say, 10, 12, 13 seconds, maybe, and hopefully give you an answer. Down to 16 seconds. 18. That's not actually bad. And so for every language, every popular language-- and we base the query on the extension of the files we found in the repos-- we calculate a ratio. Does it have more spaces or more tabs for every language? And you could see that Java tends to be more about spaces where Go is all about tabs, clearly. So there you have it. We actually know the answer to one of the most crucial and important questions we've had in this industry. Spaces. Now, the amazing thing here about the query is that we've analyzed each file, again, with a regular expression. For 133 gigabytes of code in 10-ish seconds in interactive time, as we call it. There is no need to go and grab a coffee. Or come back the next morning to get the answer. So you can iterate them on your queries. And this is what the graph looks like. And credits should all go to Felipe. And you can query this. All the details are there. You can run the query yourself. And you can look at all the data and see how it evolves through time. But the answer is spaces. So, to recap, we've talked about how Firebase is a great foray into Google Cloud Platform. We discussed virtual machines and databases to bring the software stacks that you love and know to GCP. Next, we talked about Cloud Native and serverless and how to choose the right solution. And finally, we covered adding machine learning and data processing to your apps and architectures. Before I close, let me share a few tips and tricks as you get started with GCP. Google Cloud Console is where you will likely spend a fair amount of time exploring and using the platform. This is where you configure billing accounts. You create and you manage projects. You manage all your GCP resources, regardless of the data center location. Every product and every service has its own section in the console. It has dashboards, detailed configuration, and settings. There is Cloud Identity and Access Management. It's a team of people working, or an entire group of people, so you set people up with the right permissions. And there's even a mobile app for monitoring and managing your apps and resources on the go. So while the console is super powerful, flexible, you can also do everything with a gcloud command line. So for every action in the console, there is a gcloud equivalent. So gcloud is our scriptable and almighty CLI. Cloud Shell is a shell environment hosted on GCP, and it manages your projects and resources. It's accessible from a web browser. And it's powered by a small virtual machine with persistent disk space, and up-to-date software-- Git, Docker, containers-- I mean, compilers, all of these things for all your development needs. And it even comes with a web code editor. So GCP resources are the fundamental components that make up Google Cloud Services. Typical examples include Compute Engine, virtual machines, Cloud Pub/Sub topics, Cloud Storage buckets, Cloud Functions, and so forth. And those resources can be organized into projects and folders. This means, for instance, that once you delete a project, all the resources attached to it can also be deleted, which is a great way to keep a clean environment and to keep your costs under control. So what do you need to get started? Well, the first thing you need is a Google account. You can create a new Google account, or you can use an existing one, such as your Gmail account. I would recommend that you enable billing for your project and that you sign up for the $300 free trial to get started at no charge. If you do not sign up for the free trial, you can still benefit from the fairly generous, always free tier that GCP offers. So $300 is actually quite a bit of money, and enough to kick the tires of GCP in a number of ways. You could have six VMs running for one year-- they're fairly small, but that's six of them-- or a four Node cluster of bigger VMs for a Container Engine, or GKE cluster, for three months non-stop. Or you could store ten terabytes in a multi-regional storage bucket, which is the best performing storage class, for one month. The billing section of the Cloud console is where you manage billing accounts. And you link project to those billing accounts. And a billing account is really a payment method, one or more credit cards or bank account details. You can change a billing account for a given project at any point of time, and you could set budget alerts that helps you manage cost, and set up triggered actions for projects or accounts, as well. You can also generate billing exports as well as reports to better understand your span. This is important when you start using Cloud at scale. So beyond the web console, the command line tools, GCP also comes with a number of built-in additional tools. To start with, every project comes with a private by default Get repo called Cloud Source repositories, which is free for up to five users, and 50 gigabytes of storage. So staying with resources that are private to your projects and teams, GCP also comes with container repository to store your container images. Once your container image is in the repo, that means it's on Google's network, which means, in turn, that the deployment to GKE and Cloud Run are really fast. And when it comes to building container images, or any code, for that matter, there's Cloud Build, a fully managed CI/CD platform. This includes building-- as the name implies-- but also deploying to VMs, to GKE, and to serverless products. Now, product naming, for us and for everybody, is hard. But remembering all of these names can be overwhelming for you, as well. So here's a cheat sheet with concise definitions of all products. GCP in four words or less. And this covers every product I've talked about but everyone, also, I didn't talk about. Now, as I come to the end of the session, I'd like to do a shameless plug for a series of videos on the Google Cloud YouTube channel called-- no surprise-- GCP Essentials. The goal here is to cover-- in fairly short episodes-- something that is helpful for people who are actually getting started with Google Cloud Platform. Check out the video, please. And if you like them, do subscribe. So that was a lot of ground covered. I'm leaving you with some links, including the one for the four words or less. I hope this was time well spent for you. I hope you will consider bringing the awesomeness of GCP to your existing and upcoming apps. Thank you to everybody on the live stream. And for everybody, feel free to hit me up on Twitter. Thank you very much. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 18,734
Rating: 4.778656 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate
Id: h4NJdvUcq2c
Channel Id: undefined
Length: 39min 23sec (2363 seconds)
Published: Wed May 08 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.