Getting Started with Deep Learning Models in R using Google Cloud and RStudio (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[THEME MUSIC PLAYING] MIKHAIL CHRESTKHA: Good afternoon, everyone. Thank you for joining Deep Learning in R with Google Cloud in RStudio. My name is Mikhail Chrestkha. I'm a machine learning specialist within Google Cloud's Customer Engineering team. And I'm really excited to also have Andrie here with me. He's a solutions engineer at RStudio, the co-author of the book "R for Dummies," and a very, very avid contributor to the Stack Overflow community. I think I saw were in the top 400 there, the top 0.1%. And Andrie is also a gentle reminder that I have work to do to fill the white space to the right of my picture for next time, so stretch goals for 2020 to maybe author a book. So our agenda today is really a little bit around the motivation around this session. We're going to talk about the R ML ecosystem with Google Cloud in RStudio. Andrie's going to walk you through the deep learning steps in R and sprinkle three demos. And then we're really going to close with a summary. So, why are we here today? I started using R in 2005 as an operations research student. At that time outside of academia, R was found in small pockets within the industry. I then spent seven years in data science and analytics consulting with a Big 4 firm, and I slowly saw R being evaluated and slowly adopted across data science teams. Now in 2018 as a customer engineer here at Google, and talking to a lot of IT and data science organizations, we're slowly seeing R in every single conversation when we talk about machine learning. That's my personal story. But the data backs it up. When we talk about various indices, rankings, that you Stack Overflow, GitHub, Google Trends, or various search engines, R usage continues to rise. Kaggle, which joined the Google family last year, also released a survey in 2017 where R is still the top tool of choice for business analysts, data analysts, and statisticians. Why is this so important? You'll hear the theme around democratizing ML and AI across today's session as well as the conference. And this is important because really there's only a few million data scientists and ML practitioners in the world today. There's definitely a skill-- a skill shortage. But when we talk about developers, business analysts, data analysts, statisticians, we can really expand that number to the tens of millions. We have a lot of ways around democratizing this. During Phae Phae's keynote yesterday, you saw Cloud AutoML, a way to build natural language, vision, and translation machine learning models without coding. We had the Kaggle community, which allows you access to various public data sets, best practices. Now for the coders, TensorFlow is in the center of this ecosystem. We've slowly expanded this ecosystem to include TensorFlow for mobile, TensorFlow for JavaScript to run in browsers. And today, we're very excited to bring TensorFlow for R, from RStudio to this ecosystem. Now everyone in the room here, can I get a show of hands? How many have used R before? How many of you are R users? Great. So we have a packed house here. So why should you care about deep learning? A quick primer, there's a lot of sessions around deep learning and TensorFlow here. We're going to go through some of the suggested sessions that are still available later today and tomorrow. But deep learning is another machine learning technique alongside regression, classification, and clustering. The real nuance behind this is that we have a lot of hidden layers within artificial neural networks that allow you to model a lot more complexity. Why is this fundamentally different from traditional ML techniques? The more data you give it, the better it gets. Other techniques tend to have a plateau. But with deep learning, the more data you collect, the more examples you feed it, we really are seeing breakthroughs in accuracy. And really two applications you want to think about. Number one is new domains for traditional R usage, breaking into what we call perception services-- vision, natural language, speech, when we talk about the ability to diagnose diseases within medical imagery, when we talk about identifying product quality defects on manufacturing lines, being able to classify product reviews automatically. There's a whole new use cases for that. But second, let's not forget about our traditional structure use cases. I really believe deep learning has a place for specific niches around sequential data to actually drive more value and squeeze out the actual accuracy. And then also for very heavy feature engineering, also known as veritable enrichment, deep learning can really help to speed up that as well. Now, if you're convinced about deep learning, why TensorFlow? Just a couple of quick bullets on it. It is a numerical computation library, allows you to run operations in parallel. This really allows you to distribute large machine learning training jobs across machines. As R users, were usually restrained to the RAM of a machine. With TensorFlow's framework, we really are able to now leverage big data in the machine learning space. And then finally, TensorFlow is a growing community and an ecosystem where we're open sourcing not just algorithms, but actual reference architectures that you can start using immediately without architecting these neural networks from scratch. And then the third and final piece is why Google Cloud? We want you to focus on your R code. We don't want you to worry about spinning up infrastructure. We don't want you to maintain these clusters. We really want you to focus on the code, deploy it into managed services, whether that means you're trying to store millions of images, audio files, machine logs, whether you're trying to query petabytes of data, we really want you to use Google Cloud Storage or Google BigQuery that you've heard about in other sessions as well. And then the last piece, which is really what I'm really close to is really speeding up the time to operationalize models. Through traditional data science workflow with R really treats R as a data science experimentation place, and then you need to really work with IT to productionalize it. We'll go through some examples where we're able to deploy those models directly for consumers and developers to consume your models in API form. And finally I think the most-- really the most important piece is we're excited to bring the R community to the deep learning world. Here's a great quote from JJ Allaire, who's the CEO of RStudio. And it's really the great strong foundational background in statistics and applied mathematics that the R community can bring an educated the machine learning community. I had a couple of great conversations with Andrie over the last two days. I wanted to see if you could add a couple of thoughts around this theme. ANDRIE DE VRIES: Thanks Mikhail. So yes, I think the heritage of TensorFlow has traditionally been through computer science. And the fact that we have both a full port of TensorFlow into R, and you can use the full TensorFlow library in R, that makes it accessible to people who have traditionally been probably statisticians first rather than computer scientists. And statisticians care more about-- or less about black box models and more about inference, and standard errors, and what is the uncertainty I have here? So I think there is a lot of scope for statisticians to contribute to this field, a lot of green field that we can contribute to and making this deep learning experience much more meaningful for statistics and consumers. MIKHAIL CHRESTKHA: Great. Thanks Andrie. Let's dive right in into the ML ecosystem with Google Cloud and RStudio. First, a very bird's eye view. The very first top layer, our favorite IDE, RStudio, really being able to use that in an internet browser such as Chrome. The middle layer, the R session, the interface, this can be on your local machine. This could be on a virtual machine. It could be on a cluster. But really, this is where all your R libraries are managed. And now, when we talk about extending your R toolkit, again, for cloud computing and deep learning, first, data. We talk about BigQuery, our data warehousing analytics solution that can process petabytes of data in seconds and minutes, cloud storage, working with hundreds of millions of images and log files, the modern layer TensorFlow and Keras. And then really on that last theme around minimizing time-to-market, scalability, how can I train a model very quickly as a managed service on-demand and then deploy it as an API? That's really where cloud machine learning engine fits into the picture. I'm going to drill into this a little bit, but these are just the moving parts. So let's talk about the overall reference architecture. We're starting with your development environment. Currently, RStudio Server Pro is actually available as a one-click deployment on Google Cloud Platform's new marketplace. So this really installs a-- spins up a machine that has all the pre-installations and libraries required. This is where you install TensorFlow and Keras. We also are seeing a convergence of DevOps and data science. We really want to manage code effectively. A lot of you in the audience probably use GitHub or GitLab for a lot of your code. We really have the ability to quickly set up a private Git using Google Cloud Source Repositories to manage that code. Again, we talked about being able to access data. This really minimizes the dependency on your environment. You can spin up your R environment in a laptop, in a very light Chromebook. And now you can just-- you can push all the hard, heavy lifting to these managed services on demand as needed and not have to really worry about procuring all this hardware and new servers. Now we get into the training piece, training small sandbox models, or experimental models on your local lists find to make sure your code is working. But you really want to derive the most insight and value from deep learning. I mentioned, we need more and more data in the vision space, natural language space. And this is where training Cloud Machine Learning Engine basically uploads all the required R packages and TensorFlow into a cluster of machines and runs that for you. We also have deployment and serving. Google Cloud Machine Learning Engine also has an API service for you to register your model in a central repository for your entire organization now, whether it be applications, developers, analysts, to use that model as a simple REST API call. RStudio also has a great new product, RStudio Connect, to really manage that entire ecosystem of models. And then finally, how do we consume that? And this is the great piece. This box is specifically a little bit on the gray within Google Cloud. It could be on-premise. You might have applications with App Engine. You might be using R Shiny, which is a great visualization front-end tool from RStudio as well, or mobile devices. This opens up all these models that you've built in R for consumption across the company, across your consumer products, internal processes. At this point, I'm going to hand it over to Andrie to talk about what the R libraries there are to make this possible. ANDRIE DE VRIES: That was a great introduction about why you would want to use TensorFlow as an R user. If you want to do that, you should know about a couple of packages that are available on cron, which I list here. The first one is well, bottom left, they start with TensorFlow. That's the most famous one. And the TensorFlow package on cron is actually a full wrapper around everything that's in the Python base layer in TensorFlow. Everything you can do in Python with TensorFlow you can do in R, 100% coverage. But TensorFlow itself is quite a low-level programming environment. You basically have to write some mathematical equations to make use of that. So I see some people nodding their heads. The much more sensible thing to do as a practitioner is to use Keras, which is a higher-level wrapper library around TensorFlow. Now, the Keras package also available on cron, again, is 100% coverage of the Keras library in Python. So again, everything you can do in Keras on Python you can do in R. And that's the one I would recommend you use in most of your day-to-day data science exploratory work. TF Estimators is a package that is much more targeted at a use case where you have large amounts of data, you have simple models, and you want to take that into production very quickly. That's basically the type of thing that you would use as a computer scientist, and you want to embed some machine learning into a physical device. It's unlikely as an R user you will touch TF Estimators very much. But then we also have supporting tools to make it possible to get your data into the required format. TF Data Sets gives you scalable input to pipelines. TF Runs I will talk about in a little bit more detail. It gives you a great way of running your TensorFlow experiments in a systematic way. And TF Deploy enables you to publish your train model onto either RStudio Connect or into the Google Cloud ML service. And Cloud ML is a porch or a great way of accessing the Cloud ML services on Google. And I would like to demonstrate some of that for you live. So, why would you do this? So if you're short on ideas about why you should care about this, we have some great examples on our gallery at RStudio. The classical examples of TensorFlow are for complex perceptual problems as Mikhail said earlier, so image classification, research in cancer, immunotherapy, credit card fraud detection, machine translation, these types of complex perceptual problems. And typically, people will tell you that you need very large amount of data for that to be sensible. I will actually demonstrate much more of a toy example of something that fits in my laptop very, very easily. And just to illustrate the point that you don't need to have a million images for TensorFlow to make sense. You can actually use TensorFlow on traditional machine learning problems. I'm not saying that TensorFlow is going to outperform Xgboost. Or if you have structured data, that's probably not going to be the case. But it has a place in these mixed environments. So, let's talk briefly about the steps in building a Keras model. And unsurprisingly, these steps are exactly the same steps you would take for pretty much every machine learning problem anyway. Maybe the compilation step is a bit different. I'll show some code, but here's the highlight. First of all, you define your model. Typically that'll be a sequential model where your layers follow sequentially one on the other. That is the majority of examples you'll see are layers that just sequentially follow on. But there's also a functional model that allows you to combine different neural networks if you have more complicated problems. And Keras allows you to have multiple GPUs. So you can run your code on not just a single machine, but also on classes of GPU very easily. Once you've set up the model, then there's a very simple step of compilation that compiles the code via Python into the native C++. And for that step, you'll define your optimizer, your loss function, and the metrics you want to measure. Typically, it'll be your validation accuracy or something similar. Then you will actually fit the model. Originally in R we would call this just a train. In Keras, it's called Fits. You'll do your evaluation on how well your accuracy is doing. And maybe you'll do some plots to evaluate your accuracy on several intervals. And then you'll protect either your classes or repeatability. We have a cheat sheet. And I have-- if you search for "Keras Cheat Sheet," at RStudio, you'll find it. But there's the link as well. And Chrestkha, I think it's time for a quick demo. Hopefully that works. All right. So first of all, you are actually looking at an instance of RStudio Studio Server running on a virtual machine in Google Cloud ML. So standard RStudio Server and the back end. And Mikhail and I spent some time on Monday to just install or point to NVIDIA GPU processes onto this machine. So we have GPU available. And I'm going to run some code just so you can see how the integration works and just to make sure that there's no jiggery proqui. Let me just restart my session. Hold on, clean slate. So, let me set the scene. I have a bit of code in that takes some time series data. And the data was-- originates from 15 people wearing a chest-mounted accelerometer. And this accelerometer measures acceleration in x, y, and z direction. And they were then told to do different activities-- walking, and running, and sitting, going up and down stairs, et cetera. The original task, which you can then-- you can find the data on the UCI David Machine Learning website. The original task was to predict what activity is this person doing? We've slightly flipped it in this example. And I'm saying, I know that this person is walking. From the trace from the accelerometer, can I determine which person is wearing the device? A small set, it's only 15 people. So I don't think-- I'm not claiming that this will work if you have a million people. Like this probably won't work. But in this case, a toy example, I think it's quite nice. I'm not going to run through the code right here. Mechanical steps with the code, I'm going to just run it in one consecutive session. And I want you to observe just two things. One is there will be some red text that just floats up momentarily. And if you look very carefully, it's TensorFlow communicating back to the R session saying, I'm running on a GPU. You may just see that flashing past. And then once the training starts, we have interactive visualization in the RStudio IDE. So for every epoch, for every full iteration through all of the data, it will update a plot in the IDE that gives you instantaneous feedback on what's happening. So let's see if this works. Go to Starting. There's the TensorFlow messages saying, I'm running on GPU. And now it's starting the train. And there we have the interactive plot flashing up. I think it's time to be about once every second or so it will update. And I can look at this plot very briefly. The top plot shows me my loss. And the bottom plot shows me my accuracy. Blue is my training accuracy. And green is my validation accuracy. So this is a nicely behaved model. There's no big discrepancy between validation and training. So this is a nicely behaved model. So if I now switch back into the slides, I can just briefly give you a bit more flavor on what's happening. So if you want to actually write some R code, I just want to give some pointers about things you should be careful about. The first is that we have-- there's this funny operator here. I'm not even sure what you call it. It's a reverse byte, right? See, that looks like a magrittr pipe for those of you who are familiar with magrittr or dplyr. But it points the other way. And what this operator does it gives you a way to mimic simultaneous assignments of objects in R. So this is something you can do in Python very easily. You can say, x comma y equals 1 comma 2. And you assign x and y simultaneously. This operator allows me to simultaneously sign x_train, y_train, x_test, y_test, which are embedded objects in this data set and list object. So that's the first thing you should just take note of that. The second important thing is Array Reshape. This is important in the context of Python with SSR. So you all-- pretty much everybody said I'm an R user. So you will know that in R, vectors are-- or matrices are columnar, column first. But in most other programming languages, including Python, the arrays are row primary. So you have to use Array Reshape to get your data in the format that TensorFlow understands. Do not try and use the Dim function in R. That will not work. So it just-- top pro tip, just use Array Reshape. Then the next line I have here that's interesting is dividing x_train by 255. I'm just rescaling all my values in a range 0 to 1. It's very important in TensorFlow and Keras to have your input values scaled to the same value, and you have to-- or scale for the same range. And that range should be minus 1 to plus 1, or 0 to 1. If you don't do that, you may get numerical convergence problems. So that's something that's in most R packages that that algorithm will take care for you. In Keras, you have to do a bit more work yourself. And then there are two functions, two categorical and some other functions that helps you convert your factor levels to in machine-- in the computer science world, it's called one hot encoding. In statistics, we call it dummy encoding. So you have to use this two categorical function to make that work. And next, then I'm defining a model. My model, in this case, I have just four layers-- a drop-out layer, a dense layer, and so on. I'm not going to explain that right now. You can find that in any [INAUDIBLE] tutorial. But that's fairly simple to do. Again, just notice the pipe function that we have in R. It is a very natural way to code in R. Then you have to compile. A simple step, one thing you have to be careful about here, do not assign the value back to your model. So if you're interested in the technical detail, this is because this Keras object is an R6 class. So it's doing modification by reference. If you accidentally reassign the value at this step, you'll get some very strange results. So pro tip, don't do that. And then we want-- now we're ready to train. The function is called fit. We begin to assign that object so you're not-- or the result's in optical history, which means that I can plot that, and I can inspect that. And if you simply called plots on that history object, you get a nice ggplot object with the same information that I showed you earlier in the dynamic plot. That brief very, very quick overview of how Keras works. Let me introduce you to one of the other packages. Remember the table I had earlier on the right-hand side? We have some supporting tools. TF Runs short-- or reactions TensorFlow Runs is, I think, a fantastic way to manage your experiments. And each experiment is a run. And really, the only thing you need to remember is that there's a function called Training Run, which is similar to Source in R. So in R, you would say Source and runs the entire script. If you do Training Run, it will source the entire script and file. But it will do some bookkeeping for you. It will remember every run, what the hyperparameters were that you used. What was the exact code? It will put that into a small local version control so can go back and compare. And it also captures all your output, your validation accuracy, your training accuracy, and so on. So you can easily inspect after the fact what's happened by just querying it a data frame with that information. So top tip, go and use TF Runs. And actually at this point, I can just give a very quick demo of TF Runs in practice. The file I ran through earlier was called "Walking Experiments." And I'm going to run just this one line of code, training run "Walking Experiments." And that is going to solve that function. And I'll just make sure I have all the correct libraries, the packages installed. Now observe what's happening here. It's running through the same code. You will still get your interactive training set up. I mean, this is exactly what you said earlier. But once this is done, with some luck, it will pop up a window that shows me a summary of what is in this run. And there we go. So this is a browser window that popped up that has my plots. It has all my metrics and my model specification, et cetera. And this is something I can query later. Back to Mikhail, and then I'll give some demo of it later. MIKHAIL CHRESTKHA: Great. Thanks. So so far we've covered how to really build a model and experiment maybe on a local machine. But how do you really scale and deploy? So really we're going to talk about the two components of Cloud ML Engine, the training and serving piece and dive a little bit into the code. So first of all, Cloud ML Engine again is a managed machine learning service. We are-- we're essentially also giving you on-demand access to GPUs. Andrie mentioned me and him worked together the last few days to actually install the NVIDIA P100 GPU. Our TPUs are also in beta. So now directly from your R console, you'll be able to access TPUs through this interface. When we talk about training, what does that really mean? What we're doing is we have a cloudml_train function. That's really taking all the R code, uploading it into our cluster of servers, installing all the dependencies, and now using the cloud for massive scale there. Andrie also talked about TF Runs to really create a systematic approach around experimenting. And really that concept around champion versus challenger models, and keep track of all those. And that's traditionally using grid search techniques. Another value of Cloud ML Engine is we actually have hyperparameter tuning using Bayesian optimization that does it automatically for you with an input file where you give it some guidelines on what evaluation metric you want to maximize or minimize. So you can see it's fairly simple. You package everything into a .R file. Now on the other side of it really is around how do you deploy these models? Again, a few simple SDK functions, exporting the saved model. The one great thing about TensorFlow models, also they are language and platform agnostic. They're binary files that can be consumed by any type of libraries and converted into REST APIs. So in this case, we're going to use the Deploy function to really publish or register this model into a cloud registry. And now a number of developers and analysts can now consume it as a REST API call, whether that be through R-- in this case, we use the Predict function-- but it could be a Python developer, a Java application, a mobile application that all consume the same model with the appropriate input provided and the response provided in the appropriate format. So we're going to now jump into an actual demo around Cloud Machine Learning Engine, how that looks like, and really open up the Google Cloud Platform console around monitoring those scalable machine learning training jobs and what may be a portfolio of models look like managed in a central location. ANDRIE DE VRIES: Thanks Mikhail. I'm back in my RStudio Server session on the VM as we discussed earlier. But as Mikhail suggested, the point of cloud ML as a service is that I can send my models over to Cloud ML for training or for hyperparameter tuning. And we have a package called-- wait for it-- Cloud ML that gives you really great integration to do exactly that. So I'm going to step you through some of the functions to do that. Let me just make sure I'm in the right place. So I'm in full Cloud ML. The configuration of this package is actually very straightforward. Once you've installed Cloud ML from cron, library Cloud ML, and then there's a function gcloud_install. This will install your Cloud ML SDK on the machine you are working with. It will then-- once installation is done, it will step you through an interactive session where you authenticate in your browser to your Cloud ML session where you can specify which workspace I'm using, et cetera. So you can cache your credentials on any machine that you're using. So I did that last night. I didn't enough to do that again. So I can simply proceed to training my models. So, I'm setting my working directory. I'm loading the Cloud ML package and now cloudml_train. And as you can see there, this is actually familiar because you've already seen TF Runs earlier where the concept is that you're not stepping through your code directly. You're submitting your script file that contains your model to TF Runs. In this case, you're submitting that same file or similar file to the Cloud ML service. The only thing that's different now is that I'm specifying in this case as a user, I'm going to use a standard GPU. But you can use bigger machines. And I'm specifying that I have a configuration in a YAML file for tuning .yml. So, let's have a very quick look at what working Cloud ML looks like and what that YAML file looks like. So, working Cloud ML in R is pretty much very similar to the file I had earlier. I have my file definitions, and then I have some layers. I have an evolution layer, then a maximum pulling layer, then a drop out layer and so on. But at the top of this script, I have-- I've set up an optical flags. And basically I'm saying, create a ref-on object called convolution 1 filters and give it the value of 16. And just a little bit lower down in my actual script, I can find that. There we go, convolution 1 flags, dollar convolutional 1 filters. So when the code gets to this point, it will look up the value of the flag. So it's very simple. I've set up a list of some values. And I'm just now referring to those values in the list. So, so far, so-- so what, right? If I just run the script as is, it will take that value the flag and just use it. But, if I use cloudml_train with this tuning at YAML file, then some magic happens. This YAML file is not particularly difficult to decipher. I have some hyperparameter information at the top. And I'm telling it to run for 25 trials. So run 25 models. Run three of those models in parallel. But then I have my parameter called conv 1 filters, which is exactly the same flag I've just looked at. I'd say that's a discrete value and do a grid search of this value with increments that I specify-- 16, 32, 64, 128, et cetera. So basically what you can see here is that I'm setting up a grid that Cloud ML will search through. So this is not quite a random grid search because in Cloud ML, you get the benefit of some Bayesian optimization. You will typically see that over time, the models, the candidates get better. But that's just what's going to happen is I'm going to run for 25 trials on the Cloud ML service and sampling some combination of these flags every single time. Now, before I press the Go button, I want to show you, if you're not seen this before, the Cloud ML interface on Google. So this is the project that Mikhail and I have been working on. You can see it a history. We had some failed experiments. They will show up in red. I can click through into a log to try and understand what went wrong in this case. Typically it was because I misspelled some variable or I didn't set something up properly. And then I started to have some runs that were successful. And the last two were successful. So I think if I kick off another job right now, it should just work with some luck. OK, so let's try it. And what I want you to observe is that first of all, there is going to be some feedback directly in the console. And then once the job starts running, I'll get feedback not in the console, but in the terminal window in RStudio. So the terminal window, this is-- I think this is the result of a previous job. Let's see what happens. So the terminal window gives me a view on to Linux that's running on this machine. OK, so I'm submitting. And this it will take just a few seconds for Cloud ML to respond. And then I should start getting information and instructions on what to go and do next. And while we're waiting-- there we go. It responded saying with some information. But did you notice that it switched to the terminal window where I'm now collecting results from the Cloud ML logs? So there we go. And these logs I can go in-- it's the same log that I can now go and inspect online. This shows you the integration. If I just click back to the console, it tells me that this is my job number. Ends in 008. I can go to this URL to inspect what's happening on my job or in the logs. And I can run this command in R to figure out-- to find out what's happening. So I also notice that R itself is not blocked. So this is job running in Cloud ML, but I can do my normal R code. And I get an answer straight away. So if I switch back to the Cloud ML console, and I just click Refresh, with some luck I'll get that job number ending in 008. There we go. And it says it's just running for a minute. And I can go and view the logs. And these are the same logs I just showed you earlier. So once that-- once that wakes up, it just takes a minute normally. There we go. I can-- there we go. It says now-- it's queued, and is waiting to be provisioned. And this job from experience I know will-- it's pretty much the same one I had the second job. This will now continue to run for the next hour and a half. And the beauty of this way of interacting with Cloud ML is that this is a service. It's not a virtual machine. I had zero insulation on this machine. I did not have to go and configure, install anything at all. And the Cloud ML package in R will discover all my dependencies. I'm using dplyr. I'm using Keras. It will discover those packages. It will use Packrat to get the corresponding package and install those on the Cloud ML Service. At the end of the run, that machine will come back with results. And the clocks stop, so I'm not getting charged any more for that machine running. That's Cloud ML. Let's have a look at if I actually copy this instruction job status and tell R-- OK, tell me what's happening there. I can inspect what's happening. There we go. I get back an R list that tells me information about when this job was started, all the values it's going to choose, and also tells you where you are. So that's about as much as I want to do about demo. MIKHAIL CHRESTKHA: Yeah. I wanted to show the Cloud Machine Learning Model. ANDRIE DE VRIES: Ah, good point, yes. MIKHAIL CHRESTKHA: Yeah. Sorry, can you switch over that? ANDRIE DE VRIES: Yes. MIKHAIL CHRESTKHA: Yeah, the one thing I wanted to talk about-- and this is a topic I'm passionate about-- again, deploying those models for consumption. Again, the UI from the-- oops-- from the Google Cloud Platform, it's very simple. But the idea is once you register those models, they're available in a central repository for your organization to manage and maintain. Again, you can follow your own taxonomy. Here, I've just published a few sample models. They might be relevant to certain functions, certain business lines. I have a couple here that maybe the digital marketing team has a model to predict click-through rates. We have the computer vision department that's trying to recognize the image of the product catalog, some text classification. So all these models are now available for REST API consumption. I ran a loop to actually-- oh, sorry-- to actually see-- oops. I ran a couple of calls to the image classification model last night. And what this allows you to do from a monitoring standpoint is maybe take a look at the last 12 hours. And it really gives you a lot of information around, how often is the model being used by your applications, by end users? What does the predictions per second? From a performance standpoint, also being able to track latency and issues, along with the logging, to have a nice way to see-- you can see, a lot of times we have a pretty low latency. But there are some peaks during the night here when I ran the loop where that really could be a problem. You can debug and make sure these predictions are serving correctly whether you need real-time predictions or at batch. I think we wanted to show the Shiny examples as a front end, and then we're going to wrap it up. ANDRIE DE VRIES: Thanks, Mikhail. OK, so as Mikhail suggested, we want to deploy this. And he said earlier, you can deploy these models as APIs in the service. You can do the same thing, you can deploy TensorFlow models as an API in RStudio Connect, which is an [INAUDIBLE] product that gives you a publication platform to publish your Shiny apps or monitor reports, et cetera. And I want to show you one example of that in RStudio Connect live. This is a small toy example I wrote. It uses a pre-trained Keras model. So I didn't do any training myself. I used a publicly available model that is fairly sophisticated, not-- it's a number of years old. You can get better models these days. And what it does is I can tell it to upload a small image perhaps of my dog. And then once the image is uploaded, it will try and tell me what's there. It thinks my dog is a Malamute, or a Collie, maybe an Eskimo dog. Let's tell it to give me more categories, 10 categories. It definitely thinks it's some kind of dog. It's none of those. It's actually Finnish Lapponian. But if you know what a Malamute looks like, it's a pretty good guess. The code itself is very simple. It's only about 50 lines of code in total. And the key element of that code is in this line here. The application raised at 50 is both in function and Keras. Keras comes with some pre-trained models, including [INAUDIBLE],, and exception, and exception, and mobile net. And all I did was to say use that model and just score on my own image. So there's my scoring function, the product image. It's basically-- it's just model then predict, and then I decode the predictions. And that plot prediction, that's the bar chart. And that's it. That's the entire application If I upload a second image maybe of my cat, let's see what it thinks. Well, it thinks it's some kind of cat-- Tiger cat, or Egyptian cat. I think it's just a Tabby cat really. [ANDRIE LAUGHS] So this is a little toy example just to illustrate that you can use TensorFlow in your Shiny app. And actually in this case, I'm just using CPU in the background. I don't even have a GPU to serve the scoring function. So you may get away with not having a very sophisticated machine. I see some laughs. That's probably because it thinks, well, maybe it's bucket or a plastic bag. [ANDRIE LAUGHS] Go figure. The perils of deep learning, don't assume that these things are intelligent, they're not, right? They will tell you based on what your training data set was. MIKHAIL CHRESTKHA: So just to close out, I think-- really take away, you as the R community now have a set of libraries both to access TensorFlow, Keras, scale and deploy using Google Cloud. I'm really in deep learning as a new toolkit in your chest really allows you to open up new applications, tackle new domains and challenging business problems. And what we're most excited again, is what you bring to the broader ML community with applied math, the [INAUDIBLE] background, to really learn-- to teach the broader community how to build ML models more effectively. A couple of things, I know RStudio has a booth in Moscone West on level 2. We have a couple of folks there. You can stop by. The Keras cheat sheet is available. We're also excited to announce a new Kaggle competition. This is in partnership with Google Cloud and RStudio. So give it a shot. That'll be published very soon where you can try out different techniques, including TensorFlow. We mentioned RStudio Server Pro is a one-click deployment that is available on Google Cloud Platform's Marketplace. And a couple of books, again, deep learning for R is something that I really found very useful in my journey of relearning. Now just a couple of suggestions. I think for those who maybe want a little bit more exposure to the new domains around deep learning, there's a great computer vision session around satellite imagery with one of our customers. Andrie touched upon this, which is a great question from the community around deep learning, versus support vector machines, versus Xgboost. There's a great session around scikit-learn and Xgboost where it talks a little bit about the trade-offs around deep learning as well as our traditional statistical techniques. There's two encore sessions that I very much recommend for folks who want a little bit of a deep dive into TensorFlow, TensorFlow, Deep Learning, and Convolutional Neural Nets without a PhD, that was packed. A lot of people couldn't get in yesterday. I would definitely recommend seeing that. And if you're interested in the broader ML/AI spectrum within Google Cloud, really From Zero to ML on Google Cloud Platform, everything from REST APIs that you can access as an analyst or a developer, all the way to really deploying and coding in TensorFlow-- TensorFlow there. So hopefully you get a notification around surveys for this. Please fill them out and provide us great feedback. And we're lucky where we have a lunch upcoming, so anyone who wants to stay back and ask us questions, we'll be here for the next 30 minutes. Thank you everyone. [THEME MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 6,771
Rating: 5 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: y6vPAe9Z7QI
Channel Id: undefined
Length: 46min 34sec (2794 seconds)
Published: Wed Jul 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.