[THEME MUSIC PLAYING] MIKHAIL CHRESTKHA: Good
afternoon, everyone. Thank you for joining
Deep Learning in R with Google Cloud in RStudio. My name is Mikhail Chrestkha. I'm a machine
learning specialist within Google Cloud's
Customer Engineering team. And I'm really excited to
also have Andrie here with me. He's a solutions
engineer at RStudio, the co-author of the
book "R for Dummies," and a very, very avid
contributor to the Stack Overflow community. I think I saw were in the
top 400 there, the top 0.1%. And Andrie is also
a gentle reminder that I have work to do to fill
the white space to the right of my picture for next time,
so stretch goals for 2020 to maybe author a book. So our agenda today
is really a little bit around the motivation
around this session. We're going to talk
about the R ML ecosystem with Google Cloud in RStudio. Andrie's going to walk you
through the deep learning steps in R and sprinkle three demos. And then we're really going
to close with a summary. So, why are we here today? I started using R in 2005 as
an operations research student. At that time
outside of academia, R was found in small
pockets within the industry. I then spent seven years in
data science and analytics consulting with a Big
4 firm, and I slowly saw R being evaluated
and slowly adopted across data science teams. Now in 2018 as a customer
engineer here at Google, and talking to a lot of IT and
data science organizations, we're slowly seeing R in
every single conversation when we talk about machine learning. That's my personal story. But the data backs it up. When we talk about
various indices, rankings, that you Stack Overflow,
GitHub, Google Trends, or various search engines,
R usage continues to rise. Kaggle, which joined the
Google family last year, also released a survey
in 2017 where R is still the top tool of choice for
business analysts, data analysts, and statisticians. Why is this so important? You'll hear the theme
around democratizing ML and AI across today's session
as well as the conference. And this is important
because really there's only a few million data
scientists and ML practitioners in the world today. There's definitely a skill-- a skill shortage. But when we talk about
developers, business analysts, data analysts, statisticians,
we can really expand that number to the tens of millions. We have a lot of ways
around democratizing this. During Phae Phae's
keynote yesterday, you saw Cloud AutoML, a way
to build natural language, vision, and translation machine
learning models without coding. We had the Kaggle
community, which allows you access to
various public data sets, best practices. Now for the coders,
TensorFlow is in the center of this ecosystem. We've slowly expanded
this ecosystem to include TensorFlow
for mobile, TensorFlow for JavaScript
to run in browsers. And today, we're very excited
to bring TensorFlow for R, from RStudio to this ecosystem. Now everyone in the room here,
can I get a show of hands? How many have used R before? How many of you are R users? Great. So we have a packed house here. So why should you care
about deep learning? A quick primer, there's a lot
of sessions around deep learning and TensorFlow here. We're going to go through some
of the suggested sessions that are still available
later today and tomorrow. But deep learning
is another machine learning technique alongside
regression, classification, and clustering. The real nuance
behind this is that we have a lot of
hidden layers within artificial neural
networks that allow you to model a lot more complexity. Why is this fundamentally
different from traditional ML techniques? The more data you give
it, the better it gets. Other techniques tend
to have a plateau. But with deep learning,
the more data you collect, the more examples you
feed it, we really are seeing breakthroughs
in accuracy. And really two applications
you want to think about. Number one is new domains
for traditional R usage, breaking into what we
call perception services-- vision, natural
language, speech, when we talk about the
ability to diagnose diseases within medical imagery, when we
talk about identifying product quality defects on
manufacturing lines, being able to classify
product reviews automatically. There's a whole new
use cases for that. But second, let's not forget
about our traditional structure use cases. I really believe
deep learning has a place for specific niches
around sequential data to actually drive more
value and squeeze out the actual accuracy. And then also for very heavy
feature engineering, also known as veritable enrichment,
deep learning can really help to
speed up that as well. Now, if you're convinced about
deep learning, why TensorFlow? Just a couple of
quick bullets on it. It is a numerical computation
library, allows you to run operations in parallel. This really allows you to
distribute large machine learning training
jobs across machines. As R users, were
usually restrained to the RAM of a machine. With TensorFlow's
framework, we really are able to now leverage big
data in the machine learning space. And then finally, TensorFlow
is a growing community and an ecosystem where
we're open sourcing not just algorithms,
but actual reference architectures that you can
start using immediately without architecting these
neural networks from scratch. And then the third and final
piece is why Google Cloud? We want you to focus
on your R code. We don't want you to worry about
spinning up infrastructure. We don't want you to
maintain these clusters. We really want you
to focus on the code, deploy it into managed
services, whether that means you're trying to
store millions of images, audio files, machine
logs, whether you're trying to query
petabytes of data, we really want you to use
Google Cloud Storage or Google BigQuery that you've heard
about in other sessions as well. And then the last
piece, which is really what I'm really close to is
really speeding up the time to operationalize models. Through traditional data
science workflow with R really treats R as a data
science experimentation place, and then you need to really work
with IT to productionalize it. We'll go through
some examples where we're able to deploy those
models directly for consumers and developers to consume
your models in API form. And finally I think the most-- really the most
important piece is we're excited to bring the R
community to the deep learning world. Here's a great quote
from JJ Allaire, who's the CEO of RStudio. And it's really the great
strong foundational background in statistics and
applied mathematics that the R community
can bring an educated the machine learning community. I had a couple of
great conversations with Andrie over
the last two days. I wanted to see if you could
add a couple of thoughts around this theme. ANDRIE DE VRIES: Thanks Mikhail. So yes, I think the
heritage of TensorFlow has traditionally been
through computer science. And the fact that we have both a
full port of TensorFlow into R, and you can use the full
TensorFlow library in R, that makes it accessible to
people who have traditionally been probably
statisticians first rather than computer scientists. And statisticians
care more about-- or less about black
box models and more about inference,
and standard errors, and what is the
uncertainty I have here? So I think there
is a lot of scope for statisticians to
contribute to this field, a lot of green field that
we can contribute to and making this deep learning
experience much more meaningful for statistics and consumers. MIKHAIL CHRESTKHA: Great. Thanks Andrie. Let's dive right in into the
ML ecosystem with Google Cloud and RStudio. First, a very bird's eye view. The very first top
layer, our favorite IDE, RStudio, really
being able to use that in an internet
browser such as Chrome. The middle layer, the R
session, the interface, this can be on your local machine. This could be on
a virtual machine. It could be on a cluster. But really, this is where all
your R libraries are managed. And now, when we talk about
extending your R toolkit, again, for cloud computing and
deep learning, first, data. We talk about BigQuery,
our data warehousing analytics solution that can
process petabytes of data in seconds and minutes, cloud
storage, working with hundreds of millions of images and
log files, the modern layer TensorFlow and Keras. And then really
on that last theme around minimizing
time-to-market, scalability, how can I train a
model very quickly as a managed service on-demand
and then deploy it as an API? That's really where cloud
machine learning engine fits into the picture. I'm going to drill
into this a little bit, but these are just
the moving parts. So let's talk about the
overall reference architecture. We're starting with your
development environment. Currently, RStudio
Server Pro is actually available as a one-click
deployment on Google Cloud Platform's new marketplace. So this really installs a-- spins up a machine that has
all the pre-installations and libraries required. This is where you install
TensorFlow and Keras. We also are seeing a convergence
of DevOps and data science. We really want to
manage code effectively. A lot of you in the
audience probably use GitHub or GitLab
for a lot of your code. We really have the
ability to quickly set up a private Git using Google
Cloud Source Repositories to manage that code. Again, we talked about
being able to access data. This really minimizes the
dependency on your environment. You can spin up your R
environment in a laptop, in a very light Chromebook. And now you can just-- you can push all the
hard, heavy lifting to these managed services
on demand as needed and not have to really
worry about procuring all this hardware and new servers. Now we get into
the training piece, training small sandbox
models, or experimental models on your local lists find to
make sure your code is working. But you really want to derive
the most insight and value from deep learning. I mentioned, we need more and
more data in the vision space, natural language space. And this is where training
Cloud Machine Learning Engine basically uploads
all the required R packages and TensorFlow
into a cluster of machines and runs that for you. We also have
deployment and serving. Google Cloud Machine
Learning Engine also has an API service for
you to register your model in a central repository for
your entire organization now, whether it be applications,
developers, analysts, to use that model as a
simple REST API call. RStudio also has a great new
product, RStudio Connect, to really manage that
entire ecosystem of models. And then finally, how
do we consume that? And this is the great piece. This box is specifically
a little bit on the gray within Google Cloud. It could be on-premise. You might have applications
with App Engine. You might be using
R Shiny, which is a great visualization
front-end tool from RStudio as well, or mobile devices. This opens up all these
models that you've built in R for consumption
across the company, across your consumer
products, internal processes. At this point, I'm going
to hand it over to Andrie to talk about what
the R libraries there are to make this possible. ANDRIE DE VRIES: That
was a great introduction about why you would want to
use TensorFlow as an R user. If you want to do that, you
should know about a couple of packages that are available
on cron, which I list here. The first one is
well, bottom left, they start with TensorFlow. That's the most famous one. And the TensorFlow
package on cron is actually a full
wrapper around everything that's in the Python
base layer in TensorFlow. Everything you can do in
Python with TensorFlow you can do in R, 100% coverage. But TensorFlow itself is
quite a low-level programming environment. You basically have to write
some mathematical equations to make use of that. So I see some people
nodding their heads. The much more sensible thing
to do as a practitioner is to use Keras, which
is a higher-level wrapper library around TensorFlow. Now, the Keras package also
available on cron, again, is 100% coverage of the
Keras library in Python. So again, everything you
can do in Keras on Python you can do in R. And that's
the one I would recommend you use in most of your day-to-day
data science exploratory work. TF Estimators is a package that
is much more targeted at a use case where you have
large amounts of data, you have simple models,
and you want to take that into production very quickly. That's basically
the type of thing that you would use as
a computer scientist, and you want to embed
some machine learning into a physical device. It's unlikely as
an R user you will touch TF Estimators very much. But then we also
have supporting tools to make it possible to get your
data into the required format. TF Data Sets gives you
scalable input to pipelines. TF Runs I will talk about
in a little bit more detail. It gives you a great way
of running your TensorFlow experiments in a systematic way. And TF Deploy enables you
to publish your train model onto either RStudio Connect
or into the Google Cloud ML service. And Cloud ML is a
porch or a great way of accessing the Cloud
ML services on Google. And I would like to demonstrate
some of that for you live. So, why would you do this? So if you're short on ideas
about why you should care about this, we have
some great examples on our gallery at RStudio. The classical
examples of TensorFlow are for complex
perceptual problems as Mikhail said earlier, so
image classification, research in cancer, immunotherapy, credit
card fraud detection, machine translation, these types of
complex perceptual problems. And typically,
people will tell you that you need very
large amount of data for that to be sensible. I will actually
demonstrate much more of a toy example of something
that fits in my laptop very, very easily. And just to illustrate
the point that you don't need to have a million
images for TensorFlow to make sense. You can actually use TensorFlow
on traditional machine learning problems. I'm not saying that TensorFlow
is going to outperform Xgboost. Or if you have structured
data, that's probably not going to be the case. But it has a place in
these mixed environments. So, let's talk briefly
about the steps in building a Keras model. And unsurprisingly,
these steps are exactly the same steps you would take
for pretty much every machine learning problem anyway. Maybe the compilation
step is a bit different. I'll show some code, but
here's the highlight. First of all, you
define your model. Typically that'll be
a sequential model where your layers follow
sequentially one on the other. That is the majority
of examples you'll see are layers that just
sequentially follow on. But there's also
a functional model that allows you to combine
different neural networks if you have more
complicated problems. And Keras allows you
to have multiple GPUs. So you can run your code on
not just a single machine, but also on classes
of GPU very easily. Once you've set up the model,
then there's a very simple step of compilation that compiles the
code via Python into the native C++. And for that step, you'll
define your optimizer, your loss function, and the metrics
you want to measure. Typically, it'll be
your validation accuracy or something similar. Then you will actually
fit the model. Originally in R we would
call this just a train. In Keras, it's called Fits. You'll do your evaluation on
how well your accuracy is doing. And maybe you'll do some plots
to evaluate your accuracy on several intervals. And then you'll protect either
your classes or repeatability. We have a cheat sheet. And I have-- if you search
for "Keras Cheat Sheet," at RStudio, you'll find it. But there's the link as well. And Chrestkha, I think
it's time for a quick demo. Hopefully that works. All right. So first of all,
you are actually looking at an instance of
RStudio Studio Server running on a virtual machine
in Google Cloud ML. So standard RStudio
Server and the back end. And Mikhail and I spent
some time on Monday to just install or point
to NVIDIA GPU processes onto this machine. So we have GPU available. And I'm going to run
some code just so you can see how the
integration works and just to make sure that
there's no jiggery proqui. Let me just restart my session. Hold on, clean slate. So, let me set the scene. I have a bit of code in that
takes some time series data. And the data was-- originates from
15 people wearing a chest-mounted accelerometer. And this accelerometer
measures acceleration in x, y, and z direction. And they were then told to
do different activities-- walking, and
running, and sitting, going up and down
stairs, et cetera. The original task,
which you can then-- you can find the data on the UCI
David Machine Learning website. The original task
was to predict what activity is this person doing? We've slightly flipped
it in this example. And I'm saying, I know that
this person is walking. From the trace from
the accelerometer, can I determine which person
is wearing the device? A small set, it's
only 15 people. So I don't think-- I'm not claiming
that this will work if you have a million people. Like this probably won't work. But in this case, a toy example,
I think it's quite nice. I'm not going to run
through the code right here. Mechanical steps
with the code, I'm going to just run it in
one consecutive session. And I want you to
observe just two things. One is there will be some
red text that just floats up momentarily. And if you look
very carefully, it's TensorFlow communicating
back to the R session saying, I'm running on a GPU. You may just see
that flashing past. And then once the
training starts, we have interactive
visualization in the RStudio IDE. So for every epoch, for
every full iteration through all of the data, it
will update a plot in the IDE that gives you instantaneous
feedback on what's happening. So let's see if this works. Go to Starting. There's the TensorFlow messages
saying, I'm running on GPU. And now it's starting the train. And there we have the
interactive plot flashing up. I think it's time to be
about once every second or so it will update. And I can look at this
plot very briefly. The top plot shows me my loss. And the bottom plot
shows me my accuracy. Blue is my training accuracy. And green is my
validation accuracy. So this is a nicely
behaved model. There's no big discrepancy
between validation and training. So this is a nicely
behaved model. So if I now switch
back into the slides, I can just briefly give
you a bit more flavor on what's happening. So if you want to actually
write some R code, I just want to give some
pointers about things you should be careful about. The first is that we have-- there's this funny
operator here. I'm not even sure
what you call it. It's a reverse byte, right? See, that looks like a
magrittr pipe for those of you who are familiar with
magrittr or dplyr. But it points the other way. And what this operator
does it gives you a way to mimic simultaneous
assignments of objects in R. So this is something you can
do in Python very easily. You can say, x comma
y equals 1 comma 2. And you assign x and
y simultaneously. This operator allows me
to simultaneously sign x_train, y_train,
x_test, y_test, which are embedded objects in this
data set and list object. So that's the first thing you
should just take note of that. The second important
thing is Array Reshape. This is important in the
context of Python with SSR. So you all-- pretty much
everybody said I'm an R user. So you will know that
in R, vectors are-- or matrices are
columnar, column first. But in most other
programming languages, including Python, the
arrays are row primary. So you have to use Array Reshape
to get your data in the format that TensorFlow understands. Do not try and use
the Dim function in R. That will not work. So it just-- top pro tip,
just use Array Reshape. Then the next line I have
here that's interesting is dividing x_train by 255. I'm just rescaling all my
values in a range 0 to 1. It's very important in
TensorFlow and Keras to have your input values
scaled to the same value, and you have to-- or
scale for the same range. And that range should be
minus 1 to plus 1, or 0 to 1. If you don't do that, you
may get numerical convergence problems. So that's something
that's in most R packages that that algorithm
will take care for you. In Keras, you have to do
a bit more work yourself. And then there are two
functions, two categorical and some other functions that
helps you convert your factor levels to in machine-- in
the computer science world, it's called one hot encoding. In statistics, we call
it dummy encoding. So you have to use
this two categorical function to make that work. And next, then I'm
defining a model. My model, in this
case, I have just four layers-- a drop-out layer,
a dense layer, and so on. I'm not going to
explain that right now. You can find that in any
[INAUDIBLE] tutorial. But that's fairly simple to do. Again, just notice the pipe
function that we have in R. It is a very natural way to code
in R. Then you have to compile. A simple step, one thing you
have to be careful about here, do not assign the value
back to your model. So if you're interested
in the technical detail, this is because this Keras
object is an R6 class. So it's doing
modification by reference. If you accidentally reassign
the value at this step, you'll get some very
strange results. So pro tip, don't do that. And then we want-- now
we're ready to train. The function is called fit. We begin to assign that
object so you're not-- or the result's in
optical history, which means that I can plot that,
and I can inspect that. And if you simply called
plots on that history object, you get a nice ggplot object
with the same information that I showed you earlier
in the dynamic plot. That brief very, very quick
overview of how Keras works. Let me introduce you to
one of the other packages. Remember the table I had
earlier on the right-hand side? We have some supporting tools. TF Runs short-- or
reactions TensorFlow Runs is, I think, a fantastic way
to manage your experiments. And each experiment is a run. And really, the only
thing you need to remember is that there's a function
called Training Run, which is similar to Source in R.
So in R, you would say Source and runs the entire script. If you do Training Run, it
will source the entire script and file. But it will do some
bookkeeping for you. It will remember every run,
what the hyperparameters were that you used. What was the exact code? It will put that into a
small local version control so can go back and compare. And it also captures all
your output, your validation accuracy, your training
accuracy, and so on. So you can easily inspect
after the fact what's happened by just
querying it a data frame with that information. So top tip, go and use TF Runs. And actually at this
point, I can just give a very quick demo
of TF Runs in practice. The file I ran
through earlier was called "Walking Experiments." And I'm going to run just this
one line of code, training run "Walking Experiments." And that is going to
solve that function. And I'll just make sure I have
all the correct libraries, the packages installed. Now observe what's
happening here. It's running through
the same code. You will still get your
interactive training set up. I mean, this is exactly
what you said earlier. But once this is
done, with some luck, it will pop up a window that
shows me a summary of what is in this run. And there we go. So this is a browser window that
popped up that has my plots. It has all my metrics and my
model specification, et cetera. And this is something
I can query later. Back to Mikhail, and then I'll
give some demo of it later. MIKHAIL CHRESTKHA: Great. Thanks. So so far we've covered
how to really build a model and experiment maybe
on a local machine. But how do you really
scale and deploy? So really we're going
to talk about the two components of Cloud ML
Engine, the training and serving piece and dive
a little bit into the code. So first of all,
Cloud ML Engine again is a managed machine
learning service. We are-- we're essentially also
giving you on-demand access to GPUs. Andrie mentioned me
and him worked together the last few days to actually
install the NVIDIA P100 GPU. Our TPUs are also in beta. So now directly
from your R console, you'll be able to access
TPUs through this interface. When we talk about training,
what does that really mean? What we're doing is we have
a cloudml_train function. That's really taking
all the R code, uploading it into our
cluster of servers, installing all the dependencies,
and now using the cloud for massive scale there. Andrie also talked about
TF Runs to really create a systematic approach
around experimenting. And really that
concept around champion versus challenger models,
and keep track of all those. And that's traditionally
using grid search techniques. Another value of Cloud
ML Engine is we actually have hyperparameter tuning
using Bayesian optimization that does it automatically for
you with an input file where you give it some
guidelines on what evaluation metric you want to
maximize or minimize. So you can see
it's fairly simple. You package everything
into a .R file. Now on the other
side of it really is around how do you
deploy these models? Again, a few simple
SDK functions, exporting the saved model. The one great thing
about TensorFlow models, also they are language
and platform agnostic. They're binary files
that can be consumed by any type of libraries and
converted into REST APIs. So in this case, we're
going to use the Deploy function to really publish
or register this model into a cloud registry. And now a number of
developers and analysts can now consume it
as a REST API call, whether that be through R-- in this case, we use
the Predict function-- but it could be a Python
developer, a Java application, a mobile application that
all consume the same model with the appropriate
input provided and the response provided
in the appropriate format. So we're going to now jump into
an actual demo around Cloud Machine Learning Engine, how
that looks like, and really open up the Google
Cloud Platform console around monitoring those scalable
machine learning training jobs and what may be a
portfolio of models look like managed in a
central location. ANDRIE DE VRIES: Thanks Mikhail. I'm back in my RStudio
Server session on the VM as we discussed earlier. But as Mikhail suggested,
the point of cloud ML as a service is that
I can send my models over to Cloud ML for training or
for hyperparameter tuning. And we have a package called-- wait for it-- Cloud
ML that gives you really great integration
to do exactly that. So I'm going to step you
through some of the functions to do that. Let me just make sure
I'm in the right place. So I'm in full Cloud ML. The configuration
of this package is actually very
straightforward. Once you've installed
Cloud ML from cron, library Cloud ML,
and then there's a function gcloud_install. This will install
your Cloud ML SDK on the machine you
are working with. It will then-- once
installation is done, it will step you through an
interactive session where you authenticate in your
browser to your Cloud ML session where you can specify which
workspace I'm using, et cetera. So you can cache your
credentials on any machine that you're using. So I did that last night. I didn't enough
to do that again. So I can simply proceed
to training my models. So, I'm setting my
working directory. I'm loading the Cloud ML
package and now cloudml_train. And as you can see there,
this is actually familiar because you've already
seen TF Runs earlier where the concept is that you're
not stepping through your code directly. You're submitting
your script file that contains your model to TF Runs. In this case, you're submitting
that same file or similar file to the Cloud ML service. The only thing
that's different now is that I'm specifying
in this case as a user, I'm going to use a standard GPU. But you can use bigger machines. And I'm specifying that I have
a configuration in a YAML file for tuning .yml. So, let's have a very
quick look at what working Cloud ML looks like and
what that YAML file looks like. So, working Cloud ML
in R is pretty much very similar to the
file I had earlier. I have my file definitions,
and then I have some layers. I have an evolution layer,
then a maximum pulling layer, then a drop out layer and so on. But at the top of
this script, I have-- I've set up an optical flags. And basically I'm saying,
create a ref-on object called convolution 1 filters and
give it the value of 16. And just a little bit lower
down in my actual script, I can find that. There we go,
convolution 1 flags, dollar convolutional 1 filters. So when the code
gets to this point, it will look up the
value of the flag. So it's very simple. I've set up a list
of some values. And I'm just now referring
to those values in the list. So, so far, so-- so what, right? If I just run the
script as is, it will take that value the
flag and just use it. But, if I use cloudml_train
with this tuning at YAML file, then some magic happens. This YAML file is
not particularly difficult to decipher. I have some hyperparameter
information at the top. And I'm telling it
to run for 25 trials. So run 25 models. Run three of those
models in parallel. But then I have my
parameter called conv 1 filters, which
is exactly the same flag I've just looked at. I'd say that's a
discrete value and do a grid search of this value
with increments that I specify-- 16, 32, 64, 128, et cetera. So basically what
you can see here is that I'm setting up a
grid that Cloud ML will search through. So this is not quite
a random grid search because in Cloud ML, you get
the benefit of some Bayesian optimization. You will typically see
that over time, the models, the candidates get better. But that's just
what's going to happen is I'm going to run for 25
trials on the Cloud ML service and sampling some combination of
these flags every single time. Now, before I press the Go
button, I want to show you, if you're not seen this
before, the Cloud ML interface on Google. So this is the project
that Mikhail and I have been working on. You can see it a history. We had some failed experiments. They will show up in red. I can click through into a
log to try and understand what went wrong in this case. Typically it was because
I misspelled some variable or I didn't set
something up properly. And then I started to have
some runs that were successful. And the last two
were successful. So I think if I kick off
another job right now, it should just work
with some luck. OK, so let's try it. And what I want you to
observe is that first of all, there is going to be
some feedback directly in the console. And then once the
job starts running, I'll get feedback
not in the console, but in the terminal
window in RStudio. So the terminal
window, this is-- I think this is the
result of a previous job. Let's see what happens. So the terminal
window gives me a view on to Linux that's
running on this machine. OK, so I'm submitting. And this it will take just
a few seconds for Cloud ML to respond. And then I should start getting
information and instructions on what to go and do next. And while we're waiting-- there we go. It responded saying
with some information. But did you notice that it
switched to the terminal window where I'm now collecting
results from the Cloud ML logs? So there we go. And these logs I can go in--
it's the same log that I can now go and inspect online. This shows you the integration. If I just click
back to the console, it tells me that this
is my job number. Ends in 008. I can go to this URL to inspect
what's happening on my job or in the logs. And I can run this command
in R to figure out-- to find out what's happening. So I also notice that R
itself is not blocked. So this is job
running in Cloud ML, but I can do my normal R code. And I get an answer
straight away. So if I switch back to
the Cloud ML console, and I just click Refresh, with
some luck I'll get that job number ending in 008. There we go. And it says it's just
running for a minute. And I can go and view the logs. And these are the same logs
I just showed you earlier. So once that--
once that wakes up, it just takes a minute normally. There we go. I can-- there we go. It says now-- it's queued, and
is waiting to be provisioned. And this job from
experience I know will-- it's pretty much the
same one I had the second job. This will now continue to run
for the next hour and a half. And the beauty of this way
of interacting with Cloud ML is that this is a service. It's not a virtual machine. I had zero insulation
on this machine. I did not have to go and
configure, install anything at all. And the Cloud ML
package in R will discover all my dependencies. I'm using dplyr. I'm using Keras. It will discover those packages. It will use Packrat to get
the corresponding package and install those on
the Cloud ML Service. At the end of the
run, that machine will come back with results. And the clocks stop, so I'm
not getting charged any more for that machine running. That's Cloud ML. Let's have a look at if I
actually copy this instruction job status and tell R-- OK, tell me what's
happening there. I can inspect what's happening. There we go. I get back an R
list that tells me information about when
this job was started, all the values it's
going to choose, and also tells
you where you are. So that's about as much as
I want to do about demo. MIKHAIL CHRESTKHA: Yeah. I wanted to show the Cloud
Machine Learning Model. ANDRIE DE VRIES:
Ah, good point, yes. MIKHAIL CHRESTKHA: Yeah. Sorry, can you switch over that? ANDRIE DE VRIES: Yes. MIKHAIL CHRESTKHA: Yeah, the one
thing I wanted to talk about-- and this is a topic I'm
passionate about-- again, deploying those models
for consumption. Again, the UI from the-- oops-- from the Google Cloud
Platform, it's very simple. But the idea is once you
register those models, they're available in
a central repository for your organization
to manage and maintain. Again, you can follow
your own taxonomy. Here, I've just published
a few sample models. They might be relevant
to certain functions, certain business lines. I have a couple here that
maybe the digital marketing team has a model to predict
click-through rates. We have the computer
vision department that's trying to recognize the
image of the product catalog, some text classification. So all these models are
now available for REST API consumption. I ran a loop to
actually-- oh, sorry-- to actually see-- oops. I ran a couple of calls to
the image classification model last night. And what this allows you to do
from a monitoring standpoint is maybe take a look
at the last 12 hours. And it really gives you a lot
of information around, how often is the model being
used by your applications, by end users? What does the
predictions per second? From a performance
standpoint, also being able to track latency and
issues, along with the logging, to have a nice way to see--
you can see, a lot of times we have a pretty low latency. But there are some
peaks during the night here when I ran the loop where
that really could be a problem. You can debug and make
sure these predictions are serving correctly whether you
need real-time predictions or at batch. I think we wanted to show the
Shiny examples as a front end, and then we're
going to wrap it up. ANDRIE DE VRIES:
Thanks, Mikhail. OK, so as Mikhail suggested,
we want to deploy this. And he said earlier, you
can deploy these models as APIs in the service. You can do the same thing, you
can deploy TensorFlow models as an API in RStudio Connect,
which is an [INAUDIBLE] product that gives you a publication
platform to publish your Shiny apps or monitor
reports, et cetera. And I want to show you
one example of that in RStudio Connect live. This is a small toy
example I wrote. It uses a pre-trained
Keras model. So I didn't do any
training myself. I used a publicly
available model that is fairly
sophisticated, not-- it's a number of years old. You can get better
models these days. And what it does
is I can tell it to upload a small image
perhaps of my dog. And then once the
image is uploaded, it will try and tell
me what's there. It thinks my dog is a
Malamute, or a Collie, maybe an Eskimo dog. Let's tell it to give me more
categories, 10 categories. It definitely thinks
it's some kind of dog. It's none of those. It's actually Finnish Lapponian. But if you know what
a Malamute looks like, it's a pretty good guess. The code itself is very simple. It's only about 50
lines of code in total. And the key element of that
code is in this line here. The application raised at 50
is both in function and Keras. Keras comes with some
pre-trained models, including [INAUDIBLE],, and
exception, and exception, and mobile net. And all I did was to say
use that model and just score on my own image. So there's my scoring
function, the product image. It's basically-- it's
just model then predict, and then I decode
the predictions. And that plot prediction,
that's the bar chart. And that's it. That's the entire application If I upload a second
image maybe of my cat, let's see what it thinks. Well, it thinks it's
some kind of cat-- Tiger cat, or Egyptian cat. I think it's just
a Tabby cat really. [ANDRIE LAUGHS] So this is a little
toy example just to illustrate that you can use
TensorFlow in your Shiny app. And actually in
this case, I'm just using CPU in the background. I don't even have a GPU to
serve the scoring function. So you may get away
with not having a very sophisticated machine. I see some laughs. That's probably because it
thinks, well, maybe it's bucket or a plastic bag. [ANDRIE LAUGHS] Go figure. The perils of deep
learning, don't assume that these things are
intelligent, they're not, right? They will tell you based on
what your training data set was. MIKHAIL CHRESTKHA: So just
to close out, I think-- really take away, you
as the R community now have a set of libraries both to
access TensorFlow, Keras, scale and deploy using Google Cloud. I'm really in deep learning
as a new toolkit in your chest really allows you to
open up new applications, tackle new domains and
challenging business problems. And what we're
most excited again, is what you bring to the broader
ML community with applied math, the [INAUDIBLE] background,
to really learn-- to teach the broader
community how to build ML models
more effectively. A couple of things,
I know RStudio has a booth in Moscone
West on level 2. We have a couple of folks there. You can stop by. The Keras cheat
sheet is available. We're also excited to announce
a new Kaggle competition. This is in partnership with
Google Cloud and RStudio. So give it a shot. That'll be published
very soon where you can try out different
techniques, including TensorFlow. We mentioned RStudio Server Pro
is a one-click deployment that is available on Google Cloud
Platform's Marketplace. And a couple of books,
again, deep learning for R is something that I
really found very useful in my journey of relearning. Now just a couple
of suggestions. I think for those who maybe
want a little bit more exposure to the new domains
around deep learning, there's a great
computer vision session around satellite imagery
with one of our customers. Andrie touched
upon this, which is a great question from the
community around deep learning, versus support vector
machines, versus Xgboost. There's a great session around
scikit-learn and Xgboost where it talks a little
bit about the trade-offs around deep learning as well
as our traditional statistical techniques. There's two encore
sessions that I very much recommend for folks who want
a little bit of a deep dive into TensorFlow,
TensorFlow, Deep Learning, and
Convolutional Neural Nets without a PhD, that was packed. A lot of people couldn't
get in yesterday. I would definitely
recommend seeing that. And if you're interested
in the broader ML/AI spectrum within Google
Cloud, really From Zero to ML on Google Cloud
Platform, everything from REST APIs that you can access as
an analyst or a developer, all the way to really deploying
and coding in TensorFlow-- TensorFlow there. So hopefully you
get a notification around surveys for this. Please fill them out and
provide us great feedback. And we're lucky where we have
a lunch upcoming, so anyone who wants to stay back
and ask us questions, we'll be here for
the next 30 minutes. Thank you everyone. [THEME MUSIC PLAYING]