♪ (music) ♪ Hello, everyone. First, thanks everyone for coming
to attend the Dev Summit. And second, thanks
for staying around this long. I know it's been a very long day. And there has been a lot of information
that we've been throwing at you. But we've got much, much more
and many more announcements to come. So please stick with me. My name is Clemens, and this is Raz. We're going to talk about
TensorFlow Extended today. But before we do this,
I'm going to do a quick survey. Can I get a quick show of hands? How many of you do machine learning
in a research or academic setting? Okay. Quite a big number. Now how many of you do
machine learning in a production setting? Okay. That looks about half-half. Obviously, also a lot of overlap. So for those of you who do
machine learning in a production setting, how many of you have
agreed with this statement? Yeah? Some? Okay. I see a lot of hands coming up. So everyone that I speak with
who's doing machine learning in production agrees with this statement: "Doing machine learning
in production is hard," and it's too hard. Because after all, we actually want
to democratize machine learning and get more and more people
to allow them to deploy machine learning in their products. One of the main reasons why
it's still hard is because in addition to the actual machine learning. So this small orange box
where you actually use TensorFlow, you may use Keras
to put together your layers and train your model. You need to worry about so much more. There's all of these other things
that you have to worry about to actually deploy machine
learning in a production setting and serve it within your product. Now the good news
is that this is exactly what TensorFlow Extended is about. TFX in [inaudible] Google
is an [inaudible] machine learning platform that allows our developers
to go all the way from data to production and serving machine learning models as fast as possible. Now before we introduce TFX, we saw that going through this process of writing some of these components,
some of them didn't exist before. Gluing them together
and actually getting to a launch took anywhere
between six to nine months, sometimes even a year. Once we've deployed TFX
and allow developers to use it, in many cases, people can use
this platform and get up and running with it in a day and actually get to
a deployable model in production in the order of weeks or in just a month. Now, TFX is a very large
system and platform that consists of a lot of components
and a lot of services so unfortunately I can't talk about
all of this in the next 25 minutes. So we're only going to be able to
cover a small part of it but we're talking about the things that we've already
open sourced and made available to you. First, we're going to talk
about TensorFlow Transform and show you how to apply
transformations on your data consistently between training and serving. Next, Raz is going to introduce you
to a new product that we're open sourcing called TensorFlow Model Analysis. We're going to give a demo of how
all of this works together end to end and then make a broader announcement
of our plans for TensorFlow Extended and sharing it the community. Let's jump into
TensorFlow Transform first. So, a typical ML pipeline
that you may see in the wild is during training, you usually have a distributed data
pipeline that applies transformations to your data. Because usually you train
in a large amount of data, this needs to be distributed, and you're on this pipeline and sometimes materialize
the output before you actually put it into your trainer. Now at serving time, we need to find a way to somehow
replay those exact transformations online. As a new request comes in,
it needs to be sent to your model. There's a couple of challenges with this. The first one is, usually those two
things are very different code paths. The data distribution systems
that you would use for batch processing are very different from the libraries
and tools that you would use to-- in real time transform data
to make a request to your model. Now we have two different code paths. Second, in many cases,
it's very hard to keep those two in sync. I'm sure a lot of you have seen this. You change your batch processing pipeline
and introduce a new feature or change how it behaves and you somehow
need to make sure that the code that they actually use in your
production system is changed at the same time and is kept in sync. The third problem is,
sometimes you actually want to deploy your TensorFlow machine learning
model in many different environments. You want to deploy it in a mobile device;
you want to deploy in a server; maybe you want to put it on a car;
now suddenly you have three different environments
where you have to apply these transformations,
and maybe there's different languages that you use for those,
and it's also very hard to keep those in sync. And this introduces something
that we call training serving skew, where the transformations that you do
at training time may be different from the ones in serving time,
which usually leads to bad quality of your serving model. TensorFlow Transform addresses this
by helping you write your data processing job
at training time, so actually help you create those
data pipelines to do those transformations, and at the same time, it emits a TensorFlow graph that can be in line with your training model
and also your serving model. Now what this does is,
it actually hermetically seals the model, and your model takes
a raw data request as input, and all of the transformations
are actually happening within the TensorFlow graph. This is a lot of advantages,
one of them is that you no longer have any code in your serving
environment that does these transformations because they're all
being done in the TensorFlow graph. Another one is wherever you
deploy this TensorFlow model, all of those transformations
are applied in a consistent way. No matter where this
graph is being evaluated. Let's see how that looks like. This is a code snippet
of a pre-processing function that you would write with TF Transform. I'm just going to walk you
through what happens here and what we need to do for this. First thing we do
is normalize this feature. As all of you know,
in order to re-normalize a feature we need to compute the mean
and the standard deviation, and to actually apply this transformation,
we need to subtract by the mean and divide by the center of deviation. So what has to happen is,
for the input feature X, we have to compute these
statistics which is a trivial task. If the data fits into a single
machine, you can do it easily. It's a non-trivial task if you have
a gigantic training data set and actually have to compute
these metrics... ...effectively. Once we have these metrics
we can actually apply this transformation to the feature. This is to show you that the output
of this transformation can then be, again, multiplied with another tensor-- which is just a regular
TensorFlow transformation. And then in order to bucketize a feature,
you also again need to compute the bucket boundaries to actually
apply this transformation. And again, this is a distributed data job
to compute those metrics for the result of an already transformed feature. This is another benefit to then
actually apply this transformation. The next examples just show you
that in the same function it can apply any other tensor in tensor [inaudible]
function and there's also some of what we call mappers in TF transform
that don't require this analyze phase. So, N-grams doesn't require us
to actually run a data pipeline to compute anything. Now what happens here
is that these orange boxes are what we call analyzers. We realize those as actual data pipelines
that compute those metrics over your data. They're implemented using Apache Beam. And we're going to talk
about this more later. But what this allows us to do is actually
run this distributor data pipeline in different environments. There's different runners
for Apache Beam. And all of the transforms are just simple
instance to instance transformations using pure TensorFlow code. What happens when you
run TensorFlow Transform is that we actually run these
analyze phases, compute the results
of those analyze phases, and then inject the result
as a constant in the TensorFlow graph-- so this is on the right--
and in this graph, it's a hermetic TensorFlow graph
that applies all the transformations, and it can be in-lined
in your serving graph. So now your serving graph
has the transform graph as part of it and can play through
all of these transforms wherever you want to deploy
this TensorFlow model. What can be done
with TensorFlow Transform? At training time for the batch processing,
really anything that you can do with a distributed data pipeline. So there's a lot of flexibility here
with types of statistics you can compute. We provide a lot
of utility functions for you, but you can also
write custom data pipelines. And at serving time because we generate
a TensorFlow graph that applies these transformations--
we're limited to what you can do with a TensorFlow graph,
but for all of you who know TensorFlow, there's a lot of flexibility
in there as well. Anything that you can do
in a TensorFlow graph, you can do with your transformations. Some of the common use cases
that we've seen, the ones on the left I just spoke about, you can scale
a continuous value to the <i>C-score</i> which is minimalization
or to a value between 0 and 1. You can bucketize a continuous value. If you have text features,
you can apply Bag of Words or N-grams, or for feature crosses,
you can actually cross those strings and then generate
vocabs of the result of those crosses. As mentioned before,
TF Transform is extremely powerful in actually being able to chain together
these transforms so you can apply transform under result
of a transform and so on. Another particular interesting
transform is actually applying another TensorFlow model. You've heard about the saved model before? If you have a saved model that
you can apply as a transformation, you can use this until you've transformed. Let's say you have an image
and you want to apply an inception model as it transforms
and then use the output of that inception model maybe to combine it
with some other feature or use it as an input feature
to your model. You can use any other
TensorFlow model that ends up being in-lined
in your transform graph and also in-lined in your serving graph. All of this is available today
and you can go check it out on <i>github.com/tensorflow/transform</i>. With this I'm going to hand it
over to Raz who's going to talk about TensorFlow Model Analysis. Alright, thanks Clemens. Hi, everyone. I'm really excited to talk about TensorFlow Model Analysis today. We're going to talk
a little bit about metrics. Let's see, next slide. Alright, so we can already
get metrics today right? We use TensorBoard.
TensorBoard's awesome. You saw an earlier presentation
today about TensorBoard. It's a great tool--
while you're training, you can watch your metics, right? If your training isn't going well,
you can save yourself a couple of hours of your life, right? Terminate the training, fix some things... Let's say you have
your trained model already. Are we done with metics? Is that it? Is there any more to be said
about metics after we're done training? Well, of course, there is. We want to know how well
our trained model actually does for our target population. I would argue that we want to
do this in a distributed fashion over the entire data set. Why wouldn't we just sample? Why wouldn't we just save
more hours of our lives, right? And just sample,
make things fast and easy. Let's say you start with a large data set. Now you're going to slice that data set. You're going to say, "I'm going
to look at people at noon time." Right? That's a feature. >From Chicago, my hometown. Running on this particular device. Each of these slices reduce the size of your evaluation dataset by a factor. This is an exponential decline. By the time you're looking at
the experience for a particular... ...set of users, you're not
left with very much data. And the error bars on your
performance measures, they're huge. I mean, how do you know that
the noise doesn't exceed your signal by that point, right? So really you want to start
with your larger dataset before you start slicing. Let's talk about a particular metric. I'm not sure-- Who's heard of the <i>ROC Curve?</i> It's kind of an unknown thing
in machine learning these days. Okay. We have our ROC Curve,
and I'm going to talk about a concept that you may or may not be familiar with which is <i>ML Fairness</i>. So what is fairness? Fairness is a complicated topic. Fairness is basically how well
does our machine learning model do for different segments
of our population, okay? You don't just have one ROC Curve, you have an ROC Curve for every segment. You have an ROC Curve
for every group of users. Who here would run their business based on their top line metrics? No one! Right? That's crazy. You have to slice your metrics;
you have to go in and dive in and find out how things
are going so that lucky user, that black curve
on the top, great experience. That unlucky user, the blue curve? Not such a great experience. When can our models be
unfair to various users? One instance is if you simply
don't have a lot of data from which to draw your inferences. Right? We use Stochastic optimizers, and if we re-train the model, it does something different
every time, slightly. You're going to get a high variance
for some users just because you don't have a lot of data there. We may be incorporating data
from multiple data sources. Some data sources are more
biased than others. So some users just get
the short end of the deal, right? Whereas other users
get the ideal experience. Our labels could be wrong. Right? All of these things can happen. Here's TensorFlow Model Analysis. You're looking here at the UI hosted
within a Jupyter Notebook. On the X-axis, we have our loss. You can see there's some
natural variance in the metrics. We're not always going to
get spot on the same precision and recall for every
segment of population. But sometimes you'll see...
what about those guys at the top there experiencing
the highest amount of loss? Do they have something in common? We want to know this. Sometimes our users that... ...get the poorest experience, they're sometimes
our most vocal users, right? We all know this. I'd like to invite you
to come visit <i>ml-fairness.com</i>. There's a deep literature about the mathematical side of ML Fairness. Once you've figured out how
to measure fairness, there's a deep literature
about what to do about it. How does TensorFlow Model Analysis
actually give you these sliced metrics? How did you go about
getting these metrics? Today you export
a saved model for serving. It's kind of a familiar thing. TensorFlow Model Analysis is simple. As it's simple, it's similar. You export a saved model for evaluation. Why are these models different?
Why export two? Well the eval graph that
we serialize as a saved model has some additional annotations that allow our evaluation batch job to find the features,
to find the prediction, to find the label. We don't want those things mixed in
with our serving graphs so you export a second one. So this is the <i>GitHub</i>. We just opened it, I think
last night at 4.30 pm. Check it out. We've been using it internally
for quite some time now. Now it's available externally as well. The GitHub has an example that kind of puts it all together so that you can try all these components
that we're talking about from your local machine. You don't have to get
an account anywhere. You just get cloned
and run the scripts and run the code lab. This is the Chicago Taxi Example. So we're using public data from--
publicly available data to determine which riders
will tip their driver and which riders, shall we say, don't have enough money to tip today. What does fairness mean in this context? So our model is going
to make some predictions. We may want to slice these
predictions by time of day. During rush hour we're going
to have a lot of data so hopefully our model's going to be fair
if that data is not biased. At the very least it's not
going to have a lot of variance. But how's it going to do
at 4 a.m. in the morning? Maybe not so well. How's it going to do when the bars close? An interesting question. I don't know yet,
but I challenge you to find out. So this is what you can run
using your local scripts. We start with our raw data. We run the TF Transform;
the TF Transform emits a transform function
and our transformed examples. We train our model. Our model, again, emits two
saved models as we talked about. One for serving and one for eval. And we try this all locally,
just run scripts and play with the stuff. Clemens talked
a little bit about transform. Here we see that we want
to take our dense features, and we want to scale them
to a particular Z-Score. And we don't want to do that
batch by batch because the mean for each batch
is going to differ, and there's going to be fluctuations. We may want to do that
across the entire data set. We may want to normalize
these things across the entire data set. We build a vocabulary; we bucket
for the wide part of our model, and we emit our transform function,
and into the trainer we go. You heard earlier today
about TF Estimators, and here is a wide and deep estimator that takes our transformed features and emits to saved models. Now we're in TensorFlow Model Analysis, which reads in the saved model and runs it against all of the raw data. We called render slicing metrics
from the Jupyter Notebook, and you see the UI. The thing to notice here
is that this UI is immersive, right? It's not just a static picture
that you can look at and go, "Huh" and then walk away from. It lets you see your errors broken down by bucket or broken down by feature, and it lets you drill in
and ask questions and be curious about how your models
are actually treating various subsets of your population. Those subsets may be
the lucrative subsets you really want to drill in. And then you want to serve
your models so our demo-- our example has a one-liner here that you can run to serve your model. Make a client request-- the thing to notice here
is that we're making a GRPC request to that server. We're taking our feature
tensors, we're serializing them into the GRPC request,
sending them to the server and back comes probability. But that's not quite enough, right? We've heard a little bit
of feedback about this server. The thing that we've heard
is that GRPC is cool, but REST is really cool. I tried. This is actually one
of the top feature requests on GitHub for model serving. You can now pack your tensors
into a JSON object, send that JSON object to the server and get a response back to [inaudible]. Much more convenience
and I'm very excited to say that it'll be released very soon. Very soon. I see the excitement out there. Back to the end to end. You can try all of these pieces
end to end all on your local machine. Because they're using Apache Beam
direct runners, and direct runners allow you to take your distributive job
and to run them all locally. Now if you swap in
Apache Beam's data flow runner, you can now run against
the entire data set in the cloud. The example also shows you
how to run the big job against the cloud version as well. We're currently working
with a community to develop a runner for Apache Flink,
a runner for Spark. Stay tuned to the TensorFlow blog and to our GitHub... ...and you can find the example
at <i>tensorflow/model-analysis</i> and back to Clemens. Thank you, Raz. (applause) Alright, so we've heard
about Transform. We've heard how to train models,
how to use model analysis and how to serve them. But I hear you say you want more. Right? Is that enough? You want more? Alright. You want more. And I can think of why you want more. Maybe you read the paper
we published last year and presented at KDD about TensorFlow Extended. In this paper we laid out
this broad vision of how this platform works within Google
and all of the features that it has and all the impact
that we have by using it. Figure one, which allows
these boxes and describes what TensorFLow Extended actually is. Although, overly simplified,
this is still much more than we've discussed today. Today, we spoke about
these four components of TensorFlow Extended. Now it's important to highlight
that this is not yet an end to end machine learning platform. This is just a very small piece of TFX. These are the libraries
that we've open-sourced for you to use. But we haven't yet
released the entire platform. We're working very hard
on this because we've seen the profound impact
that it had internally-- how people could start
using this platform into applying machine learning
in production using TFX. And we've been working
very hard to actually make more of these components available to you. So in the next phase, we're actually
looking into our data components and looking to make those
available to users that you can analyze your data,
visualize the distributions, and detect anomalies
because it's an important part of any machine learning pipeline to detect changes and shifts
in your data and anomalies. After this we're actually looking
into some of the horizontal pieces that helped tie all of these
components together because if they're only
single libraries, you still have to glue them together yourself. You still have to use them individually. They have well-defined interfaces,
but you still have to combine them by yourself. Internally we have a shared
configuration framework that allows you to configure the entire pipeline
and a nice integrated fountain that allows you to monitor
the status of these pipelines and see progress and inspect
the different artifacts that have been produced
by all of the components. So this is something
that we're also looking to release later this year. And I think you get the idea. Eventually we want to make
all of this available to the community because internally,
hundreds of teams use this to improve our product. We really believe that this
will be as transformative to the community
as it is at Google. And we're working very hard
to release more of these technologies into the entire platform
to see what you can do with them for your products
and for your companies. Keep watching the TensorFlow blog posts
for a more detailed announcement about TFX and our future plans. And as mentioned, you can already use some of these components today. Transform is released. Model Analysis was
just released yesterday, Serving is also released, and the end-to-end example is available under the shortlink and you can find it
on the model analysis [inaudible]. So with this, thank you
from both myself and Raz, and I'm going to ask you
to join me in welcoming a special external guest, Patrick Brand,
who's joining us from Coca-Cola, who's going to talk
about applied AI at Coca-Cola. Thank you. (applause) ♪ (music) ♪