[MUSIC PLAYING] ASHISH TENDULKAR: So
what is TensorFlow? Let's start with that. OK, so goal of this talk is
introduction to main TensorFlow concept. It will be useful to have a
mental model of how system behaves and server
behaves, mostly you can forget everything and
use the high-level wrapper if your job is just to use
machine learning, right, machine learning algorithms
out of the shelf. If you're a researcher or
developer of machine learning algorithms, you should also
know low-level TensorFlow. OK, so what is TensorFlow? So TensorFlow is open source
machine learning library. I'm sure all of
you know about it. It was originally
developed by Google. And it has been open
source since November 2015. It is one of the highest rated
repository on GitHub right now, with 65,000-plus stars. TensorFlow is especially
useful for deep learning. But how many of you are
non-computer science people? OK, quite a large number. If you want to run some of
the mathematical simulations or any complex
mathematical calculations, you can also use
TensorFlow for that. That's something that
I want to emphasize on. Otherwise people don't
really feel that TensorFlow-- people generally associate it
with only machine learning. There are machine
learning libraries built on top of TensorFlow,
but core TensorFlow is not really tied
to machine learning. So you can define any
mathematical computation as a data flow graph, as we
will see in a few minutes. And then you can turn
it under TensorFlow. You can use TensorFlow
for research, as well as in production environment. So if you're doing
start up and are thinking of using
machine learning, you can think of TensorFlow. You can use TensorFlow. TensorFlow is equally
also useful in research where I have seen trainings
where lot of researchers are actually writing
TensorFlow code and making it available
to the community so that people can experiment
with that code later. And TensorFlow is open source
with Apache 2.0 license. Right, so there is some
meaning in the name TensorFlow. Do you ever wonder why
this is named TensorFlow? It's actually a combination
of two words, tensor and flow. What is really tensor? Have you taken advanced
mathematics course? So tensor, in this context,
is multidimensional array. Tensor is nothing but
multidimensional array. And flow is nothing but
a graph of operations. Every mathematical
computation is expressed as a graph of
operations in TensorFlow. So the nodes in the
graph are operations, and the edges are
nothing but tensors. So tensors flow into operations. So imagine doing a
matrix multiplication operation or a vector
matrix multiplication. So vector and matrix
will be the input. And multiple is operation. OK, so Google is using
internally TensorFlow very heavily. Do you recognize
all these logos? So Google uses TensorFlow
heavily in all these product. So you can see that the
number of directories containing model
description file has just gone up very heavily. So they started to
use high-level design of TensorFlow,
what exactly it is. So TensorFlow defines a general
purpose computation graph. It achieves ease of expression. And then it creates
tools for running it in different environments. The really good part
about TensorFlow is this. So TensorFlow can
execute operations on various hardware platforms. So you have CPU, you have
GPU, you have Android, iOS, or Raspberry Pi. So even if you are writing some
kind of an embedded system, you can still think
of using TensorFlow and put it on embedded system. How does TensorFlow
achieve that? So there is something
called as TensorFlow distributed execution engine. So we generally write our code
either in Python, C++, Java, Go kind of frontend. So TensorFlow distributed
execution engine takes your code written in Python. So this TensorFlow
distributed execution engine takes the code written
in Python and converts it into the underlying
hardware instruction set. So it's very useful
to understand this layered API of TensorFlow. It will help you to know exactly
what is useful in your work. So this is what I
was talking about. So this is core TensorFlow. So you can write your
mathematical operation as a data flow graph in
this Python frontend. Then TensorFlow distributed
execution engine takes it. And using this, you can run it
either on CPU, GPU, Android, iOS, Raspberry Pi. So one really good
thing about TensorFlow against the competing
libraries is that if you write programs in
some other competing library, like, let's say, scikit-learn. Do you like scikit-learn? What do you do to
distribute a program? What do you do to
write a scikit-learn on a very large amount of data? Right. So essentially when you
have large data that is distributed across
system, obviously there is nothing in the
scikit-learn API. So you have to essentially do
all the work of distributing the computation to
10 different nodes, and then collect
the result of that and learn, do training
in a distributed fashion. Using TensorFlow, you don't
have to do any of this. You can simply specify
your data flow graph and let TensorFlow handle the
complexity of the distributed system. So TensorFlow, once you
specify your cluster, once you give cluster
specification in TensorFlow, TensorFlow will decide
which operations to schedule on what kind of device. Got it? So that's one of the awesome
things about TensorFlow. OK. So I'll repeat the question
for the benefit of everyone. So your question
is whether there are different APIs written
for this different hierarchy platform. You don't have to
worry about it. That is handled
internally by TensorFlow. So I think it might have
written some kind of layer. Some kind of layer will be there
that takes the high level code. And then there'll be layer
to write instructions that can be exhibited in CPU. If you have written a
program in Java, for example, you can think of
this as a compiler or a Java virtual machine. So a virtual machine
takes the Java code, and then it can execute it on
different operating systems-- Windows, Linux. You don't write
different Java programs. So for GPU, there
is no Java analogy. So what happens
is that your data flow graph that is written here,
so you specify your operation in form of a data flow graph. TensorFlow takes
this data flow graph and it generates some
kind of an instruction in order to run it on
CPU, GPU, Android, iOS. And internal TensorFlow will
have all the information. So that's part of the
TensorFlow internal. We are not worried about
that as a developer. So our TensorFlow
will have a way of converting your instruction
so that they run on CPU, GPU, and all this kind. So let's say if I ask you to-- I give you two jobs. Add two numbers,
add A, B. Will you be able to write that
function, add A, B? Pretty easy, right? Now I ask you a slightly
harder question. I give you a photograph. And I ask you detect face of
a person in the photograph. Can you write the
program for that? Yeah, so maybe you can
construct features by hand, and then write rules,
or something like that. But rule-based
systems are brittle. If I take a photograph of
a person from a slightly different angle,
probably it will not work if you write rules that are
hand-coded for a straight face, straight, camera-facing face. So what's the difference? I want you to think,
take a step back, and think about the difference
between these two tasks. The first task is write a
program to add two numbers. And the second task is find the
human face in the photograph. What is the difference
between these two tasks? So essentially, in the first
task, you know the function. You know the exact mapping,
how to take two numbers and perform addition. You know the function,
F of A, B is A plus B. But in the other task,
unfortunately, we do not know the function. So F of a photograph,
and I want, let's say, output of 1 if
face is present, 0 otherwise. I do not know the
function, correct? But as a human, you can
easily recognize it, right? So why are we able
to recognize that? That our brains are
trained to recognize those kinds of images. Now, can we do a similar
thing with computers? Yes, you can. So that's what is machine
learning all about. So what do we need to
do machine learning? What is the first and
foremost important thing? Data, right? Training data. And you need a lot
of training data, so you need to
have training data. In this case, you need to
have a label training data. So you need to have a photograph
and the label associated with which will tell whether
there is a face or not. What do we do with
training data then? We need training data. That's the first thing. What else? What else is required in
order to build machine learning model? Training data as data
is covered, right? You need model itself. Model is the first thing. What else? Cost function, right? You need to have a
way of evaluating whether a machine learning
algorithm is performing correctly. So cost function
is another thing. What is the third thing? Once you have cost
function, you need to have optimization algorithm
to optimize that cost function. And fourth, you need some kind
of an evaluation criteria. So one very, very simplified
or componentized view of machine learning
is putting machine learning algorithm as a point
in this four dimensional space, right? You have model, you have cost
function, your optimization objective, and you have
evaluation criteria, right? So think about any machine
learning algorithm, and you will be able to
put it as a point in four dimensional space. So what are some of
the examples of models? It's important in this context. I'm actually building
up the context so that you understand what
is in the layers API, right? So what are the example
models in machine learning? AUDIENCE: Decision trees. ASHISH TENDULKAR: Decision
tree is one model. AUDIENCE: Neural networks. ASHISH TENDULKAR:
Neural networks. Linear regression. Theta transpose X.
Logistic regression, which is we take
linear combination and put it through
logistic function. So these are all models. Then what are the
examples of cost function? What do we use as
a cost function in regression kind of problems? AUDIENCE: List creditors. ASHISH TENDULKAR: List
creditors, very good. What about classification? AUDIENCE: Cross-entropy. ASHISH TENDULKAR:
Cross-entropy, right? So these are cost functions. What kind of optimization
algorithms do we use? AUDIENCE: Item optimizers. ASHISH TENDULKAR:
Item optimizer. The most is gradient
descent, right? Stochastic gradient descent. Yeah. There are many of them, right? So these are all
optimization algorithm. And what are the
evaluation metrics? So accuracy is one. AUDIENCE: F-measures. ASHISH TENDULKAR: F-measure. Is accuracy good every time? AUDIENCE: No. No. ASHISH TENDULKAR:
Where does it fail? Whenever you have imbalanced
kind of problems, right? Accuracy is a bad measure. Let's say you are asked
to write a program to detect spam emails. And let's say spam
emails are just 1%. I write a classifier which
is that everything is good. Accuracy is 99%. Are you getting what I'm saying? So accuracy is a bad
measure in such case. So we care about what is
called as precision recall. Do you know all these
things, precision recall? Have you heard about it? Have you heard about
confusion matrix? AUDIENCE: Yes. ASHISH TENDULKAR: Yes. So all these are
bare nut and bolts of machine learning, right? These are all reusable
components in machine learning. You take any machine
learning algorithm and it will have
these components. So I'll give you one secret
mantra of knowing machine. When somebody comes
to you and says that I've invented a new
machine learning algorithm, don't get worried. Ask five questions. What is the training data? What is the model? What is the cost function? What is the
optimization objective? And how do we evaluate
this algorithm? If you ask these five questions,
everything will be sorted out, got it? OK. So these are all reusable
machine learning components that are defined in layers API. It's very important to know
what is where in TensorFlow, because it's very
easy to get lost. So we talked so far
about Python Frontend. This is all about writing
your mathematical operation as a data flow graph in layers. So these are now machine
learning specific APIs, right? So in layers, we have
all the reusable machine learning components. Then people said, you
know, why should I care about building models
using layer all the time? So people said that,
you know, I don't always use this layers API. I need some higher
level abstraction to define a new
machine learning model. So if you're writing,
if you're performing, if you're doing your MS Project
research, or even PTP where you need to write your own
machine learning algorithm, write a new algorithm,
you can also think of using estimator APIs. You can think of estimators
as abstract classes. Do you know abstract class? So it defines some
kind of a framework. So for any machine
learning algorithm, if you want to use
estimator, it says that you're to implement what is
called a strain function, model function, and test function. So estimator is a
framework to build any machine learning model,
new machine learning algorithm. And then people
said, why should I keep writing the same
linear regression? Why every developer should
write same linear regression using estimator? So that's why TensorFlow came
up with canned estimators. So in canned
estimator, TensorFlow is supporting linear
regression, logistic regression, and neural network
out of the box. So if you want to use
the TensorFlow just like scikit-learn,
just like scikit API, canned estimator is
your answer, got it? So that's what core
TensorFlow supports. So there are some other
third party canned estimator for decision tree than
SVM and for clustering. Random forest is also there. There are three canned
estimators supported-- rather supported in the
core TensorFlow. One is linear regression,
logistic regression, deep neural network
for classification and for regression. These four are supported by
TensorFlow implementation by Google. In addition to that, there are
third party implementations for random forest, SVM,
and for clustering. OK, so this is essentially
a data flow graph. This is a data flow
graph for calculating, for performing ReLU, a basic
operation in neural network. What is the basic operation
in neural network? We have examples here. There are weights on
each of the feature. So we do theta
transpose X, right? So we do matrix multiplication
of examples and weights. The vectorized implementation
of neural network, for example. So we take examples,
and we take weight. We do matrix multiplication. So this is a
mathematical operation. I want you to look at
the nodes in the graph. Each node is a
mathematical operation, and edge is another one on
which we send the tensors on what information there is. So weights are
coming on this edge, examples are coming
on this edge, and there is a matrix
multiplication happening here. And the result of the
matrix multiplication is fed into the
addition operation. And there is area with biases. And result of this
entire add operation is for it to ReLU,
Rectified Linear Units. And then we do another extend
operation between labels and whatever you
get out of ReLU. So why does TensorFlow
define data flow graph? So in a classic
TensorFlow, now we have what is called as
TensorFlow Eager in which you can define operations
and run it as you go. But in a classic TensorFlow,
as when we started, there are two distinct steps. The first step is about
defining a data flow graph. And second step is execution
of the data flow graph. Why do you think TensorFlow
has this design philosophy? Why are these two separate
things in TensorFlow? Let's try to understand that. That will help you understand
why TensorFlow first insists on laying
out a data flow graph and then executing it. So what are the
advantages that TensorFlow will get by doing this? Now, you can see
that I can perform this matrix multiple application
operation in, let's say, GPU. Let's say GPUs are good
at matrix multiplication. I can take this piece of
graph and schedule it on GPU. Then I can perform
this addition on CPU. So it knows exactly
the dependency between the operations,
and it can decide which operations can be paralyzed. And when TensorFlow--
we'll see it in a moment-- yeah, we talked about it. Edges are n-dimensional
arrays that are tensors. And computation is
a data flow graph. Yeah, this is what happens. Now TensorFlow can
receive the computation that's set to Device
B and Device A, right? And now you'll wonder-- now there are two devices. What is required? Some way of
communicating, right? You need a notion
of communication. So that is handled
by TensorFlow. So TensorFlow will
include send/receive nodes automatically. You don't have to do that. Are you getting what I'm saying? So imagine you, you're
self-implementing such a receivable training. You'd have to do all these
things that TensorFlow is doing for you. That may be partially answering
what your questions are. OK? Got it? Any questions on this? So this-- look at it. Send/receive nodes
are put there. Then it also puts some other
mathematical operators, something like differentiation,
automatic differentiation. Where do you use differentiation
in machine learning? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: In
optimization, right? Whenever you are trying to
optimize the parameters. In gradient descent, we take
partial derivative of what? Of lost function or its
cost function, right? So if TensorFlow
doesn't do it, you would have to supply a function
to perform the differentiation operation. But TensorFlow has
an in-built facility to do the differentiation. And it actually inserts
this node automatically. Plus, TensorFlow also has a
specialized linear algebra operation compiler
that also optimizes some of the mathematical
functions for you. You can think of this
as a compilation phase. And all kinds of optimizations
are done at that stage. These are basic
send/receive implementation. Right now, this is where
the extensibility comes in. There are a number of standard
operations and kernels. You can also define your
own operators and kernels in TensorFlow. If you want to extend
TensorFlow to support new kind of hardware,
our TensorFlow gives facility to write your
own operations and kernels. These are device-specific
implementations of operations and kernels. This is where you can explore
if you want to support new kind of hardware. So this is a single-process
configuration. So there is a client. So what happens is that first
we define a data flow graph, and then we run it in
a specific context. So we create a station,
and every data flow graph is executed in the
context of that session. So if it's a single-process
configuration, we say that station run,
which then executives a subgraph on the worker. In this separate
configuration, what happens is that the master
process spawns multiple workers and get the work done. And what you see here,
this is a TensorFlow graph for neural network. So you have a logit layer. You have a ReLU layer. You have input. You have reshape operation here. And this is stochastic
gradient descent trainer. You can see that this is
basically a neural network graph. In practice, it can be
very complex with hundred to thousands of nodes and edges. But you are insulated. If you're writing
a layers API, you don't have to deal with
writing a data flow graph of the low layer
operation because layers will have already implemented
some of these subgraphs. For example, this SGD trainer
might be already implemented by the layers API. In the same manner,
there is logic layer that is already implemented,
a ReLU layer that is already implemented for you. In course of the
workshop, we will look at building
TensorFlow model right from canned estimator
to the low level API. All right, let's peep
into the second one-- Python Frontend. So we already talked
about data flow graph. So that's the code
of TensorFlow. So this is how you build
your data flow graphs. Import TensorFlow as TF
is the standard input. Then, this is how you define
your session, tf.Session. So we define session. And then we are now
defining a data flow graph-- so tf.constant. So this is a way of specifying
a constant tensor called constant multidimensional array. And I have a
two-dimensional array. The first row is 5, 6. Second row is 7, 8. And then I'm performing
the multiplication of this extensor with itself. Now, it is only
defining a tensor. It does not, right now,
have any value in it. So we will see actually in
the lab, when we say, print x, it will not actually print
the content of this tensor. It will just print that x is
a tensor of type constant. And it holds probably
numeric values. So that's information
you'll get. But in the same manner,
there is another tensor which performs first
multiplication, then it adds another
multiplication to it. So this is multiplication
between the matrix x to itself. And then we are
doing multiplication of x with an indicator
matrix or a diagonal matrix, which has one on the diagonal. Getting it? And then there is
additional operation. So now you can see how
this graph is actually represented here. So you can see that
there is a constant x, and there is another constant,
which is this constant matrix. These are my input tensors. So this constant is
multiplied to itself through MatMul operation. And then there is another
matrix multiplication happening between this constant matrix
and the diagonal matrix, got it? That is represented in
this MatMul operation. And then I'm performing
addition of these two. Result of this matrix
multiplication operation is added to result of
this matrix multiplication operation. And when I do z.eval,
that is a time when we are executing
this data flow graph. And we are saying
that I should evaluate z, which is this particular
node, this particular thing. In the same manner,
I can say evaluate x. Then it will evaluate only
this part of the graph. So you can evaluate at
any part of the graph. You can lay down
your complete graph, and then evaluate any subgraph
of that particular graph. That is OK. Got it? But you have to run
it under tf.Session. So let's do this first Basics
of TensorFlow practical. This is something that
I'll run on my screen. So we use Python. You people are
familiar with Python. You write your program
and then execute it. That's how you work, right? So Jupyter Notebook is a very
nice way of building demos where you interleave the
documentation and the code. Otherwise, what do you do? Are you [? Recess Colors? ?]
Are you running your company? What is it? [? Recess Colors? ?] Must
be writing demos, right? Let's say you want to
quickly try some method. You read a paper and you
want to try it and show it to your advisor. What do you do? You will write the program
in some separate file, and then you'll have paper that
will be in a separate document. And then you have to
constantly refer to the paper and document to explain that. The Jupyter Notebook solves
that problem for you. So Jupyter Notebook
gives you a way where you can write
some document, and then you can
write the code block. So what you can do is,
when you're writing, when you're implementing
some research paper, you can start with
the Jupyter Notebook. You can run this
locally as well, and you can write
documentation like this. So I have written
some documentation on basics of TensorFlow. And then I can run now-- basic unit in Jupyter
Notebook is a cell. There are two types of cells. So this is our text cell. And then second cell
is the code cell. I can run this code cell by
pressing this Run button. So now this code cell is run. Let's say if I want to do
something-- print finished import. You can see this, right? So you can run this Jupyter
Notebook cell by cell and see the output. So let's say you want to
demonstrate two competing approaches. So you write your code for first
approach and second approach. In the same document
itself, you will be able to get all
the comparisons done. Is the idea clear about
Jupyter Notebook for those who are not familiar with it? You can run it cell by cell, OK? Let's print "hello, TensorFlow." This is our tradition to print
"hello, world" in any language that we learn, right? So let's see how to
do that in TensorFlow. All right, I have some
output that I'll clear first. Current output, clear. So what is it that
I'm doing here? I'm defining a "hello, Tensor,"
which is holding a constant, which is holding a TensorFlow
constant answer with value "hello, TensorFlow!" Now, let's print this hello
and see what is there in that. We already talked about
it, so I have already leaked the question. No point asking
this question again. And then I define this
session, tf.Session. And I say session.run, hello. OK, let's see what
comes out of this. OK, so the first
print has printed. See this, what it has printed? This is a description
for tensor. It says that hello is a tensor. Constant:0, this is an
automatically generated name by TensorFlow program. Second most important
thing is shape of a tensor. Since we are talking about
a scalar, the shape is-- we use numpy. How many of you use numpy? It's the same concept as numpy. And the type is string. The type of object that
this instance will hold will be of type string. And the second print, second
session that run print actually prints "Hello, TensorFlow!". Got it? This is the point that I was
trying to emphasize earlier, that if you just do print
hello without running your node in the
context of session, you don't see
"Hello, TensorFlow!" but you see the
description of the tensor. And when I run hello in
the context of a session, I see "Hello, TensorFlow!". Got it? OK. So this is how you
can define constant. You can define a scalar. You can use shape and rank. Let's bring them and see what
comes out as a shape and rank. So shape in zero in this case. So for scalar, shape is zero. Now, let's look at what
is the shape for a list. So you can see tf.constant. I can define a list also. So this is a rank one tensor. So let's print the shape again. Except this basic thing, I
give all the notebooks to you to execute so you
will not be bored. You can keep writing
in this notebook and see what comes out of this. So here the shape is null
because it's a scalar. When I have a list, you can see
that the shape is three here because I have three
elements in the list. I can also define 2D-- I can also define a matrix. This is a matrix which has
two rows and three columns. Let's look at the shape of that. You can see that
the shape is 2, 3. There are two rows
and three columns. You can also define
rank three tensor here. So I can say print
extensor here. You can see that the
shape is 2, 1, 3. I'm defining a 3D array here. OK, got it? Any questions? OK. We'll shift gears now. Now, this is how
you define a plus b. So I'm defining two
arrays, two lists. And I can add these two lists
just using a simple addition operation. Let's try to do this. So this is defining
data flow graph. That's why you see the
information of the tensors. And now I'll define a
session and run total, and you will see that
three plus four is seven. That you see here. And two plus one is three. That you see here. Got it? OK, so that was about constant. Constants are not interesting. We talked about it. What is the most important
thing in machine learning? We talked about
training data, right? And you cannot have your
training data in memory. I cannot really write training
data as in memory objects and hope to input
them to tensor, right? So your training data will
normally be in a database, or you might have a file,
a big, big file, correct? And now you need to
have a way of inputting this file in TensorFlow. So tf.placeholder is
your friend for that. So you should use
tf.placeholder. So this is more like a promise. I am promising TensorFlow
that I will input or tie an object or a bunch
of values of float32, and hold a place for these
float values for me in tensor x. Then in the same manner, I'll
define another place holder for y. So in x, in this case, I'm
going to hold all my inputs. In y, I'll hold my
labels, for example. And then I can do
z equals x plus y. Now, since I'm
doing placeholder, there is a specific way in
which you have to feed data into this placeholder. So we use a mechanism
through feed dictionary. It's more like a
dictionary where I can initialize values
for the placeholder. Are you getting what I'm saying? In the back? OK. So here I have initialized
my x with value 3 and y with value 4.5. Instead of saying that
x and y are constant, I'm saying that now
they're placeholders, and I'm inputting them
through feed dictionary. And I will run the z
under session object. I'll run this. In the second
instance, I'm actually initializing x and y to lists. One has 1, 3, and
second is 2, 4. And when I run it, I see
the first run statement has printed the sum of 3 and 4.5. That is 7.5, correct? And second print
has printed the sum of two vectors, which is 3, 7-- 3 and 7. No rocket science. I'm just demonstrating how you
can use TensorFlow to perform these basic operations. Yeah, feed dictionary is a
mechanism of putting values into the placeholder. So tomorrow-- we'll see
this in the workshop. When I'm reading
data from files, I'll read it in memory
first on the file, and then I'll use feed
dictionary mechanism to initialize the placeholders. I think TensorFlow Eager helps
you to do operations as you lay your data flow graph. We're not going to cover
TensorFlow Eager today. In a classic TensorFlow,
the way it started, this was through
the API 1.6 where you have to first define a
data flow graph like this. And then it's important
to run it in a session. So let's say if you
have two graphs-- I'll put it another way-- you can define two
data flow graphs. And then you can define
a session for a graph and then run the session
within that graph. So you have to associate a graph
to a session and then run it. If you don't specify any
graph, then the full graph is run under the session. You will have to start
a session and specify what graph you want to use
for that particular session. We can also define variables. So for example, in machine
learning algorithm, we also need variables, right? Variables-- like variable
to hold weights, biases. These are classical variables
in programming language. Variables that can
take different values in the course of program. So let's go back. OK, so that was the
first practical which I walked you through, right? This point on,
whatever practicals now on that will come, you'll
be doing it on your own, OK? So I'm going to start
slightly in a reverse manner. So we looked at here, right? Now we'll go all
the way to the top. This is for people who do not
know much about TensorFlow and people who just
only use TensorFlow like scikit-learn API. So for the benefit of those
people, we'll start from here. And then we will
go into deeper-- then we'll go to layers
API where you'll actually be writing your own
machine learning models using layers API, OK? So the first practical will be
pretty straightforward and easy for many of you. So let's look at
canned estimator. How many of you have
written scikit-learn API? Have you written your machine
learning in scikit-learn? So not many of you have written
machine learning programs. How many of you have written
machine learning programs in some language or the other? OK, so what language do you use? AUDIENCE: Python. ASHISH TENDULKAR: Python. But in Python, what do you do? AUDIENCE: PyTorch. ASHISH TENDULKAR:
PyTorch, OK, fine. PyTorch, OK. So if you're using
PyTorch, you should also try TensorFlow Eager. It's very, very similar. OK, fine. So let's look at
canned estimator. So this is the call symbol
for canned estimator. If you have written
scikit-learn program, you'll see that this is very
similar to scikit-learn. So in TensorFlow,
there's another concept called feature columns. So feature column is a concept
to give input to the TensorFlow canned estimator. So you have to use
what is called as-- you have to specify
what is your data type. So I define one real
valued column square foot. Then rooms, which is
another real valued column. And then there's a zip code. I'm saying that
is a sparse column with individualized feature. There can be multiple types of
feature in machine learning, right? So how do you handle
discreet values in machine learning algorithm? Let's say you have a
feature called color, and color red, blue, green. How will you represent
this red, blue, green so that computer understands it? So one possible way to represent
red, blue, green is using, let's say, three
binary variables. So when red is on, I will
put the first variable as 1. When the blue is on, I'll
put second variable as 1. And when green is there,
I'll put third variable as 1. It's called one hot encoding. That means out of
these three variables, only one will be on at
a given point in time. So this is a standard way of
handling your categorical data, but it's not the only way. So sometimes what
happens is that if you're doing natural
language processing, and then there are words, right? So you will have this
one hot encoding running into millions of dimensions. And you want, let's
say, some kind of low dimensional
representation of this thing. So there is something
called as embeddings, which is more like a continuous
representation of the words in much lower dimension. Because what happens
with one hot encoding is that you get large
amount of sparse features. There's a lot of sparsity
that gets introduced in your training data. And embeddings is one way
of reducing that sparsity and getting denser presentation. And then I just use this
linear regressive function and feed the feature column
into it, and that's it. I have a regressor. And then I say regressor.fit
and regressor.evaluate. So fit will do their
training, and evaluate will do the evaluation. AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: Yeah. There are more such
kind of functions to handle your different
type of variables. AUDIENCE: But these
kind of functions are only for the [INAUDIBLE]? ASHISH TENDULKAR: No. They are available also for
your standard any other things. Not a problem. OK. I just change the regressor
to DNNRegressor, right? So you can see that I just
had to change the DNNRegressor and put it in units. I have a neural network model. Nothing changes. Only these two changes,
I get a new regressor. It's that easy. People who are starting
in machine learning, people do not know
much about it, want to use it as a black box. This is a great way of doing it. So I talked about
embedding columns. So you can also use embeddings. Embeddings are done through
neural networks, a single layer neural network. And then let's look
at TensorBoard. Now, you're doing
training, right? So what are you generally
interested in training? You're interested in
how the convergence is happening, right? Correct? Whether model is
learning or not. So bear with me. There are a few of our friends. They have not done
gradient descent. So I'll try to explain
what is gradient descent. Can everybody see this? So we have what is called as-- so this is our parameter, right? I'll keep it simple
with one parameter. And this is loss. So there is a graph between
what is the value parameter and what is loss. And let's say I
have a convex loss. I'll take a simple example. But gradient
descent you can also use for non-convex
functions, although convergence to global
minima is not guaranteed. So the way gradient
descent algorithm works, it's very, very intuitive
and very interesting idea. So do you do trekking? You go to the hill top, right? And now you want to come down. What do you do? So now I look around,
and I figure out the direction of
the steepest descent and follow that direction. So exactly the same idea is
applied in gradient descent, right? I want to find out
optimal value of theta that minimizes my loss. Obviously, we can see that
this is the point, right? So we have an
internal algorithm. We start here
somewhere, anywhere. Let's say I start here. And at this point, I'll find out
gradient of the loss function. And let's say this is the
gradient of the loss function. And then I take steps in the
direction of gradient, correct? Now, learning rate
defines how far I want to move in the
direction of loss function, in the direction of gradient. Even though this is
a very steep slope, I don't know to jump and
reach at the end of the slope, something like that. So I want to decide how much I
want to move in that direction. So a typical update
in gradient descent is you set your theta to
theta minus alpha times del del-theta of J theta. So this is a partial
derivative, right? This is the gradient
that I find, and I multiply that degree
with a learning rate. Now, is gradient descent,
even for convex function, is it guaranteed to
converge all the time? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: It depends
on the learning rate, right? So if you take small steps,
you'll probably reach here. But what if you decide
to take big steps? AUDIENCE: You oscillate. ASHISH TENDULKAR: Yeah. You'll oscillate, right? You might go like this,
come back like this. Correct? So choosing the
right learning rate is very important, both
from efficiency perspective and from getting a
convergence perspective. So you have this tool. So you want to
probably look at-- so this is my epoch,
and this is my loss. So for every epoch
there is some loss. As I start training, this
should ideally decrease. If you're getting something
like this, you're good. Then you have right
learning rate. But if you get something like
this, it's a rare sign for you, right? This is a signature
of oscillation, right? Loss is coming down, going up-- very bad. So if you find this,
reduce your learning rate. If you find this
goes too flat, that means it's learning very slowly. And then you have to scope up
improving the learning rate. Put slightly higher
learning rate, and then you will get
faster to the optima. Why learning is possible? Why is learning possible? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: What are
the assumptions that we make? We said that training
data and test data comes from the same
distribution, IID, Independent and Identically Distributed. So there is one source which is
giving you training examples. That assumption is true. Why do I really need to
go through all the data again and again? Instead of going
through the entire data, I batch the examples
into smaller batch size, let's say of 1,000. And I process batch of
1,000, get one update. That's stochastic
gradient descent. And when I'm doing
stochastic-- no, that's a mini-batch
gradient descent. When I'm doing mini-batch
gradient descent, there is a possibility that
you might see some such kind of template there. And other extreme is I
take one example at a time and make update. That's stochastic
gradient descent. And one example--
it's not so stable. That's why we always
use mini-batch gradient descent for practical purposes. And if you know
gradient descent, you can optimize any learning
object of any machine learning algorithm. That may not be the most
optimal thing to do, but then you have one tool that
is applicable for all the cost functions. Have you at least trained one
machine learning algorithm-- each one of you? Did you look at the loss? If you're training, let's
say, neural integration, did you look at what
is the squared error after every iteration? So you want to look
at that, right? So TensorBoard is
a place to look at. TensorBoard is a tool that
comes with TensorFlow where all the statistics are gathered
and showed them to you. So you can see, actually,
some of the summaries, also of the objects. You can also see how
different variables are changing across epochs. So this shows how
bias is changing. It also has graphs for
accuracy and other matrices, like precision,
recall, F-measure. So you have all those
graphs in TensorBoard. So this is a very powerful tool. You don't have to, let's say,
write other visualization routines in Python to visualize
your learning statistics. OK, so now it's your time. Have you cloned this
repository under GitHub? Each one of you have
cloned a GitHub repository? OK. So go to the workshops notebook
that is for canned estimator. I will first walk you through
this canned estimator notebook, and then you can
try it on your own. Feel free to make changes,
explore, print various things, and experience
TensorFlow yourself. So in this particular exercise,
what we're going to do is we are going to
write a classifier to detect handwritten digits to
recognize handwritten digits. So there are 10
classes, 0 to 9, right? So there is a standard MNIST
data set for handwritten digit recognition. We'll be using that. So we'll be using a keras data
set function to load the data. It will write a new
training and test set. And then we'll be using
numpy input function to input my training
data and test data. And finally, I'm going to
have a feature specification. Each of my feature
column is a numeral. So I have a numerical column. So essentially, each
image in this data set is a 28 by 28 pixel. So I linearize it, and I'll make
it into 784 dimensional vector. And then I specify
linear classifier. So this linear classifer
is nothing more than a logistic regression
classifier, which takes the featured specification
and number of classes, which is 10. And then I train
using 1,000 steps. And then I will look at what
is the accuracy of that. Yeah, let's start, OK? And we'll also train-- we'll also build our deep
neural network classifier and check the accuracy with
deep neural network classifier. And you'll compare
accuracies of two classifiers on the MNIST digit data set. OK? Right, if you have
any questions, you can feel free to ask us. OK. Let's take about 15
minutes or take 20 minutes to finish this lab. Let's look at standard
process of machine learning. So now, the first exercise
we did was with the images. Now let's look at
structured data. Structured data you would
encounter at different places. That's a data that is stored
in a database or in some files. And we are often required to
build machine learning models on it. So we'll take one such data set. This is a housing data set. I will show you a
couple of tools. One is this facets. Let's talk about it. So facet is a tool that you can
use to visualize and explore your data. So the data set that
we are going to look at is adult data set where we
will try to predict what is the individual's income. And this will be solved as
a classification problem where we will say that we want
to predict whether income is more than $50,000 or not, OK? OK, so this requires
us to download data from the internet. And again, we'll use
keras.utils package to get training
data and test data. And the data set is
there in UCI repository. And it has got multiple
columns, like age, work class, education,
occupation, and so on. We'll load data using Pandas. And then we'll apply the
first pre-processing. So any missing values,
if they're there, we will drop in it. And then we'll
separate the labels from the feature in this. So we will get the
income out of the vector. And we do one sanity check. We find out what are the shapes
of training data and label data. So you can see that we have same
number of examples and labels, same with test
examples and labels. And the head command, the
Panda's head command-- do you know Pandas? How many of you know Pandas? OK. Pandas is another
dataframe package. Do you use Python? Have you seen the
dataframes in Python? Dataframe is kind of a versatile
data structure that can store data of different types. So you can think of it as a
collection of dictionaries or a collection of lists. So you can have float
values in the first column. You can have character value
in the second column and so on. So that is dataframe. So the head command will
print first five rows. So you can see
that there is age. There is work class,
education, and so on. And we can also,
in the same manner, look at first few entries
of the label columns. So you can see that there is
a false and true labels here. And this false and true we got
by applying a lambda function. Where the income is
greater than $50,000, we said that we want true,
or else we want false. OK? And now we have training
input function and test input function using Panda's
input function. Because we already did
that now in Pandas, we'll use Pandas input
function from estimator to get the input. One of the very important
thing in machine learning is feature engineering. So you have raw features. Sometimes those raw
features are not sufficient. You want to, let's say, do
one hot encoding, right? Or you want to
combine two features. These are examples of
feature engineering. So we'll do feature engineering. And feature columns is-- we talked about feature column. Feature column is a
preferred way of taking input into TensorFlow programs. So we are going to define
our feature columns. For every input column,
we have to define a corresponding feature column. So we have a numeric
column for age. If you fill that
now, look at the age. So what happens is that
whenever we are kids, we are not earning
anything, right? So our income is zero. Then as you start a career,
our income starts going up. And then there is a peak. And then we retire, and then
there is no income there. So you can see that
there is some kind of a non-linear relationship
between income and age. So we decided to bucketize,
let's say, income. So I can take the age and
define a bucketized column. So bucketization will
happen automatically. So you have to specify what kind
of buckets you want to create. So you have to specify
bucket boundaries. So age is then 30, 31. Age less than 46, 60, 75, 90. The other range is
based on the age value of a particular bucketized
column will be created for me. And I'm appending the bucketized
column to feature column. Then sometimes you also
have categorical data with vocabulary list-- for example, degree. So there are bunch of degrees,
like bachelor, 11th pass, HS grad, masters, doctorate. All these degrees, right? So there's a fixed
dictionary of vocabulary. Based on the vocabulary,
you want to assign values to the categorical data. So categorical column
with vocabulary list is the function
you should be using where you can specify
your vocabulary list and get a value
for your education. Then, in the same manner, you
can hash your categorical data into finite buckets. And cross column is
very interesting. In linear regression,
we fit a line. We fit a line as a function. Let's say if I want to fit
the second order polynomial, how do we do it with
linear regression? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: Yes,
I do feature crosses. Let's say x is my feature. I'll square that particular
feature, x squared, x1 squared. If I want to
feature x1, x2, I'll do x1 squared, x2
squared, and x1, x2. So essentially taking the
cross product of the feature. So if you say a crossed
column between age bucket and education
bucket, I will create automatic cross between
age bucket and education. And then I can append that. And then you can create
a canned linear estimator using your classifier. And then you know
what to do, right? We train the classifier. And then we evaluate the
estimator under training data and under test data, and we
look at various statistics. And once we have
the model ready, you can use it to predict
for the test data. So go through this code lab. And in the same manner, I can
define a deep neural network model where I can do
feature embeddings. So that is the second
part of this lab. So I would request you to
go through the code lab, read the documentation,
do the experimentation. And we are around to
help you out anyways, OK? So after this, this concludes
the basic part of the workshop. What we learned is we learned
about the basics of TensorFlow. And we also talked about
how to use TensorFlow like scikit-learn APIs. So people who want
to learn basics, I would ideally like everyone to
stay until end of the workshop. But people who just
want to learn basics, we are done with basics. In the second half
of the workshop, we will try to write our
own machine learning models using layers API. That might be interesting
for many of you who are doing research and
who aspire to building machine learning algorithms and models. So the second part
will be extremely relevant for those people. [MUSIC PLAYING]