Basics of TensorFlow - TF Workshop - Session 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] ASHISH TENDULKAR: So what is TensorFlow? Let's start with that. OK, so goal of this talk is introduction to main TensorFlow concept. It will be useful to have a mental model of how system behaves and server behaves, mostly you can forget everything and use the high-level wrapper if your job is just to use machine learning, right, machine learning algorithms out of the shelf. If you're a researcher or developer of machine learning algorithms, you should also know low-level TensorFlow. OK, so what is TensorFlow? So TensorFlow is open source machine learning library. I'm sure all of you know about it. It was originally developed by Google. And it has been open source since November 2015. It is one of the highest rated repository on GitHub right now, with 65,000-plus stars. TensorFlow is especially useful for deep learning. But how many of you are non-computer science people? OK, quite a large number. If you want to run some of the mathematical simulations or any complex mathematical calculations, you can also use TensorFlow for that. That's something that I want to emphasize on. Otherwise people don't really feel that TensorFlow-- people generally associate it with only machine learning. There are machine learning libraries built on top of TensorFlow, but core TensorFlow is not really tied to machine learning. So you can define any mathematical computation as a data flow graph, as we will see in a few minutes. And then you can turn it under TensorFlow. You can use TensorFlow for research, as well as in production environment. So if you're doing start up and are thinking of using machine learning, you can think of TensorFlow. You can use TensorFlow. TensorFlow is equally also useful in research where I have seen trainings where lot of researchers are actually writing TensorFlow code and making it available to the community so that people can experiment with that code later. And TensorFlow is open source with Apache 2.0 license. Right, so there is some meaning in the name TensorFlow. Do you ever wonder why this is named TensorFlow? It's actually a combination of two words, tensor and flow. What is really tensor? Have you taken advanced mathematics course? So tensor, in this context, is multidimensional array. Tensor is nothing but multidimensional array. And flow is nothing but a graph of operations. Every mathematical computation is expressed as a graph of operations in TensorFlow. So the nodes in the graph are operations, and the edges are nothing but tensors. So tensors flow into operations. So imagine doing a matrix multiplication operation or a vector matrix multiplication. So vector and matrix will be the input. And multiple is operation. OK, so Google is using internally TensorFlow very heavily. Do you recognize all these logos? So Google uses TensorFlow heavily in all these product. So you can see that the number of directories containing model description file has just gone up very heavily. So they started to use high-level design of TensorFlow, what exactly it is. So TensorFlow defines a general purpose computation graph. It achieves ease of expression. And then it creates tools for running it in different environments. The really good part about TensorFlow is this. So TensorFlow can execute operations on various hardware platforms. So you have CPU, you have GPU, you have Android, iOS, or Raspberry Pi. So even if you are writing some kind of an embedded system, you can still think of using TensorFlow and put it on embedded system. How does TensorFlow achieve that? So there is something called as TensorFlow distributed execution engine. So we generally write our code either in Python, C++, Java, Go kind of frontend. So TensorFlow distributed execution engine takes your code written in Python. So this TensorFlow distributed execution engine takes the code written in Python and converts it into the underlying hardware instruction set. So it's very useful to understand this layered API of TensorFlow. It will help you to know exactly what is useful in your work. So this is what I was talking about. So this is core TensorFlow. So you can write your mathematical operation as a data flow graph in this Python frontend. Then TensorFlow distributed execution engine takes it. And using this, you can run it either on CPU, GPU, Android, iOS, Raspberry Pi. So one really good thing about TensorFlow against the competing libraries is that if you write programs in some other competing library, like, let's say, scikit-learn. Do you like scikit-learn? What do you do to distribute a program? What do you do to write a scikit-learn on a very large amount of data? Right. So essentially when you have large data that is distributed across system, obviously there is nothing in the scikit-learn API. So you have to essentially do all the work of distributing the computation to 10 different nodes, and then collect the result of that and learn, do training in a distributed fashion. Using TensorFlow, you don't have to do any of this. You can simply specify your data flow graph and let TensorFlow handle the complexity of the distributed system. So TensorFlow, once you specify your cluster, once you give cluster specification in TensorFlow, TensorFlow will decide which operations to schedule on what kind of device. Got it? So that's one of the awesome things about TensorFlow. OK. So I'll repeat the question for the benefit of everyone. So your question is whether there are different APIs written for this different hierarchy platform. You don't have to worry about it. That is handled internally by TensorFlow. So I think it might have written some kind of layer. Some kind of layer will be there that takes the high level code. And then there'll be layer to write instructions that can be exhibited in CPU. If you have written a program in Java, for example, you can think of this as a compiler or a Java virtual machine. So a virtual machine takes the Java code, and then it can execute it on different operating systems-- Windows, Linux. You don't write different Java programs. So for GPU, there is no Java analogy. So what happens is that your data flow graph that is written here, so you specify your operation in form of a data flow graph. TensorFlow takes this data flow graph and it generates some kind of an instruction in order to run it on CPU, GPU, Android, iOS. And internal TensorFlow will have all the information. So that's part of the TensorFlow internal. We are not worried about that as a developer. So our TensorFlow will have a way of converting your instruction so that they run on CPU, GPU, and all this kind. So let's say if I ask you to-- I give you two jobs. Add two numbers, add A, B. Will you be able to write that function, add A, B? Pretty easy, right? Now I ask you a slightly harder question. I give you a photograph. And I ask you detect face of a person in the photograph. Can you write the program for that? Yeah, so maybe you can construct features by hand, and then write rules, or something like that. But rule-based systems are brittle. If I take a photograph of a person from a slightly different angle, probably it will not work if you write rules that are hand-coded for a straight face, straight, camera-facing face. So what's the difference? I want you to think, take a step back, and think about the difference between these two tasks. The first task is write a program to add two numbers. And the second task is find the human face in the photograph. What is the difference between these two tasks? So essentially, in the first task, you know the function. You know the exact mapping, how to take two numbers and perform addition. You know the function, F of A, B is A plus B. But in the other task, unfortunately, we do not know the function. So F of a photograph, and I want, let's say, output of 1 if face is present, 0 otherwise. I do not know the function, correct? But as a human, you can easily recognize it, right? So why are we able to recognize that? That our brains are trained to recognize those kinds of images. Now, can we do a similar thing with computers? Yes, you can. So that's what is machine learning all about. So what do we need to do machine learning? What is the first and foremost important thing? Data, right? Training data. And you need a lot of training data, so you need to have training data. In this case, you need to have a label training data. So you need to have a photograph and the label associated with which will tell whether there is a face or not. What do we do with training data then? We need training data. That's the first thing. What else? What else is required in order to build machine learning model? Training data as data is covered, right? You need model itself. Model is the first thing. What else? Cost function, right? You need to have a way of evaluating whether a machine learning algorithm is performing correctly. So cost function is another thing. What is the third thing? Once you have cost function, you need to have optimization algorithm to optimize that cost function. And fourth, you need some kind of an evaluation criteria. So one very, very simplified or componentized view of machine learning is putting machine learning algorithm as a point in this four dimensional space, right? You have model, you have cost function, your optimization objective, and you have evaluation criteria, right? So think about any machine learning algorithm, and you will be able to put it as a point in four dimensional space. So what are some of the examples of models? It's important in this context. I'm actually building up the context so that you understand what is in the layers API, right? So what are the example models in machine learning? AUDIENCE: Decision trees. ASHISH TENDULKAR: Decision tree is one model. AUDIENCE: Neural networks. ASHISH TENDULKAR: Neural networks. Linear regression. Theta transpose X. Logistic regression, which is we take linear combination and put it through logistic function. So these are all models. Then what are the examples of cost function? What do we use as a cost function in regression kind of problems? AUDIENCE: List creditors. ASHISH TENDULKAR: List creditors, very good. What about classification? AUDIENCE: Cross-entropy. ASHISH TENDULKAR: Cross-entropy, right? So these are cost functions. What kind of optimization algorithms do we use? AUDIENCE: Item optimizers. ASHISH TENDULKAR: Item optimizer. The most is gradient descent, right? Stochastic gradient descent. Yeah. There are many of them, right? So these are all optimization algorithm. And what are the evaluation metrics? So accuracy is one. AUDIENCE: F-measures. ASHISH TENDULKAR: F-measure. Is accuracy good every time? AUDIENCE: No. No. ASHISH TENDULKAR: Where does it fail? Whenever you have imbalanced kind of problems, right? Accuracy is a bad measure. Let's say you are asked to write a program to detect spam emails. And let's say spam emails are just 1%. I write a classifier which is that everything is good. Accuracy is 99%. Are you getting what I'm saying? So accuracy is a bad measure in such case. So we care about what is called as precision recall. Do you know all these things, precision recall? Have you heard about it? Have you heard about confusion matrix? AUDIENCE: Yes. ASHISH TENDULKAR: Yes. So all these are bare nut and bolts of machine learning, right? These are all reusable components in machine learning. You take any machine learning algorithm and it will have these components. So I'll give you one secret mantra of knowing machine. When somebody comes to you and says that I've invented a new machine learning algorithm, don't get worried. Ask five questions. What is the training data? What is the model? What is the cost function? What is the optimization objective? And how do we evaluate this algorithm? If you ask these five questions, everything will be sorted out, got it? OK. So these are all reusable machine learning components that are defined in layers API. It's very important to know what is where in TensorFlow, because it's very easy to get lost. So we talked so far about Python Frontend. This is all about writing your mathematical operation as a data flow graph in layers. So these are now machine learning specific APIs, right? So in layers, we have all the reusable machine learning components. Then people said, you know, why should I care about building models using layer all the time? So people said that, you know, I don't always use this layers API. I need some higher level abstraction to define a new machine learning model. So if you're writing, if you're performing, if you're doing your MS Project research, or even PTP where you need to write your own machine learning algorithm, write a new algorithm, you can also think of using estimator APIs. You can think of estimators as abstract classes. Do you know abstract class? So it defines some kind of a framework. So for any machine learning algorithm, if you want to use estimator, it says that you're to implement what is called a strain function, model function, and test function. So estimator is a framework to build any machine learning model, new machine learning algorithm. And then people said, why should I keep writing the same linear regression? Why every developer should write same linear regression using estimator? So that's why TensorFlow came up with canned estimators. So in canned estimator, TensorFlow is supporting linear regression, logistic regression, and neural network out of the box. So if you want to use the TensorFlow just like scikit-learn, just like scikit API, canned estimator is your answer, got it? So that's what core TensorFlow supports. So there are some other third party canned estimator for decision tree than SVM and for clustering. Random forest is also there. There are three canned estimators supported-- rather supported in the core TensorFlow. One is linear regression, logistic regression, deep neural network for classification and for regression. These four are supported by TensorFlow implementation by Google. In addition to that, there are third party implementations for random forest, SVM, and for clustering. OK, so this is essentially a data flow graph. This is a data flow graph for calculating, for performing ReLU, a basic operation in neural network. What is the basic operation in neural network? We have examples here. There are weights on each of the feature. So we do theta transpose X, right? So we do matrix multiplication of examples and weights. The vectorized implementation of neural network, for example. So we take examples, and we take weight. We do matrix multiplication. So this is a mathematical operation. I want you to look at the nodes in the graph. Each node is a mathematical operation, and edge is another one on which we send the tensors on what information there is. So weights are coming on this edge, examples are coming on this edge, and there is a matrix multiplication happening here. And the result of the matrix multiplication is fed into the addition operation. And there is area with biases. And result of this entire add operation is for it to ReLU, Rectified Linear Units. And then we do another extend operation between labels and whatever you get out of ReLU. So why does TensorFlow define data flow graph? So in a classic TensorFlow, now we have what is called as TensorFlow Eager in which you can define operations and run it as you go. But in a classic TensorFlow, as when we started, there are two distinct steps. The first step is about defining a data flow graph. And second step is execution of the data flow graph. Why do you think TensorFlow has this design philosophy? Why are these two separate things in TensorFlow? Let's try to understand that. That will help you understand why TensorFlow first insists on laying out a data flow graph and then executing it. So what are the advantages that TensorFlow will get by doing this? Now, you can see that I can perform this matrix multiple application operation in, let's say, GPU. Let's say GPUs are good at matrix multiplication. I can take this piece of graph and schedule it on GPU. Then I can perform this addition on CPU. So it knows exactly the dependency between the operations, and it can decide which operations can be paralyzed. And when TensorFlow-- we'll see it in a moment-- yeah, we talked about it. Edges are n-dimensional arrays that are tensors. And computation is a data flow graph. Yeah, this is what happens. Now TensorFlow can receive the computation that's set to Device B and Device A, right? And now you'll wonder-- now there are two devices. What is required? Some way of communicating, right? You need a notion of communication. So that is handled by TensorFlow. So TensorFlow will include send/receive nodes automatically. You don't have to do that. Are you getting what I'm saying? So imagine you, you're self-implementing such a receivable training. You'd have to do all these things that TensorFlow is doing for you. That may be partially answering what your questions are. OK? Got it? Any questions on this? So this-- look at it. Send/receive nodes are put there. Then it also puts some other mathematical operators, something like differentiation, automatic differentiation. Where do you use differentiation in machine learning? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: In optimization, right? Whenever you are trying to optimize the parameters. In gradient descent, we take partial derivative of what? Of lost function or its cost function, right? So if TensorFlow doesn't do it, you would have to supply a function to perform the differentiation operation. But TensorFlow has an in-built facility to do the differentiation. And it actually inserts this node automatically. Plus, TensorFlow also has a specialized linear algebra operation compiler that also optimizes some of the mathematical functions for you. You can think of this as a compilation phase. And all kinds of optimizations are done at that stage. These are basic send/receive implementation. Right now, this is where the extensibility comes in. There are a number of standard operations and kernels. You can also define your own operators and kernels in TensorFlow. If you want to extend TensorFlow to support new kind of hardware, our TensorFlow gives facility to write your own operations and kernels. These are device-specific implementations of operations and kernels. This is where you can explore if you want to support new kind of hardware. So this is a single-process configuration. So there is a client. So what happens is that first we define a data flow graph, and then we run it in a specific context. So we create a station, and every data flow graph is executed in the context of that session. So if it's a single-process configuration, we say that station run, which then executives a subgraph on the worker. In this separate configuration, what happens is that the master process spawns multiple workers and get the work done. And what you see here, this is a TensorFlow graph for neural network. So you have a logit layer. You have a ReLU layer. You have input. You have reshape operation here. And this is stochastic gradient descent trainer. You can see that this is basically a neural network graph. In practice, it can be very complex with hundred to thousands of nodes and edges. But you are insulated. If you're writing a layers API, you don't have to deal with writing a data flow graph of the low layer operation because layers will have already implemented some of these subgraphs. For example, this SGD trainer might be already implemented by the layers API. In the same manner, there is logic layer that is already implemented, a ReLU layer that is already implemented for you. In course of the workshop, we will look at building TensorFlow model right from canned estimator to the low level API. All right, let's peep into the second one-- Python Frontend. So we already talked about data flow graph. So that's the code of TensorFlow. So this is how you build your data flow graphs. Import TensorFlow as TF is the standard input. Then, this is how you define your session, tf.Session. So we define session. And then we are now defining a data flow graph-- so tf.constant. So this is a way of specifying a constant tensor called constant multidimensional array. And I have a two-dimensional array. The first row is 5, 6. Second row is 7, 8. And then I'm performing the multiplication of this extensor with itself. Now, it is only defining a tensor. It does not, right now, have any value in it. So we will see actually in the lab, when we say, print x, it will not actually print the content of this tensor. It will just print that x is a tensor of type constant. And it holds probably numeric values. So that's information you'll get. But in the same manner, there is another tensor which performs first multiplication, then it adds another multiplication to it. So this is multiplication between the matrix x to itself. And then we are doing multiplication of x with an indicator matrix or a diagonal matrix, which has one on the diagonal. Getting it? And then there is additional operation. So now you can see how this graph is actually represented here. So you can see that there is a constant x, and there is another constant, which is this constant matrix. These are my input tensors. So this constant is multiplied to itself through MatMul operation. And then there is another matrix multiplication happening between this constant matrix and the diagonal matrix, got it? That is represented in this MatMul operation. And then I'm performing addition of these two. Result of this matrix multiplication operation is added to result of this matrix multiplication operation. And when I do z.eval, that is a time when we are executing this data flow graph. And we are saying that I should evaluate z, which is this particular node, this particular thing. In the same manner, I can say evaluate x. Then it will evaluate only this part of the graph. So you can evaluate at any part of the graph. You can lay down your complete graph, and then evaluate any subgraph of that particular graph. That is OK. Got it? But you have to run it under tf.Session. So let's do this first Basics of TensorFlow practical. This is something that I'll run on my screen. So we use Python. You people are familiar with Python. You write your program and then execute it. That's how you work, right? So Jupyter Notebook is a very nice way of building demos where you interleave the documentation and the code. Otherwise, what do you do? Are you [? Recess Colors? ?] Are you running your company? What is it? [? Recess Colors? ?] Must be writing demos, right? Let's say you want to quickly try some method. You read a paper and you want to try it and show it to your advisor. What do you do? You will write the program in some separate file, and then you'll have paper that will be in a separate document. And then you have to constantly refer to the paper and document to explain that. The Jupyter Notebook solves that problem for you. So Jupyter Notebook gives you a way where you can write some document, and then you can write the code block. So what you can do is, when you're writing, when you're implementing some research paper, you can start with the Jupyter Notebook. You can run this locally as well, and you can write documentation like this. So I have written some documentation on basics of TensorFlow. And then I can run now-- basic unit in Jupyter Notebook is a cell. There are two types of cells. So this is our text cell. And then second cell is the code cell. I can run this code cell by pressing this Run button. So now this code cell is run. Let's say if I want to do something-- print finished import. You can see this, right? So you can run this Jupyter Notebook cell by cell and see the output. So let's say you want to demonstrate two competing approaches. So you write your code for first approach and second approach. In the same document itself, you will be able to get all the comparisons done. Is the idea clear about Jupyter Notebook for those who are not familiar with it? You can run it cell by cell, OK? Let's print "hello, TensorFlow." This is our tradition to print "hello, world" in any language that we learn, right? So let's see how to do that in TensorFlow. All right, I have some output that I'll clear first. Current output, clear. So what is it that I'm doing here? I'm defining a "hello, Tensor," which is holding a constant, which is holding a TensorFlow constant answer with value "hello, TensorFlow!" Now, let's print this hello and see what is there in that. We already talked about it, so I have already leaked the question. No point asking this question again. And then I define this session, tf.Session. And I say session.run, hello. OK, let's see what comes out of this. OK, so the first print has printed. See this, what it has printed? This is a description for tensor. It says that hello is a tensor. Constant:0, this is an automatically generated name by TensorFlow program. Second most important thing is shape of a tensor. Since we are talking about a scalar, the shape is-- we use numpy. How many of you use numpy? It's the same concept as numpy. And the type is string. The type of object that this instance will hold will be of type string. And the second print, second session that run print actually prints "Hello, TensorFlow!". Got it? This is the point that I was trying to emphasize earlier, that if you just do print hello without running your node in the context of session, you don't see "Hello, TensorFlow!" but you see the description of the tensor. And when I run hello in the context of a session, I see "Hello, TensorFlow!". Got it? OK. So this is how you can define constant. You can define a scalar. You can use shape and rank. Let's bring them and see what comes out as a shape and rank. So shape in zero in this case. So for scalar, shape is zero. Now, let's look at what is the shape for a list. So you can see tf.constant. I can define a list also. So this is a rank one tensor. So let's print the shape again. Except this basic thing, I give all the notebooks to you to execute so you will not be bored. You can keep writing in this notebook and see what comes out of this. So here the shape is null because it's a scalar. When I have a list, you can see that the shape is three here because I have three elements in the list. I can also define 2D-- I can also define a matrix. This is a matrix which has two rows and three columns. Let's look at the shape of that. You can see that the shape is 2, 3. There are two rows and three columns. You can also define rank three tensor here. So I can say print extensor here. You can see that the shape is 2, 1, 3. I'm defining a 3D array here. OK, got it? Any questions? OK. We'll shift gears now. Now, this is how you define a plus b. So I'm defining two arrays, two lists. And I can add these two lists just using a simple addition operation. Let's try to do this. So this is defining data flow graph. That's why you see the information of the tensors. And now I'll define a session and run total, and you will see that three plus four is seven. That you see here. And two plus one is three. That you see here. Got it? OK, so that was about constant. Constants are not interesting. We talked about it. What is the most important thing in machine learning? We talked about training data, right? And you cannot have your training data in memory. I cannot really write training data as in memory objects and hope to input them to tensor, right? So your training data will normally be in a database, or you might have a file, a big, big file, correct? And now you need to have a way of inputting this file in TensorFlow. So tf.placeholder is your friend for that. So you should use tf.placeholder. So this is more like a promise. I am promising TensorFlow that I will input or tie an object or a bunch of values of float32, and hold a place for these float values for me in tensor x. Then in the same manner, I'll define another place holder for y. So in x, in this case, I'm going to hold all my inputs. In y, I'll hold my labels, for example. And then I can do z equals x plus y. Now, since I'm doing placeholder, there is a specific way in which you have to feed data into this placeholder. So we use a mechanism through feed dictionary. It's more like a dictionary where I can initialize values for the placeholder. Are you getting what I'm saying? In the back? OK. So here I have initialized my x with value 3 and y with value 4.5. Instead of saying that x and y are constant, I'm saying that now they're placeholders, and I'm inputting them through feed dictionary. And I will run the z under session object. I'll run this. In the second instance, I'm actually initializing x and y to lists. One has 1, 3, and second is 2, 4. And when I run it, I see the first run statement has printed the sum of 3 and 4.5. That is 7.5, correct? And second print has printed the sum of two vectors, which is 3, 7-- 3 and 7. No rocket science. I'm just demonstrating how you can use TensorFlow to perform these basic operations. Yeah, feed dictionary is a mechanism of putting values into the placeholder. So tomorrow-- we'll see this in the workshop. When I'm reading data from files, I'll read it in memory first on the file, and then I'll use feed dictionary mechanism to initialize the placeholders. I think TensorFlow Eager helps you to do operations as you lay your data flow graph. We're not going to cover TensorFlow Eager today. In a classic TensorFlow, the way it started, this was through the API 1.6 where you have to first define a data flow graph like this. And then it's important to run it in a session. So let's say if you have two graphs-- I'll put it another way-- you can define two data flow graphs. And then you can define a session for a graph and then run the session within that graph. So you have to associate a graph to a session and then run it. If you don't specify any graph, then the full graph is run under the session. You will have to start a session and specify what graph you want to use for that particular session. We can also define variables. So for example, in machine learning algorithm, we also need variables, right? Variables-- like variable to hold weights, biases. These are classical variables in programming language. Variables that can take different values in the course of program. So let's go back. OK, so that was the first practical which I walked you through, right? This point on, whatever practicals now on that will come, you'll be doing it on your own, OK? So I'm going to start slightly in a reverse manner. So we looked at here, right? Now we'll go all the way to the top. This is for people who do not know much about TensorFlow and people who just only use TensorFlow like scikit-learn API. So for the benefit of those people, we'll start from here. And then we will go into deeper-- then we'll go to layers API where you'll actually be writing your own machine learning models using layers API, OK? So the first practical will be pretty straightforward and easy for many of you. So let's look at canned estimator. How many of you have written scikit-learn API? Have you written your machine learning in scikit-learn? So not many of you have written machine learning programs. How many of you have written machine learning programs in some language or the other? OK, so what language do you use? AUDIENCE: Python. ASHISH TENDULKAR: Python. But in Python, what do you do? AUDIENCE: PyTorch. ASHISH TENDULKAR: PyTorch, OK, fine. PyTorch, OK. So if you're using PyTorch, you should also try TensorFlow Eager. It's very, very similar. OK, fine. So let's look at canned estimator. So this is the call symbol for canned estimator. If you have written scikit-learn program, you'll see that this is very similar to scikit-learn. So in TensorFlow, there's another concept called feature columns. So feature column is a concept to give input to the TensorFlow canned estimator. So you have to use what is called as-- you have to specify what is your data type. So I define one real valued column square foot. Then rooms, which is another real valued column. And then there's a zip code. I'm saying that is a sparse column with individualized feature. There can be multiple types of feature in machine learning, right? So how do you handle discreet values in machine learning algorithm? Let's say you have a feature called color, and color red, blue, green. How will you represent this red, blue, green so that computer understands it? So one possible way to represent red, blue, green is using, let's say, three binary variables. So when red is on, I will put the first variable as 1. When the blue is on, I'll put second variable as 1. And when green is there, I'll put third variable as 1. It's called one hot encoding. That means out of these three variables, only one will be on at a given point in time. So this is a standard way of handling your categorical data, but it's not the only way. So sometimes what happens is that if you're doing natural language processing, and then there are words, right? So you will have this one hot encoding running into millions of dimensions. And you want, let's say, some kind of low dimensional representation of this thing. So there is something called as embeddings, which is more like a continuous representation of the words in much lower dimension. Because what happens with one hot encoding is that you get large amount of sparse features. There's a lot of sparsity that gets introduced in your training data. And embeddings is one way of reducing that sparsity and getting denser presentation. And then I just use this linear regressive function and feed the feature column into it, and that's it. I have a regressor. And then I say regressor.fit and regressor.evaluate. So fit will do their training, and evaluate will do the evaluation. AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: Yeah. There are more such kind of functions to handle your different type of variables. AUDIENCE: But these kind of functions are only for the [INAUDIBLE]? ASHISH TENDULKAR: No. They are available also for your standard any other things. Not a problem. OK. I just change the regressor to DNNRegressor, right? So you can see that I just had to change the DNNRegressor and put it in units. I have a neural network model. Nothing changes. Only these two changes, I get a new regressor. It's that easy. People who are starting in machine learning, people do not know much about it, want to use it as a black box. This is a great way of doing it. So I talked about embedding columns. So you can also use embeddings. Embeddings are done through neural networks, a single layer neural network. And then let's look at TensorBoard. Now, you're doing training, right? So what are you generally interested in training? You're interested in how the convergence is happening, right? Correct? Whether model is learning or not. So bear with me. There are a few of our friends. They have not done gradient descent. So I'll try to explain what is gradient descent. Can everybody see this? So we have what is called as-- so this is our parameter, right? I'll keep it simple with one parameter. And this is loss. So there is a graph between what is the value parameter and what is loss. And let's say I have a convex loss. I'll take a simple example. But gradient descent you can also use for non-convex functions, although convergence to global minima is not guaranteed. So the way gradient descent algorithm works, it's very, very intuitive and very interesting idea. So do you do trekking? You go to the hill top, right? And now you want to come down. What do you do? So now I look around, and I figure out the direction of the steepest descent and follow that direction. So exactly the same idea is applied in gradient descent, right? I want to find out optimal value of theta that minimizes my loss. Obviously, we can see that this is the point, right? So we have an internal algorithm. We start here somewhere, anywhere. Let's say I start here. And at this point, I'll find out gradient of the loss function. And let's say this is the gradient of the loss function. And then I take steps in the direction of gradient, correct? Now, learning rate defines how far I want to move in the direction of loss function, in the direction of gradient. Even though this is a very steep slope, I don't know to jump and reach at the end of the slope, something like that. So I want to decide how much I want to move in that direction. So a typical update in gradient descent is you set your theta to theta minus alpha times del del-theta of J theta. So this is a partial derivative, right? This is the gradient that I find, and I multiply that degree with a learning rate. Now, is gradient descent, even for convex function, is it guaranteed to converge all the time? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: It depends on the learning rate, right? So if you take small steps, you'll probably reach here. But what if you decide to take big steps? AUDIENCE: You oscillate. ASHISH TENDULKAR: Yeah. You'll oscillate, right? You might go like this, come back like this. Correct? So choosing the right learning rate is very important, both from efficiency perspective and from getting a convergence perspective. So you have this tool. So you want to probably look at-- so this is my epoch, and this is my loss. So for every epoch there is some loss. As I start training, this should ideally decrease. If you're getting something like this, you're good. Then you have right learning rate. But if you get something like this, it's a rare sign for you, right? This is a signature of oscillation, right? Loss is coming down, going up-- very bad. So if you find this, reduce your learning rate. If you find this goes too flat, that means it's learning very slowly. And then you have to scope up improving the learning rate. Put slightly higher learning rate, and then you will get faster to the optima. Why learning is possible? Why is learning possible? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: What are the assumptions that we make? We said that training data and test data comes from the same distribution, IID, Independent and Identically Distributed. So there is one source which is giving you training examples. That assumption is true. Why do I really need to go through all the data again and again? Instead of going through the entire data, I batch the examples into smaller batch size, let's say of 1,000. And I process batch of 1,000, get one update. That's stochastic gradient descent. And when I'm doing stochastic-- no, that's a mini-batch gradient descent. When I'm doing mini-batch gradient descent, there is a possibility that you might see some such kind of template there. And other extreme is I take one example at a time and make update. That's stochastic gradient descent. And one example-- it's not so stable. That's why we always use mini-batch gradient descent for practical purposes. And if you know gradient descent, you can optimize any learning object of any machine learning algorithm. That may not be the most optimal thing to do, but then you have one tool that is applicable for all the cost functions. Have you at least trained one machine learning algorithm-- each one of you? Did you look at the loss? If you're training, let's say, neural integration, did you look at what is the squared error after every iteration? So you want to look at that, right? So TensorBoard is a place to look at. TensorBoard is a tool that comes with TensorFlow where all the statistics are gathered and showed them to you. So you can see, actually, some of the summaries, also of the objects. You can also see how different variables are changing across epochs. So this shows how bias is changing. It also has graphs for accuracy and other matrices, like precision, recall, F-measure. So you have all those graphs in TensorBoard. So this is a very powerful tool. You don't have to, let's say, write other visualization routines in Python to visualize your learning statistics. OK, so now it's your time. Have you cloned this repository under GitHub? Each one of you have cloned a GitHub repository? OK. So go to the workshops notebook that is for canned estimator. I will first walk you through this canned estimator notebook, and then you can try it on your own. Feel free to make changes, explore, print various things, and experience TensorFlow yourself. So in this particular exercise, what we're going to do is we are going to write a classifier to detect handwritten digits to recognize handwritten digits. So there are 10 classes, 0 to 9, right? So there is a standard MNIST data set for handwritten digit recognition. We'll be using that. So we'll be using a keras data set function to load the data. It will write a new training and test set. And then we'll be using numpy input function to input my training data and test data. And finally, I'm going to have a feature specification. Each of my feature column is a numeral. So I have a numerical column. So essentially, each image in this data set is a 28 by 28 pixel. So I linearize it, and I'll make it into 784 dimensional vector. And then I specify linear classifier. So this linear classifer is nothing more than a logistic regression classifier, which takes the featured specification and number of classes, which is 10. And then I train using 1,000 steps. And then I will look at what is the accuracy of that. Yeah, let's start, OK? And we'll also train-- we'll also build our deep neural network classifier and check the accuracy with deep neural network classifier. And you'll compare accuracies of two classifiers on the MNIST digit data set. OK? Right, if you have any questions, you can feel free to ask us. OK. Let's take about 15 minutes or take 20 minutes to finish this lab. Let's look at standard process of machine learning. So now, the first exercise we did was with the images. Now let's look at structured data. Structured data you would encounter at different places. That's a data that is stored in a database or in some files. And we are often required to build machine learning models on it. So we'll take one such data set. This is a housing data set. I will show you a couple of tools. One is this facets. Let's talk about it. So facet is a tool that you can use to visualize and explore your data. So the data set that we are going to look at is adult data set where we will try to predict what is the individual's income. And this will be solved as a classification problem where we will say that we want to predict whether income is more than $50,000 or not, OK? OK, so this requires us to download data from the internet. And again, we'll use keras.utils package to get training data and test data. And the data set is there in UCI repository. And it has got multiple columns, like age, work class, education, occupation, and so on. We'll load data using Pandas. And then we'll apply the first pre-processing. So any missing values, if they're there, we will drop in it. And then we'll separate the labels from the feature in this. So we will get the income out of the vector. And we do one sanity check. We find out what are the shapes of training data and label data. So you can see that we have same number of examples and labels, same with test examples and labels. And the head command, the Panda's head command-- do you know Pandas? How many of you know Pandas? OK. Pandas is another dataframe package. Do you use Python? Have you seen the dataframes in Python? Dataframe is kind of a versatile data structure that can store data of different types. So you can think of it as a collection of dictionaries or a collection of lists. So you can have float values in the first column. You can have character value in the second column and so on. So that is dataframe. So the head command will print first five rows. So you can see that there is age. There is work class, education, and so on. And we can also, in the same manner, look at first few entries of the label columns. So you can see that there is a false and true labels here. And this false and true we got by applying a lambda function. Where the income is greater than $50,000, we said that we want true, or else we want false. OK? And now we have training input function and test input function using Panda's input function. Because we already did that now in Pandas, we'll use Pandas input function from estimator to get the input. One of the very important thing in machine learning is feature engineering. So you have raw features. Sometimes those raw features are not sufficient. You want to, let's say, do one hot encoding, right? Or you want to combine two features. These are examples of feature engineering. So we'll do feature engineering. And feature columns is-- we talked about feature column. Feature column is a preferred way of taking input into TensorFlow programs. So we are going to define our feature columns. For every input column, we have to define a corresponding feature column. So we have a numeric column for age. If you fill that now, look at the age. So what happens is that whenever we are kids, we are not earning anything, right? So our income is zero. Then as you start a career, our income starts going up. And then there is a peak. And then we retire, and then there is no income there. So you can see that there is some kind of a non-linear relationship between income and age. So we decided to bucketize, let's say, income. So I can take the age and define a bucketized column. So bucketization will happen automatically. So you have to specify what kind of buckets you want to create. So you have to specify bucket boundaries. So age is then 30, 31. Age less than 46, 60, 75, 90. The other range is based on the age value of a particular bucketized column will be created for me. And I'm appending the bucketized column to feature column. Then sometimes you also have categorical data with vocabulary list-- for example, degree. So there are bunch of degrees, like bachelor, 11th pass, HS grad, masters, doctorate. All these degrees, right? So there's a fixed dictionary of vocabulary. Based on the vocabulary, you want to assign values to the categorical data. So categorical column with vocabulary list is the function you should be using where you can specify your vocabulary list and get a value for your education. Then, in the same manner, you can hash your categorical data into finite buckets. And cross column is very interesting. In linear regression, we fit a line. We fit a line as a function. Let's say if I want to fit the second order polynomial, how do we do it with linear regression? AUDIENCE: [INAUDIBLE] ASHISH TENDULKAR: Yes, I do feature crosses. Let's say x is my feature. I'll square that particular feature, x squared, x1 squared. If I want to feature x1, x2, I'll do x1 squared, x2 squared, and x1, x2. So essentially taking the cross product of the feature. So if you say a crossed column between age bucket and education bucket, I will create automatic cross between age bucket and education. And then I can append that. And then you can create a canned linear estimator using your classifier. And then you know what to do, right? We train the classifier. And then we evaluate the estimator under training data and under test data, and we look at various statistics. And once we have the model ready, you can use it to predict for the test data. So go through this code lab. And in the same manner, I can define a deep neural network model where I can do feature embeddings. So that is the second part of this lab. So I would request you to go through the code lab, read the documentation, do the experimentation. And we are around to help you out anyways, OK? So after this, this concludes the basic part of the workshop. What we learned is we learned about the basics of TensorFlow. And we also talked about how to use TensorFlow like scikit-learn APIs. So people who want to learn basics, I would ideally like everyone to stay until end of the workshop. But people who just want to learn basics, we are done with basics. In the second half of the workshop, we will try to write our own machine learning models using layers API. That might be interesting for many of you who are doing research and who aspire to building machine learning algorithms and models. So the second part will be extremely relevant for those people. [MUSIC PLAYING]
Info
Channel: Google Developers India
Views: 96,585
Rating: undefined out of 5
Keywords: machine learning, tensorflow, deep learning, tensorboard, custom estimator, API, DNN, android, google machine learning, android developer, google developers, web developer, mobile developer, app developer, developers, developer news, google event, google developer conference, web developer conference, mobile developer conference, developer products, developer platforms, devops
Id: F_uuqfgdZZw
Channel Id: undefined
Length: 61min 9sec (3669 seconds)
Published: Fri Jun 08 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.