Lecture 19 : Introduction to Neural Network

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello welcome to the NPTEL online certification course on Deep Learning. In our previous lecture we have talked about the various non-linearity functions . In today's lecture what we are going to talk about is the Neural Network. And, when we talk about neural network, we will initially see that how different logic functions. The simple functions like and function, or function, or XOR function. Can be implemented using the neural network, then we will talk about the feed forward neural network or multi-layer perceptron. And, we will also talk about the learning or the training mechanism of the feed forward neural network which is known as back propagation learning. So, before we go to the neural network let us quickly recapitulate that what are the different types of nonlinearities or non-linear functions, that we have discussed in our previous lecture . So, we have talked about the very simple type of non-linearity which is the threshold non-linearity, that is if y is a function of x, then y will be equal to 1, if x is greater than or equal to 0 and y will be equal to 0, if x is less than 0. So, this is a simple threshold function or a non-linear function, where the threshold value is equal to 0. I can also have a threshold function, where the threshold value can be nonzero may be say I take the threshold value to be equal to 5 5. So, in that case value of y will be 1 if x is greater than or equal to 5 and it will be 0, if x is less than 5. So, this is the simplest kind of non-linearity, then I can have which which is a threshold function. The other kind of non-linearity that we can have is what is known as sigmoidal function, which is used in logistic regression. So, the sigmoidal function is actually given by 1 by 1 upon e to the power minus S, which is a sigmoidal function of the argument S. Now, in this case since we are talking about the classifications or machine learning techniques, where we will be frequently talking about the dot product of 2 different vectors W and X, were W is the weight vector and X is the sample vector, then our argument S becomes W transpose X. So, the sigmoidal non-linearity or logistic regression will be given by sigma W transpose X is equal to 1 upon 1 plus e to the power minus W transpose X. So, as you find in the right hand side the sigmoidal function has been shown graphically. And, you see that at W transpose X equal to 0, the value of the sigmoidal function is half and as W transpose X goes on increasing, the sigmoidal function asymptotically reaches a value equal to 1. Of course, it will never reach the value equal to 1, but asymptotically you can say that it reaches the value of 1. And, as W transpose X becomes negative, as it increases from the negative side or in other terms W transpose X goes on reducing on the negative side. Then, the sigmoidal function as in asymptotically reaches a value equal to 0. So, this logistic regression actually gives an output a limit on the output, where the output is limited between 0 and 1. And between 0 and 1 we have a smooth transition, where at the center that is at the value of W transpose X equal to 0, the sigmoidal function passes through 0.5 . So, this is another type of non-linearity, which we will see that this can this is widely used in implementation of neural network . The other kind of non-linearity is what is known as rectified linear unit or ReLU, which is given by y is equal to maximum of 0 or x. So, if x is greater than 0, then value of y is equal to x, if x is 0 or less than 0, then value of y will be equal to 0. And, the representation the graphical representation of this ReLU function is also shown on the light right hand side in this figure . So, ReLU is also our non-linearity, which is widely used in modern neural networks particularly when we talk about in deep neural networks or deep learning. So, we will come across all these different types of nonlinearities as we proceed in our discussion . So, let us come to the neural network now. The heart of the neural network is neuron. So, when we talk about neural network. The concept of neural network is actually inspired from the way or we believe our brain works. Of course, till now nobody has been able to say with certainty, how the brain actually functions, but this is what till now what we believe how our train actually functions? So, in our brain we have a network of neurons and if you look at every neuron as is shown in this figure on the right hand side, the neuron consists of a cell body. The cell body so, this is the center of this which is the cell body or the nucleus under cell body collects information, it receives information through a number of sensors coming to the cell body, which through a connectors, which are known as dendrites. Cell body processes this information and the information is outputted through a connection which is named known as action. And, action finally, it branches out and connects to other neurons through synaptic connections. And, it is believed that the information as it passes through actions and then finally, branches out and then it is passed on to other neurons in the network, through a synaptic connections in this process there is a multiplicative interaction. What is that multiplicative interaction? If the signal outputted by the cell body is the X then when it reaches reaches the other neurons, through this multiplicative interaction coming through the synaptic connections, the value which reaches the other neurons is W times x . So, this is the kind of multiplicative interaction, which is given in the neural network or in the network in the brain. So, when you talk about neural network, we will also see that the neural networks are derived from this particular concept . So, what we have in neurons? In neurons we have sales, which receives signals through dendrites and it pass it passes this signals after processing to the other neurons in the network through synaptic connections. So, the processing is done in an unit in the cell which is known as soma and action is the connecting path, which transmits the signal from one neuron to another neuron. So, this is what is the concept of a neuron in human brains? . So, when we talk about a neuron in our neural network, you find that here also every neuron consists of a functional unit, which is the cell body given by this unit. This collects information X or the vector X through a number of inputs, which if we are equivalent to dendrites. And, when this inputs are coming to the cell body, they passed to our weighting function given by W or weight values given by the weight vector W. And, the output of the neuron is of the form some function of W transpose X, where W is the weight vectors, X is the input vector and output y of the neuron will be a function of W transpose X. And, when you talk about neural network this function f in most of the cases is a non-linear function like the non-linear functions that we have discussed before. So, we will come to the use of those non-linear functions in neural network in our discussions . So, given this model of the neuron a neural network is nothing, but an interconnection of all those neurons. So, here you find that in this figure what we have shown is we have a number of neurons, which collects information X, that is our information vector or sample vector. And, in every level it is passed through or multiplied by an weight vector W. So, I will have a set of weight vectors over here. This processed information from every neuron is passed to the other neurons through the dendrites or synapses, and while it passes through these dendrites or synapses while passing they are also multiplied by another set of weights or weight vectors W and it continues . And, finally, when you get the output? The output of every neuron or every unit in this neural network is given by this function W transpose X and usually a non-linear function f of this W transpose X, that is what is the output of every neuron right . So, this is how is the architecture of a neural network looks like. So, given this let us now see that how these neural networks can be used to implement various functions. So, the first function which is a very simple function, that we discussed is an and function. And, the and operation that we are going to consider which is a logical operation on 2 units or 2 inputs the inputs are X 1 and X 2 and; obviously, these inputs are binary inputs. So, I have the input vector, which is given by X 1 X 2 and my output function, which is an and function is given by y as shown in the table on the left hand side. So, as you all know that if the input vector is 0 0, then given the function to be and function the output is; obviously, 0 if the input is 0 1 the output is also 0, if the input is 1 0 output is 0 only when input is 1 1 that is both the inputs, both the variables on the input which are binary variables are 1, then only the output of the and logic will be equal to 1. So, if I consider this X 1 and X 2 which are inputs to this and get to be the features or X 1 X 2 given together become a binary feature vector, then I have a feature space or a 2 dimensional feature space. So, if I plot these outputs in this 2 dimensional feature space as given on the right hand side of this figure over here, you see that when X 1 is 1 and X 2 is 0 the output is 0, which is shown here. If, X 1 is 0 and X 2 is 1 then also output is 0. If, X 1 is 0 X 2 is 0 output is 0 only when both X 1 and X 2 they are 1 the output is 1 . So, this is how the functional values will be distributed in the feature space given by the features X 1 and X 2. Now, I can consider this to be a classification problem that is when I am considering the input to be a binary feature vector, then I can consider the output to belong to 1 of the 2 process or the input vectors belonging to 1 of the 2 classes. In, 1 class which is class 1 and the other class which is class 0. So, all the feature vectors 0 0 0 1 and 1 0 they will belong to 1 class when the output should be equal to 0 and only when the feature vector is 1 1 it should belong to another class and output will be equal to 1. And, those distributions of the feature vectors are as shown in this plot on the right hand side . Now, consider and now find that considering this to be a binary classification problem. I have to find out a classifying boundary or a classifier which classifies these 2 classes. And, as you see that this is a linear problem as I can separate these 2 classes by using linear boundaries. And, over here though there are multiple boundaries possible that is I can have this as a linear line which separates these 2 classes, this can also be a linear separator which is which separates these 2 classes , but 1 of the option is as shown over here . And, you find that equation of the straight line in this 2 dimensional space is given by X 1 plus X 2 minus 1.5 equal to 0. And, as I said that this is one of the many possible linear boundaries that I can have between these 2 classes . So, considering this now you find that I can consider this to be a feature vector, where my feature vector is 1 X 1 X 2 and I have an weight vector which is given by minus 1.5 1 1 . So, equation of this straight line in that case becomes W transpose X or X transpose W whichever way I put it because the value of W transpose the X and X transpose W is same. So, the equation of this straight line is given by W transpose X equal to 0 or X transpose in W equal to 0. And, the feature vectors 0 0 0 1 and 1 0 that will fall on one side of the straight line and the feature vector 1 1 will fall on the other side of the state line. And, incidentally if you analyze you find that this particular equation, the classifier that I get this is nothing, but a 2 class support vector machine or a binary support vector machine, because it maximizes the virtual margin. And, as I said that though there are many possible straight lines that I can draw for them the margin will be less than the margin which is given by this. So, this is also a support vector machine. Now, given this now let us see that how I can implement this using a neural network . So, as I said before all the feature vectors taken together I can put that in the form of our matrix and we are also putting this in unified form that is I am adding in each of the feature vectors 1 additional element which will be equal to 1 . So, my feature vectors are 1 0 0. So, 1 is added which is an additional element as we have shown over here. So, this is one of the feature vector, which is 1 0 0 1 0 1 is another feature vector, 1 1 0 is another feature vector and 1 1 1 is the fourth feature vector. And, we also said so, all these feature vectors are represented are put together in the form of a matrix ok. And, out of this we know that this first 4 feature vectors they belong to say class omega 1 for which output will be equal to 0, and for this it belongs to class omega 2 for which output will be equal to 1. And, I also have this weight vector W which is minus 1.5 1 1 . So, given this representation representing all the feature vectors in the form of a matrix and that weight victor now how my classifier will work . Let us see this one . So, I can put it in the form of X transpose W, whereyeah I just put it this way. Here instead of writing this as X I will write this as X transpose, because whenever we talk about a vector we usually talk the vector as a column vector. So, this 1 0 0, which is go over here it is actually 1 0 0, that is a column vector . So, instead of writing this matrix as X let us put this as X transpose. So, that every row in this matrix is actually transpose of our feature vectors right. So, with this understanding you find that the way the classified will actually work is if I compute the W transpose X sorry X transpose W where X is this matrix X transpose is this matrix. And, W is a weight vector which is this, then the output of this multiplication this matrix multiplication is minus 1.5 then minus 0.5 minus 0.5 and 0.5. Now, if I pass it through a non-linearity. So, if you remember we said that this nonlinearities are non-linear functions are widely used in neural networks . So, this vector that I get, if I pass it through a non-linearity, which is a threshold function my output becomes 0 0 0 1. So, the threshold function is when the input is less than 0 the output should be equal to output should be 0, if the input is greater than 0, then output will be 1. So, in all these cases here it is minus 1.5 which is less than 0; obviously. So, I will have an output 0 here here it is minus 0.5, again I will have an output 0 here it is minus 0.5, again I will have an output 0 here it is plus 0.5 which is greater than 0. So, here I get an output equal to 1 . So, you find that this matrix multiplication followed by this threshold operations actually perform an and operation which is a logical operation . So, given this now how a neural network I can design a neuron to perform this particular task. So, in case of neuron as it inputs the feature vectors and I say that the feature vectors are inputted through the dendrites. One of the input I will put it as 1. Because the feature vector X 1 X 2 we are converting that to 1 X 1 and X 2, we are adding an additional component and making that equal to 1, which is you know a unified representation. So, I have one over here, I have X 1 over here, and I have X 2 over here, which are my input vectors. Then, the weight vector I put it as minus 1.5 here it is 1, here it is 1 . So, this function of the neuron I can put it in 2 forms to 2 parts. The first part computes W transpose X and what is this W transpose X? W transpose X is nothing, but X 1 plus X 2 minus 1.5. So, it becomes X 1 plus X 2 minus 1.5. And, then the second part of the neuron that gives you the threshold function. This threshold is and at the output what I get is function y is equal to f of W transpose X . So, I can put this either in the form of W transpose X or I can also write it as W i X i, i varying from 0 to 2 and function of this. And, in this particular case this function with these weight vectors will be an and function. So, this is one of the ways you will find that, I can implement an and logic, which I can pose as a classifier problem as a binary classifier problem with binary inputs. So, we have 2 dimensional binary inputs X 1 and X 2 and that classifier can easily be implemented by a single neuron. So, I do not need multiple number of neurons or in neural network for that purpose. So, using a simple single neuron, neuron having a threshold non-linearity can implement an and logic . In the same manner let us consider that whether I can have some other logical functions to be implemented in the same manner . So, I consider here the other logical function which is an or function. Again in case of or function I again consider the inputs to be 2 dimensional binary vectors having components X 1 and X 2. So, my function will be a both one X 1, X 1 and X 2, then output should be 0 in all other cases that is if the inputs are only when X 1 and X 2 both of them are 0. Then output should be 0 that is it belongs to one class and in all other cases that, when X 1 X 2 is 0 1 or X 1 X 2 is 0 1 0 or X 1 X 2 is 1 1 then output should be equal to 1 indicating that these feature vectors belong to the other class. Again as before if I plot these feature vectors in the 2 dimensional feature space as given over here, you will find that when X 1, X 2 both of them are 0 the output is 0. In all other cases the output is 1. Here again it becomes a linearly separable problem. Again you can see that I can have multiple number of straight lines, or infinite number of straight lines, which separates these 2 different classes. One of all these straight lines; one of the straight lines is given by X 1 plus X 2 minus 0.5 which is equal to 0. And, here you can easily verify that both if both X 1 and X 2 are 0s then output becomes minus 0.5 . However, if any of or both X 1 and X 2 they are equal to 1, then output becomes output becomes plus 1.5, clearly indicating that when it is 0 0 the output is negative in all other cases the output is positive ok. So, when it when both of X 1 and X 2 are 1 1, the output is 1.5. If, one of them is 1 and the other 1 is 0 the output is 0.5. However, in all these 3 cases the output is positive . So, given this so, this becomes a simple linear classifier and when I have this simple linear classifier, you find that a single straight line in the feature space can separate these two different classes . How can I put it in the form of a neuron? Can, I how can I implement it in the form of a neuron? So, here as we have shown before in the same manner I can do it as X transpose W for the matrix X is the matrix which is formed from all those 2 dimensional feature vectors, W is the weight vector which indicates, what is the separating plane between the two different classes. If, I perform W transpose X, then this is the output vector that I get again you pass through this threshold non-linearity. So, output becomes 0 1 1 1. So, when my input vector is 0 0 output is 0, when the input vector is 0 1 output is 1 1 1 0 again output is 1 1 1 the output is 1 . So, this simple operation implements the or logic. And, how do I implement it in neural network again a very simple . I put the input vector to be 1 X 1 X 2 and the weight vectors will be minus 0.5 1 1, this neuron computes W transpose X and I have this threshold non-linearity, which performs f of W transpose X, where this f is nothing, but the threshold non-linearity. And, at the output what I get is an or function . So, again you find that using a single neuron I can implement a or function. So, it has been possible to implement these logical functions using a single neuron, because the problem that have considered they are linearly separable problems both and and or functions they are linearly separable. But, if the problems becomes non separable, what will be our situation and how we can solve those problems using neurons or neural networks, that we will explain in our next lecture . Thank you .
Info
Channel: IIT Kharagpur July 2018
Views: 8,933
Rating: undefined out of 5
Keywords:
Id: QlhHqMnd9Wo
Channel Id: undefined
Length: 29min 14sec (1754 seconds)
Published: Thu Aug 15 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.