Tensorflow Tutorial for Beginners | Tensorflow on Neural Networks | Intellipaat

Video Statistics and Information

Video

Captions Word Cloud

Captions

[Music] hey guys welcome to the session by in telepath so you guys would have whole of Google self-driving car a car which drives by itself now that is truly amazing isn't it this revolutionary technology is possible because of artificial intelligence and implement artificial intelligence you would need some tools so this is where deep learning frameworks come in and in today's session will work with the denser fluid deep learning framework now before we start with the class you subscribe to our channel so that you don't miss out on our upcoming videos also if you are interested in doing an end-to-end certification course in artificial intelligence then intellibid provides is the right course to gain all of the requisite skills so let's a quick glance at the agenda we'll start by understanding the terms artificial intelligence machine learning and deep learning and then we'll comprehensively learn about artificial intelligence after that we will look at some deep learning frameworks and have an introduction to tens of look going ahead we'll learn about neural networks of that we'll be implementing a demo on the mni AC data set using tensor flow so base without much steel let's start off with the class so I'll start off by asking you a very simple question so tell me what is it that makes humans intelligent well we as humans can think learn and make decisions and the others what makes is intelligent now imagine if machines can show human-like intelligence a machine which can think and make decisions like humans that is truly amazing isn't it so artificial intelligence is basically that field of computer science which emphasizes on the creation of intelligent machines which can work and react like humans so now that we know what artificial intelligences let's see where do machine learning and deep learning Faden so you can consider artificial intelligence to be the broader umbrella and machine learning and deep learning to be a subset of it or you can also say that machine learning and deep learning are a means to achieve artificial intelligence now let's see what machine learning us so machine learning is basically the subset of artificial intelligence where we teach a machine how to make decisions with the help of input data so we'll understand machine learning with this little example so what you see over here what is this exactly well it's a bird and what about this this again is a and this well this too was a poet now how do you know all of these are boots well as a kid you might have come across a picture of a bird and you would have been told by your kindergarten teacher or your parents that this is a poet and your brain learned that anything which looks like that is a bird and that is how our brain functions but what about a machine now if I take in this image of the bird and feed it to a machine will it be able to identify it as a bird so this is where machine learning comes in so what I'll do is I'll take all of these images of the birds and keep on feeding them to the machine until it learns all the features associated with it and once it learns all the features associated with it I'll give it new data to determine how well it has learned or in other words first I'll feed in training data to the machine so that it can extract or learn all the features associated with the training data and once the learning is done I'll give it new data or the test data to determine how well the learning is done and this is the underlying concept of machine learning now let's head on to deep learning so deep learning is the subset of machine learning where we develop intelligent algorithms which mimic human brain so now the question which arises over here is how do we mimic human brain well to answer that let me ask another question so what is the brain composed of well a brain is primarily composed of neurons isn't it and these neurons send and receive electrochemical signals so we have a new run over here and the electrochemical signals are received through the dendrite the processing of these signals is done in the cell body and the output of these input signals is sent to other neurons to the axon and if our task is to mimic human brain all we have to do is create artificial neurons and these artificial neurons work the same way as biological neurons so to implement deep learning we have to create artificial neural networks and these artificial neural networks comprised of an input layer a hidden layer and an output layer so all of the inputs are received through the input layer and the processing is done in the hidden layer and the final output is received through the output layer and to sum it up artificial intelligence is the broader umbrella machine learning is the subset of artificial intelligence and deep learning is the subset of machine learning and machine learning and deep learning are basically methods to achieve artificial intelligence so let's understand why should we study artificial intelligence so Yei well e is actually everywhere it's omnipresent like God and yes applications are present in every single industry from banking and finance to medical science and also in aerospace now it is actually a known fact that many banks have numerous activities on our d2d pieces which need to be done accurately and most of these activities take up a lot of time and effort from the employees and at times there is also a chance of a human error in this activity so to speak so some of the works that banks and financial institutions handle are investing money in stocks financial operations managing various properties and so on and with the use of AI system in this process the institutions are able to achieve efficient results in a quick turnaround time right guys just a quick info if you are interested in doing an end-to-end certification course in artificial intelligence then Intel pod provides is the right course for you to master all of the concepts with respect to this field and you can check out the course details in the description below so let's continue with the class so the strategic implementation of artificial intelligence in the bank helps them to focus on every customer and provide them quick resolution and similarly in medical science field as well it has wide applications so here has completely changed the way medical science was perceived just a few years ago so there are numerous areas and medical science where AI is used to achieve incredible value so the help of AI the medical science was able to create a virtual personal healthcare assistant so these are used for research and this purpose there are also many efficient healthcare BOTS introduced in the medical field to provide constant health support to patients and that is also used in the aerospace industry so in aerospace there are a lot of features from booking the tickets to the takeoff and operation of the flights that a I takes care of AI applications make air transport efficient fast safe and also provides comfortable journey to the passengers and it has also changed the fees of caming so these days were able to play TV n computer games on the whole new level all thanks to artificial intelligence application so these are just some of the applications where artificial intelligence is used and all in all is used to reinvent the world so scientists are riding on the back of AI when machine intelligence will surpass the human intelligence scientists believe that once the AI system starts working in its full capacity it will reinvent the world that we know today so think of the world where all the menial tasks such as garbage disposal construction digging and so on will be taken care of by AI applications so it will all be a time when the hierarchical order dictates the limits of human it will be the world where no one will be looked down upon and every human will be considered equal so this way humans can then focus their strengths on higher levels of work to accomplish a lot more and always taking technology to new heights so now that we've understood the importance of artificial intelligence let's learn more about the AI so he is basically a field of computer science which emphasizes on the creation of intelligent machines which can work and react like humans so I am reiterating it so E is that field of computer science where we create machine which can work and react like humans so let's keep this definition at the back of our head using this definition let's actually look at some applications of AI which are currently existing in today's world so we have simple chat BOTS like okie Google and city which help us in assisting whatever we want so let's say I want to know the current time all I need to do is ask Siri Siri tell me what is the current time similarly if I want to know the distance between India and Malaysia I'll ask Siri tell me where is the distance between India and Malaysia again let's see i underside and I want to listen to a simple joke I'll ask Siri tell me a simple joke right so these are some of the applications of artificial intelligence and then we have Sofia so Sofia as the first humanoid robot so she is the first humanoid robot who can actually speak to us like natural humans so Sofia can show some wide range of emotions exhibited by humans but she is actually a Robo another application of artificial intelligence is a self-driving car so you have self-driving cars by Google and Tesla which actually drive by themselves so they do not need any external driver to drive right so these cars work by themselves similar to self-driving cars you also have cell flying drones which do not need any human intervention and they can navigate by themselves right guys just a quick info if you're interested in doing an end-to-end certification course and artificial intelligence then Intel Power provides is the right course for you to master all of the concepts with respect to this field and you can check out the course details in the description below so let's continue with the class so now let's actually get a bit deeper and understand what is intelligence so intelligence can be defined as one's capacity for understanding one's capacity for self-awareness one's capacity for learning and one's capacity for problem-solving there is how well is something or someone able to understand how well is someone able to learn new things and how well as someone able to solve problems by themselves so now that we know world's intelligence let's understand what is artificial intelligence so when you apply the same intelligence to machines this is known as artificial intelligence let us imagine there's a machine which can understand things which are normally understood by human there is a machine which is self-aware and there is a machine which can solve problems by itself that's just amazing isn't it right so this is the artificial intelligence which I am talking about so now so now let me also know what is intelligence I'll ask another question so tell me what is it that makes humans intelligent well we as humans can reason we as humans can learn we can perceive we can solve problems and we also have linguistic intelligence dollars we can figure out what is someone else seen and we can also understand the grammatical intricacies of different languages so again my question would be what if a machine could exhibit all of these factors normally shown by human again that's just amazing isn't it so this as what is known as artificial intelligence so a machine which can shoe reads normally shown by a human that has known as artificial intelligence all right so now that we're clear with artificial intelligence let's segregate AI ml and DL so normally most people get confused between artificial intelligence machine learning and deep learning so this is where I'm going to help you out in understanding the difference between these three so we have AI at the top and you can consider machine learning and deep learning to be subsets of AI so again machine learning and deep learning are just ways to achieve artificial intelligence so I'll restate it again machine learning and deep learning are just ways to achieve artificial intelligence now machine learning is that part of artificial intelligence which aims to teach the computers the ability to do tasks with data without any explicit programming right so we don't need to do any explicit programming and the algorithms do tasks by themselves and then m/l we mostly use numerical and statistical approaches to achieve artificial intelligence and then we have deep learning which is actually a subset of machine learning so first we have AI and then we have ml and then we have DL so deep learning comes and ferb machine learning fields and we apply deep learning through something known as artificial neural networks but which will obviously learn later all right so now let's understand artificial intelligence in a bigger set so as AI polity told you artificial intelligence is the superset under which comes machine learning under which comes deep learning and then machine learning and deep learning are basically v's to achieve artificial intelligence now these are the different areas of research of artificial intelligence so you have ml again a part of MLS deep learning then we have natural language processing so here we basically understand water spoken or written by a human and then we have speech where we either translate the speech to text or we translate the text to speech the next sub field is robotics and then we have autonomous vehicles under robotics so Google self-driving car is an example of this over here right guys just a quick info if you're interested in doing an end-to-end certification course and artificial intelligence then Intel Power provides is the right course for you to master all of the concepts with respect to this field and you can check out the course details in the description below so let's continue with the class so now that we've also understood the difference between artificial intelligence machine learning and deep learning let's see different examples of machine learning around us so most of you would have shopped on Amazon now when you go into Amazon you see that there are some products recommended to you now how do you think that would happen so this is something known as recommendation engine and recommendation engine is nothing but a component of machine learning so let's see you and your friend buy similar products for your friend buys five products and you buy three products now all of those whatever three products you buy are same as what your friend buys so let's see the common products are an iPhone a back cover for the iPhone and a Bluetooth headset now let's say the other two products bought by a friend would be a macbook and a mouse now since there are three products which are same between you two this is why the products which your friend has also bought those are the products which will be recommended to you so on the basis of the commonality between you and your friend you will be recommended a macbook and a mouse as well so this is nothing but a concept of machine learning and then we have Amazon Alexa so Amazon Alexa as Amazon Alexa is a really good example of speech recognition you know when you see Alexa turn on the lights it'll turn on the lights when you say Alexa book right for me it will do exactly that when you say Alexa ordre cheese pizza and that is exactly what Amazon Alexa will do now Alexa is just a machine right but when you see do something order a pizza book a cab for me turn on the lights you know how is the machine able to understand all of this so the idea behind this a speech recognition and that is again a component of machine learning and then we have Netflix's movie recommendation so let's say you watch two TV series first TV series as friends and the next TV series is Big Bang Theory and since you watch these two TV series which belong to the johner comedy that is why maybe you will be recommended How I Met Your Mother or you can be recommended Silicon Valley or some other TV series belonging to the comedy channel so this again is machine learning and then we also have Google traffic prediction let's just say you're traveling in your car and there is huge traffic there youo and you desperately want to get another traffic so you turn on Google Maps and Google Maps tells you the best direction from that the traffic would be the least now how does Google Maps do this this again is machine learning so now that we've looked at different real-world applications of machine learning let's actually understand what exactly is machine learning so as I've already told you machine learning is a subset of artificial intelligence which gives the machine ability to learn without being explicitly programmed so over here data is the key or in other words you basically teach a machine how to learn without any explicit programming and the machine learns with the help of data so now that we know what exactly is machine learning let's also understand how does machine learning work so as I have already told you machine learning depends totally on data so first we take in a data set and divided into two parts the first part would be the training set and the second part would be the testing set and we will train the model on top of the training set so now once we train the model we will give it a new data and check for its accuracy on top of that you data and if the accuracy of that new data comes out to be good enough then we will go ahead and use that machine learning model on the other hand the model which you built if the accuracy of that model is not good enough then we'll go ahead and fine-tune that model till we get the desired accuracy this is the basic premise behind machine learning now let's look at the subcategories of machine learning so we have sepoys learning unsurprised learning and reinforcement learning so when suppose learning you can consider that the learning us guided by a teacher so we have a dataset which actually acts as a teacher and its role is to train the model or the machine so once the model gets strained it can start making a prediction or decision when new data is given to it so let's take this example so over here we are training this machine by giving it samples of data so who are here the data is nothing but different images of Apple and along with each image of Apple we are also giving it the label of the image right so this image goes with its label which is Apple again this image goes with its label which is Apple again the same with these two right so we are teaching this machine that whenever it sees and you made something like this it is nothing but an apple and after time when we give it a new data from whatever learning it has done it will predict whether it's an apple or not so on the basis of its learning this machine predicts that there is a good possibility there is actually 97% possibility that the image which has been fed to the machine is nothing but an apple so why use kiss-off suppose learning could be spam classifier so spam classifier basically means that whether the email which we get it's a spam or and that is done on the basis of different textual parameters so let's see a chanmin email wouldn't contain too many exclamation marks it wouldn't contain a catchy headline and so on but on the other hand if it's a spam email maybe we contain a lot of exclamation marks with maybe a lot of numbers and ill have statements like hey congrats you've won a lottery or hey could you help me out so the spam classification is basically an example of supervised learning then we have unsupervised learning so in unspooling the model learns through observation and finds structures in the data so once the model is given a data set it automatically finds patterns and relationships in the dataset by creating clusters on it so what it cannot do is add labels to the cluster like it cannot say this is a group of apples or mangoes but it will separate all the apples from mangoes so over here we have this set of images now this unsurprising model which is applied on this it will segregate these fruits on the basis of similar characteristics so over here we have segregated these four into one cluster these three into second cluster and these three into the third cluster now even though the unsupervised learning does not have any labels it has still segregated these three into three clusters right so the machine over here does not know that these are apples or these are oranges or these are bananas yet it has segregated these three on the basis of similarity of characteristics so it found out that these four objects are similar to each other and there is quite a bit of variability when it comes to these four objects and these three objects similarly this machine was able to figure out that these three objects they are quite similar to each other but when compared with these three objects they are very dissimilar this is the underlying concept of an suppose laning and a good example of unsupervised learning would be again Netflix movie recommendation so over here the movies are segregated on the basis of different genres so over here TV series like Friends How I Met Your Mother and Silicon Valley are clustered into one group because those come into the same category similarly movies such as secret superstar and the angle could come under the same category because they have the same lead actors so over here we are segregating the movies on the basis of similar characteristics even though there are no labels in it and it's finally time for third machine-learning type which is reinforcement learning right guys just a quick info if you're interested in doing an end-to-end certification course in artificial intelligence then Intel part provides is the right course for you to master all of the concepts with respect to this field and you can check out the course details in the description below so let's continue with the class so over here there is an agent and there is an environment and the agent interacts with the environment and finds out what is the best outcome for it so it basically follows the concept of hit and trial method the agent is rewarded or penalized with a point for a correct or a wrong answer and on the piece of positive reward points gained the model trains itself so let's take this example so over here this self-driving car would be our agent and the road is the environment and this car is interacting with this environment so it will observe the environment and it has two choices over here so either to go straight or turn right now let's see this agent or the self-driving car decides to go straight then what happens at goos and bang straight into this barricade so then it realizes that the action taken by it was not in its best interest and that is why it is penalized so since it Pina lies it realizes that the action taken bite was wrong and that is why from the next time onwards it will do the opposite action so instead of going straight it will take the right turn and when it takes the right turn it realizes that the road is correct and the agent has rewarded so this is our reinforcement learning piece Kelley works so the agent interacts with the environment it takes an action and if the action turns out to be incorrect at a spinal ice and if the action turns out to be correct it is rewarded so this cycle goes on and on till it completely learns it environment properly and the best use cues of reinforcement learning is again self-driving car so companies such as Tesla and Google are working on these self-driving cars so just to sum it off these are the three different types of machine learning algorithms so we have super iced unsurprised and reinforcement machine learning so under suppose we have regression and classification and in unsurprised we have clustering techniques association analysis and hidden Markov model and then the third is obviously reinforcement learning which works on trial and error method and if you want to do some really cool machine learning projects you can check the sites out now let's look at some limitations of machine learning so when it comes to machine learning algorithms they would require massive stores of training data so again as I've told you machine learning is totally based on the data which it has so if you have more amount of data only then it will be able to give correct accuracy so let's say you take in very small amount of data and there's a good possibility that the results which you're getting are very biased or very incorrect and also error diagnosis is quite difficult when it comes to machine learning because again the amount of data is very huge and wherever there's a mistake you'd have to go through the entire algorithm which you've written and then find out that particular mistake by yourself which is very difficult and also when it comes to machine learning algorithms that not really that creative so these ml algorithms are built only for one specific purpose so let's see if I'll build a machine learning model which will predict whether it'll drain or not today now if I want to use the same model to predict the stock prices they'll not work right so basically one model is built only for one particular task so this is the lack of creativity that I'm talking about when it comes to machine learning and also there are a lot of time constraints as the model has to learn through a lot of historical data so that was everything about machine learning now let's start off with deep learning so deep learning is a subset of machine learning where it learns through data representations as opposed to task specific algorithms so we saw that the drawback in machine learning model source that the models are specific to only one particular task but this is not the case with deep learning models as these deep learning models are based on the data representations and these deep learning models are mostly built with something known as deep neural networks so this is how a deep neural network looks like so these deep neural networks completely learn the D tau which is fed to it so this is the data so let's see if this image of women is fed as the data to the deep learning model then we'll completely extract all the features of this data by itself again the difference between ml and deep learning over here is that the extraction the feature extraction in machine learning is manual but when it comes to deep learning the feature extraction is automatic so the deep learning model automatically extracts all of the features associated with image and were new images are fed to it it automatically is able to tell whether the image seemed to this or not so this over here we have a graph which tells us how does performance vary with respect to the amount of data so what happens in machine learning is that as we keep on increasing the data the performance increases only up to a particular threshold after that if we increase any more data there is no increase in performance so this is another problem when it comes to machine learning but on the other hand when it comes to deep learning the more amount of data you give it the better will be its performance and that again is because deep learning is based on learning data interpretation so the more data you give it it will automatically learn all those features of the data by itself and it will be keep on increasing its performance gradually now let's look at some applications of deep learning so speeds recognition as one application of deep learning now you need to understand that you cannot build speeds recognition applications with machine learning so this is where machine learning fields and deep learning comes in and helps you to build speeds recognition applications also another application of deep learning as self-driving cars so we see over here that the person is just sitting he is not even touching the steering wheel and the car is driving by itself so just an amazing application of deep learning and then we have language translation over here so this again is a power of deep learning so over here we are typing something in Spanish and it is being automatically converted into English so we also have visual translation over here so over here this text or this board as in some random language and this app over here which uses deep learning automatically converts this visual into English so those were some applications of deep learning now let's actually understand how does deep learning so most deep learning methods use neural network architectures and that is why deep learning models are often referred to as deep neural networks so a deep neural network basically has these three models and input layer the hidden layers and the output layer and the term deep usually refers to the number of hidden layers in the neural network so traditionally neural networks only contain two or three hidden layers while deep networks can have as many as 150 hidden layers now that's a very huge amount isn't it so deep learning models are trained by using large sets of label data and new network architectures that learn features directly from the data without the need for manual feature extraction so all of the input data is given to this input layer and this input layer automatically extracts the features by itself now that data is sent to this hidden layer which performs all sorts of processing tasks and then the final result is given out through the output layer so now let's also understand what exactly is a neural network so when your network is a computing model whose layered structure resembles the network structure of neurons in the brain with layers of connected nodes so it can learn from the data which can be trained to recognize patterns classified data and forecast future events so the neural network is based on the biological neural network of our brain so that is why it has given the name neural network so the layers are interconnected where nodes or neurons with each layer using the output of the previous layer as its input so its main function is to receive a set of inputs perform calculations and then use the output to solve the problem now as I've already said this artificial neural network books are based on something known as a biological neural network so our biological neural network has dendrites cell body and axon so dendrites are what input is taken cell body is where the processing is done and axon as well the message is transferred to other neurons and the same thing happens in artificial neural network as well so first we given the data that data is processed and then the final processed result is given out as the output so over here let's say we train the data with images of cat and the labels would be either cut or not cut after that we given a new image of a cat and then we basically try to predict whether the model correctly classifies this as cut or not cut and since the model has learned the data properly it correctly classifies this image as cat now there are many V soft knitting the nodes of a neural network together and each V results in a more or less complex behavior possibly the simplest of all topologies is the feed-forward Network so when feed-forward neural network signals flow in one direction without any loop in the signal paths and typically artificial neural networks have a layered structure the input layer picks up the input signals and passes them on to the next layer known as the hidden layer and there can be more than one hidden layer in a neural network and at last comes the output layer that delivers the result now the first question to pop into your head would be what is the inspiration behind these artificial neural networks well the answer to that is the biological neural network of a brain so let us first understand the architecture of a biological neuron so as you can see in the slide our biological neuron has these three main components so we have the dendrite the cell body and the axon these dendrites receive the signals and the cell body processes signals and the axon finally sends out these signals to other neurons so just like the biological neuron the artificial neuron has a number of input channels a processing stage and one output that can fan-out to multiple other artificial neurons so now let's understand these artificial neurons in detail these artificial neurons are the most fundamental units of deep neural network it takes an input processes it passes it through an activation function and returns the output if the condition is met or else it will process it again until you get the correct output and such type of artificial neurons is called as a perceptron so they are basically like a linear model which is used for binary classification so as the figure shoots we have X 1 X 2 X 3 and going on till xn as inputs in the input layer now to which we add the weights and the bias that are randomly selected so here we have w1 w2 w3 going on till WN as weights so we multiply these weights with the corresponding inputs and add all the values together and finally we add bias to that sum so this final sum is passed through an activation function which finally gives us the output so let us see this in detail so here we have but three arrows which correspond to the three inputs coming into the network now for these three inputs we have corresponding weights associated with them so input one is associated with the weight of 0.7 input 2 is associated with the weight of 0.6 and input 3 is associated with the weight of 1.4 now these inputs are multiplied with their respective weights and their sum is taken so with the 3 inputs are X 1 X 2 and X 3 the sum would be X 1 into 0.7 plus X 2 into 0.6 plus X 3 into 1 point 4 n 2 the sum we add an offset which is called as bias so this biases just a constant which is used for ski purpose now let's understand the concept behind these weeds so these rates basically determine the relative importance of the inputs so let's say we have two inputs humidity and varying a blue shirt so here we can see that wearing a blue shirt has almost no correlation with the possibility of rainfall so that is why the weight assigned to input x2 would be low in order to bring down its importance now let us see why do we need activation functions so consider the scenario where you have two different classes one class is represented with triangles and the other class is represented with circles now let's say I asked you guys to draw a linear decision boundary which can separate these two classes so is that really possible can we draw a linear line which can segregate these two classes well the answer is obviously no isn't it so let me tell you guys how can we do this so we'll have to add a third dimension to create a linearly separable model which is easy to deal with so there are logical when you are going from 2d to 3d you are making your equation non linear so with the third dimension I have introduced non-linearity in our data which helps in creating a linearly separable model and in real-world situations you don't always get linear problems so you should know how to deal with nonlinear problems as well and this is where activation functions help us to convert the linear equation to nonlinear form so these activation functions bringin non linear functional mappings between the input and the response variable their main purpose is to convert an input signal of a node in an artificial neural network to an output signal and if we do not apply an activation function then the output signal would just be a simple linear function now there are many types of activation functions and today we'll be discussing some of the widely used ones so well let's start with the identity function so the identity function gives out the same output as the input so no matter how many layers we have if all the activations are identity functions then the final output of the last layer would be the same as the input given to the first layer and the range of the identity function goes from minus infinity to plus infinity so after that we have the binary step function so this binary step function is usually denoted by H or theta and is a discontinuous function so if the input is less than 0 then the output would be 0 and if the input is equal to a greater than 0 then the output would be 1 and this is why binary step function is used to solve a binary classification problem so after that we have the sigmoid function so the formula for the sigmoid function is denoted by 1 upon 1 plus E power minus x the sigmoid function basically scales the values between 0 and 1 so if the input is a large negative number it is scale towards 0 and similarly if the input is a large positive number it is scale towards 1 then we have the rattan hitch function there is a hyperbolic trigonometric function which scales the values between minus 1 and 1 so one advantage of Tiny's function with sigmoid is that it can deal more easily with negative numbers and after that we have the rayleigh function which stands for rectified linear unit so this function will give out 0 F input is less than 0 and on the other hand if input is equal to a greater than zero then it'll act as an identity function and give out the same value as the input and thus rayleigh function is the most widely used activation function and is primarily implemented on the hidden layers of the neural network then we have the leaky Rea loop which is just a modified version of relu so the leaky relu instead of just completely removing the negative part it just lowers the magnitude and finally we have the softmax function which is ideally used in the output layer for classification problems so the softmax function basically gives a set of probability values for each class of the output and that particular class which would have the maximum probability will be our output class so that was all about activation functions now let us learn more about perceptrons so like we were taught her behavior in certain conditions perceptrons also required training so they have a learning algorithm through its to produce the output by training a perceptron we try to find a line plane or some hyperplane which can accurately separate these two classes by adjusting the weights and biases so consider this image where we give the dogs and horses as input so here after the first iteration error value is 2 since the horse has been classified as dog and there is one dog which is placed in the horses class and in the second iteration the error value is reduced to 1 as a test just the dog which is classified as a horse and finally in the third iteration we get the correct output as the positron has been trained well with no error so all the dogs have been pleased in one class and all the horses have been pleased in one class now let's understand the perceptron training algorithm so this perceptron over here receives multiple inputs and each input is initialized with a random feed so after this we multiply these weights with their corresponding inputs and then we get the sum now this input is passed through the activation function which would give us a nonlinear output so this process until here is known as feed forwarding now if the output which we get is not optimum we calculate the error in prediction and then go back and then update the weights and bias so this process where we go from output to the input is known as backpropagation and we keep on back propagating until we get the desired output so that was the perceptron training algorithm now let's have a look at the benefits of using artificial neural networks so the artificial neural networks can learn organically this means and artificial neural networks outputs aren't limited entirely by inputs and results given to them initially by an expert system so artificial neural networks have the ability to generalize their inputs this ability is valuable for robotics and pattern recognition systems artificial neural networks also help in only new data processing so nonlinear systems have the capability of finding shortcuts to reach computationally expensive solutions these systems can also infer connections between data points rather than waiting for records in our data source to be explicitly linked this nonlinear shortcut mechanism is fed into artificial neural networking which makes it valuable and commercial big data analysis artificial neural networks also have high potential for fault tolerance when these networks are skilled across multiple machines and multiple servers they are able to route around missing data or servers and nodes that can't communicate and these artificial neural networks can also self repair themselves so if they're asked to find out specific data there is no longer communicating these artificial neural networks can regenerate large amounts of data by inference and help in determining the node that is not working this treat is useful for networks that require informing their users about the current state of the network and effectively results in a self debugging and diagnosing network now to implement this artificial neural networks you would need the help of a deep learning framework so the first question to pop into your head would be what are the different deep learning frameworks available so today we'll cover just start so let's start with tons of look so tensor flow is arguably one of the best deep learning frameworks that we have today it is an open-source software library developed by the researchers and engineers from the Google brain team for high-performance numerical computation one well-known use case of tensorflow is Google Translate so Google Translate is coupled with capabilities such as natural language processing text classification forecasting and tagging so tensorflow basically comes with two tools then support and tensorflow serving so building massive deep neural networks could be complex and confusing this is where we can use 10 support to visualize our tensor flow graph and plot quantitative metrics and then we have tensor flow sewing which is a flexible high-performance serving system and can be used for rapid deployment of new algorithms while retaining the same server architecture and api's so now let's look at the next deep learning framework which is chaos so Kerris is actually a high level api which can run on top of other deep learning libraries such as stands of Luciano or CNT ki and with the help of cara's who can implement both convolutional neural networks as well as recurrent neural networks and the best thing about Cara's model building is extremely easy it's like stacking layers on top of each other so next we have PI dot which is our scientific computing framework developed by Facebook so we can get from the name itself that PI tortures pythonic in nature thatis it can leverage all the services and functionalities offered by the Python environment and also smoothly integrates with a Python data science stack another great feature of pi dot is that it offers dynamic computational graphs which can be changed during runtime this is highly useful when we have no idea how much memory will be required for creating the neural network and the next deep learning framework is DL for G so unlike deep learning frameworks which we saw till now which were all based on Python deep learning for G is our deep learning programming library which is written for Java and the Java Virtual Machine and the biggest advantage of DL Fujita's it includes inbuilt integration with Apache Hadoop and spark so it helps in getting state-of-the-art results on image recognition tasks so it shows matchless potential for image recognition fraud detection text mining parts of speech tagging and also natural language processing and finally we have a mix net so AMEX net is a deep learning framework developed by Apache Software Foundation specifically for the purpose of high efficiency productivity and flexibility and the beauty of Emmet's net is that it gives users the ability to code in a variety of programming languages such as Python our Julia and Scala this means that you can train your deep learning models with whichever language you're comfortable in without having to learn something new from scratch and this deep learning framework is known for its capabilities in imaging speech recognition forecasting and NLP so when you hear the term tense of flu the first question to pop into your head would be what exactly is a tensor so when tensorflow data is represented in the form of tensors simply put a tensor is a multi-dimensional array in which data is stored so you can consider these tensors to be the building blocks in tensorflow now these very tensors are given as the input for the neural network so as I've said a tensor has nothing but an N dimensional array so the number of dimensions used to represent the data is known as its rank so if for a tensor has just one element in other words if it has just magnitude and no direction then its rank will be zero if a tensor has magnitude and direction in one plane then its rank will be 1 similarly for tensor has magnitude and direction in two planes then its rank will be 2 and this goes on higher of the order now tensor flow has the name steets as a combination of two words tensor and flow here the data is stored in tensors but the execution is done in the form of a graph so this is not like your traditional programming we just write a bunch of lines and everything gets executed in sequence so first you'd have to prepare this computational graph and then this computational graph is executed inside something known as a session now up in this computational graph all the mathematical operations are depicted inside the nodes and all the tensors are represented on the edges so the entire computation process is done in two stages in the first step the code is depicted on to the computational graph and in the second step a new session environment is started and the graph is executed inside this session so what that was all about the computational graph now let's look at the program elements intensive flow so we have three program elements constant placeholder and variable so well let's start with constants so constants are program elements whose value does not change or in other words the value is fixed so let's head on to Jupiter notebook and work with these constants right so my first task would be to import the tense flow framework so I'll type import tensorflow as TF I'll click run so let me just wait till the import is done right so we have successfully imported the tensor flow framework now panel said let's go ahead and start working with the constants so let me just type in constants over here so I create the first constant and name this constant as corn one now this is how we can create constants intensive flow so I will use this DF and then put in a dot and then type in constant now inside this I will give the value of the constant so let's say the value is 10 so this is an integer type constant now similarly I will also create a floating type constant and I love store this in corn too so I'll type P f dot constant and the floating value would be 3.14 now after this I love create a string type constant so again this would be D F dot constant and the string which I'd be giving would be this as Sparra and finally we have a boolean type constant and I'll stir this in corn for so this will be D F dot constant and let's say the value is false now I love run now let me print all of these values so I use the print function and then I'll go ahead and print all of these values corn free corn for righty so we see that this first constant is a tensor of type integer the second constant is a tensor of type float this third constant is a tensor of type string and this fourth constant is a tensor of type boolean now you see that we only have the data types of all of these tensors but we don't have their values this is because as I've already told you guys we have to create a computational graph and then execute that computational graph inside a session but till now we have not started a session so let's go ahead and start a session first so I'll rights s equals D F dot session and I'll hit run now I will run all of these inside this session so I will type cess dot run and let me go ahead and run all of these corn one corn to corn three and corn 4 so this time we have the values of all of these tensors so the value of constant one is stand the value of constant two is 3.14 the value of constant three as this is part up and finally the value of constant for is false right so first we'll have to create all of the constants then we'll have to create a session and inside this session we'd have to run all of these constants now let me go ahead and perform some simple operations on all of these constants so let me just type in operations over here so I'll do a simple addition operation so I'll type position over here and let's see the value of the first constant s 2 nd I'll put a plus symbol and then I'll take in the next constant and the value of the second constant would be 30 so I am basically adding to tensorflow constants the value of the first constant is 20 the value of the second constant is 30 and I'm storing that result in addition similarly I will multiply these two constant now so multiplication TF dot constant so I'll get the value of 20 now I'll put the asterisk symbol and then this would be the F dot constant of 30 so this time I'm multiplying these two values right so I'll hit on run so now again if I'll have to see the resultant addition and multiplication values I'd have to run these two inside a session so wild-eyed sirs dot run and then put in these two values over here addition and multiplication right so we see that 20 plus 30 gives us an addition value of 50 and similarly when we multiply 20 with 30 we get a result of 600 right now so this was basic operation with scalars and we already know that tensors can have higher dimensions so let's go ahead and perform addition and multiplication with these higher dimension tensors right so again I'll just put in addition over here and I will taken the first constant and inside this I will given a list of values so let's see I will take in one two three four and five and I will add this list with the next constant and this time the second constant has the values of five four three two and one similarly I'll also multiply so multiplication equals T F dot constant and this will have values let's see the same values one two three four and five let me put our come over here right now I'll put the asterisk symbol again I'll type the F dot constant and I will give in the list of the values five four three two and one now I'll hit run again I need to run these two inside a session so says dot run let me put in addition over here after that I would also need the multiplication value right so this is a result so when we add one plus five we get six when we are two plus four we get six similarly when we add each of the corresponding elements with these elements inside the list we get all the sixes over here now let's take this multiplication results over here when you multiply one with five we get Phi when we multiply two with food we get eight 3 cross 3 gives us 9 four cross two gives us eight and again 5 cross 1 gives us a fight so this was addition and multiplication with respect to lists now let me also do a simple operation on strings so let me begin the four string and name it as str1 so this is a constant so TF dot constant and let's say I type over here I love and then I give us peace now I will take in the second string which would be STR two and inside this and again this would be a constant so TF dot constant and the value of this constant would be tensorflow right now I will run this and let me execute this inside a session so says dot run of str 1 plus STR 2 so the result which you get is I love tensorflow right so the 4 string list I love and the NASA space and the second string is tensorflow so when I add these two strings the resultant s I love tensorflow right so that was all about constants and then we have please holders so when it comes to please who will lose we don't have to provide an initial value and can specify it during the runtime so this allows us to build our computational graph without needing the data and this is how we can not create placeholders so TF dot placeholder is the syntax and inside that we just give the data type of the variable which we will substitute later on during execution so let's go to so let's head back to Jupiter and work with these placeholders so let me just type in placeholder over here so let me create my first placeholder so I'll name the last E and the F dot placeholder and this would be of integer type so TF dot and 32 now I will create another variable which would be B and the value of B would be actually a cross - so let me run this now I will run these two inside a session success dot run and I want to see the result of B so I'll put in B over here now since we know that a placeholder takes in a value during runtime so this is one I'll feed the value to this placeholder a over here now to do that I would have to create something known as a feed dictionary so feed dead equals let me create a dictionary over here so it would be e and the value which I'll be giving to a would be let's say 5 now let me run this and let's see what do we get right so during the execution time I have assigned a value of Phi 2 e and when we multiply this Phi with 2 over here we get the value of P which is 10 right so all of this is happening during runtime because with the help of a placeholder we can assign it a value during execution now similarly let me go ahead and give a list of values over here so instead of 5 let me give the list of values 1 2 3 4 and 5 now I'll run this so oh here 1 cross 2 gives us 2 2 cross 2 gives us 4 3 cross 2 6 4 cross 2 is 8 and 5 cross 2 a stent so you get an array of values during runtime now similarly let me also create our placeholder for Strings so while type string please who will do over here so now let me create this variable and named as as string name and this would be your place holder so DF dot please holder and I am taking in a string so TF dot string right now let me create another string over here so the name of this string would be my name and let's say the value of the string as I am right now let me run this and I'll execute this inside a session says dot run and I want the result of my name when I added to it respect to string name now let me also create a placeholder for Strings so I'll just type string placeholder and I'll let me create the first place whole look so I'll name that as str1 name and since there's a placeholder D F dot placeholder and I will be giving in a string during execution time now I'll also create another string value over here and name that to be my name and the value of the string is I am and then I'll give us peace right so now I'll hit run and let me excute this inside a session now let me also create a placeholder for Strings so this will be string placeholder right so let me create this placeholder I'll name this as STR name and since it's a placeholder I need to use TF dot placeholder and I will be assigning a string to this during execution time now I will also create another string which would be my name and this would be equal to PI and there's a space and I will be adding this with STR name so let me hit run right now I'll execute this inside a session so for that I'll type sess dot run and I want the result of my name so I'll just put in my Nemo here and then I'll try the feed dictionary feed direct equal to let me put in the dictionary over here and I will assign the values of STR name over here right so the values of STR name would be Sam Bob and Charlie now let me hit run and let's see what we get right so what we are basically doing is we are adding this with the placeholder value here and we are giving the values during the execution time so I am Sam I am Bob and I am charlie now these three values are coming from this feed dictionary during the runtime right so this is all about placeholders and finally we have variables so variable is just program element which allows us to add new trainable parameters to the graph and this is the syntax to create a variable T F dot variable and then we give the value or we initialize the value and then we specify the data type of that variable right so let's head back to Jupiter now I'll just type in variables over here right so let me create my first variable and the name of that variable would be fire 1 and we can create a variable like this D F dot variable so guys you need to keep in mind that this B over here is actually capital right so now after this I would have to assign a value to it so let's say I assign this variable a value of 20 and this is of integer type so TF dot and 32 I'll run this now another thing to be kept in mind us whenever we are declaring values intensive loop they have to be initialized so this is how we can initialize one of the variables so we have something known as global variable initializer and when we invoke this function all of the variables which we have declared would be initialized I'll hit run so now let me execute this inside a session sucess dot run off in it and I have initialized this video below here now let me also go ahead and run that variable says dot run var one right so we have the result of the r1 which is 20 now since this is a variable the value of a variable can be actually updated so let me go ahead and update the value of this so I will name this us updated who are one and the function would be T F dot assign and inside thus the first parameter would be the variable which I'd want to update and after that I need to give the value to which I'd want to update this so I want to make this value of 20 to 25 right now I will run this so this is actually a variable we can actually update the values so let me go ahead and do that so the name of the updated variable would be let's see updated who are one and the function for that would be D F dot assign and the stakes in two parameters the first parameter would be the variable which were supposed to update and the second parameter would be the value to which we are updating it so I want to make this 20 to 25 I'll hit run right now let me run this inside a session says dot run and I need to pass in the variable which would be updated var one let me make it small V over here I'll run this right so initially the value of var one was 20 but we have updated it and made its value to be 25 now let's also go ahead and create a small linear model so let me just type in a linear model over here and this is how our linear model would look like W X plus B where W and B would be variables and X would be a placeholder right right so let me start off by creating W so W is a variable so W would be equal to TF dot variable and I am initializing it with a value of let's say 10 and this is of integer type so this would be TF dot and 32 now similarly I will also assign the value for B so B is also a variable and its initial value would be Phi and this is also of integer type and finally we have X which has a placeholder so X is equal to DF dot placeholder and since please hold does not actually take an initial value it just takes a data type so the data type is again DF dot and 32 so I'll run this and now what I'll do is I will multiply W with X and add B to it so the equation would be W cross X plus B and I will store it in a variable and name that variable to be linear model right so this is W cross X plus B I'll run this now again if I have to execute this I have to run this inside a session and since also I have created two new variables I'd have to initialize them first so in it 1 equals D F dot global variables initializer I'll hit run so now I will create a session success dot run and I will execute in it one first so I have successfully initialized these two variables W and X now I can go ahead and run this linear model says dot run now I would want the result of linear model so linear model and then I will use the feed dictionary now inside this free dictionary I would have to assign a value to this placeholder X so X equal to let's say I give a list of values over here and the list of values would be 1 2 3 4 and 5 now I'll run this and let's see what we get right so the value of x is 1 we get 15 so this basically means that 10 into 1 plus 5 which is 15 now after that of the value of x is 2 so this would mean 10 into 2 plus 5 which is 25 similarly if X is 3 and that would mean 10 into 3 which is 30 plus 5 is 35 right and seam is the keys for 4 & 5 so let's go back in 1980s there was this crazy guy called dr. he didn't he proposed that you know K we could mimic somehow the our human mind work and we can mimic the way it works so that we can also try to make a make a thing out of it so that to whatever our mind is doing artificially we can also do that thing so he proposed a paper and everybody said that little person you are doing something wrong this is not what is right and everybody you know didn't like the the way things are going on but after 20 years almost in like the late 90s around 1989 he again came up to this paper now he brought in his paper but this time with a proof and he won a contest that is called imagenet in 2004 this time he came back and everybody had to listen so we were at in 2004 when he won the English net I have two people who I have like I've already told you know Imogen it is imagenet is actually in competition it is held by a joint dope joint division of Stanford Princeton University they these guys come up and they they did they downloaded a corpus of images now they have a very large collection of images around 1 million images around covering thousand categories they have you can just click on explore and you can see this tree around yet so what you have to do with that you have to make a model out of it so that you can classify the images so people used to make you know hand-coded you know Kathy fires and they were working they were working but but at some stage you know they had to fail because like up now let's say that fish you are trying to detect a fish but fish can not every ways be in the same manner in which you are building the hard coded thing right all it's said that you are trying to detect a cat and then you say okay top right if it is the ia top left desire and then it's a squishing mouth then it's a cat right but now the cat face will not always be available to you in that particular fashion like it might be a top view a side view a bottom view any of you might come up and at that time people used to fail but in 2014 that one for when you know Joplin Hilton came with it with his paper is paper the competition he won was with 94% accuracy and that was never achieved never achieve and it surpassed the human accuracy also so that was though that was where you know when people started listening to him and he came out then he proposed slowly that okay that this is what I think that you know artificial even let what is nothing but it's time that we are trying to mimic what is happening in in our own mind right so if I say that what is the theory or what does the what joffrey said about the neural network is that he said that that artificial neural network is a computational model that is inspired by the by the wave biological neural networks in our brain human information possesses right so I mean most of you know this thing that you know that our nervous system is is made up of exams and okay I'll show you the photo that you know it is made up of exam then stove is to brain I shown you the food and you may be able to you it is exactly how are you and human nervous system is basic most basic diagram and everybody of you might have seen this so how this works is that there is a exact replicate copy of this if I copy this you know and you can attach it back here again so these are the Exxon terminal that are connected to dendrites again and this is how it flows so let's imagine that your fingertips pay you can have your dendrites and these are like small small particles are you know you're pure neuron you know you're on your own you're connected to your brain so this is out there are like millions of these connected and they're connected I take this okay one more feature of Jupiter notebook that you already know that you can add a markdown language so what I do is I do escape em and then I'll write HTML inside it so I write IMG SRC Cory too like because how you can write your in this you can add that this is right let's now look at it one by one slowly slowly okay so these are called dendrite Deaton right that the one who all except achill these are the one who are responsible for catching all the information and sending it forward to the nucleus now nucleus processes this thing and it sends to this Exxon's and again throws it out to the X in terminals now imagine again that this is this copy this is again replicated here back here so you can imagine that be turned right there connected to this exome right and then they're flowing flowing flowing flowing flowing currently now imagine let's say that you touched a very small or very hot tub glass site the moment you touch the hot glass the dendrites at your drone fingertips they sense that something is very hot there so they send this information to your brain okay that is hot now imagine this dendrite is processing the information sending it to terminal terminally to dendrite and turn dendrite determine it goes on goes or goes under it goes to your brain now Billy now the brain feel that this is this is very hot you should remove your hand okay now this process is again flowing back to the back to the dendrite the original dent right here let's say that this is the original dendrite here okay and brain will say that this is really hot you should remove your remove the fingertips from the glass and now when I am saying that this information is going on believe with from your fingertip to your brain there are like millions of these and right you are connecting and flowing this information now again a process of reversal information flow is going on and this time your brain is sending a signal it remove your hand from the block and this is again going back to the fingertip and then you will leave your ear hand from the glass but in real life when you touch the glass and when you remove it how much time do you take to remove this it's just some nanosecond right maybe less than your reflexes are very strong you just remove the handout but imagine what is happening when you do this your brain is not your your your hand if not taking a decision of removing that you might think it okay my hand is taking the decision to remove that no the the hand is like a slave of brain so everybody part of art is like a slave of a brain like whatever we are doing it is send by the brain then function is sent by the brain and then it's this is how your body works right so it's a slave of your brain and the brain sends information remove or attack whatever it is then it react but it's really fast right similarly to disobey only Joffrey sent it okay this neural network this is how even I will also build on neural now I'll show you how a basic move on looks like here it is this is the image so I did basic new the external Network it sister neuron but this is an artificial so Geoffrey's headed okay this is how also this is how I propose let's do the following let's say that your hand is not doing anything your your dendrites oh you know a system is not this basic neuron is not doing anything imagine that isn't doing nothing but you know it is reciprocating the information but how it is regulated preventing the information by activating itself so when you got a hot glass or something it activates itself and then only it's transmitting the thing right now I imagine if this would have not activated it would have sent nothing right so this is also playing a very crucial role see brain is sitting somewhere like a million nerve neurons far right he doesn't know what is happening to your hand until and unless this neuron is sending the information so what information will the brain process if it doesn't have the information so these guys are very much responsible for sending the information and they send the information only when they're activated so this dendrite will become this this nucleus will become activated then only it will send up further information and there is also a basic kind of information processing is going on it doesn't mean that it is attained writers are sending blank signals see I'll tell you more in depth inside see these dendrites they collect the information these dendrite they collect the information and they send it to the nucleus now nucleus think that should I process this and send it forward yes or no depending on that nucleus and it's further right now how is new plus and nucleus ending is nucleus has a particular threshold value after processing it it says after if I am activated by that process value it will send it further if not it will keep it to itself so this is also how a basic artificial neuron when I say artificial it means that the one that we are building right now indeed let me break this is also built now what is happening is that imagine this round thing that you are seeing is the nucleus this thing is a nucleus but this is a nucleus rate and these are the dendrites food 1 this X 1 and X 2 this is the way of thinking how we are intimating and this is the tail x in tail and it is again connected to again of neural needle so it is a chain of multiple neurons going on now if we go mathematical models happening inside this it is nothing but a linear combination of all the inputs with a bias I just like went to the very human biological way and I just send a sentence it's doing nothing what you're doing it and it collects in the input let's say that you have two input let's say that you have to input x1 and x2 please I have to input now to every input you will give some weight edge to it let's say our weight edge is here this weight edge is called W 1 for X 1 and W 2 for X 2 correct now what I said it is trying to build a linear regression right so what it does it multiplies the weight of the corresponding input to itself so W 1 X 1 plus W 2 X 2 plus it adds a bias to it now you will ask what is the bias I will say imagine that there is another input called X naught there is a another input called X naught and the value of X naught is equal to 1 the value of X naught is equal to one and there is this weight associated to it that is called W naught right so this bias it added to it and then you add that's it you plus all these values multi-purpose all these values and you give this output to something called by this is all our neuron does make sense any questions we'd ask it's very important when you build intuition that what is happening in so this is the most basic it's the smallest part of a neural network that if in pick and the smallest part of a neural network is doing this like it takes all the input that are coming to itself give them some weight it's multiplied the weighted to the corresponding input and then sum all these and then add a small bias to it also that's it oh how is this weight age and bias being introduced that took who is okay X 1 and X naught X 2 and X 1 I can say okay someone is giving to us the input that what is the temperature of the glass let's say it is coming from the outside environment X 1 and X 2 is coming out there but who is giving this W 1 and W 2 so okay so the answer is that you know you know artificial neural network environment at the very first instance at the very first instance they are randomly allocated to it so see you all guys have learned ml right so how does ml work ml says that I know the output and depending on the output I have to give some weight edge to these input so that they can give me the output now you know that the output will be Y you know that the output is going to be by like now you want to make some equation there why is equal to x1 plus x2 okay and now this is not the correct answer right X 1 and X 2 will never give you bye-bye one so what you do you add some W 1 here some random W 1 and then you had random W - what it will do it will not give you the answer W 1 and W 2 will not give you the answer you will do some Corrections again in W 1 W 2 and ultimately it will give you by this is what will happening at the very first instance you will take W 1 and W 2 s as a random first if that does does the process is reverse yes it also reverses itself the the model will not only go from X went from X 1 X 2 to y but it can also come back from Y to X 1 X 2 and second let's say that he is asking a Siobhan does it process recursively recursively just imagine that there are like thousands of now I'll draw it again to show you this is just a very basic one right now imagine what is going on is that this is this small that you saw right x1 x2 and it is giving someone now here it is again there is a x1 x2 and then this is giving me by now this Y and Y y1 will become input for some other self then it is giving here so are you imagining how how will it look like please become this and this will become again an input for commander so these are like connected these are like throwing information from one to another one to another one to another right but this is how it works this is how a small a network looks like and this this is the if I write down the theory it says that the basic unit of computation and user network is a neuron or often called as a node it then came right in very right this is called a node or you can say a unit also so it receives input from some other node from an external sources or external that I was saying right all right he the X 1 and X 2 might be a input from another you don't know it might be given to you directly from the external environment right computer object he is input each input has associated weight to it which is assigned on the basis of its relative importance to other the node applies a function f to the weighted sum of this input right we've been having the students data so whatever the marks based upon their marks so we have to design degrade so we have to eat just whether the student is passed out or pain so string the marks based upon the grade this is a very hard coded problem I think but let me take an example that you are working on a machine learning problem where you are trying to find out the loan default right not in bank you have multiple people who take the loan now the bank wants to know how will this person repay the loan or not right now you already have a historical data with you the bank will give the data to you okay let's say that this is the data with you so in that data you already know that if littles the feature of a person this is the credit history of the person based on that they say they will also give that okay he paid he didn't paid he paid they didn't paid right and based on that you had just your monitor so there also you have action by with you you will have your x and y with you at that particular instance also okay so everybody gets this F dysfunction thing right that's how this function is working this function f is called f fine to the F function that you see this is called a fine a fine thing and it's doing nothing right you saw that this is it doing a linear thing here it is multiplying the v8 supercar sponding x1 x2 and you might now consider X 1 X 2 as the important feature so the more important x1 is them of the more higher the value of w1 pimpy right the more lower the value of W 2 is it means it doesn't make much sense to why so everything that we will now talk will be ready respect to Y so the X 2 is net not making much sense to Y 2 the value of W 2 will be very less let's say Y is dependent on X 1 the value of W 1 will go more higher right and then your bias you already have this is a static thing right because the X not associated to it is equal to 1 right let's now go back at what further happens here now you get the Y out of here right now this thing is a little you have got the answer and this might be greater than 1 or less than 1 also right so there is no measure of this answer it might happen that W 1 W 2 is very small let the W 1 W 2 are very small then the value of y which might be almost equal to 0 right all let's say that W 1 value is very high but very high so the value of y might also get very high right so now this might bring some bias you know this might bring some ambiguity to our model this Y might be sometimes you know next to infinite it might be equal to infinite or it might also be equal to 0 right depending on the variable and we don't want that we want our Y to be in some particular things right we want our Y to be in some bonded matrix then comes next guys whose names are activation function activation functions now we got to know that how this Y is made right how this Y is the ultimately achieved now to adjust the value or squeeze the value of y we need some other other things right we need we need something that can control the value of y and these are called activation functions now what activation functions does is that they will take the value of y and they will squeeze the value of y in some certain range what they will do they will squeeze the value of y in from certain range so we have multiple you know let's say we have a step function we have a 10h function Shivam Sanjay I said can you please come again about this activation function not getting me properly you so let's come back what this activation function is so you understand that the value of y how it is calculated right it is now totally dependent on w1 and w2 if the value of w1 and w2 is very less I mean zero point zero zero zero zero or let's say 10 to the power of minus 23 then the then the y is going to be very less right next to zero all let's say that the value of w1 and w2 is like 10 to the power positive 48 plus 48 so this might be very high the value of y might be very high there might be no standard to control this value of by right mathematically there is no this value can go from negative infinity to positive infinity right and this is not a very good range to to get about a particular output right so so what we do is we introduce something that is called activation functions now what does activation function will do with activation functions we'll make sure that your output that you have got why it should be in some particular range what it should be it should be in some particular range now how do you decide that who is going to win what range and how is is that ranger called see the activation function if I ask you mathematical the mathematical purpose of introducing an activation function is you squeeze the value of Y between some variable called 0 to 1 right majority time so the sort of value so the purpose of activation function is to bow into dou Y in some range right and this range might be minus 1 say 1 0 to 1 or sometimes 0 to infinity prime different different things right I was at this thing that mathematically the value is to be squeezed between zero to one but if I talk to you about statistically why I am doing this is now imagine till here till this particular range your model is a linear model everybody agrees with me mate this is a linear model nothing nonlinear have been done till now right all the things that we have done till this step in the circle everything is linear there is no nonlinear thing right now linear things are are a little bad at Plataea fine thing if there are more than three glasses right if I ask you can you if let's say there are three classes one is like crosses paid one is said it goes right and one is some triangles now can you make a single line and can divide these three data no right if it would have been two classes you might have you know drawn a single line and you might have divided the data into the two classes right but if it is a three clouds data how will you divide this you have to ultimately make a triangle kind of a shape or this kind of a shape right then only you can divide the divide this this thing right so I will show you a boat also let's say let me give you an example then I will show you all this world okay can any one of you let me hide it you now this is this this thing in front of you everybody can see your data there is this red dots and there is a blue triangle in the screen right everybody can see this now can you make a linear model that can separate these taking this can only do enough to dimension right but if I say you but okay see your magic anybody wants to try this thing how will you do no okay what I do is so what I did work I introduced a third dimension and now I could really separate by introducing a hyperplane here and then I can say it okay something that is above this hyperplane is read something else below this hyper pairs blue make sense to everyone right so this is why we do activation function to make sure that we want to classify things more better to make things more clear we do activation function so mathematically why we're doing activation functions because we want to squeeze the value of y somewhere if but if I talked statistically what we are doing we are introducing non-linearity in a linear model because linear models are very much not capable of classifying things in when the number of classes increases so now we saw that as soon as we increase the linearity if we increase a dimension in the data if we introduce nonlinear in the data what it is doing it increases the chances of making a linear separable model right you right so if I say entry three times first mathematically squeezing the Y second if I see a graphical thing why they're doing this because we want to introduce a new dimension and statistical is saying why we're doing this because we want to introduce non-linearity in fact it now you might be asking what these function is - it's a very good question we have deviation functions grounds see this is how our Sigma it looks like a tan hats look like really looks like NC everybody is bounding so see it might bounding between 0 to 1 10 inches bounding between -1 say 1 Ray Lewis binding from 0 to infinite so it led to go everything that is going above yes we have step function although step function says that even go something something can we only have two things either 0 or 1 if something is negative it goes 0 if something is positive it goes 1 so step function Sigma in tenets value so there are multiple things these are doing the same thing that we want to do for activation function value go you can understand that if something is negative it says it's a zero and if something is positive it lets go whatever the value of this is right all right - maybe a builds in here this is what brings in sigmoid it's this candidate if you render is this like what is the maximum of 0 to 1 that that ever value it will take but this is what activation function this let's say that you have this thing called y1 and y2 right and now the value of y1 and y2 is very large very large right so this Y 3 will become ultimately also very large and this y 4 will become also very large and we know this is not the exact thing we always expect our output to be in the form of 0 to 1 f it is a classification problem and if it is aggression problem then it is it will always be in some range from 0 to 1 right you all agree with me right in machine learning also whenever you get a classification problem give one hot encoded the problem so you never said that the output is going to be a CAT cat or D Gogi dog right you always said that ok if zero comes this is cat one comes in third all right you did the same thing right it's a machine learning also or if it was a regression problem then you always regress of value from 0 to 1 with a multiplicative factor right so this is what is happening here also y3 and y4 the value can never go up beyond a liminal to control that limit we have to squeeze the value and see when I'm saying squeezing the value that is in mathematical form on but if you see in statistically what we are doing is that we have to introduce non-linearity into the data right because linear data's are not linear equations are not good in separating nonlinear data right right this becomes separable you see now if it is very much separable just by a simple hyperplane okay so coming back to a feed-forward neural network so see this was the input x1 this was input X - and now what a feed-forward neural network expects is that a hidden node should be connected to all the inputs what does it expect it should be connected to all the inputs that are in the previous layer so this is led this is called layer input layer wherever your hidden neurons this is called hidden lair and your outputs are here this is called a output now when you are building this feed forward neural network your responsibility is just one that whichever neuron you're standing let's say we are standing on this particular neuron let's adjust this needle this new on now the only thing that this okay this is northern Iran exactly so let's say we are standing on this neuron not the responsibility is justice that if you go to the previous layer the previous layer is the input layer all the neurons in this layer should be connected to this he didn't know he didn't move right similarly output 2 also expects the same you go to the previous layer so this is the output layer what is the previous layer hidden layer so all the neurons in the hidden layer should be connected to the output node to make sense or if I make it more easily let's say that whichever neuron you're standing just go to the previous layer and all the all the neurons in that previous layer should be connected to this particular neuron right what is this layer signifying here correct let it affect it so what you are doing it you have stacked all your neurons one by one in some format right so all your input neurons so see whatever you made in this above statement here in this above thing here so just X 1 and X 2 this is X 1 X 2 this might be X 3 X 4 X 5 X 10 right so these X 1 and X 2 s are what inputs right so what you do is you cut elect all D and you just put it inside a layer here this becomes a hidden layer now you will not only have one neuron you will have multiple neurons right also what is happening is that you so you what you have done is rather than having just one neuron what you have done is you have made multi neuron system this is a feed-forward multi neuron model here so what you are seeing in the in the previous photo letters and image that I showed you that was just a multi input single neuron but now you have multiple neuron so what you do is you stack these neurons in this layer then you have you gonna so this is just I example what you can do is you can have a image like this also kid near okay so now if I if I if you see it around here so what is happening it these are your inputs X 1 X 2 X 3 X 4 right this is your input layer this is your hidden layer one where all your f fines and activations are happening right this is your lair - again all the F finds and you know your activations are happening and this is your output layer hidden layer hl2 this is Excel one and this is input layer so what is happening is that let's say this is called neuron one dead urdead y1 just month so let's say this is N 1 this is called neuron 1 now what is happening is that this neuron 1 is connected to X 1 also X 2 also X 3 also and some bias also so what is happening here this F fine will be happening and what will be happening it will be here that W 1 X 1 plus W 2 X 2 plus W 3 X 3 plus plus some bias B will give some output called Y and here you will have B having a very small activation function also residing right similarly for this n 2 also this n 2 will is connected to all the inputs rate is n 2 is connected to all the inputs this n 2 is connected now it is connected to this to this this and this now n 2 n 2 will also do some affine and then bring some activation small activation function here and it will also give some output called by 2 similarly Y 3 will also behave now thus by 1 by 2 y 3 will become input for these new let's say we call this neuron as n foe now this info is connected to this this and this similarly again a fine and let's say it gives the output called oh one oh one is here similarly Oh two is here now this Oh 102 is all will also become input for this let's say we call it to open by there right this is over Michael like many movements earlier we were here that there is this some output called by for and there is some output called Wi-Fi right now what is happening why y4 and y5 will become output for this owner on and they will give some output over and for two similarly here and then you will have output right this is how a multi-layer neural network works how will we know that how many layers correct this is one of the very good questions that you know first of all how many layers do you want this is the layers that you have l1 and l2 now you might have another question that okay in one layer how many neurons can effect so there are currently four I can fit five six ten also right I can fit ten neurons although so who is going to tell me that how many neurons and how many layers this is what exactly what data finds you're going to do here I mean everything is built for you that's this this neuron thing and this all thing is every this is already built for you the only thing you have to do is that you have to come up with the optimal number of neurons and optimal number of layers for each two neural network and now who is going to decide this very good question this is you who is going to decide this and how okay so first answer is that hit enter and yet everybody will tell you and I am also getting the flat the first approaches hit enter let's say that you are doing a very new problem I mean you don't have any idea you are just doing so one thing is that you will go with the hit and trial model second is that as soon as you become a little bit expert you know now you start learning okay this is how my model is reacting to this particular neural network layer then you see the model capacity okay so let's say that so see this is a very good question but this is also a advanced question now see when this let's say this is called neuron one when this Meudon one is connected to x1 x2 x3 there is a weight called w1 w2 on the Slayer and w3 here right so but when this n2 is connected this is not the blue moon doubles up the 3 there is w photoblog w 6 so when this X 1 is connected to this n 2 here on this line there is another wait what is W 4 so imagine how many ways are there for 3 inputs 3 neurons there will be nine V excite now there are nine weights now imagine here again that you have three inputs and two outputs we do six then you have six so nine plus six then again here two to the fourth so you will have two plus four so nine plus six plus four is 19 so 19 weights are their engine you will have three biases here right so 19 plus 3 is what 22 so for the small neural network you have 22 values that you have to predict right because Y 1 W 1 W 2 we have taken as as what as random right and biases are also random so you have to adjust them in a very optimal way so that you can get the output as Y so now imagine that you have 22 things that you have a pivot that you have to Train here 21 22 things that you have to or that you have to predict not predict but you have to optimize right and now let's say that you have a data only for like 10 rolls now does this justify that you will have twenty two things two adjacent as ten rows of data but this is called capacity building so you what you have to do is that you have to check your model if your model is even capable if the data is even capable to adjust these values right let's say that you are just doing a simple addition like if you give you two numbers your model has to give some output right now let's say that you have thousand examples for it but now you're building or like a thousand layer and thousand you don cut layer thousand neurons and here then you have thousand layers but as the model required a smooth operator no the data will be lost in some third third or fourth layer of Li right and see what I just said in just last two two sentences this will come slowly slowly slowly with the experience and you want so what you need also been ever see whenever a new problem comes to me hi first of all read the research paper to see because we are not research that may be frank with you we are not researchers who are going to make things we are deep lending engineer say right we have to implement the engineer and we have to solve a business problem through trade so what you do is first of all you read some papers on right then you get the intuition of how do you build a neural network around it then I do some hidden trial errors okay this is how within this some changes and then slowly understand ok when I change this thing this thing is affected more then you reduce and more then you have some prior knowledge also let that said that I told you keep capacity that what is the capacity of the model let's say that you have a very large data but you are just building one layer then the data is under fitted right I'll lecture that you have very less data but you have made a very big neural network for us this is what overfitting right you have a very over number of neurons for your model to be so predictable right so slowly slowly you'll Bing breaker you know you will have intuition then this is I will tell you you know with your robe when you will be doing your first problem in the neural network you will understand slowly see you might be thinking that you know I am NOT taking to your practical right I am not telling you about anything lol example but this theory is really important I might directly jump to the code also but that will make no sense to you after two or three weeks right because code anybody can write until unless you don't know okay what will happen in the next two after two or three months when you will be making your own neural network you will fail that okay which activation Swanson to use then this thing will open okay so see this is what he is talking about to all the people this is what he's talking about let's say that you want to add more layer so currently there is only two hidden layers you want to add more add more but more right if we want to increase per layer you want to increase on neuron you might go like this also okay you might click we reduce it to like let's say six yes I'll reduce it to four present ons is one right this is how your neural network will look like these are the two inputs now every input is connected to the every neuron in the first hidden layer right then second then third and fourth similarly and this is how the data is flowing inside a you can see now slowly you then we put every X into each neuron does every neurons give same Y very good question but no C came if we if we if we come here it would have been the same if they would have the same W right so if X 1 is connected to N 1 here at this left it has W 1 1 right but when X 1 is connected to n 2 here it has something called W 1 2 right so there are different weights so it is coming from x1 but but because it is connected to different neurons different input neurons they will have different weight also understandable so to other people what km is asking is that okay that you are doing a fine I understand that you are doing a fine but let's say that let's go back to a fine equation because the value of x1 and x2 is constant the output will always be constant right but no they caused by one is made up of this this equation and this is equation is also driven by W 1 and W 2 and the W 1 W 2 values are different the value of y will also be different in each case right make sense right and then you are asking about a real-world problem this is how a new Don will book okay I have this also don't worry so you mentioned the overfitting and underfitting neurons like mom yes so how do we come to know that based on the Y or again a very good question so I took this to word called underfitting and overfitting this is how it actually work is intuitive gif okay okay so what so ratios asking is that know that I took two words in overfitting and underfitting and how do we address this problem so what we do is that we divide our data into two parts and I think humans have also done this in ml also right training data and validation data right and not to does three perfect training validation and testing right so what you do is whenever you are training your model you just make a slice another slice for data and this is called validation data and you will never for this data to the model right so no you are never going to train your model on this particular data so what we're doing is you will never show the data go to the model the model will never come to know about it so it's it will never learn about it also now once your model is made what you will do is you will test it on this data what you will do you will test this on this data and then the model will say okay this is how it looks like okay this is what the predictions are and then you'll see the then you'll see the predictions and then you'll compare the predictions okay okay this is not what I wanted so the model is underfitting or overfitting now how will you do under fitting and all fitting you will do a confusion matrix and then you'll see that if the model is just giving out you only one classes let's say that your data had ten classes and if the model is just pushing towards one class then this is overfitting the remodel is over fitted or a very lot level okay then we have another question from came he says that it means now this dog image when we expand this image mathematically this will come out to be three things one is called X 1 X 2 and X 3 right now what we have to do is this X 1 X 2 X 3 is done toward from our end now this X 1 X 2 X 3 is connected to this neuron 1 ok now X 1 is connected to and when this is called W 1 1 but when it is connected to n 2 it will have some another random bit called W 1 to W 1 to make sense so number of neurons you can increase as much as you want as much you can have n 3 and 4 and 5 and 7 7 doesn't matter but you have to make sure that this W 1 1 is not the same weight for all N 1 and 23 right good thank you so inside the matrix you can see there are multiple values the value ranges from 0 2 to 5 5 to 5 5 is 4 like utter black and 0 is for a white right the value ranges from 0 to 5 you can see the more black it is the more to valuated it has right so what is happening is that these are the multiple inputs that you will have inside your images multiple inputs and all these inputs will go in here so in this particular example you are seeing 3 but this time you will have more so these all these all the zeros in first all the values in 2nd 3rd 4th 5th row till the 28th through you will have all the numbers inside this input layer stacked one by one so ultimately you will have 28 cross 28 is equal to 784 so you will have 784 inputs normally for this small image and this is you your guys we're all asking about me how this will work in the l1 so GL well Malik said that you have a image of 500 cross 500 so that becomes around like 25 W double zero that means or 2.5 million inputs you will have for single image right and let's say that you have ten thousand meters so this becomes x ten thousand right so these are number of inputs that will go inside one by one by one by one by one so one image will go one time to image will go to second time third time and then days on this the the output is the model is going to do some calculations in the hidden layer and then it will predict the output and it will give you some output it by okay so now we turn up and we turn the pages and we'll move to multi-layer perceptron but now we might have multi-layer perceptron architecture now a multi-layer perceptron market it just simply means that you might have multiple hidden layers okay so apart from having an input layer and output layer you might have hidden layers in between of them also whereas a single perceptron was only a linear function a multi-layer perceptron will have multiple linear and nonlinear functions a combination of them why you shouldn't you if I was not able to continue to do today also you will not be able to continue okay don't worry I'll push the notebook in there and then hey so I think you already thought it but I have an image of multi-layer perceptron also okay so this is how a multi-layer perceptron looks like if I just minimize my screen little bit to you this means it is no okay just to memorize things up now this is excellent this is what your input x1 x2 is are your two inputs right now you have a single hidden layer right now here you have this one neuron here now you already know that what is the calculation going on inside there is the affine function a fine function what it does it takes input from all the behind layer neurons right so if I say it takes input from all the behind there you know any a if this is a hidden layer h1 then it means that if you take inputs from all the all the neurons inside this input layer effect so this is x1 then this is connected this is X 2 this is also connected now whenever a neuron is connected to another new unit is connected through some weight gage corresponding to it right let's say this will W 1 this is W 2 right and then you might also have a X naught now the X naught is always equal to one so you might say this is just a bias bias as B now this B is like constant to this neuron also and do this all right okay so now now this is X 1 and X 2 for this when they are connected to this neuron the one it is downwards in this news on that are connected again to some weight it but this weight it might or might not be equal to the W 1 that is that is the one I described here the one described here this is this is never going to be equal to this one right so these are different different weights different but you might to make it more simple as you might think that whenever or whenever there is a string connecting to neurons it is connected to a rated and this weighted is different for each one right this way th is different for each one you might also see the equation here down here the blue not in plus 1 plus W 1 X 1 up to X 2 now you know that W naught is equal to 1 so we have replaced W naught is equal to 1 so X naught is equal to 1 so we have replaced X naught is 1 and the blue not W not in itself difference in W naught or what people like to say it's called as notation called B and this is called the bias of the neural network ok bias and these WS are called up weightages right and then you do again a summation out here and this thing we do the final output and then you give the output right in right you also remember that after every neuron there is also activation function which is sitting here here and here they are also present on the on the from the outer section also outer section also this in this they're present here also so they and then you read that there are different different types of activation functions activation functions from sig from step function to sigmoid then to softmax we have different different I will slowly go through each of them and then you'll understand that how they how they interacted how you might be also changing them according to your needs [Music] visible now so now once you are done with this perceptron layer now okay so this is called a feed-forward Network just one thing feed-forward unit but when you when you come from here this neuron you go till here then they have some calculations goes on then you come to here which is taking collections from days and just neuron and then you give them the output right now let's say the output comes out to be some by one that is the actual output but let's save buy right now there is a error associated to it so you say that you have a error and what is the error Y minus y1 this is their right why was the actual answer that should come right but you are saying answer is y1 then then Y minus y1 is actually the error what it is it is actually the error this is actually the error by - I won by a model you you say - a model that model you have done some kind of mistakes now if you go back in inside your session if you go back in back to your session you might see some errors inside it and that error might be equal to Y minus y1 so go and do the changes now if you see our our function that we have done till now the result through which we have come till now here at this point y1 it is all how how much you know now dynamic things you have taken IR I mean non-static things you have taken C x1 is always going to be static right this no one is going to change this x1 x2 because these are given by the data at rest right so the data can never change itself it is you who has to change right and then what is the thing that we have taken a 0 these are the values of W 1 and W W 1 W 2 and W naught these are the venta Miller lis DW here I mean all the W that are here all the W they have a here V is valued or something that you have assumed right so now what the model say the model say that the value that you have assumed the blue one and w 1 w w LW n these values might have some kind of a correction because the current values of W are presenta are giving some error that if Y minus y1 so to ready use this error you go back and check your values of W and switch the values of W to such an extent so that they minimize this error right whatever I fed just now statistically right make sense to everyone right what they just said here okay now this thing is called back propagation now this thing is called back propagation backpropagation if I like take the copy of we tried different activation functions - yes you might drive whatever activation function you want to drive but there are some okay so oh man and are this question that you know that how do can we use different activation functions also yes they depends on the problem that what functions do you want to choose so I'll tell you my particular example I was working on this library that is called D or D Phi here what we are trying to do is that we are trying to restore colors into images that don't have color so I mean if you if you talk about images that are like from that are before 1924 so we have some German German RK art artifacts those are before 1924 and those are totally black and white images to us with so what we have to do is that we have to fill images inside it now filling images inside it is I'll tell you it's a complex process but there is a form of for images that comes out I did called the LAV images so if you if someone of you has an interest in photography or or if you understand what images are then you might have heard this term called la la be images to convert now the range of l-e-d images is from minus 128 to 128 so here my task so if generally if I if you see the activation function if you remember from the previous class all our activation function term ranging something from zero to one or or like minus one to plus one right now I wanted something that has to go from minus 128 10 plus 128 right and how will these things to go so accordingly you can change your activation function and you make your own function out of it right make sense yeah I forgot you in last lecture you said whenever you trying to bound your values in a certain point that new you have different activities to activation function I just died just let I just remind mine okay got a correct correct totally correct so this is all about the feed-forward now if you repeat forward if I just say in one line feed-forward is going from input to output now if you want to come back from output to input to make correction that is called back propagation now back propagation is one of the most used or I should say okay now visible hey now this is one of the most not used but this is one of the most obvious things in backpropagation this is in a a and why this is famous because this is the main backbone for a a this is how a a learns from learner itself right so people to that a I is learning from it no it's not learning from itself it's learning from the others whatever error it is making it goes back and it checks to see a human tendency of error checking is worth let's say that you are doing some calculation and you do some error what do you do you go back again to the calculations when you check okay I did this mistake and then you correct it right this is also the way how machine is also understanding so what machine does my machine does some calculation it calculates that there is some error then based on the assumptions that it took or based on the assumptions it took it thinks okay this the other things I assume and this is now the error so what it does it goes back and say okay I will change my assumptions I'll tweak my assumptions and then I'll see what is the error again this is an iterative process so again though let's say that in back propagation what will what you will it do it will try to change the values of w1 and w2 right it will try to change these values of W 1 and W 2 and then it will again go in the forward or forward direction and it will again see ok again there will be some error but this time the error will be less as corresponding to the patrizia's error right now again it will go back then come forward again go back forward go forward invert forward backward forward backward this will happen multiple times and at one point the error will saturate the error will look like this so error the error occurred graph will look almost like this here error is on the y-axis right and this is whatever you say time on the number of epochs or steps lengths as of now steps means that how many times you have gone a forward backward forward backward so you can easily see here that dot the error was very high at this particular point but then it slowly slowly slowly decreased it decreased still here but now it's almost saturated like almost saturating a trend at one tenseness it will be like parallel to the x-axis it will become parallel to the x-axis this is how it look right even so how this program will go back and forth is its API gone we have to manually do this every time we run the program each time and or run the program in the reverse order once I can you okay so so so the point is back to first of all we do it on in our hands so we will do it manually first okay but good news for you is that no you don't have to do it manually all these things are all these things are like handled by the the framework itself he times we have to go back and forth for the the minimizer this you have to decide you will see the error graph continuously and then you'll decide okay now I want to stop or okay now I still I want to go for the this is up to your hands okay make sense right I mean see what if if they can even tell this that you know if they if they can even tell this at how to stop then I think the problem is done I mean then we will be doing nothing right we just open the and we just say this is the code read the code and do it it they'll read the problem and do it itself right no the problem is that it will do everything all the calculations for you but but the mind where to stop the problem where to go ahead with the problem is still in your hands you have to still make sure that where do you want to stop the things or where do you want to again a bit of further out of a fight make sense right okay let's move forward I have one more charity from you let me be straight okay so don't read this I will tell you this is just for your your purpose or at it in a weekend when you read is when you go back and you want to don't have to take the visit the whole video don't roll through the video you can just read it through the notebooks alright that's why I am just copy pasting it theory here also so so what happens is I have already told you that you know all the edges that you are hitted that you see here that all these lines that you sorry all these lines that you see here these lines these are called edges all the connecting things are colleges now for every corresponding edge you have a weighted rate for every corresponding else you have a weighted and these weightages are by default randomly assigned to you right now they're randomly assigned a sign at the very first point when the problem start at that particular instance these are like randomly assigned to you now for every input in the training later the ANM is activated and this is what I told you also right that defense will do all the calculations will do the first feed-forward neural network will check what the output and then the output it has compared with the desired output that we already know and I already know this is also you might have heard this term again and again this is called ground soup right so see a deep learning may you might have heard this is not equal to what traditional technology is in traditional programming you have been given the input and people ask you output but here it's not like that here they will give you the input also in the output all and then you have to define some way to convert the input into output right this is how machine learning works right because they give you input alpha and then they give you output also so like say let's first on the first example if someone tells you that you have to do some image classification problem then this is not like they'll give you at the very first instant this is the image now tell you what it is no they'll give you a training dataset where they will give you the output also with the image they will also tell you that it ok 9 is a 10 here 5 through 10 years I mean written right so you have the ground roots with you also now what you will do you will compare your result with the ground output and then the error is propagated back to the previous layer if you read the line typically where the error is noted and the weights are adjusted accordingly this process is repeated till the output error is a predetermined threshold value or where you think that it is being saturated right this is what I also told you now if I tell you that how it goes ok let's see one by one mathematically I will show you one by one also it's it's pure math okay just a little attention and then we'll be here let's say this thing is called okay or otherwise what I didn't do is I will just pick up the smaller problem and then I'll show you also noticed okay just this is I didn't let me search for the image that I have you okay sorry I cannot find it wrong otherwise but okay let's go let's see that error is equal to I will write it down error is equal to let's save by actual minus y predicted right native random error is equal to this e Li okay now if you see that how is this y1 calculated then let's say that we we had an input called we had an input we had an input called I 1 and let us say I to write down and let's say that we only have this one okay let's make it more simple right I will take justice in it that will that will make you understand how things were correct let's cook pickles if I night [Music] this is the example right so now we have input input as X 1 and X 2 right and we have a output called why let's say that the actual order desired desired output be why a right this is what we require from him this is how we have and the value of y is equal to the value of y is equal to W 1 into W 1 into X 1 plus W 2 into X 2 right everybody agrees to this right and how's the how will this calculated right now the ka and let's then define error errr error is equal to error E is equal to y minus y right or whatever you want to say Y a - why doesn't make much problems right this is how this is working so now if I say E is equal to Y minus y a then I can say e is equal to instead of Y I can write w 1 into X 1 plus W 2 into X 2 - why a right now open up this doesn't make much sense but let me write it down w1 into x1 plus 2 blue 2 into X 2 minus ye right now sorry okay now the error is this right now what do you want to do now the final equation of error is in front of you right if I write it down the final equation is in front of you you can now do you have the final equation in front of you now you have to see that how to minimize this equation right how will you minimize this equation this is the equation of error in front of you now what you can do is now you might be thinking that okay this edica equation is just a simple linear equation I can solve it linearly now I imagine that instead of X 1 X 2 you have thousand X n right now this equation is not a 2d equation now this is a thousand dimension equation now you cannot just simply do things and just check it around right you cannot just do it like that you have to you cannot just plot a graph and you can look into that right this is a little more complex than we thought of it right if I show you how the graph might look like right the graph will look like don't just read this just for the time being forget this thing just don't read these things okay so what I can do is I said okay this is your error okay question wall let's say error I can also say J of W you might see this term in logic a of W as error okay you should not do this this is not making it good okay when you plot this thing here now what you can see you see X 1 is X 1 is what X 1 is X 1 is a constant right X 2 is a constant and Y is a constant you all agree right that X 1 I will put a star and say X 1 comma X 2 and Y a are constants right you cannot change them these are constants right they they make no effect to e the one that is that might change E is w1 or w2 okay for effect none of you told me but we missed a bias also bias be you miss this bias be here right say this be what this P is I mean it's not going to make some effect or but still just for the courtesy let's what is right to anyone any doubters still here please ask me questions right so now we can see that okay what is happening is that E is only dependent on W right E is only the error is only dependent on W 1 and W 2 and B no no none of the other people are changing this key so what we will do is we will try to plot this error with respect to the weight and we you see this is how the graph looks like and and what do we want to do with the error if error is by a - why - why a what is our aim our aim is to make why a is equal to equal to why a as equal to equal to Y right this is what our aim is right if y a is equal to equal to then we are saying that the error should be equal to zero right error should be equal to zero or tending towards zero right but if I say mathematically zero is I mean if I say statistically this error is never going to be a zero right this is never ever tending to zero so what our main focuses our main focuses that if e is here our main motto is to minimize the current error whatever the error is we have to minimize this error now the value of this error can be found here right you can simply see the graph and say okay this is the value of e just take the value of W and plot them right but as I told you now these errors are not just two just two inputs we have like thousand inputs cuz we might have thousand of w1 and then one B also so we cannot just plot it so it will become a thousand in one dimension graph you cannot just plot it so now we have to think mathematically that how can you be how can we minimize how can we minimize the error II now minimizing the error e but we also have the like the equation of a also we have the equation don't think that we have the graph just think that we have an equation of equation of Yi also right this is above write an equation of a is somewhat like this I copied this from here this is somewhat like this right this is somewhat like that this is what the equation of e is and this is how we have to minimize now in mathematics we have learned in inferential mathematics has learned that if you want to minimize any curve if you want to minimize any polynomial curve of the order greater than two what is the process anyway tell me if you want to find out if you want to if you want to know that where is first derivative I believe I am 30 but go take the first derivative take the phone okay okay correct they take the first if it's very much correct so as amar said that you have to take so see what the first derivative tells you the first derivative tells you that wherever there is a change in the slope of the graph if I talk about what we learned in our higher education about differentiation was the differentiation tells you that point coordinate where the graph changes its slope from positive to negative or negative to positive so if you take the first derivative it will tell you this play see it was in negative slope that it came in positive slope now again here when it was in positive slope then it came into negative slope again positive negative positive negative positive negative positive negative so it will tell you all the places where it has changed the changed the slope from positive negative or negative to positive I suppose it's possible right yeah like for a global minimum and maximum you have to take the second derivative and second derivative correct correct correct correct correct so this was how how we talk about influential mathematics but if you go in descriptive mathematics it says that what is a derivative of a function a derivative of a function when you have a tangent that is parallel to the x-axis right when it is parallel to the x-axis so how a when of where will be the tangent parallel to access is whenever there is a slope that is equal to zero so if the slope was from positive to going to negative or negative to positive so both the mathematical percepts tells you that there is a change in the slope and not right so similarly we can know that where are the positions where are the playful where are where we can get a change in the flow now as Omar very very uh-oh he suggested that we will get all the points where it is maximum and minimum also and we just won the point where it is minimum I mean these points we are interested in the one that are by doctored by agree either this like that this like that this test right but now again there is a problem is that genes are very higher error this this one that I am just marked as even this is a very higher level of error right we don't want a higher level of error we want a lower level this one we want the e3 we want a three to be our error so what we do we take double differentiation we first take differentiation then we take double differentiation now double differentiation then Omar can tell me what is the condition if it is less than equal to or equal to less than two okay I'll now with the first derivative you can know that where the slope of change but if you want to know that which is what it is it is if it is a Maxima or if it is a minima how do you know it you know by doing a double differentiation of that thing and then checking its dimension so if the double derivative if the double derivative of any of your function is greater than zero if the double derivation of a function is greater than 0 then it is a minima right and if the double derivation of any of your function is less than zero then that is maxima these are the conditions then you will check accordingly and eugen you will go for minima now again the problem is that ok now you will reject these these points you will reject all you okay so you might get got this you might discard this you might discard this you might also discard this right so you will you will you will remove all these blue points now you will only have the green points with you right now you want this III to be there right now we don't have any mechanism through which we know that we have reached global minima or not this is the dilemma in deep learning and for slowly what we do is now understand this mechanism this is called up this is this whole thing that T that I am showing right now is called gradient descent or GD right you might have also heard this as SGD stochastic gradient descent when you do a gradient descent in batches it is called stochastic gradient descent okay I will take a pause here and to anyone who has not understood this thing please ask I know I might have taken a very very quickly this up and and again one more point to two other people also who are not mat savvy see this is this mats is just for your understanding you won't be doing all this math in your program no you won't be doing all this in your deep learning courses you but you will be the striking the direct code or you won't be doing all this mathematics this is already done by the framework so don't worry about this but but again my point is please try to get this inside because this is necessary right this is necessary if we do not understand what is your framework doing then there is no sense of a framework doing something right so just Pisa and if of any doubt please ask please Ark do not hesitate because I know it took me three days to understand this thing so I was okay so I am assuming that everybody understands still here okay so what I will do is I will take the first derivative of de sorry de now you know that E is equal to this let us write it down so now you have seen this one thing that the now the now the now the e is dependent on either w1 or w2 so what you will do is you will take the derivative with respect to with respect to tw1 so what will happen it will say x1 and then everything is equal to zero right because these are all constant director wrong what I am doing is I am trying to take a derivative with respect to w1 what will happen one into one totally correct plus w2 is again a constant plus constant plus constant makes no sense right yeah and if you want d e by DW - and what will happen it will give you X 2 right 2 into 1 don't worry again - ok it's in chat just a second again see someone is asking me this question that you know that is this really required no this is not required this is just for your reference don't worry Sharon you need to add a constant term to into the derivative oh yes yes yes so we need a a we add a constant term but here we we know that that constant term is are equal to zero so yes he is right we have some c1 c2 but these are constant as of now for us they don't bring much value to I put this picture now so see this was just for one right if you go from here till here this was just for adjusting the new values now what will be the adjusting values of w1 and w2 so now we already have the older values of W and W so the new value of W 1 W 2 will be T W is equal to W 1 minus D e by d w1 two will be equal to w2 minus d e by TW two right this will be equal to this and but there is a special term that is called alpha that is associated here it is multiplied here I tell you what l5 here I do not know how to type elfin like latex okay you okay so now you might ask that okay this is the new value of w1 and w2 right so now what will happen this this older value of W 1 and W 2 will be replaced by the new one and then you again go forward and then you again you have the error again come back change the value again go forward backward forward but there is one thing that you might ask me that what is this K here this a is called a is called alpha colleges not air is actually alpha and this is also called as learning rate or you might see people calling us a large right now what learning rate is not just pay a little attention towards here right so let's say the previous value of W 1 and W 2 will give you some error that would be here at this point the one that I draw very small here right the next one will be here then here and here so it might it might vary that how gradually you are moving your W let us say that you are making very small change in W very small change in W so your error lets okay okay okay where is it so now let's say you're you have taken some you know initial value of W such that W one such that the error is here right at this point at this particular point now you make a very small change in your W one and make it to W one - so what will happen the F the error will shift very slowly also right your error is also going to shift very very slowly because you have made a very small change in W one or otherwise let's assume that you might dare your W one originally was here and you made a very big change in W so your error might behave at this point new error might be here or it might also happen that it might come here also right it might change that how gradually you're moving your W so there is this term melody that it is that says that you should very slowly move your w so that you do not miss your minima as you again let's assume that you slowly slowly moved your hair till here so you have come till here your W one is here now and all suddenly increase you did a change in the blue one so large that you came here you came at this point now this is bad because you missed the global minima in here right so what people say people say that you should very slowly move the value of W very slowly slowly so usually you should do all the changes very slowly and your de by DW your de by DW should be controlled by some another external factor and that factor is called alpha now because if you go down the screen if you see I have multiplied this de by DW term with a thing called alpha now this alpha is with you this is a constant and this is in your hands you can change it any time that you want right so what you will do is and move back up here so what you will do is that you will select the value of alpha such that it is moving very slowly right but now there is a disadvantage also because if you move very very very slowly then let us say you were here then you will come here at this point then here then here then here so you might reach your global minima at a very very larger time which may take a lot of time to you so you have to take a very critical value of alpha but that it moves the very lonely I mean slowly be yeah how are you trying to like tune up your model this is not tuning it is just adjusting the the thing that how the model is reacting to the changes that we are aware that we are doing so it's like we are controlling our own process of the flow of gradient and alpha keeping the value of alpha yes you can say this is one of the parameters of hyper parameter tuning hyperparameters tuning HPT yes this is one of the small features 30 or tuning so the value of alpha you should keep if 0.001 but again if someone says you this will look like I just said you that you should keep your alpha no I am totally wrong it depends on your data that how slowly you see that your modulus moving this is just an initial value that you should take so if you if you will see all of your major frameworks like Kiera's or like tensorflow you will see that they have put a value of 0.001 also by default make sense till here anyone is having doubt let's say that if you want to do it for multiples itself this is how it looks them I don't want to scare you but this is how it this mathematics goes backwards and then you do all the changes slowly slowly slowly you if you wanted to see the weights adjusted also I have a picture for that also let me let me let me let me yes this is all going slide anyone any doubts no okay I some questions good no questions and I think we should move forward and I should show you some things you okay now I will take questions from the previous love that someone asked me that how does the image looks like and I showed you this gif late so I this image is nothing but image is not just numbers numbers ranging numbers ranging from 0 till 2 5 5 where 2 5 5 being the highest value and zero being the lowest value and these are called pixel values these are pixel values VI values right is a pixel values to you these are pixel values this will how the images work but the image that your that you are seeing on your screen this is a single channel amou right this is a single channel image now you might have multi-channel image also let me show you the demo okay okay well last time anyone in doubt puts in back propagation because this ends your theory for your theory photo deep low for neural networks right near light or intro is done yeah minused was the data set where everything was written you know in 1 2 3 4 was written rated on images and we need to predict out and what is written on the image right I told you there are two things now one is the image and second is what is actually written on it is called labels right so we got the data set and then we started doing it on kiosk but today we will do it on tensor plural so why don't ends of no because I want to give you a basic idea how coding tensorflow is also written and how Kiera's is helping you and you know how much lowering the works otherwise you won't get to know that how much things Kiera's is doing you know itself right you I'll do the first basic and if you don't if we don't if you think this is not visible to you I can switch to this also you numpy as NP and in both what else can we do my plot lab and guys I am requesting you please I do I know you do many of you are working and you don't get time to do all these things but please take out time and start doing at least some revision okay I'll do these three imports I'll check the version of you know what that we are using so I am currently using one point 8.0 if you are using anything else don't worry ours code might at some place differ right don't worry what if your code white do I get the data I get a data from from hence flow don't examples dot tutorial start a minister so actually cancel flow is already downloaded not downloaded but they have put this inside there in their syntax so if you go in fact these tutorials you will get a lot of I show you also let me show you how many examples we have I have also never seen these but you okay so currently they have only a minister they're adding slowly other things also okay so we have a minister then we do the import input to data also input data let say we do this anybody remembers to sinter that is called a read data sets three data sets expects from you is that it requests requests you the input data dot reads first is the train deaier where do you want to keep them where do we want to keep the data right you can see all these things it's written here right train data data is equal to false yes we do not want to fake it a hot hot one hot everybody remembers one hot encoding right that we did last thing also we are in doubt please ask let me know the current D type default D type of the data is float32 not good for us reshape equal to true validation is equal to 5000 that's correct see it is equal to none and source URL it is written here so the URL is actually help from where it is going to download so you do not have to tell it dot download for it wait we will just say where do we want to keep the data so I'll tell I'll keep the design they keep the data in my nest underscore Gaeta right this is create a folder there a minute underscore data and I want the one hot encoding to me on Sony again I do not have space to write because of this right you okay so what has happen here is that by default if I see and go in my data section here so I might have a minused data set in my data folder right I will show you I have a data folder and see there is this M in his data and all the three files a four files are here which for files if you remember we did this also in the nest [Music] right these four core files that we saw as bad also I'm not wrong we have done this this I have shown you this website I have shown you how it what data set is right correct we from wrong okay so now what we have done is we have downloaded a data and it isn't of one hot encoded data set right now I did one mistake here I should have first of all said import warnings mornings you what filter warnings and ignore why I am doing this because I don't want to see this all warnings here you can see this is just warning I don't want the warning to be displayed every time just to don't to point it so now you will see see no warnings correct now let's see what is inside em - if we check the type of this so it says that it has a type of a thing let's see what things can it can help us okay so it has actually three things test train and validation let's go inside what is this test train and you can see that there is dot index and dot count it means it's actually a couple it's actually narrated from a couple thing right go inside train train May if we see there are images and labels right let's go to images and then let's see what is the first thing first is this image this image looks like something like this let's do one thing let's print this earth let's plot this image right how do you plot this image PLT dot I am show right but I am sure will take two things what but first you have need to make sure that your images are 2 channel 3 channel image but if you see what is the shape of your current image currently it is 784 784 is not a valid image rate our image should have a train sorry a width and a length height and width simultaneously right so there is no so what you have to do is that you have to do any shape of this hole we know 784 is a square of 2012 28 and we know these are the images of 2028 now you get 28 Crossman we will do a one also why because I want to show that this is a one channel image now if I do a plot [Music] and stick right what we can do more is that we can add it we can make it gray so that you can see the actual color this is how it looks like alright so the first image looks like like this this first image looks like this what I can do is I can print the label also for this corresponding image I'll do I'll say print a minused dot trained dot label would have to make sure that I am giving it the same number so have I given zero whenever you see because this is one hot encoded now the data is in 100 coalition and you can see 0 1 2 3 4 5 6 7 right so you can just say that I can make a function out of it I can say lord underscore image we can accept a number from the you let's say we want to print five nine nine hundred minutes this is at Syria zero one two three four five right I take a pause there if you have any questions queries please ask their definition what can I please explain what the other is mister trained out labels I am sorry can you ask again I'm so sorry in the in the above definition different defined now lordotic seen that okay so what what a minister is giving you a minister is giving you three types of data if you see here if you pay a small attention it says three things test train and validation right so now what did it was you went into train and then you said how mine things are there there are actually two things there is images and there is labels also so labels and images imagine they're a very long list a long array where for every corresponding image there is a number written there so what I am doing is that I am asking a minister that insider train data set if we have images give me a image called num num is actually a function here right so num is actually containing five nine nine so what is the five nine nine image just try to plot it now because if you see the shape of a - train images it is actually 784 comma 0 now image is not actually a 784 comma 0 right it is flattened image just to save the data right so what we are going to do is that we did a reshape of this image we because we already know from a Yeley coincide if we go to the young liquid side you can see here that they have clearly written that the image is actually a 28 cross 28 events here if you see you if you read this very carefully it says 28 plus 20 right so we know that how did that number 784 came from so what we are doing is that we're not converting it into dot reship 28 cross 28 and then I'm asking don't bring the normal image just bring to me the image with the grayscale type initials and then and the number I gave it for image I am giving the same number for the label also make sense and thank you okay any other person who has any doubts concerns is not being rush button right once we have got all the data once we have got everything inside us what we are going to do is now we are going to make tensorflow variables so to store all these images beds and biases we are going to make placeholders variables and constants if you remember these definitions so what we are going to do we are first going to make our T I've got placeholder this placeholder will create will will keep all the will keep all the images and we also have a wire that will be a give that will a Y that will contain all the labels right for every images or there is a label so what will the model to model we'll try to learn all the VTL from the image and try to map into 1 2 3 4 5 6 7 8 9 right so we are going to say D F dot placeholder so first is actually the D type D type is VF dot float32 correct then it asks for the same second thing is that this shape shape is equal to none no no no no we don't give shape is equal to 28 cross 28 we give it you we give a duffel image we are not going to do the unfolding of the image will let it remain platinum right now we are going to make up weights how do we make the weights weights is equal to TF dot variable right by default we are going to initialize them with zero correct how many zeros 784 the number of inputs and the next column output will be hundred only right can anybody explain me how did I came with this number of weight 784 in 200 anybody know I'll tell you so see what is happening here is that we are creating a variable called W now what does W is going to correct correct correct so our neural network will look somewhat like this but sorry we do not have so many lecture this is the input but we have only ten Oh this is the input layer input layer has 784 column that we already know and I think her dad correctly okay and this is the output you this is good oh right so what is happening here is that you have 784 inputs of an image inside your input layer and then you have ten outputs now how many weights will be there inside let's say this has to connect to every 10 of the output 3 then 4 then 5 then 6 then 7 then 8 9 in 10 10 so this is just one input then you have 783 other also so what will be the number right so everybody understands it 7800 that will be 784 in 210 right that my number of weights we will have insider thing right and I did a mistake sorry I should have written it 10 I will turn it hundred but right these are number of the bits and can anybody tell me how many biases will be there anybody how many bit and how many sorry biases correct $10 why 10 see because you will connect this 784 let's read our 784 indeed rs10 right one two three where the bias is connected under this end tea now once you're done with this now once you're done with this w sodium to interrupt a usage a Tobias's will be connected on to on this ten biases into the number of hidden a neural net new right of it correct correct I'll explain you again if this is not okay other people also wanted to I'll tell you so see again I will draw let let me let me let me bring out my it's not your I lost it okay so see what does this is this is your input right these are your input and let me change my then this is your output now how many outputs do you have only 10 right currently I am drawing just 5 now how are these connected these are connected somewhat like this so weights are connected to every node right everything is connected to everything right so you can what you can do is how many weights will be there input into number of output right input into number of output is equal to the number of weights but how is the bias connected a bias is connected somewhat like this give me let me let me let me give you an example here a bias is connected to every output neuron like this be even then we do than v3 before or you just imagine just imagine that bias is nothing but bias is actually a input only but that input value is one so let's imagine this is X naught here and you have input called from x1 till x7 84 right these are inputs now just imagine one thing only one thing that if you have outputs here and these outputs want to connect to this X not how many lines will you have if you want to connect this X naught to every output how many lines will you have you it will be a number of good into yeah number of outputs only because you're connecting just X naught two outputs right so now how many outputs will be there is number of biases right so how many outputs do we have 10:10 so the bias will be if we consider yeah if we consider only one and direct in this case we are considering one net only rate one in the layer and output layer correct correct correct correct correct so what we are currently doing is we are just making it for one layer right so if you are making it for multi layer and you have to do it for every layer from here till here then for here so you have to do it for every layer yeah did I make sense yes 2:01 also what is this this helps you concepts of biases how did biases came up so bias imagine bias is nothing but actually an input called X not hoooly whose value is just 1 cool little bit Syl query please ask whatever you have in your bank best let me know you I have to let trader a little bit more okay okay okay I said I I think I suggested you guys a book called neural networks and deep learning by Michael Nielsen that is a very very genuine book please read that book that is very helpful I was reading that biases threshold value threshold value of what not get it okay okay you are saying that I did not get the book okay I understand you didn't get the book URL or or you or the content throughout the book is free I am again posting the URL here no I think I posted the URL here only in the chart only in the last class this is the book do read it it's it's pure theory the people who want to go inside that theory of neural networks and deep learning they should definitely read this book there is no escape from this book please use this book okay I should go back through buddy yeah actually the bias is for simplifying the mathematical calculations right am i right correct totally correct totally correct so what happens is if you are going to be into let's say in a binomial or binary classification problem where there is only one thing that is true or false at that time we face the issue when we are calculating the cost function that time we face this is that's why we have to introduce this thing called so this thing of bias has a weight only just think of I was reading Newell's output zero one is determined by the weighted sum which is LS the greater than the sufficient volume Lu and that value they call it as bias correct correct so why am i they have written this is because let's say that you're by the touch of 8's is going nothing it's going nowhere right and let's say it's a zero then who is going to decide your output so a bias is actually a safe safe side prediction where we're not a prediction but a safe side calculation that we do to avoid things so whenever you will do a binary classification you have only one thing right either true or false right exactly calculate a cost function you will get an error there but how will you produce for zero and how will you produce for one that's why we need to introduce this thing called bias inside so just consider biases another weight where it is actually connected to an input where input core value is equal to one weight or weighted summation of only now I'm saying just consider just consider just for simplifying the process just consider bias as a weight only and weight of what weight of an input called X naught whose value is equal to one so when you so what is the work that we are doing is that we are multiplying every what are we doing and I find I find me we are multiplying every input with its weight right you remember - I find equation x1 into w1 that we discussed in last class so if the value of we do like this right x1 w1 so what is happening is a value of x1 is equal to what I am saying keep it 1 can that become that dream what what were you calling that is this called this perceptron model goodbye will relax true here so perceptron models at him but it is it but it is the same model that I showed you on the very first day that was just single neuron and then I find was happening remember I was just showing you a single neuron they were there is only one circle and a fine is going on an activation is going on that is what represents of a single neuron entity of a neural network you now let's say that be a bootleg three things X W and B these are three things that you want to declare we also want to make a Y also correct see how I am going to declare Y is very important I will say T f dot n and dot softmax what the softmax you guys remember softmax was an activation function what was that doing can anybody tell me okay how many are able to relate to this thing that I've written you simple animation okay and how okay so oh you are totally correct and how many of you can relate this thing to just a second where is our last class just give me second last class Curie to intro to me will you and how many of you can relate to this thing you how many of you can relate this with this thing this is just a sentence that I have written here on the top and and this one you decide deny until then have functions yes km is totally right it is called a affine function and then something is asking deciding which neuron to turn off and turned on so I will say as think that so of my sunshine decides which one the neuron to be turned off or turned on on the basis of weight and biases exactly not not exactly but you are somewhat right but that is a special case that you're talking just a specialty and that is called a binary classification whether that's just true and false right so it has no option so what it will do it will go with the maximum probability and it will say okay because the weights are more you are allowed to go right whereas what is a softmax function can anyway tell me the equation of uh softmax functions from asian of e to the power minus Z above e to the power minus Z so what does actually is doing is it is taking all the probabilities of all the things and it is making sure that the probabilities because sum is equal to what 1 so let's say that we are doing a minus problem like where what what are the different different outputs that we can have zero one two three four five six seven eight nine right so we can have ten outputs right so what will die if I do a file for every output if I show you just give me a second so whenever your your output your output is in just imagine these are 10 outputs right and you gave an image of five right what did it we gave an image of five point two three four five six seven eight nine and ten these are ten outputs right and let's say this is the fifth one so let's say you gave the fourth image right you give you gave the image called four right you gave this image that was looking like this now what is happening now the neural network has done all done all the calculations and it thinks it's the fourth image right so what it will do it will give probability it will give probability for every class it might be high or low but the model will at every output model will give some some probability that this is this probability I think probability of having a one is this probability of having a two is this three is this four is like 98% right I give probability for all now what you'll do you'll pick the maximum one now who is giving these probabilities actually soft max soft max is distributing so - sing it oh so it is just converting data into varieties - yes totally correct totally correct it is converting the data into data probability prediction so soft max is converting all the numbers into probability distribution whereas the sum of all these probabilities all the probabilities that it is giving you out for all the outputs the sum of all this should be equal to 100 or 1 I mean if you are taking in terms of percentage it should be 100 and if you're taking in terms of probability it should be 1 make sense Sunday Oh mark making sense to you also right helping out cool cool cool so yes I was asking one thing that his this photo saying the same thing that I have written in this highlighted cell also yes so we have converted this this operation that is showing in this image into this one link right soft medical code I mean you can see it a function the function is sim now if you see the softmax function it's e to the power minus set upon summation of e to the power minus said if you want I can send you all the sheets with all these formulas on one right so we asked first we did the affine we did our matrix multiplication of X and W then we added bias into it and then we did of softmax of the whole thing like you have done this thing right this is the one that we are going to calculate this is the Y that is being calculated but we already have the Y that is with us also right that is called by actual we also had a why actual with us so we have to create at by actual also that is called dot placeholder we played a placeholder but it they shoulders would be of 4:32 and the shapes would be shape will be can anybody tell me what will be the shape you you that will be generally right why I show a bite will be 10 because our image will have only one thing right if this is the image the output will be 4 right why I am giving 10 years because we did a one hot encoding you remember on the above we did this one hot encoding here at this point now what does one hot encoding does one hot encoding says that your data is actually not in the form of one two three four now it has converted into zeros and ones and you can see it here very much correctly at this instance whenever you are writing five you are writing in this format correct Kaymer totally correct it's two dimension so now what is going to happen that for every image the answer will be in 1010 no length right that's why I'm giving this right make sense to everyone now what I am going to do I am going to do the cross-entropy what does this cross-entropy I am Telling the I am Telling the noodle network that you have to go back and do all these things also what I am doing I am saying him to do a reduce mean I'll tell you what I am doing don't worry reduce mean of T of dot reduce you he do some some of what bye - - TF dot log of Hawai okay and then the sum here and then reduce me know this reduction in dices is equal to Munson but also accepted hmm I know this is a bit tricky but can any one of you want to try this thing what is actually happening here okay I'll tell you what is happening here is at this particular step what we are trying to do to calculate the edges for bad proposition whatever interval we are trying to calculate the errors and you remember error what actually what in the last class we discussed this also what was error error was nothing but why actual - why predicted right here also this is my actual minus log of Y predicted right now what I am doing because Y actually and VIPRE addicted will be so let's say this is the output output for all the cells so it will have some why actual let's say it is the image of four right so for image pair why actual will be one here right and other than that it will be 0 0 0 0 1 and then 0 0 0 and let's say it's giving some probability zero point zero one zero point zero two zero point one let's say and at here it will give 98 so what you have to do you have to - all these things right and then add up and be the same I did a - w-why a minus log of TF log Y and an ID does some of it right reduce some of it and have reduced it to the indexes first index right and then I took out the mean of it because it is that they're the cross and drop is a mean actually to divide by the number of observation that is called a cross and okay so you have the cross entropy with you know everybody gets the idea you I'll take up all here have a look at the screen just reduction in this is one this is okay so see what is actually happening is that these are why we wanted to go with the numpy also but okay so imagine you you understand these are outputs right these are output neurons right and and let's say the images of number four right so one hot encoding we'll say this is why actually what is what it will save I actually it will save I a upon 0 0 0 and it will here save by 1 z 1 right and then 0 0 0 for all and what will have I predicted say via pretty little predictive will give values for everything right it will give something that says give 0 here than 0.1 here 0.1 here then point to here right and then let's say it gives here 0.9 that's it gives 0.9 here so it will be point zero point zero to let's say it give zero point nine and something something something so what you have to do is you have to - these things first this is what we did also right we did a minus here and then we got the answer let's say answer is here and then we get tons we get the some of this but this is only for one image right we want to do it for all the images so what is happening I am saying it to the model that model do it on the first what first index now if you see properly what is the first index in every shape none here and what is this none this nun is the number of images that are going to come inside this seer none night making sense to you I can tell you again see what I'm doing there that I did this some you understand some yeah this is fine this yes you did to some here right but we want the sum to happen over this axis called 1 X is 1 now what is on the axis 1 this one okay okay right what is X is 1 ideally this is called none right now here if you see an exit right none and none is what none is active the number of images that we are going to supply here who wants some to go across all the images she wrote a little mean of it okay is that okay to everyone okay okay if it is okay now what you have to do is that we have to minimize this thing only right cross and drop is actually our error and we have to actually what do we have to we have to minimize this now how will we minimize this this is called a training step anybody remembers the name of the equation that is used for back propagation you anybody gradient descent remember so what we are going to ask to gradient descent wheels a gradient descent bro we have this thing called cross entropy please please please reduce this cross please reduce this so please reduce this across entropy for us means minimize what cross entropy for us correct okay to all you now whatever word we are going to do what happens intensive law what is the golden rule of tensorflow it says that whenever you want to run something you have to run it inside you have to run it inside a section below yes is equal to today we are going to use a new kind of position let's say it's interactive session right it's actually a session only but you can handle it very carefully right it actually made for what you call for all your zoo bitter notebooks on so you can do a interactive session rather than doing TF dot session you can do interactive session right now what your first thing when you do sorry you have got it that bit in a session in the interactive session I'll tell you the F dot session is when you you are not closing it right you have to explicitly close it but interactive session will make sure that once the process is done it will close itself and interactive sessions should be used only in Cuba do not books only it is meant for that only right once you make the once you make the session what is the first thing you always do you will always remember global variable initialization there are three things placeholders variables and constant you can run placeholders and constant directly inside a session but if you want to run any variable inside your session then what you have to do you have to initialize them first remember I do the same you I'll run it now all your variables are declared inside the session and they are ready right now but this doesn't do what I am going to do I am going to start a batch of 400 start of the angels - good let's say at first let me make it fifty only so I will run a fifty steps 5550 backward steps for our training right and I will say that so what I am doing I am asking the data to be given in batches how will I get it I will say immunised Emma NIST dot dot next batch it will automatically give me batches inside it right then Apple increases on because how many training samples do we have let me get just one second my nest dot train dot images not shape he have five fifty five thousand right so I have to make it five five zero did you get the calculation what I am doing actually here so what I am asking a minister that a - please don't give me all them aside once give me 100 hundred images and that I will take it in every step right and if I am going to take hundred images and how many times do I have to run the loop 550 times because I have fifty five thousand images right making sense to everyone right because we have 55,000 images and I am taking only 100 images at a time so I have to take 550 times I have to run this thing to complete all the images right so what is happening now it will give me bat X and batch by batch X will contain all the images and badge Y will contain the correspondingly labels of the thing right I'll do nothing I'll just say says dot run the Train step nothing and I will do a feed dick now I have to feed the dictionary also right because I have taken two placeholders one is this by a and another one is this X right so I have to feed these x and y also I know the X is nothing but X is what batch batch X b tch bad checks and I know my why underscore actual is what but we TCH solid by train right now if I run this it will run the session for me and done so now what is happening is the session is going to ask what is strange step so train step will say that you have to do a gradient different optimization minimization for cross and drop it is going to a switch cross entropy cross entropy will say that I have a reduced I am a reduced meaning of a reduced sum of Y a and Y so it will go back and it will ask okay why you are a placeholder I get it but who is why actually so he will say okay Y is actually a calculation of X W and B it will ask what is X X is again a placeholder you get the value from your feed depth but who are W and B it was a WNBA everybody gets the idea how session things work in tensorflow please ask her questions right now otherwise it will become a little messy afterwards and again to all we're not getting this all so don't worry Kiera's is much yep yeah right hello he owns some video yeah actually I want to know means how we decide the patches means taking a huge number means in some image processing example examples we have to reduce the size of village of Ana after that we have to train the means neural network so learn the size of the image and the number of batches we train it a picture means what we can say time of that I mean processing speed also so how they take this range means this - very good question very very very good question now I'll tell you answer slowly one by one right so first of all why do we take batches ok the concept of batches came because we don't have as much as Ram we can afford right because we have a limited limited Ram resource right let's say we have 1 million images now can you just put 1 million images in the forward and neural network no never you have to give them like thousand ten thousand images one eighth time right because our Rams are limited right we still don't have a processing power that can take all the move now your question comes how do we decide this batch number right let's say let me take a very large batch number right so what is happening that all the images are going try to go inside the network but again what will happen that because all the images are going together it will go fast definitely it will go fast but again the RAM will exhaust at one limit right let's say that we reduce the image only to one image at a time now give me now I'll ask you this one simple question ok you decide and tell me so let's say that - I I asked you to classified let's say or two animals and you have never seen those animals right these are some African animals that you have never feels like and I am showing you only one image and I say let's say the animals name our animal a and animal B right and let's say I only give you one image at a time right and then I say you that song kid this is animal a now give and the next time I show you the image I ask you that what is the probability of animal B you will say to me that bro ever seen animals B right yes right you will say this to me right that's why we cannot minimize the number of batches also so we have to make a certain make a certain ratio of the batch number such that the model is also getting generalized littering are you getting my point what I am trying to say this is your theory that if you are giving a very less number of batches then it's difficult for the model to run but if you are giving a very large number then you are you are forcing the model to learn multiple features in just one go right that's why you have to decide number very precisely based on how many features in your images do you have how difficult your image is to categorize or or let's say that if you are training on two images that what is the difference is the difference very significant but can be it can be caught very easily or is it very difficult to catch right that's how we define this batch number sorry one more means I just wanted and here this means we have to choose precisely but how can one choose means if he knows the parameters means if you know the in number of in images and the number of features which on on which days means it differentiating the image right how can one decide on me you're considering these parameters I I totally understand correct actually I had this yeah I have planned some if it means immature water hints a classification classification of problems model but I am I was not sure how I got back I was trial and error I was doing trial and error which batch how much badge and the resizing of the emitter was also a little bit tricky one so clearly I have to answer one more of your question that how do you do so cam is writing correct yet more data more model more a fitting yes that is true that is certainly true but okay son get to your answers first of all let's say that you are talking about this thing again and again that what is this resizing thing so see why majorly do we resizing is because see if you are giving a HD image to a model it doesn't make sense to it right yeah actually I'm not saying about resizing any sizing also is also one of the factor because if I resize the image to very small size means suppose a thousand by thousands of pixel image if I'd reduced it by two twenty to twenty by twenty pixel image then the process is the spirit of other interactive table will be fast enough right but it will minimize the features in that so if I reduce to that to some 500 pixel to 500 pixel then it will take some more processing time but when I figure twenty by twenty image size then I can give a large number of what against a bad fish correct I get right yeah what you got right so just to ramp to accommodate so this is a very good mathematical calculation that you are doing that if you reduce the if you reduce the resize the image to a smaller content and you can give a very large batch number but you have a very high definition image then you have to make sure that your batch size is very limited but but you have to make sure Sun get here that you know you have to see that you should not even reduce your image to a very small extent and even model cannot take out so because if you should go and see the algorithm but how does the C v2 dot resize work or how does the image resizing algorithm work the features are somewhat lost but you have to now make sure that at which extent do you want to resize your image right so see why I am as I am saying this is because he at the end of the day when you are when you are resizing your image so the next time when you are doing the prediction from this model also that time also you have to reduce this image size yes I'll say it again so let's say that you reduce the thousand twelve thousand mhm into 200 plus 200 mm and then you mail your model now what you have to do is then while you're predicting the images at that time also you have to make sure that your images are reduced to 200 and then only you can do them prediction also yes so is there any some mathematical formula or means some kind of calculation so can when decide to means approximate or bad size or like that so okay yeah correct correct also see I'll tell you first is that it totally depends on the data that you are working so so I'll tell you two different use cases I did two different use cases one is that you are working on this thing called on a classification of nut-and-bolt so not our circular also not sort cylindrical in size and bolts are hexagonal in shape right this is one thing that you are doing and second is that you are working on the very minut wiring section wiring classification so you have a model of multiple wires going inside so have you seen that image that when multiple land wires are going inside server forms right that is you can say PCB figure well correct color correction PCB of the correct Eric record feasibly motherboard connection you can also take right so now you're working on this my new thing right so now you have to make sure that how is the how is the model going to differentiate right so so it depends on the data right bolt and nuts are there dead quite differentiable at the very first instance right and if you're working on wires they are very minut and you know it's even difficult for humans to to differentiate and how can the model differentiate right so first is the first the first thing is data that your data is the first thing on which you can decide this number second is that people have made generic models like vgz net ResNet and imagenet so whatever what I generally do is that I take inspiration from these are from these or paper or papers or from these models right so because you'll see you see we don't have much no time to experiment with this because we're trying to stand in the right second this is a second thing right so what I do is I take inspiration from other models right third is that I do this elbow elbow elbow joint um what does elbow algorithm says it says that if you it is a brute force thing so what you do is you draw a graph here on the x-axis you have the error of your model right and on the on the x-axis you have the batch size so what you do you start in pack size with let's say 1 and you go with 10 20 or I mean 50 first 10 and then 50 100 200 and then you go with the maximum amount also right then you will see a graph somewhat like this at this section it is almost like a elbow or hand elbow somewhat like that and this is your optimized pad size activity this is the optimized bat size that you should always take so first thing is your data your data will tell you that how much you have to resize it so I'll tell you how to I decided that I told you that I take inspiration from other models so I have seen that VG v GG works on on the 2 to 4 dimension so they resize the image a 2 to 2 cross 2 to 4 + 2 to 4 so what you can do is that you can resize your image to 2 to 4 and I have seen bigger model like the new model that I was working yesterday it took image of 500 so I have never seen a model with more than 500 plus 500 image but again again it depends on the on your data again it depends on the data right see bad size is just one number right now you have to decide multiple things that size then only layer then and then in one layer how many neurons so there are multiple heat and trial that you'll do and believe me I asked the same question almost 5 years ago to my mentor also this and was also giving me something that you will decide with this experience and the more models you will make no you get the more ideas slowlyslowly whenever you will be dealing with data you will get the idea okay this this this is quite difficult you know difficult for the model to do to train so I have to leave the toy I have to leave the model to a very specific area so you will so you will reduce it to a specific area and you will crop it in a very good manner right and then you will give a generalized bad size right so so right now what here we are doing is that we are giving it a random bad size right so some model is a model on itself is taking random images from itself right from her Rita pool but sometimes let's say that you I tell you I was working on this example of watermark detection so what I was doing I had to I had to you know a build of classifier model or whether there is a watermark present or not okay now tell me how will you do this problem any idea water means visible autumn of you are seeing watermarks we see on the images rate so there is this problem so let's imagine that you are the you are as a illustrate client right you are let's say you are a housing dot-com right and now people are uploading photos on magic pill calm also right now what they're doing is that they're uploading images from magic break come to your portal also housing or calm also and this is a violation of rules right because magic bring calm has its own priority priority rioted images right they have their watermark on those images but now people what agents are doing to save time they are uploading the same magic break images to your portal also and now magic brick will show you well because you are using his images there what a monumental so you have to restrain the users from from using those images how will do it first time so identify Oh is there any water map of magic as if I put this only that you have to tell that this does this image contain a watermark yes or no right so see this is a very oh very oh no I am out doing this problem one with two years back so what was this problem is at the same problem so imagine this is a this is a image right and similar to this image same copy of this image but with a watermark at the way it small level now what are muffs you know that watermark are not colored in nature they are transparent in nature right and they work on the Alpha Channel so what is this that they they do not even have some color on them also so what you can do you cannot even you know convert your image into a single general image also right so you asking this question what is the resize I'll give you one more problem other than resize you have one more problem rather than converting your problem into a single channel limit you have to keep it now into four channel limits right and I had if I reduce this image size what was happening the the the logo was at the watermark was getting squeezed out so I could not resize the image also but actually I have to take a model with the constant image also so it depends on the problem on which we are working right second I will tell you I was working on this PCB wiring also I was working here and this actually a PC wiring it was actually a car wiring system right so there was this engine was also in the photo there was multiple things inside the photo and there it has a very small slot in which images were going inside right now if you give them if you give the model disk will you in I mean all these things all those noise also the model is not going to understand anything so what I had to do I had to crop this section first couple this small section I had to crop this small section and then I had to give it to model and then only I could do it right yeah so it would so your question is totally valid and I can totally relate to it I had the same questions I did know same questions and I also could not get this but believe me but believe slowly slowly slowly yeah interesting news is you will get interesting problems in a and the best part about AI is that you know what is happening now we are out of the bubble phase right now so everybody understands okay things work on their ways right but still people you know they have very interesting use cases and they ask you that can you do this with the AI can you solve this with AI or can you solve it this section and imagine you'll believe me you will come out with somewhat other way you will come out some other way that you guess you can do this you can do this and that unknown and you will come with it don't worry don't worry at all Sanctus I know I couldn't give you a correct answer but did I make sense to you because see you can read multiple blogs I mean you can you read multiple blogs and multiple books you will not get a correct answer anywhere even if someone is given no answer even if someone is giving you answer he's giving you for a particular problem no he's giving that okay let's say he's saying take you know bet you bet you take the reciters 300 trust 300 but he is giving it for this particular problem right so that's why there is nothing so that's why what I do if you want to generalize something go with these models like bgz what is V DV GC is actually a model that is made by made by Oxford University on imagination yeah I'm born through somehow means blog also and when I take is I gotta take help from diesel these things and then I work on it and you should definitely follow this guy by a med-surg he is a very good guy and if you're especially interested in images you should definitely this blog is your savior yeah these are the models lgz yeah attendez Vegas so vgg resonate inception so you should definitely go with these things because they are trained they are trained to be very generic in nature so they have almost Heusen categories to classify and they are trained on 1 million images so you should definitely take inspiration from these things I I generally take this inspiration from these things because these are made by one of the greatest people and deep learning and AI section and the second second part is that they are a generic in nature so I take inspiration for me correct now once what we have done here until this 33 step is that we have taken a batch of 100 images and we have drained the model now let's see that how do we get that's a a CC is equal to TF dot equals what is TF dot R max what of what of let's say by a comma 1 comma T F dot R max of by comma my comma 1 ACC will come out to be yes easy we'll come on to God where are the musical productions you see see don't run why is this known to us just a second just a second okay sorry I have to cast it all sorry my mistake I should say this as here God reveals me reducing of TF dot cost ACC o correctional rota to take it in to go and see see this is not nautically CC this is actually correct addictions sorry for this thing torence s dot run for an ACC what things I have to give I have to give a feed dick what I have given only by a right to my is equal to y is able to have to give any honest okay I could give volume is equal to and one next dot test dot which is right and Y is equal to Y is equal to as a mist and the misc look I have 114 top levels like and did I have removed affection to put a chemical okay okay okay what this safe place holder shape 10-zip you device play soldered over here I press okay sorry see what I did I were giving tester and I give train here and to give test okay again sorry pal it comes out to be 0.09 Mandi running good I mean it's not good so priority should be zero point zero minute but we didn't have done much of the thing only it is it one perceptron and it can give this accuracy right here it this is this is just to show you that how things will work here and how you can do thousand how will you do multiple things out of it right what you can do you can get your I think it's also simply a caduceus toward run you have to get out of a blow here right the blue end you can get you you get your weights also here so these are all the waits and waits she waits should be 784 into 10 these are your bids you have your weight here similarly you can have your biases also here biases you can have good says don't run now can you do you can do be V dot and should be equal to ten only they are ten only you can go see here right this is just to give you an intuition that how will you do a change since close very important just to this right just a quick info if you're interested in doing an end-to-end certification course in artificial intelligence then Intel part provides is the right course for you to master all of the concepts with respect to this field and you can check out the course details in the description below so this brings us to the end of the session so I hope that you've comprehensively learnt about tensorflow but also to subscribe to our channel so that you don't miss out on our upcoming videos thank you and happy learning

Info

Channel: Intellipaat

Views: 22,157

Rating: undefined out of 5

Keywords: tensorflow tutorial for beginners, tensorflow tutorial, tensorflow training, tensor flow, what is tensorflow, tensorflow basics, tensorflow, introduction to tensorflow, google tensorflow, tensorflow python, deep learning tensorflow, deep learning with tensorflow, learn tensorflow, tensorflow explained, tensorflow 2.0, tensorflow image classification, tensorflow for beginners, tensorflow example, tensorflow image recognition, tensorflow model, Intellipaat, what are tensors

Id: 5pG9HYdFd8M

Channel Id: undefined

Length: 239min 29sec (14369 seconds)

Published: Tue Oct 01 2019