Tutorial 9- Drop Out Layers in Multi Neural Network

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
  • Original Title: Tutorial 9- Drop Out Layers in Multi Neural Network
  • Author: Krish Naik
  • Description: After going through this video, you will know: Large weights in a neural network are a sign of a more complex network that has overfit the training data.
  • Youtube URL: https://www.youtube.com/watch?v=XmLYl17DbbA
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ Sep 30 2019 🗫︎ replies
Captions
hello all my name is Krishna and welcome to my youtube channel today we are basically going to discuss about dropout and regularization now we need to understand one thing is that whenever we have artificial neural network which is very deep at that time you will understand that will be having many weights and many bias parameters and when we have a huge amount of weight parameters and bias parameters then what will happen is that the artificial neural network tends to overfit the data set problem or a particular use case so we should try to find out a way how to fix that particular fitting problem understand that whenever we have a multi layered neural network underfitting will never happen because we will basically be having multiple layers in a multiple layer neural network suppose if you just have one layer neural network at that time underfitting will usually happen so always remember for a multi-layer neural network we will never face underfitting but yes overfitting will be a problem because as we go on creating a neural network in a deeper way at that time which each and every addition of a weight parameter the weights what it does is that it tries to fit the training data perfectly so in that particular case what will happen is that you will be facing a high variance problem that basically means an overfitting problem so there are two basic ways to solve an old fitting problem the first way I would like to term it as regularization you have heard of regularization in regularization we have also discussed in machine learning and there are two types of regularization like l1 and l2 but in today's session will basically be discussing about second type which is called as dropout by the implementation of knockout this will also help us to you know implement a technique which is similar to a regular regularization we will try to understand what exactly is dropout now this dropout thesis was basically written in 2014 with the help of to put two people one is little sri moscow nithya Shrivastav and the second person is my favorite Geoffrey Hinton so myth is washed off thesis was all about dropout and he was under he was basically there was a student of Jeffrey inter and that and this particular 3 6 came somewhere around 2013-14 and now we try to understand how does dropout work and I'll also be given the thesis paper description mining URL in the description box so that you can go and read because after understanding this particular explanation I think that will be a very easier way for you to just read a thesis paper and understand all the techniques that we'll be discussing over here so to begin with let us take a very small very good example like how does dropout how how will basically implement a dropout layer in that neural network so to begin with first of all and just write like to revise a machine-learning concept which is called as random forest algorithm so suppose I hope everybody knows random forest algorithm in random forest you know that we create multiple decision trees now each and every decision tree over here is basically you know we will be creating a decision tree to complete depth when we say - it's complete depth then each and every decision tree will be leading to an over fitting problem it will try to all fit the data and whenever we are using decision trees in dying inside random forests but you should remember the random forest has one more technique we will not be using all the features in random forest right we will just be using a sample or a subset of features we will be using a subset of features now when we use a subset of features whatever this whole fitting condition is happening this subset of features basically is our regularization method which will actually reduce this particular overfitting problem and we try to improve the accuracy you know and if you don't know about random forests I'd suggest you please go through my playlist over they have uploaded a lot of videos on random forests both practical implementation and theoretical path but why I have discussed about random forests over here is that because you should know this subset of features we create decision tree with the help of subset of features and we create multiple decision trees you know so every time whenever we are creating a new decision tree at that end what will happen will have some different subset of features which will be like mine similarly to implement the dropout suppose this is my neural network suppose this is my neural network suppose my first layer is basically my hidden layer sorry input layer the second layer is basically mind ln1 we don't let - and this is finally my output layer okay now to implement the dropout layer what we basically do is that we select our dropout ratio dropout ratio drop operation suppose if I indicate this as P now usually the dropout ratio will be between zero to less than P less than one okay less than or equal to one now the dropout ratio over here also indicates that how we did for a random forest like a selected sample or subset of features similarly over here we'll be selecting subset of features from the input layer similarly we'll be selecting subsets of activation function on or the hidden neurons in the hidden layer similarly over here also we'll be selecting subset of neurons we will not be selecting everything we will just be selecting subset of features in each and every hidden layer along with the input features so over here you can see that I have selected over here to feature ok initially my two features are inactive so I can say that my p value for the first layer is basically point five now this p value is nothing but my dropout ratio okay similarly over here in my hidden layer one I can see that over here two are activated remaining Ola deactivated right so again I may select p value as point 5 just approximately saying it as point five because here I have five nodes and from that three are activated two are deactivated right sorry two are activated and three are deactivated similarly over here I can see that two are inactivated and two are activated so here also my value will be or will be 0.5 how to select the PMAG I just tell you know by but just understand that I have created a neural network and here I have selected dropout ratio of P is equal to 0.5 0 to P is equal to 0.5 in each and every layer have selected P is equal to 0.5 now you should understand that when my forward and the backward propagation will be going on when I select the p value is 0.5 it is we select some features it will deactivate them when it deactivates them suppose in this case my first my the second and the fourth mode is deactivated so my input will get past all the activation function now in my second layer also in my first layer you should see that again I've selected p-value is 0.5 so randomly it will select some of the activation function some of the neurons over here it will deactivate them all the processing will be same all the crossing will be same right but you should understand based on the p-value it will just deactivate some of the neurons then simultaneously go to the next error where again p-value will be there and finally we'll get the output now this is respect to the first propagation right I mean the backward propagation backward proposition will be almost same as I discussed in my previous video right whichever neurons are getting activated each other but neurons are activated their weights will get updated ok the weights will get updated now in the next titration in the first iteration you saw that someone got deactivated someone got activated in the next titration again when I say promote p-value is 0.5 half of this node in the input feature will again be activated and it'll be just selected you know as a subset of feature and it'll be selected randomly okay to be selected randomly every time with respect to this probability value features will get selected randomly and that is all the crux idea behind dropout layer again it is almost similar to random forest when you just select number of features you create the decision tree and finally you get the majority vote as your output and then you implement the random forest now in this case what we are doing is that in order to improve in order to improve the overfitting problem good fitting problem instead of using regularization like L 1 and n 2 you are basically using dropout ratio now image drop and dropout ratio what you are trying to do there deactivating some of the neurons you are activating some of the neurons or you can also call it as an activation function you are deactivating some of the input features and you're activating some of the input features also and by that way the whole process is going on right now the next question arises if my training data is basically deactivating and activating what about my test data what about my test data now a simple is basically applies first test data whenever I want to predict for my test date now all I have to do is that all the new loans will get connected just understand everything will get connected for the test data okay there will be no deactivated or activated neurons or the activity or activated features or everything will be connected now once it is connected right suppose this is connected right this is connected everything is connected suppose I'm connecting everything all one additional work that you have to do for all these weights for all these weights that were actually fixed in during training the data this probability will get multiplied so W multiplied by P will happen for each and every weights in each and every layer that is a simple hack that is basically applied for the test data to find what is the output credited for the test data we just have to multiply the weights and the probability value that gets selected which is basically my drop operation okay and that is how it works for the test data now the next question arise is how do we select the p-value I'll give you a small and simple hack one way is that I can basically use Hyper parameter optimization you know to find the exact p-value and the general scenario is that whenever a deep neural network is doing an overfitting your p-value should be little bit higher okay when I say a little bit higher it should at least be greater than 0.5 but if you want to find the exact p-value which is suitable for the use case you can basically apply hyper parameter optimization hyper parameter optimization may be many you can use cross-validation you can use many more which have already implemented in machine learning and show you this particular example everything in the upcoming classes in the practical implementation because I love to show you that if I do not apply dropout ratio you can see that that the whole artificial neural network you're attending to all fit the data but after applying dropout ratio you will be seeing that the error and I could isolate you know the bar the videos will be it will not be that much high okay it will be little business and I hope you understood about dropout rates you understood about what is this all about you know dropout ratio you can also call it as a dropout layer now drop off layer is basically just like a thinking that every layer will be created between this between this like suppose if I take my input feature under Internet there will be a drop out layer created over here which will be deactivating and activating the input feature similarly in the hidden layer so I hope you like this particular video please do let me know like if you have any question that see you in the next video have a great day and please do subscribe the channel if you have not already done please share with all your friends whoever required this kind of hell thank you want at all I will see y'all in the next video
Info
Channel: Krish Naik
Views: 77,486
Rating: undefined out of 5
Keywords: dropout neural networks, dropout layer keras, dropout in convolutional layers, lstm dropout keras, dropout linear regression, convolutional neural network, dropout vs l2 regularization, pytorch dropout
Id: XmLYl17DbbA
Channel Id: undefined
Length: 11min 31sec (691 seconds)
Published: Wed Jul 24 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.