Backpropagation in Convolutional Neural Networks (CNNs)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video we are going to look at the mathematics behind back propagation in a convolutional neural network the purpose of this video is to get a basic understanding of how back propagation Works in a convolutional neural networks so that it becomes easier to implementing code from scratch if you are using a typical AI framework such as tensorflow Pi torch Etc you will not even have to bother with the back propagation as it is already implemented this video will exclude the bias term in order to make it easier to understand the back propagation process you should know the basics of how a forward propagation Works in a convolutional neural network if not this video will give you a brief introduction foreign we start with the input layer and the kernel the kernel is often referred to as the filter during a regular forward propagation we start off in the left corner of the input layer we multiply the weights in the kernel with the corresponding values in the input layer the output is Z1 and the formula can be seen below this is the first step in the convolution then we start the kernel two steps to the right and multiply the weights with the corresponding values notice that we use the stride of two this simply means that we move the kernel two steps at a time this outputs Z2 and this is the second step in the convolution since there are no more input values to the right we move the kernel two steps down and all the way to the beginning and repeat the multiplications this is the third step in the convolution and the output is Z3 we'll now move the kernel two steps to the right and repeat the multiplications this is the fourth step in the convolution and the output is Z4 after the convolution we end up with an output Matrix which we can call layer 1. now we flatten out layer 1 and output a prediction which we will denote as y hat y hat can then be used to calculate the loss now we'll not go into the depths of how to calculate the y-hat or the last in this video we'll simply focus on the Kernel and how we can update it so let's get started in order to update the weights we will use this formula here the updated weights are denoted with an asterisk the I simply means that for a given I value in the range from 1 to 9 the formula would change accordingly so for example when I is equal to 8 the updated W8 can be calculated by subtracting the learning rate Alpha multiplied with the partial derivative of the loss with respect to W8 from the original W8 now that we understand the formula let's look at the terms wi is just the kernel the learning rate Alpha is a constant we choose ourselves the unknown in this formula is the partial derivative of the loss with respect to the weights since there are an equal amount of Weights as there are partial derivatives we can represent this term the same way we represent the kernel in Matrix form so the next step is to calculate the partial derivative of the loss with respect to the weights let's get an intuition for how we can set up this equation to make everything easier to digest let's look at it one term at a time we'll start off with W1 a change in W1 will cause a change in all the Z values this is because W1 appears in all of the equations for disease the change in the Z values will in turn cause y hat to change which in turn will cause the loss to change armed with this knowledge we can set up the equation for the partial derivative of the loss with respect to W1 let's simplify things further and look at one C value at a time we'll start off with C1 our objective is to calculate the partial derivative of the last with respect to W1 a derivative is basically a term to measure the rate of change we know that a change in W1 will cause Z1 to change to measure the rate of change in Z1 once W1 changes we can calculate the partial derivative of Z1 with respect to W1 a change in Z1 will in turn cause y hat to change to measure this rate of change will take the partial derivative of y hat with respect to Z1 the change in y hat will now cause a change in the loss to figure out the rate of change in the last when y hat changes we can take the partial derivative of the loss with respect to Y hat the last term of this equation can be simplified to the partial derivative of the loss with respect to Z1 will follow the same logic for Z2 First We Take the partial derivative of Z2 with respect to W1 to measure the rate of change in Z2 when W1 changes then we take the partial derivative of y hat with respect to Z2 to measure the rate of change in y hat once Z2 changes finally we take the partial derivative of the loss with respect to Y hat to measure the rate of change in the loss when y hat changes we'll then simplify the last term as we did earlier if we do the same thing over Z3 and Z4 we'll end up with this equation for the partial derivative of the loss with respect to W1 the same logic can be applied to all of the weights that gives us this general formula but let's continue looking at the partial derivative of the loss with respect to W1 we can further try to simplify that equation looking at the terms where we take the partial derivative with respect to W1 we notice that we can solve them using the equations from earlier a quick recap if you don't remember how to do partial derivation basically since we are taking the partial derivative with respect to W1 we'll look at W1 as the only variable and consider everything else as constants therefore the solutions are as follows we can now replace the partial derivative terms with the respective Solutions let's do the exact same thing for the partial derivative of the loss with respect to W2 since we take the partial derivative with respect to W2 we'll look at W2 as the variable and everything else as constants that gives us the following solution for the partial derivative with respect to W2 if we do the exact same thing for all the weights we'll end up with these nine equations by looking at these equations we see some repeating terms let's identify them first we'll start with the partial derivatives with respect to disease by looking at the forward propagation we notice that these terms turn out to be the partial derivatives of the loss with respect to the terms in layer 1. next we can look at the Ace in front of the partial derivatives with respect to Z1 by mapping them out to the input layer we notice that this term looks strikingly similar to the a values from the first step of the convolution let's copy these values from the input layer and multiply them with the partial derivative of the loss with respect to Z1 these are the first terms in our equations let's move on to the Ace in front of the partial derivatives with respect to Z2 mapping these values to the input layers give us the a values from the second step of the convolution we'll copy these values and multiply them with the partial derivative of the loss with respect to Z2 these are the second terms in our equations doing the same thing for the Ace in front of the partial derivatives with respect to Z3 Maps out like this in the input layer essentially the third step of the convolution we'll copy these values and multiply them with the partial derivative of the loss with respect to Z3 these are the third terms in our equations finally let's do the exact same thing for the Ace in front of the partial derivatives with respect to Z4 this Maps out to be the fourth and final step in the convolution we'll copy these values and multiply them with the partial derivative of the loss with respect to Z4 these are the four terms in our equations multiplying and adding the matrices together gives us the Matrix containing the partial derivative of the loss with respect to the weights now multiplying this Matrix with the learning rate Alpha and subtracting it from the kernel gives us the updated weights exactly as the formula we looked at earlier beautiful isn't it

Info

Channel: far1din

Views: 35,320

Rating: undefined out of 5

Keywords: 2d animation, Animation, Backpropagation, Backpropagation calculus, Backpropagation in cnn, Backpropagation math, Backpropagation visualized, Calculus, Cnn backpropagation, Cnn explained, Convolutional neural network, Convolutional neural networks, Convolutional neural networks visualized, Deep learning, Manim, Training neural network, Understanding backpropagation, backpropagation algorithm, backpropagation in neural networks, machine learning, neural network

Id: z9hJzduHToc

Channel Id: undefined

Length: 9min 21sec (561 seconds)

Published: Sun Dec 11 2022