4.3 Exponential Linear Units | Gated Linear Units | ELU & GLU | ReLU | Activation Functions| Notes

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome back to my channel so in the previous video we learned about relu and its variance we learned that how relu is better in comparison to sigmoid and tanner's activation function and how did relu solve the problems of sigmoid and tannage and how the tang relu problem was solved by its variance that is p relu and liquid and in this video we will be covering the third variant of relu that is exponential linear units or elu and once that is completed we will be dealing with another activation function that is glu or gated linear units so without wasting any time let's get started so let us begin with elu so the formula of elu looks something like this so here you can see that for the positive inputs for the positive inputs you will get the output same as the input but for every other case that is for the negative and 0 we will use a special formula that is alpha multiplied by e to the power of x minus 1 so here alpha is a parameter that we can choose and x are the values that you will be getting after the pre-activation step that is the once you feed the inputs to your neuron then you will what you will get is you will get the output as the inputs multiplied by their respective weights and after that we will be adding a biased term so this is the value that you will be getting after the pre-activation step so if we talk about the graph if we discuss about the graph of relu then here you can see that for the positive inputs you will get y is equal to x x graph y is equal to x scrub that is a straight line and for every other input for every other rest of the values of x you will get a slope something like this so here you can clearly see that for the positive values you will get a slope something like this that is y is equal to x and for the negative values you will get a slope like this so here the better part so the advantage here you will gain is that for a normal reload you will get output as this as a disc line that is slope is equal to zero so here you will because of this you will face a dang relu problem but here because you are getting a because you are getting some slope some value some output in the negative part therefore the time value problem that you encounter earlier can be solved here so if you talk about the advantages of elu then the first advantage is that it has all the advantages of relu the second advantage is that there is no dying relu issue or the neuron dead neuron or deadly issue and if we talk about the third advantage then here you can see that there are some values that are positive and some values that are negative so because of that we can say that the output the mean output that you will be getting is zero centered which will affect the which will affect positively the weight update efficiency so yeah we can we know that the positive output is going to be obviously more than the negative but just in a general case just because by observation we can say that the if we take the mean of all the negative and the positive output then the mean will be lying somewhere close to zero not exactly zero but close to zero therefore therefore we can say that the output is zero centered therefore the weight update efficiency is better and if we now discuss the disadvantages of elu then the first disadvantage is that since because in the formula there is a exponential term involved therefore it will take much longer time it is computationally expensive the time complexity is more therefore this activation function will take some more time and the second disadvantage is that although elu is theoretically better than relu but even then when we do the practical implementation when we when we do the practical coding we have found that the outputs of relu the accuracy of relu is somewhat similar to that of elu so elu and relu don't have much difference to them in terms of their accuracy and output so therefore we generally use relu only now let us discuss our next activation function that is glu or gated linear units if we talk about its working then this activation function has a slightly different working than rest of the activation functions here what we do is in the neural layer in the layer in which we have to apply our activation function what we do is we first double that for example if if there is a neural network something like this and we want to apply activation function in this layer what we do is we double the count of the neurons we double the dimensionality of that neural network layer on which we have to apply the activation function and we split that into two halves in the first half we have our regular we have our normal neurons that earlier was present here and in the second half what we do is we try to attach a sigmoid activation function now what we do is we feed our input to these new these separate neurons so here from this part what you will get is output as the inputs multiplied by their respective weights and a bias term here also you will get your input multiplied by the respective weights here also you will get your inputs multiplied by the respective weights and add a bias term but after that in the second half what you do is we feed this thing inside a sigmoid activation function so after once these steps are completed what we do is we try to multiply the output which we got here with the output that we will get after we pass the input inside the sigmoid activation function and we do a element-wise multiplication between the two outputs so that will get a separate output now the sigmoid function what will the sigma function do is it it will try to scale down the input it will receive between zero and one so from the first half you will get uh you will get output something like this from the second half you will get an output something like this and we what we do is now we try to do the element wise multiplication between the two outputs so that we will get another separate output a final output that output that we get and getting from our this activation function so the steps involved in glu are double the uh double the dimensionality of the neuron layer and in the divided into two halves and what you have to do is in the second half you have to apply a sigmoid activation function and now what you have to do is you have to do the element twice multiplication between the outputs gained from the two halves and the value that you will be getting after the element wise multiplication is your final value that is the value that you will be gaining from your activation function of glu so this is how a glue works we can also say that glu is a gated activation function so that was all for today guys in this video we learned everything you needed to know about elu that is exponential linear units and glu gated linear units so in the next video we will be discussing about soft plus activation function max out and switch activation function so we will meet in the next video until then bye

Info

Channel: The Ai Genome with Saksham Jain

Views: 155

Rating: undefined out of 5

Keywords: machine learning, machine learning algorithms, deep learning, Saksham Jain, The FutrCamp, deep learning basics, futre camp, future camp, Artificial Neural Networks (ANN), ANN, Activation Functions, what is relu activation function, relu activation function, problem of relu activation, dying relu, ReLU fixed vanishing Gradient problem, PReLU, Leaky ReLU, parametric ReLU, Rectified Linear Unit, Varients of Relu, Exponential Linear Units, ELU, ReLU, GLU, Gated Linear Units

Id: 08S46Bv1K7E

Channel Id: undefined

Length: 7min 10sec (430 seconds)

Published: Fri Jun 11 2021