Activation Functions In Neural Networks Explained | Deep Learning Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we are going to learn about actuation functions we go over the definition of actuation functions why they are used then we have a look at different kinds of actuation functions and at the end i also show you how to use them in your code and don't worry because deep learning frameworks like pytorch and tensorflow make it extremely easy to apply them this video is part of the deep learning explained series by assembly ai which is a company that creates a state-of-the-art speech-to-text api and if you want to use assembly ai for free then grab your api token using the link in the description below and now let's get started so what are activation functions and why do we need them actuation functions apply a non-linear transformation and decide whether a neuron should be activated or not now let's take a step back and see what this means in a previous video we learned how neural networks work in a neural network we have the input layer where we accept an input and an output layer that gives the actual prediction or the outcome of the network and in between we have the hidden layers all of these layers consist of neurons and at each neuron we apply a linear transformation it multiplies the input with some weights and maybe adds a bias now this is fine as long as we have a simple problem like this where we can model the predictions with a linear function but let's say we have a more complex problem one thing we can do is of course add more layers to our network but here's a big problem without activation functions we only get linear transformations after each other so our whole network is basically just a stacked linear regression model that is not able to learn complex patterns and this is exactly why actuation functions come into play so after each layer we want to apply an activation function this applies a non-linear transformation and helps our network to solve complex tasks now let's have a look at different kinds of actuation functions there are many different actuation functions you can choose so we take a look at the most popular ones we'll have a look at the step function sigmoid hyperbolic tangent value leaky value and the softmax the step function will just output 1 if our input is greater than a threshold and 0 otherwise this perfectly demonstrates the underlying concept that the activation function decides if a neuron will be activated or not if the input is greater than the threshold the neuron is actuated and otherwise not while this transformation should be easy to understand the step function is actually a little bit too simple and not used in practice a very popular choice in practice is the sigmoid function the formula is 1 over 1 plus e to the minus x this outputs a probability between 0 and 1. if the input is a very negative number then sigmoid outputs a number close to 0 and for a very positive number sigmoid transforms it to a number close to 1 and for numbers close to 0 we have this rising curve between 0 and 1. this again means that the more positive the input number is the more our neuron will be activated the sigmoid function is sometimes used in hidden layers but most of the time it is used in the last layer for binary classification problems until now we have only seen activation functions that output numbers between 0 and 1 but this is not a requirement for actuation functions so in the next examples you will see transformations that can output numbers also in a different range the hyperbolic tangent is a common choice for hidden layers it is basically a scaled and shifted sigmoid function that outputs a number between -1 and plus 1. value is probably the most popular choice in hidden layers the formula is rather simple it just takes the maximum of 0 and the input x so if the input is negative it outputs 0 and if the input is positive it simply returns this output without modification it does not look that fancy but it can actually improve the learning of our neural network a lot so the rule of thumb is that if you are not sure which actuation function you should use in your hidden layers then just use value there is only one problem that sometimes happens during training this is the so-called dying value problem after many training iterations our neuron can reach a dead state where it only outputs 0 for any given input which means there will be no more updates for your weights so to avoid this problem you can use a slightly adapted function which is the leaky value the leaky value is the same as the regular value for positive numbers here it just returns the input but for negative numbers it does not simply return 0 but it applies a small scaling factor a times x a is usually very small for example 0.001 so the output is close to zero but it avoids that the neuron will be completely dead so this is also a very good choice for hidden layers so whenever you notice that your weights won't update during training then try using leaky value instead of the normal value and the last function i want to show you is the softmax function the softmax squashes the input numbers to output numbers between 0 and 1 so that you will get a probability value at the end so the higher the raw input number the higher will be the probability value this is usually used in the last layer in multi-class classification problems after applying the softmax in the end you then decide for the class with the highest probability now that we've seen different actuation functions in theory let's have a look at how we can use them in tensorflow and pytorch it is quite easy with both frameworks in tensorflow i recommend using the keras api with this we have two options for each layer we can specify the optional argument actuation and then just use the name of the actuation function or we just leave this actuation argument away and create the layer ourself all the functions i just showed you are available as a layer in tensorflow.keras.layers in pytorch we also find all actuation functions as a layer under torch.nn in our init function of the neural network we can create instances of the actuation function layers and then in the forward pass we call these layers or as a second option we can use the functions directly in the forward pass by using the functions defined in torch.nn.functional and that's basically all we have to do to use actuation functions in our code alright so i hope you now have a clear understanding of what actuation functions are and how you can use them and if you have any questions let me know in the comments and also if you enjoyed this video then please hit the like button and consider subscribing to the channel for more content like this and before you leave don't forget to grab your free api token using the link in the description below and then i hope to see you in the next video bye
Info
Channel: AssemblyAI
Views: 40,123
Rating: undefined out of 5
Keywords: Deep Learning, Activation Functions, Sigmoid, Softmax, ReLU, Leaky ReLU, TanH, Deep Learning Explained, Deep Learning Tutorial, Neural NEts
Id: Fu273ovPBmQ
Channel Id: undefined
Length: 6min 43sec (403 seconds)
Published: Mon Dec 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.