Illustrated Guide to Recurrent Neural Networks: Understanding the Intuition

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Thanks for this! Lmao I remember reading the paragraph of what a Vanishing Gradient was a million times and not really getting it. Your video was an aha moment for me not only for RNNs but for that as well.

👍︎︎ 4 👤︎︎ u/Blytheway 📅︎︎ Sep 04 2018 🗫︎ replies

Subscribed. Waiting for more :)

👍︎︎ 2 👤︎︎ u/rohit1999 📅︎︎ Sep 05 2018 🗫︎ replies

Captions

hi and welcome to an illustrated guide to recurrent neural networks a Michael also known as learned vector I'm a machine learning engineering the natural language processing and voice assistance space if you're just getting started in machine learning and want to get some intuition behind recurrent neural networks those videos for you if you want to get into machine learning recurrent neural networks are a powerful technique that's important to understand if you use smart phones and frequently surf the internet odds are you use applications that leverages RN ends recurrent neural networks are using speech recognition language translation stock prediction it's even using image recognition to describe it content in pictures so I know there are many guys on recurrent neural networks but I want to share illustrations along with an explanation of how I came to understand it in this video I'm going to avoid all the math and focus on the intuition behind RN ends instead by the end of this video you should have a good understanding of RN ends and hopefully have that light bulb moment so RN ends are neural networks that are good at modeling sequence data to understand what that means let's do a thought experiment say you take a still snapshot of a ball moving in time let's also say you want to predict a direction that the ball is moving so with only the information that you see on the screen how would you do this well you can go ahead and take a guess but any answer you come up with would be that a random guess without knowledge of where the ball has been you weren't having an update of to predict where it's going if you record many snapshots of the balls position in succession you will have enough information to make a better prediction so this is a sequence a particular order in which one thing follows another with this information you can now see that the ball is moving to the right sequence data comes in many forms audio is the natural sequence you can chop up an audio spectrogram into chunks and feed that into RN ends text is another form of sequences you can break text up into sequence of characters or sequence of words okay so our ends are good at processing sequence data for predictions but how well they do that by having a concept I like to call sequential memory to get a good intuition behind what sequential memory means I like to invite you to say the alphabet in your head go on give it a try that was pretty easy right if you were taught the specific sequence it should come easily to you now try saying the alphabet backward I bet that was much harder unless you practice the sequence before you'll likely have a hard time here's a fun one start out the letter F at first just struggle with the first few letters but then after your brain picks up the pattern the rest will come naturally so there's a very logical reason why this can be difficult you learn the alphabet as a sequence sequential memory is a mechanism that makes it easier for your brain to recognize sequence patterns all right so are n ends have this abstract concept of sequential memory but how the heck does it replicate that concept well let's look at a traditional neural network also known as a feed-forward neural network as an input layer hidden layer and output layer how do we get a feed-forward neural network to be able to use previous information to affect later ones what have we had a loop in a neural network that can pass previous information forward and that's essentially what a recurrent neural network does an RNN has a looping mechanism that acts as a highway to allow information to flow from one step to the next this information is the hidden state which is a representation of previous inputs let's run through an art and use case to have a better understanding of how this works let's say we want to build a chatbot they're pretty popular nowadays let's say the chatbox can classify intentions from the user's inputted text to tackle this problem first we're going to encode the sequence of texts using an RNN then we're going to feed the RNA and output into a feed-forward neural network which will classify the intents okay so a user types in what time is it to start we break up the sentence into individual words rnns work sequentially so we feed it one word at a time the first step is to feed what into the RNA the RNA encode what and produces an output for the next time we beat the work time in a hidden state from the previous step remember that the hidden state represent information from all previous steps the RNN now has information about the work what in time we repeat this process until the final step you can see about a final step the Arnon has encoded information from all the words in the previous steps since the final output was created from the rest of the sequence we should be able to take the final output and pass it to the feed-forward layer to classify in intent for those of you who like looking at code here are some Python showcasing the control flow first you initialize your network layers in the initial hidden state the shape and dimensions of the hidden state will be dependent on the shape and dimension of your recurrent rail network then you loop through your inputs past a word and hence a into the artnet DRN and returns the output at a modified hidden state this modified hidden state should now contain information from all your previous steps you continue to loop until you're out of words last you pass the output to the feed board layer and it returns a prediction and that's it the control flow of doing a forward pass of a recurrent neural network is a for loop okay now back to our visualization you may have noticed the odd distribution of colors in the hidden states this is to illustrate an issue with our n ends known as short-term memory short-term memory is caused by the infamous vanishing gradient problem which is also prevalent in other neural network architectures so as yarn and processes more steps it has troubles retaining information from previous steps as you can see the information from the word what and time is almost non-existent at the final step short-term memory and vanishing gradient is due to the nature of back propagation algorithm used to Train and optimize neural networks to understand why this is let's take a look at the effects of back propagation on a deep feet board neural network training and neural network has three major steps first it does a forward pass and makes a prediction second it compares the prediction to the ground truth using a loss function the loss function outputs an error value which is an estimate of how badly the network is performing last it uses the error value to do back propagation which calculates the gradients for each node in the network the gradient is a value used to adjust the network's internal weights allowing the network to learn the bigger the gradient the bigger the adjustments and vice versa here's where the problem lies when doing back propagation each node in a layer calculates its gradient with respect to the effects of the gradients and the layer before so the adjustments in the layer before it is small then the adjustments in the current layer will be even smaller this cost gradients to exponentially shrink as it back propagates down the earlier layers failed to do any learning as the internal weights are barely being adjusted due to extremely small gradient and that's the vanishing gradient problem let's see how this applies to recurrent neural networks you can think of each time step and over current no network as a layer to train a recurrent neural network use an application of backpropagation called back propagation through time the gradients value will exponentially shrink as it propagates for each time step again the gradient is used to make the adjustments in the neural networks weights thus allowing it to learn small gradients means small adjustments this causes the early layers to not learn because of the vanish ingredients the RNN doesn't learn the long-range dependencies across time steps this means that there is a possibility that the word light and time are not considered when trying to predict a user's intention the network has to make its best guess with is it that's pretty ambiguous and would be difficult even for a human so not being able to learn on earlier time steps causes the network tap short-term memory okay so RNN suffer from short-term memory so how do we combat that to mitigate short-term memory to specialized recurrent neural networks were created one called long short term memory or lsdm for sure the other is gated recurrent units or gr use LS TMS and gr use essentially fokin just like our meds but they're capable of learning long-term dependencies using mechanism called gates these gates are different tensor operations that can learn what information to add or remove to the hidden state because of this ability short-term memory is less of an issue for them to sum this up RNs are good for processing sequence data for predictions but suffer from short-term memory the short-term memory issue for vanilla arm ends doesn't mean to skip them completely and you still more involved versions like LST MS or gr use RNs have the benefit of training faster and uses less computational resources that's because there are less tensor operations to compute you could use LST MS or gr use when you expect a model longer sequences with long-term dependencies if you're interested in digging deeper I've added links to the description on amazing resources explaining RN ends and its variants I had a lot of fun making this video so let me know in a comment so this is helpful or what you would like to see in the next one thanks for watching

Info

Channel: The A.I. Hacker - Michael Phi

Views: 157,361

Rating: 4.950284 out of 5

Keywords: machine learning, deep learning, neural networks, recurrent neural networks, rnn, lstm, gru

Id: LHXXI4-IEns

Channel Id: undefined

Length: 9min 50sec (590 seconds)

Published: Sat Aug 25 2018