What is LSTM (Long Short Term Memory)?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
imagine you're at a murder mystery dinner right at the start the lord of the manor abruptly kills over and your task is to figure out who done it it could be the maid it could be the butler but you've got a problem your short-term memory isn't working so well you can't remember any of the clues past the last 10 minutes well in that sort of situation your prediction is going to be well nothing better than just a random guess or imagine you have the opposite problem where you can remember every word of every conversation that you've ever had if somebody asked you to outline your partner's wedding vows well you might have some trouble doing that there's just so many words that you'd need to process be much better than if you could just remember well the memorable stuff and that's where something called long short term memory comes into play also abbreviated as lstm it allows a neural network to remember the stuff that it needs to keep hold of context but also to forget the stuff that well is no longer applicable so take for example this sequence of letters we need to predict what the next letter in the sequence is going to be well just by looking at the letters individually it's not obvious what the next sequence is like we have two m's and they both have a different letter following them so how do we predict the sequence well if we have gone back through the time series to look at all of the letters in the sequence we can establish context and we can clearly see oh yes it's my name is and if we instead of looking at letters looked at words we can establish that the whole sentence here says my name is oh yes martin now a recurrent neural network is really where an lstm lives so effectively in lstm is a type of recurrent neural network recurrent neural net and recurrent neural networks work in the sense that they have a node so there's a node here and this node receives some input so we've got some input coming in that input is then processed in some way so there's some kind of computation and that results in an output that's pretty standard stuff but what makes an rnn node a little bit different is the fact that it is recurrent and that means that it loops around so the output of a given step is provided alongside the input in the next step so step one has some input it's processed and that results in some output then step two has some new input but it also receives the output of the prior step as well that is what makes an rnn a little bit different and it allows it to remember previous steps in a sequence so when we're looking at a sentence like my name i we don't have to go back too far through those steps to figure out what the context is but rnn does suffer from what's known as the long term dependency problem which is to say that over time as more and more information piles up then rnn's become less effective at learning new things so while we didn't have to go too far back for my name i if we were going back through an hour's worth of clues at our murder mystery dinner well that's a lot more information that needs to be processed so the lstm provides a solution to this long term dependency problem and that is to add something called an internal state to the rnn node now when an rnn input comes in it is receiving the state information as well so a step receives the output from the previous step the input of the new step and also some state information from the lstm state now what is this state well it's actually a cell let's take a look at what's in there so this is an lstm cell and it consists of three parts each part is a gate there is a forget gate there's an input gate and there's an output gate now the forget gate says what sort of state information that's stored in this internal state here can be forgotten it's no longer contextually relevant the input gate says what new information should we add or update into this working storage state information and the output gate says of all the information that's stored in that state which part of it should be output in this particular instance and these gates can be assigned numbers between zero and one where zero means that the gate is effectively closed and nothing gets through and one means the gate is wide open and everything gets through so we can say forget everything or just forget a little bit we can say add everything to the input state or add just a little bit and we can say output everything or just output a little bit or output nothing at all so now when we're processing in our rnn cell we have this additional state information that can provide us with some additional context so if we take an example of another sentence like martin [Music] is buying apples there's some information that we might want to store in this state martin is most likely to derive to the gender of males so we might want to store that because that might be useful apples is a plural so maybe we're going to store that it is a plural for later on now as this sentence continues to develop it now starts to talk about jennifer jennifer is at this point we can make some changes to our state data so we've changed subjects from martin to jennifer so we don't care about the gender of martin anymore so we can forget that part and we can say the most likely gender for jennifer is female and store that instead and really that is how we can apply this lstm to any sort of series where we have a sequence prediction that's required and some long term dependency data to go alongside of it now some some typical use cases for using lstm machine translation is a good one and another one are chat bots so q a chat bots where we might need to retrieve some information that was in a previous step in that chat bot and recall it later on yeah all good examples of where we have a time sequence of things and some long term dependencies and had we also applied lstm to our murder mystery dinner we probably could have won first prize by having it forecast to us that whodunit was the butler so is the butler if you have any questions please drop us a line below and if you want to see more videos like this in the future please consider liking and subscribing
Info
Channel: IBM Technology
Views: 147,649
Rating: undefined out of 5
Keywords: LSTM, Long Short Term Memory, RNN, Recurrent Neural Networks, AI, Artificial Intelligence, IBM, IBM Cloud, IBM Watson, murder mystery
Id: b61DPVFX03I
Channel Id: undefined
Length: 8min 19sec (499 seconds)
Published: Mon Nov 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.