Recurrent Neural Networks (RNN) - Deep Learning w/ Python, TensorFlow & Keras p.7

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on everybody and welcome to part 7 of the deep learning with Python tensorflow in chaos tutorial series in this part what we're going to be talking about is the recurrent neural network so the idea of a recurrent neural network is that the order of some data carries significance and importance so two areas where this tends to be the case is in like time series data where data is organized temporally or chronologically otherwise you might have this is important in natural language so the order of something in a sentence carries a lot of importance as far as the meaning of that sentence so for example if you said you know some people made a neural network what happens when you feed it through some sort of natural language processing algorithm let's say that was just a deep neural network what it usually tends to be the case is we're gonna tokenize data which just means let's say in this case we're gonna split it by words so each word is its own unique feature but the problem with a deep neural network is this sentence which have the same you know meaning as a neural a neural network made some people and obviously these two sentences have widely varying meanings and impacts and all that so it's important the order of things in these sentences so so that's the idea of a recurrent neural network so how do they actually work and I've got some beautiful beautiful drawings I don't know who drew those but man they're beautiful anyways so the idea here is that let's say these green boxes here these are recurrent cells so you could have a basic recurrent cell very rarely do these get used usually it's like an LS TM which is short for long short-term memory there's also the gru or a gated recurrent unit but again I just pretty much see everybody use LST M cells in all honesty so the way these work is you know you've got your sequential data here and the sequential data is passed in let's say to one of these and then the cell does - it outputs two - two locations it outputs one to the next layer but - and the next layer could be the next you know it could be input to the next recurrent type of layer or it could be the output layer or a dense layer who knows but anyways each cell is gonna output somewhere to another layer but also down to the next node in the in the specific recurrent layer now where are these output - it doesn't necessarily have to be - the next thing down and then it also doesn't have to only go in one direction you can have bi-directional recurrent layers for example and you can get really crazy with the with the order of things and where data gets passed and stuff but in general this is a super basic one so a another way to look at it is like this is one cell and so so let's say you've got basically let's say this is like the second one in the layer so let's say it's like we're looking at this cell right here so this green box is this cell here okay so basically some some data has come from the previous cell it's wrapped around here it's coming in and the lsdm cell that we're looking at here it chooses okay what do we want to forget from the previous node then there's input data coming in okay based on this input data what do we want to add to this bundle of info that we've got and then finally okay what based on this bundle info what do we want to output to both the next layer and the next node that we want to output to so that's the task of the LST M cell now obviously that's pretty complicated especially when you know all these things like the whole point is to eventually just pass along like a scalar value so how do we do that very challenging stuff if you want to learn more about how LST M cells work specifically there is a great timeless right up on the LST M cell I linked to it in the text-based version of the tutorial so definitely go check that out if you want to otherwise let's get into writing our own basic current neural network so what we're gonna do in this video is just do a really simple one just so we can get the idea of like what really how easy a recurrent neural network itself is but they'll really the hard part with recurrent neural networks is getting your data set structured in such a way because the type of data that you're going to be using typically doesn't have targets it has it's just time series it's just informal kind of data and we don't really have you know what are we actually trying to point this to so there's usually a huge amount of pre-processing involved in when you work with recurrent neural networks so what we're gonna do here is just use a simple M nest example and in the next tutorial we'll do a realistic example with some time series data specifically with crypto currency prices so yeah anyway I'm gonna minimize this and let's begin with some imports so import tensor flow as tea jerk always does this to me if my tensor flow is TF fix that then we're gonna go from tensor tensor flow dot Karros models we're going to import the sequential type of model from tensor flow caras dot dot layers we're going to import the dense layer one because the output layer itself needs to be a dense layer but also we are gonna end with another like a realistic you know a dense layer before the output layer that's a pretty common operation to do and then we're gonna have some drop out just so nothing gets overly weighted and the LST M cell I am gonna comment here if you are on the GPU version of tensorflow check out the KU d NN l STM cell so even if you are on the GPU version of tensorflow this l esteem cell will indeed run on your GPU but the KU d n NL STM cell is even more optimized and i it's like five plus times faster even though both run on your GPU if you're on the cpu version tensorflow well then you know you're gonna this tutorial like to this model is going to take like five hours to train or something either way okay so so those are the things that we need and now we need the data set so we're just gonna grab em NIST real quick so M s is equal to TF karosta datasets amnesty and then we could just need to unpack it so it's just these two tuples and then it's extreme white rain then it's X test Y test and that's M nest dot load underscore data so again my apologies for using M nests but I think it's better to do it make this one it's basic as possible and then in the next one we're getting because honestly it would take us it's probably like the next one's probably gonna take us two three maybe even four parts just to get to okay now feed it through the neural network and that's the easy part really feeding it through is is not too hard so so for example let's do let's just do a print X train shape and then we should be able to do to deduce this from the above but X train and let's just get the first sample dot shape just so we get idea what we're dealing with here so so we can see we've got 60,000 examples of 28 by 28 images and of course the zeroeth element is just 28 by 28 so take a minute pause and think about okay so this is our training data and it's already in sequences for us right and what are the sequences well it's in sequences like what is this 28 by 28 well its 28 rows of 28 pixels per row so we could in theory say each each row is a part of a sequence and if you were to take the first row and then kind of scan downwards could you figure out what the number is probably so the so the hope is that a recurrent neural network could actually do this and maybe even more so we should see that it it could plausibly do it better than a deep neural network but we'll see the more important thing is just to show the recurrent net anyways but yeah so that's the sequence it's a sequence of rows of pixels that were that we're going to be feeding through this neural network so now we want to build the actual model itself so we're gonna say model equals I'm gonna give us some space here model equals and it's just gonna be that sequential type of model now we're gonna do model that ad and we're going to add in an LST M so how many are at Ellis team layer right how many cells do we want to have we're just going to use 128 for now we're gonna specify the input shape and that is 28 by 28 we already kind of know that but we can make this somewhat dynamic by doing X train dot shape and then 1 : so then we can pass an activation function and we'll use our good friend rectified linear and again do check out ku ku D anomalous TM I'm just not going to do that yet I'll probably change it before we run it because it is really really fast but for now we'll just keep it whoops didn't mean to move that we'll just keep it here okay so the next thing that we want to do is one more parameter which is return underscore sequences and that's going to be true so do we want this layer to return sequences like it was input like sequences were input or do we want this layer to return something flat right so if we're going to a dense layer we wouldn't want to return sequences because the dense layers not going to understand what the heck's going on but if we're going to another recurrent layer we definitely want to return those sequences this is just a long line anyway okay so so that's one of our layers and then we're just going to do model that ad and we'll throw in a drop out because that's what you do so we'll do a 20% drop out there and then we'll do another layer so model dot ad will add an LS TM where we'll do another 120 eight we don't need to specify input shape in fact I don't know why we never have to and then we will pass an activation function here activation equals rectified linear we'll do a drop out I'm just going to copy and paste copy paste and then let's do a model dot add will add a dense layer we'll make it thirty two nodes and then activation gosh can you guys think of what activation oh yes that's right rectified linear don't do a drop out and then we'll do will go to the final dense layer so I'm just gonna copy this paste and I'm just gonna hard code in ten that's just it's just how many classes do you have so we're just gonna go with ten the activation shouldn't be rectified linear it should be soft max at this point and okay I think we're done there with like at least with our you know model structure now all we need to do is do the compile and then the fitment so to do the compile we're gonna have an optimizer compile compile anyways TF Kara it's not optimizers optimizers dot Adam and we'll just specify a learning rate of 1e negative three and then we'll do it two K well we haven't really talked about two K but we'll do it decay 1e negative six or something probably that's too small maybe one a negative five so a decay is just over time we'd probably like you want to start with a larger learning rate to take those large steps that you want to take and then but over time you'll kind of find this like local spot but probably your two if you took smaller steps you could do even better so if you keep taking those big steps you're gonna keep bouncing around that little u-shape but if you take a smaller step you could plausibly do better so anyways that's what decay is about it's just over time every batch it's going to decay the learning rate a little bit so it's gonna keep shrinking that learning rate and so you'll take smaller steps anyways model dots compile and we'll say loss equals sparse categorical cross entropy I just want to say they did add a shorthand for mean squared error squared here you see you know that's one way you can measure loss and they said it was it's now you could use MSE I'd really like it if they added shorthand for all of them someone tell me below if they have and I just don't know about it but you know I'd like to be able to say SCC that'd be nice I hate typing that out okay so we've got our loss the optimizer equals opt really simple there and then the metrics that we're gonna track here we'll just go with accuracy and then finally model that fit X train Y train epochs we'll do three epochs validation data so we can see it over time will be X test Y test okay so that should be good to go hopefully I can run this not crash the computer cuz I know sublime likes to really get me so we'll go ahead and run that now I'll probably just go through maybe one epoch up let's see invalid syntax at the drop out what did we and maybe not what line is that 14 did we not fully close this off we didn't try again but fruit okay things are looking good so far alright and we are off we are learning the accuracy does not look very good but you can see how slow this is like this is extremely slow I would expect accuracy to be a little better even by now I wonder why this is going so painfully slow I wonder if I've made a mistake even so I don't even see loss going down and almost looks like loss is going up 14 3 9 3 8 hmm no first I saw this I was like oh that's our problem no we've got our final dense layer activation is soft mags what did we screw up we have two screwed up something it should be learning much more by now so this isn't it oh I don't know that we've ever showed this but I brought it up at least what aren't we doing here we are not normalizing this data so we could use the chaos normalization or we can just say X train equals x train divided by 250 5x test equals x test / 255 I'm gonna save that the other thing I'm going to do while I have you guys is show you Couty and NL STM so let's go ahead and import that as well I'm gonna change this to KU tienen Alice TM and this one and then you you can't the KU tienen Dallas team uses a tan H activation function which is you can google it if you want to see like the shape of it but you should be able to with rectified linear you should still be able to get a good accuracy here but the problem was def I almost kind of I almost kind of want to show you guys because I know some people aren't gonna be able to run this if they're on CPU and they just want to learn so I guess I'll just rerun it but we can even see so like this did not train and hopefully that was the problem gonna be really sad if that's not the problem but the real almost certainly the reason this didn't train is because we didn't scale between 0 & 1 so let's see the example yeah I mean look how much faster it's learning now all we did was scale it between 0 & 1 that's how huge of an impact that makes so yeah it's super important okay so I'm gonna go ahead and break well I guess I don't have to break it but I'm gonna show you guys now kudi and n Alice TM because it's so fast you know it's paste paste and then I'm gonna delete this and deLee this whoops we do want to have that so see we're already out like 4140 I'm sure we're gonna get into like the 90s or something but let me eta to three minutes so now let's run this one hopefully I'm not gonna crash the GPU sublime is pretty bad I probably should be running this in a console instead but whatever look at this I got fast this is the best the best thing ever what a great little addition I mean recurrent Nets are so slow this kudi and NL is TM sell is just glorious so we're already at an accuracy of you know 96% and before epochs were taking I don't know we were pretty deep into that one and it was like 2 minutes and 30 seconds to go whereas these entire epochs are taking 13 seconds that's that's crazy okay so I think we're only gonna run for three epochs so I guess I'll just let this one finish good done so validation so accuracy here 97 0.73 validation actually did better ninety-eight point 11 I don't know if I can get back to the other layers or not I did want to show like on this so like on something like this I think the reason why validation accuracy is higher than accuracy is because this is an average for the entire epoch whereas this is like at the end of the epoch we run it so and that's also why like this is that the this is an average for the epoch so then after the next epoch we can see a lot of times our average will jump quite a bit but even in this case it's still low lower than the validation accuracy which tells me we actually probably could have continued to train because by the end of the epoch we were much much better but we can also look at intense our board is lost still falling and all that so anyways that is all for now in the next tutorial we will go over a far more complex example of doing this on like a realistic time series data set because that it just takes so much more work but I didn't want to bog us down with the pre-processing without this because actually you know fee through a recurrent net itself as long as you have data that's already is in sequential form and already has a target it's actually pretty simple so anyways that's it for now a quick shout out to the most recent channel members Taj Jagdish Dana Larsen and Julio no sorry if I mispronounced anybody's names here it's pretty crazy how many like people's like different names that I've never heard before and I have no idea how to pronounce so I'm really sorry but it's just crazy like my audience just comes from from everywhere so it's really cool to see and again thank you guys so much for your support it's really appreciated so anyway that's all for now if you have questions comments concerns whatever feel free to leave them below if I did screw something up in general there's a red flag if validation accuracy is higher than accuracy but I do believe my explanation before is why that's the case also I'm I'm very confident that their data is prepared correctly but anyways if that ever does happen to you that should always be a massive red flag keep training and make sure that that is not the case off forever anyways that's it for now questions comments concerns whatever feel free to leave below if you've got any requests for things that you'd like to see in this series let me know otherwise I will see you guys in another video
Info
Channel: sentdex
Views: 178,449
Rating: undefined out of 5
Keywords: Recurrent Neural Network, rnn, TensorBoard, TensorFlow, Keras, Deep Learning, tutorial, neural network, machine learning
Id: BSpXCRTOLJA
Channel Id: undefined
Length: 21min 20sec (1280 seconds)
Published: Fri Sep 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.