[RNN] Applying and Understanding Recurrent Neural Networks in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys what's up spencer here today i'm gonna be going over the theory and the applications of your current neural network now what type of data you could use with a recurrent neural network you might ask well largely related to sequential data so time series or even text for that instance so in this video i'll be going over the theory and the applications of what you can utilize this model for but before you even go into the realm of recurrent neural networks i strongly recommend that you check out my other video on just the theory of artificial neural networks and also maybe even an application of what you can even use with artificial neural networks or like style against different types of architectures but in general it fits under the same umbrella so do make sure you check those videos out and also help out this video by liking this and maybe even subscribing check out my other videos and also my future content highly recommended but let's get back to the video so i basically just combined a bunch of the theory into a really neat slide deck on where there's a bunch of visualizations that are going to be occurring throughout the entire powerpoint to illustrate what the overall architecture of the recurrent null network is doing so hopefully it makes it a little bit more clear just to conceptualize everything over here so let me just highlight on the very first part of the drawbacks of the recurrent neural network so one of the major drawbacks is the vanishing gradient idea and this is essentially on you try or your model trying to update its weights but your weights are not being updated because your gradient is so close to zero so your model is essentially not training on anything and the other idea was the exploding gradient where you're you are trying to update your weights but your updates are so like massive that you're adding like a huge huge number to your weights and your model just becomes like exponentially like explosive so it's not really doing much in terms of just like capturing the details and then accurately predicting any reasonable results but there are uh some other models that you know deal with this like the lstms or the gurus but we'll be doing that in a different video a very future one and yeah but the other drawback of the recurrent null network is that it can't retain long-term memory so if it is training on let's say a like like a like you're trying to predict text for instance and it's training on like a 300 page book and by the the 200th page you're trying to predict the next page of output for instance it's not going to memorize or remember what it was training on like on the very first page for instance like maybe that's probably your most um your most applicable part of your model maybe it has your glossary i don't know but it essentially cannot melt like it can't retain a long-term memory advantage where some other models might have that like the lstm of the gurus and also uh the using the activation function relu won't solve the issue of the vanishing gradient which i mentioned earlier um and it's just it just usually boils down to what type of optimizer or what type of activation function you'd be utilizing but um using aurelius is not the be all and all so over here this is the input this is just your input that you could be plugging into your model and the very next part is that we have an initial state so this initial state is essentially initialized with a vector just full of zeros and this is a hyper parameter that can be tuned throughout whatever however many epochs you're going to be using and all of these zeros are going to ideally have different numbers different weights associated with them but the each of these states over here they always are going to be plugged into an activation function so this initial state always has an activation function and this will be a little bit more detailed and spelled out a little bit later in the slides but over here we have the the uh the input that's going to go to your next initial state so your h underscore one and your initial state over here h underscore zero your hidden state is going to be plugged into your your hidden state as well so let me step back a bit and let's denote these as the weights so i read this from x to a where a stands for just the activation and so this is going to be a to a so we have the initial state h underscore 0 is always has an activation function and we use the output of that activation function as an input to the next state as the input to the output this will be a little bit more clear with a few equations but we have these states and we have the activation function which i was saying earlier of the very first state over here where we are inputting our vector full of zeros into our output function and it's usually very popular to use tan h or rayleigh but really depends on what you're trying to do or what you're trying to optimize to within the specific range um but you are essentially going to utilize the activation outputs and the activation of your if you're given not your activation your multi matrix multiplication of your input and your weights so over here you have the weights of activation to activation you multiply your weights by your activation output from your initial state over here and you add that to your weights of your of your orange your orange line over here and you multiply matrix multiply that by your input and then you add these two and add some bias and your bias is always going to be um it's like it's like your intercept term for your given hidden state your very first hidden state over here and then once you have all the outputs from input 1 input 2 you have one value over here which is then going to be plugged into a different activation function over here and that activation function would then just be multiplied by the blue weight of your hidden state one and you are going to have an output over here y hat with the superscript one so that is essentially the overall idea of the recurrent neural network i'm going to take a step further so once we have the output of our of our matrix multiplication for our initial state for our of our input and our input for our initial state over here we have the the matrix multiplications and you sum everything up and then you get a new term so the a1 a1 these are the two exact same things but they have different weights going to them and then the cycle essentially just continues where you are utilizing the the a superscript one and the weights of a to a you multiple matrix multiply those two and then you add the matrix multiplication of your next input with your get like your different your different weight for your different value you plug that in into your activation function then it outputs two other terms which is going to be your activation output and then you have your second activation value you multiply that by your weights and then you get one value over here and then it keeps on going and then it just does recursively so hopefully that is a little bit more clear on what is going on in terms of visualizing the recurrent neural networks and so note that the activation terms within this the a's of one that are being outputted so each hidden state can have a different activation function associated with that but that's just a little bit more customization and you can always play around with that but that is essentially what the algorithm of the recurrent neural network is with visuals so hopefully that's a little bit more clear and if it doesn't really hit this spot i can be doing a coding demonstration on what that looks like so there's going to be two parts to this overall algorithm that i'm going to be utilizing i'm going to be doing multi-step forecasting as well just single state forecasting as well as with all the architectures that's involved but over here there's going to be a jupiter notebook everything's in python for all python lovers and our very first step we'll be loading in all of our packages so of course i have the standard data science packages where i have the math library and the data science library over here this is the data scrape this great package that i'll be utilizing and in previous video that i've done it was essentially just doing financial data and how to scrape that stuff i was using this a little bit more in detail if so make sure you check that video out if you are interested i got a plotting package i got some scaling over here and then i'm using keras the most recent version which is i'm going to put that somewhere in the screen somewhere i don't know at the top of my head i would probably do like a pip list but yeah so i'm using these packages over here and i'm doing a random seed so you can definitely follow along um on this so let's let that load and i'm gonna be using the interval uh just getting data from like just gold essentially and we'll be predicting what the value of gold is um just utilizing the recurrent neural network so this is where my interval is going to be so i'm going to get about five years worth of data right now it's december 20th it's sunday right now and 2020 so um i'm getting pretty much like five years worth of data of gold over here so let's load that in using our stock scraper there and this is with the data that i will be using my gosh i really like using financial data because it's just so it's like so clean and it's so easy to import and don't have to do all these headaches i'm just like further cleaning but anyways this is what we'll be using we'll be predicting the ingested clothes using the other values high low open close and volume so it's going to be a multivariate process however the very first step i'm going to be doing is a univariate time series where i'm going to be calculating the percentage changes of our adjusted clothes and we'd be predicting based on its own data just to keep it a little more simple then we could be a little bit more complex there so over here let's plot this this is the adjusted close so if you invested in gold like five years ago it's about a thousand dollars is that what it was yeah thousand dollars you are a very rich person it's like 100 ready to return so that's really good over a five year time frame it's excellent actually but over here i'm just cleaning some data i'll be using this data for in the future uh just for the uh the multivariate process which is gonna be a little bit different and this is just doing some cleaning and i'm scaling everything but over here this is the data i'll be using for the univariate example over here so this will be utilizing returns so this is the percentage change note that our very first observation is going to be an n a because we're always getting do like this approach of where we have to calculate the percentage change in the very first observation it's just nulled out because there's no change happening um just using that as a point but yeah these are all percentages so this could be read as 0.65 0.5 etc so let's plot what that looks like and this is what my returns looks like so it looks like the max that i've lost in like a given day from day to day and then just to close was like four percent and the max game was like six percent so it's pretty cool look at the histogram real quick over here so it actually looks pretty normal sort of uh but that's a really cool graph you can always take a look at that and it like it's like yeah it's a really nice bell curve almost that's really cool but over here this is where we're going to do a little bit more of cleaning so i'm essentially going to be reshaping my entire data set into basically an n by one matrix so just just basically match everything together and this is what that function does over here so i'm just going to call this a numpy array so i get the returns i called the values which is just the observations minus all of the the row names and the column names it doesn't like that so it ignores that it just gets those values indexed at the very first one so i ignore the n a values and i reshape it so the negative one on the reshape just means that it's a python interpreter on where it is essentially just given a one dimension over here it will automatically recalculate however many dimensions it needs to be over here based on the original dimensions of our shape so since our data were already an n by one matrix over here uh the rows is just going to be the same amount of rows that i have up here which was like 1331 so and then i just matched that down to that very first column and i'll print that out for the npa numpy array and then last but not least i will scale it this is by default but i decide just doing just rent it out for more clarity now i'm gonna be going between the range of zero and one for the scale and then just call scale.fit transform on the numpy array and we can print out the length of it here so we have 1330 values so that makes sense right initially we had 1331 but that's because we took out the very first value over here and now we have 1330 so that is the length and um the next step is to essentially just calculate the time series datum so essentially this since everything has to be sequential in terms of training the model and it leads to one specific output so you're given 10 x values and you try to predict that one output over here so using a little bit more of a visual let's say i'm trying to predict three day or one day in advance using the past two days so i'll be using one two these first two values and i'm trying to predict this value that's essentially what a time series is uh you're using the um the values right before the predicted value right but you're given that very bl like that block right there you're trying to predict that given output so using first two values predict what that is of index using these two values predict what that is um and so since we're trading these are all true values but it will be a little more complicated when it comes to actually predicting um like out of sample data and so yeah over here so a little bit more in detail so i'll be using 10 historical data points to predict one step ahead and i don't really use this variable because i'm right indexing over here and it array counts for that um but over here we have the x and we are essentially just creating our train x and our train y values so i'm just appending the numpy arrays so i index at zero over here so zero to the i plus samples so therefore it's gonna be zero colon ten values so be getting the very first ten values zero to nine yeah zero to nine values and then we're gonna be getting the tenth value so it'll be at ten remember that python stays at zero so the the next step over this could be retaining that value and i'll be using that as a y so this is predicting one day ahead using the past 10 values that's essentially what it's doing and i just for loop it and i just append it to the x and y over here so let's run this and this is just a snippet of what it will look like um i just essentially just uh printed out what that length is going to be got 10 over here and then one value of predictions that are going to be over here so these are going to be 10 values one two three four five six seven eight nine ten and then this is going to be the 11th value where we are predicting on itself so hopefully that makes sense it's basically like a train and then you have a head you use the body to predict the head and then it keeps it keeps on going so over here we have some reshaping over here notice that i have 1320 observations instead of 1330 this is because of the minus samples over here since we wanted to incorporate the one step ahead prediction so that's what was being done here so we are expecting 1320. i'll also be using a threshold of 90 of our data will be considered to be training data in the latter 10 of the data will be used as a testing data and so the threshold is 11.88 so i'll just be calling the very first 1188 values and then the 13 30 minus the 1188 values for the testing part so that's being done here so next step is to create your architecture and this is probably where it gets really really interesting in terms of trying to maximize your model complexity to get your best results and all that stuff so it takes a lot of thinking on what type of layers you would want and also how many units what type of activation functions you want to use so on and so forth and this is where like all the expensive cost analysis and nifty math comes into play but over here i'm starting really really simple i'm using the keras library the sequential over here just to line up everything and it's so easy to use so first part i'm adding one layer with three hidden units so remember in our you know in the powerpoint uh one of those hidden states pointing to the next hidden state and then you have your input that's putting in we have like three units of those so yeah so that is our hidden layer and then just for you know um our yeah next part over here is a dropout layer this helps with the overfitting idea where it will take out a good chunk of our well in this case 20 of our training data and essentially just like ignore it and this helps prevent the overall overfitting idea where you have the under fitting on overfitting so hopefully our model does not memorize the past data because we don't want that we just wanted to recognize patterns and act on those patterns and then last part i just added a dense layer of one unit and this is essentially because we are predicting essentially like this is like a regression problem so we only need one hidden unit well we only need one hit one unit in our output layer so that's why we're using that and one of more important pieces over here i'm using the loss function of mean squared error and i'm using the atom optimizer over here so we don't really necessarily have to worry about lowering rate because the intricacies of the atom optimize optimizer is pretty good i think it's like default point zero one so we don't really need to hypertune that and then i'm just printing out the architecture so this is what it looks like we got a sequential model going on we've got the rnn with three hidden layers i mean three units we've got drop out layer and we have a dense layer 19 parameters so pretty simple uh and then we'll be training on this so this is the threshold i mean getting the first 1188 values which is our threshold value and then the first 1188 values over here i don't need to shuffle in fact we don't want to shuffle i just included a few parameters that i deemed very important into the overall scope but shuffling is essentially just like randomizing your observations and just plugging in for you so you don't really have to worry about randomizing your data when you're plugging into your model that fit so it's really a really neat function i'm using 100 epochs and batch size 32 and validation split 20 so i go a little bit more in detail on what the values of this essentially is like representing and i did a previous video on a was it a artificial neural network and i go into detail a little bit more on how these work but high level this is just a for loop that's just iterating through this is however many observations you want to plug into and this is however many of those batch sizes to of your training data to actually compare the results to event uh judge how well it's doing in the middle of its training session verbose equals one this is just outputting just spitting out data which is this this is verbose one if you equal it to zero then you're not gonna get any output and it'll be running behind the scenes so we have over here we got the training laws we have the validation laws ran really fast which is to be expected because it's pretty simple and over here this is what our training loss and validation loss looks like so rule of thumb uh this is really cool to just like just know i even put a few notes over here if your training loss is much greater than your validation loss then you're gonna be underfitting and vice versa so if your training loss is much lower than any validation loss then you are over fitting so in this case our validation loss is over here so it's a little bit greater than our training laws and so that just means we are over fitting and that means that our data is essentially memorizing the training data and it's just using the training data to output your test data so it's not really getting it's not really understanding the patterns which is you know you have to play around with it together like a really neat robust model and it usually comes down to what type of data you are using and in our case there's so many other factors that are involved in order to predict the next price of like gold usually politics but you know we don't have to go in there not be like a future topic a very large topic so over here i am going to be utilizing what many other folks have not really been doing uh in terms of just like guides and all that and this is a multi-step forecast so what happens if we want to predict you know a few days in advance instead of just doing a one step ahead and many people have been doing it wrong but this is what you should be doing um if you want to predict uh a little bit more out of scope well extern like using your your overall data set and then just like just like continue to step into the future to get like a a more a more robust prediction so starting over here we have our true y values so um at the 11 88th index we get the rest of the data and this just could be our true values which is fine we have going we're going to have a an array with the like just empty array we're going to be putting in our values over here so this number of forecasts that we're going to be doing is going to be 132 and our latest input so this is going to be our most recent uh x variable inside of our inside of our testing realm so it's going to be our latest input so this could be our 11 88 value and we'll be plugging that in to our prediction so we can get one step ahead but this is where it comes the the tricky part comes into play we're going to be we're going to be appending our prediction to our prep y over here but we are then going to update our latest input with that given with that given prediction over here so it's going to be somewhat of a rolling window or a sliding window as you might call it where we are essentially going to be chopping off our oldest observation and appending our newest observation so it will keep on it'll keep on rolling so to say so if you have a computer science background you can think of it as a circular array um but it's essentially it's like stepping through so like you're always like it it's like a conveyor belt it's always going to be moving along and your older values are going to be chopped out and your newer values are new values will be plugged in so it's going to be like a lethal if you're in the operation systems or industrial ise natural systems engineers so last in first out yeah so this could be like doing something like that so that is essentially what that is doing over here and a high level overview it's essentially just going to be using the predictions that you are using to predict your other predictions that's essentially how you do the multi-step forecast for time series so let's run that and it's running real quick so it's going to be doing 132 of these values and this is what our values will look like so notice our uh our forecasted values pretty much of like a flat line which is pretty much to be expected it's just using the previous value as the given um as a given prediction they will keep on doing that and that just says that your model is essentially like not good enough and it's not really getting the trends over in there so yeah i'll be using this value over here pct underscore c underscore gold and that is essentially this so we'll be using one two three four five we'll be using five of these columns over here that we got over here and we're going to be predicting the adjusted close over here and notice that everything is normalized between zero and one using that min scaler and i did that a little bit previously all the way up above i just went over it a really quick just to go to univariate time series forecasting but we'll be doing essentially the exact same thing we are going to essentially uh create our time series data set that we'll be training on we have our x and our y we are going to be appending these values and the only thing that's different is that our columns number of columns that we are using since it's multivariate we are using multiple variables inside of our prediction output so we can have five columns which is one two three four five pi to volume as our x variables and our fifth our fifth column um well our last column is gonna be the adjusted column so zero one two three four five we'll be calling here uh note that zero to five it does not include that most recent value same thing over here so yeah that is essentially what's happening here we'll be running that and this is what our data looks like here so we have one two three four five values for our first observation so the first five values represent you know that very first row of our independent values which is fine and then over here this is what our dimensions look like we just convert everything to an array we have 1321 values now um 1321 and then we have 10 uh 10 observations per array but it has five columns inside of each of those arrays you got going over here and of course we only have like one output like a scalar value for our ys because we're only trying to predict a y value so over here uh do some additional cleaning um we get the threshold again the nine percent of our x dot shape over here and that will essentially just act as a splitter for our train x shape which is this our train y shape test x and test y so this is what we are working with and i went ahead and i just ran the recurrent neural network but this is essentially the same architecture i'm just making a little bit more complex since we have additional complexity for our input so i have 30 internal units and using the same activation function 108 using same bias and of course we have the same input shape where the input shapes refer to number of rows and the number of features which in this case like number of columns which is five so it'll be ten and five and have the dropout rate over here incorporate the dense layer for the regression and model submarine this is what our output looks like over here so let's fit that data real quick it will take a little bit longer but notice that it's very similar to that of a univariate time series the overall structure and the overall like the overall architecture in terms of just like plugging in additional data for other future predictions it's essentially what it all really is and it looks like it's done so let's do the training loss that looks uh no no that looks all right i mean it's not that much different from each other but you know time series is a little bit wonky and this is the one time stereo step ahead forecast so it initially looks pretty good but you have to remind yourself like the um if this was that good you know in the overall financial realm then someone will not be so very much inclined to provide their model like this like the rnns because they'll be like a billionaire a millionaire but if you were to utilize this if you look at it a little bit closer right you're essentially using the very first well basically your historical data to predict your next increment of your data so uh and also look at how off it is usually so that can mean the difference between a loss of 50 or an increase of 50 percent and there's definitely a lot more research that needs to be done but then again i'm not a financial expert so do take it as you will but this is essentially a one step time forecast and also if you want to do your multi-step forecast which i've done earlier for the single univariate time series you would have to come up with a very clever idea to come up with predictions for the independent variables as well to get your new prediction so you'll essentially be predicting six columns um since our data that we are working with was it five it was one two three four five yeah you'll be you will be predicting like all five of these values when you are multi-stepping through because you would have to have your you would have to have new values for your high low open close volume and you try to predict your adjusted close which you are going to be appending to uh your input values so that can be definitely a little bit tricky but it's the same idea on how you would do a multi-step forecast there's just additional models that you might have to do in order to predict one one side one space you could probably run linear parallel recurrent neural networks for just predicting each individual piece that'll be that'll be sort of interesting but yeah that's essentially all you really need to know for a simple or current neural network you have the theory and you have the the applications on you can use this model for whatever you might use so have at it [Music] you
Info
Channel: Spencer Pao
Views: 7,278
Rating: undefined out of 5
Keywords: Neural Networks, RNN, Recurrent Neural Networks, Python, How To, Theory, Code, Cool
Id: FBlPZJrJt9g
Channel Id: undefined
Length: 32min 16sec (1936 seconds)
Published: Mon Dec 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.