Amazon Stock Forecasting in PyTorch with LSTM Neural Network (Time Series Forecasting) | Tutorial 3

Video Statistics and Information

Video

Captions Word Cloud

Captions

hey guys how's it going my name is Greg Hogg and in this video I'm going to show you how to do lstm stock forecasting in pytorch it's a really important skill and a great way to improve your pie torch knowledge so don't click away it's going to be a really helpful video I'm going to explain every step of the way super simple so let's get started now firstly I'm just going to get a couple Imports out of the way really standard stuff pandas numpy and matpod lib for graphing and data frame usage and torch is pi torch and we're getting their neural network module for later then we're going to load up our data as a pandas data frame mine is called amazon.csv it's just all of Amazon's daily stock history you can get that either your own from Yahoo's website or if you want the exact same copy I'm using which is what I recommend you can find that in the video description you can download that from my Google Drive and then if you're using collab you can go and upload it over here like I did okay let's look at our data set so if we open our data frame 6516 rows by seven columns this is daily stock history of Amazon we have the open value the highest value for that day the lowest value for that day the closing value which is what we generally care about and there's an adjusting close and the volumes of shares traded we don't care about almost all of this basically we're just going to look at the close column now something that might look a little funny to you is why is Amazon worth like what is this nine cents back in 97 no okay so they weren't worth nine cents it's that basically to say that this number is that much bigger than this number well you have to kind of do a bit of an adjustment on what it was before because if you actually go and look up Amazon's IPO value for 18 dollars per share okay so we don't see any history of that we don't see 18 because this is a hundred and one dollars at the very end of this so basically they do an adjustment because what stocks end up doing is they basically do this split thing where every now and then they'll say like hey let's just divide our stock value by 2 and so since it was 200 now it's worth one hundred dollars and that's just because they want to keep it uh kind of standard with other companies you don't want it to look like Apple's worth 10 times as much as Google if it's actually not if you didn't understand that don't worry I recommend you do some stock analysis because it's actually very interesting for your own daily life but that's basically the deal is that they did a bit of a transformation here so that this value is actually representative of it's that much more than it was before don't worry about that anymore okay but as I said we only really care about the closing value so this is just going to make that data frame only have while the date as well and the closing value so a subset here and now we have the date when that was and the closing value of the stock for that date this is what we're going to be forecasting now remember on Pi torch by default it's going to use CPU but if you want to use a GPU which you probably do you should do something like this where a device is Q to zero if torch.qt is available otherwise you want CPU and right now if we output device well it just depends on what run time I'm running right now I don't have a GPU so it said CPU later when we get to the training part I will restart it and switch to a GPU and we'll see that that switches to Q to zero okay now we're going to do some simple Transformations we're going to make the date column A pandas date type so that's pretty simple and then we plot the date and the closing value so over time we should see here it goes boom okay so as you can see from before like this it's not that Amazon was worth like one dollar or 18 cents or whatever it was at this point it's just a transformation so that it shows that it actually grew this much and then it went lower here again don't worry about that too much now I want to set up the data frame in a way that's close to how the model is going to take the input and train off of it so basically an lstm is looking at history so for one particular date it wants to know what the closing value was for that date and the date before that and the day before that we can do that with this massive block here and I'm going to explain how it works shortly but don't worry about the code here I'm just going to run that and check take a look at this data frame that we produced we have for any particular date here we have the closing value for that date so it's just closed so that's the same value then we have T minus 1 which means the previous day so on the previous day which would be the 26th this was the closing value and for 2 minus 2 it would be the 25th this was the closing value all the way up to a week beforehand T minus seven we have that would be on the 20th this was the closing value and this goes all the way from this you know original value here all the way up to this which is the end date that I have downloaded and so it's the same deal this is T minus one where so this is the value for the fourth so you should actually see it here 103.9499 that's that value there okay so it should be in this diagonal format here where if this is T minus 2 here well 2E minus 2 we go minus two and there you go the same value that's the closing now the reason our model likes this is because in here we have an input and output pair or we have an input Matrix X and an output Vector y here is our input Matrix X it's T minus 1 all the way all of this stuff and then here is our output Vector why is that the case well this Vector is what we would use to predict this because we're looking at T minus 1 T minus 2 we're looking at the last week of History to try and predict what this would be now of course we know what this is because this already happened in the past but that's what you use you use data to try and inform and train the model that hey this is our last week and this is what this next day turned out to be now for this date this is what the last week of History was and this is the date we're predicting on all the way up until our very last data point we have this is the last week that we were training on this was the last week of history and then we're trying to use that to show the model hey this is what it was then now the reason that this is informative is because look at the graph like in general if you give it say a week of History it's going to see like okay well clearly in this week here this was going up so of course the model is probably going to predict that it would go up or here and this week here clearly there was Parts where it was going down and so the model is probably going to predict that would go down we're trying to learn the sequential pattern this is why we're using a sequential based model in RNN because it keeps checking okay the price went down again price went down again price of it down again I'm probably going to predict that the price is going to go down again or if it was going up the recurrent Network updates it's going up it's going up it's going up okay I'm going to predict that it goes up now I don't want to focus too much on the code here and actually I realize that we're not going to need this duplicate line again basically this function is it's going to prepare a data frame for an lstm where it takes a data frame like the one we had above it just has the date and the closing value so a simple data frame it's going to take that and then a number of steps or basically a look back window we passed seven hence we saw we got T minus 1 up to T minus 7. if you instead passed look back is say two and we run that again we should see we have very similar things we have the closing value for that date but then only up until T minus 2. so I'm going to stick with 7 for this example I think it's easy to explain and it just makes sense for the model to have a week of information don't worry too much about the fact that stocks aren't running on the on weekends this actually does prepare for that as you can see there's a jump between well that's April Fool's Day so that'd be kind of funny but there is no April 1st or April 2nd there because that would have been a weekend but yeah basically this function is going to make a deep copy of the data frame and it's going to set the index of that to date in place and then it's going to do a series of shifts and I don't want to show you how this works it's just a complex function but it kind of keeps stepping where for the number of look back windows it's going to keep Shifting the data frame and it turns out that if you do it this many times you step like this then you're going to end up with this data frame here I would just copy the code you could dig it apart if you really wanted to would be a good practice for your brain but not really necessary so now that we have that data frame I'm just going to convert it to numpy it's the same thing but to numpy and it's not going to take the date column it's just going to have this as the First Column because this is set as the indexed right now you can kind of see that this is lower because it's not part of the data so this is our massive Matrix which is shifted DF as numpy now remember that since this example is going to predict this and same thing this is going to predict this basically this is our Vector of outputs Y and this is our Matrix of inputs X so that they line up so that each of these examples is using to predict this and we're also going to take a subset of this and make it the train and take the other subset and make it the test we'll just make the train some first portion like the first 90 or so percent going up like this and then the last 10 percent rows are going to be for the test now before we worry about X and Y and train and test we're just going to to run a scalar on all the data so we'll do is sklearn.pre processing min max scalar and we'll get the scalar is that with the future range of negative one to one we will overwrite our Matrix that we have above and just run the transform and fit it on that Matrix and we should see something that looks very similar to the above except the features are going to be all between negative one and one okay now that our data is scaled we're going to make this X and Y so we have X our Matrix X is that Matrix above the scaled one all the rows but then just the First Column onward because that's this piece here the First Column onward and then all of the rows so that massive chunk there and then Y is just going to be all the rows but the very first column that's our predictor and we have x dot shape and Y dot shape should be the same number of rows here but for X we're going to have seven different features sorry this is Greg from the future fixing a mistake in my first recording at this spot you have to do X is a deep copy of numpy.flip X on axis equals one because basically this current data frame here it has T minus 1 to T minus seven but for an lstm you would want it going T minus seven T minus six up until T minus one because it's kind of recurringly getting the more updated answer until it gets to the most close to the close value there is so you would just want to flip that to kind of mirror this in the horizontal Direction so you're not going to see that in the upcoming cells because I forgot it in the first recording but make sure you do that now we're going to work on splitting this to train and test we're just going to get a split index which should be the integer of the length of X which should be 6509 times 0.95 so this means we're going to use the first 95 percent as train and the last five percent as test and our split index so that's the row number at which we're splitting basically maybe off by 1 61.83 now we're are going to split X and Y into train and test so X is now becoming X train and X test Y is becoming y train and Y test and so X should be up until the split index the next test should be split index onward very similar for y train and Y test and these are our shapes which you should see that these match up perfectly now it's a requirement for pi torch lstms to have an extra Dimension at the end here so we're going to do a bit of reshaping to all of these things where the matrices are going to get another dimension at the end and the Y vectors are also going to get another dimension as well now so far all of this stuff is in numpy but since we're using pi torch it probably makes sense to wrap this stuff in pi torch tensors so that is really all this does and make sure that the values are floats they're all making them tensors of themselves and then floats we still should see the shape but now we'll see Pi torch shapes they're going to look the same as above now generally when training models in pi torch you use data sets rather than just raw tensors so we're going to make that data set object now it's really easy but to do that you can't just like make data set of these you have to make your own custom class and it's really really easy but to do that we basically just make a data set and then we call it whatever we want as our class name and then you put data set in there and we have to specify three really simple things a knit which takes a matrix X and an output Vector Y and we're just going to store those in self with the same name and then we get the length method where if you call when on this thing it will return what it will return the length of X or also the length of Y that would be the same thing and then get item which is the the square brackets thing the indexing that just gets X of the index and Y of the index that you passed in there really really easy it just returns that Tuple and that's it so from that we are going to make our train data set one of those passing in X train y train and the test with X test and Y test we run that and I don't need to Output anything because they're not really going to Output much if you look at them but just to take a look it's a data set one more thing we actually wrap these raw tensors not in just a data set but we wrap the data sets and data loaders to get our batches so we get our data loader here we'll specify the batch size is 16 and we will get our train motor is the data loader passing in that train data set with the batch size of what we specified and we want to shuffle that every time and test it's going to be the same thing except Shuffle is going to be false we run that and here are loaders that's where we're going to use to iterate over get batches and make updates to our model now this next part isn't quite necessary but for visualization it's quite helpful we get for underscore later we'll keep track of something else don't worry about that we're basically just looping over and getting a batch from the train motor and we get the X part of the batch and the Y part of the batch is well the X batch will be batch Sub Zero and the Y batch will be batch sub one but we have to convert those or put those on the device that we're using later when we make the model we will make sure that the the model goes to that device as well so we put it to either the CPU or the GPU and then we're just going to look at the X batch shape and the Y batch shape what are we expecting well since the batch size is 16 and what are we looking at for the inputs well the inputs are the X should be seven by one and the outputs should just be one okay so we are going to see that we have our size of 16 by seven by one we have 16 examples where we have the seven look back window and then just that extra one there and then for each of those look back windows we have our predicted output which is just one value here's where we create our model in a moment I'm going to paste a ton of code on the screen so don't feel bombarded I'm going to explain it every step of the way but let's be honest this is confusing it's a confusing model and lstms and pytorch are just more annoying than they are in tensorflow of course like most trade-offs here with pi torch and tensorflow tensorflow is a little easier for beginners but pytorch is a little more flexible for people really in the field here it's going to look pretty confusing but let me try my best so here's our code we have a class lstm neural network dot module it's going to take in the input size which is our number of features we'll pass that A1 hidden size which is however much however Dimension we want to be in the middle there could be 1 2 4 32 the same type of thing in a linear layer and number of stacked layers which is basically you can stack lstms because as they recurrently run through themselves they're going to produce a sequence themselves they'll produce a sequence of hidden layers and so you can stack these things if you want if they didn't make sense I would check out Andrew Ink's courses if you want to hear about that otherwise just take it as the more of these things that you have the more complexity that your model would have so we're going to do our standard super init we'll get to the hidden sizes the hidden size our self.numb stacked layers is the num stacked layers now we don't need to keep track of the input size because it's just where we create our lstm part so the lstm part of the model is going to take an input size which again is just the number of features we're looking at it's just going to be one thing because we're just looking at the closing value the hidden size whatever we Define for that we're actually going to end up passing four of those and number of stacked layers again I'm just going to give that one because it turns out that this model is going to overfit pretty well anyway so we're just going to use one of those and batch first is true which means that in some later pieces you'll see that it returns at the batch as the First Dimension finally after your lstm it does the recurrent part for taking in the sequence you would want a fully connected layer which is going to map from your hidden size whatever that hidden layer was to just one because at the end of the model you need that to predict just your final closing value so that's why that's a one okay and then what your forward function does is going to look extra confusing if that wasn't already confusing enough we have the batch size which is x dot size sub 0. so you have to kind of dynamically get this size from the input that's important and then we have h0 and c0 man what the heck are these things okay so if you go to look at an lstm they have a couple of these gate things these gate Vector like things and I don't want to get into the details but basically you have to kind of initialize the lstm by passing it these default h0 and c0 things and they have to have this particular shape and they also have to be on the device that you're working on I do not want to get into the details it looks like this and you initialize these two things and then you're going to pass them in this Tuple here when you go to use the lstm so you get your output that you actually care about you get your updated Tuple of your h0 and c0 which we're going to ignore because we don't really want that basically after you pass X in here it will give you your output and then furthermore output is not actually the output that you'd expect it is you get output from passing in well of course you have to do your FC layer but then it's out front colon negative one colon right now I am not digging into the details of why that is the case trust me it is going to work we create a model where we're passing it we have one feature just the closing value 4 hidden size you could play around with that if you want num stacked layers you could also play around with that if you want but just one will be fine we will create our model put it to the device and here you go in the same format that I did in the first tutorial I want to first show you the training Loop using functions we haven't quite written yet for train and Epoch and validate an Epoch and in a moment we'll write those functions although I'm not going to go into them as much detail because I covered them a lot in the first video and we played around with it a bit in the second video so I would check those out if it's if it's a little confusing to you but we're going to specify our learning rate we'll start this at .001 number of epochs we're going to make this 10 loss function is the mean squared error that might look a little surprising but this is basically a form of regression problem we're still predicting a continuous value and so we are going to try and minimize the mean squared error or Optimizer we're just going to use the standard atom by passing the model's parameters and the learning rate that we adjusted and then for Epoch in the range of the number of epochs we are going to train for knee Puck and validate for an Epoch now let's write our train function okay so making sure to put this above our training Loop here we have train one Epoch which is going to set the model to training mode we will print the epoch and we'll start to accumulate this running loss value then we're going to Loop over the train loader so this basically just gets a batch and keeps track of the index and we'll get our X batch and our y batch from the batch and put it on the device that we're using we will get our output which is getting our output from the model so we put in our X batch into the model that cuts our output and we get our loss is the loss function of the output and comparing that to the ground truth so comparing our model's outputs to the ground truth is our loss value then we get the running loss is plus equals the loss dot this should really be dot item but it doesn't really matter basically the loss is just a tensor with one value and so we will get item which just actually gets the value from that tensor so running loss plus equals the Lost item will accumulate that loss and then we zero out the gradients we do a backward pass through the loss to calculate the gradient and then we take a slight step in the direction of the gradient to make our model just a little bit better then every 100 batches so this says if the batch index mod 100 is 99 so at the 100th batch or every 100 batches we will get the average loss across those hundred batches is the running loss divided by a hundred that'd be the average and then we print the batch and the it should be the average loss for the batches but we just write loss is the well the word is printing that and then running loss should be reset back to zero so that that's proper we will execute that just create the function and now let's make our validate function so putting this above our training loop again and we'll put it below our train although it doesn't really matter here we set the model train to be false so basically put in evaluation mode we specify our running loss starts at zero and then the batch index and the batch is going to enumerate the test voter so we're going through getting batches from the test loader there's our Xbox and Y batch and then with no gradients we don't need to calculate a gradient because we are not doing any model updates so we don't need those we get our output is just putting that X batch through the model we accumulate our loss by or sorry we get our loss by the loss function is comparing the outputs that we got to the ground truth we accumulate our running loss by again I would make that accumulate the was dot item we get average loss across batches is the running loss divided by the length of the test motor so across all the batches we get our average loss value and so we can print that down there we will run that and then let's quickly just check if this works it is working and we should probably train with the GPU although actually it's pretty fast no actually okay fine we're sticking with CPU for this lesson because uh I mean it our the model already over fit so let's take a look so you can see the Vos was very high at first but it very quickly figured it out so it goes down validation loss goes down and yeah this uh this seem to train just fine it actually was perfect and I mean it was on purpose because I had set this to 10 epochs if you were training your own model you would of course always have to play around with the learning rate the batch size the model architecture but I'm setting this up so that this works right away and yeah let's carry on and let's look at our model okay this is just some code to do some plotting so we're going to do no gradients when we get our predictions so we'll get our predicted for the X train so this is for putting all the X train stuff in to get our predictions for the first 95 of the data remember the model's already seen this stuff so it should do pretty well but the model takes that in make sure that it is on the device when you're putting it into the model and then you have to put it to the CPU in particular that's not a typo because after you convert it to numpy well numpy doesn't use the GPU so the stuff that you're using better be located on the CPU so we put it there so we get our predictions for the first 95 percent and then we do plot we plot the Y train is the actual closing compared to our predictions for that and then we plot the date and the close and we should see our graph where the Orange is the predicted and the blue is the actual remember the model's already seen this before in training so it should be pretty good but you see the day as it goes on and the closing value whoa is a value from negative one to one that's just our scalar so remember we scaled this data on purpose so it's all in this small range for the network to use but we're definitely going to have to use an inverse transform in a moment because we don't want to look at these values it just looks weird now this next piece might be a little confusing but I'm going to take it really carefully here we have our values between negative and one and one so we need to do the opposite of what our transform did so that we get our original scale back the real numbers are dollar values that are actually useful to us now it's not too bad but there's a little bit of a trick that we use Now train predictions is going to be predicted.flat that's not a trick I just making sure that it's one axis that's important and then dummies why do I what am I calling this dummies well it's a zeros Matrix and we'll look at it in a moment but basically I don't care about what most of this stuff looks like it's really just having the proper shape and then we're going to adjust the First Column so dummies is the zeros where it's the shape of the x-trained shapes of zero so the number of examples that we have and then our look back plus one it's look back plus one because that is the length of this whole thing okay so I'm trying to make a dummy Matrix that looks like this so look back plus one well this is seven so look back what plus one should be eight so it's the same length here and then it's going to be the row wise the number that we have in X strain so why am I making this weird dummy Matrix well it's so that I'll set the First Column that sets the First Column of that to the predictions that we have and then we can apply our inverse transform because our transform is used to taking that whole Matrix and maybe I don't know if you guys know an easier way to do this maybe there is but basically I just made a matrix that has the same shape as what the scalar was used to and then I did an inverse transform on that whole thing so that we can again just care about the First Column so it actually transformed that First Column well and then we'll get back our train predictions in the right scale is we're going to get a deep copy of that dummy's First Column again that transformed column and training predictions is only that First Column that I'm selecting and here you go you should see early values should be very slow small numbers like we saw and high numbers should be or later numbers should be high numbers like we saw now we're going to do something very very similar for the ground truth this was for the predictions but now for y train since that's also currently on the wrong scale we get our dummies is our Matrix that looks basically the same thing dummies is that First Column is now going to be y train dot flat so we update that First Column apply the transform on that whole thing and then we get new y train it's just going to be a deep copy of that updated first call and we should see why train there now we're just going to make the same graph it's exactly the same thing except we have new y train there and train predictions should be the same as before looking at the close oh sorry those are backwards labels this should be day and that should be closed so here you should see the day on the x-axis close on the y-axis before except now they're on the proper scale so the graphs they shouldn't really change or look any different it's just that they're on the proper scale now now we need to do this for our test stuff now I'm going to do this all in one chunk here before we kind of looked at this intermediate graph which was we had on the wrong scale I'm just going to show you how to get the test predictions and convert them to their proper scale here and I'm purposely using slightly different code just for pi torch to get different practice with different functions that do similar things that's a lot of wording but just take a look so test predictions is you again get your X test and you put that to the the device the model's on you get our outputs then we do dot detach and you'll know notice oops not that you'll notice there's no torch.nograd here this is kind of an alternative in this situation where we detach this from the computational graph then we put it on the CPU then we can do numpy and we're going to flatten that as well so it's a very long chain of these things and that's not unfortunately not uncommon in pi torch but it gets the job done basically it is just a vector of our test predictions if you're using tensorflow it genuinely would just be like you just run them through the model but that's what it is dummies is the same thing as before so I'm really not going to explain that we are just going to convert our test predictions to the right scale and we'll run that and you should see the proper scale on the test predictions there now we just need to get our ground truth on the proper scale but this time it's for y test I don't want to explain this because it's basically the same as it was for y train and now we have new y test on the new scale and let's just finish this off with our final graph so this should be our new y test our ground truth that the model kind of seen like it hasn't actually seen it it just evaluated through it and our test predictions as well so on the proper scale we have our actual and our predictions we have this is probably okay this is the right labels this time day and close and here you go okay so after this is the part that occurs after this piece here you could optionally do some extra work to get what these particular dates are although it's a little bit annoying so I'm not going to bother doing that good practice if you want to this is what the model did so you can see it's kind of just a lag behind it it's following it pretty pretty well it's hard to do forecasting well like stock forecasting if you were to actually get a really good job of this then then you should go to Wall Street to be honest that's uncommon and what you should not do is think that the model that we just created is for predicting stocks the next day you will almost certainly lose money if you do that and if you even make a more complex model of course people do it it is kind of crazy though most importantly if you are doing stock forecasting think about the intrinsic value of a company think about the current state of the economy if there's a lot of negative news and it's been down a lot recently actually it's going to go opposite most likely and that would be a time for it to go up conversely if it's been going up for a long time it's probably just going to go down and and most of these stocks work well together occasionally companies have fallouts and they have issues but for the most part if Amazon's doing well so is Netflix Facebook Google Apple all of them kind of work well together so that's the sort of sort of stock analysis you should do nonetheless here is our lstm stock forcing for forecasting model I hope that this was helpful it's a really really useful skill to be able to do stuff like this feel free to try out other models like maybe a GRU maybe a linear model whatever you'd like to do if you're not subscribed to the channel I'd recommend doing that awesome idea to do drop a like on the video if this was helpful and uh yeah I'll see you later guys bye

Info

Channel: Greg Hogg

Views: 51,824

Rating: undefined out of 5

Keywords: Machine Learning, Data Science, Python, Deep Learning, ML, DL, Py, Jupyter, Colab, Tutorial, Step by Step, Spark, PySpark, Big Data, Data, Neural Networks, Data Scientist, Sklearn, Scikit-Learn, Keras, NumPy, Pandas, PyTorch

Id: q_HS4s1L8UI

Channel Id: undefined

Length: 31min 53sec (1913 seconds)

Published: Sat Apr 08 2023