Multivariate Time Series Forecasting with LSTM using PyTorch and PyTorch Lightning (ML Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys in this video we are going to continue with the bitcoin price forecasting example and more specifically we are going to build a simple lstm model which predicts the next price or next minute price for bitcoin let's get started so here we have exactly the same notebook that we left off last time and currently i'm going to change the run type to a gpu save that and after that i'm going to reconnect or connect to the instance and i'm going to check what type of gpu do i get and in our example this will be 80 let me just try out this one more time check it and after this runs i'm going to change basically some of the imports so still cry 80 unfortunately but that's all right i'm going to install by torch lightning and the latest or some of the latest version of tqdm and in here i'm going to add multiple slip model checkpoint and early stopping from pytorch whitening because we are going to use those and basically after that i will run this and this is again exactly the same thing that we had before running through the data i'm going to complete the pre-processing steps and next i'm going to just change the sequence length because i basically found that the longer the sequence is the better the predictions get and uh in our example we're going to use 120 minute data from the history of the model so i'm going to run all of this and we will be back after everything is executed all right so the execution is basically done and as you can see it took about 40 or 50 seconds something like that uh to complete the sequences and this is the amount of data that we have for training and test sequences next i'm going to show you how you can basically convert all of this into a pi torch data set and to do that i'm going to create a new class called btc data set which are is going to extend from python data set and basically we're going to override the two required methods which is the length and the get item method so i'm going to define this and this will take as a parameter sequences on which it's going to operate i can try today sequences all right and after that i'm going to overwrite the length method and this will basically return the number of sequences that we have in our example then i'm going to overwrite the get item method which uh takes as an argument the current index that we are interested in and from the sequences i recall that we are basically storing the data in those as a tuple and the first item of this tuple is the sequence and the next item is the label so i'm going to expand this tuple right now and take the sequence at the current index and finally i'm going to return a dictionary which contains two tensors the first one is going to be the tensor containing the sequences or the sequence data and the next one is going to be the label which again is the bitcoin price that we are interested in predicting so i'm going to convert the sequence from uh numpy array uh basically i'm going to convert it from uh pandas data frame to numpy array and i'm going to wrap this into pie torch tensor constructor and for the label we have another tensor which is going to take the label but i'm going to convert this into a volt because we are doing regression and we are interested in predicting a floating point number uh next i'm going to wrap or erupt all this into by torch whitening data module and to do that i'm going to create a class called uh ptc price data module which is going to basically split that um so it's going to create two data sets one for training and one for testing is going to take the training and testing sequences and then create waters basically for those and again we're in uh extending pl whitening data module right here again we need a constructor and in here i'm going to pass in the train sequences the test sequences and the batch size and we're going to put a sample value of 8 or whatever but we are going to change that in a bit and next i want to basically go here call the super method so we're calling the super constructor of the whitening data module and i'm going to store the sequences into fields of this class and i'm going to store the bot size as well today i can't write all right so i like to create this method called setup which we are going to call right after creating an instance of this class and this is going to basically convert the sequences into a btc data set i'm going to pass in the train sequences and i'm going to do basically the same thing for the test sequences all right so next we want to overwrite the three methods train data water which of course is going to return a data order for the training uh part of the the evaluation and training process then we are going to do the same thing for the validation water and test water and the validation and test water in our example are going to contain exactly the same data you are free to to change this and create test and validation sets for your examples but uh here i'm more interested in giving you up somewhat of an engineering approach of how to basically structure your whole project when you have to do some time series forecasting with an ostm or another model so i'm going to overload the train date order and i'm going to return a data waller and here i'm going to pass in the training data set as the first parameter and the batch size actually is going to be the bar size that we passed on i don't want this to be shuffled uh because this is um time series data and actually the time series data has a component of time which might be of importance when the model is warning all of this i think that there are there are some arguments against uh shifting and some arguments for shift link i think that you should do a research on your own and probably do an experiment or two uh on which works best for your project but i'm not going to shuffle the the data and i'm going to specify the number of workers this will essentially allow us to to vote the data a bit faster compared to a single worker and this will again depend on the machine that you have and currently we have this google co-op instance so yeah we are still not using the gpu that's right and then i'm going to create a validation loader and here we again have the data waller and this one takes the test data set and the patch size for this is going to be one and this will be very similar to how we are doing inference in the real world we most likely passing just a single example so this is the validation data order and i'm going to do exactly the same thing for the test date waller after this is done i'm going to execute this cell so we have the btc price data module and the btc data set everything uh is looking all right next i'm going to define the number if epochs for which we are going to train the model and the batch size and basically all those parameters are going to be defined based on the project that you have and the the machine that you have as well so you must probably you can start with those numbers and then continue on and fine-tune them based on your project i'm going to create an instance of the data module and i'm going to pass in the training sequences the test sequences and the patch size which is going to be the constant that we created here the path size and then after the module is created i'm going to call the setup method which is going to again create those data sets all right so i want to basically show you what a single item from this data set contains so you can get a better view of what is in the data and if i create this train data set and i start iterating over it i'm going to put on a break right here and if i output the item sequence and its shape and this is okay because if you recall we are returning from our data set class right here we are returning a dictionary which contains a sequence and a label and both of those are actually tensors so this should work and i'm going to do exactly the same thing for the label and i'm going to print out the label i'm not going to print out the sequence itself because that will be a very large output and here we have the label shape so as you can see this is a tensor that contains just a single item and we'll have to basically deal with this a bit later on and we have the label itself and recall that this is a scaled price of bitcoin so this is not exactly the route date or the the real price of bitcoin all right so next i want to show you how we can build a simple model that is going to use an lstm and in here i'm going to create a constructor which is going to take the number of features that we have the number of hidden units for the lstms in particular and the number of layers again for the ostm the first thing that i'm going to do here is call the init method and then i'm going to save the number of hidden units next i'm going to initialize the lstm layer of our model and if you're not familiar sorry about that if you are not familiar with what ostms are let me show you kind of the documentation in pytorch so ostm is uh short for long shorter memory recurrent neural net and basically of course this is no way an intro to lstms or rns but in essence those type of models allows us to input some sequence of data and then maybe get a single output as the regression example that we're dealing with or maybe you want to output multiple example multiple data points for example where doing translation of text you input for example english and then as an output you get french at the french translation of the text initially ios teams were primarily used for text but then again they are very general and can handle all type of all types of sequences so in our example we are going to use those for time series uh data and i don't think that ostms are pretty much state of the art in all of the possible time series evaluations or benchmarks some time ago there were there started to introduce transformers to time series data as well as to text and images and more recently it looks like the transformers are pretty much overtaking everything i mean like even text images and probably time series data but in our example we are going to stick to the basics and hopefully in some of the next videos i'm going to show you how we can use transformers for time series data so stay tuned for that and here the documentation for the ostm is very good and very well developed they also have an example of what the lstm can take as an input and as an output and in our example i am going to first start with initializing the ostm and this takes a lot of parameters actually but i am going to give the number of features as the input size so in our example this is going to be raindiff ship so this is the number of features that we have so we have nine features so the n features in our case is going to be nine uh the hidden size is going to be and hidden this is passed as a parameter right here next i want to specify that i want to input the batch size as a first parameter to this and this will be handy because we want to have a data order that can pass in a batch of sequences and each sequence is going to contain 102 uh 120 examples let me just check where do we have this yeah so we will have 120 examples in each sequence data points and for each data point we will have nine features exactly and i'm going to show uh this exact shape later on next this can get number of layers so this will basically stack ostms on top of each other and i'm going to specify a dropout this is a way to regularize the ostms and uh using dropout in lstms is kind of bit uh tricky but when using pytorch it seems like uh it's just past this parameter and everything is sorted out for you next i want to output a single number which is going to be the prediction of our model and this will take the number of hidden units as a size for the input and we want to output a single number so the number of out features is going to be one all right so this is basically the initialization of our model uh next we are going to overwrite the forward method and this will take the a simple sequence that contains the data for the current example first thing i'm going to do is call lstm flatten parameters and i think that this should work very well even without calling this method but this should sort out the memory or the gpu memory when you're doing some distributed training so this is rather a defensive style and it should work when you are doing distributed training because again pythos whitening allows you to do distributed training and multi gpu training very easily so this should get you sorted out with that and for the output i'm going to return to the documentation here and this is basically the output of a single lstm so the first thing is going to be the features of the last layer of the ostm then we have the hidden state which is specified for each layer that we have and then the cell state for some position t or time t of the iostm and yeah those are a bit hard to understand and depending on the the problem that you have you might want to dive a bit deeper into what you want to achieve but essentially what i'm going to do is call the lstm and take out the complete hidden state of the iostm and after that i'm going to take the state of the last layer because as you can recall uh we are passing for example two layers and we want the output of the last layer so this will be the layer that contains the features that we want to pass to the to the final or to the regular regressor layer that we have all right so let's return this and i'm going to output the out from here so this is the model that we have and let me run this so next we are going to create a whitening module which is going to use this model that we have so far and here i'm going to initialize it i'm going to pass in the number of pictures and uh basically when i have something that is client facing or user friendly module i would like i sometimes or most of the time like to type hint the parameters i think this is a good practice especially in python and i'm going to init this create a model i'm going to pass in the number of features i'm going to leave all the other parameters of our model and the number of hidden units and the number of layers at their default values and i'm going to initialize the loss function and since we're doing regression i'm going to use mean squared error was right here okay so we have to basically define a couple of methods the first one is going to be the forward method and this is going to take the sequence and the labels and you'll see why in a second first i'm going to take the output of the model or the prediction for the price and i'm going to specify a wasp as 0 and i'm going to return the waves and the output and if we have labels i'm going to call the loss function and i'm going to convert the labels i am going to basically add another dimension at the end of the labels so the comparison between the output of the model dimension is going to be equal to the dimension of the labels so next i'm going to define a training step and this will take a batch for examples patch index and i'm going to take from the batch the sequence this is just from the dict that we have in our data set i'm going to take the labels then i'm going to calculate the was in the outputs and i'm going to walk the results as the training was and i'm going to return the loss so we can basically walk this all right this was for the training step and i'm going to copy and paste in the interest of time the validation step and the test step there are pretty much exactly the same except that we are logging the validation was and the test was and again we are just returning the was i think that i can actually return the words right here as well but you will see all right so the final method that we need to overwrite is configure optimizers and here pytorch whitenings requires from us to return optimizer and i'm going to pass in atom with basically weight decay fix which is very popular when or regarding or sometimes using hugging face uh transformers or the hanging face library in itself and this would require the parameters of the model and i'm going to pass in a warning crate like this and this will be pretty warm and you'll see why hopefully when we start training this and now i'm going to initialize this class with the number of features which again i'm going to show you here is going to be 9 or the first element from this array so i'm initializing it with this okay so um i'm going to do something a bit different now i want to basically take this and create a data waller to show you what is happening exactly under the hood and i'm going to take the yeah let me let me actually run over the data module brain data water method and i'm going to break here and let me just print item is if this works this is not defined oh yeah all right i have an error so i've used this let me go back check this all right all right so here we have sequence and the labels and as you can see we have the bot size which is 64 then the number of examples or data points in each sequence and then finally we have the number of features as a last uh position in this shape so everything seems to be uh doing all right and this is basically the shape that you need to pass in when you want to train this particular model and the batch first parameter that we set to true is going to respect that the first dimension of this array or tensor is going to be the number of elements in the batch so this works out pretty alright next i want to start tensorboard and i'm going to specify the directory for which it's going to what the works from whitening works and next i want to start checkpoint callback and here i am going to use model checkpoint from pytorch whitening i'm going to specify the checkpoints folder and i just want to use this checkpoint or model checkpoint callback to save the best model during training i want to save only the best one so i'll say save top k equal to one i want this to be very both so we know what is happening and i want this to monitor the validation was and i want to look for the minimum value of the validation was because we are trying to minimize it next i'm going to initialize a tensorboard lower and i'm going to specify the name of the experiment as btc price then i want to add another callback which is going to be the final one and this will be early stopping callback so what this will do is look at the validation was and if we haven't done any improvement to the validation was for the last uh two epochs the training is going to be stopped and the patience parameter is going to be the number of epochs for which we want this to happen to be stopped and then i'm going to create this trainer from pytorch whitening and i'm going to pass all those stuff as a parameters and for the max epochs we want to specify the epochs that we have defined already i want to specify that we are going to use a single gpu and the final parameter is going to be required because we are training on google co-op so this should give us something useful so it says that we have a gpu available which is correct and we are going to use it for training next i want to start the training and i'm going to pass in the model in the data module and i'm going to wait and see what is happening right here so this will basically start the entire training progress process and go over the module and going to do a validation sanity check at the start so that all the all the parameters and all of the dimensions are doing fine and unfortunately this will not be very very fast because we have this uh ca 80 in our example so what i'm going to do is basically i'm going to upload a checkpoint which is going to contain a model that i've pre-trained and yeah unfortunately i'll have to to stop this and basically what the checkpoint that i have so i'm going to do that by creating the btc price predictor called what from checkpoint and i'm going to use the best checkpoint that i've just uploaded and i'm going to specify the number of features which is going to be train gf shape one again nine features and this model i want to be freezed or frozen so it is a bit faster to create predictions so i'm going to basically go over the test data set and yeah i can actually show you the the training progress that happened during all this uh as you can see i've trained this model for around three epochs that were kind of right and the validation was has been decreasing during those and at the epoch number three and four as you can see the validation was was not improving so the early stopping callback kicked in and uh basically the training has stopped so this was the training progress that i went over and in here you can see that an epoch takes about two minutes and a half on a t4 gpu i was working there all right so now that we have this data set i'm going to initialize it and then we are going to get the predictions from our model i'm going to iterate over it i'm going to take the sequence and the label then i'm going to ignore the wash off our model and i'm going to take just the output and i'm going to unsqueeze and squeeze here in the sequence put it over here and basically i'm going to append the prediction to the predictions and i'm going to do the same thing for the labels i'm going to run this and this should take some time so i'll see you when this process has complete alright so now that this is done you can see it took about six minutes the first thing that we are going to do is check the number of predictions and compare that to the number of test data that we had and these numbers are different because we have to basically do the length of the data frame minus the sequence length and then the number of examples is going to be correct so you see why we are going to need this in a bit uh so let's just show what do we have into the test data frame and you can recall that we actually did some scaling with min max keyword and actually the close price is the scaled price the scaled cost price and the predictions of course are in this same uh scaled range so we want to inverse this scaling and to do that i'm going to use a technique that i found at stack overflow and i'm going to show you what i mean right now so uh this is yeah this is something that i found and this answer is from yeshu vishnalia i believe so this dude right here uh i think that he had the best uh answer to this and basically the question is how do we invert this scaling and uh to do that we are going to use the scaler itself and the scaler contains basically those minimum and scaled values i'm going to show what i mean right now so this is the minimum values for each of the columns that we have as features and we have the same thing for the scale parameter of the scaler all right so just to take the last values of those i'm going to do something like this and this will basically give us the scale of the close price which is great and i'm going to write a simple function based on this but first i want to create a d scaler which is going to be minmax scaler which has the parameters from the our original scaler and i'm going to overwrite the min and the scale parameters so yeah mean underscore and scale underscore so after this is done i want to write a simple function called the scale and it's going to take this keyword and some values unfortunately the minbox keyword works only with 2d data so we have to convert those values to the numpy array and i'm going to convert the values to a numpy array and then basically add a new axis or a new dimension to this array so if you guarantees us that we have at least another dimension you might want to make this function more robust but i know that the values are currently 1d so next i'm going to call inverse transform of the d scale under this keyword and i want to work on the result so we are going to pass in one dimensional array or vector and we are going to output a vector as well so i'm going to descale the predictions and the labels all right and let's look at the for example the first five examples of the predictions and yeah those look as non-scaled values i'm going to do the same thing for the labels and again those seem like that they have been altered compared to the prices that we had here so this is very good and next i want to create a plot that compares the predictions to the labels that we have so to do that i want to take the original test data because i need the date time from it and i'm going to basically redo the split of the test data and i'm going to check that the length of the test data frame and the test data are exactly the same so this is uh all right but recall that the tray the test sequences must exclude the first amount of test of sequence lengths so just to take the data for that i'm going to take everything after the initial sequence length and this will guarantee us sequences data this will guarantee us that guarantee this guarantees us that we have the same number of data points here or the dates that we have into the test sequences themselves and i'm going to show you the test sequences data here which again is the original number uh the original data that we had so this looks pretty good i'm going to basically take this com right here from that data so how i'm going to put this first i want to use matpotlib to convert the dates numbers and i'm going to take the column date and convert it to a list so this will basically do a date conversion to numbers which we are going to plot right now and matplotlib by pywap has a plot date method or function which i'm going to pass the dates to that the predictions this code i want to draw this one to draw a line and i'm going to say that those are the predicted labels predictions i'm going to pass in the real prices i want the ticks on the x-axis to be rotated and i want to show a legend of this so if we put this we are going to see the result of the model so here in yellow we have the real price of the bitcoin for these dates uh basically this exact month march that we have right now and these are the predicted prices as you can see at least at the beginning it looks like the model is doing quite an alright job and as the time progresses the trend is i mean like it's kind of getting the trend but the power of these jumps are nowhere as predicted as the real prices and i mean like the predictions at least to me look uh that they get the general trend lines of the the real values and uh yeah you might think that you have a bitcoin price predictor but yeah this is not no way a financial advice so don't take this model and use it in production and basically buy or sell based on what it says i mean like it might work but do it on your own risk so this is pretty much the final result of our lstm there might it might be the case that i'm somewhere leaking some of the data but i think that this general approach to time series forecasting is very very good and these are the results that you might expect to get there you have it uh you have a model that can predict bitcoin prices i mean like it's not the perfect thing in the world but uh at least you know how you can get some time series data converted into sequences or a set of sequences how you can do some scaling of the data then convert all of this into a format for pytorch and pytorch whitening training how you can use an lstm and hopefully evaluate the results and get some some very well trained models of course you might want to twiddle or tweak the hyper parameters for example the number of hidden layers or the probably the dropout the number of layers of the ostm you might stuck at different architectures you might want to try attention in lstm and uh of course you might try whole new models all together but at the end this is a simple way of approaching time series forecasting projects thanks for watching guys please like this video subscribe to the channel and in the next one i'm going to show you how you can basically reuse this same framework for classifying uh time series data so we are going to take some input some sequence and we are going to classify the the the sequence into some uh predefined course thanks for watching bye
Info
Channel: Venelin Valkov
Views: 8,215
Rating: 4.98 out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning, Time Series, LSTM, Forecasting, Bitcoin, PyTorch, Python
Id: ODEGJ_kh2aA
Channel Id: undefined
Length: 46min 3sec (2763 seconds)
Published: Tue Mar 30 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.