Multivariate Time Series Forecasting Using LSTM, GRU & 1d CNNs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody how's it going so last time we looked at some lstm time series forecasting which is extremely essential very very important video and if you haven't looked at that one and you're trying to watch this video i'd highly recommend checking that out i'm going to link it up above here you can click that card and so this is part two or last time we were looking at predicting the temperature and although i was a little bit lazy on labeling the graphs here one of these two is the predicted temperature over time and the other is the actual temperature and you can see the predictions are very very close what it actually is is each of these time steps here these are hours so we use the previous five hours to try and predict the next hour and we can bump up that what we call window with information say the while we can make it the last seven hours of temperature data and try to predict the next hour and so on okay so we ended with this graph and that was our test predictions and our compared to our actuals and it's awesome here's our pandas data frame that shows all of the the data that was used to make that the predictions and the actuals we print that there now there's uh there's no problem with this actually it's great um there is a lot more to do in time series though where i'm not even going to cover it all in this video but something that you definitely might want to do is use more variables to help predict because you know often you know in neural networks or whatever type of model we don't generally just look at one type of variable technically we're looking at five data points or five different values to use our prediction or to make a prediction we're using the last five hours of temperature data except there's no reason we can't just look at temperature we can look at all these other variables and if you recall we have a bunch of them if i scroll to the very top here we have all of this information okay this is pressure we are using temperature just to predict temperature uh so we could use pressure to help predict the temperature and all these other stuff we wanted to although we're actually not going to do that something else that we didn't even consider was using the actual time itself as a variable and you might think like okay time it's a time stamp why is that important well actually we have a lot of periodicity here and if we were to graph it out over time and in fact sorry i hit my computer there if we do graph out the temperature over the years and we did that we can see a very obvious relationship here is you know it goes up and down like this and if you think about that that makes sense for the temperature and it would for many other variables as well where it's probably going to get hotter in the summer depending on where you are and assuming you have a summer like most places do and yeah very very interesting so over years we have this pattern like that and also there's another pattern we didn't even think about or maybe you did is well okay temperature we know that in each day it's generally hotter in the middle of the day and i'm not sure if this is true for every location but i assume it is uh it's generally hottest in like the middle of the afternoon or so and then it's colder colder at night and colder in the morning and it goes up and down like that in the same kind of way it does over years so we can use that information and that all actually comes from just what time it is here from that timestamp although it takes a little bit it looks like it looked like i was scrolling forever there um it takes a little bit of a transformation to get the something useful out of this we're actually going to be translating it two seconds uh just a number of seconds since some certain time and after that we'll do some sine and cosine transformation which i don't really want to talk about too much but the point is that we can use that sort of periodicity or just graphically it makes sense if you look like that as variables as well uh and including something like maybe the pressure as well just to show we can use other variables here as well and also there's no reason that we can we have to predict one variable as well like there's no need uh well if you wanted to predict just temperature that's great except why not predict the the pressure at the same time if we're trying to make a forecasting model for the future on climate i don't know why can't we just predict pressure we totally can and it turns out we can do it accurately okay so there's all of that to consider as well as something we haven't talked about is basically lstm and um i believe i mentioned gru last time i'm not sure if i got to that but basically these recurrent neural networks are generally used or thought of as the go-to for forecasting stuff and um you know predicting time series data they're very common in their natural language processing because they are time series as well and so actually there's not the only model that you can use we have these things called convolutional neural networks which people often associate with two-dimensional meaning for pictures goes through a picture except that doesn't have to be true we can do one-dimensional c and n's which go across go across like that okay so we didn't even really talk about that in first video all we did is make that one model and then we showed the results of that model and there they are train test and val and yeah we're going to be making that today in this next video sorry i'm just going to scroll to my other thing to make sure uh i'm in the right spot yeah so i don't know why uh people don't really associate lstms or cnns with forecasting as much as they should because they're a lot lighter weight as in they will find out there's a lot a lot less parameters that you can that you have to learn here which can make your model a lot faster both in training and actually on inference time uh how well it can how quickly it can predict as well okay so a lot of interesting things to to think about in this video and we are going to cover a lot of them so let's get started okay at this point i am definitely going to assume that you're somehow familiar with our code and what we did last time so uh if you don't want to watch that and you want to try and follow along that's okay but i'm letting you know if you get confused i would probably watch that one first so uh to start we're going to just kind of make a function that's um i should say before i start writing stuff and typing it wrong we want to automate the predictions and the actuals and this graph all the same time so i'm just kind of taking a matrix of input and then making a data frame out of it and then plotting those columns i want to do that all in one function probably should have done that last time but it leaves a nice warm up for this time to remember what we were doing so mean squared error do that import that as mse and we're going to find something called plot predictions one which is going to take a model a model in memory not a not a model path although that's an option too model and x and y and we'll say start and end indices okay and we're just going to make those defaults that we start at zero and n is one which is saying okay for this uh for this range you know how much do you actually want to want to plot and that's just for plotting we're also going to use this mse because i want to say the emit the mean squared error but i'm going to make that mse over the entire set but the plot is only over the indices specified okay so we're going to get the predictions that whoa definitely not predictions so predictions or preds whatever predictions is model dot predict of x and we're going to flatten that as before this is really just copying from what's above but putting it into a function uh we're going to make a data frame out of this and do pd dot data frame with those two columns so that's data we make it this dictionary we set data equal to it's going to be predictions and predictions is going to be made up of predictions and then actuals is going to be made up of actuals which is why okay that vector there and then we're just going to plot these things plt dot plot and then we're going to plot df sub we're going to just plot both those columns so we'll do predictions that column predictions and yes so how much do we actually want to plot we only want to plot start to end and we're just going to copy that copy and paste that because it's basically the same thing but we're going to plot actuals as well okay and we're going to return a couple of things we're going to return the data frame itself as well as the mean squared error of the whole thing so of actuals that's going to be y actually of y and the predictions okay so if we call this like so if we do plot predictions most notably we care about it on the test set of information so we'll pass it model one our first model and x test one and x or y test one and if you really recall if you literally just did the other tutorial you'll notice probably these variable names are ever so slightly different i i changed it just a little bit to match up with the rest of the code a little bit but it shouldn't be anything crazy uh you guys probably noticed that i spilled predictions wrong somewhere oh no i just have missing a one there and pop predictions one there you go so it makes a that's just how it's going to plot a data frame when it's plotting other stuff as well or show a data frame predictions and actuals you can see they're very close and the lines together are very close now again i was pretty lazy on making the graph if as a practice you can absolutely make this a title some x and y axes and uh a legend to show which one is which but i don't really care for the most part i just want to show um that they're pretty close to each other that's that's the main part of this okay so uh that was one thing i wanted to do and we'll use that a couple times now we're going to make um a convolutional neural network model so we're basically just going to replace the lstm stuff and so just to show you how easy this is and actually it's how i would do it if i was making it from scratch as well like we are is i would take the take the model because a lot of it's going to be largely the same and i mean you could play around with a lot of it by yourself feel free to do do whatever you can with it but we are going to make the turn the ones into twos you're going to be seeing a lot of that today i think we go up all the way to model seven and you can optimally do more yourself but here so we're happy with uh five by one so recall that this is the the time or the window length we chose the last five hours and the one here well one one's often just for formality and for being picky uh it wants it to wrap it in a list saying no we're actually only doing one variable today we will be changing that which is going to be exciting but not yet this is still our input and it's only taking the temperature that's the only variable we're looking at it five times okay so now the interesting part is we're going to replace lstm with a conv-1d and this is basically the sliding window thing we slide this window around so how many times do we want to slide this window or actually how many windows they're all going to be the same length but how many windows do we want to slide across this is going to be this first number we can stick with 64. that's fine and kernel size this is basically the size and it's kind of weird i'm using the term window twice here the window for the input is length five we look at the last five hours um but then this window we actually we actually slide this network window across this thing so you can think about it as this basically if we choose length two so kernel size is two that means and actually i guess i should use the word kernel instead of windows it's less less confusing but people often say window here this kernel we slide it across and it takes it goes in order of the input for those five things it says okay take these two things and turn it into one take these two things turn to one and so if we're doing kernel size two with a window of five it's going to fit it into the first and second spot second and third spot third and fourth spot fourth and fifth spot that would be four locations that it could fit and so it's actually going to turn this into a so it's going to do that 64 times so 60 64 by 4 is what it would turn that into and then after that we would need to do flatten model two dot add flatten because we don't really want it in that this weird kind of two-dimensional form and we'll flatten it so that we can convert it to eight release as before and our summary is as follows as we said uh so it's going to take five by one and then it turns it into four by sixty four sixty four by four really this uh the same thing and here total parameters is two thousand two fifty isn't that just incredible like 2257 if you recall and we like low parameters for the most part um the course complexity in the model helps this be accurate but it is way way way less parameters than this thing 2 200 compared to 17 000 even though i both i chose 64 for both of those things so that's incredible uh gru which we'll actually see after as a replacement to lstm it's another recurrent neural network and that one will have basically a balance between the cnn and an r and the lstm but anyways for now um yeah this is really really good that it's low parameters because it's a it'll make the model run faster easier to train and and many other things okay so after we've made that model we are going to and i'm just going to make i'm going to be copying from this other notebook it's actually just copying from the same notebook as above i just don't want to give you the headache of scrolling up and down we're going to take the the checkpoint code which is that says cp2 which is kind of confusing because we haven't done a cp2 yet so actually let me just show you uh cp1 and we will translate that we will convert that into cpu one so i just lied i said i wasn't going to scroll up and i did um really all i'm doing here and i'm going to be less picky about this later on because we're going to do it a bunch of times but cp1 we're changing that to cp2 we're going to be saving model two model two dot compile mean squared error is fine optimizer with this learning rate is fine and so on okay compile that and we're gonna copy and i'm just going to make this exactly as it is in my other notebook here you can follow along pretty easily model 2.fit x train y train x y val and make that the callbacks so we'll fit this convolutional network model and we can see it trains very quickly the loss is going down as is expected and desired root mean squared error as well that's just the square the square root of it and that's going to go for a little while i don't really want to keep talking for the next minute or so okay i'm back so valvos it went down to about 0.49 which is great uh if we compare that to our earlier fit above i try not to do this because i know it's a big headache to scroll up and down uh valvos we got to about you know 4.87 before seems to be the best one so that's pretty awesome they are pretty much the same validation loss very very similar uh but the complexity of this model is significantly decreased like this model would run a lot faster most likely so we would probably prefer that okay so cnn is a is a great option to think about when forecasting and uh just another warm-up before we move on to more variables is you're replacing it all with gru and gru stands for gated recurrent unit it's basically a simpler lstm and i just copied this code in because it's really the exact same thing as the lstm code with model and i called it model 3 and same thing takes the same input except the lstm is replaced with the word gru and our summary should look very similar except here we have 13 000 parameters which as i said is between the cnn and the and the lstm so it's it's much more like the lstm though it's fundamentally a lot more like the lstm okay um so yeah so it's another recurrent neural network and it's it's a decent option the reason you might want to use it is because you know complexity is not always a good thing it's you know it makes it heavier and a more comp more uh slower to use model as well as you're more prone to overfitting if you have more parameters and more complexity so yeah you might want to use gru every now and then so we're gonna again copy in this checkpoint three three three and that's fine and i will train model three as well i apologize for getting code out of mid-air but it's really just uh it's it you can follow along with it pretty easily if you've seen the other stuff i think so i'm going to let that go and yeah loss is going down down down okay so on epoch 10 we got the the validation losses about 0.48 pretty much the same as the other ones and so again we would prefer this model over the lstm i would probably for this prefer the 1d cnn most actually but we definitely prefer this over the lstm because it's the same pretty much the same accuracy and idea but it is it's less it's less parameters and that's generally better if we could do that okay so now we're going to move on to the idea basically two ideas in one is using multiple variables so we only use temperature before we looked at five steps of it but we only used one temperature or one one variable so we're going to use other variables as well as this idea of translating a time into our time stamp into these usable signals we're going to convert it to this one time stamp we're actually going to get four columns of interesting uh very important features out of it which is the day signal both day sign and coast signal so there's two columns there day sign days day coast as well as year year uh gear sign in years your coast okay so uh to do this we are first going to add a seconds column to the temperature uh firstly so if i just remind you we have temperature is this thing and so we can make this new kind of data frame which is temp df is equal to the pd data frame out of out of temperature and this is a little bit weird that we're just making a a data frame with one column which is kind of like a series except there's a reason for it so pdf data frame like that if you want to see what that is it's just it's just the same thing but it's going to output it like it's a data frame because it's a data frame okay so we have that and we can see the index is still this uh this time stamp here which is going to be very useful so we can add temp df sub seconds up seconds is equal to temp df dot index okay that's that that's that date time stamp date datetime column there and we can map pd dot timestamp dot timestamp okay and it's a little bit weird how that's what that does but here uh if we look at this column this this new column here we have seconds which looks like some really big value it's 1.23 times 10 to the nine or whatever so very big value and what is it one billion i guess uh one billion seconds i guess if i have that right and so why isn't this zero seconds because this is the first time we have basically it's actually not uh this is not zero seconds because this is all from some earlier time stamp but it doesn't really matter the point is the relative time in that this is just some increasing value that's according to what the time stamp is so you could if you wanted to subtract some value to get this down to this down to zero and then so on if that makes you feel better but i don't really it doesn't really matter okay so we have uh this now as seconds and so this is a big stepping stone to converting this to um day sign day coast and year sign your co-signals and i know if you haven't heard that before you don't really know what i'm saying but it doesn't really matter so i'm going to plug in um well actually i'll write this out by hand just so you can get an idea for it so we're going to get two numbers here which is day and day and year now basically the number of seconds that are in a day and the number of seconds that are in a year so it's a lot so day is going to be well we have 60 seconds in a minute and 60 minutes in an hour and 24 hours in a day so 60 seconds in a minute 60 minutes in an hour 24 hours a day is the number of seconds we have in a day and then if we have day then we can say year is equal to and it turns out not to be 365 but more like 365.2425 it's kind of weird that's what it is um uh years in a or days in a year and so if we multiply that by the number of seconds that are in day this really should be called number of seconds in a day but just date for short so this would be number of seconds in a year so it turns out we can use these signals to do something like temp df we're going to add in a column here we'll add first the day sign signal and this is going to be equal to well something about sine unsurprisingly and i'm not really going to explain how this works too much uh go ahead and read about it on your own uh if you want to basically converting period signals into sine and time into period sine and cosine signals but uh we're going to add in or make this some sine thing where we do temp df sub seconds seconds that's and then we're going to multiply this by 2 times pi or num pi dot pi is sort in there and then divide that by the number of seconds that are in a day and it turns out that this gives you a nice sign signal uh sorry for the mess up there i tried to do something off screen and it didn't work so i'm just going to not do it so anyways that's our sign signal for for the day and we can look at it if you want with temp df i actually ran that we're in that cell and we can see here 0.25 0.5.7.6.96 and then it would start to go up and then would start to go down and start to go up and start to go down so we turn that into this nice this nice period signal the sine signal and we can do the same for a bunch of different things so i'm just going to copy actually i'm just going to copy the other three in there because i don't really see a point of ready to move for you i'll explain it decos is going to be the same except with coast and stick with day and your sign is going to be the sign same except stick with sign and replace that with here cos cos year and year okay and if you wanted to output the whole data frame there then here you can see they're very nice signals and what's also nice about this is there's not going to be any other pre-processing we actually didn't have to do any pre-processing before because we're only using one variable uh we will have to do this now but these are already in the type of range that we want that's that's going to be we're going to be very happy with that variable so i'll stick with that and then just put dot head so we don't have to have the whole thing in the notebook and carry on okay so now that we have this uh basically in our new data frame here we don't really want the seconds column because that's not going to be overly used useful to us you probably could include it in there except there'd be some weird co-linearity because these it's based off of these signals so let's not include that we're going to do temp df it's equal to temp df dot drop seconds and then axis equals one and this reminds me uh as after i show this that this is this is gone um why actually weren't we happy with just the seconds column in general because this is some positive value and we could have done some preprocessing like divided by the max value and then have values between 0 and 1 for each of these well the reason is that there is this period data where basically as this time goes on the temperature is going to go up and down and up and down and up and down except this seconds column this is just an ever increasing value and so it should really wrap around where um basically if you have a time a seconds that's equivalent to that's very close to december so say like late december and you have a seconds amount that equates to you know early january the next year um the second thing this doesn't show that because it just shows the one number is bigger than the other but really they should have a very similar they should have a very similar temperature value um and including the fact more importantly if we were to increase this three four years in the future well this time in december should this amount this value for the column seconds or whatever we're using it this value should be very close to whatever it is in that december because the point is that uh we're reflecting this kind of period data what time is it in the year not how many seconds it is how many what time is it in the year or what time is it in the day so here today it should be roughly similar to what it is uh two years in the future at the same time or tomorrow at the same time so we have to reflect these values somehow and so we do that with uh with this translating seconds into day sine d cos ear sign in your cost so i hope that makes sense the intuition is that this should wrap around not be some ever increasing value here okay anyway so we've got that and let's move on so the next thing we have to do is uh change our our function that we used to make our window so it made our inputs uh basically turn it into you know these this from this whole data frame into here's the five here's the five and also our labels as well so we'd have our matrix of five five five five five and then the whole vector of the corresponding outputs so to make this uh make this output or make this input output pairs we use this thing here and we called it like that and we'll see that soon but anyways going all the way down sorry for the headache we're going to copy that and this is the idea so just to recall if you don't remember um the point was that each of these rows here this is one row of information and this is it this is the answer the next temperature we have the temperature at one o'clock two o'clock three o'clock four o'clock and five o'clock we're gonna use that to predict six o'clock two o'clock three o'clock four o'clock five o'clock six o'clock we're going to use that to predict seven o'clock so this is about making turning the data frame into this function into this format that's what this is trying to do this whole thing is x and this whole thing is y so i'm going to get rid of that since we already have it and now we're going to call this uh df to xy2 just it's just for our next model here and window size is equal to 5 sure that's that should be fine as well but the difference here is that we are appending so remember actually well i just got rid of it yeah actually i i want to keep that for now because what we're using here is this is really t1 temperature at one o'clock t2 t3 t4 t5 we're using that to predict t6 and d7 and t8 now at least we're using this this this stuff to u7 for project 7 and this done to predict h so if this is all temperature data i'll just add t's here it's going to be the same same idea except instead of predicting or instead of using just the temperatures we're going to be adding these other columns and assume you know they're actually we're going to be adding all four of these things but i'm only going to show you one is this would be ds for day sign so this we're going to want to put this here day signed at 1 o'clock and then day sign at 2 o'clock and then so on so i'll just copy that in and i think this is worth the time skip ahead if you're annoyed by this but i think it's worth for beginners to understand really the format of the data here and once you get a hang for this then then you're really good to go and so day sign at one o'clock they sign at two o'clock day sign at three o'clock they sign at four o'clock and they sign at five o'clock and we're using that you know you could try to predict these other values although that's kind of stupid uh because it's time and you know what the time is but we could use that to predict whatever we want we're only predicting temperature we're going to use all this stuff to predict that t6 and t7 and t8 so we're not going to change that's what the label is we're not going to change the label it's going to still be the temperature but here we want to be adding in this row this whole row of data rather than just the list of the temperature and here this will be this would be ds2 ds2 and this will be ds3 and so on and so on it'll match it'll always be this number here so in each of these basically in each row we want to have the following type of data we want it to be a list so one thing two things three thing fourth and five thing where each value or each thing in there is a certain time step and each item is itself a list where the first thing is the first variable of interest the second thing is the second variable of interest that we also have in here for all of them we would have d dc one and here we'd have dc2 and here we'd have dc3 and we'd also have ys near sign one and y s two okay so we wanna make this function and i'll finally get rid of this hopefully that made sense and if that was boring i hope you skipped through it okay let's write this function so if you recall we're doing and i changed the window size here quickly to six because otherwise we would have five one two three four five variables and uh a window size is five as well i want those to be different values because i want to see how the shape turns out to be and so we're going to be building this thing up x and y where x itself is going to be its shape will be the number of training examples by the number of time steps that we're using times the number of variables that we're using so if we recall we're doing for the other function we have for i and range we go basically we iterate through the data frame with an index and and so we go through once and then two and then three and then four and then each of those in each of those we actually grab this thing df has numpy i until i plus window size so what that does is wherever we happen to be wherever our starting point is say our starting point happened to be here i was zero we would grab the next whatever piece of information the next window size piece of information we'd like to be left with basically this matrix thing here that would be this whole thing would be some some cut of the matrix and what it is is that each of the rows is the time steps in order and so we can do it turns out that we can actually just do if we replace this with r and it doesn't matter what variable name we use but it's r because each of these are a row of information if we just do if we just grab this thing that matrix that split of the matrix that we want and then we say okay row is our the whole row for our information and so our row is not going to be made up of just values it's going to be made up of lists of values and so here this itself is going to be a list because it'll be it'll be one of four of these things okay so it's basically r is going to be all of that and then again if you since this is a for loop in here the next r is going to be all of that and then all of that and then all of that and then all of that and so on so basically it makes it this uh this whole thing this list of lists here where each of the inner lists is going to be in going to be one of these okay a little bit confusing and that's why i try to take this really slow and seriously so you get how the input is made we're going to each time append this row and then label is going to well df is numpy i plus window size it's going to be the next the corresponding output and so well if this isn't if this is a a data frame here this is a matrix what that's going to be doing is actually grabbing making it the row the whole row itself so what we need to do is translate that into the variable of interest that we want and remember this if we were wanting to forecast multiple variables we could actually make this multiple piece of information as we will later but we want to make that sub 0 so that we append the right label on there just just the temperature value so if we do this we run that and we grab actually i'll just write it out for you it's pretty easy x2 y2 we're just doing twos because mapping changing switching with the twos from before they're sticking with the twos from before sorry about that x2 y2 is going to be df 2xy2 and that will be we're going to pass in the df which is i believe temp and yep temp df temp df and the window size is six is totally fine and then if we ask x2 dot shape y2 dot shape we should see well however many training samples and for x2 we should have however many training samples by six by five because we've seen six time steps and we're using five variables and y2.shape since we're just appending a value it should be just the number of training samples is the value that we have so here x2.shape.shape 7085 6x5 7085 again just to make sure you know number of training samples number of time steps that we're using just to switch that to four if you want to switch it to six after i'm actually there's no reason to switch it in there um here i'll just specify that the window size is four and here now it's we're using four steps to predict the next information the next value and so we change the training number of examples a little bit okay i'm going to stick with that with six and just actually remove that so there there's our matrix of inputs i'm calling it a matrix it's kind of weird that's really a tensor and and our output as well so now that we have that we are going to uh really just copy the exact same thing that we have way above and to not give you the headache i'll just copy it itself we're going to be splitting into train tests and validation sets and i'll remind you how that's being done uh we're also calling these with twos so we're just saying x2 train is going to be this thing x2 up until 60 000 an arbitrarily picked number um but note that we're doing this in order where we have trained as chronologically in order we're training on this and then validation is in the future and then test set is even in the future like after that and there's a reasoning for that and that reasoning is because you know in in practice when we actually go to use this stuff we don't have data in the future so we want our model to be able to predict you know you know future data better than it can predict past data we don't really care about that very much so anyways x2 train is x2 up until 60 000. the first 60 thousand rows uh y2 trains the first 60 000 information val is going to be 60 60 000 65 000 same for the same for input and output and test is going to be just uh the rest after this point and the rest after this point for x and y okay and if we see the shapes there then we should see uh pretty much everything match or well yeah everything match okay so now we have to do something a little bit uh tricky and annoying it's mostly just the syntax looks kind of weird since we're doing with tensors but uh hopefully you'll be able to picture and i'll try to explain it is uh preprocessing for this type of data so you'll notice that these are on the scale of you know temperature it's in celsius celsius it's going uh you know max like negative 40 or negative 30 up until like 30 40 50 or something like that but that's still a much different scale than all of these other values so we have to bring this temperature on to the same type of scale like between negative three and three or so of of these values so how do we do that um we want to apply the same exact function so with the same numbers doing the exact same thing to to whether it's the training test that the validation set of the test set and so what we're going to use uh to make the function is the training set information so we'll actually gather and we're going to do just what's called um i always get them confused uh it should be standardization which is no i forget which one it is honestly uh the one where it makes a normal curve whether it's standardization or normalization um and i think again standardization um because it's making a standard normal distribution and yeah so for that we need the training set mean and the training uh set standard deviation and that's going to be just for the temperature because we only need to pre-process that column we'll actually need to do it when we use other variables that are on different scales we'll need to do it for those as well but for this example it's just going to be processing the temperature okay now i want to do this through the numpy way so we're going to be getting the mean as well as the standard deviation of temp so temperature temp is temperature temp training training mean is equal to the numpy dot mean of x2 x2 train and then i'll just write an explain after colon colon zero so try not to get too bogged down in this i know it looks a little bit weird most likely but remember the first dimension is going to be the number of examples in this we want it to be made up of all of them so i chose colon the next one is the time steps and we actually we want all the time steps because you know we want the first hour and the second hour the third hour and the fourth fourth hour and then of those spots we have five and those are the five variables which variable do we want we just want temperature and i know it's a little bit weird but if you really think about it this will be all of the training uh the temperature values that are in this this whole thing is all the values the the temperature values that are in the training set and we get the mean of those and that's what we want for making our preprocessing function and we can do temp training temp training standard deviation is equal to numpy.std of x2 x2 train it's going to be exactly the same thing it's just the standard deviation of the same thing x2 train colon colon 0. okay and now we can make our preprocess function based off of these values and so we're going to define pre-pre-process of x so just any matrix whether it's training test or validation we're going to make that so that x colon colon zero so we're gonna set all those values equal to itself itself minus the temp training mean that we calculated so this is dividing or sorry that's not a minus that's a zero um so first we're going to take all of our values and we're going to subtract the mean and then we're going to divide it by the standard deviation temp training temp training sorry std and that's how you that's how you would do that and then if you want you can return x here but it is actually changing the thing that you've you passed in fundamentally okay so that is our pre-processing and now we can just go ahead and do this for each of our variables all right sorry uh what did i mess up there invalid syntax to find that um oh i'd mess the bracket there okay uh so yeah that's the that's the mean that's the standard deviation that's how we make a pre-process function we can just call pre-process on x2 x2 train pre-process on x2 x2 val and pre-process on x2 on x2 test i think i said the wrong thing there i'm not sure um train valve and test and note that remember because i did it i wrote it like this and even though we're returning x and we're not using it um this is totally fine so do that and we can see that that's our information right there and it looks a little bit weird in that form but i promise you that we'll be right now okay so we're going to make a model which which is really the exact same thing as above the the lstm model and i'm just going to copy it in for you and show the difference the only difference is going to be the input layer we're going to we're going to have 6 instead of 5 this used to be this used to be five and one but now we're going to change this to six and five because we have six time steps and five variables of interest and this happens to work just fine with our input and i don't like how it's calling it 3 when we're calling it 4 but if we really wanted to change that we could but i don't remember how okay and now above we're going to do it just exactly the same thing so really don't worry about this another checkpoint and another compile and we're just going to compile that it works exactly the same way we're going to fit it with x the x2 data so model 4 dot fit x2 train y2 train validation data x2 val and y2 val and epox like so and i'll show you at first that this is going to work as we can see the loss is going down very rapidly and yeah i'll talk to you at the end of the epochs well it's at the end of the epochs and actually fantastically we have a validation loss of 0.41 if you uh remember we are getting about 0.48 or 0.49 with the other ones so definitely a significant improvement with honestly not very much uh increase in the parameters it only it was like 17 seventy thousand before and now it's eighteen thousand five hundred or so so uh not much more more heavy duty even model but a lot better that is awesome so uh now that we understand how to do that and we made a model we are just going to use our plot predictions function so i'm just going to write in plot predictions 1 model 4 is that the right model yes it is model 4 with x2 test our new our input and our output for the test and here we go it makes our we can we can see here the predictions compared to the actuals are very close and our models you can see it's it's predicting uh very very close every single time so even better than before we can see uh this is the vowel test loss i guess is 0.49 and really really awesome okay so we are we are pretty pretty darn happy with that model but i want to add one more thing in here which is the pressure okay so we we're going to add in the pressure as well as the pressure as an output okay so we're going to add in the pressure using as a predictive variable but we're also going to try and predict it as well so we're not just predicting temperature we're going to predict the the pressure at each time step as well so they're going to be exactly the same as before and that's why we're just kind of leveling this up more and more each time except instead of a 1 here this is going to be a 2 for our model we're going to be opening two things and we're also going to be inputting it'll be the same here except we're going to be having another variable if we choose to use this input you really do want to use use it as input so we're going to put it there as well and that's going to change our data set here a little bit it's going to have another variable but that's really that's really it and the output the output y2 used to be this long vector and we just had vector that values comma but now we're gonna have comma two because we have two different ones okay so uh knowing those changes we're just gonna go down and uh and make them one at a time so okay here we go um we're gonna start with making that um basically adding that pressure to the data frame so here we have p temp df sorry that didn't take um oh i forgot to make this uh f11 sorry about that um okay so we're going to do not predictions p temp df which is just the pressure and temperature d data frame in one is equal to the pd dot cat we're just going to be adding this column into our other data frame we're going to be concatenating df we'll just put it at the front here df p brackets m bar happens to be what pressure is in the data frame all the way at the top we're going to be concatenating that onto temp df basically horizontally like this or however you want to call it x axis equals one makes it so that p temp ef looks like this okay it's just some code that really adds it in that the column there okay so now if we have that and i don't really don't really like that just show the head it's too too many too many information all at once uh and now so we're going to take that they take that function as above the the window making one we want to get our input and output pairs back but we're going to modify that for our use case here now i am also going to change the window size just because again we actually made it so that we would have had six variables since that would be six by six but we're going to do six by uh or seven by six instead of us and so here what actually changes in this function well i'm going to call it three just to make that easy to make our lives easier and it's gonna be very very similar so it's still gonna be this bracket r for r in this and it turns out the way we wrote this um it actually doesn't matter we don't have to change that part at all because or the input part because it's still going to be just append the row and then the row and then the row so that's totally fine the only thing here is that we want our output to be we want to be appending a list where it's going to be the list it's going to be that sub the first thing and that sub the the the second thing like that and there's other ways you could write this a little bit cleaner but this is totally fine like that and append the label and make the array and that's fine okay so then if we if we get our input output pairs i don't know why it's there i'm going to press this button a bunch of times i just nev i never have codes in a code block somehow so here and i know uh krishna does that a lot as well so 7084 by 7x6 so number of information or number of training samples times the length of our window we're using the last seven hours to predict the next hour uh and then six variables and then our output is going to match up except we're going to be predicting the temperature and pressure all at the same time we're going to copy and paste this thing in here because it's really boring it's just splitting it exactly the same way as train um train test and val we could have wrote a function for this we wanted to but copy and paste is honestly fine for this kind of things and all it is is change from two to three and that's literally it and the shapes are as follows seven training samples by uh window length by variables and then training examples by the the variables which are training uh temperature and pressure and it's like that for all of them now we're going to do something uh very similar and we're actually just going to write this out but it's mostly mostly copy and paste from above for our pre-processing and what we'll see shortly is that we actually want to it's probably not that big of a deal but we probably also want to pre-process our output as well which has some complications we'll see that but it's not that big a deal um we want to pre-process our output as well because now that we're if we look at the different values on um on temperature and pressure here they're a lot different okay so these these values are a lot bigger than these and our cost function or our network may be tempted to to try to impact these errors a little bit more than these ones because that's going to increase the value of the loss so we do want to try and put these on the same type of scale and we'll be able to restore that later with uh we'll call like post processing to kind of restore these values from the predictions as well since since we'll be telling it basically try to predict only like between negative three and three and negative three and three we wanna be able to to restore the actual prediction as well okay so but for now we're going to just make up our pre-processing the input as well uh for this we're going to for need uh pressure training mean we'll just put a three there is equal to the numpy dot mean of x3 drain sub colon colon zero okay so now that actually took over as the first column there so that's that's why that's the zero and we're going to copy that but replace that with standard deviation with an std and that's fine and that's going to be std there okay so we need those as well as the temperature and i'm just going to copy and paste the temperature because that's not overly interesting and yeah okay so we have both those variables now we can write a new preprocess function so how it looks was pretty much like this uh this before pre-process three um actually no i'll just write it out for you to make it clear so define pre-process of uh we'll put a 3 there of x this is actually just preprocessing the input preprocess 3x it's going to do exactly the same thing except to do that with both columns so we wanted to set all of those equal to equal to itself itself like that and then this is about the pressure so we didn't want to do p training training mean three and divide that by p p training p sorry it's not showing up pre-training std3 and then we want to do exactly the same thing except change the the um the temperature column that's going to be one now so one and one and this will be with temp three by the way these um sorry one second um i'm acting like you're you're like talking to me i'm talking to you um so i'm acting uh never mind i don't know what i'm talking about so basically these values here we didn't really have to gather them again it's pretty much just the mean and standard deviation of all the temperatures which we already did that earlier but it's a little bit cleaner if we have this here as well and why do i feel like i messed something up yeah i did peach yeah i figured that this was wrong um this is temp training mean time training mean three and this is temp training st training std three okay i was mixing them up there okay so that's going to be our pre-processing functions for the input as well as the input or the output it's really going to be the same thing and actually i will just copy this one for you because it's it's exactly the same thing except we're just setting it differently based off how the output looks if we remember the output looks like this 5084 by 2 for example so we're just going to say all the rows and then the first column corresponds to the pressure so we're going to minus those all by the tr the pressure and divide by the mean of that and then divide by the standard deviation and the same thing for the the temperature part so that's a little bit weird how we're doing that um and it might not be totally necessary but it's probably a good thing to do okay a lot of talking a lot of talking so it's getting a little bit sore we want to pre-process reprocess the x3 information for all those x3 train x3 train val and test this is going to be val this is going to be test those are done okay it pre-processes all those i'm going to kill the output we also want to pre-process the output and just to do that quickly for you it's going to be the same there it's just re-rousing the train the validation and the test happy with that no we're not i must call it something different uh pre-process output is not defined pre-process output is i was set not to find pre-process outputs did i not call that i thought i called this i'm just going to do this again because i don't quite know what happened there because the name the name definitely looks the exact same pre-processed output on y so i'm just going to run this stuff again because it gets a little bit weird if you do have to run it again um where did it come from where'd it come from where'd it go um yeah okay so those variables look right i'm going to recall that function get that again get that again and make sure that that's ran i'll pre-process it and i will oh it's because preprocess oh i wasn't calling three there that should be a three um yeah sorry about this part um i like to leave it in there just to show you what happens if you do kind of mess something up a little bit i don't like to pretend everything is totally pretty so i will get that again um ptemp df is gonna be that information on the function don't really need to do that the x3 and y3 x3 train and so on this stuff pre-process output i really should call that call that three because that's a little bit weird without that okay pre-process three and pre-process output three should definitely be defined here and i'll be confused if that one doesn't work yes that's totally fine okay so sorry about that and now let's make a new model now it's gonna be basically the exact same things above so i'm very happy copy and pasting because it's not too much effort the input layer is going to be again the number of the uh the number of the window so that the window i change that to 7 to match that 7 that we were using and the the value here that is going to be the number of variables which is 6 as is correct as we can see from this 6 here okay um so we have all of that and yeah that should make a model because we changed this uh or off screen and change this to two so that's saying okay the first spot is going to be safer pressure and the second spot is going to be for the temperature oh okay so much talking and getting really tired but um there those are that's what we're gonna be doing we're going to copy and paste the checkpoint here cp5 is totally fine to match up with model five sorry i'd killed that model five there's our model cp5 i don't know why these things are never coming in um and then model.fit and luckily i can take a little bit of a break because i've been talking so much getting tired and that's gonna run with the x3 information right there as above and just to make sure that that's working yes that seems to be going down okay that is our model five and we can see we have a very tiny validation loss now recall that um we did do this pre-processing on the output which makes them those values a lot smaller and therefore they make a lot smaller loss as well which doesn't that doesn't mean anything good or bad it just means that the values are smaller and the thing that we care about is that whether it's going up or down and the validation loss you know it starts out zero point four zero point zero four zero one zero one two one zero zero eight one it's clearly going down and down down like the other ones were and so we look pretty uh i think we're pretty happy with that okay now i want to make a function that's going to plot pretty much all of this stuff at once and i do not want to write it out with you because i think you're going to look at it and i know we'll explain what it's doing so here you go i'm just going to copy and paste this function in here which is i'll explain what it does so very similar to our other function earlier for plotting the results it takes a model uh some input some output and some starting index and an end index okay so we got the predictions like that and that is going to be the shape is going to be this matching like the the y here it's going to be the first column is going to be all of the temperature or all of the pressure predictions and the second column is going to be all of the temperature predictions so we get the predictions and then so pressure predictions is just the first column temperature projections is the second column and the actuals which since we passed and y uh the prediction the pressure for the actuals is first column temperature actuals are the second column and then we're just going to make all of this into the data frame and we're going to plot all of this all at the same time and then i'm also going to return the data frame for um just start to end and that's i'll show you why we choose to do that so if i if i do this plot predictions and i called that with um sorry i did that too quickly model 5 and x3 test and y3 test so the input and the output there we can see these numbers are very very close so for temperature they're very very the predictions are very very similar which is great for pressure they're actually very close as well so that's fantastic and here's just the first hundred results here and this also plots it for the first hundred we can see here the mapping is again i don't really know which one is the orange or blue or which one is the red and the green but the point is that um clearly one of the two is forecasting uh this stuff and the other one is forecasting this stuff and they're very close to each other so to obtain uh to show the difference between temperature and pressure temperature seems to go from 0.4 at the beginning and then 0.64 so 0.4 and then 0.64 so this is the temperature stuff and our predictions are very close and pressure there from negative point seven six to a negative point three seven negative point seven at the negative point three seven they are their approach and predictions are very close as well if you want to edit this to make again change get a better function for uh graphing it then absolutely by all means go ahead and do that okay so that is uh that is our results and they are very promising but these are these are not numbers that we actually care about because we need to be able to undo this mapping we can of course undo the mapping for the actuals because we have the actuals but for the predictions you know this is something where our model said this is my guess and we have to be able to completely undo this with some with some function mapping and so we can do this by uh what's really called the inverse of or the opposite of what our preprocessing is it's kind of interesting what we're going to call post processing post process this is just going to be for the temperature and i'm just calling this an array because it's uh it takes in really you'll see how it works it doesn't really matter what the shape of this thing is because it's just going to apply a function uh to that thing regardless of what it is um but so what i mean to say is it's a scalar operation it's going to do it to every single element in the array so i can just say array is equal to array times and i'll explain this after temp training attempt training mean three and then so we multiply that and then we add temp um sorry that's not that first one that should be standard deviation std3 plus temp temp training mean three okay and then that the reason for that and we'll return array so note that this doesn't actually change this thing this this returns back a different one um so the reason before we had to divide and so we change that we do that first to do a multiplication and then after that we do an add this happens to be the the exact inverse of that standardization function it gets back our prediction and so i can write the exact same thing but for for pressure and that's just going to be like this close process p of r our array is multiply the array by that p training standard deviation and that training mean as well so we have those post-processed and then really this uh this exact function i'll just show you how it's made so we can follow this this exact function we have before for plotting our results but we can do the return the results except with it post-processed so we can put um we need to do this one this is for the pressure so we can do this post process post-process p the the pressure value there we undo that i'm going to copy that i'll be writing that a lot post process uh this is going to be temp like that okay so we call that function there and post process p of the first one there and then post process p of the temperature as well like sorry that's not right um temp okay so that's doing exactly the same thing and i'm sure you don't have to write that much code that's a better way to do this but that's just going to return it post processed so that if we were to call it um and actually let me just uh this returns the data frame and yeah actually what we might play around with this shortly but we'll call it like this post-processed processed df is equal to uh plot predictions oh i remember why yeah we're going to be switching this up plot predictions 2 and this is going to take in model 5 x3 x3 test and y3 test okay and then post i'm actually sorry yeah it's fine if this returns to df so that's why i'm kind of just grabbing the df um and there's a bigger reason for why this i'm doing this shortly so this thing and it plots this you can't even tell that there's four curves here it looks like there's just two actually maybe you can see it a tiny little bit that there's some blue down there um but yeah so they're so close and when we scale it like this to what they actually are they end up to be so so close to each other and so because of this we really can't graph them in the same plot at least i don't know a nice pretty way to do that off by hand so we're going to just split them off really and i'm going to do this quickly for you because i don't think it's too relevant basically if we did this is how we would do this since we grabbed post processed df here i can do uh say start end equal to 0 100 we need to take this out of the function because we're not doing it in the function anymore just with the df temperature predictions and actuals from start to end we could plot those and there is that so that's just for pressure and we can sorry that's just for temperature and we can do this just for the pressure as well with plotting the pressure and it's nice how it is it knows this really crazy jagged jaggedy pattern which is incredible so that is mostly the end here and it's really the end if you want to be but i encourage you to stay for another just a few minutes where i'm going to show you some interesting things about time series and lstms so with the lstm model if we if we take the exact same model as before let me just grab model 5 model five where are you so much code here model five okay uh there's something interesting we can do to increase the complexity of the model if we wanted to and you'll see if people do this a lot why can i not i never have these code blocks i don't know why so here we have model five but also quickly switch it to model six i'm not actually going to fully train it i'm just going to train it up into the point that shows you it works here we have lstm 64. if you wanted to actually do this weird kind of chain it makes this kind of like a square model almost basically you pass things upward and then upward and then upward rather than just kind of recurrent through this thing and i know it's kind of hard to picture if you don't if you're not super knowledgeable about this stuff but basically and just to do it in code we can do model six dot add another lstm and we can do this say 32 of them it actually didn't really matter the value too much return sequences equals true and this makes sense uh roughly because you know that this lstm it takes a sequence it takes a time series and so if we feed it in if it says hey return sequences and then feed it into this lstm that's basically passing this this already processed and learnable lstm here passing that into this next one here which is very interesting uh that model would work i'm not going to bother actually sure i will quickly just grab those actually i'll grab it from my other notebook so that i don't have to bug you with this um so here just model 6 checkpoint would be exactly the same thing and that's totally fine and no it's not model 6 is not defined i didn't click that now it's fine yes there's the model so 7 by 32 now and 64. okay very interesting more parameters for sure a lot more parameters it's a heavy model 30 000 is definitely approaching a big model model 6 is defined now there you go there it is we're going to fit that just to the point to show you that it's going to work i don't really care about the accuracy but i'm telling you it's probably going to be pretty darn high might overfit a little bit that's quite possible but it's going to work and i'm not going to let you see that absolutely go for it if you want to check out my notebook it is in the description like always and model 7 is something else which i'm just going to paste here for you is takes in seven by six and then conf 1d 64 kernel size of two let me show you something okay if we go back up to sorry all the way up into our other convolutional model convolutional model is here let me paste them side by side to show you the difference remember that why we were doing this is because the window was five and we were just using one variable well here we just specify the input you always got to do that anyway the window is going to be 7 now with 6 variables and then conf 64. i guess i should also do an activation equals relu there i don't know why i didn't have that earlier activation equals relu is pretty common and yeah so it is exactly the same thing like exactly the same you just specify that the input is is going to look a little bit different but it doesn't change the convolution part this works totally fine and what it does is kind of traces this like because we have you can picture basically it works on this matrix of input so we're going to be sliding or each row is going to be a particular variable pretty much it's a little bit tough to picture that way so actually i wouldn't worry about it too much basically you're sliding this window across multiple variables now and um and this works as well and that could be a lot lighter model and as we'll see i'll just let it do that also we also changed the output but that didn't have to so here uh i set only 4 000 parameters and it's totally happy with that and again just to prove that it's happy with that i will quickly grab the compile code cp7 is i can put that right here is actually fine and fit that model and it will work it will be very happy with that okay so pretty fantastic um yeah excellent job if you did get this far i'm guaranteeing most people didn't get this far it's a pretty tough uh heavy duty heavy duty video um but i hope you learned a lot along the way and drop a thumbs up if you did get this far please subscribe if you're not i assume you are if you've got this far but if you're not then definitely want to subscribe and yeah we're probably going to make one more time series video at some point um there's some other interesting stuff that we could look at like about forecasting more into the future all at once there's more to it for sure and i'll definitely at some point make a video for uh forecasting the the stocks stock market is a very common one but temperature is pretty interesting too and it's it teaches the nice lstm idea which is most important for for jobs uh job job market is just understanding how this forecasting input window is working and how you can make these models this teaches that pretty well so yeah like subscribe and uh yeah i'll see in the next one
Info
Channel: Greg Hogg
Views: 108,365
Rating: undefined out of 5
Keywords: LSTM, Time Series, Time Series Forecasting, Time-series forecasting, Time series using deep learning, cnn, 1dcnn, 1d cnn, time series prediction, time-series prediction
Id: kGdbPnMCdOg
Channel Id: undefined
Length: 68min 14sec (4094 seconds)
Published: Fri Oct 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.