Two Effective Algorithms for Time Series Forecasting

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone thanks for coming so what I'm going to talk about today is two algorithms we actually use the introduction to do time series forecasting and the first one that I'm going to talk about is forecasting with the fast Fourier a chance formation and it sounds a mouthful but what I was trying to come I want to convince you is if you just started in this business if you don't know much about this and this is a really good algorithm start with in fact when I started my forecasting and project I knew very little about forecasting after I knew none and when I'm when my boss asked me to do this kind of forecasting I was panicking and so I've got to find in two weeks something simple something easily understand Bo and something easily implement and this is what we each chose as our first iteration of our forecasting pipeline so let's dive into that and the key idea the key idea is we have a time source data let's look at time series as a single variable right there's a time as your variable and there's a values then the key idea is how do we decompose that if we can become D composed of time series into something simple then we can forecast easily before I go into details here's I promise you the only equation that I have which is a sine function and just as a refresher so sine function is it has an amplitude and it has a phase and also it has a frequency or period that's it so why does it matter well because some great and some great person puree proved this out theorem a reasonably continuous and periodic function can be expressed as a sum of a number of sine functions and that's the key to the algorithm that we use let's look at an example so if you look at this time series data and the x-axis is time the y-axis is a value can you predict what's going to happen I'd later it looks like it's a pretty irregular time series right but is it but if you look at it if you decompose that into a series society functions this function turns out to be regular so this is the first one that has the largest amplitude and smallest period then you can decompose into the second one third one fourth one when you combine all of them just just by summation then you will get the original time series now every single sign functioning is very regular it's very periodic if you can apply a forecasting which is quite trivial I high school math and then we combine your forecasting you get your forecasting results so that's the idea behind it I'll give you a reverse the example now I have to assign functions that I take from the previous example when they sum them up I get the approximation of the original series and you can see that this one is almost similar to the original time series with much smoother curve right now if you apply more and more sine functions you get more and more accurate approximation okay now we're adding example in mind I can claim that f of T is actually simple and the algorithm is you run f of T decomposition on your input data and then you filter out no amplitude or high frequency components so if you have some data that has very low high frequency and very low F into that means it's most likely noise I guess it happens very frequently and irregularly and once you have this decomposition you have a bunch of cyanide functions you pick the first few that are most significant and you apply for casting on that basically you move your face forward I get you're forecasting recombine them you get your results as simple as that here's the final example the black curve is the original example the blue curve is the smooth and the forecast results you can see that they are almost close to each other as long as the original data has certain periodicity but what if there's an outage we live in an imperfect world when there's an outage then our matrix has this huge drop as highlighted box and all of sudden our forecasting will be off so if you look at this boxes you can see that on the peak of the next q2 tops then you see there's a big chunk of errors and then but there's a way to fix that automatically the solution is we iteratively apply this differences of input so we can adjust input based on the output of our algorithms let me illustrate if you look at these two boxes this is where there is a huge difference and on the very bottom there is a red line that's the error so you can see whenever there is a huge difference between your forecasting result and the original curve then the error will spike all right so what if you can subtract this spike the error from your original input and then reapply your algorithm what would you get what you do get is you get closer in the closer out and results you get your results that become that's becoming more and more accurate over time until you hit certain threshold then you can stop and when you combine these two ideas together the decomposition and this iteratively approximation algorithm then we get our first algorithm in production to predict really simple time sirs then where should we use F of T one is when there is periodicity right and the right hand side we divide our cities into large areas each area has enough enough number of trips each area has enough number of writes and drivers supply and demand so and so forth and when we can aggregate data into certain quantity than we get right-hand side which is a periodic function now we can apply this algorithm to achieve certain level of accuracy oh and we you need a great job there are two two advantages of this algorithm one is it's really simple to implement fast Fourier transformation is a standard algorithms algorithm that's been studied intensively by so many people for so many years so you pretty much again get it get a prepackaged libraries for this kind of algorithms in any kind of languages and then you can just use them it's also really recently fast to run and also it's parallelizable again our previous case in this case we develop our cities into multiple regions and when we multiply that by 6 plus hundred cities then we get thousands and thousands of time series to forecast but each time service is independent of each other so we can easily and have a distribution a parallelizable environment and have each course to only handle a subset of time series without interfering with each other and also I want to emphasize that decomposition is really powerful I'll give you one another example that's not for FFT but for a different decomposition it's called STL so again this is the original one I take from a reference that you can check out later you can decompose that into periods and then the trend and the noise now once you have this you can predict with the trend and then you can predict the way the European and the periodic functions and then you can either decide to ignore the and noise or you can just simulate notice for the future and recombine the results so you get your forecasting results actually that's the idea behind the non-seasonal array methods a ramos stands for auto regressive integrated moving average it's a very popular traditional timeseriesforecasting algorithm it's also based on this essential idea decomposition but we decided to move on so there's got to be a bottleneck right the bottom at what the real bottleneck is it's really not easy to incorporate new signals for example what if I want to incorporate the weather what if there is a huge game in the city that will jack up the supply the demand I did this kind of things what if I want to have a not so pure article but still regular kind of patterns that I want to decode and I want to incorporate in that case we need more our methods which brings to the next and algorithms we tried forecasting with deep learning now deep learning is a huge topic so I'm going only to talk about intuition let's look at an example here is a time series data it's some kind of time series for our number of queries he looks regular enough but how can we deal with it first of all it even though it's continuous it's not this is the data we collected by nature all the time series data we collect is not continuous it's continuous only because we perform interpolation if we don't do any interpolation it's just number of discretize the data points which brings to the key idea time services are actually sequences but what does that matter because once we can discretize a time series into sequences then we can apply a very powerful technique called sequence to sequence it was first published back in 2014 by mer by Google to to solve the machine learning machine translation problems but it turned out the sequence to sequence technique is very good at modeling timeseriesforecasting as well I'll start with an example let's say we have time we discretize them into C time series right and we have a the time time axis on the bottom now we have a time series like this so if you look at this and then we can support if you look at this picture it's unfolded a structure of a neural network by a photo that I mean for every single input we have a start in 1 input 2 so and so forth each one is a data point for given time stamp right and each one will be processed by a newer cell and it generates some kind of hidden states now with this we can perform forecasting per input and then the forecasting itself can be an input to the next forecasting so now we can combine some kind of history and some kind of sequence in with this neural network to forecast something but it's not done yet the thematic of this or the most the power of this approaches not only can we an input time source data we can also encode a lot of different signals again this particular case the time of week because if we have certain periodicity most likely for the same time of the same week will get similar results right what if we can encode this signal into our forecasting pipeline and the same thing and now yet we also have weather data and weather data contains temperature humidity precipitation wind weather types some of them may be just may be relevant to our forecast some of them don't but the key is we want to enter everything so that we can train our model to assign certain weights to signify the most impactful factors in the weather and the beautiful part is the weather itself is just a vector which is naturally the input of our newer machine then another key pieces what about our recent context usually my Master's data is not independent does not have just independent data points right the data points are dependent with each other especially along the time axis that's why if you read the literature there is a key concept called Auto regression meaning that the current the value of the current time may depend on the value of the past times and you can extend the past however long you like now what if how can we encode this kind of information into our algorithms well some very smart people have already come up with a really really elegant solution it's car and essentially there are two components they're both neural networks but the first piece as highlighted by the box is is used to encode just the past when you we go back m time M can be any unit for example M months in our particular case for our Nong range in turn forecasting we use ours so it could be M hours if it's week then it's 168 hours times one week or however a number of weeks right so we can use the historic data the most recent one use this machine at this particular plant component to train them to get certain weights and then and then use the result as the input to the original neural network noon work that we discussed and then when we combine this we get a much more accurate result this is the idea called encoder decoder architecture just one minute and that's it so the summary if you have to have two takeaways one is decomposition is a really powerful tool in timeseriesforecasting use it the other one is timeseriesforecasting can be modeled as a sequence to sequence problem thank you [Applause]
Info
Channel: InfoQ
Views: 273,485
Rating: 4.9122171 out of 5
Keywords: Forecasting, Algorithms, Time Series Forecasting, Uber, Fast Fourier Transformation, Artificial Intelligence, Machine Learning, Neural Network, Deep Learning, QCon, QCon.ai, InfoQ
Id: VYpAodcdFfA
Channel Id: undefined
Length: 14min 19sec (859 seconds)
Published: Fri May 11 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.