Time Series Forecasting with Machine Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone today i wanted to talk about a concept that data scientists tend to tackle in their day jobs time series analysis now i don't talk too much about my personal or even my professional life outside of youtube but now that i've worked as a full-time associate data scientist for about a year i think i can spice up my content with some experience this could be useful to you as a lot of you as i've seen that are not just ml curious you also want to get a full-time job in the field and some of you from at least the comments that i read are already established professionals but regardless always excited to talk to new people and everyone is welcome equally looking at resources across the internet that talk about time series problems they almost always take a traditional approach eyes closed through all the data into some arima model and some magic happens unfortunately these models can be very difficult to tune if you aren't an expert but luckily you really don't need to be an expert with dissecting time series concepts specifically to get usable results from a time series model in fact you can take some machine learning approach instead and also get results that are just as good in this video we are going to walk through the typical flow to solve time series problems we're going to see what different approaches we can take to solve such time series problems and also highlight the differences between traditional approaches and machine learning approaches so you know what techniques to use and also when to use them but before we continue this video is sponsored partially by kite they provide a code completion service for machine learning code it integrates super well with your editors and even jupiter notebooks so click the link in the description to try kite for free now back to the video let's first define a concrete problem where time series is useful so that's step one think about your grandma she started this laptop repair line two years ago and it's a hit the way it works is a customer places an order request online the customer then ships the broken laptop to grandma then she and her workers fix them and the laptops are sent back the problem here is her workers are paid by the hour if there are more laptops to repair grandma calls in more workers but as you can imagine it's hard to know how many workers we need without knowing the number of laptops that we get per day now grandma hired you as a data scientist and you think what could be useful to know for grandma is how many laptops are we going to receive tomorrow this way we can call the required number of workers to come in tomorrow so now that we have this defined problem i think it's easier to move forward so step two what data do we have for now let's say that we store every work order in an orders table when a customer makes an order request online a row in the orders table is added this table has information like order id the price and the timestamp when the order was made for simplicity let's say that there are no log errors no missing values and no sparse data so step three what is the data telling us well from here we do some exploratory data analysis or eda and come up with a few approaches the natural structure of this problem is a time series problem from the orders table we can aggregate the data at the daily level to get a time series of number of order requests per day when doing our exploratory analysis i'd like to understand the bread and butter of the time series that is trend and seasonality so first of all does this data exhibit seasonality for this case let's say that we see some weekly seasonality in the data basically we are hit relatively hard on mondays low on thursdays and so on so weekly seasonality exists now question two do we see trend in this data over time we'd probably want to know if the orders have been increasing or decreasing in volume we also want to explain the trend changes that we see like in the middle of july we see a huge increase in the trend of sales and this probably happens because we launched a promotional event that month and we see another trend change in october and mostly because we changed our payment strategy now that we have like a basic understanding of our data how do we make predictions if we think of this as a traditional time series forecasting problem there are several approaches like arima profit neural profit and vector auto regression let's briefly talk about these so to make a prediction in arima we need to identify the trend and seasonality components and transform our data accordingly this kyle notebook here does a good job in following the step-by-step procedure to use arima but all of this processing and testing is only good for a basic toy data set here it might be a little tough to get these predictions right and tuned for more complex problems especially if you aren't an expert with time series in general as an alternative we could use facebook's profit model profit handles missing data better and it can take data with seasonality and trends and it produces stellar results that even rival a tuned arima and its other flavors like seasonal arima we can get even better results using neural profit this increases the forecasting accuracy of the profit model by using a neural network scrolling down here i can see some main features that are added to this model but a big disadvantage of profit models at least from what i've seen is that we cannot add regressors for which we don't have future values okay so um let me explain this with grandma's laptop company we want to know how many laptops will we get tomorrow to make this prediction we use the number of laptops we got yesterday two days ago three days ago and so on aside from that we can pass in a day of week predictor for you know some seasonality and we can pass this in because for tomorrow we already know what the day will be even though the day hasn't happened yet but you know what else could be useful in predicting the number of laptops tomorrow it would be something like the number of orders that were placed online over the past few days laptops come in about three to four days after the orders are placed online so using order information could come in quite handy to determine the number of laptops you'll receive however we cannot add this order information directly to the profit model since we cannot feed it a predictor for which we don't have the future value i can't tell you the number of orders that happen tomorrow since it hasn't happened yet in fact the number of orders per day forms another time series on its own too profit and arima models they fall into the category of univariate models we forecast and we only deal with a single time series but with multivariate time series models we can deal with and even forecast the output of multiple time series an example of this is vector auto-regression models the input could be the past inbound volume time series the past order time series and some day of week predictors and the output could be the forecasted inbound volume and the forecasted order volume the main downside here for vector auto regression is that we might need a lot more quality data to come up with reasonable predictions as opposed to the univariate approaches since we're forecasting multiple series here but depending on your situation and your data one or the other may be useful now all these methods that we've discussed so far are approaches where we treat the problem as a time series problem but we could very well convert this to a traditional machine learning regression problem so in grandma's laptop example if we're making prediction today the features could be something like how many laptops did we get this day last week what was the standard deviation of the inbound volume over the last week what was the number of orders that have been placed but haven't been fulfilled yet can we also add in day of weak predictors to account for the weekly seasonality and almost any other predictor and anything else that you can think of the label is what we want to predict and in this case it is the inbound volume for tomorrow and we'd want to frame our training data in this way a set of x's and y's so what kind of models are we talking about here let's paint this picture in a hierarchy in the form of a chart all the models for which we can do time series modeling can be classified as traditional time series models and machine learning models time series models can be further classified as univariate and multivariate depending on the number of values we are predicting univariate time series models which predict the output of one time series they can be arima models they could be the surima or that's seasonal arima models which is basically an arima model that takes into account seasonality but they can also be like profit models which is facebook solution to time series forecasting that you can use without being an expert in time series analysis then we also have neural profit which is the neural network version of profit and then we have also multivariate time series models so we can forecast multiple time series an example of this would be the vector auto regression models that i talked about and this along with the other models actually have good implementations in the python's stats models library so if you're looking to code this out check out stats models and now on the machine learning front we can basically use any type of regression model to do the time series job so it could be a neural net regressor where these are neural networks with one output neuron that determines the label of your regression we can even use something like cat boost regressors this is pretty cool because it allows for better feature engineering like random force and scikit-learn it also gives really cool diagrammatic representations of feature importance it's really easy to check if there's overfitting and but unlike you know random forest decision trees it's actually in better model altogether based off of the gradient boosted decision trees actually so it's totally worth checking out cat boost and honestly any other regression model could work here too now that we've outlined some examples let's look at the core differences between traditional time series models and machine learning models for time series forecasting traditional time series forecasting is recursive in grandma's company we thought making predictions for tomorrow would be enough but it looks like the workers need more of a heads up than just a single day so she now wants to know the number of laptops that we will receive three days from now now to determine the inbound volume three days from now the traditional time series way would be that we determine the inbound volume one day from now use that to determine the two day out prediction and then use this to determine the three day out prediction the machine learning approach though we can forecast this directly so we would directly know the three day out forecast if our model is trained to do so so here's the second difference time series models are easily extendable so now in grandma's warehouse we also need to determine the long-term space arrangements in the warehouse but to do this we need to know the inbound volume 10 days in advance well that's okay with our traditional time series model because we just need to keep recursively making predictions until we get the 10 day out volume no change in training data no change in the model now if we want to do that for a machine learning model though we need to modify our training data we need to train the model to predict 10 days out too in addition to the three days out model and this could scale the training data linearly as we have more horizons to predict and the third difference well with traditional time series approaches they can be pretty tough to get right unless you're an expert with time series models while the machine learning approach is a lot more tractable for people who don't know much about time series forecasting although i will say the profit models are an exception here now a fourth difference so time series models at least the unit variant ones we can't add regressors for which we don't know the future values while the machine learning models we can add these regressors that allows us to better fine-tune the models so clearly each has their advantages and disadvantages and depending on the problem you're solving the data that you have and the hardware capacity one of these solutions may be more preferable than the other hope this gives you a better intuition on different approaches to time series forecasting you don't need to be an expert i have a year in the field and i'm just getting my feet wet everything in this video is my experience dealing with time series-esque problems so let me know in the comments below what your experiences are with time series forecasting would love to listen to them i'll see you in the next one take care till then bye bye
Info
Channel: CodeEmporium
Views: 20,705
Rating: 4.8907437 out of 5
Keywords: Machine Learning, Deep Learning, Data Science, Artificial Intelligence, Neural Network
Id: _ZQ-lQrK9Rg
Channel Id: undefined
Length: 13min 52sec (832 seconds)
Published: Tue Jan 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.