Python 🐍 Autoregressive Forecasting 📈 Model | Step-By-Step tutorial ‼️ | Simplified code 🤩 #python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone it's Alex today we'll look at a real life operational use case scenario of using python to create a simple forecast of sales quantities for 13 weeks beyond the existing 52-week Horizon for this we will use a univariate Excel data set containing weekly entries representing quantities sold we will apply simple preprocessing exploratory data analysis detection of skewness and stationarity as well as model training evaluation and finally python Auto regressive forecasting method as always please expect to find the link to the source code on my GitHub page in the description below now without further Ado let us jump straight into the code as always We Begin by importing the necessary libraries we will need the following pandas for handling data in general netplotlib to create visualizations and numpy for mathematical operations then from stats models we will import AR model to enable us photoregressive modeling in general from SQL and metrics we will import the mean absolute error squared error and r squared score all of which are commonly used in regression analysis to assess the performance of machine learning model next we will also need the add follow function from stats models.stat tools for the ADF test itself then we'll require the skew function for PSI Pi dot stats for computing skewness and finally the tabulate function from the tabulate library for tablet format the code Begins by reading the file path specified to an Excel file containing sales quantities data and loads it into a pandas data frame called sales data while the Excel date format is suitable for excel's internal calculations and formatting it is not directly compatible with pandas full spectrum of data and functionality therefore converting the Excel date native values to Panda specific data and format ensures consistent data types X tense the range and resolution of dates and enables the use of pandas powerful daytime operations and functions for time series analysis in this step of the code we set the frequency of the data frame index too weekly in our specific example starting on Mondays by setting the index to a weekly frequency we establish a consistent time interval for our data this ensures that all observations in the data set align with the same regular time intervals allowing for consistent time based calculations and operations weekly frequency is a very common level of temporal aggregation for many business and economic time series related data sources it allows to capture weekly patterns Trends and seasonality in data when analyzing things like sales customer behavior inventory analysis or any other time dependent variables that might exhibit weekly path sentence and or fluctuations this section of the code we create a histogram plot of the sales column using matplotlib Library the resulting plot shows the distribution of sales data values helping visually assess skewness and symmetry of the data distribution prior to any further analysis foreign who further validate our initial assessment of the skewness of the sales quantity values we move on to calculating the skewness of the sales column using the skew function from the Sci-Fi stats model the code checks the computed skewness value and print a corresponding message regarding the degree of symmetry and or skewness of the data complementing our initial visual assessment this section of the code performs the augmented Tiki Fuller test for stationarity on the sales column using the add follow function from the statsmodels.tsa.star tools module the ADF test helps determine if a Time series is stationary or not our code then print the result of the ADF test including the ADF statistic p-value number of observations and critical values the results are displayed in a formatted table using the tabulate function from the tabular module next the code interprets the ADF result by comparing the ADF statistic with the critical value at a 5 significance level based on the result of the comparison a corresponding message is then printed regarding the stationarity or non-stationarity as the case might be of the data these lines of the code split the sales data into train and test sets to prepare for model training and evaluation the first 40 weeks of data are assigned to the train sales variable while the remaining data from week 40 onwards are assigned to the test sales variable this part of the code iterates over a range of lag values and fits an auto aggressive model using the auto reg class from the statsmodels.tsa.ar model module the loop calculates the prediction performance metrics such as mean absolute error mean squared error root mean squared error and r squared for each lag value and selects the lag value that minimizes a combined score the best log value found is then stored in the variable called best underscore logs the code then prints the auto detected optimal parameter of lags for the given data set this value represents the number of log terms used as predictors in the AR model foreign in this step the AR model is fitted using the optimal logs value determined in the previous step the model is then fit to the trained data using the fit method predictions are made on the test data by calling the predict method on the fitted model next metrics that measure the accuracy and goodness of fit of the model's predictions compared to the actual test data are computed to evaluate the model's performance these then are printed in tabulated form using the tabulate function in this section the fitted AR model is used to forecast cells for a future period beyond the existing data Horizon of 52 weeks the predict method is called on the fitted model specifying the start and end in this of the forecast period final section of the code creates a figure with two sub nodes to visualize the actual sales quantities train and test data sets and forecasted sales quantities the first of the sub ports displays the existing sales data train and test data sets and predicted sales quantities against the test data to provide a visual breakdown of elements of the model the second sample however shows the existing sales data along with the forecasted sales quantities only in everyday scenario the plot 2 is the one that would be shared with stakeholders and or used in the presentation as always we use various formatting options such as access labels titles Legend and gridlines to enhance the Clarity and interpretability of the plots overall our code performed data pre-processing exploratory data analysis stationarity testing model training evaluation and finally forecasting it provided insights into the data's distribution skewness stationarity model performance and finally visualization of the sales data and forecasted values foreign in this subset of the code we create a data frame called forecasted underscore 13 underscore weeks which will contain forecasted sales values by our AR model captured in the forecast underscore sales array the forecast underscore sales array is rounded up to the nearest whole number using np.co function and it is assigned to the sales column of the data frame next the 4i loop in a list comprehension format adds a date column to the forecasted underscore 13 underscore weeks Daydream label in each row with a weak number from 1 up to the length of the forecasted sales value where F dot width is short for forecasted within the subsequence that the pd.comcat function concatenates the sales data data frame and the selected columns date and sales from the the forecasted underscore 13 underscore weeks data frame along the vertical axis this operation appends the forecasted values to the end of the source sales data data frame finally function dot 2XL saves the updated sales data data frame to the Excel file specified by Source the index equals false argument and shows that the index column is not saved to the file finally having successfully implemented the last subset of the code let us check the content of the source file for the presence of existing 52 weeks worth of sales quantity values together with the appended forecast 13 weeks of sales values at the end it would have worked
Info
Channel: Alex.J "JAX"
Views: 1,564
Rating: undefined out of 5
Keywords: #python, #forecasting, #tutorial, #analysis, #data, #analyst, #sales, #dataanalysis, #dataanalyst
Id: U1RmZKLJeLo
Channel Id: undefined
Length: 19min 0sec (1140 seconds)
Published: Wed Jun 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.