Mod-02 Lec-03 Forecasting -- Linear Models, Regression, Holt's , seasonality

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

We continue our discussion on Forecasting Models. In the last lecture, we developed forecasting models for this data. This data we assumed represented a constant model and we looked at forecasting models for this data, we looked at simple average, weighted average, moving averages and simple exponential smoothing. And all of them, we used to test on this model and in this model; we assumed that this data represents a constant and we were, trying to find out that constant. The fact that each of these data does not represent the constant comes from our belief that there is an inherent variation or noise or something, which cannot be assigned to a specific cause or a specific reason. Therefore, we assumed that this represents the constant and we tried to fit a model that gives us that constant. Now today, we will look at a different type of data and we will move to this data. Now, if we look at this data and if we ask anyone to forecast this data, we will end up giving a forecast, which is greater than 35, because as we move along these numbers, we realise that there is an increasing trend. So, we use the term trend to show, a certain behaviour and in this case, there is an increase, now we will not and should not fit a constant model, for this data, for example, if we took this simple average, then the simple average the total of these 6 numbers, happens to be 181 and the simple average will be 30.16. Now, simple average 30.16 is always within the range of values and therefore, will be somewhere here while a meaningful forecast would be a number, which is more than this 35. So, we do not use the earlier models directly, we do not use models like simple average, moving average, simple exponential smoothing etcetera directly, to give the forecast for this. So, we have to now build different forecasting models that are capable of capturing the trend, which this data is exhibiting. So, let us first look at very a simple model that captures trend and then we will look at a more advanced model, that can capture trend now before, we move to any model let us try and plot these with respect to time. So, let us call this as time equal to 1, time equal to 2, time equal to 3 4 5 and 6 let us plot this. The time on the x axis and the data on the y axis, so let us plot here time and then here, we have D which is the demand or sale for this particular product. So, at time equal to 1 2 3 4 5 6 and let us have the values, we start here from 24 26 28 30 32 34 36 and so on. So, at t equal to 1, the value is 26 which is here, t equal to 2, the value is 28 which is here, t equal to 3, the value is 29 which is here, t equal to 4 the value is 31 which is somewhere here, t equal to 5, the value is 32 which is somewhere here and t equal to 6, the value is 35 which is somewhere here. So, this is how the 6 points are plotted, when you plot them on this and one look at these 6 points kind of conveys to us that, there is an increasing trend and they in some sense, if we try to draw a line, the line will go like this. Now, then when we try and draw a line, we ask ourselves a couple of questions are we going to draw a line; obviously, all these six points are not collinear. So, are we going to draw a line, which is going to pass through, maximum number of points or are we going to draw a line, which kind of best represents these 6 points by being as close to each of these 6 points as possible. So, the answer is we do the second and we then try and fit a line or draw a line, which is as close to these 6 points as possible. So, let me first draw the line using a free hand and then explain a few things, let us assume that this line is going to be like this. Now, if we draw this line, then what we are going to do is now this point is represented by this point on the line, say this point is represented by this point, this is represented by this point, this is represented by this point, this is represented by this point and this is represented by this point. Now right now though, I have drawn this line, I do not know the equation corresponding to this line, because I have made a free hand sketch. So, I would like to now find the equation corresponding to this line, one of the ways to do that is by, if we consider a point like this point is going to be represented by this point on the line. And therefore, there is going to be some kind of a gap between the actual point and the point, which is there on this line and let us say, we call this gap as the error, we call this as the error. Now, if we want this line to be as close to these 6 points as possible, then we will have to draw the line in such a manner, that the errors are as small as possible. The moment we define that the errors have to be as small as possible, we also have to observe that in some situations, the error can be positive while in some other situation, the error can be negative in the other direction. So, we obviously, do not want these positives and negatives to cancel out each other, therefore a better way to look at it is to say that, we want to draw this line, in such a manner that the squares of these errors are minimized or essentially, we try to minimize the what is called the error sum of squares. Try to minimize the error sum of squares, which is essentially, we will try to minimize e square, where e is the error corresponding to each point. So, there will be 6 points here, there will be 6 error terms, now we want the sum of the squares of these 6 errors to be minimized or if this line is going to have coordinates a plus b t, where a and b are to be determined, then the error term will be Y minus a minus b t is the error term. Error term specific to a particular point say the t-th point can be written as e t is equal to Y t minus a minus b t, now sum of squares of these errors will become sigma e t square is equal to sigma Y t minus a minus b t square. Now, we want to minimize the error sum of square, so we minimize error sum of squares, which gives us minimizing Y t minus a minus b t the whole square, now we want to find out the a and b, which represents b represents the slope and a represents the Y ordinate or the value on the Y axis for t equal to 0. So, a and b have to be found out, such that this error sum of squares is minimized, now let us try and derive a very simple expression to find out a and b. So, we want to minimize Y minus a minus b t square here, I have represented it as Y t minus a minus b t here, I am writing it as Y minus a minus b t, we could also write it as Y t minus a minus b t. Now, this has to be minimized and a and b have to be found out, so the condition for minimization is the first derivative equal to 0 with respect to the variables. So, we try to partially differentiate this with respect to a and then with respect to b and then try and get the values. So, partially differentiating with respect to a we would get 2 times Y minus a minus b t with a minus sign summation, so summation equal to 0. And partially differentiating with respect to b we would get sigma 2 times Y minus a minus b t into minus t is equal to 0, observe that when you differentiate with respect to b, so it becomes 2 times Y minus a minus b t and then differentiating with respect to b. So, we get a minus t, so the minus is written here and the t is written here, now we expand this summation and we therefore, write 2 equations the minus becomes irrelevant here, because the right hand side is 0. So, we get an expression sigma Y is equal to n a plus b sigma t, this goes the 2 also goes, so sigma Y is here, you take it to the other side, so sigma a is n times a where n is a number of data points, so n a plus b sigma t b is a constant that can be taken outside, so b sigma t is the first equation. Second equation will be sigma Y t is equal to a sigma t plus b sigma t square, now multiply here, so Y t again this minus and this 2 goes, so you have Y t is equal to a sigma t and then plus b sigma t square comes, because this t is multiplied by the other t. So, you have 2 equations and 2 unknowns a and b, we can directly solve using this, now there are times it is also possible to simplify this to write unique expressions for a and b. But, right now, we are not going to do that we are going to keep these equations as they are and we are going to solve, now if we consider this data 26 28 29 31 32 35. So, we look at this data 26 28 29 31 32 35, this is your Y t or Y as you may call, so this is Y 1 Y 2 Y 3 Y 4 Y 5 and Y 6, we need sigma Y or sigma Y t is a sum of all of them, which is 181 14 23 24 26 31 181. So, this term we have written we need b sigma t, so t is 1 2 3 4 5 6, so this is 21 3 plus 3 6 plus 4 10 15 plus 6 21, so we have this term b sigma t then we need the term for sigma Y t. So, you need a Y t term, so Y into t so 26 into 1 is 26 28 into 2 is 56 29 into 3 is 87 31 into 4 is 124 32 into 5 is 160 35 into 6 is 210 and the sum is 12 19 23 2 3 9 11 19 24 26 4 5 6 23 2 9 19 24 26 2 3 4 5 663 and then we have now found out this term sigma Y t now we need to find out this term, which is sigma t square. So, we write t square as 1 4 9 16 25 36, so this total will be 5 plus 9 14 30 55 91 71 661 77 86 91, so now, these 2 simultaneous equations become sigma Y 181 is equal to n a 6 a plus b sigma t 21 b and 663 is equal to a sigma t 21 a plus 91 b. So, we have 2 equations with 2 unknowns and we need to solve these 2 for a and b, so the easiest thing to do is to multiply this by 2 and multiply this by 7, so that we can cancel the a and then get the value for the b and then we go and substitute to try and get the value for a. So, when we solve this, we get the solution a equal to 24.2667 and b is equal to 1.6857, this means that this line is going to intersect the Y axis at a value of 24.2667 and this line of best fit has a slope of 1.6857. But, then we are interested in finding a forecast for the seventh period, which actually led us to drawing a line and finding out the coordinates of that line. So, now, for the 7th period the forecast that we are going to make is the point corresponding to t equal to 7 on this line, which is this point is what we are interested has the forecast for the 7th period and that is given by a plus 7 b. So, f seven forecast for the 7th period is a plus 7 b, which is 24.2667 into 7 times 1.6857, which is 36.066. So, this is our forecast for the period number 7, and let us say this is going to come at 36.066, which is the forecast for period 7. So, the model that, we have just now seen is the linear regression model, linear regression model, which also gives us the line of best fit given these set of points, now the forecast for the 8th period can be given by F 8 is equal to a plus 8 b and so on. Now, this method is a very intuitive nice method to try and forecast, when the data shows a certain amount of trend, now let us go back and look at this method again, what have, we done or what are the assumptions behind this method of linear regression for time series data with trend, what are the assumptions. Now first thing we have done is we have considered all the data points, second is we have given equal weight age to all the data points, even though we squared the error and tried to minimize, we did not associate a weight with each of these. For example we did not do minimize w t into Y minus a minus b t the whole square, we did not do that, which means that we have given equal weight age to all the data points and third is we have tried to minimize the error sum of squares. Now, let us go back a little bit and try to see, whether we use the same assumptions, in any of the models for the constant data that, we have here. Now, for example if we make the same 3 assumptions on this, which means we fit a model, which uses all the data points, which fit a model, which gives same weight to all the data points and we try and fit model, which minimizes the error sum of squares. Except that, because this data is constant data, we fit simply Y equal to C and because this data shows trend, we fit Y is equal to a plus b t, now suppose we do this then we fit Y equal to C and then we minimize the error sum of squares. So, a error sum of squares will be sigma Y minus C the whole square and we want to find out C such that, Y minus C the whole square is minimized, then the answer for C will be simply sigma Y by n, which is the simple arithmetic mean. So, if we try to fit a model Y equal to C using linear regression and minimizing error sum of squares for a constant data, we get the simple arithmetic mean, when we make the same assumption that, we want to consider all the data points, equal weight age to all the data points. And minimizing error sum of squares considering trend and fitting a plus b t, we get this linear regression model, when we were looking at this set of data the constant data. We also said that even though, it is a good idea to use all the data points, we would rather give a little more weightage to the recent points on little less weightage to the older points and that thinking of ours let us towards moving averages and particularly towards exponential smoothing, which we saw in the previous lecture. So, if we are convinced that exponential smoothing is a very good model to do forecasting for this type of data then the obvious question is, can we have models based on exponential smoothing to do this data with trend? We also know that simple exponential smoothing, that was used here was very effective, but simple exponential smoothing, would only give us the forecast, which is within the range, just a simple average gives us a forecast, which is within the range. So, a simple exponential smoothing, obviously, we will not work here and simple exponential smoothing will actually lag behind the last value, it will give you a value closer to the mean, which means, it will lag behind the last value. So, we need a model, which uses ideas from exponential smoothing at the same time gives different weights to different data points with more weight age to the recent points and lesser weight age to the old points. We will now discuss the Holt’s model, which does exactly that, it considers all the data points, it uses ideas from exponential smoothing and it gives progressively more weights, towards recent points. In the Holt’s model, we fit F t plus 1 is equal to a t plus b t, now F t plus 1 is the forecast for period t plus 1, a t is called the level, which represents the smoothed value up to and including the last data, b t is the slope of the line that, we are fitting at the point t. So, a t is the representative value of the level or the constant and then we add a slope to it. So, that we get the forecast of the next period, so forecast of the next period will be level at the end of the previous period or the present period and then we add a slope to it, so a t plus b t. Now, both a t and b t are updated for every data point using exponential smoothing, so a t which is the level value up to an including the last point is given by alpha D t plus 1 minus alpha into a t 1 minus plus b t minus 1. Actually, this is our very familiar exponential smoothing equation the normal exponential smoothing equation has 2 parts, one is the demand part and the other is the forecast part alpha D t plus 1 minus alpha F t. Now, alpha D t is the same alpha is the weight D t is the demand during the last period 1 minus alpha I had used F t in the last lecture, but I had also mentioned that there are 2 versions of it. So, now, we will use a version alpha D t plus 1 minus alpha F t minus 1, so F t minus 1, which is the forecast made at the end of the period t minus 1 for period t. So, in some sense D t and F t minus 1 are comparable, F t minus 1 is the estimate of D t, in this notation F t minus 1 would mean that it is a forecast made at the end of period t minus 1 for t. So, a t minus 1 plus b t minus 1 represents, the forecast for period t, just as forecast per period t plus 1 is represented by a t plus b t. So, here the equation is like alpha D t plus 1 minus alpha F t minus 1 right, when alpha D t plus 1 minus alpha F t where F t is for a t minus 1 plus b t minus 1 forecast made for period t is a t minus 1 plus b t minus 1. Now, b t, which is the slope also has 2 components, one is in some sense the component that comes out of the level and the other is the component that comes out of the slope. So, beta is another exponential smoothing constant like alpha, beta into a t minus a t minus 1, if you take the present level value a t and take the previous level value a t minus 1, the difference between them in some sense represents a slope. So, that is 1 component of the beta the b t the other component comes from the actual slope that was computed. So, the b t also has 2 components, that are related through exponential smoothing, the a t also has 2 components that are related, through exponential smoothing. Now, let me explain this further, now when we did the Y equal to a plus b t through linear regression, now these were the 6 points, the white dots are the points and we tried to fit a line that pass through all of them and is as close to the 6 points as possible. In the Holt’s model, we would not do that, for example, we are here at the 6th point and we are interested in forecasting the 7th point. Now, this white thing is your D 6, the demand at the 6 point, which is a data that we know, the level value at t equal to 6 is the equivalent of this point, which is actually representing that. The level is obtained by through exponential smoothing considering, these things in a certain manner up to and including D 6, we have a level, which is here. So, once we know this level plus we also need the slope at this point, in the linear regression model, after we fit the line, the slope is the same at every point in the line, in the Holt’s model, we are going to kind of redefine the slope at every point up to and including the last point. So, we will do some calculations here to find the slope and therefore, a 6 plus b 6 will give us F 7, how do we find a 6 and b 6. Now a 6 and b 6 have 2 components, now a t, now if we look at this a t has a component alpha D t plus 1 minus alpha into a t minus 1 plus b t minus 1. So, a t has 2 components, one that comes out of this demand, the other that comes out of this is your a t minus 1 plus b t minus 1, this plus this. So, it is use this is computed using this one and this plus the slope here and they are related through an exponential smoothing equation. Similarly, the slope at this point also has 2 components one is the slope at t minus 1, the slope at this point and the other is essentially, the slope between this point and this point, which comes from a t minus a t minus 1. So, in some sense in the Holts’ model for every point t, we are actually trying to find out a level and a slope and we add them, so that we get the forecast for the next period then we know that forecast for the next period and then we know that demand point and keep iterating this till, we get to the final answer. So, let us explain the computations in the Holt’s model, for the same set of data 26 28 29 31 32 35, we have already seen. That we have get a forecast of 36.066 when we fit this line F 7 became 36.066, let us see what happens, when we fit using the Holt’s model. Now, the same data, we are going to use 26 28 29 31 32 35 like in all exponential smoothing, we need some initial values and we also need to fix the values of these smoothing constants. So, we fix alpha equal to 0.2 note that. note that, Alpha is the exponential smoothing constant, then we compute a t and beta is the exponential smoothing constant, when we compute the slope. So, alpha is 0.2 beta is 0.3 is what we have assumed in our calculations. We also need some initial values, so the very first value, which is 26, we assume D 1 to be 26; D 1 equal to to 26. So, we assume that the level the first point that, we calculate here. This is 26, so the white one, the white point represents D 1 and the other colour point represents a 1. So, right now we are going to assume that the level value for the first point is the same that is one of the assumptions. And we also need to initialise a slope here and in order to do that here, we actually try, we know these 6 points, so with a known 6 points the initial slope is the approximate slope of the line joining, the first point and the last point. So, the initial value of the slope is therefore, 35 minus 26, which is 9, 9 divided by 5, which is 1.8, so the initial values of a 1 and b 1 are 26 and 1.8, so from our equation for Holt’s model F 2 is equal to a 1 plus b 1, which is 27.8. Now, we need to find out a 2 by our previous equation a t is alpha D t plus 1 minus alpha a t minus 1 plus b t minus 1, so a 2 will be alpha D 2 plus 1 minus alpha D 1 plus 1 minus alpha into a 1 plus b 1. So, alpha D 2 plus 1 minus alpha into a 1 plus b 0.2 alpha is 0.2 D 2 is 28, comes from here plus 0.8 into a 1 plus b 1 is what we calculated, which is F 2. So, 0.8 into 27.8, which becomes 27.84, so when we apply the Holt’s model, this value is 28, the white 1 is 28, which is the demand point and this 1 is 27.84, when we apply the Holt’s model. So, we got 27.84 here the slope at this point, the slope at this point is now calculated the slope at this point was assumed to be 3, calculated to be 3, here it is now calculated using exponential smoothing. So, the slope at this point is now beta into a 2 minus a 1 plus 1 minus beta into b 1, so beta is 0.3 a 2 minus a 1 is the difference between the computed 27.84 and 26, which is 1.84 plus 1 minus beta is 0.7 into the previous slope, which is 1.8, so it now becomes 1.812 and F 3 is a 2 plus b 2, which is 29.652. Now, D 3 was 29 and F 3 is 29.652, now we need to find out F 4, we need a 3 plus b 3, now a 3 and b 3 have to be calculated using these. So, we can now progressively calculate the rest of the numbers and this table explains what these numbers are, so for period 4 D 4 is 31 a 4 is 31.2352, b 4 is 1.755 and this is 31.294. And if we move on, we find out finally, that F 7 is 36.32, so let me write this 36.32 here. This is using the Holt’s model, this is from the Holt’s model, now in the Holt’s model, if we look at this finally, a 6 is 34.592 b 6 is 1.7272, so a 6 is here, the representative value was 34.592, b 6 was 1.7272 and finally, we got 36.32. So, the representative value here was 34.592, the actual value is 35, the slope is about 1.8, but the slope has come down to 1.7272 and the final value is 36.32. Now, we observe that the final the forecast given by Holt’s model is slightly higher than the forecast that, we got using the linear regression. Now, it is not absolutely necessary that every time, the Holt’s model will be slightly higher than linear regression, but it is very likely, so one of the reasons is if we look at this data, the value keeps increasing, but then if you look at the last few numbers the increase is slightly higher than, what it was before. So, in when we fitted the line, we gave equal importance to all these points and therefore, the smaller increase, essentially pulled it back a little bit and we got 36.066. Now, because we are giving higher weightage the 35, which shows a larger increase is tending to pull it slightly towards this 35 and therefore, the value became slightly more than this. But, one of the advantages of Holt’s method is that, it captures idea of exponential smoothing, which means we are giving more weight, to the more recent point while doing the forecast and in some sense. We are actually calculating the level and the trend the forecast at every point, through an iterative process that involves exponential smoothing unlike linear regression, where we compute it in one shot by giving equal importance to all the points. So, in some sense what simple exponential smoothing did to this data, Holt’s method tries to do, to this type of data. So, this is how we normally forecast, when the data exhibits trend, we will now see one more aspect of forecasting. So, let us assume that we are looking at this kind of a data, where the year is divided into 4 quarters and the data is given for each quarter, for the 3 years, so if we take year 1, the quarterly data are 53 22 37 and 45. Now, when we start looking at this table, we observe that, there seems to be a trend in this direction, for all of them the total also seems to go up, but each quarter of the year is not showing constant kind of data, there is a variation in this, now this thing is called seasonality. So, each quarter represents a season and the data is showing a certain seasonality in this case, so we need forecasting models that capture seasonality, seasonality is important because several products exhibit seasonality, common examples, that are given are the sale of ice creams are higher in summer than in winter. The sale of air conditioners is higher during summer, than during winter, the sale of in some places the sale of heaters is higher in winter than in summer, sale of sweaters are meant to be higher in winter than in summer and so on. So, products exhibit a certain seasonality, now can we capture that seasonality of that product and then make the forecasting is the next question, so we will first build a simple seasonality model and then we will also develop or explain a seasonality model that uses ideas from exponential smoothing. So, the simple seasonality model is like this, first let us find out the total demand for the 4 years and this is 157 173 189, so these are the total demands in the previous years. Now what we do is we try to associate a seasonality index with every season, so a seasonality index associated with the first quarter is the proportion of the total demand met by the first quarter. So, that will be 53 divided by 157, so 22 divided by 157, 37 divided by 157 and 45 divided by 157, now again seasonality index in the second year will be 58 divided by 173, 25 divided by 173 and so on. So, we compute the seasonality indices for the 3 years and for the 4 quarters. So, the seasonality indices are, so these 453 divided by 157 is 0.3422 divided by 157 is 0.14 and so on. So, the seasonality indices are calculated this way, now from these seasonality indices, we can all these will have to add up to 1 and from these seasonality indices, we can find out the average of the seasonality indices. So, the average of these seasonality indices are given here 0.337 0.14 and 0.293, these are average across, this way. Now, some of the averages also has to be equal to 1, but sometimes there could be some rounding of errors in the average and therefore, this may be slightly less than one or it may be slightly more than 1. But, many times it is exactly, it will add up to 1, if it does not we can normalise it to 1, so that the sum of the indices is also 1. So, now, we have captured these indices and based on this average over 3 years, we can say that quarter 1 accounts for 33.7 percent of the total demand on an average quarter 2 accounts for 14 percent of the total demand quarter 3 accounts, for 23.3 percent of the total demand and quarter 4 accounts for 29.3 percent of the total demand. Now, the totals are 157 173 189 and the totals show a certain increasing trend or a linear trend, now we can fit a model to forecast the demand, for the 4th year. Now, for this particular example, there are only 3 data points, so it will be difficult for us to fit an exponential smoothing based model or a Holt’s model, so we rather go for linear regression and then we can fit a model of the type Y is equal to a plus b t. So, when we fit a model of the type Y is equal to a plus b t, we get Y is equal to 141.141 plus 16 t and the forecast of the demand for the 4th period at t equal to 4 will become 205. So, 205 will be the forecast of the demand, for this period and since, we know that quarter 1 accounts for 33.7 percent of 205 quarter 2 accounts for 14 percent of 205. We simply multiply by the proportions to get the 4 forecasted values, which are 69.08 28.7 47.77 and 60.06 are the forecasted values, for the 4 season, so 62 makes forecast to 69.08 27 28.7 and so on. So, this is a very rudimentary algorithm to do seasonal forecasting, where we de seasonalize the data, for compute the seasonality indices average, the seasonality indices separately take the total demands, forecast the demand and multiply by the seasonality indices to get the new forecast for all the 4 seasons. There are only a couple of questions in this rudimentary algorithm or method and we will see that, today before we close this lecture, first question that is commonly asked is why cannot, I take only these 3 values and independently fit a Y equal to a plus b t and get the forecast. Mathematically, it is possible and it is do able, if you do that, you will even get a value, which is very close to 69.08. Normally that is not done, because it means 2 things, it means you are treating this quarter as independent of the other 3 quarters whereas, when you fit a seasonality model you are capturing somewhere the dependency between or amongst the quarters. So, you do not do this independently, it is also equivalent of saying, there are 12 data points, I am picking up data point number 1 5 and 9 and I am trying to forecast the 13th data point, which is not normally done. So, we do not normally, do that we de seasonalize, it and then we bring it back second question that is often asked is now what is this itself shows a trend 29 to 29 and 30 should I average this or should I fit another trend model right here, assuming that there is an increasing trend. The normal answer is you average it, unless you really have several years of data, to show that in a particular quarter, there is the index itself is showing a trend. If the index itself is showing a trend and if you are convinced you can fit a trend model, there which also means that, if 1 index is showing an increasing trend, we are somewhere else in some other quarter the other the it will show a decreasing trend. But, ordinarily the average is taken de-seasonalized. So, the next model, which is called the winter’s model where, we try and capture the level the trend and the seasonality index, using exponential smoothing, we will see in the next lecture.

Info

Channel: nptelhrd

Views: 80,156

Rating: undefined out of 5

Keywords: Forecasting -- Linear Models, Regression, Holt's, seasonality

Id: e1yUVLKhcko

Channel Id: undefined

Length: 53min 40sec (3220 seconds)

Published: Mon Jun 25 2012