Time Series Talk : Autocorrelation and Partial Autocorrelation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're gonna be talking about the autocorrelation function ACF and the partial autocorrelation function P ACF and most importantly the big differences between them so honestly when I first started learning time series in economics this was a really big challenge for me just understanding the intuition behind ACF versus PA CF understanding real-world examples where they both arise and just understanding how to derive them both mathematically so I'm hoping to make some of those challenges a little bit easier for you guys so to start off with we're gonna go ahead and just use a toy example instead of going into a bunch of math theory in the beginning so what we're trying to do is predict the average monthly price of salmon maybe in our city so here's salmon and we want to predict what is the average monthly price gonna be this month compared to last month or the month before and all the months prior okay so here's a bit of notation just outlining that S sub T is gonna be the average price of salmon this month S sub t minus 1 is the average price of salmon last month notice it's just minus 1 so it's the month prior okay and as sub t minus 2 is the average price of salmon two months prior and of course we can keep going we could do s sub t minus 3 minus 4 and however far we want it but for the purposes of this video we're gonna stick to just these three now a big concept in time series maybe one of the most important concepts is that the measurement of some value at a time period depends on the measurement of that value at the previous time period at the time period before that one and on and on and on in the past and that makes a lot of sense right because there's a lot of things that could affect the price of salmon such as the weather maybe fishing regulations and stuff like that but arguably they're the most intuitive determiner of the price of salmon this month is just hey what was the price of salmon last month if it was high last month maybe we expect it to be high this month if it was high one year ago maybe expected to be again high this year for example so that's the idea behind time series one of the big concepts in time series and we'll get more into that a lot in future videos but for the purposes here we just need to fill in some blanks in this kind of diagram so here we have three boxes we have the price of Sam in this month price of salmon last month and the price of salmon two months ago now let's just draw a couple of very intuitive arrows that tell us what's gonna be correlated with what what might cause what so the price of salmon two months ago and to just make it even more concrete let's just say two months ago was January then February was last month and we're currently in March okay so the price of salmon in January is definitely gonna have some kind of effect on the price of salmon in February so we denote that by this arrow leading from January to February similarly the price of salmon in February will have an effect on the price of salmon in March now there's one more arrow we can draw here and it might seem weird to draw it first but it does make sense the price of salmon in January has an effect on the price of salmon in March through February right because we have an arrow going to February and an arrow going through March so there is some indirect effect of the price of salmon from January affecting the price of salmon in March but there's also going to be possibly a direct effect that's where we skip over the February altogether and just say that there's some kind of mechanism going on here where the price of salmon in January directly affects the price of salmon in March and to make it more abstract for a second where the price of salmon two months prior affects the price of salmon today and why want that happen to give a real-world example so it's not just abstract maybe there's some big food festival that happens in your city every two months so that food festival happens in January March May and on every other month right and of course during that food festival price of salmon might change because maybe the city wants to make more money off of the big festivities and stuff like that so the price of salmon in January might directly affect the price of salmon in March because the food festival happened only in both those months okay there's a concrete example there's several others you can think of now let's get into the actual meat of this video is first how do we calculate the autocorrelation function so I want to know the autocorrelation function and this I written cor RS correlate but this is the same thing as ACF autocorrelation function I want to find the auto correlation between the price of salmon in January and the price of salmon in March so that is s sub t minus 2 s sub t ok how would I find that well well I can find it really easily mathematically by basically just taking lining up all the prices from two months ago and finding the correlation here we're talking about the regular Pearson correlation you might have learned in high school or college and just finding the correlation that way so for example going further in time I could take the price in January and March and then I would have February and April then I would have March and May and so on and I would just find the correlation between all these different data points treating this as my X variable this is my Y variable and I think you guys know how to find the correlation between two data sets just like that ok but kind of at a more theoretical level and this is going to help us understand PA CF along better let me switch over here to a different color this correlation between January and March or more abstractly between the price of salmon two months ago and the price of salmon in a current month is going to be made up of two pieces and we can see that very easily graphically in these boxes we've drawn here because the arrows leading from two months ago to the current month there's two ways to get there I can get there directly so one effect is gonna come from doing s t minus 2 directly to S sub T right so that's the direct arrow and of course there's the indirect route so here's the direct route of course the indirect route is S sub T minus 2 going to S sub T minus 1 going to S sub T so hopefully you guys can see that so the direct route is going from two months ago to the current month and the indirect route is going from two months ago to last month to the current month and both of these together kind of form the ACF the auto correlation between the price of salmon two months ago and the current month now how does that rest with PACAF or the partial autocorrelation you might already see where this is going let me switch sheets here for PA CF we only care about the direct effect we don't care about the effect as it comes through other time periods so we only care about the effect S sub t minus 2 going to S sub T and why do we why might we only care about that why would we sometimes care about a CF and sometimes care about PA CF well a CF tells you the correlation between the prices I'm a number of periods ago and the price of salmon today but of course there's a lot of different components of that there's the component directly and there's a component indirectly now we might only care about the component directly because we want to see whether the price assignment two periods ago so two months ago is a good predictor of the price of salmon today based on a CF it might seem like a good predictor like if that correlation remember that Pearson correlation is really high but that correlation might be high only because of these indirect effects it might be the case that the direct effect has little to no correlation will barely help us at all with predicting the price of salmon today that's why PA CF is very very important because PA CF tells us okay taking all those indirect effects away just getting rid of them what is the direct effect of the price of salmon some number of periods ago and the price of salmon today so that's what PA CF is so PA CF is direct effect a CF includes direct effect and all be indirect effects through the intermediary time periods so now the last thing we'll do in this video is how would I find PA CF of course it's pretty easy to find a CF you literally just do a Pearson correlation lining up your data set the first column of which is two months ago or however many months ago and the second column of which is today that's pretty easy right PA CF seems a little bit more challenging right so here's a way to find PA CF you would write a regression model let's say we're trying to find P a CF of two right so we're at K equals 2k being our lag so you can substitute whatever K value you want so here we're gonna write a regression function where the price of Simon today which is this is equal to some coefficient Phi 2 sub 1 some coefficient times the price of salmon last month plus some other coefficient times the price of salmon 2 months ago and of course we have our error term and now this coefficient right here this Phi 2 sub 2 is going to give us the direct effect of the price of salmon two months ago on the price of salmon today and why is it there at the fact why is there no more confounding going on with this intermarry S sub t minus 1 because we already took that into effect in our model because we have a term here which already captures that effect therefore this Phi 2 2 is going to give us that direct effect of price of salmon two months ago on price of salmon today so it is exactly this Phi 2 2 which is the PA CF that is the PA CF for K equals 2 if I want to find the P ACF for K equals 3 I need to build a new model where I include another term with S sub t minus 3 and the coefficient of that term in the regression is going to be my PA CF for K equals 3 and so on ok so the last thing I want to do is draw a plot of PA CF we'll be looking at more of these plots in the future as we do more time series type videos but let's say we find the PA CF for K equals 1 2 3 4 5 6 7 on and on and on then of course these are called arc lags and let's say this is the plot we get of course PA CF can be negative right because if the price assignment today negatively impacts the price of salmon or sorry the price of salmon two months ago negatively impacts the price of salmon today then it should be negative these red bars I've drawn here our error bands you'll see this a lot going forward basically you can think of it right now as anything within the error bands so from 0 going out to the air bands is no different than 0 we don't have any evidence to say that it's actually different from 0 okay so I think statistical significance so we see that lag 1 has a nonzero PA CF flag 2 has an answer appears PA CF so does 3 4 & 5 but 6 & 7 there's not really any correlation between the six months ago price of salmon and the seven months ago price of salmon and imagine all future lives and the price of salmon today so what could a good model look like here remember what P ACF tells us PACAF tells us the coefficient of the price of salmon that many months ago on the price of salmon today and if that coefficient is different from zero as indicated by it being outside these red error bands then it's a good factor into a model because it can help us make that prediction so for example here this model might look like price of salmon today is going to be equal to and I'll have switched to betas here so beta naught plus beta 1 times price of salmon minus 1 so month ago price of salmon 2 months ago and then we keep going for 3 4 & 5 months ago okay so I won't draw out all them over the last one will be beta sub 5 s sub t minus 5 plus of course we need our error term so a good model here might look like coefficient plus all these other coefficients each times the price of salmon from one month ago two months ago all the way to five months ago because that's what the P ACF plot tells us so the P ACF plot is super powerful and helping us identify a good time series model to predict the price of SEM today based on price assignment in some number of past periods okay so that is a P a CF plot of course you might be wondering why didn't we draw an ACF plot that is also useful for a different type of model we'll get there in the future so just as a kind of teaser this type of model we've drawn here where you predict the price you predict something based on past values of that thing is called an AR or auto regressive model auto regressive because it's a regression Auto because it's based on values of itself in the past okay so that I hope was a good clarification for you all in what is the fundamental difference between the auto correlation and partial auto correlation and also how to find the auto correlation through the Pearson just regular method and how to find partial auto correlation by taking your regression figuring out the coefficient of that term okay so until next time
Info
Channel: ritvikmath
Views: 166,529
Rating: 4.9632387 out of 5
Keywords: machine learning, time series, data science
Id: DeORzP0go5I
Channel Id: undefined
Length: 13min 16sec (796 seconds)
Published: Wed Apr 10 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.