Talks S2E7 (Konrad Banachewicz): Time Series Analysis - Vintage Toolkit For Modern Times

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everyone and welcome to this brand new episode of talks and today i'm very excited because uh today conrad is talking about time three analysis and i don't know much about time series so hopefully there's a lot for me to learn today and conrad is also a very good friend uh we we used to catal together nowadays we don't find much time to candle together right so uh you're working in adavanta and what's this company about previously you were in ebay now you have moved to a new company yeah well it's more like the company hi first of all glorious day to you sir thanks for having me thank you i uh uh i work at other india it's not so much that i moved to a new company as the company moved around me because the part of ebay i was in we changed hands it was it was sold to adevinta so it was the classifiers group well used to be ebay now it's now it's in terms of what i do um i'm a lead data scientist in the central data science team uh which in practice means well being the guy with enough seniority that when i say this is not gonna work people actually listen uh well doing enough to prove then i can just you know not just talk the talk but also walk the walk i think americans have this kind of saying uh what do we do well it's an e-commerce company so primarily we are concerned with the fact that when you come to our website you click what we hope you will click and best way to achieve it is well it's like free objectives that you have to reconcile if people sell through our platform they want to sell as expensive as possible if people buy for our platform they want to buy as cheaply as possible and we are sitting in the middle and we would like to make some money while we're doing it so those objectives don't exactly go very well together and balancing them is well that's that that's the part where data science comes in in terms of more specific stuff uh recommenders ranking we started doubling a little bit into reinforcement learning lately oh let me think what else well obvious run-of-the-mill stuff like fraud detection but user detection uh chat moderation what about time series i am working my way into this one i'm working my way into this one because you know the funny thing about time series uh well i've been doing time series like way back in university god helped me close to a close to quarter of a century ago and then i was under the impression that nobody really cared about time series anymore so i was like yeah well it was a i kid you not because nobody did like it peaked when i was in finance with financial models and i don't think i've actually used it at work for the last i don't know five years easily and then slowly getting to another and i was just like actually those people wanna you know wanna talk about time series and those people are interested and it slowly started coming back er undoubtedly it was partly because you know deep learning and lstm being the revolution that it was but not everybody has enough data to have deep learning not infrastructure for that matter i mean okay these days it's a little more commoditized because if nothing else works you can just fire up collab in your browser but you and i both know that was not really the case like five six years ago true well back in the day at least like financial analysis financial math and whatnot r ruled the field and if there's one thing where i still to this day think that r dominates over python it's it's time series functionality okay okay well i which by the way that's doesn't contradict the fact that i uh am giving code examples in python in my presentation because you know i'm not gonna fight the global tide that's just the way things are going this is what everybody is using in terms of data related modeling sure and uh you have also made a series of notebooks on kaggle on time series so i will be sharing the link of all all the all the notebooks in that series as a pinned comment to this video so if the audience is interested and it's a very good series so please go through it go through each and every kernel and there's a lot to learn conrad has invested a lot of time in those notebooks so um a fifth one but unfortunately events of you know this thing called life that's happening when you're not cuddling uh prevented me from from finishing the fifth one uh but i hope to have it soonish okay so it's an ongoing oh very much so very much so there's i think four out and i have ideas for at least another six okay that's very nice cool cool so yeah looking forward to the presentation so the screen is all yours now i think we are already in sharing mode uh and [Music] full screen so i'm fairly certain what you should be seeing right now yes excellent then we're ready to begin well thanks everyone for joining me on abhishek and myself on this lovely saturday afternoon where i'm sitting what can we do about time series i mean brief overview first uh general first of all general comment time series is an absolutely huge field so the overview i'm giving you in this presentation it doesn't begin to scratch the surface this is like a really really high level to point you to some useful stuff that you can already achieve with fairly elementary techniques although as my probability lecture at uni used to say elementary ladies and gentlemen is not the same as trivial so the four main topics we will touch in this presentation is well the basic groundwork so we have a couple of definitions that we can work with or simple tools forecasting so let's be honest that's like nine out of ten cases what people want to know about when they hear the phrase time series i'm gonna dig a little deeper to have a bit of an introduction to what you can do in situations where you want to find out what's actually going on and not merely have a good prediction of the of next observation and then depending on how it goes time wise we'll go a little bit into okay for lack of anomaly detection although you may have encountered the term outlier detection or some such so what can we say about time series first of all it's been around for a very very long time if you go through well earlier stuff around it the two guys i believe they were at stanford julen walker in 1920s they started building models which later ended up as the first massively used time series models there's gonna be a lot of definitions you might encounter uh you open a proper math book someone might be starting it by saying that yeah a time series is a specific realization of a stochastic process defined on the probability space blah blah in reality in practice unless you're working in the math department everything you measure over time is a time series period anything beyond that that's just uh well algebraic bookkeeping what can we do if we have a set of measurements of a certain phenomenon over time well we might want to interpret it to find out what's actually going on and that's something that we will be touching upon in the in the digging deeper section we can try to filter smooth or smooth depending on what you use our observations to remove the noise the good a good example to think of is say you are observing a trajectory of a car from the outside and you want to know which what kind of movements the driver was making i mean yes there obviously is some correspondence if the current car went left and the driver probably turned left it's not a 1-1 there's some sort of noise which prevents you from seeing the real signal and that's what filtering and smoothing is about uh forecasting this is what everybody is looking in like i mentioned earlier what nine out of ten cases what people think about when they hear the phrase time series and finally simulation there are certain applications where you can't really estimate anything because you just don't have enough data so you need a good reliable model from which you can simulate stuff i think risk calculations in a bank uh points to address already at this stage uh i've been known to use the phrase deep learn everything in a somewhat sarcastic manner that doesn't mean by any stretch of the imagination that i bash criticize or what or whatever deep learning i think the planning is awesome i think it's one of the greatest breakthroughs in the history of machine learning it allows you to solve a huge amount of problems however what it isn't it it isn't cheap you need a lot of data to get deep learning to work and it's that problem is particularly acute if you're dealing with time series because no matter what your domain where you are dealing with text or images there's always a chance you can find out some more unlabeled data in terms of time series say your company started gathering monthly data i don't know 10 years ago that means 120 observations on the series of interest that's not exactly a lot to write home about to wrap it up it's useful to have deep learning to understand deep learning and it is indeed one of the things i intend to address in the later parts of the of my of my notebook series but it is quite important also to have uh tools that you can use where you have you know 300 observations and that's it uh every domain that you're dealing with has a text has a pretty much a reference data set that you use to demonstrate things like you want to explain a regression to someone chances are the boston housing data set figured somewhere you want to explain classification uh well when i was a student it was the iris data set these days it's probably cipher or or something like this in time series one of those classic data sets that everyone use is number of passengers flying out of in and out of australia over the yeah 20 years of the post-war period this is what you see here uh the reason this is such a nice data set is that it exhibits a bunch of characteristics in a fairly regular manner you see the level is increasing which means there is some sort of trend there is a repetitive pattern occurring which means there's probably some kind of seasonality and there's not a lot of noise what can we do about it and this is where we get our first encounter with one of the primary tools to be used in time series analysis and namely the composition so you can you can you might encounter it under name structural decomposition or something like this the basic idea is any process or time series you can decompose it into a trend some sort of long-term progression say you look at the number of people flying say by airplanes well there's more people over time most countries grow in size chances are under even if it's the same proportion there's physically more people going to the airport to fly uh st the seasonal component uh well think sales sales of ice cream is gonna be heavily seasonal sales of uh christmas articles it's coming to be quite seasonal as well uh those kinds of things the important bit to to remember the uh period over which the uh the seasonality occurs is fixed it's deterministic and it's not a priori occasionally people especially if you're talking about uh economic time series also add a cycle component which is pretty much seasonal component but where the period is undetermined like it i mean it's only known approximately like for instance in economics you can have the would you call it the condrative cycles which in which can be like 11 years but they can be 13 but occasionally 10. so ballpark we are there but it's not fixed i decided to skip it from the overall exposition because it just confuses things but be prepared that you might encounter it and so you have trend you have seasonality and you have pretty much everything else people might call it sometimes irregular component residual component model error everything else that doesn't fit does it that is not captured by trend and seasonality we are saying okay that's noise or at least that's what we hope for so what can we what happens when we apply this decomposition to our uh passengers data series well there's a clear growing trend very monotone close to linear this is a repetitive seasonal seasonal pattern that's all right and then we have the residuals sorry uh and this is where we are less happy because if you look conrad we do we do have a question coming in sorry to interrupt you yes so the question is how is trend and seasonality different can you an example or two of course uh trend well seasonality is everything where you know that this is or you have reasonable reason to believe is in bodies and i'm so sorry english is not my first language you have reason to believe that this will the phenomenon will repeat itself over the same interval like uh yeah well sales of heavily seasonal articles in the year those kinds of things uh ice cream uh i don't know bathing suits in the summer assuming well when whenever summer happens in you wherever you live uh christmas articles this is something say christmas articles they will always peak like throughout december maybe some people carry it okay over the moon into january and then it's going to flood like the rest of the year or in the context of uh the seasonality here in flight can we map this to months i think there's a solid chance that the big peak is probably something around christmas that's when a lot of our holiday when a lot of people travel and the smaller ones that also repeat are situations where people are flying throughout the year i don't know some minor holidays that are not so global in nature so that would be seasonality trend is pretty much everything that you can reasonably approximate as deterministic say you look at number of uh users of an uh of a certain website or a platform uh chances are if we know that in a given country 80 percent of people use the internet and the population of the country is growing at 5 annually taking a ridiculous number obviously then the number of users will also keep growing at that rate and it's pretty deterministic i mean it's not ideally deterministic of course but it's something that we can kinda capture that way i hope that answers the question if not feel free to think back shall we continue or is there another one yes yes let's continue there is another question but i i think we should take it towards the end because um it's about how to approach a problem so again i i promise to take that in at the end sure okay uh the the relevant bit here if you look at the lowest panel residuals they are quite they the magnitude of the jumps is quite high at the third at the early part of the sample and towards the late part of the sample it's almost flat if you go from like 60 to 100 observations which means very variance of your observations changes over time that's not noise one of the crucial things about noise means the variance is constant over time which means in this instance the additive decomposition fails to capture something important about the process oh biggie we can reformulate it into multiplicative decomposition which means effectively you can think of it as slapping logarithm on everything and then decomposing because well if it's multiplicative then it's additive in logarithmic space lo and behold well i'm not that surprised because i chose this example to demonstrate my point the behavior of the process itself hasn't changed that much neither has the trend but residuals are much better behaved far less variation i'm sorry to interrupt again but would you mind clicking on the hide button that says yes of course it's uh it's on the graph okay better yeah much better thank you very much of course yeah i was sorry i didn't realize this was showing uh as well uh okay wrapping up the groundwork part more more fun with the basics uh two important uh concepts that are kind of outside the scope of this one is autocorrelation stationarity stationarity translated to plain english is pretty much saying uh yes stuff is random but the randomness tomorrow is similar to the randomness today and it or and if it if it's not then how does it change autocorrelation is an approximation to the idea in order to be able to predict something the process must have some kind of memory which means there must be some relationship between what happened today and what happened yesterday because if we're trying to build daily forecasts and every day is completely independent from everything else yeah we can't predict anything because there's just no there's just no in no past information for us to exploit for more details part one of the of the notebook series okay so we'd like to do some forecasting enter exponential smoothing uh kind of like if you're looking at statistical models the method that you start with is linear regression then in time series forecasting that's going to be exponential smoothing this is like the uh i don't know swiss army swiss army pocket knife or something it just works pretty much everywhere it's been introduced million years ago well 80 really but you see my point uh it's the vintage method to refer to the basic idea each prediction is a combination of past observations and those weights the further you go back into the past the faster they did they shrink i they are exponential decay in the weights uh the brilliant part of this of this minimal requirements minimal to set up minimal to estimate meanwhile to implement in practice like if you are really in that kind of position in life you can you can and i have done it uh if you have to not by choice trust me you can implement exponentials moving in excel and it actually works uh exponential smoothing models had a kind of interesting cycle life cycle because they were their great thing for the 50s and 60s then they kind of fell into absolution because you know better computers became available and people started being able to estimate better time series models where we didn't have to worry about having minimal memory footprint but then they came back with a vengeance because if you are well ecommerce website like a de vinta to use a good example and you need to generate predictions quickly you don't have time to make an expensive call to well unless you are google facebook or nsa unless you have that kind of infrastructure you don't have the time to make an expensive call to a big deep learning model and then reevaluate it you need something that can be done quickly ideally within the database or glorified database like elasticsearch and this is where exponential smoothing models come back in because the forecast that we'll see in a sec is a combination of two things that are either available at execution or can be pre-computed which means you can bake it into a sql query with zero concern for full-blown engineering data science infrastructure behind it etc and that that's what enabled their comeback especially in contexts like uh algorithmic trading uh the actual formula that i was referring to assuming uh by the way disclaimer throughout everything i'm saying xt denotes the time series that we are actually interested in [Music] xd is our time series st is our smooth version using single exponential smoothing the aka brown method and the idea is if you look at the formula here says alpha is a constant between zero and one alpha times the current value of the time series is one minus alpha times the previous smooth version or you can think of it as current smooth observation is the previous smooth observation plus some portion of the error from the previous step namely how different our smooth prediction was from the actual observation the reason this is the reason this is important this allows you to balance essentially how much you care about recency versus smoothing higher alpha you're more focused on the recent observations lower alpha you focus more on the on the smooth pass this will become more clear from a graph or two in a sec yeah simple way the average of the past in the present important bit here this allows you to produce an automatic forecast arbitrarily far into the future just edge steps into the future if you keep iterating formula free on itself recursively then once you run out of observations then your most recent smooth version that's your flat observation going forward uh the recency versus moving if we have a very small alpha as i mentioned earlier well as you can see blue is the original series red is the smooth one pretty much gets rid of most of the variation follows kind of the general behavior and then from a certain point onward it flat lines if we increase the alpha it follows this series a bit more closely so there is less focus on this previous smooth version and more on the more recent observations and if we push the alpha to 0.9 then this then this practically not entirely because it's still 0.9 and not not one replicates the behavior of the series but if you notice there's a lag this is because this observation depends the new smooth one depends on the previous smooth one and rescaling and that's why we don't replicate it immediately replicated with a lag what can we do about predicting something with a uh single exponential smoothing by the way i'd like to say i apologize from the bottom of my heart uh for the way this slide looks but i'm totally out of shape when it comes to magnum and uh i was like yeah this is gonna work i just did screenshots and then i looked at it and i realized i don't remember how to set up a code block anymore so i decided so i apologize for that promise to do better next time we'll keep working with the passengers data set how can we do it in stats models stats models a moment of introduction stats models is pretty much the goal to package when it comes to classic statistical methods well up until three moments three months ago or so anyway when cass was introduced spoiler alert and i'll mention that later which on the one hand introduced incorporates lstm on the other it also has exponential some exponential smoothing uh but until that point stats bundles was pretty much their place the place to go which uh kind of explains why the syntax is not exactly the most obvious at times at least it wasn't for me i remember reading documentation and going like good gracious 1990s called they want this there they won their style guide back so what we got here we read the data the usual we plot the series we instantiate an object with this data set as an argument we call fit with usual parameters prediction number of steps ahead plot tada this is what it looks like uh two things to observe one as same as earlier there is the lag going on b from a certain point onwards if it flat lines uh it's nice that it replicates the behavior a little even if it's progressively the higher the peaks go the more it's missing them but let's be honest a completely flat forecast going forward that's not super useful so what can we do about it well it worked for the level of the process the exponential's moving so mr holt was like well let's see if we can just move the trend and then see what happens low and behold that's precisely what it did uh for double exponential smoothing it's the halt method you don't have a single equation you have two one for the series itself and a separate one just for the trend the smooth level and smooth trend exactly still there because i had some noise on the line for a moment so there is a question related to uh smoothing exponential smoothing so uh house one i'm sorry this one or the previous one oh actually it was asked a little bit uh before okay so i'll just jump back sure the question is how safe is exponential smoothing or window shifting methods for irregular time series or the ones with s signal where noise might be high so safe by safe the person means foolproof okay there's nothing foolproof there is nothing foolproof in this world no i kid you not i kid you not every i'm not trying to be cute i i really mean it uh the reason this is not complete this is not full proof is the following you have a bunch of methods in statistics machine learning you name it where if you do something wrong things are going to blow up because your objective function will just reach to infinity you'll get segmentation fault and whatnot uh exponential smoothing is not one of those methods you can always fit some sort of constant to to to mimic it as closely as possible so that's regarded me so in that sense if you know that when you do something wrong uh well the whole thing is gonna crash it is pretty foolproof because when things crash in front of you you know something went wrong uh on the other hand uh if if they don't crash they're not guaranteed to crash they it's not foolproof and in terms of irregular series do you mean there are missing observations in between or that the noise is high yeah so so uh the ones with less signal and a high noise noise might be high they will get you somewhere i mean everything all this everything where every single situation when you are dealing with high proportion of noise is about getting rid of that noise in a manner that you know you get rid of as much as you want to but not more than that so if you clean up the signal a little bit and this is a form of cleaning up so yeah i think it should be fine like i said with the cave ad that have a look for to start well things like is your data set big enough like it's not exactly proper math but the kind of rule of thumb is that i mean keep in mind exponential smoothing it's it's a recursive method which as usual is the case with recursive methods it's a it needs a little bit of time to catch up which means your series cannot be too and the kind of rule of thumb is that for a given uh for a given alpha you need three over alpha observations for this thing to stabilize uh yeah i know it's it's a bunch of heuristics but let's be honest that's the best we can do in a situation like this i hope that answers the question yeah i think it does so let's let's carry on okay uh forecast uh eight steps going forward is the most recent smooth observation plus h times the most recent version of the smooth trend what does it look i'm so sorry what's it look like in practice uh well this is pretty this is pretty much copied from the previous block so i'm not going to discuss this one an awful lot and then we have the double exponentials moving part so hold method similar syntax although why did they call one simple exponentials moving and the other hold and not double as beyond me [Music] and this is what you get this is the picture which you already saw and you're familiar with the forecast that just flat lines from a certain point onward and then this is what happens when we apply double exponentials moving so good news we are not undershooting anymore with respect to the variation uh bad news number one we are overshooting more and more substantially even more importantly if you care at least a little about reality what this method does it extrapolates the most recent no sorry yeah extrapolates for example the most recent trend down in indefinitely which means pretty sure we're gonna get negative number of passengers not good not good so what do we do if the trick worked once why not try to make it work twice we go from double to triple exponentials moving aka hold winter's method if memory serves winters was holds phd student or something like this and they worked on this one together uh well the first one is kind of similar as before a smooth version of the series corrected by the seasonal component and here comes the trend component then there's the equation for the trend which is unchanged and the new new one compared to double is this one equation number nine for the seasonal component seasonal capital l that's the period that's the seasonality that we are expecting in the data nine out of ten cases if you see this in practical applications this stuff is going to refer to monthly data since we're talking annual seasonality which means l is 12. uh occasionally if quarterly then l is 4 but that's about it this is not a this is not at times this is not an exponential smoothing consideration per se but more a general remark about any model where you are trying to fit seasonality please please pretty please in the name of all that's good and beautiful in this world if you want to estimate seasonality of pattern say l make sure you have at least two full cycles in your data because otherwise bad things are just lurking waiting for you to have working waiting to happen to you uh forecast from this model going forward most recent uh smooth version of the level plus extrapolation of of the trend of the most recent trend plus the appropriate seasonal component what this transform translates to in practice this is what we started with this is this this was the improvement we got with um double exponential smoothing and this is what we get when we get triple exponentials moving so we take care of the level the trend and the seasonality i mean it's not perfect but it's kind of getting you at least in the you know socially acceptable direction with the previous slide uh does it start over fitting at this point uh it might it might yes well the question the thing is uh well how much data do we have if we but yes in general the thing to keep in mind here uh unlike machine learning methods or other time series methods say arima and whatnot this one does not really have a model this is effectively a glorified exercise in curve feeding so yes yes there is a risk that this will start overfitting at some point it's not that apparent here yet uh because well the past australian passengers data set is kind of selected to to make a point but yes yes that's why you kind of have to be careful also with exponential smoothing the problem is not as big as you know with overfitting as you would with you get with i don't know uh non-regularized linear model with collinear variables it's not that bad but yeah the issue is there okay and another question that we have now is is it because of the nequist limit that we demand sample size double the size of the period is it because of what i'm sorry so sorry i might not be pronouncing it right but nyquist limit uh probably probably it's been a while since i've been it's been a while since i've been looking it from that angle uh but from what i do recall yeah although the less formal explanation i always like to use is well if you want to know whether something that happens say throughout i don't know january to february is a seasonal effect or not then you need at least two cycles to sit to see because if i only observe uh i don't know 10 months of a year i have absolutely no way of knowing say taking as an example i live in the netherlands which is northern europe hence northern hemisphere as far as we're concerned summer is what happens well even for dutch standards uh starting with june which means it gets a little warmer before that how do i know that well because i've been living here a while however if i had only lived here for 10 months and i only saw one spring i have no way of knowing whether the change in temperature that's okay is it a seasonal pattern in this period or is it a ongoing trend and by november i should start panicking the global warming is going to kill me i have no way of knowing it because i have not seen a full cycle materialize at least twice but yes i mean i had i admit talking about nucleus in this context is probably a more elegant explanation okay uh another another question that we have sorry i i have a couple of questions here so okay so biggest nightmare for every presenter is that you're talking and talking and there's nothing but that silence yeah true so uh is it advisable to use cross validation for hyper parameter estimation in exponential algorithms uh cross validation out of the box no i'm always that's my default answer cross validation in time series no not a validation by all means if you can if you can be bothered yes if you have time to mess around with that sure find the number of parameters and treat it the way you would in the other hyper parameter yes so you're saying some kind of holdout set yes okay very much uh i mean if i remember correctly i have i have this mapped out the validation methods for time series i think for module 7 or something like this the only tiny issue in the meantime i just need to find the time uh pun intended uh but yes i i am aware that this is something that's not super obvious to people i mean just to give a trailer preview cross validation out of the box is cool if you are sure that time doesn't matter which is fine if you are for instance talking about i don't know uh segmentation of segmentation or classification of images those kinds of things then the time dimension isn't really germane to your problem however in type series problems by its very nature practically it is and so if you are crossed by this is always a risk that you are effectively looking into the future and then using it to back that to evaluate the past uh of course you're gonna get fantastic results there's also a long list of bad things that will happen to you but yeah that's that's why i say i mean sometimes you can get away with it but those are very very special cases and exceptions in general validation times yourself say holdout said better still if you have time uh which is what's it hindman called it like a rolling holdout set pretty much so you have 100 observations you feature model on 1 to 80 and then validate on the last 20 and then you shift it and you fit on observations 2 until 81 and then validate on the rest and keep rolling keep moving your training set and your hold outside ahead of it forward yeah like sliding window the person mentions thank you thank you that's the phrase that all with me thank you so before we move on there's uh also one one one more question and uh it's about level trend and seasonality and the person wants to know what does level refer to uh a constant think of it if you think of it in relationship to no actually no the analogy to calculus is is is a horrible one uh let's see if i can go back to some uh yeah that's kind of horrible but it will have to do level you can think of it like well for for something that wouldn't that didn't have a trend a level would oh no i know yes that's what i wanted level uh this series more or less oscillates around zero it has a which means it's not super formal explanation but we are talking about intuition here which means it has a constant level of zero and this level doesn't change uh a little later assuming we manage to squeeze it time wise we're also going to talk about uh changing level in the series uh but this is a level because there's no well think of it like this if you take an average of the series over this interval and this one and this one and this one it's probably going to be pretty similar okay this one might be a little different because there's huge swings but up to this point actually no average will probably will be fine volatility will go through the roof but on average there's like variations around zero which means the level is constant here by contrast if you look at the triple one an average here would probably be something like 150 android year ballpark 250 350 etc which means the level of your series changes it keeps going upward which means it's not constant or at least it's only locally constant but overall it's increasing and that's where the trend comes in trend pretty much says okay something like this that's an upward trend kinda linear or very very flat quadratic depending on how you want to look at it uh so that would be level versus trend yeah and then seasonality yeah i mentioned that one that's a that's a repetitive pattern that you have across the entire period of the sample i kind of hope i answered that one anymore or do we do we proceed let's proceed we do have a few more questions but we do have a lot of things to cover too so after some time fair enough in that case let me speed it up a little bit wrapping up this part uh exponential smoothing works out of the box are pretty much out of the box with stats models if you can swallow this in the slightly outdated syntax all the methods in this chapter those are special cases of the class called state-based models uh if you heard of kalman filter this is like the poster child of state space models more details and what not to run this one in part two of my notebook series it's nice to forecast stuff it's nice to look into the future but sometimes we'd also like to know what the hell is actually going on and this being 2021 the era of instant gratification most people want to have that answer quickly well in all fairness maybe some of them have just suffered enough in i don't know sas or something because i sure want quick results if i have to spend some time in sas by the way if you don't know what sas is in the context of statistical software count your blessings there's everybody's well the company that everybody loves to hate namely facebook has the list over the last few years two libraries specifically dedicated to time series one is profit which probably most people have talked with have heard about and the other is cuts cuts came out well ts i think stands for time series i have no idea what ka is for this one came out like in june that's what part five of my series is going to be about so once i get to it point is those libraries are designed to get you fast accurate and interpretable models uh pretty much out of the box the only major dependency prophet has is uh stun to get hamilton monte carlo uh advantages compared to exponential smoothing you can have multiple seasonal patterns so that's super useful you can you are more flexible with your shapes of your trend in terms of what you manage to capture because unless something you see well monotone throughout the sample exponential smooth double triple exponentials moving are going to struggle and for the more mathematically inclined people uh listening to this this is based on the gums so the general generalized additive models which is pretty much a crazy powerful if not that easy to get into branch of nonparametric statistics uh profit uh everything is just a function of time that if that formula looks familiar to you compared to what was at the beginning good it's more or less the point uh we decompose our time series x into a trend uh and i intentionally change the subscript to strat to stress the fact that this is not a statistical model this is a curve fitting exercise so this is literally a deterministic function of time we're just trying to fit in the best way possible or with that or that we can st is a seasonal pattern combination thereof a so those two so far so good residuals at the end namely the part where we throw everything that we don't know what to do with and we hope that it's in the indistinguishable from a white noise yeah the more interesting part hd you can call it holiday special days irregular days whatever essentially this is a functionality in profit that's supposed to capture things like christmas is fixed but easter is not and yet in christian countries easter is going to be important uh i mean you know when it is going to happen before a gear starts but you don't know it a prior i mean it's not fixed every given year dito for the start of the ramadan in muslim countries those kinds of setups you know it's deterministic you know it's a priori but the difference the intervals between them are not the same but you can also use it to handle things like outliers you can handle use it to handle uh historical events say you are looking at the performance of the american stock exchange and then you get to something like september of 2001 but obviously all hell broke loose but it's not exactly a normal day so this is something that you might want to incorporate into your model to correct yeah that was that one was kind of different uh so that doesn't impact your entire uh model going forward very easy to customize it's literally a bunch of lego blocks just add and and go on from there uh it's a nice extension of double exponential smoothing because it takes care of multiple seasonal patterns which we'll see in a sec uh yeah curve fitting that i mentioned that thing that made me fall in love with profit you don't care about missing values you don't i mean it's missing it's remote it's an outlier it's suspicious it was a data collection error you don't care you just dump it why because this is a regression on time which means unlike exponential smoothing and also arima which we are not talking about today by the way uh we the the observations that we use they don't need to be regularly spaced which is an unbeliev which makes life unbelievably easier uh yeah probabilistic aspects hamiltonian monte carlo so kids when you try this at home i uh well it kinda works fine on linux and on mac if you have a windows machine make sure you read the instructions on compiling profit or specifically compiling compiling pi stand for profit because it's not exactly trivial but trust me based on the performance of the monte carlo methods it's more than worth it and those bits i am talking about in parts 4 and the upcoming part 5 of my of my notebook series again with apologies for the style of the slide so so um sorry to interrupt again so uh yeah there is a question on dealing with missing values in time series so profit as you mentioned it doesn't require any kind of special attention to missing values but what about traditional methods uh in terms of traditional methods in general let's see uh out of the box exponential's moving car is going to crash you're gonna have a problem there uh arima if you implement arima uh what would you call it from scratch this is going to have a problem as well if on the other hand you reformulate arima as a state space model which you can uh then you don't have a problem because state space models can handle i'm so sorry i hate slack so much and i hate it even more when it gives me updates all the time state-space models can handle missing values because they just effectively it's a filtering problem so to summarize exponential smoothing trouble state space models zero trouble arima depending on your implementation whether someone took care of it or not so would you go ahead with filling these missing values when dealing with time series problems or just ignore them uh personally i prefer to ignore them mostly because if i ignore them then then i know something well i know that what i'm looking in those instances is is is interpolation of some sort because if i decide to interpolated a priori myself that kinda makes it sensitive to the choice of interpolation method i use what do i do do i carry over the last known value do i take an average of the two adjacent ones same assuming i only have a single missing one do i fit a whole separate model just to fit just to interpolate the missing values honestly three out of four cases i just don't feel like going down that rabbit hole so i'm trying to use methods where i can get away with that also that's that's kind of another advantage of profit because you get the interpolation in sample of your missing values for free as a uh yeah as a byproduct of what you're doing really in practice i i tried in so far as i can i try not to not to feel missing values because it's a subjective choice and i'm not a huge fan of subjective choices in statistical modeling i hope that answers the question sure it does let's move on okay uh reading data in profit disclaimer for all the brilliance of the creators of prophet what possessed them to hard code the required column names for the timestamp and the series itself is beyond me but they did so the timestamp has to be called the s and the series has to be called y and as a data example let's call you know what else is everybody else talking about other than kovit for the last 18 months let's look at the series of new cases i believe it's from new york or at least that's a data set i think gathered by the new york times uh why is it important if you look at the data set see that well obviously the number of positive cases in disease cannot fall below zero also if you look here it's well with the exception of this sudden jump up to a point and look it looked like it was kind of flat lining last but not least if you look a bit closer at the curve the curvature here here here and in this part it's a little different we can formalize that intuition excuse me by looking at change so-called change points in the in the trend and this is a functionality that's that's built that comes pre-built in in profit uh essentially as part of the estimation of the model it marks and this is what we get here of the dashed lines the points where the slope in the trend changed above a certain threshold if you think this is still too much doesn't look to me that way then you can mess with the parameter change point prior scale effectively what it says on the box it's a prior distribution that you slap on top of it if you want to regularize it uh you can also start thinking about well series where you don't want to predict too too much stuff i mean again sticking with the covered example i remember april last year there were some people who were claiming that at the current rate of growth there is going to be 11 million people sick in i think it was portugal if i remember correctly by end of summer trouble is portugal only has 9 million people so i think a lot of people don't exactly understand what exponential growth means uh and it's useful to take that into account when you're modeling real life data lest you end up looking like a tv presenter how do we do it we have a expert estimate what's the highest level that our process can reach or as well as the lowest one so we add it to our original data frame fit the model as before with the usual profit syntax make future data frame etc add to this the cup and the floor run prediction and voila this is what we are getting a prediction that has the ink already incorporated the floor sorry the floor and the cap well as you can see i was a bit too conservative with my cup because the series kind of does stick out but i think you're getting general idea uh seasonality uh accelerating bit quickly because i think we have like 18 minutes left uh this is gonna be two for the price of one uh i still apologize about the layout of the slide we're looking at hourly data so we are we fitted the model we specify the frequency number of periods ahead and we want to see the confidence interval and this is where the hamiltonian monte carlo comes in because the way profit constructs um confidence intervals and the like is just by by stimulation by monte carlo simulation and this is what we get when we fit doesn't matter actually the specific model itself this one and this one i'm demonstrating just to show that if you are working with hourly data then automatically you get an intraday seasonality fixed as estimated i'm so sorry with a uncertainty band around it uh this is what i mean this is daily for some reason they're doing the daily yearly weekly i don't fully understand i think daily weekly yearly would make more sense in decreasing frequency order but whatever weekly same thing and the annual pattern i have three years so i can get away with that relevant bits here multiple seasonal patterns smoothly accommodated into a single model a sideline for the more mathematically inclined fourier expansion just you you keep or leave components relating to different frequencies uh and if you incorporate them the simulation component which you just do by keeping by setting the mcmc samples on markov chain monte carlo number of samples above zero then you also get the uncertainty intervals holiday special days as i mentioned earlier i kind of like this one because i think this functionality is is just cool uh actually started norway you you actually you'll like that uh because i started looking at country holidays in pandas also and in baseline once in norway and if you can believe that they have all the other stuff listed as public holidays but they don't have christmas uh so that's why i was like actually you know what that's a good example to use here so i added manually the data frame describing christmas over my sample period you also get a nice functionality lower window upper window what does this mean you have a whole special day holiday you think this has some sort of effect on your series on date occurs but you can and that's fine that works out of the box but you know what you can also do you can say uh okay i think the effect is gonna start a day earlier before the date as well that's lower window minus one and it's gonna persist for something like a week afterwards which probably in the context of christmas kinda makes sense hence upper window seven combine the two yada yada and this is the kind of picture you get this is not perhaps super fascinating if you are only interested in prediction but if you are interested in finding out what's actually going on that i would argue then i would argue having those three things the actual series the predicted one example and the holidays and holiday effects in one graph is useful especially that you can zoom in a little bit on the periods of interest and then you know okay this point this was our predicted the red line the blue is the actual which means we overshot widely and this happened to coincide with this particular holiday and with this kind of thing it makes it fairly easy to to analyze uh well what actually happened along the way uh any questions there is a question but uh not exactly related to seasonality okay so let's take it uh towards the end it's related to nutrition models yeah okay and then uh last but not least at least for for today uh anomaly detection the reason to think of anomaly detection or as i mentioned earlier uh outlier detection outlier analysis what's in the name to gauche experian is the following trend and seasonal pattern are things that change uh they are random but we know that they are random and we know that they change so to borrow my favorite news villain of the last two decades mr ramsfeld those are known unknowns beyond that however all sorts of other things can happen we can have a change in the level of the of the series we can have unexpected events happening or we have events that we didn't even know happened but only find out about it after the fact because all of a sudden something in our data starts looking really weird uh this is where anomaly detection comes in and this is the point at which we circle back to exponential smoothing because what's exponential moving at the end of the day it's variation around the theme of a moving average and if you understand how moving average means and moving standard deviation then as a bonus you can get uh probably the simplest anomaly detection score ever namely the z score the basic idea is like this assuming x t is our original series x m is our rolling moving average over a period of m excuse me and sigma m instead only standard deviation then this thing has approximately normal distribution and that's dictated by one of the coolest results in probability namely central limit theorem in practice this means that if something has a normal distribution that think that then things further than plus minus three standard deviations from the mean are extremely unlikely uh extreme example of that is that the people who are building the atomic bomb in the manhattan project their computers did not allow them to do simulations that that much so they had to assume that normal distribution has a finite support so they literally assumed that beyond three standard deviations plus minus from the mean the probability of everything is zero i mean the bomb did explode when it was supposed to it did work so there is something to be said for crude approximation sometimes in our context if something sticks further away in the z-score that means it's very unlikely compared to what we consider normal behavior at the moment uh yeah anomalous the disease correlable free we read the data we have a look at it nothing nothing that's special so far that's literally yours you pick a window size which is effectively a parameter that you decide based on expert judgment you create a rolling object rolling mean running standard deviation translate into a z-score plot yeah sorry and then you can quickly find out which observations are out of range because you just pick the ones where the z-score was above free voila it's not a super foolproof method because it's well sensitive to a bunch of things number one uh well your subjective choice on the size of the window how big is it uh number two just about just about everything after suitable normalization becomes gaussian so you can use this heuristic but not everything there are distributions which do not conform to that and you might be in for a nasty surprise so it's not at all utterly foolproof but it's in the ballpark this part is literally working progress namely outlier detection if you try to do it the cuts way which is the one i'm currently writing uh you don't have to think an awful lot uh how to define stuff yourself just import the outlier detector class instantiate an object and then you feed it and voila you get a list of timestamps where suspicious things occur uh the second thing is looking at change points which i have mentioned earlier and this was actually something that i found that well that that and ensembling time series is what i really really like about cuts uh because this allows us to look at change points change points are in general you can think of it as distribution changes in your time series this can be a change in the mean it can be a change in the slope or direction completely of your linear trend linear trend those kinds of things it's a notorious problem if you are looking at real life data stuff changes all over very very few things are genuinely stationary pretty unpredictable over a longer time time horizon and the massive advantage of cuts is that it works pretty much out of the box uh riffing off to the question that was asked earlier but what does it mean i mean if the level changes where for instance something like this over this period the uh process oscillates around something say whatever 1.35 collapses than here but if you abstract away from the mean the type of dynamics is the same it's roughly equally volatile it's just that it happens around different levels needless to say if we try to estimate something like this with exponential smoothing we're gonna have massive problems until it catches up to a new level around here which means it's probably quite useful to be able to approximately at least catch those points when things change voila cuts again biocpd is one of i beautypiece is i think one of three or four change point detection algorithms in in cuts uh if you're impatient feel free to read up yourself if you can spur an extra week or so then wait for python wait for part five uh same as before we imported the detector object fitted voila approximately identified what's going on uh on a closing note as i mentioned up earlier this is not even a crash introduction this is an intro to an overview that's that's as best because the time series is a field has been around literally for a hundred years or almost uh there's a lot to talk about in terms of applications uh in the context of the notebook series as i mentioned earlier the things i have mapped out states-based models which i'm a huge fan because of how many things pop up as special cases from state space validation strategies for time series in particular why cross-validation is by by definition almost or almost always a bad idea for time series machine learning methods and obviously that includes deep learning hashtag lstm and also something that for some reason is not uh that well covered and not that many people looking to it which i think is a bloody shame personally our multivariate time series and specifically hierarchical models essentially hierarchical models is the fastest way to think about it say you have you are trying to predict uh gdp of european countries if you take into account the fact that let's just say even within the eu you can have aggregate output on eu level and you also have individual ones and you try to incorporate the dependencies between them you can then do a much better job in your prediction than you would have been able to do if you had only used each series individually despite the fact that from a setup and data amount etc point of view the problem would have been way way simpler so stay tuned and vote give up votes for continued motivation uh that's it from me thank you awesome awesome talk conrad thank you very much it was very informative and uh also very funny so i i loved it and uh there's there's still a lot of questions we have if you don't mind staying a bit over time i don't yeah no problem if you want you can stop sharing your screen now uh okay ah stop screen i think stop now yeah it's fine and uh okay let's take let's take some of them and um i if you have time later probably you can don't worry i i am not in any extreme hurry okay great so uh one of the questions that we have here is what if we don't have two full cycles for of data for forecast like fashion products uh you can still have a model just don't just don't do the seasonality just just don't do seasonality simple as that you can still have a legit model capturing a bunch of things just don't do seasonality i mean listen it's not like things will always go wrong because you might be right and something actually is a seasonal effect it's just that the risk that something is horribly wrong and you're gonna confuse a part of the cycle uh with a trend that's that's just too big okay uh the next one is what methods do we use for forecasting rare events which are still modeled by time like like natural disasters oh probably some probably some variation of anomaly detection i mean if you have this the my approach is there is a slight black humor component to it or you know gloomy component that's what it comes down to it's kind of horrible when disasters happen but it gives one more data point which means you can do something about it uh so what but that being said there's still usually very very few so what you can do is you can and that's something i plan to cover in a separate module on anomaly detection is you try to construct some sort of score that describes whether things are normal or not and then you monitor the score and your as your time series progresses and usually when there's something weird that starts going on the anomaly that means the distribution is gonna depart from what was normal behavior and you should be able to capture it i mean doing a super high level spoiler one idea is something like this if you have say two time series uh and you know a period where things are normal on both you can do a bunch of time series for that matter uh you can do time series you can do lump them together as a matrix and do principal components to the analysis to the pca the composition i can decompose i can value the composition that's the phrase i was looking for uh and the idea is the normal behavior that's gonna load to the high eigenvalues the ones that are representing what's normal what is the weird stuff that happens that's gonna load to the small to the tiny ones towards the end of the spectrum so the trick that you can do is you do an eigenvalue decomposition you drop the low eigenvalues and then you reconstruct the original matrix and then you start tracking the composition error reconstruction error and the trick is the normal stuff was loading to the eigenvalues that are still there which means the reconstruction error will be pretty low the weird stuff was loading to the tiny ones which are gone and that's where the construction error is going to explode but this is coming the this this this is coming because uh it's a problem that you encounter quite frequently and i kind of like the elegance of the idea that you can solve the modern problem with such a classic technique like pca great i was i was actually looking for one of the answers given from one of one of the audience but i'm not able to find it but if i do during the next question i i will okay i will share it and we can discuss that so the next question is should we handle missing data so this is something that you didn't discuss today time series as integration no no not so like if missing data percentage is large i mean 45 percent is quite large yes so uh should we handle missing data in those cases we might listen the thing about missing data there's no such thing as a universally good answer like everybody knows there's the no free launch theorem in the context of model performance uh there is a variation on the same theme if you look at uh missing data depending on a problem sometimes we can afford not to care sometimes we can afford with just you know slap a linear interpolation between and never look back we can do it using boosting but i guess it's a little bit of a matter it's a little bit a matter of like a philosophical or most perspective like i think for myself of missing values as just a step that i need to take care of and my ultimate goal is is a predictive model but if 50 of values are missing and i want to handle it using boosting then effectively i have another problem to solve i need to build another predictive model to even use the time series one but then you know if i built a boosting model to handle my missing values well then i might just as well say that the stuff i was going to predict is missing as well and be done so i kind of reformulated the problem completely which is legit it's it's a legit approach uh just just usually not my first choice okay but are there do you have any suggestions on what kind of uh like handling missing value techniques you can apply when like it's a mostly time series analysis like in terms of regression or it's just basic machine learning techniques that you always use carry over the last value essentially you know what it doesn't really matter that much what you apply as long as you as long as you distinguish between what you what was original observation and what is feeling right now at least in my experience like say you had a bunch of zeros and then you decided to fill out the missing values with zeros as well it's fine as long as you keep a column an extra column with an indicator that allows you to distinguish this was an original value so this is a genuine zero and this is just my interpolated zero that i find more important than than the choice of a specific method okay jumping on to the next question um [Music] it's about what kind of algorithms work well for demand forecasting or supply forecasting oh intermediate intermittent demand oh yes one yes one of my favorite horror stories uh there is this oh good gracious croston i think it's called crosstown method or something like this hold on let me let me check quickly yes uh just to get started uh crosstalk method which is effectively a cousin of exponential smoothing except you have two equations running in parallel uh one for the non-zero periods and the other for when will the non-zero periods actually occur and then you're trying to combine it so that's one way uh yeah the other i would probably cast it as a regression problem in this in the in the you know classic male sense and then uh because then it collapses to something like uh intermittent demand it's a bit like we insure with non-life insurance most of the time most things are not on fire but when they are a lot of things get on fire and the losses are very bad what does this mean in practice there's there are long periods of zero zero zero and then you have huge spikes in the series so people have been dealing this problem insurance for like next to forever and uh what frequently works you have two mod you kinda have two models one predicts is it a zero or a non-zero observation and then conditional on the result of this one you have a second one which is saying uh what the actual non-zero value will be i'm pretty sure something like this was applied the last time there was a little demand prediction on cargo competition cargo so i would suggest looking up that one okay and uh yeah sure i mean i'm not so much aware with this method but i would take a look after this talk you didn't spend you know like in biblical terms seven miserable years in corporate finance and for me that included insurance no never i've i've never dealt with time series data so which honestly not so much my head around how many people say that to me because seriously i spent a better part of the last 15 years being absolutely sure that like yeah come on it's a cute skill i picked up along the way and that was it like there's no application for it in 2020 boy little did i know yeah true but yeah everyone is so interested in learning about time series including me and then we have the next question sorry we have to move a little bit so it's uh as as you can read the question on your screen how do you feel them uh depending on what are you feeding them into if you are fitting them into a machine learning algorithm or something like well like boosting like abhishek mentioned then you don't care you can drop those observations i mean or you can just backfill them with something and have an extra column uh that that indicates whether this was a genuine zero genuine zero or a or interpolated one that that's the most important bit honestly should i drop them you absolutely do not need to drop them a priori but if you have like uh five million observations well okay that's an overkill on monthly data if you have 500 observations and you're creating logs then yeah probably dropping three rows doesn't hurt you in a whole lot if you can get away with it as a proportion of the data dump it and don't look back okay this question is about data snooping leading to finding random particles in time series data absolutely you stay long enough you'll find anything you stand long enough until you'll find anything because there is no such thing as geneva convention that prohibits torture for data so you you drill long enough you'll find everything in there and that's exactly like i remember when i was a student we once got an exercise the exercise was literally find the flaw in the reasoning because we were presenting with full-blown written in all seriousness exercise and like analysis you know data graphs you name it proving that storks bring children like okay that's the european variant whatever it is you're part of the word i'm sure there is some variant of that european variant when you explain to really little children where the babies come from you say storks come and drop them in the field and you're looking looking looking for this analysis and you're like what the hell the variable of stork presence is important it is number one predictor until you know after going back and forth a few times we're like wait a minute is there a variable controlling whether it's urban or rural and the moment you edit the variable are we talking about a city or the countryside the variable on presence of storks became irrelevant so you're staring the data long enough especially if something was not fully uh you know analyzed an exhaustive manner you'll find just about anything which is exactly the reason why i'm a huge fan of regularizing data regularizing the daylight out of anything that works yeah okay good funny answer um so next one so given a time series would you advise making dual analysis one to analyze the whole time series and also smaller parts of it to understand time series better as long as you don't spend too much time on the analysis of the sub parts absolutely in fact i don't view it i i view it as a sort of part of the same process really like if i wanna find out what's actually going on in the series it's just more or less stable over time this is most certainly one of the things i'm gonna do i'm gonna look at small i probably wouldn't look at the random segments in a time series but uniform like contiguous ones a window here a window there most certainly most certainly gives you a better understanding worst case scenario you will come away with a confident knowledge yeah stuff is stable i'm good that's the worst that can happen which is not a bad outcome let's be honest okay the next question is from the same person and he wants to know about the the difference between i'm not able to find the question so what is the difference between profit and cats what is the real difference between prophet is one of the models incorporated in cats prophet uh for pro prophet is uh um prophet is an ak-47 of time series models it's supposed to be the one tool that you will use to solve just about any problem you that falls in your lab that's related to daily modeling daily data cuts on the other hand it's a bit like a toolbox what cuts it has is it has profit as one of the possible models you can select to to well for for prediction so that's on the upside because i think there's like eight or nine more and that stuff ranges from like tata model which is uh you know ugly cousin of exponential smoothing to things like lso to lstm so literally from vintage to modern uh but on the downside it doesn't have uh the kind of detailed diagnostics or the ability to actually drill into the model to find out what's going on uh it has the functionality on ensembling it has more elaborate functionality on detecting change points so yeah they kind of overlap they come from the same source one encapsulates parts of the other but other than that it's it's yeah i don't think it's really that meaningful a comparison now that i think about it i have not used any of them so i would never know um so this question uh we will still take a couple of more questions if that's okay i'm cool yes probably as long as you are careful not to not to leak if you are careful with with you know with not introducing leakage then yes by all means by all means in fact something i've been doing on a fairly regular basis is if you want to build a decent predictive model in a machine learning sense sorry using machine learning approach uh then yes of course you can use light gbm or xg booster your favorite uh ensemble of trees uh with one big disclaimer they do not extrapolate which means if there's a possibility of that of data in the future being in a range that has not materialized before or visualized manifest or manifested manifested that's the way i was looking for hasn't manifested before er light gbm is going it can't can't capture that which is why a trick that can be used is you build a sort of parallel model linear so of course it's going to be dumber it's not going to be as accurate and whatnot which you then predictions from this model you plug as features into your lgbm the point of a linear model on the side is not to have good quality predictions it's just to have something operating in the broader range because lgbm purely by itself does not extrapolate from the range that was in the data linear models do so with hold winters i'm fairly certain you can you can achieve something similar like i said just be super super careful about introducing leakage okay the questions keep coming there's still so many questions so i i haven't really can i just make one tiny quest literally a minute break i need to go get some water because my voice so while conrad is taking getting some water i i think it's a good idea so like we have the pole and in that pole there are 205 votes and it i've asked if you're a beginner in time series and 80 percent of you have said yes so um and at the end of the presentation today i will also be sharing uh the links to conrad's tutorials and uh you will you can go to these kaggle notebooks and learn a lot more about time series yes ah i have water i'm re-energized okay i think i think we have had all already a lot of questions um let's take a couple more not more than that and then let's enjoy the weekend no that's fine best approaches to the multivariate forecasting out of the box if you have no better idea do a bunch of univariate models slightly more and then combine them slightly more slightly smarter version uh vector autoregression for lack of a better time it's like well yeah vector version of an autoregressive model if your taste goes more in the deep learning department then tubnet okay conrad really we still have a lot of questions but i'm not i'm not going through all of them anymore we i i can do another 15. that's okay i think i think it's fine now so um like i i made a poll while you were gone for water i was talking about the polls i made a poll and i asked are you a beginner in time series and eighty percent people say yes and twenty percent no two more than uh i think it's 206 people who voted during your presentation and yeah so this question i always ask people like if they're presenting some something new or some some some topic um uh how would beginners go about learning about time series so do you recommend any books where to start from should they start from your own books well i think i've done enough so for self-promotion for one day let me quickly check something in terms of the uh books to read yes okay this is going to be horrible self-promotion again but the fastest way just to get to the recommendations is the the first of my notebooks in the series which towards the end of the groundwork section has something called like saying some useful references include the thing about useful references the the question is what is it that you want to do because there's a bunch of ways you can go about it if you just want to understand from a very very practical point of view then pretty much anything that rob hindman writes because the only important point to make hindman does all his analysis was that he publishes in r so if you are very religious in the r versus python for statistics conflict that this might be and then this might be an issue for you i'm not i believe in the best tool for the job uh but that might be a stumbling block for some people because r is not as widespread as widespread as it used to be like a couple years ago in terms of the other thing that i have here uh durbin and koppman time series analysis by state space methods if you want to understand f uh really good and general class of moles in state space models pretty much you understand this you understand common filter and then you're getting exponential smoothing arima and a bunch of other things pretty much for free as special cases and the um the only price for that is you go have to have to go literally like once through two pages of matrix algebra matrix algebra which let's be honest ain't that much uh there's this and if you want like a more academic treatment version then the first position on the list in there broccoli and davies broccoli and davis is really good it's it's a little older but it will give you a solid for the solid grasp of the foundation great i think i should get some of them uh okay so it's it's been more than one and a half hours and uh i think this is the longest talk we had um ever but also one of the most interesting talks and very funny talk i i had so much to learn people still have questions coming in so i i just wanted to ask will you be sharing your presentation will you be sharing your linkedin or twitter sure how do i share a presentation is a question i mean you can give me the link later and i will attach to the video if if the idea of just passing it to you and then you share it in the channel then yes by all means yeah if it's like yes connor do it right now and figure it out i'm like not do it right now and uh yeah sure so conrad is going to share the presentation and is it fine if people connect with you and ask you some questions they might have related time series later on after the talk sure i'll you know what i think it might be just fastest if you throw it if you throw my twitter handle in in the channel sure i think this might be most efficient uh because yeah the alternative is like what i don't have a youtube channel and doing all of it for linkedin probably isn't the most efficient manner so yeah i have to think that we do need one more session with you maybe some other name brother you know what just give me i think that just two things would need to happen one is uh i think we need to make it more focused so i can go into a little bit more detail on something and uh maybe just maybe give me another like two months so i can speed up a little bit on my schedule regarding uh the notebooks because then then it's just easier to to refer to something and also i need to learn how to do code blocking markdown so they don't look so horrible next time i think it was readable so it's fine i'm trying to strive for a little more than just merely readable it was not a lot of code so that's that's good so thank you conrad thank you once again for taking the time out and amazing talk uh thanks to all the audience who joined today yes thanks nice thanks for the questions i wish we had managed to answer more but i always do and yeah enjoy the rest of your weekend and see you next week thank you see you bye
Info
Channel: Abhishek Thakur
Views: 3,069
Rating: 4.9796953 out of 5
Keywords:
Id: cKzXOOtOXYY
Channel Id: undefined
Length: 95min 54sec (5754 seconds)
Published: Sat Sep 25 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.