Time, Interrupted: Measuring Intervention Effects with Interrupted Time-Series Analysis - Ben Cohen

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we're good to go my name is Reuben Kumar and I want to welcome Ben Cohen he's gonna be doing time interrupted measuring intervention effects with interrupted time series analysis I just want to write everyone shut your cellphone's off trying turning the volume down on your laptop so we don't have any interruptions all the Thaksin tutorials will be recorded so they'll be online pi data records all of them so you can watch the other recorded PI data talks from New York or Cordoba as well online if you enjoy this format and can you all give me a i'll help me give a warm welcome to you Ben Cohen hi everyone and welcome sorry sorry for the AV difficulties but you know first talk of the day so there you go so this talk is time interrupted measuring intervention effects with interrupted time series analysis a little bit about myself my name is Ben Cohen and I'm a data scientist at Warby Parker I'm sorry the clicker there we go sorry I'm a data scientist Warby Parker and for those of you who aren't familiar may not be familiar with Warby Parker this is a bit of our mission statement from our website but you know essentially we're we're a company that sells eyeglasses and sunglasses and we were founded about eight years ago and we started out as an e-commerce company but we've now expanded a lot and we also now have around 80 retail stores in the United States and two in Canada and a big part of our business model is that you know we do a lot of things in-house everything from designing our glasses to you know running our own customer service and that extends to our tech infrastructure as well so you know while we do use some off-the-shelf software we've also written internal software for a lot of things in our business everything from supply chain management to order processing our retail point of sale and what this means is that we have a lot of data so this data science team at Warby Parker really supports stakeholders throughout the company by helping them make better decisions with data and like I said we have a lot of different kinds of data that we work with but this talk is about some work that we did with the marketing team so what is interrupted time series analysis it's a form of causal inference and it's designed for situations where traditional experiments are impossible and I'll give some examples of the situations in a bit but first I want to go over what I mean when I say traditional experiments so this comes up a lot in in in digital performance marketing so this is an area where we often apply experimental design and you know we have a lot of tools are built around digital marketing to help us and and basically you know the way that they work is they've give us the ability the revolutionary ability relate to do direct attribution in advertising right so when you look at an ad online we know you know who sees it who clicks on it what they do later on our website whether they become a customer etc and and you know a lot of a lot of our industry is built around this and you know one of the core tools that we use here is a B testing and this is something that's enabled by this direct attribution so let's talk about a/b testing I'm sure that a lot of people here are familiar with it but I'm just going to give a brief overview you know the idea is that we have some population of customers or of individuals and we randomly bucket them into two into two or more groups or you know and those versions those groups you know could be called treatment and control and we show them different versions of something so different versions of our website or of an advertisements and then for each of those groups we measure target metrics like conversion rate or you know clicks on the website etc and then by compare the metrics for those two groups and we can use a statistical model to make some inference about the effect of the differences between what we showed them and so this could be a traditional you know kind of null hypothesis the significance test or you know a Bayesian model there's lots of different approaches to a/b testing but that's the the basic idea so now the problem that happened recently is that you know our our we we wanted to do a national TV campaign and we haven't done this a lot and our marketing team came came to us ask for help in figuring out how to measure the impact of this so you know we're showing national TV ads do we make money from this and so there's a big problem with this which is that with TV there's no way to randomize exposures and beyond that there's actually even no way to even know who saw the ad so a lot of times you may have a Gregorian TV program but when you're when you're talking about an individual person coming to your website there's no way of knowing whether or not they were exposed to a TV commercial so you know one one thing that you could do to sort of replicate the idea behind a be test is geo-targeting right so you could run the ad in certain cities or certain markets that are similar to each other and and compare across geographies and that way you you could sort of capture some of the idea behind an a/b test although it wouldn't be strictly randomized but this has one big drawback which is that it's really expensive it turns out that it's way more expensive you know on a per impression basis to do locally targeted TV ads and to do national ones and in our case we really wanted to run ads nationally in order to get that higher larger reach so we needed a different approach so I want to take a step back and ask a fundamental question how can we know if something we did had an effect so the big problem is that there are a lot of different causal factors that can impact how many people go to a website or buy a product right so you know it could be that the added we showed them was the causal factor but it could be something about you know who they are their demographics it could be you know other ads that they saw or you know all kinds of different things the time of day and so you want to control for the ones that you care about in this case exposure to the TV ad and and control and control for all the other ones that you that you don't care about and so you know in science we have a gold standard for this it's randomized controlled trials and this is you know the classic experimental design behind for example clinical drug trials right the idea is that you randomly assign from your target population into treatment and control groups and the idea behind this is that the randomization controls for your confounding factors because you if you if you start with the same group of people and you randomize them then you can be pretty sure that the only difference between the groups is your treatment and if this sounds familiar it's because a beeper set a be testing is precisely randomized controlled trial but typically in a digital medium and again the problem here is that with our TV example is that we can't have a control group because we don't actually control the assignment process and and we don't know who's in what group we aren't able to observe people who weren't exposed to the intervention right we know you know we know that some people saw the intervention and some some people saw the ad and some people didn't but we don't know which ones are which right and so stating the problem this way can lead to a useful reformulation of the problem a restatement of the problem which is what would have happened if we hadn't shown the ad to anybody so again we know that out of the population some fraction of people saw the ad but we don't know who that is but we can't ask the question what have we just hadn't done this at all and hadn't shown the ad to anybody and this is an example of a counterfactual you know and I put the definition up here but basically a counterfactual is a snare oh that didn't happen right and the idea here is that if we can guess the counterfactual we can estimate the effect by comparing it with what actually did happen and so this this problem of guessing the counterfactual you know transforms the problem into a prediction problem predicting the counterfactual and fortunately for us as data scientists prediction is something that we that you know that's our bread and butter so this brings us to interrupt a time series so there are many ways of constructing counterfactual models and and there you know there's a whole field of causal inference that deals with this but I'm going to discuss one particular technique and it relies on two important properties that this scenario has and also lots of other similar scenarios so this this model i TS is a counterfactual model that model that works when first of all the intervention begins at a specific known point in time so we know exactly what day and what time we started showing TV ads right and we control that and then also we have sequential observations from both before and during the intervention in other words we have you know time series data about whatever metrics we care about going back into the past from before we were showing TV ads and we also will continue collecting them sequentially afterwards so to capture the idea here take a look at this plot from a hypothetical campaign so what we have here is a time series this could be you know sales or some other metric that you care about and the dashed line indicates where in time the campaign starts and so just looking at this plot I'm sure everyone here would intuitively conclude that you know that this campaign had a positive effect and what's going on here is that based on our understanding of the world and our our beliefs we have a strong prior that the data generating process would have continued in a similar way absent the intervention so essentially what you're probably doing mentally is is fill in the counterfactual like this and it's precisely this intuition that the technique of I TS seeks to formalize this goes by various names sometimes the counterfactual is referred to as a baseline prediction I like the term interrupted time series because it highlights that what we're looking for is a discontinuity in the in the time series and in fact IITs could be considered a special case of another causal inference technique or our suite of techniques called regression discontinuity and it's a special case because here the discontinuity happens in time and this technique is frequently used in econometrics and social science and it has a lot of other potential applications particularly you know in public health right so if you do some sort of population level policy change or an education other public policy another potential application is an outage postmortem you know so like let's say your website goes down and you want to you know you want to assess what the impact was on your business but you you actually don't have any data for when that happens another one is when you have to make a website changes but you can't a be test right so sometimes there may be depending on your industry there may be a regulatory reason or some other you know high corporate policy reason why you have to make some change to your to your website or you know to some products and you're not allowed to a/b test it right another potential application is you know if if you have some press piece about your company or you know some PR event you know something out of the ordinary and you want to measure the impact so I'm gonna go on and continue using this TV campaign as a motivating example but I just wanted to highlight that it's a very general technique it has a lot of potential applications so how do you go about building a time series counterfactual so let's look again at this simple example and when I say it's simple I deliberately made it as as simple as I could so our data here is just drawn from a normal distribution so it's just white noise with a mean that changes once a chain the mean jumps up when the intervention happens and so our counterfactual in this case is just that the mean would have stayed the same so in a case like this all you have to do is look at the summary statistics look at a histogram of the data from before-and-after right and you just have two normal distributions and so you can you know use your standard toolbox of statistical tests and models to calculate the lifts but of course in the real world we know that our data will seldom or never look like this and in particular it will tend to differ in two key respects which I'll discuss in turn so the first one is non stationarity and so in the previous example both the mean and the variance of our time series stay the same over time but that's not necessarily going to be the case so in this example the mean is changing over time and what we have is that the the sales were actually already trending up even before the intervention and what this means is that if we just naively summarize the data from before and after like we did before we're gonna get a nonsensical and misleading results and in particular you know if you just naively you know average out the values here you'll see that you know the the variances of these two distributions look totally different but that's that's not really capturing what's really going on so even if you just fix the mean at its most recent value you know you're still going to overestimate the lift significantly here because again you're not capturing the fact that that upward trend and in fact you know if you think about this as a business problem overestimating the effect of your ad spend is actually the worst case scenario because it means that you're going to be losing money and you won't know it right you'll be wasting money on ads that don't work so what you really want is a model that understands how the distribution is changing over time and how we can extrapolate that into the future right and so this is where time series models come into play and the other big problem is other than non-stationarity as auto-correlation right so you can have a mean that roughly stays the same over time but the values actually sort of meander around the mean and what's going on is that you know when you have autocorrelation then the value at any particular time step is correlated with the value at a previous time step so it's more likely that you'll see small changes locally as opposed to large changes and you know you can diagnose this with it with an autocorrelation plot and you know by the way if you had a chance to go to the there was an excellent time series models tutorial yesterday where we you know went over some of these things but anyway the unfortunately these two properties knock out many statists and 'red statistical models and tests and regression models because a lot of these techniques like that we're used to using for a/b testing make an assumption and IID assumption about the data that it's independent and identically distributed and unfortunately what happens here is that autocorrelation violates the independence assumption and and non stationarity violates the identically distributed constraint so this is why we need a dedicated time series model through this kind of prediction and you may have heard of some of these different types of models there's a lot of them out there you know exponential smoothing ARIMA etc there's a there are you know if you look online at like how many different time series forecasting packages there are there are literally hundreds maybe thousands of them so I'm not going to go into all of them but I do want to talk about you know what they sort of have in common which is that fundamentally what they all try to do is to make predictions by explicitly modeling temporal structure in the data so this time dependence in the data you know you can think of it in one sense as a nuisance because it means that you have to throw a lot of standard the standard toolbox out the window but another way of thinking about it is that it's actually really useful because this temporal structure this time dependence carries a lot of information about the underlying process and therefore predictive power and this is what time series models try to take advantage of so like I said I'm not going to talk about all the different ways that you could do time series modeling you could really use any of them for this technique but I want to talk about the types of features that can exist in in real world data and the types of things that your models might have to account for and in doing so I'll explain sort of how we decided on what tool to use so starting off simply again you know you can have random noise that may or may not have autocorrelation you can have trend and it may be nonlinear right so the the slope of the trend can change at different points in time you can have day of the week seasonality this is very common for business data right like people shop on certain days of the week or you know people do different things at different times and then on top of that you can have you know a lower frequency periodic effect you know of annual seasonality where things happen at different times of the year and you can have holiday effects where specific days of the year tend to be outliers because of something that you know that happens on that day I think in this example I used national donut day and national bagel day as the holidays in this simulated data and the last thing another thing is that you know you can have other covariates that that have some sort of causal effect on on the data so in this example there's another time series which is your spend across other marketing channels right so you know somebody might have seen some of your online ads and that could affect you know how they how they behave you know in addition to how they know whether or not they saw a TV ad so another thing that you really want from from a model especially in this context is prediction intervals it's very important in this particular application to have a way of quantifying uncertainty because again you know the purpose if you think about the purpose of doing this kind of modeling this kind of attribution or lift modeling is that it's really in some sense risk mitigation you want to make sure that your spend it has is has a positive return on investments and you want to have a sense of how confidence you are about that and so if you only have a point estimate it's very difficult to know how much to trust that for example if our model says that our ad generated a 5% lift it's really different from a business perspective if that confidence interval is between 2 and 7% or if it's between negative 5 and positive 15% right that matters a lot in terms of how you decide how to allocate resources as a business so for this type of problem having reliable prediction intervals can be just as important if not more important than the than just the accuracy of the prediction itself but there's actually a big problem with generating prediction intervals or on you know uncertainty bars in time series forecasts and it's it's summed up in this quote by Hindman from from his textbook on time series forecasting but basically what it says is that you know typical typically time series models have very poor empirical coverage of their prediction intervals and there are a number of reasons with that I'll get to some of those later but one in large part this this happens for a technical reason which is that a lot of traditional models like a Rhema will incorporate you know a random error term into the model but they they won't necessarily incorporate uncertainty in the parameter estimates and actually for all but the simplest time series models there's no closed-form or analytical way of incorporating that kind of uncertainty and so there are really two primary options that people use one is bootstrapping where essentially your your resampling from your training data and this actually is a little bit perilous for time series data because of that autocorrelation and on stationarity right like how do you resample from a time series it's actually pretty complicated and then the other one is Bayesian inference where you're actually fitting instead point estimates you're fitting a posterior distribution for your parameters so ultimately we decided to go with a library called profit it's a Facebook released open-source project and it is a Bayesian it uses a Bayesian inference engine under the hood and one thing that's really cool about it is that it it uses Stan which is a Bayesian inference tool on the back end but it has api's in both R and Python that are you know basically 100% feature can paddle equivalents etc and it's built with daily and sub daily business data in mind it supports all these things I mentioned before like changing trend multiple different levels of seasonality external regressors all that stuff so you know I'm I'm not being paid by Facebook to pitch this we we we found it very useful but you know your mileage may vary but we found that it that it that was a good fit for this for this problem so now I'm going to talk about how you actually go about training and prediction for interrupted time series model so let's go back to our simulated data set that I was showing you before so this is the same artificial data set but I've added a simulated campaign period where I added about a 10% lift after the dashed line and you know this actually illustrates that because of the complexity of the data here it's very hard to eyeball this and and actually know so that's you know why we need a model so recall that our counterfactual is that if we hadn't done anything that the data generating process would have continued in essentially the same way so what that means is that what we do what we're going to do is use the historical data up until the beginning of the campaign as our training data and then we're going to make a prediction for each time step during the campaign but of course as good data scientists we also need to be able to test our data test our model rather and we want to do that on out-of-sample testing data so this is another tricky thing to do but we went with a simple approach where you know we cut the data off a little ways before where our predictions will start and then use that as our test set so we're going to train a model using all the data up to that point and then a you know measure the accuracy so now you can check you know the accuracy of its predictions on the test set and measure you know your accuracy metrics so just a word of warning here you want to be careful about choosing accuracy metrics for a time series model because some traditional regression accuracy metrics like root mean squared error can have some problems where they're not scale invariant if the variance changes over time right so you just want to be careful about that so one popular choice is mean absolute percentage error but there's other ones including information criteria and various other ways of measuring that you know how good your model is you also want to you know look at autocorrelation in the residuals or the errors of your model because if your errors are correlated in time it basically means that you're leaving predictive power on the table you're not accounting for all of the temporal structure time dependent structure of the data and then lastly you also really want to look at the empirical coverage of your prediction intervals so basically if your confidence interval says you know if you have a 95% confidence interval you want to check that it actually contains the true value roughly 95% of the time and again that's because of this you know this problem with coverage of prediction intervals and so you know you kind of just want to calibrate your uncertainty metric so that you can trust it so once you've done this and you're satisfied with your model and your hyper parameters etc you then can retrain the model using all of the data up to the beginning of the intervention and make your forecast so here you can see that the the blue line is the counterfactual prediction and the green line as the actual observed and it may be hard to see in this view but you know it looks like the the blue line is slightly below the green line meaning our model thinks that there was some positive effect from the campaign so let's talk about how we can actually analyze that lift and try to quantify it so one really nice thing about using a Bayesian framework like profit is that it gives you samples from the posterior predictive distribution and the best way to think about this is that it's basically a probability distribution over possible futures possible outcomes and the nice thing about these samples is that they're really easy to work with to generate all kinds of summary statistics and you can work with them to empirically so for example this is how we get prediction intervals you can generate prediction intervals by looking at the quantiles of your posterior samples and here we generate that we can generate them for both point wise so day by day and cumulative metrics and this is really useful because as you can see in this case on any given day it looks like there's a you know a small positive effects but it actually may not be outside of your margin of error but you actually get more power when you look cumulatively and that's because you know intuitively a small lift on any particular day could easily be a fluke but if you see that same small lift every day over a long period of time it becomes less and less likely that it's a coincidence and so what you can see in the cumulative view on the right is that pretty quickly as you look at cumulative ly the the two lines the counterfactual diverges from the actual observed values and they move outside of the margin of error and another thing you can do is actually get intervals around the lift itself and you can do this just by subtracting the the posterior samples from the actuals and so in these views I'm showing lift as a percentage and and here you can see even more clearly how the cumulative uncertainty actually tends to shrink over time and you can see that for many individual days we're not you know 95% sure that the a positive effect at all but that the the margin of error is quite a bit smaller when you look cumulatively but it's still useful to look at multiple different time scales because you know in real life your TV campaign your intervention may not have this simple you know step change effect that that the example has right so you can have some combination of a change in level or a change in slope there can be a delayed effect or there can be you know decay etc and so if you want to get some insights into this it can be useful to look at multiple different time scales and you know lastly you know another useful feature of using a Bayesian package like like profit is is that it makes it really easy to answer probability based questions like what's the probability that the cumulative lift was greater than or equal to five percent or some other cutoff point that's meaningful to the business and this is a really useful way of communicating results to stakeholders so in the last section of the talk I'm going to talk about some threats to validity so you know I've gone through interrupted time series analysis and maybe painted a little bit of a rosy picture of it and in the truth is that it's a very useful technique but as always there are caveats so you know in the literature this is often referred to as a quasi experimental design and that's really you know just to emphasize that it's not as good as a real experiment you know your counterfactual is not the same as having a control group because it's something that you didn't actually observe it's something you're making an inference about and so you can only have some sort of probabilistic idea about it and therefore you can't make nearly as strong of a causal claim as you could with a randomized experiment so so excuse me so the first threat the ability I want to talk about is that there's there's some change in the underlying process so you know basically the the the underlying data generating process change during the during the forecast period you know it's some fundamental way so this model assumes that any lifts you see during the campaign period is a is attributable to your intervention but that might not be a valid assumption you know something could change globally you know the economy could tank or a competitor could release a product you know some unobserved thing could happen that changes the the fundamental dynamics of the process of the system and so there are two ways I want to talk about that you can sort of mitigate this so one is that you can look both before and after the intervention period so if you have a scheduled end to your intervention then you know under this counterfactual assumption you would expect that things would kind of go back to the way that they were before and so if that doesn't happen that's that's a warning sign that something changed something else changed that you weren't accounting for and and secondly you can look at other similar time series that you know you think follow a similar process but should be unaffected by your intervention so for example if you have access to market level data for your industry you know then you would expect that like the same underlying you know global factors would affect that but if your model is predicting that you know your ad boosted your competitors sales then that's probably a sign that something else is is going wrong like it that something else is going on that your model doesn't know about sometimes these kinds of changes can be modeled explicitly as confounding variables so in this case the problem could simply be that there are causal factors known causal factors that your model doesn't account for so for example you know somebody could you know so somebody else in the biz could have you know increased the spend on some other marketing channel and they didn't tell you about it or something like that or you know there was well you know there was bad weather and so you know nobody came into your store and oftentimes and so in both of those cases you know there's now an identification problem in your model but if you if you can think of those factors and you have data on them then you can add them as covariates into your model so in the example I gave earlier you know we include our spend across different channels in the model and therefore you know we can we can control for that lastly you know even if there isn't some causal factor you're not accounting for obviously your model could still be misspecified you know it failed to capture reality so you know there could be problems with your the convergence of your model it could be overly confident it could be overfitting etc all of the typical types of issues that we deal with in every kind of modeling you know all models are misspecified to some degree so you can ever escape this but there are certain things you can do to build trust so one is that you can compare several models so one practical example of this is you know in doing a TV campaign we work with an outside agency partner and they have their own team of data scientists that that built their own in-house attribution model and we can compare the answers from the two models that were developed independently and and get a sense of you know whether they agree or disagree and that can you know give us some insight into how much to trust either one and then another a similar approach is to assess multiple outcome variables so you know you can look at sales or you can look at click-through rate or you can look at you know unique visitors you could look in retail or online right these are all different outcomes that are correlated with each other but not perfectly correlated so hopefully by by modeling all of them separately you know you could see if one of them had a spurious results you know or whether they all agree with each other so I'm going to conclude with some folk wisdom it's difficult to make predictions especially about the future you know it's not a silver bullet and it's just one of many ways that you can you know look at marketing data or any other kind of data to try to assess the performance of your marketing or whatever you're doing and if you have the choice it's always best to do a controlled experiment that's always going to give you the most reliable causal inference but there are many cases in the real world where that's not possible and in those cases using a counterfactual method like interrupted time series analysis can oftentimes be the next best thing so thank you very much and I will take questions [Applause] thank you hi I was wondering whether you could speak more to the business application of this at Warby Parker and in particular how what kind of metrics you look that look at for the time series and how you define the after intervention period since I imagine people buy glasses relatively infrequently and may not interact much with Warby Parker otherwise yeah that's a good question so yeah we look at we look at a bunch of different ones like I said so you know that obviously and and that's a good point that different metrics can have different time lags so the first match the first metric that we'll see you know the the place where we'll see the effect is soonest is if somebody comes to our website right that will happen pretty shortly after you know after you see the ad although sometimes you hope you may have to see the ad several times or you may think of it later right in terms of our business you know one of the one of the really cool unique things about our company is our home Tryon program and that's a program we have where you can try on five free pairs at home with free shipping and return shipping etc and you can you know pick which one you like and so we look at the number of people who sign up for a home Tryon and then you know we have lots of historical data about you know what you know how long it takes for people to convert from a home Tryon etc and then we also like I said we have now a large chain of retail stores as well and so we can look at metrics from those to see if you know if a campaign is impacting whether people go into the stores or not other questions can you share an example of a situation where this worked well like you had a high confidence in your counterfactual and we're there any situations where you tried this where you just couldn't find a good counterfactual prediction yeah so I think that well you know one one I don't know if I can give a specific example but one one type of thing that could happen that that could be confusing right is that if you're if your actual metrics are below where you predicted the counterfactual that can be confusing right because there there's not a plausible causal mechanism that says that like doing more advertising will cause people to shop less so that is a pretty good indication that there's something you're missing you know that there was something wrong with your with your prediction I mean in terms of when it works well I think that again like when you have agreement between the different KPIs or between different independent models that were made using slightly different assumptions you know the more you have agreement between different methods I think the more trust you have in it I have a question about comparing different like channels of advertisement and then figure out like which one was more effective do you do any of that like basically compare the value of different channels not for TV advertisement versus digital or something like that yeah so that's that's a you know that's what that's an interesting question so so you know one of the things that we're trying to do here again is like model the other channels and then you know this is a pretty unique situation in that we recently started doing TV ads right so you have this like clear demarcation in time of when something starts and stops as far as measure comparing other channels yeah I mean I think that that's like a whole other other topic but you know there's all kinds of you know regression based approaches that you can use for that one one one reason that that can be tricky to do actually is that you can often have a lot of multicollinearity between different channels so there's a lot there you know that's it's a fraught topic and there's a lot of issues with I did a identification there he talked a little bit about the data preparation for being able to view an interrupted time series you kind of have this clean segment of data leading up to your intervention but I can imagine there are times where there are different ad campaigns running in sequence and sometimes right after the other that might influence your your after effect and also will affect your attempt to do this on the next campaign yeah totally I think I said this is pretty new to us as a company so we haven't had to deal with too many of that kind of that kind of situation but I mean I think that you know ultimately there's no free lunch right like if like being intentional about how you design an X you know this is still even though it's not a you know a randomized experiment it's still an experiment in some in the sense that we're deciding to start doing ads at a certain time you know and so I think having some intentionality and and communicating across the business about how you plan to how you plan to to launch these things how you plan to measure them I think that's really key to getting a good outcome so my name is Ravi and I have a question regarding the ramp up period so when you in one of your slide you said there is a like when the intervention starts so you was laying down a time series and they're in dimension starts and then you have a different time series but in most businesses there is a ramp up period because the website is not rolled out to all users or things like that the product is not rolled out to all the stores so how do you analytically determine what should be the correct ramp up period before you start measuring the effect yes so I mean that's a good question I think you know if you have a slow ramp up period then this then this may not be the best design because again like if you have the option to compare subgroups some of whom you know we're exposed and some weren't then you can use other types of experimental or quasi-experimental designs so interrupted time series is really targeted at situations you don't have the ability to control the rollout in that way hi my name is sort of a you mentioned about the cases week where we can't do a B testing right and if you do a counterfactual so how do we actually access the our counterfactual is you know correct or how do you forecast with our counterfactual compared to the original time series that has happened after the intervention or in effect yes so I think that's why it's really you know again it's it's vulnerable to the to the the threat to the validity that that you know you have to make a pretty strong assumption that the underlying dynamics of the system of the business etcetera are going to be relatively stable during the window that you're interested in right and and so then what you know given that assumption you know you can use your traditional a typical suite of of model checking tools so it is really important to test these models on some kind of out-of-sample you know holdout set and so like I said there's a whole you know the same tool box you can use in general for assessing how much to trust a model I think is applicable here but again it is it is vulnerable in that you have to you have to make that assumption about about the underlying process staying the same all right thank you everyone if there are more questions you know find me afterwards that I'm happy to talk to people in person thank you
Info
Channel: PyData
Views: 10,514
Rating: 4.9720278 out of 5
Keywords:
Id: uuo8SwA1HO8
Channel Id: undefined
Length: 44min 33sec (2673 seconds)
Published: Mon Dec 03 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.