Inferring the effect of an event using CausalImpact by Kay Brodersen

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's a great pleasure and honor to be here let's begin with a quick show of hands how many of you remember the 1985 movie trilogy back to the future anyone almost everyone okay fantastic well for those of you who were born after 1985 back to the future is about time travel and if you're excited by time travel you will love the field of causal inference causal inference is the branch of statistics that's concerned with effects with the consequences of our actions and that's really important because identifying one causal law in your data can be more powerful than dozens and dozens of correlational patterns that you might find it's why causal inference has been at the heart of statistics for a long time and it's why no big data approach and no modern approach to data science can be successful without an understanding of causal effects so in this session I want to make sure that each and every single one of you walks out with at least a basic intuition of what it means to estimate a causal effect and what our main tools are now the main gold standard method for estimating causal effects is a randomized experiment it's pretty obvious to us nowadays it's only been formalized about 100 years ago but in lots of situations we can't run a randomized experiment because it's too expensive or too difficult or unethical or because assembly wasn't done and in all those situations we want to have a handle on getting an estimate of causal effects even in the absence of an experiment my name is Chi Burleson and in this session I'd like to focus on a particular tool that we've been developing and using at Google called causal impact now let's start with a simple example some of you might remember what happened on the 15th of January last year on that day in the morning at around 9:00 a.m. the Swiss National Bank decided to release peg between the Swiss franc and the euro and what happened an incredibly stable exchange rate for years all of a sudden within minutes became subject to the free forces of the foreign exchange markets now from a causal inference point of view the question that we want to answer is what was the effect of publishing that press release and the answer is pretty simple because we would probably all agree that this is what would have happened to the exchange rate had that press release not been published and in the language of causal inference we have our observed data y1 and all we need to do to understand causal effects is to estimate the counterfactual y0 what would have happened in the absence of the action whose effects were interested in Abadie and colleagues have introduced the term synthetic control popularized that term since we don't really have an experiment there isn't really a control in the experiment sense and yet we want to be able to estimate something that looks just like a control that's important because the difference between the counterfactual and the actual observed data is our causal effect estimate and as a sort of rule of thumb whenever you draw a causal effect in a time series chart you want to draw it as a vertical arrow whenever you see horizontal or diagonal errors they're almost always wrong now in real life things don't always look as clean as they are here so your time series of clicks or conversions or sales or downloads might look more something like this something that has erratic patterns trends day of week patterns month of year seasonal variations and yet the same thing holds we want to be able to estimate what would have happened to our clicks or downloads or sales in the absence of the action that we're interested in so again we want to be able to compute that blue counterfactual that you see here so that we can look at the difference between what actually happened and what we think would have happened the tool I'd like to acquaint you with is called causal impact and it's something we've been developing and using for a while at Google and what I'd like to convince you of is that you can run these kind of analyses that you saw on the previous slide in just three lines of code so we'll work our way towards that before getting to that let's take a step back and sort of remind ourselves of what a causal effect really is so that the problem of causal inference begins with their treatment this could be a product release or the beginning of an advertising campaign or a change in your terms and conditions anything that interacts or interferes with the market that you care about now there's two things you can do you can either administer the treatment take the action make that change or release that product in which case we get to observe the potential outcome under the treatment or you could decide not to take the action in which case we get to observe the potential outcome under no treatment now the causal effect of the action is the difference between two potential outcomes and as you can already into it from this slide we can't take both actions at the same time we can't release the product and not release the product we can't change our terms and not change them at the same time which means we can't ever observe both potential outcomes at the same time and that means that we can't ever observe a causal effect we can never ever observe a causal effect because we can't observe both potential outcomes at the same time so the only options really are to either run an experiment where we observe some of the potential outcomes under the treatment and some under no treatment or to use what statisticians refer to as observational methods observational analysis methods to try and understand and estimate causal effects in the absence of an experiment that's going to be the focus today that there's one exception perhaps and that that returns to the Back to the Future movie that we started with if we could travel back and forth in time all of this wouldn't be a problem we could go into the future in the movie by the way in case you're wondering whether any of this came true just look at your photograph from last year that's the year that was the future in Back to the Future so we could go into the future purchase something like a sports Almanac containing the sports betting results of the past 50 years go back into the past and observe both potential outcomes the first one is the original where Biff Tannen is not particularly successful sort of getting by versus the alternative potential outcome where Biff Tannen is a ultra rich real estate tycoon sitting here in his Tannen Tower again the causal effect of taking that Almanac back into the past is the difference between these two potential outcomes and the movie shows us that if only we could travel back and forth in time we could observe both of them and therefore get an exact observation of the causal effect now a helpful framework for thinking about causal effects on our way to make this slightly more formal is the so called potential outcomes framework it's an idea that's been developed and popularized by name and Rubin Holland and infants and many others and it sort of starts with a simple table if there's one slide for you to remember from this session let it be this one here so in this table every row is an experimental unit for example in a clinical trials will be a patient in a psychology experiment this would be a healthy subject in a web traffic experiment this would be a cookie or a user and in an advertiser experiment this could be an advertiser now the first thing we'll know down is for all of our units of treatment is the treatment status so one for treatment group and zero for control group so in a randomized experiment that column would be randomized you would have allocated units to the treatment group in a randomized fashion and now the trick really is to distinguish between these two potential outcomes the outcome of the treatment why I won and the outcome under no-treatment yr0 so you can see from this table that we know what the treatment group looks like under the treatment those are the ticks but we'll never ever know what the outcome under the treatment would have looked like for the treatment group if they hadn't for the control group if they had been treated and those are the question marks and vice-versa we'll never ever know what the treatment group would have looked like had they not received the treatment and then finally there's often other variables which are sort of fixed characteristics of our units which I'm just collectively referring to as covariates here these are things that we often measure during the pre period before the experiment now sometimes you might see people analyzing an experiment by running something like a t-test on ANOVA and what they're doing in that case is they're comparing the observed outcomes in the treatment group to the observed outcomes in the control group and that's fine however oftentimes we can get more flexible and statistically much more powerful estimates by taking a different approach by actually estimating each of the potential outcomes that are missing in other words if we complete that table with an approach known as imputation we're getting an estimate for all the missing potential outcomes and we can then just read of our causal effect estimate directly from that table now this is something that works beautifully well in a randomized experiment but as we said beginning there are lots of situations where you can't run an experiment because it's too expensive or too difficult or unethical or it simply wasn't done and those cases our world looks a bit more like this we sort of have one marker that we care about and that market is treated we've already shipped our product we've already changed our terms and condition or we've already launched our advertising campaign everywhere so we know what happened what we don't know is what would have happened that we not taking this action so our strategy here is going to be to estimate that counterfactual yi0 and then just compare the two so all of the machinery is going to go into estimating that account of actual using statistical models deep learning models you name it inference is going to proceed by doing this repeatedly so on the first iteration we get one estimate and we repeat that across iterations so that cross all of these iterations we get a distribution of a causal effect because that's really what we're interested in a distribution that tells us here's our posterior mean a point estimate of the causal effect and we can use that same distribution to quantify something like a credible interval or a confidence tell about our effect now let's say let's make all of this concrete by going back to our example so in this time series the only thing that we know is that we did something to the marker on the first of January 2012 that's where we launched the product or the campaign that splits our time series into these two parts the pre period and the post period and what we want to do is compute an estimate of the counterfactual an estimate of what would have happened had we not taken that action on the first of January and a proper statistic approach demands that we compute not just a point estimate but through these iterations through these repeated simulations we get a credible interval around those predictions telling us the degree of uncertainty that are there lie in our predictions now you might ask how would we ever come up with such a specific prediction of the Caliph actual and the trick really is to use other time series which are related to our outcome of interest so here are two examples red and green are other time series which are themselves not affected by the treatment but which are predictive of our outcome for example web searches for our industry or web searches for our competitors products or the stock market or even the weather all of these are can be useful time series that are correlated with our outcome of interest yet they're not directly themselves affected by the treatment that makes them great predictors for example in this in this case here you can see that whenever the green time series goes up so does the black one and similarly the red time series has this Holliday spike just before the end of the year and again you can see that in the black time series as well so here's our strategy then in a nutshell we want to train a model a statistical model in machine learning model any kind of model really to learn that relationship in the pre period learn the relationship between how we can explain black as a function of red and green and then we're going to apply that model in the post period and that's prediction is going to give us our counterfactual estimate so we train our model in the pre period and then apply the model in the post period that's the entire idea behind synthetic control estimators now I'm just pushing that plot to the top of the slide and what I'm showing at the bottom the second panel is the difference between the observed data and the counterfactual and you can see that this represents our point wise causal effect and it's sort of hovering around zero in the pre period and then shooting up right after the action was taken and decaying back down now in this case I injected a causal effect here just for illustration and the true causal effect in this example is what you can just about see by the green solid line in the second panel all of that goes to show that in this case the method correctly recovered the true causal effect from the noisy data as I mentioned you can use any model here and the particular modeling approach that we prefer in this case is a family of models known as Bayesian structural time series models but this could be any other model so the model that I've used here on the previous slide is what you see here as a graphical model representation distinguishing between observed data and latent states this is the kind of model as implemented in the causal impact tool but you could use any other model as well now let's up let's take a look at a concrete example Google AdWords connects users and businesses in all those cases where the best answer to a search query might be a business like your business so an obvious application then for a tool like this will be to ask well how did my advertising campaign perform how how well do we do was it worth to spend so here's an example of an advertiser who started an advertising campaign in week zero of this time series chart the black solid line is the number of clicks this advertiser got on every day in the US and the blue dotted line is our counterfactual estimate this is telling us this is how many clicks this advertiser would have gotten every day had they not run this advertising campaign and if you if you sum this up over the six weeks during which the advertising campaign was run you end up with about eighty five thousand clicks that's the incremental effect of the campaign now you might ask well how do we know that these are accurate estimates and that's an important question but we need to make sure that we validate these sort of methods against cases what we have ground truth and one way of doing that is to run a randomized experiment go back to that world where everything is clean and controlled and simple and in fact that's what this advertiser did so only half of the US were targeted by this campaign the other half were exempt and so the second plot here shows how you would have traditionally analyzed this sort of data where the black solid line shows the number of actual clicks and the blue dotted line shows the number of clicks in the control group which didn't receive any advertising now the fact that these two panels log almost identical tells us that in this case the method did an incredible job at estimating how many clicks we would have had without the campaign even without any access to control group causal impact is a tool that we found we helpful in our own analyses and that's what we decided to make available as open source software so causal impact is available on github there's a blog post describing it and there's a paper that goes into a lot more detail about the methods underlying the tool but let me show you in practice how this works and how each and every single one of you can run their causal impact analysis in a couple of lines of our code okay so in this case I'm looking at our studio here and I've prepared a little toy data set which I'm just going to load into my our session so this is a data set of 100 observations with an outcome variable called Y and a single predictor time series called x1 in practice you would typically have a handful or perhaps even dozens of these predictor variables and the tool uses a spike and slab prior to automatically find out which of these predictors are useful let's take a quick look at these data so this is what the what the data looks like the black solid line here is the actual observed data over time and the red dotted line is our predictor variable now you can see that the two are really correlated in the pre period so that's that's a great kind of setup for this analysis and then you can see that they sort of diverge in the post period after time point seventy time point seventy that's the point where we took our action now we could probably all sort of draw roughly what we think would have happened to black in the absence of the treatment perhaps it would have gone down slightly since that's what the dotted red line is doing but we really want to formalize that so that we can not just convince ourselves but also convince our colleagues or customers of this kind of analysis so the only thing I'm going to specify here is the prepared and the post period indicating that time point seventy was the date when we started our action and that's really the only thing that's needed to run this most basic kind of analysis you could go a lot further go into more complex models I think it's important that the energy barrier is extremely low so that's the only thing that we need to run a causal impact analysis providing the data and indicating the pre in the post period and then this time they're just elapsed what the tool did is inspected the data constructed a Bayesian structural time series model used a Gib Sam plur for posterior inference and summarize all of the results in this impact object so let's take a look at this object so I can just run the plot function on this and this is returning a GG plot object so I can interact with it in the usual way for example to increase the font size okay so let's take a look at this okay so this plot shows us three panels the top one is the original time series and our counterfactual estimate and a sort of maps our intuition that the time series would have meander down slightly in the absence of the egg of the treatment at time pond 70 the second panel shows the difference between your observed data and the counterfactual estimate that's really our point wise causal effect you can see it's around ten and then finally the third panel is something that makes sense for summable flow quantities like clicks conversions or sales where we're just adding up these causal effects over time so we get a cumulative causal effect that's around 300 by the end of the treatment period if you care about a quantitative summary of what we just saw in these plots you can just call the print method on this object and you get this table that tells us what was the actual activity like what what the activity have been like without our action and those are really the kind of numbers that you'd want to summarize in a report now if you're anything like me and you have a sort of healthy disregard for actually writing down these reports in prose then I recommend this command over here which I'll just copy and paste which gives you a nice sort of prose summary telling you what you just did what the data were what the results look like and how to interpret them without let me take a moment to thank the many colleagues and friends who've been involved in this work at various stages during the development cycle in particular Steve Scott who's the author of the bsts package that causal impact is based on as well as a la lars nicola penny fabian Simon Rowan and hell all of whom have been really strong supporters and contributors to this work and with that finally let me thank you for your attendance and attention thank you thank you okay I don't know how you managed to make simple words is a pretty challenging problem and any questions for Kai please yes the question is where do we get these other time-series from these predictor time series that's really where you as an analyst want to focus your energy on rather than on the modeling mechanics themselves so that's what this tool tries to do in practice other countries are often a great source of getting control time series markets where you didn't take your action the stock market can be a great resource indices like employment labor market indices the weather can be a fantastic predictor because it's arguably one of the things that is definitely unaffected by whatever we do to our markets Google Trends is a fantastic source of time series Google Trends is something that you can access and look for the number of search queries we got for any concept you like over time and often we see Google searches as sort of the unmoved mover in our analyses things that come in that indicate for example an interest in our industry or our products without being directly affected or moved by the actions that we take it seems to me that being able to measure the impact of an investment in this case another testing campaign it makes pretty trivial to derive or calculate the return of investment of any campaign for instance and have you seen any interest following your publication of your paper and to build tools around this library I think it's a great question I think the extending this analysis taking the results and then basically dividing your impact by your investment is the natural step that you almost always want to take based on the results we've been really excited and amazed by the breadth of applications we've seen in fields that we had never thought about after we open sourced this tool and return on investment houses a great example of that okay anyone else yes I think it's on the terms of the previous question how do you prevent for finding a spurious correlation for all those controlled times here is were some best practices it's a great question how do you prevent finding spurious correlations imagine you stick in hundreds and hundreds of times series there's always going to be some that perfectly correlated with the outcome variable in the pre period the best safeguard against this is to back test your method in the past so for example just before you actually run the analysis on your period of interest test your analysis in the past where you didn't take any actions and verify that the tool correctly tells you that there was no effect so back testing the analysis is a great way of validating that your method works and doesn't pick up spurious correlations anyone else I think I think over here we have the medal for me hi ed do you have a method to decide how many time series do you need to understand the impact because in the example you have two time series in the first example then in the our example you had one you have some way to decide if one is enough if you have two if with one is enough I don't know if is clear the question it's a great question how many predictor times is what you typically want to use I think as a rule of thumb the best analyses I've seen use something between 5 and 20 times areas it's sort of an amount that's manageable you still understand what each of these time series means rather than throwing in lots and lots of potentially spiros e correlated series and at the same time this toy example that uses just one time series that's really a toy example you wouldn't want to use that a'practice because you'll be completely at the mercy of that one time series so what often happens is that in one of those time series there might be there might be spikes and dips that you see and without other time series to guard that you'll be really relying on exactly what that one time series did so in practice using a handful of apps a dozen or two dozen of times use is typically the thing that works best yes okay can you use this method to to calculate the impact of multiple events I mean if maybe if they even overlap in time can we use this method to analyze multiple events that overlap in time that's a really interesting and I think mostly open research question I'm really interested in this question I'd love to chat with you afterwards very good so sorry bug are there tools for being able to analyze what is contributing to the say the width of the confidence interval during during that that post-treatment or during the treatment period and maybe get in and try to reduce it it's a great question statistical power is really directly related to how wide your confident ervil's are if your conference intervals are too wide for example if we go back to this case here at some point the longer you predict into the future the wider your confidence get reflecting this intuition that the further out our predictions are from the actual observed data the more uncertain we should get there lots of things that contribute to this the better your predictor time series explain your outcome the tighter your C is the less noise the less state noise observation wise there is in your outcome time series the more stable your predictions will be typically the more predictor time series you have the tighter your C eyes will get so there's a bunch of factors that go into this on the flip side if you don't have any predictor time series with strong predictive power then your results will just reflect that they will just tell you there's nothing we can say it's just too uncertain ok very good lots of questions thank you very much Kay's been awesome ok we have 45 minutes for lunch break thank you
Info
Channel: Big Things Conference
Views: 45,399
Rating: 4.9956522 out of 5
Keywords:
Id: GTgZfCltMm8
Channel Id: undefined
Length: 30min 38sec (1838 seconds)
Published: Tue Dec 13 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.