Inferring the effect of an event using CausalImpact

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so welcome back thanks Angie it's great it's great to be back in London it's great to be a start-up really great to see how the conflict has evolved so much now to get started you've probably all been there you've just finished some project which you hope had a positive impact on your business and now the only thing that sort of left to do is to evaluate whether your project was a success now that sounds easy but in practice if you think about it means you need to understand what actually happened and what would have happened if you have not run your project and looking at the differences between what actually happened and what would have happened otherwise that's exactly what puts your question right at the heart of causal inference so causal causal inference is the branch of statistics that's concerned with the effects of our actions with the consequences of what we do and that's why causal inference is one of those key areas of data science because if you can manage to extract one cause a law from your data that can be more powerful and all the dozens and dozens of correlations you might find otherwise so in this session I'd like to focus on causal inference techniques that you can use in your daily work starting today I'm Chi Buddhist and I'm a I'm a data scientist at Google and the kind of techniques that I could like to share with you today are things that you've we've been developing and using for a long time at Google especially for those of you whose main area of expertise is in areas other than causal inference my goal for today is to make sure that each and every single one of you walked out leave this conference with a better intuition than before about how you can use causes infants in the strategic pulse of your work okay so let's let's jump right into an example so two and a half years ago on the 15th of January in the early morning the Swiss National Bank released a press release in which they declared that they would no longer support the peg or the fixed exchange rate between the suit rank and the euro now within minutes of that press release being published the exchange rate between two currencies which had been rock solid for years all of a sudden became subject to the three forces of the market as is that efficient causal influence what we'd like to ask of what was the effect of publishing that press release and the answer the answer in this case is pretty easy because everybody in this room would probably agree that this is what would have happened to the exchange rate if the press rates had not been published now it caused the infant lingo we're comparing the observed data y1 to what we think would have happened otherwise a counterfactual the counterfactual data an estimate of the kind of actual y0 sometimes people know this refer to this as a synthetic control it's a useful term that's been popularized by a Verdean colleagues to indicate this idea that there is no experiment here we don't have a control group yet we want to create a control an estimate of what we think would have happened and creating that counterfactual is important because the causal effect that we're interested in is always the distance between those two what actually happened and what would have happened and as a nice rule of thumb whenever you look at a time series chart causal effects are always the vertical errors they never go across time and horizontally or diagonally and that's an easy one in practice though our cases might look a bit more like this one here where you've got some time series this could be sales clicks conversions anything of interest and we did something at certain point in time save you launch that marketing campaign or you release that new feature or you change your terms and conditions something you did to your business and you want to figure out what would have happened had you not taken that action you want to estimate that blue line and in this in this session we're going to focus on ways of doing that once you have that estimate the difference between those two that's going to be the causal effect of your action and one of the things I'd like to convince you about in this session is that you can run this kind of analysis and no more than three lines of code so we'll see how we get there let's take a step back though and remind ourselves what we're really dealing with in this area in this field of causal inference call the inference starts with a treatment there's two things we can do we can either take the treatment in which case we get to observe the potential outcome under the treatment or we can choose not to take the treatment in which case we get to observe the potential outcome under treatment importantly we can't do both of these things at the same time which means we can only ever observe one of those two potential outcomes and that's a problem because the causal effect of the treatment is the distance between those two potential outcomes since we can't observe both at the same time we can never ever observe a causal effect we can only hope to estimate it through clever data acquisition and statistical modeling palce with one exception how many of you remember the 1985 trilogy Back to the Future have a quick show of hands everyone for those of you born after 1985 I'll go a quick recap what happens in back the future is we are joining the actors in how they're taking a sports Almanac from the future that's by the way that was 2015 not quite so futuristic anymore they take that for Sam now from 2015 back into the past and that's a that's a handy thing to have in the past allows you to bet and win just about every single time doing so turns this talent who is the originally this model unsuccessful nobody into an ultra rich real-estate tycoon shown here depicted on top of tannin color so in that movie we actually get to observe both of these potential outcomes and that means we actually observe the causal effect of taking something back into the past and practice we can't do that and so we'll have to resort to statistics to solve this problem if there's one thing you remember from this session then let it be this slide here and for those of you taking pictures I'll let you know whether slide is complete the easier so whenever you have a causal inference problem my advice is to start with the table that looks like this every row in this table represents a unit for example this could be a participant in a psychology experiment or could be a user in a web traffic experiment or it could be a customer in a sales experiment and the first thing we know down for each of these units is their treatment status one for treatment group zero for control group just two standard set up let's say that the treatment is randomized now what the table makes explicit is is this idea that there are these two potential outcomes not just the observed data but we're thinking in terms of things that we could potentially observe the first one is y1 which we only observe for the treatment group that's what happens under the treatment and we'll never ever know what y1 would have been for the control group since they weren't treated and the other one is the potential outcome under no treatment which is something that we observe for the control group but we'll never know what y0 would have been for the treatment group since the treatment group was treated and then there are other variables these are typically things that we record before the treatment staff pre period Co various variables that describe units they turn out to be very useful I won't go into too much detail of those let's just suppose now say that we have other variables at least collected before the treatment started now when you look at yourself or your colleagues analyzing a randomized experiment typically you'll see people do something like this you focus just on the observed outcomes and you might run something like a t-test to compare your treatment group to control group answer the question of whether there is a statistically significant difference and that's the end of the analysis so you'd answer your causal effect estimate from the observed outcomes but it turns out there is often a statistically much more powerful approach and that is by taking a detour and first estimating the missing potential outcomes explicitly for example using using some sort of models there a regression model once you have estimates of these missing potential outcomes you can just read off your causal effect directly it's almost like as if you had a complete data set where everything that could be observed is now either observed or estimated in many cases though we don't have I would quickly go back for those of you who'd like to take notes on this slide this is like the classical case of a randomized experiment where things are easy in practice though we often have situations where we don't have that luxury think of big events big actions that your business might take running a campaign changing a feature releasing a new product version all of these are the kind of things where you don't have an alternative version of your company sitting around that you could treat as a control group think of big political decisions economic decisions there was only one UK there was only one breaks a decision there isn't a control group where you can see what would have happened otherwise these are all the kind of situations where our table looks rather more like this one here where we sort of have one unit that's asked at our company or our country the treatment has already happened we know what happened under the treatment what we want to understand is what would have happened without the treatment so if we can find an estimate for that potential outcome and another treatment then we can determine the treatment effect simply by comparing those two importantly we don't want to just have a point estimate we want to repeatedly estimate these missing potential outcomes in a bayesian context you typically do this by sampling from the posterior distribution over your missing potential outcomes and we're doing that repeatedly gives you samples from your causal effect estimate tower you can then summarize these samples and for example toward the posterior mean or a cin to indicate the uncertainty in your causal effect estimate okay so that's going to be the general strategy that we use here we are interested in where in the world of causal influence we'd like to estimate causal effects to do that we need to estimate the missing potential outcomes and once we've done that we can answer our causal effect okay so let's go back to our example and see how we can actually do this this is the time series that I showed at the very beginning and let's say that we did something to the market on the 1st of January 2012 now that event splits our time series into two parts the pre period and post period and that's really all we need to know and the event that happened on that first of January could taint could affect the time series for any time in the future now what we want to estimate is something like this an estimate of the counterfactual so the blue solid line here shows the point estimate and the light shaded area represents a 95% credible interval you can see how this interval represents our uncertainty grows over time widens of the time according corresponding to our intuition that the further you predict into the future the more uncertainty should get these are the kind of properties that we want to build into our models now you might ask how on earth did we come up with such a specific prediction of what would have happened what was even based on and the answer is that the most powerful ingredient of creating such predictions is to use other time series other time series which are related to our outcome variable the black one but which we know or at least can safely assume not to be affected by the treatment themselves so here I'm showing two curve areas red and green these could be say sales of some other product queries for your competitors brand clicks on unrelated entities things that are possibly related to your outcome variable Y but which you know are not themselves affected by the thing that you did on the first of January given these kind of times here these could be one or two or dozens or hundreds the idea now is to simply say let's split this whole problem into two parts first of all we're going to learn the relationship between how red and green explained blacks and then in the second part we're going to simply apply that learn relationship to predict what should have happened to the black time series given what red and green did in that post period okay so that's that's the strategy how we can turn a causal inference problem into a model selection and prediction problem simply converted one problem into another we're going to learn how the three time fears are related in the pre period and then simply predict what should have happened to the black time series in the post code given that we already know what happened to red and green once we've done that the difference between what actually happened to our comparable black and and what we predicted should have happened that blue that difference is going to be an estimate of our causal effect so let me push that plot up to the top of the slide and what I'm showing here at the bottom is if that Delta curve the difference between the observed data and our counterfactual estimate and that's our estimate of the causal effect of the action so you can see how hobbling around zero in the pre period that's reassuring we didn't have an effect on the past and then it's shooting up decaying back down to zero I generated these data which means we know the true causal effect and that true effect is shown by the green solid line so you can see how the model recovered accurately that true effect which we inject it into the data what kind of models could use for this in principle this could be any model we started out with simple regression models we then moved on to Bayesian structural time series models which is an extremely rich and powerful class of time series models on a Bayesian foundation currently we're also experimenting with deep learning methods the method that's implemented in causal impact the lively that I'll cover in a second is the model that you see here on this slide this is graphical model notation the time series model that has all these different components and we stitch together the component that describes the current level of the time series there is a component that describes the current trend and then there is a mechanism for placing a circled spike and slab prior on the curve areas that's a way of for us to automatically identify which curve areas are useful in predicting our outcome variable just to give you a flavor of the kind of models you could use for this in principle could be any kind of model we've done most of our analyses using the kind of model that you see here called Bayesian structural time series models I'll show you one more example of that I'd love to try a demo on my laptop so here's an example from Google Adwords you all know Google Adwords it's the system that decides if a good answer to your query on Google com is to connect you with their business it shows an ad and the data I'm looking at here is data from an advertiser who ran an advertising campaign on google.com for six weeks our outcome variable is the number of clicks the number of clicks that this advertiser got to their website and the counterfactual of course is is how many clicks what does advertiser have gotten had they not won the advertising campaign so the black curve is number of actual clicks the blue dotted line is our estimate and the covariance that I used for this analysis our queries to this advertisers competitors other brands in the same market these are amazing curved arrows which you can get for free using Google Trends it's a great source for collecting these kind of other time series now in this case the advertiser actually ran a randomized experiment and that's great news because it means we can now repeat the analysis and actually using the control group the real control group like the best way of getting causal influences from this experiment and when do that then you get this kind of picture so the true plots are almost identical they certainly give us the same kind of conclusions about the effectiveness of this advertising campaign this means that the bottom plot is based on a randomized experiment where you've held out half of half of your market from your campaign the top plot doesn't need that you can target your entire market yet you arrive by the same conclusion showing the potential that synthetic control methods have there's a lot of documentation for those of you who'd like to read about more about this there is a paper that goes a lot more into the theory of how the method works and what I'd like to focus on more in this session here is a quick demo so everything that you saw so far is implemented in a in an our library called causal impact and we open sourced this library recently so I'd like to give you a quick overview of what the library does okay so I've got a little toy data set here very simple data set has got 100 observations and two variables why is our outcome variable and x1 is our single Coverity I've seen a lot of analyses in the wild where that U is just one covariate in practice you want to use at least a handful or even dozens of curve areas just to get a richer model let's take a quick look at these data okay so this is what we have the black line here is our outcome variable and the red dotted line is our single covariant you can already see how the two are correlated it's a super simple toy example things are nice and easy we did something to that system at time point 71 and so our question now for this method is what do we think would has happened to that black time series after time point 71 given what we know happened to the red dotted line and sort of just by squinting looking at the time is if everybody can probably estimate that the black time sees wouldn't have shot up quite as much it would have mattered slightly down given how the two are correlated now the simplest possible type of analysis I can run by just specifying one more things the pre period which goes from time points 1 through 70 and the post period which is from time point 71 to 100 that's all I need to specify what is the simple kind of analysis what's happening now in the background is the model that the graphical model that you saw on the previous slide we're drawing posterior samples from that model building a model based on the pre period then using that model to predict what should have happened in the post period and sampling from it and getting a posterior distribution of our causal effects let's take a look at this object this is a ggplot object so I can work with this in the usual ggplot way okay so there are three panels here the top one shows the original data and black and a counterfactual estimate has a blue dotted line with 95% credible intervals around it you can see sort of as expected our synthetic control says the time suits would not have shot up as much it would have stayed a slightly lower level after time point 71 the second panel shows the Delta between the two telling us that we have a causal effect of around 10 every time point and the third panel simply adds up these individual Daly causal effects over time so across that period we end up with a total causal effect of around 300 looking at the actual numbers I can just take a look at the summary of the posterior influences here so this this for example tells us that the observed data had an average value of 117 in the post period but we would have expected a value of only 107 and so the difference between those two is our causal effect estimate our average causal effect in absolute terms and a relative terms and the analysis tells us that we are supremely confident that there indeed was a causal effect in our data if our queries weren't rich enough or if our covariates weren't at all predictive of our time series then this analysis would probably tell us we don't know there might be a causal effect but we're not we're not 99.9 percent certain which can often be just as useful inside suppose you ran an advertising campaign and your naz's told you that there was absolutely no effect you can take your budget put it somewhere else so null results in this sense can be just as useful as positive results and finally for those of you who have a healthy disregard to writing up the results of your analysis like writing up the kind of usual reports once you've done that a few times you kind of get bored of it so the package has a function that's part of it which describes what your data was what your model was what drops you found and what they mean you can copy and paste that into whatever I like there's no copyright whatsoever here okay let me quickly sank the many colleagues and friends who've been contributing to this work over the years in particular Steve Scott who is the author of the bsts library that implements the kind of Bayesian structural times is mostly saw a la Hauser who is the current maintainer of the package as well as last knickers are penny Fabian Simon Rowan and Hull and finally amazing audience that we thank you for your attendance and attention thank you [Applause] okay we have a couple minutes for questions this is a while ago maybe about a year ago I go ask Chris my causal effects we're going to collect a self-conscious but big set maybe was like a not very big maybe hundred so the kind of question in the business side was there's a trend here it's in the geographical region tell me could it cause the trend to blow up these months down the line there right and so I dug into this problem when I did the you know follow Bob Granger causality partial various personality dynamic by the members and looked into in case this is no control that no are no period right it is an intervention you just want to find out a correlation I know it's very surprised or difficulty difficulty throwing balls for more than a couple of time series and how little documentation and kind of like work has done except from some like people in bioinformatics looking at brain signals etc so do you think there's going to be a movement was you know creating such marketers in the future like you guys might crave one would would you rate ok location so the question is suppose you have multiple time series and you want to understand the causal structure between them without a specific intervention so this is a whole subfield of causal inference by itself identifying the cause or graph from the data it's unrelated to this situation that we're looking at here here we have a specific event and we want to understand the effect of that event I can point you to a bunch of sources perhaps you already know the literature there's a huge literature on identifying a causal structure from a set of time series I'm glad you drew the biology comparison because that's a an important one and we know when we run the experiments that we don't just not treat retrieval is something that is identical to the thing that we're giving the patient except for one covariant so we give them a placebo because we know that doing stuff always has an effect and we're interested not just in doing something but we want to know in the differencing the thing that they're doing versus just doing stuff so I just wonder if you had any thoughts on that because obviously that that's not catered for at this point something might do something it's going to have an effect right it's a great question in any real treatment there's always a lot lots of things that you did even take a real business example of say I'm releasing a new device and seeing what effect that has releasing a new product releasing a new pricing strategy the important thing to keep in mind is that your interpretation hinges on your control set so whatever is captured in your covariance is going to be your baseline some of the most well known examples of this kind of approach not in a Bayesian setting but it's sort of an earlier times when models were simpler are to actually look at policies and ask things like Congress passed a new bill what was the effect of that bill on the economy now there's no there isn't really a classic control but you can use a whole set of other countries and see how well they explain your countries of your country of interest in the past and then use that same approach and of course in this case the the only option you have is to say any difference between what I've observed and what I had expected I can only attribute to their treatment the other speakers would be available just after the talks and you could come in last questions then so thank you [Applause]
Info
Channel: Kay Brodersen
Views: 2,436
Rating: undefined out of 5
Keywords:
Id: y3hLJnB6O7c
Channel Id: undefined
Length: 29min 48sec (1788 seconds)
Published: Sun Sep 03 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.