Causal Models in Practice at Lyft with Sean Taylor - #486

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] all right everyone i am here with sean taylor sean is a staff data scientist at lyft working on rideshare labs shawn welcome to the twimla ai podcast thanks sam happy to be here i'm super excited to chat with you it's been a long time in the works um and really looking forward to our conversation i like to get these interviews started by having you share a bit about your background uh and introduce yourself to our community how did you get started working in data science yeah thrilled to be here um always fun to like reminisce about how you ended up where you got i sometimes when i think about the journey to being a data scientist that goes all the way back to like you know college and working on real estate research with a professor i used to work with a pen and getting into like geospatial data there but i you know there's a long process since then and probably the most pivotal thing was working in grad school on large-scale experimentations my grad school program i was studying i was studying to be a social scientist and study how people uh influence their friends online really around this era of big data and people getting really interested in hadoop and hive and running large-scale experiments and i got uh i got very lucky and got an internship at facebook because the 30 year old intern at facebook so sort of like like that movie the interns um and i got a great great set of mentors at facebook dean at goals and anytime bakshi were were awesome and they taught me everything i needed to know about being a data scientist at facebook and then decided to decided to stick around and stay so i was a data scientist at facebook for about seven years um and then uh about two years ago switched over to lyft where i uh started working on marketplace uh experimentation and other other kinds of stuff that left a very different set of problems in facebook but you know you can trace that journey all the way back you know 20 20 years if you want to or you know maybe just 10 but it's still still ending up to be a lot of time at this point feeling old nice nice um so tell us a little bit about uh rideshare labs what's the what's the mission there yeah that's a great question um you know for uh for a company like lyft there's often things are broken into products and products have a roadmap so you have sort of like we have teams that run certain algorithms like our pricing algorithm or dispatch algorithm of eta prediction and even the product itself like the driver side of the app the writer side of the app a product roadmap has to be very reliable you have to sort of deliver progress on a certain time schedule so that you can meet your business requirements uh but you'd also like to try newer and more innovative things so carving out some time and space for scientists to work on ideas that might not really pan out and and if they were sort of like if they were they're big bets but if you know you don't want to bet your whole company on them so we can kind of incubate them within labs and you know try things that you know have a maybe under 50 or under 25 hit rate but when they do hit we get sort of a big big boost out of them so we like to create that space you know for scientists to do that uh and we have an engineering team as well that helps us implement those ideas and get them into practice and then the playbook is really to to get go and then take the thing that we work on and get it into production and then do some kind of hand-off uh to to a production team that can kind of take the thing that we've built and run it in production um so so we create that space for for innovation within the company nice and until recently you had a more of a managerial role on that team and you uh kind of swapped jobs it sounds like with someone to get more hands-on i'd love to hear more about the story there the the motivation um you know it's not something you see a lot of yeah it is an unusual move and people have asked me quite a bit about it i i think i was very lucky so number one lucky to get the job in the first place so i wasn't sort of um hired to be head of rideshare labs at lyft but i took the role uh last summer and after after a departure with the current manager of the team and so i sort of took on this new role of trying to plan research and coordinate research for a large team we have about 13 people on labs and uh it's a very different job right so when you you know you're not doing a lot of hands-on science work anymore you're just sort of helping helping people get unblocked and make sure that they have what they need to be great researchers and to do excellent work and i really enjoyed that and i think the mentorship side of things and where you get to see people really thrive and build awesome stuff and you get to come along for the ride with what they're doing uh but i you know i had the itch to just do some more hands-on work myself and i think it's really hard to scratch that itch from a managerial role your time gets real there's a you know the program essay the maker schedule and the manager schedule it's very very true you get a lot of your time just sort of like eaten up by things that are are really great but they don't allow you to accumulate um progress on projects so i got very lucky that there's this guy nick chalmendy who's a really excellent manager left he's been there a long time and he was willing to kind of step in and take on the director role and he's uh he took on a lot of my management responsibility and now i have a little bit of space to go and work on some of those ideas that have been kind of piling up in my brain for the last few months that's awesome is there any particular uh experience in your background or um example that kind of gave you the i don't know courage for lack of a better word to kind of make that that leap to well first to know exactly what it was that you wanted and and second to make it happen well i think um you know i'm i'm a big experimentation person and i uh i think that that's a really important part of my philosophy both in business and in life so trying new things is really important you'll never learn if you if you like something unless you try it it's like same same same thing with foods as it is with careers so you have to you have to experiment a little bit so i have a bias toward action and you know trying new things um so that was one part of it the other part was just having like a supportive company that would help facilitate something like that so uh i think in general it's pretty tough to to shed responsibility at a big company it's uh it's pretty tough to find people that really want to take on stuff that you've been doing and that the unglamorous stuff so that you can go and do the fun stuff but i was lucky to be able to fill that gap um and so yeah those two things combined like the experimental mindset and and also the thing that's been coming to mind a lot lately is there's this book called flow i talk to people about it all the time i think getting into a flow state is like really like something that you should you should try to make your work facilitate and so i really want to get back to doing stuff where i kind of like lose track of time and are able to kind of like make big progress on on projects with with a little bit of time and space so creating that space for myself became a big big priority yeah at the risk of turning this into a productivity podcast learning podcast um i'm curious if you've also read deep work and how you compare flow and deep work oh no i haven't i haven't read deep work yet i have another cal newport book the you know what is it the end of email or depth of email i've been kind of paging through and i love all these ideas i do think that we've we've really kind of made a very distraction-filled environment for ourselves as workers um and particularly for the kind of work that we that we do in data science it really requires kind of sustained attention um so some problems you just you really can't make any progress on unless you can make the space both both in terms of time and mental space for them so it's it's a big it's a big goal for me personally it's a big goal for me as a manager when i manage people to make sure that they have that space i even go through there's like these preconditions for flow state in the you know in the flow book and you can kind of apply that as a management philosophy like are the people in my team able to get into a flow state what kind of distractions are blocking them from making progress and doing that so i think it's just like a really important part of being an effective scientist and researcher nice nice uh so you mentioned experimentation and how core that is to your philosophy um i think anyone who follows you on twitter and and folks should you know knows that you're very excited about stats and you have maybe more of a stats oriented experimentational oriented bent uh than other some other folks in the ml uh at least the ml twitter sphere uh i'd love to hear you riff on kind of your uh on that orientation and how you think about the relationship between you know stats and your work and ml and ai yeah that's a it's a great question to reflect on um i i have kind of branded myself as a statistician and i like hanging out with the statisticians because there's a really like old lineage of ideas there to to kind of lean on um all the way back to like you know fisher and you know his work on experimentation um and then you have sort of like savage is one of the original uh statisticians and when he was thinking about statistics he was thinking about decision problems at its core so how do we how do we make more effective decisions how do we as humans make decisions optimally how would we as a business or someone working agriculture in the case of fisher make better decisions and i really like that pragmatism of statistics sort of like geared toward a particular application um and having some some real world problem that you really want to solve and that's sort of like the way that i think about ai and machine learning and statistics they are tools that we use to to make something work better you know achieve some new capability and that there's a long tradition and statistics of that um doesn't mean that i'm like a really rigorous statistician in fact i think i'm not really good enough at math to be one of them but but there's a lot of great ideas to be borrowed from from that old tradition that are we're constantly reinventing um and that we don't really need to you can go back and read these old papers and they have all the same all the same wisdom in in them that you can read about today just maybe maybe we call them methods different things and they're more flexible and more scalable and stuff like that but at the end of the day we're really trying to solve pretty similar problems to stuff that people have been trying to solve for hundreds of years nice nice um i love to jump into some of the things that you're working on there at lyft and one of the things that you are involved in is the forecasting effort there can you uh tell us a little bit about that yeah i i continue to be branded as a forecasting person just you know you write one lousy forecasting package and everybody wants you to work on forecasting forever uh but it is a really interesting problem because i think forecasting is is at its core is a really human-centered uh modeling problem because they're often consumed by humans so people look at forecasts they have some intuition for what they should look like and they really want to use them to make better decisions coming up with a like a flexible system where that ultimately a forecast has to be a human in the loop decision making system to be effective humans have to help inject domain knowledge into forecasts by making them better through what they might know about what's likely to happen in the future so that's an interesting part to get right and then using the forecasted you know information to make a better decision and closing the loop so that uh and that's another sort of human in the loop piece to it so lift what we what we frequently need to do is plan our market management um so there's a supply side to lift which is drivers showing up and using the app to to provide driver hours to the marketplace um and then we have the demand side which are people requesting rides and these two things can get out of balance pretty easily we grow driver pool too quickly and we have too many drivers that could be bad they wouldn't really earn very much money per hour if there were too many drivers on the road and likewise like you know over demanded situations have too much demand amount of drivers are pretty disastrous if you've ever opened up the lyft app and you've seen like a 25-minute wait time or something like that just means that we did a did a bad job at planning how many uh how many drivers we're going to need so we but we have tools to address the market imbalance and sort of can spend money on incentives on the supply side or the demand side of the market and so the forecasts become really pivotal and deciding how we how we do that um so we have to set these real number policy variables every week and actually every hour and and we'd like to be able to plan that in advance and make plans to do a better job of it uh and so the planning really revolves around having a good forecast of our demands by state in the future one of the really interesting bits about demand and supply is that we have control over those variables so they're not just pure exogenous variables that we like to forecast like the weather we have to forecast like not only what will happen if we don't do something but what will happen if we do do something so if we do something like raise prices then people will demand uh fewer lifts and so demand will go down and so we have to sort of incorporate the effects of our previous decisions into the forecast so there's there's really like a rich space of modeling problems just within forecasting and it's really never going to be as simple as just take this line and extrapolate it into the future and that that's what's so exciting about it at lyft so we really have to we have to think about a system rather than just like any just a particular model when you think about incorporating in the potential decisions that you could make into your forecasts how do you close that loop do you end up using simulation techniques or other types of techniques to do that yeah it's uh at the end our our forecasting system is is designed around causal models so it's sort of unique in that way so we think of it as a causal model where we have certain nodes in the graph that we control so this is a just like you know anybody who reads pearl's books will see a dag and think oh those are cool but how do i use them well we do use them at lyft and we use them to to model our business and that the nodes that have no parents on them the pure parent nodes are variables that we control so things like price levels and how much we spend on driver incentives and then there are nodes that are marketplace outcomes the things that like happen um and there are other like you know nodes that are pure parents that we don't control so things like just how many people would show up organically and request rides so we have to sort of like do this business modeling in advance in order to create these this set of models um and they're all linked together as one big structural model so that's a that's a pretty exciting thing to do is you kind of couple your forecasts together into a joint system so they're all internally consistent with one another so you have sort of like some of those variables are your uh policy variables and those the forecast for those is actually a plan so when you when you're going to set those for future values you can't forecast it's something that you're going to do so you're going to fill in this you know this vector of values with a plan and then you say like okay under this plan what would happen to these other variables the really exciting thing that's happening these days is with with differentiable programming autograd everywhere means that we can kind of build a model that we can put a plan in and it will tell us what will happen or we can also just flip it on its head and say well we have an objective just find me an optimal plan and that that's what i'm really excited about these days is that we can we can take a forecasting model and say like the forecast the purpose of the forecasting model isn't to produce forecasts it's to produce plans and those plans should help make some business objective happen and so they really like translate directly into something actionable for the business rather than something that we'd have to sort of derive what we're supposed to do based on the forecast are you using a particular kind of causal model uh causal modeling set of tools uh that you build these uh apps in uh or are you have you kind of built up your own framework from the ground up yeah i'm lucky to have a team that's very interested in tool building so they they have a lot of the technology we needed to do this but it's it's it's built on top of pi torch um and so that that was a kind of interesting design decision we had to make early on it's kind of like what are we going to build the models and and we wanted something really flexible and that had auto grad built in so we chose pytorch and we had to build a lot of scaffolding on top of that so how do the models link together in some holistic system so so we built sort of a way to stitch together a dag of many models that's composable and that can admit sort of like a joint training or training of individual level models um and then on the causal side actually the hard part there is is coming up with old like prior experimental evidence for the causal effects of things how do we know what the what the effect of a price change will be on demand uh the best way to do that is actually find historical times when we've changed prices and go and try to figure out what happened in those circumstances because we can't just use the data under no price changes to estimate that so really a lot of the the hard part of the model was finding all the evidence that we needed to estimate the slopes of different curves that are that are kind of like important to the counter factual predictions that the model makes so a lot of a lot of the task of building a model like this is building up business knowledge about what people have done in the past what prior experiments have been run you know what interventions have we done historically and it ends up looking a little bit like macroeconomics you know if you go you go and hang out with macroeconomists for a while they're obsessed with history because the historical data provides the natural experiments that they need to understand what's going to happen in the future so like when we think about what's going to happen in a recession well the easiest way to figure out what's going to happen is go find historical recessions and try to see what was common about them and so we apply kind of a similar lens to that problem so this question may be um you know i may be asking a question that you just answered but uh when you think about applying causal models into the types of forecasting that you're doing i'm imagining that you have to dramatically kind of simplify the the model and you know it can't be you know so robust that it's taking into effect all of the actual causal relationships in the thing that you're actually modeling uh and so i'm wondering like the interface um i guess the thing that i'm thinking of is like leaky abstractions like you know how does that um how does that manifest in trying to use causal models in a real-world scenario like this yeah i love that question um because i think it's the it's the hard part and it's what i've been telling my team for we've been working on this for about two years now is that we're not building a model we're we're building a a system for building models that's going to evolve over time to help us like you know make agile adjustments to the way the business runs or to or as you know as we have new information so really we don't our goal isn't to build like the best model in the world where our goal is to be agile and to create sort of a modeling framework that allows us to to get good at making models better which is really kind of the goal of i think most teams that are building models really are thinking a lot about this loop of like proposing new idea testing it very quickly and then folding in the things that work well you know into your system quickly um and so we really would like to get good at that so the core piece of that is uh model model checking or you know bayesian phase inspecting procedures and validating a model and saying whether it's better or worse than your old one is really probably the hardest part of of being a machine learning researcher or a statistician in general when when do you have a model that's better than your old one it turns out for our model since they're so much based on business knowledge that really that's a that's a piece where the human in the loop is very useful people can inspect the output of the model and say look this doesn't make sense to me and that's that's actually very useful information so what we're really trying to think ahead about right now is what visualizations and plots and diagnostics can we create very quickly from fitted models that we can show to people that maybe don't even know how the model works or how it's fit but that they can understand and say like this doesn't look realistic to me and that's that's important not only to improve the model but also because that building trust with those people who like ultimately are responsible for the decisions that the model makes uh is is sort of like for first order importance if they don't if they don't trust the model they don't believe it then they won't they'll sort of ignore it and not make decisions using it so um one of the big pillars for our team for a long time we call it trust and understanding like do you do do people trust this model enough to start betting money on it is really where we'd like to get and to get them there you really need to show them a lot of plots it's kind of kind of the takeaway there and we've had to be very agile and modify the model a lot so that sort of like become this it feels it can feel system if you think of it as like oh there's some ultimate ultimate goal but really at the end of the day you're trying to build a good process and that's that's what we've been focused on uh in theory uh using causal modeling techniques should provide a level of trust or understanding kind of you know built in or or you know that's what's written on the 10 that's what a lot of people are excited about causal models for nowadays um it sounds like you know there's still a lot of work that needs to be done though i think that the hardest part isn't the modeling and fitting models it's actually like quite easy and straightforward to do that uh what you're really limited by is how many interventions that you've had historically so this this is sort of like why macroeconomics is hard it's like what's gonna happen in recession well we've only had like you know three or four recessions in the last 30 40 years so there's not a lot of like examples to draw on so your sample size is very limited for interventional data inherently especially like you know system-wide interventional data now when you zoom in to like you know individual level policies like a user user level randomization edit experiment or we do like time split randomizations that have like those are cases where you can get very precise causal knowledge you can estimate effects very precisely but for these like system level um estimates you really are sort of limited by the available history that you have and what you've done in the past if you've been very experimental in the past and tried a lot of things maybe you have you know the ability to get some more traction on the problem but uh but yeah we need to become better experiment designers and ultimately be more experimental to make causal models better and you use causal models uh more broadly than in forecasting there oh yeah i mean i think every model is a causal model but that's a very it should or maybe they should be at least it's a very strong perspective but in a business setting you're almost always trying to make a decision differently and if you don't make a decision differently then you didn't really have any of any effect on the business so uh some models are ultimately meant to sort of drive some decision making either a very micro level decision like for us which which which driver will we dispatch to you as a rider given that your request to ride is a decision that we make and there are counterfactuals around that decision well we could have dispatched this other driver or this other driver we have or we could just not dispatch a driver at all because we don't have enough of them and we need to allocate them in a scarce way those are all causal questions and so that's a very micro level decision and then i was talking about like zoomed out macro level decisions about you know spending money at like a weekly level of granularity on some incentives they're both causal questions and they ultimately kind of like either it's an automated system that's going to do these things in an ongoing basis without a human in the loop or it's a more sort of like fuzzy business process where there's some human in the loop but either way you sort of like would like to know what would happen if you did something differently awesome so you're also involved in efforts around marketplace experimentation there can you tell us a little bit about those sure yeah that's a that's a one of the most interesting and challenging parts of a marketplace like lyft is that it's very difficult to know when you're you know when your business is functioning better um because um like if you look at something like revenue for the it's affected by both supply and demand so we can have demand shocks that make us like a ton of revenue we didn't do anything to cause that um or we can you know drivers we can drivers can show up in droves and we can have lots of drivers and everybody gets good experiences and we didn't we didn't have any control over that could just be like you know macroeconomic factors so but ultimately like our business is the business of matching those things can we match supply and demand effectively and make really really good micro decisions that add up to good experiences for the participants on both sides of the marketplace and it's very difficult to know when you're doing a better job of that because there's just all this noise and so the thing that we can do to improve the decision making there is to build better models of how the marketplace functions that uh allow us to kind of like uh partial out the noise like denoise the signal that's been that's being kind of transmitted so ultimately what we want to do is try new algorithms in production so try new ways of matching riders to drivers and then be able to detect if that's a better outcome for the marketplace or not and so the way that we try things is through what we call time split tests so we'll sort of switch algorithms other people in industry call these switchback tests um where you sort of switch algorithms on and off at random intervals and try to see what happens on the borderline so when we switch from one to the other you get this little this nice little experiment of like the system just changes state and um and it can do better in the next hour than it did in the prior hour and our job as statisticians is to be good at detecting that so how can we figure out if it really was better and it's the idea of doing a switchback test as opposed to the more traditional uh kind of a b test um that uh or sequential testing that you really the the granularity of distribution shifts is so small you kind of have to do them uh very quickly and kind of in parallel yeah i think that's uh it's it's slightly more complicated and that we have a problem of interference um so it's not just the granularity of the intervention it's that if we if we give 50 of users like a big discount um then they would soak up all the drivers and then these other users who are in the other condition would have fewer drivers available so there are these spillovers and marketplaces that cause the treatment that you've applied to some users or some drivers to spill over to the other ones so you think about um ultimately experimentation is a prediction problem in a way trying to predict what would happen in a counterfactual world where everybody you know in the whole marketplace was was sort of living in a world with our new algorithms and the best way to do that is just to do it so that's what a time split test kind of acknowledges is that maybe the best way to test something is just to try it out and see but you need to have a rigorous experimental design in order to get detectability there so so um we are working on you know finer grain versions of that where it's a little bit more zoomed in maybe we can say circumscribe some time and space and give treatments in a little bit more of a precise way but it needs to be more coarse than a user level randomization in order to get something like more faithful to what we really care about and when you talk about uh experimental design there and and the need to be rigorous there is that's something that is um kind of human in a loop hand on for every new experiment or have you kind of platformized some of this so that um you can can do some of that in an automated way yeah these are these are great questions sam i wish it almost sounds like i wrote these [Laughter] i i think like one of my big philosophies is that we should always be running more experiments than we are and i i tell people that all the time we're not running enough experiments we should be running more of them when you think about what are the bottlenecks to running experiments it really is the human is the bottleneck because we need humans to set them up and plan them and then we need humans to analyze them and decide what to do and um and you can cut the human out of the loop for both of those steps if you really want to but it's hard um on the planning side um it involves sort of making experiments into changes in configuration instead of code so a typical a b test is like an engineer writes some new code and you have some like if statement and you know and the code that changes um that's something that an engineer has to set up in order for it to be something that you can test but if you if you create a configuration based system which which is sort of more common at facebook than it is at lyft but it's sort of like you know engineers like to switch things into configuration when they can then all the parameters in your configuration file are just our experiments waiting to happen they're just numbers or uh or categorical variables that you'd like to maybe try out sometime so it's possible to generate ideas for experiments uh using machine learning and phasing optimization is one approach for doing this where you sort of like would like to try out parameters that you are most uncertain about how they'll perform in an online test um then to close the loop on the other side is how do you get a machine to decide whether you should launch an experiment or not and this is also a huge bottleneck and i think ultimately it boils down to that it's very difficult to get people to agree on what the objective of tests are in general so like what what what would success look like and is there is there just like one variable that we could use to decide whether this is successful or not and it turns out that the answer is often not people have there's usually some trade-offs involved more if one thing is good more of another thing is good but they sort of like you know more of one thing makes the other thing go down and at lyft we have like a very clear set of trade-offs there's usually like you know things that improve the driver side of the market or the rider side of the market there's things that improve like lifts profitability but not for our riders and drivers and then there's sort of like a short-term and long-term set of trade-offs like you know more of a greedy set of outcomes or a long-term set of outcomes so but getting folks to agree on like what you're what ronnie kojavi formerly at microsoft and then airbnb it's like a guru of experimentation calls an overall evaluation criteria so if you have that number and you can compute it for every experiment then you can really just then the loop is closed and it's you know the system can propose new experiments and then launch the ones that are good that's really what you see with like approaches like multi-arm bandits or you know fully full bayesian optimization type approaches um they're they're hard to get right but if you do get them right then you can run a lot of experiments from the sounds of it your experimentation metrics are uh at least the ones you've thrown out sound like business metrics as opposed to model metrics and is has it been easier to uh to drive business metrics to drive model development around business metrics with the class of models that you're using with these causal models as opposed to you know deep learning or some other type of technique or is it just a discipline that uh you as a team have um [Music] you know just committed to uh the forecasting and planning side really resists experimentation in a lot of ways so those models are hard to evaluate offline and they're hard to evaluate online so uh and we do have approaches for doing that and we do things like you know simulate simulated back tests and things like that to try to see if the the predictive performance of the model the statistical performance of the model translates into into better decisions um and we have ways of doing that and i think it sort of makes the model more faithful to the goal that it was originally designed for um i i think that you know we would love to have much fancier models with better architectures and more bells and whistles i don't want to engage in like you know building fancier models just for fancier models i'd like to do it in purpose of a specific task so until until you're very very good at translating some offline performance metric into some online performance metric i think it's kind of dangerous to focus only on offline metrics so achieving that concordance between like i know that my way of evaluating the model offline translates into you know better business value once you have that feedback loop like tight then i think it's like okay let's do a total free-for-all on the modeling um this is a little bit of a different perspective than i think a lot of people propose because they they want to they gravitate toward these offline metrics that can be measured very precisely and get really excited about improving them but i i think the burden of proof is on you as a as a scientist to like to show that that translates into some value um in a way that like other people believe and not just like that is consistent with your model with the marketplace experimentation one of your inputs or a couple of your inputs i would imagine are the forecasts forecasted supply and demand that go into the marketplace are there particular challenges associated with um kind of hierarchical models in that kind of environment yeah i think like lyft's data is is really fascinating in its structure and it has a lot of structure that really resists um efficient modeling a lot of the time um so we have sort of like like a spatio-temporal process let's take demand for instance and i think it's a it's pretty illustrative of the problems that we have demand is a point process so people pick up an app and make a request for a ride or just check the price and so we have a sort of like latitude and a longitude and a time stamp of and that's a unit of of demand now but let's say we wanted to forecast that um there's a lot of different ways to do it the simplest one and the one that's most common is to like aggregate it into counts in some time and space buckets and then you know use a traditional time series model but you might also reasonably think about that as like i would just like to be able to predict the density or like the rate of arrival of these points in time and space and then i could like aggregate up the forecast to whatever level that i want um and so there's this like uh bias variance trade-off of this this whole spectrum of methods like you know bucketing how do you choose your buckets so if we choose time buckets or space buckets in various ways and we get different bio experience trade-offs if you use a regular grid on lift data it will work terribly so there's there's a temptation to do things like image image type models where you have like pixels most of those pixels are empty and so you're just sort of like using a very wasteful representation of the data so uh and then same thing for time like there's like many hours through the middle of the night where there's just not a lot of activity in the marketplace so there's a lot of zeros and so you know your model is not really able to do much with that um so ideally we'd have a forecasting system that can give consistent forecasts at all levels of granularity that like if i if i forecast a very fine level and i add it up then i would get something the same as if i did the very high level forecast um it's just still a very uh open-ended research problem for us i think we're still trying to get this right but uh that that degree of coordination would be really excellent for us because it would allow us sort of like the micro level marketplace algorithms to be making their decisions based on the same information that we're using for more macro level decisions and have those two things be consistent so i hope that we can get there someday but it really is sort of a challenging uh kind of like modeling problem because it has this multi-resolution quality to it in both time and space a moment ago you alluded to kind of the tension between simple models and more complex models you are doing some experimentation with neural networks for fine grained decision making can you tell us a little bit about those efforts yeah i think there's a really rich tradition at lift of using um tools like light gbm or xg boost to solve uh solve problems because those are like really great hammers to hit data with it's really like they always they work very well without a lot of perimeter tuning and they work really reliably in production um and so there's there's been a gradual sort of process of of trying to figure out like could could neural networks help us and could they do better and i think the answer has been that they typically don't do that much better in predictive performance than than trade based models sort of like we might get something that might be a little bit better but maybe not worth all the extra headache of changing things but the neural networks have a couple of a really big advantages uh and a big one is flexibility so we can change the loss function on a neural network very easily to be something else so like the link the link function at the end we can make it so it's going to predict count variables it could be like you know it's very easy to swap in different loss functions um you can do that with trees too but it's it's more challenging another one is to predict multiple outcomes at the same time so like a ride request could turn into many different outcomes that could turn it the writer could cancel the driver could cancel um it can turn into it can turn to right it could turn into a report maybe we want to build a model that pulls all those outcomes into a single vector value outcome and then we can kind of you know build a shared representation that helps us leverage like you know it's like a transfer learning idea some of those outcomes are sparser than others and so by pulling them into the same model we can do better um that's a very natural idea in neural networks for that's actually quite difficult to implement in in tree based models um and then the other kind of hidden advantage of neural networks is this the scalability they actually train a lot faster and we can do much larger scale than we can with trees and so i think the hope is that we'll be able to eventually sort of like you know train models that for our entire marketplace and in one model rather than having like region-specific models and that would be facilitated through the ability to kind of like put put everything into the same modeling architecture all at once so i'm very excited about that future um and we're we're on our way there we're already seeing really promising results particularly for things like heterogeneous treatment effect models um where sort of like there's just some more technology that allows us to do that and example of a heterogeneous treatment effect model yeah a great question is sort of like uh you know this is causal inference jargon that i take for granted um so uh in causal inference everybody's concerned with treatment effects it's like a single binary treatment be like you know you take a pill and does it work or not would be the average treatment effect would be average over the population heterogeneous treatment effect says hey maybe that pill works better for some people than for others and we like the machine learning model to give us some ident idea for whom it will have stronger treatment effects and if you think about this as a label data problem it doesn't work because i would like you know to take you know a vector for you and predict uh an effect and i'll never observe you getting the treatment and not getting the treatment so i can't actually estimate a per you know per observation treatment effects and so you have to use a kind of interesting architecture to to do that uh but then once you do you get this very powerful model sort of uses all the available features to try to explain uh heterogeneity in the response to the treatments that you have so treatments for us might be things like coupons or discounts for riders um or incentives for drivers and so by kind of like putting those into models and letting the models tell us about the you know how the response to that treatment might vary we can do a better job of figuring out like which which people are going to benefit the most from different treatments that we can use nice it sounds in some ways analogous to the idea of getting the plan out of your forecasting models instead of the forecast itself yeah i i do think that that's uh that's a big theme of my last couple years is thinking about models as not returning predictions but as returning decisions and it creates a sort of like end-to-end way of thinking about machine learning is really like the part in the middle where you have an estimate or a prediction is is a nuisance to the system right like a you know automated system doesn't need to know that there was like some estimate of something it just needs to know like what am i supposed to do in the code who am i supposed to give this treatment to or which rider is supposed to be matched with which driver doesn't care about you know some estimate that happened along the way and uh i don't know if we'll get there anytime soon but really the the layer of human interpretability within these models is a little bit of like uh something that we just have as like vestigially for a little while until maybe we'll end up with like some more end-to-end systems in the long run nice nice i'd love to close this out by having you talk a little bit about the role of rideshare labs relative to kind of classical lift data science you know just listening to you speak it sounds like a lot of the stuff that you're doing is kind of practical today work that's in the short term of the business as opposed to your traditional labs which is you know pursuing these moonshots that may or may not materialize how do you think about that relationship yeah that's a i love that question um i it is a these are moonshots the problems themselves are the same as the the teams themselves that are working on them and we partner with them quite closely uh so we're always working with teams that are actually doing real real work and working on actual day-to-day problems so we have this kind of collaborative model but the moonshot part of it is that the we're not we're not sure that the methods for solving those problems are going to work yet so we have a known working solution that we can pursue sort of like small gradient steps toward improving um but if we want to take a jump in design space to a different solution um then we need to incubate that somehow so that like the move from trees to neural networks for these systems is something that you know takes months to implement and it's not something that a product team really would would probably ever prioritize because it would just detract from getting some more immediate business value so i think the problems stay the same what we're trying to do is kind of like mine the field for new new solutions and think you know things like going to conferences and reading reading all the latest research and saying hey hey how does this apply to our business is there an idea here that like really could be a big step function improvement in how we do things and if if it could be that then it's our job to be experimental and try those things out in a kind of like limited risk setting and so that's that's kind of my my big idea about what the role of a labs team would be awesome and is there is there a result from a conference or paper or something like that that stands out as an example of that you know something that um you know is from another area or or kind of orthogonal to what you're currently doing but um you had really interesting results trying to apply it yeah i mean the that heterogeneous truman effect modeling idea i didn't come up with that we i mean it's like what is it great artist cop we're a good artist copy great artist steel yeah it's the same thing with scientists uh that's a claudia she and victor veich and david bligh had this paper it's called dragon net and you can go and read that paper and it's got a really great idea for a neural network architecture that can estimate heterogeneous treatment effects and i saw the paper and you can see that the diagram for the neural network architecture and we're like wow this is a great it's a great idea they already have code available online so we can go and like you know try that out on our data and that's been sort of like an ongoing process of trying to figure out if we can if we can do better than our kind of existing methodology and when and where it works better so um yeah so i think we we borrow very liberally and i think that that's actually the really fun part right now everybody's inventing all kinds of new stuff so it's fun to it's fun to be someone who borrows and steals awesome well sean thanks so much for taking the time to share a bit about what you're up to it's been wonderful catching up with you yeah thanks sam for having me this is this is super fun great questions looking forward to the next time thank you all right see ya all right everyone i hope you enjoyed that interview i am here with my friend robert osazuwa nest and we're going to spend a few minutes chatting about the interview and how it relates to his causal modeling and machine learning course robert welcome back hey sam thanks for having me again hey why don't we just get started by having you um just kind of riff on the interview and how it ties into the the themes that you've covered in the course we've been doing the kind of joint uh alt deep dot ai that's your education company and twiml collaboration for about a year and a half now uh what maybe four or so cohorts so far yeah it's been a good collaboration you and i and i enjoyed the interview with sean is somebody we go ways back i think i first met him when i was running a meet-up on probably programming back in the bay area and he came and talked about profit which was the uh forecasting stop where he built on top of uh probably programming language called stan nice and so did any particular themes jump out at you from the interview i think you prepared some clips that you wanted to uh took some snippets here you know it's funny you know i've had these it's actually tangential to some conversations i've had recently he talks about so one of the approaches that we focus on in our course is a is a generative modeling approach to causal modeling and and some of those things came up here and so for example he talks about heterogeneous treatment effects heterogeneous treatment effects says hey maybe that pill works better for some people than for others and we like the machine learning model to give us some ident idea for whom it will have stronger treatment effects and if you think about this as a label data problem it doesn't work because i would like to take a vector for you and predict an effect and i'll never observe you getting the treatment and not getting the treatments i can't actually estimate a per observation treatment effects yeah so one interesting thing if you take a generative modeling approach to causal modeling so what you're doing is you're you're modeling the distribution of the causal query so say you want to know what the causal effect is of this treatment on that outcome you're looking at the distribution of that outcome under the intervention under under the treatment and and so from generative mauling approach since you're just directly modeling the distribution heterogeneity is built in right it's a distribution and so it's spread quantifies uncertainty it quantifies the diversity of the population and so if you want to then take so what that allows you to do is take modern probabilistic modeling tools uh many of which are using cutting edge deep learning auto differentiation architectures to do inference and just say like okay why i have this distribution what's the probability distribution of this treatment on this outcome what's the probability distribution of this treatment on this outcome if this person were over six foot five and um likes to ride bicycles right and so that's all just we can we have modern tools in probabilistic modeling that we can just apply directly to those problems and and and so that's the approach that we take in the workshop because many people actually already have some skills in uh using some of those tools for example pi torch but there's also one of the things that popped out to me in this interview was kind of talking about the joy of modeling and that's something that really resonates with me you guys had a conversation about flow state i talk to people about it all the time i think getting into a flow state is really something that you should try to make your work facilitate and so i really want to get back to doing stuff where i lose track of time and are able to make big progress on projects with a little bit of time and space you know i could think kind of back in my career as a statistician data scientist machine learning engineer where i learned some technical skill and it dramatically improved just my experience at work say for example i learned a little bit of functional programming how to plan it apply it to exploratory data analysis i learned probabilistic programming and how you can compose smaller programs into bigger programs using some ideas from category theory to model complex systems and it was great you can solve new problems and uh you know more importantly i think for me personally more importantly was that aside from the productivity gains and the ability to do new things it was just funner right like i was able to get into that flow state um you know there's other sources of joy right ecstatic bliss fiero uh i don't know i don't care about those like i can't i really care about just getting into a deep work type of mindset and being able to do cool things with with my data with cool tech and um is the idea that causal modeling provided some of that for you the approach that we're working on in this workshop does right so a lot of causal modeling if you go to the textbooks it's just like um you know let's construct an estimate an estimator for this identifiable s demand and there's a whole bunch of theory and math and and it doesn't really you know it's almost as though you have to learn a new degree just to just to be able to apply these methods when if you can take some of these concepts from generative modeling from probabilistic programming from you know in some cases functional programming and and connect it to your intuition and understand how bits of a causal model can compose together like programs it actually makes it uh well frankly much funner than other ways of of going about it also makes it easier to learn and it makes it easier to apply and so that was one of the things i got i got to thinking about when i listened to this interview with sean nice nice yeah i've often found it interesting when folks ask you what are the prerequisites for the course and you reply well some basic probability but uh it's not like you need to have you know some deep theoretical grounding in causal causal modeling or causality or uh step statistics or anything like that yeah and obviously those things would help you if you wanted to apply it to a specific applied problem where you know say you're working with some variable that's that has a lot of nuance and and uh the in terms of statistically modeling it or you need to understand a little bit about say i don't know you know measure theory i that that's possible with a specific class of problems but it's completely separate from the problem of applying a causal model to the problem and reasoning about it causally did you have any other clips that you wanted to share no no here's the spit off you were talk guys were talking about leaky abstractions i think i i it relates to what we were just discussing in terms of reasoning about a causal model as a program that you can build on iteratively and you know apply some unit testing to uh you know have an explainable model and kind of build a component of that model where one thing is working and then you can move on to another part of your causal model so that you can grow it into something that is uh building value for you and your organization over time and so i'm wondering like the interface i guess the thing that i'm thinking of is like leaky abstractions like you know how does that how does that manifest in trying to use causal models in a real-world scenario like this yeah i love that question i've been telling my team for we've been working on this for about two years now is that we're not building a model we're building a system for building models that's going to evolve over time to help us like make agile adjustments to the way the business runs or to or as we have new information that resonates with me because two things number one a lot of tools in machine learning and statistics they kind of encourage you to get really good at learning the tool they block they black box most of the workflow and you just kind of learn some kind of art of hyper parameter optimization and with these types of causal models you need domain knowledge and and so you need to get you need to be thinking um in very detailed ways about the data generating process and so if you can if you can take if you make clear composable abstractions of that process and focus on some small element of it to start with and then build it up over time and also the inference algorithm if you can separate the model from the inference algorithm you can perhaps use some cutting edge deep learning techniques for probabilistic modeling say stochastic variational inference and and you you learn how to you know make programmable inference and build that up over time not only do you have something that's more robust because you can make sure that each component is working in isolation before you you uh bring them together but also something that builds ip over time for your company and this is a lot different from how causal inference is usually taught because it's usually thought as though uh the problem that you're presented with at the beginning is the is the only problem that you're ever going to face and so if there's a new problem you have to start from scratch and build a new model and in a production setting that's that's not a good approach well your target participant uh for the course is someone with a someone who's thinking about these problems from an engineering perspective as opposed to a traditional statistical perspective whether that's you know in the the kind of sciences or social sciences or what have you which is a lot of where the where causality's been uh been where folks are thinking about causal models and causality is that right yes so you know you guys had this blurb and so we have to incorporate the effects of our previous decisions into the forecast there's really like a rich space of modeling problems just within forecasting and it's really never going to be as simple as just take this line and extrapolate it into the future and we have to think about a system rather than just like any just a particular model do you think about incorporating in the potential decisions that you could make into your forecasts how do you close that loop do you end up using simulation techniques or other types of techniques to do that yeah it's at the end our forecasting system is designed around causal models right and so that at least currently is the killer app for causal modeling in production in most settings if you're if you're building some kind of predictive algorithm it's going to generate a prediction that you are then going to use to make a decision but often in fact more than often the this that decision impacts the outcome of that decision impacts the data that your algorithm is going to use to make future decisions that feedback loop can cause a significant amount of bias to the algorithm and so yes that is the main engineering use case that said we have lots of researchers in applied fields who are trying to get a global perspective on causal modeling um many people from public health and many people from economics for example who maybe have learned a few specific causal model causal modeling or causal inference techniques that are popular in their domain but don't really have a global understanding for for the theory got it got it let's maybe switch gears and talk a little bit about the course itself what's the structure for the course it is a cohort based course we have online lectures and videos we have assignments programming assignments and we have a weekly retrospective cohort meeting where we go online and talk about that week's lectures answer questions maybe workshop some people's individual problems and then alongside we have some reading groups with previous students who are interested in talking about adjacent topics uh and then we have a project-based element to the course such that if you're trying to build a project uh you can you can where you can apply these ideas we'll provide support with that and you can team up with other members of the of the uh course to come up with a good project outcome can you mention a couple of projects that uh students have taken on uh recently one student used a deep learning technique called normalizing flows to implement a counter factual reasoning algorithm on images some other students used these students were fans of soccer football and they used they built a college model that would predict the outcome of a trade other students prior to covid this was a really cool project they used they downloaded a bunch of airbnb data downloaded a bunch of real estate data from redfin and then composed them into a model that would predict uh how much basically if you wanted to buy a house with the goal of maximizing revenue from airbnb it showed you you know basically created a search engine for you know airbnb optimal properties another then post covid some people did some really interesting things with um epidemiological models and adjusting them so that they were they were using causal reasoning nice nice and all that might sound intimidating to some folks who would be perfect fits for this course in the past you've uh made a point to make sure to let people know that they can kind of adjust their they kind of you get out of it what you put into it and you can take it at different levels can you elaborate on that a little bit yeah so if you you know people are busy a lot of people are full-time employed who are taking the course and so you um we set it up so that you can move at your own pace if you miss a week because of work you can hop on to the cohort and get fairly caught up in the review section of the cohort um you know obviously if you were going to go to uh go deeper into the videos and homework you'd get more out of it but you're not it it's set up so that you're never in a position where if you if life happens you kind of get left behind the rest of the course and you know you won't have the time to uh to continue you'll always be able to um catch up and and get and get as much out of the course as you can awesome awesome well there's a ton more that we can go into uh one of the things that past students have raved about is the access they get to robert and the one-on-one guidance that he makes himself available to to deliver um but really the next step is to check out some more information about the course and you can do that at twimrlai.comcausal and thanks for tuning in and thank you robert for joining us you

Info

Channel: The TWIML AI Podcast with Sam Charrington

Views: 716

Rating: undefined out of 5

Keywords: Ai, artificial intelligence, data, data science, technology, TWiML, tech, machine learning, podcast, ml, Sean Taylor, lyft rideshare labs, experimentation, causality, forecasting, statistics, pytorch, experimental design, model development, fine-grained decision making, heterogeneous treatment effect model, lyft, dragonnet, prophet, open source, neural networks, moonshot, hierarchical modeling

Id: 5wbLy4SDuo4

Channel Id: undefined

Length: 58min 51sec (3531 seconds)

Published: Mon May 24 2021