Using Machine Learning and Data Science to Solve Real Business Problems (DataEDGE 2018)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
  • Original Title: Using Machine Learning and Data Science to Solve Real Business Problems (DataEDGE 2018)
  • Author: Berkeley School of Information
  • Description: Sourav Dey, Managing Director of Machine Learning, Manifold — AI and machine learning have the power to transform entire industries. Companies in ...
  • Youtube URL: https://www.youtube.com/watch?v=xo0bsiiQ9cM
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ Mar 26 2019 🗫︎ replies
Captions
Thanks for the introduction.  Thank you for having me. It's nice to be back on a college campus. Quick background on me. This is gonna be a more nuts and bolts talk about data science. I unfortunately did not go to Cal. I went to the Cal of the East Coast I guess, MIT. I did a PhD in computer science from there and then have been,   first, it was called an algorithms engineer.  Then, data scientist. Now, AI engineer. So I've been working as a data scientist for  almost a decade now and we founded Manifold  with a couple of co-founders a couple of years ago. Who are we? We call ourselves an AI studio,   really using the buzzwords there, but we are a  services company. We're a consulting company that   helps companies with accelerating their machine  learning and/or data engineering solutions.   Most of our clients are typically non-tech companies,  larger non-tech companies that are taking   their first journey into really becoming a more  data-driven company, putting machine learning and   data engineering as core assets to their business.  We're actually right down the street in Oakland,   headquartered there, and also officed in  Cambridge, a lot of alumni stay there. So what is this talk gonna be about? Really it was  kind of hard to fit everything into one talk   but what I want to do is just share some mental  models for applying AI and make them real using   some case studies from our work. So, at the top  level, the main model is that we've adapted a lot   of techniques from Chris Diem, if you're familiar  with that, and human centered design made by Ideo   and we've come up with what we call our Lean  AI Playbook of how to go into a new company,   a new situation where the business is trying  to get value out of AI and really making sure   that we deliver a win in six weeks, in 12  weeks, in 24 weeks, because that's really   why they are hiring us. They want us to really  accelerate getting value out of machine learning. I can't talk about all six of these steps  and I'll focus really mostly on Understand   which is understanding the business and  understanding the data. A little bit on   modeling and then a little bit on user feedback.  As machine learning people, data scientists,   we tend to focus a lot here on the modeling stage.  In ten years of practice, this is really a very   very small piece of the much larger puzzle, the  work that happens before and the work that happens   afterward to really make the business get value  out of machine learning, so I'll talk a little bit about that. I'll jump right in, so Understand. One of the mental models that we have is something...  I like to put Latex on slides because  PhD habits die hard but I call it the   AI uncertainty principle, which is really like,  the value that you get out of the AI is upper  bounded by the business value of the problem  that you're solving, the data quality that you   have to solve that problem, and then lastly the  predictive signal. Again, as data scientists,   we tend to focus a lot on how good can I  get the AUC, how good can I get the make,   but that is the thing that you can't predict. You have to actually do the data science to do it,   so really it's very important early on in  the project to focus over here upfront,   figuring out where to aim the AI and  assessing how good the data quality is.  Notice this is multiplicative so if  any one of these things goes to zero,   the value over here goes to zero, and that's bad  for us. It's bad for our clients. It's bad for a business. You don't want to get that. It erodes trust and you don't want to do that as a data scientist, so how do you prevent that from happening? Essentially, you have to do data understanding and business understanding. The two techniques we use here are a business   understanding workshop where we're really trying  to surface: Hey, get all the stakeholders in   the room, this is usually CEOs, chief marketing  officer, CTOs, potentially even finance people,   analytics people, to really figure out  if we were to solve this problem better,   what could the ROI be and what often happens  here is we will start an engagement being about   "Hey we want to do preventative maintenance  on these connected blenders that we have that   makes milkshakes". Turns out that that was not  the highest value place to focus the AI. It's better focused on "Hey can I instead forecast  what flavors will be best to sell to each of   these stores that can lift revenue by four to  five percent?" That's a much better place to   aim our AI and therefore get better ROI. On the  other side, we're working with the tech team:   the data analysts, the CTO, software engineers, to  figure out "Hey, let's catalog your data sources.   How clean is it? How rare is the event that you're  trying to predict? Is it labeled well? Do we need   to label more? Is the data joinable?" In many of  these large organizations the data has been siloed   in various different CRMs or data sources, and  there's actually not even a join key and so that   is an issue that we face many times. Turns out  you can use machine learning to learn a join key.   I won't talk about that but these are problems  that you should be thinking about. So, let's   make it real. One of our customers was one of the  leading registries in the United States and their   CEO wanted to hire us really to help serve her  customers better. I know that most of my revenue   comes from a very small amount of my customers.  How can I tailor my service to serve them better?   How can I find these customers? So when we went  in there, we did the business workshop and we did   a data audit. We came up with a spec. This was one  of these stories where we went and they originally   had a spreadsheet where you could track cohorts  and whether they activated or whether they bought   anything off of the registry or not. They kind  of had the idea but no one there has like "Okay,   let's get to the next revision of this." When we  did the business understanding, what we found out,   what they really cared about was that they  wanted to be much more data-driven in marketing   and in product, but their biggest problem was  that after people sign up, they didn't know   whether they were going to be high LTV or not,  a high lifetime value or a valuable customer,   until about nine months later with their baby  registry. So, that turned out, after a lot of   surfacing, was their biggest friction point in  their organization. So how could you make this   shorter? We then did a data audit and they had a  lot of data. They had been running the company for   almost ten years, so a lot of data from mobile  app, mobile app clickstreams, web clickstreams,   marketing data from Facebook, Pinterest, all of these things. They had data in their transactional   database about how the customers are using the  product, all the way to even demographics about   their customers. They can join it with census data  and other creepy data sources where you can put in   people's emails and it'll tell you demographics  about them. So after cataloging everything,   the spec that we came up with is: we're going  to build a model that will predict the final   customer lifetime value after nine months every  day after sign up. So one day after sign up,   two days after sign up, four days after sign up,  30 days after sign up. So every day we would make   a new prediction of what the expected customer  lifetime value would be and in addition, we   focused on only the transactional database because  turns out the data quality was very high there and   there is a tax that you have to pay for every new  data source that you bring online. We thought,   after looking at it, that the transactional  database has enough signal in and of itself. We   don't need to pull in a terabyte of heap analytics  logs or a terabyte of Mixpanel logs. We can just   focus here on the transactional database. Another  problem. I'm gonna use these two case studies   throughout. So another client we had was an oil  services company out of Oklahoma City and their   goal was to again be more data-driven  about their maintenance operations.   They have thousands of machines out in the field. They are breaking down. They actually don't sell the machines, they lease the machines and they sell up  time. This is again out of surfacing the business   understanding. They sell up time and so they are  also selling the maintenance contract along with   it and if the machine goes down, they lose money.  In addition, they have to roll trucks to go do the   maintenance on them and so that was their burning  problem. We also did a data audit. Turns out this   was one of the cases where they thought they had  much better data than they really did. They had   almost a decade's worth of service logs from the  machines where the maintenance techs are typing   in and clicking little checkboxes about what parts  they're using to fix these machines. They thought   that they could use that to predict whether a part  was going to fail. Turns out that this is what's   very very common in human input data. Humans  are inconsistent over time and across humans,   so there's a lot of inconsistency on how the  exact same failure was written up and diagnosed   and check-marked in these are service logs. In  addition, the service logs had a very quickly   changing lineage, in the sense of how things  were input three years ago is very different   from how things were input a year ago and how they  are input now. So the schemas changing and all   this stuff is a very very difficult data source  to work with. Instead, what we chose to focus on   is a huge historical log of sensor  data. They have 54 sensors on these machines   collecting all sorts of things like vibration,  temperature, the states of different registers   on the gas compressor equipment. We chose to  focus on that because machine data is much more   trustworthy. The lineage didn't change as much  because once it's out in the field, it is out   in the field. Number three, it's just much more  trustworthy. So what we ended up kind of focusing   down on, posing the machine learning problem,  is to forecast if a major fault,   and again we have to define what a major fault  is and this is where the business value meets the   data. We define major fault as whether the machine  was down for more than two hours because according   to their maintenance organization, this is  when they're getting many many more calls. This is oftentimes when they have to roll a truck. This  is often indicative of a true major failure in   the parts because otherwise these are kind of like  computers too. Sometimes you can just reboot it and it comes back up and it's okay  but if it's down for two hours typically we can't   just knock it on the side and it'll be okay. There  is actually something going on. That's kind of   the data understanding part. I just  can't emphasize the importance enough. This is such an important part of doing data science out  in the field because if you don't aim it at the   right place and a problem that is solvable with  the data that you have, you will fail.   It's upper bounded by zero and it's not going to be good  for anybody and you'll spin your wheels.    It's all about reducing waste that way. Let me move  on. I think I'm not gonna talk about engineering.   I'm gonna talk more about modeling here because  it's the fun stuff but one mental model   we also have, and this is actually by a friend in  the firm. I don't know if you guys have seen this diagram but this is by Monica Rogati. She's a  data science celebrity on the original LinkedIn Data Science Team from about ten  years ago. She has this fantastic diagram which   basically puts modeling right at the top. All of the  stuff that we think is the sexy stuff is at the top.   All the "boring stuff" is at the bottom  but the boring stuff is not actually that boring   because it's the foundation. If you don't do this  stuff right, you will never get success out of that.   This is why engineering is really important.  That being said, once you get to modeling, I can't emphasize this enough either and I'm  sure you're being taught this in your courses,   build a baseline model. No exceptions. We'll hire new people out of school and they'll   want to go be like "Oh we should use WaveNet  and use this pre-trained model. We'll cut   off these layers and we can put it in again." It's  just like "Dude, let's try division." That's oftentimes how a lot of these initial  conversations go and division is a great algorithm.   It's been proven to work in a lot of places. [AUDIENCE]: *chuckling* So that's what I mean by a baseline model. Do the simple thing first and rules of thumb that  we have found useful over the years is if it's   a regression problem, turn it into a classification  problem. Quantize, even if you do multi-class because a classification is easier  to understand than a regression. You can look at   AUCs, you can look at class errors. You can learn  from that. Secondly is usually the progression.   I don't always start with just division and  usually we're starting with random forests, then going to grading boosted trees, then  going to deep learning. Random forests are awesome and, oftentimes, is just the thing that  we put into production because it's just so easy   to tune. It's so robust at overfitting, it's  fantastic. Lastly, on the feature engineering   side, pick a few features, iterate from there.  This is a part of the engineering   I didn't talk about. We're often working with the  client and getting a prioritized list   of what features that they think have the most  predictive signal and also scoring them against   how difficult it would be to engineer them.  There are some things that are like a single   sequel query. There are some things where I have  to join across 17 tables and have like 24 sub   clauses in my sequel query. That's hard. Let's do  the simple thing first, but maybe this thing has  a lot of predictive signal. I do have to do  that. So really judging that and then iterating. Everybody here is very familiar with evaluation  metrics. I like to classify them into two buckets.   One bucket is the Aggregate Metrics. This is the thing that you're actually seeing.   How well is it performing? These are the AUCs, ROCs,  TPR at some false positive rate, but oftentimes   we're looking, especially in the learning phase, is at  the individual metric, so the sample level metrics.   Here, this is a common plot that we look at. I'm  looking at the true negatives and   seeing what the model predicted on them. These are  the true positives and I'm seeing what the model   predicted on them. As you can see, some of the  true positives are doing really really well, but many of them are not doing so well. What's going on? Who are these guys where   it's predicting so low? What's up with them? We do  kind of this analysis of just looking at the   four corners in the middle and letting that guide you  on what features you should make next or perhaps   how you should change the architecture. Making  this real again, this is going back to the baby   registry problem, we started with the two simplest  features which were: What what platform did they   use to sign up? This is iOS or Android or  Mac or Windows. And where they came from.   Pinterest or Facebook. Just with that simple model  and literally no features about the usage pattern.   These are super easy to make. I can  just query it right out of the table. I got a 0.65. Then we added 11 more features  that was in our priority list. Write the sequel query, put it into the Python  model, you see a 0.90, so at that point,   we're killing it. We had thoughts about how we should do embeddings and   we should do a multiscale convolutional neural network. Forget it, 0.90 AUC, this is   amazing. Seven days after sign  up, I can predict whether or not this is actually   a lifetime value bin of $500 or  greater. I can predict that pretty accurately so   that's amazing. Let's move on to the next problem  that's a better use of a data scientist's time. Similarly, the oil services problem. This was  a much more difficult problem with a huge data sensor time series. We had to  sample it and do some sample rebalancing   because the things that we're looking for are rare events. We did some feature engineering,   went into the random forest. With a few features, got a not so great   AUC of 0.65. Added a few more features, got to an AUC of 0.78.This started to saturate a little bit.   For the custom feature engineering, we made five  or six more features. Not doing great, we're like   okay, let's move on to convolutional neural  networks. I think that could be really really   good. We did a multiscale convolutional neural  net where we look at a look-back window, similar   to a WaveNet, if you're familiar with it, to try  to predict out five days, whether a failure is   gonna happen. After two weeks of trying to tune  the hyper parameters and tune the architecture,   we weren't getting materially better performance.  The AUC is about the same and so at that point we   went on to the next stage and we were thinking  about doing some more advanced modeling and   mixed effects modeling. We were like "You know what? Forget it. Diminishing returns." It's more   important to move on in the cycle and put this  in front of the user. That's kind of the last   step I'll talk about: user feedback. What do I  mean by user feedback? We are getting the predictions in front of the user that  will actually use it. So in this context, for   the baby registry company, it was in front of the  marketing team that was using this to make decisions   on whether this campaign versus this campaign was  doing better, in front of the product team.   For the oil services organization, this was in front of  the maintenance organization that is actually   triaging these predictive maintenance  things. We're doing working sessions with them,   and again, this is more of a design philosophy  but it works really really well in this space   because you don't know how this is going to be  used and you don't know how well you have to do   for the business to get value out of it,   so it's really important to get it in front of  the user. Oftentimes, first, we're going with  nothing. We're just seeing what their flow is like right now. Then we go dump predictions into Excel, give it to them. Then, perhaps we're making a Jupyter Notebook where we can change a few parameters,  playing with them there.   Eventually, we'll build a web app  or something like that around it but this is   very important. What happens is that "trust nobody". Right? Like nobody trusts models, especially   black box models. Even I don't trust black box models Something comes out, you're like  "It's probably wrong". That's like my first instinct  from a model. I'm an engineer by training and   it's magic that anything works at all because you  know the shortcuts were taking, right? What we do is we'll do sensitivity analysis and so we  actually have a package that we use internally   that we've developed, that can probe the model in  different ways to see if the intuitions that the   customer has matches with what's coming out of the  model. For example, "Hey, does the predicted failure   rate for the cohort match the historical average  of failure? If sensor A goes above this psi, does the likelihood of failure go up?   These are kind of known heuristics in people's heads.  The model has to match that, otherwise,  your model is likely wrong. In addition, this is what builds trust in the model. The second thing we're often finding, this is true, is that the predictions are never enough. The raw predictions are never enough.  It's not solving the business problem. You have to build a UI around the AI. What do I mean by that? I'll make it concrete.   In the baby registry problem, the product team  came back to us and they're like "Okay, I want to   change the product. I want to have this special  promotion where if people do these ten actions,  they will get a free box that I will ship to  them that's worth twenty five dollars.  I want to know if this will make my LTV better. Is  this trade-off worth it? I'm gonna have a higher   customer acquisition cost but will the final  LTV be worth it? How do I answer that question?" This is the business problem that they care about.  Turns out that the model can answer that, but just   giving them raw predictions is not enough. We have  to do a Temporal A/B test, where we have to come up with some new math to do that.  How do you do an A/B test with predictions? In addition, we have to retrain a model without  the features that would be confounded by this experiment. In the end, we gave them a tool  where you can take out certain features out of the model, retrain it, run it on two different cohorts.  You pass in two different CSVs. It runs it on the   two different cohorts and it does a modified Welch's test to tell you if the two   predicted LTVs are really different from one  another and even gives a p-value.   That's the thing that gave the product team value, not just the raw predictions. Similarly, on the other side, on the oil   services problem, we delivered the raw predictions  and we were really happy with ourselves in   the sense of "Hey look, there's all these units  that have high probability of failure today" and   we're thinking "Oh man, they're gonna roll  trucks and it's going to save the day. It's going to be fantastic." Turns out we took  the Excel spreadsheet to the to the maintenance people.  They took a look at it. They double-click  down into which units they were. Turns out that   "Oh yeah, those units? Man, we're driving those way  out of range. We know that basin has really   really high line pressure. They break down all the  time. We know that. We're driving it out."  It's like, "Oh. Okay. Yeah, this  means nothing to me because I know that I'm   driving these machines in a place that will lead to more  failures so, at least that was a sanity check on   the bottle that is predicting that it's going to  fail, but it doesn't have value. So what we ended up   doing there to solve that first problem is we're  looking at the differentials and probabilities so   the prediction comes out every day. We're really  now alarming on when the prediction changes.  If it's 0.2 for awhile and it jumps up to  0.6, that is what we alarm on. Secondly, there are so many features and sensors to look at and there are so many different   failure modes. They wanted better direction in  their triage. We ended up implementing what's    called the Tree Interpreter. It's a way that you can interpret what is coming out of a random forest or a gradient  boosting tree and it tells you why it's making   the prediction. What features are driving the  probability being higher or the probability being lower? That sort of "explainable AI" is very very  useful in the final web app that we delivered to   this client because it helped them direct their  triage. Otherwise, it was just taking too long to   figure out what could potentially be wrong. What  should I throw on the truck when I go out there?   So that's it. There are  many other things that I didn't talk about,   like Use Docker. Don't be a  pirate, be the Navy, so be good about your software   engineering practices. Embed high cardinality and categorical variables. Hopefully, I've been able to communicate a few of the mental  models that I found useful. Any questions? [AUDIENCE APPLAUSE] [HOST]: I can see that we are short on time. We can take a couple of questions but we will have to move on from there. [HOST]: Will you be around after? [SOURAV]: I will be around, yes. [STUDENT]: You've probably run into a lot of interesting  questions to answer every month and I'm   wondering, if we go back to the previous slide.  I know modeling is kind of at the top of the   pyramid but what percentage of time are you really spending with your clients in potential stages? [SOURAV]: Yeah, usually Understanding is one week.   Engineering is where we're spending a lot of time to really understand the    tooling that we've built up for certain constants.  It's like "Hey, we're gonna deploy to the cloud. We're gonna share and collaborate but there are  a lot of specifics of the problem that we need   to understand, the scale, the velocity of the  predictions that need to come out so we're using   different things from the RISELab and the AMPLab. We also use Spark and Clipper and  stuff like that, spending a lot of time here.  We want to quickly get through the   modeling phase to get something in front of user  feedback. Then, this is always unfortunate,   there's a lot of time spent at Deployment because  of details. Deploying it to  the cloud in the infrastructure.   Then, this is where we're not involved anymore but   we're monitoring this with the client. Right now, that oil services company has been   running validation for the past two quarters  on the product that we've delivered to them. [STUDENT]: Thank you for the presentation. We talked during the break so I was excited to see it. One of the questions that I had was  on the slide you had about building trust in your model. How do you avoid possible goal  seeking so that's like when you have an   answer that you're expecting to see  and then working towards that. How do you avoid doing that? [SOURAV]: So like overfitting to  expectations in some sense? That's a very good question, one that I haven't  thought about too much. Usually, these are basic heuristics that we're looking at.  I don't think we are. Nothing has   been so specific that that antenna has gone up.  I know this is probably an unsatisfying answer,   but usually these heuristics are really  reasonable things that were looking at. If it was like some specific thing that is like "All these points have to match to one. That has to be probability 0.71". It's never been that kind of an issue. These are much more global aggregated things that we're  looking at. [HOST]: Please join me in thanking Sourav. Thank you so much.
Info
Channel: Berkeley School of Information
Views: 15,651
Rating: 4.9764705 out of 5
Keywords: UC Berkeley, ischool, school, of, information
Id: xo0bsiiQ9cM
Channel Id: undefined
Length: 29min 6sec (1746 seconds)
Published: Fri Jun 15 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.