Michelangelo - Machine Learning @Uber

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so I'm gonna talk today about machine learning at uber and there's sort of three phases of the talk the first one is to go over some of the interesting sort of use cases of mlah doober second piece is around looking at the sort of first version of the platform that we built to support those use cases many more and then the final section is is kind of more tailored to this this track which is around kind developer experience and how or we're working right now to kind of accelerate machine learning usage and adoption and and and an innovation in the doober through better tooling and better experience and end all right so first thing ml doober uber I think is one of the most exciting places right now to do machine learning for a bunch of reasons you know first is that you know there's not so one or two big use cases that consume all of the ml attention horsepower there's a wide wide array of projects of more equal weight across the whole company and we'll go through a bunch of those a second one is the sort of interestingness of the data like uber is uber operates in the physical world all the the rider partner and driver apps you know have GPS is they have accelerometers and we collect a lot of interesting data about the physical world and of course the car is in a move around the physical world and so we're not just dealing with people clicking on webpages we're dealing with with things out there in the world the third thing is you know uber is the younger company and so in a lot of cases you know applying machine learning in areas the very first time and so you're not trying to you know grind out a few extra fractions were sent of accuracy you're actually seeing you know giant swings or the first time you get a new model deployed in the production for somebody's case and the final one is that ml is really central and strategic to Cobra at this point you know the decisions and features that we can we can base on the data that we collect you know very very hard to copy and then also you know ml is one of the things that helps uber kind of run the whole machine more efficiently it supplied lots of places to to make the product and a lot of the internal operations run much much more efficiently alright so dated uber you know data I mean uber has grown a lot in the last bunch of years we have 75 million riders now 3 million drivers completed 4 billion trips last year and so it's even bigger this year we operate in six hundred cities and we're completing you know more than a million 15 million trips every single day and to give you a sense again this is several years ago so things have grown a lot since then but this is so GPS traces of the driver phones in London over the course of I think six hours you can see how you know the car is very quickly cover a lot of the city and this is all data that we can use for for machine learning and so this is you know there's over a hundred it's sort of ML use cases our problems being solved in uber right now so this is a small sampling of ones but you can see it really cuts across the whole company from uber eat and we'll talk about that one in more depth to self-driving cars to customer support pricing forecasting and then even things kind of more removed from the product like doing anomaly detection on on system metrics and you know the backend services and then even like capacity planning or data centers to make sure we have you know adequate hardware capacity for both the long term as well as shorter term spikes that we get on on big holidays like New Year's Eve and and Halloween so here kind of walk through a few interesting ones uber eats every time you open the ubereats app we score I think hundreds of different models to generate the homepage for you so we use models to try to figure out which restaurants you're most likely interested in ordering from and so we do ranking of restaurants within some of the screens there's actually meals since we rank the meals again trying to see which ones you're more likely to want all of the delivery times you can see below them below the restaurant or m/l models trying to predict you know how long will it take once you place the order for the order to get prepared for them to notify the driver partner to drive to the restaurant to get out of the car and walk into the restaurant to pick up the meal walk back to the car then drive it to your house and so ml is used you know to to model that whole problem and give you know pretty good ETA s for delivery time for for the meals and then finally search ranking when you I search for a meal it will again not just do prefix based searching but also try to predict your intent and in what you're looking for and this is you know this is you know an a/b test this doesn't make some big difference for overs business self-driving cars you know you know the cars have to use you know they have lidar and cameras that try to understand the world around them and they also so they used ml4 for that for object detection trying to find where the streets go looking out for pedestrians or the cars and then also at the parts in the process for for planning and route finding and so forth and the cars are mostly deporting these days eta s so ETA s are the you know in the app when you request a ride or are about to request a ride it tells you how far our way the driver is and this is super important both for the product experience for our users because you know if ETA is incorrect it's quite frustrating and it may some of if I tell you use it but it's also fed into lots and lots of other internal systems and drives pricing and routing and a bunch of other things and so having accurate ETA is is super super important to uber and and it's a it's a hard problem and uber for a long time has had a route based you know ETA predictor that will look at the segments of the rodeo and travel over and you know average you know travel you know average speeds over the over and in the past and it will use that to predict kind of a CTA but we found as those ETS are usually you know wrong to some degree but they're wrong in in consistent ways or predictable ways and so we can fit models to the error and then use the prediction to correct the error and give you dramatically more accurate ejs across the board not making you know BER you know used to use Google Maps and now we're building out our own mapping infrastructure and as part of the map making process there's some of a layering of evidence collection you start with a base you know street map and then you layer on evidence to make it more more accurate and so what I things we do is if you have cars around with cameras on top and take pictures of all of the buildings and street signs and then use and also tag those with the GPS coordinates of where the picture was taken from and then use ml models to to try to find you addresses and street signs such that we can you add them to the database and help to you know make the overall the the map itself more accurate and consistent so you get sort of a base map and you layer on evidence that we collect you know with sensors and cars and machine learning to actually find first-first we figure out the the objects we're interested in and so in this case you can see you know street signs and addresses and then we apply text extraction algorithms to actually pull the text out of the image and then in the actual text where there's an address or a street sign or a you know restaurant name can be fed into the database destination prediction when you open the app and you are starting to search for where you want to go ml again is used you're like in the eats case to try to help you know help you you know help help you find the place you want to go on in forecasting in marketplace you know uber is a marketplace we try to connect riders and drivers for rides and it's you know for the thing to work it's very important that the riders and drivers you know be close to each other in both space and time if you request a ride and the driver is very very far away across the city doesn't work because it takes too long to drive across the city to get to you if you request a ride and there's no drivers available even ones that are close doesn't work either and so the sort of proximity and space and time of supply and demand is quite important and you can sort of contrast that with a business like eBay which is also marketplace but you can you know you can order a futon today from LA and they can ship it next week and that all works out even though the distance and time are spread out but for the sort of spatial temporal thing is quite important and so an ubers Maps you can see little hexagons there we divide up the maps into hexagons it's a more efficient way than a grid to organize maps but we use deep learning models to predict a variety of marketplace metrics at various points of time in the future so you know drivers who will be available riders who will want rides and then can identify you know gaps between supply and demand in the future and then use that to help encourage drivers to go where there will be demand to help again keep ETA slow and innovation high customer support you know there are 15 million rides a day people leave phones and backpack in the back of cars and they file a customer support tickets and those get ratted to call centers and uber spend lots and lots of money for people to answer support tickets and what happens when I stick it comes in is the person has to read the ticket figure out what the problem is and then pick from a big menu of responses you know for the prop for the proper response for lost-and-found or whatever else we can use deep learning models looking at the text of the message to try to predict what the actual problem was and then reduce the menu options from I think 30 down to three like a three most likely response templates and so yeah that gave I think initially a ten percent boost but I think we have another model which gave another another six percent boost in in the speed at which these people can answer tickets which is you know 16 percent off of the cost of the other thing which is huge for us and actually another one that's quite similar although different application is this this new one-click chat thing that we released recently and the idea here is that when when when a car is coming to pick you up you often want to communicate with a driver to tell you know where they hurt you know exactly where you're standing or if you're you know running down the block but it's hard for drivers to to type and it's really easier to chat and so we have a an LP model that basically predicts the likely response the next response in a conversation and so you can very quickly communicate with the driver and vice versa via just picking responses out of menu as opposed to typing and I forget the exact accuracy rate but it's quite high and you're able to carry on pretty good conversations without actually typing any text which is pretty cool all right so that's you know that was like ten or something out of you know close to a hundred different use cases around Ebor where ml is being used and over the last three years we built a platform called Michelangelo which supports you know the majority of those use cases and still talk a bit now about sort of the philosophy the platform and as for the first version of what it covers so the you know overall mission of my team is to you know build software and tools that will enable data scientists and engineers around the company you know to you know kind of own the end to end to the you know deploy and operate these ml solutions that we just saw before and to do it at full uber scale and there's a big sort of deaf experience component to that because you want to empower the same person to own you know own the end and you know from the MOT from the you know idea and the prototyping the model all the way through deployment and production the more you can have one person on that process you know the faster you can move through it and and you know modeling working is very iterative the faster you can move through things as a compounding effect because there are lots of cycles as you experiment with new models this is we had a blog post recently in addition to the technology there's been a lot of kind of organizational and process aspects to ml Huber they've been quite important and making it work well at scale in terms of the system scale but also the organizational scale and there's a blog post we put out recently that describes a bunch of this all right so you know v1 of ml at uber was really just to enable people to do it and and that's been quite successful and powerful but it's you know it wasn't always the easiest thing and so as we look at you know v2 was more around how do we improve developer productivity and an experience and an increase so the velocity of modeling work and deployment work again to facilitate kind of innovation all right so this is walkthrough of the platform so the first version of platform and then we'll talk about the things are doing now to make it better and faster so one of the early kind of hypotheses that we had or a vision around the platform was that machine learning is much more than just training models that there's a whole end in workflow you have to support to make it to make it actually work well and it starts with managing data and this you know actually in most cases ends up being the most complicated part of the of the process you have to manage the you know datasets that use for training the model which is the the features and the labels and and you know it has to be accurate you have to be able to manage that for training and Retraining and then when you deploy the model you have to get that same data to the model in production and at uber and most models are deployed into you know a real-time prediction service for request response based predictions and so in many times you know the data that you need for the model is sitting and it duped somewhere and so you have to wire up the pipeline's running kind of analytical queries against this Oracle data and then delivering that into a Kiva I used tor where the model can read it and so a lot of kind of complicated pipelines for getting the right data delivered to the right time in place for the model to use it at scoring time training models obviously yes actually train the models and that's you know we do a bunch their model evaluation you you you know modeling work as very iterative and so you want to be able to have good tools for comparing models and finding out which ones are good or not deployments you know once you have a model that you like you want to be able to click a button or call an API and have it deployed out across your serving infrastructure and then making predictions that's the obvious part and then and monitoring is interesting and that you you know you you train model against historical data you evaluate against historical data and then when you arm it when you deploy model in production you then don't actually know if it's doing the right thing anymore because you're seeing new data against the model and so being able to you know monitor the accuracy of prediction is going forward in time becomes quite important and what we found is it the same workflow you know applies across you know all sorts of or most of them all of the ML problems we've seen from traditional you know trees and linear models to deep learning you know supervised and unsupervised you know online learning where you're learning more continuously you know whether you're deploying a model in a batch pipeline or online or a mobile phone and then even as we saw in that marketplace case you know it works for you know classification regression but also for time series forecasting so for all these things the same basic workflow holds true and so we spent time building out you know platform is support to support these things all right so managing data I hit a bit of this before already but you know most cases data is the hardest part ml and we've built a variety of things including a centralized feature store where teams can can register and curate and share features that are used across different models and that you know that that facilitates modeling work because rather having to write new queries to find your features you can just pick and choose them from a feature store and then you know more important as importantly you know once your model goes into production we can automatically wire up the pipeline to deliver those features to the model at production time training models you know we run large-scale distributed training for both CPU clusters for trees and linear models and then on GPU clusters for for deep OD models in the case of deep learning you know we base a lot of it around tensorflow and pi torques but we built our own distributed training infrastructure called horribad I won't go into too much detail here but and while I could come back to this in the experience section but Horvat has two interesting aspects one is that it makes distribute training more efficient by getting rid of the parameter server and using a different technique involving MPI and re reduction to more efficiently you know shuffle data around during distributed training but it also makes the the api's from managing the distribute training jobs are much much easier for the modeling developers and we'll come back to that later so it's it's quite strong in terms of scale and and and speed but also much much easier to use managing eval models again after you you train models you know you often train you know tens or hundreds of tens or hundreds of models before you find one that's that's sort of good enough your use case and so being able to keep a rigorous recording of all them all as you train the training data who trained them as well as a lot of metrics and reports around you know accuracy of the model and even debug reports helps the modelers you know iterate and eventually find the model that they that they want to use in production and so we invest a lot of work here and and sort of you know collecting metadata about the models and then and then exposing it in ways that are very easy for developers to make sense of and move the modeling process forward and so we have this is for a for a regression model and so there's the standard kind of error metrics as well as reporting to show you know kind of the accuracy of the model very standard things a day of scientists or are used to doing for a classification model again different so metrics but again the things that people need to use to hone in on the best model for their for their use case and then you know for all of the different features that go into the model we look at the importance of the feature to the model as well as statistics about that data so the you know the the mean the min the and deviation as well as distribution again all things that help you understand the data in the model and accelerate the work here and this is a 4/4 tree models we expose a tool that lets you actually dig into the structure of the learn trees to help understand you know how the model works and and help explain you know why a certain set of input features generates a certain prediction and so you know across the top you can see this is a boosted tree with each each column and that top grid is a is one tree in the forest each row is a feature and then as you click on a tree you'll see the tree at the bottom with it with the all the split points and distributions and then you can actually fill in data on the left there and it will it will light up the path through the trees so you can see how the tree handles that that feature vector so again if the models not behaving correctly you can pull up the screen and figure out exactly why the model is generating is generating a certain prediction for a certain input set of features and then deployment and surveying so once you've once you've you know found the model that you want it's important to be able to deploy it uber does both batch predictions meaning you you run a job once a day or once an hour to generate lots lost predictions or you can deploy model into a essentially web service so a container that will receive you know network requests and then return predictions and and you know most models uber and a lot of the ones that I showed before are all of that naturists you open your your etes app and it calls the backend services and it will score a bunch of models to render to render your homepage you know in whatever it is her milliseconds and so we have built and operate the prediction clusters that scale out and are used across the company and kind of a quick architectural diagram that he is from the client sends the the feature vector in we have a kind of routing infrastructure and then within the model or sorry within the prediction service you can have multiple models loaded and so based on a header it will go find the right model send the feature vector to that model get the prediction back in some cases load more data from Cassandra which is the features where we talked about and then return the prediction back the client and I think right now we're running close to a million predictions a second across all the different use cases imma doober which is quite a bit all right so yeah so the 1 million plus and then you know in four four trees and linear models the the scoring time is quite fast they typically were or less than five milliseconds for P 95 I think it is for if there's no if there's no Cassandra in the path for the for the online features and then when you have to call Cassandra to get features it adds another you know five or ten or twenty milliseconds so I'm all-in-all still quite fast for predictions which is good we're starting to work on more deep linear models and those are trickier because depending on the complexity of the model the inference time can actually go up quite a bit but for trees it's it's usually very very fast and the final bit I talked about is you've you know trained against historical data and now you deploy your model production and you want to make sure that it's you know when you evaluate it against Oracle data you know your model was good for last week's data but now it's run in production you want to make sure it's actually good for you know for the data that you're seeing right now and so we can do and we'll come back to this at the end there's another kind of newer piece of this but there's a few different ways you can you know monitor your predictions the the ideal way is where you can actually log the predictions that you make and then join them back to the outcomes that you observe as part of the running of the system later on and then see how you see whether you got the the prediction right or wrong and so you can imagine you know for the uber etes case we predict the ETA for a certain restaurant and you order the meal and then 20 minutes later delivers and then we know the actual arrival time from that meal and that's collected at one of our back-end systems and if we log the prediction that we made for your when you view the screen and then join that back to the actual delivery time we can see you know how right or wrong that prediction was and you collect those in aggregate and then you can generate very accurate you know ongoing accuracy reports for your model in production and this one because you have to wait for batch processes to run to collect the outcomes you can get good monitoring but there's a I think an hour delay before you can you know no no you know how correct the prediction was so from an architecture perspective you know again walking through you know along the top you have the different workflow steps and then the bottom you have both our offline back systems at the bottom and then our online systems at the top and what can I just gonna walk through the stages of the architecture and so in the offline world we start in the lower left with our data lake and so Oliver burrows data funnels into Hadoop you know Hadoop in hive tables and that's the starting point for most you know most batch data data work it was part of a platform we let developers write either spark or or sequel jobs to to do the kind of coarse grain you know joining an aggregation and collection of future data and outcome data and then those are fed back into hive tables that are used for for training a batch prediction and then in cases where you want those features available online for prediction time those those values that were calculated and those those bats jobs can be copied into Cassandra for online serving and so for example in the uber eats delivery time case you know one of the features is something like you know what's the average meal prep time for a restaurant over the last two weeks and and so that's computed via you know a spark job and because if the two-week average is kind of okay if that only gets refreshed in Cassandra once or twice today because two weeks plus or minus 12 hours doesn't make that much difference for that kind of metric and so that's that one is fine flowing through the bottom batch path that gets computed once add a load to Cassandra and then we can use that same value for every single prediction however there are cases where you want you know more you want the features to be a lot fresher and so in addition to the two-week meal prep time for restaurant you may also want to know you know just gives you a sense of how you just how fast the restaurant is in general you may also want to know you know how busy is the restaurant right now so what was the what's the meal prep time over the last you know one hour or last five minutes and obviously if you're computing things with that freshness you can't afford to go run you know offline jobs and so we have a streaming path across the top where we can get metrics coming out of Kafka we can run you know flink job to aggregate across the stream of data and and write those numbers into Cassandra and then double write them back to hive so you have exact same numbers available later on for training and so sort of the parity between online offline is super important to get right and the way we've solved that generally is by having like only compute the the only compete the feature once and then double ride it to the other store so then batch training you know pulls data from these you know hive tables and runs this through the algorithm which could be a tree or a linear model or a deep learning model and then writes the output that's actually not Cassandra neem orbit into a model database that stores you know all the metadata about the model you know that we talked about for who trained when it was training with datasets and then as well as all of the actual you know learn parameters the artifacts of the model and so if it's a tree model it's the all the split points we saw before because it's a pony model it's all of the learn weights in the network and so we capture both the I mean all the metadata all the configuration and plus the actual parameters the model and store that in database and then at deployment time you can click a button or through an API you take one of those models you trained and and push that out into either an online serving container that we talked about before that will do network based you know request response predictions or you can deploy it into a batch job that will run you know on a cadence and generate you know lots and lots of predictions and send them somewhere else and then finally if you look at how the predictions actually happens along the top you know again the the real time case imagine you open the eats app and you want to see your your meal delivery time estimates you'll send you know the features coming from the phone would be you know your location time of day a bunch of things that are relevant to the current context and that will go to the model and then you know the models part of the configuration knows that in addition to the features that come as part of the current request we have to get a much other ones that are waiting for it in the future store and so we have the you know the one or the one hour meal prep time and the two-week meal prep time and probably bunch of others are pulled out of Cassandra and then join to the feed you sent from the phone and in that whole feature vector is then sent the model for for scoring and so you can see we're kind of blending the the the request context features with a bunch that are computed either of your streaming jobs or via batch jobs and again like a lot of the the challenges here are are getting the system set up and and and and in a way that where it's very easy for developers to wire up all these pipelines and not have to do it one off each time because that's where you know without this that's where most of the work and an ml goes is getting these data pipeline set up and then for the monitoring case either for real-time or batch predictions we can you know log the predictions back to Hadoop and then join them to the outcomes you know once we learn about them as part of the regular processing of data and then we can you push those out to metric systems for for alerting and and and and and monitoring and again because these are batch jobs you know I think it's we run these things once an hour so it's not super real-time yet but we'll come back to that later and then zooming out we have a sort of a management plane that we use for you know the monitoring we pumped is to central monitoring systems that drive dashboards we have an API tier that kind of orchestrates is where the brains of the system and then it also is a kind of public API service for the web UI that's used to doing a lot of the workflow management and deployment and then you can write you know Python or Java you know automation code or integration code to drive the system from the outside we have a quick little video here showing the UI but you know it's it's organized around projects and these are all kind of dummy names but a project is a container for modeling problem you can go connect to you know hive table to train your model on you can let sit here go you look at all of the models you trained we talked about this before it's a bunch of boosted tree models you can drill into one of these oops I'll click something oops try that again to projects go grab a hive table and drill in we're gonna drill in and see some of the visualizations and reports on one of these models that's already deployed we click in and you can see that this is a classification model so you can see the confusion matrix and bunch of the different metrics used to assess accuracy this is the the tree thing we saw before that has because this model has whatever 162 feature isn't a lot of trees and you can see the actual you know split points in the and the trees and then here's the future report for all the features in this model with the distributions and so forth and I think we're gonna go deploy model heating see how fast it goes out so you click so there's a model it's not deployed you click deploy click ok and it spins for few minutes and sort of packaged up the model and pushes it out but swimming free structure and then boom it's ready to go and then here you can see the you know for the history of all the different models you've deployed over time you know sort of logs of who deployed and when cool alright so that's the you know that's sort of the v1 the platform and and this is what's you know we built over the last few years to you know to support to ml use cases scale you know it's worked well in some cases you know things weren't as fast as easy as they could be and and so the next wave of of our efforts on the platform around how do you how do you think this foundation that we have now and how do you make it sort of faster and easier for people to go from idea through prototyping the first model and then deploy that and then sort of scale that model up up into production and we go through a few recent projects that we've you know finished building our building right now to address those problems so on the right side you know are we sort of working now on accelerating ml and so we have a new Python ml project that helps people work with I'm kind of bringing the the toolset to the to the data scientists who prefer working in Python over over web UIs or over over Scala hora vadas are distributed deep loading system that has a really elegant API auto-tuned is our first piece of auto ml so allowing the system to help you train good models as opposed to having they a scientist or engineer kind of have to figure out all the right settings themselves and some new visualization tools help understand you know why models are working well or not and then some newer features around understanding more in real time how the models behaving in production the thing I showed you earlier was you know refresh once an hour and now we have more real-time monitoring of the model and production all right so as we've you know start to look at how do we make how to accelerate model development and so the address the developer experience problem with machine learning we've kind of looked at a few things one is you know ml is this long workflow from getting data to training models all the way through and and you know there's friction points in every single step and so we've been quite you know rigorous around trying to dent off' I wear those friction points are and kind of grinding off the rough edges and making the work flow faster one of the guiding kind of principles or philosophies has been that and this kind of goes back in many ways to the you know DevOps philosophy where if you let the engineer own you know own the code from from prototype through hardening through through QA through production you can accelerate the loop of trying something out getting production and you also build better systems because the engineers are are on the hook to support the thing in production and we found the same thing applies to machine learning too if you can empower the data scientists to own you know more and more of the workflow and do you like the whole thing that they're able to you know to traverse the workflow faster and they also have more ownership of the of the problem end to end you know bringing the tools to developers we made a few mistakes early on around you know not not embracing the tools that the data scientists were already very familiar with ie Python so we're bringing that back and then more investments in visual tools to help understand and debug models alright so PI mil the general problem here is that you know Michelangelo initially targeted they're super high scale use cases so high skill training on giant data sets and high scale predictions you know at very low latency and that was great for you know the first couple years of use cases and a lot of the highest value ones however we found that the system is not as easy-to-use and not as flexible as is desired by meditative scientists and also as is required by sort of a long tail of of more unique problems across Hoover and so the solution was you know how can we just support sort of plain Python and the rich ecosystem of Python tools throughout this end end workflow and do it you at somewhat limited scale because you're dealing with Python and dealing with a non distributed environment but but make it scale and make it work as well as you possibly can and so the basic idea is to allow people to build models and you know using essentially any any Python code in the Python libraries you know implement a you know serving interface in Python and then have sort of packaging and deployment tools that will treat it like any other model that we have and and be able to push it out to our serving infrastructure I'll go through a quick thing but sort of the trade-offs between the PI ml and the other system it is really kind of strapped in flexibility and sort of resource efficiency and scale and and latency but the general idea is that this is a pretty simple I think it's a kaggle case but we're gonna build a thank you logistic regression models all right but we're gonna build a panda's data frame yeah we train a logistic regression model and then and then run some test predictions at the very bottom it's a very simple you know kind of standard scikit-learn in a model and this is this is actually all happening in energy per notebook I didn't show the whole context is all happening in Jupiter so you can you know have a requirements file that selects all of your dependencies you're going to import your Python libraries you then pick what you save the model file back out to your directory and then you can this is the serving interface you implement interface that knows how to load that model back into the file and then implements a predict method that can do you know simple feature transformations and then feed the data through the model for scoring and you can kind of see how these pieces you know you know give you an interface to score the model and then through the API we can you know test the model and then at the bottom we can actually call upload model and this will package up the model in all its dependencies and send it up to you know the Michelangelo back-end such that it can be managed in our UI and then deployed the same way other models can be deployed and so this model has been uploaded so now you can see in the UI the same way you saw other models that we're all trained on the high scale system and then either through the UI or through the API you can then deploy the model out to the exact same sort of infrastructure to do you know real time request response scoring and we also I don't have an example here you can also deploy it out to a spark job to do a batch batch scoring on begin so this is an attempt to you know brace the flexibility of Python and the tools of data scientists like to use already and then provide the infrastructure and scaffolding to kind of make it work at at the highest scale that these tools you know can support architectural II you know on the left side there is the you know in your environment where you're working whether it's Jupiter or any other Python environment you basically you know train your model save it locally you know using whatever techniques you want you build your your model dot py file which is that one that had the serving the predict interface in it and then you have your typical requirements and packages that tell the system for all of the libraries and and and system libraries that you need and then there is a you know basically a packaging and build step that builds up a docker container that includes your your module and all the dependencies and that can be pushed out I would cuff a screen there be pushed out to our online serving system on the top or to our sorry offline the top or online system on the bottom for doing either batch predictions vs spark job or online predictions via the request response thing that we saw before and looking a little more closer so on the left is the is the online serving of the high scale models which is you know the picture we saw before and on the left side you can see that we we actually deploy a nested docker container containing all of the Python resources and so our existing prediction service acts as a sort of proxy or gateway and then routes to a local docker container that contains all the Python code so you get all of the same monitoring and and support of the and so the the network stuff of our high scale prediction container but then we can route the request to the nested Python service that it's used for the actual scoring and then you can use scikit-learn you can use deep learning you can write custom algorithms and this is super flexible now the trade offs are you'll you'll have slightly higher latency it's it doesn't scale as cheaply because pythons running and then you know if you're using scikit-learn you can't train on giant data sets because it's not distributed but in terms of developer friendliness and speed it's great and I think the the way people are approaching it is they can use this very quickly get out of model you know say for one city and then once they've proved that the model matters then they're then there it's sort of an easier sell to go rebuild it on the high skill system so horribad is our deep learning distribute deep loading system and it has as I mentioned before kind of two interesting facets you know one is that it scales more efficiently than the other distributed deporting approaches but also the API for it is much much simpler and much easier to set up your much easier to go from a a single node training job to a distribute training job and so in this case we pulled an example from you guys from the tensorflow documentation for us set up a a distributed trading job intensive flow using a parameter server and you can see you know there was basically one is one little method the middle they were just weaned them all and everything else is is setting up the distribute training environment which is not stuff that that a model should have to care about in the of odd case were able to do you know sort of better distributed training with a lot less work and so we have the Train method up there in the middle and then around it are a few you know API calls to set up you know horev odd to do the training there's a initialization and a few other calls to the set up the environment but much much easier and friendlier than bans many other approaches and this has been sort of quite popular at the community as well for both those reasons manifold so one of the challenges you know in the visit reports we showed before is you tend to you know we train models you tend to get a global accuracy metric and so you know what's the AUC mean square error for the whole model across the whole dataset and that's you know a good starting point but often different segments of the data will have very different characteristics and the model will treat them very very differently and we've had cases we've seen cases where you know model on the whole works great but then there's you know one slice of data where behaves very very poorly and it may be very important slice of data and so manifold we're building visualization tools that let you kind of dive in more if the data understand you know how a model works on on smaller pieces or on smaller segments that data in trying to help you identify ones where you know that are anomalous or look different or problematic so auto-tuned this is a system you know once you have the you know once you figured out the features for your model there's often a lot of work to figure out the right combination of of hyper parameters that gives you the the best accuracy and so for a tree model you have the number of trees you have the depth of the forest or sorry the number of trees at depth the trees you have bidding as you know probably six or ten different hyper parameters that you can tweak and it's impossible to know upfront what the right combination is and so it's often a so a brute-force process of finding the right combination of those and you know a common approach is either a brute force grid search where you just generate a sensitive hypercube of all the different options in try every one you can do a random search where you do the same thing but then just search random pieces and find a pretty good one or you can use what's called black box optimization and you can actually more efficiently and experimentally you know try try different combinations and learn this sort of shape of the space and then traverse more directly to a more optimal a more optimal a set of parameters and so we we collaborated with the research team at uber to build this many as you can see on the right the the the line the bottom is the is the the bayesian black box optimized high program search which which gets to a better model in much fewer iterations than Athan a grid search does because it can do it can use well learns the process and could do it much more efficiently once again this is kind of getting into auto ml how do you how do you help developers you know build models faster in a more automated fashion and save you know so you know deploy the human intelligence was really needed not where it can be you know automated away and the final one is around you know a big part of you know once you have the model you like and you deploy it a big part of the thing is is how do you make sure that models behaving correctly in production and we we talked about being able to join you know predictions back to to outcomes to know your models behaving well there's a couple problems of that one is you know it's we run that as a batch job and so there is a delay you know in the case of credit card fraud it can take you know 90 days for the bank to send you the outcomes and so you you often can't so I join to the outcomes as quickly as you'd want to and so the other approach is to it's less less accurate in a sentence less precise but it but it's much much quicker and that is just called looking at distributions of both feature data so the input data as well as the predictions coming out over time and you know for most models you know the there should be a pretty regular distribution of features and in predictions over time and maybe there's a seasonality to it as the day in the week flows by but it's often easy to sort denta fie you know big anomalies that often cause problems and usually the result usually they're caused by bad data coming in either from a broken service or broken pipeline and so you can see in this case we have this is a classification model so that's hop we're just looking at the distribution of true versus false which you can see the distribution is pretty steady over time I mean we have a few other slices looking at the actual class probability over time and then a few sort of histograms cast this time series so you can see you know how the how the bucketing of data works over time and I think these ones are all you know things kind of look ok and in this case we're now looking at at the top the prediction result but then also looking against the distribution of features in the model and you can see at the top we actually had a you know the predictions themselves kind of deviated from what looks more normal and at the bottom you can see was actually a feature that had some some bad data that was sort of triggering the abnormal predictions and so again so like like with software engineering you know we're running a Prussian system having the monitoring set up automatically for you super important and and you know for ml that matters you know at the system level but it also matters that the data the model level you want to make sure that not only is the system not airing out but that the predictions and data are are correct and if there aren't breakages and data pipelines which is which is a sort of common failure mode for for ml models all right so key lessons learned recently you know one dimension before you know around sort of productivity ml is is bring the tools so developers you know we can focus very early on on high scale and and you know that was the right first choice for uber but but now as we focus on on on velocity we're now bringing the tools closer developers and even even compromising on scale to make that easier and faster and then providing a path to scale up you know as the as a problem is more deeply understood you know data is generally the hardest part of ml and so having really good infrastructure and tooling and and automation around the the data management lets the modelers focus on the modeling problem and not on the plumbing you know on the on the system silverside you know we've leveraged a lot of open source but and we've we've you know struggle in many cases to make it like it's taken a lot longer in many cases to make things actually work well at scale nothing's free and the last one is that you know real-time ml is is quite challenging and hard to get right and hard to empower modelers to own the end-to-end and we're investing a lot over to to make those systems kind of run themselves the developers can focus on on the modeling work and not worry about the systems themselves all right thank you [Applause]

Info

Channel: InfoQ

Views: 7,151

Rating: 5 out of 5

Keywords: Artificial Intelligence, Machine Learning, Data Science, Uber, Case Study, Michelangelo, InfoQ, QCon, QCon San Francisco, Transcripts

Id: iCpp5mqTeXE

Channel Id: undefined

Length: 46min 12sec (2772 seconds)

Published: Tue Apr 23 2019