H2O Driverless AI Demo

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome to h2o driverless AI this is the login screen logging in will take me to a screen where I can add data set from a number of sources we are continuously adding to this selection the first step is to import data set we are going to be using the credit card data set from cago once the data set is important we can do a number of things I can intelligently split my data here we'll specify the names of the two data sets we're going to be using c c 1 and c c 2 i can choose the percentage of the split that i want to use I'm going to be using 80 and 20 percent split you can now see that the data sets have been created this is useful if you did not already have test and train split ahead of time you can generate these on the fly in travel SEI without having to use other software like python or spark prior to ingesting it into driverless AI once I have my data set I can use the auto else tool to automatically visualize the data set it will generate interesting plots based on the data sets provided automatically I can navigate through the various plots via the carousel this particular plot is identifying outliers in each column which I can highlight select and analyze other interesting plots include a correlation graph heat maps etc the correlation graph shows correlations amongst variables is interactive as all of our visualizations in our credit card data set we can see a high correlation between bill amount 1 & bill amount - I can use the Help button if at any point I do not understand what a visualization represents I can download any plot for later use once I'm done visualizing a dataset I can create an experiment by simply clicking predict we will need a trained ear asset and optionally a tested asset for our data we'll be using a target of default payment next month trying to determine if a customer is going to default on their payment on the following month based on their payment history with the train data and target set you will be presented with three knobs dictating how you prioritize accuracy time and interpretability changes to the dials will be reflected in the panel on the left hand side fine-tuning of an experiment can be performed from the export settings I can require certain algorithms or turn them off I can set certain values whether or not to build the Python scoring pipeline whether or not to build a Java scoring pipeline after the model is completed etc we will start the experiment with the default settings now driverless AI is intelligent and recognizes that an ID column is present and automatically drops it the upper left-hand corner describes the data set this particular data set has 24,000 rows and 25 columns the upper right-hand side are my settings relative to accuracy compute time and interpretability as previously defined I'm able to see real-time metrics of the model being created different charts are a seeker procession versus recall lift and gains chart KS chart and you'll also be able to see your actual resource consumption CPU usage memory usage GPU usage if the machine has GPUs as our stalls we can see that with each iteration of the experiment new models are built and tested we can see the variable importance of the model right here when the experiment is done you will receive a screen that looks like this where typically the final model is the best model you can do a couple of different things you have the option of scoring on another data set transform a dataset download the predictions of both train and test data sets download a scoring pipeline in Python download a scoring pipeline in Java or download the experiment summaries and logs the experiment summary has something called the automatic documentation it is an automatically generated document that will show you exactly what we did over the course of the experiment and why we did it you can see the settings that we chose the environment that it was running under version of driver list the data that was selected so on and so forth most importantly it shows the details of what we decided and what parameters were used for the model which can be very useful especially when you try to prove that this is production ready or maybe bring this to your executives as a part of the explanation for why this model is useful for model deployment you have two options the Python scoring pipeline which is significantly better for batch predictions or the Java Mojo scoring pipeline which is great for real-time scoring you can also diagnose this model on a new data set this is very helpful in lifecycle management for example as more data becomes available down the line you can actually click on diagnose model on new data set and see how this model performs against the new data set when I interpret this model I'll get a screen that looks like this it is a summary page with plain text explanations of what the ml I processed it you can also look at feature importance and sharply explanations for the final model generated by driverless AI or the circuit models which are proxies for the final models created by travel a CI for example Kaline decision trees and random forest feature importance partial dependency plots etc you can also download any necessary resources from the resources tab including the Python client so that you can interact with driverless AI with Python we can also create automatic one-click deployments to compute platforms like AWS lambda ec2 local drast server etc based on your experiments you just have to provide the proper credentials we entered revell SEI in a toggle contest this was BNP paribas claims management prediction keidel contest in a more complicated model we can see that I'm able to get Kaggle level results and a dataset that was added at 14,000 rows and three columns after running for a little over six hours I was able to generate 17,000 new features and Train 3000 models had I entered this competition I would have placed in the top 1%

Info

Channel: H2O.ai

Views: 28,707

Rating: 4.7485714 out of 5

Keywords: automatic ml, automatic ML, aritificial intelligence, machine learning, feature engineering, interpretability, Kaggle contest, BNP Paribas data set, shapely explanation, surrogate model, data visualization, mli, machine learning interpretability, driverless ai, driverless ai demo

Id: wcyMBRRLmqs

Channel Id: undefined

Length: 6min 15sec (375 seconds)

Published: Mon Apr 01 2019