Decision forests in TensorFlow | Session

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hey everyone i hope you're doing well in this talk i'm excited to share a new library with you that you can use to train decision forests in tensorflow and i'll be giving this talk with matthew he's a knowledgeable and helpful software engineer and you'll see him in a bit here's a quick outline of what we'll cover i'll briefly introduce decision forests and then i'll talk about the types of problems you can solve with them and the kinds of data they work best with then matthew will walk you through a code example and i think you'll be happy to see it's just a couple of lines if you're new to them a decision forest is a family of machine learning models including things like random forests and gradient boosted trees and these can be easier to use than neural networks when you're getting started with machine learning and they're powerful too in fact they can even outperform neural networks with certain types of data and a decision forest is built from many decision trees like this one and a tree is simply a series of yes no questions that classify an item from a data set for example you could use this tree to classify an animal as a chicken a kangaroo or a cat and you do that by following the questions and their corresponding path down the tree and this makes trees easy to interpret especially in contrast to models like neural networks so you can understand and explain exactly how your model works and in addition to being more interpretable trees can also be easier to use for example here's the code to create a decision forest in tensorflow you can create your model with a single line of code it's all set up and ready to go there's no additional work needed by contrast here's the code to create a neural network and when you're working with neural networks you often have to design the model yourself and that means thinking about things like the number and types of layers that means deep learning is a little bit more work to get started with importantly in tensorflow both decision forests and neural networks use keras and this is my all-time favorite api for neural networks and i'm glad you can use it with trees now as well this means you can use the same api to experiment with different types of models and find the one that's best for your data and importantly you can deploy both models using the same tools like tensorflow serving so if trees are so great why should you use neural networks at all and when should you use one type of model versus the other well i'll show you a few examples and if you can map the problem you want to solve to these then trees are probably a good fit for you the best type of model for you to use depends on your data and basically there are just two types of data in the machine learning world to be aware of structured data is where trees shine that's a fancy way of saying tabular data or anything you can fit inside a csv file here's an example of structured data for a classification problem and each row represents an example and each column represents a feature and if you're training a decision for us to classify this data the features become questions in the tree you typically have a small number of informative features that you can reason about instinctively and that's a fancy way of saying that the features describe concepts that you understand for example if i asked you what kinds of animals have feathers you might say birds the last column represents the label and as always you're trying to predict the label from the features here's another thing you can do trees are great for regression and in this example you're trying to predict the weight of an animal based on the same two features the only thing that's changed is the label one more cool thing you can do with decision forests called ranking and here we've added another column that tells you the type of animal say a bird or a mammal and now you can train a forest to rank the animals in each group by how fast you are and this can be very powerful the way to use a decision forest is to think about how you can map your problem to one of these examples basically if you can represent your data set and a csv like these with simple features then trees are the best place for you to get started on the other hand neural networks are great for unstructured data in unstructured data the features are complex and you typically have lots of them this is a data set for image classification and you can see the features are pixel values from an image and although it's also represented in csv there's an important difference you and i can't classify an image by reading the pixel values instead of simple features that you can reason about you have tons of complex ones you can't and this is when you need neural networks to help you train models from these complex features but if you have simple features and you often do that you can reason about then trees are the best place to start and just so you know trees aren't just for beginners they tend to win a bunch of kaggle competitions with structured data despite being relatively easy to use and with that let's turn to matthew for the code thank you josh now let's take a look at some code and i will show you how easy it is to train a decision for us in tensorflow you start by importing a data set here your data set is stored in a csv file and you use panda to load it imagine this is a structured data just like the one you showed earlier next you convert the panda data frame into a tensorflow dataset if you're new there know that tf data is a solution to under large data set and you can use the function pd data frame to tf dataset to do the conversion for larger problem you might operate directly with tf data but if your data fits in memory this one line is all you need next you create the model and learning algorithm in this example you use a random forest note that you don't specify any of the hyper parameter there it means the default hyperparameter value will be used and while not optimal they often give reasonable result and are a great way to start finally you train the model not shown here are the training log to monitor the training and help you further improve the model but we will talk a bit more about that later hey did you notice that you didn't have to one-hot encode or normalize a feature in fact you did not apply any pre-processing like it is often done with neural network you didn't even list the input feature this is one of the advantages about decision forest they natively handle numerical and categorical features and save you a lot of time after the model is trained you can evaluate it on a test data set using model.evaluate or make prediction with model.predict those are the classical keras api methods finally you save the model in the tensorflow save model which has one important benefit it means you can save the model just like any other tensorflow model using tensorflow serving in other words you are free to use the best model for the job whether it is a decision forest a neural network or maybe a combination of both and if you already have a tensorflow infrastructure in place you can use it to test and eventually deploy your decision forest in the previous slide we train a random forest now let's train a different type of model and change its hyperparameter a bit the main documentation sits on github and tensorflow.org but there i will show you an interesting trick in collab you can write a question mark followed by a function or a class to get its documentation i'm sure the our user will appreciate this so coming back to the example we have a gradient booster tree model which is another popular decision for rest algorithm and we see a few of its hyper parameters are selected following the documentation you can instantiate the model and train it now let's take a look at one another important feature a nice way to understand our decision tree work or to get in some insight about the problem is to print the tree and of course the library can do that in this example you print the structure of one tree and you might find it interesting to walk through the first branches of the model and get an idea of what the model is doing this next example shows the variable importances they tell you how much each feature matter to the model and they are key for model interpretation going further you can access the tree as a recursive python object and directly inspect its structure in the example you first print the entire tree object and on the second line you access a bit of its structure namely the threshold value in a split this is the end of the coding part there are many features we did not show you and we encourage you to take a look at the collab and the documentation on github if you have any question about decision forest in tensorflow please ask them on our new discussion forum linked on this slide feedbacks are welcome and very useful including telling us what future you would like to see implemented next as i already mentioned we have resources for you to learn more and the best place to start is tensorflow.org tutorials there you will find a new tutorial we created for you and you can check our blog youtube channel twitter feed and new github to stay up to date on this note i will thank you all for your time and hope you will enjoy using our library

Info

Channel: TensorFlow

Views: 9,150

Rating: 4.9774013 out of 5

Keywords: purpose: Educate, type: Conference Talk (Full production), pr_pr: Google I/O, TensorFlow, TensorFlow at IO 2021, TensorFlow Decision Forests, Decision Forest, tree-based models, Keras APIs inside TensorFlow, decision tree, random forests, Tree-based models, #GoogleIO, Google I/O, Google IO, Google developer conference, Google announcement, Google conference, Google

Id: 5qgk9QJ4rdQ

Channel Id: undefined

Length: 9min 21sec (561 seconds)

Published: Wed May 19 2021