Cutting Edge TensorFlow: New Techniques (Google I/O'19)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] ELIE BURZSTEIN: If you are new to our session on cutting edge TensorFlow, my name is Elie. And today with Josh, Mike, and Sophien, we have four exciting projects for you. First, I will talk about Keras Tuner. Then, Josh will talk about probability programming. Next, Mike will talk about TF-Ranking. And finally, Sofien will show you how to use TensorFlow. Graphics Let me start by telling you about Keras Tuner, which is a framework we initially developed internally to bring new [INAUDIBLE] model to faster to production. Before getting started, let me ask you a question. How many of you have ever spent time optimizing your model performances. Right, I see quite a few hands raised. This is not surprising because getting the best model performances is not easy. Getting the best model performances is hard because there are many parameters such as the learning rate, the batch size, and the number of layers which influence the model performances. Moreover, those parameters are interdependent, which make finding the optimal combinations by hand very challenging. This is why we rely hypertuning to automatically find the best combinations. However so far, hypertuning have been known to be not easy to use. So if hypertuning is essential to get the optimal performances, this begs the question, can we make it easy as one, two, three? And the answer is yes. This is why today I am happy to introduce to Keras Tuner which is a tuning framework made for humans. Keras Tuner is a tuning framework designed to make the life of AI practitioner as easy as possible. It also helps hypertuner algorithm creators and model designers by providing them with a clean and easy to use API. For AI practitioner, which is most of us, Keras Tuner makes moving from a base model to a hypertuned model quick and easy. Let me show you how this is done by converting a basic MNIST model to a hypertuned one. We will only have to change a few lines of code. Here's our basic MNIST model that is a TensorFlow Keras controlled API. This is something I'm sure all of you have seen already. As you can set on the side, all parameter are fixed. For example, our learning rate is set to 0.001. So let's transition it to a hypertunable model in three steps. First, we wrap up our model in a function. Second, we define hyper-parameter ranges for the parameter that we would like to optimize. Here for example, we're going to optimize the learning rate which is one of the most important parameters to hypertune. Finally as the last step, we replace our fixed parameters with our hyper-parameter range. And we're done. Our model is ready to be hypertuned. This is as easy. Besides offering an intuitive API Keras Tuner, we'll also provide you with state of the art hypertuning algorithm, tuneable architectures which are ready to go, and an automatic experimental recording which make it easy for you to analyze, share, and reproduce your results. Originally, I started developing Keras Tuner to quickly try it on new models for [INAUDIBLE] purposes including the one we developed to protect Gmail against malware and phishing. Band detection is one of the core building block we need to protect your inbox against phishing email that impersonates the brand you love. This is why today, our [INAUDIBLE] example would be to show you how to build a simple and yet accurate brand logo classifier as logo identification is one of the critical components to detect brand spoofing email. The first thing we need to do is to load our dataset. In that example, we have about the 150 icons from various brands, including the ones displayed on the side. We also need to set a few variables such as batch size and the number of icons we're going to use [INAUDIBLE]. Next, our data set is very small so we'll rely on data augmentation to create enough training data. This slide shows you a few examples of the augmented icons we're going to feed to the classifier as training input, the output being the brand names. We are going to save the real icon as our validation dataset to make sure that our classifier degenerates well. To establish a baseline, let's first train a ResNet101v2 which is one of the most common and well known model architectures. As you can see on the [INAUDIBLE] graph, our model did converge but the actuality on real icon is not great. We barely reached 79% accuracy. And it's also quite big with 44 million parameters. Well, that's OK. We can use Keras Tuner to find a better model which is smaller and more accurate. So to do that, the first thing we need to do is to create, as the MNIST model a model function and input TunableResNet. TunableResNet is a tunable version of ResNet that we will provide with Keras Tuner as one of the architectures which are ready to tune. Next, you add a few layers on top of it and combine the model. Then well, we initialize the tuner and we give it $500 to spend on tuning to find the best model. And we ask it to maximize evaluation accuracy. Finally, we have to launch the tuning and wait for the results. So did it work? Well, yes it did. Actually, Keras Tuner found a way better model. Our new model have now 100% accuracy and only takes 24 million parameters. So we get a faster and more accurate model thanks to hypertuning. Keras Tuner works with many of the tools you love, including TensorBoard, Colab, and BigQuery, and many more to come. One more thing. Hypertuning takes a long time so to make everything more convenient, we will be releasing alongside with Keras Tuner an optional cloud service that will allow you to monitor your tuning on the go whether it's from your phone or from your laptop. Here is a screenshot of an early version of the mobile dashboard to give you a sense of what to expect. So the design is not final, but the UI will show you how long before you're tuning complete, as well as offer you a visual summary of the model trained so far so you can know how your tuning is going. Thank you for attending today. We are really excited about Keras Tuner. And you can sign up today for the early access program by heading to g.co/research/kerastunereap. [APPLAUSE] Josh is now going to talk about [INAUDIBLE] programming and TensorFlow. JOSH DILLON: OK. Hi, I'm Josh. And today, we'll be talking about everyone's favorite topic-- probability. So let's just start with a simple example. So suppose we're trying to predict these blue dots-- that is, the Y-coordinate from the X-coordinate. And Keras makes this pretty easy to do. As you can see here, we have a dense neural network with one hidden layer outputting one float. And that float is the predicted Y-coordinate. And we've chosen mean squared error as a loss, which is a good default choice. But the question is, how do we make our loss function better? What does a better loss function look like? And how would we even know? How would we evaluate the fit of this model? And so we would like to be able to specify the negative log likelihood as a generic loss, but then encode the distributional assumptions in the model. And furthermore, if we're doing that, then wouldn't it be nice to get back an object for which we can query-- ask for the mean, the variance, the entropy, et cetera. And the answer is we can do this. Using a TensorFlow probability distribution layer, we can say that we want the neural network to output a distribution, basically. And the way this works is as you can see, the second to the last layer here outputs one float. That float is interpreted as the location or mean of a normal distribution. And that's how we can implement linear regression. And as you see here, the fit is this sort of red line. But what's cool is, we're actually outputting a distribution. Right? So you can take this output and just ask for the entropy or the variance of the prediction. So that's nice. But now that we've done this, there's sort of something that looks a little fishy here. We're learning the mean, but not the standard deviation. It seems like maybe a missed opportunity to improve our model. And that missed opportunity is now self-evident that we're learning a distribution directly-- sort of an idea that was hidden in what was otherwise the mean square error. So to learn standard deviation, it's just another one or two line change. Instead of outputting one float, we now output two. One is interpreted as the mean as before, the location. The other when passed through a soft plus function, is now interpreted as the standard deviation. And what you see is we're now able to get this sort of green line and the red lime-- green being the standard deviation, red being the mean fit from before. As you can see, sort of the green line diverges as X increases. And so that suggests that our data-- the variability of Y, actually-- changes as a function of X. Statisticians call this heteroskedasticity. But I'd just like to think of it as learning known unknowns. There was variability present in our data. And because we took a more probabilistic view of our model, it was pretty easy to see how we should fit that. So that seems pretty cool. But I guess the question is, now that we're thinking about sort of known unknowns or aleatoric uncertainty, what about unknown unknowns? Do we even have enough data to accurately make the claim that this is the standard deviation and mean of this regression problem? And if we don't, how might we get there? And the answer is-- or an answer-- is to be Bayesian. Rather than to just fit the weights, if instead we think about weights is being drawn from a distribution and try to find what might be the best posterior distribution, then we can actually capture some degree of unknown unknowns. That is, keeping track of how much evidence we have or don't have to make the prediction we want to make. So in practice, this boils down to something that looks a lot like learning an ensemble. As you can see here, there are numerous random draws each corresponding to a line. But what's cool is computationally, we only pay a small additional overhead for fitting what is otherwise an infinite number of models. So that seems pretty cool. But we seem to have lost the aleatoric uncertainty-- the known unknowns. And so can we get it back? Yes. Since all of this is modular, you can simply specify whatever assumptions you want to make. We're back to fitting the standard deviation by outputting two floats from that penultimate layer. And yet, each of those dense variational layers are sort of representing an infinite number of possible weights. So now, we're starting to get a fairly sophisticated model. And to get here, all we had to do is just keep swapping out one sort of cross layer for a probability layer, output a distribution, change weights to be distributions. So that's pretty powerful and yet, a sequence of simple changes. Of course now we can ask, what if we are not even sure that a line is the right thing to be fitting here? What if we actually want to think about the loss function itself as being a random variable? In this framework where we're able to just encode ideas as random variables, everything's on the table, right? So what would that look like? And how would we do it? The answer in this case is to use the variational Gaussian process layer. And from that, we conclude the data wasn't even linear at all. In fact, it had this dampened sinusoidal structure. So no wonder we're having a hard time fitting it. We were using-- in some sense-- just fundamentally the wrong model. And the way we got here is by just questioning every assumption in our model, but not having to think about sort of the relationship between different losses which otherwise might seem arbitrary-- rather, a sequence of modeling assumptions. So how was this all so easy? With TensorFlow probability. TensorFlow probability is a toolbox for probabilistic modeling built on or in or using TensorFlow. Statisticians and data scientists will be able to write and launch the same model and ML researchers and practitioners can make predictions with uncertainty. You saw just a small part of the overall TensorFlow probability tool box. More broadly, it offers tools for building models and for doing inference within those models. On the model building side, the lowest level of most basic abstraction is distributions. You saw the normal distribution. It's exactly what you think-- gamma, exponential, et cetera. These are sort of the building blocks of your model. And they're all built to take advantage of vector processing hardware, and in a way that sort of automatically takes advantage of it. Next, we have bijectors. This is a module for transforming distributions to bestill other distributions. Defeomorphisms is the $10 word to describe these. And they can range from a simple sort of exponential logarithm transform to more complicated transforms that combined neural nets with the defeomorphism-- so for example, [INAUDIBLE] regressive flows, real MVP. Fairly exotic neural densities can be built using bijectors. You saw layers-- a few examples of those. We also have a number of losses for making Monte Carlo approximations. And joint distribution is an abstraction for combining multiple random variables as one. On the inference side of the fence we have Markov chain Monte Carlo-- no probabilistic modeling toolbox would be complete without it-- within which have Hamiltonian Monte Carlo and a number of other transition kernels which generally take advantage of TensorFlow's automatic differentiation capability. We also tools for variation inference, which turns inference into an optimization problem. And finally, we have additional optimizers that are useful for probabilistic models-- for example, quasi second order methods like BFGS as well as methods that don't use the gradient for cases where that's computationally prohibitive. So TensorFlow probability is widely used within alphabet, including Google Brain and DeepMind. It also is used externally. One of the earliest adopters is Baker Hughes GE. And they use TFP to basically treat models as random variables for purpose of detecting anomalies. So one problem that they're particularly interested in is detecting when jet engines will fail. And luckily, their dataset doesn't have failing jet engines. That would be a terrible thing. And so we have to be Bayesian and sort of infer the evidence that we don't have. So using TensorFlow probability and TensorFlow, they're able to process an enormous amount of data-- six terabytes. They are able to explore over 250,000 different model architectures and to great profit, seeing a 50% reduction in false positives and a 200% reduction in false negatives. And this sort TensorFlow graph represents their pipeline-- the orange boxes you see here-- heavily use TensorFlow probability to, as I said, treat the model as a random variable. So the question I want to leave you with is, who will be the next success story? TensorFlow probability is an open source Python library built using TensorFlow, which makes it easy to combine deep learning and probabilistic models on modern hardware. You can pip install it right now. Learn more at TensorFlow.org/probability or shoot us an email. If you're interested also in learning more about Bayesian techniques or just TensorFlow, TensorFlow probability, Google Bayesian Methods for hackers-- the online version of this book, we rewrote to use TensorFlow probability. I think it's a great way to get started if you're interested. On our GitHub repository, you'll also find numerous examples, including the one you saw today. Thank you. And with that, Mike will talk to you TF ranking. [APPLAUSE] MICHAEL BENDERSKY: Thank you Josh. Hello everyone. My name is Michael. And today, I'll be talking about TF ranking, a scalable learning to rank library for TensorFlow. So first of, I'll start by defining what is learning to rank, which is the problem we're trying to solve with TensorFlow ranking. Imagine you have a list of items. And here, the green shades indicate the relevance levels of these items. The goal of learning to rank is to learn a scoring function, F, such as to take a list of these items and produces an optimal ordering of these items in their order of the relevance. So the greenest item would be at the top. This seems like a very abstract problem. However, it has a lot of practical applications. In search, we rank documents in response to user queries. In recommendation systems, we rank items for a given user. In dialogue systems, we rank responses for a user request. And similar in questioning answering systems, we rank answers in response to user questions. One very common application of ranking that requires massive amount of data is a click position optimization. In this setting, the function, F, takes in as an input a rank list where we have some clicks on the items in the list. The perfect ranking in this case assumes that the click documents should be placed at the top of the list. Later in this talk, I will show an example of this application. OK, so let's talk about TensorFlow ranking or TF ranking for short. It was first announced on Google AI blog on December 2018. And it's a first open source library that does learning to rank at scale with deep learning approaches. It's actively maintained and developed by the TF franking team here at Google. And it is fully compatible with entire TensorFlow ecosystem, including tools like TensorBoard and TensorFlow Serving. One interesting problem which we're trying to solve with TF ranking is that unlike classification or regression metrics, ranking metrics are usually non-convexed. In fact, most [INAUDIBLE] ranking metrics are either discontinuous or flat. I'll give an example. In this case, we see a step function. And the step here indicates what happens when we change the score of the items in the list and then there is a rank swap that occurs. When the score changes and the swap occurs, we are basically becoming discontinuous in the function space. The rest of the function is flat because we do not change the ordering of the items when we change the scores. These types of function are very difficult or impossible to directly optimize [INAUDIBLE] gradient descent. Due to that, researchers in learning to rank have been developing different types of loss functions. One common loss function is the point [INAUDIBLE] loss, where we basically take as an input each item and assign to them a probability of them being relevant. This is very similar to a classification or regression case, but completely ignores the relationship between the different items in the list. So pairwise ranking losses were proposed. These use pair comparisons to order the list. So instead of learning a probability for each item, we learn probabilities of one item being preferable to another item. Again, this does not capture the entire list-- just pairs. So list-wise ranking losses were proposed instead in the which function, F, takes one item at a time but tries to optimize the ordering of the entire list producing pi star, which is the optimal permutation on the items. One new development we propose in TensorFlow ranking is this idea of multi-item scoring. So unlike in the previous slide where the function, F, takes one item at a time, in multi-item scoring scenario, the function, F, takes all the item in at times and produces the optimal ordering, pi star. This is really important for complex interdependent inputs and allows to use the context of other items to make better ranking decisions. One important thing to note is that we support many, many metrics in TensorFlow ranking. So we support standard metrics like (N)DCG, ARP, Precision@K, and others. But it's also very easy to add your own metrics to TensorFlow ranking. And once you have the metric you want to optimize and you use TensorBoard, you can easily visualize it while you train your model. So you can see how your (N)DCG or other metric progresses as you model trains across the apex. So let me jump into describing how you can develop your own state of the art ready to deploy learning to rank model in four simple steps using TensorFlow ranking. First, you specify a scoring function. Then you specify the metrics that you want to optimize for. Then you specify your loss function. And finally, you build your ranking estimate using all of these three previous steps. Here how it looks in code. First, we define the scoring function. And here, we use three hidden layer scoring function. Then we specify the evaluation metrics. In this case, we use (N)DCG metrics and we use (N)DCG at top ranks. Finally, we need to specify the lowest function and the ranking estimator. Here, we propose using a ranking head with a soft max loss. However, note the soft max loss can be easily replaced by any other loss supported by TF ranking. So it's very easy to switch between losses by simply replacing this parameter to something else. And finally, you build the ranking estimator. Interesting thing about the ranking estimator, it takes into as a parameter this group size. So this group size, if you set it to something that is greater than one, you're essentially enabling the multi-item scoring I was referring to before. If you set up to one, you fall back to using standard learn to rank approaches with single item scoring. So let's say in less than 50 lines of code, you're already build your own learning to rank model. And now, you're ready to train it on your train data. OK, I'm just going to finish this by giving an example of how TF ranking works in practice. And I'm going to go back to the click position optimization problem I posed before. To remind you, in this case, the perfect ranking when we take in as an input a click data, we produce a rank list [INAUDIBLE] the clicked items will be at the top of the list. We're using an internal dataset here of around one billion query document pairs. And for each query document pair, we extract the following-- some numerical features that we associate with this query document pair, the query and document text, the position at which document was displayed. And as labels, we use click or no-click information about this particular document for this query. Here, we compared the performance of your TF ranking to lambdaMART which is a state of the art learning to rank approach. It's interesting to know that when you compare TF ranking using numerical features only, it is comparable to lambdaMART on this dataset. However, the more interesting thing is that when we add text as features-- we achieve big improvements, especially when using TF ranking in ensemble with lambdaMART. On this dataset, we achieve over 3% gain when we're adding sparce textual features into the model and ensembling lambdaMART and TF ranking, which is a very significant improvement for a dataset this size. All right, so I hope I got you all excited about using TF ranking. What should you do next? So you can go and read our paper and archive about TF ranking and all the work that went into developing it. Then, check out our GitHub repository. It has all information about TF ranking. And install TF ranking from there by using pip install. You can run through a simple call that we have on our GitHub repository. And that's it. You're ready to start building your next learn to rank application. I would like to extend huge thanks to everyone on the TF ranking team who made this project possible. And next up is Sofien to talk about TensorFlow graphics. [APPLAUSE] SOFIEN BOUAZIZ: Thanks Mike. Thanks. Hi everyone. My name is Sofien. And today, I'm proud to announce the first release of a new library called TensorFlow graphics. But before getting any further, let's start from the very beginning and define computer graphics. Computer graphics is a subfield of computer science which studies methods for digitally synthesizing and manipulating visual content. Most of you have been exposed to computer graphics through movies and video games where amazingly beautiful synthetic scenes are rendered photo-realistically. And this is thanks to many advances in the computer graphics field. To give you some perspective, this is what first computer graphics in 1958 with the first interactive game called Tennis for Two. As can see, we have come a long way. To generate beautiful renderings, a computer graphics system needs [INAUDIBLE] input in description. These often include transformations which explain how the objects are placed in space, camera models which describe from which point of view the scene needs to be rendered, light and matter models-- defining object appearances-- and finally [INAUDIBLE] geometry. These parameters are then interpreted by renderer to generate a beautiful image. Now in comparison to computer graphics, computer vision is concerned with the theory beyond artificial system that extract information from images. So we can see computer vision and computer graphics as a duality. A computer vision system would start from an image and try to automatically extract a scene description, estimating the three dimensional position and orientation of objects, understanding the material properties, or just recognizing these objects based on their 3D geometry. Answering these type of questions about the three dimensional world is fundamental for many machine learning applications. A good example are autonomous vehicles and robots that need to reason about three dimensional objects and their relationship in space. However, to train a machine learning system solving this complex 3D vision tasks, a large quantity of data is needed. Labeling data being a complex and costly process, it is important to a mechanism to design machine learning systems that can reason about the three dimensional world while being trained without much supervision. So combining computer vision and computer graphics provides a unique opportunity to leverage a vast amount of readily available unlabeled data. This can be done by a technique called analysis by synthesis where the vision system extracts the scene parameters and the graphic system renders back an image based on them. If the rendering matches original image, the vision system has done a great job at extracting the correct scene parameters. In this setup, computer vision and computer graphics go hand in hand, forming a single machine learning system similar to a neutron coder. TensorFlow graphics is being developed to solve this type of problems. And we are aiming at providing a set of differential graphics layer that can be used in your [INAUDIBLE] machine learning models. So for the sake of time, we'll focus on four useful components that can be included into a deep learning models to solve these interesting three division tasks. So during the next slide, you will see a QR code like this one. If you are interested, use your smartphone. Point your smartphone toward the slide and you will be directed to the [INAUDIBLE] free resources. So get ready. These QR codes are going to come back in later slides. OK so now now, let's jump right in and see how 3D transformation can be expressed using TensorFlow graphics. One basic building block for manipulating 3D shapes are 3D rotations. In TensorFlow graphics, we are providing a set of classical representation, also implemented function, to convert between them. One easy way to represent rotation is by using an axis and an angle defining our major object to rotate around this axis. This can be easily expressed in TensorFlow graphics using our transformation module where in this code, we first loads the vertices of the cube. We then define the axis of rotation and the angle. And finally, we apply the transformation to the cube using rotate function of the axis single module. Now that we have seen how easy it is to express rotation in TensorFlow graphics, let's move on to camera models. Camera models play a fundamental role in computer vision as the great [? influencer ?] of appearances of three dimensional objects that are projected onto the camera plane. As you can see in this slide, the cube appears to be scanning up and down. But it is actually the camera focal lens that is changing. Currently in TensorFlow graphics, we propose two type of cameras-- an autographic and a perspective camera model. So let's see how the perspective camera model works. One common operation used with a camera model is to project a set of 3D points onto a 2D camera plane. And this can also be expressed easily with TensorFlow graphics where in this code similarly we first load the vertices of the cube. We then define the intrinsic parameters of the camera and this allows us to map the 3D vertices of the cube to 2D using the project function of the perspective camera module. So far, we [INAUDIBLE] 3D objects and optimize them to 2D. But what about the appearances of these objects? In TensorFlow graphics, we're providing a few simple material models are going to render 3D objects using TensorFlow. To be slightly more concrete, these material models define how the light reflects off the surface of an object. So given an incoming light action, how much of the light will bounce off the surface toward a particular outgoing direction? So the input parameter needed for the material model are the surface [INAUDIBLE] of the 3D object, the incoming light direction, the outgoing direction-- for example, pointing toward the camera-- and the material parameter-- in this case, color and shininess. Given all these inputs, TensorFlow graphics can evaluate how much light is reflected off the surface and the outgoing direction which allows us to shape the camera pixel. And in case this is all new for you, we provide a collab where you would be able to see this concept in more details. OK so now that we have seen some of [INAUDIBLE] graphics functionality that TensorFlow graphics introduces, let's talk about geometry-- and especially how TensorFlow graphics can help in expressing convolution, on measures, and point clouds. Image convolution is a basic building block of deep learning. They have been extensively used in many learning tasks such as image classification. The nice parts of dealing with images is that they are represented by a uniform grid, which makes convolution [INAUDIBLE] easy to implement and consequently, convolution [INAUDIBLE] networks. In TensorFlow image convolution are readily available. However, things become a bit more complicated when dealing with three dimensional objects which are often defined as measures in point clouds and are not represented as a uniform grid. This makes convolution hard to implement, and also now networks based on them. In recent years sensor giving three dimensional point clouds are becoming part of everyday life from smartphone def sensors to self-driving car [? radars. ?] It is therefore important to open source such functionalities for developers and researchers to be able to efficiently use 3D data and extract information from them. In TensorFlow graphics, we propose a set of graph convolution operators which can almost be used as a drop in with placement after convolution. So in this code, we first must load the mesh in the form of vertices and connectivity. We then apply one of the graph convolution operator that TensorFlow graphics provide. It is also important to note that we are providing the equivalent Keras layers. And finally, the convolution layer can be followed by a classic [INAUDIBLE],, all the layers. To demonstrate how this can be used inside a more complex neural networks, we also provide a collab doing three dimensional semantic human part segmentation as an example. So during the few slides, we are seeing a small set of the TensorFlow graphics functionalities. And the good news is that we are providing many more of them. And we will also add more of these functionalities in the near future. But there's also one more thing. We are glad to announce a new TensorFlow plug-in allowing measures and point cloud visualization. And I believe that this plug-in will be amazingly useful for developers and researchers that want to use TensorFlow to analyze three dimensional data. So get started today. We provide the pre-package. And installing the library is as easy as doing pip install tensorflow-graphics. In case you are really excited about TensorFlow graphics, we have a GitHub page from which you can pull our latest features. We are also providing a comprehensive API documentation and multiple collabs from which you can learn about some of the functionalities we are providing. Before ending the talk, I would like to thank people that have really contributed to make this project happen. Thank you very much, everyone. I hope you enjoyed this presentation. [MUSIC PLAYING]
Info
Channel: TensorFlow
Views: 32,359
Rating: undefined out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate
Id: Un0JDL3i5Hg
Channel Id: undefined
Length: 37min 32sec (2252 seconds)
Published: Thu May 09 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.