Deep Learning to Solve Challenging Problems (Google I/O'19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] JEFF DEAN: I'm excited to be here today to tell you about how I see deep learning and how it can be used to solve some of the really challenging problems that the world is facing. And I should point out that I'm presenting the work of many, many different people at Google. So this is a broad perspective of a lot of the research that we're doing. It's not purely my work. So first, I'm sure you may have all noticed, but machine learning is growing in importance. There's a lot more emphasis on machine learning research. There's a lot more uses of machine learning. This is a graph showing how many Arxiv papers-- Arxiv is a preprint hosting service for all kinds of different research. And this is the subcategories of it that are related to machine learning. And what you see is that, since 2009, we've actually been growing the number of papers posted at a really fast exponential rate, actually faster than the Moore's Law growth rate of computational power that we got so nice and used to for 40 years but it's now slowed down. So we've replaced the nice growth in computing performance with growth in people generating ideas, which is nice. And deep learning is this particular form of machine learning. It's actually a rebranding in some sense of a very old set of ideas around creating artificial neural networks. These are these collections of simple trainable mathematical units organized in layers where the higher layers typically build higher levels of abstraction based on things that the lower layers are learning. And you can train these things end to end. And the algorithms that underlie a lot of the work that we're doing today actually were developed 35, 40 years ago. In fact, my colleague Geoff Hinton just won the Turing Award this year along with Yann LeCun and Yoshua Bengio for a lot of the work that they did over the past 30 or 40 years. And really the ideas are not new. But what's changed is we got amazing results 30 or 40 years ago on toyish problems but didn't have the computational resources to make these approaches work on real large scale problems. But starting about eight or nine years ago, we started to have enough computation to really make these approaches work well. And so what are things-- think of a neural net as something that can learn really complicated functions that map from input to output. Now that sounds kind of abstract. You think of functions as like y equals x squared or something. But really these functions can be very complicated and can learn from very raw forms of data. So you can take the pixels of an image and train a neural net to predict what is in the image as a categorical label like that's a leopard. That's one of my vacation photos. From audio wave forms, you can learn to predict a transcript of what is being said. How cold is it outside? You can learn to take input in one language-- hello, how are you-- and predict the output being that sentence translated into another language. [SPEAKING FRENCH] You can even do more complicated things like take the pixels of an image and create a caption that describes the image. It's not just category. It's like a simple sentence. A cheetah lying on top of a car, which is kind of unusual anyway. Your priority for that should be pretty low. And in the field of computer vision, we've made great strides thanks to neural nets. In 2011, the Stanford ImageNet contest, which is a contest held every year, the winning entry did not use neural nets. That was the last year. The winning entry did not use neural nets. They got 26% error. And that won the contest. We know this task-- it's not a trivial task. So humans themselves have about 5% error, because you have to distinguish among 1,000 different categories of things including like a picture of a dog, you have to say which of 40 breeds of dog is it. So it's not a completely trivial thing. And in 2016, for example, the winning entry got 3% error. So this is just a huge fundamental leap in computer vision. You know, computers went from basically not being able to see in 2011 to now we can see pretty darn well. And that has huge ramifications for all kinds of things in the world not just computer science but like the application of machine learning and computing to perceiving the world around us. OK. So the rest of this talk I'm going to frame in a way of-- but in 2008, the US National Academy of Engineering published this list of 14 grand engineering challenges for the 21st century. And they got together a bunch of experts across lots of different domains. And they all collectively came up with this list of 14 things, which I think you can agree these are actually pretty challenging problems. And if we made progress on all of them, the world would be a healthier place. We'd have a safer place. We'd have more scientific discovery. All these things are important problems. And so given the limited time, what I'm going to do is talk about the ones in boldface. And we have projects in Google Research that are focused on all the ones listed in red. But I'm not going to talk about the other ones. And so that's kind of the tour of the rest of the talk. We're just going to dive in and off we go. I think we start with restoring and improving urban infrastructure. Right. We know cities were designed-- the basic structure of cities has been designed quite some time ago. But there's some changes that we're on the cusp of that are going to really dramatically change how we might want to design cities. And, in particular, autonomous vehicles are on the verge of commercial practicality. This is from our Waymo colleagues, part of Alphabet. They've been doing work in this space for almost a decade. And the basic problem of an autonomous vehicle is you have to perceive the world around you from raw sensory inputs, things like light [INAUDIBLE],, and cameras, and radar, and other kinds of things. And you want to build a model of the world and the objects around you and understand what those objects are. Is that a pedestrian or a light pole? Is it a car that's moving? What is it? And then also be able to predict both a short time from now, like where is that car going to be in one second, and then make a set of decisions about what actions you want to take to accomplish the goals, get from A to B without having any trouble. And it's really thanks to deep learning vision based algorithms and fusing of all the sensor data that we can actually build maps of the world like this that are understandings of the environment around us and actually have these things operate in the real world. This is not some distant far off dream. Waymo is actually operating about 100 cars with passengers in the back seat and no safety drivers in the front seat in the Phoenix, Arizona area. And so this is a pretty strong sense that this is pretty close to reality. Now Arizona is one of the easier self-driving car environments. It's like it never rains. It's too hot so there aren't that many pedestrians. The streets are very wide. The other drivers are very slow. Downtown San Francisco is harder, but this is a sign that it's not that far off. Obviously, a vision works, it's easier to build robots that can do things in the world. If you can't see, it's really hard to do things. But if you can start to see, you can actually have practical robotics things that use computer vision to then make decisions about how they should act in the world. So this is a video of a bunch of robots practicing picking things up, and then dropping them and picking more things up, and essentially trying to grasp things. And it turns out that one nice thing about robots is you can actually collect the sensor data and pool the experience of many robots, and then collectively train on their collective experience, and then get a better model of how to actually grasp things, and then push that out to the robots. And then the next day they can all practice with a slightly better grasping model, because unlike humans that you plop on the carpet in your living room, they don't get to pool their experience. OK. So in 2015, the success rate on a particular grasping task of grasping objects that a robot has never seen before was about 65%. When we use this kind of arm farm-- that's what that thing is called. I wanted to call it the armpit, but I was overruled. Basically, by collecting a lot of experience, we were actually able to get a pretty significant boost in grasp success rate, up to 78%. And then with further work on algorithms and more refinement of the approach, we're now able to get a 96% grasp success right. So this is pretty good progress in three years. We've gone from a third of the time you fail to pick something up, which is very hard to actually string together a whole sequence of things and actually have robots actually do things in the real world, to grasping almost working quite reliably. So that's exciting. We've also been doing a lot of work on how do we get robots to do things more easily. Rather than having them practice themselves, maybe we can demonstrate things to them. So this is one of our AI residents doing work. They also do fantastic machine learning research, but they also film demonstration videos for these robots. And what you see here is a simulated robot trying to emulate from the raw pixels of the video what it's seeing. And on the right, you see a few demonstrations of pouring and the robot using those video clips, five or 10 seconds of someone pouring something, and some reinforcement learning based trials to attempt to learn to pour on its own. After 15 trials and about 15 minutes of training, it's able to pour that well, I would say like at the level of a four-year-old not an eight-year-old. But that's actually much-- in 15 minutes of effort, it's able to get to that level of success, which is a pretty big deal. OK. One of the other areas that was in the grand challenges was advanced health informatics. I think you saw in the keynote yesterday the work on lung cancer. We've also been doing a lot of work on an eye disease called diabetic retinopathy, which is the fastest growing cause of blindness in the world. There's 115 million people in the world with diabetes. And each of them ideally would be screened every year to see if they have diabetic retinopathy, which is a degenerative eye disease that if you catch in time it's very treatable. But if you don't catch it in time, you can suffer full or partial vision loss. And so it's really important that we be able to screen everyone that is at risk for this. And yeah. Regular screening. And that's the image that you get to see as an ophthalmologist. And in India, for example, there's a shortage of more than 100,000 eye doctors to do the necessary amount of screening of this disease. And so 45% of patients suffer vision loss before they're diagnosed, which is tragic, because it's a completely preemptible thing if you catch it in time. And basically, the way an ophthalmologist looks at this is they look at these images and they grade it on a five point scale, one, two, three, four, or five, looking for things like these little hemorrhages that you see on the right hand side. And it's a little subjective. So if you ask two ophthalmologists to grade the same image, they agree on the score, one, two, three, four, or five, 60% of the time. And if you ask the same ophthalmologist to grade the same image a few hours later, they agree with themselves 65% of the time. And this is why second opinions are useful in medicine, because some of these things are actually quite subjective. And it's actually a big deal because the difference between a two and a three is actually go away and come back in a year versus we better get you into the clinic next week. Nonetheless, this is actually a computer vision problem. And so instead of having a classification of a thousand general categories of dogs and leopards, you can actually just have five categories of the five levels of diabetic retinopathy and train the model on eye images and an assessment of what the score should be. And if you do that, you can actually get the images labeled by several ophthalmologists, six or seven, so that you reduce the variance that you already see between ophthalmologists assessing the same image. Five of them say it's two. Two of them say it's a three, it's probably more like a two than a three. And if you do that, then you can essentially get a model that is on par or slightly better than the average board certified ophthalmologist that's set at doing this task, which is great. This is work published at the end of 2016 by my colleagues in "JAMA," which is a top medical journal. We wanted to do even better though. So it turns out you can actually, instead of-- you can get the images labeled by retinal specialists who have more training in retinal eye disease. And instead of getting independent assessments, you get three retinal specialists in a room for each image. And you essentially say, OK, you all have to come up with an adjudicated number. What number do you agree on for each image? And if you do that, then you can train on the output of this consensus of three retinal specialists. And you actually now have a model that is on par with retinal specialists, which is the gold standard of care in this area, rather than the not as good model trained on an ophthalmologist's opinion. And so this is something that we've seen born out where you have really good high quality training data and you can actually then train a model on that and get the effects of retinal specialists into the model. But the other neat thing is you can actually have completely new discoveries. So someone new joined the ophthalmology research team as a warm up exercise to understand how our tools worked. Lily Peng, who is on the stage yesterday, said, oh, why don't you go see if you can predict age and gender from the retinal image just to see if the machine learning pipeline-- a person could get that machine learning pipeline going? And ophthalmologists can't predict gender from an eye image. They don't know how to do that. And so Lilly thought the average that you see on this should be no better than flipping a coin. You see a 0.5. And the person went away and they said, OK, I've got it done. My AUC is 0.7. And Lilly is like, hmm, that's weird. Go check everything and come back. And so they came back and they said, OK, I've made a few improvements. It's now 0.8. That got people excited because all of a sudden we realized you can actually predict a whole bunch of interesting things from a retinal image. In particular, you can actually detect someone's self-reported sex. And you can predict a whole bunch of other things like their age, things about their systolic and diastolic blood pressure, their hemoglobin level. And it turns out you combine those things together and you can get a prediction of someone's cardiovascular risk at the same level of accuracy that normally a much more invasive blood test where you have to draw blood, send it off to the lab, wait 24 hours, get the lab test back. Now you can just do that with a retinal image. So there's real hope that this could be a new thing that if you go to the doctor you'll get a picture of your eye taken. And we'll have a longitudinal history of your eye and be able to learn new things from it. So we're pretty excited about that. A lot of the grand challenges were around understanding molecules and chemistry better. One is engineer better medicines. But this work that I'm going to show you might apply to some of these other things. So one of the things quantum chemists want to be able to do is predict properties of molecules. You know, will this thing bind to this other thing? Is it toxic? What are its quantum properties? And the normal way they do this is they have a really computationally expensive simulator. And you plug in this molecule configuration. You wait about an hour. And at the end of that you get the output, which says, OK, here are the things the simulator told you. So it turns out-- and it's a slow process. You can't consider that many different molecules like you might like to. It turns out you can use the simulator as a teacher for a neural net. So you can do that. And then all of a sudden you have a neural net that can basically learn to do what the simulator can do but way faster. And so now you have something that is about 300,000 times faster. And you can't distinguish the accuracy of the output of the neural net versus the simulator. And so that's a completely game changing thing if you're a quantum chemist. All of a sudden your tool is sped up by 300,000 times. And all of a sudden that means you can do a very different kind of science. You can say, oh, while I'm going to lunch I should probably screen 100 million molecules. And when I come back, I'll have 1,000 that might be interesting. So that's a pretty interesting trend. And I think it's one that will play out in lots and lots of different scientific fields or engineering fields where you have this really expensive simulator but you can actually learn to approximate it with a much cheaper neural net or machine learning based model and get a simulator that's much faster. OK. Engineer the tools of scientific discovery. I have a feeling this 14th one was just kind of a vague catch all thing that the panel of experts that was convened decided should do. But it's pretty clear that if machine learning is going to be a big part of scientific discovery and engineering, we want good tools to express machine learning algorithms. And so that's the motivation for why we created TensorFlow is we wanted to be to have tools that we could use to express our own machine learning ideas and share them with the rest of the world, and have other researchers exchange machine learning ideas and put machine learning models into practice in products and other environments. And so we released this at the end of 2015 with this Apache 2.0 license. And basically it has this graph based computational model that you can then optimize with a bunch of traditional compiler optimizations and it then can be mapped onto a variety of different devices. So you can run the same computation on CPUs or GPUs or our TPUs that I'll tell you about in a minute. Eager Mode makes this graph implicit rather than explicit, which is coming in TensorFlow 2.0. And the community seems to have adopted TensorFlow reasonably well. And we've been excited by all the different things that we've seen other people do, both in terms of contributing to the core TensorFlow system but also making use of it to do interesting things. And so it's got some pretty good engagement kinds of stats. 50 million downloads for a fairly obscure programming packages is a fair number that seems like a good mark of traction. And we've seen people do things. So I mentioned this in the keynote yesterday. I like this one. It's basically a company building fitness center for cows so you can tell which of your 100 dairy cows is behaving a little strangely today. There is a research team at Penn State and the International Institute of Tropical Agriculture in Tanzania that is building a machine learning model that can run on device on a phone in the middle of a cassava field without any network connection to actually detect does this cassava plant have disease and how should I treat it. I think this is a good example of how we want machine learning to run in lots and lots of environments. Lots of places in the world sometimes you have connectivity. Sometimes you don't. A lot of cases you want it to run on device. And it's really going to be the future. You're going to have machine learning models running on tiny microcontrollers, all kinds of things like this. OK. I'm going to use the remaining time to take you on a tour through some researchy projects and then sketch how they might fit together in the future. So I believe what we want is we want bigger machine learning models than we have today. But in order to make that practical, we want models that are sparsely activated. So think of a giant model, maybe with 1,000 different pieces. But you activate 20 or 30 of those pieces for any given example, rather than the entire set of 1,000 pieces. We know this is a property that real organisms have in their neural systems is most of their neural capacity is not active at any given point. That's partly how they're so power efficient. Right. So some work we did a couple of years ago at this point is what we call a sparsely gated mixture of experts layer. And the essential idea is these pink rectangles here are normal neural net layers. But between a couple of neural net layers, we're going to insert another collection of tiny little neural nets that we call experts. And we're going to have a gating network that's going to learn to activate just a few of those. It's going to learn which of those experts is most effective for a particular kind of example. And the expert might have a lot of parameters. It might be pretty large matrix of parameters. And we're going to have a lot of them. So we have in total eight billion-ish parameters. But we're going to activate just a couple of the experts on any given example. And you can see that when you learn to route things, you try to learn to use the expert that is most effective at this particular example. And when you send it to multiple experts, that gives you a signal to train the routing network, the gating network so that it can learn that this expert is really good when you're talking about language that is about innovation and researchy things like you see on the left hand side. And this center expert is really good at talking about playing a leading role and central role. And the one on the right is really good at kind of quicky adverby things. And so they actually do develop very different kinds of expertise. And the nice thing about this is if you compare this in a translation task with the bottom row, you can essentially get a significant improvement in translation accuracy. That's the blue score there. So one blue point improvement is a pretty significant thing. We really look like one blue point improvements. And because it has all this extra capacity, we can actually make the sizes of the pink layers smaller than they were in the original model. And so we can actually shrink the amount of computation used per word by about a factor of two, so 50% cheaper inference. And the training time goes way down because we just have all this extra capacity. And it's easier to train a model with a lot of parameters. And so we have about 1/10 the training cost in terms of GPU days. OK. We've also been doing a lot of work on AutoML, which is this idea behind automating some of the machine learning tasks that a machine learning researcher or engineer does. And the idea behind AutoML is currently you think about solving a machine learning problem where you have some data. You have some computation. And you have an ML expert sit down. And they do a bunch of experiments. And they kind of stir it all together and run lots of GPU days worth of effort. And you hopefully get a solution. So what if we could turn this into using more computation to replace some of the experimentation that a machine learning-- someone with a lot of machine learning experience would actually do? And one of the decisions that a machine learning expert makes is what architecture, what neural network structure makes sense for this problem. You know, should I use a 13 layer model or a nine layer model? Should it have three by three or five by five filters? Should it have skip connections or not? And so if you're willing to say let's try to take this up a level and do some meta learning, then we can basically have a model that generates models and then try those models on the problem we actually care about. So the basic iteration of meta learning here is we're going to have a model generating model. We're going to generate 10 models. We're going to train each of those models. And we're going to see how well they each work on the problem we care about. And we're going to use the loss or the accuracy of those models as a reinforcement learning signal for the model generating model so that we can steer away from models that didn't seem to work very well and towards models that seem to work better. And then we just repeat a lot. And when we repeat a lot, we essentially get more and more accurate models over time. And it works. And it produces models that are a little strange looking. Like they're a little more unstructured than you might think of a model that a human might have designed. So here we have all these crazy skip connections. But they're analogous to some of the ideas that machine learning researchers themselves have come up with in. For example, the resonant architecture has a more structured style of skip connection. But the basic idea is you want information to be able to flow more directly from the input to the output without going through as many intermediate computational layers. And the system seems to have developed that intuition itself. And the nice thing is these models actually work pretty well. So if you look at this graph, accuracy is on the y-axis for the ImageNet problem. And computational cost of the models, which are represented by dots here, is on the x-axis. So generally, you see this trend where if you have a more computationally expensive model, you generally get higher accuracy. And each of these black dots here is something that was a significant amount of effort by a bunch of top computer vision researchers or machine learning researchers that then they published and advanced the state of the art at the time. And so if you apply AutoML to this problem, what you see is that you actually exceed the frontier of the hand generated models that the community has come up with. And you do this both at the high end, where you care most about accuracy and don't care as much about computational costs so you can get a model that's slightly more accurate with less computational cost. And at the low end, you can get a model that's significantly more accurate for a very small amount of computational cost. And that, I think, is a pretty interesting result. It says that we should really let computers and machine learning researchers work together to develop the best models for these kinds of problems. And we've turned this into a product. So we have Cloud AutoML as a Cloud product. And you can try that on your own problem. So if you were maybe a company that doesn't have a lot of machine learning researchers, or machine learning engineers yourselves, you can actually just take a bunch of images in and categories of things you want to do-- maybe you have pictures from your assembly line. You want to predict what part is this image of. You can actually get a high quality model for that. And we've extended this to things more than just vision. So you can do videos, and language, and translation. And more recently we've introduced something that allows you to predict relational data from other relational data. You want to predict will this customer buy something given their past orders or something. We've also obviously continued research in the AutoML field. So we've got some work looking at the use of evolution rather than reinforcement learning for the search, learning the optimization update rule, learning the nonlinearity function rather than just assuming we should use [INAUDIBLE] or some other kind of activation function. We've actually got some work on incorporating both inference latency and the accuracy. Let's say you want a really good model that has to run in seven milliseconds. We can find the most accurate model that will run in your time budget allowed by using a more complicated reward function. We can learn how to augment data so that you can stretch the amount of label data you have in interesting ways more effectively than handwritten data augmentation. And we can explore lots of architectures to make this whole search process a bit more efficient. OK. But it's clear if we're going to try these approaches, we're going to need more computational power. And I think one of the truisms of machine learning over the last decade or so is more computational power tends to get better results when you have enough data. And so it's really nice that deep learning is this really broadly useful tool across so many different problem domains, because that means you can start to think about specializing hardware for deep learning but have it apply to many, many things. And so there are two properties that deep learning algorithms tend to have. One is they're very tolerant of reduced precision. So if you do calculations to one decimal digit of precision, that's perfectly fine with most of these algorithms. You don't need six or seven digits of precision. And the other thing is that they are all-- all these algorithms I've shown you are made up of a handful of specific operations, things like matrix multiplies, vector dot products, essentially dense linear algebra. So if you can build machines, computers, that are really good at reduced precision dense linear algebra, then you can accelerate lots of these machine learning algorithms quite a lot compared to more general purpose computers that have general purpose CPUs that can run all kinds of things or even GPUs which tend to be somewhat good at this but tend to have, for example, higher precision than you might want. So we started to think about building specialized hardware when I did this kind of thought exercise in 2012. We were starting to see the initial success of deep neural nets for speech recognition and for image recognition and starting to think about how would we deploy these in some of our products. And so there was this scary moment where we realized that if speech started to work really well, and at that time we couldn't run it on device because the devices didn't have enough computational power, what if 100 million users started talking to their phones for three minutes a day, which is not implausible if speech starts to work a lot better. And if we were running the speech models on CPUs, we need to double the number of computers in Google data centers, which is slightly terrifying to launch one feature in one product. And so we started to think about building these specialized processors for the deep learning algorithms we wanted to run and TPU V1 has been in production use since 2015 was really the outcome of that thought exercise. And it's in production use based on every query you do, on every translation you do, speech processing, image crossing, AlphaGo use a collection of these. This is the actual racks of machines that were competed in the AlphaGo match. You can see the little Go board we've commemorated with on the side. And then we started to tackle the bigger problem of not just inference, which is we already have a trained model and you just want to apply it, but how do you actually do training in an accelerated way. And so the second version of TPUs are for training and inference. And that's one of the TPU devices, which has four chips on it. This is TPU V3, which also has four chips on it. It's got water cooling. So it's slightly scary to have water in your computers, but we do. And then we designed these systems to be configured together into larger configurations we call pods. So this is a TPU V2 pod. This is a bigger TPU V3 pod with water cooling. You can actually see one of the racks of this in the machine learning dome. And really these things actually do provide a lot of computational power. Individual devices with the four chips are up to 420 teraflops have a fair amount of memory. And then the actual pods themselves are up to 100 petaflops of compute. This is a pretty substantial amount of compute and really lets you very quickly try machine learning research experiments, train very large production models on large data sets, and these are also now available through our cloud products. As of yesterday, I think we announced them to be in beta. One of the keys to performance here is the network interconnect between the chips in the pods is actually your super high speed 2D mesh with wrap around links. That's why it's toroidal. And that means you can essentially program this thing as if it's a single computer. And the software underneath the covers takes care of distributing the computation appropriately and can do very fast all reduced kind of operations and broadcast operations. And so, for example, you can use a full TPU V2 pod to train ImageNet in 7.9 minutes versus the same problem using eight GPUs. You get 27 times faster training at lower cost. The V3 pod is actually even substantially larger. You can train an ImageNet model in scratch in less than two minutes, more than a million images per second in training, which is essentially the entire ImageNet data set every second. And you can train very large BERT language models, for example, as I was discussing on stage in the keynote yesterday in about 76 minutes on a fairly large corpus of data which normally would take days. And so that really helps make our researchers and ML production systems more productive by being able to experiment more quickly. If you can run an experiment in two minutes, that's a very different kind of science and engineering you do than if that same experiment would take you a day and a half. Right. You just think about running more experiments, trying more things. And we have lots of models already available. OK. So let's take some of the ideas we talked about and think about how they might fit together. So I said we want these really large models but have them be sparsely activated. I think one of the things we're doing wrong in machine learning is we tend to train a machine learning model to do a single thing. And then we have a different problem. We tend to train a different model to do that other thing. And I think really we should be thinking about how can we train models that do many, many things and leverage the expertise that they have in doing many things to then be able to take on a new task and learn to do that new task more quickly and with less data. This is, essentially, multi task learning. But often multi task learning in practice today means three or four or five tasks, not thousands or millions. I think we really want to be thinking bigger and bolder about really doing in the limit one model for all of the things we care about. And obviously, we're going to try to train this large model using fancy ML hardware. OK. So how might this look? So I imagine we've trained a model on a bunch of different tasks. And it's learned these different components, which can be sometimes shared across different tasks, sometimes independent, specialized for a particular task. And now a new task comes along. So with the AutoML style reinforcement learning, we should be able to use an RL logarithm to find pathways through this model that actually get us into a pretty good state for that new task, because it hopefully has some commonalities with other things we've already learned. And then we might have some way to add capacity to the system so that for a task where we really care about accuracy, we can add a bit of capacity and start to use that for this task and have that pathway be more specialized for that task and therefore hopefully more accurate. And I think that's an interesting direction to go in. How can we think more about building a system like that than the current kind of models we have today where we tend to fully activate the entire model for every example and tend to have them just for a single task? OK. I want to close on how we should be thinking about using machine learning and all the different places that we might consider using it. And I think one of the things that I'm really proud of as a company is that last year we published a set of principles by which we think about how we're going to use machine learning for different things. And I think these seven things when we look at using machine learning in any of our products or settings we think carefully about how are we actually fulfilling these principles by using machine learning in this way. And I think there's more on the actual principles website that you can go find, but I think this is really, really important. And I'll point out that some of these things are evolving research areas as well as principles that we want to apply. So for example, number two, avoid creating or reinforcing unfair bias. And bias in machine learning models is a very real problem that you get from a variety of sources. Could be you have biased training data. Could be you're training on real world data and the world does itself is biased in ways that we don't want. And so there is research that we can apply and extend in how do we reduce bias or eliminate it from machine learning models. And so this is an example of some of the work we've been doing on bias and fairness. But what we try to do in our use of ML models is apply the best known practices for our actual production use but also advance the state of the art in understanding bias and fairness and making it better. And so with that, in conclusion, deep neural nets and machine learning are really tackling some of the world's great challenges I think. I think we're really making progress in a number of areas. There's a lot of interesting problems to tackle and to still work on. And they're going to affect not just computer science. Right. We're affecting many, many aspects of human endeavor like medicine, science, other kinds of things. And so I think it's a great responsibility that we have to make sure that we do these things right and to continue to push for the state of the art and apply it to great things. So thank you very much. [MUSIC PLAYING]

Info

Channel: TensorFlow

Views: 81,900

Rating: undefined out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate

Id: rP8CGyDbxBY

Channel Id: undefined

Length: 40min 58sec (2458 seconds)

Published: Wed May 08 2019