Opening keynote - Jeff Dean

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm really excited to be here I think it was almost four years ago to the day that we were about 20 people sitting in a small conference room and one of the Google buildings we've woken up early because we wanted a kind of time this for an early East Coast launch where we were turning on the tent of flow org website and releasing the first version of tensorflow as an open source product at project and I'm really really excited to see what it's become it's just remarkable to see the growth and all the different kinds of ways in which people have used this system for all kinds of interesting things around the world so one thing that's interesting is the growth in the use of tensor flow also kind of mirrors the growth in interest in machine learning and machine learning research generally around the world so this is a graph showing the number of machine learning archived papers that have been posted over the last sort of 10 years or so and you can see it's growing quite quite rapidly much more quickly than you might expect and but that lower red line is kind of the nice doubling every couple of years growth rate exponential growth rate we got used to in computing power for due to Moore's law for so many years that's now kind of slowed down but you can see that the machine learning research community is generating research ideas at faster than that right which is pretty remarkable we've replaced computational growth with growth of ideas and we'll see those both together will be important and really the excitement about machine learning is because we can now do things we couldn't do before right as little as five or six years ago computers really couldn't see that well and starting in about you know 2012 2013 we started to have people use deep neural networks to try to tackle computer vision problems image classification object detection and things like that and so now using deep learning and deep neural networks you can feed in the raw pixels of an image and fairly reliably get a prediction of what kind of object is in that image you know feeding the pixels they're red green and blue values in a bunch of different coordinates and you get out the prediction Leopard this works for speed too so while you can feed in audio waveforms and by training on lots of audio waveforms and transcripts of what's being said in those waveforms we can actually take a completely new recording and tell you what is being said I made a transcript well Jerome Italia you can even combine these ideas and have models that take in pixels and instead of just predicting classification classifications of water is in the object can actually write a short sentence a short caption that a human might write about about the image you know a cheetah lying on top of a car that's one of my vacation photos which was kind of cool and so just to show the progress in computer vision in 2011 Stanford hosts an image net contest every year to see how well compute computer vision systems can predict one of a thousand categories in a full-color image and you get about a million images to train on and then you get you know a bunch of test images you've your model has never seen before and you make it need to make a prediction in 2011 the winning entrant got twenty six percent error right so you can kind of make out what that is but it's pretty hard to tell we know from a human experiment that human error of a well-trained human someone who's practiced at this particular task and really understands the thousand categories gets about five percent error this is not a trivial task and in 2016 the winning entrant got three percent error so just look at that tremendous progress in the ability of computers to resolve and understand computer imagery and and and have computer vision that actually works this is remarkably important in the world because now we have systems that can perceive the world around this and can we can do all kinds of really interesting things that we've seen similar progress in speech recognition and language translation and things like that so for the rest of the talk I'd like to kind of structure it around this nice list of 14 challenges that the US National Academy engineering put out and felt like these were important things for the science and engineering communities to work on for the next hundred years they they put this out in 2008 and came up with this list of 14 things after some deliberation and I think you'll agree that these are sort of pretty pretty good large challenging problems that if we actually make progress on them till we'll actually have you know you know a lot of progress in the world we'll be healthier we'll be able to learn things better we'll be able to develop better medicines you know we'll have all kinds of interesting energy solutions so I'm going to talk about a few of these and the first one I'll talk about is restoring and improving urban infrastructure so we're on the cusp of the sort of widespread commercialization of a really interesting new technology that's going to really change how we think about transportation and that is autonomous vehicles and you know this is a problem that has been worked on for quite a while but it's now starting to look like it's actually completely possible and commercially viable to produce these things and a lot of the reason is that we now have computer computer vision and machine learning techniques that can take in sort of raw forms of data that the sensors on these cars collect you know so they have like the spinning light R's on the top that give them 3d point cloud data they have cameras and lots of different directions they have radar in you know the front bumper and the rear bumper and they can really take all this raw information in and with a deep neural network fuse it all together to build a high level understanding of what is going on around the car or is it like at another car door my side there's a pedestrian up here to the left there's a light post over there I don't really need to worry about that moving and really help to understand the environment in which they're operating and then what actions can they take in the world that are both legal safe obey all the traffic laws and get them from A to B and this is not some distant far-off dream alphabets way mo subsidiaries actually been running tests in Phoenix Arizona normally when they run tests they have a safety driver on the front seat ready to take over if the car does something kind of unexpected but for the last year or so they've been running tests in Phoenix with real passengers in the back seat and no safety drivers in the front seat running around suburban Phoenix so suburban Phoenix is a slightly easier training ground than say downtown Manhattan or San Francisco but it's still something that is like not really far off it's something that's actually happening and this is really possible because of things like machine learning and the use of tensorflow in in these systems another one that I'm really really excited about is advanced health informatics this is a really broad area and I think there's lots and lots of ways that machine learning and the use of healthcare data can be used to make better healthcare decisions for people so I'll talk about one of them and really I think the potential here is that we can use machine learning to bring the wisdom of experts through a machine learning model anywhere in the world and that's really a huge huge opportunity so let's look at this through one problem we've been working on for a while which is diabetic retinopathy so diabetic retinopathy is the fastest growing cause of preventable blindness in the world and screening every year if you're at risk for this and if you're if you have diabetes or early sort of symptoms that make it likely you might develop diabetes you should really get screened every year so there's 400 million people around the world that should be screened every year but the screening is really specialized doctors can't do it you really need a ophthalmologist level of training in order to do this effectively and the impact of this shortage is significant so in India for example there's a shortage of a hundred and twenty seven thousand eye doctors to do this sort of screening and as a result 45 percent of patients who are diagnosed to this disease actually have suffered either full or partial vision loss before they're actually diagnosed and then treated and this is completely tragic because this disease if you catch it in time is completely treatable there's a very simple 99% effective treatment that we just need to make sure that the right people get treated at the right time so what can you do so it turns out diabetic retinopathy screening is also a computer vision problem and the progress we've made on general computer vision problems where you want to take a take a picture and tell if that's a leopard or an aircraft or a car actually also works for diabetic retinopathy so you can take a retinal image is what the screaming camera sort of the raw data that comes off the screening camera and try to feed that into a model that predicts one two three four or five that's how these things are graded you know one being no diabetic retinopathy five being proliferative and the other numbers being in between so turns out you can get a collection of data of retinal images and have ophthalmologists label them turns out if you ask two ophthalmologists to label the same image they agree with each other sixty percent of the time on the number one two three four or five but perhaps slightly scarier if you ask the same ophthalmologist degrade the same image a few hours apart they agree with themselves sixty five percent of the time but you can fix this by actually getting each image labeled by a lot of ophthalmologists so you get it labeled by seven ophthalmologists if five of them say at the - and two of them say it's a three it's probably more like a tooth and a three eventually you have a nice high quality it is that you can train on like many machine learning problems high quality data is the right raw ingredient but then you can apply basically an off-the-shelf computer vision model trained on this data set and now you can get a model that is on par or perhaps slightly better than the average board-certified ophthalmologist Sameer s which is pretty amazing it turns out you can actually do better than that and if you get the data labeled by retinal specialists people who have more training in retinal disease and have and changed the protocol by which you label things you get three retinal specialists to look at an image discuss it amongst themselves and come up with a what's called a I sort of coordinated assessment and one one number then you can train a model and now be on par with retinal specialist which is kind of the gold standard of care in this area and that's something you can now take and distribute widely around the world so one issue with with particularly with healthcare kinds of problems is you want explainable models you want to be able to explain to a clinician you know why is this person why do we think this person has moderate diabetic retinopathy so you can take a retinal image like this and one of the things that really helps is if you can show in the models assessment why this is a 2 and not a 3 and by highlighting parts of the input data you can actually make this more understandable for clinicians and enable them to sort of really sort of get behind the assessment that the model is making and we've seen this in other areas as well it's been a lot of work unexplained ability so I think the notion that deep neural networks are sort of complete black boxes it's a bit overdone there's actually a bunch of good techniques that are being developed and more all the time that will improve this so a bunch of advances depend on being able to understand text so and we've had a lot of really good improvements in the last few years on language understanding so this is a bit of a story of research and how research builds on other research so in 2017 a collection of Google researchers and interns came up with a new kind of model for text called the transformer model so unlike recurrent models where you have kind of a sequential process where you absorb one word or one token at a time and update some internal stage and then go on to the next token the transformer model enables you to process a whole bunch of text all at once in parallel making it much more computationally efficient and then to use attention on previous text to really focus on if I'm trying to predict what the next word is you know what are other parts of the context to the left that are relevant to predicting that so that paper was was quite successful and showed really good results on language translation tasks with a lot less compute so the blue score there and the first two columns for English to German and English to French higher is better and then the the compute cost of these models shows that this is getting sort of state-of-the-art results at that time with 10 to 100 X less compute than other approaches then in 2018 another team of Google researchers built on the idea of transformers so everything you see there in a blue oval is a transformer module and they came up with this approach is called bi-directional encoder encoding representations from transformers or Burtt we it's a little bit shorter and more catchy so Burt has this really nice property that in addition to using context to the left it uses context all around the language the the sort of the surrounding text in order to make predictions about text and the way it works is you start with a self supervisor objective so the one really nice thing about this is there's lots and lots of text in the world so if you can figure out a way to use that text to train a model to be able to understand text better that would be great so we're gonna take this text and in the bird training objective to make it self supervised we're gonna drop about 15 percent of the words and this is actually pretty hard but the model is then gonna try to fill in the blanks essentially try to predict what are the missing words that were dropped and because we actually have the original words we now know you know if the model is correct and it's guesses about what goes in the box and by processing trillions of words or texts like this you actually get a very good understanding of contextual cues in language and how to actually fill in the blanks in a really intelligent way and so that's essentially the training objective for bird you take text you drop 15 percent of it and then you try to predict those missing words and one key thing that works really well is that's step one you can pre train a model on lots and lots of text using this fill in the blanks self supervisor objective function and then step two you can then take a language task you really care about like maybe you want to predict is this a you know a five star review or a one star review for some hotel but you don't have very much labeled text for that for that actual task you might have 10,000 reviews and know the star count of each review but you can then fine-tune the model starting with the model trained in step one on trillions of words of text and now use your paltry 10,000 examples for the text task you really care about and that works extremely well so in particular Bert gave state-of-the-art results across a broad range of different text understanding benchmarks in this glue benchmark suite which was pretty cool and people have been using now in this way to improve all kinds of different things all across the language understanding in an LP space so one of the grand challenges was engineer the tools of scientific discovery and I think it's pretty clear machine learning is actually going to be an important component of making advances in a lot of these other Grand Challenge areas things like autonomous vehicles or other kinds of things and it's been really satisfying to see what we'd hoped would happen when we release tensorflow as an open source project has actually kind of come to come to pass as we were hoping in that lots of people would sort of pick up tensorflow use it for all kinds of things people would improve the core system they would use it for tasks we would never imagine and that's been quite satisfying so people have done all kinds of things some of these are our uses intra inside of Google some are outside inside academic institutions some are you know scientists working on conserving whales or understanding like ancient scripts many kinds of things which is pretty neat the breadth of breadth of uses is really amazing this these are the 20 winners of the google.org a I impact challenge where people could submit proposal for how they might use machine learning and AI to really tackle a local challenge they saw in their communities and they have all kinds of things keep ranging from like trying to predict better ambulance dispatching to identifying sort of illegal logging using speech recognition or audio processing pretty neat and many of them are using tensorflow so one of the things we're pretty excited about is auto ml which is this idea of automating some of the process by which machine learning experts sit down and sort of make decisions to solve machine learning problems so currently you have a machine learning expert sit down they take data they have computation they run a bunch of experiments they kind of stir it all together and eventually you get a solution to a problem you actually care about one of the things we'd like to be able to do though is see if we could eliminate a lot of a need for the human machine learning expert to run these experiments and instead automate the experimental process by which a machine learning expert comes by a high-quality solution for a problem you care about so one of the you know lots and lots of organizations around the world have machine learning problems but many many of them don't even realize they have a machine learning problem let alone have people in their organization that can tackle the problem so one of the earliest pieces of work our researchers did in the space was something called neural architecture search so when you sit down and design a neural network to tackle a particular task you make a lot of decisions about you know shapes of this and that and like should it be used three by three filters in layer 17 or 5x5 all kinds of things like this it turns out you can automate this process by having a model generating model and train the model generating model based on feedback about how well the models that it generates work on the problem you care so the way this will work we're gonna generate a bunch of models those are just descriptions of different neural network architectures we're gonna train each of those for a few hours and then we're going to see how well they work and then use the accuracy of those models as a reinforcement learning signal for the model generating model to steer it away from models that didn't work very well and towards models that worked better and we're gonna repeat many many times and over time we're gonna get better and better by steering the search to the parts of the space of models that worked well and so it comes up with models that look a little strange admittedly you know human probably would not sit down and wire up a sort of machine learning computer vision model exactly that way but they're pretty effective so if you look at this graph this shows kind of the best machine human machine learning experts computer vision experts machine learning researchers in the world producing a whole bunch of different kinds of models of in the last four or five years things like ResNet 50 dense net 201 inception ResNet all kinds of things that black dotted line is kind of the frontier of human machine learning expert model quality on the y-axis and computational cost on the x-axis so what you see is as you the x-axis you tend to get more accuracy because you're applying more computational cost but what you see is the blue dotted line is Auto ml based solutions systems where we've done this automated experimentation instead of pre designing any particular architecture and you see that it's better both at the high end where you care about most accurate model you can get regardless of computational cost but it's also accurate at the low end where you care about a really lightweight model that might run in a phone or something like that and in 2019 we've actually been able to improve that significantly this is a set of models called efficient net and has a very kind of a slider about you can trade-off computational cost and accuracy but they're all way better than human sort of guided experimentation on the black that black dotted line there and this is true for image recognition for classification it's true for object detection so the red line there is auto ml the other things are not it's true for language translations so the black line there is various kinds of transformers the red line is we've gave the basic components of transformers to an auto ml system and allowed it to fiddle with it and come up with something better it's true for computer vision models used in autonomous vehicles so this is a collaboration between way moe and google research we were able to come up with models that are you know significantly lower latency for the same quality or we could trade it off and get significantly lower error rate at the same latency it actually works for tabular data so if you have lots of like customer records and you want to predict which customers or you know gonna be spending a thousand dollars with your your business next month you know you can use auto ml to come up with a high quality model for that kind of problem okay so what do we want I think we want the following properties in that computer on a machine learning model so one is we tend to Train separate models for each different problem we care about I think this is a bit misguided like really we want one model that does a lot of things so that it can build on the knowledge in how it does thousands or millions of different things so that when the million and first thing comes along it can actually use its expertise from all the other knows how to do to know how to get into a good state for the new problem with relatively little data and relatively little computational cost so these are some nice properties I have kind of a cartoon diagram of something I think might make sense so imagine we have a model like this where it's very sparsely activated so different pieces of the model you know I'll have different kinds of expertise and they're called upon when it makes sense but they're mostly idle so it's relatively computational even power efficient but it can do many things and now each component here is some piece of a machine learning model with different kinds of state parameters in the model and different operations and a new task comes along now you can imagine something like neural architecture search becoming squint at it just right and now turn it into a neural pathway search we're gonna look for components that are really good for this new task we care about and maybe we will search and find that this path through the model actually gets us into a pretty good state for this new task because maybe it goes through components that are trained on related tasks already and now maybe we want that model to be more accurate for the purple tasks so we can add a bit more you know computational capacity add a new component start to use that component for this new task continue training it and now that new component can also be used for solving other related tasks and each component itself might be running some sort of interesting architectural search inside it so I think something like that is the direction we should be exploring as a community it's not what we're doing today but I think it could be a pretty interesting direction okay and finally I'd like to touch on thoughtful use of AI in societies we've seen more and more uses of machine learning in our products and around the world it's really really important to be thinking carefully about how we want to apply these technologies you know they can like any technology these systems can be used for amazing things or things we might find a little sort of detrimental in various ways and so we've come up with a set of principles by which we think about applying sort of machine learning and AI to our products and we've made these public about a year and a half ago a way of sort of sharing our thought process with the rest of the world and I pretty particularly like these I'll point out many of these are sort of areas of research that are not fully understood yet but we aim to apply the best in the state-of-the-art methods for example for reducing bias in machine learning models but also continue to do research and advance the state of the Artemis areas and so this is just kind of a taste of different kinds of work we're doing in this area how do we do machine learning with more privacy using things like federated learning how do we make models more interpretive also the clinician can understand the predictions is making on on diabetic retinopathy sort of examples how do we make machine learning more fair okay and with that I hope I've convinced you that deep neural Nets and machine learning you're already here so maybe you're already convinced to this but are helping make sort of significant advances and a lot of hard computer science problems computer vision speech recognition language understanding general use of machine learning is going to push the world forward so thank you very much and I appreciate you all being here
Info
Channel: O'Reilly
Views: 5,842
Rating: 4.9523811 out of 5
Keywords: O'Reilly Media (Publisher), O'Reilly, OReilly, OReilly Media, Jeff Dean, Google, TensorFlow, TFWorld, keynote
Id: ZHoNF28Nj98
Channel Id: undefined
Length: 25min 45sec (1545 seconds)
Published: Fri Nov 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.