Yann LeCun - Power & Limits of Deep Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so now I am absolutely delighted to introduce our first speaker I would like to invite you to help help me welcome Yan Li kun director of AI research at Facebook professor at NYU and founding father of convolutional neural nets Yan has been a trailblazer for neural networks and his work has inspired thousands around the world to take on machine learning and his work has spurred breakthrough applications in diverse areas ranging from image recognition to speech recognition to medicine and self-driving vehicles Yan has been an amazing advocate for AI throughout the decades throughout the ups and downs of the field which is the hallmark of a true scholar Yan has made many contributions but I would like to highlight a project he did in 2003 as part of a DARPA seedling project to create a first proof of concept for an end-to-end machine learning system for autonomous driving this project was called Dave and really served as an impetus for today's nvidia approach to self-driving vehicles and to some of our own projects here at MIT I have had the privilege of knowing yan and his work for many years and I am so impressed by his ability to to be such a brilliant visionary while retaining he's down to earth sense of humor and warmth and so I was really not surprised recently when I learned that his name the cone which derives from the old brett-brett inform like humph means the nice guy so we are so honored to have him today please join me in welcoming young [Applause] Thank You Daniela it's real pleasure to be here I have to make a terrible confession this is a conference about the future of work and I'm certainly not an economist and but I hang out with a few economist Eric among others and I was at a conference in Toronto recently with a lot of very eminent economists we talked about about AI and described it in terms of in terms of general purpose technology so they see it as a technology that is going to diffuse across all of the society and the economy and transform the way businesses in most sectors actually are is done and I never thought of it this way and I was I was really interested by by the history of this in the in previous centuries of how this kind of technology percolates to society so as I'm not an economist I'm more gonna talk about the science of the technology of AI and give a little bit of a state of the art and then perhaps go on to talk about the limitations and how we can how we could possibly go beyond the current limitations so you know within of course you are all aware of the sort of history of AI there has been you know a number of waves of no different approaches to AI many of which had a lot of hope behind them which were not necessarily fulfilled in the end and you know when the the hopes are now it would feel the the wave of interest in this particular set of techniques gonna die down there were several ways of machine learning and neural nets there were several waves of sort of more kind of symbolic approaches and and and all the types of approaches and right now we are you know pretty close to the peak of a new wave perhaps started by the sort of deep yearning the emergence of deep yawning they every time that such a wave occurs the set of techniques are developed after a while kind of go underground and become part of the toolbox and we don't talk about it engine in terms of AI anymore so if you go to the old history of the first wave of neural net it looks like all work in that domain can start in the late 60s partly because of a book returned by MIT professors Marvin Minsky and Seymour Papert but in fact it just changed now people kept doing what they were doing except they they call this adaptive filters instead of you know machine learning or artificial intelligence or artificial brains and you see this in the various waves of AI and so I'm kind of wondering what's gonna happen to this wave what is it gonna be called ten years from now five years from now and what's gonna be the next wave so what we can do with AI now that we couldn't do before where he's not to the same extent is perception and that has of course a lot of applications everywhere in the economy so things like medical image analysis which answer driving cars which are probably the two most visible applications that will pop up over the next few years that the public will be aware of there's you know a lot of other applications for accessibility for the visually impaired for example translation so connecting people something that Facebook is really interested in Virtual Assistants although that's gonna take a while and you know of course search content information retrieval game security etc and science science is actually being affected by deep learning as well so all of those applications currently use supervised running and supervised running is I'm sure there are economists here in the room it's like regression right you you have inputs and you have outputs you have access and you have wise and you have to kind of learn a function that maps access to wise and you have a limited number of samples and you hope that we limited limited number of samples that you have the machine will be able to kind of learn the suitable mapping so you want to for example map speech towards images two categories portraits to names photos to caption text to topics you know things of that type and so basically the the process is the machine is a function with lots of parameters on it that can be tuned symbolized by those knobs here and you show an image you or whatever input it is you wait for the machine to produce an answer if the answer is wrong you tune all the parameters so that the output gets closer to the output you want and that's supervised running that's you know quite successful and I would say I don't want to say a percentage like 95% but pretty close to 100% of all applications of machine learning you see basically result from supervised running so the next question you might ask is how do you build this box and of course it's not a physical box anymore although it was in the fifties it's a it's a piece of software and a particular way of building those boxes are things like commercial Nets or recurrent Nets or there's a whole menagerie now of particular types of architectures that are derived from recurrent Nets for conditional Nets or other mechanisms and sort of assembled into into blocks and that's really the the substance of deep running is the fact that you can take for tional blocks each of which has parameters tunable parameters and you assemble them in some sort of compute graph and if all of those modules are differentiable you can use deep running or gradient descent to to to to train them so there were early applications of neural nets many applications some relatively small-scale so I'm not sure scale when the applications are was involved in is for character recognition and those things were really widely deployed in the mid-90s for recognizing checks and various other documents but that didn't stop the field from losing interest in those techniques in the mid-90s and I think you know the sociologists of science will have to figure this one out because I was too much in the thick of it to really kind of understand what the dynamics of it was but it's interesting to kind of know why that happened so that we we perhaps figure out how to do it differently next time so what happened over the next the last several years five years roughly is is that these ideas that go back to the 90s were brought to the fore because computers became more powerful datasets became bigger and those methods thrive on on big datasets so you need to have a large number of samples to train a neural net a deep learning system for a new task once the system is trained if you want to you know add a category to what the system recognizes you don't need that many samples but the first pass of getting the system to to run anything you require requires a lot of samples and it's not until recently that you know or kind of digital world was going to producing enough data for this to be possible except for things like speech recognition and handwriting recognition so what happened you know about seven years ago or so is that datasets for image recognition became a variable that were big enough to train those systems and the GPUs appeared which allowed my colleague at University of Toronto Geoffrey Hinton any students to implement very efficient completion dates on GPUs and then you know beat records for image recognition that sort of helped change the mind of the computer vision community and later other communities towards those methods that were kind of skeptical before that and since then we've been a we've been seeing an inflation of the the size and number of layers of those networks and the kind of menagerie of of different architectures that's what kind of makes it interesting the yet I mentioned the one of the robotics project that was involved in I should say actually the idea of learning a robotics task for driving and to end actually goes back to the 80s not it's not me it's Dean Pomerleau at CMU who trained a neural net not a congressional net to drive a truck it was somewhat successful it's called Alvin and what we did more recently was to use commercial Nets for that and then after that DARPA decided to fund a project for off-road robot driving that allowed us to develop those techniques based on commercial Nets to kind of label every piece in the image as to whether it's reversible and it was that you can you can drive robots this way that worked pretty well these are two of the students who work on this project Raya Hotel who is here she's the she leads the robotics group at deep mind and PR Simon a is knowing this poor robot who is working on robotics at Google brain they're pretty confident that the robot is not gonna run them over because they actually wrote the code and they trained it so so this this was kinda state-of-the-art baby for you know commercial nets for for driving and image recognition perhaps in the around 2008 and what's happened since has been nothing less of astonishing for me even invent for me this is a very recent work in fact it was presented at the ICC V conference last week by a number of researchers from from Facebook a research in in Menlo Park the lead author is coming here and it's a particular type of commercial net called mask our CNN they combined several techniques and what this system can do is do what's called instant segmentation so we can take an image and then draw the mask or the outline and the body embarks of every individual object and then label them with a category and and it knows the individual object so it doesn't label the whole blog here as as person it knows each individual person and the performance of this is is really impressive in fact this paper won the best paper award SEC v for that reason and can pick out small objects in images objects that overlap each other etc pick up you know backpacks count sheeps there's always a sheep picture somehow in computer vision and and into various other things like this with you know partial view and inclusions and if you would ask the computer vision researcher maybe five years ago at ten years ago how long it's gonna take us to solve this problem to this extent they probably would have refused to make any prognosis this was very you know it seemed completely unattainable relatively small number of years ago the same system can be trained for multiple tasks the things like evaluating the the pose of human bodies and that's very useful for a lot of applications in augmented reality virtual reality but also knowing what people do in images and things like this so this is perception so perception really works and there's a huge number of applications that this can be used to you know medical imaging and and submarine cars being being the first ones but but what's missing here one thing that's missing from perception is reasoning so intelligence is not just perception right there is also reasoning this memory there is planning is all those things it's common sense then yet I mentioned that then you know I mentioned two things in her in the talk common sense and fake news I'm going to talk about the first one but not the second one and here is an example of some work that addresses the the problem of reasoning so let's say you have a problem of this type this is visual reasoning this is also a paper presented at icc v the lead author is justin johnson who was an intern at facebook a couple summers ago and the the problem here is is to answer questions of the type as there are is there any mat cube that has the same size as the red metal object for example things like that or perhaps a question of the type are there are more cubes than yellow things okay so if you want the system to answer this question it has to kind of configure itself to compute the answer and what if we figured they could do was feed the sentence coded in the sequence of vectors to a recurrent neural net at STM that ADA Steven produces a vector that represents the question and then feed it to another SDM that basically regurgitates it in terms of a program if you want a visual program and this program is going to produce a graph of operators that are meant to answer the question and this graph is generated dynamically as a function of the question so it's a bit of a not such a new thing in neural net but something that is gaining popularity is the idea that the architecture of the neural net is not fixed but it's data dependent it's called dynamic graphs and one of the things we've been working on at Facebook is write deep learning for remarks that can deal with dynamic graphs it's very important for natural language understanding and for a number of applications and for things like reasoning and so what this system will do is produce this graph of operators the first operator is gonna filter the yellow objects if you want so it's gonna produce a mask that you know has one blob for a yellow object another one that filters the cubes another one that counts each of those objects and then another one that compares to two and then you get the answer and the beauty of this is that you can basically train all of this with back prop there is sort of a a discretization stat here somewhere so you have to use so zeroth-order optimization there but there are newer versions by erinkoval for example at unity of Montreal that built up on this work since it appeared on archive that are completely differentiable and very interesting so this is this idea that you can sort of conflate reasoning or or allow deepening system to reason not just to kind of perceive I think it's something that's very interesting in terms of research avenues so there are there's a number of issues with supervised machine learning and some of them were mentioned by then you know before machine learning reflects supervisory reflex biases in the data and that has that might have very dire societal consequences if that problem is not there ways in appropriate ways if the data is generated by people and the people are biased then the machine will just reflect this biases on behalf perhaps even amplify them so there's a number of work on trying to reduce those biases system in data so have machines that are less bias than people I'm personally really really optimistic on that on that account in the sense that I think it can be easier to remove bias from machines and then it ever was from people and you know we can have methods that you know to some extent could could do this there are other questions how do you test for reliability where the system is not we can't use formal methods as in traditional computer science when you know you write an auto pilot for an airplane you know you can write the program in three different ways and then run verification systems on it here you train the machine and so you know it's gonna make some mistakes and the reason and the question is how much it's a question that's discussed a lot which is you know whether decisions made by AI system should be explainable so my position on this is that it's not as useful as most people think the vast majority of decisions made by machine learning system systems don't require explanations and in fact would be no one to really look at those explanations so I think it's not as useful as people think but there are certain decisions when they concern people or when the decisions are inputs to human decisions where explanations are very useful that for in the legal domain for example and there's been a bit of a myth that neural nets are black boxes that we can we can really understand what works inside that's not true I mean we can it's a program we can look at all the variables we can do sensitivity analysis all kinds of techniques for this the reason they are not used is because they're not that useful at least in the use cases that we have today so let me switch gear and talk about the limitations of those methods and of supervised running and then talk about how we could sort of go beyond that so there are really three sort of modalities of Ronnie or paradigms if you want in the machine learning context one is reinforcement running the supervised learning and unsupervised learning so supervised running we already saw what this is you tell the Machine where the correct answer is and you give it you know a significant amount of information every time you it goes through a trial it you you tell you the right answer so it's you know a fairly large number of bits you give it reinforcement running is the process by which you don't tell the Machine the correct answer but you tell it whether it did good or bad you basically give it just a single scalar once in a while and the machine has to kind of figure out by exploration what output would have maximized the score so the feedback is much weaker and the result is that the number of samples that are required for the machine to learn anything is much much larger than in supervised learning and then and then there is something called unsupervised Ronnie or perhaps I prefer the term supervisor predictive learning which we do mode where the machine basically learns how the world works by observation just you know looking at the world kind of like babies and animals do so let me talk a little bit about reinforcement learning so it's it's kind of brought to the attention of people for various things you know machines that can play games you can basically train themselves to play game by just you know trying to maximize the score and and for simple games or games that are simple to simulate thousands of frames per second like doom or go or things like that the machine can play millions of games in a relatively short time perhaps in parallel on multiple machines so it works because there you're not limited by the number of trials in games you can run game so fast and on so many machines that you know the machine can run fairly quickly but it doesn't quite work in a real world because if you want to use reinforcement running to drive a car the car will have to drive off a cliff about fifty thousand times before you figure that's a bad idea and so you can't do this right so what people try to do is basically use simulators and then try to transfer the skills learn to vice you know through simulation to the real world or perhaps find other ways like you know we seem to be able to learn to to drive without crashing too many times so how is it that we can do this what kind of running are we using so that's really a question that you know it's been bugging me for quite a long time that it me to this slightly obnoxious at least obnoxious if you are kind of a true believer in sort of pure reinforcement running which is a if you measure in terms of the amount of information you give to a machine or you ask it to predict for the different modes of learning and you think of intelligence as a cake being a Frenchman I can't have to do that yeah but actually it's a it's a black forest cake it's more like a German cake you so the bulk of the cake would be in terms of amount of information you as a machine to predict would be would be unsupervised predictive running the icing on the cake would be supervised running and the cherry on the cake would be reinforcement running now this is a black forest cake and the cherry is not optional so you need you need your cherry I mean you need to kind of drive the system to sort of learn things that are useful for a task so this is not saying that reforms mentoring is useless this is saying that it's not situation supervised and reinforcement learning are not sufficient for learning complex things so how could machines learn like humans and animals by essentially observing the world and this is really the question of can machines require common-sense and I think of common sense in very general terms so I think a cat has common sense some level of common sense that machines don't have and and I know cats can understand the dynamics of their body and and you know value things much much better than any learning models that we have so in my mind and this is very personal opinion that not everybody agrees with is common sense is the ability to fill in the blanks so you have partial information about the state of the world through perception and common sense allows us to kind of fill in the blanks because we have background knowledge it allows us to fill in the blanks so if I you know if you only see my left profile you can tell where my right profile roughly looks like because most human faces are more or less symmetric you have a blind spot in your retina but your brain can fills it in and you don't even realize your blind spot you naturally predict the future you you have some idea of the consequences of your actions yes um what's gonna happen in any situation that you encounter so the ability to predict to some extent is the essence of intelligence really the ability to form models of the world hopefully causal models and and what we need to get machines to kind of learn how the world works and build model of the world is this idea of predictive or and supervised running so babies do this babies learn things like object permanence or gravity you know intuitive physics big edge do this this is a baby orangutan that is being shown a magic trick at the bottom here where the object inside the cap is removed without his knowledge it looks at the antique up and rolls on the floor laughing because it kind of breaks is model of the world and this is a young wrong at all in fact people do experiments like this with babies to figure out at what age they learn those basic concepts about about the world you know it's the world 3-dimensional you know how do we learn to track faces that you know how do we learn that an object that is not supported it's gonna fall things like that and so colleague at in Paris Emmanuel Dooku is the community scientists at economic superior established this this chart here which indicates at what age babies learn very basic concepts so things like gravity gravity and inertia conservation conservation momentum so basic intuitive physics babies run this around the age of eight month or so it's not something we're born with you know that it takes a while to kind of figure this out before eight months if you show something like what's shown at the upper left where you know a little cart is you know basically suspended in the air before it moans baby say well you know that's the way the world the world works you know fine after eight bones they react like the little girl here the bottom left they open their mouth like like the monkey and they open their eyes and they fixate and say what's going on they don't say what's going on they think what's going on so we could ask the question you know how do we get machines to do this and first we have to think about the architecture of a of a complete AI system so it has to have you know it produces actions that act on the world the world responds through percepts that give you know a partial view of the world and inside the machine there is an agent that it state to some sort of objective and the objective kind of measure is how happy or unhappy the agent is essentially we have something like this at the base of our brain it's called basal ganglia and it basically tells us before happy or not it tells us you know for in pain if we were hungry you know it computes all those things and what you would like to put inside of the agent is something that contains a model of the world that allows the system to sort of predict what's gonna happen in the world just because the world is being the world or as a consequence of its actions and if this model is is accurate then the machine could plan could use it to plan ahead and it's a very classical thing to do in optimal control theory there's a lot of roboticists here and they say well yeah of course we do this every day but machine only people really don't think of it this way which is kind of weird so inside of the intelligent agent there is there needs to be some sort of world simulator configurable world simulator since I have something that we have in our frontal cortex and you know something that generates action proposals something that predicts the long-term value of the objective that the machine is supposed to optimize we know how to build the blue box on the red box we have no idea how to build a green box it's not that we have no idea we have ideas it just don't work very well and it is why they don't work the problem with the world is that it's not entirely predictable it's either a stochastic or not fully observable or just very complicated and so if I if I do a super experiment I want to train a machine to predict the future and I show it segments a video of someone putting a pen on the table and then letting it go and then I ask the machine to predict what's gonna happen next machine is probably gonna learn very quickly that the pen is gonna fall every time but it can't really predict in which direction we can't really predict in which direction the Penny's gonna fall so that means it's that I can't train the machine to just predict a single value here is what happened predict this because it's basically impossible to predict exactly so there is a set of potential futures that is represented by this red ribbon here and the in the graph and every time you run the experiment one particular outcome is gonna is gonna happen but the machine may make a different prediction and you don't want to punish the machine for making the wrong prediction as long as it's in the red ribbon of potential futures the parlament said of course we don't know what this reprimand is you know it's determined by the data and so what we might have to do is train a second neuron that to tell us what to tell the first neuron that you are on the red ribbon or you're outside of it if you're outside of it train yourself to you know produce something that's on the ribbon and that's the idea which is called adversarial training is the basis of it which i think is the best idea machine learning in the last ten years and it's the idea you you train to neural nets against each other or they help each other release they're not totally against it's more like a teacher or critic if you want that transfer predictor and I'm not going to go into the details of it but basically the the green box predicts and the red box is trying to tell whether a prediction is good or not and initially it's not you know it doesn't know how to tell the first Network whether the prediction is good but eventually it learns to make the difference and the the generator that makes the prediction can use the discriminator to figure out how to change its output so that its prediction are good as far as the discriminator is concerned and what what's really cool about this is that the generator can have a source of random vectors z that will basically allow the system to produce different outputs for different values of the z different predictions if you want so the first experiments were kind of astonishing a few years ago this was a paper called DC gun one of the choruses is that Facebook so much in Terra and the what you see here are pictures of bedrooms the system has been trying to produce bedrooms from a bunch of random numbers and those are non existing bedrooms produced by the system there's been a lot of progress over the last few years a huge amount of work on those generative models and a lot of hope that it can be used all kinds of applications in content creation and image generation these are kind of earlier tents at training them to produce dogs and you know they produce stuff Salvador Dali dogs this is this work is a few years old this is some work for video production so what you see at the upper left the upper right I'm sorry is what's produced by a video predictor trained on small segments of video using kind of a supervised learning if you want so basically is the least square cause that says produce the future predict the future and of course because there are multiple futures what the thing produces is an average of all the possible futures which is a blurry image if you train with those adversarial training techniques you get much sharper images which can make sense so more recent work in this direction is is for not predicting pixels directly but predicting the mask of objects that can move around so this is you know videos taken from a car and you know he predicts that you know the car that starts turning life is gonna keep turning life the pedestrians that start crossing the street and keep crossing the street it's very useful to be able to predict what's gonna happen before it does before it does happen if you know which were trying to self-driving cars so that might be kind of in a first hint that we might be using model-based systems if you want for for autonomous machine learning systems so I have a couple of predictions and we'll end with this one is is that AI and machine learning is gonna change the value we attribute to things and this is me planning an economist here and I'm not an economist so take this with a grain of salt or ask Eric if I see something stupid so you can today go on Amazon or whatever and buy a blu-ray player for 47 bucks and it's an incredibly complicated piece of technology very sophisticated is should be about robots and you know it's astonishing that you can you can build it for that for that price if you want to buy a handmade ceramic bowl the technology is 8,000 years older so that's gonna cost you 750 bucks or so you know may be cheaper in some countries but but there is you know real human intervention and real human experience and we give value to this and we're going to give more and more value to this and less and less value to material goods that are built by robots essentially similarly with music you can download mods @mods as opera tea table filter for seven bucks or so if you want to go to performance of the Magic Flute it's you know 200 bucks 400 bucks 800 bucks if you want a good seat so the prediction is that there's a gradually bright future for creative and artistic professions that have can authentic human communication in them and creation and emotion communication we can build machines that will do jazz improvisation but they're not gonna communicate human emotions they might fake it pretty well so as I said before AI is a general purpose technology according to many economists and it will affect many sectors of the economy but it will take a long time before it kind of diffuses in all all sectors of the economy you know 10 to 20 years is what economists say technology takes to go everywhere and improve productivity and so automation of course needs to drop displacement it's not a new phenomenon I don't think AI is qualitatively different from previous technological revolutions in that respect that question technological unemployment which means you know people who don't have the skills that are required for this so I can enjoy this thing where you have the set of skills that are required for the economy and then they say the skills that people actually have and as technology moves there is much people can left behind and I was really worried about this because I was thinking that you know as technology progresses faster and faster there could be more and more people left behind what economists are saying what I learned recently is that in fact it's it's the fact that there are people trailing behind that limits the speed at which the technology disseminates in the in the economy so there is a limit to how much how many unemployed people you can have it will limit how fast technology progresses so I think to to end I think will have a true AI when we solve this problem of getting machines to learn the way humans and animals animals do its predictive running which were working on very actively a lot of people's were deep mine and various other labs work on this discovering the basic principles behind this may take 2 5 10 20 years we don't know once we've discovered those principles it might take another 10 20 years to actually figure out how to reduce it to practice and make it work effectively that's always what it takes cultural nets were you know first produced in 1988 and it took about 20 years for them to be kind of pervasive so and there is a danger that perhaps intelligence is not that simple it's a kludge and it's gonna take much longer to produce intelligent machines so I'm going to end this here and thank you for very much for your attention the
Info
Channel: The Artificial Intelligence Channel
Views: 82,858
Rating: 4.8915567 out of 5
Keywords: singularity, ai, artificial intelligence, deep learning, machine learning, immortality, anti aging, deepmind, robots, robotics, self-driving cars, driverless cars
Id: 0tEhw5t6rhc
Channel Id: undefined
Length: 36min 47sec (2207 seconds)
Published: Sun Nov 19 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.