Yann LeCun - How Does The Brain Learn So Quickly?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's our third speaker of the morning session I'm my pleasure to introduce for that professor Iyanla Kuhn a yawn got an engineering degree an electrical engineering his PhD in computer science from the University of Pierre Marie Curie in Paris and he did a postdoc with geoff hinton at the university of toronto he's currently a director of Facebook AI research and the silver professor at NY u-- his primary field he lists as artificial intelligence and machine learning and yan is best known for his work in deep learning and really the invention of the convolutional neural network model which is now widely used for image video speech and text understanding and I'm sure you'll hear much about that so with that please welcome Yann thank you Jim well thanks for the to the organizers for inviting me I like I don't I don't consider myself a neuroscientist I don't actually consider myself a computer scientist either I used to be an engineer I'm not exactly sure what I am and anyway so I probably sort of you know kind of fly in the middle of this diagram that Nico was showing earlier so I have have a question which I suppose a a practical aspect and also kind of an aspect in terms of how the brain works which is why is it that the brain learns so much so quickly the models that we have the AI models that we have the machine learning models that we have deep running etc are not nearly as efficient as what we observed in biology in terms of how fast they can learn things how few trials they need and you know how how much of a big understanding of the world they get by just observing the world or playing in it we can't do this with machines today which means that we're missing an essential piece which evolution has figured out but you know we haven't figured out yet and this is why you know our researchers need neuroscience and cognitive science I think I certainly have had over the years a lot of inspiration from from neuroscience as I'll tell you about a little bit and I forgot to start my timer we go okay so there's number of questions I am interested in in his talk you know I had the advantage of like being able to modify my slide as Josh was talking it says all these the I systems that we see none of them is real AI and this is one of one of the increasingly large number of things that Josh and I agree on there's plenty of things we disagree on but we agree on this in fact we tend to start our talks the same way nowadays as you'll see a minute the the brain learns with an efficiency that none of our machines machine learning algorithms can't match and you know or supervised running algorithms require a lot of examples or reinforcement learning algorithm require ridiculously large amounts of examples or trials to run anything which is why they basically only work for games they don't work in the real world and that's why we don't have robots that are as agile as cats or rats or even simpler animals that's what we don't have you know intelligent dialogue system and controller systems that are not frustrating and that have common sense so what's missing and I think what I'm going to argue is that you know we need new learning paradigms that basically build models of the world from observation and action and not just for reinforcement not just to supervise running but through another type of running called the predictive learning on supervised running and so parenting is a bit of a generic word it's not a very different argument from kind of the main argument Josh had in this talk so again we agree on a number of things and that's the most important one I think so supervised running we all know what it is you have you know some sort of parameterize function a classifier a deep learning deep neural net where you know you adjust the weights as running takes place and you show eight thousands of examples of cars on airplanes and you know whenever you shoot a car you tell it it's a car if it doesn't answer a car you measure some sort of error and you kind of figure out how to adjust the knobs so that the error decreases and it's nice because there is nice theories about this it you know those things can generalize to object them to views the never-seen-before instances they've never seen before there is transfer learning so if you if you train one of those networks on lots of examples you can add a new category with just very few samples which starts to look a little more like it like human and animal learning but we still very far from that and you know the the way those sort of modern recognition systems a machine learning system are built is based on the idea of deep learning which is just the idea that you know competition should have multiple steps basically that and and representation of the world should be hierarchical in some way that's that's all there is to deep learning really and then the question is so you you want to build a machine that he's a cascade of parametrize nonlinear functions they have to be nonlinear otherwise there's no point having multiple layers and then the next question you can ask yourself is what you put in the box and that's what you can get inspiration from neuroscience so the sort of modern way of building neural Nets is you alternate two types of operations sometimes three or four but in simple case just to you see the activations of one layer as a list of numbers you multiply it by a matrix basically you compute weighted sums and then you pass a result to a point wise non-linearity which in modern neural nets are just a half wave rectifier in old neural nets it was a sigmoid but we discovered that it's better to use just half wave rectification you can have neural nets with many more layers without running into problems when you do this and training takes place by optimizing some sort of objective function that you know computes the discrepancy between what the machine produces and what you want using stochastic gradient descent you estimate the gradient of this cost function with respect to the parameters the connections between the virtual neurons extremely simplified neurons using the back propagation algorithm just you know gradient estimation if you can get an estimation and of course you know you can get even more inspiration from biology not just neurons and you know modifiable synapses and you know non-linearity with a threshold but also in the kind of architecture and the connection between the neurons so commercial Nets are kind of very direct inspiration from you know classic work by Google and Wiesel which you know were translated into computer models by Fukushima in the early 80s and I got inspired by this to kind of build models of this type trained with backpropagation and they're they really I mean they're composed of essentially three types of operations linear operations which are a kind of filter Bank like receptive fields in v1 or the visual cortex plus a non-linearity and then the other operation is a pooling which is very similar to complex cells the Google and whistles complex cells and so this idea of stacking multiple layers of course a lot of people have had this idea before in some sort of conceptual model of how the visual perception or an auditory perception works and as I said the first computational models that kind of started doing something very useful where Fukushima is neo Neutron started working on this in the late 80s and got some success with character recognition this is the kind of stuff that those all know let's go do recognize characters very accurately pretty quickly we realize that they had one quality that more classical pattern recognition system at the time couldn't have which was the ability to recognize multiple objects at the same time and to essentially partially solve the binding problem so there is no explicit mechanism in those networks to to bind features together to form objects there's just many layers and nonlinearities between them and that seems to be enough to solve the binding problem there was a lot of work in theoretical neuroscience in the 90s about how how does a brain so binding problems by you know synchronization of spikes and and you know fast changing synapses and things like this and it turns out that doesn't seem to be really required doesn't mean it's not there doesn't mean it doesn't exist doesn't mean it's not useful in general but for this it doesn't seem to be required so in the last few years it's been sort of an explosion of work on on combinational nets because they they with the advent of fast computers and GPUs and large datasets those things require a lot of data to be able to to do a good job large data sets became only available in the last five years or so and so that that's when the heels started taking off of using neural nets or sort of practical applications even though there were success before and so we you know we have commercial nets that have you know on the order of one to ten billion connections oops I went a little too fast here not sure what happened you know a few million simplified neurons anywhere from 8 to 20 layers this was maybe four five years ago and when you trained and supervised the the kind of filters that I learned the video unfortunately is not playing properly the the filters that you end up with are very much like like v1 receptacles essentially or you know at least qualitatively over that the last few years there's been an explosion and inflation in the number of layers in those networks and you you don't want to think of the layers as physical layers with more as types of computations so if you if you think of the the visual cortex the don't the ventral pathway of the visual cortex for example as having recurrent connections within the one with envy two with envy for each path around the recurrence you could think of as as a layer and so the networks that are used in practice that are deployed by Facebook and Google and Microsoft IBM everybody else have on the order of 50 to 100 layers now or 50 to 100 kind of steps if you want in the computation and all of them sort of compositional in some way and those are used on a big scale is a large amount of power spent running those things in data centers you know Facebook and Google's data centers Facebook users upload something like 1.5 billion photos on Facebook every day and every single one of those photos goes through four of those guys within two seconds so a lot of it is spent there so the advantage of this idea of using multiple layers is just the idea that you want a representation of the world that is hierarchical and the reason you wanted to be hierarchical is because you want it to be compositional the world is compositional in fact there is a nice quote from Jason high star who is a natural language processing researcher at Johns Hopkins who says the world is compositional or there is a God it's the only way we can think of the world as being understandable right Angeline said you know the most understand you know incomprehensible thing about the world is that it's comprehensible and it's probably because there is some compositional structure to it so objects are forms of parts you know parts are formed of sub parts and party surprise motifs and motifs of kind of elementary edges in the case of images etc and you have the same kind of hierarchy and language and text in audio signals just about any natural signal so you can train those things supervised to do all kinds of tasks including you know driving you know robots around you train the robot you know by kind of this is a little this is a very old project almost I guess 13 years ago now and you you drive this robot you know in sort of busy environments and the commercial net just records the images from the camera and and the steering angle and then basically learns to avoid obstacles but through imitation learning essentially you can be a little more sophisticated about this and drive robots I'll show you a video at the only five-time which I probably won't and a lot of ideas using commercial Nets not just to recognize objects but to basically label the entire environment so you can this is this is more of the classical convolutional nets you can think of as being the ventral pathway this is more the dorsal pathway so instead of really sort of you know precisely classifying everything you basically tell where everything is and without necessarily having very accurate classification but but very good localization and you can imagine that this kind of stuff is very useful if you want to drive cars around so this is a little video from a company called mobilized was recently about the Intel that builds a lot of the vision systems for certain sir driving cars Nvidia the company that makes the GPU chips is also working very actively on this and using those things on a grand scale for for autonomous driving it's been some interesting work at at Facebook which I guess in the context of computational neuroscience can be interpreted as sort of trying to emerge ventral and dorsal pathway functionalities to enable the machine to not just recognize objects but localize them and not just vocalize people in general but individual instances of each object and it's kind of amazing that this works but it works really very well it's conceptually very simple it's a big commercial net where the output is a bunch of masks for each object and categories associated categories and it's trying supervised or semi-supervised in some ways and the results are nothing short of astonishing where you know you can pretty much you know see that every object is outlined or colored in a case and and identified including instances that overlap like the ships and the the top right so this is something that we're deepening has brought about progress that people did not expect will happen so fast so if you'd asked computer vision researcher five years ago or ten years ago how long is it going to take for us to solve that problem you know they might have said 20 years or maybe not within my career or something so it's kind of it's kind of great that we've been able to do this and that's what created the you know big interests from industry for those for those methods and you know all of this is great but you need to collect a lot of data and have it labeled by humans to be able to do all those things and figure out you know what pose people are taking and things like that the supervised running works when you have enough resources to collect data in fact commercial nets are used at Facebook for translation so some of the language stares at Facebook automatically trans translate when a post is made in one language and Facebook knows you don't understand that language it translates automatically and some of the translation system use commercial Nets they're applied to the kind of a text sequence it's a sequence of words essentially where words are represented by vectors and it goes to accomplish all net that kind of produces a abstract representation of the meaning of the sentence and then another D cognition net which is sort of a commercial net backwards if you want that kind of generates the translation so it's kind of a bit of a general tool but there are lots of questions that are left in the air if we are interested in how how the brain works or really what intelligence is all about so one question I have been asking myself and and you know getting different answers from different people is how many learning algorithms does a brain implement and you know I put quotes around algorithm because we know you know it's not clear whether that is best described by by the world algorithm so is it close to one or maybe a few you know one general algorithm for the for the cortex if you want is there and I'll go down with the cortex and you know of course it would be different ones for you know the hippocampus and the cerebellum and stuff like that that's ok there's still a small number but there are people who say that no it's just a kludge right evolution just you know kind of cobbled this thing together so that it would work and there is no hope of finding any general principle behind its its function Garvey Marcus is one of those a colleague at NYU in the psychology department and he's made that point that in fact he has a book whose title is Cluj that describes how the brain is correction hacks essentially so if he's right then the whole enterprise of trying to figure out how the brain works from first principle is not gonna it's not gonna succeed essentially but whether it's true or not I'd like to use the apotheosis that is true and push this as much as we can how much prior structure this is another question that also Gary Marcus is interested in and I assume many people in the audience how much prior structure does any more learning require and human learning and of course machine learning so commercial Nets are an example of imposing some structure in a machine learning system so that it learns the concepts are irrelevant and he represents them in appropriate fashion but how much such structure is required if you want to build sort of an end-to-end intelligent system it's that question is completely in the air is there something really special about the visual cortex for example compared to the frontal cortex what's different is it something that's really relevant or not or is the cortex just kind of a uniform piece of goo that you know learns whatever it's fed it's y'all nature-nurture debate this is actually what got me to machine learning in the first place getting reading about this in like debates between psychologists and linguists you know Chomsky and Piaget and people like this so all of our learning algorithms are designed as are sort of based on the idea of minimizing some sort of objective function that's what makes them sort of well principled you will have a hard time getting a paper accepted in a machine learning conference unless you're a machine learning algorithm minimize some sort of objective function but if the brain minimizes an objective function first of all does it minimize an objective function even if it's implicit right and if it does what would that function be it would probably be a collection of multiple functions but what what would it be and then the last question is how does it minimize it so all the efficient algorithms we have evaluate gradients by back propagation you know you estimate the gradient using back propagation that's very efficient when you have non differentiable parts in your learning system you kind of have to be very the gradient by perturbation it sort of works but it's really really inefficient and so I don't think it's kind of a very common principle for efficient running gradients really worked really well and I don't know any alternative to this so you know does the brain do back pop you know I'm not talking about supervised run you necessarily but is there a way I'm yoshua maybe is gonna talk about this later here's your father he's been working on like you know explaining how like trying to find forms of great an evaluation that could be biologically plausible and then one important question that I'll come back to later is how does the brain handle uncertainty in prediction so this is something you know where Josh immediately jumps to saying well you know probabilistic models and I I'm wary of probabilistic models because probability distributions in high dimension spaces are horrible objects that we can't represent pardon yeah but but you know I'm I'm perfectly ready to you know throw probability theory under the bus you're not what rejection is are converging yes so you know what's what's your sequel to AI why is it that we don't have machines that you know can learn to navigate the world as well as a cat what is it that we don't have systems that can learn language as as well and as efficiently and how the world works as as kids or wrong tones we had example of wrong tones amazing animals they're not so short they don't have language they're solitary animals they're incredibly smart almost as small as we are probably smarter in some ways so you know the punchline before before I go into this is that Ori I systems need to learn models of the world and they need to be able to use them to reason and plan and again I'm you know it's very similar to what Josh was saying so you know common sense is kind of a recurring sort of holy grail kind of question in AI like how can machines get common sense there is something in there I call common sense reasoning which doesn't mean there are methods to do this it's a problem it's not a solution but but common sense perhaps if you kind of reuse it to something concrete is the ability to fill in the blanks so for example the ability to infer the state of the world from partial information from say visual information you don't see everything about the world but you can infer a lot of information about the world through vision even though you have only a partial description of it infer the future from the past and the present predicting the future is very important for intelligence in fact arguably is the essence of intelligence in first events from the present state of the world like what happened filling your visual field you know there's low level function of this type filling the the original blind spot in your visual field we're not conscious of it because our brain fills it up you know what kind of relatively low level we don't you know a subconscious level and that is probably based on some sort prediction you know we can predict how the world is going to look if we move our head twenty centimeters to the left and that's because we've kind of abstracted the notion of depth and you know objects that are in front of others and things like this and we probably learned this at a young age so that is kind of an emergent property of being able to predict what the world is going to look like when you move your head so predicting any part of the past present or future process from whatever information is available I think is the underlying ability that from which common sense can emerge but what we need here is a way to train machines to kind of do those predictions and that's what generally a lot of machine learning people have called unsupervised learning although that means a lot of other things so I prefer another phrase for it predictive running which doesn't mean necessarily predict in the future but predict whatever is missing and unsupervised running unfortunately is kind of like the dark matter of AI in the sense that you know all of the stuff you see about AI is all supervised learning and reinforcement learning but but really most of learning most of animal learning is unsupervised and we don't know how to do this so it's sort of like physicists who tell you you know we know order ordinary matter is like 5% of the mass of the universe and the other stuff that 95% we have no idea what it is it was dark matter dark energy etc we really have no idea so the same kind of embarrassing situation so you know we learn how the world works by observation we learn we learn you know object permanence like this baby here on the top right we don't all kinds of basic things about the world very early on in the first few months it's wrong we dong here figured out that an object cannot disappear by magic and so it's totally fooled by this little simple magic tour and kind of roll roll on the floor laughing when he realizes the object is not there right so a break in your model of the world is either funny or scary or surprising and in fact that's exactly how psycho physicists and or you know child development psychologists you know test whether something is it's kind of understood by baby you're kind of breaks the model of the world right you push this little this little track here off of a thing and it doesn't fall and you know below the age of six to eight months the baby is just you know don't care sure yeah right yeah we're not and after eight months they they say wait wait this cannot be you know this thing has to fall and so they they look at it like the the baby on the bottom left here and that's how you can measure that whether you know the model is being violated and so a manual Dooku in in in France at economica in France is kind of made this chart of all kinds of different stages that which you know number of months I wish we learned those those basic concepts like you know recognizing biological motion form from sort of basic objects object permanence we learnt pretty quickly within two months but it still takes two months rigidity stability and support you know the the towers of of cubes that from my colleagues at Facebook that George were showing is kind of that kind of thing and then gravity comes you know around eight months etc motor control comes later than perception babies around perception much much earlier than motor control and so the idea that somehow you need to interact with the world to perceive it is not it's not clear it's not how much of it you need but there is another argument for in supervised learning which is knocking about geoff hinton has been making for you know 30 years which i didn't quite believe at first because I didn't know how to define in supervised running but his argument goes like this says the brain has about 10 to the 14 snap sees and we only live for about 10 to the 9 seconds so we have a lot more parameters and data this motivates the idea that we must do a lot of unsupervised running since the perceptual input is the only place where we can get 10 to the 5th dimensions or constraints per second in other words it says you know predict everything you observe because that's the only way you can constrain a large brain enough to learn you know all the parameters it has if you only ask you to predict a couple labels or a value function you're going to require millions and millions of millions of examples and you're never going to be able to run anything essentially in your lifetime so then it me to this obnoxious analogy or not choose to certain people which I which is if you want to quantify the amount of information that a machine is is given amount of feedback emission is given and asked to predict in different modes or paradigms of running in the case of reinforcement learning so reinforcement training is a you know the scenario in machine learning where you don't tell the Machine the correct answer you just wait for it to produce an answer and you tell it whether it was good or bad essentially a single scalar at every trial supervised running you give the correct answer and so if it's you know Whang among a thousand categories that's something like 10 bits so you give you know quite a bit more information to it so it's going to learn things you know with less effort and unsupervised learning you ask the machine to predict everything basically tell me what your complete perception is going to be in the next second minute hour days and so the machine has to a lot and it only needs to wait for the future to occur to get the feedback and you get a lot of information from from that that's the only way you can can I get enough information to train a large system and that's the only whether a machine or an animal can learn model of the world predictive models for the world so it's kind of it's become sort of obvious to me that that is something that we need to be able to do with machines that said so basically you know I was talking about the dark matter of AI being you know being on supervised running in this case the dark matter is actually chocolate gin was which is sort of yummy that said you know we do work on reforms my colleague at Facebook you know traditional reinforcements running for you know playing violent games or playing other violent games this is very interesting kind of vehicles for you know developing sort n to n AI systems that learn to act and perceive and everything but here is where we want to go reach Sutton a number of years ago in 1991 proposed an architecture an architecture he called diner which in which he has this argument the men the main idea of diner is the old common sense idea that planning is trying things in your head using an internal model of the world this suggests the existence of a more primitive process for trying things not in your head but through it direct interaction with the world we first met founding is the name that we use for this more primitive direct kind of trying and Donna is the extension of reinforcement learning to include a learn world model now truthfully all of this now is called reinforcement learning it's just that what he called the reinforcement learning with the model is called model-based reinforcement learning but a lot of people call Weaver you know a lot of things reinforcement running that deserve the name and this based on kind of I mean there's some parallels with a classical idea of optimal control so in optimal control when you want to I don't know you know to plan the trajectory of a NASA plan so trajectory of a rocket ok using simulation it has an accurate model of the rocket numerical model of the rocket and it has two functions for it it knows how to compute the next state from the current state and the commands and also it knows how to propagate gradient to it it's basically using back prop in control theory they called it the adjoint a joint system method but it's really basically back prop in time the invented way before machine learning people and so you have essentially a simulator that of the thing you want to control which is different a differentiable function for which you can compute the the gradient the the derivative the outputs respect to the inputs and this thing you know you simulate it you know step by step discrete time every time you hit a command you know position on the nozzles for racket or whatever it is and there's an objective which is you know get to the space station by me no minimizing fuel consumption and you know we're not taking too long or something and what you do is buy grain and decide to backdrop to time essentially you figure out a sequence of commands that minimize the objective function you can propagate through all those modules very easy this is how also sort of classical robotics does planning trajectory planning so you have a model perfect dynamical model of the robot you want to control and you have some specification of the constraints on the cost you don't want to hit obstacles to too close you know things like this we minimize jerk and and do a particular task and you do this planning this that way essentially so that they have suggests an architecture from the intelligent system which would be something like this intelligent system interacts with the world it produces actions to the world that you know possibly affect the state of the world and in return e gets percepts observation that go through a perception module and we know how to kind of how to build this although we have to use supervised learning to train it that representation of the percept world goes into an agent and the agent is trying to optimize some sort of objective function that measures an overall cost so this objective function measures unhappiness essentially and it takes the internal state of the agent as an input and tells the agent you are happy you're not happy now if we believe in this idea of sort of model-based like you know prediction having a way of predicting what the world is going to do so that you can think about it you think ahead about what kind of action you want to take inside of the agent there has to be some sort of world simulator a world model that can predict what where the world is going to what it's going to do either because the world is being the world or as a consequence of the actions of the agent is an actor so this is the actor critic kind of bingo here that generates action proposals that are fed to the world simulator that you know generates predicted process or world state and that goes into a critic and the role of the critic is to predict long-term expected value of the objective so it's a predictor for the objective function you want to be able to just me you know optimize the objective function instantaneously you want to predict the long-term consequences of your actions without actually necessarily taking them so all of those modules are trainable the word model has to be an emulator of the world the actor of course has to learn to generate action proposals that will minimize the objective and the critic has to learn to predict the long-term expected value of the objective and again this is kind of you know I wouldn't say completely classical but relatively classical models in France my crowning and particularly in sort of you know models of how the brain is organized and you know the reward mechanisms and all that stuff so in a typical trial you would observe the set of the world run a sequence of actions produced by the actor predict the effect using the world simulator and perhaps you might do a little bit of optimization to figure out a sequence of actions that would lead to a particular result predicted by the better critic and and then just train the actor to produce that sequence that actually optimizes the objective function so the main problem in there is that we cannot know how to train the actor on the critic we have no idea how to turn the world simulator that's kind of the main technical issue I think that we are facing in AI today it's this very personal opinion not everybody agrees few people agree actually but but the main problem is how do we train for models of the world and so Josh showed this work by Adem Lehrer some ghost and work Fergus at Facebook on you know trying to kind of do intuitive physics training a commercial net you predict how queues are gonna fall and when there is uncertainty what you see is that those predictions where queues are gonna fall become fuzzy so I don't know if I can point to any particular example here but here you see the it's hard to predict where this yellow block is gonna fall and so the prediction is fuzzy and it's because the thing cannot you know has to kind of predict an average of all the possible features that can happen we can actually train forward models too you know simulate various physical systems I'm actually gonna skip this because it's gonna take a little too long but you can use them to sort of optimize policies and sequences of actions there's various work which I'm going to skip also in sort of predicting using prediction to help train dialogue systems so it's some work pledges in Western and its co-workers that shows that if you train a dialogue system to predict what the next series of sentence that are going to be a type of spoken are going to be then this dialogue system does a better job at actually adapting if you're trying to reinforce my droning satisfying the answers of the user it's quite a lot of work on this okay so let me jump in the last four minutes on to adversarial training so this is a new technique that was proposed by Ian good fellow who was a student in yoshua bengio slab let me I'm sure yoshua had something to do with it as well and it's very cool idea and one problem that it might help us solve is the problem of predicting under certainty without necessarily appealing to probability theory or to kind of modeling distributions in general so for example let's say that we let the Machine observe a few frames of a video like you know me putting a pen on the table and then I'm gonna let the pen go and the pen you know will fall it's very hard to predict in which direction it's going to fall and we're gonna train a neural net here G of X and Z so X is the the past the you know observe frames and Z is kind of a source of random vectors and the machine is going to make a prediction the prediction may be that for example the pen is going to fall to the left and to the back but in fact when we let the future play out the pen falls to the back and and slightly to the right so technically the machine is not wrong it predicted kind of qualitatively the right thing but it's technically wrong because the frame is predicting is not the one that actually occurred so what we like is to be have a way to tell the Machine okay you're pretty was actually okay you know it's part of a set of possible futures that are plausible and I want to punish you for making the wrong prediction picking the wrong one within that set so what you want is an objective function that tells the Machine any of those answers are correct or are okay and it's only if you go outside of this set that things are things are bad and the problem is that we have no idea how to characterize the set of correct possible futures so we're going to have to do is train a second neuron that to learn that function that's the idea of adversarial training you have two neural nets one that knows to predict and one that is used to assess the prediction of the first one and they're both trained at the same time so in the sort of non probabilistic version of it basically your observed data is a bunch of points and you want to give low energy to them so you want the thing that assesses whether it's something is plausible to output a low number and for everything else that is not possible you want this network to give a high number and this is called you know energy based learning in general so it says how it works you have a data set that actually produces the actual feature and the generator a neural net that looks at the past and predicts the predicts of future currently is doing a really bad job and those two predictions go to the discriminator the second neuron that is supposed to assess whether a prediction is good or bad the way you train the discriminator is that you tell it to produce a low output whenever you feed it real data and you tell it to produce a large output whenever you feed it predictions from the generator but now simultaneously the generator is going to train yourself to produce outputs that the discriminator can tell are fake okay so it's going to get the gradient of the output of the discriminator with respect to its inputs and will change its parameters in such a way as to produce outputs that the discriminator think are correct or real so you know it learns this kind of contrast function that I can draw in the right here the green points are generated produced by generator the blue points are the real ones and eventually this function kind of shapes itself and the green points can I move towards the middle and those things are amazing because you can train them to generate images these are non-existing bedrooms where you put a bunch of random numbers in you get bedrooms out you can do funny arithmetics in future space which I don't have time to explain you can generate funny-looking images that I can look like you know surreal alright you know like servidor daddy dogs or something you can view a prediction so there are short segments where the first four frames are observed and the last two frames are predicted and it kind of predicts more or less what's going to happen it gets bad really quickly if you keep going same here so here the system has been trained on video segments of New York apartments and as the camera rotates it's it has to predict what the apartment looks like in places it hasn't seen so it continues the bookshelves and the couches and everything and again it goes back really quickly this is another example where here the prediction is not in pixel space but in sort of the space of 70 segmentation so we have categories or every every region and it predicts that you know if someone starts crossing the street it's going to keep crossing the street if a car starts to turn left it's going to keep turning left so let's see have one last slide so in this a AI agent architecture the problem is how do we design the objective function and we can't redesign them because they are kind of complicated so we're going to need to learn them and this is kind of a big subject to discussion in the AI community particularly on people worried about like AI being dangerous we're gonna have to train or AI systems to behave and that will have to be done by training the objective function not just hardwiring them so you know the three-dollar robotics and etc that's kind of the hardwiring although it's kind of difficult to do but the rest is we're gonna have to you know teach them good from evil essentially the way we raise children it turns out that's very similar to you know how we train children I want to end with this slide too you know I've been very inspired by by neuroscience in in my work but sort of very sort of conceptual things from neuroscience [Music] but there's a word of caution if you if you try to stick to neuroscience too closely without really understanding the underlying principles you might be sort of hypnotized or blinded by the details without actually you know finding the underlying principles so one question I've been asking myself is is there a general principle behind intelligence that is used could use you know by by biology at one end you know the animal world but that we could sort of use for machines that will play a similar role as aerodynamics place for Aeronautics and thermodynamics for you know heat engines and things like that and that's what I matter thank you [Applause]
Info
Channel: The Artificial Intelligence Channel
Views: 21,362
Rating: undefined out of 5
Keywords: singularity, transhumanism, ai, artificial intelligence, deep learning, machine learning, immortality, anti aging
Id: WUZhLzaD3b8
Channel Id: undefined
Length: 42min 51sec (2571 seconds)
Published: Wed Oct 11 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.