Lex Fridman, Yann LeCun and Yoshua Bengio | Inside the Lab Meta AI Clip

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] foreign with two of the greatest researchers Minds in the history of artificial intelligence and Computing Jan lacun professor at NYU Chief AI scientist and meta and touring Award winner also Joshua bengio who's a professor at I'm not going to say this with a French accent University of Montreal founder and scientific director of Mila go back AI Institute and also touring Award winner I thought it would be good to start by asking what is your broad vision for our path our journey towards human level intelligence maybe you're sure you go first sure thanks uh likes so I believe that we are still far from Human level AI and one of the ways to think about this Gap is to look at problems that humans are really good at and that machines are not and and and one important such problem is the ability to generalize well on a new tasks out of distribution or new settings and if you look at how humans do it they think they attend uh to the new situation they reason around it they take the time to think about the problem and that's something that we need to integrate into our AI systems and we can take inspiration from how brains do it there are a lots that we know about conscious processing that we can actually integrate into machine learning we can think of this the way I think about this is there are preferences or inductive biases or architectural preferences that we find in the way the brain works that we can put in for example how knowledge is represented in a modular way um with pieces of knowledge that are reusable that can be composed on the fly to solve new tasks how these pieces communicate with each other through a communication bottleneck um how the information that's communicated is going through uh a stochastic heart attention and how what is selected looks like our thoughts uh and and that the things that we think about often have a causal interpretation that is related to how we can act in the world uh how interventions in the environment are explaining what we're seeing um and and all that can be done I think uh with neural Nets with uh maybe uh some different way of thinking about it uh that's the kind of thing I'm working on and I mean I would love to tell you more about okay you said a lot of interesting words there out of distribution modular uh composing the knowledge pieces together the stochastic hard detention so there's causality also in that picture would love to talk to you about all of these but can you also just elaborate on what is out of distribution why is that a fundamental concept so so first of all it's a it's a practical problem in Industry when you train a system with a data set the data is being collected in a particular way maybe in some country and then you deploy the system in a different place different time and it kind of breaks down so that's that's a symptom and um as an example humans um are actually pretty good at a new setting like if you if you learned to drive in North America and then you rent a car in London it's the first time you drive on the left side of the road it's a challenge but you can survive it and you pay attention to what is going on you're generalizing out of distribution you're adapting also out of distribution so Auto distribution is you learn uh in in one city and you have to be able to transfer that and operate successfully in another city and that is a fundamental aspect of human level intelligence humans are able to somehow do this kind of thing take a leap into the unknown and well it's not completely unknown so the reason we are able to drive on the left side in London if you you know drove all your life in North America is because there are lots of things in common like the laws of physics are the same people are the same it's just like this one little thing that changed which is a you know one traffic Rule and our brain somehow has structured information so as to separate these pieces of knowledge so that we can now just change one thing which is this Rule and and somehow uh infer our way around it uh and then uh gradually uh retrain our habits so that we can do well in London as well so Jan uh Joshua laid uh some cool ideas out on the table inductive biases uh generalizing out of distribution what are your thoughts what's your vision for human level uh for our journey towards human level intelligence okay so Joshua uh focused on the list of uh problems that we need to solve and uh I agree with the list that that he mentions but I'm not I'm not focused on on Solutions actually um but but but first of all uh uh you know we we can clearly see that humans and animals can learn new skills or acquire new new new knowledge much much faster than any of the artificial systems that um that we built so far they're well conceived they can learn with fewer trials if it's a kind of a new scale they can you know they can learn with fewer examples if it if it you know consistent running new Concepts so um what kind of learning do humans and animals use that we are not currently be being able to reproduce in machines that's the big question I'm asking myself um what is it that allows a you know a teenager to learn to drive a car in about 15 or 20 hours of practice whereas you know even with sort of million of hours of training of in Virtual environments we can get uh cars to learn to divide themselves to the same degree of reliability so um so there is something that we're missing in a sort of current approaches to Ai and I think what's missing is the ability of of humans and animals to learn how the world Works to learn what I call World models I mean a lot of people are calling the according to this way so the the the the fact that um as Joshua mentioned you know a lot of physics don't change if you move from North America to uh to Britain uh and so you know when you turn the wheel to to the to the right the car is still going to Veer to the right and and the basic physics of momentum and everything is still going to be the same this is what allows the the teenager learning to drive to not have to try to run off a cliff to see what happens uh whereas uh a sort of naive uh AI system we'll have to actually run off the cliff to figure out that it's a bad idea and probably do it a few thousand times before it realizes how not to do it so um so that's that's that's what we're missing how do we get um machines to run World models to learn uh how the world Works mostly by observation to accumulate the enormous amounts of background knowledge that uh humans you know baby humans accumulate in the first few weeks and months of your life uh what if you got out like very basic things about the world the world is three-dimensional there are objects that are different from things in the background and there are objects that are static you know in animate um objects that seem to move objects which trajectory is predictable you know they fall when they're not supported things like that so you know we learn all those things in the first few months of life that in my opinion is what constitutes the basis for what we call Common Sense perhaps um and uh we don't know yet how to do this with machine but we have a few ideas like self-supervised running and and things of that type actually young I'm very focused on Solutions too I'm sure I know you are and and some of the things I talked about were sort of the uh early steps in thinking about the world's model you were talking about I agree it's it's very very Central but one of the things I believe is that this run model needs to be structured so what does it mean structure it means uh just like in the brain that knowledge is uh somehow decomposed into pieces that are as independent uh from each other as possible and there are good reasons why you want to do that from a vertical perspective that would help out of distribution and I'm currently thinking of new algorithms that are precisely allowed to do these kind of things so so for you your show modularity is fundamental let me ask let me zoom out a little bit so for constructing these kinds of world models for uh reasoning out of distribution can you do that with one one very big differentiable neural network and if so what properties does this neural network have maybe uh yawn take that one well uh the I guess the two big questions what is the Paradigm of learning that you have to use the second one is what's the architecture of the system that that we'll we'll learn this uh there's no question in my mind that that system will have uh very much in common with what we currently call Deep learning so it might be some giant big neural net that we train with gradient based type algorithm because that's pretty much the only weapon we have at the moment for this kind of problem that at least the only one that is efficient enough uh so so deep learning is part of the solution there is no question now in terms of concept for sort of learning paradigms uh I've been sort of a big advocate of self-supervised running and self-supervised running is nothing more than this idea that um uh you know you have a piece of the input some of which is currently observable and then there is another piece that is not currently observable maybe because it's covered by uh you know an object or something if it's Vision or perhaps because uh it's the future and you have to wait until uh for a bit in you know until you can see the future so training award model consists in looking at the past in the present and maybe remembering what the condition in the world is and then waiting for things to happen and then training your one model to predict well what just happened if you just took an action of course now your one model knows how to predict the next state of the world from the previous state of the world and the action you took so you have one of those causal models that uh Joshua was uh was was referring to um and and the the question is is uh is how you do that and and the the technical question under this is how you deal with the uncertainty and the prediction uh and how you deal with the fact that the the the level of abstraction of the representation of the world that you need to construct while doing so uh uh needs to be you know High it need to be needs to be kind of a high level of abstraction see so if uh you know if I want to predict what's going to happen next uh uh you know what you are going to do next um I I know maybe you might you know say a word or move your mouse in a particular way move your head in a particular way but you're not going to suddenly disappear uh you're not going to kind of you know teleport uh from one one place of the videos to another just just like that and your your face is not going to morph into something else right so there are like constraints in the physical world that you know I I know uh prevent those things from happening and that's basically the That's the basis of uh of Common Sense and uh which you know Common Sense is a collection of models of world models um so so what kind of self-supervisual name might allow us to do this um there's a few ideas about this it's probably not the kind of uh supervision thing that has been very popular until now where you you're directly trying to predict what's going to happen next in the the in the space of uh observations so you want to do video prediction for example uh you see a piece of a video and you train the system to predict the next uh frames in the video it's a very very hard problem because you have to you know reconstruct all the details of the pixels and everything and uh it's very likely to me that uh in my opinion that the type of architecture we need to build are things where the prediction doesn't take place necessarily at that level but takes place in sort of a higher level of abstraction where the useful information is present and the error irrelevant stuff isn't but the abstraction is stored within the same system it's not like somehow separate this idea of modularity that yosha is talking about is really interesting so maybe Yoshi you could you could talk about what is this big giant thing that achieves human level intelligence look like is it differentiable is it is this some discrete components what are these different uh is there a fundamental modularity or hierarchy that forms these uh high-level abstractions what do you think so if we again look at uh Human Condition um the thoughts that we have involve a discrete Choice among different Alternatives so if you see the negative one way you don't see it the other way it's not a much of different options and so there there is you know that probably corresponds to a discrete uh attention now uh that makes it a little bit more difficult to do end-to-end learning I think we can get around that and and I've many uh Solutions in my pocket for this uh but then locally in each module so that's the communication between module it involves discrete decisions but the um internal to each module could you know could be done fully and and I think um so it could be a mix of both um yeah what's missing in the the modularity question is an interesting one so I you know certainly the water model uh may have multiple modules but uh you know uh you don't need multiple modules really for the hierarchy not any more than you have in a multi-layer neural net where there is a hierarchy already of uh abstract Concepts uh but you need other modules than just a war model right you need uh first you need a module that configures the world model for the situation at hand if you have only one world model engine it needs to be configured for whatever task you are accomplishing uh you need some sort of cost module that um the system basically you know the behavior of the system basically is all directed towards optimizing that cost and part of that cost can be sort of hardwired cost things that you know compute pain and pleasure and things like this and then uh cause uh modules that are learned basically that Define sub goals uh perhaps and then you you know you need to have short-term memory to kind of maintain a estimate of the state of the world in the in the brain of uh mammals is the hippocampus retrograde even um and then you need some other module that figures out like what sequence of action should I do to optimize my costs given my word model uh you know you also need perception so all of those are modules I don't think the structure is uh you know a very necessarily very different from each other but but there is some sort of macro architecture of autonomous intelligence system that you need to devise so so I I disagree on one point here um I I think there may be a good reason why this modularity which is really the story behind the global workspace Theory from bars can help to make the right abstractions emerge and one way to think about it is uh like the way we program uh we where we use encapsulation so we we try to divide code into little independent pieces as independent as possible of course not completely independent because you these pieces need to communicate through the arguments and and return values but what information do you want to put in the if you constrain the bottleneck of the communication is going to be the most abstract aspects that have to do with uh you know what the minimal information that's needed for these experts to collaborate together and so um I I think the details of for example how a particular function is computed is it can be hidden inside each module but the communication bottleneck helps to force the emergence of these abstract Concepts which is what they exchange with each other so the constraint sir the constraints are feature not a bug so they force the abstraction yes yes communication yeah and and it's interesting because to to consider like why would Evolution put such a bottleneck in our brains and probably many other animals because like working memory of uh seven five or seven items it seems so small in fact other animals have a larger one uh and the brain is huge so it's it you know there must be evolutionary pressure that has verged to this constraint uh yosha and Jan you've also talked about the source of Consciousness in uh in the human mind and how it might be useful for human level intelligence in the machine and you talk about the constraints there that it somehow might emerge from the constraints um of the the architecture can you maybe talk about why this is something you think about and what is the source of Consciousness easy questions today what is the social Consciousness in the human mind and how that might be useful for our AI systems yeah sure uh well yeah it's hard it's okay so first of all it's a very much open question uh and lots of people would like to understand Consciousness and there are a number of competing theories about it now if I I have to make a bet about you know something that uh fits my understanding and the data I know about from from the brain and from machine learning um I I would say that the this the global workspace Theory which is the one that proposed this uh bottleneck idea is one important element but there's probably something missing from there and and again looking at the Neuroscience theories um there's another um a theory that's a bit more recent that I think could help with uh the uh illusion of Consciousness so the impression that there is a a a something different in the the fact that we're experiencing consciously something and it's not just uh computation it is all computation so it clearly is an illusion so the uh grajano's theory of um attention schema Theory this is what it's called is is saying there's a a another module besides the devices we'll be talking about um that is a a little model of uh attention of where are we going to put our attention next and it's able to plan that and and it's like a mini model of the rest of Cortex right and so because it has this so it's a little bit like a you know a homunculus right it's not a very good model of what we actually do but it it's good enough to help plan the proper sequence of attention uh choices and so that might give rise to that uh a Cartesian Duality which uh we seem to feel but but is is probably just a a side effect of this architecture yeah do you have thoughts on the Consciousness or maybe attention and it's this kind of its role in this kind of uh system yeah I think it goes Way Beyond attention but you know I have a bit of a kind of a strange opinion about about Consciousness which is actually not disconnected from what what Yeshua just said um so I think you know I I mentioned earlier that um you know most of our intelligence comes from our ability to predict and that comes from a world model but of course when we attend to a task the word model that we're using is the world model that is specific to the task at hand and that has to be uh because the world model is a very complex thing we only have one in our head we can we have one engine that allows us to you know predict what's going to happen in this situation which is why we can only attend to one task at any one time uh at least consciously and so that suggests that we only have one world model and is being configured by something okay that does executive decision uh digital homoculus Like A system that Yeshua was mentioning that essentially um is above that and configures all the other modules for the the task at hand and that gives us the illusion of uh of Consciousness right because there is this sort of you know meta Observer that that configures the rest of the brain for for to attend to a particular task and it doesn't uh just configure our world model it configures our perception system too right uh you ask people to uh uh attend to uh you know particular things going on in the scene and they they become blind to everything else that happens so um uh so so I think uh that configurator module I think that sort of configures the other ones to do a particular thing maybe it's the thing that gives us the illusion of Consciousness and so the interesting uh aspect of this is that Consciousness would not be then a consequence of the fact that we are smart but a consequence of the fact that our brand size is limited if we had an infinite sized brain then we could have dedicated one model for all the situations we can encounter and we wouldn't need a configurator to you know configure our model to the task at hand so you know it's Consciousness yeah so they just put a comment on the uh uh attention blindness experiments that that Jan talked about there's also a related kind of experiment that helps to understand uh that's the um Blind Side so in in the kind of experiment that Jan talked about where you you focus on something and you don't see things that are actually here uh actually part of your brain does see them they are at the unconscious level and you can actually act uh accordingly so um that's that's something weird like uh some people have that because of neurological problems that they they they say they don't see anything but they actually will do the right thing to pick up the glass and so on okay let me ask a big uh question for you you know looking back you're one of the most important one of the summer researchers in the field of artificial intelligence so you can look at the big scope uh think back to the 1990s uh how have you changed uh in terms of your view of what's required to achieve intelligence maybe your fashion choices and so on too music choices uh but just as an AI Visionary how is your view of what's required to achieve human level intelligence changed and how does that inform you about the coming Decades of the evolution of the field of AI maybe young can can you go first so the way I tend to operate is that uh uh you know I I may have in the back of my mind a very long-term goal you know making machines more intelligent understanding human intelligence and uh animal intelligence and then I try to work my way back um so you know we don't know any intelligent entity that doesn't that doesn't learn so learning must be a very important part of intelligence right so how do we do learning um now learning you know involves learning to represent things so it has to be hierarchical abstract um how can we do this you know running is always formulating in terms of optimization so we need some way uh of uh you know optimizing through gradient descent because that's the most efficient thing we we know how to do uh hierarchical architecture so that's what led to multi-layer neural Nets and deep learning and back propagation now working our way back now now we need to do to be able to do perception because an intelligent system needs to be able to estimate the state of the world and that's what perception is about so let's build a perception system to see if that idea of hierarchy works right that does convolutional Nets okay so now we are in the 80s and 90s okay and I'm working your way back so now we know how to do perception admittedly in a supervised Manner and you know we can we can see that uh uh humans and animals do not learn in a supervised manner they don't represent the World by observation as I was mentioning earlier so a bunch of us um after having kind of not solved but you know sort of past the stage of supervisory got interested into unsupervised learning or now something I call cell supervised running uh and Joshua and I and you know Jeff Hinton basically got together and and sort of discussed this um uh in the early 2000s so you know that's 20 years ago um where we you know form a little group where people were sort of interesting in uh similar problems on the way to trying to figure out how to train large neural Nets in an unsupervised manner we figured out how to train the national world that's in a supervised manner using gpus and that's what took off but really that's not what we were after we were after unsupervised learning and so after the first few years of you know exploiting the fact that we can use supervisor new with reduction on ads on GPU now we are back to the original problem you know how do humans and animals uh uh learn in a Cell supervised unsupervised manner um and and and you know what the reason we need this is not just to learn representations of the world for perception but also to learn predictive work model for planning uh reasoning uh Etc right and so that's that's the progression right so you have a long-term goal and then you can figure out like what is the first part of image I need to solve if I want to move forward towards that goal and the last uh at least my last 35 years basically have been sort of uh along along those lines um if not more than 35 actually time flies when you're having fun uh yeah sure how have your view of AI changed um what's required to achieve human level intelligence and maybe also what do you think the next few years and decades look like in terms of the growth and development of the field um I mean what has changed mostly is have matured and I know a lot more stuff uh which helps me supervise students better uh I think if I remember the way I was thinking in the 90s which is actually when Jan and I started to collaborate um I was very focused on a very small part of the field and um and now it's like lots of different aspects of research in Ai and machine learning and not just machine learning but you know even like classically I suddenly seem to fit better in a bigger picture including you know and also knowledge about neuroscience coxa and other things so uh I I feel much better equipped to do what Jan was also talking about and that is well reason uh my way into where we need to go next in order to capture all of these constraints coming from what we know about the brains and what we know from our experience in machine learning and AI broadly is is there just start to interrupt is there advice because you mentioned grad students is there a advice you can give to said grad students if they dream about pushing the field forward about what they should work on what are the exciting difficult um problems that might crack open this uh this problem of human level intelligence well that's what I'm working on uh what would you tell to yes I mean advice wanted to read the paper yes yes but um but yeah but but in general I would say more on the methodological side try to understand what you're doing so I think unfortunately a lot of students uh because it's easier we'll just use the concepts that are around in our community and do Plug and Play and uh do engineering with that which can be very useful but I think if you want to push the the frontier here you really need to ask the why questions all the time and and keep asking rather than try to beat the benchmarks I mean of course but that's going to come as a side effect right so focus on understanding and the science and the why questions and that's like the key to science in general yeah and do you have advice for grad students what uh how to take on this problem and these due to things that have changed in the last 90 you know the last uh 30 years or so or or more um it used to be you know in the early 90s late 80s when you're you know you're sure were working on neural Nets already there was a community you know that was sort of interested in kind of similar problems that that we could talk to and then that Community disappeared in the mid-90s and we were in a combination of fortunate and unfortunate situation unfortunate because few people were interested in what we were interested in but but fortunate because we could do things that nobody else could do and we were the only people in the world to do it whereas now um uh which is a very good thing is an enormous amount of people interested in the same stuff that we are interested in and they know more about our own stuff than we do okay so um there's a lot of collectively at least yeah I mean there's there's like you know uh thousands of uh you know students and researchers who know more about the latest greatest details of accommodational Nets than I do and it's fantastic like you know I really love that so uh that means uh if I want to contribute I need to move on right to kind of the The Next Step um and so what for for students like there are many different ways uh uh researcher a young researcher particularly in Aiko contributes and and some of them are uh you know applications of existing methods to new problems and and you know the current today's world doesn't like any supply of problems um so so that that's an interesting thing the second thing is uh devising new methods that you know improve benchmarks or or sort of known uh applications like like you know computer vision translation natural language understanding this kind of stuff um then there is sort of devising new uh uh principles or or algorithms or things like this and then there is you know framing a problem in a new way and the the problem is if you are young uh young researcher doing your PhD is that if you want a job at the end of your PhD you have to do things that may have an impact in a relatively short term an intellectual impact on necessarily a practical impact but at least an intellectual impact and doing the stuff that is like very ambitious you cannot afford to do this because you have to finish your PhD in you know five years if you're in North America three years if you're in Europe so um uh so it's a it's a trade-off right so you go for things that might be a little easier and short term because you know you need a job at the end uh UD Publications in your in your resume instead of the super ambitious stuff which you know only people like us who have tenure and you know have our own prizes can uh can spend on time it's a lot of time doing and try to convince others to work on it as well uh so find a balance because a little bit of the ambitious makes life exciting so let me um it's very possible that since we're talking remotely that I'm just a human-like Avatar and there's an AI chatbot behind uh that's generating the words I'm saying uh that's an interesting touring test question that we could talk about uh later but if you look at the um future of a world where there's AI systems that achieve human level intelligence what excites you about that world what are the is it the human connection with chat Bots and things like that or is it very specific applications what what is cool to you about this world uh maybe Yan you go first I think it's the amplification of human intelligence so the the fact that you know every human could do more stuff essentially be more productive more creative um uh perhaps uh spend their time on more fulfilling activities uh things like that which is really the history of technological Evolution right so I think uh that's that's the exciting part in my opinion and then you know of course there's going to be kind of you know specific things that people will get interested in like you know virtual assistants and they can talk to they can answer any question and and things like that and um uh and when you have you know when you have a difficult uh intellectual Challenge and problem you won't be alone to solve it you'll have you know AI assistant to uh help you with that with that so there's always the the here you know that technology is going to make us stupid or or weak but I don't believe in this at all like it's not like you know okay calculators has made us bad at mathematics on the contrary yeah they made us better yeah aside from making video games uh better uh what do you think will be exciting about um a world where we solve or begin to solve human level intelligence well I agree very much with the picture Jana has drawn but I will add some some things um that augmentation is going to help us become better at scientific discovery so that's that's where there's a positive feedback loop here right um we used to rely completely on human Minds to understand say the outcome of experiments and then propose new experiments make sense of the world and now we're building those machine learning tools that essentially are uh you know growing towards that that capability and that's going to accelerate the progress of science uh and and that could have profound positive impacts on on everyone so for example in healthcare like understanding better how cells work uh how cancer works right uh how uh viruses are able to get around our defenses so this is very complex and it it up to now it's been very slow in some sense to make sense of these hard questions whether it's in biology or astrophysics or whatever but we we we're building up tools that I think will uh change radically How We Do Science and how we discovered treatments and and everything like this yeah and of course you know modeling uh uh complex complex behavior of complex systems um things like you know materials um you know climate climate uh uh you know energy storage and batteries and uh you know production of hydrogen to store energy which would be a big big step towards uh solving climate change issues uh controlling plasma infusion reactors you know right so there's all kinds of things like this that I have the potential to you know solve uh you know big problems in the world uh and they're all kind of collective phenomena that uh so the the classical reductionist uh way of analyzing uh or modeling things doesn't quite work because they are complex Collective phenomena um so now we need what we need are kind of phenomenological models that we can that cannot be contained uh in our head cannot be contained in a few formula on the paper uh and and have to be sort of basically uh implemented by machines which you know will allow us to make predictions perhaps not understand to the same extent that we you know we can understand uh simple physical phenomena but um but that that will help us a lot for indeed for Progress for science I mean this isn't this is another idea the idea that uh you know analyzing data using machine learning and sort of what we now call AI uh will help with science uh you know started emerging you know something like 15 years ago certainly in in in uh genomics and things like this and uh I created a center for data science at NYU about 10 years ago for that for that reason for that purpose and of course let me let me let me just uh like give an example to Sure uh maybe make a bit more concrete what what Jana is is talking about and also connect with the earlier discussion about representation learning and self-supervised learning so think about what physicists and chemists have done when they have invented abstractions like pressure and temperature these are not things that exist at the low level you know of the physics um they are complete like you know inventions of our minds that happen to have really nice abstraction properties that you can describe phenomena at an abstract level using very few variables and and and predict things very reliably at least at the aggregate level so this is the sort of thing that up to now only human Minds could do but but I see that as learning the right abstractions uh such that at that level of representation Things become easier to model so the kinds of world models that Jan were talking about was talking about they need that kind of abstraction that kind of structure that's going to be discovered by by the learner so that suddenly things make sense it becomes much easier to explain lots of things when you introduce these high-level abstractions absolutely that's the you know the underlying uh engine of of intelligence is the construction of action that allows you to make predictions essentially and perhaps engineering such artificial intelligence systems will help us understand our own uh human mind further which has been something that humans have dreamed and drove towards for for Millennia so uh it is truly an honor Jan and Joshua to talk with you today this is uh something that AI systems of the future will look back at with admiration so thank you for your time today this is awesome thanks to you thanks Max always a pleasure [Music]
Info
Channel: M.G. Mashiku
Views: 5,118
Rating: undefined out of 5
Keywords:
Id: BaM5WlIGWOY
Channel Id: undefined
Length: 40min 2sec (2402 seconds)
Published: Fri Dec 16 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.