Yoshua Bengio: Deep Learning | Lex Fridman Podcast #4

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what difference between biological neural networks and artificial neural networks is most mysterious captivating and profound for you first of all there's so much we don't know about biological neural networks and that's very mysterious and captivating because maybe it holds the key to improving our differential neural networks one of the things I studied recently something that we don't know how biological neural networks do but would be really useful for artificial ones is the ability to do credit assignment through very long time spans there are things that we can in principle do with artificial neural nets but it's not very convenient and it's not biologically plausible and this mismatch I think this kind of mismatch may be an interesting thing to study to a understand better how brains might do these things because we don't have good corresponding theories with artificial neural Nets and B maybe provide new ideas that we could explore about things that brain do differently and that we could incorporate in artificial neural Nets so let's break created assignment up a little bit yeah what it's a beautifully technical term but it could incorporate so many things so is it more on the RNN memory side that thinking like that or is it something about knowledge building up common sense knowledge over time or is it more in the reinforcement learning sense that you're picking up rewards over time for a particular to achieve certain kind of goals so I was thinking more about the first two meanings whereby we store all kinds of memories episodic memories in our brain which we can access later in order to help us both infer causes of things that we are observing now and assign credit to decisions or interpretations we came up with a while ago when you know those memories were stored and then we can change the way we would have reacted or interpreted things in the past and now that's credit assignment used for learning so in which way do you think artificial neural networks the current LS TM the current architectures are not able to capture the presumably you're thinking of very long term yes so current recurrent Nets are doing a fairly good jobs for sequences with dozens or say hundreds of time stamps and then it gets harder and harder and depending on what you have to remember and so on as you consider longer durations whereas humans seem to be able to do credit assignment through essentially arbitrary times like I could remember something I did last year and then now because I see some new evidence I'm gonna change my mind about the way I was thinking last year and hopefully not do the same mistake again I think a big part of that is probably forgetting you're only remembering the really important things it's very efficient forgetting yes so there's a selection of what we remember and I think there are really cool connection to higher-level cognition here regarding consciousness deciding and and emotions like sort of deciding what comes to consciousness and what gets stored in memory which which are not trivial either so you've been at the forefront there all along showing some of the amazing things that neural networks deep neural networks can do in the field of artificial intelligence is just broadly in all kinds of applications but we can talk about that forever but what in your view because we're thinking towards the future is the weakest aspect of the way deep neural networks represent the world what is that what is in your view is missing so currently current state-of-the-art neural nets trained on large quantities of images or texts have some level of understanding of you know what explains those datasets but it's very basic it's it's very low-level and it's not nearly as robust and abstract in general as our understanding okay so that doesn't tell us how to fix things but I think it encourages us to think about how we can maybe train our neural nets differently so that they would focus for example on causal explanations something that we don't do currently with neural net training also one thing I'll talk about in my talk this afternoon is instead of learning separately from images and videos on one hand and from text on the other hand we need to do a better job of jointly learning about language and about the world to which it refers so that you know both sides can help each other we need to have good world models in our neural nets for them to really understand sentences which talk about what's going on in the world and I think we need language input to help provide clues about what high-level concepts like semantic concepts should be represented at the top levels of these neural nets in fact there is evidence that the purely unsupervised learning of representations doesn't give rise to high level representations that are as powerful as the ones we are getting from supervised learning and so the the clues we're getting just with the labels not even sentences is already very powerful do you think that's an architecture challenge or is it a data set challenge neither I'm tempted to just end it there in your library of course data sets and architectures are something you want to always play with but but I think the crucial thing is more the training objectives the training frameworks for example going from passive observation of data to more active agents which learn by intervening in the world the relationships between causes and effects the sort of objective functions which could be important to allow the the highest level explanations to to to rise from from the learning which I don't think we have now the kinds of objective functions which could be used to reward exploration the right kind of exploration so these kinds of questions are neither in the dataset nor in the architecture but more in how we learn under what objectives and so on yeah that's a afraid you mentioned in several contexts the idea is sort of the way children learn they interact with objects of the world and it seems fascinating because it's some sense except with some cases in reinforcement learning that idea is not part of the learning process in artificial neural network so it's almost like do you envision something like an objective function saying you know what if you poke this object in this kind of way would be really helpful for me to further yes further learn right right sort of almost guiding some aspect of learning right right so I was talking to Rebecca Saxe just an hour ago and she was talking about lots and lots of evidence for infants seem to clearly take what interest them in a directed way and so they're not passive learners they they focus their attention on aspects of the world which are most interesting surprising in in a non-trivial way that makes them change their theories of the world so that's a fascinating view of the future progress but Anna the more maybe boring a question do you think going deeper and large so do you think just increasing the size of the things that have been increasing a lot in the past few years will will also make significant progress so some of the representational issues that you mentioned that is they're kind of shallow in some sense Oh higher in a sense of abstraction up straight in a sense of abstraction they're not getting some I don't think that having more more depth in the network in the sense of instead of a hundred layers we have ten thousand is going to solve our problem you don't think so is that obvious to you yes what is clear to me is that engineers and companies and labs grad students will continue to tune architectures and explore all kinds of tweaks to make the current state of the Arts that he ever slightly better but I don't think that's gonna be nearly enough I think we need some fairly drastic changes in the way that we're considering learning to achieve the goal that these learners actually understand in a deep way the environment in which they are you know observing and acting but I guess I was trying to ask a question is more interesting than just more layers is basically once you figure out a way to learn through interacting how many parameters does it take to store that information so I think our brain is quite bigger than most neural networks right right oh I see what you mean oh I I'm with you there so I agree that in order to build neural nets with the kind of broad knowledge of the world that typical adult humans have probably the kind of computing power we have now is going to be insufficient so well the good news is there are hardware companies building neural net chips and so it's gonna get better however the good news in a way which is also a bad news is that even our state-of-the-art deep learning methods fail to learn models that understand even very simple environments like some Grid worlds that we have built even these fairly simple environments I mean of course if you train them with enough examples eventually they get it but it's just like instead of what instead of what humans might need just dozens of examples these things will need millions right for very very very simple tasks and so I think there's an opportunity for academics who don't have the kind of computing power that say Google has to do really important and exciting research to advance the state-of-the-art in training frameworks learning models agent learning in even simple environments that are synthetic that seem trivial but yet current machine learning fails on we've talked about priors and common-sense knowledge it seems like we humans take a lot of knowledge for granted so what what's your view of these priors of forming this broad view of the world this accumulation of information and how we can teach a neural networks or learning systems to pick that knowledge up so knowledge you know for a while the artificial intelligence what's maybe in the 80 there's a time or knowledge representation knowledge acquisition expert systems I mean though the symbolic AI was was a view was an interesting problem set to solve and it was kind of put on hold a little bit it seems like because it doesn't work it doesn't work that's right but that's right but the goals of that remain important yes remain important kind of how do you think those goals can be addressed right so first of all I believe that one reason why the classical expert systems approach failed is because a lot of the knowledge we have so you talked about common sense intuition there's a lot of knowledge like this which is not consciously accessible the lots of decisions we're taking that we can't really explain even if sometimes we make up a story and that knowledge is also necessary for machines to take good decisions and that knowledge is hard to codify in expert systems rule-based systems and you know Costco EAJA formalism and there are other issues of course with the old AI like not really good ways of handling uncertainty I would say something more subtle which we understand better now but I think still isn't enough in the minds of people there is something really powerful that comes from distributed representations the thing that really makes neural Nets work so well and it's hard to replicate that kind of power in a symbolic world the knowledge in in expert systems and so on is nicely decomposed into like a bunch of rules whereas if you think about a neural net it's the opposite you have this big blob of parameters which work intensely together to represent everything the network knows and it's not sufficiently factorized and so I think this is one of the weaknesses of current neural nets that we have to take lessons from classically I in order to bring in another kind of compositionality which is common in language for example and in these rules but that isn't so native to New Ulm Ed's and on that line of thinking disentangled representations yes so so let me connect with disentangled representations if you might if don't mind yes exactly so for many years I've thought and I still believe that it's really important that we come up with learning algorithms either unsupervised or supervised but or enforcement whatever that build representations in which the important factors hopefully causal factors are nicely separated and easy to pick up from the representation so that's the idea of disentangle representations it says transform the data into a space where everything becomes easy we can maybe just learn with linear models about the things we care about and and I still think this is important but I think this is missing out on a very important ingredient which classically AI systems can remind us of so let's say we have these design technologies invation you still need to learn about the the relationships between the variables those high-level semantic variables they're not going to be independent I mean this is like too much of an assumption they're gonna have some interesting relationships that allow to predict things in the future to explain what happened in the past the kind of knowledge about those relationships in a classically AI system is encoded in the rules like a rule is just like a little piece of knowledge that says oh I have these two three four variables that are linked in this interesting way then I can say something about one or two of them given a couple of others right in addition to disentangling the the elements of the representation which are like the variables in rule-based system you also need to disentangle the the mechanisms that relate those variables to each other so like the rules so the rules are neatly separated like each rule is you know living on its own and when I change a rule because I'm learning it doesn't need to break other rules whereas current your Mets for example are very sensitive to what's called catastrophic forgetting where after I've learned some things and then I learn new things they can destroy the old things that I had learned right if the knowledge was better factorized and and separated disentangled then you would avoid a lot of that now you can't do this in the sensory domain but a decent okay like an pixel space but but my idea is that when you project the data in the right semantic space it becomes possible to now represent this extra knowledge beyond the transformation from input to representations which is how representations act on each other and predict the future and so on in a way that can be neatly disentangled so now it's the rules or disentangle from each other and not just the variables that are disentangled from each other and you draw a distinction between semantic space and pixel like yes there need to be an architectural difference or well yeah so there's the sensory space like pixels which where everything is untangled the the information like the variables are completely interdependent in very complicated ways and also computation like the it's not just variables it's also how they are related to each other is is all intertwined but but I I'm hypothesizing that in the right high-level representation space both the variables and how they relate to each other can be disentangled and that will provide a lot of generalization power generalization power yes distribution of the test set he assumed to be the same as a distribution of the training set right this is where current machine learning is too weak it doesn't tell us anything is not able to tell us anything about how are you let's say our gonna generalize to a new distribution and and you know people may think well but there's nothing we can say if we don't know what the new distribution will be the truth is humans are able to generalize to new distributions how are we able to do that so yeah because something these new distributions even though they could look very different from the training solutions they have things in common so let me give you a concrete example you read a science fiction novel the science fiction novel maybe you know brings you in some other planet where things look very different on the surface but it's still the same laws of physics all right and so you can read the book and you understand what's going on so the distribution is very different but because you can transport a lot of the knowledge you had from Earth about the underlying cause and effect relationships and physical mechanisms and all that and maybe even social interactions you can now make sense of what is going on on this planet where like visually for example things are totally different taking that analogy further and distorting it let's enter a sign science fiction world to say Space Odyssey 2001 with hell yeah or or maybe which is probably one of my favourite AI movies and then then - and then there's another one that a lot of people love that it may be a little bit outside of the AI community is ex machina right I don't know if you've seen it yes yes but what are your views on that movie alright does it does are you able to wear things I like and things I hate so maybe you could talk about that in the context of a question I want to ask which is uh there's quite a large community of people from different backgrounds often outside of AI who are concerned about existential threat of artificial intelligence right you've seen now this community develop over time you've seen you have a perspective so what do you think is the best way to talk about a a safety to think about it to have this course about it within AI community and outside and grounded in the fact that ex machina is one of the main sources of information for the general public about AI so I think I think you're putting it right there's a big difference between the sort of discussion we oughta have within the AG community and the sort of discussion that really matter in the general public so I think the the picture of terminator and you know AI lose and killing people and super intelligence that's gonna destroy us whatever we try isn't really so useful for the public discussion because for the public discussion that things I believe really matter are the short-term and mini term very likely negative impacts of AI on society whether it's from security like you know Big Brother scenarios with face recognition or killer robots or the impact on the job market or concentration of power and discrimination all kinds of social issues which could actually some of them could really threaten democracy for example just to clarify when you said killer robots you mean autonomous weapons yes weapon systems yes I do not terminator that's right so I think these these short and medium-term concerns should be important parts of the public debate now existential risk for me is a very unlikely consideration but still worth academic investigation in the same way that you could say should we study what could happen if meteorite you know came to earth and destroyed it so I think it's very unlikely that this is gonna happen and or happen it in a reasonable future it's it's very the the sort of scenario of an AI getting loose goes against my understanding of at least current machine learning and current neural nets and so on it's not plausible to me but of course I don't have a crystal ball and who knows what a I will be in fifty years from now so I think it is worth at scientists study those problems it's just not a pressing question as far as I'm concerned so before continuing down the line a few questions there but what what do you like and not like about ex machina as a movie because I I actually watch it for the second time and enjoyed it I hated it the first time and I enjoyed it quite a bit more the second time when I sort of learned to accept certain pieces of it CC is the concept movie hi what was your experience wouldn't Laura your thoughts so the negative is the picture it paints of science is totally wrong science in general and AI in particular science is not happening in some hidden place by some you know really smart guy one person one person this is totally unrealistic this is not how it happens even a team of people in some isolated place will not make it science moved by small steps thanks to the collaboration and community of a large number of people interacting and [Music] all the scientists who are expert in their field Canon Oh what is going on even in the industrial labs its information flows and leaks and so on and and and the spirit of it is very different from the way science is painted in this movie yeah let me let me ask on that on that point it's been the case to this point yeah that kind of even if the research happens inside Google or Facebook inside companies it still kind of comes out like yes come on absolutely think that will always be the case so there I is is it possible to bottle ideas to the point where there's a set of breakthrough the go completely undiscovered by the general research community do you think that's even possible it's possible but it's unlikely unlikely it's not how it is done now it's not how I can foresee it in in the foreseeable future but of course I don't have a crystal ball and so who knows this is science fiction after all but but usually the ominous that the lights went off during during that discussion so the problem again there's a you know one thing is the movie and you could imagine all kinds of science fiction the problem wouldn't for me may be similar to the question about existential risk is that this kind of movie pain such a wrong picture of what is actual you know the actual science and how it's going on that that it can have unfortunate effects on people's understanding of current science and and so that's kind of sad is it an important principle in research which is diversity so in other words research is exploration resources explosion in the space of ideas and different people will focus on different directions and this is not just good it's essential so I'm totally fine with people exploring directions that are contrary to mine or look orthogonal to mine it's I I am more than fine I think it's important I and my friends don't claim we have universal truth about what well especially about what will happen in the future now that being said we have our intuitions and then we act accordingly according to where we think we can be most useful and where society has the most gain or to lose we should have those debates and and and and not end up in a society where there's only one voice and one way of thinking in research money is spread out so disagreement is a sign of good research good science so yes the idea of bias in in the human sense of bias yeah how do you think about instilling in machine learning something that's aligned with human values in terms of bias we and intuitively human beings have a concept of what bias means of what fundamental respect for other human beings means but how do we instill that into machine learning systems do you think so I think there are short-term things that are already happening and then there are long-term things that we need to do and the short term there are techniques that have been proposed and I think will continue to be improved and maybe alternatives will come up to take datasets in which we know there is bias we can measure it pretty much any data set where humans are you know being observed taking decisions will have some sort of bias discrimination against particular groups and so on and we can use machine learning techniques to try to build predictors classifiers that are going to be less biased we can do it for example using adversarial methods to make our systems less sensitive to these variables we should not be sensitive to so these are clear well-defined ways of trying to address the problem maybe they have weaknesses and you know more research is needed and so on but I think in fact they are sufficiently mature that governments should start regulating companies where it matters say like insurance companies so that they use those techniques because those techniques will produce the bias but at a costs for example maybe their predictions will be less accurate and so companies will not do it until you force them all right so this is short term long term I'm really interested in thinking of how we can instill moral values into computers obviously this is not something we'll achieve in the next five or ten years how can we you know there's already work in detecting emotions for example in images and sounds and texts and also studying how different agents interacting in different ways may correspond to patterns of say injustice which could trigger anger so these are things we can do in in the medium term and eventually train computers to model for example how humans react emotionally I would say the simplest thing is unfair situations which trigger anger this is one of the most basic emotions that we share with other animals I think it's quite feasible within the next few years so we can build systems that can take these kind of things to the extent unfortunately that they understand enough about the world around us which is a long time away but maybe we can initially do this in virtual environments so you can imagine like a video game we're agents interact in in some ways and then some situations trigger an emotion I think we could train machines to detect those situations and predict that the particular emotion you know will likely be felt if a human was playing one of the characters you have shown excitement and done a lot of excellent work with supervised learning but on a superbug you know there's been a lot of success on the supervised learning yes yes and one of the things I'm really passionate about is how humans and robots work together and in the context of supervised learning that means the process of annotation do you think about the problem of annotation of put in a more interesting way is humans teaching machines yes is there yes I think it's an important subject reducing it to annotation may be useful for somebody building a system tomorrow but longer-term the process of teaching I think is something that deserves a lot more attention from the machine learning community so there are people have coined the term machine teaching so what are good strategies for teaching a learning agent and can we design train a system that gonna be is gonna be a good teacher so so in my group we have a project called the baby I or baby I game where there is a game or scenario where there's a learning agent and a teaching agent presumably the teaching agent would eventually be a human but we're not there yet and the the role of the teacher is to use its knowledge of the environment which it can acquire using whatever way brute force to help the learner learn as quickly as possible so the learner is going to try to learn by it of maybe be using some exploration and whatever but the teacher can choose can can can have an influence on the interaction with the learner so as to guide the learner maybe teach it the things that the learner has most trouble with or just at the boundary between what it knows and doesn't know and so on so this is there's a tradition of these kind of ideas from other fields and like tutorial systems for example and they I and and of course people in the humanities have been thinking about these questions but I think it's time that machine learning people look at this because in the future we'll have more and more human machine interaction with a human in the loop and I think understanding how to make this work better all the problems around that are very interesting and not sufficiently addressed you've done a lot of work with language to what aspect of the traditionally formulated Turing test a test of natural language understanding a generation in your eyes is the most difficult of conversation but in your eyes is the hardest part of conversation to solve for machines so I would say it's everything having to do with the non linguistic knowledge which implicitly you need in order to make sense of sentences things like the Winograd schemas so these sentences that are semantically ambiguous in other words you need to understand enough about the world in order to really interpret probably those sentences I think these are interesting challenges for our machine learning because they point in the direction of building systems that both understand how the world works and it's causal relationships in the world and associate that knowledge with how to express it in language either for reading or writing you speak French yes it's my mother tongue it's one of the Romance languages do you think passing the Turing test and all the underlying challenges we just mentioned depend on language do you think it might be easier in front that is in English now is independent of language mmm I think it's independent of language I I would like to build systems that can use the same principles the same learning mechanisms to learn from human agents whatever their language well certainly us humans can talk more beautifully and smoothly in poetry some Russian originally I know poetry and Russian is maybe easier to convey complex ideas than it is in English but maybe I'm showing my bias and some people could say that about front half French but of course the goal ultimately is our human brain is able to utilize any kind of those languages to use them as tools to convey meaning you know of course there are differences between languages and maybe some are slightly better at some things but in the grand scheme of things where we're trying to understand how the brain works and language and so on I think these differences are a minut so you've lived perhaps through an AI winter of sorts yes how did you stay warm and continue and you're resurfacing stay warm with friends and with friends okay so it's important to have friends and what have you learned from the experience listen to your inner voice don't you know be trying to just please the crowds and the fashion and if you have a strong intuition about something that is not contradicted by actual evidence go for it I mean it could be contradicted by people but not your own instinct of based on everything you know of course of course you have to adapt your beliefs when your experiments contradict those beliefs but but you have to stick to your beliefs otherwise it's it's it's what allowed me to go through those years it's what allowed me to persist in directions that you know took time whatever all the people think took time to mature and you bring fruits so history of AI is marked with these of course it's mark with technical breakthroughs but it's also marked with these seminal events that capture the imagination of the community most recent I would say alphago beating the world champion human go player was one of those moments what do you think the next such moment might be okay surface first of all I think that these so-called seminal events are overrated as I said science really moves by small steps now what happens is you make one more small step and it's like the the drop that you know allows to that fills the bucket and and and then you have drastic consequences because now you're able to do something you were not able to do before or now say the cost of building some device or solving a problem becomes cheaper than what existed and you have a new market that opens up right so so especially in the world of Commerce and applications the impact of a small scientific progress could be huge but in the science itself I think it's very very gradual and where these steps being taken now so there's supervised right so if I look at one trend that I like in in in my community so for example in at me lie in my Institute what are the two hottest topics Gans and rain for spurning even though in the montreal in particular like reinforcement learning was something pretty much absent just two or three years ago so it is really a big interest from students and there's a big interest from people like me so I would say this is something where are we gonna see more progress even though it hasn't yet provided much in terms of actual industrial fallout like even though there's alphago there's no like Google is not making money on this right now but I think over the long term this is really really important for many reasons so in other words agent I would say reinforcement learning baby more generally agent learning because it doesn't have to be with rewards it could be in all kinds of ways that an agent is learning about its environment now reinforced learning you're excited about do you think do you think Gans could provide something yes some moment in in a well Gans or other generative models I believe will be crucial ingredients in building agents that can understand the world a lot of the successes in reinforcement learning in the past has been with policy gradient where you you'll just learn a policy you don't actually learn a model of the world but there are lots of issues with that and we don't know how to do model-based our rel right now but I think this is where we have to go in order to build models that can generalize faster and better like to new distributions that capture to some extent at least the underlying causal mechanisms in in the world last question what made you fall in love with artificial intelligence if you look back what was the first moment in your life when he's when you were fascinated by either the human mind or the artificial mind you know when I wasn't at the lesson I was reading a lot and then I I started reading science fiction there you go but I got that's that's it that that's that's where I got hooked and then and then you know I had one of the first personal computers and I got hooked in programming and so it just you know start with fiction and then make it a reality that's right Yoshio thank you so much for talking to my pleasure you

Info

Channel: Lex Fridman

Views: 93,989

Rating: undefined out of 5

Keywords:

Id: azOmzumh0vQ

Channel Id: undefined

Length: 42min 18sec (2538 seconds)

Published: Sat Oct 20 2018