AI DEBATE : Yoshua Bengio | Gary Marcus

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Anson Boucher funding chairman of Montpellier or participant tonight our professor Gary Marcus and professor Schwab NGO prefer Gary Marcus is a scientist best-selling author and entrepreneur Prosser Marcus has published extensively in neuroscience genetics linguistics evolutionary psychology and artificial intelligence and is perhaps the youngest professor emeritus at NYU he is the founder and CEO of hope STI and and the other of five books including the algebraic mind his newest book we're putting AI building building machines we can trust aims to shake up the field of artificial intelligence and has been praised by Noam Chomsky Steven Pinker and Gary Kasparov Russell urban view is a deep learning pioneer in 2018 Prosser Benji was a computer scientist who collected the largest number of new citations worldwide in 2019 he received jointly with Joe Fenton in the alayka the ACM Turing award the Nobel Prize for computing he is the founder and scientific director of Miller the largest university-based research group in deep learning in the world his ultimate goal is to understand the principle that lead to intelligence true learning this diagram show the architecture of that to lay on our network according to Trenton you are relatively simple processing element that are very loosely models of neurons they have connections coming in and each connection has a weight on it and that way it can be changed through learning deep learning use multiple layers of processing units to learn higher representations problem occurs things that expecting a monolithic architecture to handle abstraction and reasoning is unrealistic for servants you believe that sequential reasoning can be performed while staying in a deep learning framework open for the evening an opening statement by Gary Marcus and by your shop and you follow it by response an interview with your shop NGO and Gary Marcus then our guest we take question from the audience ahead Milla follow it by question from the international audience this AI debate is a Christmas gift from Montreal AI to the International AI community the ashtag for tonight's event is AI debate Montreal AI is grateful to Miller and open to the collaborative Montreal AI ecosystem that being said we will start with the first segment from so Marcus you have 20 minutes for your opening statement [Applause] and of course the a/v doesn't work hang on we do it from this this one and that will be fine that's right before we started yeah so and I were chatting about how AI was probably gonna come before AV and he made some excellent points about his work on climate change and how if we could solve the AV problem it would actually be good thing for the world are we good on sound so this was yeah I'll show it I last week at nur ops Naropa party having a good time I hope we will have a good time tonight I don't think either of us is out for blood but rather for truth an overview of what I'm gonna talk about today I'm gonna start with a bit of history in a sense of where I'm coming from I'm gonna give my take on Yahshua's view which i think is actually more agreements than disagreements but I think the disagreements are important and we're here to talk about them and then my prescription for going forward the first part is about how I see AI deep learning and current machine learning and how I got here it's a bit of a personal history of cognitive science and how it feeds into AI and you might think of it as what's a nice cognitive scientist like me doing in a place like Mila so here's an overview I won't go into all of it but of some of the things that I've done that I think are relevant to AI an important point is I'm not a machine learning person by training I'm actually a cognitive scientist by training my my real work has been in understanding humans and how they generalize and learn and I'll tell you a little bit of about that work going back to 1992 and and a little bit all the way up to the present but first I will go back even a little bit before to a pair of famous books that people have called the PDP Bibles not everybody will even know what PDP is but it's a kind of ancestor to modern neural networks Vince showed one yasha we'll be talking about many and the one I have on the right is a simplification of a neural network model they tried to learn the English past tense and this was part of a huge debate in these two books I think the most provocative paper certainly the one that is stuck with me for 30 years which is a pretty impressive to have a paper stick with you for that long was a paper about children's over regularization errors so kids say things like braked and goad some of the time I have two kids I can testify that this is true and was long thought to be an iconic example of symbolic rules so that you'd read any textbook up to 1985 and it would say children learn rules for example they make these over regularization errors and what Rumble how humble heart and McClellan showed brilliantly was that you could get a neural network to produce this output without having any rules in it at all so this created a whole field that I would call a limit of connectionism which was using neural networks to model cognitive science without having any rules in it and a so called great past tense debate was born from this and it was a huge war across the cognitive sciences by the time I got to graduate school it was all that a lot of people wanted to talk about on the one hand up until that point until that paper most of linguistics and cognitive science was couched in terms of rules so the idea was you learn rules like a sentence is made of a noun phrase and a verb phrase so if you've ever read any of Chomsky a lot of Chomsky's early work looked like that and most AI was also all about rules so expert systems were mostly made up of rules and here Rama Hart McClelland argued we don't need rules at all forget about it even a error like break might in principle they didn't prove it but they showed him principle might be product of a neural network where you have the input on the bottom the output on the top you tuned some connections over time might in principle give you generalizations that look like kids were doing on the other hand they hadn't actually looked at the actual empirical data so I trembled myself off to graduate school to work with Steve Pinker at MIT and what I looked at was these errors I did I think the first big data analysis of language acquisition or one one of the first ones writing shell scripts on on UNIX spark stations and he looked at eleven and a half thousand child utterances and the argument the Pinker and I made was that neural nets weren't making the right predictions about generalization over time and particular verb x' and so forth if you care there's a whole book that we wrote about it and what we argued for was a compromise we said it's not all rules like Morris Halle who was on my thesis committee like to argue and we said it wasn't all neural networks like Rommel Hart and McClellan didn't say it was a hybrid model we said best captured the data a rule for regulars so walk gets inflected as walked if you add this EDF to the past-tense neural networks for the irregular so this is why you say sing sang but might generalize it to split blank that sounds similar and then the reason children made over regularization errors we said as the neural network didn't always produce a strong response if you had a verb it didn't sound like anything you'd heard before you'd fall back on the rule so that was the first time that I argued for hybrid models back in the early 1990s in 1998 or even a little bit before I started playing a lot with the network models they've been a lot written about it but I wanted to understand how they worked and so I started implementing them trying them out and I discovered something about them that I thought was really interesting which is people talked about them as if they learned the rule in the environment but they didn't really always learn the rule at least not in the sense that a human being might so here's an example if I taught you the function f of x equals x or you can think of x equals y plus euro or different ways to think about it so you have the input like zero one one zero the binary number and your output is the same thing and you do this on a bunch of cases the neural network learn something but it also makes some mistakes so if you give it an odd number which is what I have there at the bottom after giving it only even numbers it doesn't come up with the answer that a human being would and so I describe this in terms of something called a training space so let's say the yellow examples are the things that you've been trained on and the green ones are things that are nearby in space to the ones you've been trained on the neural networks generally did really well on the yellow ones and not so well on the ones that were outside the space so near perfect at learning specific examples good generalizing within a cloud of points around that and poor generalizing outside that space I wrote it up in cognitive psychology after having some battles with reviewers we could talk about that some time later and the conclusion was that the class of eliminative connectionist models that is currently popular couldn't learn to extend universals outside the training space in my view this is the thing that I am most proud of having worked on some details for later this led me to some work on infants and what I tried to argue is that even infants could make these kinds of generalizations that worse timing the neural networks of that day so it was a direct deliberate task to the outside of the training space generalization by human infants so the infants would hear sentences like Lottie T and Ghana nah I read these to my son yesterday he thought they were hilarious he's seven or almost seven and then we tested on new vocabulary so there would be sentences like whoa Feifei or whoa whoa face so one of those has the same grammar as the kids had seen before and the other had a different grammar because all the items were new you couldn't use some of the more statistical techniques that people thought about like transitional probabilities and it was a problem for early neural networks the conclusion was infants could generalize outside the training space even where many neural networks could not and I argued these should be characterized as learning algebraic rules and it's been replicated a bunch of times and it led to my first book which was called the algebraic mind the idea was humans can do this kind of abstraction I argued that there were three key ingredients missing for multi-layer perceptrons the ability to freely generalize abstract relations as the infants were doing the ability to robustly represent complex relationships like the complex structures of a sentence and a systematic way to track individuals separately from kinds we will talk about the first two today probably not the third and I argued that this undermine and a lot of attempts to use multi-layer perceptrons as models of the human mind I wasn't really talking about AI I was talking about cognition such models I argued simply can't capture the flexibility and power of everyday reasoning and the key components of the thing I was defending which I would call symbol manipulation I didn't invent it but I tried to explicate it and argue for it our variables instances bindings and operations over variables so you can think in algebra you have a variable like X you have an instance of it like two you bind it so you say right now x equals two or my noun phrase equals the boy and then you have operations over variables so you can add them together or you can put them together concatenation if you know computer programming you can compare them and so forth whoops we got lost there that happen together these mechanisms provided a natural solution to the free generalization problem so computer programs do this all the time you have something like the factorial function if you've ever taken computer programming it automatically generalizes to all instances of some class let's say integer once you have that code pretty much all of the world's software takes advantage of this fact and my argument from the baby-baby data was the human cognition appeared to do so as well innately the subtitle of that first book and you can't see it that well here was integrating connectionism and cognitive science I wasn't trying to knock down neural networks and say forget about it I was saying let's take the insight of those things they're good at learning but let's put it together with the insights of cognitive science with a lot of which have been about using these symbols and so forth and so I said even if I'm right that symbol manipulation plays an important role in mental life that doesn't mean we shouldn't have other things in there too like multi-layer perceptrons which were the predecessors of today's deep learning I was largely ignored I think in in candor in until around a year or so ago people I think started paying attention to the book again but it did inspire a seminal book on neuro symbolic approaches which I hope some people will take a look at called neuro symbolic cognitive reasoning and I'm gonna try to suggest it also anticipated some of yahshua was current arguments I stopped working on these issues I started looking at innate Mis I learned to play guitar that's a story for another day and didn't talk about these issues at all until 2012 when deep learning became popular again the front page story the New York Times about deep learning and I thought I've seen this movie before and I was writing for The New Yorker at the time and I wrote a piece and I said realistically deep learning is only part of the larger challenge of building intelligent machines such techniques lack ways of causal relationships will have interesting discussion about that today they have no obvious ways of performing logical inference there's still a long way from integrating abstract knowledge and I once again argued for hybrid models deep learning as just one element in a very complicated set of machinery then in 2018 deep learning got more and more popular but I thought people were missing some important points about it and so I wrote a piece I was actually here in Montreal when I wrote it they was called deep learning a critical appraisal to outline ten problems for deep learning I think it was on the suggested readings for here and the failure to extrapolate beyond this pace of training was really at the heart of all of those things I got a ton of flack on Twitter you couldn't go back and search and see some of the history I felt like I was often misrepresented as saying we should throw away deep learning which is not what I was saying and I was careful in the paper to say in the conclusion despite all the problems I have sketched I don't think we need to abandon deep learning which is the best technique we have for training neural networks right now but rather we need to reconsolidate ntral conclusions of my academic work included the value of hybrid models the importance of extrapolation of compositionality of firing and representing relationships causality and so forth part two Yahshua's some thoughts on his views how I think they've changed a bit over time a little bit about how I feel misrepresented and how our views aren't are not similar the first thing I want to say is I really admire Yahshua for example I wrote a piece recently skewering the field for hype and I said but you know a really good talk is one by yoshua bengio a model of being honest about limitations I also love the work that he's doing for example on climate change and machine learning I really think he should be a role model in his intellectual honesty and his instant in his sincerity to make the world a better place my difference is with him are mostly about his earlier views we first met here in Montreal five years ago and at that time I don't think we had much common ground I felt like he was putting too much faith in black box and deep learning systems he relied too heavily on larger data sets to yield answers and he'll talk about system 1 and system 2 later I guess I will as well felt like it was all in the system one side and not so much on the system two side and I went back and talked to some friends about that a lot of people remember talk he gave in 2015 to a bunch of linguist who didn't like Joshua's answers to questions like how would we deal with negation or quantification words like every and they felt like what Joshua mostly did was to say well we just need more data and the network will figure it out and I gotcha we're we're still in that position I don't think he is I think we'd have a longer argument recently however yasha was taken a sharp turn towards many of the positions that I've long advocated for fundamental knowledge indented limits on deep learning need for hybrid models the in critical importance of extrapolation and so forth I have some slides and camera shots that I took at his recent talk at nur ups but I think actually shows some really interesting convergence here so disagreements now I'll talk about my position the right way to build hybrid models in a tennis this is unit significance of the fact that the brain is a neural network and what we mean by compositionality and that's it I think we actually agree about most of the rest the first ones the most delicate but I think occasionally Yahshua has misrepresented me as saying look deep learning doesn't work he said that's the I Triple E spectrum I hope I've persuaded you that that's not actually my position that I think deep learning is very useful I don't think it solves all problem the second thing is his recent work has really nailed what I think is the most important point which is the trouble deep Nets have in extrapolating beyond the data and why that means for example we might need hybrid models I would like frankly for him to cite me a little bit I think not mentioning the D values my contributions a little bit and further misrepresents my background in the field what kind of hybrids should we seek I think the Yasha was very inspired by Daniel Kahneman's book about system 1 and system 2 I imagine many people in the crowd have read it you should if you haven't and that talks about one system that's intuitive fast unconscious and other that's low logical sequential unconscious I actually think that's a lot like what I've been arguing for all along we can have some interesting discussion about the differences there are questions are they even different are they incompatible how could we tell I want to remind people of what I think is one of the most important distinctions drawn in cognitive science which is by the late David Marr who talked about having computational algorithmic and implementational levels so you could take sound abstract algorithm or notion like I'm gonna do a sorting algorithm you could predict a particular one like the bubble sort and then you could make it out of neurons you could make it out of silicon you could make it out of tinker toys I think we need to remember this as we have these conversations so we want to understand the relation between how we're building something what algorithm is being represented I don't think gosh was actually made that argument yet maybe he will today and I think that's what we would need to do if we wanted to make a strong claim that a system doesn't implement symbols and joshua has been talking a lot lately about attention I think that what he's doing with attention reminds me actually of a microprocessor in the way that it pulls things out of a register moves them into a register and so forth and so in some ways it seems as if it behaves at least a lot like a mechanism for storing and retrieving values of variables from registers which is really what I care have cared about for a long time then I've seen some arguments from Yahshua again symbols here's something in an email he sent to a student he said what you're proposing as a neuro symbolic hybrid doesn't work this is what generations of AI researchers are tried for decades and failed I've heard this a lot not just from Yahshua but I think it's misleading the reality is that hybrids are all around us the one that you use the most probably is Google search which is actually a hybrid between knowledge graph which is classic symbolic knowledge and deep learning like a system called Bert some people will know alpha zero that is the world champion or was until recently world champion of goats just you have been succeeded by Weber because you have five whole minute five morning's alpha zero is also a hybrid open a eyes Rubik's Cube solver is a hybrid there's great work by Josh Tenenbaum and Joon mouth it's also a hybrid that just came out this year another argument that Yahshua was given is that lots of knowledge isn't conveniently represented with rules it's true some of its not conveniently represented with rules and some of it is and again google search is a great example where some is represented as rules and some is not and it's very effective third argument is the difference that we might have and I don't fully know Yahshua's view is about nativism so as a cognitive development person I see a lot of evidence that a lot of things are built into the human brain I think we are born to learn and we should think about it as nature and nurture rather than nature versus nurture and I think we should think about innate frameworks for things like understanding time and space and causality as conte argued for in the critique of Pure Reason & Spell ki has argued for in her cognitive development work and the argument that I've made in the paper on the left is that richer innate priors might help artificial intelligence a lot machine learning has historically typically avoided nativism of this sort and as far as I can tell yah Xu is not a real fan of nativism and not totally sure why here's some empirical data showing that nativism in neural networks works it comes from a great paper by Yann laocoon in 1989 where he compared four different models and the one that had more innate miss and the terms of a convolution prior for those who know that what that is were the ones that did better this is just very quickly a picture of a baby ibex climbing down a mountain I don't think anybody could reasonably say that there's nothing innate about the baby ifx it has to be born with an understanding of the three-dimensional world and how it interacts with it and so forth in order to do the things that it does so nativism is plausible in biology and I think we should use more of it in that way I some of you may know as I turn to brain and neural networks that there was actually a cartoon about this debate by Dilip George it's worth looking up on Twitter and in the cartoon version of the debate Yahshua wins by saying your brain is a neural network and everybody was wow I guess Joshua was right after all and Yahshua did at least half and Jess make a similar argument to me on Facebook when he said your brain is a neural net all the way of course deep neural networks aren't really much like brains I've been arguing that for a while they're there many cortical areas many neuron types many different proteins in different synapses and so forth and so on I actually heard Yahshua would make essentially the same argument at nerves last week and so I think we probably pretty much agree about that he made a beautiful argument about degrees of freedom in particular they're loved but the critical question is really what kind of neural network is the brain so going back to Mars distinction you could build anything you want at any computation you want at if tinker toys are out of neurons we really want to know whether the brain is a symbolic thing at the algorithmic level or not and then we ask well how is that implemented in in neurons so simply knowing that the brain is a network made of neurons doesn't actually tell us that much we really want to know what kind of network it is there's another argument people say well symbols aren't biologically plausible I think this is a ridiculous argument when my son learned long division last week and followed an algorithm he was surely manipulating symbols we do at least some symbol manipulation some of the time and back in the 80s people knew this and they said well symbols are the domain of conscious rural processing they're just not what we do unconsciously Pinker and I said well language isn't that conscious and we use symbols in in language - the real question is not whether the brain is a neural network it's how much of it involves symbolic as opposed to other processes even if the brain never manipulated symbols which is counter factual to our world why exclude them from AI we can't prove that they're inadequate they have proven utility most of the world's computer code is written in it and so forth and Watts importantly lots of the worlds distilled knowledge comes in the form of symbols so you know everything in Wikipedia is symbolic we'd like to be able to use that in our machine learning systems five compositionality Yahshua has been talking a lot about compositionality and I think he will tonight I think he means something different than I mean by it so I'll let him give his description later but I think it's partly about putting together different pieces of networks and so forth I'm really interested in the linguist sense which is how you put different parts of sentences together into larger holes here's a good example last week my friend Jeff Clun I've been encouraging him to come to UBC and encouraging UBC to hire him from a job and my friend Alan Mackworth said good news Jeff Clun accepts and I wrote back and said awesome he told me it was imminent but swore me to summer we have 50 second 30 seconds can I have two extra minutes and fortunately no I'm not gonna be able to do the recursion it's you to moment not to everyone the gentleman from Montreal yields me two minutes and so so I said yep Alan said yep I knew that you knew and eventually we get to everyone in this room now knows that Alan knew that Garry knew that Jeff was going to accept the job at UBC I don't think we can represent that in today's neural network so the we can barely get a system to represent the difference between eating rocks and eating apples and this famous quote you can't cram the meaning of the entire effing sentence into a single vector I think still stands compositionality is not just about language so it's also about learning different concepts and putting them together in different ways here's my kids inventing a new game ten minutes later they've combined things that they know children can learn something in a few trials and we haven't figured out how to do that yet synthesis what I hope people will take away from this conclusions the biggest takeaway from this debate should be about the extent to which two serious students of mind and machine have converged we agree that big data alone won't save us we agree that pure homogenous multi-layer perceptrons on their own won't be the answer we both think everybody's going forward to be working on the same things compositionality reasoning causality hybrid models extrapolation beyond the training space and we agree that we should be looking for systems that represent more degrees of neural freedom respecting the complexity of the brain at the same time I hope to have convinced you that see symbol manipulation deserves a deeper look google search uses that maybe you should too that we've rejected it prematurely that hybrid neuro symbolic models are actually thriving and there's nothing more than prejudice holding us back from embracing more innate Ness the real action and compositionality is understanding complex sentences and ideas in terms of their parts AI has had a lot of waves of things that come and go in 2009 deep learning was down and out a lot of people dismissed it I have a friend who saw geoff hinton give a talk when only one person came our poster excuse me um luckily Ben geo lacunae Hinton kept plugging away despite resistance I hope people doing symbols will keep plugging away here's my prediction in my last slide when Yahshua applies his formidable model building talents which I envy to models that acknowledge and incorporate explicit operations over variables magic will start to happen thank you very much thank you for some of this [Applause] militarily we saw Sabrina it's not what pushes items like what sort of a cake business flea you literally care of it presenters you Oh now each keynote my poster - okay let's go all right so welcome to this debate and thanks Marcus for sending up and talking first so I I took a lot of notes so the main points I want to make I want to talk about out of distribution generalization which is connected to some of the things that Marcus talked about which i think is is more than the notion of extrapolation I'll get back to that I want to talk about my views on how deep learning might be extended to dealing with system to computational z-- computational capabilities rather than taking the old techniques and combining them with assisted with neural Nets I want to talk briefly about attention mechanisms and why these may provide some of the key ingredients that Gary has been talking about that make symbolic processing able to do very interesting things but how we can do it within a neural net framework and yeah and then contrast that with some of the more symbolic approaches so so I want to get out of the way a few things about the term deep learning because there's a lot of confusion and especially when deep learning is a strawman it tends to be used to mean MLPs from 1989 just like Gary used the term just a few minutes ago if you open the last Europe's proceedings you'll see that it's much more than that so deep learning is really not about a particular architecture or a particular even a particular training procedure it's not about backdrop it's not about covenants or an ends or MLPs it's something that's moving it's more of a philosophy that's expanding as we add more principles to our toolbox to understand how to build machines that are inspired by the brain in many ways and use some form of optimization usually a single objective but sometimes multiple objectives like in Ganz in general there's a coordinated optimization of multiple parts and taking advantage of some of the early 80s from the 80s of course like this district representations but also more modern ideas like my depth of representations taking advantage of sharing computations and representations that comes tasks environments enabling multitask learning transfer learning learning to learn and so on and as I will argue I think with the tools to move forward include things like reasoning search inference and causality and to connect to neuroscience because Gary mentioned that there's actually a very rich set of work happening in the last few years connecting again the modern deep learning research with neuroscience we had a paper just published in Nature Neuroscience called deep learning framework for neuroscience but I won't have time to talk about it today so out of this visualization this means something different from the normal formalization where we have data from one distribution and we worry about joan izing two examples from the same distribution when we talk about extrapolation Gary it's not clear whether we're talking about generalizing to new configurations coming from the same distribution so you have to think about the notion of distribution to be able to make a difference for agents in the world this is very important because the what they see changes in in nature because of interventions of agents because of moving in time and space and so on and what I've been arguing for a little bit now suddenly much less than Gary is the importance of compositionality but one of the things I have done in the 2000s is try to help figure out why even can't neon that's and the ones from the 80s with the presentations have a powerful form of compositionality I'm not going to go into details of that but this dates to about five years old and similarly why composing layers brings in the form of compositionality so basically my argument is we have these two forms already in your meds we can incorporate the form that Gary likes to talk about and I like to talk about these days which is inspired a lot by the work of linguists but but I think it's more powerful and more general than just about language and something we use in conscious reasoning for example basically what it is about is how one might combine existing concepts in ways that may have zero probability under the training distribution not just it's not just that it's a novel pattern is that that the one that may be unlikely under the kinds of distribution Ares in and yet our brain is able to come up with these interpretations these novel combinations and so on and Europe's I gave this example of driving in a new city where you you have to be a little bit creative and combining the skills you know in in novel ways in order to solve a difficult navigation problem now this issue is not new in deep learning I mean in the sense that people have been thinking about at least for a few years and actually would say it's one of the hardest areas that in deep learning and we haven't solved it but the thing people are starting to understand it better and one of the ingredients which find others have been thinking is crucial in in this exploration is attention so so attention is interesting interesting because it changes very nature of what the standard neural nets can do in many ways it creates dynamic connections that are created on the fly based on context so it's even more context but in a way that can favor the sort of what would gary calls free generalization or something like this that I think is important in in language and in conscious processing so so why is that so attention selects an element from a set of elements in the lower layer and it sends the selected element in a soft way at least the soft attention time that we do it in deep learning typically and so the receiver gets vector but it doesn't know where that vector comes from and so in order to really do a good job it's important for the receiver to get information not only about the value that is being sent but also where it comes from and the where is sort of a name now it's not like a symbolic name it's we use vectors what we call keys and transformers for example and you can think of these as the neon that form of reference because that information can be passed along and be used again to match some element to some other element to form a firm for their attention operations so so this also changes in walnuts from vector processing machines to set processing machines which is something Gary talked about in his whole year interventions and that I think that is important for for conscious processing so I been talking a lot about consciousness in the last couple of years this is of course a much richer volume of research in cognitive neuroscience about consciousness and the way that I'm trying to look at this is how we can frame some of the things that have been discussed in cognitive science and neuroscience about consciousness and about other aspects of high-level processing and frame them Pryor's either structural or regularizer x' for building different kinds of neural nets so so one of these prizes is what I call the consciousness planner it it's implemented by attention which selects a few elements of an unconscious state into a lured into a smaller conscious state and in terms of priors what it means is that instead of knowledge being in a form where every variable can interact with every variable well what this would entail is that at this high level of representation there's a sparser form of dependency structure meaning that there are these dependencies which you can think of like a sentence like if I drop the ball it will fall on the ground which relate only a few variables together now of course each concept like ball can be involved in many such sentences and so there are many dependencies that can be attached to a particular concept but each of these dependencies is itself sort of sparse involves few variables so we can just represent that in in machine learning as a sparse graphical model as far as factor graph so that's one of the priors and the reason why it's such a prior would be interesting is that it's something we desire for the kinds of high level variables factors that we communicate with language so there's a strong connection between these notions and language the reason being that the things we do consciously we are able to report through language whereas the things we don't do consciously that are going you know below the level of consciousness we can't report and presumably there's a good reason for this because it's just too complex to be put in a few simple words but but what's interesting is that if we can put these kinds of priors on top of the highest level of representations of our neural nets then it will increase the chances of finding the same source of representations that that people use in language so I call them semantic factors another prior that I've been talking about has to do with causality and changes in distribution because remember I started this discussion by how do we change our ways in improve our deep nets so that they can be more robust changes in distribution there's a fundamental problem with changes in distribution which is that if we let go of the iad hypothesis that the test data is the same distribution as a training data then we have to add something else all right this is something fundamentally important in order to cope with changes in the solution otherwise the the nudist region could be anything right so we have to make some sort of assumptions and and I presume that if the goofiness put these kind of assumptions in in human brains and probably animal brains as well to make us better equipped to deal with those changes of distribution and so what I'm proposing as a prior here and and really inspired a lot by the work of people like shop Goff and Peter's and others in causality is that those changes are the result of an intervention on one or a few high level variables which we could we can call causes so there's this prior that some many of the high level variables that I'm talking about are causal variables nothing is they can be caused or they could be effect of something or they related to how a cause changes causes an effect and the Assumption here is that the change is localized right it's not that everything changes when the description changes if I close my eyes like here or I put some dark glasses there's only one bit that changed just one variable change this value right and and we can exploit this assumption in order to learn representations that are more robust to changes in distribution this is what I talked about it when your X presentation and we can exploit that by introducing a meta learning objective that says better representations of knowledge have this property that when the distribution changes very few of the parts of the model need to change in order to account for that change and so they can adapt faster they can have what's called smaller sample complexity they need less data in order to adapt to the change another thing that we have explored is related to modernization and systematic realization is the idea that we're going to dynamically recombine different pieces together different pieces of knowledge together in order to address a particular current input so we have a recent paper called recurrent independent mechanisms which is one first stab at that and I'm not gonna go through the whole thing but some of the main ideas is that we have a recurrent net it's broken down into smaller recurrent Nets which you can think of different modules which we call independent mechanisms they have separate parameters they're not fully connected to each other and so the number of free parameters is much less than n rec in the regular big recurrent net instead they communicate through a channel that uses attention mechanisms such that they can basically only send these these these named vectors these key value pairs in a way this that makes it more plug-and-play that the same module can take as input the output coming from any module so long as they speak the right language that they fill the right slots if you want to think in a symbolic sense but it's all vectors and it's all trainable bilateral and there's also a notion of sparsity of which modules get selected in the spirit of the global work space Yury which is which comes from cognitive neuroscience alright so let me list a few of these priors I've already mentioned a couple and others I had not really time to mention so the the consciousness prior the idea that the joint distribution of the high-level factors is a sparse paragraph another one I didn't talk about but of course has nice analogues in classical good old-fashioned AI and rules is the dependencies that have been talking about are not dependencies defined on instances it's not like there's a wolf or my cat and and and my cat food there's a general rule that applies to cats and cat food in general right and so we do these kinds of things of course a lot in machine learning is you know in in graphical models these date back to even the convolutional nets and and dynamic Bayes nets which share parameters and so so something like this needs to be there as well at the representation of the dependencies between the high level factors I mentioned the prior that many of the factors at the high level need to be associated with some causal variables or how causal variables interact with other causal variables and in the same spirit and I didn't have to talk about it because it's really a whole other talk but very closely related to this subject agency so we are agents we intervene in our environment this is closely connected to the causality aspect and the high level variables if you look at the ones we manipulate with language often have to do with agents objects or actions which mediate the relationship between agents and objects and and there are a few papers already in the deep learning literature trying to use these priors to encourage the high-level representations to have those sort of properties and of course when you start doing things like reinforcement learning and especially look at intrinsic rewards and enforcement running these are concepts that come very handy or sub and you you have five more minutes plus two it's already done okay so then there's this other prior already mentioned the idea that the changes in distribution arise from localized causal interventions and finally one that is connected to this one but it's different and has been explored by my colleagues for example the on bo2 and Martinez offski and others before them is the idea that some of the pieces of knowledge that the high level or even at the lower level correspond to different time scales there are things about the world that changed quickly there are things that are very stable right so there's like general knowledge that we're gonna keep for the rest of our life and there are aspects of double that kaanchi change we learn new faces we learn new tricks so this is something that fits well with the meta learning framework where you have fast learning inside slow learning but I think this is another important piece of the puzzle all right now how is that related and potentially different from the symbolic and I program well you would like to build in some of the functional advantages of a classical AI will be simple manipulation in your nest but but in an implicit way so we need efficient and coordinated large-scale learning we need semantic routing system one and the perception action loop we need distribute representations for joinin which has been you know big success for deep learning we need efficient search that space on system one we need to handle uncertainty but we want to operate these other things I've been talking about that really have been explored first by the people in in classical area like systemic realization factorize acknowledges in small exchangeable pieces manipulating variables instances references and indirect and and and so this is connected to why I think just taking the the mechanisms we know for good old-fashioned AI and applying them on say the top layer on the output of a neon that is not sufficient we need deep learning and the system to component as well as in the system one part we need those higher-level concepts to be grounded and have a distributed presentation to achieve generalization we can't do route force to search in the space of you know reasoning and and then there's the question of how symbols should be represented my bet is that we can get many of the attributes of symbols without the kind of explicit representation of them which has been the hallmark of classical AI so we can get categories for example by having multimodal representations of distributions we can use things like Gumbel softmax which encourage separation into different modes we can get in directions variables as I mentioned already we can get recursion by recurrent processing and we can get a form of context independence which is allowing to dynamically activate combinations of mechanisms in the context dependent way I'm done thanks [Applause] so I Marcus you have 7.5 minute to answer yeah you should you can show collectors yes yes so I don't think we disagree on all that much except for your last set of slides so better if I do this if I can we're gonna get my slides back up AV again after AI so I didn't quite understand your response to Google search in a way so I I tried it out Google search as an example of a hybrid system that works in the real world that scales you know it's is massive I don't talk about Google search do i well yes exactly so so your critique of the good old-fashioned AI hybrid system so I let me just say good old-fashioned AI is symbols all the way I'm not endorsing that right I'm arguing for simple I mean realized symbols plus deep learning I take Google search to be an existence proof for things that you just said couldn't exist so you said good old-fashioned AI is not going to be able to represent probabilities well it's not gonna say I understand while you're talking about Google search I'm not trying to emulate Google search I really got intelligence hang on just one second can we just go in to back and forth instead of doing the seven and a half minutes and just be more freeform about it so then yeah great so so right you're not trying to build Google search you're trying to build an intelligent system yes Google search is in some ways an intelligent system and some not but I think you have two avenues here you can either say it's so different from an intelligent system that it's not interesting or you can say it's interesting and it does show the proof of concept look III completely agree that a lot of current systems which use machine learning also use a bunch of handcrafted rules and code that was designed by people based on our understanding the boat this is this is how state-of-the-art systems in particular dialogue systems I think is even a more obvious example where current state-of-the-art systems combine machine learning and with a lot of a handcrafting it's also true of autonomous vehicles these days I mean this there's a lot of engineering on top of the whole the computer vision so there's no question I don't think we disagree on this the question is where do we go next in order to build something that's closer to human intelligence okay so you I may have misunderstood your argument you're not saying I'm gonna recap to make sure I understand you're not saying that one couldn't build hybrid systems you are saying they already built that's what I was saying but okay then I misunderstood your argument no I'm talking about so how the brain works and ha I would like to build AI in the future let's come back to the brain part so so why are you not satisfied that hybrids are part of the answer if I read you correctly it depends what you mean by the word hybrid so we at what point do you get off the hybrid train so I get off the hybrid train when it's about taking the good old algorithms like in production systems and and ontology zand rules and and logic which have a lot of value and I think can serve an inspiration and trying to take them basically glue them to neural Nets so people have been trying to do these kinds of things for a long time in the 90s there was a lot of neural symbolic work and so on and I've tried to outline in my last couple of slides I guess I misunderstood that I had two more minutes left but um I had I tried to outline the reasons why it couldn't it couldn't work and it's not just about how the brain works but you know for machine learning reasons for practical computational reasons so so one of them is search so right now what I mean by search is what we do when we have the knowledge and say things like rules or pieces of neural nets and now we can a dynamically choose which parts go with which parts in order to come up with a new conclusion this is what reasoning and planning are essentially about and if you introspect a little bit about how humans plan on how humans reason we don't explore a zillion different trajectories of possible ways of combining things and pick the one that works best according to some criterion we essentially go and try one thing and sometimes two and it really doesn't work we try three or four go masters go up to fifty ok but like their their brain is weird because they've been trained or you know people who are really good at algebra and so on but but like normal behavior involves this very intuitive sort of like whedon we know where to search and that's based on system one that's based on something that we don't have conscious access to that knows where to search and so so that's one reason why we we can't use the old algorithms the other reason is that the symbols themselves we know we need to represent information in in a richer way like the reason why connectionists really wanted to depart from symbolic processing because they thought that it wasn't sufficiently rich kind of representation in order to get good generalization you want to represent everyday concepts like words in natural language by sort of this sub symbolic representations that involve many attributes and this this allows to generalize across similar things and and I've read some of the things you wrote and you could say well these attributes are like symbols themselves sure you can you could do that but the important point is now you have to manipulate these these rich representations which could actually be fairly high dimensional we need to keep that from the new on that world and yeah and of course we need to keep the things that have worked well in machine learning which include representation certainty which some people are doing like like Josh Tenenbaum with probabilistic programming and so on so I think there are some efforts going in those directions but we need to keep these ingredients together so I'm gonna mostly emphasize our agreements here I agree first of all the classical symbol systems have search issues and I think that to the extent that one wants to preserve them one wants to solve those problems so there are ways that people have thought about it for example in Psych which is the kind of classic most huge symbolic effort in Psych there are micro theories to target reasoning in particular domains and that's I think an idea that's worth exploring but I absolutely agree that if you have unbounded inference you're in trouble I think that alphago is an example where you bound the search search partly through a non symbolic system and then you use symbolic system there as well and so it's kind of a hybrid in what way is it a symbolic system the Monte Carlo tree search is just you know reversing research it's a search but they're like no symbols you have to keep track of the trees and trees are symbols that actually brings me to a separate line of discussion I'd like to so I saw it so I think it's just a matter of words so you know search it's where we'll need search and so obviously if we need some kind of search if you want to call that symbols I think symbols to me of a different nature symbols have to do with the discreteness of concepts and this is also something that is important but as I mentioned quickly at the end my presentation we can get discreteness not necessarily in its hardest form in its purest form as you have in symbols you can get discreteness by having in units lateral inhibition that creates a competition such that the dynamics converge to one mode or another mode this is what you observe in the brain by the way when you take a decision there's a sort of competition between different potential outcome suits and and so the dynamics chooses one sort of discrete choice over another but it does it in a soft way and the brain has access to all of that soft information and lay something else on here I think that we both think the other side is straw Manning our baby so I think your straw Manning symbols because lots of people have put probabilities and uncertainty into symbols and you think and I think it's an interesting discussion point that I'm straw Manning deep learning so you said I'm attacking the models in the 1980s and there's some truth in it and then there's a question of what the scope should be so I think both for symbols and for neural networks there's a kind of question about what's the proper scope of them and then we're actually pushing to the same place from opposite sides so I would argue that the kind of deep learning stuff that was straight out of the 80s which is you know continued until like 2016 in my view but we could argue about that you know just let's have a big multi-layer perceptron let's pile a lot of data and and hope for the best which I don't think you believe anymore but maybe you did at one point that's one kind of deep learning that's the kind of I don't know prototype or canonical version of deep learning and you want to open deep learning to a whole lot of other things and I think at some level that's fine at some level I think it's changing the game you might write them in a second I think that with respect to symbols you might feel I'm doing the same so I want to say sure symbols I want the discreteness of symbols but I'm very happy to add in probabilities like in a probabilistic stochastic grammar or something like that I have no problem with that I love a lot of Josh Tenenbaums work which is really like symbolic programs plus uncertainty and so I want to expand the umbrella of symbols and you want to expand the umbrella of deep learning why don't we say let's build deep learning symbolic systems that expand the scope of deep learning and expand the scope of symbol systems look I I don't care about the words you want to use I'm just trying to build something that works and that is gonna require a few simple principles to be understood and I do agree that there's a lots of interesting inspiration we can get today in in the work that's being done in kind of science and in in symbolic AI but yeah III think that some of that needs to be reinvented and by the way we started doing things like attention mechanisms and people were doing reinforcement learning already at the beginning of beginning of this decade so it's not actually attention mechanisms even died from much earlier than that so it's it's been around and another thing you have to keep in mind I've been working on weekend Ronettes since the 80s and in a way the various forms are recurrent Nets including the gated ones use very similar principles and again have been around for since the 90s so it's not a completely new thing there's an evolution of course we're doing research so it's not like we have one algorithm and we're stuck with it we're building and constantly trying to expand on the set of principles that we have found to work there's nothing wrong with that there's nothing wrong with it at all I think we should actually yield to the questions from Vinson and the public sure sure this one is so that's right I'll stay here so the first question is for professor Gary Marcus Steven Pinker said recently in a tweet that it's a NATO and not sir so deep learning internet are in fact shallow soaking up patterns but lacking explanation causality rule based reasoning for neural and unique situation what is the innate knowledge for deep understanding and what need to be learned along the way here what is there him what needs to be done what would be the innate knowledge necessary to have deep understanding instead of deep learning representation he wouldn't we like to have deep understanding that mean to be able to have causality reasoning and so on and consciousness my question for me well yes I found this slide for a second I think it will bring something out that's interesting so I made a slide that I didn't show which I made a slide that I did not have time to show which is both has a picture of a great new paper by Yahshua that we had on the reading list that is well worth reading it is about causality and it's very mathematical paper I took what I think is some of the core math of it at the bottom I admit that I didn't read the paper as carefully as I wish that I had but Joshua is going after causality by trying to make some clever observations about how distributions change over time relative to interventions that are made which is of course the classic thing that we try to do when we run experiments and he's got I think some very clever ways of going after that within neural networks and god bless him I think it's great work it's not the work that I would do but I think it's terrific I'm just gonna draw a contrast I'm not sure I want to use God's blessing but it's okay well if you got that you got that just at what I would say so on the right I have something you just say I have something just that people can see the reference I have something from a paper that Ernie Davis and I wrote Ernie did almost all the hard work but I helped a little bit that most people in the field right now would find to be repulsive with that I think we need to very carefully Ernie created a logical formalism for understanding something very simple which is containers so I have water in this if I tilt it the water will fall out what happens if I drop the microphone in it well maybe not the electrical part of that but just the the physical reasoning about it and the formalism that Ernie came up with that I think is responsive to your question is something that broke things into time-space manipulation things about rigid objects in the histories of objects so he did a very careful analysis of the knowledge that one needs in order to do this basic thing but it's not a trivial thing because we use container metaphors for a large fraction of the things that we talk about I don't want to say it's 50% but it's significant so for example we can think of a container as a lake we can think of a cup is a lake we can think of as a container we can think of the body as a container and so forth and the argument of this paper was that in order to be able to make inferences about these things you need prior knowledge and there's a question about whether that knowledge is innate or it's acquired experientially but the argument is you won't be able to make these inferences unless you have this knowledge about sets and object containing regions and have these kind of axioms and those kinds of axioms things about what rigid objects can do and so one possibility here is we need the formalism on the left in order to acquire the knowledge on the right another possibility is we never need the kind of knowledge on the right it never needs to be reified in the way that Arne Davis proposed my view is that we should have people working on both sides of the spectrum and people often think they're in the minority I feel like I'm the minority but we can do the sociology later I'd like to see more people working on stuff like this to build some broad frameworks for space-time causality and so forth but I totally welcome the kind of stuff that gosh was doing even if I personally don't have the skills to do it and I think the empirical question is kind of could you from the bottom up derive all this although I feel like maybe I strum in yahshua I thought that he was more anti nativists than maybe he really is because he acknowledged evolution so I'll say one more sentence and then turn course I acknowledge evolution I mean well I'll say one more sentence in the take it away so so in my view what part of the field should be doing is saying do we have any priors around things like this this is kind of work the list fell he does in cognitive development and Rene buyer Shawn and so forth and so part of the field should be trying to reify that knowledge and part of the field should be like if we have that knowledge and we know something about causality how can we learn from that so so that means like young looking for a minute it's not that he and I and others with similar thinking think that learning has to be from a blank slate in fact we have theorems from the 90s the the no foolish theorem that clearly says you can't have learning if you don't have some priors okay but what we're saying is is more subtle in that we're saying is we'd like to be able to get away with as little prior as possible now how is little measured well you can think of measuring it in bits so if you think about how big is a program that would encode those priors and you know you would zip that program that would be how big the prior is so the kinds of priors I've been talking about in my presentation I was talking about priors but these are pliers that in a way and are not going to require many bits and so it's very it's going to be easier for evolution precisely to discover those pliers now I also know full well that evolution has discovered very specific strong priors in fact if you look at evolution most of it is about completely hard-coded behavior but these are not the behavior that are most adaptive these another behavior that allow a species to adapt to you know as well as human as being have been able to do so it's more interesting for me to think about the part of what evolution has discovered that is more general these are the most generic priors and of course we have priors that are very very specific we kind of know how to see and to walk to some extent when we're born and some and many animals have a lot more when they're born so it's just a matter of what we care about here is trying to squeeze that the prior knowledge into this you know few simple general principles as as possible we don't know what where is the right line of course so to use your language you have a soft prior which is that you want as little innate stuff as well as meta prior which as I want as little prior as possible right so so this is a place where we at least disagree in taste because I don't want a huge amount but I think I want more than you of course we don't actually have a number but let me give you my intuition again I wouldn't want to have to design that the semantics of each of the boxes in an AI system like this if I could have another way with it why not because so yeah I didn't say why we want to have as little prior as possible is because these leads to more general-purpose machinery that can be applied to a wider spectrum of behaviors environments problems and so on it's it's as simple as that well I guess I got two things to say there but one is actually from yawns works and since you mentioned it and we actually argued about this very thing the other day when we were at nur UPS last meeting on a panel in this particular empirical case having more of a prior who is actually better right so in this particular days having a convolutional prior made the citizen Rob it did we ship every single prior it's like three lines of code of difference right it's not a big change in the amount of information compared to the classical computer vision that was done before called nets where you had to design the functions by hand completely so that's really very brilliant liance right young shared the Turing work with you for those three brilliant lines and in essence right heart one they were very clever they've been very valuable to the world maybe you know I got 25 boxes up there there are three lines each and we just need you know 24 more discoveries of that magnitude is the genome big enough to encode all those half or sorry 95% of our genes are involved in brain development I think there's room in there to encode you know that many maybe 10 more there's lots of room in the genome but clearly not enough to encode the details of what your brain is doing so it has to be that learning is explaining the vast majority of the actual computation done in the brain just by counting arguments so like 20,000 genes with 100 billion neurons and a thousand times more connections so that was what this book was about was what I called the genome shortage argument the idea was we only have so many genes let's say 20,000 genes that we thought it was 30 when the book was written we have 86 billion neurons and so what is the implication of that so the genome shortage argument was well we have to learn at all but I think that the more nuanced nobody said you have to know that all again you're doing this from an infant - let me give you them nobody says that nobody in their right mind says that - I mean it's partly a question about what our bid is you know I want to have 20 things in this debate that Jana and I had I put 10 on the board it was already done that the things you put on the board I agree with most of them in fact most of them they are small priors they don't require a lot of bits to be specified so i-i've so those were things I stood since I don't have the slide up those were things like spatial temporal continuity this being a name you have them in confidence well not the part about you could track an object over time know that it still exists well let's use a spatial continuty it's it's related I mean it's really translational invariance but and it's actually the other things I had on the Lord it's actually more than that because of the pooling but yeah the other thing that I had on the list were things like symbolic priors so operations over variables those they think you'd be less comfortable with I am totally comfortable with operations over variables it's just that the meaning of operations and variables is different for me and for you right so I'm thinking of operations as little neural nets that do things that are not just discreet that manipulate rich representations and I think of variables as indirection passing information about references about keys about the nature and types that can be rich rather than symbolic but but besides I agree with the need for references for example I mean in a way that's all I really want so that you've made me a happy man I'll tell you another time that you made me really happy earlier tonight you talked about having reference without knowing where it comes from and no I'm saying the reason you need reference is to be able to know where the the value you're getting comes from that's the reason these neural nets need to propagate not just values but also names except those names are gonna be vectors as well nearest your more symbolic than I I I'm finding it harder and harder to disagree with you we'll take another question the second question is for but with Yeshua and Gary so Montreal a I thought to Jeff noon Justin said many 99% of machine learning community is focused and wefts Jeff Cohen called the manual pad to AI the manual pad to a manual pad which is we manually identify building blocks of AI with the assumption that one day we will somehow put them all together do you think a higher fraction of our collective effort shall be relocated into the alternate path of AI generating algorithm that Jeff proposed wherein we simultaneously metal and architecture Mitali on the learning algorithm self and we automatically generate the training environment themselves well I like this question very much because I I worked on this question in the early nineties with my brother Sammy and it was essentially the subject of his thesis proposal obvious thesis and this was one of the first papers on meta learning we were trying to better learn a learning a synaptic learning rule we didn't have enough computational power to do this and even now I think in order to realize the kind of ambitious program that Jeff is talking about we wouldn't need a lot more computational power that being said I think it's a very interesting and important investigation and I was really amazed by the presentation for example that the Blazer girls gave at Europe's on this subject I think this is very exciting personally I'm also tempted by the desire to understand the principles that would be discovered and so when I tried doing this meta learning of learning rules what I quickly realized is well you can't learn something like this is completely in the abstract it really helps a lot if you've been if you put in a bit of the right structure and in order to do that you need to do experimentation of the kind we do normally machine anywhere you design the learning out with them completely and that helps to figure out what would be the the right building blocks and the right inputs and outputs that are needed for learning a learning rule or learning a system like this so so I think you know science is an exploration we don't know what's going to work these are two different directions and they can coexist in a harmonious way I pretty much agree with Yahshua's answer it I'll answer it in a slightly different way in principle we know that evolution is a mechanism that's powerful enough to involve Minds because it evolved our minds and having the machine do the work that sort of stands in for evolution would be great in practical matters it does matter what you're trying to evolve and I think what has happened empirically in the evolution of neural networks literature is that people start with too little in the way of priors and so they end up recapitulating some of our journey to bacteria but not so much of our journey from say chimpanzees to human beings in principle we know it can work in reality having a tightly constrained problem and probably a bit of priors to help us there might help it work even better than it is and I think it's totally worth exploring this one is for you you're sure it's about ethical the ethics of conscious and reservation system so there will be ethical implications about the conscious and Hritik and reasoning system how do you approach that I think it's important in general to ask the question of how our work as researchers will be used or could be used because you know you don't need to go very far in the future today we already see the misuse of AI in many ways and I'm very concerned about how we are creating tools that can be destructive and endanger democracy and danger human rights so now the specific question of consciousness I think deserves a bit more time than this debate allows personally I think that the kind of conscious processing that Gary and I are talking about are adding more computational power and intelligence to the systems that we can build but I don't think it changes fundamentally the fact that we are building gradually more and more powerful systems there's the question that some philosophers are asking about you know whether we should eventually give personhood to intelligent conscious machines I don't think we're anywhere close to understanding these questions enough to be able to answer these sort of questions Thank You Yasha for the next segment or participant will answer a question from the audience Evan Miller [Music] hmm thank you very much for this interesting debate because artificial intelligence is going to solve a lot of problems that marry that mattered very widely to many persons and I'm not a computer programmer and so my question I have several questions but I'll limit it to two one is that Gary Marcus I don't know if your professor Marcus said something that your professor ben Gio's approach relied too heavily or his approach to deep learning and his belief in it relies too heavily on larger data sets to yield answers so why is that necessarily bad there are large data sets and ways of constructing them and they said you want me to ask both questions let's pause there and I'll address that um first of all I said that that was my impression of Yahshua several years ago it's not my impression of Yahshua now I think that he's doing a lot of exciting work and he's right that some of it started a while ago but my impression when I first talked to him and I had friends in the Linguistics conference where when we would come to him and say yeah but the kind of systems that we have right now they can't solve this and I felt like his answer was often when we get a big enough data set we'll be able to cover that and I had some quotes from the slides showing it I think there are many people that are more extreme about that even Yahshua ever was so there's a branch of machine learning where I think people think the answer to a particular problem is really about getting the right data set I think Tesla's approach to driverless cars is more or less like this they say we've got the most data we have very cool ways and they do of trying to for example gather data about a particular kind of accident when it happens it's over it is very focused on the data and not so much focused on certain kinds of innovations in algorithm space that I would like to see so I have no objection to gathering more and more data I think that getting clean data is really really important people often underestimate the value of having good clean databases and I think the field was driven forward by having bigger databases no problem with any of that but the answers aren't just there so in Yahshua's terms you know we need system 1 and system 2 and I would like to have more people working on system - maybe we disagree a little bit about the execution of that but I think we agree that we need some of that and not just the system one stuff plus bigger databases I want to say a few things about data because I didn't answer this quote that you attribute to me so I think that I'm interested in the small daily regime in to the extent that we also have a lot of data before we get to that point so humans learn a new task after they've seen a lot about the world right you can't there's no chance that you will be able to learn in a meaningful way without a lot of knowledge about the world that has been acquired previously and so we need both large data in some sense we need a lot of examples if you want for the the baby AI to mature and then it can face new tasks very quickly so that's that's one thing also more on the industrial side if today you know I lead a company or a project I'm gonna use as much data as I can because this is the thing that works well but at the same time if you're looking further down the road in a few years and you're asking yourself what kind of improvement to our current algorithms would be most interesting for industry or for any kind of application then looking at those transfer learning problems where you're looking at new tasks where you have little data but you also have pre trained on many other things that's more right now in the research so the two things are not incompatible this depends on whether you were doing something in the short term or the long term I found just a tiny bit of something to disagree with yahshua but actually mostly agree with what he said the the the one place where I disagree a little bit first let me explain what a small data regime is because not everybody will know oh my gosh we meant by that there are problems where people learn things with small amounts of data yoshua would say that's because they have a lot of experience elsewhere and that's often the case in any case the small data regime is like how do you learn something if you don't have 10 million data points you know if your my kids and you learn a new game in five trials huh how do you do that and clearly some of it as you leverage prior experience the only thing I'm gonna add there is the reason that I did this is a half disagreement is the reason I did that baby experiment back in 1999 was to show that there were some things that little kids could learn without much direct experience so I made up the language so they had no prior experience with the language that they did they I didn't say this with the habituation the period where they learned the made-up language was only two minutes so they only got something like 45 examples of this made-up language so sentences like lot titi and so forth for two minutes and yet they were able to do this and then somebody else this is what happens in developmental psychology if you show the kids of a certain age can do X somebody else says yeah now I've got even younger kids to do it so somebody later showed that newborns could do what I had showed in the 1999 science paper of kids extracting rules and so even newborns it's not a perfect experiment there's a control missing but there's pretty good evidence that even newborns could do this so in this particular case I think that what you have to draw on is not experience outside the womb but the experience that we get indirectly from evolution so some of the problems that we solve in a small data regime come because we have priors for variables and things like that next question now I can't call on my friends that's not fair someone else can thank you for your presentations dr. Marcus you talked about the compositionality and the need to take into account the conversation allottee as a from a linguistic point of view so we have debates and arguments on compositionality but to make a central system V accepted compositionality we had some progress in the neural nets the recursive neural nets for for compositionality however those efforts has been abandoned they have abandoned the efforts aren't the recursive Nets we don't do research anymore under recursive Nets and I think the argument is that we need the parse tree we need the knowledge to feed into the recursive Network to design the architecture and to form the network I think there is a resistance here that we the deep learning community they are not willing to take any external knowledge in the form of linguistic structure or the parse trees dr. Ben Julie would you please elaborate on that I don't think it's a resistance as much as an obsession to beating the benchmark which could be good or bad all right it's because these very large fairly simple architectures have been working so well so I mean a good example now is the success of transformers transformers are working incredibly well but they're using actually these key value pairs I was talking about they're operating on sets so you know the the recursive nests was one attempt but there's been others that have been more successful and and maybe recursively that's we'll come back we don't know the history of science is very complicated as we've seen with deep learning so I think there's a lot actually I don't read the the sociology of the current deep learning field like you are in fact there's a lot of interest in exploring how we can put a some architectural structure in your nets that facilitate the manipulation of language and reasoning so I'm you know I'm much more optimistic than you seem to be I would say that historically there has been a resistance I think that that's changing some I think it's partly a function of people have tools and they're good at particular things and we don't really have quote deep learning tools maybe in the extended sense for really dealing well well with recursion and compositionality in the sense that I'm describing yet I think there's much more hunger in in that field in the last two or three years to do it in terms of the Transformers I just gave a talk on the new benchmark called dynamic understanding at nur ups and you can probably google for it online the basic point I made about transformers like GPT two is they make very fluent speech but they don't understand what's going on or fluent text so I just have an example here from a slide so they're often plausible in the first few sentences of surrealist fiction basically so I fed into one of the systems across the street from NURBS two unicorns walk into a bar and then the system says continues that passage with at least that's what my picture shows I've never seen such a multicolored beautiful forest of sapphire eyes on the same corner of the street in a bar before it's like fabulous that has created this service or realist prose on the other hand when I forced it into the nonfiction genre it seems a bit ridiculous so so maybe we can I don't know if this is gonna work I'm gonna lose it an example I had on the right is two lemurs walk on a road I was actually in a place with lemurs and roads recently and another joins in the total number of lemurs on the road is and you're supposed to add up two and one and come up with three and if you're a human you probably do that but if you're deep learning system you might come up with something like not 100 is claimed at about 80 or so and so the system doesn't convert and it goes back directly to your question the prediction in statistics that it's making about plausible classes of words into a direct representation of the individual entities that are involved and so if you want to watch my benchmark talk it's full of examples like that actually have them next slide so I give things about conventional knowledge definitions transformations atypical consequences and so forth and then I have data from these models on the right and they're you know typically doing like thirty percent or ten percent or something like that so there are sharp limits and I think those limits come because we don't have kind of a parse tree on the output yet and we need to do that okay well my name is mr. Barry since during seventy five years ago and it's a virtual machine and all we could do with the binary mathematics we achieve great things now today we're with Canton computing and quantum computers which are closer from the way that the human brain thinks something can be right or wrong okay both almost at the same time are at the same time well Canton computing and quantum computers could they represent a breakthrough that we were waiting for to to achieve the artificial intelligent the best way maybe so but you know I'm a big fan of Occam's razor so if we can build intelligent machines and explain how the brain works without having to go quantum you know I think it's very satisfying to go for simpler solutions and I think in terms of neuroscience most of the community thinks that you know the brain can do its competition without requiring I mean of course they're responsible computing in the sense that molecules are operating in a quantum way but but you know if we abstract one level up it's all competition and that that is not quantum in by nature so of course we don't know what you know I don't have a crystal ball but at this point I think the majority of the community both in your own and in computer science are betting on traditional computing in the sense that it's not quantum but and another thing I want to say is right now there are not many algorithms that can be efficiently paralyzed by quantum computing and no serious machine learning algorithms like mike deep nets and so on if they can find the right theoretical breakthroughs that unable to implement things like like deepness in a way that takes advantage of the quantum capabilities then it would change the game it hasn't happened yet but this is something that we can look for it I pretty much totally agree my friend Sandi's gonna ask a question I'm gonna start my question with a little anecdote when a bunch of journalists when interviewed the scientists who created the nuclear bombs one of the things that they profoundly stated was they were so involved in the science they didn't even think of the ramifications so I'm listening to you two geniuses here and I'm not even gonna pretend that like three-quarters of this isn't SpaceX going right over me but one thing that disturbs me is I don't hear a single word about checks and balances and ethics that are going into your creating the algorithms that are going into all of this you know AI and as somebody who's not an AI who is a human in this world I find this incredibly disturbing I'm sure Gary has heard me say stuff like this before but I'm bringing it out again because I would love to hear you guys address this the first thing I'll say is of all the people in the field you could have leveled that accusation against AI think Joshua is the least appropriate because I think Joshua thinks pretty deeply about this and I'll let him speak about his version of alright my own version in the book review rebooting AI was to argue that common sense could be a way of building a framework such that machines could represent values so you can think about Asimov's laws you know you want robots to not do harm to people and one of the things we talk about is how do you get a computer to even think about what a harm would be to a person so it's one thing to get a computer to recognize a picture of an elephant after you've seen many other pictures of elephants they can't really do the same trick for harm this harm takes many many different forms it's not really about the way that the pixels fall on the screen or the page radar and so a lot of the argument that Ernie Davis and I gave about this particular set of issues was we need to rethink how we get knowledge into these systems and the nature of knowledge as a platform to then get to be able to program in the values that we want so that's how we thought about it I don't think it's a full answer that it is how we thought about it and I will turn over to you thanks for raising that question it's very important Gary and I have been talking about maybe something a little bit technical from your point of view about where we think the our research should be going in terms of how we build smarter machines but it's at least as important that our society invests even more on the question of how are we going to deploy these things what is the responsibility of everyone and a chain from the researcher to the engineer to the people doing auditing or to government's drafting regulations to make sure that we steer our boat in a direction that's best for humanity that's best for citizens and I'm very concerned that we're building tools that are too powerful for our collective wisdom and I'm fine with like slowing down the deployment of AI I think governments are not yet ready to do the proper regulation and we need to spend more time talking about things like how AI can be abused to influence people to control people to kill people these are all very serious issues discrimination killer drones advertising social media deep fakes basically right now is the Wild West and we need to quickly get our act together maybe we will I'll just give one last example for one I want to mention that here in Montreal we we've been really working hard on this question and we came up last year after two years of work involving not just scholars but also citizens with a thing we call the Montreal Declaration for the responsible development of AI I invite you to check it out online and we're pushing these ideas to the Canadian government it's there there's been many frameworks that people have developed around the world to try to think about sort of social norms that we need in this deployment of AI now I think it's a lot in the hands of governments and the agencies that are looking at specific sectors where where this technology is being deployed it's also in the hands of the UN if it involves for example military deployment and for that to work we need the media we need people to voice their concerns I'll just add one thing cuz I think we have to go to the online questions but I wanted a amplify the point about wild west a good way to think about this he's right now a driverless car manufacturer can basically put anything on the road we can sue them after the fact if they cause great harm but there's no regulations essentially about what you can do with a driver's card if you compare that to how much trouble there is to do perform a new medical test or build the new drug and how much regulations near there's an asymmetry that I don't think makes a lot of sense and I'll give a shout-out to my friend Missy Cummings who has a podcast I think with the zine I'm blanking on his last name forgive me begins with an a bizarre or something like that exponential you or something like that a few weeks ago talking about this issue the the asymmetry in regulation between what's required for health and what's required for a I I think Yahoo and I agree that there needs to be a lot more their questions from the international audience though yes for the last segment or participant will answer a question from the Internet audience not sure you want me to read the question I think you're supposed to pick a question how second question so what what we will do is that I will protect question from the audience on the screen so I will be projecting the question from the internet international audience on this screen stand by once again for AV I will give you I will give you a mouse to scroll and so on and the choose Akash the question all right art we just said I'll just scroll down I'll do very quickly as symbols while Yahshua picks the next one what's my definition of symbols I don't think we should waste time arguing about that I think from the perspective a symbol manipulation the real question is do we have operations over variables you can define a symbol in such a way that it encompasses everything or nothing and I don't think that's where the debate should be so there's a question about what is the chance of AI possessing self-consciousness I think this is a very interesting question but it's also very loaded because we all have our own ideas of what consciousness means we think we have something special what I can say is it's something fortunately that scientists in neuroscience kinetic science and machine learning are trying to are starting to think about and hopefully we can remove some of the mystery and magic from there so we can be better equipped to answer these kinds of questions later so you can scroll down and up oh yeah show is driving a mouse here but it's maybe not this is now it there's AI a via the human interface oh there's no mouse wheel that's okay these questions are too long what is the best way to reproduce the levels of conscious and unconscious thinking in AI well that's your arguing actually about and the industry is we don't know and that's why we need many different researchers to explore different ways of doing this Gary Marcus thinks the deep learning and symbolic AI are compatible and can provide the best of those worlds is there any evidence I think that the best evidence that we have for that is we have some people building actual hybrid models in the real world that do useful things none achieved human level intelligence you know no deep learning system does that no symbolic system does that no hybrid system does that but systems like Google search do something that's relatively and help us and they're very much hybrid systems and then you have results like the Josh Tenenbaum and so forth results that I showed briefly where at least in a very controlled environment a hybrid system can be a deep learning system or a symbolic system on its own it's still an open argument I don't think in the end that either yahshua or I would say we have the answers here right we're trying to lay out what we think the territory is that people need to explore and I think the biggest take-home message as I said on the slide is we actually agree a lot about what that geography is that needs to be explored we have some differences about where to go in that exploration neither of us think that we've reached the destination by any means so I want to talk about the question about do you think that language understanding language understanding is a form of intelligence we clearly need better language understanding for AI and there are really interesting connections between language understanding and reasoning but they're really different so the I listened to a presentation at the last Europe's by F Federico and she's a cognitive neuroscientist and what she found that with our colleagues is that there is a language area in the brain and it does process everything that's connected to language but it doesn't do the other things that one might think are related to language like reasoning so it's other areas that are doing it and that's also connected to the bigger picture of language that I've been talking about language has sort of syntactic aspects and structural aspects but the semantics what language is referring to is you know people call common sense and grounded language understanding refers to general knowledge of how the world works and this is an area which is very active in machine learning people irrespective of whether they do language or not are looking at how learning systems which interact with their environment can build better models of the world and if we don't do that we'll never have good language understanding so this connects with some of the things that Larry talked about with limitations of transformers I mean transformers work incredibly well these are the best things we know right now in order to process language in quantitative benchmarks but as you said they have you know they make what I call stupid mistakes and I think one of the missing ingredients is they don't have a role model they don't I mean they might build quite a bit actually a world model through reading text but there's a lot about the world which you can't get I think just from reading text maybe this is a place where Gary and I could disagree like I think that there's a lot of knowledge about the world which is intuitive for example intuitive physics that is difficult to put in words I mean of course physicists do it but but babies don't do it they have an intuitive understanding of physics so in order to do good language understanding I think that we need machines that understand how the world works in a physical sense in an intuitive sense and these two things need to be tied and that's connected to the the link between system 1 and system 2 that I was talking about mmm I think we could probably talk for the rest of the time about this one question there's a lot of interesting things there I think the first thing that I will agree with with Joshua on I've lost track is tracking identity over time language and reasoning are clearly separate things but they're not fully separate so there's wonderful work for example from Mike Tanenhaus and John throughs while showing experimentally the people reason about the world at the very moment where they're processing it so if I give you an ambiguous sentence you will look to what are the things out there in the world that can help me to disambiguate the sentencing you will reason like is there a cup on the table or a couple on the towel and I'm gonna put these all together in an understanding of a sentence and so it's hard to draw a sharp line as you know interesting work notwithstanding there's certainly overlap on the other hand a you know very clear example of how important all the physical reasoning stuff is would be any primate that's not a human right think about all the physical reasoning that a chimpanzee can do without any language at all we could argue about the ape studies but I don't think they're very compelling so you have species that can you know navigate their way through trees and have social interactions of all you know very complicated social interactions exchange and all of these things without any language and I think we would both be thrilled if before we leave this mortal coil we were able to build AI systems that could do which imp and Z's do now I have a personal interest in language having studied it for most of my career and so I'd really like to see us get language right I think having the world model is a prerequisite and it's really hard let me talk about this question about reasoning how do you define reasoning and what do you think deep learning will or will not be able to solve it so of course people have been tackling reasoning for a long time and well before Neil Nets you know became hot and considered as potential tools for reasoning the way I think about how deep learning can do reasoning is connected to what I mentioned as these dynamically recombine able pieces of knowledge that we can search through so we can search through how you know which piece of knowledge can be combined with wispy piece of knowledge in order to find a question find an answer to a question and those search are those searches are heavily guided by our intuition so we know where to search and so reasoning is about looking for coherent solutions to a problem to a question there is in all the way of thinking about reasoning which I find really reappearing which dates back to the early eighties neural network of geoff hinton with both machines where you can think of reasoning and if you can find it again in modern graphical models you can think of reasoning is finding a configuration of the random variables the variables that maybe provide answers to your questions that is most compatible with everything you already know right which has the highest probability given all the facts you're giving to the machine and with both machines you're trying to find that through a Markov chain which searches in in in the way that it does it is by looking for a low-energy configuration by gradually changing the configuration until you find something that that is good and I think something like this couldn't make sense for the kind of unconscious reasoning that we do we all have the experience of asking ourselves a question not getting the answer back right away moving on to something else and then maybe the day later the morning after or something the answer comes to you so the thing that has happened during those hours is happening in the background and it's something that we don't you know it's harder to characterize but it it it may plausibly be this kind of energy minimization now the kind of reasoning that we do consciously is very different we don't consciously experience going through thousands and thousands of possible configurations we immediately search through a few things that happen to be very relevant so I think we need two kinds of reasoning I think again we agree on the two kinds of reasoning you could think about what you called ATS deep learning networks and I call pure deep learning and I would say those can't do certainly what you call system two reasoning I would say that right now the best system for doing system two reasoning is the much-maligned psych and people might want to look at an article in Forbes where psych is given Romeo and Juliet and not straight text but put into computer interpreter well form and you could argue about whether that's fair but it's it's given Romeo and Juliet and it can make some interesting inferences about characters motivations and so forth and that's a symbolic system I would say that the richness of inferences that can be made by symbolic systems is for now ahead of deep learning but I will also grant Yantra what it doesn't work well I mean in a narrow domain I mean actually in many narrow domains it can work to some extent I certainly don't want to say that it's the answer but it's a proof of concept that you can do and it's I'll give you a second to come back it's very unlike how the brain does it your brain doesn't go through zillions of you know trajectories I will agree on that point that I hold on the the contrast that I wanted to draw is so we have a system that doesn't really I think do reasoning at all which is a pure deep net multi-layer perceptron with none of the attention I disagree I've mentioned that both machines they do just that they're not gonna be able to make reasoning over quantified statements and and so forth at least not that I've ever seen well they haven't been explored recently but this is essentially what they were designed for we can place if I look bad afterwards about Boltzmann machines and their ability to deal with quantification every sum and so forth I I would say by and large that the results of extant neural networks on reasoning are not as impressive even as that example from psych but I would also say I was going to give you the point and then we can come come back that if you take the broader notion of deep learning that yahshua would like to defend and you start putting in mechanisms for attention and indirection and so forth which come at least a little bit close to the things that I want then all bets are actually open we don't know yet what the boundaries are once you include mechanisms like indirection we know some of the things we can do there there's a lot of stuff in classical reasoning I don't think has really been addressed yet there are other people are more expert in that but I would say even just dealing with quantified sentences how do you deal with everybody love somebody in the ambiguity and that we haven't really seen that yet so there's a question here about what do you think of transferring structured rules in the form of first order logic onto network parameters as opposed to encoding the information in latent variables um so this is actually the kinds of things that people were trying to do in the 90s trying to create a direct analogy between our presentation mapping between representations of knowledge in the weights between neurons and the rules in logic and personally I don't think this can work for a number of reasons on the other hand what I think can work and in a way we're already doing it is neural nets that can acquire knowledge by reading documents just like humans do or reading databases I like knowledge bases and so this is something of course that we we can do a lot better because right now I don't think we have the right tools for this but we're making progress and the kinds of things that people are using now transformers I think can be evolved into what we need especially if we couple them with better world lots so yeah this is it's clearly still an open question I think the most interesting work right now is done by or some of the most interesting work in that specific questions done by Arturo gar says who's trying to build hybrid hybrid models where you have explicit representation of logical formalisms and you can map it in on to a deep learning system it's still I think early days for that work but it's interesting it's another perspective here to make sure that those initial injected assumptions will still hold after the training overcome catastrophic forgetting so I think some so the first question I think we just at least I gave a partial answer to this the the second one about forgetting is is very important it is connected to some of the things I was talking about when I mentioned factorizing knowledge in two pieces so that when there's a change in the world the change in task a change in distribution new piece of knowledge gets added it doesn't it doesn't require the whole system to be adapted but only a few parts of it that that explained that change and if we're able to do that which for now we've done on a very small scale but if we're able to do that on a large scale then I think we can overcome catastrophic forget it we can build systems that adapt in just the necessary ways without having every neuron and every weight trying to be part of explaining the change that just happened or a new task if I could push some of yahshua was fantastic students who are probably sitting in the room to study one question that they might not be studying so much right now it would be that first question here how to inject knowledge into deep learning models and frameworks there are a lot of people in the field thinking about this I don't think there are a lot of people at the cyclope we had a paper a few years ago that does the things that was talking about earlier in the sense that we have a language model that while it's reading text for example like well is reading a question or trying to complete a sentence is looking up in a knowledgebase a structured knowledge base with with you know like subject-object-verb things like this standard relational databases and and looking for those words that it has seen or their equivalent synonym representations in the knowledge base and then using attention mechanism pick picking the pieces of knowledge in the knowledge base which can help it predict what the next word should be so this was done with some jinan and it's been it's been published and what it allows is models that can do their normal neural anything but as their computing is like they're allowed to go online and check for information that they don't already know that is not already integrated into their inside brain and use that information in order to answer questions or predict something I'm gonna raise a technical issue at make an advertisement so the technical issue is I wonder how well you have to answer it now but how well it works with quantified statements and negation as opposed to triples which you know things and first don't know so at that time we were not looking at that and I think it's only recently with attention mechanisms in the form that involves indirection that you can start thinking about quantification so quantification the way I interpret it in a neural net sense is essentially that you have these little modules which in your world you would call rules and that's fine except there are not symbolic rules they're just more like they allowed to do inference on some variables given other variables but they're the inputs of those rules don't have to be always the same coming the same place so the the inputs have types just like functions in programming in C++ have types and so they expect their input to have the right type to match together and when a rule matches what is in the data it can be triggered just like in production systems well so get to the advertisement in a second if we could work on one thing together that would be it the advertisement is Vince and I are gonna put together a set of readings after the debate so people can follow up on some of the issues that we talked about today and my first nomination is the paper that you just suggested okay so if we are to build real will the AI systems how feasible is the current practice of training deep learning networks and to end knowing fully well that they are going to be huge in this regards so for me the end-to-end thing is mostly a problem when you consider biological plausibility because there's there's a long delay between information being propagated from one part of your brain say in the back to the front part of your brain so that the number of back and forth exchanges that can happen in the time that you are able to answer a question like half a second is very short you can go like back and forth a couple of times and so would be reasonable to assume that although there is coordination at a global level a lot of the learning involves local local defects and so there's been a lot of interesting work in deep learning I don't think it right now we've solved this problem where people are trying to predict the gradient that would eventually come back if you were to do end-to-end learning and then use that to start the learning in in a sense and if you look at reinforcement learning systems they use that kind of trick as well to predict the reward that you will get and use a predicted reward as an intermediate local reward so I think there's some interesting about decentralizing this this kind of learning there is also more pragmatic explorations and things like federated learning where people are trying to build deep learning systems that can learn on local nodes like on people's phones and things like this without having the data on those phones necessarily travelling to some central repository so I think this raises all kinds of interesting questions there's also a lot of interesting connections to multi-agent learning so one of the hot topics in machine learning these days is how do we have multiple neural nets interact with each other each learning from its own objective function but in a way there's a social thing going on where they're together trying to solve the problem so I think that's another way that you can decentralize the e-learning queue between society of course I think we're almost out of time I'll add one thing to that and I think Vince maybe goes next I think that modularity in the sense that Jerry Fodor was talking about once upon a time of having individual components that are tuned to particular things is the heart of how the brain works it's not fully modular but I think the most amazing thing about the imaging literature taking pictures scans of people's brains is the way in which the brain now in a connect to a phrase of yours dynamically reconfigures itself in the course of anything that we do so you can tell someone who's coming into a scanner experiment like the ones that abs gonna do okay what you're gonna do now is you're gonna take glasses and you're gonna put them onto the head every time you hear the word blue and then people in the space of three seconds dynamically reconfigure their brain in order to be able to do that I don't think that end-to-end deep learning is capturing that but some of the dynamic reconfigurability that they gaucho is adding to second that is deep learning to I mean there's some deep learning there I mean again one oh it isn't learning seriously deep learning that allows you to do the reconfiguration yeah in what is just deep learning with gates we've had gates for since 1989 or something like this gates them with you and we can argue about that at home of course we've made progress but I mean it's not really a completely new idea of gating computation neuroscientist have been talking about your modulator is forever no so just just remove from your brain the idea that deep learning is in 1989 MLP with Pete Ward connection that is not deep learning sorry we can argue about the scope I will end with the agreement which is I think the gates are the solution so the question was framed in terms of Enda and deep learning and and then deep learning typically is the closest thing to the kind of thing that I'm critiquing doesn't tip at all and not the state-of-the-art not today's deep learning state-of-the-art that's not how it is I'm all for the gates distinguished participants ladies and gentlemen we just had a hugely impactful the I debate many times to Gary Marcus inertia of NGO and Tamila for hosting us I want to thank bins for even having the idea to do this and for Yahshua for being gracious enough not only to do it but to host it [Applause] the conversation will continue on social network with the hashtag AI debate a decade that has revived the field of AI isn't thing with this AI debate a new decade we so would soon begin with the best way for hard for AI e stay tuned for the announcement of follow-up events and for the unveiling of the next montreal AI world class event good night everyone [Music] [Music] [Applause] iq everything it is

Info

Channel: Montreal.AI

Views: 21,601

Rating: undefined out of 5

Keywords:

Id: EeqwFjqFvJA

Channel Id: undefined

Length: 122min 24sec (7344 seconds)

Published: Thu Dec 26 2019