#54 Prof. GARY MARCUS + Prof. LUIS LAMB - Neurosymbolic models

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
producing street talk for all this time now has given me a deep interest and understanding of what it means to think and what it means to be intelligent machine learning systems typically learn to approximate functions by relating input variables to output variables in a process that judea perl has likened to curve fitting programmers on the other hand define their algorithms independently of training data purely in terms of operations over variables the programmers have confidence that their programs will work in almost all situations humans can pick up new skills and assimilate new knowledge with a small amount of new information this is the so-called kaleidoscope effect which is to say being able to cast previous experience into many new types of situations in experience space we do this by building new models on the fly by extrapolating from abstract prior knowledge coming up later on in the show 60 minutes we're going to try to catch us out and prove that we uh secretly bugged jeff and tao what's kind of a workable definition of abstraction going also here beyond the buzzword of you know we need abstraction i i do abstraction you do abstraction like when does a system do abstraction and what would you accept like if i came to you and said i have gpt4 and it does abstraction what would you accept as a test that that is happening professor gary marcus says we want artificial intelligence we can trust in our homes on our roads in our doctor's offices and hospitals in our businesses and in our communities robust artificial intelligence while not necessarily superhuman or self-improving can be counted on to apply what it knows to a wide range of problems in a systematic and reliable way synthesizing knowledge from a variety of sources such that it can reason flexibly and dynamically about the world transferring what it knows in one domain into another context in the same way that we would expect of an ordinary adult if we cannot count on our ai to behave robustly then we shouldn't trust it gary concedes that we're missing some fundamental ideas about how to deal with the combinatorial complexity of reality and how to integrate knowledge that we have about the world with learning we use a lot of abstract knowledge all the time but all of the intelligent systems which have been created to date are narrow which is to say there's almost no adaptation to novelty imagine a chess algorithm that broke if you change the board size by one without having to retrain it from scratch all of the latest efforts from deepmind would fail on this simple hurdle i think gary really nailed it when he said transferring what an ai learns from one context into another this is the fundamental thing which ai systems today are missing professor gary marcus is a cognitive scientist he views cognition as a kind of cycle humans take in perceptual information from the outside they build internal and possibly incomplete cognitive models based on their perception of that information and then they make decisions with respect to those cognitive models gary passionately believes that if we don't do something analogous to this we won't succeed in our quest for robust intelligence though i was brutally repeatedly frequently attacked for it for saying that these systems don't abstract very well that there's a real problem with extrapolating beyond the training data that replicability was a problem that there's no real semantics there um etc i think all of those are now actually received wisdom they're now in fact if you watch benjio's recent talks they're basically the the introduction to his talks are those things that i said and you know lacoon has turned around even he has actually turned around and he he's the one who brought to the world's attention the gpt-3 suicide example so even like my fiercest critics have actually turned in the last couple of years the iannick said it kind of amused me a minute ago it's easy to make the argument about semantics well no it wasn't for 20 years every time i made it i was accused of being a terrorist and a bad person whatever gary advocates for a four-step program the initial development of hybrid neurosymbolic architectures followed by construction of rich partly innate cognitive frameworks and large-scale knowledge databases followed by further development of tools for abstract reasoning over such frameworks and ultimately more sophisticated mechanisms for the representation and induction of cognitive models he thinks that effective cognitive architectures are likely to look more heterogeneous with specialized modules he also thinks that we're going to have to redefine what we even mean by learning he thinks the new modality of learning is more abstract it will include more language like generalizations and would be relative to knowledge and cognitive models incorporating reasoning as a first-class citizen in the process he thinks that we've just been skipping too many steps we need to start with a bedrock of cognitive primitives like operations over variables and attention mechanisms and then we need to learn to recombine them gary says that the knowledge gathered by contemporary neural networks remains spotty and pointillistic arguably useful and certainly impressive but never quite reliable gary says that common sense is open-ended winning at a game of go is entirely different to solving an unexpected planning problem in self-driving i'm sure that gary would say that self-driving car research has been an egregious waste of time and money it's almost criminal gary also cites the lack of robustness of neural networks in respect of their training regime most human learners learn the same knowledge and language despite highly varied circumstances neural networks are sensitive even to the order in which the information is presented and many other features of the training regime there have been some really cool ideas lately in the neural network space but gary thinks that stronger medicine may be needed to acquire represent and manipulate abstract knowledge and to use that knowledge in the service of building updating and reasoning over complex internal models of the external world also joining us this evening is professor louis lamb perhaps the most recognizable name in the neurosymbolic space lewis thinks that we can integrate logical reasoning and neural networks he thinks that we can learn discrete structures and thus we can combine the symbolic and connectionist worlds lewis is a huge advocate of relational learning although discrete relations lead to gradients which are not easy to deal with in neural networks lewis points out that there are already hybrid models everywhere in production look no further than google search for example lewis argues that we need abstraction we need interpretation and rigorous semantics as a foundation instead of just focusing on these shallow correlations he says that formalizing core knowledge is still a very open problem in the field lewis has done a lot of really cool work with graph neural networks by the way he's also concerned about these tribal divisions in the machine learning space at the moment do you remember pedro domingos he spoke of the six tribes in artificial intelligence well it was actually five tribes i took the liberty of adding cybernetics as the sixth myself unfortunately a lot of this conflict is borne out of competition over obtaining grant money and powerful positions all this formal background is in symbolic ai you cannot ignore that when you look at the history of computer science as gary said even mcculloch and pizza when they provided perhaps one of the first neural network models one of the things that they shown in the paper was how this kind of networks they proposed carried out boolean reasoning logical reasoning gary marcus and lewis lam think that we can address the current problems in ai using a modern set of techniques they think that we can retain the benefits of deep learning but again promote rich cognitive models to be first-class citizens like they once were these days the focus is primarily on learning with these continuous geometric models but marcus and lam think that it must be part of a broader coalition which is amenable to symbols core knowledge and reasoning there is simply something profoundly missing from deep learning right now even the most sophisticated natural language processing models fail to demonstrate a scintilla of understanding we need systems that do more than dredge immense data sets for subtler and subtler correlations in order to do better and to achieve safety and reliability we need systems which have a rich causal understanding of the world and that needs to start with an increased focus on how to represent acquire and reason with abstract causal knowledge and detailed internal cognitive models gary says that it's a fallacy to suppose what worked reasonably well for domains such as speech recognition and object labeling which largely revolve around classification will necessarily work reliably for language understanding and high level reasoning deep learning models fail to represent the richness of the world and lack even any understanding that an external world exists at all my main observation is that both of these gentlemen are far more moderate than i would have expected they both seem to be advocating for a hybrid approach but with connectionist models as the first class citizen the main reason for this seems to be the learning which is enabled with stochastic gradient descent this is the best thing about neural networks the thought of exhaustively searching a discrete program space is enough to put chills down even the most hardy spines i mean personally i'm quite excited about the work that joss tannenbaum has been doing going completely discreet first you know and searching in this type 2 space my worry is that combining symbols of neural networks or you know a so-called sub-symbolic or neurosymbolic approach might mean that we end up with all of the problems that we had with symbol systems and lose many of the benefits we already know from chalet that neural networks are only capable of type 1 or interpolative generalization on a learned smooth manifold they cannot perform type 2 or discrete or extrapolative generalization these models do seem to allow us to represent and embed discrete data in a learnable continuous model but what happens then do we lose the ability to reason unfortunately once you project data into a vector space it's irreversible and undecidable now a few commentators said that there was an astronomical amount of jargon in the last episode it seems to be increasing with compound interest and this show might possibly be the worst ever so we're going to try and explain as much of it as possible a couple of years ago professor marcus and professor bengio had a really famous debate where they battled horns it was the symbolic approach versus the connectionist approach it even spawned this interesting article called marcus versus bengio the ai debate gary marcus is the villain we never needed i think that we both think the other side is straw manning our baby so i think you're straw meaning symbols because lots of people put probabilities and uncertainty into symbols i would argue that the kind of deep learning stuff that was straight out of the 80s which is you know continued until like 2016 in my view but we could argue about that you know just let's have a big multi-layer perceptron let's pile a lot of data in and hope for the best which i don't think you believe anymore but maybe you did at one point that's one kind of deep learning that's the kind of i don't know prototype or canonical version of deep learning and you want to open deep learning to a whole lot of other things and i think at some level that's fine at some level i think it's changing the game so i want to expand the umbrella of symbols and you want to expand the umbrella of deep learning why don't we say let's build deep learning symbolic systems that expand the scope of deep learning and expand the scope of symbol systems look i i don't care about the words you want to use i'm just trying to build something that works and that is going to require a few simple principles to be understood neuroscientists have been talking about your modulators forever you know so just just remove from your brain the idea that deep learning is in 1989 mlp with feed forward connection that is not deep learning sorry and i do agree that um there's uh lots of interesting inspiration we can get today in in the work that's been done in kind of science in in in symbolic ai whenever we discuss symbols or symbolic approaches think of traditional programming code with variables and loops it's kind of like what powers the world's infrastructure already right deep learning models deliberately assume traditional programming even though all the greatest technological achievements of mankind have relied on software utilizing symbolic computation deep learning has set the standard for learning from data but it is symbolic methods which have set the standard for representing and manipulating abstractions for discrete problems you wouldn't be able to pass a google interview or one of my interviews for that matter if you are not intimately familiar with symbolic approaches but how can we combine symbolic knowledge with perceptual data in the classical john mccarthy-esque world of ai practitioners were only concerned with knowledge internal models and reasoning but if you scan almost any of the modern day literature on ai you won't detect a single iota of these ideas not even a whiff of course you can count on francois charles and christian sergey and melanie mitchell to bring up common sense and reasoning but it really is slim pickings in general when it comes to citing rich cognitive models gpt-3 is the statistical language model which perhaps most characterizes our move away from these classical ai ideas since the resurgence of deep learning in the last decade as we discussed ad nauseam on our special edition episode on gbt3 but if you wanted to give medical advice and somebody tried to set this thing up as a suicide prevention uh counselor then it's a tragedy right you know there's an example out there on the web where someone asked gpt3 or said i want to kill myself and gpt3 said i think um that would be a good idea it doesn't have any common sense knowledge no explicit cognitive model which could be updated and it cannot perform any explicit extrapolative or abstractive reasoning what that means is that it wouldn't understand what jon had sour grapes actually meant do you remember aesop's fable in which a fox tries to reach some delicious grapes when he failed though he declared that he didn't want them anyway because they were sour this is a linguistic embodiment of the abstract mental category of situations featuring something that is the object of someone's arduo but having proven out of reach is subsequently deprecated by the person who desired it this abstract quality often concisely called sour grapes is potentially recognizable in thousands of situations and this phrase could be thus used as the verbal label on any such situation gpt 3 doesn't understand the structure or the semantics of this abstract category remember when we said that true intelligence is about casting previous knowledge and new information into new situations the kaleidoscope effect which francois charles mentioned a few episodes ago let's go one step further in breaking down the jargon the six most important words that you can wrestle with in today's conversation are intention spelt with an s extension reasoning knowledge semantics and understanding behind the scenes we actually spent hours and hours debating the semantics of these definitions which gives you some idea of how complex it is intention spelt with an s is the internal structure of an object it describes all the aspects of some object while the extension is just one attribute usually the output for example consider the following two statements the tutor of alexander the great maps to aristotle or the most famous student of plato maps to aristotle they both have the same answer or the same extension but the intention is different this is really important because in most cases statistical models either ignore the intention entirely or only approximate a latent and brittle representation of it the intentional attributes are the building blocks which can be used to extrapolate new knowledge and understand how a particular answer was derived the intentional attributes are the core cognitive primitives required to construct and extrapolate new knowledge in future novel situations given a tiny little bit of experiential information intention is about understanding the deeper reality so you can generalize it better extension is just the result of some computation but we don't know how the answer was derived reasoning is the act of deriving new knowledge from prior knowledge given new information using logical axioms and rules for example if i told you that my personal trainer had been milking me you will reason that i've been spending too much money on personal training or at least i hope that's what your reason it might just be simple deductive reasoning or it might be abstract reasoning which amounts to extrapolation what about knowledge knowledge is a justified true belief true means it's a fact justified means that it's been established or proven this is distinct from data which could be almost anything and information which is just curated data knowledge is the gold standard at the top of the pyramid what about semantics people often argue over semantics when they pick apart the meaning of an utterance to draw a different conclusion if you try to get a clear definition of what semantics itself even means you might have a fierce debate on your hands most people will tell you that it's only relevant in the context of linguistics but that's not true semantics is about the interpretation of or mapping to an inner structure to assign meaning think of semantics as the raw building materials of meaning the bricks and mortar almost anything can have meaning an image for example everything has an infinite inner structure or set of attributes these are the bricks and mortar for example a string has a length how many a's are in the string or how many b's are in the string given a situation you interpret the meaning of an utterance by selecting the relevant attributes the relevant bricks and columns to construct your meaning any object has the structure or building materials which can be selected to build the interpretation or the meaning the building materials and the possible meanings are both infinite this goes back to chomsky's conception of universal grammar he said that everything is embedded in the structure you can't get something from nothing so syntax is the inner structure semantics is the content or what the structure is saying or what it means and for the sake of our purposes here you can think of the syntax or the inner structure as being the same thing as the intention spelt with an s what about understanding understanding is successfully ascertaining meaning by reconstructing the original intention or that inner structure from the syntax and any prior semantics that we have it allows the derivation of all of the possible new semantics we understood what was presented if we can describe and reapply the knowledge gleaned for example the beer fell off a table and splashed on the floor a person who understood that utterance could from the semantic map of the sentence and probable world models derive that now the floor is wet the floor is slippery and people might fall down on the floor all of which are new semantic maps from those new sentences to an updated world model now deep learning models they don't explicitly learn the inner intentional structure or not robustly um i mean i do kind of learn it to some extent you know they might have a latent dimension for fur on an animal for example but discrete problems or problems which require few shot extrapolation or abstract or step-by-step reasoning are bad for deep learning because no smooth learnable latent manifold exists for these problems deep learning models can only generalize for interpolation on the surface of a learned latent manifold mostly representing the statistical regularities of the extensional answers not the core building blocks or the semantics neural networks are giving you a fish whereas semantics are teaching you how to fish now you see i just used a metaphor there didn't i metaphors are the cousins of analogies it helped you understand didn't it by abstracting the information i allowed you to apply your existing knowledge to a new situation in experience space this is what we need ai systems to do to reapply existing knowledge in future novel situations using abstract analogies the only potential problem is that some analogies are cognitively laboring and they might not be understood by people so there's always a balance of the information conversion ratio of an analogy and the reliance on common knowledge and cognitive processing on the other end to make sense of it now the problem with language is we don't yet understand the universal intentional or primitive logical structural building blocks of it not in a way which captures all situations does this lingua universalis actually exist surely it must do right the tragic thing for us is that a four-year-old kid can extrapolate using language semantic primitives but we haven't yet built an algorithm to do the same thing i recently read an article on linkedin which highlighted the lack of understanding of intention versus extension by deep learning folks in my opinion someone thought that they had used deep learning to learn learn the discrete fourier transform the discrete fourier transform can be thought of as a matrix multiplication where all of the cells in that matrix are a function computation computation is a complex exponential which deconstructs your signal into well onto the complex unit circle representing integer oscillations of different frequencies using sines and cosines if you visualize this matrix it looks like a pretty picture which has all of these concentric and overlapping circles on it so this chap formulated an optimization problem to learn the dft by randomly initializing a matrix of of values and optimizing with stochastic gradient descent on the squared loss of the real discrete fourier matrix and his lend matrix surprise surprise it quickly converged to the values of the discrete fourier matrix now did this guy actually learn the fourier transform or did he just learn the values of the fourier matrix because if you think about it it was only learning the extensional answers it wasn't learning the intentional structure of that discrete fourier transform the model has no idea how to re-derive those values from first principles this might be pointing out the obvious but nothing the model learned could be used to help it learn a conceptually similar problem so if we change one of the coefficients on that dft formula even by one if we if we changed n by one then nothing it learned would be transferable right you'd have to train the model again from scratch and this is exactly the same reason why gbt3 doesn't understand how it arrived at an answer and it can't abstractly reapply any of its knowledge in a new situation so the main thing that these models are missing at the moment is the ability to abstractly reapply knowledge this is really what we need so the guy who wrote this article he went one step further and he learned the fourier transform via reconstruction which is to say not using an explicit fourier matrix even to test against only testing using the squared loss of the original signal against the reconstructed signal but when he reconstructed the signal he basically used the signs and the cosines of the original signal to do so so he biased the model by baking in the intentional knowledge and the fourier transform into the optimization problem itself so what would rich sutton say about that he also unwittingly created a symbolic model i think this misunderstanding is a pretty good analogy for what is going on in the deep learning space at the moment professor rich sutton said that we should finally learn our lesson and recognize the appeal of what he thinks are our mistakes which is to say building systems the way we think we think doesn't work in the long run he thinks that building knowledge into our systems works kind of good in the short term but reaches plateaus in the long term and actually inhibits our progress he thinks that progress comes from scaling computation through searching and learning well i think i agree with him actually i mean if he's pointing out that our knowledge should not be hand crafted then you know god yeah in my opinion this is the most obvious bottleneck imaginable every single time in my career when i've been working with human knowledge engineering it's been a disaster the term knowledge acquisition bottleneck was coined for a reason in my opinion sutton is an anti-nativist anti-nativism is the theory that concepts mental capacities and mental structures are not innate and required by learning it's the it's the ultimate belief in the blank slate basically an intelligent system is one which can dynamically and efficiently discover new knowledge for increasingly abstract analogies the more of a conceptual stretch the higher level of intelligence is required arguably it's automated knowledge acquisition rather than robust ai which is the biggest challenge that we have right now a four-year-old knows that the larger than relation is transitive how is that i think people misattribute the bitter lesson as being an argument for big data actually it's an argument against human crafted and human introspected knowledge the biggest knowledge base experiment today is the psych project created by doug lennon it was started in the 1980s and it has over one million rules in an epic common sense knowledge base pedro domingos referred to the project as a catastrophic failure even the most notorious gopher advocate marvin minsky he concluded that for each different kind of problem the construction of expert systems had to start over again from scratch because they didn't accumulate common sense knowledge he said that unfortunately the strategies that were most popular by ai researchers in the 1980s had just come to a complete dead end saturn also points out that the actual contents of mines are tremendously irredeemably and endlessly complex he thinks that attempts to model or conceptualize human minds using symmetries or objects or space or even multiple agents are flawed he thinks that we should instead only build the meta methods to capture this complexity prototypically which is to say only in its potential critically he thinks that we should build what we would call emergent ai you know so um it can discover like we can rather than have us tell it what we think we have already discovered so in some sense i agree with rich i don't think we would understand the structure of an ai system any more than we would understand the workings of our own brain but there's a good argument that common sense knowledge is or could be universal so it seems pointless to relearn it from scratch every single time what i think gary marcus is 100 correct on is that for us to create such an artificial intelligence from computers would involve symbols discrete world models and discrete extrapolation but i think the answer lies in a modular architecture which will likely be discreet first in my opinion involve neuro discrete program search and involve neural networks and other continuous models as perception modules the system must also possess these meta-learning priors that charlay mentioned in his measure of intelligence allowing it to consume prior knowledge and derive new knowledge and reason over that knowledge when a self-driving car arrives in a new city with a holiday parade it needs to rapidly adapt to the new situation rich's bitter lake article might seem insightful at first but there's not yet an existence proof of a reinforcement learning system which works well on open-ended domains right which learns efficiently and which generalizes rich and his friends just released a new paper called reward is enough the premise of the paper is that simply maximizing a reward indeed any reward given a sufficiently complex environment is all you need to develop strong and general intelligence in a similar vein to the bitter lesson they think that the agent will learn all the skills it needs even things like exploration and memory it's all an extension of the idea that humans should not preconceive how they think intelligence should be architected at a high level the root of why they can get away with shenanigans like this is their definition of intelligence is in my opinion wrong frankly the paper would have been visible had it not been written by the godfathers of deep reinforcement learning their definition of intelligence doesn't even consider efficient abstract generalization or analogy making which according to douglas hofstadter and francois charles it's the absolute core of cognition this is a clip of douglas hofstadter talking at stanford if we are going to make any connection between analogy and a geographical situation we're going to liken it to the interstate freeway system and it links everything together analogy is the interstate freeway system of cognition it is not one little tiny zone somewhere off in the side it's so that's uh that's a kind of a a way of that i think about it i mean i don't really usually think about it that way i made that up yesterday so but it gives you the flavor categorization is the name of the cognition game but analogy is the mechanism that creates or that allows categorization to happen by categorization i mean uh deciding what something is what the essence of something is now one could sort of summarize this in a corny little analogy again analogy is the motor of the car of thought and then we then can even write it down as this little thing analogy is to thinking as a motor is to a car a is to b as c is to d analogy making is the perception of common essence between two things and then a couple of footnotes to sort of hedge um i mean things don't have essences but what i mean i'm not a you know i'm not talking about some kind of abstract glowing philosophical essence i'm talking about the essence that you perceive at the particular time in the frame of mind that you happen to be in and uh and by when i say things it's tempting to think that the analogies are between the things in the external world but i really want to say that analogies happen inside your head so that they're they're they're connections between two mental representations they're connections between things inside your head uh which we project to the outside world and we say these things outside out there are analogous and that's very reasonable to do most of the lineage for this reward idea can be traced back to shane legg and marcus hutter i'd love to get them on the podcast by the way so if you if you know them please invite them on i think we'd have a great conversation and they're pretty cool pretty cool guys but um in 2007 they released this paper called universal intelligence the definition of machine intelligence and it quickly became one of the articles of faith for modern deep reinforcement learning um certainly by the the folks at deepmind and many practitioners um their conception of intelligence is basically an agent being able to gain a reward consistently and summed up over the space of all of the computable environments and inversely scaled by the complexity of each environment conveniently their conception doesn't consider prior knowledge or experience or the information conversion efficiency or generalization we already know from the shortcut rule that you get exactly what you optimize for at the detriment of everything else if kenneth stanley was here right now he would have a field day if you haven't already by the way you should check out the episode we did with kenneth stanley it was my favorite episode of the show i mean a reward is just an objective at the end of the day stanley hates subjectives because of deception because they're convergent and because of the shortcut rule it might be a slightly cynical reading of stanley's work but at the end of the day stanley effectively figured out that clever objectives which is to say objectives which are impervious to shortcuts deception and convergence he honed in on interestingness and its proxy objectives novelty diversity preservation meta learning new objectives and information accumulation all of which could be optimized monotonically actually chalet figured out the same thing in a different way right the only objective in intelligence which can be optimized monotonically with no shortcuts is the generalization efficiency itself we already know that you can't optimize monotonically on a reward signal without deception why do you think there's an entire literature and reinforcement learning about exploration honestly i think the reward is enough paper is is just complete madness this is francois charles in order to build a general intelligence you need to be optimizing for generality itself so intelligence which is to say generalization power is literally sensitivity to abstract analysis and that's in fact all there is to it if you have a very high sensitivity to analogies you will be able to extract powerful abstractions from little experience so anyway this takes us full circle the story of intelligence started with minsky in the 70s who thought that intelligence would arise from kind of compiling several statically coded task-specific programs in the 80s the story evolved to the current position which is that intelligence lies in general learning ability you know being able to acquire new skills through learning this is still the dominant view now but we're seeing hints of the next generation of intelligence which is being able to efficiently learn new tasks or acquire new knowledge using what you already knew with abstract reasoning we should also realize that with intelligence the substrate and the context is important neural networks will never be able to extrapolate and will only ever achieve at best a glitchy representation of discrete problems intelligence itself might also be embodied or better thought of as an emergent process which is far bigger than any individual brain anyway i really hope you enjoy the show today we've had so much fun making it remember to like comment and subscribe we love reading your comments and we'll see you back next week welcome back to the machine learning street talk youtube channel and podcast with me dr tim scarf my two compadres mit ph.d dr keith duggar and dr yannick lightspeed culture now our guests today are so incredibly accomplished as scientists and entrepreneurs that i could probably spend about half an hour just enumerating their numerous and impressive achievements but here we go professor gary marcus is a scientist best-selling author and entrepreneur he is founder and ceo of robust ai and was founder and ceo of geometric intelligence a machine learning company acquired by uber in 2016. now kenneth stanley who's one of my heroes of ai was also one of the founders of geometric intelligence alongside gary which is super exciting by the way gary was also listed as one of the top 20 most influential people in in ai last year now um gary is also the author of five books including the seminal book the algebraic mind cluj the birth of the mind and the new york times bestseller guitar zero as well as the editor of the future of the brain and the norton psychology reader now he's published extensively in the fields ranging from human and animal behavior to neuroscience genetics linguistics evolutionary psychology and artificial intelligence often in leading journals such as science and nature is perhaps the youngest professor emeritus at nyu and his newest book co-authored with ernest davis is called rebooting ai building machines we can trust i also highly recommend you read his recent paper the next decade in ai four steps towards artificial intelligence we can trust now also joining us today is professor louis lam secretary of innovation for science and technology at the state of rio grande da sol in brazil his research interests are machine learning and reasoning neurosymbolic computing logic and computation and artificial intelligence cognitive and neural computation and also ai ethics and social computing he was formerly vice president for research at the federal university of rio grande assault in brazil he was dean and director of the institute of informatics ex-officio and elected member of the university council at the federal university of rio grande assault he's a member of the programme of organizing committees of a large number of international conferences and workshops on artificial intelligence cognitive and social computing logic and computer science and embedded systems and formal methods lewis released his new paper neurosymbolic ai the third wave at the end of last year it beautifully articulated the key ingredients needed in the generation of ai systems integrating type 1 and type 2 approaches to ai and it summarized all of the achievements of the last 20 years of research now one thing i really want to get out of the show today is moving beyond the superficial you know how exactly are we going to achieve a hybrid approach conflating symbolic and connectionist methods let's take it as a given that a hybrid approach would give us models which have caused a reasoning and are more interpretable more robust and more secure gary said in his next decade paper that without us or other creatures like us the world would continue to exist but it would not be described distilled or understood human lives are filled with abstraction and causal description i think this is so powerful francois charlay the other week said that intelligence is literally sensitivity to abstract analogies and that's all there is to it it's almost as if one of the most important features of intelligence is being able to abstract knowledge this drives the generalization which will allow you to mine previous experience to make sense of you know future novel situations anyway gary and lewis welcome to the show it's such an honor to have you both on here where are we at with neurosymbolic methods uh well first it's a pleasure to be here it's good to be in an environment where people take for granted these questions because i spent a lot of the last 20 years almost or even more than 20 years trying to get people to recognize the importance of abstraction so i i came to this having worked in psychology on children learning rules and came into the first way or second wave depending on how you kind of of neural networks and and people trying to argue that there was no abstraction that it was all just basically memorization through multi-layer networks they were then three layer networks and it's been a long hard slog to get people to realize how important abstraction is and i i think that there's been a real sea change in the last couple of years i saw um you know correlated with that is a lot of hype around ai with people who didn't recognize how important it is to get abstraction right and thought well if we just have you know a billion parameters that would suffice and in fact even now you have a lot of naive people thinking the gpt3 represents more progress towards language than than it actually does so it's been really interesting recently to see people like yahshua benjio who were historically fairly hostile to abstract knowledge and things like that um say hey we need to have causality in there of course there have always been people like ideoperl who have seen the value and then to see andrew eng say hey there's too much hype in a.i a couple weeks ago kind of blew my mind because he's been one of the the biggest hypesters and he he famously said you know uh i i can get a computer i won't quote a verbatim but i can get a computer to do anything that a person can do in a second the reality is even in a second we use a lot of abstract knowledge right we use abstract knowledge about how the world works in in order to interpret essentially every other so i'm going to defer to lewis on the state of the art in large part about kind of technical details of um putting together neural and symbolic things in unified mathematical formalisms and things like that i think he knows better than i but i will emphasize one thing which is that it's not enough to have a technical apparatus to be able to put these things together and that we need to look at a larger context a lot of what the next decade article was about was that larger context and so you know there were four points in that paper one of them was we need to have hybrids and i actually cited an earlier version of the great paper by lewis that you just mentioned that the recent version wasn't out yet um but lewis has been a pioneer in that but then i also emphasized um abstraction in general and you know the value of it which is of course part of why you want to have those hybrid models um and then i emphasized having large databases of knowledge and detailed cognitive models so some of what i still see in the field is people working on the technical details where in some narrow domain i've got an enormous amount of data and i kind of crush it with all of that data but what's really interesting about human cognition is we can pick up new domains um with relatively small amounts of data and interpret them people are trying to crush these things with large amounts of data but they're still doing it in this kind of anti-nativist way where there's no prior knowledge aside from what is acquired in the course of that system and i think if you look at how people actually encounter things in the world they have prior knowledge about domains and they can quickly assimilate new knowledge by building a model of what is going on and by having this rich database of we'll call it common sense sometimes it's expert knowledge um and so for example if you watch a movie and you see somebody pick up a gun you know something about what guns can do it's not like you're watching 30 seconds of the movie and then you infer that guns might kill people you already know that and you know some of the parameters like you know you know the gun might get stuck it might not be loaded you might know the meta narrative um i think it's due to check off if there's a gun on the table then you know you've got to use it in the plot it's probably not just there for decoration so you use that meta knowledge in in the process of understanding things and i have not yet seen any really rich reasoning system working at that level where it's all integrated the closest thing is psych which doesn't have the neural side of things at all i mean i guess they're adding in a little bit but it fundamentally is a purely symbolic system um and not a learning system so there's still lots of things to put together i think lewis is working at the foundations of how can you consolidate these kinds of information at all and we shouldn't lose sight of the fact that the larger context in which we want to do that is we want to be able to put all this knowledge together or right now you know we're doing um a video or podcast together and so you have knowledge about like what's socially appropriate for that i shouldn't get up and take off my clothes it's not that kind of video so you have background knowledge about etiquette you have knowledge about what the audience they want we're on the call with knows what the audience out there in listener land might know and we're constantly integrating just this huge range of knowledge and we shouldn't forget that that's why we want to have the neurosymbolic integration is to put together background knowledge some of it's going to be innate and maybe we can talk about that later some of it's going to be culturally acquired and so forth but that's the reason that we're doing this so now i defer to louis to kind of give you the technical state of the art on how well we can do that with any knowledge at all thank you gary thank you so much for the introduction too and we are living at some i would say some exciting times in ai one of the reasons uh is not the hype as gary mentioned but i see that uh the times we're leaving are exciting because people like gary or like postmolinsky and even joshua benjo as gary has mentioned have now looked at what's going on and have noticed the need for abstraction the need for interpretation the need for better semantics as gary said we had some very large language models that uh look very promising that look very exciting when they go into the press the the media the printed press and so on to the social networks these days not the printed press anymore and people get really impressed by some of the results we can mention for instance the example uh the exercise that the guardian did in the uk by editing that small journal piece last year and the editor said well i just put the same amount of effort the same amount of work that i would dedicate to any kind of piece that the guardian publishes so that's the summary of what the guardian did in some way however as we have defended here for quite some time for a long time as gary said in his brilliant work on cognitive science and the the way children uh reason the way children learn he has some very impacting papers on that and some books that are referenced to us in ai these days we have to look at the foundations and also we have to consider what kind of abstraction in ai and in computer science by the end of the day that we have to use we cannot forget that computer science was born out of the symbolic school of reasoning the symbolic school of logics we we don't need to mention alan turing and the pioneers and the symbolic reason and symbolic ai has always been at the core of the kind of developments that we are dealing with these days so when we see this kind of uh disputes or this kind of uh disparities that uh or this kind of separations or device that we see in machine learning we don't think that that makes much sense or that or that this kind of division contributes to the development of science in general and of uh ai and machine learning in particular that are at the four of this discussion and um when you think in terms of the kind of developments that we have been looking at over the last years we are looking yes for a foundation of integrated logical reasoning and machine learning why this is important well i'm sure that gary will mention it later on um when we develop a system we in an ai system or a machine learning system we don't we do not want only to find correlations or label classifications or better interpretation from the label databases that we have we actually want to to provide appropriate semantics for this kind of inference this kind of reasoning that we are working on very large amounts huge amounts of data and this is only growing and however we cannot forget and uh gary has said that in in several of his papers we have also defended this position that formalizing notions like common sense reasoning or even combinatorial reasoning that is more related to the core to the foundations of computer science is still a very very open problem in the field so what we look at in neurosymbolic ai is exactly how to compute and how to learn with symbols inside or outside the neural network in the way that we can provide a proper semantics to what's going on in this very large language models for instance or in this very efficient let's say um image interpretation and computer vision systems but we do not have at the point that we are in ai these days a proper and a complete foundational semantics for the field so what we look now here is how to provide a better semantical foundation a better semantics for what's going on in machine learning and also to explain how the reasoning process goes on when one is dealing with this very large connectionist model with this very large uh neural network models because we have to consider that when when one deals with millions and millions of labels millions and millions of parameters with the number of hyper parameters that one has to set these days we look some site of the foundation some site of what actually is going on in terms of rigorous semantic foundation that computer science demands from every field so in terms of what we have achieved in technological uh impact this has been impressive there is no way that we can we cannot recognize that however when we see the hype in terms of claiming that the systems are better at language interpretation or language inference or language understanding or machine translation or language translation we are still very very far away from the kind of foundation that computer science and ai needs so in this sense i make a small comparison a small historical let's say uh backtracking here in the in the early 60s at the high of the the space race there was a lot of investment in computer science for the for the reasons that we know we need to to put a man on in the moon and bring him safely back so there was a lot of investment in terms from from darpa from from from nato from several bilateral government agreements in the western world and there was a lot of developments in computer science and computer programming people needed to program computers much faster than using machine language or assembly language so there was a lot of development in the 60s and one of the developments that we had in the 60s was the search for semantics of computing the search for semantics of programming language what the program means what the program was actually doing this was extremely important if one considers and remembers the kind of harder resources that we have at the time we needed to provide a very adequate semantics a very precise semantics because it was a very right uh high risk project put a man on the moon so we needed logical and mathematical foundations and at the time there was this famous discussion between the computer science and the very prominent computer science in the uk called christopher strachey who is a named professorship at oxford university this day samson abrensk is the christopher extractor professor of computer science at oxford now and christopher strategy was trying to associate or to provide the formal semantics of programming language using a formal system called the lambda calculus the lambda calculi however dana scott was a u.s logician who who after that spent the next 10 years in oxford said well you are doing all wrong the lambda calculus does not does not provide an adequate semantics for programming languages we need to develop this director semantics a new theory to provide semantics for computing so in the end of the day they were able to provide a better formal semantics for programming languages and now we have very effective programming languages that were very useful not only in the in the let's say during the 60s but also later on today we know how to program we give better semantics to programming languages into the sense of computing i expect to see the same kind of developments happening now in ai and in machine learning we need to provide better semantics better appropriate semantics and this kind of results typically come from the logic community or the formal semantics community come from uh also they get feedback from cognitive science because we are dealing with learning here from neuroscience but we are not at the point that we have a solid uh founded semantics for machine learning and we need that if we want to provide not only a solid scientific background to our field but also to make sure that people who are actually using for instance machine learning tools or ai tools in medicine feel that this kind of tools and technologies are safe to to to massive usage so this is my introductory statement i hope we can keep going on on this subject yeah i mean definitely covered a lot of uh territory there and i think something that uh you know gary marcus said in the beginning that i find somewhat ironic there's there's an irony in the fact that it took so long to convince people that abstraction was necessary i mean after all they're doing their research using abstract language they're building systems on computers operating at a level of abstraction far above you know the the continuous wave functions that are governing the electrons etc i mean all around us there is symbolic reasoning there is abstraction and this is kind of a bizarre sort of denialism that this is all actually an illusion and not only that but it's not a very useful illusion that we should just instead reduce all the way down to you know continuous kind of kind of functions but i think a lot of that that fear or resistance to to kind of accepting that or the move away from it hinges on something uh that professor lamb said which is that you know we have to be able to optimize we have to be able to search the space and kind of find workable solutions for it and of course you know we should all know that that say discrete optimization problems you know integer programming etc have this long history being very difficult to solve because they have this kind of combinatorial nature to them and i think it was um in a debate with uh you you know professor marcus where where joshua benjio said oh well you know i can get uh i can get kind of discreet reasoning by just having multi-modal distributions and you know some type of sort of interactions between these you know multimodal distributions but i'm wondering the closer that they drive those kind of differentiable systems to emulate discrete or symbolic reasoning in some way aren't they just going to then bring in all the difficulties and training and learning that we see if you try to just go ahead and operate at kind of a more discreet level of abstraction i i think we haven't fully solved these problems right so what we want is to be able to harness the ability of these large systems to search large parameter spaces with all the value that you get from discrete symbol systems if what you do is you take your neural network and you emulate a symbol system exactly which you could do we've known how to do for many years then you're stuck with all the problems that symbol systems have had um you know if all you do is you emulate you know a 6502 microprocessor and then you know build the programming language basic on top of that then you write spaghetti code and basically you know you haven't solved anything i wish we had the answer right now but i think the truth is we're sort of like you know physics pre-newton or something like that we're missing some basic ideas about how to wrestle with the combinatorial complexity of reality integrate the knowledge that we have about the world works with kind of learning systems so um right now like i could tell you there's a you know i have a glass of water on the table and you can set up in your mind the semantics of that and louis might want to say a little bit more about how he's thinking about semantics but i'll say as a cognitive psychology that when i tell you there's a glass of water on the table that there's something in your head now that says there's a table there's a glass the glass is containing water and you know you encode all of that and now you can do some things like if you see me flailing around my arms you might say i wonder if that glass of water is going to be in trouble and he might knock over that glass of water and then you can make inferences if he did there might be glass on the floor there'd certainly be water on the floor it might make a noise we might have to pause the recording while he cleaned it up and so you can make all of these inferences and you can do some of that statistically and the confusion has come because the fact that you can do some of it entails in some people's minds that you can do all of it so you know if i fed that whole scenario into gpt3 sometimes it would get it correct but it wouldn't reliably get it correct so if i said um you know he knocks the glass of water over with his hand and dot dot dot statistically a likely continuation would be like and there's water on the floor and so you'd say hey gpt3 has the semantics it understands it but when we push these systems you quickly understand that what's really going on is it's like this giant library of cut and paste um with a lot of synonyms and it's just very superficial it's not really building up a representation of where the glass is where the water is and so forth um the symbol systems are really good at doing that and we've had things like you know blocks world for years and years they can make inferences i mean block's world wasn't perfect might do but the problem with those systems is like everything's hand-wired in advance in the next decade i give this article of parsing the story of romeo and juliet with psych and if you have programmers who translate everything in the story into symbolic propositions and you have background knowledge like people when they're dead or no longer alive like basic common sense but encoded well which is what sykes project has been then you can make great inferences that are like really high level inferences about like what juliet might think romeo's going to think when she drinks a fake potion i mean a potion that fakes her death right it's like really fancy inference that people actually can make routinely like everybody goes to romeo and juliet and they understand the dynamic that's going on of like different people having different interpretations and they understand the tragedy of it all and so like site can actually do that but the problem is site can only do that when all this stuff is hand wired and we can't sit around paying programmers to to you know pre-wire every narrative that we might encounter and so we need to bring these things together i think a lot of it comes back to lewis's point about semantics the only way forward is to actually represent the semantics of whatever it is you're trying to absorb you know whether it's romeo juliet or my my moral tale of my glass of water or whatever it might be you need to be able to manipulate the entities that your world is structured around and to track them so i tell you there's a glass of water and there's also a cat on the table now you think about the cat and that's in your mental representation and then you can worry is the cat gonna knock over on the glass of water and you make all these inferences we just don't really have a technology that can do that in the full we have technologies that can do that in the narrow so you know you can build some kind of well i mean like a gps navigation system does lots of inferences in the narrow right it understands you know where the beacons might be it can make shortest path relative to some known constraints and so forth so we could do this in the narrow but we don't know how to do it in the general what's really interesting is that um in in the good old-fashioned ai days we we had human captured knowledge right and we had the knowledge acquisition bottleneck because i'm a huge believer in the kind of semantics you're talking about and i agree that they can extrapolate a lot better although after the conversation with charley i was somewhat convinced that there's an appropriate um kind of substrate for an appropriate problem so you know type 1 might be good for mnist although interestingly there's a type 2 abstraction for any situation so you know you can abstract mnist into a discrete categorization but um i want to get to the nub of the difference between the gpt free fans and what you're talking about because we spoke to one of our friends earlier he was one of these differentiably minded connectionist you know gpt free loving types and uh connor and uh anyway they think that it is statistics all the way down and that we perform some kind of bayesian update to data you know with experience and they think that there exists a universal generating function of the universe and gpt3 is somewhat akin to this they think that any symbolic reasoning is simply an emergent phenomenon and a form of error correction if you like and would indeed even emerge in a larger version of gpthree if it existed even if this were true in my opinion it's a bit low down in the taxonomy to expect our models to memorize all the permutations of everything it harks back to the ridiculous arguments about touring completeness and universal function approximation and the need for infinite amounts of data and training passes as you cited in your paper you know when when humans hear the statement about water leaking from a broken bottle uh gary they can abstract and therefore extrapolate that information to work for ball bearings falling out of the bottle or dice falling out of the bottle you know there's something it's critically important about being able to abstract information and acquire knowledge automatically that seems to be the thing that we're missing right i mean absolutely it's what's missing i i i would say that you know your friend that you're interviewed has a lot of buzzwords but that there's no connection between those buzzwords and a working system at the moment so it could be that many of those buzzwords really are part of the picture so bayesian updating i i'm you know pretty positive about um differentiability is probably part of it i don't think it's the whole solution nobody's really gotten all that to work um and there was another buzzword oh emergence um and you know it's nice to say things emerge but like gpt-3 is a really honest test of the hypothesis that's been kicking around for depending on how you think about it 20 or 60 years of everything will emerge when i just eat in enough data i mean when i was in graduate school jeff ellman was writing these papers about simple recurrent networks that in many ways foreshadow what's going on right now he's no longer with us i sparred with him many times he doesn't get nearly enough credit for having anticipated all all of this stuff 20 some years ago the first thing i did after graduate school was to have a debate with him uh at mit um about simple recurrent networks and the same issues are still kicking around um but we have a mind-boggling amount of data compared to what he was working with he had like 600 sentences that maybe had handwritten or not that he fed into this proto multi-layer you know neural network and it could predict certain things about grammaticality basically so subject-verb agreement it had its limits some of which i pointed out or whatever but um you know we had a little back and forth but now you know you have gigabytes of data instead of like a few k of data and it's gotten so much better in some ways and not at all better in others so which ways has it gotten so much better well his system was like cat drink water it was like you know it was like telegraphic speech or you know that we talk about in child language literature was nothing like real language but he was right to see in that that you could emulate language if you had more data and that you could there were regularities that you could capture and he did you know interesting analyses about the clustering of sentences and i made the counter argument i said even when you have these clusters of sentences that doesn't mean that the system is truly abstracted if you have a new sentence it's going to have trouble and i i did some formal demonstrations in a 1998 paper called rethinking eliminative connectionism that he has anticipated so much of what's going on and you know one of the things that i said was their problems with same different there was just another paper came out a few days ago i haven't read yet showing the same different uh problems in the context of current neural networks so in 20 years later even just the basic notion of same and different is hard if what you're doing is you're accumulating a lot of things that doesn't mean that you have that they're similar doesn't mean you actually understand the abstract principle that underlies them and if i can just give a little philosophical um terminology philosophers sometimes talk about extension and intention and intention is i think close to what lewis means by semantics so extension is like i i show you i tell you what a pair is in cards let's say and i show you two twos two threes two fours two fives etc and you just have the list you look up in your table um a nine matches nine great now i know um that's the extension of a pair in cards the intention would be like knowing i've got two cards that match and by and large what these neural networks do is they traffic in extension things that have appeared together and clustered together the intention is understanding that the deeper reality that is causing it such that you can generalize it well we're still kind of stuck there it's just the extensions have become massive and so they become more compelling for certain things they become more compelling in this prediction thing but you can always break them so you know i i had this example in technology review with ernie davis where you're drinking uh cranberry juice and you're really thirsty you don't quite have enough you pour in some grape juice and what happens and gpt3 continues you drink it which is a plausible continuation and then it says you die which is not plausible because nobody drives from drinking crayon grape juice right and so it doesn't really understand the you know the toxicology of human beings even though it pretends like it does when it says you die but it doesn't actually understand anything about those mechanisms and so the extension has led it astray in that case because um there's not much left you're really thirsty correlates in its database i'm kind of being a little bit loose here but um correlates in its database with sentences about you die so lots of times when you drink something out of desperation maybe you die from it because you know or in its database like people tell tales of you know adventures or who drank the you know the poison or whatever um that's that's a fact about its database and maybe that's another way to put all of this is these systems are totally driven by these kind of idiosyncrasies of database because they don't represent the intention just the way with my pair example like if you happen not to have had two tens if that's all you did it was a lookup table then you're in trouble when you get to the two tenses you don't really know how to think for yourself um and semantics is about being able to think for yourself at that level and so you know what's better is the the emulation you get from the extension is so much better because we have you know training sets that are approaching a terabyte and so you're much more likely to find something in the extension that's close to the thing you need and there are some places where that has practical value if you wanted to make an autocomplete system then that's awesome right it's going to make the best auto complete system that you can imagine but if you wanted to give medical advice and somebody tried to set this thing up as a suicide prevention uh counselor then it's a tragedy right you know there's an example out there on the web where someone asked gpt3 or said i want to kill myself and and gpt3 said i think um that would be a good idea like this is not what you want like it doesn't understand the the the underlying conceptual framework of a suicide prevention hotline it's just randomly putting together a sentence that seems to be close to this database and even with a terabyte it's not enough so you go all the way back to um john locke and the idea was hey if i just have enough sensation it will all and this ties back now it will all emerge i don't know if locke used that word i can't remember off hand but but your friend with the buzzwords is like it will all emerge when we have enough data well no it will emerge when we have enough data and we have a formalism that allows us to put together the symbolic knowledge with the kind of large-scale quantitative data in a way that can contact a cognitive model and allow you to do reasoning over that's when it will emerge but that is to say we need a bunch of tools there that we don't quite have it's not going to just come out by magic gpt3 was the test of the magic hypothesis put it all in with you know a reasonable structure and transformer network and hope for the best and what you get is a mess i think the argument to make of we need abstraction we need sort of semantics to describe it all is it's an easy argument to make but how do you how do you being in in this field arguing for this what's kind of a workable definition of abstraction like how do you you know going also here beyond the buzzword of you know we need abstraction i i do abstraction you do abstraction like when does a system do abstraction and what would you accept like if i came to you and said i have gpt4 and it does abstraction what would you accept as a test that that is happening i think the proper tests aren't being done right now and i think there's a related problem that people have finally gotten wise to which was actually at the core of the 1998 article which is about extrapolating beyond a training space so the core example that i used there was actually about same different it was i'm going to train you on the identity function for even numbers of a certain size and i'll give you some examples and then there's going to be two kinds of generalization you can do interpolations like you've got a cloud of points i've trained you on can you interpolate between them and yes neural networks can do that not perfectly there's a bunch of problems but you know reasonably well um and then there's extrapolations like i give you an odd number and it's represented as binary in digits well the odd number is outside of the extensional space um it's outside the training space of what you've seen or the training distribution and that kind of system can't generalize and humans do generalize or generalize in this particular way that i would say is the abstraction kind of way in the same circumstance so i followed the 98 article up with a 99 paper in science with seven-month-old infants and i showed infants do the kind of generalization or abstraction where they treat something as a class so i what i gave them was set of sentences like a lot and i had some collaborators we i should say uh my team and i um we gave um people examples like la tata ganana um to be of an abb structure let's say and we'd either give you something all new words with the same structure wolfe a or a different structure woe rothe which would be an aab structure and we we balanced a lot of stuff about phonetics and stuff like that so that you couldn't just rely on on phonetic overlap we did a bunch of controls and the study has replicated which is not always true in psychology but this one has um and what we found is the kids recognized change um to a new structure over new material and that's what you you know that's really the essence of it that's what it is to generalize extrapolate beyond the training uh distribution and i think that's really what abstraction is about it's to say i get it this is identity or identity over time um so you know the first one was identity with the eve i was literally describing identity so f of one one zero is one one zero and then the second is identity over time basically so i have a a b or a b b um and that is the course to be able to do something like that of course it's not just identity in fact what i described in the 2001 book the algebraic mind was a whole family of functions that i called you quotums um i don't know if i've ever said the word out loud or not um universally quantified one-to-one mappings u-q-o-u-t-o-m it's been a long time it's been 20 years um so so there's a whole family of things where each input has a unique output and it's not that i think those are the only things you can do with abstraction but those are the ones that are most cleanly just cut away from the things that you can do with other kinds of systems so they're the scientifically purest case are the ones that require you to take a one-to-one mapping every input has a unique output and generalize that outside the training space so we can do that so you can do f of x equals x for any number you can do f of x equals x plus 1 for any number x plus two right there's an infinite number of these things you can think of language in the same way so like i can add e d to any morpheme and i get a new past tense so i can say um you know this is glass clothing and i'm i've just glass clothed my laptop i don't know if you could see it so i've made up a new verb and you know that you're going to add ed um to that thing or maybe it's glasses cloth and then i've glasses clothed it and so you can play this game perpetually you don't need to have it in the extension of what you look at and there's a little cheat having to do with phonetics which is which we ruled out to show that it really is generalization outside the training space the kids could do that's a working definition of a form of abstraction it's not the only form but that's the one where the neural networks keep getting into trouble over and over again and it's really foundational there's so many things that we know um the latin for it is mutatus mutandus we know that this thing is true for you know but i'm going to change a little so i'm i'm going to say you know walk is walked talk is talked and mutatus mutandus glass is clothed you know or glasses cloth goes the glasses cloth right we know this kind of i'm going to change it with respect to this item that's that is a you know pretty clean version of abstraction that we don't have an adequate handle on i mean there's some technical machinery now that can do it in little narrow cases you want to be able to do that in general i mean here's here's one way we would know that we're making progress in ai would be there's any this huge database free to which i frequently contribute money not time called wikipedia which of course you all know right human knowledge that is in written form so machines in principle could read it the reality is all the machines can read right now are the boxes right so what year was somebody born what is the population of canada and so forth we get machines to do that that's not hard at all we you know we suck that up into google knowledge graph and so forth but the unstructured text in wikipedia contains you know much more than any individual human knows and we should be able to suck it up if we really had ai the way the newspapers tell you you did then we will be leveraging that we would be able to use that to like make advances in material science or figure out new ways of manufacturing vaccines or whatever it is we wanted to do so there's all this information it's there for the taking and the proof that we don't have ai is nobody has any idea whatsoever how to actually take that beyond really superficial things like um you know playing jeopardy by finding the title of the jeopardy question that most closely matches the wikipedia page because most jeopardy answers or titles wikipedia so we can do that but we can't actually have somebody read this stuff and figure out how the world works whereas you know my seven-year-old can we read you know wikipedia and learn something i think this is the key thing though i agree with you that abstraction is it's almost synonymous with generalization and you cite the example in your in your paper gary about the scalar identity function i think it's a beautiful example i i cited it on the the show the other week which is that you can't generalize even outside of the training range and you went a step further if you train on the even numbers you can't even generalize to the odd numbers it's a sorry state of affairs and as a human we would look at this it's a bit like an intelligence test you know you're just you're trying to find the obvious pattern that will work outside of the um outside of the training range and maybe there's some occam's razor maybe there's there's some core knowledge of prior knowledge that needs to come into that but to us it's absolutely obvious what the pattern is but to a neural network it's not obvious now charlay i think has the best articulation of this concept yet but i think actually after i read your paper you were saying pretty much the same thing many many years before gary except when i tried to publish it um just as a brief historical footnote um one of the reviewers said i was doing a terrorist attack on neural networks it was not relevant well no i think what's fascinating about this is is the concept of um so the brittleness depends on the type of the problem right so you know neural networks are brittle for computing the digits of pi and symbolic methods are brittle for a computer vision task like mnist so you you make the argument gary that we need to have a heterogeneous ai that kind of enmeshes these two things together and another thing i noticed reading both of your papers is that i mean you get a lot of um flack gary but i think you're a moderate you know the kind of recommendations you're making are already normal you know i mean if you look at a lot of undisclosed hybrid methods like alphago or even gpt3 in my opinion has a discrete input a discrete output a discrete search on the top of it you know so these are um continuous models in the inner loop with a kind of discrete search or some kind of discrete concept in the outer loop we're already in the paradigm of hybrid models aren't we you know i wrote this other paper we haven't mentioned um in 2018 called deep learning a critical appraisal and the take-home message was that deep learning was one tool among many that we needed hybrid models and eric bringyoffsen the economist who i think has done the most for sort of thinking about consequences of ai immediately tweeted about the piece and he said that you know it was really provocative and jan lacoon um flexed his weight and said well it might be provocative but it's mostly wrong and then his followers all attacked me on twitter you can go back in into the archives um everything that i said in there i believe is now actually received wisdom three years later so i i was brutally repeatedly frequently attacked for it for saying that these systems don't abstract very well that there's a real problem with extrapolating beyond the training data that replicability was a problem that there's no real semantics there um etc i don't remember all 10 off the top of my head um i think all of those are now actually received wisdom they're now in fact if you watch benjio's recent talks they're basically the the introduction to his talks are those things that i said and you know lacoon has turned around and you know even a year ago i was attack or a year and a half ago i was attacking gpt-3 and or two excuse me gpt2 saying that it doesn't reason very well and he tried to say you're just talking about number problems i said no it's a more general problem um and he said well you're fighting a rearguard action we solved this at facebook three years ago and he posted a link and then i could never even get the source code um uh for the paper and i mean it's not solved and now so even he is actually turned around and he he's the one who brought to the world's attention the gpt-3 suicide example so he is now you know kind of he's all about common sense and how the these big language models don't cut it so even like my fiercest critics have actually turned in the last couple of years the iannick said it kind of amused me a minute ago it's easy to make the argument about semantics well no it wasn't for 20 years every time i made it i was accused of being a terrorist and a bad person whatever so it's not easy to make that argument it's easy to see it in hindsight and i always think of the leos silyard thing about the three stages of scientific truth which is it's wrong um it's not important and we know it all along but we do need both approaches so um they're written a lot of data coming back to what you said uh for a second it is actually the moderate view to say that there is wisdom in these two traditions there aren't always so you know creationism versus darwinism i have no patience for creationism i do not think there's wisdom in that tradition sometimes when two sides see some value one of them i think is just off i don't i don't think there's any you know there's a kind of emotional wisdom maybe in patience but there's no reality to it scientifically but a lot of times when two different sets of people see value in two different things that are trying to approach the same problem there is actually value in both and that is the case here going back to 1940s um there has been clear value even earlier on the symbolic side there has been clear value in symbolic computer science right all of the world software pretty much is still built on it you read these like press releases about how we're going to get rid of programmers in favor of machine learning but this is i won't say it's but it's it's really far from reality that is not how you build a browser or a web app or what i mean sure you you might you do perception with a neural network but the basic logic of your system is in logic you know every computer language is just logic so you've got people who said hey this stuff might be really valuable for ai and they're right and then you've got these people that say there are these problems that are not amenable to that because there's all this kind of statistical stuff that isn't naturally captured although you can't actually integrate in the symbolic stuff um and you know there's this other way of thinking about things and it's really good for pattern recognition and they're right you know going back to mcculloch and pitts and and um rosenblatt and and so forth like rosenblatt was right he built a neural network it was featured in the new yorker in the 1950s it said it was going to change the world and it kind of did you know 65 years later so like there's something to both of those traditions but because like grant money is at stake people have been kicking each other for years and years saying no give me all the grant money don't let my friends have it they're doing something different and so there's been this hostility for literally 65 years that has not served the field well and so like people like louis are trying to like step beyond that and say yes there's something here but there is this historical context where people want you know their students to get the money or they want the you know the grant to get the big jobs or whatever it's led to this incredible amount of hostility and then you have people like hinton who once saw the value of the integration and then saw the value of like the hype and the i don't know i shouldn't dig too much into him but he wrote a great book lewis can tell me the year um or i can try to get in 1989 or something like that um on neurosymbolic integration it was an edited volume i think for the journal artificial intelligence and then in recent years hinton has had no patience for the symbolic side at all saying we don't need this his recent paper glom which actually said some nice things about the tech review is really interesting but it's like bending around in in crazy ways in some ways in order to not use symbols so the the question that the glom paper is really trying to deal with is how do you get a network to uniformly represent parts of things well the obvious answer is to use symbols and label those parts which is you know i mean has in some parts of the world been useful and he's doing everything he can to avoid doing that and maybe there'll be some success there i mean as he says at the beginning of the paper this is like a theory and not a model it's not actually an implemented model i think it's the first sentence of the abstract or something um but you know i actually think it's interesting to try to think through those questions because none of us have the answer right so looking outside the space of the answers that we have is the only way to succeed the the space of answers we have is not adequate it just isn't and so we gotta look somewhere else and so i i admire him for trying to do that and not getting sucked up in the can i beat endless you know by another tenth of a percent or imagenet by another tenth of a percent which is what most of the research does it does not take us to what is the new idea here that that we need and so um you know i like the paper in that sense on the other hand i think it's tying itself into not saying i don't want to use symbols here and he's got this like very weak paragraph at the end i sent him fan mail for the first time in my life and i said i agree with every everything in the paper or i find the whole paper you know really provocative except of course i do not believe for a minute the argument you gave against symbols which was hardly an argument at all i can't even remember what it was but um so you know he's still like pushing with all his might to try to exclude the symbols and i don't think you know there's value in that any more than i think there would be value in the symbol of saying you know give us us big computers and we too will you know succeed and we won't need the neural networks we actually need both and we need ideas that we don't have yet i absolutely agree yeah absolutely agree with uh gary on this point that we need both so tim that is the moderate view is that we need both lewis and i are the moderates but we're not viewed that way i mean we were viewed as like you know tankers curmudgeons holding on to the past um we're not viewed as the moderates who are trying to like you know find room for everybody it it's all extremely sensible as i said when i was surprised actually but i do want to stress though that there is a if intelligence is generalization there are types of generalization and abstraction which you can only achieve with neural networks i mean just look at the manifold of human faces for example try and create that generalization for one second um for a visual you you do this like kind of mtv style right video style you're going to cut it together um there's a there's a maybe i can dig it up for you a headline of uh reporting the bengio debate that says gary marcus is the villain ai never needed you you should uh throw that in there in some way excellent it's a cool title to have right i mean like that's something that's something you write on like on like your autobiography going back from the past a bit to the future and and lewis what you said before talking a lot about kind of um we need semantic lo sort of semantic the ability to semantically describe what we're doing and so on if i think of today really getting down to how do i implement a neural network right i say here's my input data right and then i do keras or something i import some module i do layer layer layer layer layer right and then i put a loss function i put an optimizer and i run go so in in your in if you can if you can outlook to a world that you would like to see um how do we need to change this in order for us to be able to better semantically describe what we want to do with uh with these systems that's a very good point yannick um let me mention something about uh relations and uh what's going on in terms of what you said in a very simplified way we touch we touch on this point in our paper we have this so-called embedding techniques these days we mentioned it in in our paper so we wrote about it where one transforms the symbolics representations into vector space but this is for implementational purposes we just discovered i don't know if gary agrees that it's very efficient to use tensor products and so on and linear algebra and this kind of algorithms that have been uh very effective okay but this is a uh this is an implementation strategy as i say this is not uh this is not proper um a proper semantic foundation for deep learning the promising the approaches that people use these days technically they have the so-called embedding techniques where they seek to to translate the the symbolic representations or the the symbolic information to vector spaces where the reasoning process or the the implementation of what of what people call reasoning takes places via matrix computations over distance functions and in this systems um the embedding let's say is carried out using back propagation typically or some form of uh modifications of back propagation who's which is still very popular it's which is still used these days and in this way what people are seeking is a kind of a manual translation or manual way of representing relations in a distributed neural network in a distributed neural network that is seen as tensors or matrix computations and so on another thing that people are doing these days that perhaps perhaps i say perhaps will shed some light in how to to have better abstractions is that people are starting or are using the so-called graph neural network models but in the end of the day a graph can be seen as a relation right you can translate uh graph knowledge or graph neural networks into relational forms and relational forms or relational representations are in the end of the day uh a form of uh predicate logics first order logics and then this can be translated or this can be related to the kind of work that people did for 40 years 40 years in databases in relational databases so there has been some very interesting work by alon halevi from uh facebook now he used to be at the university of washington in seattle he came from the database community from the foundation of database community using data log which it's a kind of extension of prologue to deal with databases and he's now analyzing how to build these neural databases this uh deep learning databases and so on so i in this sense i'm kind of optimistic in seeing that people are noticing the connections people who works on the foundations of science and foundations of ai and these people are very very serious people that they have a lot of respect for the symbolic world they are seeing and analyze the connections that we have now between say matrix computation linear algebra tensor algebra graph neural networks and relational work and the relational work in ai and when one looks at relation relational reasoning that is the the reasoning and the learning that people are after in ai even in deep learning people want to discover relational information want to discover uh relations between objects in scenes in movies in language in image interpretation all this machinery all this formal background is in symbolic ai you cannot ignore that when you look at the history of computer science as gary said even mcculloch and pizza when they provided perhaps one of the first neural network models one of the things that they shown in the paper was how this kind of networks they proposed carried out boolean reasoning logical reasoning if you look at the paper gary right and they were cognitive scientists right they were looking at the forms that neural networks could in the future perform logical reasoning they were let's say the first neurosymbolic area the first ones that were neurosymbolic were mcculloch and pete's who are let's say the the forefathers or the godfathers of neural networks if we have the sense or the the of or this definition so what i see now is an opportunity for people to realize that it doesn't it doesn't make sense to say that symbols are useless because they are not we just celebrated 50 years of perhaps the most influential paper in the history of computer science which was the paper by stephen cook on np completeness the beginning of may was when the paper was published can we ignore symbolic computing in computer science absolutely not are we going to prove that p equals np using uh deep learning techniques probably not we are going to use symbolic machinery we are going to be to you to be used uh to use um combinatorial techniques combinatorial reasoning common sense reasoning in building the proof's inference and so on so i i can i don't see the reason to be a divide i follow i'm very happy that some people from cognitive science some people from the symbolic ai world and even some people that come from theoretical computer science are starting to work in this field and analyzing the connections between symbolic reasoning and machine learning symbolic reasoning and connectionist models deep neural networks we can also look at the work from people like as i mentioned alon halevi uh people like martin growie who is analyzing the the logic underlying graph neural network models which are quite popular these days and highly effective in representing some combinatorial problems and learning about combinatorial problems and so on so i believe that in in a few years time maybe five years we'll have a much more adequate relationship between the communities let's say where people will respect more each other right and we're also going to see more and more connections between the results that we have in the in databases and the results that we have in database theory the results we have in in logic in computer science logic and the the the results that we have in deep learning so in this sense i i never saw the reason for this sometimes very not very kind debate between the two communities and i believe that computer science the way computer science was born if you take for instance the work by john mccarthy the work by alan turing and the work not to mention the work that the cognitive science mcculloch and pizza did already in the 40s the symbols were there i think one day a social psychologist really should examine kind of the history of this this acrimonious uh relationship because i mean you know to professor marcus's credit uh we've been reviewing all this material right in preparation for today and he's been very consistent and one thing that i think was quite disingenuous of some of his opponents is not only do they really try to stay out of his side of the field you know at all possible contortions uh but they also move the goal post to kind of subsume like his his area and now claim it's what they've been talking about you know for kind of all along right i think that's that's quite disingenuous but you uh you mentioned a trigger word for me which is sort of np you know the the problem space of uh mp completeness and you know this the space of mp or mp hard problems often arise almost all the time when you have these sort of discrete problem spaces right like the traveling salesman where you have to make kind of discreet discrete decisions or the three sat problem or uh integer programming so again we're coming back to this duality right we have kind of almost the wave particle discrete continuous duality and and as you both said nobody solved the problem of how do we do optimal learning or even efficient pragmatic learning in the space when it involves this discrete sort of symbolic paradigm i'm kind of wondering what you both think right now are the most promising directions for that like we've talked to some of the the program synthesis community like sort of dream coder you know professor lam i know you have some recent papers on on as you mentioned like the graph neural networks i mean i think there's a lot of options out there are you seeing any type of convergence or or areas that we should be spending a lot more time on that are kind of being overlooked right now i think you two wrote a song about this called i still haven't found what i'm looking for um what i'm looking for is not just i think in the end let me say a different way um the the right solution here is probably going to be both an algorithm and also a large knowledge database and i think what people are looking for is an algorithm and trying to dock the large knowledge database question so you know the person who most went after the large database question was doug lennon and nobody outside of psych is particularly happy with what he came up with my view is that he was trying to do something great he didn't succeed but that what he was trying to do is vital and i wonder if we had the right algorithm but we didn't have knowledge in a machine interpretable form whether it would be that useful whether we would even know like maybe even we have it now but because we try to use it in a void in some sense in this very empiricist i'm gonna start learning everything from the data kind of way it never really amounts to that much so i often think about this paper um by rich sutton called the bitter lesson in in sutton's paper the argument that he makes is every time we try something big data wins over knowledge and i think he is right as a historical matter that that has been the case but what i think he leaves out is that as a historical matter we've only solved a tiny fraction of the problems that we hoped that we would solve in ai then most of the problems that we hope that we solve in ai have not been amenable to either a knowledge-based approach or to a big data based approach so for example the problem of language understanding is basically unsolved at this point whatever hype aside um it's basically unsolved and it does not appear to be amenable to big data and in fact i mean this is another way of making the hybrid argument it needs something beyond what we have and so if we had to write hybrid algorithm but we didn't have the knowledge to go with it and the kind of cognitive structures for how you build mental representations over time if it wasn't part of a fuller richer system i'm not sure it would make a difference and i'm not sure we would know um i'm trying to think of an analogy here like it could be that a three-month-old baby actually has the innate knowledge the right cognitive system but they don't have enough common sense knowledge acquired through those mechanisms yet so in fact the three-month-old infant has the you know one of the best minds on the planet you know no non-human will ever match what that three-month-old is about to do its potential is enormous but if you sat there and did psychological studies you'd say okay great if i do a study right it has object permanence and piaget was wrong about that but you'd also say it really doesn't understand beans about what a laptop is for and it doesn't understand you know how to assemble an ikea kit i guess it's not passing my benchmarks let's toss it out right and you know you you know if you had that magic formula but it was uneducated you'd miss it we really need a full system and building that full system includes going after lennox problem of what large-scale knowledge you need to have in excess of the form and how it needs to go after the problem i keep talking about but not enough about which is the cognitive model problem which is as information comes in how do you assemble that into a database so yeah people talk a little bit about knowledge graphs and stuff like that but um we really need a rich answer to that so how do you build a mental model of let's say the people that are on this call and their motivations and stuff like that um until you have that you might actually not know it um you know that you've solved this other piece of it so one theory is maybe it's already been solved but nobody notices and you could ask what are the most promising things i think in general people that are trying to say how do i use prior knowledge to affect learning or at least asking the right question i'm not sure they're asking it in the right way but i think that's the right questions if you know something about something you shouldn't just start from scratch and of course convolution is the greatest version of that that we have or one of the greatest versions i know that the world is orderly that you know the thing over here and the thing over there are likely to be the same essentially right it is translation and variance that's a piece of knowledge that we integrate in a learning system but we don't have as far as i know a general answer to that a bunch of people in physics for example are thinking about that a little bit in narrow cases what if the prior bit of knowledge i want is that objects move on paths that are connected in space and time right so you can't just wink in and out of existence the star trek transporter doesn't really exist right um that's as an example of a piece of prior knowledge that's relatively easily stated every physicist knows is essentially true at the medium object level we won't get into quantum stuff how do you put that let's say into a neural network of some description that is going to watch the world and be able to tell me which are the individual entities here how are they going to leverage that in their segmentation system let's say to be able to parse out the world and understand paths and stuff like that should be a solvable problem and we should in 2021 actually have an off-the-shelf technology where i can you know somehow encode that knowledge and then put it into my learning system i would say i haven't found something that lets me do that in a comfortable and general way and so i'm not satisfied with what's out there maybe someone's written about it and you know it's not getting the attention it deserves yeah i i think enmeshing the um the symbols and and the deep learning models that's challenging but from my perspective the reason why go5 failed in the 80s was because of the knowledge acquisition bottleneck and and the brittleness i even see this now that there are companies out there that are building large-scale knowledge graphs and there's just never enough knowledge you just need more and more knowledge and and and it's almost as if the knowledge we captured before didn't seem to work we need more knowledge for a slightly new situation so the the first problem i mean the first problem is that we need humans to capture the knowledge and i'm thinking well what are they doing well i think the best way to think about what they're doing is you know when you when you do computer programming and you do object-oriented design you design classes and you design inheritance hierarchies and you take two objects together and you say what do these objects have in common well they're both animals they both have fur they both have legs and you're you're designing these abstractions that generalize really well and we need to have an ai that can do that automatically right yeah i mean going back a sentence or two it takes a lot of knowledge to do much of what humans do so you think about i don't know a 13 year old watching a movie they've been on the planet 13 years acquiring knowledge constantly you know so i i have a um well soon we'll have a seven and eight year old i have a six point nine nine and an eight-year-old and you know they spend most of their time acquiring knowledge of some form or another you know this week it's chass and spies those are their hobbies and so they're learning like you know what do spies do and what's a pin and chest and stuff like that there's sponges for abstract knowledge and concrete knowledge knowledge of all sorts some of it's metrical how do i do a cartwheel but they are constantly accumulating knowledge and by the time they get to the point of doing many of the things that we expect you know they've been doing that for years and years so like why should we think that it's a quick fix i don't think it is i i think you know it it's gonna be a lot of work the nice thing about software is and this was leonard's thesis which might still be true even though i don't think he succeeded um it was it might be worth like you know several thousand people hours let's or people years even um in order to do this once because once you've done it you can percolate it and everybody takes that argument for granted in driverless cars they all assume yes it's expensive but when we get there the value will be enormous so it's worth it and very few people talk about that with respect to knowledge like you know n of one lennox the only person who still kind of says you know it's really worth it to accumulate this database and he's been doing it for 30 years i think he's put in 1200 person years or something like that back of the envelope 1500 which is a lot but it's you know it's not that much compared to what it's taken to build i don't know google search is a lot more than that right and so like it's a lot compared to what an academic can do in their lab um or maybe what a startup can afford to do but it's not on the global scale that big um you know maybe a need to do that before you get any of this off the ground but maybe we need to do it in a more modern way using techniques that lennon hadn't thought about when he started this in in the 1980s but maybe it's unavoidable i mean my my strong suspicion is that it is a bullet that we have to bite and that nobody wants to bite it and so everything suffers as a consequence because we what we do since we don't have machine accessible knowledge that would take a lot of work to develop what we do is we try to trade it off with these massive databases drawn from reddit which you know brings all the kind of bias and anti-vaxx nonsense and and so forth with it it is not actually distilled enough to really be all that useful and you know some ways sometimes the the harder approach actually turns out to be the right one it's like guitar instructors have this saying um slow is fast and fast is slow so the field is taking the fast approach of i'll get all the data i'll put it in this big thing and i'll be there but it doesn't really work it ends up you know in the long run it's not actually taking us to the deeper answers the slower approach would be let's do the hard work of one by one taking some of these important priors josh kennedy's the only person i know who's really trying to do this of trying to take a prior like spatial temporal continuity and figure out how would i put that in with the rest of my system and maybe if we can figure out a few of those we can accelerate the process and it'll get faster and faster but we might need to do some really hard work on some basics like that of you know how am i going to represent just the minimal kind of physical reasoning about the world and psychological reasoning and you know maybe each one of those pieces is going to be like a phd thesis to get one fact and that is going to be so depressing compared to i you know raised imagenet by one percent in six weeks of work by by you know twiddling this parameter of connectivity um nobody wants to do it sure yeah and you might win a turing award by figuring out to incorporate one form of invariance you know efficiently into into machine learning i really want to pull in uh professor lam on this this question too but i know that uh i believe professor marcus you have a hard stop in about four minutes correct or even two i think okay thank you so much for being here thank you thanks a lot for your questions great seeing you professor lamb i just wanted to follow up so we we've gotten kind of uh professor marcus's take on look we've got to build this knowledge base and also find ways to incorporate prior knowledge i'm really curious what your take is on on some of the most promising avenues and perhaps even overlooked avenues for finding a way to more seamlessly integrate you know continuous and discrete activity symbolic reasoning with perception yep that's a good point there's a very good question let's say when we think in terms of relational learning uh learning this structured relational domains like for instance let's think for a moment as a for instance of a database sometimes very hard because this kind of discrete relations uh technically uh they lead to to to gradients that are not very easy to deal with in uh in neural networks so there is a lot of challenge in representing relations because of this technical uh let's say this technical challenges that we have to deal with in neural networks and for instance when we confront let's say images uh or signals they are in a way they are very appropriate for for neural networks in terms of pattern recognition in terms of finding relations and correlations and associations between this this kind of data however uh when we do if relations in natural language it's uh is a kind of data where we find a lot of relations between the structure of the sentences in terms of the the grammatical structure of language this become this become a kind of a very challenging challenging problem and and this is why many people have uh have claimed have stated that in spite of having this uh very impressive large language models where is still a long a long road we're still uh very far away from having natural language understanding uh proper natural language semantics and someone was saying here how you define semantics how we define a formal semantics for language and this of course uses uh uses tools from from logic use those from theoretical computer science and in linguistics and in formal semantics and in the philosophy of language we have to provide an account of the meaning of how you compose the parts of sentences the parts of texts that's why this is so difficult to do in very large language models because you find the correlations you found that you find the proximity you find the approximations to kinds of bits of text that are related to the queries but you have no formal definition of how one thing relates to the other to the other so this is one thing that's very hard and this is a well-defined research subject research community as well in in logic and computer science but uh that we're not being able so far to translate the results from this community into natural language understanding for instance international language inference and in deep learning when we think in in terms of the the the successes that for instance deep mind has had with alphago alpha zero they have some symbolic component it's clear that they have some kind of symbolic comma component in their systems and they are doing as henry cult said at triply i 2020 new york about a year ago in february 2020 they are doing some form of neurosymbolic reasonings in neurosymbolic learning and this tension that we have between learning and reasoning is attention that has this long history in computer science but i also i also like to to to state a sentence that uh professor les valiant who is a touring award from harvard university who is also a british computer scientist las valens has said that while distinction between learning and reasoning go as far back as at least aristotle because when you think in terms of the syllogistic arguments and the inductive arguments that aristotle used more than 2 000 years ago we already had this tension between let's say deduction and induction and induction could be as a as a could be seen as a kind of uh learning learning procedure the interesting thing about semantics because we speak with waleed sabbah who's a linguist and he says the biggest problem in language understanding is that the problem of the missing text which is that almost everything that we mean we don't say uh so it's it's all very well having uh the semantics but but you know understanding seems to be something it's another module that we need to have on top where we reason over world knowledge and we do a whole bunch of other stuff so where does the rubber meet the the road between that understanding problem and the semantics that you're talking about that's a good point we have a lot of contextual information here you have to think let's say what i like about uh when you look at when we think in terms of the possible world semantics it's not very related to that but i'll get to the point here is that we you start to analyze the similarities between the possible situations that we have uh in in the world let's say so in this way when we are trying to interpret language and there is a lot of unknowns or things that were not said we have to think in terms of how to interpret uh the possible world in which this kind of utters this kind of text this kind of sentence was written about so this will be a component and and here we have an advantage of these very large language models because when you bring a lot of information perhaps the next step is to bring also to the fore the context in which this kind of text was written historical information temporal information about the text about the language that we are about to interpret so the advantages here about having a lot of data is that not only you bring the textual information or the sentences the utterances the the questions the the or the the summaries that we have you also bring contextual information and so in this way one could perhaps think in terms of let's say in a foreign analogy here not only in terms of a montague semantics that is very very classical semantics in natural language but we also could bring about perhaps the the kind of models that computer scientists brought to the fore that are based on the possible world models in terms of branching time in terms of words that are more similar to the other words that are related to the other so if we can bring this kind of information i believe we are getting closer to having a more appropriate semantics and more appropriate symmetrical models for for instance large language models or other tools that will be developed that we're not can we not we cannot even imagine at this point so that's a very good point that you brought to the debate and uh another promising thing that i want to mention here is that if we think of the history of deep learning we say well in the early 2000s i used to go to the new rips conference and i remember that when i went to the new rips that was called nips conference in the early 2000s and we once presented a paper in 2003 about the neurosymbolic temporal reasoning reasoning about knowledge and i remember that one guy came to to our poster presentation he said look you are one of the only guys here who is using artificial neural networks right at the new rips conference virtually nobody is presenting a paper on neural networks at new rips anymore why are you using neural networks apparently only you and jeff hinton here are presenting papers about artificial neural networks that was just three five years before the deep learning revolution started in the academic community and let's say seven eight years before the deep learning revolution came to the foreign industry and in media so when we think in terms of uh symbolic ai or neurosymbolic approaches perhaps we're at the same point in history uh we showed in the late 90s in the early 2000s that there were symbolic interpretations and there were kinds of symbolic reasoning that could be performed by connectionist models by neural network models i believe that with the recent developments in integrating the the hybrid models the symbolic school and connecting schools we perhaps as gary said before he left that perhaps the the the best results uh are in some of the papers that were published yet but we have not been able to exploit to their full potential i was listening to a podcast with thomas mikelov you've probably heard of him he's the guy who created the skip grand model and um he said that before the skip grand model it was like the dark ages in natural language processing right and there are other perspectives so we know a lot of linguists and logicians and they say we're in the dark ages now that they say that we're on our hiding to nothing with this vector-based approach to natural language processing it's antithetical to understanding he says that there are zero degrees of freedom natural language understanding is about uncovering the single human thoughts behind an utterance it's a binary it's there you know what i mean um so so they they think that we're in the dark ages now and you are very much a pragmatist and you're halfway between the two things because language to me it seems like a very discreet problem doesn't seem like we should be using vector spaces here at all so what do you think about this i think language language and natural language processing natural language understanding and relational learning they are let's say associated they are related in the sense that they are both discrete they are both structured in a way we know that some people claim that language is not that structure but we have a formal grammar for most western languages let's say or a formal semantics or a semantics understanding for most western languages let's say so language in a way is structured and the structure sometimes is very strongly correlated to discreteness to what to the point that you are making so in this sense the hardness of building uh deep learning approaches that are very good at relational reasoning in the logical sense in the symbolic sense is akin to the challenge of building deep learning systems that can produce language interpretation language reasoning in the same way so these problems are related so what i believe what i believe and what i guess here and i'm not sure because they are both hypothesis is that perhaps the neurosymbolic approach is promising here because we are much we have better tools to translate say the the knowledge that we have about language into a neural network into a neural network model or we have better tools coming from the logic world from the logic in computer science is cool we have better ways to build this hybrid approach where the network interacts and the network provides information or the network learns discrete structures so we what we do in neurosymbolic neurosymbolic ai or neurosymbolic computing is not to throw all the data at the same point let's say not to feed a very huge very large neural network with an even larger amount of data what we do is we we do some pre-processing we provide knowledge and presentation tools to proper me to proper design learn a learning system where symbolic structure discrete structures are first learned by this machine learning mechanism and from this learning ability and then you verify then you do your cross validation then you use your statistical approach to validate what you have learned so the challenge here is to design these tools to combine the discrete nature of language the discrete nature of relations and the continuous characteristics of the neural network approaches and what we also have to add here is that we we do not only have to think about neural network learning we can consider perhaps reinforcement learning as deep learning has done over the years they have been considering reinforcement learning and other learning procedures combined with neural learning to to have better learning machinery better learning results and in the end of the day the best very impressive technological tools that they have and i i firmly believe uh and what we have seen over the years that if you are able to to have a better representation for your discrete structures and you train your learning algorithms with this translation that you bring from the symbolic world into the neural network world you have better learning procedures that perhaps you will be able to to learn about uh symbols let's say let's focus on symbols here but also to learn about logics learn different kinds of logics and different kinds of reasoning let us not forget that we humans are not very good at logical reasoning right we are much better let's say at perceiving at uh seeing at uh informally understanding sentences at informally understanding information about our world but we are not the best at performing logical reasoning logical inference and typically if we can teach our computers teach our deep learning machinery or teach our neurosymbolic ai systems then we probably have one of we'll probably have we'll probably have to solve one of our biggest problems one of the biggest challenges that is to provide let's say logical reasoning to the masses logical reasoning to everyone logical reasoning to make inferences about what we are expecting about the world not not only in terms of logic but to analyze hard problems for instance when you have disputes between companies disputes among social groups disputes about several kinds of problems that we have in businesses perhaps if we can feed the the ai systems with a lot of information and then to design the proper inference mechanisms or making the computers to learn the proper inference mechanisms that will have the tools to provide us with better logical information about how to take decisions so that's another advantage of the neurosymbolic techniques yeah could i challenge on that a little bit because you said that human beings um often aren't very good at reasoning and yeah by the way because you folks talk about robust ai being a good thing but if you're talking about creating learning agents then in a way it's quite worrying if they might be acquiring knowledge and reasoning on on new knowledge and who knows where it might go but um in in a sense you could argue it both ways right so natural language understanding is is about disambiguating out of the 50 possible meanings of an utterance to the intended meaning and actually i mean you know when you say the corner table wants a beer we know the corner table as a person we hardly ever misunderstand each other it's something which is so invisible that we almost never misunderstand each other but then you could make the other argument that there are loads of situations where people really do have bad reasoning especially if you're having political discussions on twitter uh you know god if only they thought the same as me what we are able to do uh as humans is to consider a small set of arguments when we uh talk to other people we typically do not add a lot of information when we're dealing with logical reasoning with formal reasoning okay however when we are dealing with one of our fields of expertise say music or football or any other field of expertise we have as gary said here years and years and years of information that we have acquired that we have processed that we have memorized that we have revised that we have formed ourselves of beliefs that we have also made some belief revisions on what we used to believe and so on we have years and years and years of information about our domain of expertise and this is a lot of information that's restored in our brains in our minds and of course we we are able to reason upon our field of expertise much better than uh about other information that we don't know about this is related to system one and two from uh from daniel kahneman but that he explains in his book that we are very able to react very able to to react to things that we have seen in the past react for instance to information that is related to our survivability like like fear for instance and in other situations where we are threatened and when you are when we specialize in some field we are also very we are also able to react very quickly so there is an interplay between system one and two that are the the quicker one and the more reflexive one and so on and however when we are dealing with logical reasoning about new information about new propositions new statements for instance that someone makes at the social network we tend to have some deductive reasoning that sometimes is not quite elaborate because we also mix some information some contextual information that we react to to the to the to the provocation let's say that we react to the emotions that are involved in the system and sometimes we are not able to do the proper logical analysis of the situation and to find out uh more information more data about to take a decision so we are very good at doing logical reasoning about our domain of expertise but we are not very good to do logical reasoning about domains that we are not experts on and since most people are not expert in mathematics in logic in natural language understanding natural language semantics we are also not being we have not also been able to translate this kind of uh of information this kind of knowledge about language not knowledge about the mathematics to the deep learning systems to machine learning systems what machine learning systems were quite effective were very efficient or very good at were to use a lot of data i mean huge amount of data and find correlations find inference about image interpretation about the language inference and so on from this huge amount of data but we cannot say mimicking ourselves that we are that these machine learning systems are reasoning paul the data and are actually giving us um information that has a solid foundation about for instance natural language understanding so in this way i compare the developments that we have in deep learning to the developments that we have in um in our formal education people have people are very good at their domain of expertise they accumulate a lot of data about this domain of expertise and they can very quickly respond to questions about football about music about uh about impressionism about the french painting and so on however when you ask them about other things other domains of expertise we are stuck so when we compare deep learning or compare the current ai tools with this situation it's more or less related to that it's not exactly related but it's an analogy that i use for instance with my students wow white deep learning systems are very good in image processing image understanding finding correlations in images fudging finding relations in images then i explain how labeling works how the basic algorithms work and i also explain let's say the importance of having a lot of data about a lot of label data of about particular domains in this way machine learning system and ai system will work very well as for instance when a human expert has accumulated a lot of data a lot of information about his domain of expertise about two three decades of working or about of uh about hard work yeah do you do you feel that um because i really want to get to the root of this is is reasoning a first-class city you know we're talking about symbolic reasoning is it a first-class citizen in the human brain do you really believe that or do you think it's some kind of it because you know when we when we talk about it and we we project it as a kind of calculus right and and it makes sense in that domain but do you think it's first class in our brain um reasoning is a first-class citizen but reasoning is also based on experience is also based on learning right for instance let's say let's take a mathematician or a logician he has years and years and years of practicing proofs over examples he's like a deep learning system who has seen like 10 million proofs 10 million logical complete logical proofs then he becomes an expert in reasoning about proofs reasoning about logics reasoning about a particular domain of mathematics his experience comes not only from his uh from his biological composition comes also from a lot of experience a lot of hard work on seeing patterns on seeing examples on seeing sentences on seeing logical inferences on seeing different proofs in books on developing these kinds of proofs either way this is in a way reasoning is built in reasoning is a first class system but it also depends on how you interact with the world and how you draw your own inferences from the world of course if you practice less if you use less learning algorithms about your reasoning uh system that feeds your reasoning systems perhaps you are not very sophisticated in reasoning but the the depending on uh how much you have developed your expertise and how much examples on how many examples you have done over your life you become better and better at reasoning it is a first-class reasoning as learning is a first-class reasoning well we we often we often have i think i think i'm i'm taking this a bit out of what what keith usually says is that uh if we look at the you know the human brain at a basic level it is neurons and and cells and we all agree that humans do reasoning right humans well maybe not we all agree but most we all here agree that humans do reasoning some sort of we can manipulate symbols in our head so i think maybe what tim was getting at is do you do you think that there is kind of like something special about the brain there is an extra module that we have for reasoning or can this type of reasoning actually be implemented on the substrate on like the same substrate that does the pattern recognition um or are these are these two different things that's a great question i believe we actually have only uh some small evidence about this thing but the answer is that i don't know i don't know how it works because we don't have enough evidence at this point at this point so from as far as i understand from neuroscience i'm not a neuroscientist but as far as i understand we don't have enough evidence about if they are either integrated or separated what machine learning people like to think and and believe is that we have these distributed representations and distributed forms of reasoning going on and this is a good point this is a good point to to to consider in terms of uh not only of the from a computer science approach but also from a cognitive science approach to to to think and consider that we actually do have uh distributed knowledge representation distributed symbolic representations and distributed reasoning going on in uh in our mind and brains that's a that's that's a good point that that needs more attention and needs more uh neuroscience research to confirm or to refute the hypothesis that we have at the moment so i i i don't think we have 100 we are 100 sure about if this is either integrated or separated or if it is distributed as many as many people from machine learning uh currently believe yeah so there you know sometimes we kind of um expand on this question a little bit uh because even if even if you leave aside kind of the neurophysiology behind we can think just in terms of abstract properties you know is there something missing from the artificial neural networks that we're playing around with today that we could if we just borrowed it and abstracted a little bit and stuck it in there it might open up you know new paradigms or new capabilities so for example some people think it has to do with asynchronous firing of neurons that you know we need to be able to have kind of a processing that can take place in the neural network so that some neurons can fire over here and then actually at a subsequent time you know influence neurons that are back further in the network or having you know mechanisms that can that uh sometimes feedback and and mimic like what the dendrites can do where they can build up like a charge if you will that can cause like a different type of firing so for example uh you know maybe you want to have multiple activation functions that can trigger differently or even simultaneously so there's like a lot of you can speculate about a lot of different kind of abstract capabilities that if you could add them into the artificial neural networks we have today might enhance its um computational capabilities or maybe increase its efficiency at certain computational tasks so i think we're just kind of wondering what abstract capabilities do you think might be the most promising if any well there seem to be some very interesting results coming from for instance from from this there's no transformer models that use some attention mechanisms that have been very effective in in some applications uh definitely in sequence modeling in uh in language too so this this seems to be promising but uh since it's relatively new if you compare these models with the models that the the convolutional models that jeff hinton yeah used in the early 2010s to to to let's say to feed the deep learning revolution we will need more and more results and more test beds coming from for instance from these new architectures i like the idea of exploiting extending and analyzing the graph neural network models because graph neural network models they are very good at representing relational structures and in this way there are hope for us that we are going to be able to learn discrete or restricted relational domains that are considerably that they are harder to to do these days so uh so keith i believe that the architectures that exploit the possibilities of uh graph neural networks and as a consequence a kind of relational model or relational neural network that will someone will come up uh from drawing someone will draw inspiration in graph neural networks to come up with some kind of relational neural network uh this can be very useful in terms of uh relational learning and in the end it might be very useful in language understanding machine translation and having a better semantics for language because once you have a relational neural network or a relational representation of language or relational representation of sentences and texts perhaps you are we are getting closer to having a better semantics for natural language understanding so um i would uh bet uh not too much money but i would bet that these relational models derived from graph neural networks are a good avenue of of research for the for the coming years because of their proper proper representation of relations and relations are a very important uh structure in computer science and in natural language too excellent well professor louis land thank you so much for joining us this evening it's been an absolute pleasure it's been it's been uh fantastic thank you yannick thank you keith thank you tim thank you to gary who has just left us so uh it's been a pleasure and uh let's stay in touch let's stay in touch with developments in the machine learning ai and computer science so it's been a pleasure and thank you so much for for hearing someone uh far away from south brazil thank you it's been our pleasure
Info
Channel: Machine Learning Street Talk
Views: 53,565
Rating: undefined out of 5
Keywords:
Id: nhUt6mKCPf8
Channel Id: undefined
Length: 144min 13sec (8653 seconds)
Published: Fri Jun 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.