What Kind of Computation Is Cognition?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- I am delighted to welcome professor Josh Tenenbaum who's a computational cognitive scientist at MIT in the Brain and Cognitive Sciences Department here at Yale. Josh is an amazing and brilliant thinker. And you'll find that out, if you don't already know that. Many of us already know that, but you'll find that out as we have him here today to talk. And I guess what I'll, I mean, the thing I wanna say about Josh is that, his mind is amazing, he's a wonderful person, he's a preeminent conversational cognitive scientist in, I think, the world, and his genius was recognized in 2019 by the MacArthur Foundation. For those of us that, as I said, already know him, we didn't need that to recognize he's genius. But not much else I can say, just that we're very happy that you can visit us today, give the talk and spend time with us afterwards. And you are gonna tell us about what, I can't remember What, - [Josh] Kind. - What kind of computation is cognition? - [Josh] Yeah, it's a question deliberately. So, I hope we'll have the discussion. - What kind of computation is cognition? - Thank you. - All right. Great, thank you so much for having me, for Laurie and Brian for inviting me. It's great honor and pleasure to be here. My understanding is, this is one of the first in-person seminars that people have had here in a while. It's also the first in-person seminar that I've given in quite a long time. So, maybe in my enthusiasm and excitement, I try to pack in too much in this talk. So, I will talk about some thoughts on what kind of computation is cognition? And a lot of this is really designed to raise questions for discussion that then I hope we'll maybe have a few minutes for, with the larger group and also with the class afterwards. Okay, so, there's a few questions in this talk. That is the main one. But some questions in the background are these ones, what would constitute a meaningful answer to this question? And how do we know if it's right or even on the right track? In the spirit of this seminar, which is an interdisciplinary meeting between cognitive scientists and people interested metaphysics, I wanted to take this opportunity to raise these questions, which are essentially questions of metaphysics and epistemology about how do we model the mind. I think there are really deep issues here to discuss. I'm gonna present one approach. I'm not really gonna talk about these questions. I'm gonna present one approach, which I call the reverse engineering approach. And then I'd be very interested to talk with you about its pros and cons, what makes sense, what doesn't and what are some other approaches, okay. I also think there's intrigued in deep parallels between the way we think about answering this question and the way we think about actually modeling the mind that reflects some of the special relationships between cognitive science and just more generally, philosophy of science and how we think about what makes meaningful scientific questions and models. But that will just be in the background and maybe we can discuss afterwards. So, what I mean by reverse engineering is trying to characterize how the human mind works in the same terms that we would use to engineer an intelligent machine, okay. But instead of doing artificial intelligence, we'll be doing, let's call it natural intelligence, all right. But our goal is to build models that look like AI systems in a sense, okay. To me, what a meaningful answer to this question is, I would like to understand how the mind works in those terms. And the way we know that we're on the right track 'cause I think none of these models that we're gonna talk about are right, but hopefully they're, in some ways, on the right track is, you know, the same thing we normally do in science, we compare our models to data. That means we try to capture qualitatively how human behavior works. We want our models to behave like people. But also, quantitatively, the models often have a lot of moving parts, which I can only begin to gesture at here for you. But in order to test those in some rigorous way, it often helps to have quantitative data. So, I want to give you a feel for how we do that. But most distinctively I think, our goal is not just to capture the data from laboratory experiments, which are great and the bread and butter, the gold standard, but they're limited in terms of the scope of what we can do. Our goal is also, to build models that actually solve the problems that people do. So, in my work and in a lot of the work I'll talk about, we do things that basically look like AI. They are AI in some form, we implement our systems in robots, or we have them solve AI problems, the same kind of problems that humans face in the real world. And it's essential, we think, to know we're on the right track, that the models both give a qualitative and hopefully quantitative account of human behavior, but they also solve the problem that humans have to solve, which gives us some reason to think that they might be describing the actual computations in the mind and maybe even the brain, okay. But those are some of the hard problems. What does it really mean to describe the actual computations in the mind of the brain that I think will only have time to discuss in the following session, all right. Now, to motivate this kind of approach, I think it's helpful to reflect on the current state of artificial intelligence or AI. Whether you are in cognitive science or metaphysics or any number of other fields, including any of the humanities, you can't ignore AI. It's everywhere around us for better and worse, I think. But I think it's also important to reflect on the state of what we mean by AI. The way I like to put it is to say, we have all these AI technologies increasingly useful and perhaps, or not perhaps, actually increasingly dangerous, but increasingly powerful technologies, but we don't have any real AI, okay, at this effectively. What I mean by that is we have these systems that do things we used to think only humans could do and now machines can do them to some extent, but we don't have anything like the flexible general purpose kind of common sense, the general notion of intelligence that each and every one of you use to do every one of these things for yourself without having to be programmed by some dedicated team of engineers at a big tech company or a hot startup, okay. You just do these things. And we don't have any machine that can just do all these things for itself. That was the original vision of the founders of the field, right. What's sometimes called a general artificial intelligence. One of the most interesting things about cognitive science and the study of human cognition is how might the mind work that we can do all these things and so many other things for ourselves. Now, reflecting on that gap, I think, helps to motivate again what are possible ways to answer the question, of what kind of computation is cognition and what I think are some of the more promising ones. I wanna just dive a little bit deeper into one, very useful but very imperfect AI technology, and that's self-driving cars as they're sometimes called or autonomous driving. This is an article which I believe is the first major press article on self-driving cars, particularly on the Google effort which is now something called Waymo as part of the Alphabet corporate enterprise. I'm sure most people are aware of this. There are now many, many car companies working on, and companies of various sorts, AI companies all sorts, working on trying to build cars that can drive themselves. And it's probably believe the closest that we've come to having actual robots with some autonomous algorithms that don't have a human in the loop, except when they do, but basically don't have a human in the loop some of the time. And they're out there interacting with people in real world settings that really have major consequences, including life and death consequences, okay. So, where are we in this enterprise? Well, this article was published in the New York Times in 2010 and just sort of announcing these first efforts that Google was working on. It's now, well, 11 and a half years later, by some estimates $100 billion, $100 billion have been invested in the enterprise of building self-driving cars, and we're still not there. This is a recent article from just a few months ago in 2021 about "The Costly Pursuit of Self-Driving Cars Going On And On And On." And oh, what sometimes called the long tail of problems. All the cases that you didn't really anticipate, that somebody didn't have data to learn from in their machine learning system, that only come up as you make some progress, and then you realize, nope, there's so much more and more and more problems that we just haven't solved, now 12 years later. Waymo again, which is the Google or Alphabet company, well by this recent article in Businessweek, "Is 99% of the Way There, But The Last 1% is the Hardest." I don't think we're 99% of the way there. I think that they've been saying that for a while and I don't think they would even say that. The weird echo thing for me. It's not just Waymo, but a number of companies that have been promising robotaxis. "Where are all the robotaxis we were promised? Well." Yeah, they're not here yet. It turns out that we need some fundamental advances in AI. This is a actually quite a good article in terms of diagnosing, I think, what some of the things that are missing. And I'm, not gonna go into it, but it's gonna parallel what I'm talking about here. Or another recent article from the Wall Street Journal, "Self-Driving Cars Could Even be Decades Away, No Matter What Elon Musk Says," because AI will have to get a lot smarter. So again, I don't wanna knock this technology. It's an amazing achievement, okay. That works at all. And in some sense, it makes a lot of sense that it doesn't fully work yet and it's hard to even know when it will work. What's going on? Well, the technology, there's lots of things that go on in autonomous driving. But there's a basic thing that's been driving a lot of the progress, which also drives most of these other progress in AI technologies, which is a certain way to answer the question of what kind of computation cognition might be. And you could say it's basically, oh, - Just wanna turn your speaker off. - Oh, sorry. - Turn my speaker off. - Yes. - There's no audio? Okay. There we go. Good. - Yeah. Okay. - Okay. Perfect. - Thank you. - It's a certain kind of machine learning approach, which you could call, it's often called deep learning and end to end neural networks. But if you don't know what neural networks are, I'm sure probably most of you do know what they are. And I mean the artificial kind of neural network that is an AI technology. These are functional mappings. Functions that map from inputs to outputs. The inputs are usually some kind of sense data, like the sensors of the car and the outputs are behavior. It might be conceptual behavior like recognizing a pedestrian or another car or making a decision, okay. And some complex function that is implemented in the machine, has many, many parameters that can be trained. They're called neural networks because each little bit of that function, what's called a unit in a neural network, of which there could be millions or billions, it's often hard to keep track of how many there are in some of these models, is sort of an abstraction of a neuron or a nerve cell in the cortex shown here from the famous early studies of, of Hubel and Wiesel and early mathematical abstractions. What was sometimes called the perceptron, going back to the early 1960s, this work, that then over a series of decades turned into increasingly bigger models that represented bigger, more scalable attempts to basically build these large trainable functional mappings. So for example, in the work of Yann LeCun in the 1980s, this is what was one of the early deep convolutional neural networks that would again, take in for example, a letter or a digit and classify it as giving it a label A or 3. A little more than 10 years ago, there was a big breakthrough of progress where people, moment of progress where people started applying these to data sets of natural images and scenes in computer vision, like the famous ImageNet dataset. And this is the famous AlexNet Architecture. And you can see that going from a single nerve like element that takes inputs and weights and combines them with a linear sum and then a threshold as neurons do to one layer, multi-layer, many, many, many layered networks. At this point, these systems get deployed in all sorts of ways, including inside, for example, this is inside the Tesla AI system, not the current state of the art, but a couple of generations ago. But basically, the same kind of technology used to detect all these things out there on the roads. This is a slide from some work of one of my MIT colleagues, Jim DiCarlo and his collaborator, Dan Yamins and others, where they've take these models, which were originally abstracted from, not just how neurons in the brain work, but neurons in visual cortex, the part of the brain that actually is the front end to object recognition. And then, they've gone back, and many others have done this, but their work is especially well known, and taken the same kind of models that have been used in computer vision and shown that they can be used to make predictions, pretty good quantitative about the behaviors of neurons in responses to images. So, it's a compelling story in that these models have been used for practical applications. And though, you know, this is, I'm not focusing here on like human behavior, but these models can, to some extent, capture some aspects of human behavior. They can also capture the behavior of neurons. And yet, they're also missing a huge amount, okay. And this is the gap that I really wanna focus on that motivates the work that we're doing. Well, some part of intelligence is probably about training functions to approximate some classifier that can recognize a pattern like a pedestrian in an image, so much of intelligence goes beyond that. All these activities which, why sort of lump under modeling the world. So, not just finding patterns and pixels, but being able to actually explain and understand what we see or to be able to imagine things that we could see, maybe things that nobody's ever seen, and then think about what it would be like if the world had those things in them. Maybe set those as goals that we could make plans to achieve, and then solve the problems that come up along the way.. This is intelligence, right. And then, learning is actually building these models. So, that might mean adjusting or refining, a preexisting model, or constructing whole new models, maybe by combining pieces of other models, or actually, maybe even synthesizing a model from scratch in some ways. It's all these activities that I think, many cognitive scientists are interested in. And in our work, we're trying to capture these things in computational terms. So the question for me here is, what computations can or, and in a formal way, and in a quantitatively testable way, can express these human capacities for modeling the world and not just finding patterns in data, right. Now, we're far from being able to do these things at the scale that Silicon valley is ready to invest in. Maybe they would, if they knew more about what we were doing, but it's much earlier in the, let's call it the scaling curve, all right. What we've really been focused on in our work over the last few years is kind of, the earliest most basic sense of how we model the world, which I like to call the common sense core. It's heavily influenced by a program of research that sometimes referred to as core knowledge in human cognitive development. Liz Spelke, who is one of the leading sort of, developers of that whole research program, probably the world's leading expert in that, she's gonna be one of your speakers, actually the last speaker. So, you'll see a lot of connection between what I'm talking about today and her work. She's a friend of and collaborator also. And I think when I talk about the common sense core here, it's also deeply connected to the basic topics in metaphysics that I think the class is studying. It's this idea that Spelke and many others have developed in developmental psychology, as well as people studying the human brain and some amount of comparative work with other non-human species. That, from the very beginning, in the earliest in youngest infants that we can study, even say three month old babies, and in ways that are to some extent, but not completely probably, shared with the brains of other non-human species, we are built to understand the world, not just in terms of patterns and pixel let's say, but in terms of basic kinds of concepts. So, we often talk about an intuitive physics and an intuitive psychology or objects, actual physical things in the world, like this thing I'm picking up, which even if you'd never seen a cell phone or an iPhone before, you know there's a thing I'm picking up, right. And you know that if I were to let go of it here, that it would fall. Maybe there would be a sound. Maybe there would be so worse. I'm not gonna do that. But okay, that's an example of an understanding of physical objects and some of the intuitive physics that humans have from even at a very early age. We talk about intuitive psychology, and there we mean agents, that could be humans or could be other animals, or maybe a robot someday, or a self-driving car, but an entity that has goals and some model of the world itself and acts in some way to achieve its goals, given its model, okay. And the idea that we understand the world in terms of physical objects and goal driven intentional agents and their causal interactions is the heart of our common sense. And what I wanna talk about is how we can try to capture that set of concepts, those cognitive systems in computational terms. I'm not gonna talk too much about the brain side of this, but there'll be maybe a little bit depending on time, or we can talk about this. So I'm mostly gonna be focusing on the cognitive aspect. But it's very interesting that these systems also seem to be correspond to some of the large scale architecture of the brain, okay. And happy to talk about that and what the metaphysics of that might be telling us as well. The work here is also, I mean, for people who are familiar with the different sort of, subfields of cognitive science, one of the reasons why I like it is it's really at the intersection of the different traditional subdisciplines perception, language, action planning. This is the basic stuff that brings all those together. The targets of perception, the substrate of action planning and also the substrate of what we talk about, at least at the very beginning, when we're first learning language, right. We human adults have the ability to do what I'm doing right now and what you're doing, which is communicate, talk and understand about things that are not part of our basic understanding of objects like this and agents that pursue goals in the world, okay. We can talk about talking about that, talk about modeling that as well as quantum mechanics and the origin of the universe and the origin of life and metaphysics and so many other topics. But in some sense, this is where it starts, okay. So to get a little bit more concrete, by intuitive physics in young children, I mean, the kind of thing that say, this baby is doing here, this one and a half year old, stacking up cups or playing with their toys. Now, we have robots that can do things like pick up objects and do simple kinds of stacking operations. But we don't have any robot that can do what this kid is doing, including the physical dexterity, the ability to sort of solve the little problems that come up along the way when you're trying to achieve your goal. A little bit of debugging going on here. But even just to make the plan, which you can see that this kid is doing, right. Even if you haven't seen this video before you can see the stack of three cups, you can kind of guess where this is gonna go if I jump forward into the video, right. To conceive of the goal, to make the plan. You know if we had robots that could do this, it would be quite amazing, all right. By intuitive psychology, I'll show one very famous video from a very famous study by Felix Warneken and Michael Tomasello. This is also with a one and a half year old. But here the one and a half year old is this little kid in the back who's a participant in the experiment. So, he is watching an adult, that's Felix, do something that he's never really seen before. I mean, this is a not crazy action, but it's a little bit unfamiliar if you haven't seen this video. And now, watch what happens when the adult stops moving, the kid goes over, opens the door, yeah. (audience laughing) Yeah. So you guys are totally with me 'cause you smiled and laughed and even awed all the best parts, right? Somehow that kid is able to figure out what this guy's doing for an action that is not quite like anything he's seen before and then even figure out how to help him, right. And you can see that helping goes on when he goes over, opens the door, and in some sense, like the best part, both emotionally and I think scientifically is when he steps back and looks up and makes eye contact, right, at him, and then kind of looks down at his hands because what you're seeing when you see that right, is at least a sign suggestive that the kid has understood what the adult is trying to do and is maybe almost even signaling, like, I think I've understood what you've done. If, so I think I know what you're gonna do, which is now do something with your hands, okay. So again, we don't have robots that can do anything like that. But imagine if we did, they would be very helpful around the house as well as other situations. And it's not my goal to build those robots, particularly, although I work with colleagues who do, but the goal for me is to try to understand in those kinds of engineering terms, what's going on inside that kid's head. Now, I'm gonna show one more video, which I know some of you have seen. It's one of the most famous videos in cognitive science. And I think it's not unreasonable to call this the most important two minutes in cognitive science. Brian might have other things like maybe one of his experiments, but (laughing) this is the famous Heider and Simmel movie. I'm not even gonna show all one and a half or two minutes of it, but just a brief or excerpt. But again, I'm sure many of you have seen this, but it really is like, if you wanted one short video that makes the questions of metaphysics and cognitive science, you know, compelling as well as the computational challenges, this would be it. This is from a study in the 1940s in which people were just shown this video of some shapes, just moving around on a tabletop. And yet what they see is much more than just simple shapes moving two dimensions, right. You see what looks like an interaction between agents. One that is also sitting on top of physics. One that is maybe a little bit not super positive, right, at this point. Maybe there would be some scary music if right now, if this was a silent film. Yeah. Yes. What's gonna happen? All right. So there's a little bit more, but we'll stop there, yeah. So, it ends at least some happily for several of the characters, okay. So what's going on in here, right? I mean again, you can describe this in geometric terms, but what we actually see is all these other things. We see physical objects as well as constraints like what is solid, not penetrable, what's fixed, what's movable, that when the door locks you see it sort of get suddenly attached. You see events and causal interactions like collisions. One thing hits another and it moves. but also it's pushing and shoving. And that gives you an understanding in terms of like agents might have goals towards each other, trying to hurt or help each other when one's trying to escape or trap another one. You see relationships like you see those two as being kind of on the same side, they're friends or something, the other guys kind of an enemy. You experience, or you know, maybe your own emotions, but you experience them experiencing emotions, right, and probably make moral judgments as well. So, you know, in a sense, this the full challenge of trying to understand common sense in computational terms is, could we build an algorithm that could look at a video like this, or any movie of any groups of people interacting and make sense of it in all these terms. Now, again, we're far from that, but that's the goal, that's kind of what we're aiming towards. And so, what I'm gonna talk about today, primarily, are how we can re represent these things computationally and a little bit about how we study them quantitatively. And then, you know, at least gesture at these questions of how they might be instantiated in the brain, which maybe gives a different perspective on what it might mean for the mind to really work this way, if it's circuitry in some sense, implements these computations, and then say a little bit, maybe about learning, both how we learn, let's say our intuitive physics and also how learning can take us beyond these things. So the key computational ideas here, and I'm just gonna try to be mindful of the time and make sure that I get to at least enough of the content so we can have a reasonable discussion. And there's no clocks, so I'm gonna just keep referring my watch here. But if I lose track of it, just, you know, let me know it's time to stop basically, okay. So, you know, to answer the question, what kind of computation is cognition? It's not just neural networks for pattern recognition and function approximation. Well, here's one proposal for some of the other things we need. In our work, we often use the phrase probabilistic programs or probabilistic programming, which is kind of one of these jargon terms like neural networks, okay. It has something to do with probabilities and programming, just like neural networks has something to do with neurons. But what it really refers to is a whole sort of, you call it a computational paradigm. There's math, there's programming languages, there's there's systems and platforms that I know some of you're familiar with, some of you even use in your research. But you can think of it as ways of realizing in practical computational systems, a synthesis of several good ideas or several paradigms or broad ways of thinking about cognition and computational terms. One is the neural network pattern recognition idea. Another is what is the oldest and arguably most important way to think about intelligence in computational terms, which is the idea of symbol manipulation, or having symbolic languages for abstract knowledge representation and reasoning. In the early days of AI, as well as in cognitive science, that's what everybody thought about. And as these fields have had their successes and their failures and gone up and down, often, the so-called symbolic approach to AI has gotten kind of a bad rep, basically. People have said, "Well, early promises of AI didn't work because at everybody thought we should be using symbols." And then we realized we had to use neural networks. But that is let's call it fake news, okay. That's the nice word for it. Because, you know, if you had to nominate one of these ideas to be the most important one, that's the most important. If we didn't have symbolic languages, we wouldn't have, you know, natural language, we wouldn't have mathematics, we wouldn't have all of computing, we wouldn't have pro programming languages, whether it's Lisp or C or Python or modern programming languages for deep learning, like TensorFlow or PyTorch, we wouldn't have anything basically. So, that's an absolutely central idea. And we need ways of thinking about common sense knowledge that integrates learning and symbols, okay. And then a third idea, which is the one, I guess my work has been most associated with and is one, that when I was in grad school and in earlier times of my career, was kind of inarguably the dominant idea. Each of these ideas have had their moments of in the sun as well as in the shade. But the idea of probabilistic inference or Bayesian inference by which we mean often, the Bayesian part of problematic inference is kind of inverting conditional probabilities, specifically in a causal setting where we have models of how some things are caused by other things, let's say effects are caused by underlying latent causes in the world. We observe the effects and wanna work backwards to make good guesses about the things that cause them, okay. And that idea is absolutely central, we think, in cognitive science for understanding whether it's perception or learning, or how we make sense of sparse, uncertain, incomplete, otherwise, and otherwise ambiguous patterns of data, which whether that we're talking about perception or language or learning, you're always in that setting. And probabilistic programs and the discipline of probabilistic programming and probabilistic programming languages, basically let us bring these ideas together to define models of, well, to define scientific models, which are models of the models inside your head for these common sense intuitive physics or intuitive psychology using symbolic languages that support causal models and then probabilistic or Bayesian inference so that we can, in a sense, like run these programs backwards, to infer the underlying things in the world, from the effects that we observe, or the effects that those things in the world, cause on our senses, all right. And to use neural networks or other machine learning tools to amplify and extend what we can do with probabilistic inference over these symbolic programs. If you want to learn more about this approach beyond what I can talk about in just high level terms in this talk, I would encourage you to check out any of these various probabilistic programming languages or this web book, probmods.org, which was written by Noah Goodman and a number of other colleagues. Noah is a Professor at Stanford. We've worked together for a long time. I co-wrote the first draft of this with Noah, but it's gone through many iterations since then. A lot of other people have contributed. And Noah's been the one mostly carrying it forward. But it's a nice introduction to probabilistic programming and cognitive models based on probabilistic programs, okay. There's another important idea, another kind of computational tool for capturing common sense, which is this idea that I described with the slogan, "The game engine in your head." And this is the idea that tools from game engines, which again, I think probably many of you are familiar with, some of you probably even programing them. These are tools that were developed in the video game industry to allow somebody or a team to create a new video game much more easily than they would otherwise, in particular, often to create games that have some rich immersive experience for a player in some, you know, three dimensional world, maybe it's outer space or under sea or the wild west or dinosaurs or something that has never existed except in the mind of the game designers, but could exist in some possible world. But to create a rich, immersive, interactive experience without having to write everything from scratch. So, without having to like write all of computer graphics, but still make the world look really good, often, nearly photorealistic and respond to what the player does. So, as the player moves around the world or the images on the screen change in the appropriate way so, you feel like you're moving through this world or as you interact with the world, it reacts accordingly. Like, so if I pick up an object and drop it, it has to fall. If it's glass or something fragile, it might break, okay. So, these game engines on often have like graphics engines and physics engines, as well as what's sometimes called AI engines or game AI, by which they mean tools for simulating. It's not really AI, but it's tools for simulating like the non-player characters in the game. So they react in some vaguely intelligent way rather than just a robotic way or like for example, a guard at a base might instead of just like shooting randomly into the air when the player is trying to invade. They might like see you and go after you and you have to like hide so they don't see you for example, okay. Here's an illustration of a game physics engine. We use these a lot in our work. They, in some sense, just implement physics, right, like whether it's Newtonian me or other kinds of, you know, fluid mechanics, soft body cloth physics. But they do it in a way that is designed not to actually be what a physicist wants in a sense, maybe, but to just look good, look good enough, to be very general, to be able to model all these different kinds of things, including simple systems of a few balls bouncing around and much more complex systems like hundreds or thousands of blocks, like in this wrecking ball case and other very kind of complex non rigid or non solid materials. And to be able to look good on, look reasonable in what they do going forward, just, you know, on the scale of maybe one or a couple of seconds, all right. And to be able to run in real time, so it can be interactive with a player or maybe even faster than real time, okay. Now these tools have increasingly been used in many areas of AI as a training ground like for an AI algorithm. So, there might be like a typical thing that somebody might do in a machine learning approach to AI is to have a reinforcement learning agent that like, that is deployed into one of these simulators and then does some things and learn some input output mapping, some policy as it's called, a mapping from sense data to behavior, And then maybe, you'll deploy it in the real world. And it has what's called a Sim-to-real problem. It has to figure out how to go from its training ground in the simulator to act okay in the real world. That's a lot of what is going on in self-driving cars is the Sim-to-real challenge. And the fact that like, people are trying to create in the simulator, like every possible thing that could happen in the real world and that not possible, okay. But the reason why we call this "The game engine in the head. is because the idea is that the game engine is a model of the model inside the head, okay. Not the training grounds for a learning algorithm, but a model of the mental models that when say, a kid like this one here, sees this stack of blocks and the toy B bird on top and has the ball in his hand and imagines, well, what would happen if I roll the ball forward? Oh, maybe that would happen. Maybe it would knock over the blocks. Maybe the bird would fall. I'm not sure. But imagining what might happen, then allows me to decide if that's what I wanna do, or maybe it's not what I wanna do, or maybe I should do something differently and so on, okay. So, the kind of models that we build are, we take these kinds of simulation programs for say, physics or analogous ones effectively for agents, we wrap them inside frameworks for probabilistic inference so that we can, for example, sorry that the zoom here is cutting off the bottom of the slides, but so for example, we can build the models, like what I'm sketching here. These are sketches of the models we build, not of the physical world, but of the model inside the kid's head for a physical world or the model inside that kid's head for what somebody is doing, okay. So, models that might take an image or a sequence of images and make an inference to what's the underlying state of the world and it's dynamics that I could then predict forward future, what I might see at future times, maybe conditioned on my own physical actions. Or this kind of thing here, which is not meant to be a picture of how your brain works, but how your brain thinks brains work or what we sometimes call theory of mind, right. It's a standard sort of picture of an intuitive model of agent's minds, where agents have some kind of goals or desires, they have some beliefs about the state of the world and their state in it, which is a function of their perception system, they make plans to come up with actions that are reasonable ways, efficient ways, to achieve their desires, given their beliefs and then they exert actions on the world, which change the state of the world, or change their own state and go on and so on. And so this is just a way that we are used to often talking about how our minds work. That is, the sort of standard model in cognitive science of how even a young child might understand somebody else's actions. And what might go on in a scene like that is observing actions, given also your observations of the world state and the agent state to try to work backwards and fill in these things, to try to infer the underlying mental states or beliefs and desires of an agent in order to make sense of their behavior. So our job here is to take these sketches and turn them into working quantitatively testable computational models. So for example, in intuitive physics, this is how we've done this. And this is work that in our lab, we started doing together with Pete Battaglia‬ and Jess Hamrick. And they're the names on these slides. And there's a lot of other work, especially Kevin Smith, Tomar Oman did key things in this research program. But I'm just gonna tell you about one or two studies here that go back to the earlier work of Battaglia‬ and Hamrick. I should say also, some key work was done by one of our colleagues here, Ilker Yildirim, who used to be in our lab and is now an Associate Professor in Psychology and other fields here at Yale, and continues to do really exciting work that relates to intuitive physics and perception. So, if you're interested in this topic, Ilker's one of the best people in the world who does it. And so, I highly recommend you check out some of his work. But here's like one of the very first models we built, like more than 10 years ago at this point. The idea is, we're giving people, sorry, I skipped over this, we give people these like scenes of these blocks, they're kind of simulations based on the game, Jenga, if you're probably familiar with that, but they're colored in this way. And these different stacks of blocks, some of them will look to you like they are stable, others will look like they should be falling over. I, we might have to make a judgment of, will it fall over or not? Or you can make a graded judgment, like on a scale of one to seven, can do it either fast or slow. It actually doesn't make that much difference in our experiments. Actually, in Brian's lab with Chaz Firestone, they've done sort of versions of this where people respond extremely quickly, you have to get a brief presentation. And you get pretty similar results whether people are making very fast perceptual judgements or slower, more considered judgements. People are pretty good at this problem of being able to judge whether a stack of blocks is gonna fall over or as I'll show you, even to judge in a graded way how unstable it is. What does it mean to be pretty good is actually something that I'm gonna try to be more precise about. But the way we capture what's going on here is we say, you observe the image and somehow you have to work backwards to the underlying world state which is the three dimensional configuration of these blocks, their geometry and whatever is enough about the physics, like mass, friction, the parameters, the basic parameters of physics that the physics engine needs for its simulation, okay. And we think of perception, basically perception is the inverse arrow to this one. So, given an image or a sequence of images, we wanna work backwards. This is sometimes called inverse graphics to figure out what was the input to the graphics program. That is the thing that would render the image from the underlying world state, okay. And I'm not gonna focus in this talk or really tell you at all, how we do that. This is actually something that appropriately trained neural networks can be good at learning this inverse mapping. You can also use other kinds of, approximate Bayesian computation, more sort of top down, like, guess and check Monte Carlo algorithms, if you're familiar with that. And again, Ilker actually, has explored a number of those things in his work, very interestingly. Here, we're just gonna assume that somehow, given an image, you get a reasonable guess, not perfect, but a reasonable guess of the positions of these three dimensional blocks in the world and sizes and shapes. And then that, it's also the state of this physics simulator, where now you can kind of run your simulation forward, your fast and rough and ready approximation to Newtonian mechanics, run it forward a few time steps and see what happens. And if you take this rendering here, which is not exactly the positions of the blocks shown here, but pretty close, and you run that forward, well, this is what you get after a few time steps. Think of that as one what's called posterior predictive sample in this probabilistic, approximate simulation based intuitive physics model. That's a lot of buzzwords, but hopefully you've seen enough of what I've been talking about to get some sense of what those words are getting at, all right. Here's another sample. I'll just flash back and forth between them. The difference is just the initial guess of where the blocks are. This is a less good guess. If you look at this image down here, hopefully you'd agree, this isn't like a crazy guess compared to lots of other positions of blocks, but it's a less good guess, all right? And you know, we assume that our perceptual system is just giving us some approximation to the true three dimensional scene structure. But whether you made this guess or this guess, when you run it forward in your physics simulator, basically the same thing happens. At the fine scale detail, very different things happen, okay. So, the final configuration of the blocks here is quite different than the one here. But intuitive physics doesn't care about that. What intuitive physics cares about is just most of the blocks fell, okay. So, that's the basis of this model. You run a couple of those simulations, a relatively small number, like maybe five, seven, three depend, I mean, we can try to measure this quantitatively. I'm not gonna show you how, but like more than one, but not very many. And that yields a fit to behavioral data that looks like this here. Ignore the thing on the right for now, just look at the plot on the left. So, what that scatter plot is showing is it's plotting, the vertical axis is plotting the average human judgment on a scale of one to seven, where the high end means, we people, think it's very unstable. Those blocks are gonna all fall over. The low one means very stable, not gonna move, okay. Each plus represents one stimulus or one block tower scene such as one of the, these are three shown here, but in this experiment, there were 60, okay. The plus are error bars, I think like 95% confidence intervals, both for human judgments and for the model. So, the model is shown on the horizontal axis and that from running a small number of the simulations, like I showed you and just computing the average or expected number of locks that fell. And what you can see is it gives a pretty good fit, okay. This is, when we say people are pretty good, that's what we mean here is that, they're pretty well captured by this model. But there's another sense in which they're not very good and that's what's shown over here. This is in a sense, the more correct model. This is the same physics simulator, but it doesn't have any uncertainty as the first model does in where the blocks are. So, this is like what you would get, if you could perfectly localize the position of every block, all right. That's why there's no error bars in this side, right. The vertical numbers and a bars are the same. That's the human judgements, but we've re plotted the data with a different X axis showing the actual number of blocks that falls in each cases, when you run that simulation with the ground truth physics and the ground truth correct object position. And it's a less good model of people in the sense that the correlation, if you were to measure, this is about what we call a 0.9 correlation, or explains about 80% of the variance. The model explains that in the human data. This is more of a 0.6 correlation, which is about 36% of the variance, right. So, it's much less good model. But it also shows that in some sense, people aren't that good. If you judge by the actual correct answer that you would want in your physics exam, it would be this one, okay. But what I would argue is that actually, this is the more useful one. If you want to build a robot, that's actually gonna do this in the real world where there's always gonna be some uncertainty in its perceptual system, you want it to be robust to uncertainty. And in a sense, what you're seeing here is that this model, I would argue, is just not robust to uncertainty. In a sense you could say, people have, again, I'm sure you're probably familiar with visual illusions and what they tell us about perception. This model shows that people suffer from what you could call a stability illusion. People see stimuli like this red dot, which corresponds to this red tower here, which I'm sure most of you think is unstable and should be falling over. But in fact, it's actually perfectly stable. That's why the ground truth model, it shows up as zero here. And there's many other things like this, many other stimuli to some extent like this, where in the ground truth physics model, they're either perfectly stable or maybe just one block falls over, but people think they're much more unstable so the judgements are much higher. What we show in this work right, is that that actually is captured pretty well by this probabilistic physics simulation. It's just that the deterministic one doesn't capture it. So, glass half full or half empty, you can say, well, this just shows the limitations in our intuitive physics, the ways in which we're not that accurate. Or you could say, well, this shows the way that our system is designed in its inherent computational architecture to be robust, to inevitable uncertainties, okay. So, this is just sort of a little again, little mini microcosm of a much bigger pattern of experimental things that we've studied in our lab and many others have studied at this point. I'm not gonna go into much more detail. But you know, we can ask many other questions like of the same model. Which way will the blocks fall? How far will they fall? What happens if one color is much heavier than the other? Notice how these two towers here have the same geometry, but they get colored differently and you make different predictions. By seeing these towers here that maybe look like they should be falling, but they aren't, You can make judgements about how heavy or light some object might be. And that, we can do many other studies like this. I'll show you, since I'm just checking the time and I don't wanna run too much time. I'll show you just one other kind of application of this, which I think has interesting connections to the metaphysic side of things that I think you studied, which has worked from Toby Gerstenberg and colleagues. Toby was a postdoc in our lab for a number of years. He's now an Assistant Professor at Stanford in Psychology. And he's looked systematically at people make judgements about causal responsibility, looking at various kinds of events, especially dynamic events like these. Literally looking at how people look at this. What you can see here is that blue dot, you might recognize is an eye tracker. So it's a trace of where somebody looked as they were watching these movies. And a typical kind of Toby experiment is what's illustrated here, where people are asked in some kinds of trials to make a prediction. They see that this is like a billiard ball scene. I hope you can see. So this is a dynamic intuitive physics setting. And you might make a prediction of what's going to happen when A hits B. Will it go in the hole or not? Oh yeah. It just barely went in, okay. In other their trials like what's here, the question is not to make a prediction, but to make it more of an explanation or to say, well, in this scene, did A cause B to go in the hole, right? So, watch it and ask yourself that question. So, did A cause B to go in the hole? All right. Well, what Toby found and the reason why I'm showing this here is, you can see just in how people look at these scenes, depending on the question that they're asking, they look in a very different way. Look on the left here. When they're making a prediction, okay, they first look at C when these things are gonna collide and then they just extrapolate forward where B is going, okay. But when they're doing a causal judgment, so they're basically trying to judge in a graded way, how responsible was A for making B going, oh, look at where they're looking. They're not just looking at where B is going to go, but where B would have gone if A hadn't been there. Do you see, how they are sort of, trying to extrapolate B's motion? And that's because there's a long tradition of work. Some of which is actually the core work that Laurie first became known for in metaphysics, which is looking at how the role of counterfactual analysis in causal relations basically. And I know this is something I think, you guys have talked about in the class, so I'm not gonna try to retell in a limited way. But it's quite striking that you can see in how people look at a scene that they're doing this kind of counterfactual analysis, if and only if, in very, very dramatic different proportions, they're making a causal judgment. Toby can also model this with his probabilistic physics simulation. And this is just a sketch of how this works. It's very much like the block tower model that I showed. okay. Except that what the system is trying to do is, it's trying to make, it has to make guesses of the counterfactual. What would've happened if A hadn't been in the scene? You don't actually know, right. You can see those eye movements. In a sense, those are the guesses realized right there on the screen in front of you. They're not perfect. They're sort of noisy extrapolations. And the model, though you can't read it down here, is basically doing similar simulations with kind of noisy estimates of the ball's velocities and positions, okay. So, the same idea of a noisy or probabilistic approximate physics simulation can be used to capture how people predict what's gonna happen. And the counterfactual probabilities, they have to compute in order to make a causal responsibility judgment. And I won't go into the details of how you study this experimentally. But Toby has done really wonderful experiments where he manipulates the degree of, how close or far the counterfactual is and how that separates from the actual one and gets very beautiful fits to data. It's an advertisement for basically, really elegant work that shows the quantitative power of these models for capturing a sense of causality and causal responsibility and not just predicting what's gonna happen next, okay. One more advertisement. And I'll try to keep this even shorter, but this is recent work from Kelsey Allen and Kevin Smith who I mentioned, where they're adding in the next step, which is, how do you use probabilistic intuitive physics to solve problems, okay? In this case, thinking about the human ability to make and use tools in novel creative ways, right? Again, when we think about cognition, we think about creative problem solving as a as a core aspect of where our general purpose intelligence comes in. Many people who studied the evolution of human intelligence, focus on the human ability to not just make tools, but to just sort of find and repurpose things like this or this rock, or think of all the things I could use this phone for that aren't just making a call, right? Your phone's battery is dead. Is it useless? Well, it depends, right. Maybe unstick this or, you know, you have to, well, I won't go into it, okay. But in this paper, which was published back in 2020, feels like just yesterday, like a lot of things in 2020, in PNAS, they had people playing this really cool virtual tools game, where they basically have to solve these problems. The problem is always get the red ball into the green bowl. And there's many different things that happen in these different levels of this game. It's kind of inspired speaking of phones by these, like touch physics phone games, which are popular pastime. And what they show is that basically, a probabilistic simulation based model provides a good account of the internal process, we pose it, of people sort of trying out ways they could solve the problem by picking in a tool and thinking about where they might place it into the scene. It can capture both where people choose to drop objects as well as sort of the learning dynamics. And what this graph is just showing is, there's sort of this rapid trial and error kind of learning here. Sometimes you might see people talk about reinforcement learning algorithms in AI and say, well, they learn like people or animals by trial and error, but unlike what's called reinforcement learning typically in AI, people don't learn from, you know, thousands or millions of training examples. They do do trial and error learning as we all often do when we're trying to solve problems. But the real trial and error learning is something that unfolds over like five or 10 trials. And that's what we see in this virtual tools game as well as in the model, okay. So, I guess I don't have time to talk about intuitive psychology. But I do wanna advertise two things. One is, if you're interested in how these kinds of ideas play out in understanding intuitive psychology, you're again, very lucky here at Yale to have one of the world's experts in Julian Jara-Ettinger and his research group. They do many things, but among other things, they have developed models for, and sort of taken the kind of models that I've described here and shown all sorts of really cool things to do with them. And I'm not gonna go into the details of those. But I do want to show you the following work. And again, I guess we could just call it mostly an advertisement. But since I bothered to show you Heider and Simmel I wanna emphasize the work that is completely hidden on the bottom because of the zoom thing. I'm gonna just move that for this purpose here. It's been a dream of many cognitive scientists, and certainly of mine, to ha to be able to really study, take that Heider and Simmel video and basically the build models that can really see everything that people can do there. Now, we aren't there yet, okay. But maybe, we're like 99% of the way. No, not even close. But maybe we're like 40% of the way. So, this is really exciting work by Aviv Netanyahu and Tianmin Shu, they're the co-first authors, which they published a version of it at a recent AI conference. But it's honestly, it's even more exciting as cognitive science than AI? It's this phase paper. And what they've done is they basically built this little domain that they call flatland, where you can see these agents here that are basically interacting with physical objects. They can like pick up things, exert forces, throw them. And it allows us to capture in a controlled quantitatively studiable way, many of the things that are going on. Not just when you see a single agent, like try to pursue a goal, but when you have multiple agents interacting, like in the Heider and Simmel video. So in this setup, some agents are strong some are weaker. Agents can have different goals. Like an agent could say, the red thing here could be to get to the gold thing, or the agent could just have a goal to get to another entity. The agent here could have a goal to get the blue ball to the red space. These are possible goals that the agents could have. They also have relationships and social goals. So, agents could be helpful to each other as you can see here, where it looks like the red agent is trying to help the green agent get the blue balls to somewhere. They could be adversarial like the way they seem to be fighting over where this ball is to go. Or they could just be independent like the green one wants to get that ball up to the green space and the red one just wants to go down there. So, these are all possibilities. The cool thing is, these weren't made by humans. These were made by our models. So, we have a model which is a probabilistic generative model of these multi-agent interactions. It sits on top of a physics engine, 'cause you can see that these agents are interacting with physics. And each agent has its own sort of goal and representation of its social relationship and the other agent's goal. And then they do a fairly complex planning process to generate the sequence of behavior. So, the probabilistic program goes from these underlying physics and social variables and produces the movies. But then you can run it in reverse. It's not easy, but with the right kinds of inference algorithms, you can then also see the movie and work backwards to infer what those goals are, both the individual ones, the social ones. And then also run it again forward and predict what's gonna happen next. And so, you can see in this world with this generative model, it can produce quite interesting behavior. So for example, here, you have an agent that, well, notice what happens here. You can just see what happens. The green one goes and steals the blue one and then the red one goes and fights him for it, okay, and successfully gets it back, okay. So there, it's an adversarial interaction, but because the red one didn't see the green one initially, he kind of left blue one unprotected, okay. Notice this scene. The only difference is that the red one now can see the green one. So, he doesn't go and leave it unprotected. He stays there to protect it. That's the only difference is the partial perceptual observability that this system gets. Or you can try this. It's a little sort of touring test here. So on the top, you see two adversarial interactions. Let me move this again. One is generated by the machine, another is actually generated by two people playing the game. Can you tell which one is the human and which is the machine? Let me give you another thing here. So, on the bottom, this is now a helpful interaction where the green one is trying to help the red one, oh sorry, the red, yeah. They're trying to collaborate to get the blue ball into the yellow square. So, raise your hand if you think the ones on the left are the human. Raise your hand if you think the ones on the right are the humans. Okay. Well, most of you were right. I should have asked you how confident you are. So, in our actual experiment when, well, yeah, now you know the answer. So, when we have people judge these one at a time, people are almost about equally confident that they are sort of natural human things. When you put them side by side, you can see some subtle cues, which you're clearly all seeing. And we I'm happy to discuss that afterwards, what some of those cues might be. But hopefully, I think what you'll agree is that these are fairly natural kinds of interactions that really capture the ways agents might collaborate or compete, okay. Let me go back for the Zoom thing. Okay. So, there were a couple of things that I didn't get to talk about, but I will just remind you that I didn't talk about these things, which is, how does this work in the brain? And how might these things be learned? I'll just say the one sentence answer is, there are particular parts of the brain actually, which we can show with functional magnetic resonance imaging, given the same kinds of stimuli that do respond to these sorts of things. And when it comes to learning, we can actually start to build models of how people learn these sorts of things. Laurie's gonna drag me off the stage. (all laughing) That's fine. And the one sentence thing I wanna leave you with on learning is just the following idea, which is, if we wanna build learning algorithms that can learn like a simulation program, the learning algorithms in a sense, have to be what we call program learning programs, okay. So, what kind of algorithm could take as input, experience and produces output another program, which is itself like a probabilistic approximate simulator, let's say a physics, okay. This is an interesting challenge, okay. It's not the sort of thing that you have when you're trying to learn a neural network. We sometimes call this to contrast with learning in a neural network where there's a smooth error surface, and you can use these gradient descent algorithms, basically, just multi-variable calculus, to optimize all the parameters, to train the system, to produce the right input output behavior. If goal is to search through the space of all simulation programs, there's no nice topology or geometry, it's a much harder search problem. But somehow, you know, people are able to solve this, and we wanna understand, how? What I will just say is like, this is just sort of, call this the future of the study of, I think, human learning and computational terms and more human like machine learning. And the advertisement here is for a recent opinion piece in "Trends in Cognitive Science" by Josh Rule, another former recent student and recent graduate along with Steve Piantadosi, where we introduce this metaphor, perhaps an ill chosen name, but "The Child is Hacker". You might have heard of the "Child is Scientist". The idea that children's learning is like, a kind of scientific hypothesis testing and experimentation. By hacker here, we mean like not the bad guys who break into your email and steal your credit card numbers, but the MIT notion of hacking, and I think maybe they have some of this at Yale too. Like creative exploration, whether it's a world of code or worlds of, you know, tunnels or whatever it is, but basically all the ways in which we think we can basically think of constructing knowledge. The way we construct like a body of code. So, it's not just taking an existing program with a bunch of parameters and tuning the parameters, but it's actually algorithms that write code. And there's all sorts of like, sort of bubbling ideas. bubbling little examples of this in the kind of cognitive AI literature at this point. And I'll just say, check out this paper to see a review of some of those and how those might apply to thinking about what I'd say is the future of human learning, which is really algorithms that write algorithms, okay. Whether it's a physics engine or all the other things that we learn, okay. So that's it. This is the actual last slide. (audience applauding) Okay. I hope you've gotten at least, a taste of how we can use these tools to capture some of the aspects of common sense. But I also think the hardest questions remain. So now, I'm happy to have you throw them out of me. - [Brian] Very good. By the way, Josh, you said one sentence and I counted, it was 22 sentences. - Okay. (audience laughing) That's just what I deserve. - We have a dangerously small time for questions now because Josh has a seminar to get too soon, but we will take at least several minutes for questions. Since we have a lot of students here in the audience, I think the first question for you, Josh has to be what kind of formative educational experience, like what kind of college did you go to that could have led to such a brilliant ambitious research program? - Oh well, what a great question. I think it might have been Ezra Stiles College. Or as we used to, yeah. It kind of looks like this, but different. Sort of. Yeah. So, yes. No. I was a Yale College undergrad. Before there was this great cognitive science program, which is in large part due to some of the efforts of Brian and number of colleagues here at this point, so. - Fantastic. So let's open the floor up for questions here. Very good. - Yeah. - Thank you. - Oh, do you want. - Thank you for a lot for your talk. So, I'm new to this, so this might be naive. So, I heard about probabilistic coding and I know about the neural network. So, can you build on like a framework, like what we are currently doing chip, when we are designing chip, we have a very large software where, where we can like use, for example, probabilistic coding, to design the architecture and to convert to our neural network implementation. So, do you think that framework is possible to build? And if yes, like where should we start? - Yeah, so if I understand you correctly, I think what you're asking is can you implement these ideas, which I described, you know, let's say at the software level, right? I talked about programs and algorithms. Can you implement them at the hardware level? Could you imagine a chip, let's say, that like implements these computations in some physical device? Much as we have chips that do something similar for neural networks, some of the are specially designed chips, others are ways that people have figured out how to use the graphics GPUs or graphics processing units, which are effectively chips for, that were designed originally for computer graphics, which can be repurposed for training and running neural networks. And what I hear you asking is, is there a similar way to design, like physical circuits that implement these computations? Is that what you're asking? - [Student] Yeah, basically. Because probabilistic coding is fantastic for human understanding, while neural network is currently what we are implementing in our brain. So I'm trying to figure out, - I see. So how does this fit with neural networks in the brain? - Basically. - Yeah. Okay. I mean, the first question was a good question. This is a related, but even better question, I think. And it's one that, there's no way I can really do it justice here, but for those who are in the class, I hope we can discuss it some more. So again, what I think is lurking behind what you're saying and I even reinforce this in the beginning of my talk, is that, we have, like, we have these things, which we call neural networks, which, that term actually means two different things. There's the original neural network, which is the brain, right? If anything in the world is a neural network, it's the brain 'cause it's, right, okay. But then there's these things called artificial neural networks, which are the tools that people in machine learning use, right? Now, they're related to each other in that, the artificial neural networks, their basic units, their basic primitives, are in some way inspired by what we understand about how actual neurons in the brain work. But that inspiration is loose and limited, right. Like in the very early days of neural networks in machine systems that was a tight link, relatively. I mean, we just didn't know very much. We didn't have very much on computer technology and we knew less about how neurons in the brain work. At this point, what's called neural networks in machine learning, artificial neural networks, goes quite beyond anything that we actually understand about the brain or maybe is also quite less like much less, I mean, basically there's almost no relation at this point. There's a basic correlation, but it's not right, I think, to take today's artificial neural networks and say that's how real biological neural networks are. Like, there is some relation, but we can't assume that like the way to relate the software models of probabilistic programming that I'm talking about here, the way to relate that to the brain is to go via an implementation in today's artificial neural networks. That's one possibility. And we and others have worked on that. Actually, I referred several times to Ilker Yildirim's work. He has some really nice work in his group looking at that kind of thing, okay. Mario Belledonne, who's here somewhere. There's Mario. Mario's worked on that also with Ilker and a number of others, okay. So, that is one thing you can do. You can try to effectively compile these probabilistic programming things into an artificial neural net and say, maybe that's how the brain works. I'm not saying that's wrong. I'm just saying, we can't assume that is right. Another thing you could try to do, and other people have been working on this, is to try to say, well, maybe there's some mapping between the probabilistic computations that go on under the hood to give you the numbers which I showed you here, and the probabilistic spiking behavior of neurons that sort of, when you, you could have networks of stochastic spiking neurons, which if you've studied neuro circuit modeling, that's actually often the way computational neuroscientists describe real biological neural networks. And it's possible that those kinds of implementations could, in a more direct way, implement some of the probabilistic inferences that we do to produce the pictures I showed you here. Basically Monte Carlo the Monte Carlo principle that says, you approximate probabilities with sums, empirical expectations over stochastic simulations. That's what's behind a lot of what I showed you, and that can be implemented directly in networks of stochastic spiking neurons. So, that's another route to try to make this mapping. So, they're both like much to be done in the future to really see if any of those are going to work. But that's, I think, especially exciting to try to relate these models of the mind to models of the brain. And I think, that's also gonna be central to really having a fully satisfying answer to what kind of computation is cognition. - [Brian] That was a very satisfying answer. Not all of the answers can be that long. - Okay. - [Brian] Let's take some more questions. And since that was a computational sort of oriented question, I believe one of the people who had their hands up still in the back, let's move to a philosophy oriented question with a, we're not supposed to say our names because of releases, but this person's name may or may not be Michael. - Fortunately, we can do probabilistic inference. - Thank you, Brian and thank you, Josh. A couple times in the talk, you mentioned you poo-poo the idea that we're 99% of the way there, we're more like 40% of the way there. What would 100% of the way there be like? And would being 100% get us success of any kind? So, I'm trying to envisage what would success look like? - Right. So that, I did encourage you to ask that question. So, thank you for asking that difficult question. I think it depends. Well, wait, can you let him keep the mic for one second? Do you mean success towards, so I used those numbers like 99% in both somebody's estimate of how close are we to an AI goal, like self-driving cars, or also, how close are we towards some kind cognitive modeling goal? Which one were you asking? - [Student] Well, I was thinking like the success, like you have something that can behave just like humans do. And could they pass a training test for example? And why would that be success if this training test can be passed? That's interesting, but what does it tell us about the human mind? - Yeah. So, I don't know if I can answer that in a short answer. And Brian has, - A haiku. (audience laughing) Okay. This may be more like 22 syllables than 12. But this is about scientific model a building, right. And I don't think I'm gonna be able to give you a fully, I don't know that I can give you a fully satisfying answer to that question at all, let alone the one you want me to answer right here, okay. But I think as in most other areas of science, you know, how do we know when our models are on the right track? Well, you could call it some notion of coherence. And I think this is like basic metaphysics and epistemology, which my philosophy colleague can tell me more about. But when you have a model that captures many different ways of getting at the same phenomena, all right, that can include different kinds of behavioral judgements that we can study in the lab, that can include acting like a person in the corresponding real world scenarios. That's the sort of AI tests of the model. And as we're starting to be able to do, that can also, maybe even predict some neural responses, okay. Which I didn't show you, that these models are able to do, but they're starting to go in that direction, okay. If this, if the same class of models can explain or let's say can just fit all those different sorts of data, and I would say, even explain them in the sense that gives a functional explanation of why the behavior is the way it is, why the neurons are the way they are, because we would argue this computational approach is actually solving the real world problems in some efficient, effective way. I think that's a coherent kind of, at least epistemology here and maybe even metaphysics, I think fundamentally, right, what is lurking behind this approach, and I can try to unpack this in less haiku like fashion later on is that, I think at least, and I think most of us as scientists think, the world is real, it is really real thing. But the best we can do in our science is to build models of it. And that's true when we're doing science and it's true when we're doing cognition also, right. This view of the mind that I'm talking about here, and that is, I think, a canonical one in cognitive science is this idea of the mind as modeler a model builder. And I think the same kinds of tools of probabilistic modeling, prediction, including counterfactual or hypothetical data, like the same things that we think the mind does, we as scientists do, it's just the data, isn't the data coming in through our retinas. It's the data that we get when people press buttons and say things, when we put things through their retina, right? And I think that basic notion of coherence and modeling is at the heart of why, you know, it's heart of how we, as humans intuitively understand the world, it's what makes satisfying understanding in pretty much every area of science that I know. And I would love to explore more, trying to make that more rigorous argument, I think. Let's just say, okay. - [Brian] Let's take one more question before Josh has shuttled up to the seminar that he'll talk to and someone who's name may or may not be Jack. - [Student] Hi, I have a question about sort of the architecture here, when you introduced the game engine part of cognition, part of learning, is the claim there that that's sort of, a separate self-sufficient module? And if so, how does it interface with symbolic or semantic representations? - Right. Okay. Because time is short, there's a lot to unpack there for people who maybe don't know everything you were talking about. But I will just say, so these models, so, okay. So, there's I guess a standard cognitive science view that I think maybe you're referring to, like a sort of photo modularity of mind, language of thought kind of set of questions, right? So to translate this picture into those terms, I would say there's not, the single module in the mind or brain that's the game engine. . There's actually a set of brain systems that I can even tell you where they are based on some of my colleagues FMRI work, that are both, you know, somewhat functionally distinct subsystems. They also interact in ways. And like actually, a good chunk of the brain can be mapped onto these different parts of the game engine in this model. Ilker wrote a nice review of this in "Current Opinion in Neurobiology" a couple of years ago on at least the sort of object physics part of that. So, I'll refer you to that paper for one part of it. But I think with some other colleagues who've been studying like basically, social cognition networks, whether it's theory of mind or other ones. So, you can actually look at the brain and see some, both modular structure, but also interacting models. I would say they have some interesting informational interfaces. They're not as encapsulated as Photer would've said, but there is some functionally distinct sub-structure, okay. But you also asked about like traditional symbolic and semantic knowledge. So, I think a really important part of this research program is that these models are, what I think many cognitive scientists might call like pre-semantic. They're often more like perception. I mean Brian and I will argue about this probably in a few minutes. And I think, no, it's really interesting, and that in many ways, these models are more like perception than like cognition. Even if Brian might say sometimes, he might, I mean, I'll let you say. But like, they have some components of the way Brian, who draws a strong line between perception and cognition would call perception and others that are more like cognition. But they are certainly not like, you know, sort of traditional linguistic semantic verbally expressible knowledge. I think a lot of our early word meanings and some of our early syntax is actually, sits on top of this like early syntax and semantics and syntax semantics mappings and like verb lexical semantics sits on top of this kind of basic stuff. And I mean, I'm just channeling things that like Ray Jackendoff or Tommy and many others would say, okay, or Lili Glitman, for example. But the key thing here is that symbols should not be restricted to language like semantic knowledge. It's essential that these models as we build them are symbolic, okay. Now, they also have to be neural. That's going back to the first question. But by far, at this point, by far our best models of these kind of game engine things are symbolic representations that have object structure, compositionality, all the usual nice like, photo pollution things, these have them in spades, okay. They also support probabilistic inference and they can also support learning. So if one is used to thinking, you know, oh, well, symbolic languages that's somehow distinct from like a statistical inference or probabilistic inference module. And then, somehow learning comes into this. Like, if there's one note to maybe just take home from this and end on it's that, I think to really understand cognition, we have to go beyond that way of, kind of parceling out responsibility in terms of different computational motifs and say, no, actually to understand common sense, these all have to come to together. You have to be able to have fundamentally symbolic representations of the structure of the world, but that support probabilistic inference and that also support learning. And I think, you know, we're starting to see some steps. It's not 99% and it's not even 40%, but it's significant steps towards seeing how that synthesis might play out, okay. - [Brian] With apologies for the rush schedule. Let us thank Professor Josh Tenenbaum. (audience applauding) - Thanks. (students chattering)
Info
Channel: YaleUniversity
Views: 63,663
Rating: undefined out of 5
Keywords: Shulman Lectures, Metaphysics Meets Cognitive Science, Josh Tenenbaum, artificial intelligence, human intelligence, intuitive physics and intuitive psychology, Computational Cognitive Science, MacArthur Fellowship, Cognition, Brian Scholl, L. A. Paul
Id: NsID1iM8gRw
Channel Id: undefined
Length: 78min 9sec (4689 seconds)
Published: Tue Mar 15 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.